Load species-specific annotations needed to run SeedMatchR()

load_annotations uses AnnotationHub::AnnotationHub() to load species-specific annotations. This function currently works for human, rat, and mouse.

Usage

load_annotations(
  reference.name = c("hg38", "hg38-old", "mm39", "mm10", "rnor6", "rnor7"),
  feature.type = c("3UTR", "5UTR", "exons", "cds"),
  standard.chroms = TRUE,
  remove.na.rows = TRUE,
  protein.coding = TRUE,
  transcript.support = NULL,
  canonical = TRUE,
  gene_id = NULL,
  tx_id = NULL,
  symbol = NULL,
  entrez_id = NULL,
  longest.utr = FALSE,
  reduce.features = FALSE,
  add.filter = NULL,
  min.feature.width = 8,
  return_gene_name = TRUE
)

Arguments

reference.name: Reference build name. Options: hg38, rnor6, rnor7, mm39, mm10
feature.type: The transcript feature type to extract. Options: "3UTR", "5UTR", "exons", "cds"
standard.chroms: Keep only standard chromosomes. Boolean.
remove.na.rows: Remove transcripts with NA in the ID column. Boolean.
protein.coding: Keep only protein coding genes. Boolean.
transcript.support: Filter by transcript support level. Integer 1-4
canonical: Keep only the ENSEMBL canonical transcript. Boolean.
gene_id: Keep specific ENSEMBL ID(s). Input is a vector
tx_id: Keep specific tx_id(s). Input is a vector
symbol: Keep specific gene symbol(s). Input is a vector
entrez_id: Keep specific ENTREZ ID(s). Input is a vector
longest.utr: Reduce annotations to the longest UTR per gene
reduce.features: Reduce 3' UTRs to non-overlapping intervals across all transcripts per gene.
add.filter: Optional AnnotationFilter::AnnotationFilterList to include
min.feature.width: The minimum length for features in nucleotides.
return_gene_name: If true, the gene_id will be used to name the sequences.

Value

The following are returned in the object:

$gtf: A GenomicRanges::GRanges object containing the transcript information
$txdb: A ensembldb::EnsDb object of transcript information
$dna: The rtracklayer::TwoBitFile DNA sequence
$features: A GenomicRanges::GRangesList object representing the features of interest
$seqs: A Biostrings::DNAStringSet for the features in $features
$filter: A AnnotationFilter::AnnotationFilterList used for filtering the ensembldb::EnsDb object
$tx.names: A list of transcript names in $seqs

Details

This function is designed to perform all annotation loading, parsing, and sequence extractions for the reference of interest.

load_annotations uses the ensembldb package for handling annotations. This allows for the use of AnnotationFilter::AnnotationFilters for easily selecting transcripts of interest.

Most of the arguments to load_annotations are geared towards customizing the output features and sequence extracted from the ensembldb::EnsDb object. Filtering parameters are inherited from build_annotation_filter()`.

Examples

if (FALSE) { # interactive()
anno.db = load_annotations("hg38")
}