Load species-specific annotations needed to run SeedMatchR()
load_annotations.Rd
load_annotations
uses AnnotationHub::AnnotationHub()
to load
species-specific annotations. This function currently works for human, rat,
and mouse.
Usage
load_annotations(
reference.name = c("hg38", "hg38-old", "mm39", "mm10", "rnor6", "rnor7"),
feature.type = c("3UTR", "5UTR", "exons", "cds"),
standard.chroms = TRUE,
remove.na.rows = TRUE,
protein.coding = TRUE,
transcript.support = NULL,
canonical = TRUE,
gene_id = NULL,
tx_id = NULL,
symbol = NULL,
entrez_id = NULL,
longest.utr = FALSE,
reduce.features = FALSE,
add.filter = NULL,
min.feature.width = 8,
return_gene_name = TRUE
)
Arguments
- reference.name
Reference build name. Options: hg38, rnor6, rnor7, mm39, mm10
- feature.type
The transcript feature type to extract. Options: "3UTR", "5UTR", "exons", "cds"
- standard.chroms
Keep only standard chromosomes. Boolean.
- remove.na.rows
Remove transcripts with NA in the ID column. Boolean.
- protein.coding
Keep only protein coding genes. Boolean.
- transcript.support
Filter by transcript support level. Integer 1-4
- canonical
Keep only the ENSEMBL canonical transcript. Boolean.
- gene_id
Keep specific ENSEMBL ID(s). Input is a vector
- tx_id
Keep specific tx_id(s). Input is a vector
- symbol
Keep specific gene symbol(s). Input is a vector
- entrez_id
Keep specific ENTREZ ID(s). Input is a vector
- longest.utr
Reduce annotations to the longest UTR per gene
- reduce.features
Reduce 3' UTRs to non-overlapping intervals across all transcripts per gene.
- add.filter
Optional AnnotationFilter::AnnotationFilterList to include
- min.feature.width
The minimum length for features in nucleotides.
- return_gene_name
If true, the gene_id will be used to name the sequences.
Value
The following are returned in the object:
$gtf
: A GenomicRanges::GRanges object containing the transcript information$txdb
: A ensembldb::EnsDb object of transcript information$dna
: The rtracklayer::TwoBitFile DNA sequence$features
: A GenomicRanges::GRangesList object representing the features of interest$seqs
: A Biostrings::DNAStringSet for the features in$features
$filter
: A AnnotationFilter::AnnotationFilterList used for filtering the ensembldb::EnsDb object$tx.names
: A list of transcript names in$seqs
Details
This function is designed to perform all annotation loading, parsing, and sequence extractions for the reference of interest.
load_annotations
uses the ensembldb
package for handling annotations.
This allows for the use of AnnotationFilter::AnnotationFilters for
easily selecting transcripts of interest.
Most of the arguments to load_annotations
are geared towards customizing
the output features and sequence extracted from the ensembldb::EnsDb
object. Filtering parameters are inherited from
build_annotation_filter()
`.