Find seed matches in a DNAStringSet
object of
sequences. This function will use get.seed
extract the seed sequence
from the guide sequence. The seed is then searched across all rows of the
DNAStringSet
object using vpatterncount
.
This function returns the input DESeq2 results data.frame
with an
additional column that contains the counts for the input seed.name
.
Usage
SeedMatchR(
res,
gtf,
seqs,
sequence,
seed.name = "mer7m8",
col.name = NULL,
mismatches = 0,
indels = FALSE,
tx.id.col = TRUE
)
Arguments
- res
A DESeq2 results
data.frame
- gtf
GTF file used to map features to genes. The object must have columns transcript_id and gene_id
- seqs
The
DNAStringSet
object with sequence information for features. The names of the sequences should be the transcript names.- sequence
The
DNAString
guide sequence oriented 5' > 3'.- seed.name
The name of specific seed to extract. Options are: mer8, mer7A1, mer7m8, mer6
- col.name
The string to use for the column name. Defaults to seed name
- mismatches
The number of mismatches to allow in search
- indels
Whether to allow indels in search
- tx.id.col
Use the transcript_id column instead of gene_id
Value
A modified DESeq2 results dataframe that has column named after the seed of choice representing the number of match counts.
Examples
if (FALSE) { # interactive()
library(dplyr)
seq = "UUAUAGAGCAAGAACACUGUUUU"
anno.db = load_species_anno_db("human")
features = get_feature_seqs(anno.db$tx.db, anno.db$dna)
# Load test data
res <- Schlegel_2022_Ttr_D1_30mkg
# Filter DESeq2 results for SeedMatchR
res = filter_deseq(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)
res = SeedMatchR(res, anno.db$gtf, features$seqs, seq, "mer7m8")
}