Find seed matches in genomic features
SeedMatchR.RdFind seed matches in a Biostrings::DNAStringSet object of sequences.
This function will use get.seed to extract the seed sequence
from the guide sequence.
There are two modes for running SeedMatchR. The different modes ares set by
the res.format argument with the options DESEQ2 or data.frame.
The DESEQ2 mode will match the input seed across all rows of the
Biostrings::DNAStringSet object using Biostrings::vcountPattern(). These matches are then
aggregated and matched to genes in the DESEQ2 results data.frame. Counts
are reported as an additional column with the seed count. By default, this
mode will be run if a DESEQ2 results data.frame is provided to the res argument.
The data.frame mode will return a data.frame with the interval ranges
associated with matches from Biostrings::vmatchPattern(). This is the default mode run
when no DESEQ2 results are provided.
Usage
SeedMatchR(
seqs,
sequence,
res = NULL,
res.format = c("DESEQ2", "data.frame", "granges", "iranges"),
seed.name = "mer7m8",
col.name = NULL,
start.pos = NULL,
stop.pos = NULL,
match.df = NULL,
sirna.name = NULL,
shared_genes = TRUE,
allow_wobbles = FALSE,
get_seed = TRUE,
max.mismatch = 0,
with.indels = TRUE,
fixed = TRUE
)Arguments
- seqs
The Biostrings::DNAStringSet object with sequence information for features. The names of the sequences should be the transcript names.
- sequence
The Biostrings::DNAString guide sequence oriented 5' > 3'.
- res
An optional
DESEQ2resultsdata.frame. If provided, an additional column with the seed match count will be added to thedata.frame. If not provided,SeedMatchRwill return the interval ranges for each match for the input Biostrings::DNAStringSet.- res.format
Format for the returned results. Either 'data.frame', 'DESEQ2', 'iranges', or 'granges'
- seed.name
The seed name to be reported in the
data.framecolumn calledseed- col.name
The optional name of the column of match counts. Will default to
seed.nameif not set- start.pos
The seed start position
- stop.pos
The seed stop position
- match.df
Optional: If a matches df is provided the results of the current search will be added with rbind.
- sirna.name
Optional siRNA name. A new column called
sirna.namewill be added todata.frameIf true, the tx set is reduced to overlapping features in both the sequences db and in the DE data.
- allow_wobbles
If True, allow G:U wobbles by replacing U with Y.
- get_seed
If True, parse the input character vector and return get_seed object.
- max.mismatch
Number of allowed mismatches or the total edit distance
- with.indels
If True, include indels
- fixed
Require that each sequence symbol matches when searching. Should be FALSE if using wobbles.
Examples
if (FALSE) { # interactive()
library(dplyr)
seq = "UUAUAGAGCAAGAACACUGUUUU"
anno.db = load_annotations("rnor7")
get_example_data("sirna")
sirna.data = load_example_data("sirna")
res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg
# Filter DESeq2 results for SeedMatchR
res = filter_res(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)
res = SeedMatchR(res = res, seqs = anno.db$seqs,
sequence = guide.seq, seed.name = "mer7m8")
}