Find seed matches in genomic features
SeedMatchR.Rd
Find seed matches in a Biostrings::DNAStringSet object of sequences.
This function will use get.seed
to extract the seed sequence
from the guide sequence.
There are two modes for running SeedMatchR
. The different modes ares set by
the res.format
argument with the options DESEQ2
or data.frame
.
The DESEQ2
mode will match the input seed across all rows of the
Biostrings::DNAStringSet object using Biostrings::vcountPattern()
. These matches are then
aggregated and matched to genes in the DESEQ2
results data.frame
. Counts
are reported as an additional column with the seed count. By default, this
mode will be run if a DESEQ2
results data.frame
is provided to the res
argument.
The data.frame
mode will return a data.frame
with the interval ranges
associated with matches from Biostrings::vmatchPattern()
. This is the default mode run
when no DESEQ2
results are provided.
Usage
SeedMatchR(
seqs,
sequence,
res = NULL,
res.format = c("DESEQ2", "data.frame", "granges", "iranges"),
seed.name = "mer7m8",
col.name = NULL,
start.pos = NULL,
stop.pos = NULL,
match.df = NULL,
sirna.name = NULL,
shared_genes = TRUE,
allow_wobbles = FALSE,
get_seed = TRUE,
max.mismatch = 0,
with.indels = TRUE,
fixed = TRUE
)
Arguments
- seqs
The Biostrings::DNAStringSet object with sequence information for features. The names of the sequences should be the transcript names.
- sequence
The Biostrings::DNAString guide sequence oriented 5' > 3'.
- res
An optional
DESEQ2
resultsdata.frame
. If provided, an additional column with the seed match count will be added to thedata.frame
. If not provided,SeedMatchR
will return the interval ranges for each match for the input Biostrings::DNAStringSet.- res.format
Format for the returned results. Either 'data.frame', 'DESEQ2', 'iranges', or 'granges'
- seed.name
The seed name to be reported in the
data.frame
column calledseed
- col.name
The optional name of the column of match counts. Will default to
seed.name
if not set- start.pos
The seed start position
- stop.pos
The seed stop position
- match.df
Optional: If a matches df is provided the results of the current search will be added with rbind.
- sirna.name
Optional siRNA name. A new column called
sirna.name
will be added todata.frame
If true, the tx set is reduced to overlapping features in both the sequences db and in the DE data.
- allow_wobbles
If True, allow G:U wobbles by replacing U with Y.
- get_seed
If True, parse the input character vector and return get_seed object.
- max.mismatch
Number of allowed mismatches or the total edit distance
- with.indels
If True, include indels
- fixed
Require that each sequence symbol matches when searching. Should be FALSE if using wobbles.
Examples
if (FALSE) { # interactive()
library(dplyr)
seq = "UUAUAGAGCAAGAACACUGUUUU"
anno.db = load_annotations("rnor7")
get_example_data("sirna")
sirna.data = load_example_data("sirna")
res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg
# Filter DESeq2 results for SeedMatchR
res = filter_res(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)
res = SeedMatchR(res = res, seqs = anno.db$seqs,
sequence = guide.seq, seed.name = "mer7m8")
}