Skip to contents

Find seed matches in a Biostrings::DNAStringSet object of sequences.

This function will use get.seed to extract the seed sequence from the guide sequence.

There are two modes for running SeedMatchR. The different modes ares set by the res.format argument with the options DESEQ2 or data.frame.

The DESEQ2 mode will match the input seed across all rows of the Biostrings::DNAStringSet object using Biostrings::vcountPattern(). These matches are then aggregated and matched to genes in the DESEQ2 results data.frame. Counts are reported as an additional column with the seed count. By default, this mode will be run if a DESEQ2 results data.frame is provided to the res argument.

The data.frame mode will return a data.frame with the interval ranges associated with matches from Biostrings::vmatchPattern(). This is the default mode run when no DESEQ2 results are provided.

Usage

SeedMatchR(
  seqs,
  sequence,
  res = NULL,
  res.format = c("DESEQ2", "data.frame", "granges", "iranges"),
  seed.name = "mer7m8",
  col.name = NULL,
  start.pos = NULL,
  stop.pos = NULL,
  match.df = NULL,
  sirna.name = NULL,
  shared_genes = TRUE,
  allow_wobbles = FALSE,
  get_seed = TRUE,
  max.mismatch = 0,
  with.indels = TRUE,
  fixed = TRUE
)

Arguments

seqs

The Biostrings::DNAStringSet object with sequence information for features. The names of the sequences should be the transcript names.

sequence

The Biostrings::DNAString guide sequence oriented 5' > 3'.

res

An optional DESEQ2 results data.frame. If provided, an additional column with the seed match count will be added to the data.frame. If not provided, SeedMatchR will return the interval ranges for each match for the input Biostrings::DNAStringSet.

res.format

Format for the returned results. Either 'data.frame', 'DESEQ2', 'iranges', or 'granges'

seed.name

The seed name to be reported in the data.frame column called seed

col.name

The optional name of the column of match counts. Will default to seed.name if not set

start.pos

The seed start position

stop.pos

The seed stop position

match.df

Optional: If a matches df is provided the results of the current search will be added with rbind.

sirna.name

Optional siRNA name. A new column called sirna.name will be added to data.frame

shared_genes

If true, the tx set is reduced to overlapping features in both the sequences db and in the DE data.

allow_wobbles

If True, allow G:U wobbles by replacing U with Y.

get_seed

If True, parse the input character vector and return get_seed object.

max.mismatch

Number of allowed mismatches or the total edit distance

with.indels

If True, include indels

fixed

Require that each sequence symbol matches when searching. Should be FALSE if using wobbles.

Value

Either a 'data.frame', 'DESEQ2', 'iranges', or 'granges' object based on res.format input

Examples

if (FALSE) { # interactive()
library(dplyr)

seq = "UUAUAGAGCAAGAACACUGUUUU"

anno.db = load_annotations("rnor7")

get_example_data("sirna")

sirna.data = load_example_data("sirna")

res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg

# Filter DESeq2 results for SeedMatchR
res = filter_res(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)

res = SeedMatchR(res = res, seqs = anno.db$seqs,
sequence = guide.seq, seed.name = "mer7m8")
}