Find seed matches in genomic features — SeedMatchR • SeedMatchR

Find seed matches in a DNAStringSet object of sequences. This function will use get.seed extract the seed sequence from the guide sequence. The seed is then searched across all rows of the DNAStringSet object using vpatterncount.

This function returns the input DESeq2 results data.frame with an additional column that contains the counts for the input seed.name.

Usage

SeedMatchR(
  res,
  gtf,
  seqs,
  sequence,
  seed.name = "mer7m8",
  col.name = NULL,
  mismatches = 0,
  indels = FALSE,
  tx.id.col = TRUE
)

Arguments

res: A DESeq2 results data.frame
gtf: GTF file used to map features to genes. The object must have columns transcript_id and gene_id
seqs: The DNAStringSet object with sequence information for features. The names of the sequences should be the transcript names.
sequence: The DNAString guide sequence oriented 5' > 3'.
seed.name: The name of specific seed to extract. Options are: mer8, mer7A1, mer7m8, mer6
col.name: The string to use for the column name. Defaults to seed name
mismatches: The number of mismatches to allow in search
indels: Whether to allow indels in search
tx.id.col: Use the transcript_id column instead of gene_id

Value

A modified DESeq2 results dataframe that has column named after the seed of choice representing the number of match counts.

Examples

if (FALSE) { # interactive()
library(dplyr)

seq = "UUAUAGAGCAAGAACACUGUUUU"

anno.db = load_species_anno_db("human")

features = get_feature_seqs(anno.db$tx.db, anno.db$dna)

# Load test data
res <- Schlegel_2022_Ttr_D1_30mkg

# Filter DESeq2 results for SeedMatchR
res = filter_deseq(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)

res = SeedMatchR(res, anno.db$gtf, features$seqs, seq, "mer7m8")
}