Skip to contents

Find seed matches in a DNAStringSet object of sequences. This function will use get.seed extract the seed sequence from the guide sequence. The seed is then searched across all rows of the DNAStringSet object using vpatterncount.

This function returns the input DESeq2 results data.frame with an additional column that contains the counts for the input seed.name.

Usage

SeedMatchR(
  res,
  gtf,
  seqs,
  sequence,
  seed.name = "mer7m8",
  col.name = NULL,
  mismatches = 0,
  indels = FALSE,
  tx.id.col = TRUE
)

Arguments

res

A DESeq2 results data.frame

gtf

GTF file used to map features to genes. The object must have columns transcript_id and gene_id

seqs

The DNAStringSet object with sequence information for features. The names of the sequences should be the transcript names.

sequence

The DNAString guide sequence oriented 5' > 3'.

seed.name

The name of specific seed to extract. Options are: mer8, mer7A1, mer7m8, mer6

col.name

The string to use for the column name. Defaults to seed name

mismatches

The number of mismatches to allow in search

indels

Whether to allow indels in search

tx.id.col

Use the transcript_id column instead of gene_id

Value

A modified DESeq2 results dataframe that has column named after the seed of choice representing the number of match counts.

Examples

if (FALSE) { # interactive()
library(dplyr)

seq = "UUAUAGAGCAAGAACACUGUUUU"

anno.db = load_species_anno_db("human")

features = get_feature_seqs(anno.db$tx.db, anno.db$dna)

# Load test data
res <- Schlegel_2022_Ttr_D1_30mkg

# Filter DESeq2 results for SeedMatchR
res = filter_deseq(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = TRUE)

res = SeedMatchR(res, anno.db$gtf, features$seqs, seq, "mer7m8")
}