SeedMatchR Search Details
Tareian Cazares, Pradeep Pugalenthi, and Gulcin Ozer
SeedMatchR_search_details.Rmd
library(SeedMatchR)
library(Biostrings)
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
#> order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#> rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> Loading required package: stats4
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: XVector
#> Loading required package: GenomeInfoDb
#>
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#>
#> strsplit
Overview
SeedMatchR
uses Biostrings::matchPattern
variants for searching sequences. This article describes different
parameters of the search and their results.
Load example data
This example uses the siRNA sequence, D1, targeting the Ttr gene in rat liver from the publication:
Schlegel MK, Janas MM, Jiang Y, Barry JD, Davis W, Agarwal S, Berman D, Brown CR, Castoreno A, LeBlanc S, Liebow A, Mayo T, Milstein S, Nguyen T, Shulga-Morskaya S, Hyde S, Schofield S, Szeto J, Woods LB, Yilmaz VO, Manoharan M, Egli M, Charissé K, Sepp-Lorenzino L, Haslett P, Fitzgerald K, Jadhav V, Maier MA. From bench to bedside: Improving the clinical safety of GalNAc-siRNA conjugates using seed-pairing destabilization. Nucleic Acids Res. 2022 Jul 8;50(12):6656-6670. doi: 10.1093/nar/gkac539. PMID: 35736224; PMCID: PMC9262600.
Load sequence data
The guide sequence of interest is 23 bp long and oriented 5’ -> 3’.
# siRNA sequence of interest targeting a 23 bp region of the Ttr gene
guide.seq = "UUAUAGAGCAAGAACACUGUUUU"
# siRNA target sequence
ttr.target = get_seed(guide.seq = guide.seq, seed.name = "Full")$Target.Seq
ttr.target.wobble = get_seed(guide.seq = guide.seq, seed.name = "Full", allow_wobbles = TRUE)$Target.Seq
Load annotation databases
annodb = load_annotations(reference.name = "rnor6", canonical = FALSE, min.feature.width = 8, longest.utr = T)
#> Build AnnotationFilter for transcript features based on the following parameters:
#> Keep only standard chroms: TRUE
#> Remove rows with NA in transcript ID: TRUE
#> Keep only protein coding genes and transcripts: TRUE
#> Filtering for transcripts with support level: FALSE
#> Keep only the ENSEMBL canonical transcript: FALSE
#> Filtering for specific genes: FALSE
#> Filtering for specific transcripts: FALSE
#> Filtering for specific gene symbols: FALSE
#> Filtering for specific entrez id: FALSE
#> Loading annotations from AnnotationHub for rnor6
#> loading from cache
#> require("rtracklayer")
#> Warning: replacing previous import 'S4Arrays::makeNindexFromArrayViewport' by
#> 'DelayedArray::makeNindexFromArrayViewport' when loading 'SummarizedExperiment'
#> loading from cache
#> require("ensembldb")
#> Extracting 3UTR from ensembldb object.
#> Keeping the longest UTR per gene.
#> Extracting sequences for each feature.
#> Keeping sequences that are >= 8
Load example transcript
We will use SeedMatchR
to load an example transcript for
Ttr in rats. This sequence is the target of an siRNA that is used in
this package.
Load example DE data
get_example_data("sirna")
#> Example data directory being created at: /home/runner/.local/share/R/SeedMatchR
#> Warning in dir.create(data.path, recursive = TRUE):
#> '/home/runner/.local/share/R/SeedMatchR' already exists
sirna.data = load_example_data("sirna")
res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg
res = filter_res(res, fdr_cutoff=1, fc_cutoff=0)
Search variants
Base Ttr example
We expect to see a single complete match
matchPattern(ttr.target, ttr.base.target, max.mismatch = 0, with.indels = F)
#> Views on a 23-letter DNAString subject
#> subject: AAAACAGTGTTCTTGCTCTATAA
#> views:
#> start end width
#> [1] 1 22 22 [AAAACAGTGTTCTTGCTCTATA]
Ttr with insertion
We expect the match not to be found without the proper flags.
matchPattern(ttr.target, ttr.ins.target, max.mismatch = 0, with.indels = F)
#> Views on a 24-letter DNAString subject
#> subject: AAAACAGTGTTGCTTGCTCTATAA
#> views: NONE
We expect this search to have no matches despite the indel flag since the max.mismatch argument is at 0.
matchPattern(ttr.target, ttr.ins.target, max.mismatch = 0, with.indels = T)
#> Views on a 24-letter DNAString subject
#> subject: AAAACAGTGTTGCTTGCTCTATAA
#> views: NONE
We expect there to be no match since there is no indels allowed, but we still allow for mismatches.
matchPattern(ttr.target, ttr.ins.target, max.mismatch = 1, with.indels = F)
#> Views on a 24-letter DNAString subject
#> subject: AAAACAGTGTTGCTTGCTCTATAA
#> views: NONE
We expect the search to find 1 match since max.mismatch is set to 1 and indels are allowed.
matchPattern(ttr.target, ttr.ins.target, max.mismatch = 1, with.indels = T)
#> Views on a 24-letter DNAString subject
#> subject: AAAACAGTGTTGCTTGCTCTATAA
#> views:
#> start end width
#> [1] 1 23 23 [AAAACAGTGTTGCTTGCTCTATA]
Testing multiple mismatches and indels and how they are reported
The total edit distance from the original sequence is 6 (4 mismatches, 1 insertion, 1 deletion). We want to see how these different parameters are counted in the max.mismatch category with indels.
We expect no matches.
matchPattern(ttr.target, ttr.mm4.in.del.target, max.mismatch = 4, with.indels = F)
#> Views on a 23-letter DNAString subject
#> subject: AAATACAGTGTTGCCACTCTAAA
#> views: NONE
We expect no matches.
matchPattern(ttr.target, ttr.mm4.in.del.target, max.mismatch = 4, with.indels = T)
#> Views on a 23-letter DNAString subject
#> subject: AAATACAGTGTTGCCACTCTAAA
#> views: NONE
We expect no matches.
matchPattern(ttr.target, ttr.mm4.in.del.target, max.mismatch = 6, with.indels = F)
#> Views on a 23-letter DNAString subject
#> subject: AAATACAGTGTTGCCACTCTAAA
#> views:
#> start end width
#> [1] 2 23 22 [AATACAGTGTTGCCACTCTAAA]
We expect 1 match.
matchPattern(ttr.target, ttr.mm4.in.del.target, max.mismatch = 6, with.indels = T)
#> Views on a 23-letter DNAString subject
#> subject: AAATACAGTGTTGCCACTCTAAA
#> views:
#> start end width
#> [1] 2 22 21 [AATACAGTGTTGCCACTCTAA]
vcountPattern(ttr.target, DNAStringSet(ttr.mm4.in.del.target), max.mismatch = 6, with.indels = T)
#> [1] 1