Skip to contents
library(SeedMatchR)
library(Biostrings)
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> Loading required package: stats4
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: XVector
#> Loading required package: GenomeInfoDb
#> 
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#> 
#>     strsplit

Overview

SeedMatchR uses Biostrings::matchPattern variants for searching sequences. This article describes different parameters of the search and their results.

Load example data

This example uses the siRNA sequence, D1, targeting the Ttr gene in rat liver from the publication:

Schlegel MK, Janas MM, Jiang Y, Barry JD, Davis W, Agarwal S, Berman D, Brown CR, Castoreno A, LeBlanc S, Liebow A, Mayo T, Milstein S, Nguyen T, Shulga-Morskaya S, Hyde S, Schofield S, Szeto J, Woods LB, Yilmaz VO, Manoharan M, Egli M, Charissé K, Sepp-Lorenzino L, Haslett P, Fitzgerald K, Jadhav V, Maier MA. From bench to bedside: Improving the clinical safety of GalNAc-siRNA conjugates using seed-pairing destabilization. Nucleic Acids Res. 2022 Jul 8;50(12):6656-6670. doi: 10.1093/nar/gkac539. PMID: 35736224; PMCID: PMC9262600.

Load sequence data

The guide sequence of interest is 23 bp long and oriented 5’ -> 3’.

# siRNA sequence of interest targeting a 23 bp region of the Ttr gene
guide.seq = "UUAUAGAGCAAGAACACUGUUUU"

# siRNA target sequence
ttr.target = get_seed(guide.seq = guide.seq, seed.name = "Full")$Target.Seq

ttr.target.wobble = get_seed(guide.seq = guide.seq, seed.name = "Full", allow_wobbles = TRUE)$Target.Seq

Load annotation databases

annodb = load_annotations(reference.name = "rnor6", canonical = FALSE, min.feature.width = 8, longest.utr = T)
#> Build AnnotationFilter for transcript features based on the following parameters: 
#> Keep only standard chroms: TRUE
#> Remove rows with NA in transcript ID: TRUE
#> Keep only protein coding genes and transcripts: TRUE
#> Filtering for transcripts with support level: FALSE
#> Keep only the ENSEMBL canonical transcript: FALSE
#> Filtering for specific genes: FALSE
#> Filtering for specific transcripts: FALSE
#> Filtering for specific gene symbols: FALSE
#> Filtering for specific entrez id: FALSE
#> Loading annotations from AnnotationHub for rnor6
#> loading from cache
#> require("rtracklayer")
#> Warning: replacing previous import 'S4Arrays::makeNindexFromArrayViewport' by
#> 'DelayedArray::makeNindexFromArrayViewport' when loading 'SummarizedExperiment'
#> loading from cache
#> require("ensembldb")
#> Extracting 3UTR from ensembldb object.
#> Keeping the longest UTR per gene.
#> Extracting sequences for each feature.
#> Keeping sequences that are >= 8

Load example transcript

We will use SeedMatchR to load an example transcript for Ttr in rats. This sequence is the target of an siRNA that is used in this package.

# Original target sequence
ttr.base.target = DNAString("AAAACAGTGTTCTTGCTCTATAA")

# Ttr + 1 insert @ 12
ttr.ins.target = DNAString("AAAACAGTGTTGCTTGCTCTATAA")

# Ttr + 4 mm @ 12-15 + 1 deletion @ 21 + 1 insert @ 4
ttr.mm4.in.del.target = DNAString("AAATACAGTGTTGCCACTCTAAA")

Load example DE data

get_example_data("sirna")
#> Example data directory being created at: /home/runner/.local/share/R/SeedMatchR
#> Warning in dir.create(data.path, recursive = TRUE):
#> '/home/runner/.local/share/R/SeedMatchR' already exists

sirna.data = load_example_data("sirna")

res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg

res = filter_res(res, fdr_cutoff=1, fc_cutoff=0)

Search variants

Base Ttr example

We expect to see a single complete match

matchPattern(ttr.target, ttr.base.target, max.mismatch = 0, with.indels = F)
#> Views on a 23-letter DNAString subject
#> subject: AAAACAGTGTTCTTGCTCTATAA
#> views:
#>       start end width
#>   [1]     1  22    22 [AAAACAGTGTTCTTGCTCTATA]

Ttr with insertion

We expect the match not to be found without the proper flags.

matchPattern(ttr.target, ttr.ins.target, max.mismatch = 0, with.indels = F)
#> Views on a 24-letter DNAString subject
#> subject: AAAACAGTGTTGCTTGCTCTATAA
#> views: NONE

We expect this search to have no matches despite the indel flag since the max.mismatch argument is at 0.

matchPattern(ttr.target, ttr.ins.target, max.mismatch = 0, with.indels = T)
#> Views on a 24-letter DNAString subject
#> subject: AAAACAGTGTTGCTTGCTCTATAA
#> views: NONE

We expect there to be no match since there is no indels allowed, but we still allow for mismatches.

matchPattern(ttr.target, ttr.ins.target, max.mismatch = 1, with.indels = F)
#> Views on a 24-letter DNAString subject
#> subject: AAAACAGTGTTGCTTGCTCTATAA
#> views: NONE

We expect the search to find 1 match since max.mismatch is set to 1 and indels are allowed.

matchPattern(ttr.target, ttr.ins.target, max.mismatch = 1, with.indels = T)
#> Views on a 24-letter DNAString subject
#> subject: AAAACAGTGTTGCTTGCTCTATAA
#> views:
#>       start end width
#>   [1]     1  23    23 [AAAACAGTGTTGCTTGCTCTATA]

Testing multiple mismatches and indels and how they are reported

The total edit distance from the original sequence is 6 (4 mismatches, 1 insertion, 1 deletion). We want to see how these different parameters are counted in the max.mismatch category with indels.

We expect no matches.

matchPattern(ttr.target, ttr.mm4.in.del.target, max.mismatch = 4, with.indels = F)
#> Views on a 23-letter DNAString subject
#> subject: AAATACAGTGTTGCCACTCTAAA
#> views: NONE

We expect no matches.

matchPattern(ttr.target, ttr.mm4.in.del.target, max.mismatch = 4, with.indels = T)
#> Views on a 23-letter DNAString subject
#> subject: AAATACAGTGTTGCCACTCTAAA
#> views: NONE

We expect no matches.

matchPattern(ttr.target, ttr.mm4.in.del.target, max.mismatch = 6, with.indels = F)
#> Views on a 23-letter DNAString subject
#> subject: AAATACAGTGTTGCCACTCTAAA
#> views:
#>       start end width
#>   [1]     2  23    22 [AATACAGTGTTGCCACTCTAAA]

We expect 1 match.

matchPattern(ttr.target, ttr.mm4.in.del.target, max.mismatch = 6, with.indels = T)
#> Views on a 23-letter DNAString subject
#> subject: AAATACAGTGTTGCCACTCTAAA
#> views:
#>       start end width
#>   [1]     2  22    21 [AATACAGTGTTGCCACTCTAA]
vcountPattern(ttr.target, DNAStringSet(ttr.mm4.in.del.target), max.mismatch = 6, with.indels = T)
#> [1] 1