Generate a SeedMatchReport for select sequence definitions
Tareian Cazares
SeedMatchReport.Rmd
Generating a SeedMatchReport
A SeedMatchReport
is an analysis that will run through a
pre-defined list of sequence definitions for your siRNA and scan
annotations found in your DESEQ2 results file to report some basic
statistics.
Example workflow
Load annotations
annodb = load_annotations(reference.name = "rnor6", canonical = FALSE, min.feature.width = 8, longest.utr = T)
#> Build AnnotationFilter for transcript features based on the following parameters:
#> Keep only standard chroms: TRUE
#> Remove rows with NA in transcript ID: TRUE
#> Keep only protein coding genes and transcripts: TRUE
#> Filtering for transcripts with support level: FALSE
#> Keep only the ENSEMBL canonical transcript: FALSE
#> Filtering for specific genes: FALSE
#> Filtering for specific transcripts: FALSE
#> Filtering for specific gene symbols: FALSE
#> Filtering for specific entrez id: FALSE
#> Loading annotations from AnnotationHub for rnor6
#> loading from cache
#> require("rtracklayer")
#> Warning: replacing previous import 'S4Arrays::makeNindexFromArrayViewport' by
#> 'DelayedArray::makeNindexFromArrayViewport' when loading 'SummarizedExperiment'
#> loading from cache
#> require("ensembldb")
#> Extracting 3UTR from ensembldb object.
#> Keeping the longest UTR per gene.
#> Extracting sequences for each feature.
#> Keeping sequences that are >= 8
Load example DESeq2 data
get_example_data("sirna")
#> Example data directory being created at: /home/runner/.local/share/R/SeedMatchR
#> Warning in dir.create(data.path, recursive = TRUE):
#> '/home/runner/.local/share/R/SeedMatchR' already exists
sirna.data = load_example_data("sirna")
res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg
res = filter_res(res)
Generate report
The report can be generated searches with and without indels. It is
important to think about how indels will alter the results of the
analysis. The edit distance (D) corresponds to the number of indels and
mismatches allowed during the search. The edit distance is the total of
mismatches + indels. Therefore, if you have the indel.bool
flag set to TRUE
then any insertion and deletion will
counts towards the edit distance. So a edit distance of 4 could be 4
mismatches or 3 mismatches + 1 indel or any combination of indel +
mismatches.
Generate report without indels
default.report = SeedMatchReport(res = res, seqs = annodb$seqs, guide.seq = "UUAUAGAGCAAGAACACUGUUUU", indel.bool = FALSE)
default.report$table
In-silico siRNA Binding Prediction | ||||||||||||||||||||
Identifying siRNA hits in the transcriptome | ||||||||||||||||||||
Full Guide Strand (g2:g23)
|
18-mer (g2:g19)
|
15-mer (g2:g19)
|
8mer | 7mer-m8 | 7mer-A1 | 6mer | Total | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
D0 | D1 | D2 | D3 | D4 | D0 | D1 | D2 | D3 | D4 | D0 | D1 | D2 | D3 | D4 | ||||||
SeedMatchReport | ||||||||||||||||||||
In silico predictions | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 7 | 102 | 0 | 0 | 15 | 266 | 2,226 | 86 | 214 | 409 | 764 | 4,092 |
Expressed predictions | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 64 | 0 | 0 | 11 | 166 | 1,267 | 37 | 104 | 231 | 409 | 2,297 |
Off-target predictions | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 0 | 0 | 3 | 11 | 50 | 3 | 6 | 14 | 24 | 116 |
% off-target | 0.00% | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 14.29% | 4.69% | 0.00% | 0.00% | 27.27% | 6.63% | 3.95% | 8.11% | 5.77% | 6.06% | 5.87% | 5.05% |
Generate report with indels
indel.report = SeedMatchReport(res = res, seqs = annodb$seqs, guide.seq = "UUAUAGAGCAAGAACACUGUUUU", indel.bool = TRUE)
indel.report$table
In-silico siRNA Binding Prediction | ||||||||||||||||||||
Identifying siRNA hits in the transcriptome | ||||||||||||||||||||
Full Guide Strand (g2:g23)
|
18-mer (g2:g19)
|
15-mer (g2:g19)
|
8mer | 7mer-m8 | 7mer-A1 | 6mer | Total | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
D0 | D1 | D2 | D3 | D4 | D0 | D1 | D2 | D3 | D4 | D0 | D1 | D2 | D3 | D4 | ||||||
SeedMatchReport | ||||||||||||||||||||
In silico predictions | 0 | 1 | 0 | 1 | 20 | 0 | 0 | 1 | 82 | 1,124 | 0 | 0 | 8 | 1,100 | 6,765 | 15 | 27 | 55 | 135 | 9,334 |
Expressed predictions | 0 | 1 | 0 | 1 | 11 | 0 | 0 | 0 | 53 | 663 | 0 | 0 | 3 | 642 | 3,483 | 6 | 9 | 31 | 66 | 4,969 |
Off-target predictions | 0 | 1 | 0 | 1 | 2 | 0 | 0 | 0 | 5 | 31 | 0 | 0 | 0 | 28 | 124 | 1 | 0 | 2 | 7 | 202 |
% off-target | 0.00% | 100.00% | 0.00% | 100.00% | 18.18% | 0.00% | 0.00% | 0.00% | 9.43% | 4.68% | 0.00% | 0.00% | 0.00% | 4.36% | 3.56% | 16.67% | 0.00% | 6.45% | 10.61% | 4.07% |
Generate report with wobbles
wobble.report = SeedMatchReport(res = res, seqs = annodb$seqs, guide.seq = "UUAUAGAGCAAGAACACUGUUUU", indel.bool = FALSE, allow_wobbles = TRUE)
wobble.report$table
In-silico siRNA Binding Prediction | ||||||||||||||||||||
Identifying siRNA hits in the transcriptome | ||||||||||||||||||||
Full Guide Strand (g2:g23)
|
18-mer (g2:g19)
|
15-mer (g2:g19)
|
8mer | 7mer-m8 | 7mer-A1 | 6mer | Total | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
D0 | D1 | D2 | D3 | D4 | D0 | D1 | D2 | D3 | D4 | D0 | D1 | D2 | D3 | D4 | ||||||
SeedMatchReport | ||||||||||||||||||||
In silico predictions | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 4 | 0 | 0 | 0 | 37 | 524 | 0 | 0 | 0 | 0 | 566 |
Expressed predictions | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 4 | 0 | 0 | 0 | 24 | 321 | 0 | 0 | 0 | 0 | 350 |
Off-target predictions | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 3 | 17 | 0 | 0 | 0 | 0 | 21 |
% off-target | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 12.50% | 5.30% | 0.00% | 0.00% | 0.00% | 0.00% | 6.00% |
Generate report with wobbles and with indels
indel.wobble.report = SeedMatchReport(res = res, seqs = annodb$seqs, guide.seq = "UUAUAGAGCAAGAACACUGUUUU", indel.bool = TRUE, allow_wobbles = TRUE)
indel.wobble.report$table
In-silico siRNA Binding Prediction | ||||||||||||||||||||
Identifying siRNA hits in the transcriptome | ||||||||||||||||||||
Full Guide Strand (g2:g23)
|
18-mer (g2:g19)
|
15-mer (g2:g19)
|
8mer | 7mer-m8 | 7mer-A1 | 6mer | Total | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
D0 | D1 | D2 | D3 | D4 | D0 | D1 | D2 | D3 | D4 | D0 | D1 | D2 | D3 | D4 | ||||||
SeedMatchReport | ||||||||||||||||||||
In silico predictions | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 15 | 0 | 0 | 0 | 124 | 2,471 | 0 | 0 | 0 | 0 | 2,611 |
Expressed predictions | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 12 | 0 | 0 | 0 | 74 | 1,384 | 0 | 0 | 0 | 0 | 1,471 |
Off-target predictions | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 8 | 55 | 0 | 0 | 0 | 0 | 64 |
% off-target | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 100.00% | 0.00% | 0.00% | 0.00% | 0.00% | 10.81% | 3.97% | 0.00% | 0.00% | 0.00% | 0.00% | 4.35% |