Examples for TFO search, TTS search and Triplex search
Triplexator Examples
We give three generic examples on how
Triplexator
can be applied to biological sequence data.
Each example is addressing one of the
three steps involved in
triplex formation.
The file provided in ./demos/P00374.fasta serves as an example sequence.
It contains the genomic region spanning 1000bp up and 200 bp downstream of the transcription start site from the human DHFR gene.
Identify TFOs in single-strand sequences
We want to find all putative triplex-forming oligonucleotide in a set of
transcripts subject to the following specifics:
- at least 15 bps in length "-l 15"
- having at most 15% errors in the motif "-e 15"
- we may be only interested in TFOs that form triplexes of the
purine and the purine-pyrimidine motif "-m R,M"
- we want to remove low complexity regions of length >= 7 and period <=1
(e.g. for polyA filtering) "-fr on -mrl 7 -mrp 1"
- output all sites "-of 0"
- indicate errors with small letters (pretty output) "-po "
- output to the file names transcripts.tfo "-o transcripts.tfo"
- place the results in a specific forder "-od folder"
Command:
>triplexator -l 15 -e 15 -m R,M -fr on -mrl 7 -mrp 1 -of 0 -po -od folder -o transcripts.tfo -ss transcripts.fasta
An example script searching TFOs in the DHFR gene promoter is provided in demos:
>./demos/tfo_search.sh
It produces three output files:
Identify high quality putative TTSs in a genome
We want to find all putative target sites in
genome, which comply to
the following specifics:
- at least 15 bps in length "-l 15"
- containing at least 50% guanines "-g 50"
- having at most 10% pyrimidine interruptions "-e 10"
- filtered for low complexity regions of length >= 7 and period <=3
"-fr on -mrl 7 -mrp 3"
- at most 2 duplicates in the duplex set with strict detection algorithm "-dd 2 -dc 2"
- output all sites "-of 0"
- output to the file names genome.tts "-o genome.tts"
Command:
>triplexator -l 15 -g 50 -e 10 -fr on -mrl 7 -mrp 3 -dd 2 -dc 2 -of 0 -o genome.tts -ds genome.fasta
An example script searching TTSs in the DHFR gene promoter is provided in demos:
>./demos/tfo_search.sh
It produces three output files:
Identify TFO-TTS pairs in single-strand and duplex sequences
We want to find all putative triplexes that can form between a set of
transcripts and
promoters subject to the following specifics:
- at least 15 bps in length "-l 15"
- having at most 20% errors "-e 20"
- tolerate up to 2 consecutive errors "-c 2"
- require a guanine ration of at least 20% "-g 20"
- disable low-complexity filtering "-fr off"
- use the purine motif only "-m R"
- given the previous parameter disable q-gram filtering since it woun't help "-fm 0"
- output the alignments "-of 1"
- we like to look at the alignments so make them pretty "-po "
- output to the file names transcripts_promoters.tpx
"-o transcripts_promoters.tpx"
- we don't have much time but lots of memory so run in parallel,
promoters are fairly short, so parallelize on duplexes "-rm 2"
- but don't use all my processors I still have to work, I'll give you 3
"-p 3"
Attention: this specific parameter setting spans a large search space and may require substantial computational resources.
Command:
>triplexator -l 15 -e 20 -c 2 -fr off -g 20 -m R -fm 0 -of 1 -o transcripts_promoters.tpx -po -rm 2 -p 3 -ss transcripts.fasta -ds promoters.fasta
An example script searching triplexes in the DHFR gene promoter is provided in demos:
>./demos/tfo_search.sh
It produces three output files: