| F.A.Q. |
- What is BIP Toolbox?:
BIP-Toolbox is a modular collection of biological tools which are used to create bio databases. Each module, or tool, performs a specific function which contributes to the resulting database.
BIP stands for BioInformatics Pipeline. It provides generalized pipeline functions, as well as tools designed specifically for biologically relavent analysis.
- Why BIP-Toolbox?
We want to develop a modular flexible way to construct pipelines to populate databases.
- So what does it do?
BIP creates a database based on input data. The database can be a simple database storing information regarding transcript alignment to a genome, or something more specific such as alternative splicing, SNPs or STS markers. These different databases are created based on which tools are used from the toolbox.
- How can I use this pipeline and databases?
You can use the BIP-Toolbox to create biological databases ranging from the general purpose alignment database, to a more specific database which suits your unique needs. All databases begin with the aligment tool, BIP-Align. Depending on yours needs, you may then refine the resulting database using another tool such as BIP-Splice, etc. Since most of these tools are still in the design phase, we are still evaluating the types of combinations which are logical.
- How is the clustering performed?
Clustering in BIP-Splice is ultimately deterministic. Each transcript maps to one area in the genome. Every overlapping transcript is a member of a cluster. There is no nearest-neighbor problem or other distance finding needed, distances are measured purly by overlap. If a transcript is overlapped at each end by two or more other transcripts, the entire region is taken as a cluster. This overlap must occur in exonic regions, not intronic. I.e. actual transcript sequence, not introns, must overlap.
- Is there a probabilistic step in the pipeline?
The issue of probablistic vs. deterministic actually goes back further to the alignment step in BIP-Align. Each transcript may have multiple possible mappings, we keep the best mapping. I.e. the probabilty is that the highest scoring alignment for a transcript is the correct one. Once the alignments are passed off to clustering, the transcript can only belong to one cluster.
- Why BLAT/SIM4 and not some other comination, or just one of these tools?
Basically, this combination gives the best outcome in terms of speed and accuracy.
SIM4 gives a very accurate exon/intron structure. This is important for splice analysis because if one exon is misplaced by the slightest number of bases, then the splice analysis will give incorrect results. The problem with SIM4 is speed. Running the entire transcriptome against the genome is too computationally expensive.
Blat alone works very fast. It does not give a precise enough view of the exon/intron structure though, so it cannot be used alone for splice analysis.
The current combination uses BLAT as a first step, the output of which is expanded and fed to SIM4.
- What if I want to use another alignment method?
BIP-Align has a modular architecture which would allow another alignment method to be used. A new alignment module needs to be developed for this to happen.
- Which input data sources make sense?
Basically, any data source which provides mRNA/cDNA/EST sequence. The more information the better - ideal situation would be to have library/tissue information, annotation, annotated CDS, and any synonyms from this ID to another database. But BIP can use something as simple as sequences in a fasta file. Preferred data sources are GenBank, UniGene and dbEST.
More about the individual tools:
| BIP-Align |
All of the following tools rely on the database created by this tool. It's function is to create a database, and populate with transcript data and the alignment to the genome. |
| BIP-Splice |
This tool adds alternative splicing analysis to a database created by BIP-Align |
| BIP-SNP |
This tool adds SNP analysis to a database created by BIP-Align |
| BIP-Marker |
This tool adds STS marker information to a database created by BIP-Align |
|