The IsoSeq sequencing method produces full-length transcripts using Single Molecule, Real-Time (SMRT) Sequencing. Long read lengths allow sequencing of full-length transcripts up to 10 kb or longer, removing the need for transcript assembly or inferencing. The IsoSeq bioinformatics pipeline processes the data into high-quality consensus transcript sequences that enable accurate isoform annotation and open reading frame prediction. OmicsBox implements the IsoSeq v.3. pipeline with contains the latest tools to identify transcripts in PacBio single-molecule sequencing data.
Since OmicsBox 2.0, the IsoSeq bioinformatics pipeline based on IsoSeq v3 is available in the Transcriptomics Module.
De-Novo Isoform Discovery of PacBio Data with OmicsBox
- The IsoSeq application accepts PacBio sequencing in the form of subreads and circular consensus sequences (CCS). When subreads are provided, the circular consensus sequence calling step is performed. Subreads are required in BAM format, while CCS reads could be provided in FASTA and FASTQ format too.
- The IsoSeq pipeline consists of several steps: CCS calling, primer removal and demultiplexing, refine, clustering and polishing. The IsoSeq configuration wizard allows adjusting parameters for each step.
- Output consensus transcripts are returned in different formats (FASTA, BAM, and FASTQ). This is complemented with additional results, such as charts and reports, which help to interpret them.
Note: OmicsBox allows to obtain a single consensus transcript sets running multiple IsoSeq movies in parallel.
Note: Take advantage of the bioinformatic tools provided by the Transcriptomics and Functional Analysis modules, such as the Predict Coding Regions and CloudBlast tools, to analyze the consensus transcripts in depth.
IsoSeq v3. Scalable De Novo Isoform Discovery. Töpfer, A. and Tseng, E. 2020. https://github.com/PacificBiosciences/IsoSeq.