Introduction
RNA-seq data from metatranscriptomics NGS projects contains both coding and non-coding types of RNAs. Before any gene expression or taxonomic analysis, it is important to separate reads in families of messenger RNAs and ribosomal RNAs (rRNAs). SortMeRNA, first released in 2012, is a fast and accurate tool for filtering ribosomal RNAs in metatranscriptomics datasets. The core algorithm is based on approximate seeds. The original publication states:
SortMeRNA has shown to be a rapid and efficient filter that can sort a large set of metatranscriptomic reads with high accuracy comparable with the HMM-based programs. SortMeRNA implements seeds with errors (substitution and indel), and this important characteristic renders the algorithm robust to errors of different types of sequencers while providing the ability to discover new rRNA sequences from unknown species.
The method used by the algorithm is universal and flexible. The database can be constructed on any family of sequences provided by the user. Moreover, the algorithm does not require a multiple sequence alignment file to build the database, as HMM-based programs do, and this is an advantage when sequences are hard to align or only partial sequences are available. Another advantage of SortMeRNA is the small number of parameter settings required by the program
Since OmicsBox 2.0, the rRNA Removal features based on SortMeRNA are available in the Metagenomics Module.
SortMeRNA in OmicsBox
- The rRNA Removal tool accepts NGS reads in fasta, fastq single, and fastq paired-end format and separates the dataset into 2 parts, mRNA and rRNA. The output is again a set of fastq files.
- OmicsBox allows running SortMeRNA against a set of databases which include: Rfam and Silva 16s, 18s, and 23s rRNA sequences. An additional database containing rRNA sequences can be provided as fasta file.
- Multiple Fastq files can be run in parallel. A run of a 1GB fastq file takes about 30 minutes.
- The output can directly be used for further analysis steps like taxonomic quantification with Kraken.
References
Kopylova E., Noé L. and Touzet H., “SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data”, Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.