fbpx

Cloud Blast and Cloud Units: How to use resources wisely

Cloud Blast and Cloud Units: How to use resources wisely

This article describes how to run CloudBlast wisely. OmicsBox allows you to perform NCBI Blast searches using a cloud system for high-performance computing. This system runs jobs in parallel and autoscales depending on demand. To control usage we use so-called CloudUnits. These units represent computation time (cpu seconds) on the AWS infrastructure. At the moment only Blast and InterProScan consume units in OmicsBox as they represent over 80% of the total cloud costs.

All OmicsBox subscriptions include a certain amount of units that can easily be recharged. However, to consume these resources wisely, you should take several aspects into consideration when performing Blast searches. Please remember that the consumption of units does not depend on the overall time it takes to Blast your dataset but on the amount of used computational resources. Be cautious when analyzing big sequence datasets for the first time and please read the following recommendations:

Reduce your search space:

Use a taxonomic filter. This saves time and computational resources (and units). Of course, this is just a recommendation and the final decision depends on you, your research requirements and budget. Normally, if you are looking for potential homologous sequences, for example a plant species you may want to consider only plant species and exclude the significantly large amount of bacterial genomes (>50%) in the NR protein database. This saves time and units.

Adjust search sensitivity:

We recommend using the blastx-fast configuration which basically increases the word-size of the alignments from 3 to 6. According to the NCBI, this should not alter your homology search results significantly but provide a performance increase. This configuration has been described more in detail by Shiryev et al. here: http://www.ncbi.nlm.nih.gov/pubmed/17921491 (2007).

The CloudBlast wizard in OmicsBox with the filter option to include or exclude taxonomic groups.

Check on your Cloud Usage:

Use the “Cloud Usage” tab to review your unit consumption from within OmicsBox (from the “View” menu). Please see image below.

If you are new to CloudBlast and OmicsBox, we recommend you to try your final blast parameters on a smaller dataset first. This avoids surprises. Use a small subset of your dataset (for example 1000 sequences) to estimate the units consumption and than review and compare a few of your alignment results and the amount of consumed units.

Table with jobs executed in the cloud and its corresponding consumption of CloudUnits. Details of each job are provided in the tooltip.

Example:

If you are blasting 1000 nucleotide sequences (contigs, CDS) against the NR database without a filter on a specific taxonomy group and with blastx (without the “fast” option) means you are searching all 6 reading frames of all sequences against the world’s largest protein sequences collection with great sensitivity. Compared to a blast search again a plant subset with blastx-fast, consumption is more than 10 times lower. The below numbers are from September, 30th 2020.

  • 1000 nucleotide sequences with blastx-fast against the NCBI non-redundant database: 46 min. and 99173 CloudUnits
  • 1000 nucleotide sequences with blastx-fast against the NCBI non-redundant viridiplantae subset: 26 min. and 7880 CloudUnits

Disclaimer:

BioBam is not interested in selling any extra cloud units. Units are pass-through items for the sole purpose to cover cloud expenses.

cloudblast_settings
Share on facebook
Share on twitter
Share on linkedin

Blog Categories:

News

Releases, Media, Announcements, etc.

Use Cases, Reviews, Tutorials

Product Tutorial, Quickstarts, New Features, etc.

Video Tutorials

Helpful Features, Tips and Tricks

Tips And Tricks

Mini-tutorials for common use-cases and to address frequently asked questions FAQs

Most Popular:

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on email
Email
Share on print
Print