This articles explains what a Gene Set Enrichment Analysis (GSEA) is, how it works and how it can be performed with OmicsBox.
What is an enrichment analysis?
An enrichment analysis is a bioinformatics method which identifies enriched or over-represented gene sets among a list of ranked genes. Gene sets are groups of genes that are functionally related according to current knowledge. Commonly used sets of genes are those sharing biological functions like gene ontology terms, pathways or a common relation like a disease, chromosomal location or regulation.
How works a Gene Set Enrichment Analysis (GSEA)?
GSEA is a computational method to determine whether an a priori defined set of genes shows a statistically significant difference between biological samples. This method is used to identify classes of genes or proteins that are over-represented in a large set of genes or proteins; these classes may have an association with biological functions or disease phenotypes. The method uses statistical approaches to identify significantly enriched or depleted classes or functions.
The standard GSEA method involves three steps in the analytical process:
- Calculate the enrichment score (ES): represents the amount to which the genes in the set are over-represented at either the top or bottom of the list.
- Estimate the statistical significance of the ES: this calculation is done by a phenotypic-based permutation test in order to produce a null distribution for the ES.
- Adjust for multiple hypothesis testing for when a large number of gene sets are being analyzed at one time: the enrichment scores for each set are normalized and a false discovery rate is calculated.
A gene set enrichment analysis uses specific statistics and requires the corresponding implementations to run the analysis.
Blast2GO makes it very easy to perform a gene set enrichment analysis (GSEA)
Blast2GO as a complete bioinformatics toolset allows you to perform gene set enrichment analysis (GSEA), among many other functions. Blast2GO makes use of the GSEA software package developed by the MIT/BROAD Institute. Its integration in Blast2GO makes it easy to run the analysis and review the results, allowing you to focus on its interpretation.
The steps on how to perform a gene set enrichment analysis (GSEA) with Blast2GO are explained in this short video.
The video shows how to identify enriched functions from a tissue comparison performing GSEA with Blast2GO. To run GSEA a ranked list of functionally annotated genes is required. This list can be created in different ways:
- One option allows us to load the list of IDs and numeric values into a spreadsheet and to save it as a text file.
- Another option is to directly use the differential expression data from within Blast2GO by creating an ID-Value List via the table context menu “Create ID-Value-List”. This ID-Value List contains two columns that can be saved as a .b2g object or can be exported as a text file.
To start the GSEA you have to load the functional annotations of your genes/proteins which have to match the IDs of your ranked list. Once the Blast2GO project is loaded and the ranked list is created, you are ready to run the enrichment analysis. Click on ‘Analysis – Gene set enrichment analysis (GSEA)’ and select the input file, you can choose among different formats. Then provide the analysis parameters and hit run:
- Specify the number of gene set permutations.
- Introduce the number of detailed GO enrichment plots we would like to create.
- Choose the Gene Ontology categories you want to use.
- Set a maximum and minimum size of the gene-sets (GOs) to be included in the analysis.
- Select the filter mode and the cut-off.
Once the analysis is finished you will obtain a result table which shows all significantly over-represented functions among the IDs at the top and bottom of your ranked list. Additionally to the GO ID and GO term of each function the results provides many details:
- The Enrichment Score (ES) reflects the degree of over-representation of a GO at the extremes of the ranked list.
- The normalized ES is the primary statistic for this type of enrichment results.
- The FDR is the adjusted p-value, a statistical value adjusted for multiple testing.
- The last column indicated if a GO is enriched at the top or the bottom of the ranked list.
By right-clicking on the GO IDs a new page provides more details like the GO description and GSEA result details. An “enrichment plot” provides a graphical view of the enrichment score (ES) for a gene set.
The enrichment plot shows a green line representing the running ES for a given GO as the analysis goes down the ranked list. The value at the peak is the final ES. The middle part shows where the members (GOs) of the dataset appear in the ranked list. Those genes that appear at or before the ES represent the Leading Edge Subset. The lower part shows the value of the ranking metric as it moves down the list of the ranked genes.
The result page has a toolbar with several options like created charts, filter the results or save it as a text file. The option ‘Reduce to most specific’ allows to filter the results based on their specificity; ‘Make an enrichment graph’ generates a GO graph for each GO category selected in the wizard and ‘Show global statistics’ which generate different statistical graphs.
These visualizations will help in the interpretation of the results, to find biological meaning as well as to communicate your findings.
If you want to try all this yourself you can download Blast2GO from here.
Gene Set Enrichment Analysis can also be executed in OmicsBox, please download it from here.