OmicsBox allows renaming the name of all selected sequences by converting, replacing or adding text to the actual sequence name.
1 – Rename sequence suffix using regular expressions
It is possible that someone realised, too late, that the sequence IDs in the Blast results and in the FASTA file used for InterProScan differ slightly.
The prefix of the FASTA file is similar, but there is an extra extension in the last part of the sequence name.
OmicsBox offers the so-called Batch Rename feature, which uses the regular expression to search for a term in the sequence name and change for the desired one.
This is a feature available in the Functional Analysis module and can be found under the Tools (Select, Rename, Search).
Using the data from the example, it is possible to find 2 different solutions.
- Add the extra extension to the blast results project.
- The suffix to be added is _(ORF). Figure 1
Figure 1: Add the term to the end of the sequence name.
- Remove the extra extension from the project with the InterPro scan results by replacing it with nothing.
- The search terms to match is _\(ORF\). Figure 2
Figure 2: Replace the search term by nothing.
Now that the identifiers match, it is possible to combine both projects and add the Fasta and the InterProScan results to the project with the blast results.
To combine both projects visit the following Tips & Tricks.
2 – Rename using mapping file
The batch rename feature also allows you to use a mapping file to perform the sequence names replacement. This mapping file should contain the original sequence names you want to rename and their replacements, separated by a tab.
In this use case, we have a tomato dataset with non-informative sequence names, and we want to replace these names with the original gene IDs from Ensembl. The mapping file used for this purpose looks like this (Figure 3):
Figure 3: Mapping file.
We can do the following (Figure 4):
Figure 4: Use the mapping file to rename the original sequence names.
After running this process, the sequence names have been replaced and now contain the gene identifiers (Figure 5):
Figure 5: Sequence names replaced with the Ensembl gene IDs.