Databases๏
Markers๏
Autometa comes packaged with the necessary markers files. Links to these markers files and their associated cutoff values are below:
NCBI๏
If you are running Autometa for the first time you will need to download the NCBI databases. You may do this manually or using a few Autometa helper scripts. If you would like to use Autometaโs scripts for this, you will first need to download Autometa (See Installation).
# First configure where you want to download the NCBI databases
autometa-config \
--section databases --option ncbi \
--value <path/to/your/ncbi/database/directory>
# Now download and format the NCBI databases
autometa-update-databases --update-ncbi
Note
You can check the default config paths using autometa-config --print
.
See autometa-update-databases -h
and autometa-config -h
for full list of options.
The previous command will download the following NCBI databases:
- Non-redundant nr database
- prot.accession2taxid.gz
- nodes.dmp, names.dmp and merged.dmp - Found within
After these files are downloaded, the taxdump.tar.gz
tarballโs files are extracted and the non-redundant protein database (nr.gz
)
is formatted as a diamond database (i.e. nr.dmnd
). This will significantly speed-up the diamond blastp
searches.
Genome Taxonomy Database (GTDB)๏
If you would like to incorporate the benefits of using the Genome Taxonomy Database. You may do this manually or using a few Autometa helper scripts. If you would like to use Autometaโs scripts for this, you will first need to install Autometa (See Installation).
You can either run the following script or manually download the respective databases.
# First configure where you want to download the GTDB databases
autometa-config \
--section databases --option gtdb \
--value <path/to/your/gtdb/database/directory>
# To use a specific GTDB release
autometa-config \
--section gtdb --option release \
--value latest
# Or --value r207 or --value r202, etc.
# Download and format the configured GTDB databases release
autometa-update-databases --update-gtdb
Note
You can check the default config paths using autometa-config --print
.
See autometa-update-databases -h
and autometa-config -h
for full list of options.
The previous command will download the following GTDB databases and format the gtdb_proteins_aa_reps.tar.gz to generate gtdb.dmnd to be used by Autometa:
- Amino acid sequences of representative genome
- gtdb-taxdump.tar.gz from shenwei356/gtdb-taxdump
Once unzipped gtdb-taxdump.tar.gz will have the taxdump files of all the respective GTDB releases. Make sure that the release you use is in line with the gtdb_proteins_aa_reps.tar.gz release version. Itโs better to always use the latest version.
All the taxonomy files for a specific taxonomy database should be in a single directory. You can now copy the taxdump files of the desired release version in the sample directory as gtdb.dmnd
Alternatively if you have manually downloaded gtdb_proteins_aa_reps.tar.gz and gtdb-taxdump.tar.gz you can run the following command to format the gtdb_proteins_aa_reps.tar.gz to generate gtdb.dmnd and make it ready for Autometa.
python -m autometa.taxonomy.gtdb --reps-faa <path/to/gtdb_proteins_aa_reps.tar.gz> --dbdir <path/to/output_directory> --cpus 20
Note
Again Make sure that the formatted gtdb_proteins_aa_reps.tar.gz databse and gtdb taxdump files are in the same directory.