autometa.config package
Submodules
autometa.config.databases module
# License: GNU Affero General Public License v3 or later # A copy of GNU AGPL v3 should have been included in this software package in LICENSE.txt.
This file contains the Databases class responsible for configuration handling of Autometa Databases.
- class autometa.config.databases.Databases(config=<configparser.ConfigParser object>, dryrun=False, nproc=2, update=False)
Bases:
object
Database class containing methods to allow downloading/formatting/updating Autometa database dependencies.
- Parameters:
config (config.ConfigParser) – Config containing database dependency information. (the default is DEFAULT_CONFIG).
dryrun (bool) – Run through database checking without performing downloads/formatting (the default is False).
nproc (int) – Number of processors to use to perform database formatting. (the default is mp.cpu_count()).
update (bool) – Overwrite existing databases with more up-to-date database files. (the default is False).
- ncbi_dir
</path/to/databases/markers> SECTIONS : dict keys are sections respective to database config sections and values are options within the sections.
- Type:
str </path/to/databases/ncbi> markers_dir : str
- SECTIONS = {'markers': ['bacteria_single_copy', 'bacteria_single_copy_cutoffs', 'archaea_single_copy', 'archaea_single_copy_cutoffs'], 'ncbi': ['nodes', 'names', 'merged', 'delnodes', 'accession2taxid', 'nr']}
- compare_checksums(section: Optional[str] = None) Dict[str, Dict]
Get all invalid database files in options from section in config. An md5 checksum comparison will be performed between the current and file’s remote md5 to ensure file integrity prior to checking the respective file as valid.
- Parameters:
section (str, optional Configure provided section Choices include) – ‘markers’ and ‘ncbi’. (default will download/format all database directories)
- Returns:
dict {section
- Return type:
{option, option,…}, section:{…}, …}
- configure(section: Optional[str] = None, no_checksum: bool = False) ConfigParser
Configures Autometa’s database dependencies by first checking missing dependencies then comparing checksums to ensure integrity of files.
Download and format databases for all options in each section.
This will only perform the download and formatting if self.dryrun is False. This will update out-of-date databases if self.update is True.
- Parameters:
section (str, optional Configure provided section. Choices include) – ‘markers’ and ‘ncbi’. (default will download/format all database directories) no_checksum : bool, optional Do not perform checksum comparisons (Default is False).
- Returns:
databases sections.
- Return type:
configparser.ConfigParser config with updated options in respective
- Raises:
ValueError Provided section does not match 'ncbi', or 'markers'. – ConnectionError A connection issue occurred when connecting to NCBI or GitHub.
- download_gtdb_files() None
- download_markers(options: Iterable) None
Download markers database files and amend user config to reflect this.
- Parameters:
options (iterable) – iterable containing options in ‘markers’ section to download.
- Returns:
Will update provided options in self.config.
- Return type:
NoneType
- Raises:
ConnectionError – marker file download failed.
- download_missing(section: Optional[str] = None) None
Download missing Autometa database dependencies from provided section. If no section is provided will check all sections.
- Parameters:
section (str, optional) – Section to check for missing database files (the default is None). Choices include ‘ncbi’, and ‘markers’.
- Returns:
Will update provided section in self.config.
- Return type:
NoneType
- Raises:
ValueError – Provided section does not match ‘ncbi’ and ‘markers’.
- download_ncbi_files(options: Iterable) None
Download NCBI database files.
- Parameters:
options (iterable) – iterable containing options in ‘ncbi’ section to download.
- Returns:
Will update provided options in self.config.
- Return type:
NoneType
- Raises:
subprocess.CalledProcessError – NCBI file download with rsync failed.
ConnectionError – NCBI file checksums do not match after file transfer.
- extract_taxdump() None
Extract autometa required files from ncbi taxdump.tar.gz archive into ncbi databases directory and update user config with extracted paths.
This only extracts nodes.dmp, names.dmp, merged.dmp and delnodes.dmp from taxdump.tar.gz if the files do not already exist. If update was originally supplied as True to the Databases instance, then the previous files will be replaced by the new taxdump files.
After successful extraction of the files, a checksum will be written of the archive for future checking.
- Returns:
Will update self.config section ncbi with options ‘nodes’, ‘names’, ‘merged’, ‘delnodes’
- Return type:
NoneType
- fix_invalid_checksums(section: Optional[str] = None) None
Download/Update/Format databases where checksums are out-of-date.
- Parameters:
section (str, optional) – Configure provided section. Choices include ‘markers’ and ‘ncbi’. (default will download/format all database directories)
- Returns:
Will update provided options in self.config.
- Return type:
NoneType
- Raises:
ConnectionError – Failed to connect to section host site.
- format_nr() None
Construct a diamond formatted database (nr.dmnd) from nr option in ncbi section in user config.
NOTE: The checksum ‘nr.dmnd.md5’ will only be generated if nr.dmnd construction is successful. If the provided nr option in ncbi is ‘nr.gz’ the database will be removed after successful database formatting.
- Returns:
config updated option:’nr’ in section:’ncbi’.
- Return type:
NoneType
- get_missing(section: Optional[str] = None) Dict[str, Dict]
Get all missing database files in options from sections in config.
- Parameters:
section (str, optional) – Configure provided section. Choices include ‘markers’ and ‘ncbi’. (default will download/format all database directories)
- Returns:
{section:{option, option,…}, section:{…}, …}
- Return type:
dict
- get_remote_checksum(section: str, option: str) str
- Get the checksum from provided section respective to option in
self.config.
- sectionstr
section to retrieve for checksums section. Choices include: ‘ncbi’ and ‘markers’.
- optionstr
option in checksums section corresponding to the section checksum file.
- str
checksum of remote md5 file. e.g. ‘hash filename
‘
- ValueError
‘section’ must be ‘ncbi’ or ‘markers’
- ConnectionError
No internet connection available.
- ConnectionError
Failed to connect to host for provided option.
- press_hmms() None
hmmpress markers hmm database files.
- Return type:
NoneType
- satisfied(section: Optional[str] = None, compare_checksums: bool = False) bool
Determines whether all database dependencies are satisfied.
- Parameters:
section (str) – section to retrieve for checksums section. Choices include: ‘ncbi’ and ‘markers’.
compare_checksums (bool, optional) – Also check if database information is up-to-date with current hosted databases. (default is False).
- Returns:
True if all database dependencies are satisfied, otherwise False.
- Return type:
bool
- autometa.config.databases.main()
autometa.config.environ module
# License: GNU Affero General Public License v3 or later # A copy of GNU AGPL v3 should have been included in this software package in LICENSE.txt.
Configuration handling for Autometa environment.
- autometa.config.environ.bedtools()
Get bedtools version.
- Returns:
version of bedtools
- Return type:
str
- autometa.config.environ.bowtie2()
Get bowtie2 version.
- Returns:
version of bowtie2
- Return type:
str
- autometa.config.environ.configure(config: ConfigParser) Tuple[ConfigParser, bool]
Checks executable dependencies necessary to run autometa. Will update config with executable dependencies with details: 1. presence/absence of dependency and its location 2. versions
- Parameters:
config (configparser.ConfigParser) – Description of parameter config.
- Returns:
(config, satisfied) config updated with executables details Details: 1. location of executable 2. version of executable config : configparser.ConfigParser satisfied : bool
- Return type:
2-tuple
- autometa.config.environ.diamond()
Get diamond version.
- Returns:
version of diamond
- Return type:
str
- autometa.config.environ.find_executables()
Retrieves executable file paths by looking in Autometa dependent executables.
- Returns:
{executable:</path/to/executable>, …}
- Return type:
dict
- autometa.config.environ.get_versions(program: Optional[str] = None) Union[Dict[str, str], str]
Retrieve versions from all required executable dependencies. If program is provided will only return version for program.
See: https://stackoverflow.com/a/834451/12671809
- Parameters:
program (str, optional) – the program to retrieve the version, by default None
- Returns:
if program is None: dict - {program:version, …} if program: str - version
- Return type:
dict or str
- Raises:
ValueError – program is not a string
KeyError – program is not an executable dependency.
- autometa.config.environ.hmmpress()
Get hmmpress version.
- Returns:
version of hmmpress
- Return type:
str
- autometa.config.environ.hmmscan()
Get hmmscan version.
- Returns:
version of hmmscan
- Return type:
str
- autometa.config.environ.hmmsearch()
Get hmmsearch version.
- Returns:
version of hmmsearch
- Return type:
str
- autometa.config.environ.prodigal()
Get prodigal version.
- Returns:
version of prodigal
- Return type:
str
- autometa.config.environ.samtools()
Get samtools version.
- Returns:
version of samtools
- Return type:
str
autometa.config.utilities module
- autometa.config.utilities.get_config(fpath: str) ConfigParser
Load the config provided at fpath.
- Parameters:
fpath (str) – </path/to/file.config>
- Returns:
interpolated config object parsed from fpath.
- Return type:
config.ConfigParser
- Raises:
FileNotFoundError – Provided fpath does not exist.
- autometa.config.utilities.main()
- autometa.config.utilities.parse_args(fpath: Optional[str] = None) Namespace
Generate argparse namespace (args) from config file.
- Parameters:
fpath (str) – </path/to/file.config> (default is DEFAULT_CONFIG in autometa.config)
- Returns:
namespace typical to parser.parse_args() method from argparse
- Return type:
argparse.Namespace
- Raises:
FileNotFoundError – provided fpath does not exist.
- autometa.config.utilities.put_config(config: ConfigParser, out: str) None
Writes config to out and updates checkpoints checksum.
- Parameters:
config (config.ConfigParser) – configuration containing user provided parameters and files information.
out (str) – </path/to/output/file.config>
- Return type:
NoneType
- autometa.config.utilities.set_home_dir() str
Set the home_dir in autometa’s default configuration (default.config) based on autometa’s current location. If the home_dir variable is already set, then this will be used as the home_dir location.
- Returns:
</path/to/package/autometa>
- Return type:
str
- autometa.config.utilities.update_config(section: str, option: str, value: str, fpath: str = '/home/docs/checkouts/readthedocs.org/user_builds/autometa/checkouts/latest/build/lib/autometa/config/default.config') None
Update fpath in section for option with value.
- Parameters:
fpath (str) – </path/to/file.config>
section (str) – section header to update within fpath.
option (str) – option to update within section.
value (str) – value to update option.
- Return type:
NoneType