autometa.config package

Submodules

autometa.config.databases module

# License: GNU Affero General Public License v3 or later # A copy of GNU AGPL v3 should have been included in this software package in LICENSE.txt.

This file contains the Databases class responsible for configuration handling of Autometa Databases.

class autometa.config.databases.Databases(config=<configparser.ConfigParser object>, dryrun=False, nproc=2, update=False)

Bases: object

Database class containing methods to allow downloading/formatting/updating Autometa database dependencies.

Parameters:
  • config (config.ConfigParser) – Config containing database dependency information. (the default is DEFAULT_CONFIG).

  • dryrun (bool) – Run through database checking without performing downloads/formatting (the default is False).

  • nproc (int) – Number of processors to use to perform database formatting. (the default is mp.cpu_count()).

  • update (bool) – Overwrite existing databases with more up-to-date database files. (the default is False).

ncbi_dir

</path/to/databases/markers> SECTIONS : dict keys are sections respective to database config sections and values are options within the sections.

Type:

str </path/to/databases/ncbi> markers_dir : str

SECTIONS = {'markers': ['bacteria_single_copy', 'bacteria_single_copy_cutoffs', 'archaea_single_copy', 'archaea_single_copy_cutoffs'], 'ncbi': ['nodes', 'names', 'merged', 'delnodes', 'accession2taxid', 'nr']}
compare_checksums(section: Optional[str] = None) Dict[str, Dict]

Get all invalid database files in options from section in config. An md5 checksum comparison will be performed between the current and file’s remote md5 to ensure file integrity prior to checking the respective file as valid.

Parameters:

section (str, optional Configure provided section Choices include) – ‘markers’ and ‘ncbi’. (default will download/format all database directories)

Returns:

dict {section

Return type:

{option, option,…}, section:{…}, …}

configure(section: Optional[str] = None, no_checksum: bool = False) ConfigParser

Configures Autometa’s database dependencies by first checking missing dependencies then comparing checksums to ensure integrity of files.

Download and format databases for all options in each section.

This will only perform the download and formatting if self.dryrun is False. This will update out-of-date databases if self.update is True.

Parameters:

section (str, optional Configure provided section. Choices include) – ‘markers’ and ‘ncbi’. (default will download/format all database directories) no_checksum : bool, optional Do not perform checksum comparisons (Default is False).

Returns:

databases sections.

Return type:

configparser.ConfigParser config with updated options in respective

Raises:

ValueError Provided section does not match 'ncbi', or 'markers'. – ConnectionError A connection issue occurred when connecting to NCBI or GitHub.

download_gtdb_files() None
download_markers(options: Iterable) None

Download markers database files and amend user config to reflect this.

Parameters:

options (iterable) – iterable containing options in ‘markers’ section to download.

Returns:

Will update provided options in self.config.

Return type:

NoneType

Raises:

ConnectionError – marker file download failed.

download_missing(section: Optional[str] = None) None

Download missing Autometa database dependencies from provided section. If no section is provided will check all sections.

Parameters:

section (str, optional) – Section to check for missing database files (the default is None). Choices include ‘ncbi’, and ‘markers’.

Returns:

Will update provided section in self.config.

Return type:

NoneType

Raises:

ValueError – Provided section does not match ‘ncbi’ and ‘markers’.

download_ncbi_files(options: Iterable) None

Download NCBI database files.

Parameters:

options (iterable) – iterable containing options in ‘ncbi’ section to download.

Returns:

Will update provided options in self.config.

Return type:

NoneType

Raises:
  • subprocess.CalledProcessError – NCBI file download with rsync failed.

  • ConnectionError – NCBI file checksums do not match after file transfer.

extract_taxdump() None

Extract autometa required files from ncbi taxdump.tar.gz archive into ncbi databases directory and update user config with extracted paths.

This only extracts nodes.dmp, names.dmp, merged.dmp and delnodes.dmp from taxdump.tar.gz if the files do not already exist. If update was originally supplied as True to the Databases instance, then the previous files will be replaced by the new taxdump files.

After successful extraction of the files, a checksum will be written of the archive for future checking.

Returns:

Will update self.config section ncbi with options ‘nodes’, ‘names’, ‘merged’, ‘delnodes’

Return type:

NoneType

fix_invalid_checksums(section: Optional[str] = None) None

Download/Update/Format databases where checksums are out-of-date.

Parameters:

section (str, optional) – Configure provided section. Choices include ‘markers’ and ‘ncbi’. (default will download/format all database directories)

Returns:

Will update provided options in self.config.

Return type:

NoneType

Raises:

ConnectionError – Failed to connect to section host site.

format_nr() None

Construct a diamond formatted database (nr.dmnd) from nr option in ncbi section in user config.

NOTE: The checksum ‘nr.dmnd.md5’ will only be generated if nr.dmnd construction is successful. If the provided nr option in ncbi is ‘nr.gz’ the database will be removed after successful database formatting.

Returns:

config updated option:’nr’ in section:’ncbi’.

Return type:

NoneType

get_missing(section: Optional[str] = None) Dict[str, Dict]

Get all missing database files in options from sections in config.

Parameters:

section (str, optional) – Configure provided section. Choices include ‘markers’ and ‘ncbi’. (default will download/format all database directories)

Returns:

{section:{option, option,…}, section:{…}, …}

Return type:

dict

get_remote_checksum(section: str, option: str) str
Get the checksum from provided section respective to option in

self.config.

sectionstr

section to retrieve for checksums section. Choices include: ‘ncbi’ and ‘markers’.

optionstr

option in checksums section corresponding to the section checksum file.

str

checksum of remote md5 file. e.g. ‘hash filename

ValueError

‘section’ must be ‘ncbi’ or ‘markers’

ConnectionError

No internet connection available.

ConnectionError

Failed to connect to host for provided option.

press_hmms() None

hmmpress markers hmm database files.

Return type:

NoneType

satisfied(section: Optional[str] = None, compare_checksums: bool = False) bool

Determines whether all database dependencies are satisfied.

Parameters:
  • section (str) – section to retrieve for checksums section. Choices include: ‘ncbi’ and ‘markers’.

  • compare_checksums (bool, optional) – Also check if database information is up-to-date with current hosted databases. (default is False).

Returns:

True if all database dependencies are satisfied, otherwise False.

Return type:

bool

autometa.config.databases.main()

autometa.config.environ module

# License: GNU Affero General Public License v3 or later # A copy of GNU AGPL v3 should have been included in this software package in LICENSE.txt.

Configuration handling for Autometa environment.

autometa.config.environ.bedtools()

Get bedtools version.

Returns:

version of bedtools

Return type:

str

autometa.config.environ.bowtie2()

Get bowtie2 version.

Returns:

version of bowtie2

Return type:

str

autometa.config.environ.configure(config: ConfigParser) Tuple[ConfigParser, bool]

Checks executable dependencies necessary to run autometa. Will update config with executable dependencies with details: 1. presence/absence of dependency and its location 2. versions

Parameters:

config (configparser.ConfigParser) – Description of parameter config.

Returns:

(config, satisfied) config updated with executables details Details: 1. location of executable 2. version of executable config : configparser.ConfigParser satisfied : bool

Return type:

2-tuple

autometa.config.environ.diamond()

Get diamond version.

Returns:

version of diamond

Return type:

str

autometa.config.environ.find_executables()

Retrieves executable file paths by looking in Autometa dependent executables.

Returns:

{executable:</path/to/executable>, …}

Return type:

dict

autometa.config.environ.get_versions(program: Optional[str] = None) Union[Dict[str, str], str]

Retrieve versions from all required executable dependencies. If program is provided will only return version for program.

See: https://stackoverflow.com/a/834451/12671809

Parameters:

program (str, optional) – the program to retrieve the version, by default None

Returns:

if program is None: dict - {program:version, …} if program: str - version

Return type:

dict or str

Raises:
  • ValueErrorprogram is not a string

  • KeyErrorprogram is not an executable dependency.

autometa.config.environ.hmmpress()

Get hmmpress version.

Returns:

version of hmmpress

Return type:

str

autometa.config.environ.hmmscan()

Get hmmscan version.

Returns:

version of hmmscan

Return type:

str

autometa.config.environ.hmmsearch()

Get hmmsearch version.

Returns:

version of hmmsearch

Return type:

str

autometa.config.environ.prodigal()

Get prodigal version.

Returns:

version of prodigal

Return type:

str

autometa.config.environ.samtools()

Get samtools version.

Returns:

version of samtools

Return type:

str

autometa.config.utilities module

autometa.config.utilities.get_config(fpath: str) ConfigParser

Load the config provided at fpath.

Parameters:

fpath (str) – </path/to/file.config>

Returns:

interpolated config object parsed from fpath.

Return type:

config.ConfigParser

Raises:

FileNotFoundError – Provided fpath does not exist.

autometa.config.utilities.main()
autometa.config.utilities.parse_args(fpath: Optional[str] = None) Namespace

Generate argparse namespace (args) from config file.

Parameters:

fpath (str) – </path/to/file.config> (default is DEFAULT_CONFIG in autometa.config)

Returns:

namespace typical to parser.parse_args() method from argparse

Return type:

argparse.Namespace

Raises:

FileNotFoundError – provided fpath does not exist.

autometa.config.utilities.put_config(config: ConfigParser, out: str) None

Writes config to out and updates checkpoints checksum.

Parameters:
  • config (config.ConfigParser) – configuration containing user provided parameters and files information.

  • out (str) – </path/to/output/file.config>

Return type:

NoneType

autometa.config.utilities.set_home_dir() str

Set the home_dir in autometa’s default configuration (default.config) based on autometa’s current location. If the home_dir variable is already set, then this will be used as the home_dir location.

Returns:

</path/to/package/autometa>

Return type:

str

autometa.config.utilities.update_config(section: str, option: str, value: str, fpath: str = '/home/docs/checkouts/readthedocs.org/user_builds/autometa/checkouts/latest/build/lib/autometa/config/default.config') None

Update fpath in section for option with value.

Parameters:
  • fpath (str) – </path/to/file.config>

  • section (str) – section header to update within fpath.

  • option (str) – option to update within section.

  • value (str) – value to update option.

Return type:

NoneType

Module contents