================== large_data_mode.py ================== .. code-block:: bash usage: large_data_mode.py Autometa Large-data-mode binning by contig set selection using max-partition- size optional arguments: -h, --help show this help message and exit --kmers filepath Path to k-mer counts table (default: None) --coverages filepath Path to metagenome coverages table (default: None) --gc-content filepath Path to metagenome GC contents table (default: None) --markers filepath Path to Autometa annotated markers table (default: None) --taxonomy filepath Path to Autometa assigned taxonomies table (default: None) --output-binning filepath Path to write Autometa binning results (default: None) --output-main filepath Path to write Autometa main table used during/after binning (default: None) --clustering-method {dbscan,hdbscan} Clustering algorithm to use for recursive binning. (default: dbscan) --completeness 0 < float <= 100 completeness cutoff to retain cluster. e.g. cluster completeness >= `completeness` (default: 20.0) --purity 0 < float <= 100 purity cutoff to retain cluster. e.g. cluster purity >= `purity` (default: 95.0) --cov-stddev-limit float coverage standard deviation limit to retain cluster e.g. cluster coverage standard deviation <= `cov- stddev-limit` (default: 25.0) --gc-stddev-limit float GC content standard deviation limit to retain cluster e.g. cluster GC content standard deviation <= `gc- content-stddev-limit` (default: 5.0) --norm-method {am_clr,ilr,clr} kmer normalization method to use on kmer counts (default: am_clr) --pca-dims int PCA dimensions to reduce normalized kmer frequencies prior to embedding (default: 50) --embed-method {bhsne,umap,sksne,trimap} kmer embedding method to use on normalized kmer frequencies (default: bhsne) --embed-dims int Embedding dimensions to reduce normalized kmers table after PCA. (default: 2) --max-partition-size int Maximum number of contigs to consider for a recursive binning batch. (default: 10000) --starting-rank {superkingdom,phylum,class,order,family,genus,species} Canonical rank at which to begin subsetting taxonomy (default: superkingdom) --reverse-ranks Reverse order at which to split taxonomy by canonical- rank. When `--reverse-ranks` is given, contigs will be split in order of species, genus, family, order, class, phylum, superkingdom. (default: False) --cache dirpath Directory to store itermediate checkpoint files during binning (If this is provided and the job fails, the script will attempt to begin from the checkpoints in this cache directory). (default: None) --binning-checkpoints filepath File path to store itermediate contig binning results (The `--cache` argument is required for this feature). If `--cache` is provided without this argument, a binning checkpoints file will be created. (default: None) --rank-filter {superkingdom,phylum,class,order,family,genus,species} Taxonomy column canonical rank to subset by provided value of `--rank-name-filter` (default: superkingdom) --rank-name-filter RANK_NAME_FILTER Only retrieve contigs with this name corresponding to `--rank-filter` column (default: bacteria) --verbose log debug information (default: False) --cpus int Number of cores to use by clustering method (default will try to use as many as are available) (default: -1)