==================
large_data_mode.py
==================

.. code-block:: bash

	usage: large_data_mode.py

	Autometa Large-data-mode binning by contig set selection using max-partition-
	size

	optional arguments:
	  -h, --help            show this help message and exit
	  --kmers filepath      Path to k-mer counts table (default: None)
	  --coverages filepath  Path to metagenome coverages table (default: None)
	  --gc-content filepath
	                        Path to metagenome GC contents table (default: None)
	  --markers filepath    Path to Autometa annotated markers table (default:
	                        None)
	  --taxonomy filepath   Path to Autometa assigned taxonomies table (default:
	                        None)
	  --output-binning filepath
	                        Path to write Autometa binning results (default: None)
	  --output-main filepath
	                        Path to write Autometa main table used during/after
	                        binning (default: None)
	  --clustering-method {dbscan,hdbscan}
	                        Clustering algorithm to use for recursive binning.
	                        (default: dbscan)
	  --completeness 0 < float <= 100
	                        completeness cutoff to retain cluster. e.g. cluster
	                        completeness >= `completeness` (default: 20.0)
	  --purity 0 < float <= 100
	                        purity cutoff to retain cluster. e.g. cluster purity
	                        >= `purity` (default: 95.0)
	  --cov-stddev-limit float
	                        coverage standard deviation limit to retain cluster
	                        e.g. cluster coverage standard deviation <= `cov-
	                        stddev-limit` (default: 25.0)
	  --gc-stddev-limit float
	                        GC content standard deviation limit to retain cluster
	                        e.g. cluster GC content standard deviation <= `gc-
	                        content-stddev-limit` (default: 5.0)
	  --norm-method {am_clr,ilr,clr}
	                        kmer normalization method to use on kmer counts
	                        (default: am_clr)
	  --pca-dims int        PCA dimensions to reduce normalized kmer frequencies
	                        prior to embedding (default: 50)
	  --embed-method {bhsne,umap,sksne,trimap}
	                        kmer embedding method to use on normalized kmer
	                        frequencies (default: bhsne)
	  --embed-dims int      Embedding dimensions to reduce normalized kmers table
	                        after PCA. (default: 2)
	  --max-partition-size int
	                        Maximum number of contigs to consider for a recursive
	                        binning batch. (default: 10000)
	  --starting-rank {superkingdom,phylum,class,order,family,genus,species}
	                        Canonical rank at which to begin subsetting taxonomy
	                        (default: superkingdom)
	  --reverse-ranks       Reverse order at which to split taxonomy by canonical-
	                        rank. When `--reverse-ranks` is given, contigs will be
	                        split in order of species, genus, family, order,
	                        class, phylum, superkingdom. (default: False)
	  --cache dirpath       Directory to store itermediate checkpoint files during
	                        binning (If this is provided and the job fails, the
	                        script will attempt to begin from the checkpoints in
	                        this cache directory). (default: None)
	  --binning-checkpoints filepath
	                        File path to store itermediate contig binning results
	                        (The `--cache` argument is required for this feature).
	                        If `--cache` is provided without this argument, a
	                        binning checkpoints file will be created. (default:
	                        None)
	  --rank-filter {superkingdom,phylum,class,order,family,genus,species}
	                        Taxonomy column canonical rank to subset by provided
	                        value of `--rank-name-filter` (default: superkingdom)
	  --rank-name-filter RANK_NAME_FILTER
	                        Only retrieve contigs with this name corresponding to
	                        `--rank-filter` column (default: bacteria)
	  --verbose             log debug information (default: False)
	  --cpus int            Number of cores to use by clustering method (default
	                        will try to use as many as are available) (default:
	                        -1)