usage: large_data_mode.py
Autometa Large-data-mode binning by contig set selection using max-partition-
size
optional arguments:
-h, --help show this help message and exit
--kmers filepath Path to k-mer counts table (default: None)
--coverages filepath Path to metagenome coverages table (default: None)
--gc-content filepath
Path to metagenome GC contents table (default: None)
--markers filepath Path to Autometa annotated markers table (default:
None)
--taxonomy filepath Path to Autometa assigned taxonomies table (default:
None)
--output-binning filepath
Path to write Autometa binning results (default: None)
--output-main filepath
Path to write Autometa main table used during/after
binning (default: None)
--clustering-method {dbscan,hdbscan}
Clustering algorithm to use for recursive binning.
(default: dbscan)
--completeness 0 < float <= 100
completeness cutoff to retain cluster. e.g. cluster
completeness >= `completeness` (default: 20.0)
--purity 0 < float <= 100
purity cutoff to retain cluster. e.g. cluster purity
>= `purity` (default: 95.0)
--cov-stddev-limit float
coverage standard deviation limit to retain cluster
e.g. cluster coverage standard deviation <= `cov-
stddev-limit` (default: 25.0)
--gc-stddev-limit float
GC content standard deviation limit to retain cluster
e.g. cluster GC content standard deviation <= `gc-
content-stddev-limit` (default: 5.0)
--norm-method {am_clr,ilr,clr}
kmer normalization method to use on kmer counts
(default: am_clr)
--pca-dims int PCA dimensions to reduce normalized kmer frequencies
prior to embedding (default: 50)
--embed-method {bhsne,umap,sksne,trimap}
kmer embedding method to use on normalized kmer
frequencies (default: bhsne)
--embed-dims int Embedding dimensions to reduce normalized kmers table
after PCA. (default: 2)
--max-partition-size int
Maximum number of contigs to consider for a recursive
binning batch. (default: 10000)
--starting-rank {superkingdom,phylum,class,order,family,genus,species}
Canonical rank at which to begin subsetting taxonomy
(default: superkingdom)
--reverse-ranks Reverse order at which to split taxonomy by canonical-
rank. When `--reverse-ranks` is given, contigs will be
split in order of species, genus, family, order,
class, phylum, superkingdom. (default: False)
--cache dirpath Directory to store itermediate checkpoint files during
binning (If this is provided and the job fails, the
script will attempt to begin from the checkpoints in
this cache directory). (default: None)
--binning-checkpoints filepath
File path to store itermediate contig binning results
(The `--cache` argument is required for this feature).
If `--cache` is provided without this argument, a
binning checkpoints file will be created. (default:
None)
--rank-filter {superkingdom,phylum,class,order,family,genus,species}
Taxonomy column canonical rank to subset by provided
value of `--rank-name-filter` (default: superkingdom)
--rank-name-filter RANK_NAME_FILTER
Only retrieve contigs with this name corresponding to
`--rank-filter` column (default: bacteria)
--verbose log debug information (default: False)
--cpus int Number of cores to use by clustering method (default
will try to use as many as are available) (default:
-1)