Command-Line Interface

ancify provides a single CLI command with several subcommands. It can be invoked either as ancify (after installation) or as python -m ancify.

Synopsis

ancify [-h] [-v] {init,project,call,evaluate,run,train} ...

Global options

Flag	Description
`-h`, `--help`	Show help and exit
`-v`, `--verbose`	Enable debug-level logging (prints per-chromosome progress, timing, etc.)

Subcommands

`ancify init` — Generate a config template

ancify init                    # writes config.yaml to current directory
ancify init -o my_config.yaml  # writes to a custom path

Option	Description
`-o`, `--output`	Output path (default: `config.yaml`)

This creates a fully annotated YAML template with all fields documented. It is a good starting point for any new species.

`ancify project` — Phase 1: Coordinate projection

ancify project -c config.yaml
ancify project -c config.yaml -n 8    # override num_cpus

Option	Description
`-c`, `--config`	Path to YAML config file (required)
`-n`, `--num-cpus`	Override `num_cpus` from config

What it does: For each outgroup species and each chromosome, reads the pairwise alignment file and projects the outgroup’s bases onto the focal genome’s coordinate system.

Output: <work_dir>/projected/<species>/<chrom>.fa for every (species, chromosome) pair.

Runtime: This is the slow step (hours for a full human genome). Each AXT file is streamed sequentially per chromosome.

`ancify call` — Phase 2: Ancestral state inference

ancify call -c config.yaml
ancify call -c config.yaml -n 4

Option	Description
`-c`, `--config`	Path to YAML config file (required)
`-n`, `--num-cpus`	Override `num_cpus` from config

What it does: Reads projected FASTA files, infers the ancestral allele at every position using the configured method (voting, parsimony, likelihood, or ML), and writes confidence-encoded ancestral FASTA files.

Output: <output_dir>/<chrom>.fa for each chromosome.

Runtime: Fast (minutes for a full human genome).

`ancify evaluate` — Phase 3: Evaluation

ancify evaluate -c config.yaml

Option	Description
`-c`, `--config`	Path to YAML config file (required)
`-n`, `--num-cpus`	Override `num_cpus` from config

What it does: Compares ancestral calls against a reference ancestral sequence and/or VCF variant data.

Requires: The evaluation block in the config, and scikit-allel for VCF comparison.

Output: Per-chromosome evaluation files in <output_dir>/evaluation/.

`ancify run` — All phases end-to-end

ancify run -c config.yaml
ancify run -c config.yaml -n 24
ancify -v run -c config.yaml          # verbose output

Option	Description
`-c`, `--config`	Path to YAML config file (required)
`-n`, `--num-cpus`	Override `num_cpus` from config

What it does: Runs Phase 1 → Phase 2 → Phase 3 in sequence. Phase 3 is skipped if the evaluation block is absent.

`ancify train` — Train an ML model

ancify train -c config.yaml
ancify train -c config.yaml -o model.lgb -n 4

Option	Description
`-c`, `--config`	Path to YAML config file (required)
`-o`, `--output`	Output path for the trained model (default: from config or `work_dir`)
`-n`, `--num-cpus`	Override `num_cpus` from config

What it does: Trains a LightGBM classifier for use with method: ml. Uses high-confidence voting sites as labels by default, or an external reference if ml_training_reference is set. Run this once before using method: ml in your config.

Requires: lightgbm and scikit-learn (e.g. pip install '.[ml]').

Workflow patterns

Standard full run

The most common usage — run everything from start to finish:

ancify run -c config.yaml

Iterate on Phase 2 settings

Phase 1 is expensive. Once the projected files exist, you can re-run Phase 2 with different settings without redoing Phase 1:

# First time: run everything
ancify run -c config.yaml

# Later: tweak min_inner_freq and re-call
# (edit config.yaml to change min_inner_freq)
ancify call -c config.yaml

Debug a failing run

ancify -v run -c config.yaml -n 1 2>&1 | tee ancify.log

-v enables verbose logging (per-chromosome progress)
-n 1 uses a single worker (easier to read output, avoids interleaved logs)
tee saves output to a file while still printing to screen

Test with a single chromosome first

Edit your config to process only one small chromosome:

chromosomes:
  - chr22    # smallest human autosome

Then run normally. This lets you verify the setup in minutes instead of hours.

Process species independently (Phase 1)

For large genomes, you can parallelize Phase 1 across machines by running separate configs for each outgroup, then combining:

# On machine 1:
ancify project -c config_bonobo_only.yaml

# On machine 2:
ancify project -c config_chimp_only.yaml

# Then combine on one machine:
ancify call -c config_all.yaml

Phase 2 reads from whatever projected files exist in <work_dir>/projected/.

Examples

Run the included human example:

ancify run -c example_configs/hg38_bcgm.yaml

Run only projection, then call separately:

ancify project -c config.yaml
ancify call -c config.yaml

Generate a config and inspect it:

ancify init -o my_species.yaml
cat my_species.yaml