Changelog
1.5.0 (2026)
Documentation: tutorials for ancestral FASTAs and VCF polarization, plus a species catalogue.
Documentation
Tutorials. New Tutorial 4: Getting Ancestral FASTA Files — how to locate, verify, and read per-chromosome output; sanity checks; loading all chromosomes; confidence encoding recap. New Tutorial 5: Polarizing VCF Variants — annotating REF/ALT as ancestral/derived with cyvcf2 and scikit-allel, a complete
polarize_vcf.pyscript (AA/DAF INFO fields), unfolded SFS example, and tips on confidence filtering.Species guide. New Species catalogue with 115 commonly studied species: suggested inner and outer outgroups, UCSC assembly identifiers, and approximate divergence times. Grouped by clade (primates, rodents, carnivores, ungulates, birds, fish, Drosophila, plants, fungi, etc.). Includes a short section on generating your own net AXT alignments with lastz when UCSC data is unavailable.
1.4.0 (2026)
Likelihood-based ancestral reconstruction and expanded installation docs.
Likelihood method (Felsenstein pruning)
New
method: likelihood. Infers ancestral alleles using Felsenstein’s pruning algorithm on a user-supplied tree with branch lengths. Root posterior probabilities are computed under a continuous-time substitution model and mapped to the same case-encoded confidence scheme (uppercase / lowercase /n/N).Substitution models. Four models are supported: JC69, K80, HKY85, and GTR. All use normalised rate matrices (one expected substitution per unit branch length). Transition probabilities use
scipy.linalg.expm(JC69 has a closed-form shortcut).New
ancify.likelihoodmodule.SubstitutionModelbase class,JC69,K80,HKY85,GTRclasses,felsenstein_pruning(),call_ancestral_base_likelihood(),_call_chromosome_likelihood()worker, andbuild_model()factory.New config fields:
substitution_model,model_kappa,model_base_freqs,model_rates,likelihood_high_threshold,likelihood_low_threshold. Validation requires a tree with leaf names matching outgroups; GTR requires sixmodel_rates; base frequencies must sum to ~1.Core dependency: added SciPy (>=1.7) for matrix exponentiation.
Documentation and README
Algorithm docs. New section on the likelihood method: substitution models, Felsenstein pruning steps, worked example, and comparison with voting/parsimony/ML. Summary section updated for all four methods.
Configuration docs. Field reference and method table updated for
likelihood; new subsection “Likelihood YAML” with branch-length requirement and GTR example; validation and config recipes updated.Landing page (docs/index.rst). Likelihood added to the method list and “Why ancify?” bullet.
README. Major expansion: Installation now includes prerequisites, core install, optional extras table (
evaluate,fast,ml,docs,dev,all), GPU acceleration (voting only), verify-install commands, and quick-reference table. README also updated for four methods throughout (intro, confidence encoding, key fields, How it works, CLI, project structure).
Tests
New
tests/test_likelihood.py. Rate-matrix properties (row sums, detailed balance, normalisation), transition-probability properties (P(0)=I, rows sum to 1, equilibrium limit), Felsenstein pruning and posteriors, confidence encoding, agreement with parsimony on unambiguous cases.tests/test_config.py. NewTestValidateLikelihoodfor tree requirement, model name, GTR rates, base freqs, and thresholds.tests/test_ancestral.py. NewTestCallAncestralBaseLikelihoodmirroring parsimony tests.
Version and CLI
Version set to 1.4.0 in
pyproject.toml.CLI template (
EXAMPLE_CONFIGincli.py) updated with commented likelihood example and branch-length tree.
1.3.0 (2026)
Machine learning-based ancestral calling and documentation updates.
ML-based ancestral calling. New
method: mloption uses a LightGBM gradient-boosted classifier trained on per-position features (outgroup agreement, GC content, CpG flag, etc.) to predict ancestral alleles. Confidence is derived from predicted class probabilities. Install withpip install 'ancify[ml]'(requireslightgbmandscikit-learn).New
ancify.mlmodule. Feature extraction (extract_features()), model loading, and vectorized prediction for full-chromosome runs. Integrates with the existing pipeline via config and CLI.New config field:
methodnow supports"voting","parsimony", and"ml". Formethod: ml, optionalmodel_pathpoints to a trained LightGBM model (or uses a bundled default when available).CLI and config updated to pass method selection and ML options through to the calling phase.
Documentation: algorithm page and configuration reference updated for the ML method; GPU logo and conf tweaks.
Tests:
tests/test_ml.pyfor feature extraction, prediction shape, and integration.Lock file:
uv.lockadded for reproducible installs.
1.2.0 (2026)
Fitch parsimony for tree-based ancestral inference.
Fitch parsimony method. New
method: parsimonyoption uses the Fitch (1971) algorithm on a user-supplied Newick phylogenetic tree to infer ancestral alleles. This resolves many positions that the two-tier voting method marks as “unresolved” by leveraging the tree topology.Newick tree parser. Built-in recursive-descent parser for Newick-format trees (
ancify.parsimony). Supports branch lengths, quoted labels, and multifurcations.New config fields:
method("voting"/"parsimony") andtree(inline Newick string or path to.nwkfile).Config validation checks that tree leaf names match outgroup species names when parsimony is selected.
New
call_ancestral_base_parsimony()function inancify.ancestralfor programmatic per-position Fitch calls.Comprehensive test suite for the parsimony module: Newick parsing, Fitch bottom-up/top-down passes, full algorithm with ILS scenarios, missing data handling, and confidence encoding.
Documentation updates: algorithm page with Fitch walkthrough, configuration reference with parsimony examples, API reference for the new module, and updated README.
1.1.0 (2026)
GPU acceleration and a vectorized compute backend for much faster Phase 1 and Phase 2 runs.
Performance
GPU-accelerated ancestral calling (Phase 2). Ancestral state inference runs as a small number of tensor operations on the GPU instead of per-position Python loops. On an NVIDIA A100, the full human genome completes in under 2 minutes (vs. hours on the original scalar path).
Vectorized coordinate projection (Phase 1). Net AXT projection uses NumPy vectorized scatter (CPU) or PyTorch scatter on CUDA. The per-character Python loop is removed, giving roughly 20–50× speedup on CPU.
Multi-GPU support. When using the GPU backend, chromosomes are distributed round-robin across available NVIDIA GPUs. Use the
gpu_devicesconfig field to restrict which devices are used.Faster gzip decompression. Optional
isal(Intel ISA-L) dependency provides 2–5× faster gzip decompression for large AXT files. Install withpip install 'ancify[fast]'.
New module and config
ancify.backendmodule. Central abstraction for CPU/GPU execution:detect_backend(),get_available_gpus(),open_gz(), and vectorized implementations of majority vote, ancestral calling, and block scatter for projection.New config fields:
backend("auto"/"cpu"/"gpu") andgpu_devices(optional list of GPU IDs, e.g.[0, 1, 2]). Withbackend: auto, ancify uses the GPU when PyTorch and CUDA are available, otherwise the vectorized CPU path.
Correctness and docs
Bit-identical output. Vectorized and GPU code paths produce the same results as the original scalar implementation. Tie-breaking,
min_inner_freq/min_outer_freqbehaviour, and case-encoded confidence are unchanged.New documentation page: GPU Acceleration & Vectorization with GPU setup, supported hardware, architecture overview, and tuning tips.
1.0.0 (2026)
Initial release.
Config-driven YAML pipeline for any focal species.
Three-phase workflow: project, call, evaluate.
Two-tier inner/outer outgroup voting with case-encoded confidence.
Support for arbitrary numbers of inner and outer outgroup species.
Parallel execution via
ProcessPoolExecutor.Optional evaluation against reference ancestral sequences and VCF data.
CLI with subcommands:
init,project,call,evaluate,run.Python API for programmatic use.
108 unit and integration tests.
Example configs for human, mouse, Drosophila, and Brassica rapa.
Comprehensive documentation with population genetics background, tutorials, and algorithm deep dives.
Installable with pip or uv.