Logo

Learn

  • Population Genetics Background
    • The big picture
    • Alleles, mutations, and time
    • The site frequency spectrum (SFS)
    • How do we determine the ancestral allele?
      • The parsimony argument
      • Why multiple outgroups?
      • The two-tier approach
    • What about non-model organisms?
    • The confidence encoding
    • Where does ancify fit in a typical workflow?
    • Ready to start?
  • Quickstart
    • Step 1: Install
      • Try the example scripts (optional)
    • Step 2: Generate a config template
    • Step 3: Edit the config
      • Choosing an inference method
    • Step 4: Run the pipeline
    • Step 5: Understand the output
    • Step 6: Use the output
      • Look up a single position
      • Polarize variants from a VCF
      • Filter by confidence
    • What just happened?
    • Next steps
  • Tutorials
    • Bundled example scripts
    • Tutorial 1: Polarizing the Human Genome
      • Background
      • Step 1: Download the data
      • Step 2: Create the config
      • Step 3: Run the pipeline
      • Step 4: Inspect the results
      • What do these numbers mean?
      • Step 5: Add evaluation (optional)
    • Tutorial 2: Your First Non-Human Species
      • Think phylogenetically
      • Get the data
      • Create the config
      • Run and inspect
    • Tutorial 3: Interpreting Disagreements
      • Case 1: High confidence (A)
      • Case 2: Low confidence, inner only (a)
      • Case 3: Low confidence, outer only (t)
      • Case 4: Disagreement (n)
      • Case 5: Missing (N)
    • Tutorial 4: Getting Ancestral FASTA Files
      • Locate the output
      • Quick sanity check
      • Read a single position
      • Load all chromosomes into a dictionary
      • Confidence encoding recap
    • Tutorial 5: Polarizing VCF Variants
      • Prerequisites
      • Concept
      • Minimal example
      • Complete script: write an annotated VCF
      • Using scikit-allel instead of cyvcf2
      • Computing the unfolded site frequency spectrum
      • Tips
    • Next steps

User Guide

  • Installation
    • Quick install
    • Install options
      • With evaluation dependencies
      • With development dependencies
      • With ML dependencies
      • With GPU acceleration
      • Everything at once
      • Using uv (faster alternative to pip)
    • Requirements
    • Verify the installation
    • Platform notes
      • Memory requirements
    • Troubleshooting
      • ModuleNotFoundError: No module named 'yaml'
      • ModuleNotFoundError: No module named 'allel'
      • command not found: ancify
      • ModuleNotFoundError: No module named 'lightgbm'
      • SyntaxError on older Python
  • Configuration Reference
    • Generate a starter config
    • Complete annotated config
    • Field reference
      • Required fields
      • Optional fields
      • Evaluation fields
    • Understanding key parameters
      • min_inner_freq: the stringency dial
      • num_cpus: parallelism
      • method: choosing your inference approach
        • Parsimony YAML
        • Likelihood YAML
        • ML YAML (two-step workflow)
      • backend: the compute engine
      • Choosing chromosomes
    • Pattern placeholders
    • The chromosome lengths file
      • How to create one
    • Validation
    • Config recipes
      • Minimal (2 species)
      • Maximal (many species, strict settings, evaluation)
      • Fitch parsimony
      • Likelihood
      • ML classifier
      • Quick test run (single chromosome)
  • GPU Acceleration & Vectorization
    • At a glance
    • How it works
      • Phase 1: Vectorized coordinate projection
      • Phase 2: GPU-accelerated ancestral calling
    • Multi-GPU support
      • Memory budget per GPU
    • Configuration
      • backend
      • gpu_devices
    • Installation
      • Core (CPU vectorization)
      • GPU acceleration
      • Faster gzip decompression (optional)
      • All performance extras at once
    • Verifying backend detection
    • Correctness guarantees
    • Performance tuning tips
      • Use isal for Phase 1
      • Match num_cpus to your setup
      • Storage matters for Phase 1
      • When CPU-only is fast enough
    • Supported hardware
    • Architecture overview
    • Comparison with the default path
  • Command-Line Interface
    • Synopsis
    • Global options
    • Subcommands
      • ancify init — Generate a config template
      • ancify project — Phase 1: Coordinate projection
      • ancify call — Phase 2: Ancestral state inference
      • ancify evaluate — Phase 3: Evaluation
      • ancify run — All phases end-to-end
      • ancify train — Train an ML model
    • Workflow patterns
      • Standard full run
      • Iterate on Phase 2 settings
      • Debug a failing run
      • Test with a single chromosome first
      • Process species independently (Phase 1)
    • Examples
  • Adapting to Other Species
    • A framework for choosing outgroups
      • Step 1: Draw the phylogeny
      • Step 2: Assign tiers
      • Step 3: Check data availability
      • Decision flowchart
    • Worked examples
      • Human (hg38) — the gold standard
      • Mouse (mm39) — minimal setup
      • Drosophila melanogaster (dm6) — non-chr naming
      • Brassica rapa (plant) — beyond animals
      • Zebrafish (danRer11) — fish
    • Getting the input data
      • Net AXT alignments from UCSC
      • Chromosome lengths
    • Tips for outgroup selection
      • More inner species is better
      • The outer outgroup must be clearly outside the inner clade
      • Stringency vs. coverage tradeoff
      • Sex chromosomes and other special cases
    • Species catalogue
      • Primates
      • Rodents & Lagomorphs
      • Carnivores
      • Ungulates & Cetaceans
      • Bats
      • Insectivores & other Laurasiatheria
      • Afrotheria & Xenarthra
      • Marsupials & Monotremes
      • Birds
      • Reptiles & Amphibians
      • Fish
      • Insects — Drosophila & relatives
      • Insects — other orders
      • Worms & other invertebrates
      • Plants
      • Fungi
      • Generating your own alignments
        • Prerequisites
        • Step 1: Obtain genome assemblies
        • Step 2: Convert to 2bit format
        • Step 3: Run lastz pairwise alignment
        • Step 4: Chain and net the alignment
        • Step 5: Compress and verify
        • Step 6: Run ancify
        • Quick-reference: generic pipeline
        • Notes on plant genomes

Deep Dives

  • Algorithm
    • Overview
    • Phase 1: Coordinate Projection
      • Input: Net AXT alignments
      • How projection works
      • Worked example
    • Phase 2: Ancestral State Inference
      • Step 1: Majority vote
      • Step 2: Compare inner and outer
      • The complete algorithm as pseudocode
      • Worked example: full pipeline for one position
    • Confidence encoding
    • Biological rationale
      • Why two tiers instead of one big vote?
      • Incomplete lineage sorting (ILS)
      • When the algorithm can still fail
    • The min_inner_freq parameter in depth
    • Alignment quality and its effects
    • Alternative method: Fitch parsimony
      • When to prefer parsimony
      • The Fitch algorithm
      • Handling missing data
      • Confidence encoding (parsimony)
      • Comparison: voting vs. parsimony
      • Configuration
    • Alternative method: Likelihood (Felsenstein pruning)
      • When to prefer likelihood
      • Substitution models
      • The Felsenstein pruning algorithm
      • Worked example
      • Comparison: voting vs. parsimony vs. likelihood vs. ML
      • Configuration
    • Alternative method: Machine learning classifier
      • When to prefer ML
      • Feature engineering
      • Training workflow
      • Confidence calibration
      • Comparison: voting vs. parsimony vs. ML
      • Why LightGBM?
    • Summary
      • Two-tier voting (default)
      • Fitch parsimony
      • Likelihood (Felsenstein pruning)
      • ML classifier
  • Evaluation
    • Overview
    • What gets measured
      • Coverage statistics (always computed)
      • Reference comparison (optional)
      • VCF comparison (optional)
    • Output format
    • How to interpret the results
      • Coverage: “How much of my genome has an ancestral call?”
      • Agreement rate: “Do I agree with the gold standard?”
      • VCF comparison: “Does my ancestral allele match known variants?”
    • Validation results: Human hg38 (BCGM)
    • Configuring evaluation
      • Pattern placeholders
    • Running evaluation standalone
    • When you do not need evaluation
  • FAQ & Troubleshooting
    • General Questions
      • What species can I use ancify with?
      • How accurate is ancify?
      • How long does it take?
      • How much memory do I need?
      • Can I run phases independently?
    • Data Questions
      • Where do I get net AXT alignment files?
      • What if UCSC does not have alignments for my species?
      • Can I use MAF files instead of AXT?
      • What chromosome naming convention should I use?
    • Performance & Resource Issues
      • MemoryError during Phase 2
      • Phase 1 is very slow
      • Do I need a GPU?
      • How do I enable GPU acceleration?
      • My GPU runs out of memory
      • Does the GPU backend change the output?
      • Can I resume a failed run?
    • Output Questions
      • What does each character mean in the output FASTA?
      • My output has a lot of Ns. Is something wrong?
      • Can I use the output to polarize a VCF?
      • Is the confidence encoding compatible with Ensembl EPO?
    • Scientific Questions
      • How does this compare to the Ensembl EPO method?
      • When should I NOT use parsimony-based polarization?
      • What min_inner_freq should I use?
    • Still stuck?

Reference

  • API Reference
    • Quick example
    • ancify.utils
      • read_fasta()
      • write_fasta()
      • read_chromosome_lengths()
      • majority_vote()
      • chrom_id()
    • ancify.config
      • OutgroupSpec
        • OutgroupSpec.name
        • OutgroupSpec.alignment
      • EvaluationConfig
        • EvaluationConfig.reference_dir
        • EvaluationConfig.reference_pattern
        • EvaluationConfig.vcf_dir
        • EvaluationConfig.vcf_pattern
      • PipelineConfig
        • PipelineConfig.focal_species
        • PipelineConfig.chromosome_lengths
        • PipelineConfig.outgroups_inner
        • PipelineConfig.outgroups_outer
        • PipelineConfig.chromosomes
        • PipelineConfig.work_dir
        • PipelineConfig.output_dir
        • PipelineConfig.min_inner_freq
        • PipelineConfig.min_outer_freq
        • PipelineConfig.num_cpus
        • PipelineConfig.backend
        • PipelineConfig.gpu_devices
        • PipelineConfig.method
        • PipelineConfig.tree
        • PipelineConfig.ml_model_path
        • PipelineConfig.ml_training_reference
        • PipelineConfig.ml_high_threshold
        • PipelineConfig.ml_low_threshold
        • PipelineConfig.substitution_model
        • PipelineConfig.model_kappa
        • PipelineConfig.model_base_freqs
        • PipelineConfig.model_rates
        • PipelineConfig.likelihood_high_threshold
        • PipelineConfig.likelihood_low_threshold
        • PipelineConfig.evaluation
        • PipelineConfig.resolve_chromosomes()
        • PipelineConfig.all_outgroups
      • load_config()
      • validate_config()
    • ancify.project
      • project_alignment()
      • run_projection()
    • ancify.ancestral
      • call_ancestral_base()
      • call_ancestral_base_parsimony()
      • run_ancestral_calling()
    • ancify.parsimony
      • TreeNode
        • TreeNode.name
        • TreeNode.children
        • TreeNode.branch_length
        • TreeNode.is_leaf
        • TreeNode.leaf_names()
      • get_leaf_names()
      • parse_newick()
      • fitch_bottom_up()
      • fitch_top_down()
      • fitch_ancestral()
    • ancify.evaluate
      • compute_coverage_stats()
      • compare_to_reference()
      • compare_to_vcf()
      • run_evaluation()
    • ancify.cli
      • cmd_init()
      • cmd_project()
      • cmd_call()
      • cmd_evaluate()
      • cmd_train()
      • cmd_run()
      • main()
  • Glossary
  • Changelog
    • 1.5.0 (2026)
      • Documentation
    • 1.4.0 (2026)
      • Likelihood method (Felsenstein pruning)
      • Documentation and README
      • Tests
      • Version and CLI
    • 1.3.0 (2026)
    • 1.2.0 (2026)
    • 1.1.0 (2026)
      • Performance
      • New module and config
      • Correctness and docs
    • 1.0.0 (2026)
ancify
  • Search


© Copyright 2025, ancify contributors.

Built with Sphinx using a theme provided by Read the Docs.