Adapting to Other Species
ancify works with any focal species for which you have pairwise net AXT alignments. This guide teaches you how to think about outgroup selection and walks through examples from across the tree of life.
A framework for choosing outgroups
Before configuring ancify, you need to make one key decision: which species are your inner outgroups and which are your outer outgroups?
Step 1: Draw the phylogeny
Sketch (or look up) the phylogenetic relationships around your focal species. You need at least three species total: your focal, one inner outgroup, and one outer outgroup.
Example for human:
6 Mya 9 Mya 25 Mya
┌─── Bonobo
┌─────┤ ┌ INNER tier
│ └─── Chimp │ (closely related)
┌──────┤ │
│ └──────── Gorilla ─┘
───┤
│
└──────────────────── Macaque ── OUTER tier
(distantly related)
Step 2: Assign tiers
Inner outgroups should be:
Closely related to the focal species (same genus or family)
Diverged more recently than the outer outgroup
Ideally 2 or more species (enables majority voting)
Outer outgroups should be:
Clearly outside the inner clade
Diverged at least 2-3x further than the inner outgroups
Far enough that convergent mutations with the inner group are extremely rare
Step 3: Check data availability
For each candidate outgroup, check if UCSC has a pairwise net AXT alignment to your focal assembly:
https://hgdownload.soe.ucsc.edu/goldenPath/<focal_assembly>/
Look for directories named vs<Outgroup>. If the alignment exists, you are in business.
Decision flowchart
Do you have ≥2 inner outgroups with AXT alignments?
│
├─ YES → Great! Majority voting will be robust.
│
└─ NO → Do you have 1 inner + 1 outer?
│
├─ YES → Still works. The outer outgroup provides
│ the independent check. Consider adding more
│ inner species if available.
│
└─ NO → You need at least 1 inner + 1 outer.
Check UCSC or generate your own alignments.
Worked examples
Tip
Quick test: The repo includes one-chromosome example scripts for human (chr22), mouse (chr19), Drosophila (chr4), and Brassica rapa (A01). From the repo root run e.g. ./scripts/examples/human/run.sh. See Quickstart and scripts/README.md.
Human (hg38) — the gold standard
focal_species: human
chromosome_lengths: chromoLens.txt
outgroups:
inner:
- name: bonobo # ~6 Mya
alignment: hg38.panPan3.net.axt.gz
- name: chimp # ~6 Mya
alignment: hg38.panTro6.net.axt.gz
- name: gorilla # ~9 Mya
alignment: hg38.gorGor6.net.axt.gz
outer:
- name: macaque # ~25 Mya
alignment: hg38.rheMac10.net.axt.gz
output_dir: ./human_ancestral
num_cpus: 24
Why this works well: Three inner outgroups provide redundancy. The inner-outer divergence ratio (~6-9 Mya vs. ~25 Mya) is large enough that convergent errors between tiers are negligible. Alignment coverage is excellent for all four species.
Mouse (mm39) — minimal setup
focal_species: mouse
chromosome_lengths: mm39.chromLens.txt
outgroups:
inner:
- name: rat # ~12 Mya
alignment: mm39.rn7.net.axt.gz
outer:
- name: rabbit # ~90 Mya
alignment: mm39.oryCun2.net.axt.gz
output_dir: ./mouse_ancestral
num_cpus: 8
With only one inner species, the majority vote is trivially that species’ allele. The outer outgroup still provides the independent confirmation. To strengthen the inner tier, consider adding hamster or other rodents if alignments are available.
Drosophila melanogaster (dm6) — non-chr naming
focal_species: drosophila_melanogaster
chromosome_lengths: dm6.chromLens.txt
chromosomes: [2L, 2R, 3L, 3R, 4, X]
outgroups:
inner:
- name: simulans # ~2.5 Mya
alignment: dm6.droSim2.net.axt.gz
- name: sechellia # ~2.5 Mya
alignment: dm6.droSec1.net.axt.gz
outer:
- name: yakuba # ~6 Mya
alignment: dm6.droYak3.net.axt.gz
output_dir: ./dmel_ancestral
num_cpus: 6
Note: The chromosome names (2L, 3R, etc.) do not have a chr prefix — ancify handles any naming convention. The explicit chromosomes list excludes heterochromatic scaffolds.
Brassica rapa (plant) — beyond animals
focal_species: brassica_rapa
chromosome_lengths: braRap1.chromLens.txt
outgroups:
inner:
- name: brassica_oleracea # close relative, same genus
alignment: braRap1.braOleracea.net.axt.gz
outer:
- name: arabidopsis_thaliana # ~20 Mya, same family Brassicaceae
alignment: braRap1.araTha1.net.axt.gz
output_dir: ./brassica_rapa_ancestral
num_cpus: 4
Plant genomes often use chromosome naming like A01, A02. Omit the chromosomes key to process all entries in the lengths file. Plant genome alignments may be more fragmented due to whole-genome duplications — expect higher N rates than in mammals.
Zebrafish (danRer11) — fish
focal_species: zebrafish
chromosome_lengths: danRer11.chromLens.txt
outgroups:
inner:
- name: medaka
alignment: danRer11.oryLat2.net.axt.gz
outer:
- name: fugu
alignment: danRer11.fr3.net.axt.gz
output_dir: ./zebrafish_ancestral
num_cpus: 4
Fish have high substitution rates compared to mammals, so the divergence between inner and outer outgroups needs to be carefully considered. Medaka and fugu provide a reasonable tier separation for zebrafish.
Getting the input data
Net AXT alignments from UCSC
Download from:
https://hgdownload.soe.ucsc.edu/goldenPath/<focal_assembly>/vs<Outgroup>/
Example for human vs. chimp:
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/vsPanTro6/hg38.panTro6.net.axt.gz
Chromosome lengths
From a FASTA index:
samtools faidx reference.fa
cut -f1,2 reference.fa.fai > chromoLens.txt
From UCSC MySQL:
mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A \
-e "SELECT chrom, size FROM chromInfo" hg38 > chromoLens.txt
Tips for outgroup selection
More inner species is better
Even 2 inner species is a major improvement over 1. With 1 inner species, a single alignment error or lineage-specific substitution produces a wrong call. With 2+, the majority vote provides robustness.
1 inner species: accuracy ≈ alignment quality
2 inner species: accuracy ≈ max(alignment quality)
3+ inner species: accuracy ≈ consensus of multiple independent signals
The outer outgroup must be clearly outside the inner clade
If the outer outgroup is too closely related to the inner species, ILS can affect both tiers together, producing false high-confidence calls:
BAD: outer is too close to inner
┌── Focal
┌──┤
│ └── Inner 1 ILS can affect all three
──┤
└──── Outer (barely species → false agreement
more distant)
GOOD: outer is clearly distant
┌── Focal
┌──┤
│ └── Inner 1
──┤
│
│
└────────── Outer (deep ILS between tiers is
divergence) negligible
Stringency vs. coverage tradeoff
Increasing min_inner_freq requires more species to agree:
Setting |
Coverage |
Accuracy |
|---|---|---|
|
Highest |
Lower (but still >99%) |
|
Moderate |
High |
|
Lowest |
Highest |
For most demographic analyses, min_inner_freq=1 is fine — the SFS shape is robust to rare misassignments.
Sex chromosomes and other special cases
chrY has very poor alignment coverage due to massive repetitive content. Expect >80%
N(missing).chrX may have lower coverage than autosomes, especially in regions of reduced synteny.
Mitochondrial genomes are not handled by standard UCSC pairwise alignments. For mtDNA polarization, consider using a multiple sequence alignment approach instead.
Species catalogue
The tables below list 100 commonly studied species with suggested outgroup configurations for ancestral allele polarization. Each entry shows:
Assembly — the UCSC genome browser identifier (where one exists).
Suggested inner outgroups — closely related species for the inner tier.
Suggested outer outgroup — a more distant species for independent confirmation.
Divergence — approximate divergence times (inner / outer) in millions of years ago (Mya).
Tip
These are starting-point suggestions. Always verify that pairwise net AXT alignments exist for your chosen focal assembly at https://hgdownload.soe.ucsc.edu/goldenPath/<assembly>/. If a pre-computed alignment is not available, you can generate your own with lastz and the UCSC axtChain/chainNet pipeline.
Important
Disclaimer — verify divergence times independently. The divergence times and outgroup configurations in this catalogue are approximate, drawn from molecular clock estimates in the published literature. They are provided as a convenience and may contain errors. Molecular divergence estimates vary substantially between studies depending on calibration fossils, clock models, and gene sets used. Before relying on any entry for your analysis, please cross-check the divergence times against an authoritative source such as TimeTree or the primary literature for your taxon. The authors of ancify make no guarantee of the accuracy of these values and accept no responsibility for downstream results based on incorrect tier assignments.
Primates
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
1 |
Homo sapiens |
Human |
hg38 |
Chimp, bonobo, gorilla |
Macaque |
6–9 / 25 Mya |
2 |
Pan troglodytes |
Chimpanzee |
panTro6 |
Bonobo, human |
Macaque |
2–6 / 25 Mya |
3 |
Pan paniscus |
Bonobo |
panPan3 |
Chimp, human |
Macaque |
2–6 / 25 Mya |
4 |
Gorilla gorilla |
Gorilla |
gorGor6 |
Human, chimp |
Macaque |
9 / 25 Mya |
5 |
Pongo abelii |
Orangutan |
ponAbe3 |
Human, chimp, gorilla |
Macaque |
13 / 25 Mya |
6 |
Nomascus leucogenys |
Gibbon |
nomLeu3 |
Human, macaque |
Marmoset |
18–25 / 40 Mya |
7 |
Macaca mulatta |
Rhesus macaque |
rheMac10 |
Crab-eating macaque, baboon |
Human |
3–8 / 25 Mya |
8 |
Macaca fascicularis |
Crab-eating macaque |
macFas5 |
Rhesus macaque, baboon |
Human |
3–8 / 25 Mya |
9 |
Papio anubis |
Baboon |
papAnu4 |
Rhesus macaque, green monkey |
Human |
5–8 / 25 Mya |
10 |
Chlorocebus sabaeus |
Green monkey |
chlSab2 |
Rhesus macaque, baboon |
Human |
8 / 25 Mya |
11 |
Nasalis larvatus |
Proboscis monkey |
nasLar1 |
Green monkey, rhesus macaque |
Human |
10 / 25 Mya |
12 |
Callithrix jacchus |
Marmoset |
calJac4 |
Squirrel monkey |
Macaque |
15 / 40 Mya |
13 |
Saimiri boliviensis |
Squirrel monkey |
saiBol1 |
Marmoset |
Macaque |
15 / 40 Mya |
14 |
Carlito syrichta |
Tarsier |
tarSyr2 |
Human, macaque |
Mouse lemur, bushbaby |
58 / 64 Mya |
15 |
Microcebus murinus |
Mouse lemur |
micMur2 |
Bushbaby |
Marmoset |
58 / 70 Mya |
16 |
Otolemur garnettii |
Bushbaby |
otoGar3 |
Mouse lemur |
Marmoset |
58 / 70 Mya |
Rodents & Lagomorphs
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
17 |
Mus musculus |
Mouse |
mm39 |
Rat |
Rabbit |
12 / 90 Mya |
18 |
Rattus norvegicus |
Rat |
rn7 |
Mouse |
Rabbit |
12 / 90 Mya |
19 |
Cricetulus griseus |
Chinese hamster |
criGriChoV2 |
Mouse, rat |
Rabbit |
20 / 90 Mya |
20 |
Cavia porcellus |
Guinea pig |
cavPor3 |
Chinchilla |
Mouse |
35 / 70 Mya |
21 |
Chinchilla lanigera |
Chinchilla |
chiLan1 |
Guinea pig |
Mouse |
35 / 70 Mya |
22 |
Heterocephalus glaber |
Naked mole-rat |
hetGla2 |
Guinea pig |
Mouse |
35 / 70 Mya |
23 |
Ictidomys tridecemlineatus |
Squirrel |
speTri2 |
Mouse, rat |
Rabbit |
50 / 90 Mya |
24 |
Dipodomys ordii |
Kangaroo rat |
dipOrd1 |
Mouse, rat |
Rabbit |
50 / 90 Mya |
25 |
Oryctolagus cuniculus |
Rabbit |
oryCun2 |
Pika |
Mouse |
30 / 90 Mya |
26 |
Ochotona princeps |
Pika |
ochPri3 |
Rabbit |
Mouse |
30 / 90 Mya |
Carnivores
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
27 |
Canis lupus familiaris |
Dog |
canFam6 |
Ferret, cat |
Horse |
50–55 / 80 Mya |
28 |
Felis catus |
Cat |
felCat9 |
Ferret, dog |
Horse |
50–55 / 80 Mya |
29 |
Mustela putorius furo |
Ferret |
musFur1 |
Dog, cat |
Horse |
50–55 / 80 Mya |
30 |
Ailuropoda melanoleuca |
Giant panda |
ailMel1 |
Dog, ferret |
Horse |
50 / 80 Mya |
31 |
Neomonachus schauinslandi |
Hawaiian monk seal |
neoSch1 |
Dog, ferret |
Horse |
50 / 80 Mya |
Ungulates & Cetaceans
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
32 |
Bos taurus |
Cow |
bosTau9 |
Sheep |
Pig |
20 / 60 Mya |
33 |
Ovis aries |
Sheep |
oviAri4 |
Cow |
Pig |
20 / 60 Mya |
34 |
Sus scrofa |
Pig |
susScr11 |
Cow, dolphin |
Horse |
60 / 80 Mya |
35 |
Tursiops truncatus |
Dolphin |
turTru2 |
Cow |
Horse |
55 / 80 Mya |
36 |
Balaenoptera acutorostrata |
Minke whale |
balAcu1 |
Dolphin, cow |
Horse |
35–55 / 80 Mya |
37 |
Equus caballus |
Horse |
equCab3 |
White rhinoceros |
Dog |
55 / 80 Mya |
38 |
Ceratotherium simum |
White rhinoceros |
cerSim1 |
Horse |
Dog |
55 / 80 Mya |
39 |
Vicugna pacos |
Alpaca |
vicPac2 |
Pig, cow |
Horse |
55–60 / 80 Mya |
Bats
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
40 |
Myotis lucifugus |
Microbat |
myoLuc2 |
Megabat |
Dog |
55 / 80 Mya |
41 |
Pteropus vampyrus |
Megabat |
pteVam1 |
Microbat |
Dog |
55 / 80 Mya |
Insectivores & other Laurasiatheria
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
42 |
Erinaceus europaeus |
Hedgehog |
eriEur2 |
Shrew |
Dog |
75 / 90 Mya |
43 |
Sorex araneus |
Shrew |
sorAra2 |
Hedgehog |
Dog |
75 / 90 Mya |
44 |
Manis pentadactyla |
Chinese pangolin |
manPen1 |
Dog, cat |
Horse |
75 / 85 Mya |
Afrotheria & Xenarthra
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
45 |
Loxodonta africana |
African elephant |
loxAfr3 |
Manatee, rock hyrax |
Armadillo |
60–65 / 100 Mya |
46 |
Trichechus manatus |
Manatee |
triMan1 |
Elephant |
Armadillo |
60 / 100 Mya |
47 |
Procavia capensis |
Rock hyrax |
proCap1 |
Elephant, manatee |
Armadillo |
60 / 100 Mya |
48 |
Echinops telfairi |
Tenrec |
echTel2 |
Elephant |
Armadillo |
75 / 100 Mya |
49 |
Dasypus novemcinctus |
Armadillo |
dasNov3 |
Sloth |
Elephant |
65 / 100 Mya |
50 |
Choloepus hoffmanni |
Sloth |
choHof1 |
Armadillo |
Elephant |
65 / 100 Mya |
Marsupials & Monotremes
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
51 |
Monodelphis domestica |
Opossum |
monDom5 |
Tasmanian devil, wallaby |
Platypus |
69 / 190 Mya |
52 |
Sarcophilus harrisii |
Tasmanian devil |
sarHar1 |
Wallaby |
Opossum |
62 / 69 Mya |
53 |
Macropus eugenii |
Wallaby |
macEug2 |
Tasmanian devil |
Opossum |
62 / 69 Mya |
54 |
Ornithorhynchus anatinus |
Platypus |
ornAna2 |
Opossum |
Chicken |
190 / 320 Mya |
Birds
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
55 |
Gallus gallus |
Chicken |
galGal6 |
Turkey |
Zebra finch |
30 / 80 Mya |
56 |
Meleagris gallopavo |
Turkey |
melGal5 |
Chicken |
Zebra finch |
30 / 80 Mya |
57 |
Anas platyrhynchos |
Mallard duck |
anaPla1 |
Chicken, turkey |
Zebra finch |
55–70 / 80 Mya |
58 |
Taeniopygia guttata |
Zebra finch |
taeGut2 |
Medium ground finch, flycatcher |
Chicken |
15–45 / 80 Mya |
59 |
Geospiza fortis |
Medium ground finch |
geoFor1 |
Zebra finch |
Chicken |
15 / 80 Mya |
60 |
Ficedula albicollis |
Collared flycatcher |
ficAlb2 |
Zebra finch |
Chicken |
45 / 80 Mya |
61 |
Melopsittacus undulatus |
Budgerigar |
melUnd1 |
Falcon, eagle |
Chicken |
50–60 / 80 Mya |
62 |
Falco peregrinus |
Peregrine falcon |
falPer1 |
Budgerigar, eagle |
Chicken |
50–55 / 80 Mya |
63 |
Aquila chrysaetos |
Golden eagle |
aquChr2 |
Falcon, budgerigar |
Chicken |
50–55 / 80 Mya |
64 |
Columba livia |
Rock pigeon |
colLiv1 |
Flycatcher, zebra finch |
Chicken |
70 / 80 Mya |
65 |
Apteryix mantelli |
Brown kiwi |
aptMan1 |
Chicken, turkey |
Zebra finch |
60–70 / 80 Mya |
Reptiles & Amphibians
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
66 |
Anolis carolinensis |
Green anole |
anoCar2 |
Garter snake |
Painted turtle |
150 / 275 Mya |
67 |
Chrysemys picta |
Painted turtle |
chrPic2 |
Softshell turtle |
Alligator |
70 / 255 Mya |
68 |
Pelodiscus sinensis |
Chinese softshell turtle |
pelSin2 |
Painted turtle |
Alligator |
70 / 255 Mya |
69 |
Alligator mississippiensis |
American alligator |
allMis1 |
Painted turtle, softshell turtle |
Green anole |
255 / 275 Mya |
70 |
Thamnophis sirtalis |
Garter snake |
thaSir1 |
Green anole |
Painted turtle |
150 / 275 Mya |
71 |
Xenopus tropicalis |
Western clawed frog |
xenTro10 |
X. laevis |
Coelacanth |
50 / 370 Mya |
72 |
Xenopus laevis |
African clawed frog |
xenLae2 |
X. tropicalis |
Coelacanth |
50 / 370 Mya |
73 |
Rana temporaria |
Common frog |
— |
Xenopus spp. |
Coelacanth |
200 / 370 Mya |
Fish
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
74 |
Danio rerio |
Zebrafish |
danRer11 |
Medaka, stickleback |
Fugu |
110–150 / 180 Mya |
75 |
Oryzias latipes |
Medaka |
oryLat2 |
Stickleback, Nile tilapia |
Zebrafish |
70–100 / 150 Mya |
76 |
Gasterosteus aculeatus |
Stickleback |
gasAcu1 |
Medaka, Nile tilapia |
Zebrafish |
70–100 / 150 Mya |
77 |
Takifugu rubripes |
Fugu |
fr3 |
Tetraodon |
Stickleback |
30 / 100 Mya |
78 |
Tetraodon nigroviridis |
Tetraodon |
tetNig2 |
Fugu |
Stickleback |
30 / 100 Mya |
79 |
Oreochromis niloticus |
Nile tilapia |
oreNil3 |
Stickleback, medaka |
Zebrafish |
70–100 / 150 Mya |
80 |
Gadus morhua |
Atlantic cod |
gadMor1 |
Stickleback, medaka |
Zebrafish |
100–110 / 150 Mya |
81 |
Salmo salar |
Atlantic salmon |
— |
Rainbow trout |
Zebrafish |
25 / 200 Mya |
82 |
Oncorhynchus mykiss |
Rainbow trout |
— |
Atlantic salmon |
Zebrafish |
25 / 200 Mya |
83 |
Latimeria chalumnae |
Coelacanth |
latCha1 |
Xenopus |
Spotted gar |
410 / 440 Mya |
84 |
Lepisosteus oculatus |
Spotted gar |
lepOcu1 |
Zebrafish |
Coelacanth |
340 / 440 Mya |
85 |
Petromyzon marinus |
Sea lamprey |
petMar3 |
Elephant shark |
Lancelet |
500 / 550 Mya |
86 |
Callorhinchus milii |
Elephant shark |
calMil1 |
Zebrafish |
Lamprey |
450 / 500 Mya |
Insects — Drosophila & relatives
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
87 |
D. melanogaster |
Fruit fly |
dm6 |
D. simulans, D. sechellia |
D. yakuba |
2.5 / 6 Mya |
88 |
D. simulans |
— |
droSim2 |
D. sechellia, D. melanogaster |
D. yakuba |
0.5–2.5 / 6 Mya |
89 |
D. sechellia |
— |
droSec1 |
D. simulans, D. melanogaster |
D. yakuba |
0.5–2.5 / 6 Mya |
90 |
D. yakuba |
— |
droYak3 |
D. erecta |
D. melanogaster |
6 / 12 Mya |
91 |
D. erecta |
— |
droEre2 |
D. yakuba |
D. melanogaster |
6 / 12 Mya |
92 |
D. ananassae |
— |
droAna3 |
D. melanogaster |
D. pseudoobscura |
25 / 40 Mya |
93 |
D. pseudoobscura |
— |
dp4 |
D. persimilis |
D. melanogaster |
2 / 25 Mya |
94 |
D. virilis |
— |
droVir3 |
D. mojavensis, D. grimshawi |
D. melanogaster |
20 / 40 Mya |
Insects — other orders
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
95 |
Anopheles gambiae |
Malaria mosquito |
anoGam3 |
Aedes aegypti |
D. melanogaster |
150 / 250 Mya |
96 |
Apis mellifera |
Honeybee |
apiMel4 |
Bombus spp. |
Nasonia (jewel wasp) |
70 / 150 Mya |
97 |
Tribolium castaneum |
Red flour beetle |
triCas2 |
D. melanogaster |
Honeybee |
250 / 300 Mya |
98 |
Bombyx mori |
Silkworm |
— |
Manduca sexta (tobacco hornworm) |
D. melanogaster |
100 / 250 Mya |
Worms & other invertebrates
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
99 |
C. elegans |
Nematode |
ce11 |
C. briggsae, C. remanei |
C. japonica |
80–100 / 100–200 Mya |
100 |
C. briggsae |
— |
cb4 |
C. remanei, C. elegans |
C. japonica |
80–100 / 100–200 Mya |
101 |
Strongylocentrotus purpuratus |
Purple sea urchin |
strPur2 |
Lancelet |
C. elegans |
520 / 650 Mya |
102 |
Branchiostoma floridae |
Lancelet |
braFlo1 |
Sea urchin |
Lamprey |
520 / 550 Mya |
103 |
Ciona intestinalis |
Sea squirt |
ci3 |
Lancelet |
Lamprey |
520 / 550 Mya |
104 |
Aplysia californica |
Sea hare |
aplCal1 |
C. elegans |
Sea urchin |
600 / 650 Mya |
Plants
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
105 |
Arabidopsis thaliana |
Thale cress |
araTha1† |
A. lyrata |
Brassica rapa |
5 / 20 Mya |
106 |
Brassica rapa |
Turnip / Chinese cabbage |
braRap1† |
B. oleracea |
Arabidopsis thaliana |
4 / 20 Mya |
107 |
Oryza sativa |
Rice |
— |
O. rufipogon, O. glaberrima |
Brachypodium distachyon |
1–2 / 50 Mya |
108 |
Zea mays |
Maize |
— |
Sorghum bicolor |
Oryza sativa |
12 / 50 Mya |
109 |
Solanum lycopersicum |
Tomato |
— |
S. tuberosum (potato) |
Coffea canephora |
8 / 80 Mya |
110 |
Glycine max |
Soybean |
— |
Phaseolus vulgaris (common bean) |
Arabidopsis thaliana |
20 / 100 Mya |
111 |
Vitis vinifera |
Grape |
— |
Coffea canephora |
Arabidopsis thaliana |
80 / 110 Mya |
112 |
Populus trichocarpa |
Poplar |
— |
Salix spp. (willow) |
Arabidopsis thaliana |
8 / 100 Mya |
Fungi
# |
Focal species |
Common name |
Assembly |
Suggested inner outgroups |
Suggested outer outgroup |
Div. (inner / outer) |
|---|---|---|---|---|---|---|
113 |
Saccharomyces cerevisiae |
Baker’s yeast |
sacCer3 |
S. paradoxus |
S. pombe |
5 / 600 Mya |
114 |
Schizosaccharomyces pombe |
Fission yeast |
— |
S. japonicus |
Neurospora crassa |
150 / 500 Mya |
115 |
Neurospora crassa |
Red bread mold |
— |
N. tetrasperma |
S. cerevisiae |
30 / 500 Mya |
Note
Assemblies marked with † are not part of the core UCSC Genome Browser but may be available through UCSC’s GenArk or via Ensembl/Phytozome. Entries with — in the assembly column have public genome assemblies but typically no UCSC pairwise alignments; you will need to generate AXT alignments yourself (see below).
Warning
Entries with poor inner/outer divergence ratios. The guidelines above recommend the outer outgroup diverge ≥2–3× further than the inner outgroup. Several entries in the catalogue have much lower ratios due to the phylogenetic structure of the group. Use these configurations with caution — the outer tier provides limited additional confirmation:
#14 Tarsier (58 / 64 Mya, ratio 1.1×). Tarsiers are haplorhines, sister to anthropoids. The haplorhine–strepsirrhine split (64 Mya) barely exceeds the tarsier–anthropoid split (58 Mya). Consider using a non-primate outgroup (e.g. mouse, ~85 Mya) for a better outer tier.
#52–53 Tasmanian devil & Wallaby (62 / 69 Mya, ratio 1.1×). Australidelphian inter-ordinal divergences are close in time to the Didelphimorphia–Australidelphia split. Consider platypus (~190 Mya) as the outer outgroup for a ratio of ~3×.
#69 Alligator (255 / 275 Mya, ratio 1.08×). Turtles and lepidosaurs are nearly equidistant from crocodilians in deep time. Consider using a bird (e.g. chicken, ~240 Mya) as an inner outgroup to improve tiering.
#83 Coelacanth (410 / 440 Mya, ratio 1.07×). The Sarcopterygii–Actinopterygii split is only slightly deeper than the coelacanth–tetrapod split. Consider lamprey (~500 Mya) as an alternative outer outgroup.
#99–100 Caenorhabditis — divergence times in nematodes are poorly constrained due to the absence of a reliable fossil record. Published molecular clock estimates range widely (some studies give 18–30 Mya, others 80–110 Mya for the C. elegans–C. briggsae split). The values shown (80–100 Mya) reflect the most commonly cited molecular estimates.
Generating your own alignments
For species not covered by UCSC’s pre-computed pairwise alignments, you can
produce net.axt.gz files from any pair of genome assemblies. This section
walks through the full process using Brassica rapa as a concrete example.
Prerequisites
You need two command-line toolkits:
Tool |
Purpose |
Install |
|---|---|---|
Pairwise genome alignment |
|
|
Chaining, netting, format conversion |
Download pre-compiled binaries for your platform |
The specific Kent utilities you need are: faToTwoBit, twoBitInfo,
axtChain, chainSort, chainNet, netToAxt, and axtSort.
# Example: download Kent tools on Linux x86_64
KENT=https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64
for tool in faToTwoBit twoBitInfo axtChain chainSort chainNet netToAxt axtSort; do
wget -q "$KENT/$tool" -O "$tool" && chmod +x "$tool"
done
export PATH="$PWD:$PATH"
Step 1: Obtain genome assemblies
Download FASTA files for your focal species and each outgroup. For Brassica rapa the genomes are available from Ensembl Plants (or NCBI/Phytozome):
# Focal: Brassica rapa v1.0
wget https://ftp.ensemblgenomes.org/pub/plants/release-57/fasta/brassica_rapa/dna/Brassica_rapa.Brapa_1.0.dna.toplevel.fa.gz
gunzip Brassica_rapa.Brapa_1.0.dna.toplevel.fa.gz
mv Brassica_rapa.Brapa_1.0.dna.toplevel.fa braRap1.fa
# Inner outgroup: Brassica oleracea (same genus, ~4 Mya)
wget https://ftp.ensemblgenomes.org/pub/plants/release-57/fasta/brassica_oleracea/dna/Brassica_oleracea.BOL.dna.toplevel.fa.gz
gunzip Brassica_oleracea.BOL.dna.toplevel.fa.gz
mv Brassica_oleracea.BOL.dna.toplevel.fa braOle1.fa
# Outer outgroup: Arabidopsis thaliana (~20 Mya, same family Brassicaceae)
wget https://ftp.ensemblgenomes.org/pub/plants/release-57/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz
gunzip Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz
mv Arabidopsis_thaliana.TAIR10.dna.toplevel.fa araTha1.fa
Step 2: Convert to 2bit format
lastz and the Kent chaining tools require .2bit format:
faToTwoBit braRap1.fa braRap1.2bit
faToTwoBit braOle1.fa braOle1.2bit
faToTwoBit araTha1.fa araTha1.2bit
# Generate chromosome sizes (needed for chaining/netting)
twoBitInfo braRap1.2bit braRap1.chrom.sizes
twoBitInfo braOle1.2bit braOle1.chrom.sizes
twoBitInfo araTha1.2bit araTha1.chrom.sizes
You also need the ancify chromosome lengths file (tab-separated name + length):
cp braRap1.chrom.sizes braRap1.chromLens.txt
Step 3: Run lastz pairwise alignment
Run lastz once for each outgroup. The target is always your focal species:
# B. rapa vs B. oleracea (inner outgroup)
lastz braRap1.2bit braOle1.2bit \
--format=axt \
--ambiguous=iupac \
> braRap1_braOle1_raw.axt
# B. rapa vs A. thaliana (outer outgroup)
lastz braRap1.2bit araTha1.2bit \
--format=axt \
--ambiguous=iupac \
> braRap1_araTha1_raw.axt
Tip
For large genomes, lastz can take many hours. Speed it up with
--step=20 --seed=match12 for a coarser but faster initial search, or
split chromosomes into separate jobs and run in parallel. See the
lastz documentation for all tuning flags.
Step 4: Chain and net the alignment
Chaining groups co-linear alignment blocks; netting selects the single best chain at each target position (removing tandem-duplication noise):
# --- B. rapa vs B. oleracea ---
axtChain braRap1_braOle1_raw.axt braRap1.2bit braOle1.2bit braOle1.chain
chainSort braOle1.chain braOle1.sorted.chain
chainNet braOle1.sorted.chain braRap1.chrom.sizes braOle1.chrom.sizes \
braOle1_target.net braOle1_query.net
netToAxt braOle1_target.net braOle1.sorted.chain braRap1.2bit braOle1.2bit stdout \
| axtSort stdin braRap1.braOleracea.net.axt
# --- B. rapa vs A. thaliana ---
axtChain braRap1_araTha1_raw.axt braRap1.2bit araTha1.2bit araTha1.chain
chainSort araTha1.chain araTha1.sorted.chain
chainNet araTha1.sorted.chain braRap1.chrom.sizes araTha1.chrom.sizes \
araTha1_target.net araTha1_query.net
netToAxt araTha1_target.net araTha1.sorted.chain braRap1.2bit araTha1.2bit stdout \
| axtSort stdin braRap1.araTha1.net.axt
Step 5: Compress and verify
gzip braRap1.braOleracea.net.axt
gzip braRap1.araTha1.net.axt
# Quick sanity check: each file should contain alignment blocks
zcat braRap1.braOleracea.net.axt.gz | head -20
Step 6: Run ancify
The resulting files plug straight into an ancify config:
focal_species: brassica_rapa
chromosome_lengths: braRap1.chromLens.txt
outgroups:
inner:
- name: brassica_oleracea
alignment: braRap1.braOleracea.net.axt.gz
outer:
- name: arabidopsis_thaliana
alignment: braRap1.araTha1.net.axt.gz
output_dir: ./brassica_rapa_ancestral
num_cpus: 4
ancify run -c brassica_rapa_config.yaml
The repo also includes a ready-made runner script at
scripts/examples/brassica_rapa/run.sh (single-chromosome test on A01).
Quick-reference: generic pipeline
For any species pair, the minimal pipeline is:
# 1. Convert FASTA to 2bit
faToTwoBit target.fa target.2bit
faToTwoBit query.fa query.2bit
twoBitInfo target.2bit target.chrom.sizes
twoBitInfo query.2bit query.chrom.sizes
# 2. Align
lastz target.2bit query.2bit \
--format=axt --ambiguous=iupac \
> raw.axt
# 3. Chain and net
axtChain raw.axt target.2bit query.2bit chain.txt
chainSort chain.txt sorted.chain
chainNet sorted.chain target.chrom.sizes query.chrom.sizes target.net query.net
netToAxt target.net sorted.chain target.2bit query.2bit stdout \
| axtSort stdin net.axt
# 4. Compress
gzip net.axt
Notes on plant genomes
Whole-genome duplications are common in plants (Brassica underwent a lineage-specific triplication). The netting step handles this by picking the single best chain per target position, but expect lower alignment coverage and more fragmented blocks than mammalian comparisons.
Chromosome naming varies widely (A01-A10 in B. rapa, Chr1-Chr5 in Arabidopsis). ancify does not assume a
chrprefix — any naming works. Omit thechromosomeskey in your config to process every entry in the lengths file.Alignment coverage for the outer outgroup (Arabidopsis, ~20 Mya) will be substantially lower than the inner outgroup (B. oleracea, ~4 Mya). This is normal and means more positions will receive low-confidence (lowercase) calls rather than high-confidence ones.
If the Ensembl FASTA contains scaffold/contig names you do not want, filter your chromosome lengths file first:
grep -E '^A[0-9]+' braRap1.chrom.sizes > braRap1.chromLens.txt
For more details see the lastz documentation and the UCSC Kent utilities.