Last update: Oct 26, 2017, Contributors: Diep Thi Hoang, Jana, Minh Bui
This tutorial gives a beginner’s guide.
Please first download and install the binary for your platform. For the next steps, the folder containing your
iqtree executable should be added to your PATH enviroment variable so that IQ-TREE can be invoked by simply entering
iqtree at the command-line. Alternatively, you can also copy
iqtree binary into your system search.
TIP: For quick overview of all supported options in IQ-TREE, run the command
IQ-TREE takes as input a multiple sequence alignment and will reconstruct an evolutionary tree that is best explained by the input data. The input alignment can be in various common formats. For example the PHYLIP format which may look like:
7 28 Frog AAATTTGGTCCTGTGATTCAGCAGTGAT Turtle CTTCCACACCCCAGGACTCAGCAGTGAT Bird CTACCACACCCCAGGACTCAGCAGTAAT Human CTACCACACCCCAGGAAACAGCAGTGAT Cow CTACCACACCCCAGGAAACAGCAGTGAC Whale CTACCACGCCCCAGGACACAGCAGTGAT Mouse CTACCACACCCCAGGACTCAGCAGTGAT
This tiny alignment contains 7 DNA sequences from several animals with the sequence length of 28 nucleotides. IQ-TREE also supports other file formats such as FASTA, NEXUS, CLUSTALW. The FASTA file for the above example may look like this:
>Frog AAATTTGGTCCTGTGATTCAGCAGTGAT >Turtle CTTCCACACCCCAGGACTCAGCAGTGAT >Bird CTACCACACCCCAGGACTCAGCAGTAAT >Human CTACCACACCCCAGGAAACAGCAGTGAT >Cow CTACCACACCCCAGGAAACAGCAGTGAC >Whale CTACCACGCCCCAGGACACAGCAGTGAT >Mouse CTACCACACCCCAGGACTCAGCAGTGAT
From the download there is an example alignment called
example.phy in PHYLIP format. This example contains parts of the mitochondrial DNA sequences of several animals (Source: Phylogenetic Handbook).
You can now start to reconstruct a maximum-likelihood tree from this alignment by entering (assuming that you are now in the same folder with
iqtree -s example.phy
-s is the option to specify the name of the alignment file that is always required by IQ-TREE to work. At the end of the run IQ-TREE will write several output files including:
example.phy.iqtree: the main report file that is self-readable. You should look at this file to see the computational results. It also contains a textual representation of the final tree (see below).
example.phy.treefile: the ML tree in NEWICK format, which can be visualized by any supported tree viewer programs like FigTree or iTOL.
example.phy.log: log file of the entire run (also printed on the screen). To report bugs, please send this log file and the original alignment file to the authors.
NOTE: Starting with version 1.5.4, with this simple command IQ-TREE will by default perform ModelFinder (see choosing the right substitution model below) to find the best-fit substitution model and then infer a phylogenetic tree using the selected model.
For this example data the resulting maximum-likelihood tree may look like this (extracted from
NOTE: Tree is UNROOTED although outgroup taxon 'LngfishAu' is drawn at root +--------------LngfishAu | | +--------------LngfishSA +--------| | +--------------LngfishAf | | +-------------------Frog +------| | +-----------------Turtle | +-----| | | | +-----------------------Sphenodon | | | +--| | | | | +--------------------------Lizard | | +---| | | | +---------------------Crocodile | | +------| | | +------------------Bird +---------| | +----------------Human | +--| | | | +--------Seal | | +--| | | | +-------Cow | | +---| | | +---------Whale | +----| | | | +------Mouse | | +---------| | | +--------Rat +----------| | +----------------Platypus +---| +-------------Opossum
This makes sense as the mammals (
Opossum) form a clade, whereas the reptiles (
Bird form a separate sister clade. Here the tree is drawn at the outgroup Lungfish which is more accient than other species in this example. However, please note that IQ-TREE always produces an unrooted tree as it knows nothing about this biological background; IQ-TREE simply draws the tree this way as
LngfishAu is the first sequence occuring in the alignment.
During the example run above, IQ-TREE periodically wrote to disk a checkpoint file
example.phy.ckp.gz (gzip-compressed to save space). This checkpoint file is used to resume an interrupted run, which is handy if you have a very large data sets or time limit on a cluster system. If the run did not finish, invoking IQ-TREE again with the very same command line will recover the analysis from the last stopped point, thus saving all computation time done before.
If the run successfully completed, running again will issue an error message:
ERROR: Checkpoint (example.phy.ckp.gz) indicates that a previous run successfully finished Use `-redo` option if you really want to redo the analysis and overwrite all output files.
This prevents lost of data if you accidentally re-run IQ-TREE. However, if you really want to re-run the analysis and overwrite all previous output files, use
iqtree -s example.phy -redo
Finally, the default prefix of all output files is the alignment file name. You can
change the prefix using the
iqtree -s example.phy -pre myprefix
This prevents output files being overwritten when you perform multiple analyses on the same alignment within the same folder.
NOTE: If you use model selection please cite the following paper:
S. Kalyaanamoorthy, B.Q. Minh, T.K.F. Wong, A. von Haeseler, and L.S. Jermiin (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods, 14:587–589. DOI: 10.1038/nmeth.4285
IQ-TREE supports a wide range of substitution models for DNA, protein, codon, binary and morphological alignments. If you do not know which model is appropriate for your data, you can use ModelFinder to determine the best-fit model:
#for IQ-TREE version >= 1.5.4: iqtree -s example.phy -m MFP #for IQ-TREE version <= 1.5.3: iqtree -s example.phy -m TESTNEW
-m is the option to specify the model name to use during the analysis. The special
MFP key word stands for ModelFinder Plus, which tells IQ-TREE to perform ModelFinder and the remaining analysis using the selected model. ModelFinder computes the log-likelihoods of an initial parsimony tree for many different models and the Akaike information criterion (AIC), corrected Akaike information criterion (AICc), and the Bayesian information criterion (BIC). Then ModelFinder chooses the model that minimizes the BIC score (you can also change to AIC or AICc by adding the option
TIP: Starting with version 1.5.4,
-m MFPis the default behavior. Thus, this run is equivalent to
iqtree -s example.phy.
Here, IQ-TREE will write an additional file:
example.phy.model: log-likelihoods for all models tested. It serves as a checkpoint file to recover an interrupted model selection.
If you now look at
example.phy.iqtree you will see that IQ-TREE selected
TIM2+I+G4 as the best-fit model for this example data. Thus, for additional analyses you do not have to perform the model test again and can use the selected model:
iqtree -s example.phy -m TIM2+I+G
Sometimes you only want to find the best-fit model without doing tree reconstruction, then run:
#for IQ-TREE version >= 1.5.4: iqtree -s example.phy -m MF #for IQ-TREE version <= 1.5.3: iqtree -s example.phy -m TESTNEWONLY
ModelFinder is up to 100 times faster than jModelTest/ProtTest.
jModelTest/ProtTest provides the invariable (
+I) and Gamma rate (
+G) heterogeneity across sites, but there is no reason to believe that evolution follows a Gamma distribution. ModelFinder additionally considers the FreeRate heterogeneity model (
+R), which relaxes the assumption of Gamma distribution, where the site rates and proportions are free-to-vary and inferred independently from the data. Moreover,
+Rallows to automatically determine the number of rate categories, which is impossible with
+G. This can be important especially for phylogenomic data, where the default 4 rate categories may “underfit” the data.
ModelFinder works transparently with tree inference in IQ-TREE, thus combining both steps in just one single run! This eliminates the need for a separate software for DNA (jModelTest) and another for protein sequences (ProtTest).
Apart from DNA and protein sequences, ModelFinder also works with codon, binary and morphological sequences.
If you still want to resembles jModelTest/ProtTest, then use option
By default, the maximum number of categories is limitted to 10 due to computational reasons. If your sequence alignment is long enough, then you can increase this upper limit with the
#for IQ-TREE version >= 1.5.4: iqtree -s example.phy -m MF -cmax 15 #for IQ-TREE version <= 1.5.3: iqtree -s example.phy -m TESTNEWONLY -cmax 15
+R15 instead of at most
To reduce computational burden, one can use the option
-mset to restrict the testing procedure to a subset of base models instead of testing the entire set of all available models. For example,
-mset WAG,LG will test only models like
LG+.... Another useful option in this respect is
-msub for AA data sets. With
-msub nuclear only general AA models are included, whereas with
-msub viral only AA models for viruses are included.
If you have enough computational resource, you can perform a thorough and more accurate analysis that invokes a full tree search for each model considered via the
#for IQ-TREE version >= 1.5.4: iqtree -s example.phy -m MF -mtree #for IQ-TREE version <= 1.5.3: iqtree -s example.phy -m TESTNEWONLY -mtree
IQ-TREE supports a number of codon models. You need to input a protein-coding DNA alignment and specify codon data by option
-st CODON (Otherwise, IQ-TREE applies DNA model because it detects that your alignment has DNA sequences):
iqtree -s coding_gene.phy -st CODON
If your alignment length is not divisible by 3, IQ-TREE will stop with an error message. IQ-TREE will group sites 1,2,3 into codon site 1; sites 4,5,6 to codon site 2; etc. Moreover, any codon, which has at least one gap/unknown/ambiguous nucleotide, will be treated as unknown codon character.
Note that the above command assumes the standard genetic code. If your sequences follow ‘The Invertebrate Mitochondrial Code’ (see the full list of supported genetic code here), then run:
iqtree -s coding_gene.phy -st CODON5
Note that ModelFinder works for codon alignments. IQ-TREE version >= 1.5.4 will automatically invokes ModelFinder to find the best-fit codon model. For version <= 1.5.3, use option
-m TESTNEW (ModelFinder and tree inference) or
-m TESTNEWONLY (ModelFinder only).
IQ-TREE supports discrete morphological alignments by
-st MORPH option:
iqtree -s morphology.phy -st MORPH
IQ-TREE implements to two morphological ML models: MK and ORDERED. Morphological data typically do not have constant (uninformative) sites. In such cases, you should apply ascertainment bias correction model by e.g.:
iqtree -s morphology.phy -st MORPH -m MK+ASC
You can again select the best-fit binary/morphological model:
#for IQ-TREE version >= 1.5.4: iqtree -s morphology.phy -st MORPH #for IQ-TREE version <= 1.5.3: iqtree -s morphology.phy -st MORPH -m TESTNEW
For SNP data (DNA) that typically do not contain constant sites, you can explicitly tell the model to include ascertainment bias correction:
iqtree -s SNP_data.phy -m GTR+ASC
You can explicitly tell model testing to only include
+ASC model with:
#for IQ-TREE version >= 1.5.4: iqtree -s SNP_data.phy -m MFP+ASC #for IQ-TREE version <= 1.5.3: iqtree -s SNP_data.phy -m TESTNEW+ASC
To overcome the computational burden required by the nonparametric bootstrap, IQ-TREE introduces an ultrafast bootstrap approximation (UFBoot) (Minh et al., 2013; Hoang et al., in press) that is orders of magnitude faster than the standard procedure and provides relatively unbiased branch support values. Citation for UFBoot:
D.T. Hoang, O. Chernomor, A. von Haeseler, B.Q. Minh, and L.S. Vinh (2017) UFBoot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol., in press. http://dx.doi.org/10.1093/molbev/msx281
To run UFBoot, use the option
iqtree -s example.phy -m TIM2+I+G -bb 1000
-bb specifies the number of bootstrap replicates where 1000 is the minimum number recommended. The section
MAXIMUM LIKELIHOOD TREE in
example.phy.iqtree shows a textual representation of the maximum likelihood tree with branch support values in percentage. The NEWICK format of the tree is printed to the file
example.phy.treefile. In addition, IQ-TREE writes the following files:
example.phy.contree: the consensus tree with assigned branch supports where branch lengths are optimized on the original alignment.
example.phy.splits: support values in percentage for all splits (bipartitions), computed as the occurence frequencies in the bootstrap trees. This file is in “star-dot” format.
example.phy.splits.nex: has the same information as
example.phy.splitsbut in NEXUS format, which can be viewed with the program SplitsTree to explore the conflicting signals in the data. So it is more informative than consensus tree, e.g. you can see how highly supported the second best conflicting split is, which had no chance to enter the consensus tree.
NOTE: UFBoot support values have a different interpretation to the standard bootstrap. Refer to FAQ: UFBoot support values interpretation for more information.
Starting with IQ-TREE version 1.6 we provide a new option
-bnni to reduce the risk of overestimating branch supports with UFBoot due to severe model violations. With this option UFBoot will further optimize each bootstrap tree using a hill-climbing nearest neighbor interchange (NNI) search based directly on the corresponding bootstrap alignment.
Thus, if severe model violations are present in the data set at hand, users are advised to append
-bnni to the regular UFBoot command:
iqtree -s example.phy -m TIM2+I+G -bb 1000 -bnni
The standard nonparametric bootstrap is invoked by the
iqtree -s example.phy -m TIM2+I+G -b 100
-b specifies the number of bootstrap replicates where 100 is the minimum recommended number. The output files are similar to those produced by the UFBoot procedure.
IQ-TREE provides an implementation of the SH-like approximate likelihood ratio test (Guindon et al., 2010). To perform this test, run:
iqtree -s example.phy -m TIM2+I+G -alrt 1000
-alrt specifies the number of bootstrap replicates for SH-aLRT where 1000 is the minimum number recommended.
You can also perform both SH-aLRT and the ultrafast bootstrap within one single run:
iqtree -s example.phy -m TIM2+I+G -alrt 1000 -bb 1000
The branches of the resulting
.treefile will be assigned with both SH-aLRT and UFBoot support values, which are readable by any tree viewer program like FigTree, Dendroscope or ETE. You can also look at the textual tree figure in
NOTE: Tree is UNROOTED although outgroup taxon 'LngfishAu' is drawn at root Numbers in parentheses are SH-aLRT support (%) / ultrafast bootstrap support (%) +-------------LngfishAu | | +--------------LngfishSA +-------| (100/100) | +------------LngfishAf | | +--------------------Frog +------| (99.8/100) | +-----------------Turtle | +--| (85/72) | | | +------------------------Crocodile | | +----| (96.5/97) | | +------------------Bird | +--| (39/51) | | +---------------------------Sphenodon | +-----| (98.2/99) | | +-------------------------------Lizard +---------| (100/100) | +--------------Human | +--| (92.3/93) | | | +------Seal | | +--| (68.3/75) | | | +-----Cow | | +--| (99.7/100) | | +-------Whale | +----| (99.1/100) | | | +---Mouse | | +---------| (100/100) | | +------Rat +-----------| (100/100) | +--------------Platypus +--| (93/98) +-----------Opossum
From this figure, the branching patterns within reptiles are poorly supported (e.g.
Sphenodon with SH-aLRT: 39%, UFBoot: 51% and
Turtle with SH-aLRT: 85%, UFBoot: 72%) as well as the phylogenetic position of
Seal within mammals (SH-aLRT: 68.3%, UFBoot: 75%). Other branches appear to be well supported.
A specialized version of IQ-TREE (
iqtree-omp) can utilize multiple CPU cores to speed up the analysis. To obtain this version please refer to the quick starting guide. A complement option
-nt allows specifying the number of CPU cores to use. For example:
iqtree-omp -s example.phy -m TIM2+I+G -nt 2
Here, IQ-TREE will use 2 CPU cores to perform the analysis.
Note that the parallel efficiency is only good for long alignments. A good practice is to use
-nt AUTO to determine the best number of cores:
iqtree-omp -s example.phy -m TIM2+I+G -nt AUTO
Then while running IQ-TREE may print something like this on to the screen:
Measuring multi-threading efficiency up to 8 CPU cores Threads: 1 / Time: 8.001 sec / Speedup: 1.000 / Efficiency: 100% / LogL: -22217 Threads: 2 / Time: 4.346 sec / Speedup: 1.841 / Efficiency: 92% / LogL: -22217 Threads: 3 / Time: 3.381 sec / Speedup: 2.367 / Efficiency: 79% / LogL: -22217 Threads: 4 / Time: 4.385 sec / Speedup: 1.825 / Efficiency: 46% / LogL: -22217 BEST NUMBER OF THREADS: 3
Therefore, I would only use 3 cores for this example data. For later analysis with your same data set, you can stick to the determined number.
Once confident enough you can go on with a more advanced tutorial, which covers topics like phylogenomic (multi-gene) analyses using partition models or mixture models.