Notes by Ziheng Yang
Last modified: 25 March 2009

(A) This is the MHC dataset analyzed by Yang & Swanson (2002: table 2) and
also Yang (2006: table 8.3 and figures 8.4 & 8.5).  Note that Yang &
Swanson implemented models M1 and M2 instead of M1a and M2a.  The
modified models M1a & M2a were introduced in 2004 (paml 3.14), so if
you run the current version of paml/codeml, the results for M1a and
M2a will be different from those listed in the paper.  One other
difference is that in the Y&S2002 paper, the branch lengths were
optimized under each model, whereas the default setting in codeml.ctl
(with fix_blength = 2) has the branch lengths fixed at the values in
the tree file, which are the MLEs under M0.  Thus if you run codeml
using the default codeml.ctl, only the results for M0 match those in
table 2.  To get the same estimates for M7 and M8, you need to use
fix_blength = 1 (and the computation will take much longer).  

The default codeml.ctl should produce the results in the book, which
were obtained by fixing the branch lengths under the M0 estimates.  I
just did a run, and got the following lnL values.  The value for M7 is
lower than in the book.  The difference may be due to numerical
inaccuracies in the discretization of the beta distribution in
different versions of the program.

Model (#p):   lnL
M0     (2):  -8225.154790
M1a    (2):  -7490.993363
M2a    (4):  -7231.154540
M7     (3):  -7502.792534 (-7498.97 in the book)
M8     (5):  -7238.014961


The lysin dataset analyzed in Yang and Swanson (2002) is also included
in the package, in the examples/lysin/ folder.

References

Yang, Z., and W. J. Swanson. 2002. Codon-substitution models to detect
adaptive evolution that account for heterogeneous selective pressures
among site classes. Mol. Biol. Evol. 19:49-57.

Yang, Z. 2006. Computational Molecular Evolution. Oxford University Press, Oxford, England.
