5. Quick-start guide to BAMM

This section assumes you have a compiled version of BAMM in a directory on your computer (see Setting Up BAMM for installation). This is a quick-start guide: we will provide guidance for some general parameter settings that should get BAMM running on your dataset, but you should explore the Configuration page in more detail to optimize performance.

BAMM can be used to model speciation-extinction rates and phenotypic evolutionary rates across phylogenetic trees. To run a speciation-extinciont analysis on your dataset, you need the following (easiest if all in the same directory):

  • A time-calibrated phylogenetic tree
  • A control file
  • The BAMM program

To use BAMM to analyze phenotypic evolution, you also need a file containing your phenotypic (trait) data.

Your phylogenetic tree must be ultrametric, it must be fully bifurcating (no polytomies), and all branch lengths must be greater than 0. BAMM checks for these things, but it’s also good to do a quick check in R using the ape package. You can do this as follows:

library(ape)
v <- read.tree("myPhylogeny.tre")
is.ultrametric(v)
is.binary.tree(v)
# Now to check min branch length:
min(v$edge.length)

If all of those checks are good, we can move on.

5.1. Control file

To run bamm, you must always specify a control file. The control file contains all of the settings necessary to run the program on your dataset, including the name(s) of the input files you seek to analyze. The easiest way to run BAMM is to place the control file, and all files to be analyzed (e.g., the phylogeny) in the same directory as the BAMM application. If your control file is named myControlFile.txt, you would run BAMM as follows (on the OSX operating system):

./bamm -c myControlFile.txt

On Windows machines, you would run:

bamm -c myControlFile.txt

You may download example control files for diversification or trait analyses. Use these as a template for setting up your own analyses. The control file is simply a text file with a set of parameter names, followed by the “equals” sign (=), followed by the parameter value. Anything on a line of the control file to the right of a pound sign (#) will be ignored by the program (e.g., it is considered a comment). Part of your control file might look like this:

# This line is a comment. It will not be read by BAMM
modeltype = speciationextinction
treefile = myPhylogeny.tre
runInfoFilename = run_info.txt
# This line is also a comment and not read
runMCMC = 1
useGlobalSamplingProbability = 1
globalSamplingFraction = 0.81
seed = 12345
expectedNumberOfShifts = 1.0 # Another comment here

There are many possible settings that can be tweaked in BAMM. The next two sections give you a simple recipe for running the program on your data, depending on whether you are analyzing speciation-extinction rates or phenotypic evolutionary rates. There is no guarantee that these settings will work for your dataset.

5.2. Speciation-extinction analyses

You must have an ultrametric phylogenetic tree. For optimal performance with the quick-start settings, the tree should be calibrated in units of millions of years (e.g, a branch length of 1.0 implies 1.0 million years). As a template, use the example file template_diversification.txt. The default values in this file work reasonably well for most trees calibrated on million-year timescales but may not work for your data. Here’s an example phylogenetic tree of whales that is used elsewhere in this documentation.

If you open the template file, you’ll see that there are extensive comments. For each parameter in the BAMM control file, we’ve included a brief description on the line following the parameter. For example:

modeltype = speciationextinction
# Specify "speciationextinction" or "trait" analysis

This tells us what parameter modeltype does. It specifies the type of analysis (here, a speciation-extinction analysis). If we wanted to do a phenotypic evolutionary analysis, we would have set modeltype = trait.

There are only a handful of parameters in the template file that you need to set explicitly in order to run BAMM. These are currently defined with the following symbol: %%%%. For example, you see the following immediately after modeltype:

treefile = %%%%
# File name of the phylogenetic tree to be analyzed

This is where you specify the name of your phylogenetic tree. For example, treefile = mytree.tre. The other parameters we will force you to define explicitly have to do with output. This parameter block looks like this in the template:

numberOfGenerations = %%%%

mcmcWriteFreq = %%%%

eventDataWriteFreq = %%%%

printFreq = %%%%

acceptanceResetFreq = %%%%

numberOfGenerations is the number of simulation steps you want in your MCMC analysis. printFreq is the frequency that BAMM will write some simple information to the screen so you can track the progress of the run. mcmcWriteFreq and eventDataWriteFreq tells BAMM how often to write the two basic types of output to file.

BAMM generates two types of output. The first is a file containing basic attributes of the MCMC chain as you sample it at particular timepoints. This includes: the log-likelihood of the data under the current parameters, the number of diversification shifts in the current state, the log-prior density of the current parameters, and a moving-window average of the acceptance rate for the MCMC simulation. The second type of output is the event data. This is the real stuff of interest. It contains all parameters associated with the macroevolutionary rate regimes and is used for all the subsequent analyses of evolutionary rates.

You can set these parameters to whatever you want. However, please remember that you will be working with the event data file in R, which is a bit limited on memory. As a rough guide, we suggest choosing a value for eventDataWriteFreq that gives at least 1000 samples from the posterior, but we also don’t see much advantage to having more than 5000.

acceptanceResetFreq specifies the frequency (in generations) in which to reset the acceptance and rejection counts for reporting. Its value should be a divisor of mcmcWriteFreq.

For starters, you should try a simple run with settings like this:

numberOfGenerations = 5000
mcmcWriteFreq = 1000
eventDataWriteFreq = 1000
printFreq = 100
acceptanceResetFreq = 1000

You’ll want to increase all of these once you are sure the program is correctly loading your data etc, but it’s a good first check.

One other block of parameters can be critical to BAMM performance: the priors that you place on your evolutionary rate parameters. The prior block in your control file looks similar to this (ignoring most comments in the template file):

expectedNumberOfShifts = 1.0
lambdaInitPrior = 1.0
lambdaShiftPrior = 0.05
muInitPrior = 1.0

These priors may work for your dataset. They may also be extremely inadequate. To this end, we have included a function in the BAMMtools package to help you choose appropriate prior values. The function, setBAMMpriors, will automatically generate a prior block as a text file that you can copy and paste over the prior block in the template file. To do this, you need to install BAMMtools (see here), and you need your phylogenetic tree. Assuming you have a phylogenetic tree file my_tree.tre, you can generate the prior block with:

> library(BAMMtools) # Assuming you have installed BAMMtools!
> setBAMMpriors(read.tree("my_tree.tre"))

and the relevant output file will be generated in your working directory. See the help file (?setBAMMpriors) for more information. To be clear: this does not optimize priors to your dataset. It simply chooses a set of priors that we have found to be reasonable for most datasets and scales the distributions based on the age (root depth) of your tree. A more complete explanation can be found here.

5.2.1. Incomplete taxon sampling

For speciation-extinction analyses BAMM can analytically account for incomplete taxon sampling that might otherwise bias results. You can even correct for non-random taxon sampling. An explanation of how to account for both random and non-random taxon sampling is found here.

5.3. Phenotypic evolution

This section is redundant with the preceding section on speciation-extinction, with a few differences.

You must have an ultrametric phylogenetic tree. For optimal performance with the quick-start settings, the tree should be calibrated in units of millions of years (e.g, a branch length of 1.0 implies 1.0 million years). As a template, use the example file template_trait.txt. The default values in this file work reasonably well for most trees calibrated on million-year timescales but may not work for your data.

If you open the template file, you’ll see that there are extensive comments. For each parameter in the BAMM control file, we’ve included a brief description on the line following the parameter. For example:

modeltype = trait

This tells us what parameter modeltype does. It specifies the type of analysis (here, a phenotypic evolution analysis). If we wanted to do a speciation-extinction analysis, we would have set modeltype = speciationextinction.

There are only a handful of parameters in the template file that you need to set explicitly in order to run BAMM. These are currently defined with the following symbol: %%%%. For example, you see the following immediately after modeltype:

treefile = %%%%

This is where you specify the name of your phylogenetic tree. For example, treefile = mytree.tre. Since we are analyzing phenotypes, we also need to specify the location of the trait data, which we do here:

traitfile = %%%%

The trait file should consist of a 2 column text file, with species name followed by a tab, followed by the relevant trait value. Here is an example file of log-transformed primate body masses, and here is the corresponding Newick format tree. You should be able to plug these into the control file and get BAMM to run.

The other parameters we will force you to define explicitly have to do with output. This parameter block looks like this in the template:

numberGenerations = %%%%

mcmcWriteFreq = %%%%

eventDataWriteFreq = %%%%

printFreq = %%%%

acceptanceResetFreq = %%%%

numberOfGenerations is the number of simulation steps you want in your MCMC analysis. printFreq is the frequency that BAMM will write some simple information to the screen so you can track the progress of the run. mcmcWriteFreq and eventDataWriteFreq tells BAMM how often to write the two basic types of output to file. BAMM generates two types of output. The first is a file containing basic attributes of the MCMC chain as you sample it at particular timepoints. This includes: the log-likelihood of the data under the current parameters, the number of diversification shifts in the current state, the log-prior density of the current parameters, and a moving-window average of the acceptance rate for the MCMC simulation. The second type of output is the event data. This is the real stuff of interest. It all parameters associated with the macroevolutionary rate regimes and is used for all the subsequent analyses of evolutionary rates. acceptanceResetFreq specifies the frequency (in generations) in which to reset the acceptance and rejection counts for reporting. Its value should be a divisor of mcmcWriteFreq.

You can set these parameters to whatever you want. However, please remember that you will be working with the event data file in R, which is a bit limited on memory. As a rough guide, we suggest choosing a value for eventDataWriteFreq that gives at least 1000 samples from the posterior, but we also don’t see much advantage to having more than 5000.

For starters, you should try a simple run with settings like this:

numberOfGenerations = 5000
mcmcWriteFreq = 1000
eventDataWriteFreq = 1000
printFreq = 100
acceptanceResetFreq = 1000

You’ll want to increase most of these once you are sure the program is correctly loading your data etc, but it’s a good first check.

As for the speciation-extinction models, the priors you place on phenotypic evolutionary parameters can have a substantial impact on BAMM performance. The prior block in your (trait) template control file looks similar to this:

expectedNumberOfShifts = 1
betaInitPrior = 1
betaShiftPrior = 0.05
useObservedMinMaxAsTraitPriors = 1

These priors may work for your dataset, but they may also be very poor choices: it really depends on the scale of your tree (e.g., depth of the tree) and the variances in your trait values. The function setBAMMpriors (BAMMtools) will automatically generate a prior block as a text file that you can copy and paste over the prior block in the template file. This new set of priors is matched to the “scale” of your data. To do this, you need to install BAMMtools (see here), and you need your phylogenetic tree. Assuming you have a phylogenetic tree file my_tree.tre and a trait dataset my_traitfile.txt, you can generate the prior block with:

> library(BAMMtools) # Assuming you have installed BAMMtools!
> setBAMMpriors(phy = "my_tree.tre", traits = "my_traitfile.txt")

and the relevant output file will be generated in your working directory. See the help file (?setBAMMpriors) for more information. To be clear: this does not optimize priors to your dataset. It simply chooses a set of priors that we have found to be reasonable for most datasets and scales the distributions based on the age (root depth) of your tree and the variance of your trait data. A more complete explanation can be found here.

As of BAMMtools v2.1, you can generate a customized controlfile from within R. Doing the following:

> library(BAMMtools)
> generateControlFile(file = 'divcontrol.txt', type = 'diversification')

will create a template controlfile similar to the one that is available for download from this webpage. Additionally, one can specify BAMM parameter values and have them directly supplied to the controlfile template. The parameters are supplied in the form of a list. Following the whales example used throughout the website, and using evolutionary rate parameter priors supplied by setBAMMpriors, one can easily create a customized controlfile with the following:

> library(BAMMtools)
> generateControlFile('divcontrol.txt', type = 'diversification', params = list(
        treefile = 'whales.tre',
        globalSamplingFraction = '0.98',
        numberOfGenerations = '1000000',
        overwrite = '1',
        lambdaInitPrior = '1.889,
        lambdaShiftPrior = '0.032,
        muInitPrior = '1.889',
        expectedNumberOfShifts = '1'))

5.4. BAMM output: brief

BAMM generates multiple types of output files. These (usually) include:

  • The run_info.txt file, containing a summary of your parameters/settings
  • An mcmc_out.txt or equivalent file, containing raw MCMC information useful in diagnosing convergence
  • An event_data.txt file or equivalent, containing all of evolutionary rate parameters and their topological mappings
  • A prior.txt file or equivalent, giving the prior expectation on the number of shift events (this is optional and can be turned off).
  • A chain_swap.txt file, containing data about each chain swap proposal (when a proposal occured, which chains might be swapped, and whether the swap was accepted).

In general, the post-BAMM workflow consists of:

  1. Reading your MCMC file into R and testing whether your run appears to have converged. We advocate doing this using the coda package for R, which enables you to compute the effective sample size of your log-likelihoods and numbers of rate shifts sampled during the MCMC simulation.
  2. Summarizing your posterior distribution on the number of rate shift events
  3. Loading your event_data.txt file or equivalent into R using the BAMMtools package
  4. Many potential downstream analyses, including summarizing mean evolutionary rates for clades, analyses of rate shift distributions, plotting model-averaged rate-through-time curves, and so on.

A more detailed description of BAMMtools workflows for postprocessing BAMM output can be found here.