Manual
Contents
Overview
TEASPOON is a package for the inference of adaptive molecular evolution in serially-sampled population-genetic datasets.
If you use this software, please cite us - this helps us maintain and improve TEASPOON.
Getting started
- Check Java version
- Download TEASPOON
- Prepare data as FASTA
- Open the GUI
- Load your data
- Set an alignment mask
- Run an analysis
- Interpret the output
Data requirements
Data should be population-genetic data from coding sequences. This should be organised into an ancestral ('outgroup') alignment, and one or more 'focal' or main alignments.
Data formatting
Data should be:
- Formatted as FASTA
- Aligned identically across all timepoints, including the ancestral alignment.
- Of equal length.
GUI operation
Double-click on the GUI icon. Note may need to increase the memory reqirements, see installation page.
Loading alignments
From the TEASPOON menu, select 'load a directory'. Navigate to the directory where your input files are. All .FASTA files present will be imported by TEASPOON. You can remove files later.
Inputting date data
TEASPOON tries to guess the date of each alignment file from the last-occuring (right-most) decimal sequence in the filename, e.g. patient_1_2006.5.fasta would guess a date of '2006.5', but 2006.5_patient_1.fasta would infer a date of '1'.
You can manually enter alignment dates by double-clicking on the date column in the Alignments panel.
Specify the ancestral alignment
TEASPOON needs to use an ancestral, or 'outgroup', alignment to determine which substitutions are ancestral. If you imported data and TEASPOON automatically guessed dates successfully, then the oldest alignment will have been automatically set as the ancestral one. You can also change the ancestral alignment using the column in the Alignments table.
Note that exactly one alignment should be set. TEASPOON can't run if zero or more than one are selected.
Set an alignment mask
Each alignment mask is used to specify a region of the alignment, a rate estimation method, and optionally a neutral ratio. Click, 'add mask'. There are three options:
NEUTRAL_RATIO_FIXED: no rate estimation is performed, and the given ratio (any decimal value from zero up) is used to infer the number of substitutions.NEUTRAL_RATIO_AGGREGATED: all the mainfiles' datasets are combined into one big alignment, and the neutral ratio is inferred from the mid-frequency site class in this alignment. This neutral ratio is then applied to each of the mainfile datasets in turn to estimate substitutions in the high-frequency site class as a fixed neutral ratio.NEUTRAL_RATIO_AVERAGED: neutral ratio is inferred separately for each of the mainfiles, then their values averaged to produce a dataset estimate. This is then used for all datasets in a fixed-ratio analysis as above.
Generate an approximate site-frequency spectrum
To perform a quick approximate site-frequency spectrum analysis, click 'show site-frequency spectrum'. This performs a quick-and-dirty site-frequency estimation using 20 bins from 0-1.
Determine site-class bins' intervals
The site-classes (low, medium and high) are used in the estimation as follows:
- Low-frequency substitutions are ignored
- Mid-frequency changes are used in neutral ratio inference if using an 'aggregated' or 'averaged' analysis (or ignored if using a fixed-ratio method)
- High-frequency substitutions are counted for adaptive substitutions.
By default, these bins' intervals are set to 0.0-0.15; 0.15-0.75; 0.75-1.0 respectively. To change these intervals click 'select site-class bin intervals'
Other settings
You can also set the number of boostraps (fewer than 30 is essentially meaningless) and an optional sliding-window analysis (this slows the analysis down a lot).
Run a full analysis
Click 'run analysis' to run the analysis.
Scatterplot interpretation
The scatterplots show two quantities; the number of non-neutral adaptations, correlated with time (plus regressions for any bootstrap replicates), and the plot of excess adaptive substitutions by site-frequency class.
Command-line operation
Open a console. For the CLI app v0.1.4, usage is pretty simple being one of:
java -jar ../../jarfiles/CLI-app-v0.1.4.jar <mask> <ancestral> <output> <mainfiles>
java -jar ../../jarfiles/CLI-app-v0.1.4.jar <mask> <ancestral> <output> <mainfiles> <ratio>
java -jar ../../jarfiles/CLI-app-v0.1.4.jar <mask> <ancestral> <output> <mainfiles> <bootstraps>
java -jar ../../jarfiles/CLI-app-v0.1.4.jar <mask> <ancestral> <output> <mainfiles> <bootstraps> <ratio>
Where: - Mask is a file containing one or more masklines. These are usually generated by the maskGenerator (hardcoded for now; runnable entrypoint TODO #18, but see the maskfiles in the examples folder) - Ancestral file is a .fasta containing the outgroup - Output is where main output will go (must be writable) - Fileslist is a comma-separated list of one or more main (focal) alignments - The next argument can (in the 5-arg version) be either an integer (interpreted as number of boostraps), or double (neutral ratio). - Alternatively for the 6-arg case, args 5 and 6 are parsed to the bootstrap replicates and ratio, respectively
Verbose mode debugs are supported by prepending 'true' to the argument list, e.g.
java -jar ../../jarfiles/CLI-app-v0.1.4.jar true <mask> <ancestral> <output> <mainfiles>
java -jar ../../jarfiles/CLI-app-v0.1.4.jar true <mask> <ancestral> <output> <mainfiles> <ratio>
java -jar ../../jarfiles/CLI-app-v0.1.4.jar true <mask> <ancestral> <output> <mainfiles> <bootstraps>
java -jar ../../jarfiles/CLI-app-v0.1.4.jar true <mask> <ancestral> <output> <mainfiles> <bootstraps> <ratio>
Interpreting your data
To do.
Troubleshooting
Check you can run the example files above. See the FAQ and contact pages for more.