Wednesday, January 04, 2017

Do seahorses evolve faster?

Genome sequencing is becoming so routine that it's difficult to publish your new genome sequence in a top journal. The trick is to find something unique and exciting about your genome so you can attract the attention of the leading journals. The latest success is the seahorse genome published in the Dec. 15, 2016 issue of Nature (Lin et al., 2016.

The species is the tiger tail seahorse Hippocampus comes. The assembled genome is 502Mb or about 1/6th the size of the human genome. The seahorse has 23,458 genes (protein-coding?) or about the same number as most other vertebrates. About 25% of the genome is junk (transposon-related).1

So, what's unique about the seahorse? Just look at the photo. The seahorse doesn't look like any other fish. It has all kinds of specific features that make it a very weird fish. We don't know if these derived features are adaptive or not but we do know they have evolved in the seahorse lineage over the past 100 million years.

By itself, this phenotypic uniqueness probably wouldn't be enough to merit publication of the seahorse genome in Nature. Rapid evolution of phenotypes is not unusual and it's hard to pinpoint the changes at the molecular level. However, in this case the authors claim to have discovered a higher overall rate of evolution in seahorses compared to other fish.

They looked at a set of 4,122 orthologous genes and calculated mutation rates. The results are shown in Figure 1 in the paper.


Figure 1 | Adaptations and evolutionary rate of H. comes. a, Schematic diagram of a pregnant male seahorse. b, The phylogenetic tree generated using protein sequences. The values on the branches are the distances (number of substitutions per site) between each of the teleost fishes and the spotted gar (outgroup). Spotted gar, Lepisosteus oculatus; zebrafish, Danio rerio.
The differences in distance are quite small, ranging from 94% tp 99% of the seahorse value (0.463) in the major clade. Nevertheless, the authors claim the difference is statistically significant. They also looked specifically at neutral changes and found the same thing—faster in seahorses. The implication is that the strange morphological differences between seahorses and other species of fish can be explained by a faster mutation rate.

Here's the problem. I have no idea how they came up with these numbers. I can't possibly evaluate the quality of their data to know whether it's believable or not. Clearly the Nature referees thought it was good enough to publish. Those referees must be experts in this kind of analysis. Can someone out there help me understand the quality of this analysis? Here's the description of their method.
We obtained 4,122 one-to-one orthologous genes from the gene family analysis (Supplementary Information, section 4.1). The protein sequences of one-to-one orthologous genes were aligned using MUSCLE48 with the default parameters. We then filtered the saturated sites and poorly aligned regions using trimAl (ref. 49) with the parameters “-gt 0.8 –st 0.001 –cons 60”. After trimming the saturated sites and poorly aligned regions in the concatenated alignment, 2,128,000 amino acids were used for the phylogenomic analysis. The trimmed protein alignments were used as a guide to align corresponding coding sequences (CDSs). The aligned protein and the fourfold degenerate sites in the CDSs were each concatenated into a super gene using an in-house Perl script.

The phylogenomic tree was reconstructed using RAxML version 8.1.19 (ref. 50) based on concatenated protein sequences. Specifically, we used the PROTGAMMAAUTO parameter to select the optimal amino acid substitution model, specified spotted gar as the outgroup, and evaluated the robustness of the result using 100 bootstraps. To compare the neutral mutation rate of different species, we also generated a phylogeny based on fourfold degenerate sites. The phylogenomic topology was used as input and the “-f e” option in RAxML was used to optimize the branch lengths of the input tree using the alignment of fourfold degenerate sites under the general time reversible (GTR) model as suggested by ModelGenerator version 0.85 (ref. 51). We calculated the pairwise distances to the outgroup (spotted gar) based on the optimized branch length of the neutral tree using the cophenetic.phylo module in the R-package APE52. The Bayesian relaxed-molecular clock (BRMC) method, implemented in the MCMCTree program53, was used to estimate the divergence time between different species. The concatenated CDS of one-to-one orthologous genes and the phylogenomics topology were used as inputs. Two calibration time points based on fossil records, O. latipes–T. nigroviridis (~96.9–150.9 million years ago (Mya)), and D. rerio–G. aculeatus (~149.85–165.2 Mya) (http://www.fossilrecord.net/dateaclade/index.html), were used as constraints in the MCMCTree estimation. Specifically, we used the correlated molecular clock and REV substitution model in our calculation. The MCMC process was run for 5,000,000 steps and sampled every 5,000 steps. MCMCTree suggested that H. comes diverged from the common ancestor of stickleback, Nile tilapia, platyfish, fugu, and medaka approximately 103.8 Mya, which corresponds to the Cretaceous period.
Is the latest version of MUSCLE a good-enough alignment program to allow you to distinguish between a difference of 0.463 and 0.460 or 0.454? Are the other programs this good?

How do the calibration time points figure into the calculation? Does it make a difference if these time points are off by 5% or so?

The idea that seahorses evolve faster than other fish will now be incorporated into the scientific literature as a result of this publication. (See the cover of Nature, left.) But is it true?

There was a time when I could read a scientific paper in my field and evaluate the quality of the work and the validity of the conclusions. That time has passed with the reliance on big data and computer programs. Now I have to rely on the (presumably) expert reviewers to evaluate the quality of the work. That's a problem since we have plenty of evidence that the peer review process is seriously flawed.


Photo Credit: Aquariums Vietnam - International

1. Human genes take up about 30% of the total genome or approximately 960Mb. This is more DNA that the total genome of the seahorse. I assume the seahorse genes have smaller introns but that's not mentioned in the paper.

Lin, Q., Fan, S., Zhang, Y., Xu, M., Zhang, H., Yang, Y., Lee, A.P., Woltering, J.M., Ravi, V., and Gunter, H.M. (2016) The seahorse genome and the evolution of its specialized morphology. Nature, 540:395-399. [doi: 10.1038/nature20595]

21 comments :

  1. "Is MUSCLE48 a good-enough alignment program to allow you to distinguish between a difference of 0.463 and 0.460 or 0.454? Are the other programs this good?"

    It's just Muscle (48 is the number for the reference), which is certainly a standard enough multiple sequence alignment program. There are certainly different algorithms that are probably more accurate (under certain circumstances), but are often much slower.

    This line is interesting:
    "We then filtered the saturated sites and poorly aligned regions using trimAl (ref. 49) with the parameters “-gt 0.8 –st 0.001 –cons 60”."

    I believe that this means they filtered out alignment positions with gaps in more than 20% of the sequences, and an average similarity score (which utilizes a protein substitution matrix such as BLOSUM and is a measure of how similar on average different amino acids in the same alignment column are to each other) less than 0.001, unless this would leave less than 60% of the positions in the alignment. In which case, the top 60% of the alignment is retained.

    Not having read the paper or the supplementary methods information, I'm curious how sensitive the overall results are to these specific parameter choices.

    ReplyDelete
    Replies
    1. My experience with multiple sequence alignments is that you can always do a better job than the program. However, it's just not possible to align by hand with so much data. Some of older programs (CLUSTAL) weren't very good—at least not at the level required to distinguish between rates that differ by only a few percent. I've tried earlier versions of MUSCLE and I wasn't impressed with its ability to deal with gaps.

      The more data you toss out the greater the risk of missing something important, no? And not counting gaps can lead to false phylogenies. I'm not even sure that similarity, as opposed to identity, improves accuracy.

      In any case, I suspect the error bars at each step of their analyses lead to a cumulative uncertainty that's much greater than the final scores they report. I can't imagine what kind of statistical test they performed to account for those possible errors.

      The bootstrap scores are all 100. That looks very suspicious to me. Perhaps Joe can help me understand?

      Delete
    2. My experience with multiple sequence alignments is that you can always do a better job than the program.

      Mine too. But it's so tedious.

      Given enough data, you can easily get a bootstrap score of 100 for data with even a very slight bias, whether that bias is phylogenetic or otherwise. I see no reason to doubt the bootstrap.

      Delete
    3. "My experience with multiple sequence alignments is that you can always do a better job than the program. However, it's just not possible to align by hand with so much data. Some of older programs (CLUSTAL) weren't very good—at least not at the level required to distinguish between rates that differ by only a few percent. I've tried earlier versions of MUSCLE and I wasn't impressed with its ability to deal with gaps. "

      Beyond the intractability of doing thousands of alignments by hand, it's also impossible for somebody to repeat the analysis later even if they have the same data. I don't particularly like how MUSCLE deals with gaps either. Other programs such as PRANK or PAGAN do a better job inserting gaps in more reasonable places, IMO, but they are much, much slower.

      "In any case, I suspect the error bars at each step of their analyses lead to a cumulative uncertainty that's much greater than the final scores they report. I can't imagine what kind of statistical test they performed to account for those possible errors."

      My guess is that they did little to account for the accumulating error potential.

      I once heard Casey Dunn give an interesting talk about attempting to account for all the uncertainty produced during each individual steps of a large-scale analysis (ideas summarized in this paper: http://www.sciencedirect.com/science/article/pii/S0169534715003043), but AFAIK actually implementing some sort of workflow-wide measure of total uncertainty is not standard (nor perhaps even feasible).

      Delete
    4. Larry: "My experience with multiple sequence alignments is that you can always do a better job than the program"

      What do you mean by a "better job"? If you could define what "better" means in a rigorous fashion, you could write a program to do what you want. Yes, in grad school I remember working with manual alignment editors, but it always seemed dangerous to me -- much like with a Ouija board, you could subconsciously guide the result to get what you want rather than being consistent.

      Delete
  2. Perhaps it's time to consider the difference between "significant" and "meaningful". Supposing they actually do have enough statistical power to distinguish .463 from .460, is it at all sensible to call this a meaningful difference in rate of evolution? Does it really need an explanation, and is it capable of explaining anything?

    ReplyDelete
    Replies
    1. Yeah why would they think the particular properties of the seahorse is attributable to this slightly increased rate of molecular evolution, rather than the particular nature of those changes?

      Isn't it much more likely that the unique attributes of the organism is due to particular specific mutations in their combination, rather than the mere number of them? Do they even speak about this possibility in the paper?

      Delete
    2. Well past time if you ask me. Everyone should read The Insignificance of Statistical Significance Testing, Douglas H. Johnson, 1999.
      http://www.auburn.edu/~tds0009/Articles/Johnson%201999.pdf

      Delete
  3. Some notes:
    1) http://palaeo.gly.bris.ac.uk/fossilrecord2/dateaclade/index.html is the correct link (the papers URL http://www.fossilrecord.net/dateaclade/index.html leads to a plumbing website). That implies none of the reviewers actually checked their references and they actually use the term "fossil record" incorrectly here (I've corrected this usage in a couple of manuscripts and for some reason molecular biologists like to use "a fossil record" where "a fossil" or "a fossil occurence" would be appropriate. The "fossil record of taxon A" always refers to all fossils that have been described to taxon A. For dating usually only the oldest one (the first appearance of taxon A in the fossil record in toto) is relevant and that's just a single occurence).
    2) There are 3 nodes in their phylogeny which have fossil dates on the website above. Why only 2 were used is unclear.
    3) The website does give the reference for these ages: Benton, M.J. and Donoghue, P.C.J. (2006). Paleontological Evidence to Date the Tree of Life. Mol Biol Evol 24:26-53. But it's not as if citations were relevant for paleontologists, so you could just give a link to an amateur plumbing website, rather than refer to the paper.
    4) While we're on this, note that the paper is over 10 years old. Before you use this data, you should at least check if any relevant fossils have been described recently. Generally the optimal way to do this is to get in contact with paleontologists working on relevant taxa, because they will be familiar with the literature. Failing that, there's the PaleoDB, which for any given taxon will give you a list of candidate fossils, each with references for the paper in which they are described and a reference for the age of the locality. There's also Parham et al. (2012). Best practices for justifying fossil calibrations. Syst Biol 61:346–359. (note that Parham et al. give 5 criteria for a well justified fossil calibration. This paper meets precisely 0 of them)
    5) The practive of Benton and Donoghue to derrive soft maxima for first appearance dates (FADs) from FADs of outgroups is at least controversial (and I would personally advocate against it).

    ReplyDelete
    Replies
    1. 6) It's still unclear how the calibration points were implemented in MCMCtree. MCMCtree offers multiple options and from the data given several could be used. More importantly MCMCtree requires(!) a soft maximum age for the root. This is not given here at all. Also missing is a statement on whether the aproximate likelihhod method was used. It makes sense here and is pretty much standard but still should be mentioned.
      7) Specifically, we used the correlated molecular clock and REV substitution model in our calculation.
      And here is another whopper. The default setting in MCMCtree is the independent clock model. You could also implement the more recent DPP, which isn't fully tested or the correlated rates model. Now, there is a reason independent rates (IR) are the default and correlated rates (CR) is not. Both IR and CR were used in early relaxed clock dating and the includion of CR in MCMCtree for legacy reasons is sometimes useful. But one big issue with CR is that it can lead to very large errors in estimated mutation rates (orders of magnitude to high or low). In the best case it makes errors comparable with IR, but it can go wrong badly. We've recently preformed a CR analysis, because a reviewer asked us to ("just for comparison"), but we give several references in a paragraph on why IR is more reliable. There are some theoretical ideas on how CR could be improved, but the implementations that exist just make it more likely to generate a spectacularly wrong result. At the very least the use of the CR model warrants discussion and even more so if a central claim of the paper is an elevated evolutionary rate, since CR is known to regularly produce erroneously high evolutionary rates.

      @Dave: Filtering out positions with more than 20% gaps among taxa is reasonable for use with MCMCtree, since the program is prone to crashes otherwise (dos Reis, pers com).

      Delete
    2. "@Dave: Filtering out positions with more than 20% gaps among taxa is reasonable for use with MCMCtree, since the program is prone to crashes otherwise (dos Reis, pers com)."

      Interesting, thanks!

      Delete
  4. As far as I can judge, the programs used by the authors have been thoroughly tested, and e.g. I would find it highly unlikely that the higher rates in seahorses result from alignment errors produced by MUSCLE. There are a number of reasons why the branch lengths in the tree could be biased, including topological errors, incomplete lineage sorting, and introgression. So the branch lengths alone, and the rates calculated from them, would not be sufficient to conclude that the rate of evolution is higher in sea horses. However, the rate tests conducted by the authors were based on a set of three-species comparisons of the numbers of lineage-specific substitutions, which, as I see it, should not be affected by the same issues as the branch lengths in the tree of all 9 species (Fig. 1). On the other hand, the rate tests could be affected by other issues such as unidentified paralogous genes. I would doubt that this possibility can be excluded. Fortunately, as more genomes of teleost fishes are coming out (the genome of a closely related pipefish was published very recently: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1126-6), we will soon find out whether the observed higher rate of evolution in seahorses is real or not.

    ReplyDelete
    Replies
    1. The branch lengths in a tree should represent the amount of change since the last common ancestor (node). To a first approximation, the branch lengths represent real biological events such as mutation rate and fixation of alleles.

      The authors calculated branch lengths based on imperfect data (DNA) sequences), imperfect sequence alignments, various assumptions about evolutionary change, and computer programs that attempt to mimic real biological evolution. We all hope that the end result is a value that corresponds, or at least correlates, with the real biological branch lengths. But let's not pretend that those values are so incredibly accurate that they can distinguish between branch lengths that differ by only a few percent.

      Perhaps it would have been better if the authors had simply stated that according to their calculations the seahorse branch length is a little bit longer than some other fish. This may be due do uncontrolled variables in their data or their programs or it may be due to some underlying biological phenomenon, such as an unusual increase in mutation/fixation rate.

      This kind of statement would probably not make the cover of Nature.

      Delete
  5. I think it would have been more interesting if they had shown that the differences were concentrated in particular regions, such as the Hox clusters, the BMP, FGF loci etc.

    ReplyDelete
    Replies
    1. Why would they be concentrated in particular regions?

      Delete
    2. The changes would be concentrated in the coding and regulatory regions of developmental control genes. The changes MUST be there, but I would have assumed that the signal would be swamped by drift.
      It seems to me that considering morphology, its just a fact that seahorses have evolved faster. Their basic body plan has changed more than any other group of teleosts in X amount of time. The question is can you find that signature in genome as well? As I mentioned above, I think the relevant mutations are too small in number relative to drift. ...... but what with all these fancy super-sequencers and mega-computers nowadays...who knows??

      Delete
  6. Okay, I legitimately thought the journalists and/or press officers were making that claim, because a quick glance at the phylogeny in the paper did not in any way strike me as having something especially fast-evolving in it. Didn't realise the authors were making that claim, with a couple % difference, and such minimal taxon sampling (inherent to phylogenomics, which is why you don't generally use it quantitatively (yet)). That's cute.

    The "Yet Another Genome" papers are a problem of the overall publishing system and shoddy funding for "infrastructure" work -- stuff that's not very exciting on the surface, but necessary for later analysis. Genomes are expensive and proper assembly and annotation extremely time consuming (and not subject to Moore's Law), so I can see why people are a bit desperate to find some excuse to stick the results in a shinier journal. This genome will be useful for later analyses, whatever they may be, but that's not something you can immediately work out following a completed genome project, so there's not really an initial good reward for spending all the money and effort. I think we really do need more funding for this sort of "infrastructure" -- data that is inherently useful but not particularly exciting in isolation. Then we might see fewer obnoxious claims like this. Maybe.

    ReplyDelete
  7. To me concerns should not be posed in the programs. When comparing species I am much more worried about gene models. Gene models are usually of bad quality, and most importantly the quality is not uniform across species. In our analysis of the Iberian lynx genome we found that gene-model qualities had a great impact and we had to come to innovative filtering strategies to get reliable results. An alternative solution was to not use gene models but genomic alignments of non-coding sequences, though for this you need closely related species.

    ReplyDelete
    Replies
    1. "Closely related" meaning what? In birds, neutrally evolving sequences can be fairly well aligned for divergences of at least 100 million years, given reasonably dense taxon sampling. In crocodylians, perhaps even further, though the ages of nodes there are still not clear to me. Has anyone tried aligning junk sequences in teleosts at any taxonomic depth?

      Delete
    2. I wasn't very precise and didn't know neutrally evolving sequences of birds could be aligned well even for divergences up to 100 million years, thanks for the information.

      Delete
  8. I wouldn't talk about "mutation rates". The rates we see in sequences is the result of mutation rates, drift, selection, fixation...

    ReplyDelete