Thursday, January 05, 2017

Birth and death of genes in a hybrid frog genome

De novo genes1 are quite rare but genome duplications are quite common. Sometimes the duplicated regions contain genes so the new genome contains two copies of a gene that was formerly present in only one copy. "Common" in this sense means on a scale of millions of years. Michael Lynch and his colleague have calculated that the rate of fixed gene duplication is about 0.01 per gene per million years (Lynch and Conery, 2003 a,b; Lynch 2007). Since a typical vertebrate has more than 20,000 genes, this means that 200 genes will be duplicated and fixed every million years.

The initial duplication event is likely to be deleterious since there will now be redundant DNA in the genome. The slightly deleterious allele (duplication) can be purged by negative selection in species with large population sizes (e.g. bacteria). But in species with smaller populations, natural selection is not powerful enough to eliminate slightly deleterious alleles so the duplication persists and may become fixed in the population.

Following the "birth" of a new gene by duplication, there are several possible fates for the duplicated gene in such species. They have been explored by Masatoshi Nei and his collaborators over the past 30 years. The process is usually referred to as "Birth-and-Death Evolution" (see Nei and Rooney, 2005). The basic idea goes back to Susumu Ohno (e.g. Ohno, 1972; see Meyer and Van der Peer, 2003). Here are the possible fates.
  1. One of the genes will "die" by acquiring fatal mutations. It becomes a pseudogene.
  2. One of the genes will die by deletion.
  3. Both genes will survive because having extra gene product (e.g. protein) will be beneficial (gene dosage).
  4. One of the genes acquires a new beneficial mutation that creates a new function and at the same time causes loss of the old function (neofunctionalization). Now both genes are retained by positive selection and the complexity of the genome has increased.
  5. Both genes acquire mutations that diminish function so the genome now needs two copies of the gene in order to survive (subfunctionalization).
These five fates constitute birth-and-death evolution in gene families.

Birth and death can be studied by looking at related lineages or by looking at large gene families. (The original idea came from studying genes at the MHC locus in mammals.) But there's a much easier way of getting information on this phenomemon and that's by looking at species that have undergone whole genome duplications (WGD) to create tetraploid cells. In those cases, every gene has been duplicated and you can follow the fate of every duplicate over time.

About a year ago (Nov. 2015) I reported on a study of the salmon genome which underwent a genome duplication event about 96 million years ago [The birth and death of salmon genes]. In that species, about half the genes have been lost by deletion or by fatal mutations leading to a pseudogene. The other half appear to be functional. Most of them seem to have the same function as the original gene suggesting that there hasn't been enough time to inactivate them by mutation.

A similar study was done in carp where the genome duplication event took place only 8 My ago.

The latest study looks at the genome of the frog, Xenopus laevis (Session et al., 2016). The genome of this frog is essentially tetraploid, it derives from a hybridization between two species about 18 My ago. This was immediately followed by whole genome duplication (WGD) to restore chromosome pairing during meiosis. The two (unknown) progenitor species diverged about 34 My ago.

There's a nice News & Views article to accompany the paper: Genomics: A matched set of frog sequences (Burgess, 2017).

The genome has 45,000 protein-coding genes and many uncharacterized genes for functional RNAs. Recall that typical vertebrates have about 20,000 genes so this tetraploid (allotetraploid) species should have had about 80,000 genes originally. The authors concentrated on those genes that were clearly present in related species (Xanopus tropicalis). Most were present in only one or two copies genes but some were part of large gene familes. There were 8,806 homologous pairs (17,212 genes) and 6,807 single copy genes. Clearly, a great many genes have been lost over the past 18 My years since the genome duplication event.

The analyses looked at which pairs of genes from the two different genomes survive. The authors found that genes were preferentially lost from the chromosomes of one of the parent species and not from the other. The reason for this preferential loss is not known.

Of the genes that are missing, the authors estimate that about 36% of the duplicate genes were deleted and about 64% became non-functional pseudogenes. The birth-and-death rates in Xenopus laevis are about the same as in other species that have been looked at, although there may be a slight tendency to retain more duplicate copies than in salmon or carp. It's possible that dosage effects are more important in this hybrid.

You can't easily determine whether some duplicated copies have acquired additional functions by neofunctionalization of subfunctioalization. The gene expression studies indicate that most duplicated genes have the same expression profile suggesting they probably still have the same function. Some are different and that's an indication of divergence.

All-in-all, the results are consistent with the basic concepts of birth-and-death and neutral evolution. Following duplication, the large genome gradually evolves back to the original number of genes but much of the duplicated sequences are retained as junk DNA. That's the same thing that happens with local DNA duplications.

1. Genes that arise from DNA sequences that are not genes in closely related species.

Lynch, M., and Conery, J.S. (2003a) The evolutionary demography of duplicate genes. Journal of structural and functional genomics, 3:35-44. [doi: 10.1023/A:1022696612931]

Lynch, M., and Conery, J.S. (2003b) The origins of genome complexity. Science, 302:1401-1404. [doi: 10.1126/science.1089370 ]

Lynch, M. (2007) The origins of genome architecture. Sunderland Massachusetts, USA: Sinauer Associates, Inc. Publishers. p. 45

Meyer, A., and Van de Peer, Y. (2003) 'Natural selection merely modified while redundancy created'–Susumu Ohno's idea of the evolutionary importance of gene and genome duplications. Journal of structural and functional genomics, 3:7-9. [PubMed]

Nei, M., and Rooney, A.P. (2005) Concerted and birth-and-death evolution of multigene families. Annual review of genetics, 39:121-152. [doi: 10.1146/annurev.genet.39.073003.112240]

Ohno, S. (1972) So much "junk" in our genome. In H. H. Smith (Ed.), Evolution of genetic systems (Vol. 23, pp. 366-370): Brookhaven symposia in biology.

Session, A.M., Uno, Y., Kwon, T., Chapman, J.A., Toyoda, A., Takahashi, S., Fukui, A., Hikosaka, A., Suzuki, A., and Kondo, M. (2016) Genome evolution in the allotetraploid frog Xenopus laevis. Nature, 538:336-343. [doi: 10.1038/nature19840]


  1. Thank you, Larry
    You have summed up the contents of Session et al.’s paper quite well. But should´t it have been mentioned in the reference list?

  2. Maybe it's too late in the day, but if 36% of the duplicated genes were deleted, and 64% are now pseudogenes, then doesn't that mean that none of the genes were neofunctionalized or subfunctionalized?

  3. Very interesting post. I will use this example when teaching Evolution next semester.

  4. De novo genes are quite rare but genome duplications are quite common. Sometimes the duplicated regions contain genes so the new genome contains two copies of a gene that was formerly present in only one copy. "Common" in this sense means on a scale of millions of years.

    I don't know what to tell you Larry but unless you have some hidden, miracles evolutionary powers of unknown evolutionary mechanisms, I just don't know how you explain evolution to yourself first and then, to your students and the public.

    If de novo genes are quite rare and the only "evolution" proven in the lab breaks functional genes, in order to do ...whatever... I just don't know how you could possibly explain the evolution of previously non existent new functions not even mentioning new organs, new body plans or totally different species.

  5. Very cool paper. Thanks for writing about it, Larry. A few years ago I developed some software for doing phylogenetic analysis on allopolyploids ( I am a mathematician, not a biologist, but ended up learning quite a lot about allopolypoids.

    If assembling a geneome from fragments is like solving a jigsaw, assembling an allotetraploid genome is like simultaneously solving two similar jigsaws where all the pieces have been mixed together. This seems to me the main achievement of the paper. When this lab technique becomes routine and cheap, my software will become obsolete.

    I think a definition of homoeologs would be useful. Homoeologs are "pairs of genes or chromosomes in the same species that originated by speciation and were brought back together in the same genome by allopolyploidization". See for more. Larry just calls them homologs, but they are very special things. In this case, each pair of homoeologous genes spent 16My evolving in different species. When they met again 18My ago, and became possibly-redundant, they could be quite different from one another. This is not the same sort of starting point as a single gene duplication. I'd guess neofunctionalization and subfunctionalization are both more likely for homoeologs.

    I don't understand how Larry gets from 20,000 genes to 80,000 genes. If the parental species of Xenopus laevis were diploids with 20,000 genes, Xenopus laevis would have 40,000 genes.

    I think this statement is potentially misleading: "The initial duplication event is likely to be deleterious since there will now be redundant DNA in the genome." Typically, and as a first approximation, when whole genome duplications occur, cell volumes also double, but organisms and organs stay the same size, so cell numbers halve. There isn't an obvious metabolic cost in having twice as much DNA per cell. Allopolyploidization is a big event with a lot of possible fitness consequences. I think redundant DNA is one of the least of the problems faced by a newly-formed allopolyploid.