There are two competing worldviews in the fields of biochemistry and molecular biology. The distinction was captured a few years ago by Laurence Hurst commenting on pervasive transcription when he said, "So there are two models; one, the world is messy and we're forever making transcripts we don't want. Or two, the genome is like the most exquisitely designed Swiss watch and we don't understand its working. We don't know the answer—which is what makes genomics so interesting." (Hopkins, 2009).
The distinction is important because, depending on your worldview, you will interpret things very differently. We see it in the debate over junk DNA where those in the Swiss watch category have trouble accepting that we could have a genome full of junk. Those in the Rube Goldberg category (I am one) tend to dismiss a lot of data as just noise or sloppiness.
Before I get howls of protest about false dichotomies, let me make it clear that there's quite a bit of middle ground. Some biological systems are, indeed, highly accurate and finely tuned by natural selection. But most are not. They are the product of evolution as a tinkerer resulting in a system that's just good enough for survival. What some see as essential complexity, others see as Rube Goldberg machines. They work, but that's not the way you would have designed it if you had the power.
Let's see how this distinction plays out in a recent paper that's just been posted on the Nucleic Acids Research website.
Hecht, A., Glasgow, J., Jaschke, P.R., Bawazer, L., Munson, M.S., Cochran, J., Endy, D., and Salit, M. (2017) Measurements of translation initiation from all 64 codons in E. coli. Nucleic acids research. [doi: 10.1093/nar/gkx070] [bioRixv]First, a bit of background. Translation in bacteria begins near the 5′ end of the mRNA at a specific initiation codon that's usually AUG. AUG is usually a methionine codon but in the context of initiation it specifies a particular tRNA called fMet-tRNAi because it specifies a modified methionine called N-formylmethionine.1 This initiator tRNA is only used during translation initiation because it binds to a specific initiation factor, IF-2. (All of the other aminoacylated tRNAs in the cells are bound in a complex with elongation factor Tu (EF-Tu).)
Initiation is special because the initial amino acid (fMet) must be bound to the ribosome complex in the absence of a growing polypeptide chain. It has to be inserted into the P-site that normally contains a tRNA bound to a polypetide chain. During elongation the incoming aminoacyl-tRNA is bound to the adjacent A-site (aminoacyl site).
How does the translation complex assemble at the initiation codon and not just at any methionine codon? As you might imagine, the selection of the correct codon for initiation depends on the sequence of bases in the immediate vicinity. In bacteria, the selection depends on a short sequence upstream of the proper initiation codon. It's called the Shine-Dalgarno sequence and it forms base pairs with an exposed bit of the 16S RNA in the small ribosomal subunit. This is how the translation complex is directed to assemble at the point where translation will begin.
Hetch et al. looked at the initiation codons in bacterial proteins and found that 82% of them were the cannonical AUG that's in most of the textbooks. Almost 14% were GUG (valine) codons that the initiator tRNA sees as a start site because it's adjacent to a Shine-Delgarno sequence. FMet is inserted at this site because two of the three bases match the standard initiation codon. About 4% of initiations occur at UUG (leucine)—another two out of three match. None of this is news. It's been known since the 1970s that some initiation codons are GUG and UUG (Miller, 1974).
The bottom line is that 99.95% of all initiation events occur at AUG and the two related sequences GUG and UUG. It means there's a little bit of sloppiness in forming the correct initiation complex.
Hetch et al. noted there were four other codons that popped up in the database, albeit at extremely low frequencies. The others are CUG, AUU, AUC, and AUA. All of them match two of the three bases in the standard initiation codon. No surprises there.
The authors wondered if other codons could be mistakenly used if they were in the right position relative to the Shine-Delgarno sequence so they constructed a series of plasmids to test them. Here's what some of those plasmids look like ...
The green part is the reporter gene whose protein product can be detected easily. (GFP stands for green fluorescent protein.) Transcription begins at a bacteriophage T7 promoter and ends at a T7 terminator. This ensures that a lot of transcripts will be made when the plasmid is inserted into E. coli cells.
RBS is the ribosome binding site otherwise known as the Shine-Delgarno sequence. In this case they used a very strong RBS, AGGAGA, to make sure that the translation complex was bound to the mRNA beside the initiation codon. Hetch et al, then tested every codon for its ability to initiate protein synthesis at a site that began eight bases downstream of the RBS. Here's the result ...
As expected, the best codons are AUG and its two relatives, GUG and UUG. Other codons are much less efficient with frequencies that are 0.01-0.1% of the AUG standard. There were 17 codons that didn't work at all.
This is all consistent with a Rube Goldberg worldview where sloppiness is the norm. In this case, translation initiation is pretty good but mistakes can be made. The elongation process of translation is also error-prone with a typical misincorporation rate of about 10-4 or one error in every 10,000 reactions. Most of those errors are due to single mismatches in the triplet codon.
The authors are aware of this view because they say,
Almost all E. coli genes with non-AUG start codons initiate with methionine as the N-terminal amino acid and such events are not considered to be errors in translation initiation. By this same logic, we argue that translation initiation of genes with other non-AUG codons, in which methionine is observed as the N-terminal amino acid, should not be considered an error. However, those wishing a strict interpretation of the central dogma could consider such events to be errors in translation initiation. All biological processes are governed by processes that imply a certain rate of unlikely events, and such unlikely events are often referred to as errors, failures, or leaks.This is not quite correct. As with the case in DNA binding proteins, optimal efficiently is achieved with a perfect match to the consensus sequence—in this case AUG—but close matches can be selected when less-than-perfect expression is beneficial. Thus, there are many weak DNA binding sites that are selected in bacteria because they reduce expression below optimal levels. The weak lac promoter is a classic example.
There may well be genes where GUG and UUG are used for a reason. That doesn't change the conclusion that there's an optimal initiation codon. This explanation for the low frequency of GUG and UUG initiation codons in some genes has been around for decades. It is almost certainly correct.
Hecht et al. make this point but I think they get carried away with Swiss watch thinking.
However, focusing only on the statistical likely outcome risks overlooking any advantageous aspects of rare but purposeful possibilities. Viewing non-cannonical start codons without commitment to traditional dogma may reveal them as a potential feature, rather than an error, in gene expression. [Translation, we have an open mind, not bound by traditional dogma.]This is confusing but the basic idea is sound. Non-canonical codons can be selected. However the authors go a bit too far in suggesting that ALL non-canonical initiation codons could be significant and could cause us to re-evaluate our understanding of translation.
For example, there may be evolutionary utility to translation initiation from non-canonical start codons. Research with yeast has shown gradual transitions of genetic sequences between genes and non-generic ORFs in related species. We can imagine a scenario wherein, over evolutionary time scales, point mutations could create a weak non-canonical initiation codon downstream of an RBS. The small amounts of protein produced from such an ORF, if beneficial to an organism, could select for further mutations that increased translation efficiency up to a point where the gene product more directly impacted organismal fitness. Further mutations could then be selected that tune for optimal expression dynamics in a given genetic context.
The presence of frequent but very low-level expression of proteins via non-canonical start codons would have widespread implications for genome annotation, cellular engineering and our fundamental understanding of translation initiation. We encourage reconsidering definitions and further exploration of what is considered a start codon.By promoting the novelty of their work, and down playing what was already known, the authors attracted the attention of science journalists who read the press release but didn't know the literature. Here's how the paper was reported on phys.org: Start codons in DNA may be more numerous than previously thought.
For decades, scientists working with genetic material have labored with a few basic rules in mind. To start, DNA is transcribed into messenger RNA (mRNA), and mRNA is translated into proteins, which are essential for almost all biological functions. The central principle regarding that translation has long held that only a small number of three-letter sequences in mRNA, known as start codons, could trigger the production of proteins. But researchers might need to revisit and possibly rewrite this rule, after recent measurements from a team including scientists from the National Institute of Standards and Technology (NIST).Yes, it's true the grammar might be more sophisticated than we imagined; however, it could also be more sloppy than we imagined.
The implications of the work could be quite profound for our understanding of biology.
"We want to know everything going on inside cells so that we can fully understand life at a molecular scale and have a better chance of partnering with biology to flourish together," said Stanford professor and JIMB colleague and advisor, Drew Endy. "We thought we knew the rules, but it turns out there's a whole other level we need to learn about. The grammar of DNA might be even more sophisticated than we imagined."
1. the 21st amino acid.
Miller, J.H. (1974) GUG and UUG are initiation codons in vivo. Cell, 1:73-76.