Bathroom renovation website. Helpful Hints

What property of the genetic code is called universality. What is the genetic code: general information

- one system records of hereditary information in nucleic acid molecules in the form of a sequence of nucleotides. The genetic code is based on the use of an alphabet consisting of only four nucleotide letters that differ in nitrogenous bases: A, T, G, C.

The main properties of the genetic code are as follows:

1. The genetic code is triplet. A triplet (codon) is a sequence of three nucleotides that codes for one amino acid. Since proteins contain 20 amino acids, it is obvious that each of them cannot be encoded by one nucleotide (since there are only four types of nucleotides in DNA, in this case 16 amino acids remain uncoded). Two nucleotides for coding amino acids are also not enough, since in this case only 16 amino acids can be encoded. Means, smallest number nucleotides encoding one amino acid is equal to three. (In this case, the number of possible nucleotide triplets is 4 3 = 64).

2. The redundancy (degeneracy) of the code is a consequence of its triplet nature and means that one amino acid can be encoded by several triplets (since there are 20 amino acids, and 64 triplets). The exceptions are methionine and tryptophan, which are encoded by only one triplet. In addition, some triplets perform specific functions. So, in the mRNA molecule, three of them - UAA, UAG, UGA - are terminating codons, i.e., stop signals that stop the synthesis of the polypeptide chain. The triplet corresponding to methionine (AUG), standing at the beginning of the DNA chain, does not encode an amino acid, but performs the function of initiating (exciting) reading.

3. Simultaneously with redundancy, the code has the property of unambiguity, which means that each codon corresponds to only one specific amino acid.

4. The code is collinear, i.e. The sequence of nucleotides in a gene exactly matches the sequence of amino acids in a protein.

5. The genetic code is non-overlapping and compact, that is, it does not contain "punctuation marks". This means that the reading process does not allow for the possibility of overlapping columns (triplets), and, starting at a certain codon, the reading goes continuously triple by triplet up to stop signals (terminating codons). For example, in mRNA, the following sequence of nitrogenous bases AUGGUGCUUAAAUGUG will only be read in triplets like this: AUG, GUG, CUU, AAU, GUG, not AUG, UGG, GGU, GUG, etc. or AUG, GGU, UGC, CUU, etc. or in some other way (for example, codon AUG, punctuation mark G, codon UHC, punctuation mark Y, etc.).

6. The genetic code is universal, that is, the nuclear genes of all organisms encode information about proteins in the same way, regardless of the level of organization and the systematic position of these organisms.

Every living organism has a special set of proteins. Certain compounds of nucleotides and their sequence in the DNA molecule form the genetic code. It conveys information about the structure of the protein. In genetics, a certain concept has been adopted. According to her, one gene corresponded to one enzyme (polypeptide). It should be said that research on nucleic acids and proteins has been carried out for a fairly long period. Further in the article, we will take a closer look at the genetic code and its properties. Will also be given brief chronology research.

Terminology

The genetic code is a way of encoding the amino acid protein sequence with the participation of the nucleotide sequence. This method of forming information is characteristic of all living organisms. Proteins are natural organic substances with high molecular weight. These compounds are also present in living organisms. They consist of 20 types of amino acids, which are called canonical. Amino acids are arranged in a chain and connected in a strictly established sequence. It determines the structure of the protein and its biological properties. There are also several chains of amino acids in the protein.

DNA and RNA

Deoxyribonucleic acid is a macromolecule. She is responsible for the transmission, storage and implementation of hereditary information. DNA uses four nitrogenous bases. These include adenine, guanine, cytosine, thymine. RNA consists of the same nucleotides, except for the one that contains thymine. Instead, a nucleotide containing uracil (U) is present. RNA and DNA molecules are nucleotide chains. Thanks to this structure, sequences are formed - the "genetic alphabet".

Implementation of information

The synthesis of a protein encoded by a gene is realized by combining mRNA on a DNA template (transcription). There is also a transfer of the genetic code into a sequence of amino acids. That is, the synthesis of the polypeptide chain on mRNA takes place. To encode all amino acids and signal the end of the protein sequence, 3 nucleotides are enough. This chain is called a triplet.

Research History

The study of protein and nucleic acids was carried out long time. In the middle of the 20th century, the first ideas about the nature of the genetic code finally appeared. In 1953, it was found that some proteins are made up of sequences of amino acids. True, at that time they could not yet determine their exact number, and there were numerous disputes about this. In 1953, Watson and Crick published two papers. The first declared the secondary structure of DNA, the second spoke of its admissible copying by means of matrix synthesis. In addition, emphasis was placed on the fact that a particular sequence of bases is a code that carries hereditary information. American and Soviet physicist Georgy Gamov admitted the coding hypothesis and found a method to test it. In 1954, his work was published, during which he put forward a proposal to establish correspondences between amino acid side chains and diamond-shaped "holes" and use this as a coding mechanism. Then it was called rhombic. Explaining his work, Gamow admitted that the genetic code could be triplet. The work of a physicist was one of the first among those that were considered close to the truth.

Classification

After several years, it was proposed various models genetic codes, which are of two types: overlapping and non-overlapping. The first one was based on the occurrence of one nucleotide in the composition of several codons. The triangular, sequential and major-minor genetic code belongs to it. The second model assumes two types. Non-overlapping include combinational and "code without commas". The first variant is based on the encoding of an amino acid by nucleotide triplets, and its composition is the main one. According to the "no comma code", certain triplets correspond to amino acids, while the rest do not. In this case, it was believed that if any significant triplets were arranged sequentially, others located in a different reading frame would turn out to be unnecessary. Scientists believed that it was possible to select a nucleotide sequence that would meet these requirements, and that there were exactly 20 triplets.

Although Gamow et al questioned this model, it was considered the most correct over the next five years. At the beginning of the second half of the 20th century, new data appeared that made it possible to detect some shortcomings in the "code without commas". Codons have been found to be able to induce protein synthesis in vitro. Closer to 1965, they comprehended the principle of all 64 triplets. As a result, redundancy of some codons was found. In other words, the sequence of amino acids is encoded by several triplets.

Distinctive features

The properties of the genetic code include:

Variations

For the first time, the deviation of the genetic code from the standard was discovered in 1979 during the study of mitochondrial genes in the human body. Further similar variants were identified, including many alternative mitochondrial codes. These include the deciphering of the stop codon UGA used as the definition of tryptophan in mycoplasmas. GUG and UUG in archaea and bacteria are often used as starting variants. Sometimes genes code for a protein from a start codon that differs from the one normally used by that species. Also, in some proteins, selenocysteine ​​and pyrrolysine, which are non-standard amino acids, are inserted by the ribosome. She reads the stop codon. It depends on the sequences found in the mRNA. Currently, selenocysteine ​​is considered the 21st, pyrrolizan - the 22nd amino acid present in proteins.

General features of the genetic code

However, all exceptions are rare. In living organisms, in general, the genetic code has a number of common features. These include the composition of the codon, which includes three nucleotides (the first two belong to the determining ones), the transfer of codons by tRNA and ribosomes into an amino acid sequence.

Leading scientific journal Nature announced the discovery of a second genetic code - a kind of "code within a code", which was recently cracked by molecular biologists and computer programmers. Moreover, in order to reveal it, they did not use evolutionary theory, but information technology.

New code called the Splicing Code. It is within the DNA. This code controls the underlying genetic code in a very complex yet predictable way. The splicing code controls how and when genes and regulatory elements are assembled. Revealing this code within a code helps shed light on some of the long-standing mysteries of genetics that have surfaced since the Complete Human Genome Sequencing Project. One such mystery was why there are only 20,000 genes in an organism as complex as the human being? (Scientists expected to find a lot more.) Why are genes broken into segments (exons) that are separated by non-coding elements (introns) and then joined together (i.e., spliced) after transcription? And why are genes turned on in some cells and tissues and not in others? For two decades, molecular biologists have tried to elucidate the mechanisms of genetic regulation. This article points to a very important point understanding what is really going on. It doesn't answer every question, but it does demonstrate that the internal code exists. This code is a communication system that can be deciphered so clearly that scientists could predict how a genome might behave in certain situations and with inexplicable accuracy.

Imagine that in next room you hear the orchestra. You open the door, look inside and see three or four musicians in the room playing the musical instruments. This is what Brandon Frey, who helped break the code, says the human genome looks like. He says: “We were only able to detect 20,000 genes, but we knew that they form great amount protein products and regulatory elements. How? One of the methods is called alternative splicing". Different exons (parts of genes) can assemble different ways. “For example, three genes for the neurexin protein can create over 3,000 genetic messages that help control the brain’s wiring system.” Frey says. Right there in the article, it says that scientists know that 95% of our genes have alternative splicing, and in most cases in different types transcripts (RNA molecules resulting from transcription) are expressed differently in cells and tissues. There must be something that controls how these thousands of combinations are assembled and expressed. This is the task of the Splicing Code.

Readers who want a quick overview of the discovery can read the article at Science Daily entitled "Researchers who cracked the 'Splicing Code' unravel the mystery behind biological complexity". The article says: “Scientists at the University of Toronto have gained a fundamental new understanding of how living cells use a limited number of genes to form incredibly complex organs like the brain.”. Nature magazine itself begins with Heidi Ledford's "Code Within Code." This was followed by a paper by Tejedor and Valcarcel titled “Gene Regulation: Breaking the Second Genetic Code. Finally, a paper by a group of researchers from the University of Toronto led by Benjamin D. Blencoe and Brandon D. Frey, "Deciphering the Splicing Code," was decisive.

This article is an information science victory that reminds us of codebreakers from World War II. Their methods included algebra, geometry, probability theory, vector calculus, information theory, program code optimization, and other advanced techniques. What they didn't need was evolutionary theory which was never mentioned in scientific articles. Reading this article, you can see how much tension the authors of this overture are under:

“We describe a ‘splicing code’ scheme that uses combinations of hundreds of RNA properties to predict tissue-mediated changes in alternative splicing of thousands of exons. The code establishes new classes of splicing schemes, recognizes different regulatory programs in different fabrics and establishes mutation-controlled regulatory sequences. We have uncovered widely used regulatory strategies, including: using unexpectedly large property pools; detection of low levels of exon inclusion, which are attenuated by the properties of specific tissues; the manifestation of properties in introns is deeper than previously thought; and modulation of the levels of the splice variant by the structural characteristics of the transcript. The code helped establish a class of exons whose inclusion mutes expression in adult tissues, activating mRNA degradation, and whose exclusion promotes expression during embryogenesis. The code facilitates disclosure and detailed description genome-wide regulated alternative splicing events.

The team that cracked the code included specialists from the Department of Electronics and Computer Engineering, as well as from the Department of Molecular Genetics. (Frey himself works for Microsoft Research, a division of Microsoft Corporation) Like the decoders of the past, Frey and Barash developed « new method computer-assisted biological analysis that discovers the 'code words' hidden within the genome". With the help of a huge amount of data created by molecular geneticists, a group of researchers carried out "reverse engineering" of the splicing code until they could predict how he would act. Once the researchers got the hang of it, they tested the code for mutations and saw how exons were inserted or removed. They found that the code could even cause tissue-specific changes or act differently depending on whether it was an adult mouse or an embryo. One gene, Xpo4, is associated with cancer; The researchers noted: “These data support the conclusion that Xpo4 gene expression must be tightly controlled to avoid potential detrimental effects, including oncogenesis (cancer), since it is active during embryogenesis but is reduced in adult tissues. It turns out that they were absolutely surprised by the level of control they saw. Intentionally or not, Frey did not use random variation and selection as a clue, but the language of intelligent design. He noted: "Understanding a complex biological system is like understanding a complex electronic circuit."

Heidi Ledford said that the apparent simplicity of Watson-Crick's genetic code, with its four bases, triplet codons, 20 amino acids, and 64 DNA "characters" - hides a whole world of complexity. Encapsulated within this simpler code, the splicing code is much more complex.

But between DNA and proteins lies RNA, a separate world of complexity. RNA is a transformer that sometimes carries genetic messages, and sometimes controls them, while using many structures that can influence its function. In an article published in the same issue, a team of researchers led by Benjamin D. Blencoe and Brandon D. Frey at the University of Toronto in Ontario, Canada, report attempts to unravel a second genetic code that can predict how messenger RNA segments are transcribed from a particular genes can mix and match to form a variety of products in different tissues. This process is known as alternative splicing. This time there is no simple table - instead, algorithms that combine more than 200 various properties DNA with RNA structure definitions.

The work of these researchers points to the rapid progress that computational methods have made in modeling RNA. In addition to understanding alternative splicing, computer science is helping scientists predict RNA structures and identify small regulatory fragments of RNA that do not code for proteins. "It's a wonderful time", says Christopher Berg, a computer biologist at the Massachusetts Institute of Technology in Cambridge. “In the future, we will have a huge success”.

Computer science, computer biology, algorithms, and codes were not part of Darwin's vocabulary when he developed his theory. Mendel had a very simplified model of how traits are distributed during inheritance. In addition, the idea that features are encoded was only introduced in 1953. We see that the original genetic code is regulated by an even more complex code included in it. These are revolutionary ideas.. Moreover, there are all indications that this level of control is not the last. Ledford reminds us that, for example, RNA and proteins have a three-dimensional structure. The function of molecules can change when their shape changes. There must be something that controls folding so that the three-dimensional structure does what the function requires. In addition, access to genes appears to be controlled another code, histone code. This code is encoded by molecular markers or "tails" on histone proteins that serve as centers for DNA coiling and supercoiling. Describing our time, Ledford speaks of "permanent renaissance in RNC informatics".

Tejedor and Valcarcel agree that complexity lies behind simplicity. “In theory, everything looks very simple: DNA forms RNA, which then creates a protein”, - they begin their article. “But the reality is much more complicated.”. In the 1950s, we learned that all living organisms, from bacteria to humans, have a basic genetic code. But we soon realized that complex organisms (eukaryotes) have some unnatural and difficult to understand property: their genomes have peculiar sections, introns, that must be removed so that exons can join together. Why? The fog is clearing today “The main advantage of this mechanism is that it allows different cells choose alternative ways of splicing the messenger RNA precursor (pre-mRNA) and thus one gene forms different messages”, they explain, “and then different mRNAs can encode different proteins with various functions". From less code, you get more information, as long as there is this other code inside the code that knows how to do it.

What makes cracking the splicing code so difficult is that the factors that control exon assembly are set by many other factors: sequences near exon boundaries, intron sequences, and regulatory factors that either aid or inhibit the splicing mechanism. Besides, "the effects of a certain sequence or factor may vary depending on its location relative to the boundaries of the intron-exon or other regulatory motifs", - Tejedor and Valcarcel explain. “Therefore, the most difficult task in predicting tissue-specific splicing is to compute the algebra of the myriad of motifs and the relationships between the regulatory factors that recognize them.”.

To solve this problem, a team of researchers entered into the computer a huge amount of data about the RNA sequences and the conditions under which they were formed. "The computer was then given the task of identifying the combination of properties that would best explain the experimentally established tissue-specific exon selection.". In other words, the researchers reverse engineered the code. Like World War II codebreakers, once scientists know the algorithm, they can make predictions: "It correctly and accurately identified alternative exons and predicted their differential regulation between pairs of tissue types." And just like any good scientific theory, the discovery provided new insights: “This allowed us to re-explain previously established regulatory motivations and pointed to previously unknown properties of known regulators, as well as unexpected functional relationships between them.”, the researchers noted. “For example, the code implies that the inclusion of exons leading to processed proteins is a general mechanism for controlling the process of gene expression during the transition from embryonic tissue to adult tissue.”.

Tejedor and Valcarcel consider the publication of their article important first step: "The work... is better seen as the discovery of the first fragment of the much larger Rosetta Stone needed to decipher the alternative messages of our genome." According to these scientists, future research will undoubtedly improve their knowledge of this new code. At the end of their article, they mention evolution in passing, and they do it in a very unusual way. They say, “That doesn't mean that evolution created these codes. This means that progress will require an understanding of how the codes interact. Another surprise was that the degree of conservation observed to date raises the question of the possible existence of "species-specific codes".

The code probably works in every single cell, and therefore must be responsible for more than 200 types of mammalian cells. It also has to cope with a huge variety of alternative splicing schemes, not to mention simple solutions on the inclusion or skipping of a single exon. The limited evolutionary retention of regulation of alternative splicing (estimated to be about 20% between humans and mice) raises the question of the existence of species-specific codes. Moreover, the relationship between DNA processing and gene transcription influences alternative splicing, and recent evidence points to DNA packaging by histone proteins and histone covalent modifications (the so-called epigenetic code) in the regulation of splicing. Therefore, future methods will have to establish the exact interaction between the histone code and the splicing code. The same applies to the still little understood influence of complex RNA structures on alternative splicing.

Codes, codes and more codes. The fact that scientists say almost nothing about Darwinism in these papers indicates that evolutionary theorists, adherents of old ideas and traditions, have a lot to think about after they read these papers. But those who are enthusiastic about the biology of codes will be at the forefront. They have wonderful opportunity take advantage of a fun web application that the codebreakers have created to encourage further exploration. It can be found on the University of Toronto website called "Alternative Splicing Prediction Website". Visitors will look in vain for mention of evolution here, despite the old axiom that nothing in biology makes sense without it. The new 2010 version of this expression might sound like this: "Nothing in biology makes sense unless viewed in the light of computer science" .

Links and notes

We're glad we were able to tell you about this story on the day it was published. Perhaps this is one of the most significant scientific articles of the year. (Of course, every big discovery made by other groups of scientists, like the discovery of Watson and Crick, is significant.) The only thing we can say to this is: “Wow!” This discovery is a remarkable confirmation of Designed Creation and a huge challenge to the Darwinian empire. It is interesting how evolutionists will try to correct their simplified history of random mutations and natural selection, which was invented back in the 19th century, in the light of these new data.

Do you understand what Tejedor and Valcarcel are talking about? Views can have their own code specific to those views. “Therefore, future methods will have to establish the exact interaction between the histone [epigenetic] code and the splicing code,” they note. In translation, this means: “Darwinists have nothing to do with it. They just can't handle it." If the simple genetic code of Watson-Crick was a problem for the Darwinists, then what do they say now about the splicing code, which creates thousands of transcripts from the same genes? And how will they deal with the epigenetic code that controls gene expression? And who knows, maybe in this incredible “interaction” that we are just beginning to learn about, other codes are involved, reminiscent of the Rosetta Stone, just beginning to emerge from the sand?

Now that we're thinking about codes and computer science, we're starting to think about different paradigms for new research. What if the genome partially acts as a storage network? What if cryptography takes place in it or compression algorithms occur? We should remember about modern information systems and information storage technologies. Maybe we will even find elements of steganography. Undoubtedly, there are additional mechanisms resilience, such as duplications and corrections, which may help explain the existence of pseudogenes. Whole genome copying may be a response to stress. Some of these phenomena may prove to be useful indicators of historical events that have nothing to do with a universal common ancestor, but help explore comparative genomics within informatics and resistance design, and help understand the cause of a disease.

Evolutionists find themselves in a major quandary. The researchers tried to modify the code, but got only cancer and mutations. How are they going to navigate the field of fitness when it's all mined with catastrophes waiting in the wings as soon as someone starts tampering with these inextricably linked codes? We know that there is some built-in resilience and portability, but the whole picture is an incredibly complex, designed, optimized information system, not a jumble of pieces that can be played around endlessly. The whole idea of ​​code is the concept of intelligent design.

A.E. Wilder-Smith emphasized this. The code assumes an agreement between the two parts. An agreement is an agreement in advance. It implies planning and purpose. The SOS symbol, as Wilder-Smith would say, we use by convention as a distress signal. SOS does not look like a disaster. It doesn't smell like a disaster. It doesn't feel like a disaster. People would not understand that these letters stand for disaster if they did not understand the essence of the agreement itself. Similarly, an alanine codon, HCC, does not look, smell, or feel like alanine. A codon would have nothing to do with alanine unless there was a pre-established agreement between the two coding systems (protein code and DNA code) that "GCC should stand for alanine." To convey this agreement, a family of transducers, aminoacyl-tRNA synthetases, are used, which translate one code into another.

This was to strengthen the theory of design in the 1950s, and many creationists preached it effectively. But evolutionists are like eloquent salesmen. They made up their tales about the Tinker Bell fairy, who deciphers the code and creates new species through mutation and selection, and convinced many people that miracles can still happen today. Well, well, today is the 21st century outside the window and we know the epigenetic code and the splicing code - two codes that are much more complex and dynamic than the simple code of DNA. We know about codes within codes, about codes above codes and below codes - we know whole hierarchy codes. This time, evolutionists can't just put their finger in a gun and bluff us with their beautiful speeches when guns are placed on both sides - a whole arsenal aimed at their main structural elements. All this is a game. A whole era of computer science has grown around them, they have long gone out of fashion and look like the Greeks, who are trying to climb modern tanks and helicopters with spears.

Sad to admit, evolutionists don't understand this, or even if they do, they're not going to give up. By the way, this week, just as the article about the Splicing Code was published, the most angry and hated for recent times rhetoric against creationism and intelligent design. We have more to hear about similar examples. And as long as they hold the microphones in their hands and control the institutions, many people will fall for them, thinking that science continues to give them a good reason. We are telling you all this so that you will read this material, study it, understand it, and stock up on the information you need in order to combat this fanatical, misleading nonsense with the truth. Now, go ahead!

The series of articles describing the origins of the Civil Code can be regarded as an investigation of events about which we have very few traces. However, understanding these articles requires a bit of effort to understand the molecular mechanisms of protein synthesis. This article is the introductory article for a series of auto-publications devoted to the origin of the genetic code, and it is the best place to start acquaintance with this topic.
Usually genetic code(GC) is defined as a method (rule) of encoding a protein on the primary structure of DNA or RNA. In the literature, it is most often written that this is a one-to-one correspondence of a sequence of three nucleotides in a gene to one amino acid in the synthesized protein or the end point of protein synthesis. However, there are two errors in this definition. This implies 20 so-called canonical amino acids, which are part of the proteins of all living organisms without exception. These amino acids are protein monomers. The errors are the following:

1) Canonical amino acids are not 20, but only 19. We can call an amino acid a substance that simultaneously contains an amino group -NH 2 and a carboxyl group - COOH. The fact is that the protein monomer - proline - is not an amino acid, since it contains an imino group instead of an amino group, so it is more correct to call proline an imino acid. However, in the future, in all articles on HA, for convenience, I will write about 20 amino acids, implying the indicated nuance. The amino acid structures are shown in fig. one.

Rice. 1. Structures of canonical amino acids. Amino acids have constant parts, marked in black in the figure, and variable (or radicals), marked in red.

2) The correspondence of amino acids to codons is not always unambiguous. See below for the violation of uniqueness cases.

The occurrence of HA means the occurrence of encoded protein synthesis. This event is one of the key ones for the evolutionary formation of the first living organisms.

The structure of the HA is presented in a circular form in fig. 2.



Rice. 2. Genetic code in a circular shape. The inner circle is the first letter of the codon, the second a circle - the second letter of the codon, the third circle - the third letter of the codon, the fourth circle - designations of amino acids in a three-letter abbreviation; P - polar amino acids, NP - non-polar amino acids. For clarity of symmetry, the chosen order of symbols is important U-C-A-G.

So, let's proceed to the description of the main properties of HA.

1. Tripletity. Each amino acid is encoded by a sequence of three nucleotides.

2. The presence of intergenetic punctuation marks. Intergenic punctuation marks include nucleic acid sequences on which translation begins or ends.

Translation I can not begin with any codon, but only with a strictly defined - starting. The start codon is the AUG triplet, which starts translation. In this case, this triplet encodes either methionine or another amino acid, formylmethionine (in prokaryotes), which can only be switched on at the beginning of protein synthesis. At the end of each gene encoding a polypeptide is at least one of the 3 termination codons, or brake lights: UAA, UAG, UGA. They terminate translation (the so-called protein synthesis on the ribosome).

3. Compactness, or the absence of intragenic punctuation marks. Within a gene, each nucleotide is part of a significant codon.

4. Non-overlapping. Codons do not overlap with each other, each has its own ordered set of nucleotides, which does not overlap with similar sets of neighboring codons.

5. Degeneracy. The reverse correspondence in the amino acid-codon direction is ambiguous. This property is called degeneracy. Series is a set of codons encoding one amino acid, in other words, it is a group equivalent codons. Think of a codon as XYZ. If XY defines “meaning” (i.e. amino acid), then the codon is called strong. If a certain Z is needed to determine the meaning of a codon, then such a codon is called weak.

The degeneracy of the code is closely related to the ambiguity of the codon-anticodon pairing (an anticodon means a sequence of three nucleotides on a tRNA that can complementarily pair with a codon on messenger RNA (see two articles on this in more detail: Molecular Mechanisms for Ensuring Code Degeneracy and Lagerquist's rule. Physico-chemical substantiation of symmetries and Rumer's relations). One anticodon per tRNA can recognize one to three codons per mRNA.

6.Unambiguity. Each triplet encodes only one amino acid or is a translation terminator.

There are three known exceptions.

First. In prokaryotes in the first position ( capital letter) it encodes formylmethionine, and in any other - methionine. At the beginning of the gene, formylmethionine is encoded both by the usual methionine codon AUG, and also by the valine codon GUG or leucine UUG, which inside the gene encode valine and leucine, respectively.

In many proteins, formylmethionine is cleaved off or the formyl group is removed, as a result of which formylmethionine is converted to ordinary methionine.

Second. In 1986, several groups of researchers at once discovered that on mRNA, the UGA termination codon can encode selenocysteine ​​(see Fig. 3), provided that it is followed by a special nucleotide sequence.

Rice. 3. The structure of the 21st amino acid - selenocysteine.

At E. coli(this is the Latin name for Escherichia coli) selenocysteyl-tRNA during translation and recognizes the UGA codon in mRNA, but only in a certain context e: for the recognition of the UGA codon as meaningful, the sequence of 45 nucleotides long, located after the UGA codon, is important.

The considered example shows that, if necessary, a living organism can change the meaning of the standard genetic code. In this case genetic information, enclosed in genes, is encoded in a more complex way. The meaning of a codon is determined in the context of e with a certain long sequence of nucleotides and with the participation of several highly specific protein factors. It is important that selenocysteine ​​tRNA was found in representatives of all three branches of life (archaea, eubacteria and eukaryotes), which indicates the antiquity of the origin of selenocysteine ​​synthesis, and possibly its presence in the last universal common ancestor (it will be discussed in other articles). Most likely, selenocysteine ​​is found in all living organisms without exception. But in each individual organism, selenocysteine ​​is found in no more than a couple of dozens of proteins. It is part of active centers enzymes, in a number of homologues of which ordinary cysteine ​​can function at a similar position.

Until recently, it was believed that the UGA codon could be read either as selenocysteine ​​or as a terminal, but recently it has been shown that in ciliates Euplotes The UGA codon codes for either cysteine ​​or selenocysteine. Cm. " Genetic code allows for inconsistencies"

Third exception. In some prokaryotes (5 species of archaea and one eubacterium - the information on Wikipedia is very outdated) there is a special acid - pyrrolysine (Fig. 4). It is encoded by the UAG triplet, which in the canonical code serves as a translation terminator. It is assumed that in this case, like the case with coding for selenocysteine, the reading of UAG as a pyrrolysine codon occurs due to a special structure on the mRNA. Pyrrolysine tRNA contains the anticodon CTA and is aminoacylated by class 2 APCases (for the classification of APCases, see the article "Codases help to understand how genetic code ").

UAG is rarely used as a stop codon, and if it is, it is often followed by another stop codon.

Rice. 4. Structure of the 22nd amino acid of pyrrolysine.

7. Versatility. After the decoding of the Civil Code was completed in the mid-60s of the last century, long time it was believed that the code is the same in all organisms, which indicates the unity of the origin of all life on Earth.

Let's try to understand why the GC is universal. The fact is that if at least one coding rule were changed in the body, this would lead to the fact that the structure of a significant part of the proteins changed. Such a change would be too dramatic and therefore almost always lethal, since a change in the meaning of only one codon can affect, on average, 1/64 of all amino acid sequences.

One very important thought follows from this - the HA has hardly changed since its formation more than 3.5 billion years ago. And, therefore, its structure bears a trace of its occurrence, and the analysis of this structure can help to understand how exactly the GC could arise.

In reality, HA may differ slightly in bacteria, mitochondria, the nuclear code of some ciliates and yeasts. Currently, there are at least 17 genetic codes that differ from the canonical one by 1-5 codons. In total, in all known variants of deviations from the universal GC, 18 different substitutions of the sense a codon are used. Most deviations from the standard code are known in mitochondria - 10. It is noteworthy that the mitochondria of vertebrates, flatworms, echinoderms are encoded by different codes, and mold fungi, protozoa and coelenterates - by one.

The evolutionary closeness of species is by no means a guarantee that they have similar GCs. The genetic codes may differ even among different types mycoplasmas (some species have a canonical code, while others are different). A similar situation is observed for yeast.

It is important to note that mitochondria are descendants of symbiotic organisms that have adapted to live inside cells. They have a highly reduced genome, some of the genes have moved to the cell nucleus. Therefore, changes in the HA in them are no longer so dramatic.

The exceptions discovered later are of particular interest from an evolutionary point of view, as they can help shed light on the mechanisms of code evolution.

Table 1.

Mitochondrial codes in various organisms.

codon

Universal code

Mitochondrial codes

Vertebrates

Invertebrates

Yeast

Plants

UGA

STOP

trp

trp

trp

STOP

AUA

ile

Met

Met

Met

ile

CUA

Leu

Leu

Leu

Thr

Leu

AGA

Arg

STOP

Ser

Arg

Arg

AGG

Arg

STOP

Ser

Arg

Arg

Three mechanisms for changing the amino acid encoded by the code.

The first is when some codon is not used (or almost not used) by some organism due to the uneven occurrence of some nucleotides (GC-composition), or combinations of nucleotides. As a result, such a codon may completely disappear from use (for example, due to the loss of the corresponding tRNA), and in the future it can be used to code for another amino acid without causing significant damage to the body. This mechanism is probably responsible for the appearance of some dialects of codes in mitochondria.

The second is the transformation of the stop codon into the meaning of the new one. In this case, some of the translated proteins may have additions. However, the situation is partially saved by the fact that many genes often end with not one, but two stop codons, since translation errors are possible, in which stop codons are read as amino acids.

The third is the possible ambiguous reading of certain codons, as occurs in some fungi.

8 . Connectivity. Groups of equivalent codons (that is, codons that code for the same amino acid) are called series. The GC contains 21 series, including stop codons. In what follows, for definiteness, any group of codons will be called liaison, if from each codon of this group it is possible to pass to all other codons of the same group by successive nucleotide substitutions. Of the 21 series, 18 are connected. 2 series contain one codon each, and only 1 series for the amino acid serine is unconnected and splits into 2 connected subseries.


Rice. 5. Connectivity graphs for some code series. a - connected series of valine; b - connected series of leucine; the serine series is unrelated, splitting into two connected subseries. The figure is taken from an article by V.A. Ratner " Genetic code like a system."

The property of connectivity can be explained by the fact that during the period of formation, the HA captured new codons that minimally differed from those already used.

9. Regularity properties of amino acids by the roots of triplets. All amino acids encoded by U triplets are non-polar, not of extreme properties and sizes, and have aliphatic radicals. All C-root triplets have strong bases, and the amino acids encoded by them are relatively small. All triplets with root A have weak bases and encode non-small polar amino acids. G-root codons are characterized by extreme and abnormal variants of amino acids and series. They encode the smallest amino acid (glycine), the longest and flattest (tryptophan), the longest and "clumsiest" (arginine), the most reactive (cysteine), and form an abnormal subset for serine.

10. Blockiness. The universal CC is a "block" code. This means that amino acids with similar physicochemical properties are encoded by codons that differ from each other by one base. The blockiness of the code is clearly visible in the following figure.


Rice. 6. Block structure of the Civil Code. White color indicates amino acids with an alkyl group.


Rice. 7. Color representation of the physicochemical properties of amino acids based on the values ​​described in the bookStyers "Biochemistry". Left - hydrophobicity. On the right, the ability to form an alpha helix in a protein. Red, yellow and blue colors indicate amino acids with high, medium and low hydrophobicity (left) or the corresponding degree of ability to form an alpha helix (right).

The property of blockiness and regularity can also be explained by the fact that during the period of formation, the HA captured new codons that minimally differed from those already used.

Codons with the same first base (codon prefix) code for amino acids with similar biosynthetic pathways. The codons of amino acids belonging to the shikimate, pyruvate, aspartate, and glutamate families have prefixes U, G, A, and C, respectively. For the pathways of the ancient biosynthesis of amino acids and its connection with the properties of the modern code, see "Ancient doublet genetic code was predetermined by the pathways for the synthesis of amino acids. "Based on these data, some researchers conclude that the formation of the code was greatly influenced by the biosynthetic relationships between amino acids. However, the similarity of biosynthetic pathways does not at all mean the similarity of physicochemical properties.

11. Noise immunity. In the very general view The noise immunity of GA means that random point mutations and translational errors do not change the physicochemical properties of amino acids very much.

The replacement of one nucleotide in a triplet in most cases either does not lead to a replacement of the encoded amino acid, or leads to a replacement with an amino acid with the same polarity.

One of the mechanisms that ensure the noise immunity of the GK is its degeneracy. The average degeneracy is - number of encoded signals/total number of codons, where encoded signals include 20 amino acids and the translation termination sign. The average degeneracy for all amino acids and the termination sign is three codons per encoded signal.

In order to quantify noise immunity, we introduce two concepts. Mutations of nucleotide substitutions that do not lead to a change in the class of the encoded amino acid are called conservative. Nucleotide substitution mutations that change the class of the encoded amino acid are called radical .

Each triplet allows 9 single substitutions. There are 61 triplets encoding amino acids in total. Therefore, the number possible replacements nucleotides for all codons -

61 x 9 = 549. Of these:

23 nucleotide substitutions result in stop codons.

134 substitutions do not change the encoded amino acid.
230 substitutions do not change the class of the encoded amino acid.
162 substitutions lead to a change in the amino acid class, i.e. are radical.
Of the 183 substitutions of the 3rd nucleotide, 7 lead to the appearance of translation terminators, and 176 are conservative.
Of the 183 substitutions of the 1st nucleotide, 9 lead to the appearance of terminators, 114 are conservative and 60 are radical.
Of the 183 substitutions of the 2nd nucleotide, 7 lead to the appearance of terminators, 74 are conservative, and 102 are radical.

Based on these calculations, we obtain a quantitative estimate of the noise immunity of the code, as the ratio of the number of conservative replacements to the number of radical replacements. It is equal to 364/162=2.25

In a real assessment of the contribution of degeneracy to noise immunity, it is necessary to take into account the frequency of occurrence of amino acids in proteins, which varies in different species.

What is the reason for the noise immunity of the code? Most researchers believe that this property is a consequence of the selection of alternative HAs.

Stephen Freeland and Lawrence Hurst generated random such codes and found out that only one of the hundred alternative codes has no less noise immunity than the universal GC.
Even more interesting fact was discovered when these researchers introduced an additional constraint to account for actual trends in DNA mutation patterns and translational errors. Under such conditions, ONLY ONE CODE FROM A MILLION POSSIBLE turned out to be better than the canonical code.
Such an unprecedented vitality of the genetic code is most easily explained by the fact that it was formed as a result of natural selection. Perhaps at one time in the biological world there were many codes, each with its own sensitivity to errors. The organism that coped better with them was more likely to survive, and the canonical code simply won the struggle for existence. This assumption seems quite realistic - after all, we know that alternative codes do exist. For more information about noise immunity, see Coded Evolution (S. Freeland, L. Hurst "Code Evolution".//In the world of science. - 2004, No. 7).

In conclusion, I propose to count the number of possible genetic codes that can be generated for 20 canonical amino acids. For some reason this number never came across to me. So, we need 20 amino acids and a stop signal encoded by AT LEAST ONE CODON in the generated GCs.

Mentally, we will number the codons in some order. We will reason as follows. If we have exactly 21 codons, then each amino acid and stop signal will occupy exactly one codon. In this case, there will be 21 possible GCs!

If there are 22 codons, then an extra codon appears, which can have one of any 21 meanings, and this codon can be located in any of the 22 places, while the remaining codons have exactly one different meaning y, as in the case of 21 codons. Then we get the number of combinations 21!x(21x22).

If there are 23 codons, then arguing similarly, we get that 21 codons have exactly one different meaning of s (21! options), and two codons have 21 different meanings of a (21 2 meanings of s at a FIXED position of these codons). The number of different positions for these two codons will be 23x22. Total number GK variants for 23 codons - 21!x21 2x23x22

If there are 24 codons, then the number of GCs will be 21!x21 3 x24x23x22, ...

....................................................................................................................

If there are 64 codons, then the number of possible GCs will be 21!x21 43x64!/21! = 21 43 x64! ~ 9.1x10 145

Today it is no secret to anyone that the life program of all living organisms is written on the DNA molecule. The easiest way to think of a DNA molecule is as a long ladder. The vertical uprights of this ladder are made up of molecules of sugar, oxygen, and phosphorus. All the important working information in the molecule is recorded on the rungs of the ladder - they consist of two molecules, each of which is attached to one of the uprights. These nitrogenous base molecules are called adenine, guanine, thymine, and cytosine, but they are usually referred to simply by the letters A, G, T, and C. The shape of these molecules allows them to form bonds - finished steps - only certain type. These are the bonds between the bases A and T and between the bases G and C (the pair formed in this way is called "pair of reasons"). There can be no other types of bonds in the DNA molecule.

Going down the steps along one strand of the DNA molecule, you get the sequence of bases. It is this message in the form of a sequence of bases that determines the flow of chemical reactions in the cell and, consequently, the characteristics of the organism that has this DNA. According to the central dogma of molecular biology, information about proteins is encoded on the DNA molecule, which, in turn, acting as enzymes ( cm. Catalysts and enzymes) regulate all chemical reactions in living organisms.

A strict correspondence between the sequence of base pairs in a DNA molecule and the sequence of amino acids that make up protein enzymes is called the genetic code. The genetic code was deciphered shortly after the discovery of the double-stranded structure of DNA. It was known that the newly discovered molecule informational, or matrix RNA (mRNA, or mRNA) carries information written on DNA. Biochemists Marshall W. Nirenberg and J. Heinrich Matthaei of the National Institutes of Health in Bethesda, Washington, DC, performed the first experiments that led to the unraveling of the genetic code.

They started by synthesizing artificial mRNA molecules consisting only of the repeating nitrogenous base uracil (which is analogous to thymine, "T", and forms bonds only with adenine, "A", from the DNA molecule). They added these mRNAs to test tubes with a mixture of amino acids, with only one of the amino acids in each tube labeled with a radioactive label. The researchers found that the mRNA artificially synthesized by them initiated protein formation in only one test tube, where the labeled amino acid phenylalanine was located. So they established that the sequence "-U-U-U-" on the mRNA molecule (and, therefore, the equivalent sequence "-A-A-A-" on the DNA molecule) encodes a protein consisting only of the amino acid phenylalanine. This was the first step towards deciphering the genetic code.

Today it is known that three base pairs of a DNA molecule (such a triplet is called codon) code for one amino acid in a protein. Performing experiments similar to the one described above, geneticists eventually deciphered the entire genetic code, in which each of the 64 possible codons corresponds to a specific amino acid.