How to Memorize the Genetic Code

Overview

The same genetic code, aside from a very few minor variations, is used by all life on Earth. The four nucleotides adenine, thymine, guanine, and cytosine act like "letters" of DNA, and are denoted A, T, G, and C. In RNA, thymine is replaced with uracil, abbreviated U. Every gene is made up of codons, groups of three nucleotides that stand for a single "letter" of a protein, any one of 20 commonly occurring amino acids, that are joined together head to tail to make the protein. They're called amino acids because they each have an acid end and an amino (alkaline) end, and they differ only by their side chains that branch out from the middle. Since there are 64 possible codons but only 20 amino acids (a few organisms have 21), there are more than enough codons to go around and most amino acids have two or four codons. These mnemonics mostly adhere to the RNA nucleotides of A, U, G, and C.


Table from Wikipedia. CC-BY-SA 3.0.

The genetic code can be represented as a grid divided into blocks, where all codons within a given block start with the same two nucleotides. A block contains four codons. Two of the nucleotides U (or T) and C are physically smaller molecules, while the other two A and G are larger. Some blocks are divided into two half blocks, where one half has codons that end in the smaller-molecule letters and the other has codons ending in larger-molecule letters. All that's important to remember is whenever there's a half block, one will end in U and C while the other ends in A and G, and then which amino acids occupy half blocks and which occupy full blocks.

Amino Acids

It's a good idea to be a little bit familiar with the 20 common amino acids before proceeding. You don't have to memorize everything about them yet, but it helps to know that each one has a designated letter just like the nucleotides do. The mnemonics are included in the table for completeness, but don't memorize them yet; we'll cover them in detail afterwards.

  Name Side Chain Mnemonic   Name Side Chain Mnemonic
A Alanine -CH3 Gassy. L Leucine -CH2CH(CH3)CH3 Luuse; see you, Lucy!
R Arginine -CH2CH2CH2NHC(NH2)=NH2+ See her go, Argo; argent twice. K Lysine -CH2CH2CH2CH2NH3+ Triple-A, 'k?
N Asparagine -CH2C(=O)NH2 AA battery's N terminal. M Methionine -CH2CH2CH2SCH3 Augment.
D Aspartic acid -CH2C(=O)O- Smaller acid, like peaches. F Phenylalanine -CH2-C6H5 (benzene ring) All of you are free.
C Cysteine -CH2SH UnderGround sulfur or ugcky. P Proline -CH2CH2CH2- (back to the amino end) Coil contorting.
E Glutamic acid -CH2CH2C(=O)O- Larger acid, like peaches. S Serine -CH2OH You see how serious; serious metals.
Q Glutamine -CH2CH2C(=O)NH2 Caaged bird eating gluten. T Threonine -CH(CH3)OH 309 volts of AC.
G Glycine -H G in, G out. W Tryptophan -CH2-C8H6N (indole rings) Ugly UGG.
H Histidine -CH2-C3H3N2 (imidazole ring) Catch the cat: hiss. Y Tyrosine -CH2-C6H4-OH Wow, two functional groups!
I Isoleucine -CH(CH3)CH2CH3 Hey you, Isolucy! V Valine -CH(CH3)CH3 Guv.

The Mnemonics

For convenience's sake, we will mostly go in the sequence U, C, A, G counting by the second letter and then the first letter.

You are all free: UUUF. The UUU half block is phenylalanine (F).

The other UU half block is luuse leucine.

Speaking of leucine: Imagine you're visiting your friend Lucy and it's time to go so you say "see you, Lucy." CU full block is leucine.

Say "hey you" to Lucy's sister Isolucy. AU full block is almost all isoleucine, with one exception:

AUG like "augment" is the start codon. It tells the ribosome to begin making protein if it hasn't started already. It also codes for methionine, so every protein starts with M.

GU full block: There is a joke where someone says "give us a copper, guv." V is valine, so give us a valine, GUV. GU full block is valine.

You see how serious it is. UC full block is serine.

Because of its relative rigidity, proline puts kinks in helices. Coil Contorting proline is CC full block. You can also remember it as CCCP, just know it's a full block.

How about 309 volts of alternating current? AC full block is threonine.

GC kinda sounds like "gassy", and natural gas is mostly methane, and alanine's side chain is a methyl group. GC full block is alanine.

UAU looks like "wow", as in wow, tyrosine has two functional groups! (A benzene ring and a hydroxyl.) UAU half block is tyrosine.

The other UA half block consists of Useless Agent Alpha and Useless Agent Gamma. They're so useless, they can't code for anything! They're stop codons, that is, they signal the end of the gene and tell the ribosome to stop making protein.

Using DNA letters for this one: when I tried to catch (CAC) the CAT, it hissed (histidine) at me.

CAA and CAG are the "caaged" half block. There's a parrot in the cage eating crackers, which are full of gluten, and the caaged half block codes for glutamine.

For the other amide, think of a AA battery, maybe one with a copper top since it is the C/U half block. But we're looking at the negative, or N, terminal. N is asparagine, and so are AAC and AAU.

Triple-A is lysine, 'k? K is the letter for lysine. This is the other AA half block.

GA forms two half blocks. Georgia peaches, like all fruits, contain acids. The smaller-molecule half block is the smaller-molecule aspartic acid, while the larger-molecule half block is larger glutamic acid.

Ugly UGG is the bulkiest amino acid (tryptophan).

I can stop this protein: UGA-chaka. (It's a stop codon.) Or, alternatively: stop that annoying and culturally insensitive chant!

As for the smaller-molecule UG half block, think Under Ground, which is where sulfur is mined from, and the smallest amino acid to have sulfur is cysteine. You can also think of it as an ugcky thiol, since cysteine is the only thiol amino acid, but be careful not to confuse ugcky with the UC full block, because that's seriously serine.

See her go, the great ship Argo, of ancient Greek legend. CG full block is arginine.

Silver (Ag) and copper (Cu) can be worth serious money. AGC and AGU are serine.

AGA and AGG are like Ag twice. In French silver is argent; AGA and AGG are arginine again.

GG full block is glycine. If your RNA strand is all GGGGGGG then the protein will be all GGGGGG; no other letter does this.

And now with these mnemonics, you can read any gene sequence of almost any organism and translate it to its protein sequence. No one actually has to do this, since we have computers that can do the translation for us with a lookup table, although at the end of this page I will demonstrate translating a small gene by memory. What we can do is look at a mutation and see what effect it has. Here are the first several codons of the human HBB gene, which codes for the globin protein, which complexes with the heme molecule to make hemoglobin, the protein that carries oxygen in the blood. Observe the difference between the normal and the variant sequences:

Normal:  AUG GUG CAU CUG ACU CCU GAG GAG AAG UCU GCC ...
Variant: AUG GUG CAU CUG ACU CCU GUG GAG AAG UCU GCC ...

It's hard to see, but the difference is in the seventh codon: GUG instead of GAG. We know that the normal sequence has GAG, and we recall the mnemonic that GA peaches contain acids, and since the last letter G is a larger nucleotide, we get the larger glutamic acid. For GUG we remember "give us a valine, Guv", therefore this variant has valine instead of glutamic acid.

You might have guessed by now what effect this has. While glutamic acid is very hydrophilic, i.e. it likes water and wants to face out away from the center of the protein, valine's side chain is very hydrophobic and prefers to surround itself with other hydrophobic side chains. In the case of HBB, this mutation causes the protein molecules to link up into a chain (they polymerize), rendering them insoluble and changing the shape of the red blood cell into what looks like a sickle. This is the mutation that causes sickle cell disease.

Other variants exist on this codon that also cause sickle cell disease, namely changing GAG (glutamic acid) to GGG (remember all G in, all G out, so glycine) or GCG (remember GC sounds like gassy, so methane, methyl group, so alanine). Glycine and alanine also have hydrophobic side chains, so they function similarly to valine in this case.


Now for the manual gene translation. Here's the sequence of the human OXT gene, the precursor to oxytocin, in mRNA format:

AUGGCCGGCCCCAGCCUCGCUUGCUGUCUGCUCGGCCUCCUGGCGCUGACCUCCGCCUGCUACAUCCAGAACUGCCCCCUGGGAGGCAAGAGGGCCGCG
CCGGACCUCGACGUGCGCAAGUGCCUCCCCUGCGGCCCCGGGGGCAAAGGCCGCUGCUUCGGGCCCAAUAUCUGCUGCGCGGAAGAGCUGGGCUGCUUC
GUGGGCACCGCCGAAGCGCUGCGCUGCCAGGAGGAGAACUACCUGCCGUCGCCCUGCCAGUCCGGCCAGAAGGCGUGCGGGAGCGGGGGCCGCUGCGCG
GUCUUGGGCCUCUGCUGCAGCCCGGACGGCUGCCACGCCGACCCUGCCUGCGACGCGGAAGCCACCUUCUCCCAGCGCUGA

It looks like a mouthful, but it's actually one of the smaller human genes. Going through it codon by codon, we get:

Codon Mnemonic Result
AUG Augment Start: M methionine
GCC Gassy A alanine
GGC G in, G out G glycine
CCC coil contortion P proline
AGC Ag+Cu serious S serine
CUC See you, Lucy! L leucine
GCU Gassy A alanine
UGC UnderGround sulfur. C cysteine
UGU UnderGround sulfur. C cysteine
CUG See you, Lucy! L leucine
CUC See you, Lucy! L leucine
GGC G in, G out G glycine
CUC See you, Lucy! L leucine
CUG See you, Lucy! L leucine
GCG Gassy A alanine
CUG See you, Lucy! L leucine
ACC 309 volts T threonine
UCC You see it's serious S serine
GCC Gassy A alanine
UGC UnderGround sulfur. C cysteine
UAC Same half block as wow. Y tyrosine
AUC Hey you, Isolucy. I isoleucine
CAG Caged bird eating gluten. Q glutamine
AAC AA battery, Cu top, N terminal. N asparagine
UGC UnderGround sulfur. C cysteine
CCC Coil contort P proline
CUG See you, Lucy! L leucine
GGA G in, G out. G glycine
GGC G in, G out. G glycine
AAG Same half block as triple-A K lysine
AGG Double argent. R arginine
GCC Gassy A alanine
GCG Gassy A alanine
CCG Coil contort. P proline
GAC Georgia peaches, acid, smaller mol. D aspartic acid
CUC See you. L leucine
GAC GA peaches, smaller acid. D aspartic acid
GUG Give us a valine, Guv! V valine
CGC See her go, Argo. R arginine
AAG Same as triple-A. K lysine
UGC UnderGround. C cysteine
CUC See you. L leucine
CCC Coil contorting. P proline
UGC UnderGround. C cysteine
GGC G in, G out. G glycine
CCC Coil contorting. P proline
GGG G in, G out. G glycine
GGC G in, G out. G glycine
AAA Triple-A. K lysine
GGC G in, G out. G glycine
CGC See her go. R arginine
UGC UnderGround. C cysteine
UUC All of you (half block). F phenylalanine
GGG G => G. G glycine
CCC Contort. P proline
AAU AA batt, Cu top. N asparagine
AUC Hey you! I isoleucine
UGC UnderGround. C cysteine
UGC UnderGround. C cysteine
GCG Gassy. A alanine
GAA GA peaches, larger acid. E glutamic acid
GAG GA peaches, larger acid. E glutamic acid
CUG See you! L leucine
GGC G => G G glycine
UGC UnderGround. C cysteine
UUC All of U. F phenylalanine
GUG Guv. V valine
GGC G => G G glycine
ACC 309V AC. T threonine
GCC Gassy. A alanine
GAA GA larger acid. E glutamic acid
GCG Gassy. A alanine
CUG See you! L leucine
CGC See her go. R arginine
UGC UnderGround. C cysteine
CAG Caged bird eating gluten. Q glutamine
GAG GA larger acid. E glutamic acid
GAG GA larger acid. E glutamic acid
AAC AA battery. N asparagine
UAC Wow! Y tyrosine
CUG See you! L leucine
CCG Contort. P proline
UCG You see how serious. S serine
CCC Contort. P proline
UGC UnderGround. C cysteine
CAG Caged. Q glutamine
UCC You see. S serine
GGC G => G. G glycine
CAG Caged. Q glutamine
AAG Triple-A half block. K lysine
GCG Gassy. A alanine
UGC UnderGround. C cysteine
GGG G => G. G glycine
AGC Ag+Cu serious. S serine
GGG G => G. G glycine
GGC G => G. G glycine
CGC See her go. R arginine
UGC UnderGround. C cysteine
GCG Gassy. A alanine
GUC Guv. V valine
UUG Luuse. L leucine
GGC G => G. G glycine
CUC See you! L leucine
UGC UnderGround. C cysteine
UGC UnderGround. C cysteine
AGC Ag+Cu. S serine
CCG Contort. P proline
GAC GA smaller acid. D aspartic acid
GGC G => G. G glycine
UGC UnderGround. C cysteine
CAC Tried to catch the cat. H histidine
GCC Gassy. A alanine
GAC GA smaller acid. D aspartic acid
CCU Contort. P proline.
GCC Gassy. A alanine
UGC UnderGround. C cysteine
GAC GA smaller acid. D aspartic acid
GCG Gassy. A alanine
GAA GA larger acid. E glutamic acid
GCC Gassy. A alanine
ACC 309V AC. T threonine
UUC All of U half block. F phenylalanine
UCC You see. S serine
CAG Caged bird. Q glutamine
CGC See her go. R, arginine
UGA Stop that chant. Stop codon.

Therefore, this gene encodes a protein with the following sequence:

MAGPSLACCLLGLLALTSACYIQNCPLGGKRAAPDLDVRKCLPCGPGGKGRCFGPNICCAEELGCFVGTAEALRCQEENYLPSPCQSGQKACGSGGRCAV
LGLCCSPDGCHADPACDAEATFSQR

...which we can see is correct because it is identical to the translation about ¾ of the way down the NCBI page.