January 2018

You can find a pdf with answers to the quest here. Read More…

# More genetics

## Single Locus

As we've learned, if there are two alleles at one locus not on sex chromosome, the hybrid cross (Aa x Aa) must produce the 1:2:1 assortment. However, the 3:1 dominance is only predicted if there is a simple dominance relationship. aa x aa is boring. The other three types of crosses are shown below, AA x aa; Aa x aa and Aa x Aa

Here's how that maps out for a simple dominance on the X chromosome:

For incomplete dominance, the case is interesting because the heterozygous phenotype ONLY shows up in females.

## Multiple Loci

The first think is how do we deal with multiple loci? There is a way that is done purely with math…that's what we do when there are more than two loci. However, I'm going to show you how to do it with a punnet square.
4x4 matrix here. It is also shown below.

The expected 9:3:3:1 ratio assumes independent assortment (not on the same chromosome). We can use that prediction to ask whether there is linkage (are they really on the same chromosome) and test the model out using a statistical test we will cover next.

## Some variants in inheritance.

Multi-gene trait.
There's really not much to this. Some traits…really most interesting traits…may be inherited, but be based on more than one gene. So, there is no single gene for being tall. There is no allele for being 6' tall. But, that doesn't mean that height is not inherited. Many different loci may contribute to it. Here is a hypothetical plot of what you might expect if there are three loci that contribute to height, each with two alleles where the dominant of each allele contributes positively to height (note, there is no reason any of those assumptions should be true). You might see a distribution that looks something like this:

So, you can get a distribution of heights based on some combination of alleles at different loci. In this case, we are assuming simple additive interactions among the genes. It an be more complex (See below(.

Pleiotropy
This is sort of the opposite of multi-gene trait. Here, one gene can affect many traits. We've discussed this in the context of cytoskeletal proteins before. For example, mutations to a microtubule-associated motor protein could affect male fertility (sperm flagellum), airway function (cilia on airway epithelium) and vesicle transport and secretion of proteins. Everywhere that protein is needed, you would see some effect.

Epistasis
Finally, there is epistasis. Proteins interact with other proteins, so variations in one gene can affect how you see the phenotype caused by another. Here is a classic example I stole from another website at the university of Georgia. It covers coat color in.
Labrador retrievers. One locus, the B locus, controls the color of the pigment eumelanin. Eumelanin can be either brown in color (bb) or black BB or Bb.
Another gene, known as the "E" locus (for extensor…never mind) is needed to deposit eumelanin in the fur of the dog. It encodes a protein called MC1R and where it is expressed determines whether the eumelanin gets into the fur. The ee homozygote does not deposit eumelanin at all in the fur while Ee or EE do.
Thus, if the dog is ee, it will be yellow no matter whether it makes black or brown melanin. All the possibilities on the 4x4 matrix are shown below.

# Introduction to Genetics

(Key terms will be in
italics).
We learned about replication. Sometimes mistakes are made that affect a protein. These could be in the coding sequence or the regulatory sequences. That gives rise to variations. Let's look at some examples.

## Types of small mutations.

Some changes to DNA are large rearrangements like deletions, insertions (such as the type in exon shuffling that can lead to new protein functions). But, also there are small, single-nucleotide changes, such as the one cited in the sickle cell disease question early in the year. Here are small alterations to DNA that can affect the protein made…and therefore the appearance of the individual:

### Mis-sense mutation.

This is where a nucleotide is substituted and results in the wrong amino acid being encoded in the protein. An example of this is the glutamic acid (GAG) to Valine (GUG) in sickle cell disease. In the "THECATWASBADTHEDAYSHEBITTHEDOG" case, it might be "THE
You can still read it, but the sense of it has changed.
Frame-shift mutation: this inserts or deletes one or two bases so that you are no longer reading in the correct frame. So, "THE
TCATWASBADTHEDAYSHEBITTHEDOG" after THE, you read TCA TWA….everything is messed up after that.

Finally, there is something called a nonsense mutation, which is when a stop codon is created. So, if the UUG codon for Tryptophan was changed to UUA, it result in the protein being terminated early. Note that the frame shift mutations usually result in encountering a stop in the new frame pretty quickly.

You can also have "synonymous" mutations: a change to the DNA that does not change the protein. For example since GUA and GUU both encode valine, that switch would not change the protein.

This takes into account only mutations to the coding region. There are also mutations to promoters and other regulatory sequences, such as for blue eyes below.

One more thing, I forgot to mention signal peptides. These are sequences of amino acids at the amino terminus of the newly made protein that tell the ribosome that the protein is destined to be secreted or made in the membrane. It shouldn't surprise you that, again, we need both the structural data and signals to tell the machinery what to do.

### From Today.

If you get this part, you can read through quickly. But, make sure you are getting it.
First, a general observation: offspring tend to look like parents. No surprises there. We say that
traits are general descriptions of how an organism appears or behaves and that these traits are inherited from parents. The study of inheritance is called genetics.
What is a gene? I will use two definitions here, both of which are incomplete.
• A gene is the unit of inheritance. It is the thing that is passed from parent to offspring that results in the inherited trait.
Since we now know that the information in living things is encoded in DNA and that most traits are mediated by proteins (which are encoded by DNA) I can add a second definition.
• A gene is a stretch of DNA that encodes a protein and the regulatory sequences that tell the cell when and where the gene is turned “on” and the protein is made.
There are a couple of points here. First, since a gene is a physical sequence of DNA, a gene must therefore have a physical location in the long double strand of DNA we call a chromosome. Second, all genes are not “on” in all cells at all times. The thing that makes one cell in your body a liver cell and another a muscle cell is that different proteins are made in different amounts. All those structural variations you studied in the beginning of the year to know how each cell is different come down to which genes are “on” and “off.”

The physical location on a chromosome that contains a particular gene is called a
locus (plural, “loci”). So, let’s pick blue eyes versus brown eyes as an example. It should be said that the details of the genetics of eye color are much more complicated than you may have been previously told. You can find a really good description of how it works here. However, there is a major gene involved in eye color, called “OCA2.” I’ll tell you what the protein it encodes is later. It is encoded by a stretch of DNA on chromosome 15. Like all humans, you have two copies of chromosome 15, one from Mom and one from Dad. A “map” of the chromosome is below. It is approximately 100,000,000 base pairs long and the OCA2 gene is located from base pair 28,000,020 to base pair 28,344,457 on chromosome 15. On every person’s chromosome 15, the gene for OCA2 is in that spot. This is its locus.

However, even though every one of us has a copy of OCA2 at the same spot on each of the two copies of chromosome 15, the exact sequence of the DNA may be different (due to inherited changes to the sequence in the DNA called
mutations).
The different forms of the gene are called “
alleles.” There is an allele associated with brown eyes and one with blue.

What does OCA2 do? Well, it encodes a protein that transports an amino acid called tyrosine, which is converted by a specific enzyme into a dark brown pigment called “melanin.” Melanin is the source of the brown color in our skin or hair, and increased melanin in cells in our skin leads to us getting a “tan.” There is no “blue” pigment in my eyes…just the lack of brown pigment in the front of my iris. If you were to cut into my eye, you would find I have melanin in the back of the iris (though, please don’t).
The problem with my eyes that leads to them being blue is
where the transport protein is expressed…or, where it isn’t. The mutation associated with blue eyes is in the regulatory (promoter region) sequences around the sequence that encodes the protein. The transport protein is not expressed in the front part of the iris, no melanin gets put there and my eyes are blue.

So, let’s over simplify eye color and say that there is only “brown” and “blue.” Brown results from the transport protein being made in the front of the iris as well as in the back, so melanin is in both places.
Blue eyes result when the transport protein is not expressed in the front of the eye and less melanin is put there. (again, over simplified).

Blue eyes are recessive and brown eyes are dominant.
Consider this: suppose you have people that come in two varieties: ones that pour water on the floor and ones that don’t. If you have two people who don’t pour water in a room, the floor is dry. If you have two people who do pour water on the floor, the floor is wet. If you have one of each…the floor is still wet. The effect of the person pouring water on the floor is
dominant to the effect of not pouring water.

Let’s suppose you got one copy of the “blue” allele from your mom and one copy for the “brown” allele from dad. Since the two copies of the
locus contain different alleles, you are said to be heterozygous (having different alleles at the each of the two versions of that locus).
The copy you got from mom does not result in the transporter being made in the front of the eye…but the one from dad does, so you have the transporter there and the eyes are brown. The blue allele is busily not doing anything…but no one notices because the brown allele makes the protein. That’s why “blue” is recessive.
In order to have blue eyes, you have to have both copies of the gene be the form that does not function in the eye (just like to have a dry floor, you have to have both people in the room not pouring water on the floor). If you have two copies of the “blue” allele, no transporter, therefore no melanin in the front of the eye, therefore blue.
You are “
homozygous” for the recessive allele. If we abbreviate the allele that results in expression of the transporter as “b” and the allele that results in proper expression in the front of the iris as “B,” you can have three possible arrangements of the alleles at your two loci.
B,B (
homozygous for the dominant brown eyes);
B,b (
heterozygous, one of each allele);
b,b (
homozygous for the recessive allele)

genotype.” Genotype tells you what two alleles at a particular locus you have.
BB and Bb both result in brown eyes. Brown eyes is the “
phenotype,” the person has. Phenotype describes the actual trait.
Only the homozygous recessive (bb) results in blue eyes.

## Example Punnet squares involving a single locus:

#### Two "true breeding" populations.

"True breeding" is code for homozygous. Do you see why?

AA x aa

 Male/female A A a Aa Aa a Aa Aa

#### Heterozygous by heterozygous

Aa x Aa

 Male/female A a A AA Aa a Aa aa

You get the 1:2:1 distribution of genotypes. IF it is a trait inherited by simple dominance, then you get 3:1 dominant/recessive.

#### Heterozygous by homozygous recessive:

Aa x aa

 Male/female A a a Aa aa a Aa aa
This is often a really useful one. You get 1:1 heterozygous: homozygous recessive. This is sometimes called a "test cross," because it is an easy way to see if your fly expressing dominant trait is heterozygous or homozygous (compare the results of this cross to the first case of AA x aa above).

# Translation

## Key Words

Ribosome: the machine that synthesizes protein by translating the code of the mRNA (with the aid of tRNA). It has a small and large subunit and is made mainly of RNA.
1. tRNA: Transfer RNA is the adaptor molecule that ferries the amino acid to the ribosome. It has an anticodon; a three-base sequence that reads the codon.
2. Codon: three-base sequence on the mRNA that encodes an amino acid
3. Anticodon: three-base sequence on the tRNA that reads the codon. It is complimentary to it.
4. “A-site”: Anterior (or "Amino-acyl) site on the ribosome. This is the site where the tRNA enters with the amino acid linked to it's 3' CCA sequence.
5. “P-site”: Posterior site (or "peptidyl" site). This is the site in the ribosome where the tRNA with the growing protein chain is attached.
6. “E-site”: exit site on the ribosome.
7. Start Codon: the initiator tRNA, which holds the first amino acid reads the codon AUG and carries the amino acid Methionine (Met, or “M”). The anticodon for the start codon is 5'CAU (think about).
8. Stop Codon: there are three codons that tell the ribosome to stop translation (UAG; UAA, UGA).
9. Reading Frame: Since the code is read in groups of three, non-overlapping bases, Any stretch of mRNA has three possible reading frames. Only one reading frame at a time is used.
10. ORF, or Open Reading Frame: A stretch of codons that starts with an AUG and ends with a stop codon and therefore can encode a protein. While I usually only write out a few, a typical ORF would encode hundreds of amino acids. Collagen, for example, is a large protein and is 1400 amino acids or so long.

## Overview

I’m going to draw a bit on wikipedia for this. Here is a figure from them:

And here is a link to the video from HHMI. Translation in eukaryotes proceeds at about two amino acids per second. Bacteria is closer to 20 AA/second.

## Regulatory sequences

Recall that there is a 5' untranslated sequence on the mRNA. There you will find sequences that direct the ribosome to the start of the protein-coding sequence. In bacteria, that sequence is more important and well characterized. In Eukaryotes, the regulation of where to start is less well understood…but we are learning.
Consider the sequence of letters BATHECATWASBADTHEDAYSHEBITTHEDOGOT, you can find the meaning only by starting at the correct letter (In this case, the third letter: THE CAT WAS BAD THE DAY SHE BIT THE DOG). If you start with BAT…that works…but the rest is not meaningful: HEC ATW ASB ADO).
The AUG tells the ribosome: put a Methionine here and keep reading in this frame. After some long series of amino acids, the Ribosome will encounter a stop codon, and release the mRNA (which does require a protein "released factor") and the newly made protein. Multiple ribosomes can be reading a single mRNA at one time, lined up one after the other.
The Code:
You need a triplet codon because you need three nucleotides to get enough possible combinations to encode all 20 Amino acids (there are 64 possible combinations, three of them are stop codons). The rest of the code is given below. Note that most amino acids have more than one codon. We say the code is “degenerate,” for this reason.
Here is the general code. Note that not every organism uses exactly this code. In a couple of organisms, UGA is read as a tryptophan codon, for example. How would this happen? Take a look at the where Tryptophan is in the codon table and predict what could change to lead to UGA becoming a stop codon.

## More detail on the mechanism:

Of course, there is more…much more. Here is a really cool video (with an interesting sound track) from a lab that works on one of my favorite proteins: EF-Tu (elongation factor Tu is the name for the protein in bacteria. But, the function exists in eukaryotic systems also).
EF-Tu is arguably the original G-protein (certainly the first one we figured out). It sets the minimum time a tRNA must stay in the ribosome before the peptide bond is formed. In this way, accuracy is greatly improved.
Also, while there is some argument on the details, the GTP hydrolysis is thought to be part of what drives the assembly to "ratchet" to the next site.

The next video up should be this one:

it's good too.

# Telomeres and Telomerase:

http://www.nature.com/news/2010/101128/full/news.2010.635.html
Two problems:
1. As we have discussed, the chemistry of replication leaves the lagging strand not fully replicated. There is a 3' extension of what had been the template. This would lead to shortening of the chromosome with each round of replication.
2. We have not previously discussed this, but the ends of broken chromosomes, or just free ends of linear DNA, lead to recombination or “splicing” together of these fragments. Also, free ends of DNA tend to be degraded by enzymes quickly. So, what is special about the ends of our chromosomes that keeps this from happening?
Telomeres can be described as the specialized ends of chromosomes that protect the rest of chromosome and keep it stable.

### Cancer and the Hayflick Limit

Our cells seem to go through a fixed number of divisions before they no longer can keep going. This is called the “Hayflick Limit” after the person who noticed it.
However, cancer cells are immortal…they overcome this limit. Also, there must be a way for cells to fix the problem so that we can “reset” the Hayflick limit when a new embryo is formed. One more thing: one-celled organisms don’t have a Hayflick limit. They can grow indefinitely. All of this pointed to some enzyme activity needed to maintain the ends.

### The Sequence:

Well, since we are talking about DNA, it seems likely that there is some specific sequence that corresponds to the Telomere. There is. In humans and other vertebrates, the sequence is the short repeat 5'(TTAGGG)n, where “n” is an integer between 300 and 8000. So, that can be 50,000 bases.
• How does it get there?
• How does it achieve the goals above?

### Telomerase:

The enzyme Telomerase is a little unusual. It has a DNA polymerase activity and needs a 3' end to which it adds. But, it carries a short RNA that acts as its template. So, it finds the free 3' end, which is already longer than the newly replicated strand (due to the lagging-strand problem), uses the internal template to extend the repeated sequence over and over again. Because it uses RNA as a template, but polymerizes DNA, it is known as an RNA-dependent DNA polymerase, also known as “reverse transcriptase,” since it is the opposite of transcription. The image is stolen from Wikipedia and shows the repeating sequence

That’s how it gets there.
It fixes the shortening problem by just adding the sequence. This can then serve as a template for more lagging-strand synthesis. That will still leave a short 3' extension, but will re-lengthen the telomere.
It’s an interesting enzyme.I’ve found many other images with great detail on it, but decided to steal this one:

It shows that there is a long complex RNA that interacts with several proteins (Dyskerin, TERT and some smaller ones). The RNA is much more than the template. It folds into a complex structure that fits into the proteins. The reason I included this one because it makes reference to a couple of forms of a genetic disease known as “Dyskeratosis.” This is a version of premature aging where the effects are seen as premature organ failure, rather than an old-looking outward appearance. Does it make sense that mutations to genes in the telomerase complex would lead to that?

As for how it solves problem number two, the extended 3’ end will fold back around and, along with some accessory proteins, will form a complex, stable structure shown below in cartoon form:

This cartoon is from from:
http://www.bioscience.org/2008/v13/af/2825/fulltext.asp?bframe=figures.htm&doi=yes):

A more realistic view of the DNA in it’s four-strand form is below: (Image from Wikipedia):

The green dot is a monvalent cation.
This depicts how the structure forms. Note that the strands are
Parallel, not antiparallel.
(image from this website:
http://proj1.sinica.edu.tw/~tigpcbmb/course%20material/cb9903/cb9903.htm)

# DNA Replication

Let me lay out the basics:
Since DNA is two complimentary strands, each strand contains the information to specify the other. Thus, it seemed logical from the first time the structure was determined that the two strands would separate and new subunits (deoxyribonucleotides) would be added to make each new strand, using the other old strand as a template. Each new double helix would therefore really be one old strand and one new one.
Here is the basic idea in video. Like most of the videos I will link, these come from the Howard Hughes Medical Institutes (HHMI). Note that this video shows the new strands being made the same for both templates. As we know (and the video alludes), this cannot happen.

## The first problem:

So, one strand is the template for the other. New bases are added one at a time via a simple chemical reaction we have talked about, mediated by a complicated enzyme machine (comprising many different proteins).
The problem is that the chemistry requires that a new subunit can only be added to the 3’ end. So, if you are moving along a replication fork, one strand cannot be replicated easily…the fork is moving the wrong way and it has to be replicated “backward.”
Here’s a more basic video that shows you how an Origin of replication might work and some detail, but in a much simpler form.
Here are two other videos
here and here that have merit, though all of them, including the cool one below, have errors in them.

Notice that there is a second problem.
As the video says, you need a short RNA primer to begin each section when synthesizing the lagging strand. This is put down by an enzyme called “primase.” The leading strand needed an RNA primer to get started too. But, since it is replicated continuously, it only needs one primer, way back at the start of replication.
Here is a link to the really cool video. I think you should look at it again, now that you have seen the simple one. We have to name all the enzymes and talk more about details tomorrow.

### Here are the details of the problem

The unit of DNA polymerization (Synthesis) is a deoxyribonucleoside triphosphate. In the image, the “Base” would be either A, T, C or G, depending on what was on the template strand. Just like ATP, these molecules have high-energy (unstable, that is) bonds joining the phosphates. This can therefore be used in transfer reactions, just like enzymes transfer phosphates from ATP in reactions we have studied. There is a seemingly subtle change: instead of the third (
𝛄 or “gamma”) phosphate on the end being attacked by the OH on the 3' carbon, the first one is (called “α”). This change has a big impact, though. It links the 3' carbon of the existing DNA to the 5' carbon of the incoming base via a phosphate.
This is called a “phosphodiester.”

Here is a specific example, deoxyATP

The next base that comes in will use the high-energy triphosphate it carries to attack the 3' OH.

And forms this:

Thus, as we said, we must add DNA to the growing 3' end. From a chemical standpoint, there isn’t any reason why you cannot add an incoming base to the 5' end, provided that end has a 5' triphosphate. That triphosphate would be unstable, however. Should it hydrolyze, the new DNA wouldn’t be made until another enzyme came in and “recharged it” with new phosphates.

# RNA Splicing part 1

Here is the link to that same
video which is presented by Cold Spring Harbor Lab, where I used to work years ago and where some of the work I discussed today was done.

## Two benefits of splicing

While no one believes that RNA splicing evolved
because of these benefits, these are real benefits organisms now enjoy because of it.
1. Alternative splicing: Not all the exons are included when splicing of specific mRNAs takes place. Different versions of the mRNA may be formed. In the example I gave earlier, exons 1, 3, and 4 could all be joined up in sequence, or, all 4 could be linked. This results in different versions of the protein, with different functions. The domain structure of proteins makes this possible. NOTE: you do not alter the order of the exons when this is done
2. Exon shuffling: Also because of the domain structure of proteins, it is possible to add exons to genes via recombination and create different versions of the protein. Any chunk of DNA that has portions of introns on its ends that then is recombined (at the DNA level) into another intron will do no damage to an existing gene (It can be spliced out at the mRNA level). But, it also provides the possibility, through alternative splicing, to evolve proteins with new functions.
Key words:
Spliceosome: (there’s a good descriptive name) It’s the assembly of proteins and RNA that carry out splicing.
snRNPs (pronounced “snurps”). These are “small nuclear ribonuclear proteins.” They are the components of the spliceosome.
snRNAs: small nuclear RNAs are the main catalytic components. It is the RNA that carries out the reaction. They have names like “U1,” and “U2.” etc.

Self Splicing:
The story about how RNAs got spliced became more obvious when self-splicing RNAs were found. I mentioned these today.
The general reaction looks like this:

There are two, successive trans esterification reactions that occur in a concerted way and use energy from an additional GTP that is brought in by the complex. The first takes a branch point sequence near the 3' end of the intron. The 2' OH of that branch point attacks the 5' exon/intron boundary. This splices the intron into a lariat and frees the 3' end of the exon, which then attack the 5' end of the next exon, resulting in the spliced exons and a lariat structure of the intron.
In the self-splicing form, “internal guide sequences” form base-paired structures that hold the players together and facilitate the attacks (which usually involve that 2' OH in an intermediate).
You can see below how the stem-loop structures formed using base pairs, then can fold into three-dimensional structures, facilitating catalysis. Yes, I know, it should have been obvious.

You can see how the base pairing can be used to bring the active sites together in more detail here. By the way, these last two images are used without permission. The one below is from the Molecule of the Month blog/discussion and the one above is from Trends in Ecology and Evolution.

## Normal mRNA splicing (non self-splicing)

The story is not all that different in the splicing of mRNA. There are guide sequences and a branch-point lariat. The chemistry varies a little as to who attacks whom when and where. But, the main difference is that the sequence of the intron is not that important for the structure to work. Instead of the intron folding into a complex structure, the separate RNAs in the snRNPs form those structures and then bind the specific sequences in the intron and at the intron-exon borders.

# Operon:

After reading my blog, go to
this site and do the activity including the self test.
Also, this is an interactive demo that is moderately useful. You have to put all the elements in place on the "DNA" to see it operating. It works if you have JAVA running on your computer. If you have it blocked, it will download and you can tell your computer to run it. It is up to you. You don't have to do this one.

An operon is a system that exists in bacteria, but not in eukaryotes, for regulating several genes together. Suppose, for example, you are a bacteria that sometimes encounters the sugar lactose. It would be good to have the genes for proteins to process that. You’d need a transport protein to get the lactose into the cell efficiently, and an enzyme to break the lactose into its components, Glucose (you know how that can be used) and Galactose (which you can also convert to glucose).
But, you wouldn’t want to be making these proteins all the time. It would be a waste of energy. Ideally, you would have a system that kept the genes for these proteins “Off,” but then be able to sense the presence of lactose and turn the genes “On.”
That is the
lac operon. And it serves as an example.

## What an operon needs

1. A stretch of DNA that encodes several proteins on ONE mRNA. The ribosome will make all of them reading the same message. This does not happen in Eukaryotes.
2. Promoter: a site on the DNA recognized by RNA polymerase (which carries it’s own transcription factor to bind the promoter sequence, in E. coli).
3. Operator: A sequence in the DNA that binds another protein, called Repressor. When repressor is bound, the polymerase cannot get access to the promoter.
4. A Repressor: The protein that binds the Operator and prevents transcription. It is usually encoded nearby, transcribed from another promoter. Importantly, the repressor has to exist in two states: One that binds the operator and prevents transcription of the genes and one that does not.
5. Effector molecule: molecule, such as lactose in the example of the lac operon, that binds to repressor and switches it between the two states.

Below is a general picture taken from Wikipedia.
The players are numbered below as:
1. RNA Polymerase
2. Repressor
3. Promoter
4. Operator
5. Inducer (such as Lactose)
6. 6, 7 and 8: the coding sequences for several proteins.
In the first panel, the repressor is bound because the inducer is absent. RNA polymerase cannot gain access to the gene.
In the second panel, the inducer changes the shape of the repressor, which causes it to release from the operator. Transcription can then occur.
Note, that there are also cases where the repressor binds only in the presence of the effector molecule. So, you can turn off an operon when the effector is present. This is great for feedback inhibition. The Trp operon is an example of this and is described in the book.

# RNA Splicing part 1

Take a look at
this video which is presented by Cold Spring Harbor Lab, where I used to work years ago and where some of the work I discussed today was done.
The parts of the mRNA that are removed and discarded are called "introns," while the parts that are maintained in the final, mature mRNA are called exons. The exons include the portion that codes for the protein, as well as other regulatory sequences both before and after the "coding" region. As the video describes, there are complexes of proteins and RNA that do the splicing. The whole complex is called the "Spliceosome." The components include the generically named "small nuclear RNAs" which assemble with proteins to form "
Small Nuclear Ribonuclear Proteins," or "snurps."
The story of how and why this happens is one of my favorite topics. We will discuss this more later.

# RNA Processing

All Most of what I’ll be talking about here occurs specifically in eukaryotes.
As usual, most images are from Wikicommons.
Following transcription, the “pre-mRNA” must be processed on its way out of the nucleus to the cytoplasm.
First, notice that the sequence that made up the promotor does not end up in the pre-mRNA. This is a general theme: With each step in the DNA→RNA→Protein pathway, the information needed to specify regulation of each step generally is lost as we go to the next step. This makes sense, since it is no longer needed. Jerry had the clever realization of that fact today in class.

## Poly-A tail and GTP cap.

This addresses two issues: what is the "stop transcription signal" and how do we protect both ends of the mRNA from being rapidly broken down?
When the RNA polymerase reaches the end of the sequence that’s supposed to be transcribed, it hits a signal called the “polyadenylation signal.” This is the "stop transcription signal," but also results in something else happening. The sequence in the DNA (coding strand) is 5'-AATAAA. Of course, the polymerase is reading the template strand, so you could say the “signal” really is 3' TTTATT. The sequence varies a fair bit. This signals the RNA polymerase to leave the DNA. The sequence now in the pre mRNA (AAUAAA, or something similar) recruits a protein complex that will cleave the mRNA near the 3' end and an enzyme called PAP, for Poly Adenyl Polymerase, uses ATP to add a long series of A’s to the end of the mRNA. These A’s, and there may be hundreds of them, are NOT encoded anywhere. This is the reason there was a 3' extension at the end of the mRNA in the picture today. To be fair, the researchers already knew about this and it is part of how they oriented themselves on the DNA/RNA duplex).
The utility of the tail seems to be in recruiting several poly-A binding proteins that protect the mRNA from nucleases that degrade it, facilitate the next steps in processing, the transport of the mRNA out of the nucleus to the cytoplasm, and regulation of translation.
Some prokaryotes do a version of poly-A tail also. But, again, most of this applies to eukaryotes.
The other initial change is the addition of the GTP cap. Actually, it’s a modified G with a methyl group on position 7 of the base, cleverly called 7-methyl GTP. Wiki has a
short discussion of the mechanism of transfer. There is a specific enzyme that does the capping and the cap looks something like this:

Note the “inverted” 5'-5' link. The cap structure also protects the end of the mRNA from degradation.
RNA splicing:

The final processed mRNA might look something like this:

It comprises the coding sequence, the cap, the 5' and 3' “untranslated region” (UTR) and the poly-A tail. I think all the names are pretty self-explanatory. The 5' UTR, in particular, will contain sequences that contribute to regulating translation. Only the green stretch above will make it into the protein.

# Transcriptional regulation

As usual, images from wikicommons.
The first question is, how does the machine that makes RNA recognize a “gene?” In the past, I’ve called a “gene” the stretch of DNA encoding a protein
and its regulatory sequences. We can separate it into the “structural gene,” which contains the actual code of the protein as it would be found in the mRNA, and the regulatory sequences. A very basic regulatory region is called the “promotor.” It is a region that serves as the initial recognition site that says: “This is a gene, transcribe here.”
In addition to the hydrogen bonds that make up the “watson-crick” base pairs, there are additional sites for hydrogen bonds and other contacts that can be made without “unzipping” the DNA, primarily along the major groove. In particular, the major grove is just about the right size for an alpha helix of protein to fit in. There, R-chains on the outside of the helix can make specific contacts with the bases and “read” the sequence (that is, bind in a sequence-specific manner). The first step in transcription is recognition of the promotor, almost always upstream from the structural gene. There are several types of DNA binding proteins. Here are two examples:
This is “Lambda repressor.” Notice two things: the protein binds as a dimer. In fact, each is identical and the overall DNA sequence is a “palindrome.” More on that later. Second, notice that one of the alpha-helices in each subunit is reaching down into the major groove, where it makes sequence specific contacts. This particular protein actually blocks access of the RNA polymerase in E. coli. But, similar proteins act to promote binding.
This is a b-zip protein binding domain, of a class that acts to promote transcription in our cells. Notice the same two things. Even though the overall structure of the proteins is different (note that much of the structure is left out), the binding has some similarities.

## Assembly of the transcription complex

What is the sequence?
Well, that depends. The promotor is the most important thing that determines what genes are expressed (turned on) in which cells. Thus, genes that have to be on in every cell have different promotors than those only on in some. Genes that are turned on following a signal cascade have specific promotors. One of the main “housekeeping” promotors has a sequence called the “TATA box” (I wonder if you can guess why. It is bound by a complex depicted below. This is known as a “General Transcription factor” because it is common to most genes.
As you can see, the
"Tata Binding Protein," or TBP interacts with the DNA over a larger region than just the TATA box. It also partially opens the DNA, where it bends it rather strongly. You may remember from the video we saw that there was a sharp bend to the DNA where the initiation complex was forming. It's also a little unusual in that it binds to the minor groove, using a beta-sheet domain. That bends and partially opens the DNA.

After initial proteins assemble at the binding site, they serve as binding sites for still more proteins that will be necessary to start transcription.

Here is a depiction of some of the proteins involved:

Notice the factors off to the right labeled 4 and 5. These have more to do with those larger levels of regulation I was talking about.

Here is a cool video on the process that shows some of the next step also:

# Intro to Genetic Concepts

This just introduces some terms and ideas that we need to have in order to discuss what's going on with genes and how proteins are made. Read More…