BSCI 410 
Molecular Genetics 
Fall 2008  

 

Instructor:
Steve Mount 


 

TA:
Zenas Chang 


HOME Textbooks Syllabus Grades Homework 
and Exams
Blackboard Model 
Organisms
Links Connotea PubMed 
Online Books

Questions about Homework 4

The following are questions and answers from students regarding Homework 4 (2008). Each question is offset with a horizontal line.

I recommend that you do the reading in Strachan and Read (Chapter 11), using the search option where appropriate.


In Q4 and Q9, it asks about the genetic association of A1 and B1. I am not so sure about what I am expected to talk about, is it about the linkage between them? Because as stated in the question, they are unlinked.

I am asking about allelic association, which is when alleles occur together with frequencies significantly different from those predicted from the individual allele frequencies. This is also sometimes referred to as linkage disequilibrium. Association between a disease and an allele (or a haplotype) is similar, the disease and the allele occur together at frequencies that are higher than predicted from the individual frequencies. There are various statistical tests, and I put some of those on the slides, but for the homework it is sufficient to determine whether or not the frequencies are independent.


Questions 7-9: I am not sure whether the random mating the question talks about also includes individuals from population 1 with those from population 1 or just individuals from population 1 with those from population 2.

Random mating means that individuals are equally likely to mate with others (of the opposite sex), independent of the population from which they are drawn. Thus, a given individual is equally likely to mate with an individual from population 1 or from population 2; the number of matings is determined by the number of individuals of each type.


Q10: what it means by indicating the phase of alleles? should I choose one out of the four possibilities?

Right, those are the four possible phases.  Phase is haplotype.


When discussing Hardy-Weinberg equilibrium, is the question referring to the underlying assumptions? Is the other key point that allelic frequencies remain the same although genotype frequencies change?

A locus is at Hardy Weinberg equilibrium when the allele and genotype frequencies are in the expected relationship.  If all of the assumptions are met, then allele and genotype frequencies will almost certainly be in equilibrium, but it is possible for a locus to be at equilibrium despite not all of the assumptions being met.  In questions 2, 6 and 7 I'm not asking about assumptions.  I am asking you to calculate the frrequencies and tell me if the locus is at Hardy-Weinberg equilibrium.


For questions #5 and 8, is a logically derived answer using punnet squares okay, or do we need to use probability functions?

Punnet squares are just devices for applying the rules of probability, so if you apply them correctly, there is no difference.


With regards to the lod score, when "phase is unknown," are recombinants scored as either affected individuals without the suspected allele, or unaffected individuals with the suspected allele?  Can theta be derived from this scoring?

LOD scores are a function of theta, so theta is not derived (although one can observe the value of theta, or the map position, with the maximal lod score). If the phase is unknown, then you average the two possibilities. Each individual is a recombinant under one model but a non-recombinant under the other. This is illustrated in slides 52 and 53 of lecture 16 and box 11.3 of Strachan and Read (online).  It's an important question because the answer provides a quantitative measure of how important it is to know the phase.


Does the occasion where Z = 0.3n occur only in simple cases where theta = 0?

Z = 0.3n for dominant transmission when theta = 0, you know the phase, no recombinants and no complications such as incomplete penetrance. .


Why does the table (also in your slides) indicate that Z is (-)infinity for theta=0?

If the probability of the data given the model is 0 then the numerator of the log odds ratio and the ratio itself are also 0.   The log of 0 is negative infinity.  Thus, if something that is impossible under the model is observed, then the model is wrong, and the LOD score is negative infinity.
.


In a case with zero R and theta=0 should the term theta^R be excluded? (0^0)

The probability in the numerator is the product of the probabilities of the individuals in the pedigree.  If there are no recombinants, then it's simply (1-theta)^NR.  If you have both recombinanants and non-recombinants, then the term is (1-theta)^NR times theta^R.  It's not so much that the term is excluded as it is that it doesn't arise in the first place.


I want to emphasize that the lod score is the log (base 10) of the ratio between two probabilities,

Z = log10(P(data|M1)/P(data|M0))

Since independent probabilities are multiplied, lod scores can be added.

I will go through the detailed calculation for one child as a way of illustration. The scores for other children, being independent, can be multiplied as probabilities or added as LOD scores.

Consider a child that has inherited both the marker and the disease.   There are two ways to calculate an odds ratio for the model that the disease maps to that marker (precisely, that is, with theta=0).  In the first case, we divide all events into recombinant and nonrecombinant.  Under the model of linkage with theta = 0, all progeny should be nonrecombinant.

We can make a table:

P(data|M1) is 1 for non-recombinants and 0 for recombinants.

P(data|M0) is 0.5 for non-recombinants and 0.5 for recombinants as well.

In the second method, we have four classes,
1) disease, marker
2) no disease, no marker
3) disease, no marker
4) no disease, marker

Under model 1 (linkage), the two non-recombinant classes are equally likely, with P = 0.5
Under model 0 (no linkage), all four classes are equally likely, with P = 0.25.   Here

P(data|M1) is 0.5 for class 1 and class 2 and 0 for non-recombinants.
P(data|M0) is 0.25 for each of the four classes (which are equally likely).

Note that the odds ratio for the one child, who has inherited the disease and the marker, is identical in both cases:

In the first case, P(data|M1)/P(data|M0) = 1/(0.5) = 2
In the second case, (P(data|M1)/P(data|M0) = 0.5/(0.25) = 2

The calculation that I just did for one child is the essence of the method.  Additional children are independent, so odds ratios can be multiplied (or LOD scores added).   The effect of penetrance is accommodated in a straightforward manner using the second approach, where four distinct classes are considered. .


What is G1?

G1 means "generation 1" or first generation. It is like F1. .



page by Steve Mount