Monday, March 18, 2013

Estimating the Human Mutation Rate: Biochemical Method

This is the second in a series of posts on human mutation rates and their implication(s). The first one was ...
What Is a Mutation?

There are basically three ways to estimate the mutation rate in the human lineage. I refer to them as the Biochemical Method, the Phylogenetic Method, and the Direct Method.

The biochemical method relies on the well-known fact that the vast majority of mutations are due to errors in DNA replication. Since we know a great deal about the replication complex and the biochemistry of the reactions, we can calculate a mutation rate per DNA replication based on this knowledge. The details are explained in a previous post [Mutation Rates]. I'll give a brief summary here.

The overall error rate of DNA polymerase in the replisome is 10-8 errors per base pair. Repair enzymes fix 99% of these lesions for an overall error rate of 10-10 per bp. That means one mutation in every 10 billion base pairs that are replicated.

Theme

Mutation

-definition
-mutation types
-mutation rates
-phylogeny
-controversies
The human haploid genome is 3.2 × 109 bp. [How Big Is the Human Genome?] [How Much of Our Genome Is Sequenced? ]. That means that on average there are 0.32 mutations introduced every time the genome is replicated. In the male, there are approximately 400 cell divisions between zygote and the production of a sperm cell.1 This gives a total of about 128 new mutations in every sperm cell. In the female, there are about 30 cell divisions between zygote and the production of egg cells. That's about 10 new mutations in every egg cell.

Adding these together gives us about 138 new mutations in every zygote. Let's round this down to 130. Thus the estimate from the Biochemical Method is ..

130 mutations per generation


[Image Credit: Wikipedia: Creative Commons Attribution 2.0 Generic license]

1. This depends on the age of the man when he has children. The value used here is approximately the average for a 30 year old man.

20 comments:

  1. 3.2 × 10-9 bp.

    Hopefully it's a bit bigger than that.


    ReplyDelete
    Replies
    1. Gimme a break!!

      I was only off by 18 orders of magnitude.

      Thanks.

      Delete
    2. I was only off by 18 orders of magnitude.

      By William Dembski's standards, a small error.

      Delete
  2. Could you elaborate on the number of cell divisions? How many in development, how many per whatever time period in spermatogenesis, how many in oogenesis? Citations would be dandy too.

    ReplyDelete
    Replies
    1. Hey, Larry. Looking back, I see there was never a reply here. It would be nice to know.

      Delete
    2. The number of cell divisions in females and males is from a review by James Crow in 2000 and a paper by Huttley et al. in the same year. The primary reference is Vogel and Rathenberg (1975) but I haven't read the original. I increased the number of divisions in females to 30 from 24 but I don't remember why.

      Huttley, G. A., Jakobsen, I. B., Wilson, S. R. and Easteal, S. (2000) How important is DNA replication for mutagenesis? Molecular biology and evolution 17, 929-937. [PDF]

      Crow, J. F. (2000) The origins, patterns and implications of human spontaneous mutation. Nature Reviews Genetics 1, 40-47. [doi: 10.1038/35049558]

      Vogel, F. and Rathenberg, R. (1975) Spontaneous mutation in man. In Advances in human genetics, pp. 223-318. Springer.

      Delete
  3. "The biochemical method relies on the well-known fact that the vast majority of mutations are due to errors in DNA replication."

    Sadly, this fact is getting forgotten. The "DNA damage as primary source" story seems to be spreading quite a bit among my colleagues.

    ReplyDelete
  4. the well-known fact that the vast majority of mutations are due to errors in DNA replication

    Well... DNA replication includes more than just DNA poly a,d,e involved in replicative DNA synthesis. If the fidelity of DNA pol b, involved in base excision repair is 1e-6 (the enzyme has no proofreading activity), then for a 20 year old woman whose DNA suffers 10000 AP sites per day due to spontaneous base hydrolysis, 1e-6 x 10000 x 20 x 365 = 73 mutations. If we stipulate that mismatch repair reduces this burden by half (since there is no marker of which strand carries the mutation), the egg still accumulates ~36 mutations from base excision repair. Same calculation for the boys, for a total of replicative errors of (9 + 124; girls and boys respectively) with an additional 36 x 2 = 72 mutations from BER for a total mutational load of 205 bases per generation.

    This analysis is extraordinarily sensitive to the fidelity of pol b, and assumes further that for every AP site, pol b only needs to repair 1 nt. Giving pol b a fidelity of 1e-6 is probably overly generous (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2846199/) in which case mutational load, as calculated biochemically, will be dominated by DNA repair synthesis, not DNA replicative synthesis.

    ReplyDelete
  5. Larry,
    There seem to be too many "humans" in the title. Please delete the first one.

    ReplyDelete
  6. I hope that everyone understands that this old school estimate is closer to Fermi guestimation than biological reality. In the age where mutation rate can (and should) be measured directly, this stuff sort of calculation has only historical and pedagogical value.


    ReplyDelete
    Replies
    1. I suppose I should have mentioned that I have an utmost respect for the back-of-the-envelope estimations. The great value of the biochemical analysis is that it shows that the rate is almost certainly not 10 or 1000 per generation.

      But yeah, today we ought to just put a number in the textbooks based on direct observations (maybe even broken into rates for different races if it turns out to vary) and be done with it.

      Delete
    2. But do you actually trust the current "direct observation" experiments as the final say? From my experience with NGS, the data is so full of errors that you have to apply so may filters and checks that I would guess these new "direct" estimates are sure to be underestiamtes. Add to that the "rule of thumb" that with Illumina data, you need 32x genome coverage to detect 98% of the know heterozygous alleles in a human sample, these observations become, at best, low end estimates.

      But let's do the math;
      Mean reported is 1.2x10^-8 per base per genetation (NatRevGenetics (2012)13:745).

      I'm using assambly statistics for genome size, since we can't assay the unassembled part. So these "direct" measure miss 6-10% (will not assemble to anything so are tossed out).

      Human male - ~5.98x10^9bp
      Human female - ~6.07x10^9bp

      Note; based on (PNAS (1991) 88:7474), the genome should be 6.45x10^9, so are missing ~6% by that estimate.

      Therefore, this gives 72 per male and 73 per female.Adjust for the 6% and we have 76 or 77. Giving my critisims of the current technology, is 130 any more unreasonable?

      Delete
    3. But the kinda recent studies with trios (father. mother, children), using NGS, gave ~200 mutations in offspring not present in parents.

      Delete
    4. Could you post a reference? I quit actively searching them out after Campbell et al, (2012)Nature Genetics 44,1277.

      Delete
    5. A map of human genome variation from population-scale sequencing
      doi:10.1038/nature09534
      The 1000 Genomes Project Consortium
      Though there was a specific paper further describing this. In this one I see lots of numbers, but no direct number per generation. It must have been one of those published in 2012.

      Delete
  7. @TheOtherJim

    Giving my critisims of the current technology, is 130 any more unreasonable?

    No, of course not. It is entirely reasonable. My point is that the number depends on so many assumptions that it remains entirely reasonable when varied by several-fold in either direction.

    ReplyDelete
  8. I'm trying to work out the mutation rate in lemur species. Do you think it would be fair to apply the error rate and cell divisions used here to other primate species, taking into account known genome size and generation length for these species to get an estimate?

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete