Saturday, January 29, 2011

Looking for examples of BAD genetics journalism

The students in my introductory genetics class have an unusual assignment - each of them has to find an error in the reporting of genetics and write a letter to the editor about it.  They're not very skilled at finding examples of such errors (and I'm afraid I haven't given them much time), so I'm asking the twitterverse for help.  If you've recently wrung your hands about some egregious error in the reporting of some advance in DNA or genetics research, we'd be very grateful if you would post a link (or other identifying info) in the comments.

Here's some more information about the assignment:

The students are asked to find somewhere in the media where an incorrect statement is made about a genetic topic. This could be in a tabloid, newspaper or magazine, on television, or in a news-media online source.  (General blog posts are not eligible, though media-affiliated ones are.)

These students are only part-way through their first genetics course, so the error needs to be pretty basic.  Examples I've given them include
  • describing genome sequencing as 'cracking the genetic code'
  • describing a bacterium with arsenic in its DNA as 'a new form of life'
  • credulously reporting about the predicted effect of what turns out to be an imaginary gene
  • claiming that gene A causes behaviour B, when it only slightly increases the probability of the behaviour.
The students write a draft letter to the editor (polite, concise, in correct English), and then do a complex peer review of each others' drafts, using Calibrated Peer Review (CPR).  (UBC's Centre for Teaching, Learning and Technology is setting this up for us on a trial basis, as nobody here has used it before.)  They then polish their letter, submit it for final grading, and (we hope) also send it to its destination.

The present class is only 40 students, but if this assignment works well (and the CPR works well) we'd like to run it for 500 students next Fall.  This would lead to a barrage of letters to the editor complaining about the poor quality of their genetics coverage, and might even lead to an improvement in future reporting.

Thursday, January 27, 2011

How we define the phenotype is critical

The post-doc and I are discussing the following genetics problem, taken from a textbook:
Q.  For a certain gene in a diploid organism, eight units of protein product are needed for normal function.  Each wild-type allele produces five units.
a.  If a mutation creates a null allele, do you think this allele will be recessive or mutant*? 
b.  What assumptions need to be made to answer part a?
*Note: I don't know what the word 'mutant' means here, since we already know that the allele is mutant.  I suspect it's an error so I initially ignored it.

What I originally said:
The mutation is not recessive to the wildtype allele, because the heterozygote has a different phenotype than the wildtype homozygote.  I don't think this conclusion requires any assumptions other than the usual definition of recessive.  
However the postdoc and others have been arguing that mutation should be interpreted as dominant. This requires interpreting the word 'mutant' as an error where 'dominant' was meant, which is not unreasonable.

What I say now (after quite a bit of thinking): 

First, we're told that the mutant allele is a null allele, so the heterozygote is expected to have half the normal amount of protein (5 units instead of 10)  Since 8 units are needed for the normal phenotype, the mutant heterozygote will not be normal.  So the mutant allele certainly is not recessive.

(Here I'm assuming that the defect in one allele doesn't cause the other allele to be upregulated.  That's a possible answer to part b, though I doubt it was what the questioner was looking for, since this question comes from the first chapter on simple Mendelian inheritance.)
 
We're not told the phenotype of a mutant homozygote, so before considering whether the mutant allele could be dominant to the wildtype allele we need to carefully identify the phenotype in question.  The term 'phenotype' can have different meanings even for a given pair of alleles, depending on what is being observed and how it is being categorized. 

For example, if a pigment is being observed it could be treated qualitatively (red/white; red/pink/white; present/absent) or quantitatively (how much pigment is present).  Phenotypes are usually treated qualitatively in genetics textbooks, with quantitative phenotypes segregated into a special chapter.  But most real phenotypes have gradations, and the observer must decide whether to treat them qualitatively (with 2, 3 or more categories) or qantitatively.

Qualitative categories are usually chosen to reflect the underlying genetic effects.  For example, an observer might initially categorize flower pigment as red/white, and later realize that the 'red' category should be divided into 'red' and 'pink' because this better explained how the colours were being inherited.  If a gene were later discovered that modulated pigment production, the observer might then treat pigment quantitatively.

The problem posed above doesn't give us explicit guidance about whether this phenotype should be treated qualitatively or quantitatively.  Normal is presented as an ordinary word, not flagged as a special term by quotes or italicization, so we could certainly interpret it quantitatively.  However it could be meant qualitatively, although we're not given any clues to what the categories would be (normal/abnormal?  normal/abnormal/severely abnormal?).

If the phenotype is to be treated quantitatively (with 'normal' just taking its ordinary English meaning), then the mutant homozygote is expected to have a more severe abnormality than the heterozygote, so the allele would not be dominant.

But the postdoc argues that it's just as reasonable to treat the phenotype qualitatively with two categories, 'normal' and 'abnormal', and I agree that under this definition the mutant allele would be considered dominant.

However I think that requiring the phenotype to be defined this way is tantamount to making this a 'trick question', because this definition implies that the person posing the question deliberately ignored whatever information might be given by a more nuanced definition (one that considered possible differences between the mutant homozygote and the heterozygote).

Because the wording of the question doesn't favour this interpretation over any other, we should go for interpretations that are more reasonable - qualitative with more than two categories, or quantitative.

Would it be OK to say that the mutant is dominant because the heterozygote and mutant homozygote really do have identical phenotypes under more nuanced definitions (i.e. that they are equally abnormal)?  No, because this would require a biologically unreasonable explanation for the dominance - either the null allele in the heterozygote must completely prevent expression of the normal allele, or the presence of two null alleles in the homozygote must allow them to produce 5 units.  The former is very unlikely though not impossible, and the latter is inconsistent with the meaning of 'null allele'.

Later:  I've heard back from the person who wrote this question.  He indeed meant 'dominant' rather than 'mutant, and his intended answer agrees with that of the postdoc - that all non-normal phenotypes should be lumped together into the 'abnormal' category, which would make the null allele 'dominant'.

 I think this is both scientifically bizarre and pedagogically misleading.  It reinforces the erroneous assumption that alleles must be either dominant or recessive, and requires a very improbable explanation to be treated as typical.  Either question a should give a third option (recessive, dominant or neither) or the question should be framed with "What is wrong with this question?".  Question b can be deleted.

Tuesday, January 25, 2011

Answering a complex problem after discussing it in tutorial

My genetics students are complaining about the way I've designed the tutorials.  I have them spend the first part of each two-hour tutorial in a structured discussion of the topics covered by the past week's classes, and the second half working on a complex genetics problem.  It's this second part that's generating the complaints.

They're given the problem in advance, and are asked to print it out and make a preliminary attempt at it before tutorial.  I've told them that this attempt can be quite superficial; it's only worth 1 point (out of 5).  They turn in this attempt at the start of tutorial, and are given a blank copy of the problem to work on.  The students then work on the problem in groups of 3-4 at the chalkboards.  (This classroom is in the old math building so it has lovely chalkboards filling three walls.)  Different groups then explain to the class their suggested answers to the different parts of the problem, and students discuss these answers.  They also discuss how the problem might be adapted or modified for use in different settings, for example, changing the organism so it can be reused on a test, or making part of it into a shorter stand-alone problem.  (In future we'll try to get them to also explicitly discuss what is needed for a good written answer to the problem, but they're not ready for that yet.)

All this seems to be OK with them.  But the final step is for each student to write out a careful answer to the problem they've been discussing, as if this was an exam setting.  These answers are handed in and marked; they're worth 4 points.  At present the group work is left on the chalkboards while students are writing their answers, but I've told them that in a few weeks we'll start erasing the boards before they write their answers.

Students are complaining that this is a waste of their time, that they don't learn anything by having to write answers after they've already seen how the problem should be answered, and that they would learn more by spending the time in additional discussion.  I disagree - I think that observing the right answer doesn't lead to much learning, and that having to apply what they've just observed by creating a written answer adds a lot.

In tomorrow's lecture I'm going to show them some data that might help them see the value in this.  It's from a paper that just appeared in Science (Karpicke and Blunt).  In both of the two studies they describe, the authors had students spend 5 minutes reading a half-page of text about a biological topic, and then consolidate what they'd read in various ways.  The students were then asked to predict how much they would remember a week later.  A week later they were tested on each topic.

In the first study the students either (i) did nothing more, (ii) reread the text three more times, (iii) spent 25 minutes making a concept map with the text, or (iv) tested their recall immediately by writing about it for 10 minutes, then reread the text, and retested their recall.  In the second study the students either (v) spent 25 minutes making a concept map or (vi) tested their recall, reread the text, and retested their recall.  In the first study each student read only one text and was tested a week later with a short-answer test.  In the second each student was given two texts, one learned with a concept map and one with recall testing, and these were tested a week later using either a short-answer test or a concept map (in randomized combinations). 

In both studies the students predicted that they'd remember more with the non-testing methods, but in the post-tests they always scored substantially higher when they had consolidated their reading by testing their recall.  Here are edited versions of their graphs:

All the data

Part of the data, that I'll describe to the students

I'm going to show my students this study in tomorrow's lecture, and I'm going to give them two conclusions:  First, people are not very good judges of how much they've learned.  (So my students should realize that their opinions of how much they learn by different tutorial activities may well be mistaken.)  Second, testing oneself is an excellent way to learn.  (So my students should realize that having to develop a written answer after a discussion is a valuable way to reinforce what they've discussed.)

This will take a few minutes that I could otherwise spend talking about mitosis but I think learning how to learn is more important.  The students have a mini-midterm coming up on Friday, so they should be fairly receptive to ideas about how to learn.  I don't expect that this new data will convince them all that my tutorial design is good (that's why I wrote 'should' above instead of 'will') but at least they'll realize that I'm not just doing it to to be mean.

Saturday, January 22, 2011

Genetic mapping

In yesterday's course meeting for my new second-year genetics course (which I'm now thinking of as "21st Century Genetics"), I mentioned that the syllabus doesn't include the classical technique of genetic mapping.  The others were shocked!

My students will learn how meiosis works.  They'll learn about segregation and independent assortment.  I've never really seen clear explanations of the meanings of these widely used terms, but segregation means that each daughter cell gets one version of the two homologous chromosomes (never two or none), and independent assortment means that which version of each pair a particular cell gets is random and independent of the version it got of each other pair.  They'll learn how crossing-over between parts of a pair of homologous chromosomes makes new combinations of the alleles.

The students will learn how to find out if genes are linked (close enough together on the same chromosome that their alleles aren't randomized by meiotic assortment and crossing-over).  They'll also learn that the frequency of crossing-over between any two genes gives a rough estimate of how far apart they are.  They might even learn how to compare these frequencies to tell which gene is in the middle of a group of three linked genes (maybe as a homework problem).  BUT, they won't learn to use three-factor crosses to determine 'map distances'.  (Here's a web page with a fill-in-the-boxes version showing how such mapping analysis is done.)

Why not?  Because they won't have any use for this skill.  Even if 1000 students take the course each year, I would be very surprised if even one ever needed to map genes using crosses, except as an exercise in an old-fashioned upper-level genetics course.

Here's a page arguing that even real geneticists didn't do this - that the idealized three-factor mapping cross was largely an exercise for students.  I don't think that's necessarily true, but it's certainly true that real geneticists rarely do this any more.  Genetic mapping in general, and mapping by three-factor crosses in particular, is fast becoming an archaic technique.  If one of my students should ever find that they need to do this (and I'm having a hard time coming up with an example where they would), there are lots of textbooks to show them how.

I think that the main reason genetics courses have always included three-factor mapping is that (i) this used to be how accurate gene maps were made, and (ii) this provides a tidy way to test whether students understand the consequences of crossing-over.

I think I will teach the students the difference between a physical map and a genetic map of a chromosome, and I'll expect them to be able to explain why the two kinds of map might not be identical - because recombination frequencies are influenced by DNA sequences (chromosomes have hotspots and cool spots), and because the data from the crosses may have flaws (low numbers, phenotypic problems that limit detection of recombinants).  But I won't expect them to be able to do the mapping.

Saturday, January 15, 2011

Genetics problem for tutorial discussion

The postdoc and I just created an excellent genetics problem for the pilot section of my new course.

The problem needed to get students thinking about how changes to genes affect phenotype, but it couldn't involve crosses because they won't be doing those for another couple of weeks.  That rules out just about all the problems in the textbooks.

This new problem has everything:
  • haploinsufficiency
  • dominance
  • repressor gene
  • activator gene
  • natural polymorphism
  • important human diseases
  • screening of newborns
  • problems important in developing countries
  • amino acid substitutions
  • isoelectric focusing to detect changed protein charge
  • mixed-allele dimers
  • differences in protein levels
  • developmental regulation
  • interactions between fetus and mother at the placenta
  • suppressor mutations (mitigating the deleterious effects of another mutation)
  • natural selection in human populations
  • mutations that are very well characterized (DNA, RNA, protein, function)
  • genome-wide SNP analysis
  • a mutation that's lethal when homozygous but beneficial when heterozygous
  • new research in a high-profile journal (Sept. 2010 paper in Nature Genetics) 
  • students label subunits in tetramers
  • students predict bands in gels, for different genotypes and developmental stages
  • students predict protein levels through human development (draw lines on graph)
  • students diagram regulatory interactions between genes, for different genotypes
But it's still straightforward enough for second-year students who are just beginning to learn genetics (no crosses, no matings, no trees, no pedigrees). 

What do you think this fabulous problem is about?

Saturday, January 08, 2011

Teaching about 'dominance'

In my new genetics course I'll soon be teaching about how genotypes determine (or influence) phenotypes in diploid organisms.  For these Week 3 classes I want to give the students some reading material, both to read before the lectures and as a study reference for material covered in class.  But there's nothing suitable in any of the genetics textbooks I've looked at, so I need to create it myself.  Below I'm going to try to work out how best to present this and to design the reference I'll have them read.

The Week 2 lectures (= this week), will discuss natural genetic variation, how mutations generate this variation, and the phenotypic consequences of genetic differences in haploids and homozygous diploids.  In the last of these lectures I want to consider the differences caused by standing genetic variation as well as lab examples.  And here I should raise the issues we'll deal with next week, explaining that diploidy complicates the relationship between genotype and phenotype, and that the next week's classes will all focus on building a solid understanding of this relationship in diploid organisms. 

Somewhere (in the Friday Week 2 class or in the Monday Week 3 class) we'll need to consider that there are different kinds of phenotypes.  Some are strictly qualitative - presence or absence of an antigen or blood type, presence or absence of a disease - but many are best treated as quantitative, especially when we consider natural variation.  These include obvious things like height and hair colour, and less obvious things like about of an enzyme or metabolite present in a cell or bodily fluid.
I'll also need to introduce the idea of 'risk' as a quantitative phenotype - this is best done in the context of natural variation and genomics.

The first Week 3 class will just be about interactions between alleles of single genes.  I'll start with some of the same examples I used the Friday before, asking students to predict the phenotypes of individuals heterozygous for mutations whose homozygous phenotypes we've already established.  These should include intermediate phenotypes, 'both-type' phenotypes, and dominant/recessive phenotypes, and genes with more than two alleles. 

The existing terminology is terrible, since everything is described in terms of dominance, whereas dominance and recessiveness are really only two extremes of the range of heterozygous effects.  The problem is maintained by the practice of beginning genetics courses with Mendel, and of introducing all the important concepts with dominant/recessive allele pairs and the A/a allele representation.  Only long after students learn this (mainly by rote) are they told about genes with more than two alleles and about 'Variations on Dominance' (Introduction to Genetic Analysis), 'Modifications of Dominance Relationships' (iGenetics), or 'Complications in the Concept of Dominance (Genetics: Principles and Analysis).  These books, and all the other genetics textbooks I've seen, present 'co-dominance' and 'incomplete dominance' or 'incomplete dominance'

Oh, and in the preceding Friday class I also need to raise the important issue of how we name alleles - when the A/a convention is appropriate and when it isn't.  I'll tell them that its usually only appropriate for made-up examples in classrooms, because genetics researchers have different conventions for the real organisms they study.  (There's no point teaching students these conventions, because they are not only arbitrary but are different for different organisms.)  I'll also tell the students that I will only use the A/a convention for alleles known to be dominant/recessive to each other, and that they should be careful to only use them it they are confident that this is the case.

I really wish we had good terminology for the different kinds of effects.  I don't want to use 'codominant' and 'semi-dominant' (or 'incompletely dominant'), but the only alternative is to describe the actual relationship in each case.  Maybe I can at least standardize the words I'll use in this course: 'blended' for a heterozygote phenotype that's halfway between those of the homozygotes, 'both phenotypes' for co-dominance.