Evidence, Probability, and Uncertainty

In a previous post, we used CTTM data that Old Town High School students collected to map the iron levels in water systems around their community. If we ask students, “Do you think we are more likely to find higher levels of iron in well water or municipal water?” a typical answer might be something like, “Well, I think we’ll find more iron in municipal water.” Or well water. It could go either way. It is not the choice between well water or municipal water that is important; what is important are the things missing in their response.

  • The response provides no evidence to support the conclusion.
  • It does not say anything about the relative probabilities of finding iron in the two kinds of systems.
  • It does not say anything about how certain the student is about this conclusion.

Appropriate use of evidence, thinking in terms of probabilities, and consideration of uncertainty are ideas at the center of data literacy. CTTM provides rich opportunities for students to learn about them and use them.

Introducing These Big Ideas

First, I feel a little uncertain as I write what follows, and so I am asking teachers to comment on what I say. Please enter a comment below or send me an email if you see ways to wrestle with these big ideas that I don’t mention. I am suggesting that it is important to distinguish between these three concepts. They ARE different. I am less sure about the skills that students will need to be ready for these concepts. That might translate into questions about the appropriate grade level for introducing these ideas. Also, I am sure that there are good approaches to engaging students in these ideas other than those I suggest. Please share your experiences and views.

“Evidence” seems like the easy one to address and the place to start. Teachers generally believe that students should be able to provide evidence for their assertions. But, in practice, it is often not easy. In working with students and data, I often hear versions of the “Well, I think …” answer that I sketched out above. My knee-jerk response — and it really is almost reflexive — is to respond by asking, “Why do you think that?” I almost always receive a disappointing answer that creates another problem for me to solve in moving the discourse forward.

I can think of reasons why the answers to “Why do you think that” are disappointing. One might be that this response causes students to feel that I think their answer is wrong and am challenging it. Another reason might be that they have not thought much about evidence — they really WERE just telling me what they think — and now I am getting on their case, and that doesn’t seem fair.

Perhaps a more productive response would be to accept their answer and ask them to take it to the next logical step. For example, I might say something like, “OK. So, you are suggesting that we focus more on municipal water. I am wondering how much more. Suppose we had enough money for 100 tests. If we thought it is just as likely to find high iron in wells as in municipal water, we would divide the tests 50-50 between wells and municipal. But, you are saying we should focus more on municipal. How would you divide up the 100 tests?”

What is new and maybe a little provocative here is that this skips over the evidence, at least for a moment, and moves on to probability. It gives students a way to quantify their thinking about the relative frequencies of finding iron in the water from the two systems. This will be an unfamiliar idea for many students and so will take some time to absorb. Again, this is not a time to ask them for the evidence behind their answers. Our job will be to help them learn to develop such evidence. At this point in the discourse, the best thing might be to ask other students for their relative probabilities and record all their responses to show the range of uncertainty that the class has in answering the question.

Evidence for the Probabilities

Once the students have generated some ideas about the relative probabilities of finding elevated iron levels in the two kinds of systems, we can help them learn how to tie those ideas to the evidence they have already collected. Tuva provides an easy way to extract the raw data they need to do that. This is probably something you will need to show the students how to do, but if you are working with students who are comfortable with Tuva and mathematics, they might be able to figure it out, and it would be good to let them try. Below is a picture of a graph that will provide the data the students need. As before, clicking on the picture will open a larger version.

Contingency Table Data - Iron and Water Source
Figure 1. Data students can use to construct a contingency table

This is just a dot plot with Iron Level on one axis and water source on the other and counts for each category. The counts provide the numbers that students need to build a table, something like the one below. (I excluded the two “small community systems” in the third row of the dot plot since I was not sure where their water was coming from.)

Contingency Counts - Water source and Iron Level
Figure 5. A contingency table extracted from the dot plot.

The idea of distilling the counts from Figure 4 into a contingency table like this will be a new one for many students; it is likely to be something that you will need to introduce, rather than expecting them to figure it out independently. It might make sense to build some scaffolding for this idea by having students work with some other, simpler data. For example, you might have them explore the question of whether gender is related to right/left-handedness by doing a show-of-hands survey in class and then collecting the data in real-time on a screen or whiteboard. It might look something like this:

Example of a Contingency Table
Figure 6. A simple contingency table.

The students could use an example like this to explore the kinds of conclusions they can draw from the evidence in a contingency table. For example, can they say there are more right-handed males than females? Can they conclude that males are more likely to be right-handed? How can they transform the numbers to more easily support claims about the relationship between handedness and gender? How confident are they that these relationships would be somewhat true for another class?

If there are teachers reading this who have other suggestions about how to introduce contingency tables, please add a comment to this post or send them to me. I am sure that others will be interested.

Once the students understand how to use the numbers in a contingency table, they can use the table in Figure 5 to support, challenge, and refine their ideas about how to allocate the more expensive tests between municipal and well-water systems. Hopefully, some will figure out they can use division to turn the numbers in Figure 5 into decimals representing the proportion of samples from each source with low and high iron levels, as in Figure 7.

Table of probabilities for Iron Levels in water from different systems
Figure 7. Probabilities calculated from the contingency table

If teachers feel that the class successfully connects these tables with allocating the more expensive tests, it would be worthwhile to discuss what these decimals represent. Some students will recognize that one can think of the decimal as a percentage. So, it is accurate to say that 41% of the samples from municipal water sources and 36% from well water sources had iron levels greater than 1. The conversation will become more interesting and challenging if you ask what it means to think of these decimals as probabilities. Can we say there is a probability of 0.41 (or a 41% chance) that municipal water sources will have iron levels greater than 1?

This question provides an opportunity for students to identify and distinguish between different sources of uncertainty. If we retested the same taps as before, would it be likely that our tests show that 41% of the municipal taps have iron levels above 1? You might ask students to reflect on their experiences using the test strips to collect water chemistry information. How easy was it to make judgments about the different colors? How would they describe this kind of uncertainty?

How about if we tested a different set of municipal sources? Even if we had perfect tests, would it be likely that 41% of the tests show levels above 1? How would they describe this uncertainty?

Even though there is uncertainty, is it “kind of” true that municipal sources in Old Town appear more likely to have higher iron levels than well water sources? How does the students’ analysis change their thinking about how they would allocate resources if they could do another round of tests using a more accurate procedure? These questions could support a discussion where students could reasonably disagree.

CTTM as Authentic Scientific Work

In the first post, we showed how CTTM enables students to use maps to explore water quality questions in spatial terms. This second post showed how students could use frequency counts to support, critique, and refine claims that they generate from the data. We looked at ways to engage students in conversations about probability and uncertainty to deepen their understanding of what it means to use data as evidence for claims. The next post in the series (I have not written it yet) will explore ways to get students thinking about how to reduce uncertainty to make stronger claims.

One of the things we like about CTTM is that teachers can keep building on it. It is not a “one and done” project but instead gives students opportunities to expand and redesign their efforts to understand how drinking water quality varies across their community. That is how science works.

Finally … again … I hope to learn more about what you think about the ideas I offer here and about how you might go about presenting them to your students.

— Bill Zoellick

4 thoughts on “Evidence, Probability, and Uncertainty

  1. John Van Dis

    This is very intriguing. I’ve not thought about approaching data this way with students. I’m trying to think how I’d apply it to work we’ve done with clams when the students were asked where the community should focus on reseeding the clam flats – an investment of limited resources. In that case, the students looked at the size of the bar charts they had created for average number of clams per square foot. That seems to be the default for most students I encounter, “Let’s compare the averages and make a bar chart.” I haven’t made the contingency tables with them before. It appears that they couldn’t miss the importance of the numbers if they are making tables with them, and the link to the visual chart from Tuva certainly provides the grounding for how to make such a table. There’s lots of necessary building skills too, talking about categorical and continuous data.

    What about asking them right away to consider the likelihood or probability of having a test result with a certain level? And, what would be the probability of that test being from a well? And then helping them move toward the contingency charts and tables.

    I don’t know that your reflex question to students is a bad thing. They do need to think through the why of their decisions. What’s driving it? This applies to discussions, evaluating an article – backing up a claim/argument with something substantial.

    Using the word ‘contingency’ is new to me, not in my vernacular for describing data, so I’m pretty sure it’s not on the tip of the tongues of my students. I like this, but it’s not something I could jump into with my students without some serious planning and scaffolding. We’re still focusing on getting work completed, on time, and of a certain quality.

    1. Bill Zoellick Post author

      John — Thanks for the reply. Yes, ‘contingency table’ does have the disadvantage of making it sound like we are doing something fancy and difficult when we divide frequency counts up into different categories, even though it is actually pretty simple. Sometimes these tables are called “cross tabulations” or “crosstabs.” I decided to use “contingency” because it captures the idea that each cell in the middle of the table shows shows the frequency IF two things are both true. The idea of contingency — of IF — seems important as students begin thinking in terms of probabilities. The table in Figure 7 suggests that the probability of finding elevated iron is 0.41 IF the water comes from a municipal tap. I was thinking that “contingency” might draw attention to how the table is made. But I could be persuaded that “cross tabulation” is maybe just as descriptive and less off-putting. If you do try this with students, let us know which terms are better at helping them know what the table is all about.

  2. Sarah Dunbar

    I think that this is very appropriate for middle school students. A major goal I have for my middle schoolers is to be able to identify what questions they can try to answer with a data set and what they can not. I feel like we talked about this in some of our meetings last spring. I have an activity that I do with Tuva that asks students to identify a question, and they reflect on their question after they create a graph. I encourage them to think of questions they still have, and what they are interested in digging deeper into.

    I use the CER (claim evidence reasoning) model with my students, they are usually very solid on the claim and evidence but the reasoning that ties their claim and their evidence can be a challenge. I think your response “why do you think” is appropriate. You are essentially trying to pull out the “why” your evidence supports your claim (or attempts to answer your question)
    I can relate to your worry that your probing questions are “skipping a step”. I found that this year when we were working with Tuva. I was trying to help students navigate the massive data set and talking about what attributes they would need to look at. In guiding them to realize that to answer some of their questions they would not want to include “I don’t know” is not useful in answering the question. At times I felt like I didn’t give them the chance to come to that conclusion on their own.

    It has been a bit since I have taught middle school math, but contingency tables are not in our 7/8 curriculum. At the same time I do not think it is beyond the scope of most middle school students.

    I think that with projects like this there is actually something powerful about the “uncertainty”. Students realize that they are part of a project that is “in the works”. That they are laying the groundwork for future students (and or scientist) who will continue the project. I guess this goes back to what I said earlier: the idea that after our work is complete we still have more questions. I think an important follow up conversation is to ask what is needed to gain more clarity (or certainly) to answer your questions.

  3. Bill Zoellick Post author

    Sam Ward, who teaches high school math in Kittery, sent the following comments in an email and gave me permission to share them here:


    Something you mention at the top, a scenario where I’ve had a similar realization. When I ask a student “Why do you think that?” — like you said, they might assume I think they’re wrong. I’ve found that wording it a bit differently and adding more information such as, “What makes you think you that? That’s a good idea, and I’m curious how you came up with that?” or “I want to hear your thoughts because it might spark some thoughts in other students” or some variation. I almost always add something like “not that you’re wrong” or “I don’t think you’re wrong”.

    Depending on grade-level, a student-led discussion about how they think about probability would be appropriate, or what words come to mind when they think about probability, or examples of probability they can think of. This could catch up some students who might not have a great hold on the math vocab, and then segue the conversation toward relative possibilities of finding elevated iron levels in the two kinds of systems. The 9th graders I work with usually have a relatively solid idea of probability, which I think they pick up in the middle grades.

    I like the exercise of collecting some quick data with pronouns/handedness and putting the data into a contingency table. Before moving to conclusions about males vs females vs they it might be worth having students just respond to the question “what is something this table tells me” with the goal of getting simpler answers like “there are 9 right handed he’s”. This may be a more accessible question to get some other folks involved, and help kids connect what they’re seeing to what it actually means and help strengthen their ability to interpret contingency tables. Then you could move into comparisons, probabilities.

    Another question to pose may be “well it seems like x, y, and z is occurring, how could be more confident about these claims?” leading into a discussion of how more data is better to support your claims. This goes along with your question of “How confident are they that these relationships would be somewhat true for another class?”

    Anyway, I enjoyed reading this post. Seems like a lot of opportunity for student talk.


Leave a Reply