6 Questions Hattie Didn’t Ask But Could Have

purpose-of-testing

6 Questions Hattie Didn’t Ask But Could Have

by Terry Heick

If you’re familiar with John Hattie’s meta-analyses and it hasn’t given you fits, it may be worth a closer look.

If you missed it, in 2009, John Hattie released the results of a massive amount of work (work he updated in 2011). After pouring over thousands of studies, Hattie sought to separate the wheat from the chaffe–what works and what doesn’t–similar to Marzano’s work, but on a much (computationally) larger scale.

Hattie (numerically) figures that .4 is an “average” effect–a hinge point that marks performance: anything higher is “not bad,” and anything lower is “not good.” More precisely, Grant Wiggins aggregated all of the strategies that resulted in a .7 or better–what is considered “effective.” The top 10?

  1. Student self-assessment/self-grading
  2. Response to intervention
  3. Teacher credibility
  4. Providing formative assessments
  5. Classroom discussion
  6. Teacher clarity
  7. Feedback
  8. Reciprocal teaching
  9. Teacher-student relationships fostered
  10. Spaced vs. mass practice

So what’s below these top 10? Questioning, student motivation, quality of teaching, class size, homework, problem-based learning, mentoring, and dozens of other practices educators cherish. Guess what ranks below direct instruction and the esoteric “study skills”? Socioeconomic status. Of course, it’s not that simple.

While I leave it up to Hattie and those left-brain folks way smarter than I am to make sense of the numbers, I continue to wonder how the effect of one strategy–problem-based learning, for example–can be measured independently of other factors (assessment design, teacher feedback, family structure, and so on). Also, it can also be difficult to untangle one strategy (inquiry-based learning) from another (inductive teaching).

Hattie’s research is stunning from a research perspective, and noble from an educational one, but there are too many vague–or downright baffling–ideas to be used as so many schools and districts will be tempted to use it. Teacher Content Knowledge has an effect size of.09, which actually is worse than if they did nothing at all? Really? So how does it make sense to respond, then?

As always, start with some questions–and you may be left with one troubling implication.

What Should You Be Asking?

Recently, we shared a list of these effect sizes, shown in ascending order. Included in Grant’s original post is a well thought-out critique of Hattie’s work (which you can read here), where the author questions first Hattie’s mathematical practice of averaging, and then brings up other issues, including comparing apples and oranges (which another educator does here). Both are much more in-depth criticisms that I have any intention of offering here.

There are multiple languages going on in Hattie’s work–statistical, pedagogical, educational, and otherwise. The point of this post is ask some questions out loud about what the takeaways should be for an “average teacher.” How should teachers respond? What kinds of questions should they be asking to make sense of it all?

1. What’s the goal of education? 

Beyond anything “fringe benefits” we “hope for,” what exactly are we doing here? That, to me, is the problem of so many new ideas, trends, educational technologies, research, and more–what’s the goal of education? We can’t claim to be making or lacking progress until we know what we’re progressing towards.

The standards-based, outcomes-based, data-driven model of education has given us bravely narrow goals for student performance in a very careful-what-you-wish-for fashion.

2. How were the effect sizes measured exactly?

How are we measuring performance here so that we can establish “effect”? Tests? If so, is that ideal? We need to be clear here. If we’re saying this and this and this “work,” we should be on the same page about what that means. And what if a strategy improves test scores but stifles creativity and ambition? Is that still a “win”? 

3. What do the terms mean exactly?

Some of the language is either vague or difficult to understand. I am unsure what “Piagetian programs” are (though I can imagine), nor “Quality Teaching” (.44 ES). “Drugs”? “Open vs Traditional”? This is not a small problem.

4. How were those strategies locally applied?

Also, while the “meta” function of the analysis is what makes it powerful, it also makes me wonder–how can Individualized Instruction only demonstrate a .22 ES? There must be “degrees” of individualization, so that saying “Individualized Instruction” is like saying “pizza”: what kind? With 1185 listed effects, the sample size seems large enough that you’d think an honest picture of what Individualized Instruction looked like would emerge, but it just doesn’t happen.

5. How should we use these results?

In lieu of any problems, this much data has to be useful. Right? Maybe. But it might be that so much effort is required to localize and recalibrate it a specific context, that’s it’s just not–especially when it keeps schools and districts from becoming “researchers” on their own terms, leaning instead on Hattie’s list. Imagine “PDs” where this book has been tossed down in the middle of every table in the library and teachers are told to “come up with lessons” that use those strategies that appear in the “top 10.” Then, on walk-throughs for the next month, teachers are constantly asked about “reciprocal teaching” (.74 ES after all), while project-based and inquiry-based learning with diverse assessment forms and constant meta-cognitive support is met with silence (as said administrator flips through Hattie’s book to “check the effect size” of these strategies).

If you consider the analogy of a restaurant, Hattie’s book is like a big book of cooking practices that have been shown to be effective within certain contexts: Use of Microwave (.11 ES) Chefs Academic Training (.23 ES), Use of Fresh Ingredients (.98). The problem is, without the macro-picture of instructional design, they are simply contextual-less, singular items. If they are used for teachers as a starting point to consider while planning instruction, that’s great, but that’s not how I’ve typically seen them used. Instead, they often become items to check, along with learning target, essential question, and evidence of data use.

Which brings me to the most troubling question of all…

6. Why does innovation seem unnecessary?

Scroll back up and look at the top 10. Nothing “innovative” at all. A clear, credible teacher that uses formative assessment to intervene and give learning feedback should be off the charts. But off the charts how? Really good at mastering standards? If we take these results at face value, innovation in education is unnecessary. Nothing blended, mobile, connected, self-directed, or user-generated about it. Just good old-fashioned solid pedagogy. Clear, attentive teaching that responds to data and provides feedback. That’s it.

Unless the research is miles off and offers flat out incorrect data, that’s the path to proficiency in an outcomes-based learning environment. The only way we need innovation, then, is if we want something different.

Education: No Innovation Required; image attribution flickr user usarmycorpofengineerssavannahdistrict

5 Comments

  • Thanks, Terry, for this interesting article, with much to sort through. I was drawn in by the title, as the Head of School where I work (Rock Point School, a small boarding high school for students who have hit bumps in the road personally or academically, or who have felt alienated at large high schools) just posted an article on our website called “It’s Not Rocket Science,” in which he hopes for thoughtful conversation regarding innovation, and wonders whether core principles (the “tried and true”) will get lost in our culture’s obsession with innovation. His (short) post can be found here:
    http://www.rockpointschool.org/its-not-rocket-science/

    I really appreciate all of the questions you put forward in this post, starting with “What is the goal of education?” I also really appreciate your questions about how effectively we can isolate factors in order to measure them. I understand that we, as a society, want to isolate and measure in order to create accountability on a large scale, but what if the large scale is part of the problem. I also understand that we need assessments in order to create accountability, but what kind of accountability do we get when, on a large scale, there isn’t a way to create nuanced assessment? For example, if an assessment measures whether a student gets an answer right or wrong (and that is generally what standardized assessments measure, for just about everything except writing skills), it isn’t measuring progress (which I think should be one of the goals of education––moving each student’s skills forward from whatever the skill-levels were at the start). For example, a student can get a wrong answer by making a computational error, but looking at the work (as a teacher does in the context of a classroom test), the teacher might see that the student has grasped the concept of solving a particular kind of problem. This is relevant in learning, as is feedback about what we might call partial progress. As an educator, I’m sure it’s worth celebrating when a student’s skill level increases (and also when a student’s curiosity or risk-taking increases). These latter two can actually lead initially to more wrong answers, but they’re an incredibly important part of learning and growing.

    It’s hard to step back and look at what we can measure and what we should measure, especially when our abilities to collect and sort measurements are only increasing. I’m grateful for your summary of some of these issues and your thought-provoking questions.

  • Thanks Terry for your critical look at Hattie’s influential work. At Classkick, we’re trying to make student feedback more effective and timely, so we honestly felt validated when formative assessments made Hattie’s top 5. That said, we recognize the difficulty and limit of assessing individual teaching techniques, interventions and circumstances in a vacuum. When faced with such an integrated and involved issues as how to best educate children and even more fundamental questions like “What is the goal of education?”, it’s intimidating to try to fix everything at once. That’s why I think “checklists” of solutions and the ever-presetn top 10 lists are so appealing. They help focus our attention and make us feel productive. In the end, I’m happy that we have both folks like Terry asking the big, tangled questions and other working on specific, targeted issued – it will take both to come up with effective answers!

Leave a Reply