It’s rare that we write about individual football games, but as we determined there was enough data from Illinois high school varsity football this season to turn on our predictive statistics, an upset victory yesterday by 0-3 Minooka over 3-0 Plainfield Central caught our eye immediately.
Our α (alpha) statistic, which tends to be more reliable toward the end of the season because it has more data to work with, was telling us that Plainfield Central had about a 93 percent chance of winning the game (α = 37.8). But even our Δ (delta) statistic, which has been shown to be more reliable in the early weeks of the high school football season, gave Plainfield Central a 97 percent chance of winning. It could be argued that our statistics, with their high rate of predictive power, especially late in the season, are some of the most sophisticated analysis tools out there, given the limited amount of information that is reliably available for high school football games on the whole.
But even less quantitative measures put Plainfield Central near the top of the pack. The Sun-Times or AP, or somebody, had them ranked as No. 7 in the Chicago area before this game. Minooka was unranked going in. But if that would lead you to believe you were going to get a match-up like today’s college football contest between No. 5-ranked Florida State and unranked Wake Forest, you were in for a surprise. Sometimes these rankings and our stats are reliable, sometimes not so much.
And Minooka didn’t just win the game; they dominated, based on a story in the Chicago Sun-Times. They held Plainfield Central to 113 rushing yards while getting 315 rushing yards of their own from three running backs (Max Brozovich, Nate Gunn, and Cory Bee) in 30 carries. And the passing game was the same: Minooka limited Plainfield Central’s quarterback to 6-of-23 for 57 yards, including three interceptions by Minooka corner back Corbett Oughton, who ran one back for a touchdown.
Near the end of the third quarter, Minooka had a 38-0 lead when a Minooka punt was returned 75 yards for a touchdown. That was followed by a Plainfield Central fumble recovery deep in Minooka territory and a subsequent touchdown, bringing the score to 38-14 early in the fourth quarter.
That was it for Plainfield Central, though. “I’m not sure we were ready to play,” Central coach John Jackson was quoted as saying. “We never got in a rhythm. We were flat. They respected us. They came out and played hard, and we were a little passive. It took until the second half for us to play like we are capable of. We put a little scare into them for a while in the third quarter, but then the mistakes happened again.”
Minooka, however, didn’t exactly have its regular offense on the field. It was notably missing senior quarterback Joe Carnagio, who has an ankle injury. Enter Shane Briscoe, a junior who “did a nice job,” according to coach Paul Forsythe, for whom last night’s win was his first as head coach.
“The offense needed the first score, and he led us down there to get it,” the Sun-Times quoted him as saying. “In three games, we never led. We needed the confidence from that.”
On any given Friday, in any given classroom
Our predictive stats, though reliable and forged by trial and error, aren’t meant to be used by Vegas bookies to make money. They’re just for fun, and on any given Friday night under the lights in Illinois, we’ll get a few games wrong. Teams with a 7 percent chance of winning who clobber their opponents are rare, but those games happen—obviously.
But as easy as it is to find games that don’t exactly perform as statistics say they can be expected to perform, teachers can be found who have bad value-added scores and yet inspire kids every day and need to be kept in the classroom in the face of test-score stats that say they should be fired. And these stats aren’t just for fun: they involve people’s careers, families, livelihood, etc.
For example, a story in the Washington Post is about a teacher who got a bad value-added score but yet is a great teacher. “This story is not an aberration,” writes Valerie Strauss about the teacher in her “Answer Sheet” blog. Minooka beating Plainfield Central isn’t an aberration, either, but it did take some people stepping up their game.
This is what the Chicago teachers are fighting for, folks. Any good teacher welcomes evaluations as a chance to improve, but what they don’t believe is that test-score statistics can be extrapolated to measure their effectiveness or quality as a teacher.
For us, because limited information is available for our high school football stats—points scored—we can only use that to try to extrapolate a team’s overall strength or the likelihood they’ll win any given game. Likewise, we can’t test kids every single day of every single year on every point in the curriculum. We’re extrapolating a great deal—from student test scores to teacher effectiveness. Any value-added model, therefore, is not much better than our football stats, which extrapolate overall team strength based on points scored in their games and their opponents’ games.
Pressure, pressure, pressure … on young children, no less
Just imagine telling football players, in a little sideline conference, something like, “They’re going to fire me as your coach if you don’t beat Minooka by 57 points or more, because that’s what the stats say you should score against them, so just forget about all we talked about before, get the ball in the end zone, and let’s give it all you’ve got out there tonight!”
This would put so much pressure on kids to score 57 points that they might exhaust themselves trying to meet that mark and then lose the game altogether. Yet that’s what teachers are saying to 9-year-old third graders in classrooms where value-added models are used: “If at least 87 percent of you don’t get more than 80 percent of the questions correct on this math test, which I know is much narrower in scope than we’ve been learning, they’re going to fire me, so forget about all that other stuff, just focus on adding fractions, and let’s make those scores happen, shall we?”
These tests, which at the time they were created, had no impact on a child’s grades, how much people liked him, or any of those things that are important to our children, now all of a sudden mean his favorite teacher might very well find a way to get him out of her class because she knows he can’t achieve a certain scaled score on the test and she has a house payment to make and two daughters to put through college.
An open letter, signed by 88 Chicago-area professors, says Chicago schools should not implement a value-added teacher evaluation system at this point: “With a focus on end-of-year testing, there inevitably will be a narrowing of the curriculum as teachers focus more on test preparation and skill-and-drill teaching. Enrichment activities in the arts, music, civics, and other non-tested areas will diminish.”
Same thing in football: You can’t spend all your time guarding against an upset from an 0-3 team, because that preparation will distract you from more important overall concepts your team should be working on to have a better season, things like strength training, conditioning, the fundamentals of football. If you get distracted, you start producing teams that are flat not just for one game but for game after game. Teams, of course, can be flat on any given day, as Plainfield Central was last night. What if they’re flat for the next three weeks? Should the coach be fired? That’s kind of what we’re saying for teachers who show flat performance in a narrow range of topics, even if their overall quality as a teacher is high. If that were how it worked, we would’ve already fired Minooka’s coach.
I’ll tell you this about probability: The odds of Plainfield Central being flat are exactly the same next week as they were this week. Yet I doubt that will happen, but it could. And if the coach’s evaluation were based solely on those three weeks, there would be no opportunity for an athletic director to step in and say, “But they dominated in the first three games this season.” So much for probability and statistics in football. Now let’s see how long it takes the myopic people pushing the value-added model—or at least people who are blinded by mathematics—to see more clearly and to actually learn lessons from the real world.
The inevitability of cheating
Football isn’t as consequential as childhood education, but both football and education have found ways to cheat. Evidence has shown that the more consequential a game is to big money, the greater the likelihood that cheating will occur. Same in education: In one recent cheating scandal, teachers who were accused of changing answers in test booklets were recommended for reinstatement by Baltimore City Schools because it could not be proven that they cheated.
Statisticians did erasure analysis, as it’s called, where it is assumed that kids who submit tests with more than an expected number of erasures on multiple-choice questions did not actually do the erasing themselves. Then statisticians did a little extrapolating of their own. All they have is the number of erasures on test booklets, compared to the number of erasures each kid made on the previous year’s tests, and they extrapolate how many of those booklets had answers changed by a teacher.
Anyway, there was a hearing in Baltimore, and heads rolled. But then lawyers argued that cheating had not been proven beyond a reasonable doubt and teachers should not be deprived of their livelihood on the basis of hypotheses. What if, for example, teachers provided students with a “test-taking strategy” that had them bubble in multiple ovals for each question—all four if they couldn’t narrow down the answer immediately—and then erase the ones they eliminated as they thought about the question?
That could explain a high number of erasures in the test booklets, and a hearing panel can’t exactly call 9-year-olds to the stand to check out the story. They probably like their teacher and will say whatever their teacher or parents tell them to say, making their testimony unreliable.
Many of the same lawyers who can see the above “reasonable doubt” also advise policymakers to adopt statistical wizardry for evaluating teachers that rests on no more solid footing than the erasure analysis. That’s just silly. In fact, if you’ve ever observed kids taking multiple-choice tests, you know the erasure analysis is on much more solid ground, and still, it didn’t pass muster when it came to American due process.
Take-home lessons about football and teacher evaluations
The lesson here is that not everything has to happen in the same, standardized way in every single classroom across America just because they’re all teaching core subjects from the Common Core. As we see in the Baltimore cheating scandal, our law generally doesn’t accept the fact that statistics prove guilt. As we see in the football game, statistics don’t guarantee a winner.
If everything happens the same, it will make things easier for politicians and policymakers to talk about it, but it will make actual learning much more difficult. Making things easy for politicians to talk about—or for corporations to sell stuff—isn’t really the point of education, though. Or do you think I’m being naïve?
Occasionally teachers break the mold. Occasionally a junior quarterback, a star corner back, or a first-year coach steps up his game.
Corporate reformers live in a world of tremendous transition, where I was once told by a boss in an I/T department at a huge bank, “The only way to move up is to move out [to another company].” As such, they have something against the idea that experience is an important standard by which we measure teachers. I also believe it’s a completely worthless statistic, but education is a different world, where veteran teachers can often be found in the same classroom for 20 years. There’s something about that difference that escapes many reformers making headlines today.
Another way to look at this is the following: If we allow a higher percentage of teacher evaluations to depend solely on objective test scores, then there is less of an opportunity for a principal to override the statistics when a teacher has proven the stats wrong. We can’t just award Plainfield Central a victory in the game against Minooka because the stats give us a “97% chance” of a certain outcome. Nor can we fire teachers just because certain stats, which were never designed to measure teacher effectiveness, tell us she’s a bad teacher. And that, sports fans, is why we play the game!