The National Assessment of Educational Progress, commonly called the “Nation’s Report Card,” released scores from the 2013 tests in math and reading for the entire nation. Scores can be used to compare public school and private school students for the entire country, and to compare public school students in each of the 50 states, the District of Columbia, and the Department of Defense. We’re doing a little better than we were in 2011 and a lot better than we were in 1990 or 1992, the scores show.

Fourth-grade and eight-grade mathematics scores went up by 1 point each, a gain that was noted as statistically significant given the sample size of 376,000 fourth-graders and 341,000 eighth-graders across all 50 states. In reading, the average eighth-grade score was 2 points higher than in 2011, also a statistically significant change, but fourth-graders showed no score gains on the reading test.

### Here’s what the scores DO NOT show

Although comparisons can be made between states using the NAEP scores, you have to remember that the scores are for the entire state. Public school students in the entire state were sampled, regardless of their school’s status under any type of school improvement plan or statewide early-warning system.

You cannot use the scores to say something like, “Schools of Type A are better than schools of Type B,” or “Schools that use Curriculum A are better than schools that use Curriculum B.” You can’t even use the NAEP scores to say “Policies in State A are better than policies in State B when it comes to education and their students.”

Diane Ravitch, a former member of the NAEP board, wrote that the use of scores to promote one set of policies over another is “balderdash.” I’m glad she didn’t use a cuss word, because she is one of the most delightful bloggers and authors to read (look for my review of her book *Reign of Error* as soon as I finish reading it). “I find this statistical horse race utterly stupid. Are students in DC getting a better education than those in Massachusetts? Highly unlikely,” she wrote.

### Fourth-grade reading

Seven states—Colorado, Indiana, Iowa, Maine, Minnesota, Tennessee, and Washington—plus the District of Columbia and Department of Defense realized statistically significant gains in fourth-grade reading scores compared to fourth graders in 2011. However, three states—Massachusetts, Montana, and North Dakota—saw their fourth-grade reading scores go down in a statistically significant way.

The national public school average went from 220 to 221, a statistically **insignificant** change compared to 2011.

In Maryland, fourth-grade reading scores went from 231 to 232 for public school students, which was considered a statistically insignificant change.

(The Maryland State Department of Education issued a press release Thursday, saying this about the fourth-grade reading scores: “Maryland scores improved from 231 in 2011 to 232 in 2013.” This statement is false in that it uses the word “improved.” It is a lie to issue an official press release calling two scores that represent the same level of performance by the state’s students an “improvement.” This is a disservice to the good students and teachers of the state, and the practice of misrepresenting data must be stopped. The next time I see an official press release from a taxpayer-supported agency that calls something a thing it clearly is not, I’m going to jump out the window. At that point, when school boards or state departments consciously decide not to tell the truth, it’s hopeless, as any useful communication has been cut off.)

In Illinois, fourth-grade reading scores held at 219, obviously representing no change, statistically significant or otherwise.

### Fourth-grade mathematics

Only 14 states and the District of Columbia and the Department of Defense showed statistically significant gains in fourth-grade mathematics, compared to 2011: Arizona, Colorado, Delaware, Hawaii, Indiana, Iowa, Minnesota, Nebraska, New York, North Dakota, Tennessee, Washington, West Virginia, and Wyoming. All other states showed no significant gain or loss in score compared to 2011.

The national average for public school students gained 1 point, which was considered statistically **significant**.

In Maryland, math scores for all fourth-graders went from 247 to 245, a statistically insignificant change. To suggest this change was a “dip” or “decline” would be a misrepresentation. It would be elevating statistical “noise” to the level of the “signal.” Misrepresentations like this have led to bad decisions in the past.

In Illinois, math scores for all fourth-graders held at 240, so of course there was no statistically significant gain or loss.

### Eighth-grade reading

At the eighth-grade level, 12 states—Arkansas, Californa, Florida, Hawaii, Iowa, Nevada, New Hampshire, Oregon, Pennsylvania, Tennessee, Utah, and Washington—plus the District of Columbia and the Department of Defense saw statistically significant gains in reading scores on the NAEP.

The national average went from 264 to 266, a change which was considered statistically **significant**.

In Maryland, scores went from 271 to 274, representing no statistically significant change compared to public school performance in 2011.

In Illinois, scores went from 266 to 267, a change which was not considered statistically significant.

### Eighth-grade mathematics

Five states—Florida, Hawaii, New Hampshire, Pennsylvania, and Tennessee—plus the District of Columbia and the Department of Defense saw statistically significant increases in their eighth-grade math scores compared with 2011. Three states, however—Montana, Oklahoma, and South Dakota—realized statistically significant drops in their eighth-grade math scores.

The scores for all public school students in the country went from 283 to 284, representing a statistically **significant** increase.

In Maryland, scores on the eighth-grade math test went from 288 to 287 for all public school students, which represents no statistically significant change from 2011 performance.

In Illinois, eighth-graders scored 285 on the math test, compared to 283 in 2011, but this change was not statistically significant.

One idea drawing some coverage around the nation is this idea of “statistical significance.” NAEP uses a random sample of students from each state and doesn’t test every student. The scores are therefore estimates, drawn from only a portion of the students in a state.

If the same exact test were given to a different random sample of students on the same day, under the exact same conditions, etc., the scores would be different. Now, the larger the sample is each time, the less variability you would see in the results.

NAEP sets the critical point for statistical significance at the 95% confidence interval (p < 0.05). In other words, if they gave the test to 100 random samples of students in the state, the average score for at least 95 of those samples would be within this 95% confidence interval. The score reported is the average, but there is some error, which is known as standard error. Even though the average may be higher, the error above and below that average, calculated using a predefined confidence interval, may overlap with the error above and below the average from the previous result. That's the situation when we say there was no statistically significant increase or decrease in scores from the 2011 administration. When we actually run the reading scores for Maryland's eighth graders, for example, we may find that the increase from 271 to 274 is certainly significant using some other confidence interval besides 95%. That means, roughly translated, that a random sample of students in 2011 would score lower than a random sample of students in 2013 that many times out of 100 times on the eighth-grade reading test. NAEP officially sets the confidence interval at 95%, and there are good, historical reasons for setting it there. But when we talk about statistical significance, we are speaking very specifically about NAEPs definition of the term, and their definition is stricter than many people believe it should be. If Maryland is 80 percent sure, let's say, that reading scores went up, is that really any different from being 95 percent sure reading scores went up? I'm not so sure.