Sometime before Christmas there was a comment on the ol’ Twitter that (more or less) speculated on what an upcoming Ofsted rating would be, based on an attainment score (at least that’s how I remember it. It could have been some other metric. Look, it’s not important to the story). It immediately made me wonder how close I could get to actual Ofsted judgements without simply using published data. Can you judge a school without ever visiting?
In the spirit of dirty-data delving, I took the DfE’s Revised GCSE scores and school data from the January release and challenged myself to see how accurate my estimates would be compared to the real judgements. Some people have levelled the complaint at Ofsted that there is little point in making expensive, multiple day inspections of schools and the data already available could be used to make judgements. Of course this would not take into account any of the other multitude of factors, good and bad, that can be hidden behind so simple a metric as ATT8 and Prog8. Still, if it was possible to even reasonably close to real the real judgements I thought it was worth a shot.
I decided to start by picking a more or less random Local Authority (I’ll keep the areas I used anonymous, though it’s all taken from publicly available sources). The only choices I made here were that I was not familiar with the schools or the area in question (I had no idea on it’s size for example), nor did I know what kind of data would be useful. I went for mean Attainment 8 (ATT8) and mean Progress 8 (Prog8). Initially I looked at total ATT8 score, but since it depends on school size I removed it from my data to keep things simple. I removed Independent schools from the data and concentrated only on state schools; academies, VA, Free schools and so on.
The first thing I thought to test was the relationship between ATT8 and Prog8. If a school is getting good grades then you might assume progress would also be good. Any wide deviations from a straightforward correlation might imply something interesting. It quickly became apparent that Special schools were problematic, in that they were so far outside of other values that they skewed results widely.
Here’s what the initial scatter showed:
I’ll come back to the outliers in a moment. Satisfied that there was a reasonable correlation I made guessed at each of the schools simply based on nothing more than gut feeling about the figures. In fact, I went mainly for Prog8, since this could potentially show a school working above and beyond for their students. Here’s my shot. It took less than two minutes to do.
Blue indicates I got the guess right, orange that the real Ofsted judgement was 1 below, green 1 above. Red are my outliers. I should point out that I had already removed any schools with suppressed results or no Ofsted score available. These are generally new or recently opened schools that for whatever reason do not have values shown. So I got a total of 65% correct, but what of the wrong ones. The two red values first. The bottom score turned out to be a Special School, and from subsequent goes it turned out that I couldn’t get anywhere near the correct judgements for Special Schools using ATT8 and Prog8. Actually, a significant chunk turned out to be graded outstanding and I could have probably increased my correct percentages by just assigning every Special School in the list an instant Outstanding grade.However, I decided to take these schools out of the mix in subsequent trials. The other red score where I guessed 2 against an actual 4 had the last Ofsted in 2015 where it was unfortunately given a the lowest rating. Hopefully their rating will change this with the next inspection. In three cases I was more generous than Ofsted, though of course I’m working from current data whereas Ofsted ratings are historical, perhaps by several years. Yearly results may vary. Unless there are huge swings year on year though I guessed the number of schools getting worse since there Ofsted would be matched by schools getting better. One of the orange schools was a Steiner school which I wouldn’t consider to be typical.
This was an urban area with quite a few independent schools. I’d rejudged my strategy thinking I might be being too generous, adopting a strategy of choosing an arbitrary Prog8 to guess the level. For example, I went for +0.2 as being a grade 1. Why? Nothing more than a hunch. I suppose I set my ‘3’ rating too low though, and came up being overly harsh.Scores:
Correct:43%, one above 43%, one or 2 below 14%.
Being a small authority, individual schools make a bigger difference. Still, I wasn’t happy with my guesses here so adapted the strategy to have approxiamtely 20% 1, 50% 2, 25% 3 and 10% 4. On to the next authority:
I picked this authority because I had no ideas where it was. It turned out to be pretty small. Scores:
Correct 62.5 %, 1 above 25 %, 1 below 12.5 %.
The odd school in white was a Free Studio School which again in hindsight I would judge to be an atypical example. Wondering if maybe I had my eye in by now, I decided to go big next.
I didn’t bother putting this 71 school monster table up, just the results. 63% again, in less than two minutes. I ended up just picking out a threshold value again and going for roughly the 20:50:25:5 judgement ratio. A couple of schools I changed my guesses simply on whether I though the name of the school sounded more likely to get a higher judgement (hint: if it contains the word ‘girls’ at it’s worth guessing high). It turned out that the authority only had around 10% in the bottom two categories combined. Low 60% seemed to be my score. Combining all four authorities gave me:
Correct match to Oftsed 62 %
Ofsted result was 1 above my guess: 20 %
Ofsted result was 1 below my guess: 13%
Turns out I might be a harsh inspector I suppose.
What did I learn form all of this? I’m not too disappointed with my score. Maybe it depends on how much variation there is in genuine Ofsted judgements. I won’t be replacing the £200m Ofsted structure anytime soon, though for a bagain price of £5m, no scratch that, £10m, I’m willing to do it. Save a lot of people stress, too. Worth a crack?
If you want to try yourself, here’s a sample to see if you match up. First column is mean ATT8 and second is mean Prog8. Answers later.