Striving for a personal best – who judges?

In this post I’m going to look at some of the issues surrounding a form of assessment you’re probably familiar with, but not with the terminology that goes with it: ipsative assessment. So what is ipsative assessment? It’s a means of assessment that measures the student’s progress, not by comparing to some external (and often arbitrary) standard, but by comparing the student’s achievement now to their achievement in the past. It asks the questions: Did I improve? By how much? It avoids the questions: What mark did I get? Did Fred do better than me?

This shift from external validation to internal comparison is an important one, and has a number of implications. Firstly, when done well it moves the focus towards personal growth and development. This, to me, is what education should be about. Arguably it’s the only measure that really matters long term, although the secondary effects of gaining qualifications and meaningful employment are also important. Education in this sense vastly exceeds the education-as-job-training model, developing citizenship as well as knowledge. My model of education always reminds me of a line from the film Dances with Wolves, when Kicking Bear (Grahame Greene) is talking to John Dunbar (Kevin Costner): “I was just thinking that of all the trails in this life there is one that matters most. It is the trail of a true human being. I think you are on this trail and it is good to see.” This idea is not new and goes back to Ancient Greece, where the responsibilities of citizenship were deeply connected to everyday life.

coaching word cloud

Coaching word cloud

Ipsative assessment in widespread in sports, and is often practised in conjunction with a coach – it’s the process behind the concept of a ‘personal best’, where achieving a personal best is a cause for celebration regardless of the result compared to others. I have my own boat (a GP14) and sail at a local club. It’s not a competitive boat. That’s not an excuse – the boat was built in 1973 and still has the same sails (which were old then) as when I bought the boat around 2000, so I’m not going to be winning any club races, even if I were a better sailor 🙂 . But it doesn’t matter: what matters is did I have a good race? How well did I spot the wind shifts? How well did I round the marks? In short, how am I sailing in this race compared to my own skills? Which, of course, is the only thing I can actually control. In this way, ipsative assessment is ‘performance in practice’, and as such is a prime example of authentic assessment.

The school system here in the UK uses a measure called ‘value-added’, which looks at the change in scores between various stages of schooling, with examination or test scores being the primary accountability measure. The downside is that if this measure is used to judge schools rather than record their performance then there will be pressure to game the system, which means that value-added isn’t measuring what it’s supposed to. In addition, I recall reading a blog recently where a teacher was criticising value-added because it assumed that the class that were tested in year 6 contained the same children that were tested in year 1. Their particular school had a high turnover of children because of local circumstances so the assumption didn’t hold. How on earth can you measure progress without tying the data to an individual? Surely without that link value-added has no value at all?

What I like about ipsative assessment is that the attention is focused on what has been achieved rather than what has not. It also gives us an additional chance to engage learners with taking responsibility for their own learning and that’s crucial for ipsative assessment to be effective, although achieving it can be problematic. When my daughter was taking her degree each assignment had a self-assessment sheet that was a copy of the one used by the tutors. The students were supposed to judge their own performance and then compare it to the tutor’s sheet when the work was returned. My daughter, despite many conversations about why self-evaluation was useful and what the educational purpose was, would simply tick the middle box all the way down. In effect, she deferred to an external authority to judge her work.

Conceptually, there is also a link to assessment for learning (A4L). While A4L allows the teacher to gauge student knowledge it can also be seen as ipsative assessment for the teacher that then feeds into their reflective practice.

A key question is how can we formalise ipsative assessment without losing the ethos behind it? We need structures and procedures to mediate and support the process, but the last thing education needs (especially schools-based education in the UK) is another accountability stick with which to beat the teachers. Firstly, if the process is simply a tick-box exercise then it’s not being allocated the importance it needs, and neither student nor teacher will take it seriously. Secondly, it’s vital that the process is student-owned. The student must be taking part in an active way with evaluating and processing their ipsative feedback for them to get the gains it offers. As Hughes has pointed out, the student needs to be the main participant, not the teacher.

In a previous post I described Jeremy Levesley’s approach to assessment, and this could fit quite nicely with ipsative assessment. Suppose we use Prof. Levesley’s approach so that we’re getting to a mark or grade quickly and then use the freed time to put more effort into the ipsative process? We get a mark to meet our accreditation needs (and those students who will still fixate on the mark above all else), and we get to develop learner independence and the self-assessment capabilities of our students. It seems like a win-win, but would a hybrid approach work, or are we just contaminating the ipsative process? I believe it could be done if the academic systems we choose to adopt within our course reinforces the practices we wish to see in our students.

The reason I think ipsative assessment isn’t more prominent in education at the moment is the relentless focus on education (and students) as a product, rather than education as a process, as Cory Doctorow recently observed in his wide-ranging video on openness, privacy and trust in education. And that’s the wrong focus. Why train students to perform in systems that are so unrepresentative of the world beyond the campus when we could teach them to judge themselves and extend beyond their personal best?

Assessment: it’s a variable not a constant

The second post in this short series is going to be about assessment in general. How reliable is it? What are we actually trying to measure? Are we more concerned with how we measure rather than what we measure?"Assessment" You keep using that word. I do not think it means what you think it means

Reliability (assessing consistently), and validity (correctly assessing the thing we want to assess) are key concepts in assessment, but how well are we achieving these aims in practice? We pretend that assessment and marking is an objective, repeatable process, but like much of the rest of education, it’s messier and more untidy than that.

Tom Sherrington, points out that marks aren’t uniform in size. Some marks are easier and some harder to achieve so that everyone has a chance to demonstrate their learning. We can also use ‘harder’ marks to distinguish between grades, as was the case with Hannah’s sweets in a previous post. The aggregate scores are also not directly comparable. If two students have a score of 36 out of 50 then this implies equal achievement and level of knowledge but this may not be the case when we take into account where within the assessment those marks came from. If that’s the case when we’re talking about a science exam, then when it comes to more essay-based disciplines is it any surprise that “there is no correct score for an essay?”. And if you think marking is variable at school, then marking in higher education may come as a shock.

There is a big difference between assessment of learning (summative) and assessment for learning (formative), and as I mentioned in the first post in this series we tend to ignore the formative in favour of the summative. Interestingly, assessments often try to combine the two, which then begs the question: are we doing either well? Assessment is typically summative, with some feedback included in the hope that it ‘feeds forward’. The downside is that from the perspective of the student the point at which that feedback would be useful has already passed because they see this assessment as having ended – they focus on the mark since that tells them ‘how they did’. The feedback isn’t really assessment for learning, but rather an explanation of the assessment of learning, with an few added implications for future practice.

Jeremy Levesley recently gave a presentation describing his approach to assessment in a third year mathematics module. His aim is to spend as much time as possible in communication initiated by students, and to engineer situations in which that is promoted. Different assessment activities are targeted to the grade boundaries so that, for example, the question ‘is this first-class work?’ can be answered yes/no/maybe. Basic skills tests decide pass or fail, unseen exam questions decide 2i or 2ii, and student topic work decides whether the work is first-class or not. I really like this approach because it’s an example of how assessment can straddle both the formative and summative assessment camps. Because the conversations are student-initiated they can serve as assessment for learning, involving students in taking responsibility for managing their own learning. The summative nature of the assessment explicitly recognises the inherent variability in marking, and takes advantage of it by targeting it to a single answer (yes/no/maybe). This gives a corresponding reduction in workload for the tutor, freeing up time to be used more productively elsewhere.

Assessment can be misused. There can be a change of emphasis towards what can be measured relatively easily rather than what is valuable. For example, character education is an emerging area in UK school education and I can see its value (provided that by character we don’t mean conformity), but should we be ‘measuring’ it? This emphasis on measurement has two impacts. First, the valuable (but difficult to measure) gets replaced with some easier to measure proxy, so character might be measured by how many days students are involved in community volunteering. Secondly, the proxy becomes the measurement. Originally, SATs were supposed to sample student knowledge and skills at a particular point in time. What has happened is that SATs have come to dominate teaching activities as the scores have become the means by which schools are deemed to be failing or succeeding. What started as a useful benchmark has become an accountability stick with which to beat schools and teachers. Also, the pseudo-statistical judgements made about school performance are highly dubious if we’re looking for robust and valid measurements, as the Icing on the Cake blog frequently documents.

I love Carol Dweck’s work on mindset. Judgements of a student (whether from inside a school or outside) have a significant effect on student achievement because students can see themselves as failing in comparison to others, or failing because the school is failing, but this ignores the internal achievement that may be occurring. A runner may come, say, 439th in a marathon, but might have knocked ten minutes off their previous best time. They didn’t win the race, but their performance was the best of their life, so how can that be judged as a ‘failure’?
A policy being floated at the moment here in the UK is that of ‘secondary readiness’. Students take SATs at the end of key stage two (age 11). Previously, their scores would have been recorded and passed to their secondary (high) school, but the proposal now is to introduce the idea of ‘failing’ these tests. If the scores aren’t high enough then they are to retake them. This has the potential for children to be labelled (by themselves or others) as failures at a time of major social upheaval (the move to secondary school) and just before they hit puberty. Now, what could possibly go wrong with that? 🙂

I understand the need for accountability and having commonly understood standards of achievement, but the greatest educational impacts are not those projected outwards, but those reflected inwards. We like to know how we’re doing in relation to others, but more importantly how we’re doing in relation to our own ‘personal best’. That’s the subject for the next post.

Hannah got the sweets, who got indigestion?

Last Thursday in the UK around half a million 15 and 16-year olds took a GCSE maths exam, specifically the second paper in the non-calculator exam. By Friday the exam was trending on twitter (#EdexcelMaths), with one particular question attracting attention:

There are n sweets in a bag.
6 of the sweets are orange.
The rest of the sweets are yellow.

Hannah takes at random a sweet from the bag.
She eats the sweet.

Hannah then takes at random another sweet from the bag.
She eats the sweet.

The probability that Hannah eats two orange sweets is 1/3.

(a) Show that n2 – n – 90 = 0

I had a quick attempt and after one unproductive sidetrack I got the answer. So why am I writing about this? Because it fits in with the other posts on assessment I’m doing, and to explore some of the issues around it.

First, the actual  mathematical content is pretty straightforward – you only need to know how to do three things: calculate a probability without replacement, multiply fractions and rearrange an equation. This is hardly Sheldon Cooper territory.

The exam board has two tiers for the qualification (foundation and higher) and probability without replacement is only explicitly mentioned for the higher tier. The exam has been quoted as saying the question was aimed at those students who would achieve the highest grades (A and A*), and I think grade discrimination is a fair approach. I did ask my daughter (who’s currently revising and taking A-level maths) and she said she wasn’t sure she would have been able to answer it at 16. For non-UK readers, GCSE exams are taken at the end of compulsory schooling and A-levels are taken at 18, typically as a route to studying at university.

So why my unease with students finding it difficult? There’s always the charge of dumbing down levelled at exams but I don’t think that’s it. True, when I did my maths exam at that age the syllabus included calculus of polynomials and their applications, which now is only introduced at A-level, but they were different qualifications – GCSEs were only introduced after my school career had ended. I think my unease comes from the fact that I think this shouldn’t have been seen as a difficult question. Donald Clark has blogged seven reasons why he agrees with the children and thinks it wasn’t a fair question, some of which I agree with and most that I don’t.

There’s a couple of factors involved here. I recall reading a study where they looked at who could answer questions with the same maths content but that were written in different ways. That study found that questions written as word questions rather than equations were consistently harder to answer, even though there was no difference in the actual mathematical content. Secondly, I think it’s the way that maths is taught as rules and recipes to follow rather than a creative problem solving activity. This is not a criticism of the teachers because I think that it’s taught that way precisely because of the pressures that have (politically) been placed on education. As I’ve mentioned before I’m a big fan of Jo Boaler’s approach and it’s emphasis on flexibility and application of technique rather than stamp-collecting formulae. Donald Clark makes the distinction between functional maths (maths for a practical purpose such as employment) and the type of maths typically found in exams, but I think that’s a false dichotomy in this case. As Stephen Downes said “… what this question tells me is the difference between learning some mathematics and thinking mathematically.”. The difference between functional and theoretical maths (at this level) starts to disappear when we think mathematically – maths becomes a toolbox of skills to be applied to the problem at hand, rather than a particular formula in a particular topic to be remembered.

And if you’re wondering what the answer was:

The solution to Hannah's sweets

The solution

Assessment – where are we? Where are we going?

Assessment has become something of a theme in the things I’ve been involved in over the past few months, so this is the first of a series of posts on the topic.
I’ve just completed the ‘Assessment for Learning in STEM Teaching‘ MOOC on Futurelearn by the University of Leeds. It was a little outside my normal area in that it was focused more towards learning in schools, but I was looking for general principles and any crossover with higher education. Lately, I’ve found myself subscribing to more blogs that are specifically to do with the schools sector. Partly this is due to my interest in politics and the impact of the recent (May 2015) general election in the UK on the education system as a whole, and partly because of a personal project I’m developing (

Keep calm and assess formatively

Keep calm and assess formatively

The key starting point for the MOOC was a review of an influential paper ‘Inside the black box: raising standards through classroom assessment‘ by Paul Black and Dylan Wiliam. The paper looked at the evidence for the effect of formative assessment within the classroom on student achievement. According to Google scholar this paper has been cited over 4500 times, and Dylan was one of the educators on the course.

Formative assessment is important – it makes a significant difference to student outcomes, but it’s the bit that’s often missed, the bit students don’t really see the point of. For many students, formative equals optional, and optional means not required. But if you were to suggest that someone was to learn to drive by skipping the lessons and just taking the test again and again, they’d think you were mad, but that’s precisely what happens when formative assessment isn’t done by students or made available by staff. The term ‘formative assessment’ itself can mean different things to different people, so let’s narrow down what we’re talking about in the context of the course.

Here we’re talking about formative assessment ‘within the classroom’. This isn’t formative assessment as practice preceding a summative assignment, or to (primarily) tell the students how they are doing. Instead, what assessment for learning means is that part-way through a session the teacher takes some action to gauge what the current understanding of the students is, where the misconceptions are, and crucially, what action needs to be taken next. It’s the educational equivalent of orienteering – find where you are, look at your next destination, plan the route and move.
The course talked about intentional dialogue – the planned process of exploring understanding – and differentiated between facilitated discussions and hinge-point questions. These hinge-point questions are carefully crafted multiple choice questions where the answers map to common misunderstandings. Follow-up questions to particular students about why they answered the way they did meant the teacher could confirm they got the right answer for the right reason or explore the reasoning behind the wrong answer. Videos of teachers in action in the classroom bridged the gap between theory and practice, as well as giving us the chance to analyse and critique the technique. Intentional dialogue reminded me a lot of Laurillard’s conversational framework, although in this case the conversations are face-to-face and not mediated by technology.

Assessment for learning is a significant factor in raising achievement. If we want students to engage with formative assessment (in all its forms) then we need to design our courses so that formative assessment is what is emphasized and rewarded. As Jen Ebbeler says “by incentivizing practice, we are actually incentivizing the type of behaviour that leads to learning”. What should happen is that teachers have authority over their professional practice and decide how and when to use it. I have heard anecdotally that in some areas it has become an additional accountability measure i.e. teachers now not only have to use it, but need to be able to prove that they’ve used it. Perhaps if we want educational results to rival those of the Scandinavian countries it might be better if we took on board some of their methods, (see the taught by Finland blog for a comparison) such as giving teachers the trust and autonomy to simply do their jobs.

So what does assessment for learning look like in higher education? I’d argue it’s much less common outside the seminar room simply because of the scale, particularly with my department’s first year cohort now substantially into three figures. We might also assume where it is present it bears little resemblance to assessment for learning in schools, but does it? In school, students might hold up a coloured card with a big letter A. At university we might use polling tools such as Turningpoint, and if we want to go low-tech, then Keele communicubes have been around for a while now, courtesy of Dr Stephen Bostock. Are we really doing anything fundamentally different, apart from succumbing to ‘shiny toys’ syndrome and congratulating ourselves on how innovative we are?

Returning to the MOOC, did I find it useful? Yes. Would I recommend it to others? Yes, because I found it beneficial to look at learning from the perspective of school teaching. My only negative point is that I would have preferred to have had a class that I could have practised the technique with.