Assessment: it’s a variable not a constant

The second post in this short series is going to be about assessment in general. How reliable is it? What are we actually trying to measure? Are we more concerned with how we measure rather than what we measure?"Assessment" You keep using that word. I do not think it means what you think it means

Reliability (assessing consistently), and validity (correctly assessing the thing we want to assess) are key concepts in assessment, but how well are we achieving these aims in practice? We pretend that assessment and marking is an objective, repeatable process, but like much of the rest of education, it’s messier and more untidy than that.

Tom Sherrington, points out that marks aren’t uniform in size. Some marks are easier and some harder to achieve so that everyone has a chance to demonstrate their learning. We can also use ‘harder’ marks to distinguish between grades, as was the case with Hannah’s sweets in a previous post. The aggregate scores are also not directly comparable. If two students have a score of 36 out of 50 then this implies equal achievement and level of knowledge but this may not be the case when we take into account where within the assessment those marks came from. If that’s the case when we’re talking about a science exam, then when it comes to more essay-based disciplines is it any surprise that “there is no correct score for an essay?”. And if you think marking is variable at school, then marking in higher education may come as a shock.

There is a big difference between assessment of learning (summative) and assessment for learning (formative), and as I mentioned in the first post in this series we tend to ignore the formative in favour of the summative. Interestingly, assessments often try to combine the two, which then begs the question: are we doing either well? Assessment is typically summative, with some feedback included in the hope that it ‘feeds forward’. The downside is that from the perspective of the student the point at which that feedback would be useful has already passed because they see this assessment as having ended – they focus on the mark since that tells them ‘how they did’. The feedback isn’t really assessment for learning, but rather an explanation of the assessment of learning, with an few added implications for future practice.

Jeremy Levesley recently gave a presentation describing his approach to assessment in a third year mathematics module. His aim is to spend as much time as possible in communication initiated by students, and to engineer situations in which that is promoted. Different assessment activities are targeted to the grade boundaries so that, for example, the question ‘is this first-class work?’ can be answered yes/no/maybe. Basic skills tests decide pass or fail, unseen exam questions decide 2i or 2ii, and student topic work decides whether the work is first-class or not. I really like this approach because it’s an example of how assessment can straddle both the formative and summative assessment camps. Because the conversations are student-initiated they can serve as assessment for learning, involving students in taking responsibility for managing their own learning. The summative nature of the assessment explicitly recognises the inherent variability in marking, and takes advantage of it by targeting it to a single answer (yes/no/maybe). This gives a corresponding reduction in workload for the tutor, freeing up time to be used more productively elsewhere.

Assessment can be misused. There can be a change of emphasis towards what can be measured relatively easily rather than what is valuable. For example, character education is an emerging area in UK school education and I can see its value (provided that by character we don’t mean conformity), but should we be ‘measuring’ it? This emphasis on measurement has two impacts. First, the valuable (but difficult to measure) gets replaced with some easier to measure proxy, so character might be measured by how many days students are involved in community volunteering. Secondly, the proxy becomes the measurement. Originally, SATs were supposed to sample student knowledge and skills at a particular point in time. What has happened is that SATs have come to dominate teaching activities as the scores have become the means by which schools are deemed to be failing or succeeding. What started as a useful benchmark has become an accountability stick with which to beat schools and teachers. Also, the pseudo-statistical judgements made about school performance are highly dubious if we’re looking for robust and valid measurements, as the Icing on the Cake blog frequently documents.

I love Carol Dweck’s work on mindset. Judgements of a student (whether from inside a school or outside) have a significant effect on student achievement because students can see themselves as failing in comparison to others, or failing because the school is failing, but this ignores the internal achievement that may be occurring. A runner may come, say, 439th in a marathon, but might have knocked ten minutes off their previous best time. They didn’t win the race, but their performance was the best of their life, so how can that be judged as a ‘failure’?
A policy being floated at the moment here in the UK is that of ‘secondary readiness’. Students take SATs at the end of key stage two (age 11). Previously, their scores would have been recorded and passed to their secondary (high) school, but the proposal now is to introduce the idea of ‘failing’ these tests. If the scores aren’t high enough then they are to retake them. This has the potential for children to be labelled (by themselves or others) as failures at a time of major social upheaval (the move to secondary school) and just before they hit puberty. Now, what could possibly go wrong with that? 🙂

I understand the need for accountability and having commonly understood standards of achievement, but the greatest educational impacts are not those projected outwards, but those reflected inwards. We like to know how we’re doing in relation to others, but more importantly how we’re doing in relation to our own ‘personal best’. That’s the subject for the next post.

Maths and Mindset

An word-based maths problem

An word-based maths problem

Dr Jenny Koenig from the university of Cambridge was the presenter at one of our regular PedR meetings (pedagogical research group) recently. Now, I actually like maths. One of the first Open University courses I did was ‘MS283 An Introduction to Calculus’ so it was interesting to look at maths from a different perspective. The title was ‘Teaching and Learning Maths in the Biosciences’ and dealt with the challenges and issues surrounding quantitative skills in the biosciences, which fell into two main areas. First was content, the mathematical knowledge that a student arrived at university with, which varied according to the subjects and level they studied to and the grades they achieved. What this meant in practice was a very wide range in knowledge and ability from a bare pass at GCSE (the qualifications taken at the end of compulsory education around the age of 16) to a top grade in A-level maths immediately before entry into university. The second area was the attitude to maths, and the issues of maths phobia and maths anxiety. This lead me on to the work of Dr Jo Boaler and her ‘How to Learn Maths‘ MOOC. Unfortunately, by the time I became aware of it the course was due to finish so I downloaded the videos and settled down for some offline viewing. Her book “The Elephant in the Classroom”  is my current hometime reading on the commute home, and goes into the ideas in more detail.
Her premise is that the typical teaching of maths is strongly counterproductive and doesn’t equip students to actually use maths in the way they need to do in real life. This is because it relies on individual work using standardised methods with little creativity or active problem solving. Also, the (predominantly) UK and US practice of grouping students by ability leads to fixed expectations of both student and teacher. Her solution is to use a problem solving approach, involving group work, active discussion and explicit demonstration that there a variety of ways to reach the answer. She draws heavily on the work of Dr Carol Dweck around the concept of mindset, who distinguishes between fixed mindsets and growth mindsets. Fixed mindsets are where a person believes that people possess a fixed amount of a certain trait or talent (like mathematical ability) and that there is little that they can do to change it. This manifests itself as the self-fulfilling prophecy that there are those who are good at maths and those that aren’t. A person with a growth mindset believes that development comes through persistence and practice, and that anyone can improve their skill in a particular area. While these mindsets can apply to any area, I’d argue that Maths is one of the areas where the fixed mindset is particularly common and stated, and not only that, but that it’s culturally acceptable to be bad at maths. For example, while it’s not uncommon to hear people say that they’ve never been able to do maths you’d never see anyone smiling, shrugging their shoulders and saying “Ah, that reading and writing stuff. Never could get the hang of it”. Dweck’s work on mindset really resonates with me, and while I’m largely in the growth mindset there are a few areas where my mindset is more fixed. Now that I’m aware of those I can take steps to change them.
This concept of mindset links in to my earlier post on behaviour and reward because in addition to cultural and institutional barriers to innovation we now can add internal barriers. A fixed mindset leads to risk-averse behaviour because self-worth becomes connected to success. Failure doesn’t present a learning opportunity but passes sentence on the person as the failure. The failure or success at the task is the embodiment of the worth of the individual.
Growth mindsets on the other hand, allow ‘failures’ to be positive. A paper by Everingham et al (2013) describes the introduction of teaching quantitative skills through a new interdisciplinary course, looks at the effectiveness over two years and describes rescuing it “… from the ashes of disaster!” Evaluation at the end of the first year produced some worrying results. Maths anxiety for all students had increased. Female students were less confident in the computing areas of the course and male students were less engaged with the course overall. Significant changes were made to student support and assessment practices for the course and the second evaluation produced much better results. This is a great example of the growth mindset in action – they tried something and it went wrong. Rather than playing the ‘bail out and blame’ game they persisted. They redesigned and tried again, and then made public their initial failure through publication. When I worked as an IT trainer someone asked me how I ran my training room. I replied that I aimed for an atmosphere where people could screw up completely, feel comfortable and relaxed about it, and then get the support to put it right. What works for students works equally well, if permitted :-), for institutions.


Etheringham, Y. , Gyuris, E. and Sexton, J. (2013). Using student feedback to improve student attitudes and mathematical confidence in a first year interdisciplinary quantitative course: from the ashes of disaster! International Journal of Mathematical Education in Science and Technology, 44(6), 877–892. DOI: