Striving for a personal best – who judges?

In this post I’m going to look at some of the issues surrounding a form of assessment you’re probably familiar with, but not with the terminology that goes with it: ipsative assessment. So what is ipsative assessment? It’s a means of assessment that measures the student’s progress, not by comparing to some external (and often arbitrary) standard, but by comparing the student’s achievement now to their achievement in the past. It asks the questions: Did I improve? By how much? It avoids the questions: What mark did I get? Did Fred do better than me?

This shift from external validation to internal comparison is an important one, and has a number of implications. Firstly, when done well it moves the focus towards personal growth and development. This, to me, is what education should be about. Arguably it’s the only measure that really matters long term, although the secondary effects of gaining qualifications and meaningful employment are also important. Education in this sense vastly exceeds the education-as-job-training model, developing citizenship as well as knowledge. My model of education always reminds me of a line from the film Dances with Wolves, when Kicking Bear (Grahame Greene) is talking to John Dunbar (Kevin Costner): “I was just thinking that of all the trails in this life there is one that matters most. It is the trail of a true human being. I think you are on this trail and it is good to see.” This idea is not new and goes back to Ancient Greece, where the responsibilities of citizenship were deeply connected to everyday life.

coaching word cloud

Coaching word cloud

Ipsative assessment in widespread in sports, and is often practised in conjunction with a coach – it’s the process behind the concept of a ‘personal best’, where achieving a personal best is a cause for celebration regardless of the result compared to others. I have my own boat (a GP14) and sail at a local club. It’s not a competitive boat. That’s not an excuse – the boat was built in 1973 and still has the same sails (which were old then) as when I bought the boat around 2000, so I’m not going to be winning any club races, even if I were a better sailor 🙂 . But it doesn’t matter: what matters is did I have a good race? How well did I spot the wind shifts? How well did I round the marks? In short, how am I sailing in this race compared to my own skills? Which, of course, is the only thing I can actually control. In this way, ipsative assessment is ‘performance in practice’, and as such is a prime example of authentic assessment.

The school system here in the UK uses a measure called ‘value-added’, which looks at the change in scores between various stages of schooling, with examination or test scores being the primary accountability measure. The downside is that if this measure is used to judge schools rather than record their performance then there will be pressure to game the system, which means that value-added isn’t measuring what it’s supposed to. In addition, I recall reading a blog recently where a teacher was criticising value-added because it assumed that the class that were tested in year 6 contained the same children that were tested in year 1. Their particular school had a high turnover of children because of local circumstances so the assumption didn’t hold. How on earth can you measure progress without tying the data to an individual? Surely without that link value-added has no value at all?

What I like about ipsative assessment is that the attention is focused on what has been achieved rather than what has not. It also gives us an additional chance to engage learners with taking responsibility for their own learning and that’s crucial for ipsative assessment to be effective, although achieving it can be problematic. When my daughter was taking her degree each assignment had a self-assessment sheet that was a copy of the one used by the tutors. The students were supposed to judge their own performance and then compare it to the tutor’s sheet when the work was returned. My daughter, despite many conversations about why self-evaluation was useful and what the educational purpose was, would simply tick the middle box all the way down. In effect, she deferred to an external authority to judge her work.

Conceptually, there is also a link to assessment for learning (A4L). While A4L allows the teacher to gauge student knowledge it can also be seen as ipsative assessment for the teacher that then feeds into their reflective practice.

A key question is how can we formalise ipsative assessment without losing the ethos behind it? We need structures and procedures to mediate and support the process, but the last thing education needs (especially schools-based education in the UK) is another accountability stick with which to beat the teachers. Firstly, if the process is simply a tick-box exercise then it’s not being allocated the importance it needs, and neither student nor teacher will take it seriously. Secondly, it’s vital that the process is student-owned. The student must be taking part in an active way with evaluating and processing their ipsative feedback for them to get the gains it offers. As Hughes has pointed out, the student needs to be the main participant, not the teacher.

In a previous post I described Jeremy Levesley’s approach to assessment, and this could fit quite nicely with ipsative assessment. Suppose we use Prof. Levesley’s approach so that we’re getting to a mark or grade quickly and then use the freed time to put more effort into the ipsative process? We get a mark to meet our accreditation needs (and those students who will still fixate on the mark above all else), and we get to develop learner independence and the self-assessment capabilities of our students. It seems like a win-win, but would a hybrid approach work, or are we just contaminating the ipsative process? I believe it could be done if the academic systems we choose to adopt within our course reinforces the practices we wish to see in our students.

The reason I think ipsative assessment isn’t more prominent in education at the moment is the relentless focus on education (and students) as a product, rather than education as a process, as Cory Doctorow recently observed in his wide-ranging video on openness, privacy and trust in education. And that’s the wrong focus. Why train students to perform in systems that are so unrepresentative of the world beyond the campus when we could teach them to judge themselves and extend beyond their personal best?

Assessment: it’s a variable not a constant

The second post in this short series is going to be about assessment in general. How reliable is it? What are we actually trying to measure? Are we more concerned with how we measure rather than what we measure?"Assessment" You keep using that word. I do not think it means what you think it means

Reliability (assessing consistently), and validity (correctly assessing the thing we want to assess) are key concepts in assessment, but how well are we achieving these aims in practice? We pretend that assessment and marking is an objective, repeatable process, but like much of the rest of education, it’s messier and more untidy than that.

Tom Sherrington, points out that marks aren’t uniform in size. Some marks are easier and some harder to achieve so that everyone has a chance to demonstrate their learning. We can also use ‘harder’ marks to distinguish between grades, as was the case with Hannah’s sweets in a previous post. The aggregate scores are also not directly comparable. If two students have a score of 36 out of 50 then this implies equal achievement and level of knowledge but this may not be the case when we take into account where within the assessment those marks came from. If that’s the case when we’re talking about a science exam, then when it comes to more essay-based disciplines is it any surprise that “there is no correct score for an essay?”. And if you think marking is variable at school, then marking in higher education may come as a shock.

There is a big difference between assessment of learning (summative) and assessment for learning (formative), and as I mentioned in the first post in this series we tend to ignore the formative in favour of the summative. Interestingly, assessments often try to combine the two, which then begs the question: are we doing either well? Assessment is typically summative, with some feedback included in the hope that it ‘feeds forward’. The downside is that from the perspective of the student the point at which that feedback would be useful has already passed because they see this assessment as having ended – they focus on the mark since that tells them ‘how they did’. The feedback isn’t really assessment for learning, but rather an explanation of the assessment of learning, with an few added implications for future practice.

Jeremy Levesley recently gave a presentation describing his approach to assessment in a third year mathematics module. His aim is to spend as much time as possible in communication initiated by students, and to engineer situations in which that is promoted. Different assessment activities are targeted to the grade boundaries so that, for example, the question ‘is this first-class work?’ can be answered yes/no/maybe. Basic skills tests decide pass or fail, unseen exam questions decide 2i or 2ii, and student topic work decides whether the work is first-class or not. I really like this approach because it’s an example of how assessment can straddle both the formative and summative assessment camps. Because the conversations are student-initiated they can serve as assessment for learning, involving students in taking responsibility for managing their own learning. The summative nature of the assessment explicitly recognises the inherent variability in marking, and takes advantage of it by targeting it to a single answer (yes/no/maybe). This gives a corresponding reduction in workload for the tutor, freeing up time to be used more productively elsewhere.

Assessment can be misused. There can be a change of emphasis towards what can be measured relatively easily rather than what is valuable. For example, character education is an emerging area in UK school education and I can see its value (provided that by character we don’t mean conformity), but should we be ‘measuring’ it? This emphasis on measurement has two impacts. First, the valuable (but difficult to measure) gets replaced with some easier to measure proxy, so character might be measured by how many days students are involved in community volunteering. Secondly, the proxy becomes the measurement. Originally, SATs were supposed to sample student knowledge and skills at a particular point in time. What has happened is that SATs have come to dominate teaching activities as the scores have become the means by which schools are deemed to be failing or succeeding. What started as a useful benchmark has become an accountability stick with which to beat schools and teachers. Also, the pseudo-statistical judgements made about school performance are highly dubious if we’re looking for robust and valid measurements, as the Icing on the Cake blog frequently documents.

I love Carol Dweck’s work on mindset. Judgements of a student (whether from inside a school or outside) have a significant effect on student achievement because students can see themselves as failing in comparison to others, or failing because the school is failing, but this ignores the internal achievement that may be occurring. A runner may come, say, 439th in a marathon, but might have knocked ten minutes off their previous best time. They didn’t win the race, but their performance was the best of their life, so how can that be judged as a ‘failure’?
A policy being floated at the moment here in the UK is that of ‘secondary readiness’. Students take SATs at the end of key stage two (age 11). Previously, their scores would have been recorded and passed to their secondary (high) school, but the proposal now is to introduce the idea of ‘failing’ these tests. If the scores aren’t high enough then they are to retake them. This has the potential for children to be labelled (by themselves or others) as failures at a time of major social upheaval (the move to secondary school) and just before they hit puberty. Now, what could possibly go wrong with that? 🙂

I understand the need for accountability and having commonly understood standards of achievement, but the greatest educational impacts are not those projected outwards, but those reflected inwards. We like to know how we’re doing in relation to others, but more importantly how we’re doing in relation to our own ‘personal best’. That’s the subject for the next post.

Times tables – a matter of life and death?

Recently I took my youngest daughter to visit a university in the north-east of the UK, which involved a round trip of nearly 500 miles and an overnight stay. There’s a general election due in less than three months, which means we’re into that ‘interesting’ phase of the electoral cycle where all the parties try to outcompete each other either with bribes incentives for certain groups (‘Unicorns for every five-year old!’) or to outdo themselves with demonising whatever group is the scapegoat this month. If you’ve ever seen the Monty Python’s four Yorkshiremen sketch, you’ll know what I mean.

So what has this to do with times tables? Well, one of the announcements was for every child to know their times tables up to 12 by the time they leave primary school (i.e. by age 11), and by ‘know’ they appear to mean memorise.

I have a number of misgivings about this. Firstly rote learning without understanding isn’t particularly useful. Memorisation isn’t education. Secondly, as the work of Jo Boaler has remarked students perform much better at maths when they learn to interact more flexibly with maths (number sense) rather than than simply remembering the answers. As she points out, calculating when stressed works less well when relying on memory, which is presumably why politicians refuse to answer maths questions when interviewed, as Nicky Morgan the education secretary did recently. In one of my previous jobs I worked in a health sciences department and the statistics on drug errors (such as calculating dosages) were frightening, and there are few things less stressful than someone potentially dying if the answer to a maths problem is wrong.

The outcome of all this memorisation is that the application suffers. As we travelled back there was a radio phone-in quiz and as times-tables were in the news one of the questions was zero times eight. The caller answered eight, and was told they were wrong. A few minutes later someone else called to tell the presenter that they were wrong because zero times eight was eight, but eight times zero was zero. And this is the real problem. While maths is seen (and taught) as a recipe, a set of instructions to follow, misconceptions like this will continue to prosper. Personally, I see maths as more of a Lego set – a creative process where you combine different components in different ways to get to the end result you want. As Jo Boaler has said “When we emphasize memorization and testing in the name of fluency we are harming children, we are risking the future of our ever-quantitative society and we are threatening the discipline of mathematics”. Unfortunately, I’m doubtful whether that will count for anything against the one-upmanship in the closing months of an election campaign.

Assessment with portfolios – a neglected opportunity?

I finally got around to reading Tony Bates’ blog post on the outlook for online learning in the short and medium term. It’s an in-depth post, but I want to concentrate on one particular aspect. Tony puts forward the idea that the traditional lecture-based course will gradually disappear as learning shifts from the transmission of information to knowledge management. Later, he talks about an increase in the use of portfolios for assessment, and to my mind these two are a natural fit. This is because as teaching shifts from the presentation of a package of content to students meeting a common set of criteria traditional written assessment becomes increasingly less fit for purpose.

Clipboard and pen

Photo credit: Dave Crosby (CC BY-SA)

The idea of assessment by portfolio in higher education isn’t new. A few years ago, I saw Lewis Elton give the plenary at the teaching and learning conference at the University of Birmingham, where he suggested using portfolios as a replacement for the UK system of degree classifications. The plenary covered much of the ground in his 2004 paper ‘Should classification of the UK honours degree have a future?’ and provoked much discussion. I believe there is a case to be made for portfolio assessment to have a greater role in higher education. As Elton says, the “measurement of achievement has become more important than achievement itself”. In a recent article discussing competition in education, Natalie Bennett (the leader of the Green Party here in the UK) quoted the Education Reform Act 1988, which said that education should: “promote the spiritual, moral, cultural, mental and physical development of pupils … and prepare pupils … for the opportunities, responsibilities and experiences of adult life.” The quote relates specifically to the context of secondary education (11-18 years old), but it’s still a pretty powerful quote. And if that’s the case for school children then surely it should hold for university and higher-level study when their capacity to engage in activities that promote those values is arguably higher? We may want assessment that is both valid (measures what it is designed to measure) and reliable (gives consistent results) for very good reasons (such as quality control and comparisons across institutions), but that restricts the types of assessment we can do. What we get (across the education system as a whole) is assessment that comes down to a number or grade, and that steers the testing towards that which is easily measured, for example, a 1,500 word essay or a three hour exam.

So what does assessment by portfolio give us that conventional assessment doesn’t? First, it’s an active process and in a well designed assessment the learner has to engage over a period of time, evaluating and reflecting on their learning. Students can revisit their work and re-purpose it, for example, for use within a presentation to a potential employer. This does bring in issues of ownership and access after the end of the course, but I’ll leave those aside for now, and in any case those criticisms isn’t unique to portfolios.

Secondly, higher order and transferable skills can be assessed more effectively than they could through a conventional written assessment. For example, selecting items for inclusion will involve evaluating individual items, reflecting on their purpose and value, and synthesising the collection into a coherent whole. This does mean that the learners need to have a high level of independent learning skills, which may not be the case. A supportive pedagogical design with clear scaffolding and direction can help develop these skills. Another point is that this form of assessment is authentic – it assesses in a direct analogue of how use these higher order skills within a workplace. Support and direction need to be explicit, not only on the process of the assessment, but also on its purpose. My daughter used an eportfolio as a record of achievement during her degree, but didn’t see the point and questioned why they couldn’t simply submit their assignments and leave it at that. Another acquaintance is studying for a primary PGCE on a course that uses the Mahara eportfolio and said that it’s almost universally hated, mainly it seems because they find the user interface unintuitive.

The portfolio can be an integrating influence that draws the rest of the course together for students. They have been used very successfully on a Master’s level distance learning course by the Open University (Mason et al, 2004). The course consisted of four modules with two items being selected from each module for inclusion. Two thirds of students were positive about the role of the portfolio as an integrating element in the course.

So what are the downsides? Well, most criticisms centre around the issues of reliability and validity, but that brings us back to Elton’s statement that achievement is second place to the measurement of achievement. Elton also said that the “prime purpose of assessment should be to encourage good learning” (Elton, 2004), and that there should be a bias towards success and not failure. He’s not referring to grade inflation, but that we should move away from the deficiency model of traditional assessment. This brings to mind the perennial debate around standards, ‘dumbing down’, and whether assessments should act as a gate-keeper (norm-referenced) or as a marker of achievement (criteria-referenced). Should everyone on a course be able to get a first class degree? Absolutely, if they’ve met the standards for a first-class degree, but I can imagine the outcry if a department were to award first class honours to an entire cohort, supposing they were lucky enough to get such a cohort in the first place.

Elton recommends that if something can be assessed reliably and validly then grade it conventionally. If something can be assessed validly but not reliably then it should be graded pass/fail using experienced examiners. If it can’t be assessed either reliably or validly then it should be reported on (again by experienced examiners) rather than graded. Knight (2002) had a similar argument, stating that summative assessment should be restricted to “that which can be reliably, affordably and fairly assessed”. Skills development and other aspects of the curriculum should be formatively assessed. Portfolios, then, blur the lines between formative and summative assessment.

Portfolios have great potential for assessment, provided that they are used wisely and that significant effort is made to change the students’ focus away from the product. It’s like getting an essay for homework at school – the essay isn’t the homework, the homework is the process of research, drafting, revision and synthesis, and the printed essay is just the evidence you did it. The sticking point is portfolios they exist within an educational ecosystem that functions to support and validate conventional assessment, and that is likely to change only slowly.

References

Bates, T. (2014). 2020 Vision: Outlook for online learning in 2014 and way beyond.

Bennett, N. (2014) Let’s Get Heretical on Education: Competition Has Failed.

Elton, L. (2004). Should classification of the UK honours degree have a future? Assessment and Evaluation in Higher Education, 29(4), 415–422.

Knight, P. (2002). Summative assessment in higher education: practices in disarray. Studies in Higher Education, 27, 275–286.

Mason, R. Pegler, C. and Weller, M. (2004). E-portfolios: an assessment tool for online courses. British Journal of Educational Technology, 35(6), 717–727.

Carrots and sticks – not good enough even for donkeys?

In my last post I looked at student feedback and talked about institutional inertia in implementing new practice. Over the last couple of days I’ve come across blog posts that have led me to consider how institutions (in their widest sense) actively work against the improvement of teaching and the educational experience.

One post that came through my RSS feeds was ‘25 ways to cultivate intrinsic motivation‘. While an excellent article in itself it contained a video of the talk Daniel Pink gave to the RSA, and that’s what provided the seed for this blog post. I’d seen this video before but it was a while ago and I’d forgotten the details. Daniel talked about what motivates and drives human beings and some of the research that had been done. He described research where people were offered monetary rewards for various tasks and their performance was measured. The reward system worked as expected (higher pay produced better performance) provided that the task only involved mechanical or rote skills. Once the task needed any sort of thinking or cognitive input then a larger reward actually led to poorer performance. As Daniel states: “When a task gets more complicated, when it requires some conceptual creative thinking those types of motivator demonstrably don’t work.” He then goes on to discuss how for those types of task a combination of autonomy (self-direction), mastery (the desire to get better at something), and having a sense of higher purpose produces performance increases. Money is only relevant (in cognitive tasks) if people are paid sufficient so that they’re thinking more about the task and less about the reward. I’d argue that these three traits are a pretty good description of what drives the best teachers.

So how does this link to teaching? My daughter has recently passed her teaching degree, called the PGCE (Post Graduate Certificate of Education) here in the UK, and has just started her first full year of teaching. The UK government through the Department of Education has introduced new pay policies for teachers. The press release states that “evidence shows that improving the quality of teaching is essential to raising standards in schools.” No argument there, but I have grave doubts that any aspect of what’s been announced would actually improve ‘the quality of teaching’ within schools as a whole. There are three main elements listed in the press release for the new national pay framework. First, pay increases based on the length of service are stopped. I’d argue that rather than rewarding length of service these increases recognised increased experience in much the same way that a person with a number of years of experience could expect to start a job on a higher salary than someone without. Second, all pay progression is linked to performance based on annual appraisals. I don’t have an issue with performance monitoring or annual appraisals, provided that the process is transparent, fair, and not used as a tool to divide staff. Unfortunately, I’ve had personal experience where that was not the case. Thirdly, the new proposals scrap mandatory pay points, meaning that the pay scales remain for reference only “to guide career expectations“.

The press release then goes on to say: “It is up to each school to decide how to implement new pay arrangement for performance-related pay”, but there’s no mention of any extra funding to meet the additional salary costs (and if extra funds were available you can be sure they’d be shouting it from the rooftops). This means that funding the performance-related pay will have to come from elsewhere in the school budget. Schools are expected to do more with less, and the blame for any failure goes to those left to implement the policy (i.e. the school management) rather than those who set up an unworkable system in the first place.

Performance is assessed against the teachers’ standards framework and “if they meet all their objectives they might receive a pay rise” (my emphasis). So what happens if a majority of the teachers in a school meet (or exceed) their objectives? Do they all receive an increase, and if so, where does the money come from within a fixed budget? An analogy here is criterion and norm-referenced assessment. In criterion-referenced assessment theoretically the entire class could get the top grade provided their work met the standards that identified the top grade. In norm-referenced assessment only a certain percentage get the top grade, because what matters is not the work they produce, but how that work compares to their cohort. It’s the same for the teachers under these policies – there is no link between their performance and the reward they receive because there is no additional funding available. Even if financial regulations allow the headteacher some flexibility the largest budget item in educational institutions by a big margin is staff costs. In a previous institution I worked in staff costs accounted for around 70% of the total annual budget. A better approach would be to have had a chunk of money available to fund improved teacher performance in a similar way to the pupil premium, where schools are given additional funds to “support their disadvantaged pupils and close the attainment gap between them and their peers.” They could even call it the ‘teacher premium’.

Looking at the politics of this, and with an eye to the creeping agenda of privatisation within all sectors of education in the UK, I see this more as an attack on collective pay agreements and giving school management a tool to reduce staff costs. Over time, the salary you would get as a teacher would become a lottery. How can you even call this a national pay framework if teachers doing the same job to the same standard with the same amount of experience could end up being paid different salaries within the same school? And what would that do to the collegiate, collaborative environment that enable educational institutions to increase their achievement through the synergy of their staff?

So, the government has introduced a ‘performance-related’ pay scheme that isn’t related to performance in any systematic way, is likely to reduce institutional effectiveness by setting up staff to compete against each other for limited resources, and actually contradicts the economic and psychological research that shows us that monetary reward as a motivator for creative and complex cognitive tasks doesn’t work.

What does work, as we saw earlier, is autonomy, mastery, and purpose. Lack of autonomy in teaching in the UK is a frequent complaint. Mastery (getting better over time) is possible, but as I’ve just explored, doesn’t necessarily result in any extrinsic reward. It seems the Department of Education are relying on a sense of purpose to abdicate from their responsibility to reward and motivate teachers through effective and evidence-based policy. In effect, they’re using the old “it’s a vocation” excuse and hoping everything else will magically fall into place.

By coincidence and in contrast, I’ve recently started following a blog where an American teacher is blogging his experiences of teaching within the Finnish system. Finland is often held up as an example of excellence in teaching (including by the UK government), but the Finnish system is very different to the UK one. Pasi Sahlberg, the author of Finnish Lessons: What Can the World Learn from Educational Change in Finland? put forward some interesting views when interviewed in The Altantic. In the UK ever more command and control management (and student testing) is put forward as the answer to teacher accountability. Sahlberg says “Accountability is something that is left when responsibility has been subtracted.” In other words, accountability becomes more necessary (and more complicated to administer and measure) once you start to remove autonomy. At a school reunion two years ago, one of my former teachers said that they were glad to have retired because they said that the current system meant they “weren’t allowed to teach any more”.

Teachers and administrators in Finland are “are given prestige, decent pay, and a lot of responsibility“. Teacher training institutions are highly selected, with a master’s degree the minimum qualification. There is also a designed lack of competition within the Finnish educational system, discussed in the same Atlantic article. Contrast this with the Education Secretary’s recent dismissal of those within education who disagreed with his curriculum reforms as ‘marxists’ and ‘the enemies of promise’.

Here’s an idea: if we really want to improve the quality of education by using performance-related pay how about we do a teaching version of group assessment by tying the reward to the performance of some group on a criterion referenced basis, i.e. if the group meets the criteria the group gets the reward. The group could be those that teach a particular year, a department, or even the entire school. This would reduce the negative effects of competition because the groups are no longer in conflict for a limited resource. It’s similar to profit-sharing schemes within business, which is useful for those sections of the political spectrum who see any system where individuals are not in direct cut-throat competition with each other as fundamentally wrong. Of course, it would require Government to actually fund it rather than just trot out soundbites during a photo-opportunity to a school.

To come full circle back to my starting point, institutional inertia can be a significant block to educational innovation and improvement, but it’s even worse when the systems imposed on us seem designed to actively impede us. In politics, we might hear the phrase ‘evidence-based policy’. Unfortunately, this appears to be evidence-free policy.