Striving for a personal best – who judges?

In this post I’m going to look at some of the issues surrounding a form of assessment you’re probably familiar with, but not with the terminology that goes with it: ipsative assessment. So what is ipsative assessment? It’s a means of assessment that measures the student’s progress, not by comparing to some external (and often arbitrary) standard, but by comparing the student’s achievement now to their achievement in the past. It asks the questions: Did I improve? By how much? It avoids the questions: What mark did I get? Did Fred do better than me?

This shift from external validation to internal comparison is an important one, and has a number of implications. Firstly, when done well it moves the focus towards personal growth and development. This, to me, is what education should be about. Arguably it’s the only measure that really matters long term, although the secondary effects of gaining qualifications and meaningful employment are also important. Education in this sense vastly exceeds the education-as-job-training model, developing citizenship as well as knowledge. My model of education always reminds me of a line from the film Dances with Wolves, when Kicking Bear (Grahame Greene) is talking to John Dunbar (Kevin Costner): “I was just thinking that of all the trails in this life there is one that matters most. It is the trail of a true human being. I think you are on this trail and it is good to see.” This idea is not new and goes back to Ancient Greece, where the responsibilities of citizenship were deeply connected to everyday life.

coaching word cloud

Coaching word cloud

Ipsative assessment in widespread in sports, and is often practised in conjunction with a coach – it’s the process behind the concept of a ‘personal best’, where achieving a personal best is a cause for celebration regardless of the result compared to others. I have my own boat (a GP14) and sail at a local club. It’s not a competitive boat. That’s not an excuse – the boat was built in 1973 and still has the same sails (which were old then) as when I bought the boat around 2000, so I’m not going to be winning any club races, even if I were a better sailor 🙂 . But it doesn’t matter: what matters is did I have a good race? How well did I spot the wind shifts? How well did I round the marks? In short, how am I sailing in this race compared to my own skills? Which, of course, is the only thing I can actually control. In this way, ipsative assessment is ‘performance in practice’, and as such is a prime example of authentic assessment.

The school system here in the UK uses a measure called ‘value-added’, which looks at the change in scores between various stages of schooling, with examination or test scores being the primary accountability measure. The downside is that if this measure is used to judge schools rather than record their performance then there will be pressure to game the system, which means that value-added isn’t measuring what it’s supposed to. In addition, I recall reading a blog recently where a teacher was criticising value-added because it assumed that the class that were tested in year 6 contained the same children that were tested in year 1. Their particular school had a high turnover of children because of local circumstances so the assumption didn’t hold. How on earth can you measure progress without tying the data to an individual? Surely without that link value-added has no value at all?

What I like about ipsative assessment is that the attention is focused on what has been achieved rather than what has not. It also gives us an additional chance to engage learners with taking responsibility for their own learning and that’s crucial for ipsative assessment to be effective, although achieving it can be problematic. When my daughter was taking her degree each assignment had a self-assessment sheet that was a copy of the one used by the tutors. The students were supposed to judge their own performance and then compare it to the tutor’s sheet when the work was returned. My daughter, despite many conversations about why self-evaluation was useful and what the educational purpose was, would simply tick the middle box all the way down. In effect, she deferred to an external authority to judge her work.

Conceptually, there is also a link to assessment for learning (A4L). While A4L allows the teacher to gauge student knowledge it can also be seen as ipsative assessment for the teacher that then feeds into their reflective practice.

A key question is how can we formalise ipsative assessment without losing the ethos behind it? We need structures and procedures to mediate and support the process, but the last thing education needs (especially schools-based education in the UK) is another accountability stick with which to beat the teachers. Firstly, if the process is simply a tick-box exercise then it’s not being allocated the importance it needs, and neither student nor teacher will take it seriously. Secondly, it’s vital that the process is student-owned. The student must be taking part in an active way with evaluating and processing their ipsative feedback for them to get the gains it offers. As Hughes has pointed out, the student needs to be the main participant, not the teacher.

In a previous post I described Jeremy Levesley’s approach to assessment, and this could fit quite nicely with ipsative assessment. Suppose we use Prof. Levesley’s approach so that we’re getting to a mark or grade quickly and then use the freed time to put more effort into the ipsative process? We get a mark to meet our accreditation needs (and those students who will still fixate on the mark above all else), and we get to develop learner independence and the self-assessment capabilities of our students. It seems like a win-win, but would a hybrid approach work, or are we just contaminating the ipsative process? I believe it could be done if the academic systems we choose to adopt within our course reinforces the practices we wish to see in our students.

The reason I think ipsative assessment isn’t more prominent in education at the moment is the relentless focus on education (and students) as a product, rather than education as a process, as Cory Doctorow recently observed in his wide-ranging video on openness, privacy and trust in education. And that’s the wrong focus. Why train students to perform in systems that are so unrepresentative of the world beyond the campus when we could teach them to judge themselves and extend beyond their personal best?

Assessment: it’s a variable not a constant

The second post in this short series is going to be about assessment in general. How reliable is it? What are we actually trying to measure? Are we more concerned with how we measure rather than what we measure?"Assessment" You keep using that word. I do not think it means what you think it means

Reliability (assessing consistently), and validity (correctly assessing the thing we want to assess) are key concepts in assessment, but how well are we achieving these aims in practice? We pretend that assessment and marking is an objective, repeatable process, but like much of the rest of education, it’s messier and more untidy than that.

Tom Sherrington, points out that marks aren’t uniform in size. Some marks are easier and some harder to achieve so that everyone has a chance to demonstrate their learning. We can also use ‘harder’ marks to distinguish between grades, as was the case with Hannah’s sweets in a previous post. The aggregate scores are also not directly comparable. If two students have a score of 36 out of 50 then this implies equal achievement and level of knowledge but this may not be the case when we take into account where within the assessment those marks came from. If that’s the case when we’re talking about a science exam, then when it comes to more essay-based disciplines is it any surprise that “there is no correct score for an essay?”. And if you think marking is variable at school, then marking in higher education may come as a shock.

There is a big difference between assessment of learning (summative) and assessment for learning (formative), and as I mentioned in the first post in this series we tend to ignore the formative in favour of the summative. Interestingly, assessments often try to combine the two, which then begs the question: are we doing either well? Assessment is typically summative, with some feedback included in the hope that it ‘feeds forward’. The downside is that from the perspective of the student the point at which that feedback would be useful has already passed because they see this assessment as having ended – they focus on the mark since that tells them ‘how they did’. The feedback isn’t really assessment for learning, but rather an explanation of the assessment of learning, with an few added implications for future practice.

Jeremy Levesley recently gave a presentation describing his approach to assessment in a third year mathematics module. His aim is to spend as much time as possible in communication initiated by students, and to engineer situations in which that is promoted. Different assessment activities are targeted to the grade boundaries so that, for example, the question ‘is this first-class work?’ can be answered yes/no/maybe. Basic skills tests decide pass or fail, unseen exam questions decide 2i or 2ii, and student topic work decides whether the work is first-class or not. I really like this approach because it’s an example of how assessment can straddle both the formative and summative assessment camps. Because the conversations are student-initiated they can serve as assessment for learning, involving students in taking responsibility for managing their own learning. The summative nature of the assessment explicitly recognises the inherent variability in marking, and takes advantage of it by targeting it to a single answer (yes/no/maybe). This gives a corresponding reduction in workload for the tutor, freeing up time to be used more productively elsewhere.

Assessment can be misused. There can be a change of emphasis towards what can be measured relatively easily rather than what is valuable. For example, character education is an emerging area in UK school education and I can see its value (provided that by character we don’t mean conformity), but should we be ‘measuring’ it? This emphasis on measurement has two impacts. First, the valuable (but difficult to measure) gets replaced with some easier to measure proxy, so character might be measured by how many days students are involved in community volunteering. Secondly, the proxy becomes the measurement. Originally, SATs were supposed to sample student knowledge and skills at a particular point in time. What has happened is that SATs have come to dominate teaching activities as the scores have become the means by which schools are deemed to be failing or succeeding. What started as a useful benchmark has become an accountability stick with which to beat schools and teachers. Also, the pseudo-statistical judgements made about school performance are highly dubious if we’re looking for robust and valid measurements, as the Icing on the Cake blog frequently documents.

I love Carol Dweck’s work on mindset. Judgements of a student (whether from inside a school or outside) have a significant effect on student achievement because students can see themselves as failing in comparison to others, or failing because the school is failing, but this ignores the internal achievement that may be occurring. A runner may come, say, 439th in a marathon, but might have knocked ten minutes off their previous best time. They didn’t win the race, but their performance was the best of their life, so how can that be judged as a ‘failure’?
A policy being floated at the moment here in the UK is that of ‘secondary readiness’. Students take SATs at the end of key stage two (age 11). Previously, their scores would have been recorded and passed to their secondary (high) school, but the proposal now is to introduce the idea of ‘failing’ these tests. If the scores aren’t high enough then they are to retake them. This has the potential for children to be labelled (by themselves or others) as failures at a time of major social upheaval (the move to secondary school) and just before they hit puberty. Now, what could possibly go wrong with that? 🙂

I understand the need for accountability and having commonly understood standards of achievement, but the greatest educational impacts are not those projected outwards, but those reflected inwards. We like to know how we’re doing in relation to others, but more importantly how we’re doing in relation to our own ‘personal best’. That’s the subject for the next post.

Hannah got the sweets, who got indigestion?

Last Thursday in the UK around half a million 15 and 16-year olds took a GCSE maths exam, specifically the second paper in the non-calculator exam. By Friday the exam was trending on twitter (#EdexcelMaths), with one particular question attracting attention:

There are n sweets in a bag.
6 of the sweets are orange.
The rest of the sweets are yellow.

Hannah takes at random a sweet from the bag.
She eats the sweet.

Hannah then takes at random another sweet from the bag.
She eats the sweet.

The probability that Hannah eats two orange sweets is 1/3.

(a) Show that n2 – n – 90 = 0

I had a quick attempt and after one unproductive sidetrack I got the answer. So why am I writing about this? Because it fits in with the other posts on assessment I’m doing, and to explore some of the issues around it.

First, the actual  mathematical content is pretty straightforward – you only need to know how to do three things: calculate a probability without replacement, multiply fractions and rearrange an equation. This is hardly Sheldon Cooper territory.

The exam board has two tiers for the qualification (foundation and higher) and probability without replacement is only explicitly mentioned for the higher tier. The exam has been quoted as saying the question was aimed at those students who would achieve the highest grades (A and A*), and I think grade discrimination is a fair approach. I did ask my daughter (who’s currently revising and taking A-level maths) and she said she wasn’t sure she would have been able to answer it at 16. For non-UK readers, GCSE exams are taken at the end of compulsory schooling and A-levels are taken at 18, typically as a route to studying at university.

So why my unease with students finding it difficult? There’s always the charge of dumbing down levelled at exams but I don’t think that’s it. True, when I did my maths exam at that age the syllabus included calculus of polynomials and their applications, which now is only introduced at A-level, but they were different qualifications – GCSEs were only introduced after my school career had ended. I think my unease comes from the fact that I think this shouldn’t have been seen as a difficult question. Donald Clark has blogged seven reasons why he agrees with the children and thinks it wasn’t a fair question, some of which I agree with and most that I don’t.

There’s a couple of factors involved here. I recall reading a study where they looked at who could answer questions with the same maths content but that were written in different ways. That study found that questions written as word questions rather than equations were consistently harder to answer, even though there was no difference in the actual mathematical content. Secondly, I think it’s the way that maths is taught as rules and recipes to follow rather than a creative problem solving activity. This is not a criticism of the teachers because I think that it’s taught that way precisely because of the pressures that have (politically) been placed on education. As I’ve mentioned before I’m a big fan of Jo Boaler’s approach and it’s emphasis on flexibility and application of technique rather than stamp-collecting formulae. Donald Clark makes the distinction between functional maths (maths for a practical purpose such as employment) and the type of maths typically found in exams, but I think that’s a false dichotomy in this case. As Stephen Downes said “… what this question tells me is the difference between learning some mathematics and thinking mathematically.”. The difference between functional and theoretical maths (at this level) starts to disappear when we think mathematically – maths becomes a toolbox of skills to be applied to the problem at hand, rather than a particular formula in a particular topic to be remembered.

And if you’re wondering what the answer was:

The solution to Hannah's sweets

The solution

Assessment with portfolios – a neglected opportunity?

I finally got around to reading Tony Bates’ blog post on the outlook for online learning in the short and medium term. It’s an in-depth post, but I want to concentrate on one particular aspect. Tony puts forward the idea that the traditional lecture-based course will gradually disappear as learning shifts from the transmission of information to knowledge management. Later, he talks about an increase in the use of portfolios for assessment, and to my mind these two are a natural fit. This is because as teaching shifts from the presentation of a package of content to students meeting a common set of criteria traditional written assessment becomes increasingly less fit for purpose.

Clipboard and pen

Photo credit: Dave Crosby (CC BY-SA)

The idea of assessment by portfolio in higher education isn’t new. A few years ago, I saw Lewis Elton give the plenary at the teaching and learning conference at the University of Birmingham, where he suggested using portfolios as a replacement for the UK system of degree classifications. The plenary covered much of the ground in his 2004 paper ‘Should classification of the UK honours degree have a future?’ and provoked much discussion. I believe there is a case to be made for portfolio assessment to have a greater role in higher education. As Elton says, the “measurement of achievement has become more important than achievement itself”. In a recent article discussing competition in education, Natalie Bennett (the leader of the Green Party here in the UK) quoted the Education Reform Act 1988, which said that education should: “promote the spiritual, moral, cultural, mental and physical development of pupils … and prepare pupils … for the opportunities, responsibilities and experiences of adult life.” The quote relates specifically to the context of secondary education (11-18 years old), but it’s still a pretty powerful quote. And if that’s the case for school children then surely it should hold for university and higher-level study when their capacity to engage in activities that promote those values is arguably higher? We may want assessment that is both valid (measures what it is designed to measure) and reliable (gives consistent results) for very good reasons (such as quality control and comparisons across institutions), but that restricts the types of assessment we can do. What we get (across the education system as a whole) is assessment that comes down to a number or grade, and that steers the testing towards that which is easily measured, for example, a 1,500 word essay or a three hour exam.

So what does assessment by portfolio give us that conventional assessment doesn’t? First, it’s an active process and in a well designed assessment the learner has to engage over a period of time, evaluating and reflecting on their learning. Students can revisit their work and re-purpose it, for example, for use within a presentation to a potential employer. This does bring in issues of ownership and access after the end of the course, but I’ll leave those aside for now, and in any case those criticisms isn’t unique to portfolios.

Secondly, higher order and transferable skills can be assessed more effectively than they could through a conventional written assessment. For example, selecting items for inclusion will involve evaluating individual items, reflecting on their purpose and value, and synthesising the collection into a coherent whole. This does mean that the learners need to have a high level of independent learning skills, which may not be the case. A supportive pedagogical design with clear scaffolding and direction can help develop these skills. Another point is that this form of assessment is authentic – it assesses in a direct analogue of how use these higher order skills within a workplace. Support and direction need to be explicit, not only on the process of the assessment, but also on its purpose. My daughter used an eportfolio as a record of achievement during her degree, but didn’t see the point and questioned why they couldn’t simply submit their assignments and leave it at that. Another acquaintance is studying for a primary PGCE on a course that uses the Mahara eportfolio and said that it’s almost universally hated, mainly it seems because they find the user interface unintuitive.

The portfolio can be an integrating influence that draws the rest of the course together for students. They have been used very successfully on a Master’s level distance learning course by the Open University (Mason et al, 2004). The course consisted of four modules with two items being selected from each module for inclusion. Two thirds of students were positive about the role of the portfolio as an integrating element in the course.

So what are the downsides? Well, most criticisms centre around the issues of reliability and validity, but that brings us back to Elton’s statement that achievement is second place to the measurement of achievement. Elton also said that the “prime purpose of assessment should be to encourage good learning” (Elton, 2004), and that there should be a bias towards success and not failure. He’s not referring to grade inflation, but that we should move away from the deficiency model of traditional assessment. This brings to mind the perennial debate around standards, ‘dumbing down’, and whether assessments should act as a gate-keeper (norm-referenced) or as a marker of achievement (criteria-referenced). Should everyone on a course be able to get a first class degree? Absolutely, if they’ve met the standards for a first-class degree, but I can imagine the outcry if a department were to award first class honours to an entire cohort, supposing they were lucky enough to get such a cohort in the first place.

Elton recommends that if something can be assessed reliably and validly then grade it conventionally. If something can be assessed validly but not reliably then it should be graded pass/fail using experienced examiners. If it can’t be assessed either reliably or validly then it should be reported on (again by experienced examiners) rather than graded. Knight (2002) had a similar argument, stating that summative assessment should be restricted to “that which can be reliably, affordably and fairly assessed”. Skills development and other aspects of the curriculum should be formatively assessed. Portfolios, then, blur the lines between formative and summative assessment.

Portfolios have great potential for assessment, provided that they are used wisely and that significant effort is made to change the students’ focus away from the product. It’s like getting an essay for homework at school – the essay isn’t the homework, the homework is the process of research, drafting, revision and synthesis, and the printed essay is just the evidence you did it. The sticking point is portfolios they exist within an educational ecosystem that functions to support and validate conventional assessment, and that is likely to change only slowly.

References

Bates, T. (2014). 2020 Vision: Outlook for online learning in 2014 and way beyond.

Bennett, N. (2014) Let’s Get Heretical on Education: Competition Has Failed.

Elton, L. (2004). Should classification of the UK honours degree have a future? Assessment and Evaluation in Higher Education, 29(4), 415–422.

Knight, P. (2002). Summative assessment in higher education: practices in disarray. Studies in Higher Education, 27, 275–286.

Mason, R. Pegler, C. and Weller, M. (2004). E-portfolios: an assessment tool for online courses. British Journal of Educational Technology, 35(6), 717–727.

Two fishy MOOCs

A few weeks ago, I completed two MOOCs that ran at the same time and covered similar subject areas (at least at first glance), so I thought I’d ‘compare and contrast’ the two. One was the University of Southampton’s Exploring Our Oceans course on Futurelearn, and Duke University’s Marine Megafauna course, which ran on Coursera. I do have a background in the subject – I did a degree in Marine Biology and Zoology at Bangor University  so my aim was to look at the courses from a professional (educational technology) viewpoint while refreshing my knowledge of a subject I love.

Photo credit: Strobilomyces

Photo credit: Strobilomyces

Although both courses involved the oceans they did focus on different disciplines. Southampton’s course was more of an oceanography course while the marine megafauna course, as the name suggests, used the enigmatic big beasties to draw in and hold the students’ attention. Both courses could be described as xMOOCs although, as Grainne Conole has pointed out recently, there are much more nuanced ways of describing and classifying MOOCs. Any comparisons have to take the platform into account because it isn’t a neutral actor, as we can see in the way video is used on Coursera and assessment is done on Futurelearn.

Who are the students?

The marine megafauna course largely replicates a standard model of undergraduate education placed online, and doesn’t seem to assume any existing knowledge, although with a background in the subject I might be missing something. The Southampton course also doesn’t assume existing knowledge but here the approach is different with the target demographic that of what I’ll call the ‘curious amateur’. In other words, someone who comes to the subject with curiosity, passion, but who may have little experience of the subject or studying recently. As well as not assuming existing knowledge, Exploring Our Oceans also had material explicitly marked as advanced and optional so that participants could explore a particular area in more depth.

Video. And more video.

Both courses make frequent use of video. Marine Megafauna, like many of the courses on Coursera, uses video as its primary way of delivering content. There were five to eight videos per week, mostly as video lectures with other video clips, simulations, and audio embedded within them. Futurelearn delivers learning materials in a very linear manner so for example, in week three there will be items 3.1, 3.2, etc. Some of these were videos (complete with pdf transcript), but some were text-based where that was more appropriate. And that’s as it should be – video, useful as it is, is not the one medium to ‘rule them all’. In fact, one way that I’ll catch up on a MOOC is to read the video transcript and skip to particular points if I need to any graphics to help with my understanding. Video needs to be appropriate and offer something that the participant can’t get more easily or faster through different media, and for the majority of the time the Exploring our Oceans did that. Production values were high. We saw staff filmed on the quayside, on ships and in labs explaining the issues and the science from authentic environments. Related to this, here’s an example of poor practice with video. I’m enrolled on another Futurelearn MOOC with a single academic as the lead educator. At the start of every video the academic introduces themselves and their academic affiliation as thought we’ve never met them before. It’s week five. There are multiple videos each week – it’s not like we’re going to forget who they are between step 5.2 and step 5.5.

What didn’t I like?

I felt Marine Megafauna was a little heavy on taxonomy initially as we had introductions to each group of animals. Taxonomy is important. For example, the worms that live around hydrothermal vents (and who made appearances on both courses), have moved phylum since I did my degree, and major groupings within the gastropods have also been revised in 2005 and later. I would have preferred an introduction to group X (including taxonomy) followed by exploring that group’s ecology, conservation issues and adaptations to life in the ocean in more detail. You could compare to other groups at that point or have a summary/compare and contrast section later in the course, which would serve as a good synthesis of the course so far. As it was, it felt like we were marking time until we got to the interesting parts, and course retention might have suffered at that point. For the Southampton course, the parts I disliked were outside the control of the staff. Futurelearn uses a commenting system at the bottom of the page, similar to that of blogs, rather than the forums found on other platforms. In one way, that’s good in that it keeps the comments within context, but bad in that it prevents participants from starting their own discussions and searching comments is a non-starter. The other thing I didn’t like about the Southampton course was the assessment, which I’ll come back to later.

What did I like?

In Exploring Our Oceans I liked the range of other activities that we were asked to do. We shared images, planned an expedition, and did a practical. Yes, a real life, who made that mess in the kitchen practical on water masses and stratification using salt and food dye. In Marine Megafauna, I enjoyed the three peer assessments and the fact that scientific papers were an explicit part of each weeks’ activities. We would have between one and three PLoS ONE papers each week, and the material within them was assessed through the weekly quizzes. There were supporting materials for those unused to making sense of journal articles. Exploring Our Oceans did use some journal articles when discussing how new species were described and named, but not as an integral part of the course.

Assessment

This was the area in which I found the biggest difference between the two courses, partly I think due to the different target participants (‘undergraduate-ish’ versus ‘curious amateur’), but largely due to the restrictions of the platform. Marine Megafauna had weekly quizzes with between 20 and 25 multiple choice questions, including questions that (unusually for MOOCs) went beyond factual recall. There were three attempts allowed per quiz with the best result counting. Each quiz contributed 10% to the final course mark. There were also three peer assessments – a Google Earth assignment, a species profile, and a report on a conservation issue for a particular species. The Google Earth assignment was largely quantitative and functioned as the peer marker training for the following two.

Exploring our Oceans had quizzes of five to six multiple choice questions, with three attempts per question and a sliding scale of marks (three marks for a correct answer on the first attempt down to one mark for a correct answer on the last attempt). But this is a platform issue. At a recent conference, someone who had authored Futurelearn quizzes gave their opinion on the process, the polite version of which was “nightmare”. I have seen peer assessment used successfully on other Futurelearn courses so it is possible, but it wasn’t used within this course.

Personally, I preferred the longer assessment for a number of reasons. First, it tests me and gives me a realistic idea of how I’m doing, rather than getting a good mark for remembering something from lecture one and guessing the other four questions. Secondly, more questions means fewer marks per question, so one area of difficulty or confusion doesn’t drag my score down. Thirdly, and regardless of how it contributes to the final course mark, I see it as formative, something to help me. I want to be tested. I want to know that I ‘got it’; I also want to know that my result (formative or not) actually means something and that means rigorous assessments. This may not be the same for everyone and a more rigorous assessment may discourage those participants who only see assessment as summative and lead them to believe that they are ‘failing’ rather than being shown what they need to work on.

Some final thoughts

If I didn’t already know the subject, what would I prefer? I think I’d prefer the approach of Exploring our Oceans but with the assessment of Marine Megafauna, with a clear explanation of why that form of assessment is being used. I really enjoyed both courses so if you’re interested in marine science, then I’d say keep an eye out for their next run.

P.S. Santa? Put one of these on my Christmas list please. Ta.