A compass for new territory: Generative AI and school assessment

Roger Kennett
Nov 29, 2023
6 min read

The Australian Tertiary Education Quality and Standards Agency (TEQSA) recently released a thought-leadership piece on generative AI and assessments. The roles of assessment in the secondary context differ somewhat from tertiary (secondary schools are not credentialing the engineers designing the next tunnel you are going to trust your life to!). However, there are expert insights worth exploring on from a secondary school perspective. The authors describe their contribution as more of a compass than a map, so let's take a short journey into the emerging territory.

Key thinking from TEQSA - from secondary school perspective

The use of AI will become common place in schools and workplaces and we must prepare our students within, and for, this context.
Attempted detection of AI is a fool's errand, a simplistic solution to a complex situation. Personally, I believe that AI detectors' potential for harm is significant and their ability to achieve their stated function is insignificant.
AI represents an urgent catalyst for rediscovering assessment, not just because of the acute issue it presents for a "business as usual" approach to assessment tasks, but because AI will impact what is worth assessing in the first place. In a world where AI is everywhere, what are the human-bits we need to equip our students for?
In order for teachers to form trustworthy judgements, they need triangulate evidence from multiple sources which assess what we value with integrity. At strategic points, educators need to be able to form assessments of what students can do on their own, and these kinds of assessment require resource-heavy invigilation. I especially appreciate that every example in the report includes a trade-off to re-deploy existing resources, rather than demanding yet more of our teachers.
Assessment should be part of the learning process, not an add-on, and the best assessment regimen includes the triangulation of many types and contexts of assessment. Good assessment should generate a "rich portrayals of student learning" and provide meaningful contexts for students to deepen and refine their understanding.
Some assessments should focus on the process of learning. This process needs to include collaboration with others as well as using modern tools ethically and appropriately (including generative AI).
Assessment is a partnership where student participation is promoted. Assessment should be a process, not an endpoint. It should be dialogue between teacher and students which develops the picture of student learning not a series of isolated, marked-and-done proclamations.

Thinking stimulated by this report

The stated aim of the authors is to stimulate thinking around assessment in learning (specifically their sector) - and it has certainly got me thinking.

What is worth assessing (and therefore teaching)?

I have previously written about the paradox where AI is forcing us towards assessment types which are poor at teaching and valuing the skills of creativity, collaboration, communication, problem solving, and real-world application. Authentic contexts for students to demonstrate the richness of their skills and understanding are vital and must "count" towards whatever measure we value.

Chunking and number of NESA allowed assessment tasks

Currently NESA limits the number of assessments tasks (that count) to between 3 and 5 per year per subject. The well-meaning rationale underpinning this has been to manage student stress. However, the tension between the number of task and their individual weight (and hence student anxiety) in inverse and inescapable. Fewer tasks has, in many contexts, led to students lurching from one assessment task to another. Secondary teachers know the experience of students unable to think and work in their subject this morning because of an upcoming (usually English - sorry, It's true) assessment task. It can cause an unhelpful form of "punctuated equilibrium" in learning. Generative AI provides a compelling reason to revisit this task restriction. Chunking a larger task into smaller, varied, longitudinal submissions can help form a more trustworthy judgement about student learning. It is notable that some of the examples in the TEQSA report showcase this approach - which is currently "against the rules" in secondary schools. What if some of the chunks of the one overall task focus on the process of learning? Perhaps one is an opportunity for students to engage in the (new) skill of creating-with-AI including reflecting on this process and the accompanying ethics, dangers and benefits? Another chunk could be an invigilated component demonstrating what they can do on their own... Multiple, lighter-weighted assessments reduces the stakes, provides a diversity of measurements. Chunking is a process familiar to secondary teachers, so allowing the parts to be individually assessed is not a huge leap. Having multiple parts of a larger "whole task" reduces the cognitive load as the context is constant and hence the automatisation of skill components can develop slowly.

Invigilated Assessments

I recall when the "exam supervision roster" became the "exam invigilation roster" - anyone else? However, the careful use of invigilation in this context is worth a second thought.

I have previously written about my experiences of using dialogue as a formal assessment tool. This is a form of invigilated assessment.
I have used a form of the Geoffrey Robertson hypothetical as a formative assessment tool (but it could be summative). Students come to a forum having researched their role and the contribution to domain understanding I allocated them. What then transpires is an unfolding hypothetical scenario - partly scripted, largely "live", where students transform their learning into a dynamic exchange. The teacher holds a few game-changer moments to throw in to keep the exchange (and student thinking) dynamic and evolving (It's really worth watching at least one of Geoffrey's episodes to get the idea). The use of an AI-powered tool could capture the transcript to assist with forming a trustworthy judgement about student learning. This could be just one part of a larger assessment project. This is a form of invigilated assessment.
How about using a readily available tool like Zoom to record and transcribe a meeting of students working as a group on a larger task to evaluate their collaboration skills as they problem-solve some aspect of that larger task? This is a form of invigilated assessment.

Sure, a pencil and paper test is a useful arrow in the quiver of assessment tools we have. That too, is a form of invigilated assessment - just not the only form of invigilated assessment. If you are a teacher reading this, I am sure your imagination has already run past mine and you are dreaming up all sorts of invigilated assessment ideas - please share these with me, I'd love to hear them!)

Assessment as learning: conversation not canon

Ultimately, we want our students to transition to self and peer assessment. That's the kind of assessment a musician uses, or a chef peeking her head out of the kitchen door to see how the patrons are responding to her new dishes. For this, and many other good reasons, secondary school assessment should be more conversation than canon. Their progress from novice to expert in your domain should be explicit across the points of assessment (as far as possible). To achieve that clarity requires some deep thinking about the map of assessment across a larger scale. How, in this task, can they show the progress of their understanding and skill from the earlier tasks? This is easier in some domains than others, but it is possible in all. The key is mapping assessment before designing the course and I am going to again shoutout to "understanding by design" or "backward design model" as a great framework to for this approach.

It is not so much that everything assessment has to be AI, but that generative AI has delivered a catalytic moment for us to recapture the essence of good assessment. It might not be the most convenient time, but teachers know we must "seize the day"!

Perhaps it is time to allow schools more flexibility to wrangle assessment back to its core functions: informing subsequent teaching and learning and recognising individual learning attainment. The Master's report into NSW education provides some system-wide thinking which the generative AI storm has made more, not less, relevant.

Resource reallocation

We need to make some decisions as we adapt to assessment in the AI world. That means letting go of some things so we can allocate resources into the new styles of assessment. Recognising that all assessment instruments are imprecise and contain biases which favour some students over others, is a good first step to letting go.

We have followed a helpful compass to explore the new territory. Like a rock climber, we need to let go of a finger-hold, however familiar and secure, if we are to edge our way up to the next one. Teachers are under excruciating pressure. Retaining good teachers is increasingly difficult. This structural revolution needs to be negotiated to give teachers back more time for teaching our amazing students, not less.

Notes:

1. I also referred to some ideas from an earlier document by the Australian Learning and Teaching Council.

2. The graphic was co-created with AI... - ok, I played with some prompting before it produced an image that I do not have the skill to create alone :)