Cloning in your classroom - MaxBruges.com

🧬 Cloning in your classroom

4 May 2025

Digital twins could transform the way we teach English.

HERO Dolly the Sheep Dolly the sheep (left).

ā€œBut what’s the right answer, sir?ā€

I’d barely been teaching a week before being brought to a dead stop, mid-lesson. I’d done all the right things: pre-taught the vocabulary, broken down the context, painstakingly annotated the iambs and trochees. But that had all been vanity.

Rafael wanted an answer. And he wanted the right one.

No amount of ā€œwell, one could argueā€¦ā€ or ā€œa possible interpretation might beā€¦ā€ or any critical expostulation would satisfy him. If Maths and Science and French and ICT could answer their questions with a tick or a cross, why couldn’t English?

Spinning the flywheelšŸ”—

Rafael was right, of course. This is the problem with English. Our answers are too fuzzy. There isn’t one, immutable, correct response to ā€˜How does Shakespeare portray the protagonist as a tragic hero?’ - and Harold Bloom will rise from his grave and throttle you if you even suggest it.

Harold Bloom in his study, aside Not angry, just disappointed.

But that is also the point: we do not teach knowledge. We teach the encoding and decoding of language, in all its fuzzy nuance.

This makes the essential iterative loop of learning - answer --> assess --> improve --> re-answer - a slow, manual process. Every answer from a student in English needs to be weighed on Thoth’s scales of criticality, and only the teacher in the room is fully qualified to make that judgment. We look in envy at our STEM colleagues with their binary marking and autonomous testing; for every sentence we evaluate for the nuance of language usage, they can mark a dozen sums right or wrong. The maths student receives an order of magnitude more indicators of success and correctness that the English student does, even before accounting for the ease of self-assessment.

The bottleneck to learning in Arts and Humanities is the teacher’s capacity to assess and feedback. But that could be about to change.

Fuzzy bots, firm answersšŸ”—

Above everything, LLMs excel at fuzzily assessing language, which makes them the perfect candidate for ā€˜cloning’ an English teacher.

aside Yes I know they aren’t clones.

They may hallucinate facts or forget data, but their ability to read and extrapolate from written text is dependably good. Ultimately, that’s what we need in English. We learn it the same way the LLM does: ingesting as much good quality writing as possible so that we can produce our own, mimicking styles and structures whilst changing the content.

The classroom of tomorrow can leverage this technology to finally widen that bottleneck; a digital twin of the teacher to serve as a low-stakes feedback machine.

They don’t need to be perfect, only ā€˜good enough’ to give an indication of if the student is the right direction: finally giving English some semblance of the STEM binary; though rather than correct/incorrect, we can ask for better/worse.

Cautious optimisationšŸ”—

But tread softly, and never mistake eloquence for accuracy. The key distinction here is between evaluation and feedback. The bots are rubbish at the former: too easily swayed by context and weighting to reliably give accurate marks and levels for written work.

In my own rudimentary experiements, the bots showed the sort of capricious inconsistency that would get an examiner booted from a marking pool in minutes, with grading that varied wildly even from paragraph to paragraph. Running the same exam script through the AI three times would produce three distinctly different results. There have been some promising steps in recent months to improve on this, but at this stage the tech is simply too inconsistent to take on the quantitative aspect of marking. We may get there with a big dollop of fine-tuning and RAG-ing, but for now the auto-marking bot remains but a twinkle in every GCSE assessor’s eye.

A Vogon receiving criticism on his poetry ā€œThe candidate has demonstrated an ambitious use of vocabulary, commensurate with AO3 Level 4.ā€

Feedback, on the other hand, is ideal. Load up the AI’s context window with a script of pre-written responses (ā€˜Vary your sentence structures’, ā€˜Consider your paragraphing’, ā€˜Incorporate more ambitious vocabulary’, etc.) and they can usually be counted on to pick the right one for the right text. And for obvious reasons: it’s doing the job it was made for, to extrapolate text. Assessment frameworks may come and go, marking criteria may change with phases of the moon, but good, broad feedback is forever.

For low-stakes assessment, where the practice rather than the outcome is what matters, this has the potential to work brilliantly. A quick nudge in the right direction can keep momentum up and spin the learning flywheel; nudges that you don’t always have the time or space to deliver.

In truth, the only way to know is to do. Run some student work through ChatGPT, with a strict prompt limiting them to your pre-written comments. See what it comes up with. Feed in some more tailored comment templates. Tweak, repeat, iterate.

It may not be classroom-ready today, but you may have a feedback clone to help you tomorrow.