AI outperforms human students in university exam study

June 27, 2024

27 June 2024

A recent study conducted by the University of Reading has revealed significant findings regarding the performance of AI in academic assessments. Researchers created 33 fictitious students and used the AI tool ChatGPT to generate answers for module exams in an undergraduate psychology degree program.

The results indicated that the AI-generated responses consistently outperformed those of real-life students, with the AI students achieving results that were, on average, half a grade boundary higher than their human counterparts. Additionally, the study found that 94% of the AI-generated essays did not raise any concerns with markers, making them nearly undetectable.

The study was led by Associate Professor Peter Scarfe and Professor Etienne Roesch, who emphasised that these findings should serve as a “wake-up call” for educators globally. They highlighted the robust performance of AI submissions, which gained higher grades than those submitted by real students.

The research involved submitting fake exam answers and essays for first-, second-, and third-year modules without the knowledge of the markers. The AI students outperformed the real undergraduates in the first and second years. However, human students scored better in the third-year exams, which researchers attributed to AI’s current limitations with more abstract reasoning.

These findings have raised serious concerns within educational institutions. Some are considering a return to in-person paper exams to mitigate the influence of AI. However, this poses a significant question: Can universities afford to revert to traditional methods, or must they adapt to integrate AI responsibly into their assessment strategies?

The implications of this study are profound, suggesting a need for a re-evaluation of assessment methods and the role of AI in education. As AI continues to advance, educators must find a balance between leveraging technology and ensuring the integrity and fairness of academic evaluations.