Higher Education Research and Development Society of Australasia
With many universities now integrating large language models like GPT-4o into academic life, the conversation has quickly moved beyond student use. Educators and institutions are exploring the use of generative AI for instructional purposes, including automating grading, enhancing curriculum design, and providing personalised learning experiences. Institutions are increasingly asking:
“Can AI enhance teaching and assessment too? Can it help address long-standing issues in grading, such as inconsistency, fairness, and feedback quality?”.
Early year 2025, we conducted a small pilot project to explore whether generative AI tools can mark student assignments using detailed rubrics with the same accuracy and fairness as human assessors. The goal was not to replace educators but to evaluate whether AI could act as a useful marking assistant or quality check, especially in larger cohorts where consistency becomes harder to maintain.
The Pilot: Can AI Apply a Rubric for Higher Education Marking?
In this pilot, we tested a real postgraduate assignment that had already been assessed by two academics using a shared rubric. The task required students to write a brief project proposal including a background, aim and objectives, methodology, and references.
We then submitted the same assignment instructions and rubric into GPT-4 via a structured prompt:
“Mark the following student assignment according to the rubrics. Focus on helping the student understand what they did well, what could be improved, and how to take the next step.
The assignment task and instructions are as below here:
(Paste assignment instructions)
The assessment rubrics are as below:
(Paste rubric here)
Please provide a mark for each of these categories. Thank you!”
The AI was asked to assign marks for each criterion and provide overall feedback. We then compared the AI-generated scores to those from human markers.
What We Found
Although total scores between human and AI markers were similar on average, the differences became more apparent when analysing individual rubric components.
Overall, AI markers were more lenient, awarding approximately 7.5% higher scores than human assessors. The largest discrepancy appeared in the background section, where AI scores were significantly higher (p < 0.01), likely due to its emphasis on fluency and structure rather than critical depth (Figure 1).

Figure 1. Comparison of Mean Scores by Rubric Category: Human Assessors vs Generative AI
Figure Notes: Values were calculated using paired-samples t-tests (two-tailed significance). Rubric categories include: Background (out of 30), Aim (out of 20), Method (out of 40), References (out of 10), and Total (out of 100). Asterisks (*) indicate statistically significant differences (p < 0.05).
For aim and objectives, human assessors gave slightly higher scores than AI, though the difference was smaller. The most striking differences, however, emerged in the methodology section. While average scores were similar, human markers showed much greater variation (standard deviation ±3.85 vs AI’s ±1.98 out of 40). This suggests that human markers were more discriminative, better able to reward well-justified methods and identify vague or generic responses, which is an area where disciplinary expertise is crucial.
In references and total scores, there were no statistically significant differences. Still, human markers often offered more precise critiques, particularly on citation accuracy and relevance, details that AI struggled to assess meaningfully.
Strengths and Limits of AI in Marking
One noticeable difference was in the style of the feedback provided. Human markers provided annotated comments tailored to specific sections of the student’s work. These comments were often more nuanced, and in some cases, more supportive or empathetic in tone.
In contrast, AI-generated feedback was typically summary-based which are more concise, rubric-aligned, but often impersonal. While efficient, this style sometimes lacked warmth or specificity. If multiple students receive nearly identical feedback, the credibility and usefulness of that feedback may be diminished. Whether these stylistic differences impact learning outcomes remains an open question, and one we hope to explore further.
In that sense, AI may be effective for offering a general overview or consistency check, but not as a replacement for personalised feedback especially in complex or open-ended assessments.
Where to Next?
Our early findings suggest that AI holds real potential for supporting fairer and more efficient assessment practices, particularly when used not to replace educators but to enhance their decision-making. In large cohorts or moderation settings, AI may serve as a valuable second opinion or consistency check. There is also potential for thoughtful integration into university learning management systems.
But just like a well-designed rubric, it works best when combined with human insight. Striking the right balance between utilising AI tools and maintaining traditional teaching methods to address individual student needs is crucial for effective feedback.
Disclaimer: This study used VAL, RMIT’s internal GPT-4-based AI tool, which operates in a secure environment without access to live internet content. As different AI platforms vary in how they are tuned or restricted, results may differ from other publicly available generative AI models.
Banner image generated by ChatGTP
Blog contributors:
The HERDSA Connect Blog offers comment and discussion on higher education issues; provides information about relevant publications, programs and research and celebrates the achievements of our HERDSA members.
HERDSA Connect links members of the HERDSA community in Australasia and beyond by sharing branch activities, member perspectives and achievements, book reviews, comments on contemporary issues in higher education, and conference reflections.
Members are encouraged to respond to articles and engage in ongoing discussion relevant to higher education and aligned to HERDSA’s values and mission. Contact Daniel Andrews Daniel.Andrews@herdsa.org.au to propose a blog post for the HERDSA Connect blog.
HERDSA members can login to comment and subscribe.