The research highlighted the potential of GPT-4 in higher education, suggesting that automated grading of comprehensive content is achievable. However, concerns around data security and privacy with cloud-based models like GPT-4 persist. Alternatives such as Llama 2, which can be locally installed, are being explored, though they currently lag behind GPT-4 in performance. The balance between performance, adaptability, and data security remains a key point of discussion in the application of AI in educational assessment.
Key takeaways:
- The research studied the application of AI, specifically GPT-4, in educational assessment and its performance in Automated Short Answer Grading (ASAG).
- GPT-4 showed robust performance in grading short-answer responses, even without a reference answer, with the best results on the SciEntsBank dataset and an unexpected better performance on the Beetle dataset when the reference answer was withheld.
- Despite GPT-4's impressive capabilities, models from the BERT family, which undergo both pre-training and task-specific training, still outperform it, highlighting the importance of task-specific training.
- While GPT-4 shows potential in higher education and automated grading, concerns around data security and privacy persist with cloud-based models, leading to exploration of alternatives like Llama 2.