Andrea Horbach, Daniel Mora Melanchthon, Nils-Jonathan Schaller, Stefan Keller, Jennifer Meyer, Thorben Jansen

Empirische Arbeit: One Model to Score Them All? On Suitability, Stability, and Synergy in Automated Essay Evaluation

Sofort lieferbar

0,00 € inkl. MwSt.

PDF

Writing is a key educational competence whose development depends on feedback. Automated Essay Scoring (AES) is increasingly used to support feedback generation, yet most prior research has neglected score stability across repeated runs. However, inconsistent scoring can undermine trust in AES and limit its educational value. We evaluate feature-based logistic regression models, transformer-based neural models, and generative large language models on 4,593 EFL essays from the MEWS dataset. Beyond accuracy, we analyze prediction variability across multiple runs and examine agreement patterns within and across model families. Results show that feature-based models remain competitive, while LLMs achieve high accuracy. Despite strong average performance, GPT-5 exhibits substantial variability across runs. Across models, agreement patterns reveal that different families succeed and fail on different item subsets. Our findings underline stability as a crucial dimension for deploying AES models in educational contexts and highlight the need for careful model selection and potentially model combination.

Mehr Informationen

Mehr Informationen
Bibliographie	Andrea Horbach / Daniel Mora Melanchthon / Nils-Jonathan Schaller / Stefan Keller / Jennifer Meyer / Thorben Jansen Empirische Arbeit: One Model to Score Them All? On Suitability, Stability, and Synergy in Automated Essay Evaluation 14 Seiten. ()
Seiten	14
Artikelnummer	PEU20260305
Autor:in	Andrea Horbach, Daniel Mora Melanchthon, Nils-Jonathan Schaller, Stefan Keller, Jennifer Meyer, Thorben Jansen
Erscheinungsdatum	01.07.2026

Empirische Arbeit: One Model to Score Them All? On Suitability, Stability, and Synergy in Automated Essay Evaluation

Portofreier Versand

Kauf auf Rechnung

Persönlicher Service