HUFLIT INTESOL INTERNATIONAL CONFERENCE 2026

Name: HUFLIT INTESOL INTERNATIONAL CONFERENCE 2026
Start: 2026-08-07T07:30:00+07:00
End: 2026-08-07T17:00:00+07:00
Location: HUFLIT University

7 August 2026

HUFLIT University

Asia/Ho_Chi_Minh timezone

Contact

ChatGPT-5.0 vs Human Scoring in IELTS Writing Task 2: A Comparison Across Prompt Designs and Criteria

Not scheduled

20m

Main Conferene Hall (HUFLIT University)

Main Conferene Hall

HUFLIT University

828 Sư Vạn Hạnh street, Hòa Hưng ward, Hồ Chí Minh city, Vietnam

Technology and Digital Support for ESL Development

Thanh Trinh (HCMC University of Technology and Engineering)

This study aims to explore whether different prompting designs produce significant differences in ChatGPT5.0-generated scores for IELTS Writing Task 2 essays and to examine to what extent ChatGPT5.0’s essay scorings are aligned with human ratings. Using a dataset of 56 essays, scores generated under two prompting designs (with or without calibration examples) were compared with each other and with a human benchmark derived from multiple raters. Scores were analyzed across four IELTS criteria (Task Response, Coherence and Cohesion, Lexical Resource, and Grammatical Range and Accuracy) as well as overall performance using descriptive statistics, repeated measures ANOVA, correlation, and intraclass correlation coefficients (ICC). Findings revealed that prompt design, i.e., inclusion of calibration examples, influenced automated essay scoring capabilities of ChatGPT since the scores significantly differed between the two prompting scenarios. While ChatGPT and human scores showed insignificant variation at the overall level, systematic discrepancies across the three marking criteria were detected, with the exception to Task Response. Specifically, ChatGPT tended to assign higher scores for Coherence and Cohesion and lower scores for lexical and grammatical aspects. Pearson Correlation analyses found moderate to strong relationships between ChatGPT-based scores and human ratings, suggesting ChatGPT could reliably rank writing performance whereas lower intraclass correlation coefficients signified a weaker alignment in terms of absolute scorings.
Keywords: ChatGPT5.0, IELTS Writing, reliability

Thanh Trinh (HCMC University of Technology and Engineering)

There are no materials yet.

Account creation, submission, and review notification emails may go to Spam. Please check your Spam folder and mark the email as "Not Spam" so that future conference emails go to your Inbox.

HUFLIT INTESOL INTERNATIONAL CONFERENCE 2026

Contact

ChatGPT-5.0 vs Human Scoring in IELTS Writing Task 2: A Comparison Across Prompt Designs and Criteria

Main Conferene Hall

HUFLIT University

Speaker

Description

Author

Presentation materials

Account creation, submission, and review notification emails may go to Spam. Please check your Spam folder and mark the email as "Not Spam" so that future conference emails go to your Inbox.

Choose timezone

HUFLIT INTESOL INTERNATIONAL CONFERENCE 2026

Contact

Speaker

Description

Author

Presentation materials