ChatGPT를 활용한 서술형 문항 생성 프로토콜과 문항의 질 평가: 국어과 사례를 중심으로

함은혜; 박소영; 이병윤; 김기동; 이대형

서지반출

국문초록

이 연구는 ChatGPT를 활용하여 서술형 문항을 생성하기 위한 프롬프트 가이드라인을 탐색하고, 그에 따라 생성된 문항의 품질이 인간 개발 문항과 비교하여 적절한지를 검토하였다. 특히, 중･고등학교 국어과에서 서술형 문항의 개발과 활용에 초점을 맞추었다. 선행연구를 검토하여 서술형 문항 개발의 핵심요소들을 정의하고, ChatGPT를 활용한 문항 생성 전략을 다양하게 실험하여 3단계의 프롬프트 가이드라인을 개발하였다. ChatGPT 생성 문항과 인간 개발 문항의 품질을 20개 평가항목에 따라 평정한 결과, ChatGPT 생성 문항의 품질이 인간 개발 문항과 비교하여 유의하게 낮았다. 특히, 과업 내용의 명료성, 과업 내용과 제시자료의 연계, 국가교육과정에의 부합도, 채점기준의 신뢰도 측면에서 ChatGPT 생성 문항들의 점수가 낮았다. 반면, ChatGPT 생성 문항들은 학습목표-과업 내용-채점기준의 연계성, 학생 수준의 어휘 사용 측면에서 인간 개발 문항과 차이가 없었다. 마지막으로 인공지능을 활용한 서술형 문항의 생성과 인간-AI 협력 평가 설계에 주는 시사점을 논의하였다.

영문초록

This study explores prompt guidelines for generating constructed-response items using ChatGPT and examines the quality of the AI-generated items in comparison to human-developed items. Based on a literature review and testing of various prompting strategies, we developed a three-step prompt guideline for generating both constructed-response items and scoring rubrics. We then compared the quality of the ChatGPT-generated items to human-developed items across 20 evaluation criteria. The results showed that the quality of the ChatGPT-generated items was significantly lower than that of the human-developed items, particularly in terms of task clarity, alignment with reading materials, compliance with the national curriculum, and reliability of scoring criteria. However, no significant difference was observed between the ChatGPT and human-developed items in areas such as alignment among learning objectives, task content, scoring criteria, and vocabulary appropriateness. Finally, implications for using AI in item generation and designing human-AI collaborative assessments were discussed.

키워드

서술형 문항 문항의 품질 ChatGPT 인간-AI 협력 평가 형성 평가

구매하기 (5,800)

장바구니

국문초록

영문초록

목차

키워드