사전학습 언어모델 기반의 한국어 질문-답변 데이터 증강 방법

조우진; 이혁준; Woojin Cho; Hyukjoon Lee

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document :

한글제목(Korean Title)	사전학습 언어모델 기반의 한국어 질문-답변 데이터 증강 방법
영문제목(English Title)	A Pretrained Language Model-Based Data Augmentation Method for Korean Question-Answering Systems
저자(Author)	조우진 이혁준 Woojin Cho Hyukjoon Lee
원문수록처(Citation)	VOL 27 NO. 12 PP. 0563 ~ 0573 (2021. 11)
한글내용 (Korean Abstract)	자연어처리는 최근 인공지능이 각광을 받으며 비약적인 발전을 이루고 있다. 자연어처리의 여러 문제 중 질문-답변은 인공지능이 문단 내에서 질문에 맞는 답을 찾아주는 문제다. 인공지능 문제에서 우수한 성능을 달성하기 위해서는 인공지능 모델과 학습 데이터셋의 확보가 매우 중요하다 특히 질문-답변 데이터셋은 질문-답변의 문법, 관계 등 인간의 직접적 개입이 많이 요구되어 데이터 구축이 쉽지 않다. 이런 문제점을 해결하기 위해 본 논문에서는 답변 생성, 질문 생성, 필터링의 3 단계로 구성된 질문-답변 데이터 증강 방법을 제안한다. 증강된 데이터를 이용하여 학습시킨 모델의 질의응답 성능이 KorQuAD 데이터만으로 학습시킨 모델에 비해 F1-score 기준 최대 1.13 증가한 결과를 얻을 수 있음을 실험을 통해 보인다.
영문내용 (English Abstract)	Abstract Natural language processing (NLP) has recently made rapid progress with artificial intelligence (AI) in the spotlight. Among the many problems of NLP, question-answer (QA) is a problem in which an AI algorithm finds the right answer to the question within a paragraph. Securing artificial intelligence models and training data are utmost important to achieve good performance of AI. In particular, QA data requires a lot of direct human intervention due to grammars and relationships between questions and answers, making it difficult to obtain a data set. To solve this problem, this paper proposes a QA data augmentation method consisting of four steps: answer generation, question generation, round-trip filter technique, and verification. Experiment results shows that the QA performance of the model trained using the augmented data could achieve up to 1.13-fold increase in terms of F1-score compared to the model learned by using KorQuAD data only.
키워드(Keyword)	자연어처리 질문-답변 KorQuAD 말뭉치 자연어생성 natural language processing question-answering KorQuAD Korean corpus natural language generation
파일첨부	PDF 다운로드