신문기사와 소셜 미디어를 활용한 한국어 문서요약 데이터 구축

Gyoung Ho Lee; Yo-Han Park; Kong Joo Lee; 이경호; 박요한; 이공주

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보처리학회 논문지 > 정보처리학회 논문지 소프트웨어 및 데이터 공학

정보처리학회 논문지 소프트웨어 및 데이터 공학

Current Result Document : 1 / 10 다음건

한글제목(Korean Title)	신문기사와 소셜 미디어를 활용한 한국어 문서요약 데이터 구축
영문제목(English Title)	Building a Korean Text Summarization Dataset Using News Articles of Social Media
저자(Author)	Gyoung Ho Lee Yo-Han Park Kong Joo Lee 이경호 박요한 이공주
원문수록처(Citation)	VOL 09 NO. 08 PP. 0251 ~ 0258 (2020. 08)
한글내용 (Korean Abstract)	문서 요약을 위한 학습 데이터는 문서와 그 요약으로 구성된다. 기존의 문서 요약 데이터는 사람이 수동으로 요약을 작성하였기 때문에 대량의 데이터 확보가 어려웠다. 그렇기 때문에 온라인으로 쉽게 수집 가능하며 문서의 품질이 우수한 인터넷 신문기사가 문서 요약 연구에 많이 활용되어 왔다. 본 연구에서는 언론사가 소셜 미디어에 게시한 설명글과 제목, 부제를 본문의 요약으로 사용하여 한국어 문서 요약 데이터를 구성하는 것을 제안한다. 약 425,000개의 신문기사와 그 요약데이터를 구축할 수 있었다. 구성한 데이터의 유용성을 보이기 위해 추출 요약 시스템을 구현하였다. 본 연구에서 구축한 데이터로 학습한 교사 학습 모델과 비교사 학습 모델의 성능을 비교하였다. 실험 결과 제안한 데이터로 학습한 모델이 비교사 학습 알고리즘에 비해 더 높은 ROUGE 점수를 보였다.
영문내용 (English Abstract)	A training dataset for text summarization consists of pairs of a document and its summary. As conventional approaches to building text summarization dataset are human labor intensive, it is not easy to construct large datasets for text summarization. A collection of news articles is one of the most popular resources for text summarization because it is easily accessible, large-scale and high-quality text. From social media news services, we can collect not only headlines and subheads of news articles but also summary descriptions that human editors write about the news articles. Approximately 425,000 pairs of news articles and their summaries are collected from social media. We implemented an automatic extractive summarizer and trained it on the dataset. The performance of the summarizer is compared with unsupervised models. The summarizer achieved better results than unsupervised models in terms of ROUGE score.
키워드(Keyword)	Korean Text Summarization Dataset Description Headline Subhead Automatic Extractive Summarization 한국어 문서 요약 데이터 집합 설명글 제목 부제 자동 추출 문서 요약
파일첨부	PDF 다운로드