데이터 증강을 이용한 한국표준산업분류 다국어 분류

홈 > 연구문헌 >

한글제목(Korean Title)	데이터 증강을 이용한 한국표준산업분류 다국어 분류
영문제목(English Title)	Korean Standard Industry Classification Multilingual Classification Using Data Augmentation
저자(Author)	우찬균 오지은 황정윤 박재현 김지우 박진영 Chankyun Woo Jieun Oh Jeongyun Hwang Jaehyeon Park Jiwoo Kim Jinyong Pak
원문수록처(Citation)	VOL 49 NO. 02 PP. 0696 ~ 0698 (2022. 12)
한글내용 (Korean Abstract)
영문내용 (English Abstract)	In this paper, we created a model that automatically classifies the foreign language survey industry classification items surveyed for foreigners in the Census conducted every five years by the Statistics Korea. A language model based on pre-training, which has been widely used recently, was used, and for multilingual classification, a classification model was constructed using XLM-R. Since the data to be used as learning data is in Korean, we first built a model in Korean and conducted a test in 6 languages1). Afterwards, the performance of the classification model according to the learning language was compared by machine translation of the learning data in Korean into English. As a result of comparison, the model that trained all 6 languages showed the best performance in the overall language with an average of 75%.
키워드(Keyword)
파일첨부	PDF 다운로드

사이트맵