Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
µ¥ÀÌÅÍ Áõ°À» ÀÌ¿ëÇÑ Çѱ¹Ç¥ÁØ»ê¾÷ºÐ·ù ´Ù±¹¾î ºÐ·ù |
¿µ¹®Á¦¸ñ(English Title) |
Korean Standard Industry Classification Multilingual Classification Using Data Augmentation |
ÀúÀÚ(Author) |
¿ìÂù±Õ
¿ÀÁöÀº
ȲÁ¤À±
¹ÚÀçÇö
±èÁö¿ì
¹ÚÁø¿µ
Chankyun Woo
Jieun Oh
Jeongyun Hwang
Jaehyeon Park
Jiwoo Kim
Jinyong Pak
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 49 NO. 02 PP. 0696 ~ 0698 (2022. 12) |
Çѱ۳»¿ë (Korean Abstract) |
|
¿µ¹®³»¿ë (English Abstract) |
In this paper, we created a model that automatically classifies the foreign language survey industry classification items surveyed for foreigners in the Census conducted every five years by the Statistics Korea. A language model based on pre-training, which has been widely used recently, was used, and for multilingual classification, a classification model was constructed using XLM-R. Since the data to be used as learning data is in Korean, we first built a model in Korean and conducted a test in 6 languages1). Afterwards, the performance of the classification model according to the learning language was compared by machine translation of the learning data in Korean into English. As a result of comparison, the model that trained all 6 languages showed the best performance in the overall language with an average of 75%. |
Å°¿öµå(Keyword) |
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|