Àüü
ÀüÀÚ/Àü±â
Åë½Å
ÄÄÇ»ÅÍ
·Î±×ÀÎ
ȸ¿ø°¡ÀÔ
About Us
ÀÌ¿ë¾È³»
¿¬±¸¹®Çå
±¹³» ³í¹®Áö
¿µ¹® ³í¹®Áö
±¹³» ÇÐȸÁö
Çмú´ëȸ ÇÁ·Î½Ãµù
±¹³» ÇÐÀ§ ³í¹®
³í¹®Á¤º¸
¹é¼
±³À°Á¤º¸
¿¬±¸ ù°ÉÀ½
ÇаúÁ¤º¸
°ÀÇÁ¤º¸
µ¿¿µ»óÁ¤º¸
E-Learning
¿Â¶óÀÎ Àú³Î
½ÉÈÁ¤º¸
¿¬±¸ ¹× ±â¼úµ¿Çâ
Áֿ俬±¸ÅäÇÈ
ÁÖ¿ä°úÁ¦ ¹× ±â°ü
Çؿܱâ°ü °ü·ÃÀÚ·á
¹ÙÀÌ¿À Á¤º¸±â¼ú
ÁÖ¿ä Archive Site
Æ÷Ä¿½ºiN
¿¬±¸ÀÚ Á¤º¸
¶óÀÌ¡½ºÅ¸
ÆÄ¿öiNÅͺä
¼¼ÁßÇÑ
¿¬±¸ÀÚ·á
¹®ÀÚ DB
¿ë¾î»çÀü
¾Ë¸²¸¶´ç
ºÎ½Ç ÇмúÈ°µ¿ ¿¹¹æ
³í¹®¸ðÁý
´ëȸ¾È³»
What's New
¿¬±¸ºñÁ¤º¸
±¸ÀÎÁ¤º¸
°øÁö»çÇ×
CSERIC ±¤Àå
Post-Conference
¿¬±¸ÀÚ Ä«Æä
ÀÚÀ¯°Ô½ÃÆÇ
Q&A
´Ý±â
»çÀÌÆ®¸Ê
¿¬±¸¹®Çå
±¹³» ³í¹®Áö
¿µ¹® ³í¹®Áö
±¹³» ÇÐȸÁö
Çмú´ëȸ ÇÁ·Î½Ãµù
±¹³» ÇÐÀ§ ³í¹®
³í¹®Á¤º¸
¹é¼
±³À°Á¤º¸
¿¬±¸ ù°ÉÀ½
ÇаúÁ¤º¸
°ÀÇÁ¤º¸
µ¿¿µ»óÁ¤º¸
E-Learning
¿Â¶óÀÎ Àú³Î
½ÉÈÁ¤º¸
¿¬±¸ ¹× ±â¼úµ¿Çâ
Áֿ俬±¸ÅäÇÈ
ÁÖ¿ä°úÁ¦ ¹× ±â°ü
Çؿܱâ°ü °ü·ÃÀÚ·á
¹ÙÀÌ¿À Á¤º¸±â¼ú
ÁÖ¿ä Archive Site
ÄÄÇ»ÅÍiN
¿¬±¸ÀÚ Á¤º¸
¿¬±¸ÀÚ·á
¹®ÀÚ DB
Ȧ·Î±×·¥ DB
¿ë¾î»çÀü
¾Ë¸²¸¶´ç
ºÎ½Ç ÇмúÈ°µ¿ ¿¹¹æ
³í¹®¸ðÁý
´ëȸ¾È³»
What's New
¿¬±¸ºñ Á¤º¸
±¸ÀÎÁ¤º¸
°øÁö»çÇ×
IT Daily
CSERIC ±¤Àå
Post-Conference
¿¬±¸ÀÚ Ä«Æä
ÀÚÀ¯°Ô½ÃÆÇ
Q&A
¼ºñ½º ¹Ù·Î°¡±â
¼³¹®Á¶»ç
¿¬±¸À±¸®
°ü·Ã±â°ü
Please wait....
¿¬±¸¹®Çå
±¹³» ³í¹®Áö
¿µ¹® ³í¹®Áö
±¹³» ÇÐȸÁö
Çмú´ëȸ ÇÁ·Î½Ãµù
±¹³» ÇÐÀ§ ³í¹®
³í¹®Á¤º¸
¹é¼
±¹³» ³í¹®Áö
Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö >
Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö
>
Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)
Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)
Current Result Document :
1
/ 4
´ÙÀ½°Ç
ÇѱÛÁ¦¸ñ(Korean Title)
¼Ò±Ô¸ð µ¥ÀÌÅÍ ±â¹Ý Çѱ¹¾î ¹öÆ® ¸ðµ¨
¿µ¹®Á¦¸ñ(English Title)
A Small-Scale Korean-Specific BERT Language Model
ÀúÀÚ(Author)
ÀÌ»ó¾Æ
ÀåÇѼÖ
¹é¿¬¹Ì
¹Ú¼öÁö
½ÅÈ¿ÇÊ
Sangah Lee
Hansol Jang
Yunmee Baik
Suzi Park
Hyopil Shin
¿ø¹®¼ö·Ïó(Citation)
VOL 47 NO. 07 PP. 0682 ~ 0692 (2020. 07)
Çѱ۳»¿ë
(Korean Abstract)
ÃÖ±Ù ÀÚ¿¬¾î󸮿¡¼ ¹®Àå ´ÜÀ§ÀÇ ÀÓº£µùÀ» À§ÇÑ ¸ðµ¨µéÀº °Å´ëÇÑ ¸»¹¶Ä¡¿Í ÆĶó¹ÌÅ͸¦ ÀÌ¿ëÇϱ⠶§¹®¿¡ Å« Çϵå¿þ¾î¿Í µ¥ÀÌÅ͸¦ ¿ä±¸ÇÏ°í ÇнÀÇÏ´Â µ¥ ½Ã°£ÀÌ ¿À·¡ °É¸°´Ù´Â ´ÜÁ¡À» °®´Â´Ù. µû¶ó¼ ±Ô¸ð°¡ Å©Áö ¾Ê´õ¶óµµ ÇнÀ µ¥ÀÌÅ͸¦ °æÁ¦ÀûÀ¸·Î È°¿ëÇÏ¸é¼ ÇÊÀûÇÒ¸¸ÇÑ ¼º´ÉÀ» °¡Áö´Â ¸ðµ¨ÀÇ Çʿ伺ÀÌ Á¦±âµÈ´Ù. º» ¿¬±¸´Â À½Àý ´ÜÀ§ÀÇ Çѱ¹¾î »çÀü, ÀÚ¼Ò ´ÜÀ§ÀÇ Çѱ¹¾î »çÀüÀ» ±¸ÃàÇÏ°í ÀÚ¼Ò ´ÜÀ§ÀÇ ÇнÀ°ú ¾ç¹æÇâ WordPiece ÅäÅ©³ªÀÌÀú¸¦ »õ·Ó°Ô ¼Ò°³ÇÏ¿´´Ù. ±× °á°ú ±âÁ¸ ¸ðµ¨ÀÇ 1/10 »çÀÌÁîÀÇ ÇнÀ µ¥ÀÌÅ͸¦ ÀÌ¿ëÇÏ°í ÀûÀýÇÑ Å©±âÀÇ »çÀüÀ» »ç¿ëÇØ ´õ ÀûÀº ÆĶó¹ÌÅÍ·Î °è»ê·®Àº ÁÙ°í ¼º´ÉÀº ºñ½ÁÇÑ KR-BERT ¸ðµ¨À» ±¸ÇöÇÒ ¼ö ÀÖ¾ú´Ù. À̷νá Çѱ¹¾î¿Í °°ÀÌ °íÀ¯ÀÇ ¹®ÀÚ Ã¼°è¸¦ °¡Áö°í ÇüÅ·ÐÀûÀ¸·Î º¹ÀâÇϸç ÀÚ¿øÀÌ ÀûÀº ¾ð¾î¿¡ ´ëÇØ ¸ðµ¨À» ±¸ÃàÇÒ ¶§´Â ÇØ´ç ¾ð¾î¿¡ Æ¯ÈµÈ ¾ð¾îÇÐÀû Çö»óÀ» ¹Ý¿µÇØ¾ß ÇÑ´Ù´Â °ÍÀ» È®ÀÎÇÏ¿´´Ù.
¿µ¹®³»¿ë
(English Abstract)
Recent models for the sentence embedding use huge corpus and parameters. They have massive data and large hardware and it incurs extensive time to pre-train. This tendency raises the need for a model with comparable performance while economically using training data. In this study, we proposed a Korean-specific model KR-BERT, using sub-character level to character-level Korean dictionaries and BidirectionalWordPiece Tokenizer. As a result, our KR-BERT model performs comparably and even better than other existing pre-trained models using one-tenth the size of training data from the existing models. It demonstrates that in a morphologically complex and resourceless language, using sub-character level and BidirectionalWordPiece Tokenizer captures language-specific linguistic phenomena that the Multilingual BERT model missed.
Å°¿öµå(Keyword)
¾ð¾î ¸ðµ¨¸µ
ÀÓº£µù ¸ðµ¨
Çѱ¹¾î ¸ðµ¨
»çÀü ÅäÅ©³ªÀÌÀú
BERT
language modeling
embedding model
Korean language modeling
vocabulary
tokenizer
ÆÄÀÏ÷ºÎ
PDF ´Ù¿î·Îµå
¸ñ·Ï
Copyright(c)
Computer Science Engineering Research Information Center
. All rights reserved.