Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)
Current Result Document : 1 / 1
ÇѱÛÁ¦¸ñ(Korean Title) |
ÀÚ¿øºÎÁ· ȯ°æ¿¡ ÀûÇÕÇÑ BIT °³Ã¼¸í Ç¥±â¹ý |
¿µ¹®Á¦¸ñ(English Title) |
A BIT Named Entity Format Suitable for Low Resource Environments |
ÀúÀÚ(Author) |
À± È£
±èâÇö
õ¹Î¾Æ
¹ÚÈ£¹Î
³²±Ã¿µ
Ãֹμ®
±èÀç±Õ
±èÀçÈÆ
Ho Yoon
Chang-Hyun Kim
Min-ah Cheon
Ho-min Park
Young Namgoong
Min-seok Choi
Jae-kyun Kim
Jae-Hoon Kim
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 48 NO. 03 PP. 0293 ~ 0301 (2021. 03) |
Çѱ۳»¿ë (Korean Abstract) |
°³Ã¼¸í ÀνÄÀ̶õ ÁÖ¾îÁø ¹®¼¿¡¼ °³Ã¼¸íÀÇ ¹üÀ§¸¦ ã°í °³Ã¼¸íÀ» ºÐ·ùÇÏ´Â °ÍÀÌ´Ù. ¸¹Àº °³Ã¼¸íÀº Çϳª ÀÌ»óÀÇ ´Ü¾î·Î ±¸¼ºµÇ¹Ç·Î ´ëºÎºÐÀÇ °³Ã¼¸í ÇнÀ¸»¹¶Ä¡´Â BIO Ç¥±â¹ýÀ¸·Î Ç¥ÇöµÈ´Ù. BIO Ç¥±â¹ýÀº °³Ã¼¸íÀÌ ½ÃÀ۵Ǵ ´Ü¾îÀÇ Ç¥Áö¿¡ ¡°B-¡±¸¦ ºÙÀÌ°í, °³Ã¼¸í¿¡ Æ÷ÇÔµÈ ±× ¿ÜÀÇ ´Ü¾îÀÇ Ç¥Áö¿¡´Â ¡°I-¡±¸¦ ºÙÀ̸ç, °³Ã¼¸í°ú °³Ã¼¸í »çÀÌÀÇ ¸ðµç ´Ü¾îÀÇ Ç¥Áö¸¦ ¡°O¡±·Î °£ÁÖÇÏ´Â ¹æ¹ýÀÌ´Ù. ÀÌ ¹æ¹ýÀº ¾à 90% ÀÌ»óÀÇ ´Ü¾î°¡ ¡°O¡± Ç¥Áö¸¦ °¡Áö¹Ç·Î ¡°O¡± Ç¥Áö¿¡ ´ëÇÑ È¥Àâµµ°¡ ³ô¾ÆÁö´Â ¹®Á¦¿Í ºÒ±ÕÇüÇнÀ ¹®Á¦°¡ ¾ß±âµÈ´Ù. º» ³í¹®¿¡¼´Â BIO Ç¥±â¹ý ´ë½Å¿¡ BIT Ç¥±â¹ýÀ» Á¦¾ÈÇÑ´Ù. BIT Ç¥±â¹ýÀ̶õ BIO Ç¥±â¹ý¿¡¼ ¡°O¡± Ç¥Áö¸¦ ¡°T¡± Ç¥Áö·Î º¯È¯ÇÏ´Â ¹æ¹ýÀÌ¸ç º» ³í¹®¿¡¼ ¡°T¡± Ç¥Áö´Â Ç°»ç Ç¥Áö¸¦ ³ªÅ¸³½´Ù. ½ÇÇèÀ» ÅëÇؼ ´Ü¾î Ç¥»óÀÇ ÀÇ¹Ì Åõ¿µµµ°¡ ³ôÁö ¾ÊÀ» °æ¿ì, Áï »ó´ëÀûÀ¸·Î ÀûÀº ¾çÀÇ ÇнÀÀÚ·á·Î ´Ü¾î Ç¥»óÀ» ÇнÀÇßÀ» °æ¿ì¿¡´Â BIT Ç¥±â¹ýÀÌ BIO Ç¥±â¹ýº¸´Ù ÁÁÀº ¼º´ÉÀ» º¸¿´´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
Named entity recognition (NER) seeks to locate and classify named entities into predefined categories such as person names, organization, location, and others. Most name entities consist of more than one word and so the multitude of annotated corpora for NER are encoded by the BIO (short for Beginning, Inside, and Outside) format: A ¡°B-¡± prefix before a tag indicates that the tag is the beginning of a named entity, and an ¡°I-¡± prefix before a tag indicates that the tag is inside the named entity. An ¡°O¡± tag indicates that a word belongs to no named entity. In this format, words with ¡°O¡± tags in the corpora amount to more than about 90% of the words and thus, can cause two problems: the high perplexity of words with ¡°O¡± tags and imbalance learning. In this paper, we propose a novel format to represent the NER corpus called the BIT format, which uses ¡°T (short for POS Tags)¡± tags in place of ¡°O¡± tags. Experiments have shown that the BIT format outperforms the BIO format when the meaning projection of the word representation is unreliable, namely, when word embedding is trained through a relatively small number of words.
|
Å°¿öµå(Keyword) |
BIT Ç¥±â¹ý
°³Ã¼¸í ÀνÄ
Bi-LSTM/CRF
BIO Ç¥±â¹ý
BIT format
named entity recognition
Bi-LSTM/CRF
BIO format
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|