Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ¼ÒÇÁÆ®¿þ¾î ¹× µ¥ÀÌÅÍ °øÇÐ
ÇѱÛÁ¦¸ñ(Korean Title) |
Çѱ¹¾î Á¦¸ñ °³Ã¼¸í ÀÎ½Ä ¹× »çÀü ±¸Ãà: µµ¼, ¿µÈ, À½¾Ç, TVÇÁ·Î±×·¥ |
¿µ¹®Á¦¸ñ(English Title) |
Named Entity Recognition and Dictionary Construction for Korean Title: Books, Movies, Music and TV Programs |
ÀúÀÚ(Author) |
¹Ú¿ë¹Î
ÀÌÀ缺
Yongmin Park
Jae Sung Lee
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 03 NO. 07 PP. 0285 ~ 0292 (2014. 07) |
Çѱ۳»¿ë (Korean Abstract) |
°³Ã¼¸í ÀνÄÀº Á¤º¸°Ë»ö ½Ã½ºÅÛ, ÁúÀÇÀÀ´ä ½Ã½ºÅÛ, ±â°è¹ø¿ª ½Ã½ºÅÛ µîÀÇ ¼º´ÉÀ» Çâ»ó½ÃÅ°±â À§ÇÏ¿© »ç¿ëµÈ´Ù. °³Ã¼¸í ÀνÄÀº ÀϹÝÀûÀ¸·Î PLOs(Àθí, Áö¸í, ±â°ü¸í)À» ´ë»óÀ¸·Î Çϸç, ÁÖ·Î ¹Ìµî·Ï¾î¿Í °íÀ¯¸í»ç·Î ÀÌ·ç¾îÁ® Àֱ⠶§¹®¿¡ °íÀ¯¸í»ç³ª ¹Ìµî·Ï¾î´Â Áß¿äÇÑ °³Ã¼¸í È帷Π¾²ÀÏ ¼ö ÀÖ´Ù. ÇÏÁö¸¸ µµ¼¸í, ¿µÈ¸í, À½¾Ç¸í, TVÇÁ·Î±×·¥¸í°ú °°Àº Á¦¸ñ °³Ã¼¸íÀº PLO¿Í´Â ´Þ¸® ´Ü¾îºÎÅÍ ¹®Àå±îÁö ¸Å¿ì ´Ù¾çÇÑ ÇüŸ¦ Áö´Ï°í ÀÖ¾î¼ °³Ã¼¸í ÀνÄÀÌ ½±Áö ¾Ê´Ù. º» ³í¹®¿¡¼´Â ´º½º ±â»ç¹®À» ÀÌ¿ëÇÏ¿© Á¦¸ñ °³Ã¼¸íÀ» ºü¸£°Ô ÀνÄÇÏ°í ÀÚµ¿À¸·Î »çÀüÀ» ±¸ÃàÇÏ´Â ¹æ¹ýÀ» Á¦¾ÈÇÑ´Ù. ¸ÕÀú Ư¼ö±âÈ£·Î ¹ÀÎ ¾îÀýÀ» ÃßÃâÇÏ°í, ÁÖº¯ ¹®¸Æ ´Ü¾î ¹× ´Ü¾î °Å¸®¸¦ ÀÌ¿ëÇÏ¿© SVMÀ¸·Î Á¦¸ñ È帵éÀ» ÃßÃâÇÏ¿´´Ù. ÀÌ·¸°Ô ÃßÃâµÈ Á¦¸ñ È帵éÀº »óÈ£ Á¤º¸·®À» °¡ÁßÄ¡·Î SVMÀ» ÀÌ¿ëÇØ Á¦¸ñ À¯ÇüÀ» ºÐ·ùÇÏ¿´´Ù. |
¿µ¹®³»¿ë (English Abstract) |
A named entity recognition method is used to improve the performance of information retrieval systems, question answering systems, machine translation systems and so on. The targets of the named entity recognition are usually PLOs (persons, locations and organizations). They are usually proper nouns or unregistered words, and traditional named entity recognizers use these characteristics to find out named entity candidates. The titles of books, movies and TV programs have different characteristics than PLO entities. They are sometimes multiple phrases, one sentence, or special characters. This makes it difficult to find the named entity candidates. In this paper we propose a method to quickly extract title named entities from news articles and automatically build a named entity dictionary for the titles. For the candidates identification, the word phrases enclosed with special symbols in a sentence are firstly extracted, and then verified by the SVM with using feature words and their distances. For the classification of the extracted title candidates, SVM is used with the mutual information of word contexts. |
Å°¿öµå(Keyword) |
°³Ã¼¸í ÀνÄ
Á¦¸ñ °³Ã¼¸í
»çÀü ±¸Ãà
SVM
Named Entity Recognition
Title Named Entity
Dictionary Construction
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|