BERT Pre-trained Models for Data Augmentation in Twitter Medical Named-Entity Recognition

홈 > 연구문헌 > 학술대회 프로시딩 > 한국정보과학회 학술대회 > KCC 2021

한글제목(Korean Title)	BERT Pre-trained Models for Data Augmentation in Twitter Medical Named-Entity Recognition
영문제목(English Title)	BERT Pre-trained Models for Data Augmentation in Twitter Medical Named-Entity Recognition
저자(Author)	Kokoy Siti Komariah Bong-Kee Sin
원문수록처(Citation)	VOL 48 NO. 01 PP. 0870 ~ 0872 (2021. 06)
한글내용 (Korean Abstract)
영문내용 (English Abstract)	Data augmentation is a technique often employed to increase the size of training data that is the same as the original data synthetically. However, in Named Entity Recognition tasks that makes predictions at the token level, it is difficult to augment a set of words without changing the existing label and context of the sentence. In this paper, we take BERT to generate a new sentence by predicting the masked words and replace them according to both the context and its label. Experiments on six different BERT pre-trained models show that data augmentation using a deep bidirectional language model can generate more data with a relevant context in a short text such as tweets and improve the classifier's performance.
키워드(Keyword)	data augmentationmedical named-entit bert pre-trained modeltwitter dat information extractio
파일첨부	PDF 다운로드