이미지 캡션 생성을 위한 다중 관점을 가진 자가 교열 트랜스포머

이지은; 박진욱; 박상현; Jieun Lee; Jinuk Park; Sanghyun Park

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회논문지 (Journal of KIISE)

정보과학회논문지 (Journal of KIISE)

Current Result Document : 6 / 41 이전건 다음건

한글제목(Korean Title)	이미지 캡션 생성을 위한 다중 관점을 가진 자가 교열 트랜스포머
영문제목(English Title)	Self-revising Transformer with Multi-view for Image Captioning
저자(Author)	이지은 박진욱 박상현 Jieun Lee Jinuk Park Sanghyun Park
원문수록처(Citation)	VOL 48 NO. 03 PP. 0340 ~ 0351 (2021. 03)
한글내용 (Korean Abstract)	이미지 캡션 생성이란 주어진 이미지로부터 객체 요소를 파악하여 장면을 설명하는 자연어를 자동으로 서술하는 연구이다. 선행 연구에서는 주로 단일 특징 추출기를 통해 이미지에서 정보를 포착한 후, 순환 신경망 기반의 디코더를 통해 캡션을 생성한다. 하지만 단일 특징 추출기를 사용하기 때문에 다중 관점의 이미지 정보를 사용할 수 없고, 순환 신경망 기반의 장기 의존성 문제를 가지는 디코더를 사용한다. 이를 해결하기 위해서 본 연구는 복수의 특징 추출기를 사용하는 다중 관점 인코더를 통해 다양한 각도의 이미지 정보를 가공하여 전달한다. 또한, 순환 신경망의 한계를 보완하기 위해서, 트랜스포머 모델 기반의 디코더 레이어에 추가적인 멀티-헤드 주의 기제 기법을 통해 생성된 문장을 재구축하여 문장의 완성도를 높이는 자가 교열 트랜스포머를 제안한다. 제안하는 모델의 검증을 위해 MSCOCO 데이터셋을 이용하여 다양한 비교실험으로 정량적, 정성적 평가를 통해 제안한 방법론의 우수성을 검증하였다.
영문내용 (English Abstract)	Image captioning is a task of automatically describing a scene by identifying an object element from a given image. In prior research, information has mainly been captured from the image using a single feature extractor, and captions have then been generated by a recurrent neural network-based decoder. However, multi-view image information is not available with this method because of the use of a single feature extractor, and the use of a recurrent neural network-based decoder causes a long-term dependency problem. To address these issues, the proposed model employs a multi-view encoder using a couple of feature extractors that provide processed image information from various view. In addition, to supplement the limits of the recurrent neural network, we propose a self-revising transformer that increases the completeness of sentences by revising the generated sentences by focusing additional multi-head attention in the transformer-based decoder layer. To present the proposed model, we verify its superiority through quantitative and qualitative evaluations with various comparative experiments using MSCOCO datasets.
키워드(Keyword)	자연어 처리 이미지 캡션 생성 멀티-헤드 주의 기제 기법 다중 관점 인코더 natural language processing image captioning multi-head attention multi-view encoder
파일첨부	PDF 다운로드