Separation of Text and Non-text in Document Layout Analysis using a Recursive Filter

Tuan-Anh Tran; In-Seop Na; Soo-Hyung Kim

연구문헌

영문 논문지

홈 > 연구문헌 > 영문 논문지 > TIIS (한국인터넷정보학회)

TIIS (한국인터넷정보학회)

Current Result Document :

한글제목(Korean Title)	Separation of Text and Non-text in Document Layout Analysis using a Recursive Filter
영문제목(English Title)	Separation of Text and Non-text in Document Layout Analysis using a Recursive Filter
저자(Author)	Tuan-Anh Tran In-Seop Na Soo-Hyung Kim
원문수록처(Citation)	VOL 09 NO. 10 PP. 4072 ~ 4091 (2015. 10)
한글내용 (Korean Abstract)
영문내용 (English Abstract)	A separation of text and non-text elements plays an important role in document layout analysis. A number of approaches have been proposed but the quality of separation result is still limited due to the complex of the document layout. In this paper, we present an efficient method for the classification of text and non-text components in document image. It is the combination of whitespace analysis with multi-layer homogeneous regions which called recursive filter. Firstly, the input binary document is analyzed by connected components analysis and whitespace extraction. Secondly, a heuristic filter is applied to identify non-text components. After that, using statistical method, we implement the recursive filter on multi-layer homogeneous regions to identify all text and non-text elements of the binary image. Finally, all regions will be reshaped and remove noise to get the text document and non-text document. Experimental results on the ICDAR2009 page segmentation competition dataset and other datasets prove the effectiveness and superiority of proposed method.
키워드(Keyword)	text detection non-text identification page segmentation document layout analysis OCR recursive filter
파일첨부	PDF 다운로드