하둡 기반 천문 응용 분야 대규모 데이터 분석 기법

곽재혁; 윤준원; 정용환; 함재균; 박동인; Jae-Hyuck Kwak; Junweon Yoon; Yonghwan Jung; Jaegyoon Hahm; Dongin Park

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 논문지 C : 컴퓨팅의 실제

정보과학회 논문지 C : 컴퓨팅의 실제

Current Result Document :

한글제목(Korean Title)	하둡 기반 천문 응용 분야 대규모 데이터 분석 기법
영문제목(English Title)	Large-scale Data Analysis based on Hadoop for Astroinformatics
저자(Author)	곽재혁 윤준원 정용환 함재균 박동인 Jae-Hyuck Kwak Junweon Yoon Yonghwan Jung Jaegyoon Hahm Dongin Park
원문수록처(Citation)	VOL 17 NO. 11 PP. 0587 ~ 0591 (2011. 11)
한글내용 (Korean Abstract)	과학 응용 분야에서 데이터 집약형 컴퓨팅(data-intensive computing)이 점차적으로 주목받으면서 대규모의 데이터를 빠른 시간 내에 효율적으로 처리해야 할 필요성으로 인해 클라우드 컴퓨팅이 주목받고 있다. 하둡(Hadoop)은 대규모 데이터 처리 분석을 위한 소프트웨어 프레임워크를 제공하며 클라우드 컴퓨팅의 대표적인 기술로서 널리 사용되고 있다. 특히, 하둡은 높은 확장성과 성능을 제공하면서 결함 탐지와 자동 복구 기능이 우수하여 과학 기술 분야에서도 점차적으로 도입되어 활용되고 있다. 본 논문에서는 하둡을 이용하여 천문 응용 분야에서 생성되는 대규모 데이터를 분석하기 위한 방법을 제안하였다. 본 논문에서 관심을 가지는 천문 응용 데이터는 Super-WASP프로젝트에서 생성되는 대략 천만 개의 작은 크기의 관측 데이터를 처리해야 하는데 하둡은 대규모 데이터 처리에 특화되어 있어서 많은 개수의 작은 크기를 가지는 관측 데이터 처리에는 적합하지 않다. 본 논문에서는 천문 응용 데이터 처리를 위한 입출력 파일을 하둡에서 제공하는 특수화된 데이터 구조를 이용하여 압축하였고 천문 응용 실행 코드가 하둡에서 실행 가능하도록 맵리듀스 작업으로 랩핑하여 구현하였다.
영문내용 (English Abstract)	Data-intensive computing being highly regarded in science application area, cloud computing has engaged public attention due to the necessity of efficiently processing large-scale data as soon as possible. Hadoop provides software framework for large-scale data processing and analysis, and is widely adopted and used as the representative technology of cloud computing. Especially, roviding high-scalability and performance and getting an excellence in fault-tolerence and automatic recovery functionalities, Hadoop is gradually used in scientific communities. In this paper, we propose a Hadoop-based method to analyse large-scale data generated from astroinformatics research area. astroinformatics data we are dealing with are generated from Super WASP project, which need to process about ten million of small-sized observation data. However, Hadoop is specialized in large-scale data analysis and it is not suitable for many small-sized astroinformatics data. In this paper, we packed many small-sized astroinformatics data into large-sized ones using the specialized data structure of Hadoop and implemented MapReduce wrapper program to execute astroinformatics analysis code on Hadoop.
키워드(Keyword)	데이터집약형 컴퓨팅 천문정보 하둡 맵리듀스 Data-intensive Computing Astroinformatics Hadoop MapReduce
파일첨부	PDF 다운로드