• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

¿µ¹® ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ¿µ¹® ³í¹®Áö > TIIS (Çѱ¹ÀÎÅͳÝÁ¤º¸ÇÐȸ)

TIIS (Çѱ¹ÀÎÅͳÝÁ¤º¸ÇÐȸ)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) An Analytic solution for the Hadoop Configuration Combinatorial Puzzle based on General Factorial Design
¿µ¹®Á¦¸ñ(English Title) An Analytic solution for the Hadoop Configuration Combinatorial Puzzle based on General Factorial Design
ÀúÀÚ(Author) R. Sathia Priya   A. John Prakash   V. Rhymend Uthariaraj  
¿ø¹®¼ö·Ïó(Citation) VOL 16 NO. 11 PP. 3619 ~ 3637 (2022. 11)
Çѱ۳»¿ë
(Korean Abstract)
¿µ¹®³»¿ë
(English Abstract)
Big data analytics offers endless opportunities for operational enhancement by extracting valuable insights from complex voluminous data. Hadoop is a comprehensive technological suite which offers solutions for the large scale storage and computing needs of Big data. The performance of Hadoop is closely tied with its configuration settings which depends on the cluster capacity and the application profile. Since Hadoop has over 190 configuration parameters, tuning them to gain optimal application performance is a daunting challenge. Our approach is to extract a subset of impactful parameters from which the performance enhancing sub-optimal configuration is then narrowed down. This paper presents a statistical model to analyze the significance of the effect of Hadoop parameters on a variety of performance metrics. Our model decomposes the total observed performance variation and ascribes them to the main parameters, their interaction effects and noise factors. The method clearly segregates impactful parameters from the rest. The configuration setting determined by our methodology has reduced the Job completion time by 22%, resource utilization in terms of memory and CPU by 15% and 12% respectively, the number of killed Maps by 50% and Disk spillage by 23%. The proposed technique can be leveraged to ease the configuration tuning task of any Hadoop cluster despite the differences in the underlying infrastructure and the application running on it.
Å°¿öµå(Keyword) ANOVA   Big data   Configuration tuning   General Factorial Design   Hadoop   Impactful parameters  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå