Paper Title
Parallel Implementation Of K-Means Algorithm Using Hadoop
Abstract
Clustering is regarded as one of the momentous task in data mining which deals with primarily grouping of
similar data. To cluster large data is a point of concern. In recent years, data clustering has been studied extensively and a lot
of methods and theories have been achieved. Hadoop is a software framework which deals with distributed processing of
vast amount of data across groups of distributed computers using Map-Reduce programming model. The Map-Reduce
computing model have two phases; a map phase and a reduce phase. The map phase calculates the distances between each
point and each cluster and allots each point to its nearest cluster. All the points which belong to the same cluster are sent to a
single reduce phase. The reduce phase calculates the new cluster centers for the next Map-Reduce job. Map-Reduce allows a
kind of parallelization to solve a problem that involves large datasets using computing clusters and is also a striking
implication for data clustering involving large datasets. This paper focuses on studying the parallel implementation of KMeans
clustering algorithm using Map-Reduce computing model of Hadoop on different datasets.
Keywords— Data Mining, Data Clustering, Parallel Computing, Map-Reduce, K-Means algorithm, Hadoop, HDFS, Machine Learning.
Author - Jerril Mathson Mathew, Jyothis Joseph
Published : Volume-3,Issue-6 ( Jun, 2016 )
DOIONLINE Number - IJAECS-IRAJ-DOIONLINE-4853
View Here
|
|
| |
|
PDF |
| |
Viewed - 57 |
| |
Published on 2016-07-15 |
|
|
|
|
|
|