Large dataset partitioning using ensemble partition-based clustering with majority voting technique

Vunnava Dinesh Babu, Karunakaran Malathi

Abstract


Large datasets have become useful in data mining for processing, storing, and handling vast amounts of data. However, handling and processing large datasets is time-consuming and memory intensive. As a result, the researchers adopted a partitioning strategy to improve controllability and performance and reduce the time and memory required to handle large datasets. Unfortunately, the numerous clustering techniques available in the literature could confuse experts in choosing the best techniques for a given dataset. Furthermore, no clustering technique can tackle all problems, such as cluster structure, noise, or density. To manage large datasets, existing clustering techniques need scalable solutions. Therefore, this paper proposes an ensemble partition-based clustering with a majority voting technique for large dataset partitioning using the aggregation of k-means, k-medoids, fuzzy c-means, expectation-maximization (EM) and density-based spatial clustering of applications with noise (DBSCAN) techniques. These techniques cluster the large dataset individually in the first stage. The final clusters are discovered in the next stage through a majority voting technique among the five clustering algorithms. These five clustering algorithms assigned data instances to the cluster with the most votes. The experimental findings demonstrate that the ensemble partition-based clustering method surpasses the other five clustering algorithms in terms of execution time and accuracy.

Keywords


Clustering; Ensemble clustering; Large dataset; Majority voting technique; Partitioning

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v29.i2.pp838-844

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

shopify stats IJEECS visitor statistics