A Hybrid MPI/OpenMP Parallelization of K-Means Algorithms Accelerated Using the Triangle Inequality

Wojciech Kwedlo , Pawel J. Czochanski


The standard formulation of the K -means clustering (Lloyd's method) performs many unnecessary distance calculations. In this paper, we focus on four approaches that use the triangle inequality to avoid unnecessary distance calculations. These approaches are Drake's, Elkan's, Annulus, and Yinyang algorithms. We propose a hybrid MPI/OpenMP parallelization of these algorithms in which the dataset and the corresponding data structures storing bounds on distances are evenly divided among MPI processes. Then, in the assignment step of a K -means iteration, each MPI process computes the assignment of its portion of data using OpenMP threads. In the update step of the iteration, the cluster centroids are computed using a hierarchical all-reduce operation. In the computational experiments, we compared the strong scalability of these four algorithms with the scalability of Lloyd's algorithm, parallelized using the same approach. The results indicate that all four algorithms maintain an advantage in computing time over Lloyd's algorithm. A comparison with two software packages, whose sources are publicly available, in the same computing environment, shows that our implementations are more efficient.
Author Wojciech Kwedlo (FCS / SD)
Wojciech Kwedlo,,
- Software Department
, Pawel J. Czochanski
Pawel J. Czochanski,,
Journal seriesIEEE Access, ISSN 2169-3536, (N/A 100 pkt)
Issue year2019
Publication size in sheets0.85
Keywords in EnglishClustering , K-means , triangle inequality , MPI , OpenMP , hybrid parallelization
ASJC Classification2200 General Engineering; 2500 General Materials Science; 1700 General Computer Science
URL https://ieeexplore.ieee.org/document/8681032
Internal identifierROC 19-20
Languageen angielski
Score (nominal)100
Score sourcejournalList
ScoreMinisterial score = 100.0, 24-03-2020, ArticleFromJournal
Publication indicators Scopus SNIP (Source Normalised Impact per Paper): 2018 = 1.718; WoS Impact Factor: 2018 = 4.098 (2) - 2018=4.54 (5)
Citation count*
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Are you sure?