Parallel and Distributed Data Mining


Cafaro Massimo Cafaro

Massimo Cafaro

University of Salento, Italy

Epicoco Italo Epicoco

Italo Epicoco

University of Salento, Italy

Pulimeno Marco Pulimeno

Marco Pulimeno

University of Salento, Italy


The special session brings together researchers and practitioners working on different high-performance aspects of data mining algorithms, enabling novel applications. Data mining techniques and algorithms to process huge amount of data in order to extract useful and interesting information have become popular in many different contexts. Algorithms are required to make sense of data automatically and in efficient ways. Nonetheless, even though sequential computer systems performance is improving, they are not suitable to keep up with the increase in the demand for data mining applications and the data size. Moreover, the main memory of sequential systems may not be enough to hold all the data related to current applications. Therefore, there is an increasing interest in the design and implementation of parallel data mining algorithms. On parallel computers, by exploiting the vast aggregate main memory and processing power of processors and accelerators, parallel algorithms can easily address both the running time and memory requirement issues. Anyway, parallelizing existing algorithms in order to achieve good performance and scalability with regard to massive datasets is not trivial. Indeed, it is of paramount importance a good data organization and decomposition strategy in order to balance the workload while minimizing data dependences. Another concern is related to minimizing synchronization and communication overhead. Finally, I/O costs should be minimized as well. The Workshop will allow exchanging ideas and results related to on-going research, focusing on high-performance aspects of data mining algorithms and applications. Creating breakthrough parallel algorithms for high-performance data mining applications requires addressing several key computing problems which may lead to novel solutions and new insights in interdisciplinary applications. The focus of the workshop is on all forms of advances in high-performance data mining algorithms and applications, and related topics.


The session will cover a variety of topics including but not limited to:

  • Parallel data mining algorithms using MPI and/or OpenMP
  • Parallel data mining algorithms targeting GPUs and many-cores accelerators
  • Parallel data mining applications exploiting FPGA
  • Distributed data mining algorithms
  • Benchmarking and performance studies of high-performance data mining applications
  • Novel programming paradigms to support high-performance computing for data mining
  • Performance models for high-performance data mining applications and middleware
  • Programming models, tools, and environments for high-performance computing in data mining
  • Caching, streaming, pipelining, and other optimization techniques for data management in high-performance computing for data mining