Process placement for multicore clusters

MPI process placement can play a determining role concerning the application performance. In this project we work on a novel algorithm called TreeMatch that maps processes to resources.

Communication PatternSince few years, the number of clusters of NUMA nodes including multi-cores processors increases. A lot of parallel applications are already able to take advantage of these architectures but several opimizations are still possible. One concerns the communication between processes. Indeed, on the one hand, the amount of data exchanged between the processes of an application is not homogeneous. On the other hand, the underlying archtecture is very hierarchical (network, memory, caches). So, it becomes interesting to consider these two parts and match them in order to reduce the communication cost of an application. That's what we do with TreeMatch.

More specifically, once you have your communication pattern (by monitoring an MPI implementation for instance) stored in a matrix, you need to get back a true representation of the hardware topology. For this part, TreeMatch take as input eiher a XML file produced by Hwloc or a tleaf file. By giving the communication matrix and the topology, TreeMatch is able to produce a permutation of processes to bind.