Robust clustering and outlier rejection using the Mahalanobis distance distribution
Abstract
Both clustering and outlier detection tasks have a wide range of applications in signal processing. We focus here on the case where the data is corrupted with outliers and samples are relatively small. We study approximations of the distribution of the Mahalanobis distance when using robust estimators for the mean and the scatter matrix. We develop clustering and outlier rejection methods in the context of robust mixture modelling. We leverage on robust clustering and parameter estimations on a portion of the data, and we perform outlier detection on the rest of the data. We illustrate the importance of our method with synthetic simulations where we compare the theoretical asymptotic distribution and an approximated distribution to the empirical distribution. We conclude with an application using the well-known data set MNIST contaminated with noise.