Skip to Main content Skip to Navigation
Conference papers

Discovering Approximate Functional Dependencies using Smoothed Mutual Information

Abstract : We consider the task of discovering the top-K reliable approximate functional dependencies X → Y from high dimensional data. While naively maximizing mutual information involving high dimensional entropies over empirical data is subject to false discoveries, correcting the empirical estimator against data sparsity can lead to efficient exact algorithms for robust dependency discovery. Previous approaches focused on correcting by subtracting expected values of different null hypothesis models. In this paper, we consider a different correction strategy and counter data sparsity using uniform priors and smoothing techniques, that leads to an efficient and robust estimating process. In addition, we derive an admissible and tight bounding function for the smoothed estimator that allows us to efficiently solve via branch-and-bound the hard search problem for the top-K dependencies. Our experiments show that our approach is much faster than previous proposals, and leads to the discovery of sparse and informative functional dependencies. CCS CONCEPTS • Information systems → Data mining.
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download

https://hal-centralesupelec.archives-ouvertes.fr/hal-02880553
Contributor : Frédéric Pennerath <>
Submitted on : Thursday, June 25, 2020 - 9:49:43 AM
Last modification on : Thursday, July 2, 2020 - 3:46:23 AM
Long-term archiving on: : Wednesday, September 23, 2020 - 3:42:55 PM

File

smoothed-info-article-without-...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02880553, version 1

Citation

Frédéric Pennerath, Panagiotis Mandros, Jilles Vreeken. Discovering Approximate Functional Dependencies using Smoothed Mutual Information. 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’20), Aug 2020, Virtual Event, United States. ⟨hal-02880553⟩

Share

Metrics

Record views

34

Files downloads

34