Skip to Main content Skip to Navigation
Conference papers

An Efficient Algorithm for Computing Entropic Measures of Feature Subsets

Frédéric Pennerath 1, 2
2 ORPAILLEUR - Knowledge representation, reasonning
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Entropic measures such as conditional entropy or mutual information have been used numerous times in pattern mining, for instance to characterize valuable itemsets or approximate functional dependencies. Strangely enough the fundamental problem of designing efficient algorithms to compute entropy of subsets of features (or mutual information of feature subsets relatively to some target feature) has received little attention compared to the analog problem of computing frequency of itemsets. The present article proposes to fill this gap: it introduces a fast and scalable method that computes entropy and mutual information for a large number of feature subsets by adopting the divide and conquer strategy used by FP-growth-one of the most efficient frequent itemset mining algorithm. In order to illustrate its practical interest, the algorithm is then used to solve the recently introduced problem of mining reliable approximate functional dependencies. It finally provides empirical evidences that in the context of non-redundant pattern extraction, the proposed method outperforms existing algorithms for both speed and scalability.
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal-centralesupelec.archives-ouvertes.fr/hal-01897734
Contributor : Frédéric Pennerath <>
Submitted on : Wednesday, October 17, 2018 - 3:18:54 PM
Last modification on : Thursday, June 25, 2020 - 9:42:02 AM
Long-term archiving on: : Friday, January 18, 2019 - 2:19:26 PM

File

article.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01897734, version 1

Citation

Frédéric Pennerath. An Efficient Algorithm for Computing Entropic Measures of Feature Subsets. ECML-PKDD 2018 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Sep 2018, Dublin, Ireland. ⟨hal-01897734⟩

Share

Metrics

Record views

224

Files downloads

352