Visible to the public A fast information-theoretic approximation of joint mutual information feature selection

TitleA fast information-theoretic approximation of joint mutual information feature selection
Publication TypeConference Paper
Year of Publication2017
AuthorsLiu, H., Ditzler, G.
Conference Name2017 International Joint Conference on Neural Networks (IJCNN)
Date Publishedmay
ISBN Number978-1-5090-6182-2
Keywordsapproximation theory, classification, Collaboration, Complexity theory, composability, computational complexity, Data analysis, data reduction, dimensionality reduction, Entropy, fast information-theoretic approximation, feature selection, feature size, filtering theory, greedy algorithms, greedy forward search, greedy search complexity, high dimensional datasets, Human Behavior, human factors, Information theory, JMI maximization, JMI scales, joint mutual information feature selection, joint mutual information maximization, Metrics, minimum features subset, Mutual information, pattern classification, Policy-Governed Secure Collaboration, polynomial time, pubcrawl, Redundancy, Resiliency, Scalability, science of security, search problems, UCI datasets, unified framework, Upper bound

Feature selection is an important step in data analysis to address the curse of dimensionality. Such dimensionality reduction techniques are particularly important when if a classification is required and the model scales in polynomial time with the size of the feature (e.g., some applications include genomics, life sciences, cyber-security, etc.). Feature selection is the process of finding the minimum subset of features that allows for the maximum predictive power. Many of the state-of-the-art information-theoretic feature selection approaches use a greedy forward search; however, there are concerns with the search in regards to the efficiency and optimality. A unified framework was recently presented for information-theoretic feature selection that tied together many of the works in over the past twenty years. The work showed that joint mutual information maximization (JMI) is generally the best options; however, the complexity of greedy search for JMI scales quadratically and it is infeasible on high dimensional datasets. In this contribution, we propose a fast approximation of JMI based on information theory. Our approach takes advantage of decomposing the calculations within JMI to speed up a typical greedy search. We benchmarked the proposed approach against JMI on several UCI datasets, and we demonstrate that the proposed approach returns feature sets that are highly consistent with JMI, while decreasing the run time required to perform feature selection.

Citation Keyliu_fast_2017