Optimized Graph Classification Model using Frequent Significant Patterns for Analysis
Analysis of compounds can disclose appropriate structural and functional information which may not be derived from sequences alone. Graph classification algorithms and models that can classify and analyse correctly found number of applications in chemical compounds. But computational complexity and scalability due to large number of patterns is a bottleneck for those algorithms as they are mostly using either frequent sub graphs or discriminate graphs only for classification. Using only frequent substructures for learning is not enough to identify most discriminative factors. And is not scalable due to large number of features generated from frequent patterns. Identifying discriminative features is computationally complex and less in number and size. And, using only discriminative features loss the structure of graph. An optimized graph classification model using frequent and significant substructures is presented in this paper. Frequent and significant substructures are generated from the given data set by significant subgraph mining using representative set algorithm. These generated substructures are transformed into feature vector thus facilitating the computation of similarity between the two graphs. We present a novel framework that uses frequent and significant substructures as features such that preserves the structure of graphs and able to construct efficient classifier model as it contains both discriminative features as well as frequent features. With this approach, classification model construction is able to allow all relevant substructures that facilitate the classifier to choose the most discriminating patterns. An evaluation study on datasets explores the strengths and limitations of the proposed work. The results make obvious that this is an optimized approach to construct a classification model accurately using frequent and significant substructures as features.
Keywords: Graph mining, frequent pattern, significant pattern, classification, feature.