PCA Based Two Bond Chem Structured Subgraph Isomorphic Matching
Abstract
The development of biochemistry and chemical molecular structure has revealed an increasingly important role of chemical structure bond classification. Like the chemical classification, chemical pattern matching attracts increasing attention. However, it is time-consuming to select proper bonding and atoms/molecules against chemical molecular structure using traditional methods, such as structured-based chemical taxonomy and fragmentation pattern matching. Thus, there is an urgent need to design effective computational methods for performing effective pattern matching against compounds. This study attempted to extract single bond element and two bond elements for chemical structure pattern matching using posterior measure, such as Ratio Likelihood measure called as the PCA-based Subgraph Isomorphic Matching (PCA-SIM) on chemical structured data. Each chemical molecular structure pair in PCA-SIM framework is represented by single and double bond properties derived from the Posterior T-bond Classification. As a result by applying PCA-SIM, PCA-based Dimensionality reduced features were obtained. Simultaneously, to confirm the importance of the obtained reduced features, the associations between them are further discussed using PCA-based Dimensionality reduction technique. Finally, Principal Components Analysis (PCA) is done to identify the exact match of test sample data pattern of chemical bonds to the trained BIS pattern (i.e. with trained BIS patterns obtained as input). The experimental work is carried out in JAVA language for evaluating the matching sequences pattern in chemical bonding. The performance of PCA-SIM framework is evaluated with parameters such as density of chemical bonds, size of chemical bond structure, true positive rate, classification time, pattern matching time and pattern matching rate against existing state-of-art methods.