Investigating The Chemical Diversity of Herg Inhibitors Using Cheminformatics and Machine Learning
hERG channels regulate the hearts action potential by controlling potassium ion flow during repolarization. This study analyzed 6362 molecules as potential hERG inhibitors using cheminformatics and machine learning, based on the hypothesis that molecular properties influence inhibition potency. We analyzed a dataset of 6362 molecules using cheminformatics and machine learning techniques to study properties such as hydrogen bond acceptors (nHA), topological polar surface area (TPSA), molecular weight (MW) and ccLogPP. Principal component analysis (PCA) and scaffold visualization were used to explore the molecular diversity and identify structural motifs. We developed 14 classification structure-activity relationship (CSAR) models with the Scikit-learn package, each validated through ten rounds of cross-validation. The Random Forest models performance was evaluated based on its accuracy across training, cross-validation and test sets. Potent hERG inhibitors typically had fewer nHAs and lower TPSA, with no consistent trend in MW. They also had slightly higher ccLogPP values. PCA indicated an overlap between the potent and intermediate/inactive molecules, with less diversity in the potent group. The scaffold analysis identified five main cyclic skeleton systems (CSKs) among 144. The random forest model achieved accuracies of 0.987, 0.817 and 0.747 for the training, cross-validation, and test sets, respectively, effectively predicting hERG inhibition. The study found that specific molecular properties, lower nHA and TPSA, and higher cLogP are associated with potent hERG inhibitors. Identifying common structural motifs can aid in discovering new inhibitors, and the Random Forest model proved effective in classifying compounds based on their hERG inhibition activity.