MOLECULE FRAGMENTATION SCHEME AND METHOD FOR DESIGNING NEW MOLECULES

Title:

MOLECULE FRAGMENTATION SCHEME AND METHOD FOR DESIGNING NEW MOLECULES

Document Type and Number:

WIPO Patent Application WO/2008/087658

Kind Code:

Abstract:

Group based QSAR method (G-QSAR) is reported which uses descriptors evaluated only for the substituent groups or molecular fragments rather than whole molecule for generating QSAR. In addition, cross terms are calculated from product of descriptors at different substituent sites or fragments and used as descriptors to improve the QSAR models. This method provides QSAR models with predictive ability similar or better to conventional methods and in addition provides hints for sites or fragments of improvement in the molecules. The descriptor ranges for substituents or fragments are used to search for new groups/fragments leading to design of novel molecules with improved activity/property.

Inventors:

DESHPANDE SUPREET K (IN)
AJMANI SUBHASH (IN)
JADHAV KAMALAKAR (IN)
KULKARNI SUDHIR A (IN)

Application Number:

PCT/IN2008/000023

Publication Date:

July 24, 2008

Filing Date:

January 16, 2008

Export Citation:

Click for automatic bibliography generation Help

Assignee:

VLIFE SCIENCES TECHNOLOGIES PV (IN)
DESHPANDE SUPREET K (IN)

International Classes:

G06F19/00; G06F19/16

Other References:

VARNEK A ET AL: ""In silico" design of potential anti-HIV actives using fragment descriptors" COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, vol. 8, no. 5, August 2005 (2005-08), pages 403-416, XP008095456 ISSN: 1386-2073
SOLOV'EV V P ET AL: "Anti-HIV activity of HEPT, TIBO, and cyclic urea derivatives: structure-property studies, focused combinatorial library generation, and hits selection using substructural molecular fragments method." JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2003 SEP-OCT, vol. 43, no. 5, September 2003 (2003-09), pages 1703-1719, XP002492668 ISSN: 0095-2338
ESTRADA E ET AL: "Novel local (fragment-based) topological molecular descriptors for QSPR/QSAR and molecular design" JOURNAL OF MOLECULAR GRAPHICS & MODELLING ELSEVIER USA, vol. 20, no. 1, 2001, pages 54-64, XP002492669 ISSN: 1093-3263
JAPERTAS PRANAS ET AL: "Fragmental methods in the design of new compounds. Applications of the advanced algorithm builder" QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, vol. 21, no. 1, May 2002 (2002-05), pages 23-37, XP002492670 ISSN: 0931-8771
WANG JUNMEI ET AL: "Genetic algorithm-optimized QSPR models for bioavailability, protein binding, and urinary excretion." JOURNAL OF CHEMICAL INFORMATION AND MODELING 2006 NOV-DEC, vol. 46, no. 6, November 2006 (2006-11), pages 2674-2683, XP002492671 ISSN: 1549-9596
WINKLER DAVID A: "The role of quantitative structure--activity relationships (QSAR) in biomolecular discovery." BRIEFINGS IN BIOINFORMATICS MAR 2002, vol. 3, no. 1, March 2002 (2002-03), pages 73-86, XP002492672 ISSN: 1467-5463

Download PDF:

View/Download PDF PDF Help

Claims:

Claims

We claim:

1. A method to design novel molecules comprising: a. generation of molecular fragments of given set of compounds based on defined specific rules for the set b. evaluating properties of said fragments c. deriving relationship of said fragment properties with molecular activity or property leading to identification of important properties of fragments d. identifying important property ranges of fragments e. searching the fragments in the fragment database satisfying the said ranges of important properties f. combining the searched fragments to create novel molecules

2. The method according to claim 1 wherein for the said given set of compounds, the activities or properties are experimentally obtained from the same assay method or same experimental procedure.

3. The method according to claim 1 wherein the said fragments are derived based on common rules for the given set of compounds, where for the congeneric series of molecules, such fragments are the substituents at the substitution sites of the common template and for non congeneric series, the fragments may be derived from fragmentation of specific bonds, bonds on the ring fusion, regions of molecules that can be separated from common structural feature such as atom, bond and ring, or any pharmacophoric feature such as hydrogen bond donor, acceptor, charged group or atom, hydrophobic group, etc.

4. The method according to claim 1 wherein the said fragment properties are those obtained from various two dimensional and three dimensional molecular descriptors like: molecular weight, volume, number of hydrogen bond donors, number of hydrogen bond acceptors, number of rotatable bonds, logP values from various methods, molecular connectivity indices like Chi and ChiV, Hosoya indices, Weiner indices, topological indices, electro- topological indices, path count, chain count, kappa indices, polar surface area, electrostatic descriptors over van der Waals surface like negative potential surface area, positive potential surface area, mean potential, maximum and minimum potential, alignment independent descriptors, and other molecular descriptors.

5. The method according to claims 1 and 4 wherein the said fragment properties also include cross terms or interaction terms obtained from any mathematical operator or function such as scalar product of descriptor properties.

6. The method according to claim 1 wherein the said relationship of activity/ property with the fragment based descriptors is derived using different combinations of variable selection methods and statistical methods.

7. The method according to claim 6 wherein the said variable selection method is systematic selection such as stepwise forward selection, stepwise backward selection, stepwise forward- backward selection and stochastic selection methods such as simulated annealing, genetic algorithm.

8. The method according to claim 6 wherein the said statistical method is any of linear methods such as multiple regression method, principal component regression, partial least squares regression or any of the non-linear methods such as k-nearest neighbor method, neural networks.

9. The method according to claim 1 wherein the said ranges of fragment properties are derived from the ranges of properties or descriptors that form relationship with the said activity or property for active molecules or molecules with desired property ranges in the dataset.

10. The method according to claim 1 and 9 wherein the said novel fragments are obtained by search of fragments in database that satisfy derived ranges for all the fragment descriptors that form relationship with activity or property.

11. The method according to claim 1 and 10 wherein the said novel molecules are generated by combining derived fragments which satisfy the said property ranges of the descriptors of all fragments.

12. The method to design novel molecules using a computer program as substantially described herein particularly with reference to the description and examples.

13. A computer program for designing novel molecules comprising of a. generation of molecular fragments of given set of compounds based on defined specific rules for the set b. evaluating properties of said fragments c. deriving relationship of said fragment properties with molecular activity or property leading to identification of important properties of fragments d. identifying important property ranges of fragments

e. searching the fragments in the fragment database satisfying the said ranges of important properties f. combining the searched fragments to create novel molecules

Description:

Molecule Fragmentation Scheme and Method for Designing New Molecules

Field of Invention

This invention relates to designing of novel molecules using a method which allows defining and identifying the properties as well as sites of molecule governing the desired activity. This method uses chemical rules for fragmenting the molecules, calculating their properties and relating them with the activity.

000023

Background

[01] Historically in Hansen method, descriptors used for QSAR were in terms of experimentally determined group properties such as Hammett and Taft constants that are related to chemical environment and steric properties of groups (See Gasteiger, J. and Engel, T. Ed. "Chemoinformatics : A Textbook", Wiley- VCH, Weinheim, 2003; Oprea, T.I. Ed. "Chemoinformatics in Drug Discovery" Wiley- VCH, Weinheim, 2005; Kubinyi, H. "QSAR: Hansch analysis and related approaches" VCH, Weinheim, 1993). These group constants are considered to be independent of each other and their interactions are completely ignored in this method.

[02] After introduction of several theoretical molecular descriptors such as topological, electro- topological, etc., the current QSAR models are generated using these descriptors that represent properties of whole molecule rather than corresponding group contributions. Although these properties have played important role in identifying relationship with the activity, the exact interpretation of these conventional QSAR models has always been a challenging task. These models do not clearly specify site at which modification is required.

[03] For this purpose, 3D-QSAR models such as CoMFA have played vital role. See Cramer, R.D.; Patterson, D.E.; Bunce, J.D. J. Am. Chem. Soc. 1988, 110, 5959-5967; see also U.S. Patent Application Pub. No.: US5025388. The 3D-QSAR descriptors are local shape fields i.e. steric and electrostatic fields calculated at the grid points generated around aligned set of molecules. As the descriptor space is very large, 3D-QSAR models are generated by using regression methods such as partial least squares (PLS) method, which can reduce the descriptor space dimensionality. The 3D- QSAR models can provide clues for designing new molecules by specifying areas along with its steric and electrostatic requirements of the molecules. However, one of the major limitations of 3D- QSAR method is their dependency on molecular alignment and choice of conformation used to generate QSAR. hi addition, for non congeneric series of molecules identifying rule for alignment would be challenge.

[04] hi order to overcome the limitations of 2D/3D QSAR methods, a recent patent reports ID QSAR method that creates ID profile of a set of molecules having same biological activity and then identify the features that are common to all or most of the molecules. See Patent Application Pub. No.: WO2006055918.

[05] From above discussion, it is clear that there is a requirement of QSAR method, which will be site specific (in terms of molecular fragment/group) and capture various possible interactions

amongst them. In addition, unlike 3D-QSAR, this method does not require conformational analysis and alignment of the molecules to provide clues about sites and nature of interactions responsible for activity variation.

[06] Although, above QSAR methods are used for screening of virtual combinatorial libraries, they do not provide clues for choice of substitution groups or fragments for improvement in the activity or property of new molecules to be synthesized.

[07] The objective of present invention is to design novel molecules with desired properties by overcoming some of the problems mentioned in prior art. Another object is to develop an approach that could be applied to wide variety of problems i.e. deriving QSAR for cogeneric / non-cogeneric set of molecules and would provide ease of interpretation in terms of inverse QSAR i.e. providing direction for novel molecule design. The present invention reports a method that derives quantitative relationship of activity or property with the groups or fragments of the molecules generated on the basis of a rule derived for the dataset under consideration. The definition of chemical rules allows flexibility to focus on the specific molecular site(s) of interest for establishing QSAR and hence can provide clues for design of new molecules from various aspects of molecular structure.

[08] The present invention also reports method of identifying new groups or fragments based on ranges of descriptors of the groups or fragments leading to design of novel molecules with desired properties.

Summary

Present invention reports an approach which deals with molecular fragment/group (derived by applying specific chemical rules) based descriptors to build QSAR model and identify important molecular site(s) and their corresponding property to aid in novel molecule design with desired molecular activity or property.

In the present study we have demonstrated use of partitioning of molecular descriptor information into the substituent group or molecular fragment based descriptors. In addition, we have shown to utilize the cross terms (i.e product of group based descriptors) in the improvement of conventional QSAR models as well as G-QSAR models.

The methodology was applied on two datasets of Cox-2 inhibitors (congeneric series) and antifungal molecules (non-congeneric) by evaluating simple 2D descriptors to generate QSAR, G-

QSAR and G-QSAR IT models using multiple regression and partial least squares regression methods.

Herein we have demonstrated that applying simple chemical rules to divide chemical structure (to obtain corresponding fragment descriptors for G-QSAR) could be useful to get much better understanding of molecular mechanism of biological activity variation as compared to conventional QSAR. In addition, it is shown that the use of cross terms (i.e product of fragment based descriptors) could be useful in the improvement of G-QSAR models. The proposed G-QSAR methodology allows ease of interpretation unlike any conventional QSAR method which could only suggest important descriptor but does not reflect the site where it has to be optimized for design of new molecules.

Detailed Description of Invention

[09] The present invention allows deriving quantitative relationship between activity and descriptors calculated for various molecular groups or fragments of interest. Thus the fragmentation of the molecules forms a pre-requisite step in order to perform QSAR. Herein after, the method reported herein that allows generation of QSAR model based on descriptors of groups or fragments is designated as G-QSAR method.

[10] The fragmentation of a molecule becomes simple while working with a set of congeneric molecules, i.e. simply number of sites at which the substituents are varying forms that many different fragments for a given molecule. Following is example of such case:

Structure of Anti-adrenergic compounds

The X and Y are the substitution sites of a congeneric series of antiadrenergic active meta-, para-, and meto,/>αrα-disubstituted N,N-Dimethyl-2-bromophenethylamines. For QSAR of this set, the molecules are divided into two fragments composed of various substitutions at two sites X and Y. [11] In case of working with a set of non-congeneric set of molecules i.e. having chemically diverse structures or different templates in the molecule, it requires breaking up of a set of molecules with a

predefined set of chemical rules, in which the molecules are considered as composed of different fragments as represented below with a simple example with 3 fragments:

In order to consider the environment of the neighboring fragments) the attachment point atoms are included in the fragments. This will differentiate fragment B (w.r.t. to its environment) which will include attachment atom of A and C as compared to fragment A and C which will have only the attachment atom from B from its corresponding attachment point. For non congeneric series, the fragments may be derived from fragmentation of specific bonds, bonds on the ring fusion, regions of molecules that can be separated from common structural feature such as atom, bond and ring, or any pharmacophore feature such as hydrogen bond donor, acceptor, charge, hydrophobe, and other features.

[12] Once the molecular fragments are prepared, the next step in the present invention is to calculate various 2D/3D descriptors (same as whole molecular descriptor) for those fragments like established 2D descriptors chi indices, valence based chi indices, electro-topological indices, HBA, HBD, rotatable bonds and/or other 3D alignment independent descriptors dipole moment, radius of gyration, group volume, group polar surface area and similar descriptors.

[13] In addition, the present invention also utilizes the terms corresponding to interactions between various fragments by calculating interaction/cross terms using a mathematical operators like product. As an example, if two descriptors (Dl and D2) are calculated for the two fragments A and B, following descriptors will be generated:

DlA, DlB, D2A, D2B, D1A*D1B, D1A*D2B, D1B*D2A, D2A*D2B, D1A*D2A, D1B*D2B Where, Dl A, DlB are calculated descriptor Dl for the two fragments A and B and D2A, D2B are calculated descriptor D2 for the two fragments A and B.

[14] The third step in the present invention is to build a quantitative model. Since a large pool of descriptors is now available for building a quantitative model and not all of the descriptors are important for the activity, one needs a method to pick optimal subset of descriptors that explains variation in the activity. For this purpose various variable selection methods are available and can be coupled with variety of statistical methods available for building quantitative model.

008/000023

[15] Few variable selection methods and quantitative model building methods used in present invention are enumerated below:

Variable selection methods:

Stepwise forward, stepwise forward-backward, stepwise backward, simulated annealing method, genetic algorithm and others

Statistical model building methods:

Multiple regression, principal component regression, partial least squares regression, continuum regression, k-nearest neighbor, neural networks and others

In principle any variable selection method can be coupled with any statistical method of choice for building quantitative model.

[16] The present invention also describes use of quantitative models generated.

EXAMPLE 1

1.1 • Cox-2 inhibitor dataset

This method was tested on the series of Cox-2 inhibitors (NSAID) as reported in the literature see Desiraju, G. R.; Gopalakrishnan B.; Jetti, R. K. R.; Raveendra, D.; Sarma, J. A. R. P.; Subramanya, H. S., Molecules 2000, 5, 945-955. Initially we derived conventional QSAR model from molecular descriptors and compared it with the corresponding group based QSAR model. Based on common fragment of 1,5-diphenylpyrazole several group based descriptors were evaluated. We have used 25 molecules as training set and 5 molecules as test set as described in the original paper of Desiraju et al.

Structure of Cox-2 Inhibitors

This is an example of application of proposed methodology on a set of congeneric series molecules i.e. having a common template and variation of chemical substituents at various substitution sites. Since the same descriptors are calculated for various groups at different sites the following nomenclature is used for naming a descriptor at a particular position for e.g. Rl MoI. Wt. represents the molecular weight of the group present at Rl substitution site. Following formula was used for calculation of interaction/cross terms of the various group descriptors at different substituent sites e.g.:

R3_slogp*R4_Mol.Wt. = R3_slogp x R4_Mol.Wt.

Where, R3_slogp corresponds to value of slogP of the group at R3 substitution site and

R4_Mol.Wt. is the value of molecular weight of the group at R4 substitution site.

1.2 QSAR model

To build QSAR model PLS regression method was applied on selected set of 8 descriptors which resulted in a statistically significant model with 6 PLS components as reported in table 1.

13 Group/Fragment based QSAR (G-QSAR) model

The stepwise multiple linear regression analysis resulted in a significant G-QSAR model with 5 descriptors. The descriptors and the statistical parameters of the model are reported in table 1.

1.4 Group/Fragment based QSAR with interaction terms (G-QSARJT) model

PLS regression method applied on a selected set of 12 descriptors which includes both group based and interaction term descriptors led to a statistically significant QSAR model with 4 PLS components as reported in table 1.

The conventional QSAR model (table 1) is statistically significant and indicates the significance of basic molecular properties such as hydrogen bond acceptor counts, hydrogen bond donor counts, log partition coefficient (slogP) etc., however it does not show the site where variation is required leading to difficulty in interpretation. In order to get better insights of group descriptors important in explaining variation of activity G-QSAR and G-QS AR IT models were developed.

It can be seen from the table 1 that substitution sites R2, R3 and R4 were found to be playing major role in G-QSAR and G-QSAR IT models and this in line with the amount of variation in chemical substitution at the various substitution sites. The Rl and R5 site descriptors do not appear in models since the variation of groups at those sites are not significant.

EXAMPLE 2

2.1 Anti-fungal dataset

The biological activity data of two series of i) Heterocyclecarboxamide derivatives of 3-amino-2- aryl-l-azolyl-2-butanol and ii) 3-substituted-4(3H)-quinazolinones reported as anti-fungal molecules were collected from the research papers see Bartroli, J.; Turmo, E.; Forn, J. J. Med. Chem. 1998, 41, 1855-1868 & 1869-1882. In order to consider further structural variation in the molecules in the present study, other standard anti-fungal molecules i.e. itraconazole, voriconazole etc. were included in the dataset.

The biological activities were expressed in terms of geometric mean of MIC values (μg/ml) against 10 yeasts (i.e. anti-candida) and against 6 filamentous fungi (i.e. anti-aspergillus). For QSAR analysis the activity was converted into negative logarithm of MIC values (pMIC). In the present study two activities i.e. pMICyst and pMICff were used which represents anti-candida and anti- aspergillus activities respectively.

The main objective of the present study was to develop a single QSAR model for both the activities so that it can provide an insight into various structural features influencing both activities simultaneously which could finally be used for optimization and design of dual active molecule.

2.2 Rules for Molecular Fragmentation and Descriptor Calculation

The present case of anti-fungal molecules is an example of non-congeneric series of molecules, in which the molecules are considered as composed of different fragments as follows:

Template for antifungal molecules

( \

Template - A' - B - C

Fragment A: It is defined as part of molecule traced from the template either from path Rl or R2 until a ring structure is found. If the ring found in Rl /R2 path is fused, first ring is considered as part of A and the second ring forms the part of fragment B.

Fragment B: It is formed by a single ring structure after fragment A.

Fragment C: Finally the remaining portion of the molecule that follows fragment B is considered as fragment C.

In order to consider the environment of the neighboring fragment(s) the attachment point atoms are included in the fragments.

For QSAR analysis various 2D descriptors (a total of 360) like element counts, molecular weight, molecular refractivity, logP, topological index, electro-topological index, Baumann alignment independent topological descriptors etc. were calculated using VLifeMDS software see VLifeMDS: Molecular Design Suite developed by VLife Sciences Technologies Pvt. Ltd., Pune, India 2006.

Each molecule was divided into 3 fragments as described above and the descriptors of the molecules (same as in QSAR) were calculated for various fragments of the molecule. Following are few representative molecules and theif corresponding fragments considered in the present study.

P T/IN2008/000023

The preprocessing of the calculated descriptors led in total 729 descriptors. Since the same descriptors are calculated for the different fragments of the molecules, the following nomenclature is used for naming a descriptor for a particular fragment e.g. A Mol.Wt represents the molecular weight of the fragment A.

Following formula was used for calculation of interaction/cross terms of the various fragments e.g.:

B_slogp*C_Mol.Wt. = B_slogp x C_Mol.Wt.

Where, B slogp corresponds to value of slogP of the fragment B and

C MoI. Wt is value of molecular weight of fragment C.

In the present study the interaction/cross terms were calculated only for the descriptors of the fragments which are found to be significant in GQSAR analysis and thus it resulted in 240

descriptors after removing the invariable descriptors. To analyze this information and building models, various regression methods i.e. multiple regression, PLS regression and variable selection methods i.e. stepwise forward-backward and simulated annealing were used.

2.3 QSAR model

For model validation at first the dataset was divided in a training set of 81 molecules and test set of 20 molecules. PLS regression applied on selected 22 descriptors (by extracting 13 PLS components) resulted in statistically significant model with respect to both the activities from 22 selected descriptors, the model parameters are reported in table 2.

2.4 Group based QSAR (G-QSAR) model

The simulated annealing variable selection coupled with partial least squares regression analysis resulted in a significant G-QSAR model with 12 PLS components extracted from a selected subset of 23 descriptors. The descriptors and the statistical parameters of the model are reported in table 2. An advantage of GQSAR method is that it provides information about the contribution (%) of each fragment in the model which is shown in graph 1.

2.5 Group based QSAR with interaction terms (G-QSAR-IT) model

The simulated annealing variable selection coupled with partial least squares regression analysis was applied on a selected set of 240 descriptors which includes both group based and interaction term descriptors. This analysis resulted in a statistically significant

G-QSAR model with an optimal subset of 25 descriptors (7 PLS components) as reported in table 2. The graphs 2 show the contribution (%) of each fragment and their interactions in the PLS model.

The resulting GQSAR IT model is a better model with optimal statistical parameters as compared with QSAR and GQSAR model. The graphs 3 and 4 shows a plot of the observed vs. predicted for both anti-candida and anti-aspergillus activities by this model.

The above study resulted in better QSAR, G-QSAR and G-QSARJT models using simple 2D descriptors. It can be seen from the table 2 that both the G-QSAR and G-QSARJT models are comparable/better than the conventional QSAR method.

The conventional QSAR model (table 2) indicates the significance of basic molecular properties such as rotatable bond counts, log partition coefficient (slogP), polar surface area (PSA) etc. It is noticed that count of oxygen atoms (which indicates importance of hydrogen bonding) in a molecule (OxygenCounts) is maximally influencing (~12%) and is directly proportional to anti-aspergillus activity while flexibility of a molecule (RotatableBondCount) is found to be of major importance

(-14%) and is inversely proportional to anti-candida activity, however it does not show the site where variation is required leading to difficulty in interpretation. In order to get better insights of fragment descriptors important in explaining variation of activity G-QSAR and G-QSAR IT models were developed.

It can be seen from descriptors in G-QSAR model (graph 1) that chemical variation in fragment C plays major role (-50%) in determining both anti-fungal activities. In addition, it can noticed that fragment A and B contribute equally (-25%) to anti-candida activity whilst fragment B (~30%) influence anti-aspergillus activity as compared to fragment A (-20%).

The GQSAR IT analysis reveals that the fragment A and interaction of fragment B and C i.e. BC mainly influences variation in both antifungal activities. It can also be noticed that AB interaction is more important than AC interaction in governing anti-candida activity whilst AB and AC interactions influences almost equally to anti-aspergillus activity.

This study has allowed comparison of conventional QSAR method with proposed G_QSAR methodology. It can be noticed that though not all the descriptors in QSAR and GQSAR models are same but few descriptors are common, which aid to the confidence in statistical model developed using different approaches. In addition as an advantage of GQSAR it also indicates the fragment from which a descriptor is contributing to the model unlike QSAR. This combination of methods allows a better interpretation of the models in terms of the contribution of each molecular fragment and/or their interactions.

Flowchart for GQSAR

Table 2: Statistical parameters and descriptors obtained for QSAR, G-QSAR and G- QSARJT models for anti-fungal dataset

Previous Patent: A PROCESS FOR PREPARING EPICHLOROHYDRIN

Next Patent: AUTOMATIC TWO WHEELER STAND