Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD FOR PREDICTION OF HERG POTASSIUM CHANNEL INHIBITION IN ACIDIC AND ZWITTERIONIC COMPOUNDS
Document Type and Number:
WIPO Patent Application WO/2015/028597
Kind Code:
A1
Abstract:
The present invention provides a method for developing predictive models of ion channel inhibition by use of a training set of compounds selected from acidic and/or zwitterionic compounds, methods for predicting of ion channel inhibition of compounds selected from acidic and/or zwitterionic compounds, and computer-assisted methods of the above. More specifically the present invention relates to methods for prediction of h ERG inhibition, which is particularly useful for the screening of drugs for cardiac toxicity.

Inventors:
NIKOLOV NIKOLAI GEORGIEV (DK)
WEDEBYE EVA BAY (DK)
Application Number:
PCT/EP2014/068360
Publication Date:
March 05, 2015
Filing Date:
August 29, 2014
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV DENMARK TECH DTU (DK)
International Classes:
G01N33/68; G06F19/00
Other References:
ARONOV ET AL: "Predictive in silico modeling for hERG channel blockers", DRUG DISCOVERY TODAY, ELSEVIER, RAHWAY, NJ, US, vol. 1, no. 2, 15 January 2005 (2005-01-15), pages 149 - 155, XP027685049, ISSN: 1359-6446, [retrieved on 20050115]
WARING ET AL: "A quantitative assessment of hERG liability as a function of lipophilicity", BIOORGANIC & MEDICINAL CHEMISTRY LETTERS, PERGAMON, AMSTERDAM, NL, vol. 17, no. 6, 20 February 2007 (2007-02-20), pages 1759 - 1764, XP005895406, ISSN: 0960-894X, DOI: 10.1016/J.BMCL.2006.12.061
TOBITA M ET AL: "A discriminant model constructed by the support vector machine method for HERG potassium channel inhibitors", BIOORGANIC & MEDICINAL CHEMISTRY LETTERS, PERGAMON, AMSTERDAM, NL, vol. 15, no. 11, 2 June 2005 (2005-06-02), pages 2886 - 2890, XP027801109, ISSN: 0960-894X, [retrieved on 20050602]
ALESSIO COI ET AL: "Quantitative Structure-Activity Relationship Models for Predicting Biological Properties, Developed by Combining Structure- and Ligand-Based Approaches: An Application to the Human Ether-a-go-go-Related Gene Potassium Channel Inhibition", CHEMICAL BIOLOGY & DRUG DESIGN, vol. 74, no. 4, 1 October 2009 (2009-10-01), pages 416 - 433, XP055106607, ISSN: 1747-0277, DOI: 10.1111/j.1747-0285.2009.00873.x
SCHIESARO ANDREA ET AL: "Prediction of hERG Channel Inhibition Using In Silico Techniques", 2011, ION CHANNELS AND THEIR INHIBITORS SPRINGER-VERLAG BERLIN, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY, PAGE(S) 191-239, ISSN: null, XP009176729
Attorney, Agent or Firm:
HØIBERG A/S (Copenhagen K, DK)
Download PDF:
Claims:
Claims

A prediction method for predicting hERG channel inhibiting activity of chemical substances selected from the group consisting of acids and zwitterionic compounds, wherein said prediction method uses a combination of at least all descriptors a) to c):

a) a descriptor of the size of one or more conformers, and

b) a descriptor of the reactivity on nitrogen atoms of one or more conformers, and

c) a descriptor of the acidity (pKa (acidic)) of a compound.

The prediction method according to claim 1 , wherein the prediction method uses one or more structural descriptors.

The prediction method according to any of the previous claims, wherein prediction method uses one or more structural descriptors derived from conformational descriptors.

The prediction method according to any of the previous claims, wherein the descriptor of the size of one or more conformers is:

a) a conformational descriptor selected from the group consisting of minimal diameter (DiamMin), maximum diameter (DiamMax), Van der Waals surface (VAN_D_WAALS_SUR) and effective cross-sectional diameter (DiamEff), or b) a structural descriptor selected from the group consisting of Max DiamMin, MaxDiamMax, MaxVAN_D_WAALS_SUR and MaxDiamEff.

5. The prediction method according to the preceding claims, wherein the descriptor of the size of one or more conformers is a conformational descriptor selected from the group consisting of minimal diameter (DiamMin), maximum diameter (DiamMax), Vander der Waals surface (VAN_D_WAALS_SUR) and effective cross-sectional diameter (DiamEff).

6. The prediction method according to the preceding claims, wherein the descriptor of the size of one or more conformers is a structural descriptor selected from the group consisting of Max DiamMin, MaxDiamMax, MaxVAN_D_WAALS_SUR and

MaxDiamEff.

7. The prediction method according to any of the preceding claims, wherein the prediction method uses at least one descriptor of the conformer effective cross- sectional diameter.

8. The prediction method according to the preceding claims, wherein the descriptor of the size of one or more conformers is the conformational descriptor DiamEff, or the structural descriptor MaxDiamEff derived by taking the maximum of the effective cross-sectional diameter (DiamEff) on a set of one or more conformers.

9. The prediction method according to the preceding claims, wherein the descriptor of the size of one or more conformers is a structural descriptor (MaxDiamEff) derived by taking the maximum of the effective cross-sectional diameter (DiamEff) for a set of one or more conformers.

10. The prediction method according to the preceding claims, wherein the descriptor of the reactivity on nitrogen atoms of one or more conformers is either a) an atomic descriptor, or b) a conformational descriptor, or c) a structural descriptor. 1 1 . The prediction method according to the preceding claims, wherein the descriptor of the reactivity on nitrogen atoms of one or more conformers is an atomic descriptor and/or a descriptor derived from a selected type of atomic descriptor, wherein said atomic descriptor is calculated on the nitrogen atoms a conformer, and wherein said atomic descriptor is selected from the group consisting of donor (electrophilic) superdelocalizabilities (Donor DLC), atomic self-polarizability (POLAR) and partial electron densities of the nitrogen atom in the frontier orbital (POP_LUMO).

12. The prediction method according to the preceding claims, wherein the descriptor of the reactivity on nitrogen atoms of one or more conformers is an atomic descriptor selected from the group consisting of donor (electrophilic) superdelocalizabilities

(Donor DLC), atomic self-polarizability (POLAR) and partial electron densities of the nitrogen atom in the frontier orbital (POP_LUMO).

13. The prediction method according to the claim 1 1 , wherein the descriptor of the reactivity on nitrogen atoms of one or more conformers is a conformational descriptor derived by taking the maximum of the values of said atomic descriptor on all nitrogen atoms of a given conformer.

14. The prediction method according to the claim 1 1 , wherein the descriptor of the reactivity on nitrogen atoms of one or more conformers is a structural descriptor derived by taking the maximum of the values of the conformational descriptors as defined in claim 13 on a set of one or more conformers, said structural descriptor being selected from the group consisting of DEstructure, MAX POLAR, and MAXPOP_LUMO.

15. The method according to any one of the preceding claims, wherein the predictive model uses at least one descriptor of the maximum donor (electrophilic) superdelocalizability on nitrogen atoms of one or more conformers of a chemical compound.

16. The prediction method according to the preceding claims, wherein the descriptor of the reactivity on nitrogen atoms of one or more conformers is the atomic descriptor

DONORJDLC, or the conformational descriptor DEConformer, or the structural descriptor DEstrUcture-

17. The prediction method according to the preceding claims, wherein the descriptor of the reactivity on nitrogen atoms of one or more conformers is DEstructure-

18. The prediction method according to any of the previous claims, wherein said prediction method uses a combination of descriptors comprising:

a) a descriptor of effective diameter of one or more conformers of a chemical compound and,

b) a descriptor of donor (electrophilic) superdelocalizability on nitrogen atoms of one or more conformers a chemical compound, and

c) a descriptor of pKa (acidic). 19. The prediction method according to any of the previous claims, wherein said prediction method uses a combination of descriptors comprising:

a) a structural descriptor of maximum conformer effective diameter on one or more conformers of a chemical compound (MaxDiamEff) and,

b) a structural descriptor of maximum donor (electrophilic) superdelocalizability on nitrogen atoms on one or more conformers of a chemical compound

(DEStructure)j Q^ld

c) a structural descriptor of pKa (acidic).

20. The prediction method according to any of the previous claims, wherein said prediction method comprises the use of a binary classification model. 21 . The prediction method according to claims 19 to 20, wherein said prediction method uses a predictive threshold of the maximum donor (electrophilic) superdelocalizability on the nitrogen atoms of one or more conformers (DEstructure) in the range of about 0.1 a.u./eV to about 0.4 a.u./eV, such as about 0.2 a.u./eV to about 0.3 a.u./eV, such as about 0.25 a.u./eV to about 0.3 a.u./eV, such as about 0.26 a.u./eV to 0.28 a.u./eV, such as about 0.278 a.u./eV.

22. The prediction method according to claims 19 to 21 , wherein said prediction method uses a predictive threshold of the maximum conformer effective cross- sectional diameter calculated on one or more conformers of a given chemical compound or structure (MaxDiamEff) in the range of about 5 A to 15 A, such as 9 A to 1 1 A, such as 10 A to 10.5 A, or such as in the range of 10.3 A to 10.4 A, such as about 10.36 A.

23. The prediction method according to claims 19 to 22, wherein said prediction method uses a predictive threshold of pKa (acidic) in the range of about 0 to about

16, such as about 2 to about 8, such as about 4 to about 6, such as about 4, or such as about 5, or such as about 6.

24. The prediction method according to claims 19 to 23, wherein said prediction method uses a predictive threshold of pKa (acidic) in the range of about 0 to 16, such as about 2 to 8, and a predictive threshold of maximum donor (electrophilic) superdelocalizability on the nitrogen atoms of one or more conformers of a chemical compound (DEstructure) in the range of about 0.1 a.u./eV to about 0.4 a.u./eV, such as within the range of about 0.2 a.u./eV to about 0.3, and a predictive threshold of maximum conformer effective cross-sectional diameter calculated on one or more conformers of a given chemical compound or structure (MaxDiamEff) in the range of about 5 A to 15 A, such as 8 A to 12 A, such as 9 A to 1 1 A.

25. The prediction method according to claims 19 to 24, wherein said prediction method uses a predictive threshold of pKa(acidic) is in the range of about 2 to 8, and the predictive threshold of maximum donor (electrophilic) superdelocalizability on the nitrogen atoms of one or more conformers of a chemical compound (DEstructure) is in the range of about 0.275 a.u./eV to 0.280 a.u./eV, and the predictive threshold of maximum conformer effective cross-sectional diameter calculated on one or more conformers of a given chemical compound or structure (MaxDiamEff) may be in the range of 9 A to 1 1 A, such as in the range of about 10.3 A to about 10.4 A.

26. The prediction method according to any of the previous items, wherein the prediction method uses 1 to 10 descriptors, such as 1 to 5 descriptors, such as 1 to 3 descriptors.

27. The prediction method according to any of the previous items, wherein the descriptors are calculated using a set of conformers that consists of at least one conformer per chemical structure, such as a set of conformers that consists of 1 to 500 conformers, such as a set of 1 to 50 conformers per structure.

28. A method for developing a prediction method of hERG channel inhibiting activity of chemical substances, wherein the prediction method is obtained by training on a set of compounds which is divided based on ionization.

29. The method according to claim 28, wherein said prediction method is obtained by training on a set of compounds selected from a group consisting of acids and/or zwitterionic compounds.

30. The method according to claims 28 and 29 wherein said prediction method is further defined by claim 1 to 27.

31 . A computer-assisted method as defined in claims 1 to 27 or as defined in claims 28 to 30.

32. A computer program product comprising a computer-assisted method as defined in claim 31 .

33. A data carrier comprising a computer-assisted method as defined in claim 31 .

Description:
A method for prediction of hERG potassium channel inhibition in acidic and zwitterionic compounds

Field of invention

The present invention provides a method for developing predictive models of hERG ion channel inhibition by use of a training set of compounds selected from acidic and/or zwitterionic compounds, as well as a predictive method of hERG ion channel inhibition activity of compounds selected from acidic and/or zwitterionic compounds. Also, the present invention relates to computer assisted methods of the above. Such methods are useful for the screening of drugs for cardiac toxicity.

Background of invention Ion channels are cellular proteins that regulate the flow of ions, including potassium, calcium, chloride and sodium into and out of cells. Such channels are present in all animal and human cells and affect a variety of processes including neuronal transmission, muscle contraction, and cellular secretion. Potassium (K + ) channels are structurally and functionally diverse families of potassium selective channel proteins, which are ubiquitous in cells, and have central importance in regulating a number of key cell functions for example in the brain, heart, pancreas, prostate, kidney, gastro-intestinal tract, small intestine and peripheral blood leukocytes, placenta, lung, spleen, colon, thymus, testis and ovaries, epithelia and inner ear organs. Humans have over 70 genes encoding potassium channel subtypes (Jentsch Nature Reviews Neuroscience 2000, 1 , 21 -30) with a great diversity with regard to both structure and function. While widely distributed as a class, potassium channels are differentially distributed as individual members of this class or as families. The human ether-a-go-go-related gene (hERG) encodes the pore forming alpha subunit of the hERG potassium ion channel (also called K v 1 1 .1 or KCHN2) which plays a crucial role in repolarization of the heart and mediates the repolarizing l KR current in the cardiac action potential. Inhibition or blocking of the channel is associated with QT interval prolongation (long QT syndrome) which in turn may cause torsades de pointes, a potentially fatal arrhythmia (Mitcheson 2000, Sanguinetti 2006). Blockade of hERG has been extensively investigated in the recent decade, including mechanistic studies, a number of in silico approaches (reviews are available e.g. in Aronov 2005, Schiesaro et al. 201 1 ), and new in vitro assays proposed as alternatives to the traditional cost-intensive patch-clamp method. Although recent models of other important cardiotoxicity endpoints exist, hERG blocking remains an important marker for cardiac risk.

A high number of clinically successful drugs have had the tendency to inhibit hERG, and create a concomitant risk of sudden death, as a side-effect, which is a common reason for drug failure in preclinical trials. Therefore hERG inhibition is an important activity that must be avoided during drug development, and the need for assessment of QT prolongation liability of drugs under development is recognized in topic E14 of the International Conference on Harmonization in 2005. However, a relatively diverse group of drugs have been found to induce arrhythmias by blockage of hERG ion channels.

Predictive models for hERG inhibition can assist the elimination of possibly cardiotoxic drug candidates at an early stage in drug design. In this way, both the costs of drug development and the time spent on development may be reduced since the research can be focused on drug candidates with decreased risk of cardiotoxicity.

Due to the necessity of the elimination of cardiotoxicity, a high number of predictive models have been developed in order to predict inhibition or binding to hERG. Since hERG is a relatively promiscuous target which has been shown to interact with pharmaceuticals of a highly varying structure, the development of an accurate prediction method is a difficult task.

A number of previously developed predictive modeling efforts have included molecular docking, pharmacophore-based positive predictions and quantitative structure-activity relationship (QSAR) models. The appliance of these often requires either substantial manual intervention into calculation procedures (e.g. in molecular docking) or is optimized to predictions of presence and not absence of activity (e.g. in pharmacophore-based models). Also, the use of multiple descriptors for example in QSAR models makes the predictions less transparent and the results difficult to interpret.

It has been found that the addition of an acidic ionogenic group to a potential drug- candidate could reduce hERG binding. This has led to the particular attention on acidic compounds in order to avoid cardiotoxicity and hERG binding. Still, a considerable number of acidic compounds have been shown to bind hERG channels, and the physico-chemical attributes which discriminate acidic compounds capable of blocking hERG from acidic compounds which do not bind hERG have not been described in detail. Thus, there is a specific need for prediction and analysis of hERG binding in acidic and zwitterionic chemicals. A recent review (Taboureau et al. 201 1 ) has disclosed that hERG ion channel pharmacophore models from a number of different studies agree that charged nitrogen (hydrogen bond acceptor) and aromatic rings (hydrophobic feature) were important features to consider in hERG binding. The article does not disclose the descriptors effective cross-sectional diameter and donor (electrophilic) superdelocalizability.

A scientific article by Waring et al. has suggested that ionization status of a compound is influencing the hERG potency, but lipophilicity is shown to be a stronger driver for the hERG potency. Waring does not disclose or suggest QSAR models for compounds divided based on ionization status or pK a values.

The descriptor donor (electrophilic) superdelocalizability on oxygen atoms have been successfully used to model androgen receptor binding (Todorov et al. 201 1 ). However, the article does not indicate nor suggest that the donor (electrophilic) superdelocalizability on nitrogen atoms may be successful for modeling inhibition of the un-related hERG ion channel.

Summary of invention The present invention provides a method for prediction of hERG binding and/or inhibition which is based on a relatively low number of descriptors, and which at the same time gives a high predictive performance and transparency. A high transparency is particularly favourable in the design of new drugs with low cardiotoxicity, since provides a simplified interpretation of the attributes of a compound which influence hERG binding and/or inhibition activity.

Thus the present invention in one aspect provides a method for developing a predictive model of hERG channel inhibiting activity of chemical substances, wherein the predictive model is obtained by training on a set of compounds which is divided based on ionization.

In further aspects, the invention relates to a method for predicting hERG channel inhibiting activity of chemical substances selected from the group consisting of acids and zwitterionic compounds, wherein the method comprises the use of a predictive model. The predictive model may be as herein defined.

In one embodiment of the present invention, a prediction method for predicting hERG channel inhibiting activity of chemical substances selected from the group consisting of acids and zwitterionic compounds uses a combination of all descriptors a) to c):

a) a descriptor of the size of one or more conformers of a chemical compound, and

b) a descriptor of the reactivity on nitrogen atoms of one or more conformers of a chemical compound, and

c) a descriptor of the acidity (pKa (acidic)) of a chemical compound. In a preferred embodiment of the present invention, the prediction method uses a combination of the below descriptors a) to c):

a) a descriptor of effective diameter of one or more conformers of a chemical compound and,

b) a descriptor of donor (electrophilic) superdelocalizability on nitrogen atoms of one or more conformers a chemical compound, and

c) a descriptor of pKa (acidic) of a chemical compound.

In a still more preferred embodiment of the present invention, the prediction method uses a combination of descriptors comprising: a) a structural descriptor of maximum conformer effective diameter on one or more conformers of a chemical compound (MaxDiamEff), and

b) a structural descriptor of maximum donor (electrophilic) superdelocalizability on nitrogen atoms on one or more conformers of a chemical compound (D E st ructure), and

c) a structural descriptor of pKa (acidic).

Furthermore, the invention relates to a computer-assisted method or prediction model further defined as in the present application.

Detailed description of the invention

Defintitions: Acid: In the present invention, an acid is meant as defined by the terms conventionally used in the art. The strength of an acid is commonly described by use of its dissociation constant K a or the negative logarithm of the dissociation constant, pK a . The larger value of pK a , the smaller the extent of dissociation at any given pH. A compound is an acid if it has one or more acidic ionogenic groups which has pK a less than 16. A weak acid has a pK a in the range of -2 to 16 in water. Strong acids are almost completely dissociated in water and have a pK a less than -2 as determined experimentally or theoretically.

Applicability domain (AD): AD of a model is the physico-chemical, structural or biological space, knowledge or information on which the training set of the model has been developed, and for which it is applicable to make predictions for new compounds. For example the AD of a QSAR model is defined on the basis of the training set which has been used for developing the QSAR model.

Base: In the present invention, a base is meant as defined by the terms conventionally used in the art. A compound is a base if it has one or more basic ionogenic groups which has pK b below 16. The strength of a base is commonly described by use of its dissociation constant K b or the negative logarithm of the dissociation constant, pK b . The larger value of pK b , the smaller the extent of dissociation at any given pH. Binary classification model: A model that predicts two categorical values, such as for example 1 ) being a hERG ion channel inhibitor or, 2) not being a hERG ion channel inhibitor. Half maximal inhibitory concentration (IC 50 ): A measure of the effectiveness of a compound in inhibiting biological or biochemical function. This quantitative measure indicates how much of a particular drug or other substance is needed to inhibit a given biological process by half. In other words, it is the half maximal (50%) inhibitory concentration (IC) of a substance (50% IC, or IC 50 ). In the present invention, the biological process is defined by the functionality of the hERG channels, for example the transportation of potassium across a membrane, or such as the repolarization of the l kr current in the cardiac action potential. The IC 50 of hERG can for example be measured by use of conventional techniques in the art such as patch clamp assays of mammalian cell lines expressing hERG or radioligand binding assays.

QSAR: A quantitative structure-activity relationship (QSAR) model uses descriptors (predictor variables derived from physico-chemical properties or theoretical molecular descriptors of chemicals) for prediction of activity of compounds. For example in the case of a hERG inhibition QSAR model, a regression QSAR model relates predictor variables (descriptors) to the hERG binding or inhibition of a compound (IC 5 o, K, or % inhibition). A classification QSAR model relates the predictor variables (descriptors) to a categorical value of the response variable. The descriptors can be related to a value of hERG inhibition activity, such as for example hERG IC 5 o- In the case of a binary hERG inhibition QSAR classification model, the descriptors can be related to a value of 1 ) being a hERG inhibitor or, 2) not being a hERG inhibitor.

Zwitterionic ampholyte: An amphoteric compound (zwitterionic compound) with both acidic and basic ionogenic groups and wherein the pKa of the acidic group (pKa (acidic)) is less than pKa of the basic group (pKa (basic)), thus pKa (acidic)) < pKa (basic).

Conformer effective cross-sectional diameter: A descriptor that is also called DiamEff. The descriptor is defined as the diameter of the least-diameter cylinder containing the conformer (this parameter depends on the conformation, therefore the three- dimensional coordinates of all atoms are used for calculating the conformer effective cross-sectional diameter). The maximum of DiamEff (MaxDiam Eff) over several conformers is a measure of both size and flexibility of the whole structure of a chemical compound. We use the implementation of DiamEff in OASIS Database Manager 1 .7.3 (http://oasis-lmc.org) calculated according to the following definition.

Let / be a set of points in R 3 so that each atom of the conformer is represented by exactly one element of / which has the same three-dimensional coordinates as the atom. Then DiamEff is defined by formula 1 below: DiamEff = m l is a line in R3 max{d (l, c) \ c E 1} (1 ) where R 3 is the three-dimensional Euclidean space of real numbers and d(x,y) denotes the Euclidean distance between a line and a point y in R 3 (the definition of a smallest encompassing cylinder can be found e.g. in (Schomer et al. 2000)). Donor (electrophilic) superdelocalizability D E : The atomic descriptor donor (electrophilic) superdelocalizability is a variant of reactivity indices in the Huckel molecular orbital scheme and was originally defined by (Fukui et al. 1954) and implemented into MOPAC (Stewart 1990 and 1993, http://openmopac.net/manual/super.html) and used in the Oasis DatabaseManager system (http://oasis-lmc.org) to calculate the capability of atoms to make covalent bonds by donation of electrons. The donor (electrophilic) superdelocalizabilities are calculated according to the method described in (Schuurmann 1990A and Schuurmann 1990B:

D E (r), the donor (electrophilic) delocalizabilities of a reactant's centre r according to Fukui et al. 1961 can be defined within all-valence electron schemes as described in Schuurmann 1990A, according to the following formula 2: occ

(2)

In these formulas (Schuurmann 1990A) the outer sums go over all occupied ('occ') molecular orbitals / of the molecule in the self-consistent field (SCF) ground state, and the inner sums put together the contributions of all atomic orbitals σ, belonging to the center r of interest. In particular, c ai is the linear combination of atomic orbitals - molecular orbitals (LCAO-MO) coefficient of atomic orbital σ, at center r in the molecular orbital / ' , ε £ is the energy of the /- th molecular orbital and a is according to formula 3 below defined as the average of the HOMO and LUMO energies, i.e.

_ 1

(3)

In the context of the present invention:

• Given a specific conformer, the maximum of the atomic descriptor donor (electrophilic) superdelocalizability D E (where the maximum is calculated among all nitrogen atoms of the conformer) will be called the conformational maximum of donor (electrophilic) superdelocalizability and denoted by D E Co nformer-

• Given a specific structure, the maximum of D E Co nformer on all available conformers (i.e. a set of one or more conformers) of the structure will be called the structural maximum of donor (electrophilic) superdelocalizability and denoted by D E str ucture- In other words, D E str ucture will be equal to the maximum of the atomic descriptor donor

(electrophilic) superdelocalizability D E where the maximum is calculated among all nitrogen atoms of all available conformers of the given structure.

Following the definition, we note that D E Co nformer is a conformational descriptor and D E structure is a structural descriptor.

Decision tree classifier: A non-parametric machine learning technique. A decision tree can be used as a predictive model which maps observations about an item to conclusions about the item's target value. More descriptive names for such tree models are classification trees or regression trees. In these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. In the example of a QSAR decision tree model, observations regarding an item are for example various descriptors of the structure and physico-chemical nature of a compound. Minimal diameter of a specific conformation of a conformer (DiamMin): The descriptor is defined as the minimum distance between two parallel planes circumscribing a conformer (Dimitrov et al 2003, see also Brooke and Cronin 2009).

Let / be a finite set of points in R 3 so that each atom of a conformer is represented by exactly one element id which has the same three-dimensional coordinates x„ y„ z, as the atom. Then DiamMin is defined by formula 4:

DiamMin = min d 2 — d

a,b,c,d l d 2 R

d 1 <d 2

Viei (p l +d 1 )(pi+d 2 )≤0

Pi=axi+byi+czi

(4) where R is the real line. Maximum diameter of a specific conformation of a conformer (DiamMax): The descriptor is defined as the diameter of the smallest sphere circumscribing a conformer (Dimitrov et al 2003, Brooke and Cronin 2009). Let / be a finite set of points in R 3 so that each atom of the conformer is represented by exactly one element iel which has the same three-dimensional coordinates x„ y„ z, as the atom. Then DiamMax is defined by formula 5:

DiamMax = min r

viei -x 0 ) 2 +(yi-y 0 ) 2 +Oi-z 0 ) 2 2

(5) where R is the real line.

Van der Waals surface area of a specific conformer (VAN_D_WAALS_SUR.): Is defined in by conventional methods as the area of a surface formed by the spheres of van der Waals radii around the atoms of the conformer (as described in Meyer 1985). The Van der Waals surface area is calculated by conventional algorithms including tables for the radii of different atoms. One such algorithm is presented in (Gaudio and Takahata 1992). Another method for calculation of the Van der Waals surface area is for example the proprietary algorithm implemented in OASIS Database Manager 1 .7.3, commercially available by the Laboratory of Mathematical Chemistry, University of Burgas, Bulgaria (http://oasis-lmc.org), which in turn uses the free software MOPAC to calculate some of its descriptors.

Self-polarizability of a nitrogen atom (POLAR): Self-polarizability TT s (r) of an atom r in a conformer was introduced as a reactivity measure for π electron systems by (Coulson and Longuet-Higgins 1947) ; the all-valence electron formula is defined by formula 6:

(6) following http://openmopac.net/manual/super.html. Note that the meaning of the quantities used in the formula is the same as in the definition of donor (electrophilic) delocalizability in formula (2), namely the outer sums go over all occupied ('occ') molecular orbitals / of the conformer in the self-consistent field (SCF) ground state, the middle sums go over all unoccupied (Vac') molecular orbitals k of the conformer in the self-consistent field (SCF) ground state and the inner sums put together the contributions of all atomic orbitals σ, belonging to the center r of interest. In particular, c ai is the linear combination of atomic orbitals - molecular orbitals (LCAO-MO) coefficient of atomic orbital σ, at center r in the molecular orbital / ' , ε¾ is the energy of the /- th molecular orbital and a is according to formula 3 defined as the average of the HOMO and LUMO energies.lt will be understood that the self-polarizability of a nitrogen atom can be calculated from the above formula where r is nitrogen.

Population Lowest Unoccupied Molecular Orbital energy (POP_ LUMO): is the population LUMO (Lowest Unoccupied Molecular Orbital) energy, a descriptor assessing the partial electron densities of the nitrogen atom in the frontier orbital (from MOPAC 93). The descriptor can be calculated by proprietary software, OASIS

Database Manager 1 .7.3, commercially available by the Laboratory of Mathematical Chemistry, University of Burgas, Bulgaria (http://oasis-lmc.org), which in turn uses the free software MOPAC to calculate some of its descriptors. Models for predicting hERG ion channel binding

Predictive models (also called prediction methods herein) are commonly used in drug- design today as a low-cost tool for screening chemical compounds for biological activity. Predictive modelling is particularly useful for prediction of toxicity. Early-stage screening of the toxicity of compounds can potentially focus the drug development on compounds which have relatively low toxicity, and thereby avoid large costs and time spent in the development of drugs which are later found to be toxic in clinical trials. The predictive quality of models for chemical compounds is closely connected to the quality and number of data available for training, and the complexity of the interaction between the target molecule (for example a receptor or an ion channel) and the binding compound (the ligand). A low quality of data in combination with a high complexity of the interaction results in low predictive performance. Therefore, methods for increasing the quality of the data and/or reducing the complexity of the modelled binding are beneficial in the development of models for prediction.

The hERG ion channel is a promiscuous receptor and is a target for compounds of highly variable structure and with diverse physical-chemical properties. This feature of the hERG ion channel increases the complexity of predictive models for hERG inhibition, and makes the task of developing high-perfomance predictive models more difficult.

The inventors of the present invention have surprisingly found that the division of a training set according to the acidic and/or zwitterionic status of compounds can effectively reduce the complexity of the predictive model of hERG ion channel inhibition and lead to high predictive performance and/or a reduction of the number of descriptors of a model developed from the training set. In one aspect, the present invention provides a method for developing models of hERG ion channel inhibition wherein the training set for development of the model is divided based on the ionization tendency of the compounds, such as for example based on the pK a (acidic) and/or pK b (basic) and/or the relationship between pK a (acidic) and pK a (basic) in an zwitterionic ampholyte. More preferably, the present invention provides a method wherein a predictive model of hERG ion channel inhibition (IC 50 ) is obtained by training on a set of compounds selected from a group consisting of acids and/or zwitterionic compounds, such as for example a data set confined of compounds having either at least one acidic group and either no basic ionogenic groups or both acid and basic ionogenic groups and pK a (acidic) < pK a (basic).

Predictive models for hERG ion channel inhibition according to the present invention may be developed by use of any machine-learning method conventionally used in the field, such as, but not limited to, partial least squares, artificial neural networks, support vector machines, decision trees, Bayesian probabilistic methods, self-organizing maps, recursive partition methods and genetic algorithm. In a preferred embodiment of the present invention, the predictive model is using a decision tree. In order to optimize the use of all available experimental data on hERG and achieve the largest possible training set a binary classification approach may be chosen for the prediction of hERG channel inhibition. This approach is for example useful for data points were the activity is only available as 'less than' or 'greater than' a specific value. Moreover, as data may have been generated by different protocols and as inter- laboratory variation may occur, the IC 50 results may not be the optimal basis for making a so-called continuous model, since experimental errors lead to different results, which will have an influence on training of a predictive model. The binary classification approach has two possible response variable outcomes, i.e. either a positive prediction (for example equivalent to a hERG inhibiting activity) or a negative prediction (for example equivalent to a no inhibiting activity). This is in contrast to the continuous prediction approach where the exact IC 50 value of a compound is predicted. The present inventors have found that the use of a binary classification model is useful for prediction of hERG ion channel inhibition. Thus in a preferred embodiment of the present invention, the predictive model is a binary classification model. Such a binary classification model can be developed by use of a binary decision tree classifier. Thus in a preferred embodiment of the present invention, the predictive model is developed by use of a binary decision tree classifier.

Algorithms for construction of binary decision tree classifiers split the training set in two so that the content of inhibitors and non-inhibitors is different enough between the two partitions, according to predefined fitness criteria. If the split is not sufficient to achieve the desired classification accuracy, one or both parts are in turn partitioned, until a stop condition is met. Stop conditions relate to the achieved precision (e.g. concordance) as a result of the split. A skilled person will appreciate that various conventionally used stop conditions may be used in the development of a binary decision tree classifier.

See5 is a state-of-the-art classifier construction system (Quinlan (1993 and 1997)) using decision trees, a non-parametric machine-learning technique. See5 and its predecessors use formulas based on information theory to evaluate the "goodness" of a test; in particular, they choose the test that extracts the maximum amount of information from a set of cases, given the constraint that only one attribute is tested. To this end, the entropy criterion formula 7 is used:

Entropy = -∑f = 1 ¾# 2 ^ (7) where N is the total number of observations, k the number of classes and n y is the number of observations belonging to each class. The entropy of an information item is a measure of its randomness or uncertainty or can be taken as a measure of the average amount of information that is supplied by the knowledge of the information item. In a preferred embodiment of the present invention, the decision tree is based on the See5 algorithm or predecessors of See5 as described above.

In one embodiment of the present invention, the training set for developing a model of hERG channel inhibiting activity consists of a set of active chemicals (below a certain threshold for IC50), a set of inactive chemicals (over a certain, possibly different from the first, threshold for IC50) and a set of chemicals of marginal activity (with IC50 between the two thresholds, in case they are different). The set of marginals may or may not be used for training. Introducing a set of marginals may also be beneficial for the model performance because a single theoretical breakpoint between active chemicals and inactive chemicals is difficult to define, and specifically in the cases where there is variation in the experimental test results for hERG blocking affinity. Even with a known theoretical breakpoint, data points close to the breakpoint will with higher probability be misclassified. Such mis-classified data points, if included in the basis of the modelling may 'confuse' the training of the predictive model. In addition, some data points may only describe the activity as 'less than' or 'greater than' a specific value, for example such as '< 10 μΜ' or '≥ 40 μΜ'. In a more preferred embodiment of the present invention, the training set for developing a model of hERG channel inhibiting activity is confined to compounds having IC 50 < about 10 μΜ (active compounds) and compounds having IC 50 ≥ about 40 μΜ (inactive compounds). Thus, in one embodiment of the present invention, when the predictive model is a binary classification model, a negative prediction (related to a compound that is not a hERG ion channel inhibitor) is associated with a hERG IC 50 ≥ 40 μΜ and/or a positive prediction (related to a compound that is a hERG ion channel inhibitor is associated with a hERG IC 50 < 10 μΜ.

Descriptors of methods for prediction of hERG channel inhibition

Several types of descriptors of physic-chemical properties of compounds are known in the art and often used in predictive models of hERG channel inhibition and in QSAR models in general. Such descriptors may be useful in methods of the present invention.

The role of conformers in prediction methods

The three-dimensional structure of a ligand and/or its target is commonly used for modelling of binding or biological activity since three-dimensional structure often holds important information regarding the properties of the modelled interaction. The three- dimensional conformation of most molecules varies in aqueous solution and often there are multiple stable conformers of the same ligand found close to the conformer of the lowest energy. This means that multiple conformers can potentially be energetically favourable for binding.

The conformational variation of a chemical compound can be calculated by use of a number of conventional methods in the field, for example freely available tools such as BALLOON, CONFAB, FROG2, and RDKIT, and commercial tools such as OMEGA, Catalyst and MOE, or as was done for the present invention by use of the GAS algorithm (Mekenyan 2005). The GAS algorithm is a method for coverage of the conformational space of highly flexible chemicals by a limited number of conformers.

The GAS algorithm employs a genetic algorithm to minimize 3D similarity among the generated conformers. This makes the problem computationally feasible even for large, flexible molecules, at the cost of non-deterministic character of the algorithm. In contrast to traditional genetic algorithms, the fitness of a conformer is not quantified individually, but only in conjunction with the population it belongs to. The approach handles the following stereo-chemical and conformational degrees of freedom: rotation around acyclic single and double bonds, inversion of stereo-centers, flip of free corners in saturated rings, reflection of pyramids on the junction of two or three saturated rings. The fitness function based on maximization of RMS distance between conformers is combined with Shannon function accounting for evenness of conformer distribution across conformational space and a procedure is included for automated determination of the number of conformers needed for an appropriate coverage of conformational space (Mekenyan 2005). When strained conformers are obtained by any of the algorithms the possible violations of imposed geometric constraints are corrected with a strain-relief procedure (pseudo molecular mechanics; PMM) based on a truncated force field energy-like function, where the electrostatic terms are omitted. Geometry optimization of conformers is further completed by quantum-chemical methods. MOPAC 93 (Stewart 1990 and 1993) is employed by making use of the AM1 Hamiltonian. Next, the conformers are screened to eliminate those whose heat of formation, ΔΗ, 0 , is greater from the ΔΗ, 0 associated with the conformer with absolute energy minimum by more than a specified threshold (the default value used by the OASIS Database Manager software is 20 kcal/mol). Subsequently, conformational degeneracy, due to molecular symmetry and geometry convergence is detected within a user defined torsion angle resolution (Mekenyan 2005).

As a consequence, the set of conformers generated for a given 2D structure can be used as an approximation for the entire conformational variety of the structure and used to formulate and test hypotheses about it. In particular, maximum and minimum values of conformational parameters (e.g. effective cross-sectional conformer diameter) across the range of available conformers can be used as an approximation of the actual maximum and minimum values of the parameters in the entire conformational space of the structure. We will use the expressions 'the maximum or minimum of a parameter where the maximum or minimum is calculated on all conformers of the structure' referring to a set of one or more available conformers of the structure (which can be generated using a suitable procedure for covering the conformational space with a limited number of conformers, such as the GAS algorithm). Thus, in the context of the present invention, the term "all conformers" or "all available conformers" denote a set, a selection or group of conformers that is representative for the conformational variance of a given chemical compound or structure. Such a set of conformers may according to the present invention consist one or more conformers, such as 1 to 500 conformers, or such as 1 to 200 conformers, or such as 1 to 100 conformers, or such as 1 to 50 conformers, or such as 1 to 30 conformers, or such as 1 to 15 conformers, or such as 1 to 10 conformers, or such as 1 to 5 conformers; at least 5 conformers such as 5 to 10 conformers, or such as 5 to 30 conformers, or such as 5 to 50 conformers, or such as 5 to 100 conformers, or such as 5 to 200 conformers, or such as 5 to 500 conformers; at least 10 conformers such as 10 to 30 conformers, such as 10 to 50 conformers, or such as 10 to 100 conformers, or such as 10 to 200 conformers, or such as 10 to 500 conformers; at least 30 conformers such as 30 to 50 conformers, or such as 30 to 100 conformers, or such as 30 to 200 conformers, or such as 30 to 500 conformers; at least 50 conformers such as 50 to 100 conformers, or such as 50 to 200 conformers, or such as 50 to 500 conformers; or at least 75 conformers; or at least 100 conformers, such as 100 to 200 conformers, or such as 100 to 500 conformers; at least 200 conformers such as 200 to 500 conformers, or a least 500 conformers.

Atomic descriptors

Atomic descriptors are used to describe attributes of a specific atom or a specific type of atom in a specific conformation of a chemical structure. Atomic descriptors may have different values for different atoms in the same conformer or different values for different conformers of the same structure. Thus, in one embodiment of the present invention, the predictive model uses one or more atomic descriptors, and/or one or more descriptors derived from atomic descriptors. Thus in one embodiment of the present invention, the prediction method uses one or more atomic descriptors and/or one or more descriptors derived from atomic descriptors for example such as one or more atomic descriptors selected from the group consisting of donor (electrophilic) superdelocalizabilities, acceptor (nucleophilic) superdelocalizabilities, atomic self- polarizability,. Such atomic descriptors may be calculated using conventional protocols in the field, for example as implemented by OASIS Database Manager v. 1 .7.3 described in http://oasis-lmc.org and Nikolov et al. 2006 or Molecular Orbital PACkage (MOPAC) described in http://openmopac.net/manual.

In another embodiment the prediction method uses one or more atomic descriptors and/or one or more descriptors derived from atomic descriptors, wherein such atomic descriptors are for example selected from the group consisting of donor (electrophilic) superdelocalizability (Donor DLC), atomic self-polarizability (POLAR), and partial electron densities of the nitrogen atom in the frontier orbital (POP_LUMO). The inventors have found that descriptors of the reactivity of nitrogen atoms are particularly useful in the methods of the present invention. Therefore in a preferred embodiment of the present invention, the prediction method uses a descriptor of the reactivity of nitrogen atoms in a set of one or more conformers such as one or more atomic descriptors and/or one or more descriptors derived from a selected type of atomic descriptors, wherein said atomic descriptor is calculated on the nitrogen atoms of each conformer in a set of one or more conformers, and wherein said atomic descriptor is selected from the group consisting of donor (electrophilic) superdelocalizabilities (Donor DLC), atomic self-polarizability (POLAR) and partial electron densities of the nitrogen atom in the frontier orbital (POP_LUMO).

In other, even more preferred embodiments, said atomic descriptor is calculated on the nitrogen atoms of each conformer in a set of one or more conformers, and said atomic descriptor is selected from the group consisting of donor (electrophilic) superdelocalizabilities (Donor DLC) and atomic self-polarizability (POLAR).

Conformational descriptors

Conformational descriptors are used to describe attributes of a specific conformation of a chemical structure. Different conformers of a given chemical compound can have variable biological activity, and thus conformational descriptors calculated for each conformer of a given chemical compound or structure may be used in prediction methods of hERG ion channel inhibition. Such conformational descriptors may in general have different values for different conformers of the same structure. In one embodiment the predictive model uses one or more conformational descriptors calculated from one or more conformers of a chemical compound or structure. Conformational descriptors according to the present invention include for example descriptors of volume, surface descriptors, frontier molecular orbitals energies, geometric indices such as effective cross-sectional diameter, maximum distance between atoms (maximum diameter), planarity index, polarizability, electronegativity, heat of formation, geometric topological indices calculated for each individual conformer or for the ensemble of conformers are used in a model for prediction of hERG ion channel inhibition. Such conformational descriptors may be calculated using conventional protocols in the field, for example as implemented by OASIS Database Manager v. 1 .7.3 described in http://oasis-lmc.org and Nikolov et al. 2006.

The inventors of the present invention have found that descriptors of the size of one or more conformers are particularly useful in the methods of the present invention. Therefore in a preferred embodiment of the present invention, the prediction methods use a descriptor of the size of one or more conformers such as one or more conformational descriptors and/or one or more descriptors derived from conformational descriptors, wherein said conformational descriptors are selected from the group consisting of minimal diameter (DiamMin), maximum diameter (DiamMax), Van der Waals surface (VAN_D_WAALS_SUR) and effective cross-sectional diameter (DiamEff).

In a most preferred embodiment, the prediction method uses a descriptor of the size of one or more conformers and the descriptor is the effective cross-sectional diameter (DiamEff) or a descriptor derived from the effective cross-sectional diameter(DiamEff) of a set of conformers, such as for example MaxDiamEff.

Conformational descriptors derived from atomic descriptors

In order to take into account the diversity between different atoms of the same conformer, it may be useful to derive conformational descriptors defined by deriving a conformational descriptor by taking the maximum of a selected type of atomic descriptor d conf or all atoms of a given conformer as defined by formula 8: dconf = max d(A) (8) where the maximum is taken on all atoms A of the conformer, or on specific atoms of the conformer, such as for example oxygen (O), nitrogen (N) and/or carbon (C) atoms. Such a descriptor will then be an attribute of a whole conformer and not only of an individual atom, therefore it is used as a conformational descriptor. Thus, in this case, a conformational descriptor is derived from atomic descriptors.

In one embodiment of the present invention, the prediction model uses maximum of atomic descriptors on all atoms of a given conformer such as for example selected from the group consisting of donor (electrophilic) superdelocalizability, acceptor (nucleophilic) superdelocalizability, atomic self-polarizability, atomic charge, all atomic descriptors that can be calculated using conventional methods in the field such as the protocols implemented in the OASIS Database Manager v. 1 .7.3 as described in http://oasis-lmc.org and Nikolov et al. 2006. In one embodiment the prediction model uses maximum of atomic descriptors of a given conformer selected from atomic descriptors of oxygen (O), nitrogen (N) and/or carbon (C) atoms.

In a preferred embodiment the prediction model uses the maximum of atomic descriptors of the acceptor superdelocalizability D N of a given conformer and/or the maximum of the donor (electrophilic) superdelocalizability D E of a given conformer calculated on all atoms of a given conformer or selected from atomic descriptors of oxygen (O), nitrogen (N) and/or carbon (C), wherein the maximum of the donor (electrophilic) superdelocalizability D E on nitrogen atoms of a given conformer (D E conformer) is more preferred.

In other preferred embodiments, the prediction model uses one or more conformational descriptors derived by taking the maximum of the values of a selected atomic descriptor on all nitrogen atoms of a given conformer, wherein said atomic descriptors are selected from the group consisting of donor (electrophilic) superdelocalizability (Donor DLC), atomic self-polarizability (POLAR) and partial electron densities of the nitrogen atom in the frontier orbital (POP_LUMO). In another preferred embodiment, the prediction model uses a conformational descriptor derived by taking the maximum of donor (electrophilic) superdelocalizability (Donor DLC) descriptors on all nitrogen atoms of a given conformer. Structural descriptors

Structural descriptors describe attributes of a chemical structure as a whole rather than of individual conformations or atoms. In one embodiment of the present invention, the predictive model thus uses one or more structural (two-dimensional) descriptors, such as for example lipophilicity, topological index (InfoWiener) related to the sum of interatomic distances, counts of the numbers of atoms, bonds, and rings, as well as acidic association constants (pK a ) and basic dissociation constancts (pK b ). Such structural descriptors may be calculated by using conventional methods in the field, for example such as by using protocols implemented in OASIS Database Manager v. 1 .7.3 by Laboratory of Mathematical Chemistry, University of Bourgas, Bulgaria (http://oasis- lmc.org, Nikolov et al. 2006).

Acidic dissociation constants, basic dissociation constants, pKa, pKb may also be calculated by using other conventional methods in the field, such as for example the protocols of ACD/ToxSuite 2.95 by ACD/Labs ((http://www.acdlabs.com/ products/admet/tox/, Juska 2008)), or the default tools mentioned in Mannallack et al. In a preferred embodiment of the present invention, the acidic dissociation constants, basic dissociation constants, pKa, pKb are calculated by using protocols of ACD/ToxSuite 2.95 by ACD/Labs ((http://www.acdlabs.com/products/admet/tox/)).

The ionization state of a chemical compound can influence the hERG ion channel inhibition. Tendencies of ionization are reflected in acidic dissociation constants, basic dissociation constants, pK a , pK b . In one embodiment of the present invention, the predictive model uses at least one structural descriptor of the ionization state of a chemical compound or conformer such as for example one or more descriptors of the selected from the group consisting of acidic dissociation constants, basic dissociation constants, pK a and pK b . In a more preferred embodiment of the present invention, the predictive model uses a descriptor of pK a (acidic). In an even more preferred embodiment of the present invention the predictive model uses a descriptor of pK a (acidic), wherein the acidic dissociation constants, basic dissociation constants, pKa, pKb are calculated by using protocols of ACD/ToxSuite 2.95 by ACD/Labs as described in ((http://www.acdlabs.com/ products/admet/tox/)).

Structural descriptors derived from conformational descriptors

In order to take into account the diversity between different conformers of the same structure (chemical compound), it may be useful to derive structural descriptors defined by taking the maximum of a selected type of conformer descriptor d conf on all conformers of a given structure as shown in formula 9: Max_d = max d(c), Min_d = min d(c) (9) where c ranges over all conformers of a given structure (approximated by all generated conformers of the structure, i.e. a set of one or more conformers).

Such a descriptor will then be an attribute of a whole structure and not only of an individual conformer; therefore it is used as a conformational descriptor.

Note that the conformational descriptor used for derivation of a structural descriptor may be in turn derived from an atomic descriptor using the procedure above.

Accordingly, in one embodiment of the present invention, the prediction model uses one or more structural descriptors derived from conformational descriptors by taking the maximum and/or the minimum of a conformational descriptor wherein the conformational descriptors are for example selected from effective cross-sectional conformer diameter, included volume and surface descriptors, frontier molecular orbitals energies, geometric indices such as effective cross-sectional diameter, maximum diameter, planarity index, polarizability, electronegativity, heat of formation, geometric topological indices.

In a preferred embodiment, the predictive model uses a structural descriptor of the maximum of a conformational descriptor, such as for example the maximum of the descriptor of effective cross-sectional conformer diameter (DiamEff).

Descriptors derived from conformational descriptors of the size of a compound are for example structural descriptors derived by taking the maximum value of a conformational descriptor of the size of a conformer calculated on a set of one or more conformers. Such structural descriptors are then a descriptor of the size of a compound. In another preferred embodiment, the predictive method of the present invention uses a structural descriptor derived by taking the maximum of a selected type of conformational descriptor on a set of one or more conformers, wherein said conformational descriptor is selected from the group consisting of minimal diameter (DiamMin), maximum diameter (DiamMax), Van der Waals surface (VAN_D_WAALS_SUR) and effective cross-sectional diameter (DiamEff). Such structural descriptors are thus selected from the group consisting of Max DiamMin, MaxDiamMax, MaxVAN_DWAALS_SUR and MaxDiamEff. In a most preferred embodiment, the predictive method uses a structural descriptor (Max DiamEff) derived by taking the maximum of the effective cross-sectional diameter (DiamEff) for a set of one or more conformers.

In another preferred embodiment, the predictive method uses a structural descriptor derived by taking the maximum of a selected type of conformational descriptor on a set of one or more conformers, wherein said conformational descriptor is derived taking the maximum of a selected type of atomic descriptor on all nitrogen atoms of a given conformer, and wherein said atomic descriptor is selected from the group consisting of donor (electrophilic) superdelocalizability (Donor DLC), atomic self-polarizability (POLAR) and partial electron densities of the nitrogen atom in the frontier orbital (POP_LUMO). Such a structural descriptor is then a structural descriptor (derived from an atomic descriptor) of the reactivity of nitrogen atoms in a set of one or more conformers. When such a structural descriptor is derived from the descriptor Donor DLC it is called MaxDonor DLC or D E str ucture herein. When such a structural descriptor is derived from the descriptor POLAR it is called MaxPOLAR herein. When such a structural descriptor is derived from the descriptor POP_LUMO it is called MaxPOPJJJMO herein.

Thus, according to the present invention, the prediction method can use a descriptor of the reactivity of nitrogen atoms in a set of one or more conformers which is either a) an atomic descriptor, or b) a conformational descriptor, or c) a structural descriptor.

In such embodiments, descriptors derived from atomic descriptors are for example a) conformational descriptors derived by taking the maximum value of atomic descriptors calculated on all nitrogen atoms in a conformer or b) structural descriptors derived by taking the maximum value of a) in a set of one or more conformers.

In a most preferred embodiment, the prediction method uses a descriptor of the reactivity of nitrogen atoms which is:

a) the donor (electrophilic) superdelocalizabilities (Donor DLC) on all the nitrogen atoms in a conformer, or

b) a descriptor derived from donor (electrophilic) superdelocalizabilities (Donor DLC) on the nitrogen atoms, such as:

i) a conformational descriptor derived by taking the maximum value of Donor_DLC calculated on all nitrogen atoms in a conformer (D E Co nformer), or ii) a structural descriptor derived by taking the maximum value of i) in a set of one or more conformers (D E st ructure) ,

wherein the descriptor of ii) D E st ructure is most preferred.

Thus, in another preferred embodiment, the predictive model uses a structural descriptor of the maximum of the donor (electrophilic) superdelocalizability D E on nitrogen atoms of all available conformers of a given chemical compound or structure

(D E structure) -

In one embodiment of the present invention, the prediction method uses a combination of all descriptors a) to c) below:

a) a descriptor of the size of one or more conformers, and

b) a descriptor of the reactivity on nitrogen atoms of one or more conformers, and c) a descriptor of the acidity (pK a (acidic)) of a compound.

In a more preferred embodiment of the present invention, the prediction method uses a combination of all the descriptors a) to c) below:

a) a descriptor of the size of one or more conformers, and

b) a descriptor of the reactivity on nitrogen atoms of one or more conformers, and c) a descriptor of the acidity (pK a (acidic)) of a compound,

wherein the descriptor of the size of a conformer or a compound is a conformational descriptor selected from the group consisting of minimal diameter (DiamMin), maximum diameter (DiamMax), Vander der Waals surface (VAN_D_WAALS_SUR) and effective cross-sectional diameter (DiamEff), or a structural descriptor selected from the group consisting of Max DiamMin, MaxDiamMax, MaxVAN_D_WAALS_SUR and MaxDiamEff, and wherein the structural descriptors of the size are most preferred, and wherein the descriptors of the reactivity on nitrogen atoms is a structural descriptor derived from a selected type of atomic descriptor calculated on all nitrogen atoms of one or more conformers, said descriptor of the reactivity on nitrogen atoms being selected from the group consisting of D E st ructure, MAX POLAR, and MAXPOP_LUMO.

In a still more preferred embodiment, the prediction method uses a combination of: a) a descriptor of the size of one or more conformers, and

b) a descriptor of the reactivity on nitrogen atoms of one or more conformers, and c) a descriptor of the acidity (pK a (acidic)) of a compound,

wherein the descriptors of the size of one or more conformers is a structural descriptor selected from the group consisting of Max DiamMin, MaxDiamMax,

MaxVAN_D_WAALS_SUR and MaxDiamEff , and

wherein the descriptors of the reactivity on nitrogen atoms is a structural descriptor derived from a selected type of atomic descriptor calculated on all nitrogen atoms of one or more conformers selected from the group consisting of D E st ructure, MAX POLAR, and MAXPOP LUMO. In an even more preferred embodiment, the prediction method of the present invention uses a combination of the descriptors a) to c) comprising:

a) a descriptor of effective diameter of one or more conformers of a chemical compound and,

b) a descriptor of donor (electrophilic) superdelocalizability on nitrogen atoms of one or more conformers a chemical compound, and

c) a descriptor of pKa (acidic).

In a most preferred embodiment of the present invention, the predictive model or method uses a combination of all descriptors a to c below :

a) a structural descriptor of the maximum of donor (electrophilic) superdelocalizability, D E st ructure, equal to the maximum of the donor (electrophilic) superdelocalizability D E on nitrogen atoms of one or more conformers of a given chemical compound or structure, and b) a structural descriptor of the maximum of a conformational descriptor of effective cross-sectional conformer diameter calculated for one or more conformers of a given chemical compound or structure (MaxDiamEff), and c) a descriptor of pK a (acidic).

In another preferred embodiment the prediction model uses a combination of descriptors comprising:

a) a structural descriptor of the maximum of the donor (electrophilic) superdelocalizability D E on nitrogen atoms of all available conformers of a given chemical compound or structure (D E st ructure) and

b) a structural descriptor derived by taking the maximum of effective cross- sectional conformer diameter for all available conformers of a given chemical compound or structure (MaxDiamEff). In another preferred embodiment of the present invention, the predictive model uses a combination of descriptors comprising:

a) a structural descriptor of the maximum of the donor (electrophilic) superdelocalizability D E on nitrogen atoms of all available conformers of a given chemical compound or structure (D E st ructure) , and/or

b) a structural descriptor derived by taking the maximum of effective cross- sectional conformer diameter for all available conformers of a given chemical compound or structure (MaxDiamEff), and/or

c) a descriptor of pK a (acidic). In a more preferred embodiment of the present invention, the predictive model uses a combination of all descriptors a to c below:

a) a structural descriptor of the maximum of donor (electrophilic) superdelocalizability, D E st ructure, equal to the maximum of the donor (electrophilic) superdelocalizability D E on nitrogen atoms of all available conformers of a given chemical compound or structure, and/or

b) a structural descriptor of the maximum of a conformational descriptor of effective cross-sectional conformer diameter calculated for all available conformers of a given chemical compound or structure (MaxDiamEff), and/or c) a descriptor of pK a (acidic). In order to increase the transparency of the results of a prediction model, it is favorable to have predictive models using relatively few descriptors. Few descriptors also reduce the risk of a predictive model being over-trained. However, the complexity of binding interactions and promiscuity of a receptor often results in the use of multiple descriptors in a prediction model.

The inventors of the present invention have surprisingly found that a predictive model of hERG channel inhibition can be developed which only uses 1 to 10 descriptors, such as 1 to 5 descriptors, preferably such as 1 to 3 descriptors, such as for example 1 descriptor or 2 descriptors or 3 descriptors.

When the prediction model involves the use of a classification decision tree, rules may be useful which are based on one or more of the descriptors 1 ) if the compound comprises a nitrogen atom, 2) the pKa (acidic) of the compound, 3) the maximum donor (electrophilic) superdelocalizability on the nitrogen atoms, 4) the maximum conformer effective cross-sectional diameter.

Predictive thresholds are values of descriptors can be used to associate a given compound with an biological activity, such as for example hERG ion channel inhibitors and non-inhibitors. Predictive thresholds according to the present invention may vary depending on the data used for training of the methods. According to the present invention, the predictive threshold of pKa (acidic) may be in the range of about 0 to about 16, more preferably such as about 2 to about 8, such as about 4 to about 6, such as about 4, or such as about 5, or such as about 6, wherein a value of about 5 is most preferred.

According to the present invention, the predictive threshold of maximum donor (electrophilic) superdelocalizability on the nitrogen atoms of all conformers (D E st ructure) may be in the range of about 0.1 a.u./eV to about 0.4 a.u./eV, such as within the range of about 0.2 a.u./eV to about 0.3 a.u./eV, such as about 0.25 a.u./eV to about 3 a.u./eV, such as about 0.26 a.u./eV to 0.28 a.u./eV, such as about 0.265 a.u./eV, such as about 0.27 a.u./eV, such as about 0.275 a.u./eV to 0.280 a.u./eV, such as about 0.275 a.u./eV, or such as about 0.276 a.u./eV, or such as about 0.277 a.u./eV, or such as about 0.278 a.u./eV, or such as about 0.279 a.u./eV, or such as about 0.280 a.u./eV, or such as about 0.290 a.u./eV about 0.3 a.u./eV. According to the present invention, the predictive threshold of maximum conformer effective cross-sectional diameter calculated on all conformers of a given chemical compound or structure (MaxDiam Eff) may be in the range of about 5 A to 15 A, such as 8 A to 12 A, such as 9 A to 1 1 A, such as about 9 A, or such as about 10 A, such as 10 A to 10.5 A, such as about 10.1 A, or such as about 10.2 A, or such as about 10.3 A, or such as in the range of 10.3 A to 10.4 A, such as about 10.31 A, or such as about 10.32 A, or such as about 10.33 A, or such as about 10.35 A, or such as about 10.36 A, or such as about 10.37 A, or such as about 10.38 A, or such as about 10.39 A, or such as about 10.4 A, or such as about 10.5 A, or such as about 1 1 A.

In one specific embodiment of the present invention, a predictive model is used wherein the predictive threshold of the pKa (acidic) may be in the range of about 0 to 16, such as about 2 to 8; and the predictive threshold of maximum donor (electrophilic) superdelocalizability on the nitrogen atoms of all conformers (D E st ructure) may be in the range of about 0.1 a.u./eV to about 0.4 a.u./eV, such as within the range of about 0.2 a.u./eV to about 0.3; and the predictive threshold of maximum conformer effective cross-sectional diameter calculated on all conformers of a given chemical compound or structure (MaxDiam Eff) may be in the range of about 5 A to 15 A, such as 8 A to 12 A, such as 9 A to 1 1 A.

In a more specific embodiment of the present invention, a predictive model is used wherein the predictive threshold of the pKa (acidic) is in the range of about 2 to 8, preferably about 5; and the predictive threshold of maximum donor (electrophilic) superdelocalizability on the nitrogen atoms of all conformers (D E str ucture) may be in the range of about 0.275 a.u./eV to 0.280 a.u./eV, more preferably about 0.278 a.u./eV; and the predictive threshold of maximum conformer effective cross-sectional diameter calculated on all conformers of a given chemical compound or structure (MaxDiam Eff) may be in the range of 9 A to 1 1 A, more preferably in the range of about 10.3 A to about 10.4 A, wherein a value of about 10.36 A is preferred.

In a preferred embodiment of the present invention, the predictive model is a classification decision tree using a rule defined as:

1 ) A positive prediction is returned if all conditions a), b), c) and d) are fulfilled, 2) A negative prediction is returned if condition a) is fulfilled, and one or more of the conditions b), c) and d) are not fulfilled,

wherein the conditions a), b), c) and d) are defined as:

a) The compound comprises a nitrogen atom,

b) pKa (acidic) > 5,

c) There exists a conformer such that 0.278 a.u./eV < maximum donor (electrophilic) superdelocalizability on the nitrogen atoms (D E Co nformer), d) Maximum conformer effective cross-sectional diameter (MaxDiamEff) > 10.36A

In another preferred embodiment of the present invention, the predictive model is a classification decision tree using a rule defined as:

1 ) A negative prediction is returned if condition a) is not fulfilled, and one or more of conditions e) and f) is fulfilled,

wherein the conditions a), e) and f) are defined as:

a) The compound comprises a nitrogen atom,

e) pKa (acidic) < 5,

f) Maximum conformer effective cross-sectional diameter (MaxDiamEff) < 10.36A. In an even more preferred embodiment of the present invention, the predictive model is a classification decision tree using rules defined as:

1 ) A positive prediction is returned if all conditions a), b), c) and d) are fulfilled,

2) A negative prediction is returned if condition a) is fulfilled, and one or more of the conditions b), c) and d) are not fulfilled,

3) A negative prediction is returned if condition a) is not fulfilled, and one or both of conditions b) and d) are not fulfilled

wherein the conditions a), b), c) and d) are defined as:

a) The compound comprises a nitrogen atom,

b) pKa (acidic) > 5,

c) there exists a conformer such that 0.278 a.u./eV < maximum donor (electrophilic) superdelocalizability on the nitrogen atoms (D E Co nformer),

d) Maximum conformer effective cross-sectional diameter (MaxDiamEff) > 10.36A.

In an even more preferred embodiment of the present invention, the predictive model is a classification decision tree using both rules defined and requiring that in case of the presence of nitrogen atoms both the maximum conformer effective cross-sectional diameter and the maximum donor (electrophilic) superdelocalizability on the nitrogen atoms condition should be fulfilled in the same conformer(s) in order for a positive prediction to be generated:

1 ) A positive prediction is returned if the all conditions a), b) and c) are fulfilled,

2) A negative prediction is returned if condition a) is fulfilled, and one or both of the conditions b) and c) are not fulfilled,

3) A negative prediction is returned if condition a) is not fulfilled, and one or both of conditions b) and d) are not fulfilled

wherein the conditions a), b), c) and d) are defined as:

a) The compound comprises a nitrogen atom,

b) pKa (acidic) > 5,

c) there exists a conformer such that 0.278 a.u./eV < maximum donor (electrophilic) superdelocalizability on the nitrogen atoms (D E Co n f ormer), and with an effective cross-sectional diameter > 10.36A.

d) Maximum conformer effective cross-sectional diameter (MaxDiamEff) > 10.36A

The applicability domain (AD) of a predictive model defines the physico-chemical, structural or biological space, knowledge or information on which the training set of the model has been developed, and for which it is applicable to make predictions for new compounds.

According to the present invention, a predictive model may have an AD comprising compounds that are acids and zwitterions. In a preferred embodiment of the present invention, a predictive model has an AD confined by compounds having

at least one acidic ionogenic group and either

a) no basic ionogenic groups, or

b) pKa (acidic) < pKa (basic).

In a preferred embodiment the AD of a predictive model is confined by compounds having 1 .3 < pKa (acidic) < 16. Thus, such a predictive model is a method for predicting hERG channel inhibiting activity of chemical substances selected from the group consisting of acids and zwitterionic compounds. In one embodiment, the present invention provides a predictive model which has an AD comprising compounds that have a maximum conformer effective cross-sectional diameter (MaxDiamEff) < 18.78A. In a preferred embodiment, AD is confined by compounds having 6.38A < Maximum conformer effective cross-sectional diameter (MaxDiamEff) < 18.78A.

In one embodiment, the present invention provides a predictive model which has an AD comprising compounds that have either:

a) at least one nitrogen atom and 0.1 14 a.u./eV < maximum donor (electrophilic) superdelocalizability on nitrogen atoms (D E st ructure)≤ 0.317 a.u./eV, or b) no nitrogen atom and one or more of the following conditions: pKa (acidic) < 5 or maximum conformer effective diameter is < 10.36 A.

Thus even more preferably the AD of the predictive models of the present invention includes compounds having 1 .3 < pKa (acidic) < 16, 6.38A < Maximum conformer effective cross-sectional diameter (MaxDiamEff) < 18.78A, and either:

a) at least one nitrogen atom and 0.1 14 a.u./eV < maximum donor (electrophilic) superdelocalizability on nitrogen atoms (D E str ucture)≤ 0.317 a.u./eV, or b) no nitrogen atom and one or more of the following conditions: pKa (acidic) < 5 or maximum conformer effective diameter is < 10.36 A.

The present invention provides methods for use in the development of predictive models of hERG ion channel inhibition as well as predictive models of hERG ion channel inhibition. In one aspect of the present invention, such methods and models are assisted by a computer. In another aspect of the present invention, such methods and models may be located on conventional computer storage media.

Thus one aspect of the present invention is a computer-assisted prediction method or predictive model as defined herein.

Another aspect of the invention is a computer program product comprising a computer- assisted prediction method as described herein.

Another aspect of the invention is a data carrier comprising a computer-assisted method as described herein. Examples

Example 1 : Preparation of data sets for training and validation of a predictive model of hERG inhibition

Below is an example of the construction of a data set for development of a hERG prediction method. It will be appreciated that other data sets which are constructed in a different way may also be useful for the development of a predictive model for hERG binding.

Experimental data on hERG blocking was taken from literature (Li 2005, Polak 2009, Obiol-Pardo 2010, Doddareddy 2010, Liu 2007). Based on threshold data used in the available literature sources, a threshold IC 50 value of 10 μΜ was used as an upper limit for the actives and 40 μΜ as a lower limit for the inactives. These values were chosen in order to maximize the number of chemicals that could be used to train and validate a binary classification model, because many of the published test data were listed only with an upper/lower value of hERG IC 5 o- Introducing an intermediate area between 10 and 40 μΜ was also beneficial to the quality of the data set in view of the previously observed varying level of inter- laboratory reproducibility of hERG tests.

Chemicals with contradictory activity according to the different sources (reported as < 10 μΜ and≥ 40 μΜ at the same time) were analysed with additional literature data and resolution was made depending on the available information, ignoring the chemical if final decision was impossible.

In several cases, data from (Liu 2007) were incorrectly reproduced from the original publication. For example, tests from (Keseru 2003) had -plC 50 reported as IC50. Furthermore, in four other cases (Brugel et al. 2010, Shaw et al. 2009, Marquis et al. 2009, Haga et al. 201 1 ), IC50 in μΜ was reported in (Liu et al. 2007) as IC50 in nM; the respective data points were corrected. The data from (Liu 2007) referring to (Keseru 2003) were ignored as the latter publication was also used in (Li 2005). The structures were imported into an OASIS Database Manager 1 .7.3 database (http://oasis-lmc.org, Nikolov et al. 2006), where structure correctness was checked and canonical SMILES codes were generated. All structures were then submitted to hydrolysis simulation and identification of salts as well as removal of mixtures, inorganics and chemicals containing toxic ions (e.g. heavy metals).

Duplicate structures and stereoisomers were identified using the concept of parent 2D structure. The parent 2D structure was taken to be the original 2D structure without any stereo information; for salts, the parent structure was then generated by removing the relevant (inorganic and small organic) counterions.

For every set of two or more structures sharing the same parent 2D structure, if all structures from the set belonged to the same activity class (either IC 50 < 10 μΜ or IC 50 ≥ 40 μΜ), the structure with the most expressed activity (the least IC 50 in case of actives and the highest IC 50 in case of inactives) was selected and the rest were removed from the data set. If the structures from the set had different activity classes, the whole set was removed.

All active structures from (Li et al.) were checked using additional literature data when necessary in order to determine if there was evidence about their activity being in the sub-10 μΜ range or not. In the latter case, the chemicals were excluded from the training set (as their IC 50 could not match either the definition of active or inactive in the present work). In two cases, we added chemicals to the training set based on K, and not IC 5 o- Doddareddy et al. (2008) have reported hERG tests of 60 chemicals (carried out as a means of estimating the quality of a hERG predictive model) in a radioligand assay and a 50% displacement of [3Hl astemizole. Eighteen of the 60 chemicals have exhibited more than 50% displacement. While IC 50 data were not available for all the 18 chemicals, the concentration of the test chemicals has been equal to 10 μΜ, and the authors have found a significant correlation between IC 50 estimated by patch-clamp methods and the displacement of the radioligand assay, so it was reasonable to accept these 18 chemicals as actives for the present model, given that astemizole has been reported (Taglialatela 1998) to be one of the most potent hERG blockers with an IC 50 of 480 nM. Another set of 24 chemicals was taken from (Murphy et al.), based on the correlation between K, for the dofetilide assay and hERG IC 5 o, found in (Diaz et al.).

Next, acid and base pK a constants were calculated using the default algorithm in ACD Labs ACD/ToxSuite 2.95 (http://www.acdlabs.com/ products/admet/tox/) as well as the other available algorithm in the same system, marked as pKa/ACDLabs. The macrodissociation constants were predicted for standard conditions (25 ^ and zero ionic strength) in aqueous solutions by a proprietary algorithm that uses microconstants predictions at the corresponding protonation sites. The algorithm is based on an internal training set of 17593 compounds (http://www.acdlabs.com/ products/admet/tox/). For every structure, the values of the pKa calculated by the default algorithm were compared the pKa calculated by the alternative algorithm in ACD Labs ToxBoxes 2.95, pKa-ACD/Labs; in case both algorithms found an acidic ionogenic group, the difference between the pKa values according to the two versions were required not to exceed 8, otherwise the pKa value was considered unreliable.

The resulting data set consisted of 1718 experimental data points, of which 1 21 5 were hERG inhibitors (IC 50 < 1 0 μΜ) and 503 non-inhibitors (IC 50 ≥ 40 μΜ). It was prior to any performance of modeling randomly split into a training set T (1 374 chemicals, or 80% of the data set) and a validation set V ! with the remaining 20% (344 chemicals). The ratio of hERG blockers to hERG non-blockers was maintained in the random selection for both the training and the validation sets.

A second validation set was compiled from the training chemicals of the predictive model for hERG blocking included in ACD/Labs ACD/ToxSuite 2.95 (http://www.acdlabs.com/ products/admet/tox/, Juska 2008). Salts and mixtures were identified and contradictory experimental results and duplicates were removed in the same way as for the main data set. The parent structure for each of these chemicals was compared to all parent structures in Ti and ; if any match was found, the structure was ignored. A set V 2 was thus constructed, having no structures in common with either Ti or ; moreover, no structures from the latter two sets were stereoisomers of, or salts of the same parent structure as any of the structures in V 2 . The set V 2 contained 242 chemicals, 1 25 of them actives (hERG IC 50 < 10 μΜ) and 1 17 inactives (hERG IC 50 ≥ 10 μΜ). Note the different inactivity threshold compared to the training and the first validation sets. „.

34

Example 2: Generation of conformers from structures of chemical compounds

In this example, the training and/or validation sets described in Example 1 were used, but other conventional methods of generation of conformers may also be useful for the present invention.

Three-dimensional structure generation and conformational multiplication were performed for all 1 71 8 chemicals of Ti and V ! as well as for the 242 structures of V 2 . The GAS algorithm (Mekenyan 1 999 and 2005) for coverage of the conformational space of highly flexible chemicals by a limited number of conformers was used. The fitness function based on maximization of RMS distance between conformers was combined with Shannon function accounting for evenness of conformer distribution across conformational space and a procedure was included for automated determination of the number of conformers needed for an appropriate coverage of conformational space (Mekenyan 2005).

When strained conformers are obtained by any of the algorithms the possible violations of imposed geometric constraints were corrected with a strain-relief procedure (pseudo molecular mechanics; PMM) based on a truncated force field energy-like function, where the electrostatic terms were omitted (Mekenyan 2005). Geometry optimization was further completed by quantum-chemical methods. MOPAC 93 (Stewart 1990 and 1993) was employed by making use of the AM1 Hamiltonian. Next, the conformers were screened to eliminate those whose heat of formation, ΔΗ, 0 , was greater from the ΔΗ ° associated with the conformer with absolute energy minimum by more than a specified threshold (the default value used by the OASIS Database Manager software is 20 kcal/mol). Subsequently, conformational degeneracy, due to molecular symmetry and geometry convergence was detected within a user defined torsion angle resolution (Mekenyan 2005). In the present example, a set of maximum 30 conformers that represented the conformational space was generated for each chemical compound and used for calculation of descriptors as mentioned in Example 3. Example 3: Calculation of structural, conformational and atomic descriptors for training and validation of a predictive model of hERG inhibition

In this example, the data sets of Example 1 were used and conformers were previously calculated as described in Example 2. However, the skilled person will appreciate that the below calculation of descriptors may be done on any other data set of chemical compounds and with any conformers calculated by use of conventional methods in the field. Three groups of descriptors were calculated from the data sets of Example 1 and the conformers as described in Example 2, and used in the descriptor selection.

Structural descriptors included lipophilicity, a topological index (InfoWiener) related to the sum of interatomic distances, counts of the numbers of atoms, bonds, and rings, as well as acidic and basic pKa.

Conformer descriptors (different values for the different conformers of the same structure) included volume and surface descriptors, frontier molecular orbitals energies, geometric indices such as effective cross-sectional diameter, maximum diameter, planarity index, polarizability, electronegativity, heat of formation, geometric topological indices etc.

Atomic descriptors (different values for each atom of each conformer of each structure) included donor (electrophilic) and acceptor (nucleophilic) superdelocalizabilities, atomic self-polarizability, atomic charge and others. The full list of descriptors is presented in Table 1 below.

Descriptor List of descriptors

type

Atomic ACCEPT_DLC, BOND ORDER, DONOR_DLC, POLAR, POP_HOMO,

POP_LUMO, Q, VWACWN, VWACWP, VWPNSA, VWPPSA.

Conformer A_alpha_C, A max, A_max_Benzene, Atom_dist_ratio,

Bond_Order_Hlg, C A LC ._H E AT FO R M . , D_max, DiamEff, DiamMax, DiamMin, DIPOLE MOMENT, E GAP, E_HOMO, ELECTRONEGATIVITY, Electrophilicity, E_LUMO, GEOM._INFO_WIENER, GEOM. WIENER, PLANARITY, PLANARITY conjugate, Q_Aldehyde_0, RNCG, RPCG, SASurf_FNSA1 , SASurf_FNSA2, SASurf_FNSA3, SASurf_FPSA1 , SASurf_FPSA2, SASurf_FPSA3, SASurf_RNCS, SASurf_RPCS, SASurf_WNSA1 , SASurf_WNSA2, SASurf_WNSA3, SASurf_WPSA1 , SASurf_WPSA2, SASurf_WPSA3, SVWNPSA, SVWPPSA, VAN_D._WAALS_SUR., VAN_D._WAALS_VOL., VdWSurf_DPSA1 , VdWSurf_DPSA2, VdWSurf_DPSA3, VdWSurf_FNSA1 , VdWSurf_FNSA2, VdWSurf_FNSA3, VdWSurf_FPSA1 , VdWSurf_FPSA2, VdWSurf_FPSA3, VdWSurf_PNSA1 , VdWSurf_PNSA2, VdWSurf_PNSA3, VdWSurf_PPSA1 , VdWSurf_PPSA2, VdWSurf_PPSA3, VdWSurf_RNCS, VdWSurf_RPCS, VdWSurf_WNSA1 , VdWSurf_WNSA2, VdWSurf_WNSA3, VdWSurf_WPSA1 , VdWSurf_WPSA2, VdWSurf_WPSA3, VOLUME_POLARIZAB.

Structural Log(Kow), lnfo_Wiener, pKa(acidic), pKa(basic), N AromaticBonds,

N_CycleBonds, N_H E AVY ATOMS

Table 1. Initial descriptors generated for the data sets

ACD/ToxSuite 2.95 by ACD/Labs (http://www.acdlabs.com/ products/admet/tox/) was used to calculate the acidic and basic dissociation constants. All other descriptors were calculated using OASIS Database Manager v. 1 .7.3 (http://oasis-lmc.org).

Example 4: Generating conformational descriptors from atomic descriptors, and structural descriptors from conformational descriptors

In this present example, the data sets of Example 1 , conformers of Example 2 and descriptors of Example 3 were used. However, the skilled person will appreciate that the below calculation of descriptors may be done on any data set of chemical compounds, with any conformers calculated by use of conventional methods in the field, and with any set of descriptors calculated for a given chemical structure or conformer.

Taking the maximum of an atomic descriptor d on all atoms of a given conformer, we defined a conformational (non-atomic) descriptor as described previously in formula 4, where the maximum was taken on all atoms A of the conformer.

This procedure was performed for all atomic descriptors from Table 1 of Example 3. As a result, we calculated conformational descriptors: the maximum of the acceptor (nucleophilic) superdelocalizability D N on all atoms of a given conformer, the maximum of the donor (electrophilic) superdelocalizability D E on all atoms of a given conformer, etc.

The procedure was repeated taking the maxima only on specific atom types (O, N, C). As a result, we calculated conformational descriptors, namely the maximum of the acceptor (nucleophilic) superdelocalizability D N on all oxygen atoms of a given conformer, the maximum of the acceptor superdelocalizability D N on all nitrogen atoms of a given conformer, etc.

Furthermore, taking the maximum of a conformer descriptor d on all conformers of a given structure, we defined structural descriptors as described previously in formula 5, where c ranges over all conformers of a given structure (approximated by all generated conformers of the structure). This procedure was carried out for all conformer parameters, both for the original ones from Table 1 of Example 3 and for the ones defined by calculating maxima of atomic parameters. Note that the descriptors derived through the second equation above are structural ones, although they were derived from conformer information (and possibly also from atomic information). Thus, we defined the structural descriptors MaxDiamEff (the maximum of DiamEff, the effective cross-sectional conformer diameter of a conformer), the maximum and minimum (on all conformers) of the maximum D E taken on all nitrogen atoms of a conformer, etc. These generated parameters, together with the original structural descriptors, were used for the derivation of a hERG rule.

Example 5: Derivation of a binary prediction model for hERG ion channel inhibition of acids and zwitterions

In the present example, the data sets, descriptors and conformers of the previous Examples 1 -4 were used. However, the skilled person will appreciate that the below Example could be done with any data set of chemical compounds, with any conformers calculated by use of conventional methods in the field, and with any set of descriptors calculated for a given chemical structure or conformer. A subset of chemicals T A was selected from the training set TV The subset T A consisted of 153 chemicals with at least one acidic ionogenic group and either no basic ionogenic groups at all or pKa(acidic) < pKa(basic) (acids and zwitterionic ampholytes (AZA)).

All 153 chemicals from T A were submitted to the See5 decision tree system by RuleQuest Research (http://www.rulequest.com). See5 is a state-of-the-art classifier construction system using decision trees, a non-parametric machine-learning technique.

The See5 algorithm (Quinlan (1993 and 1997)) is the latest version of the ID3 and C4.5 algorithms developed by the same author. See5 and its predecessors use formulas based on information theory to evaluate the "goodness" of a test; in particular, they choose the test that extracts the maximum amount of information from a set of cases, given the constraint that only one attribute is tested. To this end, the entropy criterion as described previously in formula 3 is used, where N is the total number of observations, k the number of classes and is the number of observations belonging to each class. The entropy of an information item is a measure of its randomness or uncertainty or can be taken as a measure of the average amount of information that is supplied by the knowledge of the information item.

A confidence value of 5% was used in order to enhance the reliability of the derived rules.

In order to reduce randomness in the choice of descriptors, the decision tree was required to have a large enough minimum leaf size. A series of decision tree models was produced with different settings for this parameter. The number of actives in T A was 35. Using more than 35 chemicals as the minimum leaf size resulted in no parameters being selected and the trivial classifier being built (classifying all structures as positive).

Using precisely 35 chemicals as the minimum leaf size resulted in two parameters being selected (MaxDiamEff, the maximum effective cross-sectional diameter and the structural maximum of donor (electrophilic) superdelocalizability on nitrogen atoms, (D E stmcture) - Exactly the same result was obtained when the minimum block size was set to any value between 20 and 35. Next, the See5 decision tree generation system was used to create a set of rules for prediction of hERG ion channel activity based on the two most significant selected descriptors. Ionization (acidic pKa), already used in the construction of T A , was added to the selected parameter list. The construction of a classifier in See5 for hERG IC 50 < 10 μΜ of acids and zwitterionic ampholytes resulted in the following rule based on three descriptor ranges:

A prediction is positive (hERG IC 50 < 10 μΜ) if

MaxDiamEff > 10.36 [A] and

Structure 0.278 [a.u./eV] and

pK a (acidic) > 5

of the descriptors pK a (acidic), MaxDiamEff (the maximum of the effective cross- sectional diameter), and D E S tructure (the maximum donor (electrophilic) superdelocalizability D E calculated at all nitrogen atoms of all available conformers). The maxima were taken on all conformers of a given structure (approximated by all generated conformers of the structure), therefore the condition translates into the following structural alert:

there exists a conformer such that DiamEff > 10.36 [A] and

D Conformer > 0.278 [a.u./eV]

(D E conformer is the maximum donor (electrophilic) superdelocalizability D E calculated at all nitrogen atoms of the conformer)

Validation of the derived rule

The performance of the derived rule was estimated on the training set of observations as well as on two independent external validation sets.

The training set T A consisted of 153 experimental data points, of which 35 were hERG inhibitors (IC 50 < 10 μΜ) and 1 18 non-inhibitors (IC 50 ≥ 40 μΜ). Validation set V 1A consisted of 35 experimental data points, of which 8 hERG inhibitors and 27 non- inhibitors. This validation set consisted of 20% of chemicals from the initial data set taken off randomly for validation while preserving the inhibitor/non-inhibitor ratio, as described in Section 2.3.

A subset of chemicals V 2A was selected from the validation set V 2 . The subset V 2A consisted of 48 chemicals with at least one acidic ionogenic group and either no basic ionogenic groups at all or pKa(acidic) < pKa(basic). The hERG inhibitors were defined as having hERG IC 50 < 10 μΜ, while non-inhibitors (37) were defined as having hERG IC 50 ≥ 10 μΜ. Note the different inactivity threshold compared to the training and the first validation sets.

The Cooper statistics of the alert performance for the training set and for V 1A are presented in Table 2 below.

The Cooper statistics in all tables is calculated as follows:

Sensitivity: The ratio of true positives to all positives predicted positive or negative. Specificity: The ratio of true negatives to all negatives predicted positive or negative. Concordance: The ratio of true predictions to all predictions.

The two statistics denoted respectively by (C) and (S) in Table 2 below reflect two possible interpretations of the derived rule.

The conformerwise (C) application of the rule requires the existence of a conformer satisfying both the diameter and the nitrogen donor (electrophilic) superdelocalizability condition.

Alternatively (S), the structure may be required to have a DiamEff > 10.36 [A] (reached at some conformers) and D E str ucture > 0.278 [a.u./eV] (reached at possibly another subset of conformers), thus satisfying the maxima conditions. The latter interpretation can have a certain value because the active conformer does not necessarily have to be the one with the largest effective diameter; MaxDiamEff is a measure of both size and flexibility of the structure. However, below we will only use the stricter conformerwise (C) interpretation. The external validation on the set V 1A did not differ under the two interpretations. The Cooper statistics of the alert performance for V 2A is presented in Table 3 below.

Table 3. Cooper statistics of the alert for V 2A .

Because the derived rule includes a condition on nitrogen atoms, nitrogen-free AZA would trivially be predicted hERG non-inhibitors. This matches both T A and V 1A , where all nitrogen-free AZA have an IC 50 of over 40 μΜ; V 2A , however, contained three active chemicals of this type. Due to the scarcity of nitrogen-free hERG-inhibitors in the AZA set, statistically reliable rules for characterization of their hERG affinity were difficult to produce. Nevertheless, in view of the existence of such chemicals, we applied the following two domain restrictions to the hERG rule (other domain definitions will be considered below). The Nitrogen domain included only the nitrogen-containing AZA. The Extended nitrogen domain included the nitrogen-containing AZA as well as any AZA with pKa < 5 or DiamEff < 10.36 A. The motivation for this was that these negative rules were derived on the entire set of AZA and did not relate to nitrogen. Within this domain definition, a nitrogen-free AZA would be considered in domain (and predicted negative) if it matches any of the two negative rules on size and pKa; otherwise, it would be considered outside of the model domain. Table 3 lists the results for all AZA as well as for the two domain definitions.

Table 4 presents the performance of the hERG rule for the T A and V 1A sets using the strict and the extended domain definition.

Table 4. Cooper statistics for the alert for T A and V 1A using nitrogen and extended domains

The results from imposing an additional structural domain requirement of at least 30% Tanimoto structural similarity to a training set structure are presented in Table 5. Nitrogen Extended

Predicted positive (of which true positives) 9 (7) 9 (7)

Predicted negative (of which true negatives) 16 (16) 22 (21 )

Sensitivity, % 100 88

Specificity, % 89 91

Concordance, % 92 90

Table 5. Cooper statistics for V 1A with a structural domain requirement

The structural domain requirement resulted in a minor difference in performance compared to the original model (all AZA, nitrogen or extended): four true negatives and one false negative were eliminated. For this reason, we consider another domain restriction. Let p Mtn and p Max be the minimum and maximum values of a parameter p for all training set chemicals. We require the domain members to be AZA, under the Nitrogen or Extended nitrogen domain definitions, and for each of their model parameters p, p Mtn ≤ p≤ p Max . This approach has been used e.g. by (Coi et al. 2009) in the definition of a model domain for hERG affinity.

Table 6 shows the minima and maxima of the three model parameters, and the results from imposing this restriction on the model domain instead of the structural restriction are shown in Table 7.

Table 6. Minimum and maximum values of the alert descriptors for the training set chemicals

v 1A v 2A

Size 35 44

Actives 8 8

Inactives 27 36

Predicted positive (of which true positives) 9 (7) 8 (6)

Predicted negative (of which true negatives) 25 (24) 36 (34)

Sensitivity, % 88 75

Specificity, % 92 94

Concordance, % 91 91

Table 7. Cooper statistics for the validation sets with a descriptor range domain

The hERG blocking affinity (IC50 < 10 μΜ) of acids and zwitterionic ampholytes is described by an alert based on three descriptor ranges (pKa(acidic), D E st ructure and MaxDiamEff, where D E st ructure is the maximum donor (electrophilic) superdelocalizability, D E , calculated at all nitrogen atoms of all (available) conformers and MaxDiamEff is the maximum of the effective cross-sectional diameter. The alert models correctly 91 % of the observations, with a positive predictive value of 89% and a negative predictive value of 91 %. The results were confirmed by two external validations, showing sensitivity of 88%, specificity of 92%, and concordance of 91 %.

Example 6: Comparison of the binary classification tree model to QSAR models of the same training set The present example illustrates another example of the use of the method wherein the predictive model is obtained by training on a set of compounds which is divided based on ionization as presented herein. In this example, the developed predictive model is not a binary decision tree classifier, but a QSAR model build by using Leadscope Predictive Data Miner.

The training set Ύ described in Example 1 was also used to develop predictive QSAR models of hERG IC50 < 10 μΜ vs. hERG IC50≥ 40 μΜ using Leadscope Predictive Data Miner by Leadscope Inc. (Cross et al. 2003, Valerio et al. 2010, http://www.leadscope.com). Leadscope is a software for systematic substructural analysis of a compound set using predefined structural features stored in a template library. The feature library contains approximately 27,000 structural features and the structural features chosen for analysis are motivated by those typically found in small molecules: aromatics, heterocycles, spacer groups, simple substituents. Additionally, the system can generate training set- dependent structural features (scaffolds), and it also estimates molecular descriptors for each structure: the octanol/water partition coefficient (AlogP), hydrogen bond acceptors, hydrogen bond donors, Lipinski score, atom count, parent compound molecular weight, polar surface area and rotatable bonds. The model building process in Leadscope includes an automated procedure of structural feature and numeric descriptor selection (using t- and Yates' X2 statistic metrics). The Leadscope algorithm for building QSAR models is based on structural features and numeric descriptors using partial logistic regression for a binary response variable. The molecular structures were converted into SD format using OASIS Database Manager 1 .7.3 (http://oasis-lmc.org) and imported into Leadscope. The structures were then mined for the predefined structural features from Leadscopes template library by substructure analysis. The selection was done according to Yates' x 2 -test. In addition, eight molecular descriptors, the octanol-water partition coefficient (AlogP), hydrogen bond acceptors, hydrogen bond donors, Lipinski score, atom count, parent compound molecular weight, polar surface area and rotatable bonds, were calculated for each structure. Redundant features were removed using the least redundant feature option in Leadscope. Two predictive models were constructed using the above procedure. The model L To tai (Leadscope Total) was built on the entire training set (T 1 ; 1336 chemicals). The model L A (Leadscope AZA) was built on training set T A (acids and zwitterionic ampholytes only, 153 chemicals).

Both L Tota i and L A were estimated by cross-validation and by external validation using validation sets V! and V 2 (for L To tai) and V 1A and V 2A (for L A ).

The QSAR models were based on structural features and a small number of 2D calculated descriptors and were estimated by cross-validation and by external validation using the same validation sets as for the proposed alert. More specifically, the predictive models were developed based on the identified set of the structural features and the molecular descriptors, using partial logistic regression (PLR). Using the default mode recommended by Leadcope for the case of unbalanced training sets, three separate sub-models were developed based on three balanced training subsets, randomly selected so as to provide disjoint subsets of negatives. The sub-models were then combined into an overall ensemble model with all models assigned equal weights. The predictive performances of the overall model, was evaluated by 10-fold cross-validation, and by external validation. Only predictions within the defined applicability domain were accepted. The applicability domain required that a compound had at least 30% Tanimoto structural similarity (the similarity coefficient was proposed by Jaccard in (Jaccard 1901 ) and independently by Tanimoto in 1957) with a training set compound. The Tanimoto similarity was calculated based on fingerprints of the Leadscope features used for each of the models. Compounds screened with the ensemble model were required to have 30% similarity with at least one training set compound in either sub-model. In addition, predictions were required to have a positive prediction probability of over 0.7 for positives and less than or equal to 0.3 for negatives, rendering predictions with probabilities between 0.3 and 0.7 out of the domain.

The cross-validation performance of the models L To tai and L A is presented in Table 8 below.

Table 8. Cross-validation Cooper statistics of QSAR models L Total and L A Table 9 presents the external validation performance of model L To iai-

Table 9. External validation of L Total : Cooper statistics

Table 10 presents the external validation performance of L A .

Table 10. External validation of L A : Cooper statistics

The proposed rule based on a binary decision tree model using the descriptors demonstrated superior predictivity and coverage for the AZA class compared to the QSAR models.

Example 7: Comparison of other descriptors of molecular size, reactivity on nitrogen atoms and acidity of the same training set

Other descriptors reflecting molecular properties similar to pKa(acidic), the maximum donor (electrophilic) superdelocalizability, calculated at all nitrogen atoms of all (available) conformers and maximum of the effective cross-sectional diameter may also be useful for prediction of hERG inhibition activity. The maximum of the effective cross-sectional diameter (MaxDiamEff) is a descriptor of molecular size. The performance of other descriptors of the molecular size of a compound calculated according to Examples 3 and 4 are demonstrated in Table 1 1 below. The maximum effective cross-sectional diameter (DiamEff) was calculated as mentioned above.

The minimal diameter (DiamMin) of a conformer is defined as the minimum distance between two parallel planes circumscribing the molecule, and the maximum diameter (DiamMax) is defined as the diameter of the smallest sphere circumscribing the molecule (Dimitrov et al 2003, Brooke and Cronin 2009). These descriptors were calculated as defined herein.

Van der Waals surface area of a molecule can be defined in the usual way as the area of a surface formed by the spheres of van der Waals radii around the atoms of the molecule (see Meyer 1985). The descriptor of Van der Waals surface area (VAN_D_WAALS_SUR) was likewise calculated as defined herein by using the proprietary algorithm implemented in OASIS Database Manager 1 .7.3, commercially available by the Laboratory of Mathematical Chemistry, University of Burgas, Bulgaria (http://oasis-lmc.org), which in turn uses the free software MOPAC to calculate some of its descriptors. Each of these descriptors correlates with hERG blocking. While the value of the descriptors is in using them in combination with others, even when used alone, they show good agreement with hERG blocking on the training set of AZA as can be seen from Table 1 1 below:

Table 1 1 True False False True Sensitivi Specifici negative positive- negative positive- -ty -ty

-es es -es es

MaxDiamMin 1 10 6 26 10 28 95

MaxDiamMax 77 39 4 32 89 66

MaxDiamEff 59 57 1 35 97 51

MaxVan_d_Waals_ 56 60 3 33 92 48

Sur The above results therefore demonstrate that other molecular descriptors of the size of a molecule may be useful in the prediction methods of the present invention.

D E stmcture is a structural descriptor of reactivity on the nitrogen atoms of a molecule. Other similar descriptors of reactivity on the nitrogen atoms of a molecule may also be useful in the prediction methods according to the present invention. The performance of other descriptors of reactivity on the nitrogen atoms of a compound calculated according to Examples 3 and 4 are given in Table 12 below.

Reactivity indices calculated on nitrogen atoms, can be descriptors such as donor (electrophilic) super-delocalizability (DONORJDLC), self-polarizability (POLAR), or Lowest Unoccupied Molecular Orbital (POP_ LUMO). The donor (electrophilic) super-delocalizability on nitrogen atoms was calculated as defined herein.

Self-polarizability (POLAR) or TT s (r) of an atom r is a reactivity measure for π electron systems and was introduced by (Coulson and Longuet-Higgins 1947). The descriptor was calculated by using the electron formula defined according to formula 6 as defined herein (also available at http://openmopac.net/manual/index.html):

POP_ LUMO is the population LUMO, a descriptor assessing the partial electron densities of the nitrogen atom in the frontier orbital (from MOPAC 93). It was calculated as defined herein.

As can be seen from Table 12 below, each of these descriptors correlates with hERG blocking. While the value of the descriptors is in using them in combination with others, even when used alone, they show good agreement with hERG blocking on the training set of AZA. Table 12 True False False True Sensitivi Specifici negativpositive- negativpositive- -ty -ty es es es es

Max_POP_LUMO 105 1 1 13 23 64 91

Max_POLAR 1 15 1 26 10 28 99

Max_DONOR_DLC 87 29 7 29 81 75

The above results therefore demonstrate that other molecular descriptors of the reactivity on the nitrogen atoms may be useful in the prediction methods of the present invention.

The following items further serve to define the present invention:

Items

A method for developing a predictive model of hERG channel inhibiting activity of chemical substances, wherein the predictive model is obtained by training on a set of compounds which is divided based on ionization.

The method according to item 1 , wherein the predictive model is obtained by training on a set of compounds selected from a group consisting of acids and/or zwitterionic compounds.

The method according to any of the preceding items, wherein the predictive model uses one or more atomic descriptors, and/or descriptors derived from atomic descriptors.

The method according any of the preceding items, wherein the predictive model uses one or more conformational descriptors derived from one or more conformers of a chemical compound or structure.

The method according to any of the previous items, wherein predictive model uses one or more structural descriptors.

6. The method according to any of the preceding items, wherein the predictive model uses at least one descriptor of the conformer effective cross-sectional diameter. 7. The method according to any of the previous items, wherein predictive model uses one or more structural descriptors derived from conformational descriptors. 8. The method according to any of the preceding items, wherein the predictive model uses at least one descriptor of the conformer effective cross-sectional diameter, such as the conformer effective cross-sectional diameter calculated on each conformers of a chemical compound (DiamEff), and/or the maximum conformer effective cross-sectional diameter calculated on all conformers of a chemical compound (MaxDiamEff).

9. The method according to any of the preceding items, wherein the predictive model uses at least one descriptor of the maximum conformer effective cross-sectional diameter calculated on all conformers of a chemical compound (MaxDiamEff).

10. The method according to any of the preceding items, wherein the predictive model uses one or more descriptors of donor (electrophilic) superdelocalizability on oxygen, nitrogen or carbon atoms. 1 1 . The method according to any of the preceding items, wherein the method uses one or more descriptors of donor (electrophilic) superdelocalizability on nitrogen atoms.

12. The method according to any one of the preceding items, wherein the predictive model uses at least one descriptor of the maximum donor (electrophilic)

superdelocalizability on nitrogen atoms of a chemical compound.

13. The method according to any one of the preceding items, wherein the predictive model uses at least one descriptor of the maximum donor (electrophilic) superdelocalizability on nitrogen atoms on each conformer of a chemical compound (D E conformer) and/or the maximum donor (electrophilic) superdelocalizability on nitrogen atoms on all conformers of a chemical compound (D E str ucture) -

14. The method according to any one of the preceding items, wherein the predictive model uses a descriptor of the maximum donor (electrophilic) superdelocalizability on nitrogen atoms on all conformers of a chemical compound (D E str ucture) -

15. The method according to any of the previous items, wherein the predictive method uses pKa (acidic) as a descriptor. 16. The method according to any of the previous items, wherein the predictive model uses a combination of descriptors comprising a descriptor of conformer effective diameter and a descriptor of donor (electrophilic) superdelocalizability on nitrogen atoms.

17. The method according to any of the previous items, wherein the predictive model uses a predictive threshold of pKa (acidic) in the range of about 0 to about 16, such as about 2 to about 8, such as about 4 to about 6, such as about 4, or such as about 5, or such as about 6.

18. The method according to any of the previous items, wherein the predictive model uses a predictive threshold of the maximum donor (electrophilic)

superdelocalizability on the nitrogen atoms of all conformers of a chemical compound in the range of about 0.1 a.u./eV to about 0.4 a.u./eV, such as about 0.2 a.u./eV to about 0.3 a.u./eV, such as about 0.25 a.u./eV to about 0.3 a.u./eV, such as about 0.26 a.u./eV to 0.28 a.u./eV, such as about 0.278 a.u./eV.

19. The method according to any of the previous items, wherein the predictive model uses a predictive threshold of the maximum conformer effective cross-sectional diameter calculated on all conformers of a given chemical compound or structure in the range of about 5 A to 15 A, such as 9 A to 1 1 A, such as 10 A to 10.5 A, or such as in the range of 10.3 A to 10.4 A, such as about 10.36 A. 20. The method according to any of the previous items, wherein the predictive model uses a predictive threshold of pKa (acidic) in the range of about 0 to 16, such as about 2 to 8, and a predictive threshold of maximum donor (electrophilic) superdelocalizability on the nitrogen atoms of all conformers of a chemical compound ( D E str ucture) in the range of about 0.1 a.u./eV to about 0.4 a.u./eV, such as within the range of about 0.2 a.u./eV to about 0.3, and a predictive threshold of maximum conformer effective cross-sectional diameter calculated on all conformers of a given chemical compound or structure (MaxDiamEff) in the range of about 5 A to 15 A, such as 8 A to 12 A, such as 9 A to 1 1 A. 21 . The method according to any of the previous items, wherein the predictive model uses a predictive threshold of pKa(acidic) is in the range of about 2 to 8, and the predictive threshold of maximum donor (electrophilic) superdelocalizability on the nitrogen atoms of all conformers of a chemical compound (D E str ucture) is in the range of about 0.275 a.u./eV to 0.280 a.u./eV, and the predictive threshold of maximum conformer effective cross-sectional diameter calculated on all conformers of a given chemical compound or structure (MaxDiamEff) may be in the range of 9 A to 1 1 A, such as in the range of about 10.3 A to about 10.4 A. . The method according to any of the previous items, wherein the predictive model uses a combination of descriptors comprising:

a) a structural descriptor of maximum conformer effective diameter (MaxDiamEff) and/or,

b) a structural descriptor of maximum donor (electrophilic) superdelocalizability on nitrogen atoms (D E st ructure) , and/or

c) a structural descriptor of pKa (acidic). 23. The method according to any of the previous items, wherein the predictive model uses a combination of all the descriptors a to c :

a) a structural descriptor of maximum conformer effective diameter (MaxDiamEff) and,

b) a structural descriptor of maximum donor (electrophilic) superdelocalizability on nitrogen atoms (D E str ucture) , and

c) a structural descriptor of pKa (acidic). . The method according to any of the previous items, wherein the method comprises the use of a binary classification model. . The method according to any of the previous items, wherein the training set is sorted with respect to hERG channel inhibiting activity. . The method according to any of the previous items, wherein the training set is sorted with respect to hERG channel inhibiting activity measured as IC 50 . . The method according to any of the previous items, wherein the training set is confined by compounds having channel inhibiting activity IC 50 <10 μΜ and compounds having IC 50 ≥ 40 μΜ. 28. The method according to any of the previous items, wherein a negative prediction is associated with a hERG IC 50 ≥ 40 μΜ.

29. The method according to any of the previous items, wherein a positive prediction is associated with a hERG IC 50 < 10 μΜ.

30. The method according to any of the previous items, wherein the predictive model uses 1 to 10 descriptors.

31 . The method according to any of the previous items, wherein the predictive model uses 1 to 5 descriptors.

32. The method according to any of the previous items, wherein the predictive model uses 1 to 3 descriptors.

33. The method according to any of the previous items, wherein the binary

classification model is developed by use of a binary decision tree classifier.

34. The method according to any of the previous items, wherein the binary

classification model is developed by use of the See5 classifier construction system.

35. The method according to any of the previous items, using the rule defined as: a positive prediction is returned if all conditions a), b), c) and d) are fulfilled, a negative prediction is returned if condition a) is fulfilled, and one or more of the conditions b), c) and d) is not fulfilled,

wherein the conditions a), b), c) and d) are defined as: a) The compound comprises a nitrogen atom,

b) pKa (acidic) > 5,

c) there exists a conformer such that 0.278 a.u./eV < maximum donor

(electrophilic) superdelocalizability on the nitrogen atoms (D Structure) ,

d) Maximum conformer effective cross-sectional diameter (MaxDiamEff) > 10.36A

36. The method according to any of the previous items, using the rule defined as: a negative prediction is returned if condition a) is not fulfilled, and one or more of conditions b) and c) is fulfilled,wherein the conditions wherein the conditions a), b) and c) are defined as:

a) The compound does not comprise a nitrogen, b) pKa (acidic) < 5,

c) Maximum conformer effective cross-sectional diameter (MaxDiamEff) < 10.36A.

37. The method according to any of the previous items wherein both rules as defined in items 35 and 36 are used.

38. The method according to any of the previous items wherein the predictive model wherein a positive prediction requires that the maximum conformer effective cross- sectional diameter (MaxDiamEff) and the maximum donor (electrophilic) superdelocalizability on the nitrogen atoms (D E st ructure) conditions are fulfilled in the same conformer.

39. The method according to any of the previous items, wherein the applicability

domain is confined by compounds having 1 .3 < pKa (acidic) < 16.

40. The method according to any of the previous items, wherein the applicability

domain is confined by compounds having 6.38A < Maximum conformer effective cross-sectional diameter (MaxDiamEff) < 18.78A.

41 . The method according to any of the previous items, wherein the applicability

domain is confined by compounds having at least one acidic ionogenic group and either a) no basic ionogenic groups, or b) pKa (acidic) < pKa (basic).

42. The method according to any of the previous items, wherein the applicability

domain is confined by compounds having a) at least one nitrogen atom and 0.1 14 a.u./eV < maximum donor (electrophilic) superdelocalizability on nitrogen atoms (D E stmcture)≤ 0.317 a.u./eV, or b) no nitrogen atom and one or more of the following conditions: pKa (acidic) < 5 or there exist no conformer so that conformer effective diameter is > 10.36 A.

43. The method according to any of the previous items, wherein the applicability

domain is confined by the compounds having all properties of the compounds defined in items 39 to 42. 44. A method for predicting hERG channel inhibiting activity of chemical substances selected from the group consisting of acids and zwitterionic compounds, wherein the method comprises the use of a predictive method. 45. The method according to item 44, wherein the predictive method is further defined as in items 2 to 43.

46. A computer-assisted method or prediction method further defined as in any one of the preceding items.

47. A method for predicting hERG channel inhibiting activity of chemical substances selected from the group consisting of acids and zwitterionic compounds, wherein the method comprises the use of a predictionmodel.

48. The method according to item 47, wherein the prediction method comprises the use of a binary classification model.

49. The method according to any of items 47 and 48, wherein prediction method uses one or more structural descriptors.

50. The method according to any of items 47 to 49, wherein prediction method uses one or more structural descriptors derived from conformational descriptors.

51 . The method according to any of items 47 to 50, wherein the prediction method uses at least one descriptor of the conformer effective cross-sectional diameter.

52. The method according to any of items 47 to 51 , wherein the prediction method uses at least one descriptor of the maximum donor (electrophilic) superdelocalizability on nitrogen atoms of a chemical compound.

53. The method according to any of items 47 to 52, wherein the prediction method uses pKa (acidic) as a descriptor.

54. The method according to any of items 47 to 53, wherein the prediction method uses a combination of descriptors comprising:

a) a structural descriptor of maximum conformer effective diameter on all conformers of a chemical compound (MaxDiamEff) and/or, b) a structural descriptor of maximum donor (electrophilic) superdelocalizability on nitrogen atoms on all conformers of a chemical compound (D E st ructure) , and/or

c) a structural descriptor of pKa (acidic). The method according to any of items 47 to 54, wherein the prediction method uses a predictive threshold of the maximum donor (electrophilic) superdelocalizability on the nitrogen atoms of all conformers (D E str ucture) in the range of about 0.1 a.u./eV to about 0.4 a.u./eV, such as about 0.2 a.u./eV to about 0.3 a.u./eV, such as about 0.25 a.u./eV to about 0.3 a.u./eV, such as about 0.26 a.u./eV to 0.28 a.u./eV, such as about 0.278 a.u./eV. The method according to any of items 47 to 55, wherein the prediction method uses a predictive threshold of the maximum conformer effective cross-sectional diameter calculated on all conformers of a given chemical compound or structure

(MaxDiamEff) in the range of about 5 A to 15 A, such as 9 A to 1 1 A, such as 10 A to 10.5 A, or such as in the range of 10.3 A to 10.4 A, such as about 10.36 A.

The method according to any of items 47 to 56, wherein the prediction method uses a predictive threshold of pKa (acidic) in the range of about 0 to about 16, such as about 2 to about 8, such as about 4 to about 6, such as about 4, or such as about 5, or such as about 6. A computer-assisted method or prediction method further defined as in any one of the preceding items.

References

Aptula AO, Cronin MT (2004), Prediction of hERG K+ blocking potency: application of structural knowledge, SAR QSAR Environ. Res. 15 (5-6), 399-41 1 . Aronov AM (2005), Predictive in silico modeling for hERG channel blockers, Drug Discovery Today 10 (2), 149-155.

Brooke DN, Cronin MTD, Calculation of Molecular Dimensions Related to Indicators for Low Bioaccumulation Potential, Environment Agency, Bristol, 2009, Science Report, https://www.gov.Uk/government/uploads/system/uploads/attachm ent_data/file/291060/s cho0109bpgt-e-e.pdf

Cianchetta G, Li Y, Kang J, Rampe D, Fravolini A, Cruciani G, Vaz RJ (2005),

Predictive models for hERG potassium channel blockers, Bioorganic and Medicinal Chemistry Letters 15 (15), Pages 3637-3642.

Coulson CA, Longuet-Higgins HC, Proc. Roy. SOC. (London) A 192, 16-32 (1947). Cross KP, Myatt G, Yang C, Fligner MA, Verducci JS, Blower PE, Finding

Discriminating Structural Features by Reassembling Common Building Blocks, J. Med. Chem. 2003, 46, 4770-4775.

G.J. Diaz et al. / Journal of Pharmacological and Toxicological Methods 50 (2004) 187— 199.

Dimitrov SD, Dimitrova NC, Walker JD, Veith GD, Mekenyan OG (2003),

Bioconcentration potential predictions based on molecular attributes - an early warning approach for chemicals found in humans, birds, fish and wildlife. QSAR Comb. Sci. 22: 58-68. Doddareddy MR, Klaasse EC, Shagufta, IJzerman AP, Bender A, Prospective Validation of a Comprehensive In silico hERG Model and its Applications to Commercial Compound and Drug Databases, ChemMedChem 2010, 5, 716 - 729.

Fernandez D, Ghanta A, Kauffman GW, Sanguinetti MC (2004), Physicochemical features of the HERG channel drug binding site, J Biol Chem. 279(1 1 ), 10120-7. Fukui, K., Kato, H. and Yonezawa, T., Buff. Chem. SOC. Jup. 27, 423 -427 (1961). Gaudio AC, Takahata Y (1992), Calculation of molecular surface area with numerical factors, Computers and chemistry, Vol. 16 (4), 277-284. Gepp MM, Hutter MC (2006), Determination of hERG channel blockers using a decision tree, Bioorganic and Medicinal Chemistry 14, 5325-5332.

Haga Y, Mizutani S, Naya A, Kishino H, Iwaasa H, Ito M, Ito J, Moriya M, Sato N, Takenaga N, Ishihara A, Tokita S, Kanatani A, Ohtake N (201 1 ), Discovery of novel phenylpyridone derivatives as potent and selective MCH1 R antagonists, Bioorganic & Medicinal Chemistry, Volume 19, Issue 2, pp. 883-893.

Jentsch, Nature Reviews Neuroscience 2000, 1 , 21 -30 Juska L, Didziapetris R, Japertas P (2008), Trainable model of hERG channel inhibition prediction, Abstracts/Toxicology Letters 180S, S32-S246

Keseru G, Prediction of hERG potassium channel affinity by traditional and hologram qSAR methods, Bioorganic and Medicinal Chemistry Letters— 2003, Volume 13, Issue 16, pp. 2773-2775.

Li, Q., Jorgensen, F.S., Oprea, T., Brunak, S., Taboureau, O (2008), hERG

classification model based on a combination of support vector machine method and GRIND descriptors, Molecular Pharmaceutics, 5 (1 ), pp. 1 17-127. Liu T, Lin Y, Wen X, Jorrisen RN, Gilson MK (2007), BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities, Nucleic Acids Research 35:D198-D201 .

Marquis RW, Lago AM, Callahan JF, Rahman A, Dong X, Stroup GB, Hoffman S, Gowen M, DelMar Eric G, Van Wagenen BC, Logan S, Shimizu S, Fox J, Nemeth EF, Roethke T, Smith BR, Ward KW, Bhatnagar P, Antagonists of the Calcium Receptor. 2. Amino Alcohol-Based Parathyroid Hormone Secretagogues, Journal of Medicinal Chemistry 2009, Volume 52, Issue 21 , pp. 6599-6605. Mekenyan, O. G.; Dimitrov, D.; Nikolova, N.; Karabunarliev, St. Conformational Coverage by a Genetic Algorithm. J. Chem. Inf. Comput. Sci. 1999, 39 (6), 997-1016.

Mekenyan, O. G.; Pavlov, T.; Grancharov, V.; Todorov, M.; Schmieder, P.; Veith, G. 2D-3D Migration of Large Chemical Inventories with Conformational Multiplication. Application of the Genetic Algorithm. J. Chem. Inf. Model. 2005, 45 (2), 283-292.

Meyer AY, Molecular Mechanics and Molecular Shape. Part 1 . van der Waals

Descriptors of Simple Molecules, J. CHEM. soc. PERKIN TRANS. II 1985, 1 161 -1 169.

Mitcheson JS, Chen J, Lin M, Culberson C, Sanguinetti MC (2000), A structural basis for drug-induced long QT syndrome, Proc. Natl. Acad. Sci. U.S.A. , 97, 12329-12333.

Nikolov N, Grancharov V, Stoyanova G, Pavlov T, Mekenyan O (2006), Representation of Chemical Information in OASIS Centralized 3D Database for Existing Chemicals, J. Chem. Inf. Model., 46(6), 2537-2551 . S. T. Murphy et al. Bioorg. Med. Chem. Lett. 17 (2007) 2150-2155.

Obiol-Pardo C, Gomis-Tena J, Sanz F, Saiz J, Pastor M (201 1 ), A Multiscale Simulation System for the Prediction of Drug-Induced Cardiotoxicity, J. Chem. Inf. Model. 51 , 483-492. P. Jaccard, 1901 , Distribution de la flore alpine dans le bassin des Dranses et dans quelques regions voisines. Bulletin del la Societe Vaudoise des Sciences Naturelles 37, 241 -272.

Perry M, Sanguinetti M, Mitcheson J (2010), Revealing the structural basis of action of hERG potassium channel activators and blockers, J Physiol 588.17, 3157-3167.

Polak S, Wisniowska B, Brandys J, Collation, assessment and analysis of literature in vitro data on hERG receptor blocking potency for subsequent modeling of drugs' cardiotoxic properties, J. Applied Toxicology 2009; 29: 183-206, doi 10.1002/jat.1395.

Sanguinetti MC, Jiang C, Curran ME, Keating MT (1995), A mechanistic link between an inherited and an acquired cardiac arrhythmia: HERG encodes the IKr potassium channel, Cell, 81 (2):299-307. Sanguinetti MC, Tristani-Firouzi M, hERG potassium channels and cardiac arrhythmia. Nature 2006, 440, 463^169.

Schiesaro A, Ecker GF, Prediction of hERG channel inhibition using in silico

techniques, in: Gupta S (Ed.), Ion channels and their inhibitors, DOI 10.1007/978-3- 642-19922-6_7, Springer, 201 1 .

Schomer E, Smallest enclosing cylinders, Algorithmica (2000) 27: 170-186). Schuurmann, G. Env. Tox. Chem. (9), 417 (1990), (A)

Schuurmann, G. Quant. Struct.-Act. Relat. (9), 326 (1990), (B)

Simon J. Shaw, Yue Chen, Hao Zheng, Hong Fu, Mark A. Burlingame, Saul Marquez, Yong Li, Mark Claypool, Christopher W. Carreras, William Crumb, Dwight J. Hardy,

David C. Myles, and Yaoquan Liu, Structure- Activity Relationships of 9-Substituted-9- Dihydroerythromycin-Based Motilin Agonists: Optimizing for Potency and Safety, J. Med. Chem. 2009, 52, 6851 -6859. Stewart, J. J. P. MOPAC 93; Fujitsu Limited: Chiba 261 , Japan; Stewart Computational Chemistry: Colorado Springs, CO, 1993.

Stewart, J. J. P. MOPAC: A Semiempirical Molecular Orbital Program. J. Comput.- Aided Mol. Des. 1990, 4, 1 -105.

Taglialatela M, Pannaccione A, Castaldo P et al., Molecular basis for the lack of HERG K+ channel block-related cardiotoxicity by the H1 receptor blocker cetirizine compared with other second-generation antihistamines, Mol Pharmacol. 1998 Jul;54(1 ):1 13-121 .

Todd A. Brugel, Reed W. Smith, Michael Balestra, Christopher Becker, Thalia Daniels, Tiffany N. Hoerter, Gerard M. Koether, Scott R. Throner, Laura M. Panko, James J. Folmer, Joseph Cacciola, Angela M. Hunter, Ruifeng Liu, Philip D. Edwards, Dean G. Brown, John Gordon, Norman C. Ledonne, Mark Pietras, Patricia Schroeder, Linda A. Sygowski, Lee T. Hirata, Anna Zacco, Matthew F. Peters, Discovery of 8-azabicyclo- [3.2.1 ]octan-3-yloxy-benzamides as selective antagonists of the kappa opioid receptor. Part 1 , Bioorganic & Medicinal Chemistry Letters 20 (2010) 5847-5852. Todorov M, Mombelli E, Al ' t-Al ' ssa S, Mekenyan O (201 1 ), Androgen receptor binding affinity: a QSAR evaluation, SAR and QSAR in Environmental Research, 22:3-4, 265- 291 .

Valerio LG, Yang C, Arvidson KB, Kruhlak NL, A structural feature-based computational approach for toxicology predictions, Expert Opin. Drug Metab. Toxicol. (2010) 6(4):505-518.

Wang M, Yang XG, Xue Y (2008), Identifying hERG potassium channel inhibitors by machine learning methods, QSAR and Combinatorial Science 27, No. 8, 1028-1035, doi: 10.1002/qsar.200810015. (hERG C4.5)

Waring MJ, Johnstone C (2007), A quantitative assessment of hERG liability as a function of lipophilicity, Bioorganic & Medicinal Chemistry Letters 17, 1759-1764.