METHOD OF PREDICTION OF THE THREE-DIMENSIONAL CONFORMATION OF FLEXIBLE PROTEINS

Title:

METHOD OF PREDICTION OF THE THREE-DIMENSIONAL CONFORMATION OF FLEXIBLE PROTEINS

Document Type and Number:

WIPO Patent Application WO/2007/127367

Kind Code:

Abstract:

Computer-implemented methods to predict the structure of an alternative conformation for a macromolecule comprising a hinge. Computer- implemented methods of predicting the location of a hinge in a polypeptide. Computer readable medium and a system useful for practicing the methods in a computing environmentare also provided.

Inventors:

FLORES SAMUEL C (US)
GERSTEIN MARK B (US)

Application Number:

PCT/US2007/010234

Publication Date:

November 13, 2008

Filing Date:

April 25, 2007

Export Citation:

Click for automatic bibliography generation Help

Assignee:

UNIV YALE (US)
FLORES SAMUEL C (US)
GERSTEIN MARK B (US)

International Classes:

G01N33/48; G16B15/00

Foreign References:

US20020072864A1

2002-06-13

Other References:

KROL ET AL.: "Local and long-range structural effects caused by the removal of the N-terminal polypeptide fragment from immunoglobulin L chain gamma", BIOPOLYMERS, vol. 69, 2003, pages 189 - 200
SHEINERMAN ET AL.: "On the role of electrostatic interactions in the design of protein-protein interfaces", JOURNAL OF MOLECULAR BIOLOGY, vol. 318, 2002, pages 171 - 177
BEROZA ET AL.: "Calculation of amino acid pKas in a protein from a continuum electrostatic model: method and sensitivity analysis", JOURNAL OF COMPUTATIONA CHEMISTRY, vol. 17, 1996, pages 1229 - 1244

Attorney, Agent or Firm:

DOYLE, Kathryn (One Logan Square18th And Cherry Street, Philadelphia PA, US)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

What is claimed is:

1. A computer-implemented method for identifying an alternative structure for a macromolecule having a hinge, comprising: a) identifying a hinge location between a first domain and a second domain in said macromolecule; b) rotating one of said domains with respect to the other domain to generate a first conformer; c) equilibrating said first generated conformer to generate an equilibrated first conformer; d) determining a free energy for said generated equilibrated first conformer; and e) repeating steps b), c) and d) iteratively on said equilibrated first conformer until model space is sufficiently populated thereby generating an ensemble of conformers, wherein each generated conformer has a free energy value, and wherein a conformer with a low free energy is an alternative structure for said macromolecule.

2. The method of claim 1 , wherein said macromolecule is selected from the group consisting of polypeptide, RNA and DNA.

3. The method of claim 1, wherein said rotating step comprises rotating a domain around one of the x-axis, the y-axis and the z-axis in a fixed increment to generate one of six different conformers.

4. The method of claim 3, wherein said rotating step is repeated to generate said six different conformers.

5. The method of 3, wherein said fixed increment is +15° or -15°.

6. The method of claim 1, wherein said equilibrating step comprises a molecular dynamics equilibration run for a sufficient length of time.

7. The method of claim 4 wherein at least one of said six different conformers is subjected to steps b), c) and d) to generate additional different conformers.

8. A computer readable medium having computer readable instructions to instruct a computer to perform steps for a method for identifying an alternative structure for a macromolecule having a hinge, comprising: a) identifying a hinge location between a first domain and a second domain in said macromolecule; b) rotating one of said domains with respect to the other domain to generate a first conformer; c) equilibrating said first generated conformer to generate an equilibrated first conformer; d) determining a free energy for said generated equilibrated first conformer; and e) repeating steps b), c) and d) iteratively on said equilibrated first conformer until model space is sufficiently populated thereby generating an ensemble of conformers, wherein each generated conformer has a free energy value, and wherein a conformer with a low free energy is an alternative structure for said macromolecule.

9. A system for identifying an alternative structure for a macromolecule having a hinge, said system comprising: a computing environment executing a macromolecule processing computing application comprising, a hinge location identification module configured to identify the location of a hinge between a first domain and a second domain in said macromolecule; a rotation module configured to rotate one domain with respect to the other domain to generate a first conformer; an equilibration module configured to perform a molecular dynamics equilibration on said generated first conformer to generate an equilibrated first conformer; and a calculation module configured to calculate the free energy of said generated equilibrated first conformer.

10. The system of 9, further comprising an iteration module configured to perform iteratively the operations of said rotation module, said equilibration module and said calculation module on said generated equilibrated first conformer until model space is sufficiently populated thereby generating an ensemble of conformers.

11. A computer-implemented method for identifying an alternative ligand-binding structure for a macromolecule having a hinge, comprising: a) identifying a hinge location between a first domain and a second domain in said macromolecule; b) rotating one of said domains with respect to the other domain to generate a first conformer, wherein a ligand is docked to one of said domains and remains stationary with respect to that domain; c) equilibrating said first generated conformer to generate an equilibrated first conformer; d) re-docking said ligand to said generated equilibrated first conformer to generate an equilibrated ligand-docked conformer; e) determining a binding energy for said generated equilibrated ligand-docked conformer; and f) repeating steps b), c), d) and e) iteratively until model space is sufficiently populated thereby generating an ensemble of ligand-docked conformers, wherein each generated ligand-docked conformer has a binding energy value, and wherein a ligand- docked conformer with a low binding energy is an alternative ligand-binding structure for said macromolecule.

12. The method of claim 11, wherein said macromolecule is selected from the group consisting of polypeptide, DNA and RNA.

13. The method of claim 11, wherein said rotating step comprises rotating a domain around one of the x-axis, the y-axis and the z-axis in a fixed increment to generate one of six different conformers.

14. The method of claim 13, wherein said rotating step is repeated to generate said six different conformers.

15. The method of 13, wherein said fixed increment is +15° or -15°.

16. The method of claim 11 , wherein said equilibrating step comprises a molecular dynamics equilibration run for a sufficient length of time.

17. The method of claim 14 wherein at least one of said six different conformers is subjected to steps b), c) and d) to generate additional different conformers.

18. A computer readable medium having computer readable instructions to instruct a computer to perform steps for a method for identifying an alternative ligand-binding structure for a macromolecule having a hinge, comprising: a) identifying a hinge location between a first domain and a second domain in said macromolecule; b) rotating one of said domains with respect to the other domain to generate a first conformer, wherein a ligand is docked to one of said domains and remains stationary with respect to that domain; c) equilibrating said first generated conformer to generate an equilibrated first conformer; d) re-docking said ligand to said generated equilibrated first conformer to generate an equilibrated ligand-docked conformer; e) determining a binding energy for said generated equilibrated ligand-docked conformer; and f) repeat steps b), c), d) and e) iteratively until model space is sufficiently populated thereby generating an ensemble of ligand-docked conformers, wherein each generated ligand-docked conformer has a binding energy value, and wherein a ligand- docked conformer with a low binding energy is an alternative ligand-binding structure for said macromolecule.

19. A system for identifying an alternative ligand-binding structure for a macromolecule having a hinge, said system comprising: a computing environment executing a macromolecule processing computing application comprising,

a hinge location identification module configured to identify the location of a hinge between a first domain and a second domain in said macromolecule; a rotation module configured to rotate one domain with respect to the other domain to generate a first conformer. wherein a ligand is docked to one of said domains and remains stationary with respect to that domain; an equilibration module configured to perform a molecular dynamics equilibration on said first conformer to generate an equilibrated first conformer; a re-docking module configured to dock said ligand to said generated equilibrated first conformer to generate an equilibrated ligand-docked first conformer; and a calculation module configured to calculate the binding energy of said generated equilibrated ligand-docked first conformer.

20. The system of 19, further comprising an iteration module configured to perform iteratively the operations of said rotation module, said equilibration module and said calculation module on said generated equilibrated ligand-docked first conformer until model space is sufficiently populated thereby generating an ensemble of conformers.

21. A computer-implemented method for identifying the location of a hinge in a polypeptide, comprising: a) cutting the polypeptide backbone between at a cut site between a first residue and a second residue to produce a first fragment and a second fragment; b) determining an energy value for each of said first and second fragments; c) calculating the energy change associated with said cut site based on said energy value for each of said first and second fragments; and d) repeating steps a), b) and c) iteratively wherein each iteration cuts at a different cut site to produce a database of cut sites and associate energy change; and e) analyzing said database for local energy minima, wherein a cut site associated with a local energy minima is identified as a hinge location.

22. A computer-implemented method for identifying the location of a hinge in a polypeptide, comprising calculating the first normal mode displacements of said polypeptide, wherein the nodal surface of the lower order normal mode eigenvector is identified as a hinge location.

23. A computer-implemented method for identifying the location of a hinge in a polypeptide, comprising: a) obtaining a first identification of hinge location based on a first hinge identification method; b) obtaining a second identification of hinge location based on a second hinge identification method; c) calculating a weighted vote of said first identification and said second identification to identify a location of a hinge in said polypeptide.

24. The method of claim 23, further comprising a third identification of hinge location based on a third hinge identification method, wherein said calculating step comprises calculating a weighted vote of said first identification, said second identification and said third identification.

25. The method of claim 24, further comprising a fourth hinge identification of hinge location based on a fourth hinge identification method, wherein said calculating step comprises calculating a weighted vote of said first identification, said second identification, said third identification and said fourth identification.

26. The method of any of claims 23, 24 and 25, wherein one of said hinge identification methods comprises: a) cutting the polypeptide backbone between at a cut site between a first residue and a second residue to produce a first fragment and a second fragment; b) determining an energy value for each of said first and second fragments; c) calculating the energy change associated with said cut site based on said energy value for each of said first and second fragments; and

d) repeating steps a), b) and c) iteratively wherein each iteration cuts at a different cut site to produce a database of cut sites and associate energy change; and e) analyzing said database for local energy minima, wherein a cut site associated with a local energy minima is identified as a hinge location.

27. The method of any of claims 23, 24, 25 and 26, wherein one of said hinge identification methods comprises calculating the first normal mode displacements of said polypeptide, wherein the nodal surface of the lower order normal mode eigenvector is identified as a hinge location.

Description:

Method of Prediction of the Three-Dimensional Conformation of Flexible Proteins

BACKGROUND OF THE INVENTION Protein motion is often the link between structure and function. The ability to predict protein motion and associated alternative conformations of proteins would be extremely useful in many applications, for instance, screening early-stage lead compounds for binding to a target protein as part of drug development. Computational drug screening is desirable for its hoped-for ability to reduce the time and expense of finding drugs. While significant advances have been made on the problem of predicting thermodynamically accessible alternate conformations of proteins, given a single set of protein structural coordinates, the motions involving the most extensive conformational changes remain beyond the reach of economical prediction for most proteins. The problem of docking small-molecule ligands to protein receptors can be approached by several successively more accurate approaches. In rigid- enzyme, rigid-ligand docking, both the protein receptor and the ligand are assumed rigid. Only six degrees of freedom, the rotations and translations of the ligand, are treated. Many docking codes are capable of performing this type of analysis; however, it is not currently considered an important problem. In rigid enzyme, flexible-ligand docking, the atoms and chemical groups in the ligand are free to move about rotatable bonds. AutoDock, GOLD, DOCK, and other codes with this capability have reached a high level of sophistication and have proven useful in a wide range of applications. In flexible-ligand, flexible-sidechain, rigid backbone docking, backbone atoms remain stationary, while the protein side chains are flexible. There are various methods currently available for this type of docking. Methods for predicting docking having flexible ligand and flexible-sidechains, with small backbone fluctuations, are currently under development. To date however, a program designed to predict docking of flexible ligands to proteins with large scale domain, hinge bending motions has not been developed.

Thus, there is a need in the art for a method of predicting alternative protein conformations and ligand-binding conformations based on large scale domain, hinge bending motions. This invention addresses this need.

BRIEF SUMMARY OF THE INVENTION The invention provides a computer-implemented method for identifying an alternative structure for a macromolecule having a hinge. The method comprises the steps of a) identifying a hinge location between a first domain and a second domain in the macromolecule; b) rotating one of the domains with respect to the other domain to generate a first conformer; c) equilibrating the first generated conformer to generate an equilibrated first conformer; d) determining a free energy for the generated equilibrated first conformer; and e) repeating steps b), c) and d) iteratively on the equilibrated first conformer until model space is sufficiently populated thereby generating an ensemble of conformers, wherein each generated conformer has a free energy value, and wherein a conformer with a low free energy is an alternative structure for said macromolecule.

A computer-implemented method for identifying an alternative ligand- binding structure for a macromolecule having a hinge is also provided by the invention. The method comprises a) identifying a hinge location between a first domain and a second domain in the macromolecule; b) rotating one of the domains with respect to the other domain to generate a first conformer, wherein a ligand is docked to one of the domains and remains stationary with respect to that domain; c) equilibrating the first generated conformer to generate an equilibrated first conformer; d) re-docking the ligand to the generated equilibrated first conformer to generate an equilibrated ligand-docked conformer; e) determining a binding energy for the generated equilibrated ligand-docked conformer; and f) repeating steps b), c), d) and e) iteratively until model space is sufficiently populated thereby generating an ensemble of ligand-docked conformers, wherein each generated ligand-docked conformer has a binding energy value, and wherein a ligand-docked conformer with a low binding energy is an alternative ligand-binding structure for the macromolecule. In embodiments of the computer-implemented methods, the macromolecule is selected from the group consisting of polypeptide, RNA and DNA. In some embodiments, the rotating step comprises rotating a domain around one of the x-axis, the y-axis and the z-axis in a fixed increment to generate one of six

different conformers. In some embodiments, the rotating step is repeated to generate the six different conformers. In one aspect, the fixed increment is +15° or -15°. In other embodiments, the equilibrating step comprises a molecular dynamics equilibration run for a sufficient length of time. In yet other, embodiments, at least one of the six different conformers is subjected to steps b), c) and d) to generate additional different conformers.

The invention further provides a computer-readable medium having computer readable instructions to instruct a computer to perform steps for a method for identifying an alternative structure for a macromolecule having a hinge. Also provided is a system for identifying an alternative structure for a macromolecule having a hinge comprising a computer environment executing a macromolecule processing computing application comprising, a hinge location identification module configured to identify the location of a hinge between a first domain and a second domain in the macromolecule; a rotation module configured to rotate one domain with respect to the other domain to generate a first conformer; an equilibration module configured to perform a molecular dynamics equilibration on the generated first conformer to generate an equilibrated first conformer; and a calculation module configured to calculate the free energy of the generated equilibrated first conformer. The invention further provides a computer-readable medium having computer readable instructions to instruct a computer to perform steps for a method for identifying an alternative ligand-binding structure for a macromolecule having a hinge. A system for identifying an alternative ligand-binding structure for a macromolecule having a hinge is also provided, the system comprising: a computing environment executing a macromolecule processing computing application comprising, a hinge location identification module configured to identify the location of a hinge between a first domain and a second domain in said macromolecule; a rotation module configured to rotate one domain with respect to the other domain to generate a first conformer. wherein a ligand is docked to one of said domains and remains stationary, with respect to that domain; an equilibration module configured to perform a molecular dynamics equilibration on said first conformer to generate an equilibrated first conformer; a re-docking module configured to dock said ligand to said generated equilibrated first conformer to generate an equilibrated ligand-docked first conformer; and a calculation module configured to calculate the binding energy of said generated equilibrated ligand-docked first conformer.

The invention also provides computer-implemented methods for identifying the location of a hinge in a polypeptide.

BRIEF DESCRIPTION OF THE DRAWINGS For the purpose of illustrating the invention, there are depicted in the drawings certain embodiments of the invention. However, the invention is not limited to the precise arrangements and instrumentalities of the embodiments depicted in the drawings.

Figure 1 depicts a flow chart of a method according to an embodiment of the present invention. Figure 2 depicts a graph of a family of 20 ROC curves. These curve represent the performance of HingeMaster hinge predictor, a method of the invention, against test sets of 10 protein structures each.

Figure 3 depicts a schematic structure of biotin carboxylase in its apo conformation. The arrow points to the hinge location identified by FlexOracle (residues 86-89 and 182-185), another method of the invention.

Figure 4 depicts a graph of the results of the method using biotin carboxylase. The data are plotted as docked energy versus the RMSD of domain 3. The docked energy was used as the figure of merit. The datapoints for the apo structure and for the bound conformer predicted using the method of the invention are circled.

Figure 5 depicts an alignment of the conformation of ligand-bound biotin carboxylase known from crystal structure data and the conformation predicted by the method of the invention to be the bound structure.

Figure 6 depicts a schematic structure of glutamine binding protein (GIuBP) in its apo conformation. The arrow points to the hinge location used in the example (residues 86-89 and 182-185).

Figure 7 depicts a graph of the results of the method on GIuBP. The data are plotted as docked energy versus the RMSD of domain 3. The docked energy was used as the figure of merit. The datapoints for the apo structure and for the bound conformer predicted using the method of the invention are circled.

Figure 8 depicts an alignment of the conformation of ligand-bound GIuBP known from crystal structure data and the conformation predicted by the method of the invention to be the bound structure.

Figure 9 depicts a schematic structure of MurA in its apo conformation. The arrow points to the hinge location used in the example (residues 20-21 and 228-229).

Figure 10 depicts a graph of the results of the method using MurA. The data are plotted as docked energy versus the RMSD of domain 3. The docked energy was used as the figure of merit. The datapoints for the apo structure and the bound conformer predicted using the method of the invention are circled.

Figure 11 depicts an alignment of the conformation of the apo and ligand-bound glutamine binding protein configurations known from crystal structure data and the conformation predicted by the method of the invention to be the bound structure.

Figure 12 depicts an exemplary computing system in accordance with herein described system and methods.

Figure 13 illustrates an exemplary illustrative networked computing environment, with a server in communication with client computers via a communications network, in which the herein described apparatus and methods may be employed.

Figure 14 illustrates an implementation of a exemplary macromolecule processing platform of the invention.

DETAILED DESCRIPTION OF THE INVENTION

The invention provides a method using a computer to identify an alternative structure for a macromolecule for which structural information is known. In brief, the method involves identification of a hinge location in the macromolecule. A collection of conformers is generated by iterative rotation around the hinge and each conformer is equilibrated using molecular dynamics. Advantageously, the method is also applicable to ligand binding molecules, by introducing a step of docking a ligand to each conformer. The free energy of binding can be estimated for the docked structure, providing a figure of merit used to identify a correct ligand- bound conformer.

Definitions

Unless defined otherwise, all technical and scientific terms used herein generally have the same meaning as commonly understood by one of ordinary skill in

the art to which this invention belongs. Generally, the nomenclature used herein and the laboratory procedures in cell culture, molecular genetics, organic chemistry, and nucleic acid chemistry and hybridization are those well known and commonly employed in the art.

As used herein, each of the following terms has the meaning associated with it in this section.

The articles "a" and "an" are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, "an element" means one element or more than one element.

As used herein, "conformer" refers to a molecule having the same primary sequence as a parent molecule, but has a different conformation compared to the parent molecule as a result of a rotation around a hinge region.

As used herein, a "starting structure" refers to the conformation which is rotated to generate a conformer. At the start of the method, the structure of the parent molecule is the starting structure. In subsequent iterations of the method, the structure of a conformer generated by the method is the starting structure.

As used herein, a "macromolecule" refers to a very large molecule made up of hundreds or even thousands of atoms. Macromolecules included proteins and nucleic acids, such DNA and RNA.

As used herein, "equilibrate" refers to a computational simulation of the thermal motion of the molecule for a defined period of time or until some convergence criterion is met.

As used herein, "free energy of binding" refers to the difference in free energy between a protein-ligand complex and the protein and ligand separately. As used herein, "low free energy" and "low binding energy" are relative to the values associated with other generated conformers.

As used herein, a "hinge" refers to a localized region in a macromolecule at which large changed in main-chain torsional angles occurs. In polypeptides, hinge motions usually involve a small number of residues, since even one bond can provide the required rotational freedom. This kind of protein motion is free of packing constraints.

As used herein, "docking" refers to the computational process of finding the most favorable conformation, orientation and position of a ligand with

respect to a macromolecule, such as a protein. An estimate of the free energy of binding is typically generated as part of this process.

As used herein, "molecular dynamics" refers to a computer simulation of the motion of atoms in a molecule or molecules, due to various stimuli, such as temperature, applied force, initial position and velocity. As used herein, "model space" refers to the collection of all possible pitch,yaw and roll angular orientations for the domain that is rotated.

As used herein, model space is "sufficiently populated" when conformers have been generated that span a range of pitch, yaw and roll angles specified by the user, no new child conformers can be generated (e.g., due to steric clash, etc), or both.

It is understood that any and all whole or partial integers between any ranges set forth herein are included herein.

Description The invention provides a method using a computer to identify an alternative structure for a macromolecule having a hinge. The macromolecule may be any macromolecule, including but not limited to a polypeptide, a DNA molecule or an RNA molecule. In a preferred embodiment, the macromolecule is a polypeptide. The method assumes that the macromolecule consists of two rigid • domains, separated by a flexible hinge. In brief, the method includes rotating one domain with respect to the other domain in fixed increments to generate an alternative conformation of the macromolecule, equilibrating the conformer, and calculating the enthalpy, free energy of folding, free energy of binding or other meaningful energy parameter for the alternative conformation. This process is repeated iteratively to generate a data set of conformers having alternative conformations with respect to the parent macromolecule conformation and an energy parameter associated with each conformer, some of which are structurally similar to thermodynamically probable ligand-bound protein structures. If the motion consists of hinge bending without significant translations or changes in secondary structure, the conformer with a low free energy of binding is expected to represent a thermodynamically-probable alternative conformation of the macromolecule. One advantage of this program is that it generates an ensemble of conformers that more closely resembles the actual thermal

ensemble than any ensemble generated by other methods, such as normal mode analysis.

Figure 1 depicts a flow chart of an embodiment of the method of the invention. At the start 15 of the process 10, a person provides a data set of structural coordinates for a macromolecule for which an alternative structure is sought. Processing then proceeds to block 20 where the location of a hinge is identified in the macromolecule, thereby delineating two domains separated by the hinge. This structure is then used as the starting structure in the processing of block 25. A conformer is generated from the starting structure by rotating one domain with respect to the other, creating a second data set of structural coordinates corresponding to the generated conformer. Processing then proceeds to block 30, where the conformer is equilibrated. Processing proceeds to block 35 where a check is performed. If the generated conformer fails the equilibration step, the conformer is discarded (block 40) and is not used in further iterations of the process. The process then proceeds to block 45 where a check is performed to determine if the model space is sufficiently populated. If the answer to check 45 is yes, the process ends (block 65). If the model space is not populated (answer to check 45 is no), processing proceeds to return to the starting structure (block 50) from which the discarded conformer was generated. Processing them returns to block 25 where a different rotation is applied to the starting structure. If the answer to the check at block 35 is yes (the conformer passes the equilibration step), processing proceeds to block 55 where the free energy of the equilibrated conformer is calculated and recorded in association with the structural information for the conformer. Structural information refers to the set of coordinates of the atoms in the conformer. For polypeptides, the coordinates are preferably in PDB (protein data bank) file format. Processing then proceeds to block 60 where a check is performed to determine if the model space is sufficiently populated. If the check at block 60 indicates that the model space is not sufficiently populated, the processing goes back to block 25, wherein the first generated conformer is used as the starting structure to generate a different conformer by a rotation different from what was used to generate the discarded conformer. If the check at block 60 indicates that the model space is sufficiently populated, the process ends (block 65) with the determination of an alternative structure for the macromolecule, made based on the conformer(s) having the lowest free energy. The user can specify a range of pitch, yaw and roll angles to define the model space. When the iterations reach the user-

specified range, the iteration ends. Alternatively, the iterative process ends when no new conformer can be because doing so results in steric clashes or irreconcilable unnatural bond lengths or angles. A combination of the both is also possible, g

Advantageously, the method can be readily adapted to the problem of ligand binding motions. In brief, a ligand of interest is docked to each alternative conformer and the binding free energy for the ligand-bound conformer is calculated. This process is repeated iteratively to generate a data set of ligand-bound conformers having alternative conformations and a binding free energy associated with each ligand-bound conformer. As shown herein, the calculated protein-ligand binding free energy can be used to select a correct ligand-bound structure from the ensemble of generated conformers. Thus, those conformers having the lowest free energies of ligand binding represent thermodynamically-likely ligand-bound conformations of the macromolecule.

The method can be used to predict a ligand-bound formation of a hinge-bending macromolecule, such as a polypeptide, given its apo structure (structure in the absence of a ligand). Advantageously, the computational cost of the method is moderate, permitting practical implementation on a single processor. The method can be also be used to generate trajectories of hinge bending motion based on a single structure. As additional feature, the generated motion conserves the rigidity of the domains, the intermediate structures are equilibrated by molecular dynamics, and the final position of the ligand is predicted. The method is useful both for predicting conformers resulting from large scale hinge bending movement, as well as for minimal hinge bending movement.

While the following disclosure refers to a polypeptide, the method should not be construed as being limited to a polypeptide. Armed with the present disclosure and knowledge in the art, the skilled artisan is able to perform the method of the invention using any macromolecule. Hinge location

The method of the invention is carried out with a macromolecule for which structural information of the apo or other configuration is available. This structure of the macromolecule is the "parent structure." Preferably, the structural information is based on a crystal structure. However, structural information from NMR or a theoretical structural prediction are also useful.

The location of the hinge in the macromolecule can be determined using any method known in the art. For a polypeptide or other macromolecule that has been crystallized in two different conformations, these conformations can be inspected visually to determine the hinge location. Experimental methods of determining the location of a hinge in a polypeptide include analysis of proteolytic fragments, NMR (nuclear magnetic resonance) and the like.

The hinge location can also be determined theoretically with reasonable accuracy. There are several processes known in the art for identifying a hinge location theoretically. These processes include, but are not limited to, Gaussian Network Model (GNM; Bahar et al., 1997, Fold Des. 2:173-181), Floppy Inclusions and Rigid Substructure Topography (FIRST; U.S. Pat. No. 6,014,449), Translation Libration Screw Motion Determination (TLSMD; Painter et al., 2005, Acta Crystallogr D Biol Crystallogr 61 (Pt 4):465-471), and the FlexOracle, NM and HingeMaster methods of the invention.

In a preferred embodiment, the hinge location is identified using FlexOracle, NM (normal modes) or HingeMaster. Advantageously, the flexibility information provided by these methods can used for other applications. Such applications include motion prediction by methods other than the method of the invention, elucidation of protein function and other methods apparent to the skilled artisan in view of this disclosure and the knowledge in the art. FlexOracle refers to a method of predicting a hinge location in a polypeptide. There is a 1 -cut hinge predictor embodiment and a 2-cut hinge predictor embodiment. The FlexOracle 2-cut hinge predictor is a highly accurate method for hinge identification based on a single structure. NM is a family of hinge predictors based on normal modes. HingeMaster combines several predictors, including FlexOracle and NM, to generate a weighted vote on the combination of the combined predictors.

FlexOracle: FlexOracle is a novel method based on the premise that if two or more domains are joined by a hinge, and if a peptide bond is broken on the protein, the energetic cost of separating and solvating the two resulting fragments will be lowest if that break is in a hinge. Conversely, if the break is inside a rigid domain, the energetic cost will be high. Domains can move relative to each other only if the motion is permitted energetically. Thus if two domains have many interdomain interactions they are unlikely to separate. Similarly, if a motion results in the

exposure of large hydrophobic areas on the protein, then the energetic and entropic cost of solvation will make that motion less likely to occur.

In one embodiment, FlexOracle is a single-cut hinge predictor. The idea of evaluating the cost of separating two fragments is implemented using the minimization and single point energy evaluation features available in almost any molecular mechanics engine. This energy of separation is equivalent, up to an additive constant, to the difference in enthalpies between the two fragments generated by introducing a single cut on the protein chain on the one hand, and the original, undivided chain on the other hand. This energy evaluation is carried out for every choice of cut location, and the resulting energy vs. cut location graph is expected to have minima at locations that coincide with flexible hinges between domains.

The method starts with an energy minimization step, to relieve any close contacts or unnatural bond lengths or angles in the undivided chain which would bias the results. There are several programs known in the art for carrying out this step. In one embodiment, the step is carried out using TINKER'S minimize routine with the OPLS-AIl Atom (Jorgensen et al., 2006, J Amer Chem Soc. 118:11225- 11236) force field and the Ooi-Scheraga Solvent Accessible Surface Area (SASA) (Ooi et al., 1987, PNAS 84:3086-3090) continuum solvation free energy term. For each iteration of the predictor, a cut is introduced between residues i - 1 and i of a polypeptide having N residues.. This cut divides the protein into two fragments, numbered 1 and 2. Fragment 1 is a polypeptide containing residues 1 to i - 1, and fragment 2 is another polypeptide containing residues i to N.

These fragments are used in an energy calculation as follows. E _c is defined as the single point energy of the complete (undivided) protein. This includes bonded and non-bonded interactions. In the energy evaluation step, the OPLS-AIl Atom force field with the SASA implicit solvent model is used again. For each choice of cut location i, fragment single point energies E _frag] (0 and E _frag2 (i) are calculated. The method relies on the assumption that

δE(0 = E _fmgλ (I) ₊ E _fmgl (0 -E _c [1] is related to the energy change associated with hinge motion about the selected hinge. The quantity AE(i) represents the intra-fragment energy gained or lost by breaking all of the interactions between fragment 1 and fragment 2, as might occur in an opening motion. It also includes the solvation energy which might be gained or lost.

The quantity E _c is a constant independent of the cut location and can be set to zero without consequence. Even when the actual motion of the protein is not an opening one, the method has predictive value because for incorrect choices of the hinge location, i.e., cut locations that are actually inside one of the domains, many inter- fragment interactions would be broken. Also, significant hydrophobic areas would be exposed on the surfaces of fragments 1 and 2. In either case, AE(i) would be relatively high.

The procedure of cutting the protein before residue i and computing AE(i) is repeated for values of i that are scanned from 2 through N. A plot of δE(j) vs. i is made and minima on this graph correspond to hinge locations. Local minima tend to coincide with hinges; globally lowest energy values were not the best indicators of flexibility. However, many minima are generated by short range fluctuations in the predictor results, which do not correspond to hinges. A moving window minimum identifier is used to more clearly define the minima that are most likely to correspond to hinges. In a preferred embodiment, moving window minimum identification starts with the energies being normalized to range from 0 to 1. A given residue is considered to be a minimum if it has the lowest energy of any residue in a window that also includes 8 residues to the left and right (for a total of 17 residues in the window). However, the residue also has to be lower in energy than the highest energy residue in the window by 0.12; the value of this difference can be optimized by the skilled artisan for each type of macromolecule by routine methods Lastly, residues less than 20 amino acids from either terminus are not considered as possible minima. Whenever any residue i is found to be a minimum, residue i — 1 is also considered to be a minimum. This is because the energy value associated with residue i actually corresponds to a cut between residues i - 1 and i.

Standard molecular mechanics force fields do not account for the backbone and side chain entropy, which is not needed to calculate dynamics. For the single-cut hinge predictor, entropy is important, since it is possible that changes in freedom of motion influence conformational change. Therefore, in a preferred embodiment, the energy evaluation step is carried out using the FoIdX force field (Schymkowitz et al., 2005, Nucleic Acids Res 33(Web Server issue):W382-388; Schymkowitz et al., 2005, PNAS 102(29):10147-10152) instead of the OPLS-AH

Atom force field. The fundamental difference between the FoIdX and OPLS-AH Atom force fields is that the former is an Empirical Effective Energy Function, based entirely on experimental data. FoIdX includes terms that estimate the entropic cost of constraining the backbone and side chains in particular conformations. The interaction with solvent is treated mostly implicitly, although persistent entrained water molecules are treated explicitly. Other terms account for Van der Waals, hydrogen bonding, electrostatic, and steric interactions. This change, which accounts for entropy in the energy minimization step, improves the single-cut hinge predictor method.

The predictor method described above is implicitly geared towards the detection of single-stranded hinges since it cuts the chain at a single location. It is to be expected that there exists a "single-cut" error associated with the fact that the backbone is cut at only one location, because, in many proteins, the backbone crosses the hinge region two or more times. One way to deal with double stranded hinges is to make not one but two cuts in the backbone, at residues i and/. To do this, the single index i was replaced with the indices i and/. These define two fragments consisting of the following residues:

Fragment 1 : residues 1 to (i - 1) and (J to N)

Fragment 2: residues i to (J -1)

In one embodiment, CHARMm with the Born Solvation Model is used to compute the enthalpies of the fragments. This embodiment is not preferred as the computational expense is prohibitively high and the accuracy relatively low. In a preferred embodiment, FoIdX is used to compute the free energy. This embodiment is preferred because the accuracy of prediction improved, while the computational expense remained reasonable. In order to find the choice of i and/ corresponding to the hinge location, one may generate two fragments for every possible choice of *,/. Advantageously, however, it was found that restricting i and/ to multiples of four is sufficient to locate the hinge in most cases. Furthermore, the resulting 16-fold reduction in computational expense brings the method into the realm of practical calculation on a single processor. Additional savings are obtained by restricting the range of/,/, to no fewer than 5 residues from either terminus and requiring that i < (j ⁺ 8), although numbers greater than 8 can potentially be used for even greater savings. Thus, the calculation scheme looks like this:

for (i = 8 to N - 5 - 8 step 4 ) for (y = i + 8 to N - 5 step 4) compute stability of fragment 1 + fragment 2

Once the free energies of folding for all such fragments are calculated they can be plotted, for instance, with energies color-coded by energy. Examination of exemplary graphs and comparing local minima of free energy to known hinge locations, the following cases were observed:

1. The i,j indices of a minimum were near the diagonal, meaning the corresponding fragment 2 was small. Such minima were discarded since the diagonal energies are generally small, and small fragment motions are not of interest in this method.

2. Both i and j were near the termini. These minima were also discarded. Although the termini are usually flexible, such motions are not of interest in this method.

3. Of the minima that did not fall in cases 1 or 2, the lowest minimum sometimes had one of its two indices near a terminus, but the other substantially far from either terminus. In this case, the former index was discarded for the reasons cited in (2). The latter index however tended to coincide with a single-stranded hinge.

4. Of the minima that did not fall in cases 1 , 2, or 3, the lowest very often indicated the location of a double stranded hinge.

5. Lastly, on occasion, the minimum reported following cases (3) or (4) did not correspond to the known hinge location, however, one of the higher minima not eliminated per cases 1 and 2 did.

To address these cases, the method may include clustering and postprocessing steps. Exemplary steps are now described. As a culling step, all choices of i,j that result in

FoIdX energy < min(FoldX energy) + (max(FoldX energy) — min(FoldX energy)) 0.1 are flagged. If this results in fewer than 30 fragment pairs, the 15% of pairs with lowest energy are flagged instead. All the remaining (unflagged) elements are not considered to be candidates for the hinge location.

The k-means clustering algorithm is then used to identify and separate the local minima. Centroids are initially generated in a regular grid spaced 50 residues apart starting at i,j = 25,25. The pairs flagged in the culling step are each assigned to the nearest centroid. The location of each centroid is then recomputed for each resulting cluster, and the pairs are once again reassigned to the nearest recomputed centroid. This process is repeated until all centroids stopped moving. The lowest-energy element of each cluster is taken as the local minimum corresponding to that cluster.

The minima identified are recorded in order of energy, with the lowest corresponding to the global minimum. Any minima such that i ≥(j - 24) are discarded since they border the diagonal, per case (1) above. If, for any minimum, both i andy were within 20 residues of the termini, that minimum is also discarded, per case (2). For the lowest remaining minimum, if only one of the two indices is within 20 residues of a terminus, then the protein is identified as having a single- stranded hinge, per case (3). The index near the terminus is discarded and the remaining index is taken to be the location of the single-stranded hinge. Otherwise, both indices are taken together to indicate the location of a double stranded hinge, per case (4). Since the calculation is done only for every fourth residue, the hinge prediction was reported as a range:

Hinge 1 : residues i -2 to i +1 Hinge 2: residues j -2 toy +1

Case (5) occurred somewhat less frequently. Thus, although the method described outputs the remaining local minima, these are much less accurate than the primary hinge prediction.

NM: NM is a family of hinge predictors based on normal modes. The first member of this family, called NMλ, posits that the minima of the normalized squared normal mode fluctuations should coincide with hinges. As shown herein for the case of domain hinge bending, the first (rather than higher) normal mode is most informative, a point of some debate in the literature. A second, novel method, designated NMi?, detects the most significant structural domain through segmentation of normal mode correlation matrices. Subsidiary novel predictors, NMC and NMD, use similar information to find additional hinges.

Normal mode expansions provide the form of displacements of a structure at each of a progressive series of resonant frequencies, or excitation frequencies to which an elastic structure responds strongly. Various studies underscore the importance of low-order modes in describing protein motion, but some particular motions appear best described by high order modes. The NM method makes use of the concept of a nodal surface. As an example, consider a one dimensional guitar string driven at its second harmonic frequency. The string will have a nodal point in the middle which remains stationary. A drum head (effectively two dimensional) similarly will have nodal lines; depending on which mode is excited. A three-dimensional object such as a tuning fork or a protein will have a surface which describes the locus of points that remain stationary when the object vibrates at one of its normal frequencies. The displacements of points on opposite sides of this nodal surface have opposite sign. This surface is in some sense a hinge, about which the motion occurs. In view of the knowledge in the art, two ideas emerge regarding hinges: 1. The nodal surface of the lowest order normal mode eigenvector should coincide with the hinge location.

2. The nodal surfaces of the second, third, and higher normal mode eigenvectors should also coincide with the hinge, but to a lesser degree. To test these, the mobility score, M _n for each residue i in the A:"' mode was extracted for k = 1 to 7. This quantity is the square fluctuation of residue i in mode k, normalized such that the most mobile residue has mobility M _ik = 1 for mode k. One ROC curve was generated for each mode k.

The results indicated that the first idea was correct. Specifically, significant hinge information is contained in the first normal mode displacements. The second and third normal modes have almost no predictive information, as reflected by areas under the curve near 0.5. In fact, with some sets of proteins, areas under the curve less than 0.5 were obtained, indicating that the second and higher modes are negative predictors. Modes higher than 3 were also found to have very little hinge information. Therefore, the second idea is incorrect. These data indicate that M^

alone should be used for hinge prediction. For consistency with the notation, ^x _NM A$) ^≡ M _a is used herein.

To describe NMB, NMC and NMD, a review of the calculation of normal mode motional correlations between α-carbon atoms in a protein is useful. It is possible to compute this quantity by means of a weighted sum of the correlations due to each normal mode in a thermal ensemble, vis.:

Where F is the Kirchoff, or connectivity matrix, and Q is the diagonal matrix of eigenvalues O^ of F. The elements of Fare simple to obtain approximately using the GNM method. [^J, is the displacement of the α-carbon of residue / due to normal mode k. δfζ is the net displacement of the α-carbon of residue i from its equilibrium position. υ is the effective spring constant between atoms under the GNM method. Cross-correlations are normalized with respect to the auto-correlations as follows:

The matrix of average correlations was computed for all contiguous stretches of residues in the protein chain as follows:

A matrix W is generated by weighting this matrix to favor pairs k,l that are maximally distant from each other, and are likely the endpoints of a structural domain:

W(k,l)= IH AKt). [5]

200

W{k,l) is treated as a two-dimensional discrete function of k,l and its minima is identified using the algorithm described for FlexOracle. NMB (Contiguous Domain Boundary Identifier), NMC and NMD (Excluded Region Identifier) all use this list of minima, but treat it differently. NMB ranks the minima by the value of W(k,ι) at the minimum. The particular values of the indices k,l, where k<l at the location of the global minimum are taken as the residue numbers of a pair of hinge points. If MJ) is within 5 residues of the N(C) terminus, then MJ) is dropped and the other index is reported as the sole hinge point. The last modification is that for the hinge point at MJ) the hinge is reported as spanning residues k—\ to k ( /— 1 to /).

NMC goes through the same procedure, except it ignores the lowest minimum (already reported by NMB) and processes all remaining minima as above. If any hinge point is within five residues of a hinge point corresponding to a lower minimum, the hinge point corresponding to the higher minimum is discarded.

NMD (Excluded Region Identifier) works somewhat differently. It is based on the idea that, while it may not be possible to precisely identify the flexible regions of protein, parts of the protein can be identified that are rigid. The method is premised on the idea that the hinge may lie anywhere except in these rigid regions.

For a minimum of W located at residues MJ), it considers residues k+l to 1—1 to be part of a structural domain and excludes them from consideration as a hinge. The process is repeated with the remaining minima k,l of W(k,l), Any residues that were not excluded after all minima have been considered in this way are reported as potential hinges.

HingeMaster: Different hinge algorithms use substantially different information to make hinge predictions. Consequently, they have different strengths and yield very different results. StoneHinge is good at finding the general region of the hinge, but often overestimates the size of the hinge region. FlexOracle is also not very precise but often shows wide regions of high energy that indicate domains, or wide valleys where hinges are likely. TLSMD, on the other hand, makes a small number of predictions, well spaced apart, one or two of which often lie exactly on or very close to domain hinges, and the rest of which are incorrect or lie on points of non-domain flexibility.

HingeMaster is a novel method that combines the hinge prediction of StoneHinge (Keating et al., 2006, submitted), translation libration screw motion

determination (TLSMD; Painter et al., 2005, Acta Crystallogr D Biol Crystallogr 61 (Pt 4):465-471), Hinge Seq (Flores et al., submitted), and different embodiments of FlexOracle, the NM family of predictors to provide a single prediction of a hinge location. HingeMaster produces an output which is a weighted vote of the four individual predictors:

^X HingeMasleriO ~ 2-1 K ^X c(0 Vc eC [6] where

C = {StoneHinge,FO\,FO\M,FO,HingeSeq, TLSMD,NMA,NMB,NMC,NMD,\) ^x c\ ^l) = output of predictor c for residue i. c = weighting coefficient of predictor c, determined below. FOl refers to single-cut FlexOracle with the FoIdX force field. FOlM refers to a second embodiment of the FlexOracle predictor which detects the local minima of the same. FO refers to the two-cut FlexOracle predictor.

In a preferred embodiment, least squares fitting is used to find the A _c *s in Equation 6 corresponding to an optimal predictor. The procedure follows. Let}> = a column vector, the components y($) of which are the hinge annotations of the m residues in the HAG, in the format 1 = hinge, 0 = non-hinge. The index i counts over all residues in all proteins of the set in question, which in this work will be either the training, test, or complete HAG set. Order is unimportant as long as the i 's m ^' y are in the same order as the i 's in x, below. Let jc = an mx9 matrix, the rows of which will be used to predict the rows of y. Each column of x is a an m-component vector x _c , such that C e C. Each component x _c (ι) of each such column vector is the output of the predictor c for residue i. Correspondingly, x(i) (without a subscript) is a row vector with 9 components corresponding to the output each of the 9 predictors emitted for residue i. Let λ = a column vector, the components λ _c of which will give the weight to be applied to the various predictors in order to make the composite HingeMaster predictor. Thus according to the definition of HingeMaster (Equation 6):

^X H _mg eMas _W r = ^{xλ ai} y - [ ⁷ ]

Note that HingeMastefc C.

To obtain λ, the quantity (xλ - y) ² is minimized. The least squares regression methodology is a standard one. The result is that: λ = (x ^τ _X y ^ι x ^τ y [8]

The above Equation 8 can be said to train λ based on predictor output and gold standard annotation over some set of residues i. The best available value of λ is likely to be one fitted using the set of all residues in all proteins in the HAG, which was designated as {HAG} . That is to say, in Equation 8 x,y(i \ i e {HAG}) is used to obtain a particular value of λ called λ ^HAG .

HingeMaster was validated by first randomly separating the 20 homologous pairs of proteins in HAG into a training set consisting of 15 of these pairs (30 total proteins) and a test set consisting of the remaining 5 pairs. The set of all residues in all proteins in the training set called {TRAINING} , while the set of residues in the test set called {TEST} . Equation 8] was evaluated using x,y(i \i e {TRAINING}) to obtain the cross-validation value of the vector λ. , which is called λ . This vector was used with the predictor results for residues in the test set to obtain predictor results as follows: X ^■ X(1 11 e {TEST}) [8] A ROC curve was generated by gradually decreasing the threshold above which values of x ^* _HingeMas , _er (i \ i e {TEST}) were taken to correspond to predicted hinge locations, and comparing these to the annotated hinge locations

_y(ι | i e {TEST}). For each value of the threshold, residues i with scores x ^* _HiagtMasUr (J) above that threshold are taken to be test positives. The test positives were further classified using a strict criterion, meaning that those that coincide exactly with annotated hinges (>"(/) = 1) are taken as true positives, those that coincide with non- hinge residues (.KO = 0) are taken as false positives, even if they are immediately adjacent to a hinge residue. A loose criterion may also be used for a more qualitative measure of success.

The above process was repeated a total of 20 times. Each time,

{TEST} and {TRAINING} were randomized. Thus, 20 different values of X were obtained and 20 different ROC curves were generated.

The fitting of λ and λ was carried out as described above. The resulting weighting factors are shown in Table 1. Values of c are given in the left column. The output ^x e is given for predicted hinge and predicted non-hinge residues, for each predictor c. For example, NMA gives output ranging continuously from 0 to 1, with the lower values more likely to correspond to hinge locations. NMB, on the other hand, gives discrete output: 1 for predicted hinge locations and 0 for predicted non-hinge locations. Note that the sign of λ _c corresponds to whether higher or lower values correspond to hinges for that predictor, c = 1 is a dummy constant which compensates for the difference in mean values of predictors x vs. gold standard annotation y.

Table 1

The predictors were evaluated using the statistical measures of sensitivity (true positives/gold standard positives), specificity (true negatives/gold standard negatives), and p-value (probability of obtaining the observed predictor results by random selection) in Table 2. Note that these were computed under the strict criterion, meaning that a test positive was considered to be a false positive if it coincided with a non-hinge residue, even if it was immediately adjacent to an annotated hinge residue. Test positives are predicted hinge locations. NMA and FOl give continuous (rather than discrete) output, normalized to range from 0 to 1 for each protein. Therefore for NMA and FOl, values below .02 and 0.1, respectively, were taken as test positives. There were a total of 13259 residues in the HAG, of which 152( 13107) were Gold Standard Positives(Negati ves). Therefore for the example of StoneHinge, sensitivity was calculated as 42/152=0.28 and specificity was (13259- 1204)/l 3107=0.91. For the same example, p-value was computed as the probability of finding 42 or more true positive residues in a set of 1204 residues selected randomly and without replacement from a set of 13259, using the cumulative hypergeometric distribution

Table 2

The ROC curve generated for HingeMaster is shown in Figure 2. In each case, the ROC curve rises quite steeply near the origin, and also has area under the curve substantially greater than 0.5, indicating significant predictive power.

The hinge predictions generated by the described methods can be used to choose the hinge location for generation of alternative conformations in the method of the invention, as well as for any other application wherein knowledge of hinge location is useful. In the method of the invention of identifying an alternative

structure for a macromolecule, the next step is assignment of residues to the domains delimited by the hinge location.

Assignment of domains and COMs: Residues in the protein are then assigned to one of three regions: a "stationary" domain, a hinge region, and a "mobile" domain. Specifically, once the hinge location is identified, the hinges are identified as residues i to j, k to 1, and m to n. Domain 1 (Dl) consists of residues 1 to i-1 and 1+1 to m-1. Domain 3 (D3) consists of residues j+1 to k-1 and n to N, where N is the number of residues in the protein. "Domain" 2 (D2) consists of the hinge residues i to j, k to 1, and m to n. Note that more or fewer hinge points are possible; most proteins have two hinge points. The centers of mass (COM; Pang et al. ₅ 2003, FEBS Lett 550:168-174) of domains Dl, D2, and D3 are determined using the center_of_mass function from Visual Molecular Dynamics (VMD; Humphrey et al., 1996, J Molec Graphics 14:33-38; see also www(dot)ks(dot)uiuc(dot)edu/Research/vmd) and labeled Xl, X2, and X3, respectively. These domain and COM definitions are used in the subsequent preparation and manipulation of the structure.

Preparation and standard orientation of protein: The starting structure is then put into a standard orientation. In one embodiment, the standard orientation is the convention in which X2 coincides with the origin, Xl lies along the z-axis, and X3 lies in the — y part of the yz plane. In further preparation, where the method is carried out on a single protein chain, all additional peptides, ligands, metals, water, and dissolved counterions are removed from the analysis.

When the method is used to identify an alternative ligand-binding structure for a macromolecule having a hinge, the ligand of interest is docked to the starting structure. Any docking program suitable for docking a ligand to a macromolecule may be used. Many docking programs are known in the art, including, but not limited to, AutoDock (Molecular Graphics Lab, Scripps Research Institute, La Jolla, CA) GOLD (Cambridge Crystallographic Data Center, Cambridge. UK), DOCK (Molecular Design Institute, UCSF, San Francisco, CA) and Glide (Schrδdinger, Portland, OR). Preferably, the ligand is present from the beginning of the method, so that subsequent equilibration steps do not lead to side chains obstructing the active site. The docked ligand-protein complex is then put into standard orientation, as described above.

Rotation: To generate a conformer of the polypeptide, the mobile domain is rotated around at least one of three axes with respect to the stationary domain. The three axes have their origin in the center of mass of the hinge. In one embodiment, D3 is the mobile domain and is rotated with respect to Dl . The three possible rotations are rotation about the x-axis, σ _x , also referred to as the "pitch" rotation, rotation around the y-axis, σ _y , the "yaw" rotation, and rotation around the z- axis, σ ₂ , the "roll" rotation. σ _x generically used herein to denote the matrix of Euler rotation about the x-axis. σ _x (α) specifically refers to an Euler rotation of α degrees. Therefore, when this matrix is multiplied from the right by any coordinate vector σ _x (α)»R, a new coordinate vector R' is generated, which is rotated about the x-axis by an angle α. Similar definitions hold for σ _y and σ _z. Since the conformer in question is in standard orientation, with the center of mass of the hinge region positioned at the origin of the Cartesian coordinate system, applying such rotations to D3 results in the rotation of that region about the hinge. When a ligand is docked to the polypeptide, the ligand is conventionally not rotated, i.e. remains stationary with respect to Dl . The rotations can be effected in various increments, with 15° giving reasonable results. A smaller rotation increment may yield greater accuracy but at increased computational expense, while a larger increment may yield the opposite result.

D3 may be rotated in one axis, two axes or all three axes to generate a conformer. In one embodiment, the mobile domain is rotated about the hinge in fixed increments in all three possible axial directions. A collection of six conformers are generated by rotation around a single axis in a plus or minus fixed increment. For instance, a first conformer is generated by a +15° pitch rotation, a second conformer is generated by a -15° pitch rotation, a third conformer is generated by a +15° yaw rotation, a fourth conformer is generated by a -15° yaw rotation, a fifth conformer is generated by a +15° roll rotation and a sixth conformer is generated by a -15° roll rotation. These six child conformers are the first generation of conformers and result from a single rotation to the parent polypeptide structure. To generate a second generation of conformers, at least one of the six child conformers of the first generation is used as the starting structure for another rotation. Preferably, two or more of the six child conformers of the first generation are used as the starting structure for additional rotation. If the same increment value is used (e.g., 15°) then each of the six first generation conformers can be used to generate five, new

conformers. The conformer that results from the reverse of the rotation used to generate the first generation conformer reproduces the parent molecule. For instance, if the first generation conformer results from a +15° yaw rotation to the parent molecule structure, a -15° yaw rotation takes the conformer back to the parent molecule structure. Since this would provide redundant information and waste computing resources, this is preferably avoided.

Equilibration: The rotation step almost invariably results in unphysical bond lengths and bond angles in the boundary between D2 and D3, and often in steric clashes between D3 and the rest of the protein or protein-ligand complex. To relieve these factors, an equilibration step is performed on each conformer, to permit side chains to adjust to the new domain arrangement, to allow minor backbone motions for strain relief, and, where application, to refine the interactions between the protein and ligand. A Molecular Dynamics equilibration is performed for a sufficient length of time to allow enthalpy to level off while not allowing significant domain motions. There are numerous Molecular Dynamics programs available in the art, including but not limited to, TINKER (Department of Biochemistry and Molecular Biophysic, Washington University, St. Louis, MO), CHARMM (Accelrys, Inc. or Martin Karplus, CHARMM Development Project, Harvard University, Cambridge, MA) and NAMD (The Theoretical and Computational Biophysics Group. University of Illinois, Urbana, IL). In a preferred embodiment, the equilibration step is performed by using TINKER'S mdrun program. In one embodiment for a ligand-bound protein, 10000 time steps (20 ps) was sufficient for equilibration. The skilled artisan can readily optimize this parameter, given the present disclosure and the knowledge in the art. Any conformer that fails in the equilibration step is prevented from spawning further child conformers, since such a failure is indicative of irreconcilable steric clashes. Failure refers to when the molecular dynamics code fails to converge in the equilibration step. Excessive steric clashes, unnatural bond lengths and angles, and other unphysical circumstances result in very large forces that the code cannot deal with. Any of these, when sufficiently severe, results in the failure of the molecular dynamics code to solve the equations of motion for the conformer, therefore it fails to converge.

Scoring: A free energy of ligand binding or other suitable energy parameter is then estimated for each equilibrated conformer, for instance, using the AutoDock force field, and is recorded.

For a ligand-bound conformer, prior to the free energy calculation step, the ligand is removed from the structure file for the conformer and is re-docked to the conformer structure. The lowest docked energy is recorded, and the corresponding ligand coordinates are inserted into the protein structure file for that conformer. The position of D3 shifts slightly during the equilibration, therefore its angular orientation is calculated and recorded as explained elsewhere herein.

Iteration: The conformer is then used as the starting point for a subsequent rotation to generate another conformer. In a preferred embodiment, the six first generation conformer structures are then used as starting points for subsequent rotations to create a second generation of conformers. Due to the presence of the starting structure, as well as the children of neighboring structures, on average only about three child conformers need to be generated by each of the six conformers. This process is continued iteratively, with children conformers in turn spawning other child conformers. Any child conformer that fails in the equilibration step is prevented from spawning further children, since such a failure is indicative of irreconcilable steric clashes. In this way, the sterically accessible boundary of the pitch-yaw-roll space is defined without user supervision.

Angle calculation, display of results and selection of best conformer: The angular position of D3 of a generated conformer structure is calculated in pitch- yaw-roll space as the σ _x σ _y σ _z rotation that would have to be applied to D3 of the starting structure in standard orientation, to obtain the given structure. This step is done by first structurally aligning the generated structure with the starting structure by minimizing D 1 RMSD. Then, the rotation-translation matrix required to move D3 of the starting structure to align with D3 of the generated structure is computed using VMD's "measure fit" command. The rotational part of the rotation-translation matrix is then compared to a generic σ _x σ _y -σ _z Euler rotation matrix and the unknown angles are solved for algebraically. Note that the generic matrix is arranged according to the convention of rotating first about z, then y, then x. This calculation is performed immediately following each equilibration step, and the computed pitch, yaw, roll angles are recorded. Any program designed for viewing biological macromolecules may be used in the practice of the present invention. In one embodiment, the viewer is Jmol, which is a free, open source molecule viewer, available on the internet at

www(dot)jmol(dot)org. Jmol is used to create a viewer which represents each rotated, equilibrated, and, where applicable, re-docked ligand-bound structure as a single sphere. The location of this sphere in the three-axis coordinate system shown in the viewer corresponds to the σ _x -σ _y -σ _z rotation angles applied to obtain the structure. The color of the sphere corresponds to the estimated free energy of ligand binding for that conformer, for instance, red=high energy and blue=low energy. With this viewer, it is easy to see regions of σ _x ,σ _y ,σ ₂ space which contain low-binding-energy conformers. In some instances, it is possible to visually identify the lowest energy conformer, or at least identify regions where a more refined search of σ _x ,σ _y ,σ _z space is likely to be fruitful. In a refined search, further conformers in a region of lower energy are generated using smaller rotational increments. A more rigorous analysis is to sort the conformers by energy. The lowest-scoring conformer is selected as the most likely conformer to resemble a thermodynamically most-probable alternative conformation or, for a ligand-bound polypeptide, the thermodynamically most probable bound conformation.

Apparatuses

In an aspect of the invention, each of the methods described herein may be implemented as a program or programs of instructions executed by computer. In a typical realization, such a program or programs of instructions can be saved on a computer readable medium, such as for example a hard disk drive, a floppy disk drive, or a magnetic tape storage device, or even a plurality of such devices. Thus the program or programs of instructions may be read in and executed by one or more machines, either serially or in parallel, depending on the data in consideration. It will be understood that the novelty and utility of both the methods and their implementations are not dependent on any particular embodiment of computer(s) or computer readable medium.

Figure 12 depicts an exemplary computing system 100 in accordance with herein described system and methods. Computing system 100 is capable of executing a variety of operating systems 180 and computing applications 180' (e.g., web browser and mobile desktop environment) operable on operating system 180. Exemplary computing system 100 is controlled primarily by computer readable instructions, which may be in the form of software, where and how such software is

stored or accessed. Such software may be executed within central processing unit (CPU) 110 to cause data processing system 100 to do work. In many known computer servers, workstations and personal computers central processing unit 110 is implemented by micro-electronic chips CPUs called microprocessors. Coprocessor 115 is an optional processor, distinct from main CPU 110, that performs additional functions or assists CPU 110. CPU 110 may be connected to co-processor 115 through interconnect 112. One common type of coprocessor is the floating-point coprocessor, also called a numeric or math coprocessor, which is designed to perform numeric calculations faster and better than general -purpose CPU 110.

It is appreciated that although an illustrative computing environment is shown to comprise a single CPU 110 that such description is merely illustrative as computing environment 100 may comprise a number of CPUs 110. Additionally computing environment 100 may exploit the resources of remote CPUs (not shown) through communications network 160 or some other data communications means (not shown). In operation, CPU 110 fetches, decodes, and executes instructions, and transfers information to and from other resources via the computer's main data- transfer path, system bus 105. Such a system bus connects the components in computing system 100 and defines the medium for data exchange. System bus 105 typically includes data lines for sending data, address lines for sending addresses, and control lines for sending interrupts and for operating the system bus. An example of such a system bus is the PCI (Peripheral Component Interconnect) bus. Some of today's advanced busses provide a function called bus arbitration that regulates access to the bus by extension cards, controllers, and CPU 110. Devices that attach to these busses and arbitrate to take over the bus are called bus masters. Bus master support also allows multiprocessor configurations of the busses to be created by the addition of bus master adapters containing a processor and its support chips.

Memory devices coupled to system bus 105 include random access memory (RAM) 125 and read only memory (ROM) 130. Such memories include circuitry that allows information to be stored and retrieved. ROMs 130 generally contain stored data that cannot be modified. Data stored in RAM 125 can be read or changed by CPU 110 or other hardware devices. Access to RAM 125 and/or ROM 130 may be controlled by memory controller 120. Memory controller 120 may provide an address translation function that translates virtual addresses into physical

addresses as instructions are executed. Memory controller 120 may also provide a memory protection function that isolates processes within the system and isolates system processes from user processes. Thus, a program running in user mode can normally access only memory mapped by its own process virtual address space; it cannot access memory within another process's virtual address space unless memory sharing between the processes has been set up.

In addition, computing system 100 may contain peripherals controller 135 responsible for communicating instructions from CPU 110 to peripherals, such as, printer 140, keyboard 145, mouse 150, and data storage drive 155.

Display 165, which is controlled by display controller 163, is used to display visual output generated by computing system 100. Such visual output may include text, graphics, animated graphics, and video. ' Display 165 may be implemented with a CRT-based video display, an LCD-based flat-panel display, gas plasma-based flat-panel display, a touch-panel, or other display forms. Display controller 163 includes electronic components required to generate a video signal that is sent to display 165.

Further, computing system 100 may contain network adaptor 170 which may be used to connect computing system 100 to an external communication network 160. Communications network 160 may provide computer users with means of communicating and transferring software and information electronically. Additionally, communications network 160 may provide distributed processing, which involves several computers and the sharing of workloads or cooperative efforts in performing a task. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used. It is appreciated that exemplary computer system 100 is merely illustrative of a computing environment in which the herein described apparatus and methods may operate and does not limit the implementation of the herein described apparatus and methods in computing environments having differing components and configurations as the inventive concepts described herein may be implemented in various computing environments having various components and configurations.

Illustrative Computer Network Environment:

Computing system 100, described above, can be deployed as part of a computer network. In general, the above description for computing environments applies to both server computers and client computers deployed in a network environment. Figure 13 illustrates an exemplary illustrative networked computing environment 200, with a server in communication with client computers via a communications network, in which the herein described apparatus and methods may be employed. Server 205 may be interconnected via a communications network 160 (which may be either of, or a combination of a fixed-wire or wireless LAN, WAN, intranet, extranet, peer-to-peer network, the Internet, or other communications network) with a number of client computing environments such as tablet personal computer 210, mobile telephone 215, telephone 220, personal computer 100, and personal digital assistance 225. In a network environment in which the communications network 160 is the Internet, for example, server 205 can be dedicated computing environment servers operable to process and communicate web services to and from client computing environments 100, 210, 215, 220, and 225 via any of a number of known protocols, such as, hypertext transfer protocol (HTTP), file transfer protocol (FTP), simple object access protocol (SOAP), or wireless application protocol (WAP). Each client computing environment 100, 210, 215, 220, and 225 can be equipped with browser operating system 180 operable to support one or more computing applications such as a web browser (not shown), or a mobile desktop environment (not shown) to gain access to server computing environment 205. In operation, a user (not shown) may interact with a computing application running on a client computing environments to obtain desired data and/or computing applications. The data and/or computing applications may be stored on server computing environment 205 and communicated to cooperating users through client computing environments 100, 210, 215, 220, and 225, over exemplary communications network 160. A participating user may request access to specific data and applications housed in whole or in part on server computing environment 205. The applications and/or data may be communicated between client computing environments 100, 210, 215, 220, and 220 and server computing environments for processing and storage. Server computing environment 205 may host computing applications, processes and applets for the generation, authentication, encryption, and communication of web services and may cooperate with other server computing

environments (not shown), third party service providers (not shown), network attached storage (NAS) and storage area networks (SAN).

Thus, the apparatus and methods described herein can be utilized in a computer network environment having client computing environments for accessing and interacting with the network and a server computing environment for interacting with client computing environments. However, the apparatus and methods providing the identification of an alternative structure for a macromolecule can be implemented with a variety of network-based architectures, and thus should not be limited to the example shown. The herein described apparatus and methods will now be described in more detail with reference to a presently illustrative implementation. Figure 14 shows an illustrative implementation of exemplary macromolecule processing platform 300, such as would be used for the generation of alternative structures for a given macro molecule having desired characteristics. As is shown in Figure 3, macromolecule processing platform 300 comprises sensor client computing environment A 320, client computing environment B 325, up to and including client computing environment N 330, communications network 335, server computing environment 360, data storage containing energy data 340, data storage containing user conformer data 345, data storage containing macromolecule data 350, management and macromolecule processing application 370. Additionally, as is shown in Figure 3, client computing environments 320, 325, and 330 are capable of displaying, manipulating, and navigating processed macromolecule data 302, 304, and 306 respectively. Communications network 335 can comprise one or more of fixed- wire and/or wireless intranets, extranets, and/or the Internet. Exemplary macromolecule processing 360 application operatively can comprise one or more modules (not shown) (e.g., applications, applets, scripts, or other computing environment executables) to perform one or more selected operations and/or functions as part of generating processed macromolecule data including but not limited to, a hinge location identification module for use in identifying suitable hinge locations, a rotation module for use to rotate a domain with respect to another domain in generating a first conformer, a rotation module for use to rotate a domain with respect to another domain in generating a first conformer, wherein a ligand is docked to one of the domains and remains stationary with respect to that domain, an equilibration module for use to perform a molecular dynamics equilibration of generated conformers to generate one or more equilibrated conformers, a re-docking module

configured to dock a ligand to a generated equilibrated first conformer, and a calculation module configured to calculate the free energy of generated equilibrated conformers or to calculate the binding energy of a generated equilibrated ligand- docked conformer

In an illustrative operation, one or more of client computing environments 320, 325, or 330 can operatively communicate with server computing environment 360 over communications network 335 to process one or more portions of a macromolecule to generate alternative structures having desired characteristics. Responsive to requests for macromolecule processing, server computing environment 360, executing macromolecule processing application 370, can process data (e.g., inputted data from any of client computing environments 320, 325, or 330) and/or retrieve data from data stores 340, 345, and/or 350 to generate processed macromolecule data for communication to client computing environments 320, 325, up to and including 330 over communications network 335. The processed macromolecule data 302, 304, 306, can then be displayed, manipulated, and navigated on client computing environments 320, 325, and/or 325, respectively.

In the illustrative implementation, macro molecule data store 350 can comprise archived data representative of previously processed macromolecules. Conform data store 345 can comprise data of one or more conformers for use in generating the processed macromolecule data. Energy data store 340 can comprise data representative of various energy states for a given macromolecule. Additionally, energy data store can comprise one or more instructions for execution by macromolecule processing application to calculate various energy levels for one or more portions of a macromolecule.

It is appreciated that although macromolecule processing platform 300 is described having various components cooperating in a manner to generate processed macromolecule data that such configuration and deployment is merely illustrative as the herein described systems and methods contemplate the generation of macromolecule data using computing environments having various configurations including but not limited to stand alone computing environments and mobile computing environments. Further, although Figure 3 describes platform 300 as comprising various data stores, it is appreciated that such description is merely illustrative as the inventive concepts described herein can be performed using various

data stores including but not limited to single partitioned data stores and distributed data stores.

The herein described apparatus and methods provide the identification of an alternative structure for a macromolecule. It is understood, however, that the invention is susceptible to various modifications and alternative constructions. There is no intention to limit the invention to the specific constructions described herein.

On the contrary, the herein described apparatus and methods are intended to cover all modifications, alternative constructions, and equivalents falling within the scope and spirit of the herein described apparatus and methods.

It should also be noted that the herein described apparatus and methods may be implemented in a variety of computer environments (including both non- wireless and wireless computer environments), partial computing environments, and real world environments. The various techniques described herein may be implemented in hardware or software, or a combination of both. Preferably, the techniques are implemented in computing environments maintaining programmable computers that include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Computing hardware logic cooperating with various instructions sets are applied to data to perform the functions described above and to generate output information. The output information is applied to one or more output devices. Programs used by the exemplary computing hardware may be preferably implemented in various programming languages, including high level procedural or object oriented programming language to communicate with a computer system. Illustratively the herein described apparatus and methods may be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Each such computer program is preferably stored on a storage medium or device (e.g., ROM or magnetic disk) that is readable by a general or special purpose programmable computer for configuring and operating the computer when the storage medium or device is read by the computer to perform the procedures described above. The apparatus may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner.

Although an exemplary implementations of the herein described apparatus and methods have been described in detail above, those skilled in the art will readily appreciate that many additional modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of the herein described apparatus and methods. Accordingly, these and all such modifications are intended to be included within the scope of this herein described apparatus and methods. The invention may be better defined by the following exemplary claims.

EXAMPLES The invention is further described in detail by reference to the following experimental examples. These examples are provided for purposes of illustration only, and are not intended to be limiting unless otherwise specified. Thus, the invention should in no way be construed as being limited to the following examples, but rather, should be construed to encompass any and all variations which become evident as a result of the teaching provided herein.

Experimental Example 1 : FlexOracle method

The three embodiments of the FlexOracle method (single-cut hinge predictor using TINKER; single-cut hinge predictor using FoIdX and 2-cut hinge predictor) were tested against 20 pairs of protein structures (40 total structures), in the Hinge Atlas Gold (HAG).

The HAG is a dataset of manually annotated hinges publicly available on the Database of Macromolecular Motions (http://MolMovDB(dot)org) (Gerstein et al., 1998, Nucleic Acids Res. 26(18):4280-4290; Gerstein et al., pp. 401-442 in: Rigidity Theory and Applications, Thorpe et al., eds., Klewer Academic, New York, NY 1999.; Krebs et al, 2000, Nucleic Acids Res. 28(8): 1665- 1675; Krebs et al., 2003, Methods Enzymol. 374:544-584). The HAG provides a collection of 20 homologous pairs of single-chain protein structures. The HAG is specifically compiled for the purpose of testing structure-based predictors of domain hinges and therefore includes only structures that meet the following conditions:

1. The structure is independently stable, rather than relying on other chains or molecules to maintain its conformation.

2. The structural coordinates were obtained by x-ray crystallography, with the exception of calcium-tree calmodulin.

3. At least two sets of atomic coordinates are available, and together they represent a domain motion that is biologically relevant or thermodynamically feasible. 4. The motion involves two or more rigid domains moving about a flexible hinge.

Each of these pairs of protein structures, also known as morphs, has an annotated hinge location. This location was chosen prior to running any hinge prediction codes, by visual inspection of the corresponding morph movie. Manual annotation has been found to be more reliable than the use of automated methods such as FlexProt, DynDom, or Hingeflnd, which depend on user-adjustable parameters and sometimes incorrectly assign the hinge location. The process of inspection and annotation was aided by the "Hinge Annotation Tool" available on the morph page for each morph in MolMovDB. It consists of a set of arrow buttons which adjust the position of a window of residues, which are highlighted as the protein moves. This tool can also take annotations from the public for various uses. The result of the annotation effort is a set of hinge residues for structural pairs against which FlexOracle and other hinge predictors can be tested. The hinge annotation in the HAG is not encyclopedic. It is based on the comparison of two sets of structural coordinates, but other motions not reflected by this measure may be thermodynamically feasible. Notably, in some case, FlexOracle predicted hinges not annotated in HAG but for which experimental evidence was later found in the published literature. Since the point of the HAG is to be objective rather than comprehensive, in these cases, the annotation or the scoring of the predictor results was not changed.

Statistical evaluation: FlexOracle assumes hinges do not simply correspond to points of globally lowest energy, but rather to local minima identified and postprocessed in various ways. The set of residues reported as predicted hinge locations by any of the three versions of FlexOracle are referred to as test positives, and the number of residues in this set is called M. The residues annotated as hinges in the HAG are referred to as gold standard positives, and the number of these hinges is called H. In this section, the test positives were compared to the gold standard

positives to objectively evaluate the predictor. Other standard statistical terms as they relate to the current context are defined in Table 3.

Table 3

The p-value is computed for all predictors in this study using the cumulative hypergeometric function,

M p-value = ∑HYP(H,D,x,M) [10] x= TP where the hypergeometric function gives the probability of finding exactly x of the H gold standard positive residues in a set of M residues randomly chosen from the population numbering D:

The sensitivity, specificity, and p-value are used in the statistical evaluation, p-value is a particularly useful quantity, since it compares directly to random picking. The three quantities were used to evaluate the three versions of FlexOracle and compare to GNM (Bahar et al., 1997, Fold Des. 2:173-181), long a popular flexibility prediction algorithm.

The results of the experimental example are now described. The TINKER and FoIdX versions of the single-cut predictor were evaluated first. The test positives were those residues identified as local minima according to the algorithm described in the detailed description section. The various statistical quantities per the above definitions were then tabulated. GNM required a slightly different treatment. To evaluate this predictor, the absolute value of the first normal mode displacements was computed and was normalized to range from 0 to 1. The nodes, or points of zero displacement, are taken to correspond to the hinge location. Therefore, all residues with normalized displacement smaller than 0.02 were taken to be test positives. The results are shown in Table 4.

Table 4

Qualitatively, the FoIdX version of the single-cut predictor was observed to be significantly less noisy, and therefore had fewer minima than the TINKER version (292 residues for FoIdX vs. 923 for TINKER). This led to a lower sensitivity for the FoIdX version, but improved specificity and p-value. GNM is less specific than either of the single-cut predictors, but has better sensitivity and p-value. The two-cut hinge predictor embodiment of FlexOracle was run on the 40 proteins in HAG, and the results were compared to the hinge annotation. Test positives were reported by the two-cut predictor in windows 4 residues wide due to the 4-residue grid spacing. This window width is referred to as the strict criterion and was used for the statistical benchmark. The results are shown in Table 4. Notably, the p-value is 3.5- 10 ^"66 . This value indicates the method has a very high predictive power.

This result proves the statistical significance of the test but in practice, for a given protein, a prediction that is in some sense close enough to the correct hinge may for practical purposes be considered a true positive, even if it does not coincide exactly. Therefore, for a more operational benchmark the definition of the test positives was widened to include 5 residues to the left and right of the predicted hinge location, for a window width of 14 residues (loose criterion). When a gold standard positive residue was found within the 14-residue window, this was considered a true positive. The test was considered a success for a given protein if

there were no false positives or false negatives under this criterion. The test was considered a partial success if there were one or more true positives but also one or more false positives and/or false negatives. Finally, the test was a considered a failure if there were no true positives for that protein. The results are shown in Table 5. As can be seen, the majority of the proteins were successes.

Table 5

Under this criterion, there were 47 true positive hinge points. For these, the average distance between the center of the gold standard positive residues and the center of the test positive residues was 1.66 residues. For 29 out of the 47, the distance was 1 or 0 residues. Thus, even under the loose criterion, the predictions had a tendency to line up closely with the HAG hinges. The predictor did not work well for the two pairs of proteins with triple-stranded hinges. As discussed previously, the HAG annotations reflect hinges chosen under a very specific crystallographic criterion and are not encyclopedic. Therefore, for some of these "failures," it is possible that the prediction is correctly suggesting a motion which is thermodynamically permitted but is not reflected in the pairs of structures used to generate the hinge annotations.

In summary, the single-cut version of FlexOracle naturally works best on single-stranded hinges. In addition, the single-cut predictor nonetheless has predictive ability in these cases of proteins with two strands in the hinge, although the two-cut predictor is much more accurate. The two-cut predictor, in contrast, was specifically designed to handle double-stranded hinges. It was also designed to respond to single stranded hinges by discarding one cut of the pair as described earlier.

Under either scheme, only one chain is analyzed at a time, in the absence of ligands, bound metals, or additional subunits of a complex. The method is robust under removal of small ligands from co-crystallized coordinate sets. The

method obtained mixed results with calmodulin, thus, careful use is necessary with metal-bound proteins. Similarly, care should be taken with single subunits taken from complexes.

Experimental Example 2: Alternative conformation for Biotin carboxylase The method of the invention was tested using a polypeptide known to undergo large scale domain hinge bending and for which there is a crystal structure for both the apo and the ligand-bound conformation. Acetyl-CoA carboxylase, found in all animals, plants, and bacteria, catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the first committed step of fatty acid synthesis. Biotin carboxylase is one of the three components that comprise acetyl-CoA carboylase in E. coli. Biotin carboxylase, a member of the ATP -grasp superfamily, is composed of three domains, A, B, and C. A and C share a large interface area and appear to move as a single unit, separated from domain B by a helix-turn-helix motif spanning residues 107-126 and flexible hinges spanning residues 127-130 and 204-207. Upon binding ATP, domain B rotates approximately 45° with respect to A and C (Thoden et al., 2000, J Biol

Chem. 275:16183-16190). The large scale domain motion appears to be the cause of cracking when protein crystals are soaked with ligand. Biotin carboxylase operates as a dimer, but the dimerization interface is far from domain B.

The hinge location was determined using the 2-cut FlexOracle hinge predictor, which has been shown to be successful in locating the hinge within a few residues for hinge bending proteins. Figure 3 indicates the hinge location, residues 131-132 and 192-193, used in this example.

The D3 RMSD gives a good measure of the scale of domain rearrangements. Therefore, to benchmark the results, each conformer was aligned structurally with the ligand-bound structure known from crystallography. This was done by minimizing RMSD between Dl of the predicted and Dl of the known ligand- bound structure. Once this was done, the RMSD between D3 of the predicted and D3 of the known ligand-bound structure was calculated. Since it is D3 which has moved, this difference gives a measure of the large-scale conformational change which has occurred over the course of the simulation. If a large scale conformational change is indeed required for ligand binding, the RMSD between the D3 domains should be

significantly lower for the predicted bound conformation than for the apo structure used as a starting point.

A shown in Figure 4, the predicted bound conformer of biotin carboxylase had a markedly lower D3 RMSD than the apo structure. The predicted bound conformer superimposed well with the bound structure known crystallographically (Figure 5).

Experiment Example 3: Alternative conformation for Glutamine binding protein

The motion of glutamine binding protein (GIuBP) as it binds glutamine involves large-displacement domain hinge bending, estimated to take ~5 ns (Pang et al., 2003, FEBS Lett 550:168-174). Molecular Dynamics simulations of the apo • structure for this length of time failed to result in domain closure. Pang et al suggested that the failure was due in part because closing time should be a stochastic process and in part because the ligand would be expected to induce closure, while the dynamics were computed with no ligand information. The method of the invention for identifying a ligand-binding conformation was tested using the structural coordinates for glutamine binding protein in it apo configuration and its ligand.

The hinge location was determined by the FlexOracle 2-cut hinge predictor, which found a hinge at residues 86-89 and 182-185. Residues 88-89 and 181 - 182 were selected as the hinge location for the purpose of generating conformers using the method of the invention (Figure 6).

Conformers were systematically generated in a grid with 15° spacing.

A region of low binding energy was found. The predicted bound conformer was about ~lθA lower in D3 RMSD, and about 4 kcal/mol lower in binding energy compared to the apo structure (see Figure 7). The superimposition of the predicted bound conformer with the ligand-bound structure known crystallographically is shown in Figure 8. These data demonstrate that the method of the invention is able to predict the ligand binding motion of GIuBP based solely on the structural coordinates of its apo structure and it ligand.

Experimental Example 4: Alternative conformation for MurA

The method of the invention has been shown to predict the bound conformation of a protein when large scale domain motion is required. To test the program on a protein which does not have a large scale domain motion, the method was carried out on MurA. When MurA binds to the antibiotic T6361 , the bound structure is very similar to the apo structure. The peculiarity of this ligand is that it binds to the open conformation of MurA, rather than the closed. Thus, if the bound conformation predicted by the method of the invention is correct, it should not differ significantly from the apo structure by the measure of D3 RMSD. It should, however, have significantly lower estimated binding energy, since the Molecular Dynamics equilibration should have resulted in side chain rearrangements conducive to better ligand binding.

The 2-cut FlexOracle predicter identified residues 18-21 and 230-233 as the most likely hinge residues. There was no strong HingeMaster prediction in the range of 18-21, however there was a strong GNM minimum coinciding with residues 20-21. HingeMaster's global minimum was at residue 230.

Figure 9 depicts the hinge location, residues 20-21 and 228-229, selected for this example. As illustrated in Figure 10, the binding energy of the predicted ligand-bound conformer decreased about 6 kcal/mol compared to the apo structure, while the D3 RMSD changed only slightly. The superimposition of the apo structure, the predicted bound conformer and the ligand-bound structure known crystallographically is shown in Figure 1 1.

The disclosures of each and every patent, patent application, and publication cited herein are hereby incorporated herein by reference in their entirety. While the invention has been disclosed with reference to specific embodiments, it is apparent that other embodiments and variations of this invention may be devised by others skilled in the art without departing from the true spirit and scope of the invention. The appended claims are intended to be construed to include all such embodiments and equivalent variations.

Previous Patent: KINASE INHIBITORS AND METHODS OF USE THEREOF

Next Patent: ASSISTANCE METHOD AND APPARATUS FOR ONLINE PURCHASES OF GOODS OR SERVICES CONDUCTED WITH PAYMENT CAR...