Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
THE ESTABLISHMENT OF PROTEOME STRUCTURE PROFILE DATABASES AND THEIR USES
Document Type and Number:
WIPO Patent Application WO/2004/081535
Kind Code:
A2
Abstract:
The present invention provides methods for the determination of the proteome, including primary profile and quaternary profile of a proteome. The present invention also provides methods for determining a proteome status in a biological sample. The methods of the invention may be used to analyze the structure of protein drugs or to analyze the status of gene expression.

Inventors:
TSAY YEOU-GUANG
WANG CHENG-NAN
Application Number:
PCT/US2004/007628
Publication Date:
September 23, 2004
Filing Date:
March 12, 2004
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BIONOVA CORP (US)
TSAY YEOU-GUANG
WANG CHENG-NAN
International Classes:
C12Q1/68; G01N31/00; G01N33/68; G06F17/00; G01N; (IPC1-7): G01N/
Foreign References:
US20020087273A12002-07-04
US20020045194A12002-04-18
Other References:
ANDERSON N.L. AND ANDERSON N.G.: 'The human plasma proteome. History, character, and diagnostic prospects' MOLECULAR AND CULLULAR PROTEOMICS vol. 1, no. 11, 2002, pages 845 - 867, XP002986053
Attorney, Agent or Firm:
Feng, Flora W. (Levin Cohn, Ferris, Glovsky and Popeo., P.C., 666 Third Avenu, New York NY, US)
Download PDF:
Claims:
CLAIMS We Claim:
1. A method for determining a proteome structure profile in an organism comprising the steps of : (1) collecting a sample from the organism ; (2) determining a proteome structure profile of at least one proteome in said sample.
2. The method of claim 1 wherein the proteome structure profile is selected from the group consisting of a protein abundance profile of at least one protein in said sample, a protein post translational modification status profile of at least one protein in said sample, and a proteome quaternary structure profile of at least one proteome in said sample.
3. The method of claim 1 wherein the proteome structure profile assesses the presence, absence or severity of a medical disorder; determines the likely outcome resulting of a course of treatment ; identifies pharmacological properties of pharmaceuticals ; or identify the etiology and pathophysiology of a medical disorder.
4. The method of claim 1 wherein the sample is a body fluid or a blood cell.
5. The method of claim 4 wherein said body fluid is plasma, urine, pleural effusion, cerebrospinal fluid, sputum abdominal effusion or semen.
6. The method of claim 1 wherein the proteome structure profile is a protein abundance profile.
7. A method for determining a protein abundance profile in an organism comprising the steps of : (1) collecting a sample from the organism; (2) determining a protein abundance profile of at least one protein in said sample.
8. The method of claim 7 wherein the protein abundance profile assesses the presence, absence or severity of a medical disorder; determines the likely outcome resulting of a course of treatment ; identifies pharmacological properties of pharmaceuticals ; or identify the etiology and pathophysiology of a medical disorder.
9. The method of claim 7 wherein the sample is a body fluid or a blood cell.
10. The method of claim 9 wherein said body fluid is plasma, urine, pleural effusion, cerebrospinal fluid, sputum abdominal effusion or semen.
11. The method of claim 7 wherein the protein abundance profile comprises measuring the concentration of at least five proteins in said sample.
12. The method of claim 7 wherein said protein abundance profile is determined by the steps of : (1) adding a standard protein with a known concentration to said sample ; (2) digesting said sample with a sequencespecific protein cleavage agent to produce digested sample peptides and digested standard protein peptides ; (3) analyzing said peptides by mass spectrometry to generate mass spectrometric data comprising mass spectroscopy peaks; and (4) comparing the mass spectroscopy peaks of said digested sample peptides and said digested standard protein peptide to determine a concentration of at least one digested sample peptide.
13. The method of claim 12 wherein said sequencespecific protein cleavage agent is a protease selected from the group consisting of trypsin, AspN, ArgC, LysC, GluC and chymotrypsin.
14. The method of claim 12 wherein said sequencespecific protein cleavage agent is a chemical selected from the group consisting of hydroxylamine, CNBr, BNPSskatole, and 2Nitro5thiocyanatobenzoate.
15. The method of claim 12 wherein said mass spectrometry is a method selected from the group consisting of liquid chromatographytandem mass spectrometry, capillary electrophoresistandem mass spectrometry, matrixassisted laserinduced desorption/ionization timeofflight mass spectrometry and matrixassisted laserinduced desorption/ionization quadruple timeofflight hybrid mass spectrometry.
16. The method of claim 12 wherein said comparing step comprising: (1) processing said mass spectrometric data to generate the specific ion chromatograms for a peptide of said protein in said sample for deriving ion counts; (2) processing said mass spectrometric data to generate the specific ion chromatograms for a digested peptide of said standard protein for deriving ion counts ; (3) combining the ion counts at all charge states of said digested sample peptide to derive a value 2T (4) combining the ion counts at all charge states of said digested standard protein peptides to derive a value ES (5) determining the said concentration from the ratio ET/ES,.
17. The method of claim 12, wherein said comparing step comprises: (1) processing said mass spectrometric data by mass spectrum averaging and deconvolution method to derive a series of synthetic peptide mass maps; (2) performing mass spectrum integration of said synthetic peptide mass maps to generate a final synthetic peptide mass map; (3) calculating said concentration of a protein as a ratio IT/IS wherein IT is the intensity of a peptide from said protein and Is is the intensity of a peptides from said standard protein in said final synthetic peptide mass map.
18. The method of claim 7 further comprising the step of comparing the protein abundance profile with a second protein abundance profile from a second patient suffering from said disorder to diagnose said disorder.
19. The method of claim 7 further comprising comparing the protein abundance profile with a protein abundance profile database to diagnose said health condition.
20. A method of determining a protein post translational modification status profile in an organism comprising the steps of : (a) collecting a sample from the organism ; (b) determining a post translational modification status of at least one amino acid position of at least one protein in said sample.
21. The method of claim 20 wherein the protein post translational modification status profile assesses the presence, absence or severity of a medical disorder; determines the likely outcome resulting of a course of treatment ; identifies pharmacological properties of pharmaceuticals ; or identify the etiology and pathophysiology of a medical disorder.
22. The method of claim 20 wherein the sample is a body fluid or a blood cell.
23. The method of claim 22 wherein said body fluid is plasma, urine, pleural effusion, cerebrospinal fluid, sputum abdominal effusion or semen.
24. The method of claim 20 where said determining step comprises: (1) separating at least one protein from said sample; (2) digesting said protein with a sequencespecific protein cleavage agent to form peptides; (3) performing mass spectrometry on the peptides to produce mass spectrometric data; (4) identifying modified peptide candidates from the mass spectrometric data; and (5) mapping the amino acid position of the modification within said modified peptide candidates by tandem mass spectrometry.
25. The method of claim 24 wherein said protein is separated by size, mass, charge, sedimentation coefficient, or a combination thereof.
26. The method of claim 24 wherein said separating step is performed by SDSPAGE, non denaturing PAGE, two dimensional gel electrophoresis, immunoaffinity purification, and a combination thereof.
27. The method of claim 24 wherein said protein cleavage agent is a protease selected from the group consisting of trypsin, AspN, ArgC, LysC, GluC and chymotrypsin.
28. The method of claim 24, wherein said protein cleavage agent is a chemical selected from the group consisting of hydroxylamine, CNBr, BNPSskatole, and 2Nitro5thiocyanatobenzoate.
29. The method of claim 24 wherein said mass spectrometry is a method selected from the group consisting of liquid chromatographytandem mass spectrometry, capillary electrophoresistandem mass spectrometry, matrixassisted laserinduced desorption/ionization timeofflight mass spectrometry and matrixassisted laserinduced desorption/ionization quadruple timeofflight hybrid mass spectrometry.
30. The method of claim 24 where said identifying step comprises: (1) processing mass spectrometric data to generate specific ion chromatograms for determining retention time of a peptide from said protein; (2) processing mass spectrometric data to generate specific ion chromatograms for determining retention times of peaks for a modified form of said peptide; (3) comparing the said retention times of said peptide and said modified form for determining the differences of retention time (ART) ; (4) finding said modified peptide candidates if the said differences of retention time (ART) is less than ten minutes.
31. The method of claim 24, wherein said identifying steps comprising: (1) processing said mass spectrometric data by mass spectrum averaging and deconvolution method to derive a series of synthetic peptide mass maps; (2) performing mass spectrum integration of said synthetic peptide mass maps to generate a final synthetic peptide mass map; (3) analyzing said final synthetic peptide mass map to find said modified peptide candidates.
32. The method of claim 20 where said determining steps comprise: (1) separating at least one protein from said sample by a physical characteristic ; (2) digesting said protein with a sequencespecific protein cleavage agent to form peptides ; (3) analyzing said peptides to acquire mass spectrometric data by mass spectrometry ; (4) interpreting said mass spectrometric data to determine said posttranslational modification status.
33. The method of claim 20 where said determining steps comprise: (1) digesting said sample directly with a sequencespecific protein cleavage agent to form peptides; (2) analyzing said peptides to acquire mass spectrometric data by a mass spectrometric technique; and (3) interpreting said mass spectrometric data to determine said posttranslational modification status.
34. The method of claim 33 wherein said physical characteristic is size, mass, charge, sedimentation coefficient, or a combination thereof.
35. The method of claim 33 wherein said separating step is performed by SDSPAGE, non denaturing PAGE, two dimensional gel electrophoresis, immunoaffinity purification, and a combination thereof.
36. The method of claims 32 or 33 wherein said protein cleavage agent is a protease selected from the group consisting of trypsin, AspN, ArgC, LysC, GluC and chymotrypsin.
37. The method of claims 32 or 33 wherein said protein cleavage agent is a chemical selected from the group consisting of hydroxylamine, CNBr, BNPSskatole, and 2Nitro5tlliocyallatobenzoate.
38. The method of claims 32 or 33 wherein said mass spectrometric technique is a method selected from the group consisting of liquid chromatographytandem mass spectrometry, capillary electrophoresistandem mass spectrometry, matrixassisted laserinduced desorption/ionization timeofflight mass spectrometry and matrixassisted laserinduced desorption/ionization quadruple timeofflight hybrid mass spectrometry.
39. The method of claims 32 or 33 where said interpreting step comprises: (1) processing mass spectrometric data to generate specific ion chromatograms for determining ion counts of a modified peptide from said protein; (2) processing mass spectrometric data to generate specific ion chromatograms for determining ion counts of the unmodified form of said modified peptide ; (3) calculating said protein post translational modification status as XM/ (ZM+ 2U) where EM is the value of the ion counts of said modified peptide and SU is the value of ion counts of said unmodified form.
40. The method of claims 32 or 33, wherein said interpreting steps comprising: (1) processing said mass spectrometric data by mass spectrum averaging and deconvolution method to derive a series of synthetic peptide mass maps; (2) performing mass spectrum integration of said synthetic peptide mass maps to generate a final synthetic peptide mass map; (3) calculating said protein post translational modification status as IM/(IM + Iu) wherein IM is the intensity of said modified peptide, lu is the intensity of unmodified form of said modified peptide in the final synthetic peptide mass map.
41. The method of claim 20 wherein said protein post translational modification status profile comprises a fractional value representing the fraction of amino acids in said amino acid position that has post translational modification.
42. The method of claim 20 wherein protein post translational modification profile is a profile of modifications selected from the group consisting of disulfide bond, acylation, acetylation, carbamylation, hydroxylation, glycosylation, methylation and phosphorylation.
43. The method of claim 20 wherein said protein post translational modification status profile comprises determining the post translational modification of at least five amino acid positions in one or more proteins in said sample.
44. The method of claim 20 wherein said protein is selected from the group consisting of albumin, hemoglobin alpha, transferrin, fibrinogen beta, alpha2macroglobulin, alphalantitrypsin, ApoAl, C3a complement protein, haptoglobin, Creactive protein, and a combination thereof.
45. The method of claim 20 wherein said protein is albumin and said at least one amino acid is selected from the group consisting of : S256 ; T260; S294 ; Y435; S443, and T444.
46. The method of claim 20 further comprising the step of comparing the protein post translational modification status profile with a second protein posttranslational modification status profile from a second patient suffering from said health condition to diagnose said disorder health condition.
47. The method of claim 20 further comprising comparing the protein post translational modification profile with a protein posttranslational modification status profile database to diagnose said health condition.
48. The method of claim 20 wherein said disorder is selected from the group consisting of : HCV infection, HBV infection, chronic renal failure, hepatocellular carcinoma, cervical cancer, and HIV infection.
49. A method of determining a proteome quaternary structure profile in an organism comprising the steps of : (1) collecting a proteome sample from the organism; (2) identifying at least one species and compositions of quaternary structures in said proteome sample; and (3) determining the abundances of at least one quaternary structures in said proteome sample.
50. The method of claim 49 wherein the protein quaternary structure profile assesses the presence, absence or severity of a medical disorder ; determines the likely outcome resulting of a course of treatment; identifies pharmacological properties of pharmaceuticals ; or identify the etiology and pathophysiology of a medical disorder.
51. The method of claim 49 wherein the sample is a body fluid or a blood cell.
52. The method of claim 51 wherein said body fluid is plasma, urine, pleural effusion, cerebrospinal fluid, sputum abdominal effusion or semen.
53. The method of claim 49 wherein said identifying steps comprise: (1) fractionating said proteome sample into a plurality of samples physical characteristic; (2) analyzing a plurality of said fractions for their primary structure profiles; (3) reconstructing said primary structure profiles to generate distribution silhouette of a plurality of proteins; (4) identifying said quaternary structure species by discerning peak shapes in said distribution silhouettes of said proteins; (5) grouping proteins with similar peak shapes in said silhouettes to identify said compositions of quaternary structures.
54. The method of claim 49 wherein said determining steps comprise: (1) fractionating said proteome sample into a plurality of samples by a physical characteristic ; (2) analyzing a plurality of said fractions for their primary structure profiles; (3) reconstructing said primary structure profiles to generate distribution silhouette of a pleural of proteins; (4) determining the abundances of said quaternary structure by measuring corresponding peak areas of its protein components.
55. The method of claims 53 or 54 wherein said physical characteristic is size, mass, charge, sedimentation coefficient, or a combination thereof.
56. The method of claims 53 or 54 wherein said fractionating step is selected from a group of methods including gel filtration chromatography, ion exchange chromatography, affinity chromatography, hydrophobic interaction chromatography and ultracentrifugation sedimentation.
57. The method of claim 49 wherein said determining steps comprise: (1) identifying all the conversion factors between peak area of all protein components and said quaternary structures (2) determining protein abundance profile of said proteome sample; (3) calculating proteome quaternary structure profile with said protein abundance profile and said converting factors.
58. The method of claim 49, further comprising a prefractionating step between said collecting step and said analyzing step, wherein said fractionation step removes at least 10% of the monomeric proteins from said proteome sample.
59. The method of claim 58 wherein said prefractionation step is gel filtration chromatography.
60. The method of claim 49 further comprising the step of comparing the quaternary structure profile with a second quaternary structure profile from a second patient suffering from said'disorder to diagnose said disorder.
61. The method of claim 49 further comprising comparing the quaternary structure profile with a quaternary structure profile database to diagnose said health condition.
62. A method of determining the effects of a physiological condition on a proteome comprising the steps o£ (a) collecting a proteome sample from a patient ; (b) administering the physiological condition to the patient ; (c) collecting a second proteome sample from the patient; (d) determining a primary proteome profile, a quaternary proteome profile, or both from said first and second sample; (e) comparing said primary proteome profile or said quaternary proteome profile of the first and second sample to determine the effects of a physiological condition on said patient.
63. The method of claim 62 wherein the physiological condition is the administration of an agent is selected from the group consisting of a drug, a protein drug, a toxin, chemical, DNA, RNA, protein, receptor, antibody, ligand, virus, cell, metabolite waste, toxins, combinatorial libraries of chemicals, and fragments and combinations thereof.
64. The method of 63 wherein the agent is labeled with a detectable label.
65. The method of claim 62 wherein the method determine the binding partner for the agent.
66. A method for determining the structure of a protein comprising the steps of : (a) collecting a sample comprising said protein; (b) digesting said protein; (c) analyzing said digested protein by liquid chromatographytandem mass spectrometry to determine a plurality of chromatography spectras of said digested protein and a partial identity of said protein ; (d) collecting a plurality of candidate peptides, wherein the candidate peptides representing a digestion product of a plurality of candidate proteins ; (e) analyzing said plurality of peptides by liquid chromatographytandem mass spectrometry to determine a plurality of chromatography spectra of said candidate peptides ; (f) comparing the chromatography spectras of said digested protein and said candidate peptides to determine the structure of said protein.
67. The method of claim 66 wherein said liquid chromatographytandem mass spectrometric is an electrospray ionizationion trap tandem mass spectrometry.
68. The method of claim 66 wherein said protein is a ribosomal frameshift mutant protein and wherein said method determines the site of the ribosomal frameshift.
69. The method of claim 66 wherein said candidate proteins is made by the steps of : (a) analyzing a DNA database to determine the candidate nucleic acids that encode the protein; (b) producing a protein for each possible translation product of said candidate nucleic acids.
70. A fraction collection device comprising a switch valve, a fluid inlet, a fluid outlet, a pressure source, a first sample conduit loop having a first conduit inlet and a first conduit outlet and a second sample conduit loop having a second conduit inlet and a second conduit outlet.
71. The fraction collection device according to claim 70, wherein the fluid outlet is in communication with an outlet needle.
72. The fraction collection device according to claim 70, wherein upon the switch valve being in a first position, the first sample conduit is in fluid communication with the fluid inlet and the second sample conduit is in fluid communication with the pressure source and the fluid outlet.
73. The fraction collection device according to claim 70, wherein upon the switch valve being in a second position, the second sample conduit is in fluid communication with the fluid inlet and the first sample conduit is in fluid communication with the pressure source and the fluid outlet.
74. The fraction collection device according to claim 73, wherein upon the switch valve being in a second position, the second sample conduit is in fluid communication with the fluid inlet and the first sample conduit is in fluid communication with the pressure source and the fluid outlet.
75. The fraction collection device according to claim 70, further comprising an actuator for changing positions for the switch valve.
76. The fraction collection device according to claim 75, further comprising a timer for activating the actuator for changing the position of the switch valve.
77. A method for fraction collection comprising: diverting a fluid flow into a first sample conduit during a first period of time to collect a first fluid fraction from the fluid flow therein; diverting the fluid flow into a second sample conduit during a second period of time to collect a second fluid fraction from the fluid flow therein; flushing the first fluid fraction out of the first sample conduit and out an outlet during the second period of time; and flushing the second fluid fraction out of the second sample conduit and out an outlet during the first period of time.
78. An apparatus for fraction collection comprising: at least one diverting means for diverting a fluid flow into a first sample conduit during a first period of time to collect a first fluid fraction from the fluid flow therein and for diverting the fluid flow into a second sample conduit during a second period of time to collect a second fluid fraction from the fluid flow therein ; and at least one flushing means for flushing the first fluid fraction out of the first sample conduit and out an outlet during the second period of time and for flushing the second fluid fraction out of the second sample conduit and out an outlet during the first period of time.
79. A fraction collection device comprising: a switch valve having a plurality of ports, wherein the switch valve includes multiple positions for alternately connecting multiple pairs of ports together; a fluid flow connected to a first port of the plurality of ports; an outlet needle connected to a second port of the plurality of ports; a pressure source connected to at least one third of the plurality of ports; a first sample conduit loop having a first inlet in communication with a fourth port of the plurality of ports and a first outlet in communication with a fifth port of the plurality of ports ; and a second sample conduit loop having a second inlet in communication with a sixth port of the plurality of ports and a second outlet in communication with a seventh port of the plurality of ports.
80. The fraction collection device according to claim 79, wherein upon the switch valve being in a first position, the first port is in communication with fourth port and the third port is in fluid communication with the sixth port.
81. The fraction collection device according to claim 80, wherein the seventh port is in communication with the second port.
82. The fraction collection device according to claim 79, wherein upon the switch valve being in a second position, the first port is in communication with sixth port and the third port is in fluid communication with the fourth port.
83. The fraction collection device according to claim 79, wherein the fifth port is in communication with the second port.
Description:
THE ESTABLISHMENT OF PROTEOME STRUCTURE PROFILE DATABASES AND THEIR USES RELATED APPLICATIONS This application claims benefit under 35 U. S. C. § 119 (e) of United States Provisional Application No. 60/453, 890, filed March 12,2003 and Provisional Application No. 60/526, 832, filed December 3,2003, the entire disclosures of which are herein incorporated by reference.

TECHNICAL FIELD The invention is directed to methods and systems for proteome analysis in a biological sample. The biological samples that can be analyzed include plasma, urine, blood cells and others. The biological samples include pharmaceuticals such as protein drugs including antibodies and hormones.

BACKGROUND OF THE INVENTION Throughout this specification, various patents, published applications and scientific references are cited to describe the state and content of the art. Those disclosures, in their entireties, are hereby incorporated into the present specification by reference.

Amino acid sequences along with post-translational modifications (PTMs) define the primary structures of proteins. While the amino acid sequence represents a more constant component, post-translational modification is considered as a highly regulated and usually reversible process. Post-translational modifications are the major properties that may rapidly and sometimes reversibly modulate the protein activities. The most studied post-translational modifications include disulfide bond linkage, acylation, glycosylation, methylation, and phosphorylation. The roles of these post-translational modifications in most proteomes remain largely unclear. The insufficiency of knowledge is partly due to the lack of adequate analytical methodologies, particularly those quantitative ones. For example, the conventional

biochemical approach for identification and quantitation of the modified peptides usually requires the use of radioactive reagents. Nevertheless this kind of strategy is not suitable for the analysis of many proteomes in many organisms.

There has been advances in the methodologies of modified peptide identification, particularly for phosphorylated peptides. For example, phosphorylated peptide ions in a spectrum derived from the tryptic digest can be identified by their susceptibility to phosphatases (Neubauer and Mann, 1999; Carr et al 1996).

Immobilized iron (III) or gallium (III) affinity chromatography is an alternative method of selectively enriching and identifying phosphopeptides (Posewitz, and Tempst, 1999, Neville et al. , 1997, Li and Dass, 1999). There are several other methods that incorporate affinity tags at the phosphorylated residues to facilitate specific purification of phosphorylated peptides (Oda et al., 2001, Zhou et al. , 2001).

On the other hand, mass spectrometry is becoming the method of choice for mapping the modified residues, considering its outstanding sensitivity and rapidity. Currently, modification sites on proteins at subpicomole ranges can be effectively identified by mass spectrometry (Betts et al, 1997).

The protein quaternary structure is an aggregate of tertiary structure units that exist as homo-or hetero-multimers. The formation of protein quaternary structures is mediated through the same thermodynamic forces stabilizing tertiary structures, such as electrostatic interaction, hydrogen bonding, hydrophobic interaction and disulfide bond linkage. Based on Branden and Tooze (1991), the quaternary protein structures can be categorized into four different classes: (1) covalently connected tertiary domains-In this class of protein, domains are usually formed as modules covalently combined together on a single polypeptide chain; (2) hetero-multimers-In this case different tertiary domains aggregate together to form a unit; (3) homo-multimers-It is far more common to find copies of the same tertiary domain assembling

non-covalently; and (4) larger macromolecular structures-The molecular machinery contains components made from multimeric assemblies of proteins, nucleic acids, and sugars. Examples include viruses, microtubules, flagellae, fibers of various sorts, ribosomes, histones and gap junctions.

While the components in a proteome like plasma have been analyzed, there are currently no methods for the analysis of entire proteome to diagnose a diseased or normal state in a patient (Reviewed in Anderson et al. , 2002). It is noteworthy that certain proteins in a proteome like plasma have been quantitated individually in clinical practice. For example, plasma albumin level is routinely examined in patients with liver or renal diseases. However, the current methods identify, at most, one or two proteins for diagnosis. Such profiles, while still useful, does not take advantage of the diagnostic potential of the whole proteome analysis.

Although many proteomes like plasma are intensively studied, the research interest has primarily focused on the search of unidentified scarce proteins and the analyses of individual protein levels. Little is known about the roles of other proteome structures such as post-translational modifications and quaternary structures in plasma proteome. One of the major reasons is the lack of good methodologies to identify and quantitatively determine these proteome structures in a cost effective manner.

BRIEF SUMMARY OF THE INVENTION The current invention is related to liquid chromatography-tandem mass spectrometric methodologies (abbreviated herein as LC-MS/MS) for profiling the protein structures in the entire proteome, including protein abundances, post-translational modification statuses and quaternary structures. The invention, by providing methods for analyzing proteome structural profiles, allows a better functional characterization of the proteome than was previously possible.

For proteome protein abundance profiling system, the proteome composition information is gathered first. The samples are processed by the proteome preparation procedure (provided in detail below) and then are subjected to protein cleavage. The peptide products from the protein cleavage are analyzed by LC-MS/MS methods to determine the abundances of the proteins in the proteome. Data on protein abundances are combined together to become the protein abundance profiles.

For proteome post-translational modification status profiling system, the proteome post-translational modifications are first identified via mass spectrometric analysis. The proteome samples are processed and then subjected to either in-gel or in-solution protein cleavage procedures. The resulted peptides are analyzed by LC-MS/MS method to quantitatively determine the modification statuses at particular amino acid positions. The data on protein post-translational modification statuses are combined together to become the protein PTM status profiles.

Post-translational modification profiling includes determining any modifications to individual amino acids of a protein. These modifications include the covalent attachment of glycosyl groups, acetyl groups, phosphate groups, lipid groups and the like, all of which are expressly encompassed by the term"modification". The natural or other chemical modifications, such as those listed in example above, can occur anywhere in a polypeptide (protein), including the peptide backbone, the amino acid side-chains and the amino or carboxyl termini. It will be appreciated that the same type of modification may be present in the same or varying degrees at several sites in a given polypeptide. Also, a given polypeptide may contain many types of modifications. Polypeptides may be branched, for example, as a result of ubiquitination, and they may be cyclic, with or without branching. Modifications include, at least, acetylation, acylation, carbamylation, ADP-ribosylation, amidation, covalent attachment of flavin, covalent attachment of a heme moiety, covalent

attachment of a nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid derivative, covalent attachment of phosphotidylinositol, cross-linking, cyclization, disulfide bond formation, demethylation, formation of covalent cross-links, formation of cysteine, formation of pyroglutamate, formylation, gamma-carboxylation, glycosylation, GPI anchor formation, hydroxylation, iodination, methylation, myristoylation, oxidation, pegylation, proteolytic processing, phosphorylation, prenylation, racemization, selenoylation, sulfation, transfer-RNA mediated addition of amino acids to proteins such as arginylation, and ubiquitination. (See, e. g., PROTEINS--STRUCTURE AND MOLECULAR PROPERTIES, 2nd Ed. , T. E.

Creighton, W. H. Freeman and Company, New York (1993); POSTTRANSLATIONAL COVALENT MODIFICATION OF PROTEINS, B. C.

Johnson, Ed., Academic Press, New York, pgs. 1-12 (1983) ; Seifter et al., Meth Enzymol 182 : 626-646 (1990); Rattan et al., Ann NYAcad Sci 663: 48-62 (1992). ).

For proteome quaternary structure profiling system, the entire proteome is first divided by fractionation procedures. These fractions are then analyzed for their primary structure profiles, including both the protein abundance and protein PTM status profiles. The primary structure profiles are reconstructed to generate the distribution/elution silhouettes. Through examining these distribution/elution silhouettes, quaternary structure grouping procedures are used to discern the presence of any quaternary structures as well as to identify the components within a quaternary structure. Meanwhile, quaternary structure quantitation procedures determine the abundance of a quaternary structure by examining the areas of corresponding peaks in the silhouette of individual protein components. The combination of the amounts of the quaternary structure becomes the proteome quaternary structure profile.

The invention also provides methods for creating clinical proteome structure databases and the databases so established. The databases are made by coupling the

protein structure profiles at various health conditions with clinical parameters. In producing the database, samples are collected from a patient at various physiological and pathological conditions. These samples are analyzed for their proteome structure profiles using the disclosed procedures. The clinical parameters of the patient are also determined and recorded. These parameters and the proteome structure profiles are combined to establish the proteome structure profile databases. These databases can be eventually used for many purposes including diagnostic and therapeutic ones.

It is understood that while the disclosure refers to patients and humans, the invention is applicable to all living unicellular and multicellular organisms.

Unicellular organisms include viruses, bacteria, yeasts and others. The multicellular organisms include humans, dogs, cats, cattle, pigs, chicken, goats, ducks and other animals as well as rice, soybean, potato and other plants. The term'patient'is thus defined as any organism. Further, it is understood that any organism may be substituted for"human"in this disclosure.

Furthermore, the current invention provides one additional apparatus design for facilitating the analyses using these methodologies. This is a two-conduit fraction collection design that enables sample fractionation at a higher rate while avoiding the sample loss during the movement of fluid outlet.

Furthermore, the current invention defines four important concepts. The first is the proteome primary-quaternary structure relationship concept that depicts the quaternary structure profile of a proteome can be calculated using the proteome protein abundance profiles. The second is the reverse molecular biology concept that depicts that analysis of the final protein products and the genome sequences enables us to identify the molecular mechanisms at RNA processing, translational, post-translational levels. The third is the function-specific quaternary structure concept that depicts that assembly or disassembly of a quaternary structure is central to key physiological

functions. The fourth is the pathogenic quaternary structure concept that describes the central role of quaternary structures in pathogenesis of various diseases.

The current invention also discloses the uses of the core methodologies, derived information and related product materials. With these, tools are provided to allow one of skill in the art to gain more insight into how protein abundances and post-translational modifications control protein quaternary structures and the associated functions. In addition, the procedures entailed in the current invention also have other important uses such as documentation of protein-protein or protein-ligand interactions, and characterization of recombinant protein drugs or purified plasma products. Moreover, the information entailed in these structure profile databases may shed light on how quaternary structures are modulated at different health conditions and how the quaternary structural changes play roles in physiological and pathophysiological processes. Finally, this knowledge should help us to design better materials for research, diagnostic and therapeutic uses. Such a systemic approach should provide a splendid platform for studying the structural and functional aspects in the field ofproteomics.

The current invention also provides evidence on the notion that the plasma protein abundance profiles, defined by two-dimensional gel electrophoresis, are altered at disease states, including both changes in protein abundances as well as in overall protein post-translational modification statuses. With the provided methods, experiments were performed to identify phosphorylation, acetylation, hydroxylation and carbamylation on several plasma proteins. Surprisingly, our analysis of albumin phosphorylation status profile at four separate regions demonstrated that the albumin phosphorylation status profile was altered at various disease conditions, including human immunodeficiency virus, hepatitis C virus, or hepatitis B virus infection as well as hepatocellular carcinoma, cervical carcinoma, renal failure.

With our quaternary structure profiling procedures, the methods of the invention showed, for the first time, that plasma samples from renal failure patients had different overall gel filtration silhouettes in comparison with those from normal individuals. Several plasma proteins formed large quaternary structures (>700 kDa), including fibrinogen a/p/y, a2-macroglobulin, C4b-binding protein and haptoglobin.

In contrast, albumin quaternary structures, previously known in dimeric or monomeric form, were not detected at this particular chromatographic range.

Another embodiment of the invention is directed to methods for setting up a urine proteome primary structure profiling procedure. During the acquisition of the proteome composition information, it was discovered that the overall urine proteome primary structure profiles defined by two-dimensional gel electrophoresis were altered in patients with diabetes mellitus.

In addition, our protein structure profiling procedures revealed that a therapeutic monoclonal antibody Herceptin was highly phosphorylated at the N-terminus of its heavy chain, while its quaternary structure migrated almost exclusively as a-150-kD molecule in the gel filtration column.

At last, we reported a reverse molecular biologic technique for analyzing the protein products resulted from-1 ribosomal frameshift. This method consists of three steps: LC-MS/MS analysis of the protein digests, the first data analysis with verified mRNA sequence and the second data analysis with the single insertion mutant mRNA sequences. This method successfully verified the transframe product as well as mapped the transframe positions. This exemplifies the uses of reverse molecular biology in protein analysis.

One embodiment of the invention is directed to a method for determining a proteome structure profile in an organism. The method comprises the steps of collecting a sample from the organism and determining a proteome structure profile of

the sample or determining a proteome structure profile of at least one proteome in said sample.

In this disclosure, a proteome structure profile is defined as (1) a protein abundance profile of at least one protein in a sample, (2) a protein post-translational modification status profile of at least one protein in said sample, (3) a protein quaternary structure profile of at least one protein in said sample and (4) any combination of the above.

Any of the methods of the invention may be used to assesses the presence, absence or severity of a medical disorder (health condition) or determines the likely outcome resulting of a course of treatment. In this application, the terms medical disorder, disorder, and health conditions are interchangable.

Disorders can include one or more of the following. In certain embodiments of the invention, the disorder may be associated with an infection. Bacterial infections include, but are not limited to, those associated with Staphylococcus, Nisseria, Erysipelothrix, Listeria, Salmonella, Shigella, Escherichia, Klebsiella, Enterobacter, Serratia, Proteus, Morganella, and Providencia, Yersinia, Clostridius, Spirochete, or Mycobacteria. Viral infections include, but are not limited to, those associated with Rhinoviruses, Influenza viruses, Parainfluzenza viruses, Adenoviruses, Coronaviruses, Herpes virus (e. g., HSV-1 or HSV-2), Rabies virus, Human T-Lymphotrophic Virus Type I (HTLU=T or HTLV-11), Arboviruses, Arenaviruses, Hantaviruses, Marburg or Ebola viruses, Anthrax virus, Smallpox virus, or Human immunodeficiency viruses (HIV-1 or HIV-2). Fungal infections include, but are not limited to, those associated with Coccidioides, Blastomyces, Paracoccidioides, Sporothrix, Cryptococcus, Candida, Aspergillus, Rhizopus, Rhizomucor, Absidia, Basidiobolus, Bipolaris, Cladophialophora, Cladosporium, Drechslera, Exophiala, Fonsecaea, Phialophora, Xylohypha, Ochroconis, Rhinocladiella, Scolecobasidium, or

Wangiella organisms. Parasite infections/infestations include, but are not limited to, those associated with Chlamydia, Protozoas, Amebas, Nematodes, Trematodes, or Cestodes organisms.

In other embodiments of the invention, the disorder may be associated with a proliferative disease (e. g. , cancer, tumor, or neoplastic growth). Cancers and associated disorders include, but are not limited to, renal cell carcinoma, secondary renal cancer, cancer of the renal pelvis and ureter, bladder cancer, prostate cancer, urethral cancer, penile cancer, testicular cancer, endometrial cancer, ovarian cancer, cervical cancer, vulvar cancer, vaginal cancer, fallopian tube cancer, gestational trophoblastic disease, fibrocystic disease, breast cancer, Paget's Disease, cystosarcoma phyllodes, intracranial neoplasms, benign intracranial hypertension, spinal cord neoplasms, CNS paraneoplastic syndromes, esophageal tumors, stomach cancer, small-bowel tumors, large-bowel tumors (e. g. , colon cancer), pancreatic tumors, lung cancers, thyroid cancers, cervical metastases, liver metastases, primary liver cancer, basal cell carcinoma, squamous cell carcinoma, malignant melanoma, and Kapos's Sarcoma.

In other embodiments of the invention, the disorder may be associated with contact with a toxin (e. g., poison, drug, venom, or allergen). Venoms include, but are not limited to those associated with venomous snakes, venomous lizards, spiders, bees, wasps, yellow jackets, hornets, ants, other biting arthropods, centipedes, millipedes, scorpions, and marine animals such as Coelenterates, Mollusks, stingrays, and sea urchins.

In still further embodiment of the invention, the condition may be associated with pregnancy, old age, dementia and the like.

Samples that can be analyzed, by any method of the invention include a body fluid or a cell. The body fluid can be plasma, urine, pleural effusion, cerebrospinal

fluid, sputum abdominal effusion or semen. The cell can be blood cell, tissue biopsies or pathological specimen.

The proteome structure profile can be a protein abundance profile.

Another embodiment of the invention is directed to a method for determining a protein abundance profile in an organism. The method has the steps of collecting a sample from the organism ; and determining a protein abundance profile of at least one protein in said sample. The protein abundance profile may comprise measuring the concentration of at least five proteins in said sample. The protein abundance profile can be determined by (1) adding a standard protein with a known concentration to said sample; (2) digesting said sample with a sequence-specific protein cleavage agent to produce digested sample peptides and digested standard protein peptides; (3) analyzing said peptides by mass spectrometry to generate mass spectrometric data ; and (4) interpreting said mass spectrometric data to determine an abundance of at least one protein.

The term"sequence-specific protein cleavage agent"can be protein or chemical based. Protein based agents (protease) can be trypsin, Asp-N, Arg-C, Lys-C, Glu-C and chymotrypsin. Chemical based agents can be hydroxylamine, CN13r, BNPS-skatole, and 2-Nitro-5-thiocyanatobenzoate. Other such agents are known.

The term"mass spectrometry"may mean any form of mass spectrometry and includes, at least, liquid chromatography-tandem mass spectrometry, capillary electrophoresis-tandem mass spectrometry, matrix-assisted laser-induced desorption/ionization time-of-flight mass spectrometry and matrix-assisted laser-induced desorption/ionization quadruple time-of-flight hybrid mass spectrometry.

To determine a concentration (abundance), the interpreting step can involve: (1) processing said mass spectrometric data to derive ion counts ET for a peptide of said

protein in said sample; (2) processing said mass spectrometric data to derive ion counts ES for a peptide of said standard protein; (3) determining the said abundance as the ratio ET/ES. In another way, the steps may involve (1) processing said mass spectrometric data to derive a series of peptide mass maps; (2) processing said peptide mass maps to generate a final peptide mass map; (3) calculating said abundance of said protein in said sample as a ratio IT/Is wherein IT is the intensity of a peptide from said protein and Is is the intensity of a peptide from said standard protein in said final peptide mass map.

In any of the methods of the invention, any of the results (such as protein abundance profile, a protein post-translational modification status profile, a proteome quaternary structure profile) may be compared to the results of a second patient (known to suffer from a disorder or known to be healthy) to diagnose the disorder in the first patient. As an alternative, the results can be compared to a database of results from other patients to diagnose a disorder or a lack of a disorder.

Another embodiment of the invention is directed to a method of determining a protein post-translational modification status profile in an organism by (a) collecting a sample from the organism ; (b) determining a post-translational modification status of at least one protein in said sample. The determining steps may be (a) separating at least one protein from said sample by a physical characteristic; (b) digesting said protein with a sequence-specific protein cleavage agent to form peptides; (c) performing mass spectrometry on said peptides to produce mass spectrometric data; (d) identifying modified peptide candidates from the mass spectrometric data; and (e) mapping said protein post-translational modification within said modified peptide candidates by mass spectrometry.

Whenever proteins are separated in the methods of the invention, the separation may be based on size, mass, charge, sedimentation coefficient, or a combination

thereof. Such separation techniques include SDS-PAGE, non denaturing PAGE, two dimensional gel electrophoresis, immunoaffinity purification, and a combination thereof.

The determining steps may involve (1) separating at least one protein from said sample by a physical characteristic; (2) digesting said protein with a sequence-specific protein cleavage agent to form peptides; (3) analyzing said peptides to acquire mass spectrometric data by mass spectrometry ; (4) interpreting said mass spectrometric data to determine said post-translational modification status. Alternatively, it may involve (1) digesting said sample with a sequence-specific protein cleavage agent to form peptides; (2) analyzing said peptides to acquire mass spectrometric data by mass spectrometry; and (3) interpreting said mass spectrometric data to determine said post-translational modification status.

Post-translational modification may be identified in a number of methods.

One method involves : (1) processing mass spectrometric data to determine retention time of a peptide from said protein; (2) processing mass spectrometric data to determine reten ion times of peaks for a modified form of said peptide ; (3) c comparing the said retention times of said peptide and said modified form for determining the differences of retention time (ART) ; and (4) finding said modified peptide candidates if said differences of retention time (ART) are less than ten minutes.

Another method involves (1) processing said mass spectrometric data to derive a series of peptide mass maps ; (2) processing said peptide mass maps to generate a final peptide mass map; (3) analyzing said final peptide mass map to find said modified peptide candidates.

In any of the methods of the invention, the results of an intermediate step or a final step may be interpreted by (1) processing mass spectrometric data to determine ion counts of a modified peptide from said protein ; (2) processing mass spectrometric

data to determine ion counts of the unmodified form of said modified peptide; (3) calculating said protein post-translational modification status as IMJ M+ EU) where EM is the value of the ion counts of said modified peptide and EU is the value of ion counts of said unmodified form. Another method of interpretation is (1) processing said mass spectrometric data to derive a series of peptide mass maps; (2) processing said peptide mass maps to generate a final peptide mass map; and (3) calculating said protein post-translational modification status as IM/(IM + Iu) wherein IM is the intensity of said modified peptide, Iu is the intensity of said unmodified form in the final peptide mass map.

In one embodiment, the protein post-translational modification status profile may be a fractional value representing the fraction of amino acids in said amino acid position that has post-translational modification. These modifications may be disulfide bond, acylation, acetylation, carbamylation, hydroxylation, glycosylation, methylation and phosphorylation.

The proteins that can be examined, using the methods of the invention, include albumin, hemoglobin alpha, transferrin, fibrinogen beta, alpha-2-macroglobulin, alpha-1-antitrypsin, ApoAl, C3a complement protein, haptoglobin, C-reactive protein, and a combination thereof. In particular, for albumin, the amino acids (or peptides containing these amino acids) useful for examination include S256 ; T260; S294 ; Y435 ; S443, and T444. The diseases that can be diagnosed by the methods of the invention includes any of the diseases and disorders listed. They at least include HCV infection, HBV infection, chronic renal failure, hepatocellular carcinoma, cervical cancer, and HIV infection.

Another embodiment of the invention is directed to a method of determining a proteome quaternary structure profile in an organism comprising the steps of : (1) collecting a sample from the organism; and (2) determining the abundance of a

quaternary structure of at least one protein in said sample. The determining step may include identifying step may include ( (1) fractionating said sample into a plurality of fractions by a physical characteristic; (2) analyzing said fractions for their protein abundance profiles; (3) reconstructing said protein abundance profiles to generate the distribution silhouette of said protein; and (4) determining the abundance of said quaternary structure by analyzing said distribution silhouette. Another determining step may include (1) determining protein abundance profile of said sample; and (2) calculating the abundance of said quaternary structure with said protein abundance profile and a set of conversion factors. The conversion factors are determined by (1) fractionating a sample into a plurality of fractions by a physical characteristic; (2) analyzing said fractions for their protein abundance profiles; (3) reconstructing said protein abundance profiles to generate distribution silhouettes of individual proteins in said sample ; (4) determining said set of conversion factors by analyzing distribution silhouettes of individual proteins in said sample. The method also comprises a fractionating step before said determining step, wherein said fractionation step removes at least 50% of the monomeric proteins from said sample. The fractionation step is gel filtration or dialysis.

The quaternary structures are identified by steps including (1) fractionating said sample into a plurality of fractions by physical characteristic ; (2) analyzing said fractions for their protein abundance profiles; (3) reconstructing said protein abundance profiles to generate distribution silhouette of individual proteins in said sample; (4) identifying said quaternary structure by discerning peak shapes in said distribution silhouettes.

Another embodiment of the invention is to uncover post-transcriptional events involved in the generation of a protein comprising the steps: (1) digesting the protein with sequence-specific protein cleavage agent into peptides; (2) analyzing said

peptides with mass spectrometry to acquire mass spectrometric data; and (3) identifying the said post-translational events involved in the generation of said protein by analyzing the mass spectrometric data. The post-transcriptional event is alternative splicing, trans-splicing, RNA editing, ribosomal frameshift, ribosomal skipping or post-translational modification. The identifying step may comprise: (1) processing the genome sequence of said protein based on said post-transcriptional event; and (2) identifying said post-transcriptional event by interpreting said mass spectrometric data with said processed genome sequence. Another identifying step may comprise: (1) processing said mass spectrometric data to derive a series of peptide mass maps; (2) processing said peptide mass maps to generate a final peptide mass map; and (3) identifying said post-transcriptional event by analyzing said final peptide mass map.

Many of the methods above discuss the analysis of a sample collected from an organism, it is understood that any of the methods of the invention may be used to analyze a single protein, such as, for example a protein drug. For example, the methods of the invention may be used to analyze a monoclonal antibody from a cell culture supernatant to determine if the monoclonal antibody is homogeneous or whether the monoclonal antibody comprises a mixture of molecules with different post-translational modifications. The methods of the invention may be used to analyze purified proteins to determine if the purified proteins are homogeneous or heterogeneous with respect to its post-translational modification. Thus, it is understood that any of the methods of the invention may be used to analyze a single protein sample, or a mixture of proteins. The step of collecting a sample from a patient or organism may be replaced by the step of providing a sample.

Other aspects and advantages of the present invention are described further in the following detailed description of preferred embodiments of the present invention.

BRIEF DESCRIPTION OF THE FIGURES FIG. 1 Protein/proteome structure profiling system.

FIG. 2 Proteome protein abundance profiling system.

FIG. 3 Derivation of protein abundance index.

FIG. 4 Synthetic peptide mass map method.

FIG. 5 Reverse molecular biology concept.

FIG. 6 Ion trap-TOF (time-of-flight) mass spectrometry.

FIG. 7 Proteome PTM status profiling system.

FIG 8 Methods for quantitation of protein PTM.

FIG. 9 Principles underlying PTM identification analysis.

FIG. 10 Methods for identification of modified peptides and residues: SIC analysis.

FIG. 11 Proteome quaternary structure profiling system.

FIG. 12A-B Two-conduit fraction collection design.

FIG. 13 Primary-quaternary structure relationship.

FIG. 14 A fast solution to quaternary structure profiling is protein abundance profiling.

FIG. 15 The establishment of proteome structural profile databases.

FIG. 16 Application in pharmacoproteomics : protein drugs per se.

FIG. 17 Application in pharmacoproteomics : pharmacological properties.

FIG. 18 Application in pharmacoproteomics : identification of pathogenic quaternary structures.

FIG. 19 Identification of plasma proteins of a normal individual.

FIG. 20 Differentially expressed/modified plasma proteins in different diseases.

FIG. 21 Methods for plasma proteome primary structure profiling.

FIG. 22 Some examples of SIC analysis for quantitating representative peptides (I).

FIG. 23 Some examples of SIC analysis for quantitating representative peptides (II).

FIG 24 Some examples of SIC analysis for quantitating representative peptides (III).

FIG 25 The method for quantitation of PTM status of serum albumin.

FIG 26 MS/SM spectra of modified peptides from plasma proteins (I).

FIG 27 MS/SM spectra of modified peptides from plasma proteins (II).

FIG 28 MS/SM spectra of modified peptides from plasma proteins (III).

FIG 29 The PTM status profiles of serum albumin in normal individuals and patients with hepatocellular CA, cervical CA or HIV infection.

FIG 30 PTM status profiles of serum albumin in normal individuals and patients with HBV infection, HCV infection and renal failure.

FIG. 31 The overall GFC elution silhouettes of plasma from normal individuals.

FIG. 32 The overall GFC elution silhouettes of plasma from patients with renal failure.

FIG. 33 Elution silhouettes of plasma proteins (1).

FIG. 34 Elution silhouettes of plasma proteins (2).

FIG. 35 Urine proteome primary structure profiling system.

FIG. 36 2-D Electrophoretic analysis of urine proteome.

FIG. 37 Changes of urine proteome profiles in diabetes mellitus patients.

FIG. 38 The scheme of primary structure profiling of herceptin.

FIG. 39 The tandem mass spectra of phosphorylated peptides of herceptin.

FIG 40 Quaternary structure profile of herceptin.

FIG. 41 The LC-MS/MS method for verifying the transframe protein product and mapping the-1 frameshift site.

FIG. 42 Processing of the sequence database.

FIG. 43 Diagram of the DNA template.

FIG. 44 Generation and detection of the transframe protein.

FIG. 45 Summary of the result of LC-MS/MS analysis.

FIG. 46 Mapping of the transframe site by tandem mass spectrometry.

DETAILED DESCRIPTION OF THE INVENTION One embodiment of the invention is directed to a new LC-MS/MS-based system that can be used for analyzing the primary and quaternary structure profiles of a protein, protein complex or even the whole proteome. The primary structure profile consists of two major parts, the protein abundance profile and the protein PTM status profile. The protein and protein complex can be nature products derived from body fluids, cells, or tissues. They can also be the recombinant proteins generated using a viral, bacterial or cellular expression system. This system can also be used to analyze entire proteomes, such as plasma, urine, cells, tissues or even the intact organisms.

The system is broadly described in Figure 1. With this system, we may couple the proteome structure profiles at various health conditions with clinical parameters to establish the proteome structure profile databases. At last, we discuss the uses of the methodologies, information and materials associated with the analysis system and the clinical databases.

The current invention is described in five parts : (I) the proteome protein abundance profiling system, (II) the proteome post-translational modification status profiling system, (III) the proteome quaternary structure profiling system, (IV) the establishment of proteome structure profile databases, and (V) the uses of the methodologies, information and materials related to the proteome structure profiling procedures and proteome structure profile database.

Part I : THE PROTEOME PROTEIN ABUNDANCE PROFILING SYSTEM In one embodiment, this methodology used for profiling proteome primary structure profile is illustrated in FIG. 2. First, the samples are processed by the proteome preparation procedure and then are subjected to protein cleavage procedure.

The peptide products are analyzed by LC-MS/MS-based protein quantitation

procedures to determine the amounts of individual proteins in the proteome. The proteome composition information is to be gathered first to facilitate the design of the protein abundance profiling procedure. Therefore, the protein components in the proteomes are separated using adequate proteome resolution procedures. The separated proteins are then subjected to protein identification procedures. Combining these results, we can derive the proteome composition information.

A. The proteome preparation procedures-The proteome preparation procedures are needed to prepare the proteome for later analysis. They are used for (1) conversion of proteins into a more stable form and (2) removal of the unwanted components in the sample. In-solution reduction and alkylation used in the current invention is to convert the labile cysteine into a more stable derivative. The current invention incorporates a methanol/chloroform precipitation procedure to remove salts and lipid components in the sample. The precipitated proteins are resuspended before the protein cleavage procedure is carried out. Other methods such as dialysis are also suitable for this use.

In order to accurately quantitate the protein components, it is most preferred to consider the effect of the unnecessary sample loss in subsequent quantitative analysis.

In a preferred embodiment, a standard protein (e. g. bovine casein kappa) with a known concentration is added prior to any treatment of the original sample. Once the relative abundances of proteins are determined, their absolute levels in the original sample can be derived by referring to this standard protein. It is understood that any known protein may be used as a standard protein. Naturally, proteins that can be easily distinguished, such as protein from another species, are preferred. Before protein cleavage procedure is undertaken, the proteins are prepared using chemicals, such as reduction and alkylation to change the labile thiol groups into more stable chemical structures.

B. The protein cleavage procedure In order to better characterize the physicochemical properties of target proteins, it is necessary to cleave protein into peptides/polypeptides. The protein cleavage procedures include chemical and enzymatic methods. Throughout this application, mention of enzymatic or chemical cleavage alone will be understood to include both. Among these, digestion with site-specific endopeptidases, like trypsin, Lys-C, Asp-N and Glu-C, has been proven consistent and useful. Other proteases include enterokinase, factor Xa, thrombin, IgA protease and chymosin. Chemical cleavage reagents are known and includes, at least, low pH which cleaves at AspGly or AspPro, Hydroxylamine, CNBr, BNPS-skatole, and 2-Nitro-5-thiocyanatobenzoate.

C. The protein quantitation procedure-In one embodiment, quantitative mass spectrometric technique such as liquid chromatography-tandem mass spectrometry is used to document the amounts of the peptides from each protein in the proteome. The liquid chromatography is usually performed out at low flow rates, so capillary or nanoflow HPLC is the preferred setting for such analysis. The tandem mass spectrometers used in the current invention are ion trap mass spectrometer (ThermoFinnigan) and quadruple-time-of-flight hybrid mass spectrometer (Applied Biosystems). However, any instruments that can be equipped with an electrospray ion source should fit this use well.

The LC-MS/MS data are used for calculating the relative abundances of a set of peptides that are derived from different proteins in the proteome. The selection of these peptides is preferably derived from proteome composition information that is acquired beforehand (FIG. 2; see below). The abundance of a peptide from any target protein is represented by ET, which is defined as T1 + T2 + T3 +... + Tn where Tn is the peak area at the charge state n. Meanwhile, the abundance of a peptide from the standard protein is represented by ES, which is defined as S 1 + S2 +S3 +... +Sn where

Sn is the peak area at the charge state n. Finally, the protein abundance index (or PAI) of any protein is the ratio between ET and ES, i. e. PAI = XT/XS (FIG 3). The protein abundance index reflects the absolute level of a protein in the original sample. The protein abundance profile is the combination of these PAIs for the proteins in the proteome. It is also noteworthy that the same raw data can be used for determining the proteome post-translational modification status profile when another set of peptides is employed.

Another embodiment for determining the abundances of peptides in a complicated mixture is presented in Figure 4. This method is aimed to summarize all the peptide ions detected by liquid chromatography-tandem mass spectrometry, which generates a synthetic peptide mass map. The final synthetic mass map is constructed using mass spectrum averaging procedures, deconvolution procedures and integration procedures sequentially. Importantly, in addition to the mass and relative abundance information, the retention time of a peptide is also recorded. The synthetic mass spectrum can be used for identification of new modified peptides, protein abundance analysis as well as protein PTM status determination. The peak height of a peptide is a true reflection of its abundance in the sample. If the intensity of a target peptide and a standard peptide is IT and Is respectively in the synthetic peptide mass map, the protein abundance index (PAI) should be equal to the ratio Lr/Is.

In addition, tandem mass spectra of these peptides are acquired to verify their identities. Computer programs such as Sequest (ThermoFinnigan) or ProID (Applied Biosystems) are used for interpretation of these data. This allows unambiguous identity verification of each peptide. Another parameter for verifying a peptide is the elution time in a chromatographic run. Under the same HPLC conditions, the elution time is usually very consistent. While elution time changes upon adjustment of a HPLC system, the tandem mass spectrum is a parameter independent of the HPLC

conditions.

D. The acquisition of proteome composition information-Before protein quantitation procedure is carried out, we first need to acquire the information on the proteome composition such that the set of peptides in the analysis can be properly selected. This information arises from several different sources, like (1) literature search (2) genomic analysis or (3) mass spectrometric analysis.

If mass spectrometric analysis is used, there are usually three experimental steps involved sequentially: (1) proteome resolution procedure, (2) proteome cleavage procedure, and (3) protein identification procedure.

(1) The proteome resolution procedures-These procedures separate proteins based on their charges, sizes and other chemical properties (e. g. hydrophobicities).

While they can be used for all samples, these procedures are preferred for acquisition of protein samples of simpler compositions. The techniques used in the examples are mainly electrophoretic methods, including one-dimensional SDS-polyacrylamide gel electrophoresis, two-dimensional gel electrophoresis consisting of isoelectric focusing and SDS-polyacrylamide electrophoresis. Methods that exclusively use chromatographic techniques or that combine chromatographic and electrophoretic techniques can also be used.

(2) The protein cleavage procedures-Proteins may be cut into smaller peptides using chemical and enzymatic methods. Among these, digestion with site-specific endopeptidases, like trypsin and Lys-C, has been proven consistent and are preferred.

For some embodiments, the proteins can first be cleaved into peptides and these peptides are resolved for subsequent protein identification analysis. Techniques such as multi-dimensional protein identification technology (MudPIT) are helpful in collection of the proteome composition information.

(3) The protein identification procedures-Any method of protein identification may be used including protein sequencing, identification by antibodies, and the like. The preferred method of protein identification is based on mass spectrometric techniques like LC-MS/MS. LC-MS/MS facilitates the collection of tandem mass spectra from a complicated mixture of peptide produced by cleavage of proteins. Each tandem mass spectrum is searched against a database like non-redundant protein database (National Center for Biotechnology Information NCBI) that contains protein or DNA sequences. The search results are combined together to uncover the identities of the proteins in the sample. The database search is facilitated by computer programs such as Sequest (ThermoFinnigan) and ProID (Applied Biosystems). Other mass spectrometric techniques like peptide mass mapping by MALDI-TOF methods or non-MS methods like Edman sequencing are also helpful in collection of the composition information.

E. Reverse molecular biology (FIG. 5)-m one embodiment, we define a novel concept and disclose techniques for scrutinizing the primary structures of proteins. It is known that transcription, RNA splicing and ribosomal translation are the major events that impact the expression of the final protein products. Some rare events, like RNA editing, RNA transplicing, protein transplicing and ribosomal frameshifting, are also crucial for generation of the protein products. Therefore, many mechanisms may participate in formation of different forms of the proteins.

The method of the invention provides techniques to identify all these mechanisms through analyzing the final protein products. The methods of the invention can be included to establish a new discipline, which we named"reverse molecular biology." As the human genome project is close to completion, reverse molecular biology should be helpful in verifying the amino acid sequences of proteins. This also includes the methodologies used for post-translational modification analysis.

F. Ion trap-time-of-flight hybrid tandem mass spectrometry (FIG 6)- At present, liquid chromatography-tandem mass spectrometry is the mainstay of advanced protein analysis. However, improved mass spectrometric analysis, which require efficient peptide fragmentation, high accuracy and high resolution-steps that are still inadequate in the current state of the art. Based on the properties of various mass analyzers, we provides a tandem mass spectrometer configuration consisting of upstream ion trap and downstream time-of-flight mass analyzers (ion trap-time-of-flight or IT-TOP mass spectrometer) for the improved analysis of proteins in one embodiment. Such a mass spectrometer has very good fragmentation power at the ion trap mass analyzer along with excellent resolution and accuracy over currently available the time-of flight mass analyzer. This type of instruments satisfies an unfilled need for improved methods in analyses of protein structure properties including protein abundance analyses, post-translational modification studies, and quaternary structure analyses.

Part II : THE PROTEOME post-translational modification status PROFILING SYSTEM In one embodiment, this methodology used for profiling a proteome post-translational modification (PTM) status profile is illustrated in FIG. 7. First, the samples are processed by the appropriate proteome preparation procedure and then are subjected to protein cleavage procedure. The peptide products are analyzed by LC-MS/MS-based protein PTM quantitation procedures to determine the amounts of individual PTMs in the proteome. The proteome PTM information is gathered first to facilitate the profiling procedure. Therefore, it is preferred that the protein components in the proteomes are separated using appropriate proteome resolution procedures. The separated proteins are then subjected to modified peptide identification and subsequent modified residue identification procedures. Combining

these results, the proteome PTM information is derived.

A. The proteome preparation procedures-In one embodiment, these procedures can be identical with those used for proteome protein abundance profiling.

For certain cases, we may gather significant clinical information by only analyzing the PTM statuses of a particular protein. For these proteins, they can be isolated using techniques such as SDS polyacrylamide gel electrophoresis, two-dimensional gel electrophoresis and immunoaffinity purification.

B. The protein cleavage procedures-The procedures are identical with those used for proteome protein abundance profiling.

C. The protein PTM quantitation procedures-In a preferred embodiment, a mass spectrometric method is employed to quantitatively determine post-translational modifications on the plasma proteome. This method is designed based on analyses of specific ion chromatograms (SIC). The current disclosure features the use of the unmodified peptide as the reference in quantitative analysis.

The specific ion chromatograms are used for calculating the ion counts for modified and unmodified peptides. Summation of the ion counts of peptide ions with different charges estimates the peptide levels, which is proportional to EM = M1 + M2 + M3 +... + Mn where n is the charge state. The abundance of the reference peptide is indicated by SU, which is equal to Ul + U2 + U3 +... + Un where n is the charge state.

The ratio EM/ (EM+ EU) represents the modification status of this peptide region. If the modification site is mapped, this ratio also reflects the modification status at the specific amino acid residue (FIG 8).

In another preferred embodiment, the above mentioned synthetic peptide mass map method is of great use in determination of the PTM status at specific amino acid residues. As the peak heights in the final synthetic peptide mass map is a true reflection of the peptide abundance, the protein PTM status can be defined by the ratio

Im/ (Im + Iu) where IM and Iu are the intensities of modified and unmodified peptides in the synthetic peptide mass map respectively (FIG 4).

Meanwhile, tandem mass spectra of these peptides are acquired to verify their identities. Computer programs such as Sequest (ThermoFinnigan) or Prom (Applied Biosystem) are used for interpretation of these data. This allows unambiguous identity verification of each peptide. Another parameter for verifying a peptide is the elution time in a chromatographic run. Under the same HPLC conditions, the elution time is usually very consistent. While elution time changes upon adjustment of a HPLC system, the tandem mass spectrum is a parameter independent of the HPLC conditions.

The current invention primarily uses LC-MS/MS as the protein PTM quantitation procedures. For some embodiments, quantitative mass spectrometric methods, such as MALDI-TOF or MALDI-Q-TOF analysis, may be helpful.

D. The acquisition of proteome PTM information-Before protein PTM quantitation procedure is performed, it is preferred that information is acquired on the proteome PTMs such that the set of peptide pairs in the analysis can be properly selected. This information arises from several different sources, like (1) literature search and (2) mass spectrometric analysis. If mass spectrometric analysis is used to gather the PTM information, there are four experimental steps involved: (1) proteome resolution procedures (2) protein cleavage procedures (3) modified peptide identification procedures and (4) modified residue identification procedures.

(1) The proteome resolution procedures-These procedures separate proteins based on their charges, sizes and other chemical properties (i. e. hydrophobicities).

They are used to acquire protein samples of simpler compositions. The techniques used in the examples are mainly electrophoretic methods, including one-dimensional SDS-polyacrylamide gel electrophoresis, two-dimensional gel electrophoresis

consisting of isoelectric focusing and SDS-polyacrylamide electrophoresis. Methods that exclusively use chromatographic techniques or that combine chromatographic techniques and electrophoretic techniques are also helpful.

(2) The protein cleavage procedures-They cleave protein into peptides/polypeptides. The protein cleavage procedures include chemical and enzymatic methods. Among these, digestion with site-specific endopeptidases has been proven useful. Preferred method includes a combination of different endopeptidases, including trypsin, Lys-C, Glu-C and Asp-N. The choice of digestive enzymes is primarily made based on the amino acid content and sequence of tested plasma proteins. One advantage of this method is that it ensures a higher protein coverage percentage, which is essential for a successful protein modification analysis.

A second advantage is that it facilitates the generation of adequate modified-unmodified peptide pairs that are used for identification and quantitation analysis.

(3) The modified peptide identification procedures-The preferred techniques to analyze the protein PTM structure is mass spectrometric techniques. In one preferred embodiment, the system is primarily liquid chromatography-tandem mass spectrometry. Liquid chromatography provides several advantages in this system: (1) a chromatographic column may concentrate the targeted peptides/polypeptides while removing unwanted contaminants, which greatly enhances the detection sensitivity of the mass spectrometric method; (2) HPLC enables temporal separation of peptides in a complex mixtures and thus automatic acquisition of their structural information; and (3) peptides/polypeptides can be characterized for their elution properties that are instrumental in identification of modified peptides in subsequent protein PTM profiling procedures In one preferred embodiment, the current invention provides a new concept for

identifying PTMs in the plasma proteome. The identification procedure is based on two basic postulates. First, every modified peptide must have an unmodified peptide of equal or similar length as the reference peptide. This can be expected if modification on a protein does not severely interfere with the cleavage by endopeptidases (FIG. 9). Usually, these two peptides have exactly the same residue number. For certain peptides, we have observed that the modified peptides may be one residue shorter or longer than the unmodified counterpart. Secondly, the peptide pair consisting both the modified and unmodified peptides have very similar retention times (RT) in a LC-MS run. When the modification group is not very bulky, its conjugation does not drastically alter the elution properties of peptides. Therefore, there is usually a small difference of retention time (ART) between the modified and unmodified peptides (FIG. 9). This property enables us to track the elution of the modified peptide through referring to its unmodified counterpart.

The SIC analysis examines the specific ion chromatograms of the hypothetical modified species in reference to the unmodified peptides, which may uncover putative modified peptides. Usually, the modified and unmodified peptides are closely eluted from the reverse-phase column due to comparable hydrophobicity. Therefore, the retention time of a modified peptide should be approximately equal to that of the unmodified form. The niez values for hypothetical modified peptides are calculated based on the first-step experiment, and they are then used to plot the specific ion chromatograms for a peptide. Proprietary software has been developed to calculate these m/z values and also to graph specific ion chromatograms. When a peak is observed on the specific ion chromatograms for the hypothetical modified peptide, its difference with the reference peptide, ART, is used to determine whether it is a signal from the modified peptide. If the TARTI is large (e. g. > 10 min), the peak is not likely related to the unmodified peptide. If the |ART| is small (e. g. < 10 min), the peak likely

represents the signal from the modified peptide (FIG. 10).

In another preferred embodiment, the synthetic peptide mass map method provides significant data to allow the discovery of new peptides bearing modifications.

Through comparing the synthetic mass maps of the proteins from different samples, we can identify the sample-specific peptides which are highly likely caused by differential modification mechanisms (FIG 4).

In some embodiments, mass spectrometric techniques, like neutral loss scan or precursor ion scan, are also useful to identify the modified peptides. Other non-MS techniques, such as affinity purification with or without coupling of chemical modifications are also embodiments of the invention.

(4) The modified residue identification procedures-In order to further characterize the structure of the putative modified peptides, mass-specific LC-MSMS analysis are performed on these putative modified peptides. The tandem mass spectra not only enable us to examine the peptide identities but also to map the exact modified residue within the sequence. The acquired CID spectra are initially interpreted by programs like SEQUEST using adequate mass tag at specific residues and are finally confirmed by our proprietary computer programs. The modified amino acid residues can thus be pinpointed by examining the mass spectrum of the fragment ions. These results are gathered to be parts of proteome PTM information.

Part III: THE Proteome Quaternary Structure Profiling System In one embodiment, the proteome quaternary structure profiles can be characterized with the scheme illustrated in FIG. 11. First, the entire proteome is divided by fractionation procedures. These fractions are then analyzed for their primary structure profiles within, including both the protein abundance and protein PTM status profiles. The primary structure profiles are reconstructed to generate the distribution silhouettes of proteins (or called specific protein distribution silhouettes).

Through examining these distribution silhouettes, quaternary structure grouping procedures are used to discern the presence of any quaternary structures as well as to identify the components within a quaternary structure. Meanwhile, quaternary structure quantitation procedures determine the abundances of a quaternary structure by examining the areas of corresponding peaks in the specific protein distribution silhouettes.

A. Fractionation procedures-Techniques such as ultracentrifugation sedimentation analysis, native gel electrophoresis, or conventional chromatographic methods are preferred. In principle, it is preferred that the fractionation step avoid unwanted assembly or disassembly of any quaternary structures during the analysis.

All of the protein quaternary structures are separated based on their unique physicochemical properties. Among various chromatographic techniques, the gel filtration analysis is preferred for its minimal effects on the protein quaternary structures as well as for identification of the apparent molecular weights based on the elution positions. For fractions derived from chromatographic techniques, the subproteomic fractions are also termed chromatographic or eluted fractions. One important feature of the current invention is not only to determine the peak positions but also to discern the exact shapes of the peaks in specific protein distribution silhouettes. In one embodiment, we analyze their peak shapes to distinguish one quaternary structure from the others. Therefore, it is essential to collect multiple fractions across one elution peak for a quaternary structure analysis. In other words, a peak shape in specific protein distribution silhouettes can be correctly revealed only when this peak is dissected into multiple fractions.

B. Primary structure profiling procedures-m one embodiment, the techniques used are identical with those for proteome primary structure profiling, with the information on proteome composition and PTM acquired beforehand. Briefly, the

proteins in subproteomic fractions are cleaved with site-specific enzymes or chemical methods. The protein abundance index and the PTM status of each protein is determined using LC-MS/MS based profiling procedures.

C. Data reconstruction procedures-These procedures reconstruct the primary structure profiles of all fractions by sorting out the protein abundance indexes in all the fractions for every protein. It is noteworthy that the primary structure profiling procedure in the current invention determines the absolute amounts in a sample. Therefore, in one embodiment, we use all the protein abundance indexes to graph the specific protein distribution silhouettes for every single protein. Thus we acquire the distribution/elution silhouette for every single protein. In addition, the absolute amounts of modified peptides can also be determined to examine whether the distubutionelution silhouette of these PTMs are selectively present in particular quaternary structures.

D. Quaternary structure grouping procedures-These procedures locate the peaks on the specific protein distribution silhouettes. In one embodiment, we group proteins with similar peak-shapes together and they are designated as the components of a particular quaternary structure. When multiple proteins have similar distribution/elution peaks, these proteins define a hetero-multimeric quaternary structure. It is envisioned that certain proteins do not share peak shapes with any other proteins for they have the homo-multimeric quaternary structures. It is envisioned that there are multiple peaks existent for a particular protein, which means this protein is present in multiple types of quaternary structures. Thus the quaternary structure species and their compositions are defined by these grouping procedures. In one embodiment, the distribution/elution peaks of all know PTM structures are also grouped with the peaks of the quaternary structures. This should define the PTM composition in the particular quaternary structures.

E. Quaternary structure quantitation procedures-In one embodiment, these procedures determine the areas of corresponding peaks discernible in the specific protein distribution silhouettes of individual proteins. The abundances of these quaternary structures are in proportion to the peak areas of any components in the quaternary structure. Therefore, the levels of all quaternary structures in the entire proteome can be formulated. For a protein with multiple peaks in its distribution/elution silhouettes, the ratio between areas of different peaks reflects how this protein distributes over different quaternary structures.

F. Two-conduit fraction collection-During the development of the needed methodologies, it is preferred to collect many fractions across one single elution peak of a particular quaternary structure. However, we have observed that the design of the present fraction collector lacks such capacity to collect the eluted components without significant sample loss. Therefore, we propose a two-conduit design immediately upstream of the flow exit in the current invention in one embodiment, as shown in Figs. 12A and 12B, to prevent accidental fluid drop that often occurs during movement of the fluid flow exit. Specifically, eluted fluid is collected in a first sample conduit during a first time period, while fluid previously collected in a second sample conduit is flushed out of that conduit and into a collection area (e. g. , tube, flask, and the like). In a preferred embodiment of the invention, pressurized air may be used to flush the fluid collected in a sample conduit into a collection area via an outlet needle (for example). In addition, the flushing of the fluid into a collection tube can become independent of the steady flow from the column. Thus, faster and better timed elution is feasible.

According, as shown in Figs. 12A and 12B, a switch valve 1202 having a plurality of ports (1-8) may be provided. Such switches are commercially available and include, for example, selector valve (part number 70001230) from Waters Corp.

The switch valve may be used with an actuation device (not shown), which may include electronic control/timer (either digital or analog) for controlling the positioning of the switch. The switch allows for fluid connection of one or more pairs 1214 of ports-preferably multiple pairs of ports (e. g. , paired ports 1214a-1214d). Thus, when the switch valve is in placed in a first position, for example, as shown in Fig. 12A, port 1 and port 2 may be connected, port 3 and port 4 may be connected, port 5 and port 6 may be connected and port 7 and port 8 may be connected. In contrast, as shown in Fig. 12B, when the switch valve is in a second position, port 2 and port 3 may be connected, port 4 and port 5 may be connected, port 6 and port 7 may be connected and port 8 and port 1 may be connected.

When the switch valve is in the first position, sample conduit 1206 is collecting the fluid fraction from fluid flow 1208. The other end of the conduit 1206 collects may be in communication with port 3, which may be connected to a pressure source or an overflow area (for example). Preferably, connection of the sample conduit 1206 to the fluid flow is done over a predetermined period of time. While the switch valve is in the first position, conduit 1204 may have an inlet connected to port 8,-which may be in communication with a pressure source (e. g., controlled pressurized air flow). An outlet of conduit 1204 may be in communication with outlet needle 1212 via connected ports 5 and 6. The outlet needle 1212 may direct fluid to particular fraction tubes (for example, odd-number fractions). W that way, during the period of time that conduit 1204 is connected to the pressure source, pressurized air forces fluid contained in conduit 1204 out outlet needle 1212.

Similarly, in Fig. 12B, when the switch valve is in a second position, conduit 1204 is now connected to the fluid flow 1208 (and may also be connected to a pressure source or to an overflow area) during a period of time (which may be the same as or different from the period of time that conduit 1206 was connected to the fluid flow).

Conduit 1206 correspondingly may then be connected to the pressure source and the outlet needle 1212, so that the fluid contained therein may be directed into a fraction collection tube (e. g. , even number fractions).

G The proteome primary-quaternary structure relationship concept (FIG 13)-As described above, the quaternary structure level is proportional to the corresponding peak area in the specific protein distribution silhouettes of any protein components (FIG 11). Briefly, the entire proteome is divided by fractionation procedures. These fractions are then analyzed for their primary structure profiles within, including both the protein abundance and protein PTM status profiles. The primary structure profiles are reconstructed to generate the distribution silhouettes. Through examining these distribution silhouettes, quaternary structure grouping procedures are used to discern the presence of any quaternary structures as well as to identify the components within a quaternary structure. Meanwhile, quaternary structure quantitation procedures determine the abundances of a quaternary structure by examining the areas of any peaks in the specific protein distribution silhouettes.

The quaternary structure level can be calculated if the conversion factor between this quaternary structure and a particular protein is derived. In one embodiment, through the quaternary structure profiling procedures disclosed in the current invention, these conversion factors can be measured empirically.

From the other way around, the abundance of any protein in a proteome is the sum of its amounts in all quaternary structures. An equation describes such a relationship, Pm=E (Km, nQSn) where m=l to M and n=l to N. Km, n is the conversion factor between protein Pm and quaternary structure QSn. M and N are the total number of the protein species and quaternary structures, respectively, in the entire proteome. When a protein is not present in a particular quaternary structure, the K

value is zero. In contrast, when a protein is present in a quaternary structure, the K is a positive constant. As disclosed in FIG. 13, all of the protein abundances can be reconstructed using such equations.

In one embodiment, we calculate the quaternary structure profiles with the protein abundance profiles on the premise that the number of proteins (M) is equal or large than the number of quaternary structures (N). Under these conditions, one and only one proteome quaternary structure profile can be calculated using the determined proteome protein abundance profile. As proteome protein abundance profiling is a much faster analysis, this approach becomes a fast quaternary structure profiling technique. With our proteome structure profiling procedures, we should be able to determine how N and M are related to each other.

In another envisioned scenario. The M is indeed larger than N, but the number of free proteins is the primary reason for a larger M. In another embodiment, we can derive the protein quaternary structure profiles by removing most of the free proteins. A feasible scheme is illustrated in FIG. 14. The proteome with M>N is processed by techniques like gel filtration or dialysis so as to acquire one single subproteomic fraction with N>M. For example, gel filtration column with low exclusive limits at 100-150 kDa range should suit this purpose well. For these GFC columns, the fractions collected at the void volume should contain mostly multimeric protein quaternary structures. For these samples, the quaternary structure profiles can be calculated using the protein abundance profile of the fraction.

PART IV The Establishment Of Proteome Structure Profile Databases With this system, we may couple the protein structure profiles with clinical parameters at various health conditions to establish the proteome structure profile databases. First, the samples are collected from individuals or animals at various physiological and pathological conditions. These samples are analyzed for their

proteome structure profiles using the procedures described above. The clinical parameters of these individuals or animals are also determined and recorded. These parameters and the proteome structure profiles are combined to establish the proteome structure profile database (FIG. 15).

A. Samples collected from individual at various conditions-In human and other organisms, there are usually multiple factors affecting the protein structures in a proteome. Sometimes, well controlled experiments can be used for direct detection of proteomic changes at different health conditions in other organisms.

However, for analyses using specimens from certain organisms (e. g. human), it is very difficult to carry out experiments with immaculate controls. In one embodiment, in order to investigate how specific health conditions may be linked with any proteome structures, it is preferred to carry out a large-scale survey of proteome samples from the general population.

The conditions that are envisioned are all the possible health problems encountered, including intoxications (e. g. alcoholism), medical therapies (e. g. chemotherapy), allergic reactions, viral infections (e. g. HBV, HCV), bacterial infections, parasite infections, prion diseases, metabolic diseases (e. g. diabetes mellitus), cardiovascular diseases (e. g. acute myocardial infarction), renal diseases (e. g. chronic renal failure), degeneration diseases, neoplastic diseases (e. g. hepatocellular carcinoma) and other categories of diseases. Using the methods of the invention, the proteome changes at various physiological conditions such as pregnancy, aging, gender difference and so on may be included.

B. Proteome structure profiling procedures-All the proteome structure profiling procedures can be employed using the samples, including the protein abundance profiling system, proteome PTM status profiling system and proteome quaternary structure profiling system. Standard procedures are formulated for

assaying these structure profiles.

C. Acquisition of the clinical parameters-The clinical information of tested human or other organisms are taken and organized. The health conditions at the time when the proteome is sampled are particularly important. Clinical data that can be collected include results through serological, biochemical, pathological and other medical tests. Some more general features are also recorded, including gender, age, habits about food and drink, hobbies, exercise and other related factors.

D. Organization of the proteome structure profile database-In one embodiment, the structure profiles consisting of protein abundance, PTM status and quaternary structure profiles are integrated together with the clinical parameters acquired. The algorithms are developed to statistically correlate the proteome structure profiles and the clinical parameters. A search engine is set up to match any proteome structure profiles in a very large database. Those clinical parameters associated with the matched proteome structure profiles are statistically evaluated to give us indications about the most likely clinical situations.

Part Uses Of The Methodologies, Information And Materials Related To The Proteome Structure Profiling Procedures And Proteome Structure Profile Database The current invention first discloses the procedures for analyzing the structures of protein/protein complexes as well as analyzing proteome structures collectively in a proteome like plasma. The disclosed procedures include those for analyzing protein/proteome protein abundances, post-translational modifications and quaternary structures. It also discloses the information regarding protein primary and quaternary structure profiles in the proteomes such as plasma, urine and blood cells. The proteome primary and quaternary structure information is also organized into clinical databases that correlate disease conditions with the proteome structure profiles. It also discloses how the peptides, proteins, antibodies and other molecules can be

designed with the above information for research, diagnostic and therapeutic uses.

I. The uses of related proteome structure analysis methodologies It is always a major goal to understand the functions when we study a complicated proteome like plasma (Kenon et al., 2002). Among different protein structure levels, quaternary structures are considered to have direct association with protein functions. Therefore, the invention provides new proteome quaternary structure profiling procedures for studying quaternary structures and thus functions in a proteome. As quaternary structures are frequently modulated by post-translational modifications, the invention also incorporate post-translational modification status profiling procedures such that the mechanisms underlying quaternary structure changes can be explored. These proteome structure analysis methodologies are useful in several important aspects : (1) Identifying and quantitating protein primary and quaternary structures in the proteomes from the samples such as plasma, urine, and blood cells. As mentioned above, the information collected has been used for describing proteome primary and quaternary structure profiles in the clinical databases. The application of proteome PTM profiling procedures is exemplified in the EXAMPLES section.

(2) Determining the protein primary and quaternary structure profiles of related purified plasma products used for therapeutic uses. Huge amounts of plasma proteins have been prepared for therapeutic uses. However, the current understanding about how the protein primary and quaternary structures may impact the efficacy and safety of these therapeutic products is lacking. The methods of the invention addresses these issues. Certain PTMs, like glycosylation, have been considered very important in plasma clearance and metabolism of plasma proteins. Since quaternary structures directly relate to protein functions, we also expect such a property must have a great impact on the pharmacological properties of these therapeutic products. In

one embodiment, the technologies entailed in this invention are used to gather the qualitative and quantitative information on their structures. Using the methods of the invention will enable better knowledge in developing the preparation well as quality-control procedures (FIG. 16). For instance, many small proteins may have different apparent molecular weights if they do not assume the correct quaternary structures. The improper quaternary structures may change greatly their pharmacokinetic properties as small molecules below the excretion limits may leak into renal excretion. Therefore if the quaternary structures are not characterized and monitored, their exact therapeutic effects should remain open to question.

(3) Identifying and quantitating primary and quaternary structures on recombinant protein products for research and clinical uses. There is a dramatic increase of recombinant protein and antibody molecules for clinical uses in recent years. It is a general belief that PTMs such as glycosylation on these molecules have significant effects on their pharmacokinetic properties such absorption, transport, stability and clearance. Similarly, for therapeutic protein molecules, the quaternary structures of these products should have considerable effects on their pharmodynamic as well as pharmacokinetic properties. In one embodiment, these procedures are used to qualitatively and quantitatively determine the primary and quaternary structure profiles of these protein products. The methods of the invention will enable the user to delineate the effect of specific protein structures on the pharmacokinetic and pharmacodynamic properties (FIG. 17). This, in turn, will facilitate the design, preparation and assessment of these protein molecules. For example, the therapeutic antibodies like Herceptin (Genentech) can be analyzed for these protein structure properties with the methods in the current invention.

(4) Identifying and quantitating the specific plasma protein structures that bind metabolites, wastes, toxins, small non-protein drugs. In one embodiment, the

disclosed systems are used to determine what proteins may participate in the binding of these compounds since these binding proteins are for the transport and excretion of these acting molecules. Therefore the disclosed invention will facilitate such physiological, toxicological and pharmacological researches (FIG. 17). The pharmacological properties of therapeutic agents, such as those in the form of small chemical, polysaccharide, RNA, DNA, protein, virus or cell, can be analyzed using the systems in the current invention.

In one embodiment, the method of the invention is used to characterize the primary and quaternary structure profiles of the binding protein. It is well known that one single plasma protein like albumin may bind a great variety of chemicals. One intriguing possibility is that only the molecule bearing particular primary or quaternary structure has the best affinity. The method of the invention will allow one to examine whether particular compounds may associate with a particular class of protein structures and provide knowledge about how small compounds are transported and excreted by the plasma.

(5) Identifying and quantitating binding partners for proteins, antibodies and other macromolecular structures. There are a limited number of methodologies that can be used for documenting protein-protein interactions. Moreover, their applications are frequently restricted to certain organisms and materials. Since no antibodies or tag proteins are required, the method of the invention provides a superior alternative to the conventional methods for studying this important property. For example, the method can identify the antigens for an antibody and vise versa. This will directly facilitate the development of research, diagnostic or therapeutic products.

In one embodiment, the methods of the invention may also help identify the binding partners for proteins or other macromolecules. For example, the methods enable the user to identify the transport protein in the circulation for foreign proteins of

invading microorganisms such as viruses and bacteria. It also has applicability to the studies of portal proteins in cells or tissues. This will be instrumental in understanding many physiological and pathophysiological processes and in developing more efficient diagnostic and therapeutic measures.

(6) Identifying and quantitating primary and quaternary structure profiles of other proteomes. The current invention embodies the development of several new procedures to study protein structures, including the primary structures (post-translational modifications) and the quaternary structures. These techniques are very important for the studies of the proteomes from other body fluids of any organism, such as urine, saliva, and cerebrospinal fluids. These methodologies can also be used for analyses of the proteomes from particular pathways, organelles, cells, tissues and whole organisms from all species. For example, the blood cells can be studied using the same methodologies to determine the protein primary and quaternary structures within their proteomes.

Our procedures can provide us a panoramic view over how proteomes may change in response to various stimuli. It is a general belief that the proteome of a tissue or a cell is frequently organized into macromolecular quaternary structures to exert their actions and that proteins may have different functions in different quaternary structures. The methods of the invention not only identify protein quaternary structures and their compositions but also determine the abundances of these macromolecular quaternary structures. Such perspectives should allow the study of how the proteome changes in response to various physiological and pathophysiological stimuli, while delineated by quaternary structure species and abundances.

(7) Identifying the quaternary structures that are responsible for particular biochemical activities (e. g. enzyme activity) in the proteome. Since the quaternary

structures are directly responsible for the protein activities, their amounts should directly correlate with the studied activities. In one embodiment, the methods of the invention allows the user to identify the quaternary structures that are responsible for specific biochemical activities and allows a better understanding of the dynamic changes of quaternary structures during evolving biochemical processes. For example, the proteins in the coagulation pathway and complement pathways in plasma may be studied using the methods of the invention.

(8) Comparing the quaternary structural profiles among different cells and tissues to identify the quaternary structures central to particular physiological processes (function-specific quaternary structures). Many biological processes have been known to be regulated at the epigenetic levels. However, there is little information about how assembly or disassembly of quaternary structures may participate in regulation of cellular functions. In one embodiment, through carrying out the comprehensive survey of the quaternary structure profiles, we can look for the quaternary structures that are differentially assembled or disassembled during various processes. Our quaternary structure analysis system undoubtedly will shed light on many important but unclear questions. For example, we can compare how stem cells and differentiated cells are different in terms of their quaternary structure profiles.

We can examine how young and aged cells are different at the quaternary structure level. Through getting this kind of knowledge, we may have a chance to devise specific measures to modulate these processes.

(9) The method of the invention may be used to compare the quaternary structural profiles between pathologic and normal specimens to identify the quaternary structures that are central to pathogenesis of diseases (i. e. pathogenic quaternary structures) (FIG 18). These quaternary structures are those that arise during the disease progression and have a direct impact on the disease progression. For example,

from its entry to the establishment of the infection, a microorganism like virus or bacteria inevitably has to interact with the host proteomes. The quaternary structures formed between viral/bacterial and host proteins are frequently essential for their transport, entry into cell, as well as transformation of host cells. All these quaternary structures are directly involved in establishment of an infectious disease. Through comparing the proteomes at different stages of infection may help us identify the quaternary structures essential for disease progression.

Moreover, many diseases still have no definite etiology, particularly those with delayed onsets. Some of these diseases, like many chronic diseases, are apparently affected by multiple factors. It is an interesting but important possibility that all these factors have an accumulating and progressive effect on the formation of disease-causing quaternary structures. In other words, these quaternary structures can be the only causes of these diseases. Their assembly signifies the establishment of a particular disease, an advanced stage at which the disease becomes established and also irreversible. It is expected that they will be the excellent targets of new diagnostic and therapeutic measures. The methods in the current invention should be instrumental in the discovery of these pathogenic quaternary structures. This knowledge will also greatly enhance our understanding of the underlying mechanisms for many complicated diseases, e. g. chronic infections, cancers, hypertension, and diabetes mellitus, as well as facilitate the development of more effective and specific therapeutic measures by targeting the specific pathogenic structures.

In one embodiment, the disclosed systems are used to discover and characterize the function-specific or pathogenic quaternary structures as well as to realize these concepts.

II. The uses of the information contained in proteome structure profile databases

In addition to systemically characterize proteome structural information, including primary and quaternary structures, one signature feature is the further composition of the proteome structure profiles as well as correlation of these profiles with clinical conditions.

The information entailed in these databases is very useful in several aspects: (1) Monitoring the body conditions and diagnosing animal diseases through examining proteome structure profiles. The quaternary structure is dictated by its primary structure, which is defined as amino acid sequence plus post-translational modification. Because of its reversible nature, post-translational modifications is considered as the major mechanism that changes proteome quaternary structures as well as their associated functions. The methods of the invention may be used to identify many post-translational modifications that act as major molecular switches that govern the assembly and disassembly of proteome quaternary structures and accompanying stimulation and suppression of proteome functions. Therefore, this invention enables research on the correlation between primary and quaternary structures and enables research on how primary structures modulate proteome functions. This provides a novel and useful platform system for investigating protein structure-function relationship.

In one embodiment, the clinical databases that can be assembled using the methods of the invention will allow better understanding of how proteome structures are altered as proteome functions adjust in different physiological and pathophysiological conditions. Through direct comparison among proteome primary and quaternary structure profiles at different disease conditions, the disease-specific profiles can be identified.

(2) Evaluating the effects and toxicities of trial therapeutic measures. The primary challenge when screening drugs and therapeutic measures is not only in

finding biological relevant activities, but in identifying the active components of a drug or the mechanism of action of such drugs. Current analytical techniques are limited in their ability to adequately identify all the protein components that are affected by a drug. In one embodiment, the present invention is used to answer the need for a reliable method to rapidly assess all the interaction of one drug (or a combination of drugs) against a multitude of possible biological targets by analyzing the proteome's primary and quaternary structures.

The proteome primary and quaternary structure profiles are identified in body fluid/tissue proteomes from individuals receiving therapies. These profiles will be matched with those in the clinical databases. The body responses, including therapeutic effects and toxicity, are revealed through such database searching. These proteome primary and quaternary structure profiles upon therapy can be incorporated into the clinical databases for monitoring agents that share similar therapeutic effects or toxicity in the future. Thus, the methods of the invention greatly enhance the development of more efficient and safer therapeutic measures (FIG. 16).

Not only does the present invention provide method for determination of the primary structure of a drug, as part of the primary structure analysis, but it also provides valuable information about interaction of a drug with other biomolecular targets through quaternary structure analysis. In one embodiment, such information permits the identification of compounds having particular biological activity and gives rise to useful drugs, veterinary drugs, agricultural chemicals, industrial chemicals, diagnostics and other useful compounds.

Moreover, in one embodiment, the methods of the invention may be used to analyze combinatorial libraries and mixtures of compounds. Different mixtures of compounds may be screened in a non-human subject and the proteomes analysis may be stored on a computer database. The database, comprising data from multiple

combinatorial screens may be analyzed by deconvolution to determine target compounds that have the potential for the most significant effects. The resynthesis and combination of a pool of mixtures that has the most effect on a particular target protein may be performed and the compounds may be rescreened until an optimal compound, or optimal mixture of compound is derived. The screening procedure may have multiple criteria. For example, combinations of compounds may be screen for maximum effectiveness in suppressing autoimmunity (e. g. , antibody production) and reduced liver toxicity (e. g. , reduced liver enzymes).

(3) Monitoring the efficacy and toxicity of approved therapeutic measures.

The proteome primary and quaternary structure profiles are examined in individuals receiving approved therapies. As the proteome structures alter as the organ functions improve or deteriorate, we expect proteome structure profile analysis should provide evidence on the improvement as well as deterioration of organ functions. In one embodiment, the proteome structure profiles will be compared with those in the clinical databases to determine whether the therapy is effective and safe (FIG 16).

(4) Monitoring the body conditions and diagnosing animal diseases through examining proteome structure profiles in other body fluids, particularly the urine. Many small proteins as well as degradation products of larger proteins are filtered into urine for their excretion. The methods of the invention enables us to see whether these proteins have the same or similar primary structure profiles as they are in plasma. If the similarity is established, it is reasonable to monitor the plasma proteome primary structure profiles through analyzing those in the urine samples. It is still very likely that their primary structure profiles are modified by the cells or tissues lining the urinary tracts. The modification activity is supposed to be a reflection of cell/tissue activities in urinary system. Hence this should provide an additional advantage to monitor the primary structure profiles of the urine proteome.

If quaternary structures are not disrupted in the urine sample, they would be also analyzed.

(5) Developing and producing protein molecules with proper protein primary and quaternary structure profiles for research, diagnostic and therapeutic uses.

In one embodiment, with these databases, we have the needed information for designing new peptides/polypeptides bearing proper protein structures. Functional recombinant protein products need to be compared with the primary and quaternary structure profiles of the counterparts present in normal individuals. These reagents can also be used to generate specific recognition molecules, such as antibodies, for targeting these specific protein structures. These molecules will be of great research, diagnostic and therapeutic uses (see below).

(6) Establishing the concept on the proteome primary-quaternary structure relationship (FIG. 13). One of the most striking suppositions in the current invention is that proteome quaternary structure profile can be calculated through determining the proteome protein abundance profiles. However, such a calculation procedure first requires the acquisition of the proteome quaternary structure information.

Particularly, we have to measure the factors that define how protein abundance and quaternary structure are converted into each other. Once these are completed, it becomes feasible to develop these fast quaternary structure profiling procedures.

(7) Establishing the concept that post-translational modifications in plasma proteomes play major roles in the modulation of the proteome quaternary structures.

In one embodiment, the accompanying procedures for characterizing proteome post-translational modifications should provide us extra information about how the plasma proteome quaternary structural changes are elicited. The proteome PTMs are considered as the molecular switches that induce the assembly and disassembly of proteome quaternary structures. PTM characterization of protein components in each

plasma quaternary structure becomes a fairly important task. For proteins shared by different quaternary structures, the quantitative comparison of their PTM status profiles should help us define the possible keys that initiate the quaternary structural changes in the proteomes.

This concept should prompt us to study the mechanisms underlying the catalysis of these PTMs. Thus far we know little about where and how these PTMs are incorporated. For PTMs like phosphorylation and acetylation, the coupling of these groups usually requires the presence of high-energy intermediates (e. g. ATP for phosphorylation and acetyl-CoA for acetylation). It is also a general belief that these compounds are unstable in a milieu like plasma. It is not known where these groups are coupled. Most likely, the plasma is not where the modification reactions are carried out. On the contrary, as mentioned above, the classical plasma proteins are generated by organs like liver and intestines. It is possible that these modifications are in existence when they are synthesized and secreted. Another intriguing possibility is that these PTMs may occur not only post-translationally but also "post-secretionally". That means these PTMs are coupled by active mechanisms outside the cells. Consistently with this notion, people have observed that there are certain phosphorylation machineries at the extracellular side of plasma membrane of cultured cells. If the cell surface is where the"post-secretional"modification proceeds, it would be in great accordance with why the PTM status profiles of plasma proteome may reflect the activities of different tissues and organs. How the cell surface post-secretional machinery is linked to cellular activities should become a focus of physiological and pathophysiological researches.

(8) Establishing the concept that proteome structure profiles are altered under different physiological and pathophysiological conditions in animals. In one embodiment, this invention leads to the finding that plasma proteome structures,

primary as well as quaternary ones, are altered at different health conditions. For PTM status profiles, it should be stressed that the exact modified residues are first exactly identified and then subjected to further quantitative analysis. In other words, we put more emphasis on the local modification states besides the overall modification statuses of the polypeptides.

(9) Establishing the concept that proteome structure profiles can be used for early detection of diseases and subsequent assessment of their progression and therapeutic efficacy in one embodiment. Our results implicate that certain plasma proteome modifications are highly regulated processes. This may explain why different health conditions may associate with particular proteome structural changes.

We hypothesize that the overall plasma proteome structures could be the products of interplays between pathologies and host defense mechanisms. For diseases like cancers and infectious diseases, the pathogenic agents may also take advantage of plasma protein modifications to shape a plasma milieu where the diseases can be continued.

In certain diseases, plasma proteins are modified by special metabolites caused by the diseases. For these diseases, plasma proteome structures may not have major roles in the pathogenesis but they indeed can serve as marker molecules that correlate with the accumulation of metabolites. Since plasma proteins usually have half-lives up to several weeks, the levels of protein post-translational modification should be indicative of the exposure time as well as the metabolite concentration. For example, we showed carbamylation at Lys-549 is increased in patients whose blood urea nitrogen accumulates.

(10) Establishing the concept that proteome structure profile databases are important for exploring the biological functions of animal proteomes in one embodiment. Coupled with detailed clinical information such as serological,

biochemical and other parameters provides an understanding of how plasma proteome structures, particularly the quaternary ones, change in response to various physiological and pathological stimuli. Since the quaternary structures directly impact the proteome functions, the current invention should provide good handles for studies of plasma proteome functions. For example, it should be very interesting to see how proteome functions may adjust to different pathogens like chemicals, virus, and bacteria. Also, we can test how proteome functions may change when an animal is exposed to therapeutic agents such as drugs and vaccines.

(11) Helping to define the concepts of pharmacoproteomics. The definition of pharmacoproteomics is the investigation of pharmacologic properties of therapeutic agents through proteomics methodologies. The scope of this discipline includes the concepts discussed in the USES (2), (3), (5), (8) and (9) in this section. For example, it entails the studies addressing how primary and quaternary structure profiles of protein drugs affect the pharmacokinetic properties of these drugs. It also involves the researches for how proteome structure profiles are used for monitoring the <BR> <BR> therapeutic effects and toxicity. Through establishing the proteome structure profile database, we should be able to identify the structure parameters associated with pharmacokinetic and pharmacodynamic properties of therapeutic agents.

Throughout this application, the term drug or protein drug include any protein drugs. These protein drugs may be used or analyzed using the methods of the invention. Nonlimiting examples of protein drugs includes, at least, Luteinizing hormone-releasing hormone, Somatostatin, Bradykinin, Goserelin, Somatotropin, antibodies, Buserelin, Platelet-derived growth factor, Triptorelin, Gonadorelin, Asparaginase, Nafarelin, Bleomycin sulfate, Leuprolide, Chymopapain, Growth hormone-releasing factor, Cholecystokinin, Chorionic gonadotropin, Insulin, Corticotropin, Calcitonin, Erythropoietin, Glucagon, Calcitonin gene, Hyaluronidase,

Interferons, Endorphin, Interleukins, Thyrotropin-releasing hormone, CSIF, Liprecin, Menotropins, Pituitary hormones, Urofollitropin, Follicle luteoids, Leutinizing hormone, aANF growth factor releasing factor, LH-releasing hormone, Melanocyte-stimulating hormone, Gonadotropin releasing hormone, Oxytocin, Vasopressin, Streptokinase, ACTH analogs, Tissue plasminogen activator, Atrial natriuretic peptide, ANP clearance inhibitors, Urokinase, Angiotensin II antagonists, Bradykinin potentiator B, Bradykinin antagonists, Bradykinin potentiator C, CD4, Brain-derived neurotrophic factor, Ceredase, Colony stimulating factors, Cystic fibrosis trans-membrane conduct regulator, Enkephalins, FAB fragments, IgE peptide suppressors, Chorionic gonadotropin, Insulin-like growth factors, Ciliary neurotrophic factor, Neurotrophic factors, Parathyroid hormones, Corticotropin-releasing factor, Prostaglandin antagonists, Granuloycte colony stimulating factor,, Pentigetide, Protein C, filgrastim, Protein S, Granulocyte macrophage colony stim. factor., Renin inhibitors, Thymosin a-1, sargramostrim, Thrombolytics, Multilineage colony stimulating factor, Tumor necrosis factor, Vaccines, Macrophage-specific colony stimulating factor, Vasopressin antagonist, Colony stimulating factor 4, alpha-1 Anti-trypsin, Adenosine deaminase, Epidermal growth factor, Amylin, Atrial natriuretic peptide, Enkephalin leu, beta-Glucocerebrosidase, Enkephalin met, Bone morphogenesis protein 2, Factor IX, Bombesin, Factor VIII, Bactericidal/pennability increasing protein, Follicular gonadotropin releasing peptide, Hirudin G-1128, IEV inhibitor peptide, Gastrin-releasing peptide, Inhibin-like peptide, Glucagon, Insulinotropin, Growth hormone releasing factor, Lipotropin, Macrophage-derived neutrophil chemotaxis factor, Heparin binding, Magainin I/II, neurotrophic factor, Melatonin, tryptophan hydroxylase, Fibroblast growth factor, Midkine, Neurophysin, Somatostatin, Neurotrophin-3, Somatotropin, Nerve growth factor, Oxytocin, Phospholipase A2, Sauvagine, Soluble IL-1 receptor, Thymidine kinase, Thymosin alpha one, TNF

receptor, soluble, TPA, Transforming growth factor beta, TRH, Thyroid stimulating hormone, Vasopressin and Vasotocin. Each of these protein drugs may exist in multiple forms, such as alpha, beta and y forms, and these forms are also envisioned as part of the definition of"protein drugs"above. One advantage of the techniques of the invention is that it is capable of separately and combinatorially analyzing all the proteins (i. e. , the proteome) in a patient in response to exposure to a drug. The methods of this invention can, for example, provide in a straightforward manner all the proteins that interacts or is affected by a drug.

(12) Another embodiment is based on the concept in pharmacoproteomics that the idiosyncratic efficacy and toxicity of a particular drug in patients can be predicted through analyzing the proteome structure profiles of these patients in one embodiment.

Particularly, pharmacoproteomics should emphasize the individualization of therapeutic measures based on the particular proteome structure profiles in patients.

Plasma proteome structures are supposed to have direct impacts on pharnzacologic properties (e. g. drug binding and transport) of almost all, if not all, therapeutic agents.

Therefore plasma proteome structure profiling will be instrumental in calculation of the dosages and intervals with which these agents are delivered to have efficacy while avoiding toxicity. Such information, in combination with the pharmacogenetic profiles, will greatly improve our knowledge of idiosyncratic effects and toxicity of drugs on different individuals.

Thus, in one embodiment, the methods of this invention can be used for the rapid screening of a drug, to see in one experiment, all the proteins that the single drug affects. In the method, the patient is exposed to a drug and a proteome analysis using any of the methods of the invention is performed. The number of affected proteins in the body can be determined by seeing which proteins have affected (changed) primary or quaternary structure.

(13) Developing the concept that specific quaternary structures may serve as the targets of diagnostic as well as therapeutic measures in one embodiment. As described above, quaternary structures should play many distinct but important roles in the progression of different diseases. For example, the interactions between host and foreign proteins play many important roles during the infection of microorganisms.

These quaternary structures should be crucial in early stage of the infection like the transport via the circulation, entry into the target cells/tissues, and even circumvention of the attack by the host immune system. In order to replicate and spread at the later stage, the invading organisms, particularly those with a limited number of genes, can also alter the host proteome quaternary structures to provide a favorable milieu. All of these quaternary structures at different stages are the good targets of diagnostic and therapeutic measures. Furthermore, the function-specific and pathogenic quaternary structures described in the use I-(8) and-(9) are the excellent targets when the therapeutic measures are designed.

The conventional wisdom of therapeutic methods is to use a particular molecule with strong and unique affinity to target and neutralize a particular gene or protein. The concepts and methodologies in the current invention can drastically advance the current therapeutic strategy. In one embodiment, we use multiple agents to target these special quaternary structures and to treat or prevent diseases. The specificity of such a therapy is conferred by the convergence of these agents at the specific quaternary structures, while no activity of any protein is completely eliminated. Other quaternary structures sharing the same components are still well preserved even in the presence of these reagents. This approach may allow us to design more physiological and thus safer therapeutic measures. Therefore, the concepts and methodologies in the current invention provide the solid biochemical basis for the combinatorial or multiple-drug therapy.

III. The uses of the materials related to the proteome structure profile databases Many useful materials can be designed using the copious information entailed in these databases. These may include peptides and proteins that have particular primary and quaternary structures profiles, as well as the recognition molecules like antibodies that are targeted against specific proteome primary or quaternary structures.

These reagents should be very useful for studying the mechanisms involved in plasma function. Some of these molecules may even have diagnostic and therapeutic values.

The information entailed in these databases is very useful in several aspects: (1) Advancing the research and diagnostic methods using particular structure-specific antibodies. In one embodiment, the clinical databases should identify proteome structures that are diagnostic of specific diseases. The antibodies against the specific structures can be used for developing rapid, inexpensive and specific immunochemical methods for detecting specific proteome structures.

(2) Advancing the research and clinical methods like antibody arrays with the structure-specific antibodies in one embodiment. For evaluation of certain specific acute diseases, rapid tests for detecting protein/proteome structures are frequently needed. In such cases, antibody-based methods are very helpful for their excellent speed and sensitivity. Methods like structure-specific antibody arrays can be considered as good tests for comprehensive assessment of the body condition.

(3) Advancing the research and diagnostic methods using molecules with specific structures for characterization of immune responses in one embodiment. An important supposition in the current invention is that certain proteome structures may be induced when plasma is exposed to certain disease conditions. It is a very interesting question whether these induced protein structures may be deemed as foreign antigens and elicit responses by our immune systems. If this happens, there

should be antibodies or cells that recognize these peculiar plasma proteome structures.

We can first synthesize peptides or proteins that carry the similar or the same protein features in our proteome structure profile databases. These molecules can then be used for quantitative analysis of the specific antibodies or/and cells, which may help us document"auto-immune profiles"in human and other animals.

(4) Advancing the concept that the auto-immune profiles are related to pathogenesis of certain diseases. The auto-immune profiles can be organized into databases along with clinical conditions. Such databases should decipher the association with these profiles and health conditions. This will enable us to determine how autoimmunity mediated by these antibodies and cells may participate in pathogenesis of diseases. Such knowledge should help us to further understand the roles of immune systems in animal physiology and pathophysiology.

(5) Continuing with developing and producing therapeutic methods with protein molecules with adequate protein structures in one embodiment. Recombinant plasma proteins with specific structure profiles can be developed and tested for their therapeutic effects. With the protein structure information in the databases, the structural differences between recombinant protein drugs and endogenous plasma proteins, in terms of primary and quaternary structures, can be analyzed. This should allow us to test whether the functional defects of the synthetic products result from improper primary or quaternary structures. Through adjusting the protein structures in the recombinant proteins, it becomes likely to generate recombinant plasma proteins with appropriate functionality.

Furthermore, peptide/polypeptide products with specific local structures can be also tested for any therapeutic effect. If any protein structures have a direct role in disease progression, inactivation or activation of these protein structures may disrupt the disease processes. Instead of introducing the intact proteins, the

peptide/polypeptides molecules may elicit suppressive or enhancing effects on these structures.

In another embodiment, we propose to deliver the therapeutic agents given via intravenous, intra-arterial or intramuscular injection in associated with their individual interacting plasma proteins. The therapeutic agents include small chemicals and macromolecular drugs such as DNA, RNA and proteins. The interacting proteins from the plasma proteome are identified using the disclosed systems. As mentioned, these interacting proteins may help to modulate the pharmacodynamic and/or pharmacokinetic properties of the therapeutic agents. Thus this may enhance the efficacy or reduce the toxicity of these drugs. As these are resident proteins of the plasma, they should elicit minimal adverse reactions. In some embodiments, the artificial molecules are design to mimic the properties of these interacting proteins such that they are delivered in association with these drugs. These are used as an alternative to the native interacting proteins.

The following examples are provided to illustrate, but not to limit, the claimed invention.

EXAMPLES EXAMPLE 1 Establishment of plasma proteome protien abundance profile database 1. Acquisition of proteome composition information: The plasma/serum samples were resolved by SDS-PAGE and the gels were developed by Coomassie blue staining method (FIG. 19). Briefly, 0. 2 ul of human serum were solubilized in 10 ul of 1X SDS sample buffer and resolved on a 10% polyacrylamide gel. The gel was developed with Coomassie blue staining and then photographed. The numbers and bars at left indicate the migration positions of molecular weight markers. The major serum proteins grouped based on their MW are labeled with dotted lines and capital letters.

The bands were excised and subjected to in-gel digestion. The peptide Band Mr (K) Proteins <BR> <BR> <BR> <BR> α2-macroglobulin, apolipoprotein B-100, fibronectin<BR> A 200 isoform 1 B 135 a2-macroglobulin, ceruloplasmin C 100 Cl inhibitor, plasminogen, properdin B factor <BR> <BR> <BR> Transferrin, hemopexin, α1-B-glycoprotein<BR> D 80 complement C4A E 72 Complement C3 F 60 Albumin G 55 Fibrinogen b, oc2-HS-glycoprotein, apolipoprotein H H 52 Fibrinogen g, immunoglobulin G heavy chain I 49 Haptoglobin, complement C4A I 20 Lacritin, lipocalin J 19 Inmunoglobulin G light chain 18 ApoAI

products were analyzed by LC-MS/MS method and the MS data were analyzed by SEQUEST program to revealing their protein identities. The proteins identified were listed, along with their apparent sizes on the SDS polyacrylamide gel. (Table 1).

Table 1 List of plasma proteins identified in SDS polyacrylamide gel For each disease category (e. g. hepatocellular carcinoma), the plasma/serum samples were pooled together and then subjected to 2-D difference gel electrophoresis for revealing the differentially expressed or modified proteins (FIG 20). Figure 20 shows the two-dimensional difference gel electrophoresis of serum samples from different disease groups. 0.2 1ll of pooled human serum were solubilized in lysis buffer and the proteins were conjugated with Cy3 (disease; red-colored) or Cy5 (normal ; green-colored) dyes. The serum samples from two different groups were combined together and were then resolved on the same two-dimensional gels. When a protein has the similar amount in both samples, a yellow signal was coded. The

stars (*) show the examples whose expression is enhanced in disease group. The crosses (+) show the examples whose positions at the isoelectric point axis are dramatically shifted.

Sodium dodecyl sulfate-polyacrylamide gel electrophoresis of plasma proteins-The serum/plasma samples were diluted in 1X SDS-PAGE sample buffer and stored at-80°C in aliquots until use. One microliter of human plasma/serum was resolved in a 10% SDS-PAGE system per instructions of the manufacturer (Amersham). Typically, the mini-gel was run at constant current of 20 mA per gel.

Following completion, the gel was washed with deionized water twice and then subjected to Coomassie blue staining.

In-gel digestion-The gel piece containing the studied protein was first soaked in 25 mM NH4HCO3 for 10 min and then in 25 mM NH4HC03/50% acetonitrile for 10 min. After drying in a Speed-Vac (Savant), the gel was incubated in 100 Ill of 2% ß-mercaptoethanol/25 mffiI NH4HCO3 for 20 min at room temperature and at dark.

The same volume of 10% 4-vinylpyridine in NH4HC03/50% acetonitrile was added for cysteine alkylation. After 20-min incubation, the gel was soaked in 1 ml of 25 mM NH4HCO3 for 10 min. After washed in 25 mM NH4HC03/50% acetonitrile for 10 min, the gel was dried and then incubated with 25 mM NH4HCO3 containing 50 ng of modified trypsin (Promega) overnight (# 18 h). The supernatant was separated from the gel, which was extracted with 200 ul of 0. 1% formic acid. The supernatant and the gel extract were combined together and then dried in a Speed-Vac. The samples were kept at-20°C for storage. It was resuspended in 0.1% formic acid immediately before use.

LC-MS/MS analysis-The identities of these proteins have been determined using capillary liquid chromatography-ion trap tandem mass spectrometry. A Finnigan LCQ XP ion trap mass spectrometer was interfaced with an Agilent 1100D

HPLC system equipped with a 150X0. 3 mm Agilent 300SB C18 column (5 um particle diameter, 300 A pore size). The mobile phase consisted of water (solvent A) and acetonitrile (solvent B). The peptides were eluted at a constant flow rate of 1.0 cumin, with B concentration increasing from 5 to 85% B in 50 min. The spectra for the elute were acquired as successive sets of three scan modes. The MS scan determined the intensity of the ions in the m/z range of 395 to 1605, and the most abundant ion in the MS spectrum was selected for ZOOM and MS/MS scan. ZOOM scan is a slower scan for a narrow m/z range and examined the charge number of the selected ion. The MS/MS scan examined the fragment ions of the selected peptide ions and gave tandem mass spectra. The acquired tandem mass spectra were interpreted using a Finnigan Corporation software package, the SEQUEST Browser, which correlates the MS/MS spectra with the protein sequence database containing all human proteins. A 105.14-Da mass tag was constitutively assigned to the Cys residue, which was modified with 4-vinylpyridine during the protein cleavage step.

2-D difference gel electrophoresis-The plasma/serum proteins were diluted in 7M urea/2M thiourea containing 4% CHAPS, 2% DTT, 2% pH4-7 IPG buffer and then subjected to Cy3 or Cy5 dye labeling per instruction of the manufacturer (Amersham). The serum proteins from normal individuals were conjugated with Cy5, while those from disease groups were done with Cy3. After centrifugation at 109000g for 15 min, 400 u, g of the sample were resolved by isoelectric focusing for 100,000 VH.

After equilibration in the buffer containing 50 mM Tris-HCl pH8. 8,6M urea, 30% glycerol and 0.02% SDS, the IPG gel strip was placed at the top of a 10-15% gradient gel per instruction of the manufacturer (Bio-RAD). The gel image was developed in a fluorescence image scanner FLA-5000 (Fuji).

RESULTS AND DISCUSSION-When the plasma proteins were analyzed on a SDS-PAGE gel, there were about a dozen bands resolved (FIG. 19). With few

exceptions, there were usually multiple proteins present in each band upon LC-MS/MS analysis. As many as four proteins were sometimes identified in one gel band (Table 1).

With two-dimensional difference gel electrophoresis, we have also attempted to find differentially expressed orland modified proteins in different disease groups (FIG 20). When comparing these proteomes in general, we observed that no 2-D images from all these disease groups were totally the same with one another. These results clearly indicate that the primary structure profiles, which are defined by 2-D gel pattern, are different in plasma specimens from different disease groups. Quite strikingly, there were certain proteins that were indeed differentially expressed in particular disease groups. For example, the expression level of a group of proteins whose pI is-6. 5 and MW was apparently higher in sera associated with HIV infection and cervical cancers. Furthermore, there were also certain proteins which had shifted positions at the isoelectric point axis, which indicated that these proteins were differentially modified at the disease state. These results corroborate that the profiles of primary structures, both protein abundance and post-translational modification status, are changed in different diseases.

2. Plasma proteome protein abundance profiling: Figure 21 shows the methods for plasma proteome primary structure profiling.

A standard protein with definite amount was added prior to any sample treatment, and the mixture was subjected to reduction and alkylation in solution. After being precipitated with the methanol/chloroform, the proteins were resuspended and then subjected to in-solution digestion. The protein digests were analyzed by LC-MS/MS and the raw data were analyzed to derive the protein abundance profile or to derive the PTM status profile. In Figure 22, the specific ion chromatograms (SICs) are graphed for quantitating the representative peptides. The peptides YLQEIYNSNNQK (SEQ

ID NO : 1) and SKFQLFSSPHGK (SEQ ID NO : 2) are used for quantitating fibrinogen Y and transferrin respectively.

The little variation in three independent experiments demonstrated the consistency of such an LC-MS/MS analytical system (FIG 23). In Figure 23, the specific ion chromatograms (SICs) are graphed for quantitating the representative peptides. The peptides YVMLPVADQDQCIR (SEQ ID NO : 3) and LTIGEGQQHHLGGAK (SEQ ID NO : 4) are used for quantitating haptoglobin and a2-macroglobulin respectively.

In-solution protein digestion-One llg of bovine casein K in 9M urea was added into serum samples to denature and dissolve the serum proteins.

(3-mercaptoethanol was added and incubated at dark for 20 min and 4-vinylpyridine was added for another 20 min incubation at dark. To the solution, added was 4X volume of methanol, 1X volume of chloroform and 3X volume of water. After vigorous mixing by vortex for 10 min, the whole mixture was centrifuged at 3000 rpm on a tabletop centrifuge for 10 min. The upper organic phase was removed and 0.5 ml of methanol was added. After centrifugation at 3000 rpm on a tabletop centrifuge for 15 min, the supernatant was removed. The pellet was washed with cold acetone twice, and then resuspended in 9M urea. 0. 1 llg of trypsin in 25mM NH4C03 was added to the protein sample for digestion overnight.

Liquid chromatography-tandem mass spectrometric analysis-Electrospray mass spectrometry was performed on an ABI Q-STAR XL hybrid mass spectrometer.

The mass spectrometers were interfaced with Agilent 1100D HPLC system equipped with 150X0. 3 mm Agilent 300SB C18 columns (5 nm particle diameter, 300 A pore size) with mobile phases of solvent A and B described above. The peptides were eluted in 50 min with 5 to 85% of acetonitrile at a flow rate of 2 pl/min.

Specific ion chromatogram analysis of studied peptides were graphed using

in-house proprietary computer programs and AnalystQS (Applied Biosystems) RESULTS AND DISCUSSION-We have developed and provided a novel procedure for proteome protein abundance profiling, which primarily consists of inclusion of a standard protein, methanoUchloroform precipitation, in-solution digestion, LC-MS/MS analysis and protein abundance determination (FIG. 21). The specific ion chromatograms in FIG. 22-24 demonstrated the feasibility to detect the peptides representative of the plasma proteins as well as the added standard protein.

In Figure 24, the specific ion chromatograms (SICs) are graphed for quantitating the representative peptides. The peptides AVMDDFAAFVEK (SEQ ID NO : 5) and SPAQILQWQVLSNTVPAK (SEQ ID NO : 6) were used for quantitating albumin and the standard protein, bovine casein K, respectively.

In a triplicated experiment, the protein abundance index of albumin was averaged 2. 8 with a standard deviation of 0.2. The results, listed in (Table 2), shows the protein abundance index of albumin in plasma sample. The total ion counts of peptides AVMDDFAAFVEK (SEQ ID NO : 7) from albumin and SPAQILQWQVLSNTVPAK (SEQ ID NO : 8) from the standard protein casein are determined in three independent experiments. The average and standard deviation are also derived and listed. Such a diminutive variation validated the consistency of our proteome abundance profiling procedure.

Table 2 Consistency of LC-MS/MS analysis-based protein abundance analysis Exp Casein Albumin PAI 1 6. 6 x 104 1. 8 x 105 2. 8 2 7. 8x10 2. 0x10 2. 6 3 6. 4x104 1. 9x105 3.0 Average = 2.8 Standard Deviation = 0.2 EXAMPLE 2 Establishment of plasma proteome POST-TRANSLATIONAL

modification status profile database I. Acquisition of proteome post-translational modification information: The plasma/serum samples were resolved by SDS-PAGE and the gels were developed by Coomassie blue staining method. Bands containing the target proteins were excised and subjected to in-gel digestion. The peptide products were analyzed by LC-MS/MS method to reveal their protein post translational modification (PTM).

For each disease category (e. g. hepatocellular carcinoma), the protein spots with differential modifications at disease states were also excised from the regular two-dimensional gels for PTM analysis (FIG 25). The modified peptides identified thus far are listed in Table 3. In Table 3, the small letters ahead the residues indicate the types as well as the positions of the modification groups. p: phosphorylation; c: carbamylation ; a: acetylation; o: hydroxylation/oxidation. Their individual MS/MS spectra are shown in FIG. 26-28. In these Figures, pS, pT and pY indicate phosphorylated serine, threonine and tyrosine residues respectively.

Table 3 Summary of identified plasma protein PTMs Sequence Position Protein Modification AEFAEVpSKLVTDLTK 250-264 Albumin phosphorylation (SEQ ID NO : 9) FQNALLVRpY 427-435 Albumin phosphorylation (SEQ ID NO : 10) VPQVpSpTPTLVEVSR 439-452 Albumin phosphorylation (SEQ ID NO : 11) YLpSVVLNQLCVLH 476-488 Albumin phosphorylation (SEQ ID NO : 12) QpTALVELVKHK 550-560 Albumin phosphorylation (SEQ ID NO : 13) DpSISSKLK 293-300 Albumin phosphorylation (SEQ ID NO : 14) cKQTALVELVK 549-558 Albumin carbamylation (SEQ : 15) DDFAAFVEKCCaK 573-584 Albumin acetylation (SEQ : 16) aKaKVPQVSTPTLVEVSR 438-452 Albumin acetylation (SEQ : 17) SPVGVQPILNEHTFCAGoMpS 326-344 Haptoglobin phosphorylation (SEQ : 18) DLEEVKAKVQoPYL 89-101 ApoAI hydroxylation (SEQ : 19) pYCDMNTENGGWTVIQNR 269-285 Fibrinogen b phosphorylation (SEQ ID NO : 20)

Sodium dodecyl sulfate-polyacrylamide gel electrophoresis of plasma proteins-The serum/plasma samples were diluted in 1X SDS-PAGE sample buffer and stored at-80°C in aliquots until use. One microliter of human plasma/serum was resolved in a 10% SDS-PAGE system per instructions of the manufacturer (Amersham). Typically, the mini-gel was run at constant current of 20 mA per gel.

Following completion, the gel was washed with deionized water twice and then subjected to Coomassie blue staining.

Two-dimensional gel electTophoresis Six Ill of plasma were diluted with 94 RI lysis buffer (containing 7 M urea, 2 M thiourea, 2% DTT, 4% CHAPS and 2% pH3-10 IPG buffer) and placed onto the 18-cm immobilized pH gradient (IPG) gel, which separates proteins according to their isoelectric points (pIs). The instrument, IPGphore IEF system, as well as the IPG gels was purchased from Amersham Pharmacia Biotech. After the first-dimension experiment was finished, the proteins were resolved by SDS-PAGE analysis. The gels were developed by Coomassie blue staining.

In-gel digestion-The gel piece containing the studied protein was first soaked in 25 mM NH4HC03 for 10 min and then in 25 mM NH4HC03/50% acetonitrile for 10 min. After drying in a Speed-Vac (Savant), the gel was incubated in 100 ul of 2%

p-mercaptoethanol/25 mM NH4HC03 for 20 min at room temperature and at dark.

The same volume of 10% 4-vinylpyridine in NH4HC03/50% acetonitrile was added for alkylation of cysteine residues. After 20-min incubation, the gel was soaked in 1 ml of 25 mM NH4HC03 for 10 min. After washed in 25 mM NH4HC03/50% acetonitrile for 10 min, the gel was dried and then incubated with 25 mM NH4HC03 containing 50 ng of endopeptidase such as modified trypsin, Arg-C, Lys-C and Asp-N overnight (-18 h). The protein digests were removed from the gel, which was extracted with 200 ul of 0. 1% formic acid. These two fractions were combined together and then dried in a Speed-Vac and kept at-20°C for storage. It was resuspended in 0. 1 % formic acid before LC-MS/MS analysis.

Liquid chromatography-tandem mass spectrometric analysis-Electrospray mass spectrometry was performed on either a Finnigan Mat LCQ ion trap mass spectrometer or an ABI Q-STAR XL hybrid mass spectrometer. The mass spectrometers were interfaced with Agilent 1100D HPLC system or Waters Cap-LC system equipped with 150X0. 3 mm Agilent 300SB C18 columns (5 urn particles and 300 A pore size) with mobile phases of solvent A and B described above. The peptides were eluted in 50 min with 5 to 85% of acetonitrile at a flow rate of 2 pLl/min.

For the LCQ mass spectrometry, the spectra for the eluate were acquired as successive sets of three scan modes. The MS scan determined the intensities of the ions in the m/z range of 395 to 1605, and the most abundant ion in an MS spectrum was selected for ZOOM scan and MS/MS scan. The former examined the charge number of the selected ion and the latter acquired the spectrum (CID spectrum or MS/MS spectrum) for the fragment ions derived by collision-induced dissociation.

For Q-STAR mass spectrometry, the mass spectra were collected as successive sets of two events, MS scan and product ion scan.

The acquired CID spectra, from either LCQ or Q-STAR, were interpreted using

SEQUEST Browser (ThermoFinnigan). The MS/MS spectra were correlated with the protein sequence database containing only the target protein. Enzyme was not specified in the search parameters, which increases the confidence of identification when matched peptides have predicted cleavage sites. A 105.14-Da mass tag was constitutively assigned to the Cys residue, which was modified with 4-vinylpyridine in experiments. Those MS/MS scans that matched peptide sequences with expected cleavage sites were considered significant and also were subjected to further evaluation with our proprietary computer program to confirm the SEQUEST results.

The hypothetical m/z values and the selected ion chromatograms were generated using another proprietary computer program. This experiment was particularly focused on finding methylation, simple glycosylation, acetylation, phosphorylation and other groups on these tested plasma proteins.

RESULTS AND DISCUSSION-The serum proteins from various disease groups were resolved on SDS polyacrylamide gels. We have previously documented that a-70-kDa major band (the band F in FIG 19) contained almost exclusively human albumin. Based on this observation, this form of albumin was a candidate for post-translational modification identification analysis. In one embodiment, we identified six phosphorylated, one carbamylated and two acetylated peptides on serum albumin. This copiousness in post-translational modification was consistent with the observation that albumin was always resolved as multiple isoelectric variants using regular two-dimensional gel electrophoresis. We have observed that several proteins were differentially modified proteins with 2-D difference gel electrophoresis (FIG. 20).

We have prepared these proteins using regular 2-D gel electrophoresis and subject these proteins to in-gel digestion and subsequent LC-MS/MS analysis. The modification sites of these proteins have been identified and the results are listed in Table 1 in one embodiment. The modifications include phosphorylation at

haptoglobin and fibrinogen (3 and hydroxylation at ApoAI protein.

II. Plasma proteome PTM status profiling: (A) In-gel digestion protocol: The plasma/serum samples from different categories of patients as well as normal individuals were processed using the procedure schemed in FIG. 27. Following SDS-PAGE, the serum albumin was excised from the gels and was subjected to in-gel digestion procedure. The protein PTM status of albumin was determined using LC-MS/MS analyses. The results of these PTM status profiling are shown in FIG. 32 and 33. In these figures, the analyzed modified residues are listed at abscissas. The modification status at each modified residue, defined as EM/ (2M+XU) (FIG. 8), is indicated by an open circle.

Preparation of plasma/serum sample panel-We have selected seven human disease groups as well as the normal individuals for the investigation of plasma proteome post-translational modification statuses. The disease groups include hepatitis C carrier, hepatitis B carrier with and without e conversion, renal failure, hepatocellular carcinoma and cervical carcinoma. The presences of these diseases have been verified by independent serological and biochemical testing of the patients.

There are at least ten individuals in each group (Table 4).

Table 4 The categories and the sample numbers tested for albumin post-translational modification status profiles Category Sample number Normal 40 Hepatitis C virus infection a 24 Chronic renal failure b 45 Hepatitis B virus infection without <BR> <BR> 11<BR> e conversion c Hepatitis B virus infection with 10<BR> e conversion d Hepatocellular carcinoma e 28 Cervical cancer e 10 Human immunodeficiency virus infection f 12 a: From individuals who are anti-HCV seropositive b : From individuals receiving dialysis therapy c: From individuals who are anti-HBV S seropositive but anti-HBV e seronegative d: From individuals who are anti-HBV S and anti-HBV e seropositive e: From individuals who have pathology report f : From individuals who are anti-HIV seropositive

Sodium dodecyl sulfate-polyacrylamide gel electrophoresis of plasma proteins-The serum/plasma samples were diluted in 1X SDS-PAGE sample buffer and stored at-80°C in aliquots until use. One microliter of human plasma/serum was resolved in a 10% SDS-PAGE system per instructions of the manufacturer (Amersham). Typically, the mini-gel was run at constant current of 20 mA per gel.

Following completion, the gel was washed with deionized water twice and then subjected to Coomassie blue staining.

In-gel digestion-The gel piece containing the studied protein was first soaked in 25 mM NH4HCO3 for 10 min and then in 25 mM NH4HCO3/50% acetonitrile for 10 min. After drying in a Speed-Vac (Savant), the gel is incubated in 100 1 of 2% P-mercaptoetlanol/25 mM NEt4HCO3 for 20 min at room temperature and at dark.

The same volume of 10% 4-vinylpyridine in NH4HC03/50% acetonitrile was added for alkylation of cysteine residues. After 20-min incubation, the gel was soaked in 1 ml of 25 mM NH4HC03 for 10 min. After washing in 25 mM NH4HC03/50% acetonitrile for 10 min, the gel was dried and then incubated with 25 mM NH4HC03 containing 50 ng of endopeptidase such as modified trypsin, Arg-C, Lys-C and Asp-N overnight (-18 h). The protein digests were removed from the gel, which was extracted with 200 ul of 0. 1% formic acid. These two fractions were combined together and then dried in a Speed-Vac and kept at-20°C for storage. It was resuspended in 0. 1% formic acid before LC-MS/MS analysis.

Liquid chromatography-tandem mass spectrometric analysis-electrospray mass spectrometry was performed on an ABI Q-STAR XL hybrid mass spectrometer.

The mass spectrometers were interfaced with Agilent 1100D HPLC system equipped with 150x0. 3 mm Agilent 300SB C18 columns (5, um particles and 300 A pore size) with mobile phases of solvent A and B described above. The peptides were eluted in 50 min with 5 to 85% of acetonitrile at a flow rate of 2 ul/min. For Q-STAR mass spectrometry, the mass spectra were collected as successive sets of two events, MS scan and product ion scan. The hypothetical m/z values and the selected ion chromatograms were generated using another proprietary computer program.

RESULTS AND DISCUSSION-With these PTM status profiles (Table 5), we found several significant observations: First, we observed that the modification index of each modified residue has its own unique range. For example, the modification status of phosphorylated Ser-443 and Thr-444 ranges mostly from 0.0 to 0.2, while modification status phosphorylated Tyr-435 averages only 0. 002-0. 004 in certain patients but sometimes as low as zero.

This implicates that the mechanisms, enzymatic or non-enzymatic, responsible for these post-translational modifications have distinct activities over different positions.

In other words, some residues are more prone to modifications that the others. This may reflects the differences in interaction between the target residues and their respective modifying mechanisms. Such differences may potentially result from how exposed these target residues are over the protein surface, and, most likely, how efficient the modifying/de-modifying mechanisms are.

Table 5 The albumin PTM statuses at different health conditions. CD C'> con pSer-256+ lVoclzfied cl pSer-256+ pSer-256+ M1 002 0. 059 0. 019 0. 015 0. 015 0. 044 0. 031 0. 000 pThr-260 M2 pSer-294 0. 007 0. 14 0. 030 0. 14 0. 056 0. 10 0. 055 0. 000 M3 pTyr-435 0. 004 0. 005 0. 002 0. 000 0. 003 0. 000 0. 000 0. 000 pSer-443+ M4 14 0. 20 0. 022 0. 16 0. 23 0. 079 0. 10 0. 000 pThr-444

Second, these data show that modifications at these residues are probably through well coordinated mechanisms. For example, we have attempted to identify the singly phosphorylated species of two doubly phosphorylated peptides, MI and M4.

However, the singly modified peptides are barely detectable or not even existent. If dual modification at these segments is mediated by random events, we would expect that the singly modified peptides should be much more abundant. The exclusive detection of doubly phosphorylated peptides and the unmodified counterparts argues that that modification at these neighboring sites is a well coordinated and specific process.

Third, all these modifications show distinctive modification status values under different health conditions. Some of them are consistently found in all albumin samples, while showing variations in different patient groups. For certain modifications, the difference can be over several hundred folds. Our observations establish that post-translational modification status of plasma protein/proteome may indeed change at different health conditions. More importantly, these changes could be specifically associated with particular diseases.

Fourth, our results shows that there are distinct mechanisms that mediate the modifications at different sites. If the same mechanism is responsible for such activities, we would expect a uniform elevation or depression of modification at each residue. However, our current data shows the other way (i. e. , points to the same mechanism applied in varying degrees or distinct mechanisms). For example, there is a dramatic increase of phosphorylation at Ser-294, while accompanying a decrease of phosphorylation at Ser-443 plus Thr-444 in patients with hepatocellular carcinoma.

Such a divergence in changes of modification state argues that different mechanisms are responsible for phosphorylation at these residues.

Fifth, it is quite striking that every health condition is ostensibly associated with a unique modification profile (FIG. 29-30). For example, the normal individuals have significant phosphorylation at residues Ser-443 andThr-444, with no apparent phosphorylation over Ser-256 + Thr-2609 Ser-294 and Tyr-435. There are dramatic changes in such a modification profile for diseased individuals.

For example, in patients with chronic HCV infection, there are dramatic increases of phosphorylation at Ser-294. In contrast, the increase at Ser-294 was . I relatively moderate at Tyr-394 in patients infected with HBV. It is very intriguing that the modification profiles were so different in viral infections involving the same organ.

Furthermore, we observed that the plasma albumin from hepatocellular carcinoma patients has a similar modification profile very similar to that in HCV patients. Here we show that two diseases that have distinct pathways but are related in their effects (in that they affect liver function) also have converging patterns. Nevertheless, while the data are similar, they are still distinct from each other to enable the method of the invention to distinguish between them.

Such changes in modification profiles are observed in different cancers and infections as well. For example, the plasma albumin in cervical cancer patients has

increases over three different regions, involving Ser-256/Thr-260 and Ser-294. On the other hand, HIV infection causes a dramatic general decrease of phosphorylation statuses over all tested modified residues. Again, the profile change is specifically associated with each disease. Such profile changes are not the features only seen in infectious or cancer diseases. There is also significant modification change on albumin from patients with renal failure. Its pattern is also a general decrease in phosphorylation states involving all residues. However, there is only-50% decrease in these residues, in contrast to the 10-fold depression seen in HIV patients.

In summary, all these data clearly indicate that quantitative determination of local modification states for plasma proteome has the potential to gauge various physiological or pathophysiological processes in animals. Determination of such protein/proteome structure profiles should allow us to distinguish various diseases and even to discern the distinct stages of a disease. The prowess of this proteome structure analysis system will rely on the collective consideration of changes involving multiple modification sites and, eventually, proteome quaternary structures.

(B) m-solution digestion protocol : A standard protein, bovine casein K, was first added to the sample and the mixture was subjected to reduction and alkylation.

After precipitated with the methanol/chloroform method, the proteins were resuspended in 9M urea and then subjected to in-solution digestion. The protein digests were analyzed by LC-MS/MS and the raw data were analyzed to derive the protein PTM status (FIG. 21). The little variation in three independent experiments demonstrated the consistency of such an LC-MS/MS analytical system (Table 6). It should be noted that the experiment was performed independently for three separate times.

Table 6 Consistency of LC-MS/MS analysis-based protein PTM quantitation analysis Exp FQNALLVRpY FQNALLVR #M/(#M+#U) 1 2. 7 x 104 2. 9 x 105 8. 6 x 10-2 2 2. 6x104 2. 6x105 8. 9x10-2 3 2. 8 x 104 2. 9 x 105 8. 9 x 10-2 Average = 8. 8 x 10-2 Standard Deviation = 1.7 x 10-3

In-solution protein digestion-One , g of bovine casein K in 9M urea was added into serum samples to denature and dissolve the serum proteins. p-mercaptoethanol was added and incubated at dark for 20 min and 4-vinylpyridine was added for another 20 min incubation at dark. To the solution, added were 4X volumes of methanol, 1X volume of chloroform and 3X volumes of water. After vigorous mixing by vortex for 10 min, the whole mixture was centrifuged at 3000 rpm on a tabletop centrifuge for 10 min. The upper organic phase was removed and 0.5 ml of methanol was added. After centrifugation at 3000 rpm for 15 min, the supernatant was removed. The pellet was washed with cold acetone twice, and then resuspended in 9M urea. 0.1 llg of trypsin in 25mM NH4CO3 was added to the protein sample for digestion overnight.

Liquid chromatography-tandem mass spectrometric analysis-Electrospray mass spectrometry was performed on an ABI Q-STAR XL hybrid mass spectrometer.

The mass spectrometers were interfaced with Agilent 1100D HPLC system equipped with 150#0. 3 mm Agilent 300SB CIS columns (5 pm particle diameter, 300 A pore size) with mobile phases of solvent A and B described above. The peptides were eluted in 50 min with 5 to 85% of acetonitrile at a flow rate of 2 pll/min.

The hypothetical m/z values and the selected ion chromatograms were generated using another proprietary computer program. SIC analysis was carried out with the AnalystQS software (Applied Biosystems) and our proprietary computer programs.

RESULTS AND DISCUSSION-We have developed a new procedure for

proteome protein abundance profiling, which primarily consists of inclusion of a standard protein, methanoUchloroform precipitation, in-solution digestion, LC-MS/MS analysis and protein PTM status determination (FIG. 21). In a triplicated experiment, the PTM status of albumin from a normal individual at Tyr-435 was averaged 8. 8 x 10-2 with a standard deviation of 1.7 x 10-3 (Table 6). Such a diminutive variation validated the consistency of our proteome abundance profiling procedure.

EXAMPLE 3 Establishment of plasma proteome QUATERNARY STRUCTURE profile database Gel filtration analysis was carried out on the plasma samples from normal individuals (FIG 31, the numbers above the arrows indicated the elution positions of molecular weight markers. ) and patients with renal failure (FIG 32) After addition of a standard protein, bovine casein K, the eluted fractions were subjected to in-solution digestion and the peptides were analyzed using LC-MS/MS methods. The abundance of peptides representative of different proteins (Table 7) in each fraction was determined using the methods described above are shown in Table 8. In Table 8, the relative abundance of individual proteins in GFC fractions 10-30 are shown. The relative abundances of each protein are expressed as the percentage of the highest amount of that protein in all fractions. (Table 8). The specific protein distribution silhouettes of these proteins were then graphed (FIG. 33, FIG. 34). In these figures, the numbers above the arrows in the overall elution silhouette of plasma indicated the elution positions of molecular weight markers.

Table 7 Peptide list used for generating elution silhouettes of individual proteins Peptide. Peptide sequence"MW. number, ! (Da)' : : (Da) RHPYFYAPELLFFAK 99 P02768 Serum albumin 69, 366 (SEQ ID NO : 21) precursor a' ;. i-7 Fibrinogen TFPGFFSPMLGEFVSETESR (SEQ 05 P02671 alpha/alpha-E chain 94, 973 (SEQ 10 N0 : 22)" precursor 33sMGPTELLIEMEDWKGDK3s1 1990. (SEQ 89AIQLTYNPDESSKPNMmAATLKI 2519. (SEQ IN : 24) precursor LMQCLPNPEDVK 1385. (SEQ ID : 25) alpha chain precursor VDLSFSPSQSLPASHAHLR Alpha-2-macroglobulin zu 04 P01023 277 (SEQ ID N0 : 26) precursor DIAPTLTLYVGK (SEQ 72AAC27432 Haptoglobin 38, 233 (SEQ ID : 27) Notes: a. The superscripted numbers indicate the start and end positions of the peptide in the protein sequence. b. The monoisotopic masses are listed. The mass of pyridyl-ethyl cysteine residue, 208.067, is used for calculations.

Table 8 Relative abundance of individual proteins in GFC fractions 10-30 ratio : . , P a I'--T-LuTiber-r r 10 0. 0 0. 0 6. 25 0. 0 0. 0 0. 0 0. 0 11 0. 0 0. 0 0. 0 0. 0 0. 0 0. 0 0. 0 12 0. 0 0. 0 0. 0 0. 0 0. 0 0. 0 0. 0 13 0. 0 0. 0 0. 0 0. 0 0. 0 0. 0 0. 0 14 0. 0 0. 0 2. 1 0. 0 0. 0 0. 0 0. 0 15 0. 0 0. 0 8. 0 12. 6 0. 0 0. 0 0. 0 16 0. 0 0. 0 0. 0 0. 0 0. 0 0. 0 0. 0 17 0. 0 0. 0 0. 0 0. 0 0. 0 0. 0 0. 0 18 0. 0 35. 7 6. 5 22. 1 0. 0 0. 0 0. 0 19 0. 0 100. 0 98. 7 100. 5. 1 3. 6 4. 84 20 0. 0 74. 1 99. 64. 4 10. 4 16. 3 16. 8 21 0. 0 59. 3 60. 4 59. 3 17. 9 44. 7 8. 89 22 0. 0 33 28. 4 34. 2 10. 2 76. 5 9. 46 23 0. 0 25. 9 12. 2 20. 4 61. 4 79. 5 27. 3 24 0. 0 20. 7 13. 7 13. 2 100. 100. 0 46. 7 25 0. 0 12. 2 10. 8 13. 8 25. 5 78. 4 66. 5 26 0. 0 10. 4 12. 8 10. 8 26. 7 53. 3 88. 1 27 0. 0 7. 4 7. 6 4. 1 2. 6 39. 9 93. 5 28 0. 0 6. 7 5. 5 5. 3 5. 3 31. 5 100. 0 29 0. 0 6. 6 2. 7 3. 9 18. 6 29. 5 79. 4 30 0. 0 3 1. 19. 3 49. 6

Gel filtration chromatography-Plasma was collected into a 15 mL tube, containing CPD-A (Sigma). Samples were prepared by centrifugation at 3,500 g for 10 min. Plasma proteins were kept at-80°C before gel filtration chromatographic analysis. Gel filtration analysis was performed on a Shodex PROTEIN KW-804 column (8 mm in diameter and 300 mm in length). The column was equilibrated with gel filtration equilibration buffer (PBS). The plasma was diluted with gel filtration equilibration buffer to a final volume of 100 p. l. The mixture was applied to the column and the analysis was performed with a flow rate of 0.5 mL/min at 4°C using the SPECTRA P4000 HPLC system (ThermoFinnigan), with 0.1 ml fractions collected at a rate of 5 fractions per min on a FRAC-300 fraction collector (Amersham Pharmacia).

In-solution protein digestion-One ug of bovine casein x in 9M urea was added into serum samples to denature and dissolve the serum proteins.

(3-mercaptoethanol was added and incubated at dark for 20 min and 4-vinylpyridine was added for another 20 min incubation at dark. To the solution, added were 4X volumes of methanol, 1X volume of chloroform and 3X volumes of water. After vigorous mixing by vortex for 10 min, the whole mixture was centrifuged at 3000 rpm on a tabletop centrifuge for 10 min. The upper organic phase was removed and 0.5 ml of methanol was added. After centrifugation at 3000 rpm on a tabletop centrifuge for

15 min, the supernatant was removed. The pellet was washed with cold acetone twice, and then resuspended in 9M urea. 0.1 pg of trypsin in 25mM NH4CO3 was added to the protein sample for digestion overnight.

Liquid chromatography-tandem mass spectrometric analysis-Electrospray mass spectrometry was performed on an ABI Q-STAR XL hybrid mass spectrometer.

The mass spectrometers were interfaced with Agilent 1100D HPLC system equipped with 150X0. 3 mm Agilent 300SB C18 columns (5 jjm particle diameter, 300 A pore size) with mobile phases of solvent A and B described above. The peptides were eluted in 50 min with 5 to 85% of acetonitrile at a flow rate of 2 cumin.

The hypothetical m/z values and the selected ion chromatograms were generated using another proprietary computer program. SIC analysis was carried out with the AnalystQS software (Applied Biosystems) and our proprietary computer programs.

RESULTS AND DISCUSSION-The plasma of normal individuals was resolved on a gel filtration column. Two major peaks were observed at-150 and-50 kDa region of the overall elution silhouettes defined by absorbance at 280 nm.

Meanwhile, there is one major peak whose size is larger than 700 kDa and another at -660 kDa. The patterns were quite similar between normal individuals (FIG. 31).

However, the patterns became altered in renal failure patients. First, in several patients, the separation between the 150 kD and 50 kDa peaks was blurred. There is another peak merging between these two peaks. Secondly, the >700 kDa peak was elevated abnormally in these plasma specimen. For patients #2 and #4, this peak became dominant over the-660 kDa peak (FIG. 32). In short, these observations confirm that the overall elution silhouettes became changed in patients with chronic renal failure.

We used a set of peptides to graph the specific GFC elution silhouette for their

corresponding proteins (Table 7). In the elution time 11-16 min, we collected 0.1-ml fractions at 0.5-min intervals. The protein abundance indexes of these seven proteins were determined in reference to the added standard protein (Table 8). Quite strikingly, the three proteins that were expected to be in the same quaternary structures, fibrinogen a, (3 and y, shared very similar peak shapes in their respective elution silhouettes. Since the albumin is generally in dimeric or monmeric form, it was not detectable at all at this >700 kDa elution region. C4b-binding protein (Cb4-BP) was eluted with a narrow peak base-0. 6 min and a peak time-13. 8 min. Although the peak time of C4b-BP was quite close to that of a2-macroglobulin, the elution silhouettes was easily distinguished from its much wider peak base of-2 min. At last, haptoglobin, whose tertiary structure is-50 kDa, was also detected in this region.

Based on its elution position, this haptoglobin-containing complex had an apparent MW > 700 kDa (FIG. 33; FIG. 34). These findings indicate that our procedure has the prowess to resolve quaternary structures based on their peak shapes.

EXAMPLE 4 Establishment of urine proteome STRUCTURE profile database Urine is the final destination of most plasma proteins. Therefore, analysis of the structural profile of the urine proteome should help us determine structural features of plasma proteins. We can acquire the structural information using the methodologies detailed in the current invention.

In the current invention, we followed a scheme to analyze the urine proteome structure profiles (FIG. 35). We first demonstrate how urine proteins were prepared for their proteome structural analyses such as two-dimensional gel electrophoresis (FIG 36). In the meantime, we compared the overall proteome structural profiles to identify the differentially expressed or modified proteins in diabetes mellitus patients (FIG. 37). We gather the proteome composition information by analyzing the identifies of urine proteins on these two-dimensional gels (Table 9). In these urine

proteome primary structure profiling systems, the proteins are concentrated and de-salted using methanol-chloroform method and then the proteins were prepared for in-solution enzyme digestion. The peptide products are then subjected to LC-MS/MS analysis to determine the protein abundance profiles as well as post-translational modification status profiles. In order to carry out the quantitative mass spectrometric analysis, the proteome composition and PTM information are collected first.

Table 9 Results of protein identification analyses 3,..... Spotj 1 Collagen 1 16 Gelsolin 33 Pancreatic a-amylase, a2 amylase 3 Cadherin-l, 17 Gelsolin 34 Heparan albumin, periecan al antitrypsin 4 alB Apolipoprotein 35 Prostaglandin synthase, Ig A-IV, heavy chain, beta-trace a2-glycoprotein 5 alB glycoprotein, albumin 6 Acida 20 Zn-a2-glycoprotein 36 Ig glucosidase, albumin 7 Acid 21 Fibrinogen 37 Ig chain glucosidase, endothelial cell albumin protein C receptor 8 22 Fibrinogen chain 9 Kininogen 23 Inter-a-tTypsin 39 MApl99 regenerating islet lectin inhibitor heavy la, chain 10 a2 thiol 24 Inter-a-trypsin proteinase, inhibitor heavy a2 glycoprotein, 25 ER glycoprotein 41 Ra-reactive factor uromodulin, protein C inhibitor 11 al-antitrypsin, 26 AMBP protein 42 Polymeric immunoglobulin angiotensionogen receptor, Mac-2 binding 12 al-antitrypsin, 27 Prostaglandin D thyroxine-binding synthase protein 13 al-antitrypsin 28 AMBP protein 43 Albumin 14 Antithrombin, 29 Prostaglandin 44 Uromodulin, Cl inhibitor, Mac-2 albumin, al synthase binding glycoprotein antitrypsin, 4F2 30 Albumin, 45 Orosomucoid 2, al-acid cell surface beta-trace glycoprotein 1, AMBP, pepsin A antigen, a2-antiplasmin 15 C4A

Precipitation of urine proteome-To the urine, added were 4X volume of methanol, 1X volume of chloroform and 3X volume of water. After vigorous mixing by vortex for 10 min, the whole mixture was centrifuged at 3000 rpm on a tabletop centrifuge for 10 min. The upper organic phase was removed and 0.5 ml of methanol was added. After centrifugation at the 3000 rpm on a tabletop centrifuge for 15 min, the supernatant was removed. The pellet was washed with cold acetone twice, and then resuspended in 9M urea.

Two-dimensional gel electrophoresis-The concentrated urine proteins were diluted with 94 ul lysis buffer (containing 7 M urea, 2 M thiourea, 2% DTT, 4% CHAPS and 2% pH3-10 IPG buffer) and placed onto the 18-cm immobilized pH gradient (IPG) gel, which separates proteins according to their isoelectric points (pIs).

The instrument, IPGphore IEF system, as well as the IPG gels was purchased from Amersham Pharmacie Biotech. After the first-dimension experiment was finished, the proteins were resolved by SDS-PAGE analysis. The gels were developed by Coomassie blue staining.

2-D difference gel electrophoresis-The urine proteins were diluted in 7M urea/2M thiourea containing 4% CHAPS, 2% DTT, 2% pH4-7 IPG buffer and then subjected to Cy3 or Cy5 dye labeling per instruction of the manufacturer (Amersham).

After centrifugation at 10,000g for 15 min, 400 jj. g of the sample were resolved by isoelectric focusing for 100,000 VH. After equilibration in the buffer containing 50 mM Tris-HCl pH8.8, 6M urea, 30% glycerol and 0.02% SDS, the IPG gel strip was placed at the top of a 10-15% gradient gel per instruction of the manufacturer

(Bio-RAD). The gel image was developed in a fluorescence image scanner FLA-5000 (Fuji).

In-gel digestion-The gel piece containing the studied protein was first soaked in 25 mM NH4HCO3 for 10 min and then in 25 mM NH4HC03/50% acetonitrile for 10 min. After drying in a Speed-Vac (Savant), the gel was incubated in 100 1ll of 2% (3-mercaptoethanol/25 mM NH4HCO3 for 20 min at room temperature and at dark.

The same volume of 10% 4-vinylpyridine in NH4HC03/50% acetonitrile was added for cysteine alkylation. After 20-min incubation, the gel was soaked in 1 ml of 25 mM NBLtHCOs for 10 min. After washed in 25 mM NH4HC03/50% acetonitrile for 10 min, the gel was dried and then incubated with 25 mM NH4HC03 containing 50 ng of modified trypsin (Promega) overnight (-18 h). The supernatant was separated from the gel, which was extracted with 200 p1 of 0.1% formic acid. The supernatant and the gel extract were combined together and then dried in a Speed-Vac. The samples were kept at-20°C for storage. It was resuspended in 0.1% formic acid immediately before use.

LC-MS/MS analysis-The identities of these proteins have been determined using capillary liquid chromatography-ion trap tandem mass spectrometry. A ThermoFimigan LCQ XP ion trap mass spectrometer was interfaced with an Agilent 1100D HPLC system equipped with a 150X0. 3 mm Agilent 300SB C18 column (5 llm particle diameter, 300 A pore size). The mobile phase consisted of water (solvent A) and acetonitrile (solvent B). The peptides were eluted at a constant flow rate of 1.0 plUmin, with B concentration increasing from 5 to 85% B in 50 min. The spectra for the elute were acquired as successive sets of three scan modes. The MS scan determined the intensity of the ions in the m/z range of 395 to 1605, and the most abundant ion in the MS spectrum was selected for ZOOM and MS/MS scan. ZOOM scan is a slower scan for a narrow m/z range and examined the charge number of the

selected ion. The MS/MS scan examined the fragment ions of the selected peptide ions and gave tandem mass spectra. The acquired tandem mass spectra were interpreted using a Finnigan Corporation software package, the SEQUEST Browser, which correlates the MS/MS spectra with the protein sequence database containing all human proteins. A 105. 14-Da mass tag was constitutively assigned to the Cys residue, which was modified with 4-vinylpyridine during the protein cleavage step.

RESULTS AND DISCUSSION-In the current invention, we devise a urine proteome primary structure profiling system (FIG. 35). The proteins are concentrated and de-salted using methanol-chloroform method and then the proteins were prepared for in-solution enzyme digestion. The peptide products are then subjected to LC-MS/MS analysis to determine the protein abundance profiles as well as post-translational modification status profiles. In order to carry out the quantitative mass spectrometric analysis, the proteome composition and PTM information needs to be collected first.

We used the same protein precipitation procedure to concentrate the urine proteins from normal individuals and then subject them to regular two-dimensional gel electrophoresis (FIG. 37). Clearly, the overall patterns between DM patients and normal individuals were quite distinct. Two urine samples were collected from two normal individuals and two DM patients respectively. The urine proteins were concentrated and resolved on pI4-7 isoelectric focusing-10-15% gradient SDS polyacrylamide gels. The stars (X) indicate the proteins whose accumulation was observed in DM patients. There were more proteins resolved for urine samples from DM patients. The protein identification analysis on these proteins is being carried out.

The proteins identified thus far were listed in the Table 9. With our methodologies, we have identified several proteins which were previously not known

to exist in the urine. These include collagen al, cadherin 1, alB glycoprotein, acid a glucosidase, a2 thiol proteinase, uromodulin, protein C inhibitor, thyroxine-binding protein, antithrombin, 4F2 cell surface antigen, C4A, ER glycoprotein, prostaglandin D synthase, Map 19, regenerating islet lectin la, transthyretin, Ra-reactive protein, Mac2 binding glycoprotein, Cl inhibitor, Orosomucoid 2 and pepsin A. As there were much more proteins in the urine from diabetes mellitus patients as well as others at different conditions, the urine proteome composition information should keep expanding.

EXAMPLE 5 Determination of protein primary and quaternary structure profiles of a therapeutic antibody herceptin Herceptin is a therapeutic antibody used for treating breast cell carcinoma (Genetech). Here, we employed our protein structure profiling procedures to analyze the primary and quaternary structures of this monoclonal antibody. Herceptin was subjected to in-solution digestion and then analyzed for its post-translational modifications (FIG. 38). We identified several phosphorylation sites that have not been reported previously (Table 10, FIG. 39; for both figures, pS, pT and pY mean phosphorylated Ser, Thr and Tyr residues respectively). Meanwhile, we also used gel filtration chromatography to analyze its quaternary structures (FIG. 38).

Table 10 Modified peptides identified in herceptin Sequence ; pSCAApSGFNIKDTYIHWVR 21-38 (SEQ pTApYLQMNSLRAEDTAVYYCSR ""78-98 (SEQ DIQMTQSPS 1-24 2816. 228 Heavy (SEQ Gel filtration Chromatography-Gel filtration analysis of herceptin (Lot no.

1588) was performed on a Shodex PROTEIN KW-804 column (8 mm in diameter and 300 mm in length). The column was equilibrated with gel filtration equilibration buffer (PBS). The antibody was diluted with gel filtration equilibration buffer to a final volume of 100 p. The mixture was applied to the column and the analysis was performed with a flow rate of 0.5 mL/min at 4°C using the SPECTRA P4000 HPLC system (ThermoFinnigan), with 0.1 ml fractions collected at a frequency of 5 fractions per min on a FRAC-300 fraction collector (Amersham Pharmacia).

In-solution protein digestion-Herceptin was denatured and dissolved in 9M urea. (3-mercaptoethanol was added and incubated at dark for 20 min and 4-vinylpyridine was added for another 20 min incubation at dark. To the solution, added were 4X volumes of methanol, 1X volume of chloroform and 3X volumes of water. After vigorous mixing by vortex for 10 min the whole mixture was centrifuged at 3000 rpm on a tabletop centrifuge for 10 min. The upper organic phase was removed and 0. 5 ml of methanol was added. After centrifugation at 3000 rpm on a tabletop centrifuge for 15 min, the supernatant was removed. The pellet was washed with cold acetone twice, and then resuspended in 9M urea. 0.1 µg of trypsin in 25mM NH4CO3 was added to the protein sample for digestion overnight.

Liquid chromatography-tandem mass spectrometric analysis-Electrospray mass spectrometry was performed on an ABI Q-STAR XL hybrid mass spectrometer.

The mass spectrometers were interfaced with an Agilent 1100D HPLC system equipped with 150x0. 3 mm Agilent 300SB C18 columns (5 llm particle diameter, 300 A pore size) with mobile phases of solvent A and B described above. The peptides were eluted in 50 min with 5 to 85% of acetonitrile at a flow rate of 2 J. l/min.

The acquired CID spectra were interpreted using SEQUEST Browser (ThermoFinnigan). The MS/MS spectra were correlated with the protein sequence database containing only the target protein. Enzyme was not specified in the search

parameters, which increases the confidence of identification when matched peptides have predicted cleavage sites. A 105.14-Da mass tag was constitutively assigned to the Cys residue, which was modified with 4-vinylpyridine in experiments. Those MS/MS scans that matched peptide sequences with expected cleavage sites were considered significant and also were subjected to further evaluation with our proprietary computer program to confirm the SEQUEST results. The hypothetical m/z values and the selected ion chromatograms were generated using another proprietary computer program. We particularly focused on finding methylation, simple glycosylation, acetylation, phosphorylation and other groups on these tested plasma proteins. SIC analysis was carried out with the AnalystQS software (Applied Biosystems) and our proprietary computer programs.

RESULTS AND DISCUSSION-Here, we employed our protein structure profiling procedures to analyze the primary and quaternary structures of this monoclonal antibody. This antibody was subjected to in-solution digestion and then analyzed for its post-translational modifications (FIG. 38). Thus far, we identified three phosphorylated peptides that have not been reported previously (Table 9). The exact phosphorylation positions were mapped out using tandem mass spectrometry (FIG. 39).

We are working on the quantitative analysis of these modifications.

Meanwhile, we also used gel filtration chromatography to analyze its quaternary structures. Over 99% of the herceptin protein migrated as-150-kDa proteins. Based on this figure, the quaternary structure of herceptin seems to be quite homogeneous. However, there was a minute signal over the 22.9-min region, whose identity remained to be analyzed.

EXAMPLE 6 REVERSE MOLECULAR BIOLOGY : mapping of the-1 ribosomal frameshift sites The-1 ribosomal frameshifting is an important post-transcriptional mechanism

for regulating functions of a number of viral and cellular genes. Here, we report a new liquid chromatography-tandem mass spectrometric approach for exact mapping of the frameshift windows as well as verification of the-1 frameshifted translated protein products (FIG 39). The method consists of three experimental steps: LC-MS/MS analysis of the protein digests, initial data analysis using the verified mRNA sequence and advanced data analysis using the sequences with single insertion mutations (FIG 40). Through analyzing the chimeric protein IS3-MBP-His6 (FIG 41; FIG. 42), we identified a transframe peptide DIGSLAILQKGR (SEQ ID NO : 37) that indicates the-1 frameshifting at the previously documented slippage sequence, A4G motif (FIG 43, FIG 44). Now that an overview is provided, each Figure is explained in more detail below.

In one embodiment, one novel method is used to verify the occurrence of frameshifting and to identify the-1 ribosomal frameshift window. Figure 41 is a diagram illustrating the methods The whole procedure consists of three experimental steps: (1) The protein product is digested and then subjected to LC-MS/MS analysis ; (2) The acquired MS spectta are initially analyzed with the regular nASA sequence database and (3) The MS/MS spectra are further analyzed with the MRNA sequences with single insertion mutations. A signature peptide is identified for defining the exact ribosomal frameshift window. The black line underscores the nucleotides covered by the most downstream +0-frame peptide while the hatch line highlights the residues covered by the most upstream-1-frame peptide. The shaded nucleotide sequence denotes the region harboring the-1 ribosomal frameshift window.

Figure 42 is a diagram illustrating how to generate mRNA sequences with single insertion mutations. In the region harboring the putative-1 frameshift window, a nucleotide residue is inserted every three nucleotides. The insertion position is at rear of each frame 0 codon and the inserted residue is a duplicate of the third

nucleotide of the codon. WT: wild-type; IM : insertion mutant.

Figure 43 is a diagram illustrating the composition of the IS3-MBP-His6 reporter gene. The numbers above the nucleotide sequence denote the positions of these residues relative to the translational start codon of the IS3-MBP-His6 gene. The amino acid sequences listed below are the putative products using frame 0 and frame - 1. The residues supposed to be translated upon-1 ribosomal frameshifting are underlined. A short tilted line denotes the shift from frame 0 to-1. BamHI, BssHII and Ec1136II indicate the restriction sites used during construction of the vector. T7: T7 promoter; fsw: frameshift window; His6: histidine hexamer tag.

Figure 44 shows the expression of the-1 frameshifted IS3-MBP-His6 protein products. Left. The proteins purified by nickel affinity chromatography were resolved on a 10 % SDS polyacrylamide gel and later stained with Coomassie blue dye.

Right. Proteins were electrotransferred to a polyvinylidene difluoride membrane, which is probed with MBP-specific monoclonal antibody and developed with chemiluminescent reagents. The arrow heads indicate the positions of major frameshift product in the gel and the membrane. The numbers at the left of the gel and the membrane denote the migration of molecular size markers.

Plasmid construction-Standard molecular cloning techniques were used for construction of all plasmids. For construction of plasmid pEM-IS3-MBP-His6, the plasmid expressing IS3-MBP-His6 protein, IS3 fragment containing the slippery sequence was first amplified from the chromosomal DNA of E. coli strain XL-1 Blue with the primers IS3315-338 (5'-GGATCCCTGGCTATCCTCCAAAAGGCCGCG-3' (SEQ ID NO : 31)) and IS344o-4io (5'-GGCGCGCCACCCGGAGCACGCGGCACATTG-3' (SEQ ID NO : 32) ). The PCR product was cloned into pGEMT-Easy vector to generate pGEMT-IS3. A BssHII/Ec1136II fragment of the E. coli maltose-binding protein (malE) sequence from

pMAL-C2 was inserted into the corresponding sites of pGEMT-IS3 to yield pGM-IS3-MBP. Finally, pEM-IS3-MBP-His6 was generated by inserting BamHI/Ecll36II fragments from pGM-IS3-MBP into the T7 expression vector pET29a.

Purification of transframe protein product-Escherichia coli BL21 (DE3) was transformed with indicated plasmid and then grown in LB broth supplemented with kanamycin (50 llg/ml). When the bacterial culture has an OD600 of 0.8 at its 1: 100 dilution, IPTG was added to a final concentration of 1 mM and the culture was incubated for another 2. 5 h. The cells were then harvested and disrupted with B-PER bacterial protein extraction reagent (PIERCE) and lysis buffer (50 mM NaH2P04, 10 mM Tris-HCl, 8 M urea, 500 mM NaCl, pH 8.0). The whole cell lysate was clarified by centrifugation and the solubilized recombinant protein was purified by affinity chromatography on ProBondTA4 resin (Invitrogen). A column with 2 ml of bed volume was washed with 15 ml of lysis buffer, loaded with 30 ml of cell extract, washed first with 50 ml of lysis buffer and then with 50 ml of wash buffer (500 mM NaCl, 20 mM NaP04 buffer, pH 6. 0). Bound proteins were eluted with 30 ml of elution buffer (150 mM imidazole, 500 mM NaCl, 20 mM NaP04 buffer, pH 6.0).

These proteins were concentrated and washed extensively with wash buffer using an Amicon Ultra PL-30 (Millipore). The purified proteins were resolved on 10% SDS polyacrylamide gels. Some gels were developed using Coomassie blue staining method. For other gels, the proteins were electrophoresed onto the poly-vinyl difluoride membranes. The membranes were probed with anti-MBP antibody horseradish peroxidase-conjugated anti-mouse IgG antibodies. After incubation with the chemiluminescent reagents, the signals on the membranes were detected. m-gel tryptic digestion-The gel piece containing polypeptides was soaked in 25 mM NH4HCO3 for 10 min and then in 25 mM NH4HC03/50% acetonitrile for 10

min. After drying in a Speed-Vac (Savant), the gel was incubated in 100 jn. 1 of 2% (3-mercaptoethanol/25 mM NH4HC03 for 20 min at room temperature and at dark.

The same volume of 10% 4-vinylpyridine in NH4HCO3/50% acetonitrile was added for cysteine alkylation. After a 20-min incubation, the gel was soaked in 1 ml of 25 mM NH4HCO3 for 10 min. After being washed in 25 mM N-H4HC03/50% acetonitrile for 10 min, the gel was dried and then incubated with 25 mM NH4HC03 containing 50 ng of modified trypsin (Promega) or 0.2 pg of Asp-N (Roche) overnight (18 h). The tryptic digest was removed from the gel, which was extracted with 200 1 of 0. 1% formic acid. These two fractions were combined and then were dried in a Speed-Vac and was then kept at-20°C for storage. It was resuspended in 0. 1% formic acid immediately before mass spectrometric analysis.

Mass spectrometric analysis-Electrospray ionization-ion trap tandem mass spectrometry was performed using a ThermoFinnigan LCQ Deca XP ion trap mass spectrometer interfaced with an Agilent 1100D HPLC system. A 150x0. 3 mm Agilent 300SB C18 column (3 llm particle diameter, 300 A pore size) with mobile phases of A (0. 1% formic acid in water) and B (0. 1% formic acid in acetonitrile) was used. The peptides were eluted at a flow rate of 2 jj. l/min with an acetonitrile gradient, which consisted of 5-16% B in 5 min, 16-20% B in 40 min, and 20-65% B in 40 min.

The spectra for the eluate were acquired as successive sets of three scan modes. The MS scan determines the intensity of the ions in the m/z range of 395 to 1605, and the most abundant ion was selected for zoom scan and MS/MS scan. The former examined the charge number of the selected ion and the latter acquired the tandem mass spectrum for the fragment ions derived by collision-induced dissociation. The acquired CID spectra were interpreted using a ThermoFinnigan Corporation software package, the TurboSEQUEST Browser, which correlated the MS/MS spectrum with the indicated databases. In the initial data analysis, the nucleotide sequence of

IS3-MBP-His6 was used for matching the MS/MS spectra. In the advanced data analysis, we first generated a series of nucleotide sequences with insertion mutations in the putative frameshift region (FIG. 36). Over the putative frameshift region, the last nucleotide of every codon in the frame 0 was duplicated as the inserted residue and every insertion mutant contains only one mutation.

Enzyme was not specified in the search parameters, which increased the confidence of identification when matched peptides had appropriate cleavage sites. A 105.14-Da mass tag was constitutively assigned to the Cys residue, which was modified with 4-vinylpyridine in experiments. Those MS/MS scans that matched to peptide sequences with cleavage sites at adequate positions were considered significant and also were subjected to evaluation with proprietary computer software to confirm the search results.

RESULTS AND DISCUSSION-The simultaneous slippage mechanism predicts a usage of a third nucleotide of a codon in the region containing the heptanucleotide slippery sequence as the first nucleotide of the next codon. Based on this model, the transframe protein product is equivalent to the protein product from the gene template carrying a particular single insertion mutation. The inserted nucleotide is the duplicated third nucleotide of the first codon.

In practice, the analysis consists of three steps: LC-MS/MS analysis of the protein digests, the first data analysis with verified mRNA sequence and the second data analysis with the single insertion mutant mRNA sequences (FIG. 41). In the first step, we cleaved the tested protein in the gel piece with adequate endopeptidases, such as trypsin and Asp-N, to collect the digest products. The digested products are then subjected to LC-MS/MS analysis to acquire the MS/MS spectra of these peptides.

In the second step, the MS/MS spectra are matched against the verified nucleotide sequence for identifying the possible frameshift region. The MS/MS

spectra from a transframe protein product should be able to match the amino-terminus of the frame 0 translated sequence and the carboxyl-terminus of the frame-1 translated sequence. Through the initial data analysis, we observe the most downstream peptide in the frame 0 sequence and the most upstream peptide in the frame-1 sequence. The putative frameshift region is located between these two peptides.

In the last step of analysis, we first generate a series of nucleotide sequences with insertion mutations in the putative frameshift region (FIG. 42). Over the putative frameshift region, the last nucleotide of every codon in the frame 0 is duplicated as the inserted residue and every insertion mutant contains only one mutation. Against the database containing these insertion mutant sequences, the MS/MS spectra are analyzed again. If there exists one-1 frameshift site in this region and the simultaneous slippage mechanism is used, we should be able to find a transframe peptide that matched to only one mutant sequence in the database. Examination of the corresponding nucleotide sequence of the transframe peptide will reveal where the frameshifting occurs. If there indeed exist multiple frameshift sites and distinct translated products are generated, we should allow us to identify all these locations in the same region by observing different transframe peptides.

In order to validate our analytical method, we first employed this method to verify the frameshifting potential of a previously documented frameshift window, A4G motif, from Escherichia coli IS3 sequence. We first constructed a vector that expresses IS3-MBP-His6 protein in bacterial cells. The IS3-MBP-His6 gene is a synthetic construct that contains two heterologous ORFs. The 5'ORF is part of the IS3 insertion sequence segment, whereas the 3'ORF encodes part of the maltose-binding protein with six His residues (MBP-His6) at its 5'region. The MBP-His6 nucleotide sequence is deliberately in-frame fused with the-1 reading frame of the IS3 nucleotide sequence. The MBP-His6 polypeptide, once expressed,

should allow us to rapidly detect the usage of the frame-1 sequence. It is also of great help in purification of the supposed transframe protein for subsequent mass spectrometric analysis (FIG 43).

This vector was introduced into Escherichia coli cells and IS3-MBP-His6 protein expression was induced. The proteins containing MBP-His6 domain is the whole cell extract were purified with nickel affinity chromatography. The purified polypeptides were resolved by SDS-polyacrylamide gel electrophoresis, which revealed that the nickel column enriches a multitude of polypeptides. Western blot analysis showed that only two bands were recognized by MBP-specific antibodies, indicating that the 3'ORF has been used in these protein products. These two bands have apparent molecular weights of-32. 5 K and-29. 9 K respectively, with much higher staining intensity over the upper one (FIG 44). However, it was not clear whether these polypeptides indeed contained the sequence encoded by 5'ORF and are the true transframe proteins.

Based on the verified IS3-MBP-His6 sequence, the transframe product is supposed to have 302 amino acid residues and a molecular weight of 33, 334. Since the most intensely stained band had a molecular size of-32. 5 kDa, this band was supposed to be the intact transframe protein. Thus, this band was excised and was subjected to in-gel digestion with trypsin and Asp-N. The first LC-MS/MS analysis using the verified mRNA sequence identified eleven Asp-N and nineteen tryptic peptides, which covered 648 of 909 nucleotides (71%). All these peptides were found from the frame-1 downstream of the putative fsw position (FIG. 45). In Figure 45, the summary of the result of the initial data analysis on IS3-MBP-His6. The amino acid sequences of the frame 0 and-1 are listed below the nucleotide sequence of the IS3-MBP-His6 gene. The numbering at right indicates the position of a nucleotide relative to the translation start codon. The shaded characters highlight the residues

corresponding to the recovered peptides. The hatched lines underscore the amino acids in the-1 frame identified during the analysis. The box indicates the putative frameshift window. In accordance with the results with Western blot analysis, these results indicate the inclusion of MBP-His6 domain in the protein product, while it was yet to determine whether any of 5'ORF sequence was translated into this polypeptide.

Among the frame-1 peptides, DILR is the most upstream one (FIG. 45) while we found no peptides from to the 5'ORF. This indicated that the-1 frameshifting should occur before the nucleotide 87 of the IS3-M : BP-His6 gene. Based on this prediction, we generated a series of single insertion mutants with duplication of the third residue of each frame 0 codon. For instance, the first insertion mutant has a G added behind the first frame 0 codon, ATG, while the second mutant has a A added behind the second frame 0 codon, AAA (FIG. 43; FIG. 46). In figure 46, the a, b and y ions are assigned based on the common nomenclature system. Their positions in the peptide are summarized on the peptide sequence at top.

Against the database containing these single insertion mutants, the LC-MSIMS data were analyzed again. Besides those peptides identified during the first analysis, one specific peptide DIGSLAILQKGR (SEQ ID N0 : 33) was identified according to its tandem mass spectrum (FIG. 46). Among all mutant sequences in the database, it only matched with the sequence with insertion of duplicated adenosine residue at the either position 75 or 78, indicating that either position could be exact-1 ribosomal frameshift site. This peptide has the 18th residue of the frame 0 sequence at its amino-terminus and the 28th residue of the frame-1 sequence at its carboxyl-terminus, indicating it is a transframe peptide. The identification of this peptide not only indicates that 5'ORF was indeed used in this protein product but also that this polypeptide is a bona fide transframe protein. Therefore, our results successfully demonstrated that the frameshifting potential of the A4G slippery signal was preserved

in such a synthetic reporter system.

Methods for the practice of the invention have been described. Other details are known biological techniques and may be found, for example, in Yates et al. , 1995a, Yates et al. , 1995b, Tsay et al 2000, Chiu et al. , 2002; Chen et al. , 2002; Wang et al., 2003.

CITED REFERENCES Anderson, N. L. , Anderson, N. G. (2002) The human plasma proteome. Mol. & Cell. Proteomics 1,845-863 Betts, J. C. , Blackstock, W. P. , Ward, M. A. , and Anderton, B. H. (1997) Identification of phosphorylation sites on neurofilament proteins by nanoelectrospray mass spectrometry. J. Biol. Chem. 272,12922-12927 Bjellqvist B, Pasquali C, Ravier F, Sanchez JC, Hochstrasser D. (1993) A nonlinear wide-range immobilized pH gradient for two-dimensional electrophoresis and its definition in a relevant pH scale. Electrophoresis 14 (12): 1357-65) Carr, S. A. , Huddleston, M. J. , and Annan, R. S. (1996) Selective detection and sequencing of phosphopeptides at the femtomole level by mass spectrometry. Anal.

Biochem. 239, 180-192, doi : 10. 106/abio. 1996. 0313 Chen, C. -W., Tsay, Y.-G., Wu, H. -L., Lee, C. -H., Chen, D. -S., Chen, P. -J. (2002) The double-stranded RNA-activated kinase, PKR, can phosphorylate hepatitis D virus small delta antigen at functional serine and threonine residues. J Biol Chem.

277 (36): 33058-67 Chiu, C. -M., Tsay, Y.-G., Chang, C. -J., Lee, S. -C. (2002) Noppl40 is a mediator of the protein kinase A signaling pathway that activates the acute phase response alphal-acid glycoprotein gene. J Biol Chem. 277 (42): 39102-11 Jonscher, K. R. , and Yates, J. R. , 3rd (1997) The quadrupole ion trap mass spectrometer-A small solution to a big challenge. Anal. Biochem. 244,1-15,

doi: 10.1006/abio. 1996.9877) Kenyon, G. L. , DeMarini, D. M. , Fuchs, E. , Galas, D. J. Kirsch, J. F., Leyh, T. S.

Moos, W. H., Pctsko, G A. , Ringe, D. , Rubin, G. M. , and Sheahan, L. C. (2002) Defining the mandate of proteomics in the post-genomics era: workshop report. Mol.

Cell. Proteomics 1, 763-780 Li, S. , and Dass, C. (1999) Iron (III)-immobilized metal ion affinity chromatography and mass spectrometry for the purification and characterization of synthetic phosphopeptides. Anal. Biochem. 270,9-14 Neubauer, G., and Mann, M. (1999) Mapping of phosphorylation sites of gel-isolated proteins by nanoelectrospray tandem mass spectrometry : Potentials and limitations. Anal. Chem. 71, 235-242 Neville, D. C. , Rozanas, C. R. , Price, E. M., Gruis, D. B., Verk-man, A. S. , and Townsend, R. R. (1997) Evidence for phosphorylation of serine 753 in CFTR using a novel metal-ion affinity resin and matrix-assisted laser desorption mass spectrometry.

Protein Sci. 6,2436-2445 Oda, Y., Nagasu, T., and Chait, B. T. (2001) Enrichment analysis of phosphorylated proteins as a tool for probing the phosphoproteome. Nat. Biotechnol.

19,379-82.

Posewitz, M. C. , and Tempst, P. (1999) Immobilized gallium (III) affinity chromatography of phosphopeptides. Anal. Chem. 71, 2883-2892 Tsay, Y-G., Wang, Y.-H., Chiu, C. -M., Shen, B. -J., Lee, S. -C. (2000) A strategy for identification and quantitation of phosphopeptides by liquid chromatography/tandem mass spectrometry Anal. Biochem. 287 (1) : 55-64 Yates, J. R. , 3rd, Eng, J. K., McCormack, A. L. , and Schieltz, D. (1995a) Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal. Chem. 67, 1426-1436

Yates, J. R. , 3rd, Eng, J. K. , and McCormack, A. L. (1995b) Mining genomes: Correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. Anal. Chem. 67, 3202-3210 Zhou, H. , Watts, J. D. , Aebersold, R. (2001) A systematic approach to the analysis of protein phosphorylation. Nat. Biotechnol. 19, 375-8.