Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DETECTING SINGLE AMINO ACID POLYMORPHISMS BY ISOTOPIC LABELING AND MASS SPECTROMETRY
Document Type and Number:
WIPO Patent Application WO/2007/134117
Kind Code:
A2
Abstract:
A combination of split-field drift tube/mass spectrometry and isotopic labeling techniques is described for identifying single amino acid polymorphisms (SAAPs).

Inventors:
CLEMMER DAVID E (US)
VALENTINE STEPHEN J (US)
Application Number:
PCT/US2007/068584
Publication Date:
November 22, 2007
Filing Date:
May 09, 2007
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV INDIANA RES & TECH CORP (US)
PREDICTIVE PHYSIOLOGY AND MEDI (US)
CLEMMER DAVID E (US)
VALENTINE STEPHEN J (US)
International Classes:
G06F19/22
Other References:
JULKA ET AL.: 'Quantification in proteomics through stable isotope coding: A review' JOURNAL OF PROTEOME RESEARCH vol. 3, no. 3, May 2004 - June 2004, pages 350 - 363
GATLIN ET AL.: 'Automated identification of amino acid sequence variations in proteins by HPLC/microspray tandem spectrometry' ANAL. CHEM. vol. 72, no. 4, February 2000, pages 757 - 763
LIU ET AL.: 'Technique Review: Development of high throughput dispersive LC-ion mobility-TOFMS techniques for analyzing the human plasma proteome' BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS vol. 3, no. 2, August 2004, pages 177 - 186
Attorney, Agent or Firm:
ADDISON, Bradford, G. (11 South Meridian StreetIndianapolis, IN, US)
Download PDF:
Claims:
CLAIMS:

1. A method of detecting the presence of a single amino acid polymorphism in a complex mixture of peptides, said method comprising the steps of providing a first and second sample, said first and second samples comprising a plurality of peptides; labeling the peptides of the first sample with a first label and labeling the peptides of the second sample with a isotopic derivative of said first label; mixing the labeled first and second samples together to provide a mixed composition; and subjecting the mixed composition to analysis by a gas-phase separation technique and time of flight (TOF) mass spectrometer to detect peptides in the first sample having a single amino acid polymorphism relative to peptides in the second sample.

2. The method of claim 1 wherein the gas-phase separation technique is selected from the group consisting of ion mobility spectrometry and field asymmetric waveform ion mobility spectrometry.

3. The method of claim 1 wherein the gas-phase separation technique is ion mobility spectrometry.

4. The method of claim 3 wherein said peptides have an average size ranging from about 5 to about 30 amino acids.

5. The method of claim 1 wherein a third sample is provided, wherein the step of labeling the peptides comprises labeling the first, second and third samples with isotopic labels, in a manner that allows the first, second and third sample peptides to be distinguished from one another through the use of mass spectrometry, wherein the first, second and third samples are then mixed together and multiplexed in a single experiment.

6. The method of claim 5 wherein a fourth sample is provided, wherein the step of labeling the peptides comprises labeling the first, second, third and fourth samples with isotopic labels, in a manner that allows the first, second, third and fourth sample peptides to be distinguished from one another through the use of mass spectrometry, wherein the first, second, third and fourth samples are then mixed together and multiplexed in a single experiment.

7. The method of claim 1 wherein the isotopic derivative of the first label is a deuterium analogue of the first label.

8. The method of claim 7 wherein the first label comprises a methyl group and the isotopic derivative of the first label comprises a CD 3 group substituting for the methyl group of the first label.

9. The method of claim 3 wherein said ion mobility spectrometer is a split-field drift tube mass spectrometer.

10. The method of claim 9 further comprising the step of conducting collision induced dissociation in the second, short mobility region at the back of the split- field drift tube.

11. The method of claim 1 further comprising the step of separating the labeled peptides by 2-D liquid chromatography techniques prior to the step of subjecting the mixed composition to analysis by a gas-phase separation technique and time of flight (TOF) mass spectrometer.

12. The method of claim 1 further comprising the step of examining ion mobility spectrometer peak shapes and positions as a means of confirming single amino acid polymorphism peptides that are indicated based on m/z analysis.

13. A method of detecting the presence of a single amino acid polymorphism in a complex mixture of peptides said method comprising the steps of isolating a first and second set of polypeptides from two distinct biological sources; cleaving said first and second sets of polypeptides to produce a first and second sample, said first and second samples comprising a plurality of peptides wherein the average length of said peptides is within a range of about 5 to 30 amino acids; labeling the peptides of the first sample with a first label and labeling the peptides of the second sample with a isotopic derivative of said first label; mixing the labeled first and second samples together to provide a mixed composition; electrospraying the mixed composition to create labeled peptide ions; and subjecting the labeled peptide ions to analysis by an ion mobility spectrometer and time of flight (TOF) mass spectrometer to detect peptides in the first sample having a single amino acid polymorphism relative to peptides in the second sample.

14. The method of claim 13 wherein the polypeptides are cleaved with trypsin.

15. The method of claim 13 , wherein: the labeled peptide ions are stored in an ion funnel trap prior to subjecting the labeled peptide ions to the combination of ion mobility spectrometry and mass spectrometry.

16. A method for determining the relative abundance and identity of single amino acid polymorphism present in a sample, said method comprising the steps of providing a first and second sample, said first and second samples comprising a plurality of peptides; labeling the peptides of the first sample with a first label and labeling the peptides of the second sample with a isotopic derivative of said first label; mixing the labeled first and second samples together to provide a mixed composition; electrospraying the mixed composition to create labeled peptide ions; storing the labeled peptide ions in an ion funnel trap; introducing a pulse of the labeled peptide ions into the a split-field ion mobility spectrometer and time of flight (TOF) mass spectrometer for analysis to detect peptides in the first sample having a single amino acid polymorphism relative to peptides in the second sample; introducing a second pulse of the labeled peptide ions into the a split-field ion mobility spectrometer and subjecting said second pulse of the labeled peptide ions to collision induced dissociation to generate a ladder of peptide fragments; and determining the amino acid sequence of the single amino acid polymorphism peptide through analysis of the mass spectrometer data.

17. The method of claim 16 further comprising the step of examining IMS peak shapes and positions, prior to introducing the second pulse of the labeled peptide ions into the a split-field ion mobility spectrometer, as a means of confirming the presence of single amino acid polymorphism peptides.

18. The method of claim 16 wherein the peptides of first and second sample were prepared from a first and second set of polypeptides isolated from two distinct biological sources.

Description:

DETECTING SINGLE AMINO ACID POLYMORPfflSMS BY ISOTOPIC LABELING AND MASS SPECTROMETRY

BACKGROUND Missense gene mutations [i.e., nucleotide substitutions that result in a single amino acid polymorphism (SAAP)] are one of the most common forms of genetic alteration. Such events account for -53% and -46% of mutations catalogued in the Human Gene Mutation Database and the Online Mendelian Inheritance in Man databases, respectively. The substitution of one amino acid for another can alter protein structure and function as well as its solubility and stability, and may lead to disease. Recently, it has been estimated that roughly 60% of SAAPs in the Swiss-Prot protein database are linked to human diseases. For example, mutations have been found that disrupt protein activation; protein-protein interactions; as well as DNA-binding, ATP -binding, and catalytic protein domains. One of the more widely known examples of a SAAP-associated disease is sickle cell anemia which results from the single amino acid substitution of a glutamic acid residue for a nonpolar valine residue in the β-chain of hemoglobin. Many other SAAP-related pathologies exist, ranging from psychiatric disorders to cardiovascular disease. A relatively recent study determined the role of SAAPs in the development of graft versus host disease indicating the relevance of SAAP determination for tissue typing. SAAPs not only play a significant role in the development of certain diseases but may also affect treatment strategies. Polymorphisms in drug metabolizing enzymes are of interest because they have been implicated as the primary source of variability in interindividual drug response. For example, genetic data have been used to group individuals into drug metabolizing categories ranging from "poor" to "ultrarapid" metabolizers. Polymorphisms within drug metabolic enzymes may result in individuals being categorized as poor or intermediate responders; utrarapid metabolism, however, primarily arises from gene duplication. Because drug toxicity is associated with metaobolism, the analysis of SAAPs within metabolic enzymes is of considerable interest.

Generally, the presence of a genetic mutation associated with a disease is determined by single nucleotide polymorphism (SNP) analysis. Multiple analytical platforms exist for SNP analysis and several offer advantages in sensitivity (e.g., those that amplify the molecules of interest via the polymerase chain reaction) and throughput (e.g., multiplexed gene chip approaches). However, SNP analyses suffer from several drawbacks, especially as

they relate to SAAPs. First, the presence of a SNP provides no information about the expression level of a particular gene product. Thus such an analysis may not provide important information about the onset of disease. Second, only 1% of SNPs are actually manifested as SAAPs, therefore many observed SNPs are not relevant at the protein level. Third, as with any other screening techniques, SNP analyses can, at times, be inaccurate.

At present, the determination of SAAPs within proteins can be cumbersome for mass spectrometry (MS) techniques. Typically, a purified protein is enzymatically digested and the masses of the resulting peptides are measured and compared with expected masses obtained from a sequence database. Peptides with molecular weights that do not match theoretical values are candidate SAAP peptides (mismatches may also result from post- translational modification). Problems can then arise due to the completeness or fidelity of the database used in the comparison. Additionally, finding all SAAPs within a protein requires the observance of peptides across the entire protein sequence (i.e., complete sequence coverage). As obtaining high sequence coverage of known peptides may be difficult with enzymatic digestion due to digestion efficiency as well as the fact that some SAAPs are expected to occur at the cleavage sites of specific enzymes, the use of multiple enzymes may at times be required to reveal protein polymorphisms.

As disclosed herein an improved method for analyzing samples for the presence of SAAPs is provided, wherein a first and second sample are compared to one another to identify SAAPs. More particularly, the method comprises separately labeling the first and second populations of polypeptide with two labeling entities that can be distinguished from one another. The first and second samples are then mixed and subjected to split-field drift tube/mass spectrometry techniques to detect SAAPS.

SUMMARY A device, system or method in accordance with illustrative embodiments of the present disclosure may include one or more of the following features or combinations thereof:

In one illustrative embodiment a method for identifying a SAAP is described. The method includes digesting a protein to produce peptides. A number of the peptides can be labeled with an isotopic reagent. The labeled peptides can be subjected to a combination of ion mobility spectrometry and mass spectrometry. In one embodiment a first set of the

peptides are labeled with a light isotopic agent relative to a second isotopic agent used to label a second set of peptides. In one embodiment the isotopic reagent can be a heavy isotropic agent. The peptides can be tryptic peptides or the products of other known proteolytic or chemical fragmentation. In accordance with one method two separately and distinctly labeled pools of peptides are combined in a single fluid to form a labeled peptide composition. The labeled peptide composition can be electrosprayed to create labeled peptide ions. The labeled peptide ions can then be subjected to a combination of ion mobility spectrometry and mass spectrometry. In one embodiment the labeled peptide ions are stored in an ion funnel trap prior to subjecting the labeled peptide ions to the combination of ion mobility spectrometry and mass spectrometry.

In another illustrative embodiment a system for identifying SAAPs is described. The system comprises an ion mobility spectrometry device functionally coupled to a mass spectrometry device. The system can comprise an ion trap device functionally coupled to the ion mobility spectrometry device and the mass spectrometry device. The system can comprise an electrospray ionization device coupled to the ion trap device, the mobility spectrometry device, and the mass spectrometry device.

In yet another illustrative embodiment an analytical method is described. The method comprises enzymatically digesting or chemically fragmenting two separate samples containing related proteins to produce two pools of related peptides. The two pools of peptide are then labeled in a manner that the individual species of one of the pools of peptides can be distinguished from peptides in the other pool of peptides using mass spectrometry techniques. In one embodiment this is accomplished by labeling the peptides with an isotopic reagent. The method further comprises combining the two pools of labeled peptides and subjecting the labeled peptides to a combination of ion mobility spectrometry and mass spectrometry to generate data. This data can be utilized to identify a number of SAAPs.

Additional features of the present disclosure will become apparent to those skilled in the art upon consideration of the following detailed description of preferred embodiments exemplifying the best mode of carrying out the subject matter of the disclosure as presently perceived.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 represents a schematic diagram of one embodiment of an IMS-MS instrument.

Figs. 2A & 2B show two-dimensional (2D) tjr>(m/z) dot plots for the cytochrome c (Fig. 2A) and hemoglobin (Fig. 2B) digest mixtures. On the left of each 2D plot are total-ion mass spectra obtained by integrating all ion counts across the drift time range at each m/z value. On the bottom of each 2D plot are total ion drift time distributions obtained by integrating all ion counts across the entire mass spectrum at each drift time bin. Charge-state families in both 2D plots are delineated with solid lines. An intensity threshold of 4 counts was used for each 2D plot. In these experiments, peptides from bovine proteins are labeled with the light (H 3 ) label while peptides from equine (cytochrome c) and sheep (hemoglobin) are labeled with the heavy (D 3 ) label. Figs. 3A & 3B. Fig. 3 A shows a two-dimensional tjfjn/z) dot plot representing an expanded region of Fig. 2A for the cytochrome c digest mixture. The mobility distributions for features labeled A through E in the 2D plot are shown in Fig. 3B. The mobility distributions are obtained by integrating all drift bins across the 2D plot range at the m/z value corresponding with the monoisotopic peak for each feature. The identity of the assigned doublet feature is given as the peptide sequence, YIPGTK (SEQ ID NO: 1). The SAAP peptide sequences are given in Table II and are assigned the sequence identifiers: GITWK (SEQ ID NO: 8), NKGITWK (SEQ ID NO: 9), TYTDANK (SEQ ID NO: 10), TYTDANKNK (SEQ ID NO: 11), KTEREDLIAY (SEQ ID NO: 12), TGQAPGFTYTDANK (SEQ ID NO: 3), KTGQ APGFSY (SEQ ID NO: 13), KGEREDLIAY (SEQ ID NO: 14), TGQAPGFSYTDANK(SEQ ID NO: 15).

Figure 4A- 4C. Fig. 4A shows an expanded region of the 2D dot plot of the hemoglobin digest mixture shown in Figure 2B. Here the lower-mobility, singly-charged peptide ions are highlighted and assigned features are labeled. The sequences for the SAAP and unique peptides are given in Table III and are assigned the sequence identifiers: GNVK(SEQ ID NO: 21), NFGK (SEQ ID NO: 5) GHGAK (SEQ ID NO: 4), AVTAFWGK (SEQ ID NO: 22), GNVKAAWGK (SEQ ID NO: 23), AAVTAFWGK (SEQ ID NO: 24), VDEVGGEALGR (SEQ ID NO: 25), AAVTAFWGKVK (SEQ ID NO: 26), VKVDEVGGEALGR (SEQ ID NO: 27), GTFAALSELHCDK (SEQ ID NO: 28), EFTPVLQADFQK (SEQ ID NO: 29), WAGVANALAHRYH (SEQ ID NO: 30), VGGHAAEYGAEALER (SEQ ID NO: 31), GHGEK (SEQ ID NO: 32), AAVTGFWGK (SEQ ID NO: 33), SNVKAAWGK (SEQ ID NO: 34), AAVTGKWGK(SEQ ID NO: 35), VDEVGAEALGR (SEQ ID NO: 36), VVAGVANALAHK (SEQ ID NO: 37),

GHGEKVAAALTK (SEQ ID NO: 38), VKVDEVGAEALGR (SEQ ID NO: 39), GTFAQLSELHCDK (SEQ ID NO: 40), VGGNAGAYGAEALER(SEQ ID NO: 41), WAGVANALAHKYH (SEQ ID NO: 42), HHGNEFTPVLQADFQK (SEQ ID NO: 43), FFEHFGDLSNADAVMNNPK (SEQ ID NO: 44). Two doublets representing identical peptides are indicated as Di and D 2 . Fig. 4B represents the mass spectra obtained by integrating all ion counts across the entire drift time distribution, whereas Fig. 4C represents the mass spectra obtained by integrating only ions within a narrow drift bin tolerance (± 3 bins) of the [M+H] + mobility family line shown in Figure 2B. A SAAP peptide as well as a unique peptide (see Table III; attached hereto) not observed in the total-ion mass spectrum are labeled in the mobility-selected mass spectrum.

Figs. 5A & 5B show mobility-selected MS/MS spectra obtained by integrating a narrow range of drift bins (centered about the precursor ion drift time) for each m/z value. Precursor ions as well as fragment ions from the homologous y-series are labeled. The mass spectrum shown in Fig. 5A was obtained upon collisionally activating the indicated SAAP peptide, WAGVANALHK (SEQ ID NO: 20; sheep hemoglobin) while the mass spectrum shown in Fig. 5B is obtained upon collision-induced dissociation of a peptide doublet (having the sequence TGPNLHGLFGR (SEQ ID NO: 51)) from the cytochrome c digest mixture.

Fig. 6 shows the precursor mass spectral doublet (having the sequence TGQAPGFTYTDANK (SEQ ID NO: 3)), indicating the incorporation of a single label.

DETAILED DESCRIPTION

While the invention is susceptible to various modifications and alternative forms, specific embodiments will herein be described in detail. It should be understood, however, that there is no intent to limit the invention to the particular forms described, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

As used herein, the term "peptide" encompasses a sequence of 3 or more amino acids and typically less than 50 amino acids, wherein the amino acids are naturally occurring or synthetic (non-naturally occurring) amino acids. Synthetic or non-naturally occurring amino acids refer to amino acids that do not naturally occur in vivo but which, nevertheless, can be incorporated into the peptide structures described herein.

Furthermore, the terms "polypeptide" and "protein" are terms that are used interchangeably to refer to a polymer of amino acids, without regard to the length of the polymer. However, polypeptides and proteins have a typical polymer length that is greater than that of "peptides." As disclosed herein, a method of using mass spectrometry to analyze complex polypeptide mixtures for the presence of single amino acid polymorphisms (SAAPs) is provided. One problem associated with the existing MS analysis of complex mixtures is that the signal from low-abundance species (or those with low ionization efficiencies) can be obscured by isobaric ions. The use of IMS separation, or other related gas-phase separation techniques such as field asymmetric waveform IMS (FAIMS) can reduce such chemical noise, allowing features to be evident that would otherwise not be observed with MS analysis alone, as they would be buried in the higher-signal chemical noise. The ability to identify low-abundance (or low-response) species leads to relatively high sequence coverage for the proteins, with relatively short analysis times of < 60 seconds. As described herein, drift tube/mass spectrometry techniques are combined with isotopic labeling methods to detect, and optionally identify, peptides that have a single amino acid polymorphism (SAAP). Accordingly, this disclosure demonstrates the aforementioned combination as a rapid and sensitive approach to distinguish and identify SAAP peptides. One embodiment of the present invention is directed to a method of identifying the presence of single amino acid polymorphisms (SAAP) in one or more peptides present in a sample, relative to a second sample. In accordance with one embodiment two or more samples are provided wherein the polypeptides of each of the samples is isolated from a similar source. For example, each of the samples may comprise the total proteins isolated from a cell extract or from a bodily fluid (e.g. blood, serum, plasma, urine, saliva, spinal fluid and lymphatic fluids) of different individuals of the same species, including for example humans. In one embodiment the proteins are isolated from an individual's plasma or serum. Alternatively, the samples may comprise a only a subset of the total proteins initially isolated from a natural source.

Typically, the samples will be isolated from different individuals of the same species. However, comparisons can be conducted between samples recovered from different species, particularly when a subset of the total isolated proteins is subjected to analysis. Alternatively, the samples may come from different bodily fluids or tissues of the same individual

(including for example, comparison of samples recovered from tumor tissue relative to non- tumor tissue). In one embodiment, two samples are provided, representing a first and second set of polypeptides isolated from two different individuals of the same species.

Proteins can be recovered from their native sources (e.g. cell extracts, serum or plasma) using standard techniques known to those skilled in the art. Furthermore, the proteins initially recovered can be separated into subsets based on molecular weight, charge, antibody or ligand interactions, hydrophobicity/hydrophilicity, or other physical properties, or any combination thereof, prior to SAAP analysis. In accordance with one embodiment, each of the polypeptides of a first sample share at least 60% sequence identity with its corresponding peptide present in a second sample. More typically, the corresponding peptides of the first and second samples share at least 65, 75, 80, 85, 90 or 95% amino acid sequence identity. In one embodiment the vast majority of the polypeptides (e.g. 80-90% of the total polypeptides present) contained in the respective samples are identical to one another based on their amino acid sequences, with the remaining polypeptides of each sample exhibiting only 80, 85, 90 or 95% amino acid sequence identity to one another.

Prior to SAAP analysis the polypeptides present in the samples are subjected to one or more treatments to fragment the polypeptides into peptides having sizes ranging from about 5 amino acids in length to about 40 amino acids, or ranging from about 10 amino acids in length to about 30 amino acids. In one embodiment the polypeptides present in the samples are fragmented, and optionally segregated based on size (using filtration or chromatography techniques), to produce compositions that each comprise peptides having an average size ranging from about 5 amino acids in length to about 30 amino acids, or in on embodiment having an average size ranging from about 10 amino acids in length to about 20 amino acids. The reduction in average length of the peptides can be accomplished using any standard technique, including enzymatic or chemical treatment of the polypeptides. In accordance with one embodiment the polypeptides are treated with a proteolytic enzyme such as trypsin, endo-Lys-C, endo-Arg-C or similar proteolytic enzymes known to those skilled in the art. In one embodiment the polypeptides of a first and second sample are digested with trypsin to produce a first and second composition, each comprising peptides having an average size ranging from about 5 amino acids in length to about 30 amino acids.

After the peptides have been fragmented to the desired size, the peptides are optionally purified, using standard techniques, and labeled. More particularly, the peptides of

each sample are differentially labeled using labels that can be independently detected during mass spectrometry analysis. In accordance with one embodiment, more than two samples are analyzed in one experiment to determine the presence of SAAPs in the respective samples. In this method each set of peptides present in the samples to be analyzed are separately labeled with a different isotopic label wherein each label can be distinguished from one another by mass spectrometry analysis. In accordance with one embodiment the iTRAQ (Applied Biosystems) technique is used to chemically tag the N-terminus of the peptides. Four distinct isotopic tags are available enabling four different conditions to be multiplexed together in one experiment. The two, three or four labeled samples are then combined, optionally fragmented by condensed-phase separation (e.g. by nanoLC) and analyzed by an ion mobility spectrometer and time of flight (TOF) mass spectrometer. Measurement of the intensity of these reporter ions, enables relative quantification of the peptides in each sample.

In accordance with one embodiment the analysis is conducted using two separate samples of peptides. In one embodiment the first and/or second set of peptides are isotopically labeled using standard techniques known to those skilled in the art and previously described. See for example, Julka, S.; Regnier, F. J. Proteome Res. 2004, 3, 350- 363, Goshe, M. B.; Smith R. D. Curr. Opin. Biotechnol. 2003, 14(1), 101-109, Tao, W. A.; Aebersold, R Curr. Opin. Biotechnol. 2003, 14(1), 110-118 and Ferguson, P. L; Smith, R. D. Annu. Rev. Biophys. Biomol. Struct. 2003, 32, 399-424. In accordance with one embodiment the global internal standard technology (GIST) has been used to isotopically label the primary amines of all tryptic peptides with light (H 3 )- or heavy (D 3 )-labels (for the respective proteins in each pair). GIST techniques have been fully described in the prior art and are known to those skilled in the art (see for example, Ji, J.; Chakraborty, A.; Geng, M.; Zhang, X.; Amini, A.; Bina, M.; Regnier, F. J. Chromatogr. B, 2000, 745, 197-210; and Geng, M.; Ji, J.; Regnier, F. E. J. Chromatogr. A, 2000, 870, 295- 313). Digest mixtures are then analyzed with IMS-MS instrumentation. As shown herein the mobility distributions between complementary SAAP peptides are often similar (though they may be shifted in overall drift time) and can thus be used as a signature to link related SAAP peptides. It should be appreciated that differences in mobilities are useful for interpreting spectra. In addition, the IMS-MS approach reveals many identical peptides and peptide variants resulting in relatively high sequence coverage for both protein pairs.

In accordance with one method disclosed herein, ion mobility spectrometry is used with time-of-flight (TOF) mass spectrometry to characterize peptides generated from purified proteins (e.g. produced from tryptic digests of the proteins) as well as peptide mixtures produced from proteomics samples. Detailed reviews of the history, instrumentation, and theory of IMS have been given elsewhere. Various IMS devices can be used in conjunction with the presently disclosed method of SAAP analysis, and one exemplified embodiment is split-field IMS instrumentation (Valentine, S. J.; Koeniger, S. L.; Clemmer, D. E. Anal. Chem. 2003, 75, 6202-6208). Fig. 1 shows a schematic diagram of one instrument that has been used for conducting SAAP analysis. Peptide ions are produced by electrospray ionization (ESI) of an infused peptide solution. Ions are stored in an ion funnel trap similar to that described by Smith and coworkers (see Shaffer, S. A.; Prior, D. C; Anderson, G. A.; Udseth, H. R.; Smith, R. D. Anal. Chem. 1998, 70, 4111-4119, Him, et al.,. Anal. Chem. 2000, 72, 2247-2255 and Tang, et al., Anal. Chem. 2005, 77(10), 3330-3339).

In accordance with one embodiment, other known related gas-phase separation techniques can be used in place of IMS to separate the ionized labeled peptides. Such related gas-phase separation techniques include field asymmetric waveform IMS (FAIMS) separations (see Buryakov IA, Rrylov EV, Nazarov EG, Rasulev UK. Int J Mass Spectrom Ion Processes. 1993;128:143 and Shvartsburg AA, Tang K, Smith RD. Anal Chem. 2004;76:7366). FAIMS and other known related gas-phase separation techniques can be used in conjunction with mass spectrometry analysis to identify the presence and/or identity of SAAPs present in one or more samples. As used herein the phrase "gas-phase separation techniques" relates to separations based on mobilities of ions in buffer gases under the influence of an applied electric field. Such separations can include devices that use gas flow. Periodically, a pulse (100 μs wide) of ions from the funnel is introduced into a split- field drift tube. In the first drift region (200 cm), ions are separated based on differences in their mobilities through a buffer gas (for example, using ~2.8 and 0.2 Torr He and N 2 , respectively) under the influence of a weak electric field (-10 to 11 V-cm "1 ). The field in the second region of the drift tube (1.3 cm) can be set to transmit precursor ions or to induce fragmentation. Ions exiting the drift tube are extracted into the source region of an orthogonal-extraction TOF mass spectrometer. Here, high-frequency, high-voltage pulses synchronous with the mobility measurement pulse are used to initiate flight time (tp) measurements. The time required for an ion to traverse the drift tube (drift time -to) is

dependent on the mobility (K) of the ion as given by the expression K = (Vo-E '1 ). Here V D is the drift velocity and E is the drift field. The transit time of an ion (and thus VD) is dependent on the overall charge state of the ion as well as its collision cross section. That is, more highly charged ions will have shorter tos than those of lower charge and more compact ions will have shorter tr>s than more elongated species. Additionally, because the timescale of the drift measurement is significantly longer than that of the TOF measurement (ms versus /is), one can record flight time distributions within individual drift time windows. This is termed a "nested" drift(flight) time fø(tø-)] measurement. Alternatively, tp values can be converted to mass-to-charge ratios (m/z) to produce fo(τw/z) datasets. In accordance with one embodiment a method of detecting the presence of a single amino acid polymorphism in a complex mixture of peptides is provided. The method comprises the steps of obtaining a first and second sample, wherein the first and second samples each comprise a plurality of peptide sequences (i.e. a set of peptides). As noted above the peptide sequences typically represent a proteolytic digestion or chemically cleaved product of an initial polypeptide comprising sample. The polypeptide sample itself may represent the entire protein recovered from a biological source or it may represent a subset of polypeptides initially recovered from a biological source and subsequently selected based on a common physical feature of the polypeptide (e.g. shared binding affinity, size or other feature). The peptides of the first and second samples are labeled in such a manner that identical peptides present in the first and second samples can be distinguished from one another through the use of mass spectrometry. In one embodiment the analysis is conducted on three or four samples, wherein each of the sets of peptides are labeled with isotopic labels that can be distinguished from one another through the use of mass spectrometry, and the samples are then mixed together and multiplexed in a single experiment. In accordance with one embodiment, one set of the first or second sample peptides is isotopically labeled relative to the other sample peptides. In one embodiment, two samples are analyzed wherein one set of peptides is labeled with deuterium relative to the other set of peptides. More specifically, in one embodiment the peptides of the first sample are labeled with a first label and the peptides of the second sample are labeled with an isotopic (e.g. deuterium) derivative of the first label. For example, the first set of peptides may be labeled with a compound comprising

a methyl group and the second set of peptides are labeled with a derivative of the first label, wherein the methyl group has been replaced with a CD 3 substituent.

Once the first and second samples (comprising the respective first and second sets of peptides) have been labeled, the first and second samples are mixed together and the mixture is subjected to analysis by an ion mobility spectrometer (or other related gas-phase separation techniques) and a time-of-flight (TOF) mass spectrometer. Analysis by the ion mobility spectrometer and time of flight (TOF) mass spectrometer includes the step of ionizing the mixture of peptides using standard techniques including corona discharge, atmospheric pressure photoionization (APPI), electrospray ionization (ESI), matrix assisted laser desorption ionization (MALDI) or use of a radioactive source. In one embodiment electrospray ionization is used to ionize the peptides. The mass spectrometer can be provided with an ion funnel trap and the labeled peptide ions can be stored in the ion funnel trap prior to subjecting the labeled peptide ions to the combination of ion mobility spectrometry and mass spectrometry. At specified intervals, a sample of the ions is let into the drift chamber of the ion mobility spectrometer and the time of flight (TOF) mass spectrometer. In one embodiment the ion mobility spectrometer comprises a split-field drift tube.

In accordance with one embodiment a system for analyzing complex peptide mixtures for the presence of SAAP peptides is provided. The system comprises an ion mobility spectrometry device functionally coupled to a time of Flight (TOF) mass spectrometry device. The device further comprises an electrospray ionization device and an ion trap funct : onally coupled to the ion mobility spectrometry device and the mass spectrometry device.

In accordance with one embodiment condensed-phase separations are used in combination with IMS or FAIMS and time of flight (TOF) mass spectrometer characterization. For example, condensed-phase separation can be conducted using liquid chromatography (LC; see Shen et al., Anal Chem. 2002;74:4235 ), capillary electrophoresis (CE; see Shen et al., Anal Chem. 1999;71:5348) and/or gel techniques (see Shevchenko A, WiIm M, Vorm O, Mann M. Anal Chem. 1996;68:850). In one embodiment the separation of the labeled peptides includes multidimensional separations using two or more different stages, followed by MS characterization.

In one embodiment the peptides are separated using 2-D gels, where an isoelectric focusing (IEF) separation by isoelectric point is followed by electrophoretic (SDS-PAGE)

separation by molecular size. The labeled peptides are then extracted from the gel and analyzed using IMS and MS techniques. In alternative embodiments using liquid-phase separations, a combination of strong-cation exchange LC and reversed-phase (RP) LC can be used. For example, in one implementation a technique known as MudPIT (see Washburn MP, Wolters D, Yates JR. Nature Biotechnol. 2001;19:242 and 8.Skop AR, Liu HB, Yates J, Meyer BJ, Heald R. Science. 2004;305:61) is used.

Other useful 2-D separations comprise anion exchange/RPLC (Wagner K, Miliotis T, Marco-Varga T, Bischoff R, Unger KK. Anal Chem. 2002;74:809), size-exclusion chromatography/RPLC (Opiteck GJ, Jorgenson JW, Anderegg RJ. Anal Chem. 1997;69:2283), and other LC/LC combinations (see Evans CR, Jorgenson JW. Anal Bional Chem. 2004;378:1952), RPLC/CE (Bushey MM, Jorgenson JW. Anal Chem. 1990;62:978 and Lewis KC, Opiteck GJ, Jorgenson JW, Sheeley D. J Am Soc Mass Spectrom. 1997;69:2283) and micellar electrokinetic chromatography/CE (seeRocklin RD, Ramsey RS, Ramsey JM. Anal Chem. 2000;72:5244) and CE/CE methods (e.g. IEF coupled to isotachophoresis/zone electrophoresis; Mohan D, Pasa-Tolic L, Masselon CD, Tolic N, Bogdanov B, Hixson KK, Smith RD, Lee CS. Anal Chem. 2003;75:4432).

Two-dimensional LC approaches can provide peak capacities of greater than 1000 for proteolytic digests (see Valentine, et al., J. Proteome Research 2006, 5, 2977-2984 and Valentine, et al, Int. J. Mass Spectrom. 2001, 212, 97-109). Separations with additional dimensions, including combinations of methods at the intact protein and peptide level (i.e., with intermediate enzymatic digestion) can provide yet greater total peak capacities.

Detecting the presence of sequence variants is the first step in SAAP peptide analysis. However, an additional goal is to determine the nature of the polymorphism itself (i.e., the position and identities of the amino acids). For protein systems where the sequence differences are known, the parent mass, as well as the number of added isotopic labels, are used to determine the identity of peptide variants. However, for systems where the peptide variants are not known or where the mixture contains multiple proteins, such an approach is intractable. In these cases it is useful to examine collision-induced dissociation (CID) (or other fragmentation methods including photo dissociation, surfaced induced dissociation (SID), electron capture, etc.) data. In the present IMS-MS system as exemplified in Fig. 1, CID is implemented using a split-field approach. Briefly, the field in the second, short

mobility region at the back of the drift tube can be modulated to transmit precursor ions or create fragments. Thus precursor MS and CID-MS information can be obtained.

Accordingly, in one embodiment a method is provided for determining the relative abundance of SAAPs in a sample as well as the identity of the SAAPS. The method comprises the steps of providing a first and second sample, wherein the first and second samples comprise a plurality of peptide sequences. The peptides of the first sample are then labeled with a detectable isotopic label relative to the peptides of the second sample and the first and second samples are mixed together to provide a mixed composition. The mixed composition is then analyzed using an ion mobility spectrometer and time of flight (TOF) mass spectrometer to determine the presence of SAAPs in the first or second sample. Upon identification of the presence of one or more SAAPs in the first or second sample, peptide ions of the first or second sample, or of the mixed composition, can be subjected to collision induced dissociation or electron transfer dissociation in the second, short mobility region at the back of the drift tube to generate a ladder of peptide fragments that allows the determination of the amino acid sequence of the SAAPs through analysis of the mass spectrometer data.

EXAMPLE 1

Use of ion mobility spectrometry (IMS) combined with mass spectrometry (MS) for SAAP analysis.

The present disclosure describes the use of a gas-phase separation technique, ion mobility spectrometry (IMS) combined with mass spectrometry (MS) for SAAP analysis. Tryptic digests of two proteins, cytochrome c (bovine and equine) and hemoglobin (bovine and sheep), are used to create identical peptides as well as those differing by a single (or multiple) amino acid(s). Although the single amino acid variant peptides come from different animal species, for purposes of this Example they will be referred to as SAAP peptides. Hemoglobin (bovine and sheep) and cytochrome c (horse >95% and bovine >95%) was purchased from Sigma- Aldrich and used without further purification. Each protein is dissolved in 0.1 M ammonium bicarbonate buffer solution (pH 7.5) with 1 M urea to a final protein concentration of 20 mg-mL "1 . TPCK-treated trypsin (Sigma- Aldrich) was added at a ratio of 1 :50 (w/w) to each protein solution and incubated for 24 hours at 37 0 C. The digest solution was then filtered using a 3 kDa molecular weight cutoff microconcentrator (Amicon Bioseparations). The filtered solution is then subjected to solid phase extraction (Oasis HLB) to clean the digest peptides. The remaining peptides are dried using a CentriVap Concentrator (Labconco).

Isotopic labeling strategies for MS have been described (see for example Julka, S.; Regnier, F. J. Proteome Res. 2004, 3, 350-363, Goshe, M. B.; Smith R. D. Curr. Opin.

Biotechnol. 2003, 14(1), 101-109, Tao, W. A.; Aebersold, R Curr. Opin. Biotechnol. 2003, 14(1), 110-118, and Ferguson, P. L; Smith, R. D. Annu. Rev. Biophys. Biomol. Struct. 2003, 32, 399-424). Accordingly, in this disclosure only a brief description of the labeling procedure used is described. N-hydroxysuccinimide, acetic anhydride and acetic anhydride- D 6 (99% purity, Sigma- Aldrich) was used for synthesizing N-acetoxysuccinimide and its Da- analogue. The reactions to produce the light (H 3 )- and heavy (U 3 )-labeled peptides are shown below in Scheme I where R represents a tryptic peptide.

Scheme I

bovine (cytochrome c and hemoglobin)

N-acθtoxysuccinimidθ

equine (cytochrome c) and sheep (hemoglobin)

A 100 fold excess of N-acetoxysuccinimide and N-acetoxy-[D3]succinimide was added individually to two equal aliquots of lmg/mL digest solutions (0.1 M sodium phosphate buffer at a pH of 7.5). The final light (H 3 )- and heavy (D 3 )-labeled protein samples are stirred for 5 hrs. at 300 K. Following this, 0.5 mL of N-hydroxylamine is added per 1 mL aliquot of the labeling reaction solution and a 5 M NaOH solution is added to a pH of about 11. After 10 min., the pH of each light (H 3 )- and heavy (D 3 )-labeled peptide solution is adjusted to about 7 to about 7.5 using 0.1 M HCl. The labeled peptides of each protein are extracted and purified by solid phase extraction (Oasis HLB) and subsequently they are dried using the CentriVap Concentrator (Labconco). Complete labeling results in the incorporation of an isotopic label at the amino terminus of each tryptic peptide as well as at each lysine residue.

Light (H 3 )- and heavy (D 3 )-labeled digest peptide solutions are reconstituted by dissolving ~1 mg of each in 1 mL of water. Then two aliquots (100 μL each of the light (H 3 )- and heavy (D 3 )-labeled peptide solutions) are mixed together and diluted to a final peptide concentration of ~5 μg-mL "1 in a water-acetonitrile-formic acid (49:49:2, v/v/v) solution. This solution is used directly for ESI-IMS-MS analysis. Positively charged (protonated) ions are formed by electrospraying the peptide mixture solutions. The ESI needle is biased at +2000 V

relative to the entrance of the desolvation region and the solution flow rate is held at 0.25 μL-min " . In the present disclosure, datasets were collected for 60 seconds (~1 to 2 ng of sample consumed); however, it should be understood that shorter analysis times (requiring less sample) can be utilized. Two model protein systems, cytochrome c and hemoglobin were selected for IMS-MS analyses and characterization of SAAPs. Cytochrome c consists of a single polypeptide chain that is 104 amino acid residues long. Hemoglobin contains a- and /3-polypeptide chains such that the total number of amino acid residues is -2.8 times larger. The choice of species is aimed at generating different levels of amino acid variance and thus numbers of SAAPs. Table I shows all amino acid variations and their position within the protein sequence as well as the sample source and mass difference. The information in Table I describes how SAAPs are distinguished with MS techniques including those described herein. Consider, for example, the substitution of the amino acid serine by threonine (S→ T) at residue position 47 in bovine and equine cytochrome c, respectively. A pair of peptides containing this single variation would be distinguished by a shift in m/z corresponding with the sum of the mass difference from the SAAP (-14 Daltons for the S→ T variation) and the mass difference associated with the incorporated isotopic label(s) divided by the peptide charge state.

Table I. Complete list of amino acid variations between species for these proteins.

Mass

Protein Residue a Difference 13 cytochrome c S47 → T 47 14.02 cytochrome c GOO → K 6 O 71.07 cytochrome c G 89 → T 89 44.03 hemoglobin a chain G 8 → S 8 30.01 hemoglobin a chain H 20 → N 20 23.02 hemoglobin a chain A 22 → G 22 14.02 hemoglobin a chain E 23 → A 23 58.00 hemoglobin a chain A 6 O → E 6 O 58.00 hemoglobin a chain E 7 i → G 7 i 72.02 hemoglobin a chain A 79 → T 79 30.00 hemoglobin a chain E 82 → D 82 14.02 hemoglobin a chain Sm Cm 15.98 hemoglobin α chain Si is → N115 27.01 hemoglobin β chain Ai 2 → Gi 2 14.02 hemoglobin β chain G 2 4 → A 24 14.02 hemoglobin β chain S43 → H 43 50.02 hemoglobin β chain T 49 → N 49 13.00 hemoglobin β chain A 86 → Q 86 57.02 hemoglobin β chain Ki 03 → Ri 03 28.01 hemoglobin β chain Nn 6 → Hue 23.02 hemoglobin β chain F 117 H 117 10.01 hemoglobin /3 chain F119 → Nn 9 33.03 hemoglobin β chain R.143 → Ki 43 28.01 a Variant amino acid residues and positions. The residues on the left and right correspond with those found in bovine and equine cytochrome c and bovine and sheep hemoglobin, respectively. b Mass differences are obtained from the monoisotopic values for each residue.

Fig. 2 shows nested t fan/z) distributions for isotopically labeled mixtures of bovine/equine cytochrome c and bovine/sheep hemoglobin digest mixtures. Typically, features in toijn/z) datasets fall into mobility families based on peptide charge state. Such families consist of a low-mobility, singly-charged ion family as well as higher mobility doubly- and triply-charged ion families. Under the spray conditions employed, the doubly- and triply-charged families are favored in both protein datasets. Visual inspection of the two datasets demonstrates the increased complexity in the hemoglobin mixture; many more high abundance features are evident. Fig. 2 also demonstrates a major advantage of the IMS-MS

analysis. That is, the mobility dispersion of all species makes it possible to observe many low abundance species (e.g., features in the singly-charged, [M+H] + , family) because they are removed from regions of interfering higher abundance signals. This is further discussed more detail below. Fig. 3 A shows an expanded region of the two-dimensional plot of the cytochrome c data shown in Fig. 2A for several typical peptides. In examining the plot in detail (see Fig. 3B) we find that peptides (A and B) from different species having the same sequence exhibit essentially identical mobility distributions. The 6.03 m/z difference between the experimentally determined values of 762.50 and 768.53 for this [M+H]+ pair indicates that two labels have been incorporated into the peptides, and that the sequences are identical. Additionally, two labels suggest that the C-terminal residue is lysine. The combined information is used to identify these peaks; we assign this pair to the light (H 3 )- and heavy (D 3 )-labeled peptide ion [YIPGTK+H] + (SEQ ID NO: 1) from bovine and equine cytochrome c [m/z (calculated) as 762.38 and 768.38, respectively]. Peptides with amino acid sequence variations are observed as peaks that differ in m/z depending upon the amino acid variation. Fig. 3B shows two peptide sequence variants (C and D) assigned as [TGQAPGFSYTDANKH-IH] 2+ (SEQ ID NO: 2; m/z = 770.84) and [TGQAPGFTYTDANK+2H] 2+ (SEQ ID NO: 3; m/z = 780.84) from bovine and equine cytochrome c, respectively. The experimentally determined m/z difference, 10.00, corresponds to the S 47 → T 47 substitution (bovine→ equine) plus the isotopic shift associated with incorporation of two labels for each peptide. A complete list of all SAAP peptides as well as unique peptides (i.e., those found in only one of the protein samples) is given in Table II and Table III for cytochrome c and hemoglobin, respectively. In total 12 and 26 peptide variants were observed, corresponding to 5 and 11 amino acid polymorphisms for cytochrome c and hemoglobin, respectively.

An interesting issue that arises in the IMS distributions for the

[TGQAPGFSYTDANK+2H] 2+ (SEQ ID NO: 2) and [TGQAPGFTYTDANK+2H] 2+ (SEQ ID NO: 3) ions is the degree of similarity. Many factors contribute to the establishment of gas-phase peptide ion conformation, including peptide amino acid composition and primary sequence, the overall length of the peptide, as well as the cation (for positively-charged ions) used in the electrospray ionization process. Because the mobility separation distinguishes peptide ions based on differences in overall collision cross section (as well as charge), the

nature of an amino acid substitution must be considered when assessing the utility of the mobility dimension as a differentiator of SAAP peptides. Fig. 3B also shows the drift time distributions for the two cytochrome c SAAP peptides (C and D). Notably, both distributions contain three peaks at similar drift times. That three peaks are resolved requires that at least three conformation types are stable at 300 K over the timescale of the experiment.

Additionally, the width (-2.5 ms) at the base of the triplet peaks is nearly the same for both peptides. The similarities in these distributions, associated as a signature for the two SAAP peptides, can be contrasted by the disparity of the mobility distribution for the unassigned, doubly-charged feature E shown in Fig. 3B. This distribution is largely comprised of a single peak at ~15 ms corresponding to ions with relatively high mobilities. Thus, one approach is to examine IMS peak shapes and positions as a means of confirming SAAPs that are found from m/z analysis.

As also illustrated in Figure 3B, the two mobility distributions for the SAAP peptides are dissimilar in some respects. For example, the relative abundances of the three features are not identical. The middle peak dominates the bovine distribution while the lowest- mobility ions dominate the equine distribution. In addition to the difference in relative abundances, the entire distributions occur over different to ranges; that is, all three features from the higher molecular weight peptide are observed at slightly longer times (see Fig. 3B). Such a shift is expected because of the correlation of size and mass (Valentine, S. J.; Counterman, A. E.; Hoaglund-Hyzer, C. S.; Clemmer, D. E. J. Phys. Chem. B, 1999, 103, 1203- 1207). In this case, the observed shift in t D results from the increasing peptide size originating with the S→ T substitution (associated with a difference of a single methyl group).

While the data presented in Fig. 3 demonstrate the possibility of using mobility information to both relate SAAP peptides (through similarities in mobility distributions) as well as distinguish these species (through distribution differences). Note that for some peptides the mobility information may be combined with molecular modeling and cross section calculations to confirm the presence of SAAPs. For example, the S → T amino acid substitution described above might be expected to produce the similar mobility distributions (triplet peaks) as both amino acids are polar and of relatively similar size. However, substitutions of other types of amino acid residues (non-polar or charged) or substitutions occurring at key positions within the peptide sequence can result in significant changes in mobility distributions (including large shifts in t d as well as changes in the number of stable

structures). Thus, as mentioned above, the mobility information can be combined with molecular modeling and cross section calculations to confirm the presence of SAAPs. For example, this type of analysis can be used in identifying peptide sequences having identical m/z values (but different cross sections). See Henderson, S. C; Valentine, S. J.; Counterman, A. E.; Clemmer, D. E. Anal. Chem. 1999, 71, 291-301.

One problem with MS analyses of complex mixtures such as the hemoglobin sample (Fig. 2) is that the signal from low-abundance species (or those with low ionization efficiencies) can be obscured by isobaric ions. As described previously, IMS separation can reduce chemical noise. Fig. 4A shows an expanded region of bovine and sheep hemoglobin data shown in Figure 2B. The selected region highlights the low-m/z, low-mobility (singly- charged) peptide ions. Many features are evident that would otherwise not be observed with MS analysis alone; they would be buried in the higher-mobility chemical noise. This is illustrated by the mass spectral insets associated with specific regions of the data. Integrating all drift bins at each m/z value (i.e., the mass spectrum that would be obtained in the absence of mobility separation) shows a relatively complicated spectrum. The baseline of this spectrum over the region shown is -500 ion counts. A narrow slice across the [M+H] + family shows two distinct features corresponding with the SAAP peptide ion [GHGAK+H] + (SEQ ID NO: 4) and the unique peptide ion [NFGK+H] "1" (SEQ ID NO: 5: see Table III). The observation of these ions illustrates the ability to observe low-intensity features with the combined IMS-MS approach that are not observed with MS alone.

The ability to identify low-abundance (or low-response) species leads to relatively high sequence coverage for the proteins, even with the short analysis times employed here (< 60 s). The percent sequence coverage from IMS-MS analysis is -76% and ~94% for bovine and equine cytochrome c, respectively compared with only -52% and -68% for MS analysis alone. In addition to providing high sequence coverage, we note that almost all of the amino acid variations (-83%) between the two cytochrome c molecules are accounted for with at least one peptide. The observed variant peptides constitute a significant portion of these proteins, -24% and -25% of the total sequence of bovine and equine cytochrome c, respectively. A relatively large fraction of each hemoglobin protein has also been observed (-64% and -73% for IMS-MS analysis compared with -52% and -59% for MS analysis alone for bovine and sheep hemoglobin, respectively). Variant peptides again constitute a significant fraction of the protein sequence, comprising -29% and -42% of the bovine and

sheep hemoglobin sequences, respectively. Only -50% of the sequence variations are accounted for in the hemoglobin analysis.

The sequence coverage for cytochrome c and hemoglobin obtained from experiments is shown below in Scheme II and Scheme III, respectively.

SCHEME π.

CYTOCHROME C BOVINE

GDVEKGKKiF VOKCAOCΆΎY EKGGKHKγGP NLHGLFGRKT

GQAPGF[S]YTY) ANKNKGITWjG} EETLMEYLEN ΫKKYIPGTKM IFAGIKKK[G]E KEOLIAYLKKATNE (SEQ ID NO: 6)

CYTOCHROME C EQUINE

GDVEKGKKIF VOKCAQCWϊY EKGGKHKTGP NLHGLFGRKT

GOAPGFmYTO ANKNKGITWIKl EETLMEYLEN YKKηPGTKM IFAGIKKKmE REDLIAYLKKATNE (SEQ ID NO: 7)

SEQ ID NO: 6 and SEQ ID NO: 7 of scheme II show the complete sequences of bovine and equine cytochrome c, respectively. Peptides observed with the IMS-MS are shown in bold (SAAP peptides), italic (identical peptides observed as MS doublets), and underline (unique peptides or an identical peptide observed from only one protein sample). Amino acid variations are placed within brackets. For a list of observed SAAP peptides as well as peptides that are, in fact, unique to each protein see Table II. Sequence identifiers for each of the peptide sequences disclosed in Table II are as follows:

GITWK (SEQ ID NO: 8) NKGITWK (SEQ ID NO: 9)

TYTDANK (SEQ ID NO: 10)

TYTDANKNK (SEQ ID NO: 11 )

KTEREDLIAY (SEQ ID NO: 12)

TGQAPGFTYTDANK (SEQ ID NO: 3) EETLMEYLENPK (SEQ ID NO: 52)

KTGQAPGFSY (SEQ ID NO: 13)

KGEREDLIAY (SEQ ID NO: 14)

TGQAPGFSYTDANK (SEQ ID NO: 15)

Table II. Assignments for peptide ions from a cytochrome c digest mixture.

Experimental

Peptide Sequence charge a No Labels' 3 m/z c Calculated m/z d Peak 6

Bovine Equine

GITWK 1 1, 2 649.39, 694.39 649.34, 694.34 uei, ue 2

NKGITWK 1, 2 1 891.57, 446.32 891.48, 446.24 Ue 3 , ue 4

TYTDANK 2 2 451.76 451.69 sei

TYTDANKNK 2 1 550.30 550.26 se 2

KTGQAPGFSY 2 2 570.32 570.26 sb 3

KGEREDLIAY KTEREDLIAY 2 2 639.31, 664.33 639.31, 664.32 sb 4 , se 4

TGQAPGFSYTDANK TGQAPGFTYTDANK 2 2 770.84, 780.84 770.84, 780.84 sbs, se 5

EETLMEYLENPK 2 2 793.42 793.35 Uβ5 a Charge (z) of the assigned peptide ion. Two numbers indicate the peptide was observed as a singly- and doubly-charged ion.

The total number of isotopic labels incorporated into the peptide. Two numbers correspond with the observance of the same peptide as two different dataset features with the incorporation of one or two isotopic labels. 5 c The experimentally determined m/z value for the monoisotopic peptide ion. d Calculated m/z values are obtained for digest peptides from the Peptide Mass program (http://ca.expasy.org/tools/peptide-mass.html). To these values the masses of the incorporated labels are added as well as the total number of attached protons. The summed value is then divided by the charge state of the ion. Two values correspond either with the SAAP peptides from different sources, the same peptide of different charge state, or the same peptide with differing numbers of attached labels. 10 e Peaks observed in the cytochrome c dataset. here ue, ub, se and sb are unique equine, unique bovine, SAAP equine, and SAAP bovine peptides, respectively.

Scheme III.

HEMOGLOBIN α-CHAIN BOVINE

VLS 1 -VAPKiG)NV KAA WGKVGG (HI AIAEIYGAEALE RMFLSFPTTK TYFPHFDLSH GSAQVKGHG[A) KVAAALTKAV { EjHLDDLPG {A} L S{E}LSDLHAHK LRVDPVNFKL LSHSLLVTLA SHLPSDFTPA VRASLDKFLA NVSTVLTSKYR (SEQ ID NO: 16)

HEMOGLOBIN /3-CHAIN_BOVINE

MLTAEEKAAV T{A}FWGKVKVD EVG(G)EALGRI LVVYPWTQRF

FE{S}FGDLS{T}A DAVMNNPKVK AHGKKVLDSF SNGMKHLDDL U-GTFA(A)LSEL HCDKLHVDPE NF(K)LLGNVLV WLAR(NF)G(KIE FTPVLQADFQ KWAGVANAL AH(R)YH (SEQ ID NO: 17)

HEMOGLOBIN α-CHAIN _SHEEP

VLSAAPK(S)NV KAAWGKVGGiN] A(GA|YGAEALE RMFLSFPTTK TYFPHFDLSH GSAQVKGHG[K) KVAAALTKAY (G)HLDDLPG(T)L S(D)LSDLHAHK LRVDPVNFKL LSHSLLVTLA (C)HLP(N)DFTPA VHASLDKFL4 NVSTVLTSKY R (SEQ ID NO: 18)

HEMOGLOBIN jS-CHAIN _SHEEP MLTAEEKAAV T(GIFWGKVKVD EVG(A)EALGRL LWYPWTQKt

FE(HIFGDLS (NIA DAVMNNPKVK AHGKKFI-Z) 1 S 1 F SNGMKHLDDL J^GTFA(Q)LSEL HCDKLHVDPE NF(R)LLGNVLV WLAR(HH)G(N)E FTPVLOADFQ KWAGVANAL AH(K)YH (SEQ ID NO: 19)

Scheme III shows complete sequences of bovine and sheep hemoglobin. Both chains are shown for each protein. Peptides are indicated in a similar fashion as those in Scheme II. For a list of observed SAAP peptides as well as peptides that are, in fact, unique to each protein (see Table III). Sequence identifiers for each of the peptide sequences disclosed in Table III are as follows: GNVK (SEQ IDNO: 21) NFGK (SEQ IDNO: 5) GHGAK (SEQ IDNO: 4) AAVTAFWGK (SEQ IDNO: 22) GNVKAAWGK (SEQ IDNO: 23) AAVTAFWGK (SEQ IDNO: 24) VDEVGGEALGR (SEQ ID NO: 25) AAVTAFWGKVK (SEQ IDNO: 26) VKVDEVGGEALGR(SEQ IDNO: 27) GTFAALSELHCDK (SEQ IDNO: 28)

EFTPVLQADFQK (SEQ ID NO: 29)

WAGVANALAHRYH (SEQ ID NO: 30)

VGGHAAEYGAEALER (SEQ ID NO: 31)

GHGEK (SEQ ID NO: 32) AAVTGFWGK (SEQ ID NO: 33)

SNVKAAWGK (SEQ ID NO: 34)

AAVTGKWGK (SEQ ID NO: 35)

VDEVGAEALGR (SEQ ID NO: 36)

WAGVANALAHK (SEQ ID NO: 37) GHGEKVAAALTK (SEQ ID NO: 38)

VKVDEVGAEALGR (SEQ ID NO: 39)

GTFAQLSELHCDK (SEQ ID NO: 40)

VGGNAGAYGAEALER (SEQ ID NO : 41 )

WAGVANALAHKYH (SEQ ID NO: 42) HHGNEFTPVLQADFQK (SEQ ID NO: 43)

FFEHFGDLSNADAVMNNPK (SEQ ID NO: 44)

It is instructive to consider the tryptic peptides containing amino acid variations that are not observed in the experiments. Such peptides include GITW{G}EETLMEYLENPK (SEQ ID NO: 45) from bovine cytochrome c as well as AV{E}HLDDLPG{A}LS (E)LSDLHAHK (SEQ ID NO: 46),

LLSHSLLVTLA{S}HLP{S}DFTPAVHASLDK (SEQ ID NO: 47), FFE (S)FGDLS {T} AD A VMNNPK (SEQ ID NO: 48) from bovine hemoglobin and AV(C)HLDDLPG(T)LS (D)LSDLHAHK (SEQ ID NO: 49),

LLSHSLLVTLA(C)HLP (N)DFTPAVHASLDK (SEQ ID NO: 50) from sheep hemoglobin. The calculated masses of these peptides are 2008.95, 2366.19, 2968.60, 2088.95, 2310.16, and 3011.59, respectively; all of these values fall within -33% of the value for the molecular weight cutoff filter employed in the sample cleanup steps (see above).

Table III. Assigned peptides from a hemoglobin digest mixture.

Experimental

Peptide Sequence chain a charge No Labels c m/z ά Calculated m/z e Peak

Bovine Equine

GNVK a 1 1 459.25 459.24 sbi NFGK β 1 1,2 507.27, 549.23 507.24, 549.24 ub] GHGAK GHGEK a 1 2 553.29,617.30 553.25, 617.25 sb 2 , SS 2

AAVTAFWGK AAVTGFWGK β 1 2 1034.59, 1026.57 1034.50, 1026.49 Sb 3 , SS3 GNVKAAWGK SNVKAAWGK a 2 2 507.70, 525.76 507.76, 525.76 sb 4 , Ss 4 AAVTAFWGK AAVTGKWGK β 2 2 517.71,513.78 517.75, 513.75 sb 5 , Ss 5 VDEVGGEALGR VDEVGAEALGR .28 Sb 6 , SS6

VVAGVANALAHK SS 7

AAVTAFWGKVK sb8

GHGEKVAAALTK SS9

VKVDEVGGEALGR VKVDEVGAEALGR .37 sbio, ssio GTFAALSELHCDK GTFAQLSELHCDK .84 sbπ, ssπ VGGNAGAYGAEALER USi

EFTPVLQADFQK ub 2

WAGVANALAHRYH WAGVANALAHKYH .40 Sbl2, SSl 2

VGGHAAEYGAEALER ub 3

HHGNEFTPVLQADFQK US 2 FFEHFGDLSNADAVMNNPK US 3 a Hemoglobin chain in which the peptide is found. Charge (z) of the assigned peptide ion. c The total number of isotopic labels incorporated into the peptide. Two numbers correspond with the observance of the same peptide as two different dataset features from the incorporation of one or two isotopic labels. The experimentally determined m/z value for the monoisotopic peptide ion. e Calculated m/z values are obtained for digest peptides from the Peptide Mss program (http://ca.expasy.org/tools/peptide-mass.html). To these values the masses of the incorporated labels are added as well as the total number of attached protons. The summed value is then divided by the charge state of the ion. Two values correspond either with the SAAP peptides from different sources or the same peptide with differing numbers of attached labels. f Peaks observed in the cytochrome c dataset. Here, ub, us, sb and ss are unique bovine, unique sheep, SAAP bovine, and SAAP sheep peptide,

10 respectively.

Thus such species may not comprise an appreciable amount of the sample peptides. This argument does not preclude other factors (such as trypsinization or ionization efficiency) which may also contribute to the non-observance of these and other variant peptides.

Distinguishing sequence variants is the first step in SAAP peptide analysis, however, one goal is to determine the nature of the polymorphism itself (i.e., the position and identities of the amino acids). For these protein systems where the sequence differences are known, the parent mass as well as the number of added isotopic labels are used to determine the identity of peptide variants. However, for systems where the peptide variants are not known or where the mixture contains multiple proteins, such an approach is intractable. In these cases it is useful to examine collision-induced dissociation (CID) data. In the present IMS-MS system, CID is implemented using a split-field approach (see Valentine, S. J.; Koeniger, S. L.; Clemmer, D. E. Anal. Chem. 2003, 75, 6202-6208, and Hoaglund-Hyzer, C. S.; Li, J.; Clemmer, D. E. Anal. Chem. 2000, 72, 2737-2740). Briefly, the field in the second, short mobility region at the back of the drift tube can be modulated to transmit precursor ions or create fragments. Thus precursor MS and CID-MS information can be obtained.

Figure 5 shows two CID-MS spectra for the peptide [TGPNLHGLFGR+2H] 2+ (SEQ ID NO: 51) from cytochrome c (bovine and equine) as well as the SAAP peptide [WAGVANALAHK+2H] 2+ (SEQ ID NO: 39) from sheep hemoglobin. Both spectra contain a homologous y-type ion series and both series are quite extensive. For example, the y 2 to yi \ and the y 2 to yio fragment ions are observed for the former and latter peptides, respectively. In this type of defined system, the fragment ion assignments can be interpreted manually. The approach can also be combined with database search methods.

While the disclosure has been illustrated and described in detail above, such illustration and description is to be considered as exemplary and not restrictive in character, it being understood that only the preferred embodiments have been shown and described and that all changes and modifications that come within the spirit of the disclosure are desired to be protected. All cited references are expressly incorporated herein by reference.