Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
IDENTIFICATION OF PROTEINS IN THE ABSENCE OF PEPTIDE IDENTIFICATIONS
Document Type and Number:
WIPO Patent Application WO/2017/037563
Kind Code:
A1
Abstract:
A plurality of measured product ion spectra are produced using a DIA tandem mass spectrometry method for a sample of two or more known proteins. A desired confidence probability for the identification of at least one known protein of two or more known proteins is received. Two or more combinations of N theoretical product ions that are exclusive to the at least one known protein are calculated. The at least one known protein is identified by finding product ions in the plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability.

Inventors:
SHERMAN JAMES ANDREW (CA)
TATE STEPHEN A (CA)
Application Number:
PCT/IB2016/054990
Publication Date:
March 09, 2017
Filing Date:
August 19, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
DH TECHNOLOGIES DEV PTE LTD (SG)
International Classes:
G01N33/68; G16B40/10; H01J49/00
Domestic Patent References:
WO2014195783A12014-12-11
Foreign References:
US20150144778A12015-05-28
US20120049058A12012-03-01
US20090090857A12009-04-09
Other References:
GILLET ET AL.: "Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis", MOLECULAR & CELLULAR PROTEOMICS, vol. 11, no. 6, 1 June 2012 (2012-06-01), pages 1 - 17, XP055201307
See also references of EP 3345003A4
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A system for deterministically identifying to a desired confidence probability a known protein of a sample from other known proteins of the sample using a combination of two or more exclusive combinations of theoretical product ions as an error detection and correction code in a tandem mass spectrometry data independent acquisition (DIA) method that does not provide peptide identifying information, comprising:

a separation device that separates peptides of two or more known proteins from a sample over time;

an ion source that receives the peptides from the separation device and ionizes the

peptides, producing an ion beam of precursor ions;

a tandem mass spectrometer that receives the ion beam, divides a mass-to-charge ratio (m/z) range of the ion beam into two or more precursor ion mass selection windows, and selects and fragments the two or more precursor ion mass selection windows during each cycle of a plurality of cycles, producing a plurality of measured product ion spectra; and

a processor in communication with the tandem mass spectrometer that

(a) receives the plurality of measured product ion spectra from the tandem mass spectrometer,

(b) receives a desired confidence probability for the identification of at least one known protein of two or more known proteins,

(c) calculates two or more combinations of N theoretical product ions that are exclusive to the at least one known protein, and

(d) identifies the at least one known protein by finding product ions in the plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability.

2. The system of claim 1, wherein the two or more of the two or more combinations of N theoretical product ions that match the product ions in the plurality of measured product ion spectra comprise an error detection and correction code.

3. The system of claim 1, wherein the processor calculates two or more combinations of N theoretical product ions that are exclusive to the at least one known protein in the set of two of more proteins by

(ci) retrieving from a memory a sequence for each protein of the two or more known proteins,

(cii) calculating for each sequence of the two or more known proteins one or more

theoretical peptides and computationally selecting and fragmenting each theoretical peptide of the one or more theoretical peptides, producing a plurality of theoretical product ions for each protein of the two or more known proteins,

(ciii) selecting a number, N, of theoretical product ions to be used to identify known proteins,

(civ) from the plurality of theoretical product ions for each protein of the two or more known proteins, calculating every different combination of N theoretical product ions, producing one or more combinations for each protein, and

(cv) comparing each combination of each protein of the two or more known proteins to each combination of every other protein of the two or more known proteins to determine two or more combinations of N theoretical product ions that exclusively identify the at least one known protein of the two or more known proteins.

4. The system of claim 1, wherein the processor further calculates a confidence probability for each of the two or more combinations of N theoretical product ions by directly calculating a likelihood that each product ion of each combination occurs at random from data stored about the product ions.

5. The system of claim 1, wherein the processor further calculates a confidence probability for each of the two or more combinations of N theoretical product ions by estimating a likelihood that any product ion occurs at random, Y, calculating a confidence probability for any product ion as (1 - Y), and calculating a confidence probability for each of the two or more combinations of N

theoretical product ions as a product of the confidence probabilities of the product ions of the at least one exclusive combination, (1 - Y)N.

6. The system of claim 2, wherein the processor further iteratively executes steps (ciii)-(cv) and increases the number N in each iteration until the processor further determines in step (cv) two or more combinations of N theoretical product ions that exclusively identify the at least one known protein of the two or more known proteins.

7. The system of claim 1, wherein in step (d) the processor finds product ions in the plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability by

comparing the measured intensity levels of the plurality of measured product ion spectra at the m/z values of the product ions of two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability to a threshold level that indicates the presence of a product ion.

8. The system of claim 2, wherein

step (ciii) the processor selects a number, N, of theoretical product ion pairs to be used to identify known proteins,

step (civ) the processor, from the plurality of theoretical product ions for each protein of the two or more known proteins, calculates every different combination of N theoretical product ion pairs, producing one or more combinations for each protein, wherein each theoretical ion pair is from the same theoretical peptide, and step (d) the processor finds product ion pairs in the plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ion pairs that provide a combined confidence probability greater than or equal to the desired confidence probability by

performing curve subtraction on two ion extracted chromatograms (XICs)

calculated from the plurality of measured product ion spectra for each pair of product ions of two or more of the two or more combinations of N theoretical product ion pairs.

9. The system of claim 1, wherein before step (civ) the processor compares each product ion of each plurality of theoretical product ions for each protein of the two or more known proteins to the plurality of measured product ion spectra and removes the each product ion from the plurality of theoretical product ions if the product ion is not present in the plurality of measured product ion spectra.

10. The system of claim 2, wherein in step (cii) the processor further calculates an elution order for each theoretical peptide of each known protein and stores the elution order with each theoretical product ion calculated from theoretical peptide, and

in step (cv) the processor further uses elution order in comparing each combination of each protein of the two or more known proteins to each combination of every other protein of the two or more known proteins to determine two or more combinations of N theoretical product ions that exclusively identify the at least one known protein of the two or more known proteins.

11. The system of claim 10, wherein in step (d) the processor finds product ion pairs in the plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ion pairs that provide a combined confidence probability greater than or equal to the desired confidence probability by calculating extracted ion chromatograms (XICs) for product ions of the plurality of

measured product ion spectra,

comparing the measured intensity levels of the XICs of the product ions of the plurality of measured product ion spectra at the m/z values of the product ions of two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability to a threshold level that indicates the presence of a product ion, and

comparing the retention times of the XICs of the product ions of the plurality of measured product ion spectra at the m/z values of the product ions of the two or more of the two or more combinations of N theoretical product ions to the elution orders of the product ions of the two or more of the two or more combinations of N theoretical product ions.

12. A method for deterministically identifying a known protein of a sample from other known proteins of the sample using a tandem mass spectrometry data independent acquisition (DIA) method that does not provide peptide identifying information, comprising:

(a) receiving a plurality of measured product ion spectra from a tandem mass

spectrometer using a processor,

wherein the plurality of measured product ion spectra are produced by the tandem mass spectrometer by dividing a mass-to-charge ratio (m/z) range of an ion beam into two or more precursor ion mass selection windows and selecting and fragmenting the two or more precursor ion mass selection windows during each cycle of a plurality of cycles,

wherein the ion beam is produced by an ion source that ionizes peptides of two or more known proteins, producing an ion beam of precursor ions, and wherein the peptides are separated from a sample by a separation device;

(b) receiving a desired confidence probability for the identification of at least one known protein of two or more known proteins using the processor;

(c) calculating two or more combinations of N theoretical product ions that are exclusive to the at least one known protein using the processor; and

(d) identifying the at least one known protein by finding product ions in the plurality of measured product ion spectra that match two or more of the two or more

combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability using the processor.

13. The method of claim 12, wherein calculating two or more combinations of N theoretical product ions that are exclusive to the at least one known protein comprises (ci) retrieving from a memory a sequence for each protein of the two or more known proteins,

(cii) calculating for each sequence of the two or more known proteins one or more

theoretical peptides and computationally selecting and fragmenting each theoretical peptide of the one or more theoretical peptides, producing a plurality of theoretical product ions for each protein of the two or more known proteins,

(ciii) selecting a number, N, of theoretical product ions to be used to identify known proteins,

(civ) from the plurality of theoretical product ions for each protein of the two or more known proteins, calculating every different combination of N theoretical product ions, producing one or more combinations for each protein, and

(cv) comparing each combination of each protein of the two or more known proteins to each combination of every other protein of the two or more known proteins to determine two or more combinations of N theoretical product ions that exclusively identify the at least one known protein of the two or more known proteins.

14. A computer program product, comprising a non-transitory and tangible computer-readable storage medium whose contents include a program with instructions being executed on a processor so as to perform a method for

deterministically identifying a known protein of a sample from other known proteins of the sample using a tandem mass spectrometry data independent acquisition (DIA) method that does not provide peptide identifying information, comprising:

(a) providing a system, wherein the system comprises one or more distinct software modules, and wherein the distinct software modules comprise a measurement module and an analysis module; (b) receiving a plurality of measured product ion spectra from a tandem mass spectrometer using the measurement module,

wherein the plurality of measured product ion spectra are produced by the tandem mass spectrometer by dividing a mass-to-charge ratio (m/z) range of an ion beam into two or more precursor ion mass selection windows and selecting and fragmenting the two or more precursor ion mass selection windows during each cycle of a plurality of cycles,

wherein the ion beam is produced by an ion source that ionizes peptides of two or more known proteins, producing an ion beam of precursor ions, and wherein the peptides are separated from a sample by a separation device;

(c) receiving a desired confidence probability for the identification of at least one known protein of two or more known proteins using the analysis module;

(d) calculating two or more combinations of N theoretical product ions that are exclusive to the at least one known protein using the analysis module; and

(e) identifying the at least one known protein by finding product ions in the plurality of measured product ion spectra that match two or more of the two or more

combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability using the analysis module.

15. The computer program product of claim 14, wherein calculating two or more combinations of N theoretical product ions that are exclusive to the at least one known protein comprises

(di) retrieving from a memory a sequence for each protein of the two or more known proteins, (dii) calculating for each sequence of the two or more known proteins one or more theoretical peptides and computationally selecting and fragmenting each theoretical peptide of the one or more theoretical peptides, producing a plurality of theoretical product ions for each protein of the two or more known proteins,

(diii) selecting a number, N, of theoretical product ions to be used to identify known proteins,

(div) from the plurality of theoretical product ions for each protein of the two or more known proteins, calculating every different combination of N theoretical product ions, producing one or more combinations for each protein, and

(dv) comparing each combination of each protein of the two or more known proteins to each combination of every other protein of the two or more known proteins to determine two or more combinations of N theoretical product ions that exclusively identify the at least one known protein of the two or more known proteins.

Description:
IDENTIFICATION OF PROTEINS IN THE ABSENCE OF

PEPTIDE IDENTIFICATIONS

CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional Patent Application Serial No. 62/212,282, filed August 31, 2015, the content of which is incorporated by reference herein in its entirety.

INTRODUCTION

[0002] Various embodiments relate generally to tandem mass spectrometry. More

particularly various embodiments relate to systems and methods for

deterministically identifying known proteins in a sample from product ion data produced by a data independent acquisition (DIA) tandem mass spectrometry method without the aid of any peptide precursor ion data and for increasing the levels of confidence of such identifications.

[0003] A common problem in proteomics is determining the identity of protein present in the sample. Typically, proteins are identified in a sample using a two-step tandem mass spectrometry process. In the first step, experimental data is obtained. The proteins in the sample are digested using an enzyme such a trypsin, producing one or more peptides for each proteins. Note that a peptide, as used herein, is a digested portion of a protein. Some proteins can be digested intact, so a peptide can also be the entire protein. However, in most cases peptides are digested portions of proteins.

[0004] The peptides digested from proteins are then separated from the sample over time using a sample introduction device or separation device. The separated peptides are then ionized using an ion source. The ionized peptides, or peptide precursor ions, are selected by mass-to-charge ratio (m/z), the selected precursor ions are fragmented, and the resulting product ions are mass analyzed using a tandem mass spectrometer. The result of the first step is a collection of one or more product ion mass spectra measured at one or more different times.

[0005] In the second step, computer generated information about known proteins

expected to be in the experimental sample is compared to the experimental data. The known proteins are obtained from a database, and are computationally digested using the same enzyme used in the tandem mass spectrometry experiment, producing one or more theoretical peptides for each known protein. Theoretical peptides are computationally selected and fragmented, producing theoretical product ions for each known protein. The resulting theoretical product ions are then compared to each of the one or more measured product ion mass spectra at each of the one or more different times. Typically, known proteins are scored based on how well their theoretical product ions match the one or more measured product ion mass spectra. The proteins in the sample are then identified from the highest scoring known proteins.

[0006] Unfortunately, however, this traditional method of identifying proteins in a sample has at least three problems. First of all, it is not deterministic. In other words, even though the product ions of a particular known protein match the experimental product ion spectra better than the product ions of other known proteins, it cannot actually be determined that the particular known protein was identified. This is because all of the matching product ions of the particular known protein may be shared by other proteins. In other words, the product ions that best match the experimental product ion spectra may not be exclusive to the particular known protein. Consequently, the match may be a result of product ions from two or more other known proteins.

[0007] A second problem with the traditional method of identifying proteins in a sample is that there is no way to establish increasing quantifiable levels of confidence (since traditional peptide ID is non-deterministic). In the traditional method, a known protein may be identified in comparison to other known proteins with a 95% confidence level. As a result, there is a 5% chance the identification is incorrect independent of the non-deterministic nature of the assay. In addition, there is no way to increase the confidence level in the traditional method. Such levels of confidence do not allow tandem mass spectrometry identifications to be used in clinical analysis. Error rates required for clinical use are typically on the order of 0.01% (1 in 10,000) or 0.001% (1 in 100,000), for example.

[0008] A third problem with the traditional method of identifying proteins in a sample is that, for DIA workflows, the traditional identification method typically requires additional precursor ion information to disambiguate convolved experimental product ions. In other words, the traditional method cannot reliably identify proteins directly from DIA product ion data without peptide precursor ion identification information.

[0009] DIA is a tandem mass spectrometry workflow. In general, tandem mass

spectrometry, or MS/MS, is a well-known technique for analyzing compounds. As described above, tandem mass spectrometry involves ionization of one or more compounds from a sample, selection of one or more precursor ions of the one or more compounds, fragmentation of the one or more precursor ions into product ions, and mass analysis of the product ions. Tandem mass spectrometry can provide both qualitative and quantitative information. The product ion spectrum can be used to identify a molecule of interest. The intensity of one or more product ions can be used to quantitate the amount of the compound present in a sample.

A large number of different types of experimental methods or workflows can be performed using a tandem mass spectrometer. Three broad categories of these workflows are, targeted acquisition, information dependent acquisition (IDA) or data dependent acquisition (DDA), and DIA.

In a targeted acquisition method, one or more transitions of a peptide precursor ion to a product ion are predefined for one or more proteins. As a sample is being introduced into the tandem mass spectrometer, the one or more transitions are interrogated during each time period or cycle of a plurality of time periods or cycles. In other words, the mass spectrometer selects and fragment the peptide precursor ion of each transitions and performs a targeted mass analysis for the product ion of the transition. As a result, a mass spectrum is produced for each transitions. Targeted acquisition methods include, but are not limited to, multiple reaction monitoring (MRM) and selected reaction monitoring (SRM).

IDA is a flexible tandem mass spectrometry method in which a user can specify criteria for performing targeted or untargeted mass analysis of product ions while a sample is being introduced into the tandem mass spectrometer. For example, in an IDA method a precursor ion or mass spectrometry (MS) survey scan is performed to generate a precursor ion peak list. The user can select criteria to filter the peak list for a subset of the precursor ions on the peak list. MS/MS is then performed on each precursor ion of the subset of precursor ions. A product ion spectrum is produced for each precursor ion. MS/MS is repeatedly performed on the precursor ions of the subset of precursor ions as the sample is being introduced into the tandem mass spectrometer.

[0014] In proteomics and many other sample types, however, the complexity and

dynamic range of compounds is very large. This poses challenges for traditional targeted and IDA methods, requiring very high speed MS/MS acquisition to deeply interrogate the sample in order to both identify and quantify a broad range of analytes.

[0015] As a result, DIA methods have been used to increase the reproducibility and comprehensiveness of data collection from complex samples. DIA methods can also be called non-specific fragmentation methods. In a traditional DIA method, the actions of the tandem mass spectrometer are not varied among MS/MS scans based on data acquired in a previous precursor or product ion scan. Instead a precursor ion mass range is selected. A precursor ion mass selection window is then stepped across the precursor ion mass range. All precursor ions in the precursor ion mass selection window are fragmented and all of the product ions of all of the precursor ions in the precursor ion mass selection window are mass analyzed.

[0016] The precursor ion mass selection window used to scan the mass range can be very narrow so that the likelihood of multiple precursors within the window is small. This type of DIA method is called, for example, MS/MS^. In an MS/MS^ method a precursor ion mass selection window of about 1 amu is scanned or stepped across an entire mass range. A product ion spectrum is produced for each 1 amu precursor mass window. A product ion spectrum for the entire precursor ion mass range is produced by combining the product ion spectra for each mass selection window. The time it takes to analyze or scan the entire mass range once is referred to as one scan cycle. Scanning a narrow precursor ion mass selection window across a wide precursor ion mass range during each cycle, however, is not practical for some instruments and experiments.

[0017] As a result, a larger precursor ion mass selection window, or selection window with a greater width, is stepped across the entire precursor mass range. This type of DIA method is called, for example, SWATH™ acquisition. In SWATH™ acquisition the precursor ion mass selection window stepped across the precursor mass range in each cycle may have a width of 5-25 amu, or even larger. Like the MS/MS^ 1 method, all the precursor ions in each precursor ion mass selection window are fragmented, and all of the product ions of all of the precursor ions in each mass isolation window are mass analyzed. However, because a wider precursor ion mass selection window is used, the cycle time can be significantly reduced in comparison to the cycle time of the MS/MS ALL method.

[0018] U.S. Patent No. 8,809,770 describes how SWATH™ acquisition can be used to provide quantitative and qualitative information about the precursor ions of compounds of interest. In particular, the product ions found from fragmenting a precursor ion mass selection window are compared to a database of known product ions of compounds of interest. In addition, ion traces or extracted ion chromatograms (XICs) of the product ions found from fragmenting a precursor ion mass selection window are analyzed to provide quantitative and qualitative information.

[0019] As described above, however, identifying proteins in a sample analyzed using

SWATH™ acquisition, for example, can be difficult, because there is no peptide precursor ion information provided with a precursor ion mass selection window to help determine the precursor ion that produces each product ion. In addition, because there is no peptide precursor ion information provided with a precursor ion mass selection window, it is also difficult to determine if a product ion is convolved with or includes contributions from multiple precursor ions within the precursor ion mass selection window.

As a result, the mass spectrometry industry currently is unable to deterministically identify proteins in a sample from experimental product ion data alone. Also, the mass spectrometry industry currently is unable to perform identifications of proteins from any type of experimental data with increasing levels of confidence. Finally, the mass spectrometry industry currently is unable to reliably identify proteins in a sample produced by a DIA tandem mass spectrometry method from product ion data without the aid of any precursor ion data.

SUMMARY

[0021] A system is disclosed for deterministically identifying to a desired confidence probability a known protein of a sample from other known proteins of the sample using a combination of two or more exclusive combinations of theoretical product ions as an error detection and correction code in a tandem mass spectrometry data independent acquisition (DIA) method that does not provide peptide identifying information. The system includes a separation device, an ion source, a tandem mass spectrometer, and a processor.

[0022] The separation device separates peptides of two or more known proteins from a sample over time. The ion source receives the peptides from the separation device and ionizes the peptides, producing an ion beam of precursor ions. The tandem mass spectrometer receives the ion beam. The tandem mass spectrometer divides a mass-to-charge ratio (m/z) range of the ion beam into two or more precursor ion mass selection windows. The tandem mass spectrometer selects and fragments the two or more precursor ion mass selection windows during each cycle of a plurality of cycles, producing a plurality of measured product ion spectra.

The processor receives the plurality of measured product ion spectra from the tandem mass spectrometer. The processor receives a desired confidence probability for the identification of at least one known protein of two or more known proteins. The processor calculates two or more combinations of N theoretical product ions that are exclusive to the at least one known protein. The processor identifies the at least one known protein by finding product ions in the plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability. A method is disclosed for deterministically identifying a known protein of a sample from other known proteins of the sample using a tandem mass spectrometry DIA method that does not provide peptide identifying information. A plurality of measured product ion spectra are received from a tandem mass spectrometer using a processor. The plurality of measured product ion spectra are produced by the tandem mass spectrometer by dividing an m/z range of an ion beam into two or more precursor ion mass selection windows and selecting and fragmenting the two or more precursor ion mass selection windows during each cycle of a plurality of cycles. The ion beam is produced by an ion source that ionizes peptides of two or more known proteins, producing an ion beam of precursor ions. The peptides are separated from a sample by a separation device. A desired confidence probability for the identification of at least one known protein of two or more known proteins is received using the processor. Two or more combinations of N theoretical product ions that are exclusive to the at least one known protein are calculated using the processor. The at least one known protein is identified by finding product ions in the plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability using the processor.

[0027] A computer program product is disclosed that includes a non-transitory and tangible computer-readable storage medium whose contents include a program with instructions being executed on a processor so as to perform a method for deterministically identifying to a desired confidence probability a known protein of a sample from other known proteins of the sample using a combination of two or more exclusive combinations of theoretical product ions as an error detection and correction code in a tandem mass spectrometry DIA method that does not provide peptide identifying information. In various embodiments, the method includes providing a system, wherein the system comprises one or more distinct software modules, and wherein the distinct software modules comprise a measurement module and an analysis module.

[0028] The measurement module receives a plurality of measured product ion spectra from a tandem mass spectrometer. The plurality of measured product ion spectra are produced by the tandem mass spectrometer by dividing an m/z range of an ion beam into two or more precursor ion mass selection windows and selecting and fragmenting the two or more precursor ion mass selection windows during each cycle of a plurality of cycles. The ion beam is produced by an ion source that ionizes peptides of two or more known proteins, producing an ion beam of precursor ions. The peptides are separated from a sample by a separation device. [0029] The analysis module receives a desired confidence probability for the identification of at least one known protein of two or more known proteins. The analysis module calculates two or more combinations of N theoretical product ions that are exclusive to the at least one known protein. The analysis module identifies the at least one known protein by finding product ions in the plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability.

[0030] These and other features of the applicant's teachings are set forth herein.

BRIEF DESCRIPTION OF THE DRAWINGS

[0031] The skilled artisan will understand that the drawings, described below, are for illustration purposes only. The drawings are not intended to limit the scope of the present teachings in any way.

[0032] Figure 1 is a block diagram that illustrates a computer system, upon which

embodiments of the present teachings may be implemented.

[0033] Figure 2 is an exemplary diagram of a precursor ion mass-to-charge ratio (m/z) range that is divided into ten precursor ion mass selection windows for a data independent acquisition (DIA) workflow, in accordance with various embodiments.

[0034] Figure 3 is an exemplary diagram that graphically depicts the steps for obtaining product ion traces or XICs from each precursor ion mass selection window during each cycle of a DIA workflow, in accordance with various embodiments. [0035] Figure 4 is an exemplary diagram that shows the three-dimensionality of an XIC obtained for a precursor ion mass selection window over time, in accordance with various embodiments.

[0036] Figure 5 is an exemplary diagram that graphically depicts the steps for calculating exclusive addresses for known proteins in a sample, in accordance with various embodiments.

[0037] Figure 6 is an exemplary diagram that graphically depicts the step for identifying known proteins in a sample by using their exclusive addresses, in accordance with various embodiments.

[0038] Figure 7 is a schematic diagram of a system for deterministically identifying to a desired confidence probability a known protein of a sample from other known proteins of the sample using a combination of two or more exclusive combinations of theoretical product ions as an error detection and correction code in a tandem mass spectrometry DIA method that does not provide peptide identifying information, in accordance with various embodiments.

[0039] Figure 8 is a flowchart showing a method for deterministically identifying to a desired confidence probability a known protein of a sample from other known proteins of the sample using a combination of two or more exclusive combinations of theoretical product ions as an error detection and correction code in a tandem mass spectrometry DIA method that does not provide peptide identifying information, in accordance with various embodiments.

[0040] Figure 9 is a schematic diagram of a system that includes one or more distinct software modules that performs a method for deterministically identifying to a desired confidence probability a known protein of a sample from other known proteins of the sample using a combination of two or more exclusive combinations of theoretical product ions as an error detection and correction code in a tandem mass spectrometry DIA method that does not provide peptide identifying information, in accordance with various embodiments.

[0041] Before one or more embodiments of the present teachings are described in detail, one skilled in the art will appreciate that the present teachings are not limited in their application to the details of construction, the arrangements of components, and the arrangement of steps set forth in the following detailed description or illustrated in the drawings. Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting.

DESCRIPTION OF VARIOUS EMBODIMENTS

COMPUTER-IMPLEMENTED SYSTEM

[0042] Figure 1 is a block diagram that illustrates a computer system 100, upon which embodiments of the present teachings may be implemented. Computer system 100 includes a bus 102 or other communication mechanism for communicating information, and a processor 104 coupled with bus 102 for processing information. Computer system 100 also includes a memory 106, which can be a random access memory (RAM) or other dynamic storage device, coupled to bus 102 for storing instructions to be executed by processor 104. Memory 106 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 104. Computer system 100 further includes a read only memory (ROM) 108 or other static storage device coupled to bus 102 for storing static information and instructions for processor 104. A storage device 110, such as a magnetic disk or optical disk, is provided and coupled to bus 102 for storing information and instructions.

[0043] Computer system 100 may be coupled via bus 102 to a display 112, such as a cathode ray tube (CRT) or liquid crystal display (LCD), for displaying information to a computer user. An input device 114, including alphanumeric and other keys, is coupled to bus 102 for communicating information and command selections to processor 104. Another type of user input device is cursor control 116, such as a mouse, a trackball or cursor direction keys for communicating direction information and command selections to processor 104 and for controlling cursor movement on display 112. This input device typically has two degrees of freedom in two axes, a first axis (i.e., x) and a second axis (i.e., y), that allows the device to specify positions in a plane.

[0044] A computer system 100 can perform the present teachings. Consistent with

certain implementations of the present teachings, results are provided by computer system 100 in response to processor 104 executing one or more sequences of one or more instructions contained in memory 106. Such instructions may be read into memory 106 from another computer-readable medium, such as storage device 110. Execution of the sequences of instructions contained in memory 106 causes processor 104 to perform the process described herein. Alternatively hard-wired circuitry may be used in place of or in combination with software instructions to implement the present teachings. Thus implementations of the present teachings are not limited to any specific combination of hardware circuitry and software.

[0045] In various embodiments, computer system 100 can be connected to one or more other computer systems, like computer system 100, across a network to form a networked system. The network can include a private network or a public network such as the Internet. In the networked system, one or more computer systems can store and serve the data to other computer systems. The one or more computer systems that store and serve the data can be referred to as servers or the cloud, in a cloud computing scenario. The one or more computer systems can include one or more web servers, for example. The other computer systems that send and receive data to and from the servers or the cloud can be referred to as client or cloud devices, for example.

[0046] The term "computer-readable medium" as used herein refers to any media that participates in providing instructions to processor 104 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 110. Volatile media includes dynamic memory, such as memory 106. Transmission media includes coaxial cables, copper wire, and fiber optics, including the wires that comprise bus 102.

[0047] Common forms of computer-readable media or computer program products

include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, digital video disc (DVD), a Blu-ray Disc, any other optical medium, a thumb drive, a memory card, a RAM, PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any other tangible medium from which a computer can read.

[0048] Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 104 for execution. For example, the instructions may initially be carried on the magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 100 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector coupled to bus 102 can receive the data carried in the infra-red signal and place the data on bus 102. Bus 102 carries the data to memory 106, from which processor 104 retrieves and executes the instructions. The instructions received by memory 106 may optionally be stored on storage device 110 either before or after execution by processor 104.

[0049] In accordance with various embodiments, instructions configured to be executed by a processor to perform a method are stored on a computer-readable medium. The computer-readable medium can be a device that stores digital information. For example, a computer-readable medium includes a compact disc read-only memory (CD-ROM) as is known in the art for storing software. The computer- readable medium is accessed by a processor suitable for executing instructions configured to be executed.

[0050] The following descriptions of various implementations of the present teachings have been presented for purposes of illustration and description. It is not exhaustive and does not limit the present teachings to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practicing of the present teachings. Additionally, the described implementation includes software but the present teachings may be implemented as a combination of hardware and software or in hardware alone. The present teachings may be implemented with both object-oriented and non-object-oriented programming systems. DETERMINISTIC IDENTIFICATION OF PROTEINS

[0051] As described above, the mass spectrometry industry currently is unable to

deterministically identify proteins in a sample from experimental product ion data alone. Also, the mass spectrometry industry currently is unable to perform identifications of proteins from any type of experimental data with increasing levels of confidence. Finally, the mass spectrometry industry currently is unable to reliably identify proteins in a sample produced by a data independent acquisition (DIA) tandem mass spectrometry method from product ion data without the aid of any precursor ion data.

[0052] In various embodiments, proteins are deterministically identified in a sample from experimental product ion data alone. This is done by calculating for each protein in the sample one or more combinations of two or more product ions that are exclusive to the protein and comparing the one or more combinations to the experimental product ion data. For each protein, each of the one or more combinations can be thought of as an address or code for the protein. The terms address and code are used interchangeably herein. As a result, one or more unique addresses or codes are calculated for each known protein in the sample.

[0053] In various embodiments, proteins are deterministically identified in a sample with increasing confidence by manipulating the addresses or codes exclusive to each protein according to information theory. An address or code is deterministic if has enough information to differentiate the protein from all other proteins in the group. However, deterministic addresses can further be manipulated to include more information to increase the confidence in the identification. For example, each address can include more product ions, or two or more addresses can be used together.

[0054] In various embodiments, proteins are reliably and deterministically identified with increasing levels of confidence from a sample produced by a DIA tandem mass spectrometry method from product ion data without the aid of any precursor ion data. As described above, DIA tandem mass spectrometry methods increase the reproducibility and comprehensiveness of data collection from complex samples. They collect product ion spectra for all precursor ions within a large m/z range. By calculating deterministic addresses for each protein in a sample, manipulating the addresses to provide increasing levels of confidence, and comparing the manipulated addresses to product ion spectra produced from a DIA method, the proteins in a sample are reliably and deterministically identified with increasing levels of confidence and without the aid of any precursor ion data or precursor ion identification.

[0055] Sherman et al. , Unique Ion Signature Mass Spectrometry, a Deterministic Method to Assign Peptide Identity, Mol Cell Proteomics, 2009 Sep, 8(9): 2051-2062, proposed a deterministic method for identifying peptides. They computationally digested proteins in a proteome and computationally selected and fragmented the resulting theoretical peptides producing a set of theoretical product ions for each theoretical peptide in the proteome. They then compared all theoretical peptides precursor ions and product ions and found the precursor ion to product ion transitions exclusive to each theoretical peptide. They called a set of two or more of these transitions a unique ion signature (UIS). They then showed how these UISs can be used to reduce the number of transitions needed to identify a peptide using a targeted acquisition method, such as selected reaction monitoring (SRM). In contrast to the various embodiments described herein, Sherman et al. calculated addresses or codes exclusive to peptides and not to proteins. In other words, they found sets of precursor ion to product ion transitions are exclusive to peptides. The various embodiments described herein calculate sets of precursor ion to product ion transitions that are exclusive to proteins. As a result, an address of protein can include precursor ion to product ion transitions that are not exclusive to any of it peptides.

Sherman et al. also describe that further development of the peptide identification method would benefit from solutions adapted from the field of signal analysis. They provide that optimized solutions can provide added confidence in UIS identification.

In contrast to the various embodiments described herein, Sherman et al. do not describe any specific solutions adapted from the field of signal analysis. They also do not provide how such solutions can be adapted to peptide identification let alone to protein identification.

DIA methods provide a larger amount of data than targeted acquisition methods like SRM. In addition, unlike targeted acquisition methods like SRM, DIA methods do not provide information about the specific precursor ions from which each product ion is produced.

Figure 2 is an exemplary diagram 200 of a precursor ion mass-to-charge ratio (m/z) range that is divided into ten precursor ion mass selection windows for a data independent acquisition (DIA) workflow, in accordance with various embodiments. The m/z range shown in Figure 2 is 200 m/z. Note that the terms "mass" and "m/z" are used interchangeably herein. Generally, mass spectrometry measurements are made in m/z and converted to mass by multiplying charge. [0061] Each of the ten precursor ion mass selection or isolation windows spans or has a width of 20 m/z. Three of the ten precursor ion mass selection windows, windows 201, 202, and 210, are shown in Figure 2. Precursor ion mass selection windows 201, 202, and 210 are shown as non-overlapping windows with the same width. In various embodiments, precursor ion mass selection windows can overlap and/or can have variable widths. U.S. Patent Application No. 14/401,032 describes using overlapping precursor ion mass selection windows in a single cycle of SWATH™ acquisition, for example. U.S. Patent No. 8,809,772 describes using precursor ion mass selection windows with variable widths in a single cycle of SWATH™ acquisition using variable precursor ion mass selection windows in SWATH™ acquisition, for example. In a conventional SWATH™ acquisition, each of the ten precursor ion mass selection windows is selected and then fragmented, producing ten product ion spectra for the entire m/z range shown in Figure 2.

[0062] Figure 2 depicts non-variable and non-overlapping precursor ion mass selection windows used in a single cycle of an exemplary SWATH™ acquisition. A tandem mass spectrometer that can perform a SWATH™ acquisition method can further be coupled with a sample introduction device. The proteins of a sample are typically digested using an enzyme, such as trypsin before the sample is introduced into the tandem mass spectrometer. As a result, the sample introduction device separates one or more proteins digested proteins, or peptides, from the sample over time, for example. A sample introduction device can introduce a sample to the tandem mass spectrometer using a technique that includes, but is not limited to, injection, liquid chromatography, gas

chromatography, capillary electrophoresis, or ion mobility. The separated one or more peptides are ionized by an ion source, producing an ion beam of precursor ions of the one or more proteins that are selected and fragmented by the tandem mass spectrometer.

As a result, for each time step of a sample introduction of separated proteins, each of the ten precursor ion mass selection windows is selected and then fragmented, producing ten product ion spectra for the entire m/z range. In other words, each of the ten precursor ion mass selection windows is selected and then fragmented during each cycle of a plurality of cycles.

Figure 3 is an exemplary diagram 300 that graphically depicts the steps for obtaining product ion traces or XICs from each precursor ion mass selection window during each cycle of a DIA workflow, in accordance with various embodiments. For example, ten precursor ion mass selection windows, represented by precursor ion mass selection windows 201, 202, and 210 in Figure 3, are selected and fragmented during each cycle of a total of 1000 cycles.

During each cycle a product ion spectrum is obtained for each precursor ion mass selection window. For example, product ion spectrum 311 is obtained by fragmenting precursor ion mass selection window 201 during cycle 1, product ion spectrum 312 is obtained by fragmenting precursor ion mass selection window 201 during cycle 2, and product ion spectrum 313 is obtained by fragmenting precursor ion mass selection window 201 during cycle 1000.

By plotting the intensities of the product ions in each product ion spectrum of each precursor ion mass selection window over time, XICs are obtained for each precursor ion mass selection window. For example, XIC 320 is calculated from the 1000 product ion spectra of precursor ion mass selection window 201. XIC 320 includes XIC peaks or traces for all of the product ions that are produced from fragmenting precursor ion mass selection window 201 during the 1000 cycles. Note that XICs can be plotted in terms of time or cycles.

[0067] XIC 320 is shown plotted in two dimensions in Figure 3. However, each XIC of each precursor ion mass selection window is actually three-dimensional, because the different XIC peaks represent different m/z values.

[0068] Figure 4 is an exemplary diagram 400 that shows the three-dimensionality of an

XIC obtained for a precursor ion mass selection window over time, in accordance with various embodiments. In Figure 4, the x axis is time or cycle number, the y axis is product ion intensity, and the z axis is m/z. From this three-dimensional plot, more information is obtained. For example, XIC peaks 410 and 420 both have the same shape and occur at the same time, or same retention time. However, XIC peaks 410 and 420 have different m/z values. This may mean that XIC peaks 410 and 420 are isotopic peaks or represent different product ions from the same precursor ion. Similarly, XIC peaks 430 and 440 have the same m/z value, but occur at different times. This may mean that XIC peaks 430 and 440 are the same product ion, but they are from two different precursor ions.

[0069] Due to the three-dimensional nature of DIA XICs, proteins can also be identified based on the elution order and time between product ions. As a result, in various embodiments, addresses or codes exclusive to proteins can also include timing information.

[0070] The XIC for each precursor ion mass selection window can be filtered so that it includes only product ion XIC peaks produced from peptides. For example, XIC peaks with intensities below predetermined threshold intensity and non-isotopic XIC peaks can be excluded from each XIC. The charge state of product ions of peptides can be determined through high resolution mass spectrometry. This is done, for example, by determining the m/z difference between isotopic peaks. As a result, the mass or molecular weight of product ions of peptides can be determined from high resolution mass spectrometry. In various embodiments, therefore, product ion data can be expressed as ion mass or molecular weight rather than m/z.

After obtaining product ion experimental data using a DIA method, proteins in a sample are identified by comparing theoretical product ions of known proteins to the product ion experimental data. Theoretical product ions are computationally generated from stored information about the one or more known proteins expected to be the sample. This stored information can be stored in many different forms including, but not limited to, databases and flat files.

In various embodiments, stored information about known proteins or peptides is obtained from a FASTA file. The FASTA file is parsed. The proteins parsed from the FASTA file are then computationally digested using the same enzyme used to digest the sample in the experiments. Computational digestion of the one or more known proteins produces one or more theoretical peptides, or one or more peptide precursor ions, for each protein. Theoretical product ions for each protein are obtained by computationally fragmenting theoretical peptide precursor ions of each protein. For example, theoretical product ions are obtained by selecting the b and j fragments of theoretical peptide precursor ions.

In various embodiments, theoretical peptides and product ions of each protein are compared with theoretical peptides and product ions of every other protein expected in the sample in order to determine an exclusive combination of theoretical peptides and product ions for each protein. For example, a sample may include three known proteins, Pi, Pi, and j. The z 'th protein is denoted as Pi. The / h theoretical peptide is denoted as pj. Theoretical peptide pj has fragments or product ions denoted as f(j,i) ... fo,nj . Protein and peptide indexes are independent. Known proteins, Pi, Pi, and j are found to have the following sequences: Pi = (pi, pi, p3) = SGEPQSDDIEASR HLIER DVTYLTEEK; P 2 = (pi, pi, p 4 ) =

SGEPQSDDIEASR HLIER LPLAAQGK; P 3 = (pi, p 3 , ps) = HLIER

DVTYLTEEK DSVLIR.

Note that Pi cannot be identified, because it is composed entirely of shared theoretical peptides pi, pi, and p3. Also, if there is insufficient information to confidently identify theoretical peptides p4 and ps, proteins Pi and j also cannot be identified. Peptide identifications cannot be used to identify proteins Pi and j, because there is no way to determine that one source of pi was Pi and the other source of pi was Pi. In addition, if there is also only partial fragmentation, the peptide may not be identifiable. Differentiating from greater than one million peptides is a significantly more difficult problem that differentiating one protein from 20,000 possible proteins.

Assuming that each theoretical peptide generates only two product ions, the product ions for each theoretical peptide are: pi→ {f(i,i),f(i,i)} ; pi→ {f(i,i),f(i,i)} ; P3→ {f(3,i),f(3,i)} ; p4→ ps→ {f(5,i),f(5,i)} . A product ion combination length for the proteins is chosen. The length can be two or more product ions. In this example, the length is three.

In various embodiments, choosing the length of the product ion combinations can be an iterative process. A short length can be chosen first, and the length can be increased until exclusive combinations can be found for the proteins of interest. [0079] In this example, the length is chosen to be three product ions. Every permutation of three product ions for each of proteins P2 and j is then calculated. Each combination is then compared against every other combination. Shared combinations or addresses are discarded.

[0080] Table 1 shows some of the permutations or combinations, Uk, calculated for proteins P2 and j that are exclusive to proteins P2 and j.

Table 1

[0081] To summarize, for example, if U03 of Table 1 is detected in the experimental data without noise, then P2 is present in the sample. The same is true of all of Uoi . . . U08. Similarly, if any one of U09 . . . Uie of Table 1 is detected in the experimental data without noise, then P3 is present in the sample. Note that Table 1 only depicts product ions from three different peptides in each combination. However, this is not a requirement. A combination or address can include two or more product ions from the same peptide.

[0082] Figure 5 is an exemplary diagram 500 that graphically depicts the steps for

calculating exclusive addresses for known proteins in a sample, in accordance with various embodiments. In Step 510, the sequences of known or expected proteins in a sample are obtained from stored information 501 about proteins. Stored information 501 can be, but is not limited to, a flat file or database. As show in Figure 5, and like the example described above, three protein sequences, Pi,P2, andP3 are obtained, parsed, or queried from stored information 501.

In step 520, each of the three protein sequences are computationally digested based on the same enzyme used in the mass spectrometry experiment. The three sequences ΡΙ,ΡΊ, and j are digested into different combinations five peptides, pi,

In step 530, each of the peptides of each protein is computationally fragmented. As shown in Figure 5, each peptide fragments into two product ions. For example, peptide pi fragments into f(i,i) and f(i,2).

In step 540, every permutation of combinations of three product ions of each protein are calculated. These are the addresses or codes of each protein. The use of three product ions in each combination or address is arbitrary. Generally the number of product ions, or the length of the address, can be increased iteratively. In step 550, each combination or address of the known proteins is compared to every other combination or address of the known proteins. Redundant combinations or addresses are removed. The remaining addresses are the exclusive addressed for each protein. Note that Pi has no exclusive addresses. This is because it is composed entirely of shared theoretical peptides pi, p2, and p3, as described above. As a result, Pi cannot be identified.

After determining exclusive addresses for each known protein, one or more of the addresses are compared to the sample product ion spectra. If an exclusive address of a protein matches the sample product ion spectra, a protein is identified in the sample.

Figure 6 is an exemplary diagram 600 that graphically depicts the step for identifying known proteins in a sample by using their exclusive addresses, in accordance with various embodiments. In Step 610, for example, one or more of the exclusive addresses of proteins P2 and j are compared to the product ion spectral data collected for a sample using a DIA method. The exclusive addresses of proteins P2 and P3 are the same exclusive addresses shown in Figure 5, and the product ion spectral data is the same data shown in Figure 3, for example.

[0089] A match of just one exclusive address of proteins P2 and j is enough to identify proteins ^ and j in the absence of noise. However, noise is present in every experiment. As a result, matching one exclusive address of proteins P2 and j provides identifications with a certain level of confidence. As described above, the mass spectrometry industry to date has been unable to provide clinical levels of confidence for protein identifications.

[0090] In various embodiments, exclusive addresses calculated for known proteins are used to provide clinical levels of confidence for protein identifications. More specifically exclusive addresses calculated for known proteins are used as error correcting codes to increase the levels of confidence for protein identifications.

[0091] Error correcting codes are a part of information theory that has long been used in the field of communications. More specifically, error correcting codes have been used to improve the communication of digital signals between a transmitter and a receiver through a channel that includes noise. At some point the digital ones and zeros sent through the channel get corrupted along the way. An exemplary error correcting code is simply to send each digital packet from the transmitter two or more times through the channel. The redundancy of the packets is increased until the uncertainty of the digital message is driven below some small level, for example. Error correcting codes could be applied to protein identification using mass spectrometry by, for example, repeatedly rerunning the same sample. This is like increasing the redundancy of the transmitted message in the field of

communications. This is, however, impractical because it significantly reduces the throughput of samples.

Instead, in various embodiments, error correcting codes are applied in tandem mass spectrometry by increasing the redundancy of the identifications. In the field of communications, this is like increasing the number of times a message is read at the receiver. In protein identification, the tandem mass spectrometer is like a transmitter that sends product ion spectra rather than messages. A processor in communication with the tandem mass spectrometer is like the receiver that compares multiple exclusive combinations of product ions of a particular protein to the product ion spectra. Error correcting codes that can transform long lists of exclusive addresses of known proteins into high confidence identifications include, but are not limited to, best-N out of M, Reed-Solomon codes to mass spectrometry (MS)-data, low density parity check (LDPC) codes, convolutional codes, and the Bahl, Cocke, Jelinek and Raviv (BCJR) algorithm.

In various embodiments, the error correcting codes used in protein identification are simply the addresses of exclusive product ions for each known protein. The level of confidence for these codes can be established in a number of ways. For, example, a large set of data can be used and the likelihood of each product ion occurring at random is calculated directly. Alternatively, a likelihood of random occurrence is estimated that errors on the safe side.

Using this latter method, for example, the likelihood of a product ion's occurrence being random interference is estimated to be Y = 0.1. The likelihood that three product ions are all true is the product of the likelihood of each product ion. For example, the probability of exclusive address Uoi of protein P2 matching the measure product ion spectra is Pr(Uoi), where

Pr(Uoi) = Pr(f ( ) ) x Prffau) * Pr(f ( 4,i)) = (1 - ) 3 = (1 - 0 = 0-729. So the likelihood of protein P2 given exclusive address Uoi made up of three product ions is 0.729.

[0096] The likelihood of a protein is increased by identifying more exclusive addresses in the sample data. For example, if addresses Uoi and U04 of protein P2 are identified, the confidence in the identification is given by the probability of Uoi OR Uo4 (logical inclusive disjunction), PrfUoi V U04), where

PrfUoi V U04) = Pr(Uoi) Pr(U 04 ) + Pr(Uoi) Q(U 04 ) + Q(Uoi) Pr(U 04 )

PrfUoi V U04) = 1 - ζ 2 .

[0097] Table 2 shows how the confidence level of protein identification is increased through the use of increasing numbers of exclusive addresses or codes.

Table 2

So if the desired error rate is 1 : 1,000, then matches to five exclusive addresses must be found in the product ion spectral data. [0099] If the observed addresses Table 2 share product ions, the symmetry of using powers of ζ breaks down. However, it is still possible to calculate the likelihood. It just requires enumeration of all the possible states and then the use of the binomial distribution and Y.

[00100] As described above, each element of an address can be a single product ion from a different peptide of a protein. So, each element has a different m/z value. In order for an address to match the measured mass spectra, each m/z value of the address must have an intensity in the measured mass spectra above a certain threshold value. In other words, an intensity threshold level is used to determine if each element of an address is present, if each element of an address is a single product ion from a different peptide of a protein.

[00101] In various embodiments, each element of an address is a pair of product ions from the same peptide of the protein. If pairs of product ions from the same peptide are used as an element of the address, XIC curve subtraction can be used to determine if each element of address is present. XIC curve subtraction is described in U.S. Provisional Patent Application No. 62,112,212, which is incorporated herein by reference. Essentially, XIC curve subtraction allows the presence of a pair of product ions of the peptide to be confirmed by calculating a measure of the difference between the XICs of the two product ions rather than by their measured intensities.

[00102] As described above, every permutation of product ions for the length of address chosen is calculated for each of the known proteins. Then every permutation or address of a protein is compared against every permutation or address of the other known proteins to find the exclusive permutations or addresses for each protein. This method of calculating the exclusive addresses is independent of the measured data.

In various embodiments, the exclusive addresses are more efficiently found by using the measured data in their calculation. For example, every product ion for each of the known proteins is still computationally generated. However, the list of product ions for each protein is shortened by determining if each product ion is present in the measured data first. Product ions are identified if they are above a certain threshold intensity level, for example. All the permutations of identified product ions for the length of address chosen are then calculated for each of the known proteins, and every permutation or address of identified product ions of a protein is compared against every permutation or address of identified product ions of the other known proteins to find the exclusive permutations or addresses for each protein.

System for Deterministic Protein Identification

[00104] Figure 7 is a schematic diagram of a system 700 for deterministically identifying to a desired confidence probability a known protein of a sample from other known proteins of the sample using a combination of two or more exclusive combinations of theoretical product ions as an error detection and correction code in a tandem mass spectrometry DIA method that does not provide peptide identifying information, in accordance with various embodiments. System 700 includes ion source 710, tandem mass spectrometer 720, and processor 730. In various embodiments, system 700 can also include separation device 740.

[00105] Separation device 740 can separate peptides of two or more known proteins from a sample over time using one of a variety of techniques. These techniques include, but are not limited to, ion mobility, gas chromatography (GC), liquid chromatography (LC), capillary electrophoresis (CE), or flow injection analysis (FIA).

[00106] Ion source 710 can be part of tandem mass spectrometer 720, or can be a separate device. Ion source 710 receives the peptides from separation device 740 and ionizes the peptides, producing an ion beam of precursor ions.

[00107] Tandem mass spectrometer 720 can include, for example, one or more physical mass filters and one or more physical mass analyzers. A mass analyzer of tandem mass spectrometer 720 can include, but is not limited to, a time-of-flight (TOF), quadrupole, an ion trap, a linear ion trap, an orbitrap, or a Fourier transform mass analyzer.

[00108] Tandem mass spectrometer 720 receives the ion beam from ion source 710.

Tandem mass spectrometer 720 divides an m/z range of the ion beam into two or more precursor ion mass selection windows and selects and fragments the two or more precursor ion mass selection windows during each cycle of a plurality of cycles, producing a plurality of measured product ion spectra.

[00109] Processor 730 can be, but is not limited to, a computer, microprocessor, or any device capable of sending and receiving control signals and data from tandem mass spectrometer 720 and processing data. Processor 730 can be, for example, computer system 100 of Figure 1. In various embodiments, processor 730 is in communication with tandem mass spectrometer 720 and separation device 740.

[00110] Processor 730 performs a number of steps. In step (a), processor 730 receives the plurality of measured product ion spectra from tandem mass spectrometer 720. In step (b), processor 730 receives a desired confidence probability for the identification of at least one known protein of two or more known proteins. In step (c), processor 730 calculates two or more combinations of N theoretical product ions that are exclusive to the at least one known protein. Finally, in step (d), processor 730 identifies the at least one known protein by finding product ions in the plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability.

[00111] In various embodiments, processor 730 identifies the at least one known protein by finding product ions in the plurality of measured product ion spectra the match combinations the two or more combinations of N theoretical product ions.

Confidence probabilities are then calculated for the matching combinations. Two or more matching combinations are then combined to provide a combined confidence probability greater than or equal to the desired confidence probability.

[00112] Alternatively, in various embodiments, a confidence probability is calculated for each combination of the two or more of the two or more combinations of N theoretical product ions. Combinations of two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability are then compared to the plurality of measured product ion spectra to identify the at least one known protein.

[00113] In various embodiments, the two or more of the two or more combinations of N theoretical product ions that match the product ions in the plurality of measured product ion spectra represent an error detection and correction code.

[00114] In various embodiments, processor 730 calculates two or more combinations of N theoretical product ions that are exclusive to the at least one known protein in the set of two of more proteins by performing a number of additional steps. In step (ci), processor 730 retrieves from a memory a sequence for each protein of the two or more known proteins. In step (cii), processor 730 calculates for each sequence of the two or more known proteins one or more theoretical peptides and computationally selects and fragments each theoretical peptide of the one or more theoretical peptides, producing a plurality of theoretical product ions for each protein of the two or more known proteins. In step (ciii), processor 730 selects a number, N, of theoretical product ions to be used to identify known proteins. In step (civ), from the plurality of theoretical product ions for each protein of the two or more known proteins, processor 730 calculates every different combination of N theoretical product ions, producing one or more combinations for each protein. In step (cv), processor 730 compares each combination of each protein of the two or more known proteins to each combination of every other protein of the two or more known proteins to determine two or more combinations of N theoretical product ions that exclusively identify the at least one known protein of the two or more known proteins.

[00115] In various embodiments, processor 730 further calculates a confidence probability for each of the two or more combinations of N theoretical product ions by directly calculating a likelihood that each product ion of each combination occurs at random from data stored about the product ions.

[00116] In various embodiments, processor 730 further calculates a confidence probability for each of the two or more combinations of N theoretical product ions by estimating a likelihood that any product ion occurs at random, Y, calculating a confidence probability for any product ion as (1 - Y), and calculating a confidence probability for each of the two or more combinations of N theoretical product ions as a product of the confidence probabilities of the product ions of the at least one exclusive combination, (1 - Y) N .

[00117] In various embodiments, processor 730 further iteratively executes steps (ciii)-(cv) and increases the number N in each iteration until the processor further determines in step (cv) two or more combinations of N theoretical product ions that exclusively identify the at least one known protein of the two or more known proteins.

[00118] In various embodiments, in step (d), processor 730 finds product ions in the

plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability by comparing the measured intensity levels of the plurality of measured product ion spectra at the m/z values of the product ions of two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability to a threshold level that indicates the presence of a product ion.

[00119] In various embodiments, in step (ciii), processor 730 further selects a number, N, of theoretical product ion pairs to be used to identify known proteins. In step (civ), processor 730 further, from the plurality of theoretical product ions for each protein of the two or more known proteins, calculates every different combination of N theoretical product ion pairs, producing one or more combinations for each protein, wherein each theoretical ion pair is from the same theoretical peptide. In step (d), processor 730 further finds product ion pairs in the plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ion pairs that provide a combined confidence probability greater than or equal to the desired confidence probability by performing curve subtraction on two ion extracted chromatograms (XICs) calculated from the plurality of measured product ion spectra for each pair of product ions of two or more of the two or more combinations of N theoretical product ion pairs.

[00120] In various embodiments, before step (civ), processor 730 compares each product ion of each plurality of theoretical product ions for each protein of the two or more known proteins to the plurality of measured product ion spectra and removes the product ion from the plurality of theoretical product ions if the product ion is not present in the plurality of measured product ion spectra.

[00121] In various embodiments, in step (cii), processor 730 further calculates an elution order for each theoretical peptide of each known protein and stores the elution order with each theoretical product ion calculated from theoretical peptide. In step (cv), processor 730 further uses elution order in comparing each combination of each protein of the two or more known proteins to each combination of every other protein of the two or more known proteins to determine two or more combinations of N theoretical product ions that exclusively identify the at least one known protein of the two or more known proteins.

In various embodiments, in step (d), processor 730 finds product ion pairs in the plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ion pairs that provide a combined confidence probability greater than or equal to the desired confidence probability by calculating extracted ion chromatograms (XICs) for product ions of the plurality of measured product ion spectra, comparing the measured intensity levels of the XICs of the product ions of the plurality of measured product ion spectra at the m/z values of the product ions of two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability to a threshold level that indicates the presence of a product ion, and comparing the retention times of the XICs of the product ions of the plurality of measured product ion spectra at the m/z values of the product ions of the two or more of the two or more combinations of N theoretical product ions to the elution orders of the product ions of the two or more of the two or more combinations of N theoretical product ions.

Method for Deterministic Protein Identification

[00122] Figure 8 is a flowchart showing a method 800 for deterministically identifying to a desired confidence probability a known protein of a sample from other known proteins of the sample using a combination of two or more exclusive combinations of theoretical product ions as an error detection and correction code in a tandem mass spectrometry DIA method that does not provide peptide identifying information, in accordance with various embodiments.

[00123] In step 810 of method 800, a plurality of measured product ion spectra are

received from a tandem mass spectrometer using a processor. The plurality of measured product ion spectra are produced by the tandem mass spectrometer by dividing an m/z range of an ion beam into two or more precursor ion mass selection windows and selecting and fragmenting the two or more precursor ion mass selection windows during each cycle of a plurality of cycles. The ion beam is produced by an ion source that ionizes peptides of two or more known proteins, producing an ion beam of precursor ions. The peptides separated from a sample by a separation device. [00124] In step 820, a desired confidence probability for the identification of at least one known protein of two or more known proteins is received using the processor.

[00125] In step 830, two or more combinations of N theoretical product ions that are

exclusive to the at least one known protein are calculated using the processor.

[00126] In step 840, the at least one known protein is identified by finding product ions in the plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability using the processor.

Computer Program Product for Identifying Compounds using a Binary Bit Matrix

[00127] In various embodiments, computer program products include a tangible computer- readable storage medium whose contents include a program with instructions being executed on a processor so as to perform a method for deterministically identifying to a desired confidence probability a known protein of a sample from other known proteins of the sample using a combination of two or more exclusive combinations of theoretical product ions as an error detection and correction code in a tandem mass spectrometry DIA method that does not provide peptide identifying information. This method is performed by a system that includes one or more distinct software modules.

[00128] Figure 9 is a schematic diagram of a system 900 that includes one or more distinct software modules that performs a method for deterministically identifying to a desired confidence probability a known protein of a sample from other known proteins of the sample using a combination of two or more exclusive combinations of theoretical product ions as an error detection and correction code in a tandem mass spectrometry DIA method that does not provide peptide identifying information, in accordance with various embodiments. System 900 includes measurement module 910 and analysis module 920.

[00129] Measurement module 910 receives a plurality of measured product ion spectra from a tandem mass spectrometer. The plurality of measured product ion spectra are produced by the tandem mass spectrometer by dividing an m/z range of an ion beam into two or more precursor ion mass selection windows and selecting and fragmenting the two or more precursor ion mass selection windows during each cycle of a plurality of cycles. The ion beam is produced by an ion source that ionizes peptides of two or more known proteins, producing an ion beam of precursor ions. The peptides are separated from a sample by a separation device.

[00130] Analysis module 920 receives a desired confidence probability for the

identification of at least one known protein of two or more known proteins. Analysis module 920 calculates two or more combinations of N theoretical product ions that are exclusive to the at least one known protein. Analysis module 920 identifies the at least one known protein by finding product ions in the plurality of measured product ion spectra that match two or more of the two or more combinations of N theoretical product ions that provide a combined confidence probability greater than or equal to the desired confidence probability.

[00131] While the present teachings are described in conjunction with various

embodiments, it is not intended that the present teachings be limited to such embodiments. On the contrary, the present teachings encompass various alternatives, modifications, and equivalents, as will be appreciated by those of skill in the art. Further, in describing various embodiments, the specification may have presented a method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. As one of ordinary skill in the art would appreciate, other sequences of steps may be possible. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. In addition, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the various embodiments.