Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR DETERMINING AML PROGNOSIS
Document Type and Number:
WIPO Patent Application WO/2023/111313
Kind Code:
A1
Abstract:
The present invention relates to a method for stratifying an Acute Myeloid Leukemia ("AML") patient into a group of predicted outcome in relation to determinants such as, but not limited to, overall survival, event free survival and/or reaching complete remission, as well as predicted response to treatment and/or drugs. The method comprises generating a proteomic and/or a proteo-genomic profile of a patient suffering from Acute Myeloid Leukemia ("AML") and determining e.g., if said patient belongs to a high-risk or a low-risk group.

Inventors:
LEHTIÖ JANNE (SE)
WESTERLUND MATTIAS (SE)
Application Number:
PCT/EP2022/086454
Publication Date:
June 22, 2023
Filing Date:
December 16, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
LEHTIOE JANNE (SE)
International Classes:
G01N33/574; G01N33/68; G16B25/00; G16H50/00
Domestic Patent References:
WO2005103719A22005-11-03
Foreign References:
US20140199273A12014-07-17
Other References:
NICOLAS E ET AL: "Expression of S100A8 in leukemic cells predicts poor survival in de novo AML patients", LEUKEMIA, NATURE PUBLISHING GROUP UK, LONDON, vol. 25, no. 1, 12 November 2010 (2010-11-12), pages 57 - 65, XP037786205, ISSN: 0887-6924, [retrieved on 20101112], DOI: 10.1038/LEU.2010.251
WANG M ET AL: "Validation of risk stratification models in acute myeloid leukemia using sequencing-based molecular profiling", LEUKEMIA, NATURE PUBLISHING GROUP UK, LONDON, vol. 31, no. 10, 7 February 2017 (2017-02-07), pages 2029 - 2036, XP037653642, ISSN: 0887-6924, [retrieved on 20170207], DOI: 10.1038/LEU.2017.48
HARTMUT DÖHNER ET AL: "Diagnosis and management of AML in adults: 2017 ELN recommendations from an international expert panel", BLOOD, vol. 129, no. 4, 26 January 2017 (2017-01-26), US, pages 424 - 447, XP055754595, ISSN: 0006-4971, DOI: 10.1182/blood-2016-08-733196
SCHAAB C ET AL: "Global phosphoproteome analysis of human bone marrow reveals predictive phosphorylation markers for the treatment of acute myeloid leukemia with quizartinib", LEUKEMIA, vol. 28, no. 3, 19 November 2013 (2013-11-19), London, pages 716 - 719, XP055929852, ISSN: 0887-6924, DOI: 10.1038/leu.2013.347
S M KORNBLAU ET AL: "Functional proteomic profiling ofAMLpredicts response and survival", BLOOD, vol. 113, no. 1, 7 October 2008 (2008-10-07), pages 154 - 164, XP055381786, DOI: 10.1182/blood-2007-10-119438
ALBITAR M ET AL: "Proteomics-based prediction of clinical response in acute myeloid leukemia", EXPERIMENTAL HEMATALOGY, ELSEVIER INC, US, vol. 37, no. 7, 1 July 2009 (2009-07-01), pages 784 - 790, XP026266323, ISSN: 0301-472X, [retrieved on 20090505], DOI: 10.1016/J.EXPHEM.2009.03.011
JU BAI ET AL: "Potential biomarkers for adult acute myeloid leukemia minimal residual disease assessment searched by serum peptidome profiling", PROTEOME SCIENCE, BIOMED CENTRAL, LONDON, GB, vol. 11, no. 1, 3 August 2013 (2013-08-03), pages 39, XP021160051, ISSN: 1477-5956, DOI: 10.1186/1477-5956-11-39
AASEBØ ELISE ET AL: "Global Cell Proteome Profiling, Phospho-signaling and Quantitative Proteomics for Identification of New Biomarkers in Acute Myeloid Leukemia Patients", vol. 17, 1 January 2016 (2016-01-01), pages 52 - 70, XP055926308, Retrieved from the Internet [retrieved on 20220614]
ILYAS, A.M. ET AL.: "Next generation sequencing of acute myeloid leukemia: influencing prognosis", BMC GENOMICS, vol. 16, 2015, pages S5, XP021208945, DOI: 10.1186/1471-2164-16-S1-S5
LEY, T.J. ET AL.: "Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia", N ENGL J MED, vol. 368, no. 22, 2013, pages 2059 - 74, XP055101067, DOI: 10.1056/NEJMoa1301689
PAPAEMMANUIL, E.H. DOHNERP.J. CAMPBELL: "Genomic Classification in Acute Myeloid Leukemia", N ENGL J MED, vol. 375, no. 9, 2016, pages 900 - 1
HARTMUT DOHNER ET AL.: "Diagnosis and management of AML in adults: 2017 ELN recommendations from an international expert panel", BLOOD, vol. 129, no. 4, 26 January 2017 (2017-01-26), XP055754595, DOI: 10.1182/blood-2016-08-733196
YAFFE, M.B.: "Why geneticists stole cancer research even though cancer is primarily a signalling disease", SCI SIGNAL, vol. 12, no. 565, 2019
AASEBO, E. ET AL.: "Global Cell Proteome Profiling, Phospho-signaling and Quantitative Proteomics for Identification of New Biomarkers in Acute Myeloid Leukemia Patients", CURRENT PHARMACEUTICAL BIOTECHNOLOGY, vol. 17, no. 1, 2016, pages 52 - 70
FLORES-MORALES, A. ET AL.: "Proteogenomic Characterization of Patient-Derived Xenografts Highlights the Role of REST in Neuroendocrine Differentiation of Castration-Resistant Prostate Cancer", CLIN CANCER RES, vol. 25, no. 2, 2019, pages 595 - 608, XP055929742, DOI: 10.1158/1078-0432.CCR-18-0729
MERTINS, P. ET AL.: "Proteogenomics connects somatic mutations to signalling in breast cancer", NATURE, vol. 534, no. 7605, 2016, pages 55 - 62, XP055929753, DOI: 10.1038/nature18003
ARABI, A. ET AL.: "Proteomic screen reveals Fbw7 as a modulator of the NF- B pathway", NATURE COMMUNICATIONS, vol. 3, no. 976, 2012
WANG, M. ET AL.: "Validation of risk stratification models in acute myeloid leukemia using sequencing-based molecular profiling", LEUKEMIA, vol. 31, no. 10, 2017, pages 2029 - 2036, XP037653642, DOI: 10.1038/leu.2017.48
MOGGRIDGE, S. ET AL.: "Extending the Compatibility of the SP3 Paramagnetic Bead Processing Approach for Proteomics", J PROTEOME RES, vol. 17, no. 4, 2018, pages 1730 - 1740, XP093020679, DOI: 10.1021/acs.jproteome.7b00913
HUGHES, C.S. ET AL.: "Ultra sensitive proteome analysis using paramagnetic bead technology", MOL SYST BIOL, vol. 10, 2014, pages 757
"Immunoassays for the 80's", 1981
MOGGRIDGE, S. ET AL., J PROTEOME RES, vol. 17, no. 4, 2018, pages 1730 - 1740
HUGHES, C.S. ET AL., MOL SYST BIOL, vol. 10, 2014, pages 757
BRANCA, R.M. ET AL., NAT METHODS, vol. 11, no. 1, 2014, pages 59 - 62
JOHANSSON, H. ET AL.: "Proteogenomics and Hi-C reveal transcriptional dysregulation in high hyperdiploid childhood acute lymphoblastic leukemia", NAT COMMUN, vol. 10, no. 1, 2019, pages 1519
GUTIERREZ, N.C. ET AL.: "Gene expression profile reveals deregulation of genes with relevant functions in the different subclasses of acute myeloid leukemia", LEUKEMIA, vol. 19, no. 3, 2005, pages 402 - 9, XP037781000, DOI: 10.1038/sj.leu.2403625
JOHANSSON, H. ET AL.: "Breast cancer quantitative proteome and proteogenomic landscape", NAT COMMUN, 2019
ZHANG, H. ET AL.: "Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer", CELL, vol. 166, no. 3, 2016, pages 755 - 765, XP029667812, DOI: 10.1016/j.cell.2016.05.069
BENNETT, J.M. ET AL.: "Proposals for the classification of the acute leukaemias. French-American-British (FAB) co-operative group", BR J HAEMATOL, vol. 33, no. 4, 1976, pages 451 - 8
KRAUSE, D.S. ET AL.: "CD34: structure, biology, and clinical utility", BLOOD, vol. 87, no. 1, 1996, pages 1 - 13
WUCHTER, C. ET AL.: "Impact of CD133 (AC133) and CD90 expression analysis for acute leukemia immunophenotyping", HAEMATOLOGICA, vol. 86, no. 2, 2001, pages 154 - 61
VAN GALEN, P. ET AL.: "Single-Cell RNA-Seq Reveals AML Hierarchies Relevant to Disease Progression and Immunity", CELL, vol. 176, no. 6, 2019, pages 1265 - 1281
ZHANG, Y. ET AL.: "Low Platelet Counts at Diagnosis Predict Better Survival for Patients with Intermediate-Risk Acute Myeloid Leukemia", ACTA HAEMATOL, vol. 143, no. 1, 2020, pages 9 - 18
DOHNER HESTEY EGRIMWADE D ET AL., BLOOD, vol. 129, no. 4, 2017, pages 424 - 447
NICOLAS, E.: "Expression of S100A8 in leukemic cells predicts poor survivalin de novo AML patients", LEUKEMIA, vol. 25, 2011, pages 57 - 65
NICOLAS, E.: "Expression of S100A8 in leukemic cells predicts poor survival in de novo AML patients", LEUKEMIA, vol. 25, 2011, pages 57 - 65, XP037786205, DOI: 10.1038/leu.2010.251
Attorney, Agent or Firm:
AERA A/S (DK)
Download PDF:
Claims:
CLAIMS

1. A method for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling of said patient, compared to an AML patient cohort, wherein the method comprises a. providing a blood and/or tissue sample from said patient, b. processing said sample, wherein the processing comprises extracting expressed proteins from said sample, c. analyzing the extracted proteins, including quantitatively determining their expression level, d. determining the median expression level of at least 5 proteins of said extracted proteins, selected from the list of proteins with SEQ ID NOs: 1-17500 listed in table 1 for the intermediate samples of the cohort, e. quantitatively determining the proteomic risk profile of said patient by identifying abnormal protein expression levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500 listed in table 1 , wherein the protein expression level is considered abnormal when i. the protein expression level of a protein with a SEQ ID NO listed in Table

2 is downregulated in such a way that the protein expression level of said protein is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, or ii. the protein expression level of a protein with a SEQ ID NO listed in Table

3 is upregulated in such a way that the protein expression level of said protein is at least 5% above the median expression level of said protein for the intermediate samples of the cohort, and f. wherein said AML patient is stratified as belonging to a high-risk group when abnormal protein levels are detected in step e).

2. A method for stratifying an AML patient into a high-risk or a low-risk group according to claim 1 , wherein the median expression level of the at least 5, such as at least 10, 50 or 100 proteins selected from the list of proteins with SEQ ID NOs: 1-17500 listed in table 1 is quantitatively determined. 3. A method for stratifying an AML patient into a high-risk or a low-risk group according to any of the preceding claims, wherein said patient is classified as belonging to a high-risk group, when said patient’s proteomic profile shows abnormal protein levels in at least 50% of the at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500.

4. A method for stratifying an AML patient into a high-risk or a low-risk group according to any of the preceding claims, wherein one or more clustering algorithm(s) is/are used to cluster patients into at least a high-risk and a low-risk group based on abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500.

5. A method for stratifying an AML patient into a high-risk or a low-risk group according to claim 4, wherein the clustering algorithm(s) is/are one or more algorithms selected from the group consisting of Random Forest, k-Top Scoring Pairs, k-Nearest Neighbor, Support Vector Machines, oPLS-DA, PLS-DA, t-SNE, UMAP, PCA, lasso, Decision Trees, Naive Bayes and Logistic Regression.

6. A method for stratifying an AML patient into a high-risk or a low-risk group according to any of the preceding claims, wherein less than 1000, 500, 100 or 50 protein sequences of any of SEQ ID NO: 1-17500, are used for stratifying patients into at least a high-risk and a low- risk group.

7. A method for stratifying an AML patient into a high-risk or a low-risk group according to any of the preceding claims, wherein a. a patient with a proteomic profile, where at least 50 % of the proteins selected from the list of proteins with SEQ ID NO: 1-17500 have an abnormal level, is considered a high-risk patient and b. a patient with a proteomic profile, where less than 50 % of the proteins selected from the list of proteins with SEQ ID NO: 1-17500, have an abnormal level, is considered a low-risk patient.

8. A method for stratifying an AML patient into a high-risk or a low-risk group according to any of the preceding claims, wherein the proteomic risk profile of said patient is done by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with a SEQ ID NO provided in Table 4 or in Table 5. A method for stratifying an AML patient into a high-risk or a low-risk group according to any of claims 1-7, wherein the proteomic risk profile of said patient is done by identifying abnormal protein levels of at least 20 proteins from any one or more of tables 15-19. A method for stratifying an AML patient into a high-risk or a low-risk group according to any of the preceding claims, wherein the method further comprises, extracting chromosomal DNA from said sample in step b., analyzing the extracted DNA and determining genetic risk factors based on genetic variants in one or more genes listed in a genetic standard. A method according to claim 10, wherein the genetic standard is selected from the group consisting of ELN2017 and NCCN guidelines. A method for stratifying an AML patient into a high-risk or a low-risk group according to any one of claims 10-11 , wherein genetic risk factors are determined based on genetic variants in at least 5, 10, 50 or 100 genes listed in a genetic standard and providing a patient stratification into groups based on their clinical and molecular profile. A method for stratifying an AML patient into a high-risk or a low-risk group according to any one of claims 10-12, comprising providing a patient stratification into groups based on their overall, complete, combined, clinical and molecular profile. A method for stratifying an AML patient into a high-risk or a low-risk group according to any of the preceding claims, wherein the method further comprises stratifying AML patients belonging to the following patient groups age, gender, pre-treatment, other complications, co-administrations, AML-aetiology, ECOG status. A method for predicting response to one or more apoptosis modulating drug(s), such as to a drug selected from the group consisting of venetoclax, navitoclax and triciribine, or to chemotherapy, in an AML patient based on a method according to any one of claims 1-14, wherein the patient is stratified as respondent when said patient is found to belong to the high-risk group. A method of predicting response to therapy in a patient with acute myeloid leukemia (AML) according to claim 15, comprising: a. isolating blood and/or tissue samples from said patient, b. processing said sample(s), wherein the processing comprises i. extracting expressed proteins from said sample, 128 c. analyzing the extracted proteins, d. determining the median expression level of said extracted proteins for the intermediate samples of the cohort, and e. quantitatively determining the proteomic risk profile of said patient by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein i. downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, ii. upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort,

Hi. proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and iv. proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

Description:
METHOD FOR DETERMINING AML PROGNOSIS

FIELD OF THE INVENTION

The present invention relates to a biomarkers and methods for stratifying an Acute Myeloid Leukemia (“AML”) patient into a group of predicted outcome in relation to determinants such as, but not limited to, overall survival, event free survival and/or reaching complete remission, as well as predicted response to treatment and/or drugs. The method comprises generating a proteomic profile to be used alone or together with a genomic profile of a patient suffering from Acute Myeloid Leukemia (“AML”) and determining e.g., if said patient belongs to a high-risk, or a low-risk group, and/or a high-risk, an intermediate-risk or a low-risk group.

BACKGROUND

Among patients with leukemia there can be a highly variable clinical course as reflected by varying survival times and resistance to therapy. Reliable individual prognostic tools are limited at present. However, advances in proteomic technologies provide new diagnostic and prognostic indicators for hematologic malignancies such as leukemia.

Acute myeloid leukemia (AML) is a genetically heterogeneous disease with frequent relapses and overall poor survival. Improving the understanding of molecular drivers and the genotypephenotype interplay in AML is needed to improve clinical decision making and prognostication.

Acute myeloid leukemia (AML) is an aggressive hematologic malignancy where immature blast cells arising from myeloid leukemic progenitors accumulate in the bone marrow and bloodstream. The incidence of AML increases with age with a median age of onset in Sweden of 71 years. It has been established that, like for many cancers, prevalence of AML is higher amongst males than females.

In the 1970’s, cytarabine coupled to an anthracycline was first introduced as a treatment for AML and this has remained the mainstay treatment since. In recent years, new, targeted therapies focused on specific mutations (e.g., IDH1/2 or FLT3 mutations), genetic aberrations or key pathways have been introduced or advanced to late clinical trials. Although these drugs generally can improve clinical outcome, the vast genetic and phenotypic heterogeneity of AML patients leading to highly varying therapeutic responses remains a major clinical challenge. Hence, it is paramount to apply comprehensive proteomic profiling to identify AML patients that can benefit from novel targeted therapies. Several large cohort studies have helped elucidate the complex genomic and mutational landscape of AML [Ilyas, A.M., et al., Next generation sequencing of acute myeloid leukemia: influencing prognosis. BMC Genomics, 2015. 16 Suppl 1(Suppl 1): p. S5., Ley, T.J., et al., Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med, 2013. 368(22): p. 2059-74.]. Recent publications have also described genomically defined subtypes of AML associated to chromosomal aberrations and mutations with clear links to patient survival [Papaemmanuil, E., H. Dohner, and P.J. Campbell, Genomic Classification in Acute Myeloid Leukemia. N Engl J Med, 2016. 375(9): p. 900-1 .]. A summarizing illustration can be seen in figure 1 [Hartmut Dohner et al., Diagnosis and management of AML in adults: 2017 ELN recommendations from an international expert panel, BLOOD, 26 JANUARY 2017 x VOLUME 129, NUMBER 4],

Cancer is manifested by a number of genetic aberrations that alter the cellular phenotype and contribute to carcinogenesis. While these aberrations contribute to the evolution of cancer; cancer is predominantly a signalling disease [Yaffe, M.B., Why geneticists stole cancer research even though cancer is primarily a signalling disease. Sci Signal, 2019. 12(565).]. Global proteomics is an excellent way to study how genetic aberrations influence the molecular phenotype in cancer on a systems level. However, this approach has been hindered by the lack of analytical depth in standard proteomics methods.

Techniques to analyse the proteome have historically lagged behind those of DNA and RNA sequencing in terms of scale and analytical depth. Therefore, there are large gaps in knowledge on how genomic aberrations influence the functional proteome, including mutations found in prevalent diseases like cancer. While AML has been well characterized by sequencing methods, the proteome of AML has only been investigated to a shallow extent [Aasebo, E., et al., Global Cell Proteome Profiling, Phospho-signaling and Quantitative Proteomics for Identification of New Biomarkers in Acute Myeloid Leukemia Patients. Current pharmaceutical biotechnology, 2016. 17(1 ): p. 52-70.].

While there have been a few large proteogenomic studies published recently, none so far have focused on AML [Flores-Morales, A., et al., Proteogenomic Characterization of Patient-Derived Xenografts Highlights the Role of REST in Neuroendocrine Differentiation of Castration-Resistant Prostate Cancer. Clin Cancer Res, 2019. 25(2): p. 595-608., Mertins, P., et al., Proteogenomics connects somatic mutations to signalling in breast cancer. Nature, 2016. 534(7605): p. 55-62.].

Recent development of in-depth proteomics by the current inventors has for the first time allowed reaching the same analytical depth in proteomics as in transcriptomics [Arabi, A., et al., Proteomic screen reveals Fbw7 as a modulator of the NF-KB pathway, Nature Communications, volume 3, Article number: 976 (2012)]. The current disclosure thus for the first time describes an in-depth proteomic characterization of a treatment naive AML patient cohort and applications of said analytical outcome to prognostically stratify individual AML patients as belonging to a either a high- risk, an intermediate-risk or a low-risk group or to a high-risk or a low-risk group.

SUMMARY

The present invention is based on the herein for the first time disclosed in-depth mass spectrometry-based proteomic analysis of a clinically and molecularly well defined, treatment naive, cohort of AML patients. It was found that altered expression levels of proteins involved in mRNA splicing correspond to shorter survival and a poor treatment response. It was determined that this is independent of currently used genetic risk stratifications and can on its own be used to improve prognostic estimates. It was shown that heterogenous genetic backgrounds contribute to this proteome phenotype and that it is related to epigenetic changes leading to altered splicing patterns which contribute to cancer progression. In addition to being a powerful tool on its own, the herein disclose proteomic analysis can also be used in combination with currently used genetic risk stratifications for an even more complex stratification of AML patients. Thus, the herein disclosed method relates to stratifying individual AML patients as belonging either to a high-risk or a low-risk group, or to a high-risk, an intermediate-risk or a low-risk group.

The present invention thus in its broadest aspect relates to a method for stratifying an AML patient into a high-risk or a low-risk group, or into a high-risk, an intermediate-risk or a low-risk group, in relation to overall survival and/or complete remission, based on proteomic profiling and/or on proteogenomic profiling, compared to an AML patient cohort of AML patients, wherein the method comprises, a. isolating a blood and/or tissue sample from said patient, b. processing said sample, wherein the processing comprises extracting expressed proteins from said sample, c. analyzing the extracted proteins, including quantitatively determining their expression level, d. determining the median expression level of at least 5 proteins of said extracted proteins, selected from the list of proteins with SEQ ID NOs: 1-17500 listed in table 1 for the intermediate samples of the cohort, e. quantitatively determining the proteomic risk profile of said patient by identifying abnormal protein expression levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500 listed in table 1 , wherein the protein expression level is considered abnormal when i. the protein expression level of a protein with a SEQ ID NO listed in Table

2 is downregulated in such a way that the protein expression level of said protein is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, or ii. the protein expression level of a protein with a SEQ ID NO listed in Table

3 is upregulated in such a way that the protein expression level of said protein is at least 5% above the median expression level of said protein for the intermediate samples of the cohort, and f. wherein said AML patient is stratified as belonging to a high-risk group when abnormal protein levels are detected in step e).

In one embodiment, the method of the present invention comprises that the median expression level of the at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500 listed in table 1 for the intermediate samples of the cohort is predetermined.

In one method for stratifying an AML patient into a high-risk or a low-risk group, according to the present invention, said patient is classified as belonging to a high-risk group, when said patient’s proteomic profile shows abnormal protein levels in at least 50% of the selected proteins.

In one method for stratifying an AML patient into a high-risk, an intermediate-risk or a low-risk group, according to the present invention, said patient is classified as belonging to a high-risk group, when said patient’s proteomic profile shows abnormal protein levels in at least 50% of the selected proteins and said patient is classified as belonging to a median-risk group, when said patient’s proteomic profile shows abnormal protein levels in at least 25% but less than 50% of the selected proteins and said patient is classified as belonging to a low-risk group, when said patient’s proteomic profile shows abnormal protein levels in less than 25% of the selected proteins.

In another method for stratifying an AML patient into a high-risk or a low-risk group according to the present invention, one or more clustering algorithm(s) is/are used to cluster patients into at least a high-risk and a low-risk group based on abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1 -17500.

The algorithm(s) used to cluster patients in a method according to the present invention is/are typically one or more algorithm(s) selected from the group consisting of Random Forest, k-Top Scoring Pairs, k-Nearest Neighbor, Support Vector Machines, oPLS-DA, PLS-DA, t-SNE, UMAP, PCA, lasso, Decision Trees, Naive Bayes and Logistic Regression. Said one or more algorithm(s) can be used to cluster proteins of SEQ ID NO: 1-17500 into sub-groups, wherein one or more of the sub-groups is/are used to stratify patients into at least a high-risk and a low-risk group.

In one embodiment, a method for stratifying an AML patient into a high-risk or a low-risk group or into a high-risk, an intermediate-risk or a low-risk group according to the present invention comprises the use of less than 1000, 500, 100 or 50 protein sequences of any of SEQ ID NO: 1- 17500.

In one method of the present invention, the proteomic risk profile of said patient is determined by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with a SEQ ID NO as provided in Table 4 and/or in Table 5.

In yet another aspect of the present invention, the method of the present invention further comprises extracting chromosomal DNA from said sample in step b. and analyzing the extracted DNA and determining genetic risk factors based on genetic variants in one or more genes listed in a genetic standard and providing a patient stratification into groups based on their overall, complete, combined, clinical and/or molecular profile. The standard can typically be selected from the group consisting of ELN2017 and NCCN Guidelines.

Typically, the genetic risk factors are determined based on genetic variants in at least 2, 4 or 5, such as in at least 10, 50 or 100 genes listed in a genetic standard and providing a patient stratification into groups based on their clinical and molecular profile. Thus, providing an additional aspect to the method of the present invention for a patient stratification into groups based on their overall, complete, combined, clinical and molecular profile.

Thus, in one method for stratifying an AML patient into a high-risk, an intermediate-risk or a low- risk group according to the present invention said patient is classified as belonging to a high-risk group, when said patient’s proteomic profile shows abnormal protein levels in at least 50% of the selected proteins and said patient is classified as high-risk by determining genetic risk factors listed in a genetic standard. In one method for stratifying an AML patient into a high-risk, an intermediate- risk or a low-risk group according to the present invention, said patient is classified as belonging to a low-risk group, when said patient’s proteomic profile shows abnormal protein levels in less than 50% of the selected proteins and said patient is classified as low-risk by determining genetic risk factors listed in a genetic standard.

In one method for stratifying an AML patient into a high-risk, an intermediate-risk or a low-risk group according to the present invention said patient is classified as belonging to a low-risk group, when said patient’s proteomic profile shows abnormal protein levels in less than 25% of the selected proteins and said patient is classified as intermediate by determining genetic risk factors listed in a genetic standard. In one method for stratifying an AML patient into a high-risk, an intermediate-risk or a low-risk group according to the present invention said patient is classified as belonging to an intermediate-risk group, when said patient’s proteomic profile shows abnormal protein levels in at least 25% and less than 50% of the selected proteins and said patient is classified as intermediate by determining genetic risk factors listed in a genetic standard.

A method for stratifying an AML patient according to the present invention comprises analyzing proteins and/or DNA extracted from blood and/or tissue samples from said patient e.g., selected from the group consisting of skin, mucosa, bone marrow, peripheral blood, isolated cells from blood and tumour cells. The currently described method thus enables the stratification of a patient by taking a single sample, or by taking a series and/or variety of samples from said same patient.

The extracted DNA and/or proteins are typically analysed using DNA hybridization, DNA sequencing, quantitative proteomic mass spectrometry, targeted mass spectrometry, ELISA, proximity ligation assay, proximity extension assay, aptamer-based assays, Analytical protein microarrays, reverse phase protein arrays, 2D-PAGE, Data-independent acquisition (DIA) mass spectrometry and/or Data-dependent acquisition (DDA) mass spectrometry.

The method of the present invention is particularly useful for stratifying a single AML patient as belonging to a high-risk, an intermediate-risk or a low-risk group according to the present invention wherein a median expression level of said extracted proteins has been predetermined. Thus, the single AML patient can routinely be checked against a predetermined standard provided to the clinician. Said predetermined standard can include both proteomic marker values as well as information from prevailing genetic standards.

Thus, a method for stratifying an AML patient according to the present invention also comprises providing a patient stratification into groups based on their overall, complete, combined, clinical and molecular profile. The method of the present invention is in one embodiment used for predicting response to one or more apoptosis modulating drug(s) in an AML patient, wherein the patient is stratified as respondent when said patient is found to belong to the high-risk group or intermediate-risk group. An exemplary drug is selected from the group consisting of venetoclax, navitoclax and triciribine. The method of the present invention is in another embodiment used for predicting response to chemotheraphy in an AML patient, wherein the patient is stratified as respondent when said patient is found to belong to the low-risk group or the intermediate-risk group.

The method of the present invention is in one embodiment used for predicting response to therapy in a patient with acute myeloid leukemia (AML) comprising performing mass spectrometry on a plasma sample from a patient to generate a protein spectra comprising protein peaks, identifying a protein peak or group of protein peaks in the protein spectra corresponding to one or more proteins selected from the list of proteins with SEQ ID NOs: 1-17500, and predicting the patient's response to therapy based on the identification of one or more of the protein peaks.

BRIEF DESCRIPTION OF THE FIGURES

Figure 1

Hartmut Ddhner et al., Diagnosis and management of AML in adults: 2017 ELN recommendations from an international expert panel, BLOOD, 26 JANUARY 2017 x VOLUME 129, NUMBER 4

Figure 2

Patient outcomes by length of classifying list. Kaplan-Meier curves for patients stratified by the different lists into two risk levels; high and low risk using DIA data for the discovery cohort. Logrank tests were performed to evaluate significance levels and p-values are indicated in the figure. (A) Long List, (B) Medium list (C) Short list.

Figure 3

Patient outcomes by length of classifying list. Kaplan-Meier curves for patients stratified by the different lists into three risk levels; high, intermediate and low risk using DIA data for the discovery cohort. Log-rank tests were performed to evaluate significance levels and p-values are indicated in the figure. (A) Long List, (B) Medium list (C) Short list. Figure 4

Patient outcomes by risk assessment method for 10 prospectively collected patients. Kaplan-Meier curves for patients stratified by the different risk assessment methods. Log-rank tests were performed to evaluate significance levels and p-values are indicated in the figure. (A) Patients stratified by ELN2017, (B) Same as (A) but Adverse and Intermediate was combined into a single group. (C) Patients stratified by proteomic risk assessment. (D) Patients stratified by Combined score 1 (see table 5). Patients with a score of 5 or 6 were grouped together and patients with a combined score of 2, 3 or 4 were grouped together. (E) Patients stratified by Combined score 2 (see table 11). Patients with a score of 5 or 6 were grouped together and patients with a combined score of 3 or 4 were grouped together.

Figure 5

Proteomics analysis of AML. (A) Overview of the analysis workflow. (B) Relationship between peptide spectral matches and proteins identified. (C) Hieararchical clustering of patients by proteomic profiles, selected mutations and clinical features are shown. (D) Kaplan-Meier curves for patients belonging to cluster 1-4. (E) Heatmap of p-values for enrichment (Fisher test) for enrichment of genetic aberrations and mutations across proteomics clusters. (F) Sankayplot depicting how patients from different proteomic clusters align to the main genetic classifications from Papaemmanuil et al. (G) Same as E but to RNAseq classifications from TCGA(NEJM, 2013https://www.ncbi. nlm.nih.gov/pmc/articles/PMC3767041/). (H) Enrichment for proteomics clusters for 6 separate differentiation states, color denotes normalized enrichment score while adjusted p-values are visualized by the size of the circles. (I) UMAP of most variable proteins in the dataset, 8 protein clusters are highlighted and their enrichments and median levels in the 9 patient clusters are highlighted in the heatmap.

Figure 6

Spliceosome levels and clinical features in AML. (A) Heatmap of spliceosomal proteins. Three separate clusters were identified high levels (n=45), intermediate levels (n=45 and low levels (n=28) (B) Kaplan-Meier curves for patients belonging to the three groups. (C) Distribution of protein-mRNA correlations for individual genes for spliceosomal proteins and all protein-mRNA pairs in the dataset (D) Enrichment for the three groups for 6 separate differentiation states, color denotes normalized enrichment score while -Iog10(adjusted p-values) are visualized by the size of the circles. (E) Distribution of ELN2017 classifications in the three groups (F) Complete remission rates for the three groups (G) Survival curves for groups constructed by combining ELN2017 classifications with spliceosomal levels (Best: patients with lowest risk in at least one classification (Adverse, low levels) and not classified as highest risk in the other (Adverse, high levels); Intermediate: Intermediate in both, or high risk in one and low in the other; Worst: highest risk class in at least one classification and not classified as lowest risk in the other) (H) Waterfall plot of common AML mutations in the three groups. (I) Enrichment analysis for comparisons between the individual groups, color denotes normalized enrichment score while -loglO(q-values) are visualized by the size of the circles.

Figure 7

Patient outcomes by expression of S100A8 protein biomarker. Kaplan-Meier curves for patients stratified by the different lists into high or low levels of expressed S100A8 using DIA data from the HiRIEF-data (cohort size n=118). Log-rank tests were performed to evaluate significance levels and p-values are indicated in the figure. (A) High (n=59) and low (n=59) expression based on median expression level (B) The quartile of patients with either the highest (n=30) or lowest (n=30) expression based on median expression.

DEFINITIONS AND ABBREVIATIONS

In the present invention “AML” relates to “acute myeloid leukemia”.

MS, Mass Spectrometry;

LC-MS/MS, liquid chromatography- tandem mass spectrometry;

LC-SRM-MS, liquid chromatography selected reaction monitoring mass spectrometry;

OGS, N-octyl glucoside;

MMTS, Methyl methanethiosulfonate;

TCEP, Tris-(2-carboxyethyl)-phosphine;

FA, Formic acid;

FAB, French-American-British (classification)

TPCK treated trypsin, Trypsin treated with L-(tosylamido-2-phenyl) ethyl chloromethyl ketone; beta -Gal, beta -galactosidase;

CV percent. Coefficient of variation;

LC-SRM-MS, Liquid chromatography-selected reaction monitoring mass spectrometry;

SIL peptide, Stable Isotope-Labeled Peptide; DIA-MS, Data Independent acquisition mass spectrometry;

DDA-MS, Data Dependent acquisition mass spectrometry;

PRM, Parallell reaction monitoring

FDR, false discovery rate

OS, overall survival

SPE, solid phase extraction;

IQ panel, internal quality panel.

The term "proteome" refers to all the proteins expressed by a genome, and thus proteomics involves the identification of proteins in the body and the determination of their role in physiological and pathophysiological functions. The ~30,000 genes defined by the Human Genome Project translate into 300,000 to 1 million proteins when alternate splicing and post-translational modifications are considered. While a genome remains unchanged to a large extent, the proteins in any particular cell change dramatically as genes are turned on and off in response to their environment.

As a reflection of the dynamic nature of the proteome, some researchers prefer to use the term "functional proteome" to describe all the proteins produced by a specific cell in a single time frame. Ultimately, it is believed that through proteomics, new disease markers and drug targets can be identified.

As used herein, the terms “centralized control system” or “centralized control network” refer to information and equipment management systems (e.g., a computer processor and computer memory) operably linked to multiple devices or apparatus (e.g., automated sample handling devices and separating apparatus). In preferred embodiments, the centralized control network is configured to control the operations of the apparatus and/or device linked to the network. For example, in some embodiments, the centralized control network controls the operation of multiple chromatography apparatuses, the transfer of sample between the apparatuses, and the analysis and presentation of data.

As used herein, the terms “computer memory” and “computer memory device” refer to any storage media readable by a computer processor. Examples of computer memory include, but are not limited to, RAM, ROM, computer chips, digital video disc (DVDs), compact discs (CDs), hard disk drives (HDD), and magnetic tape.

As used herein, the term “computer readable medium” refers to any device or system for storing and providing information (e.g., data and instructions) to a computer processor. Examples of computer readable media include, but are not limited to, DVDs, CDs, hard disk drives, magnetic tape and servers for streaming media over networks.

As used herein, the terms “processor” and “central processing unit” or “CPU” are used interchangeably and refers to a device that is able to read a program from a computer memory (e.g., ROM or other computer memory) and perform a set of steps according to the program.

It is to be understood that this invention is not limited to the particular methodology, protocols, and reagents described, as such may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present invention, which will be limited only by the appended claims.

As used herein the singular forms “a”, “and”, and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a compound” includes a plurality of such compounds and reference to “the agent” includes reference to one or more agents and equivalents thereof known to those skilled in the art, and so forth. All technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this invention belongs unless clearly indicated otherwise.

This invention is not limited in its application to the details of construction and the arrangement of components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of "including," "comprising," or "having," "containing," "involving," and variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.

DETAILED DESCRIPTION

The invention relates to the application of recently developed proteomic methods for identifying expression levels of AML-related proteins selected to correlate with AML progression and predicting disease outcome useful for diagnosis and prognosis, as well as to reagents, products and kits for performing said identification.

Defined lists of proteins specifically expressed in AML patients are presented herein. The AML- specific expression profiles of the proteins, disclosed herein for the first time, are associated with various properties of the disease AML. The invention therefore relates to the use of these protein panels or signature sets of such proteins for stratification, detection, diagnosis and treatment of AML, as well as for monitoring therapeutic progress in an AML patient. Thus, these proteins, both individually and as sets or "signatures" can be used to stratify the individual AML patient at least to belong to a high-risk or low-risk group of AML patients, thereby facilitating improved, optimized and/or individualized treatment of said AML patient displaying a specific expression profile of proteins identified herein for the first time as associated with the disease AML.

The disclosure presents applied in-depth proteomics to characterize the biology of AML. The inventors observed that certain genetically defined subtypes form well defined clusters or subclusters on the proteome level (e.g., inv(16), CEBPA) while other mutations and aberrations have little or varying impacts on protein levels. It could be observed that proteome profiles and clusters had clear relationships to survival. For the first time, a clear relationship between protein levels of particular spliceosome components and outcome was identified which could not be readily seen at the transcript level. Patients with high spliceosome levels demonstrated poorer survival, and this was independent of genetic risk factors and clinically used risk stratification models including ELN2017.

By combining the proteomic and genetic level information an improvement on existing models for patient stratification is herein provided, especially regarding short to mid-term outcomes. This highlights the clinical benefit of mass spectrometry screening of diagnostic patient samples. The invention presented herein can thus serve as an important tool for identifying patients with elevated risk and to adapt therapeutic strategies accordingly.

In the current context, the term “stratification” is used to describe the distribution of one or more patient(s) into subgroups, for instance a patient may be stratified by stage of AML, age, gender, response to certain drugs etc.

Proteogenomic analysis of a treatment naive Acute Myeloid Leukemia - study cohort

The experimental part shows a proteogenomic analysis of a treatment naive Acute Myeloid Leukemia - study cohort selected from the larger Clinseq cohort and consisted of 118 Swedish, treatment naive patients who all received first-line induction therapy for AML. Patients were characterized by mass-spectrometry based proteomics as well as RNA-sequencing, DNA-panel sequencing, Epic array and ex-vivo drug screening at baseline (Fig 5A). Longitudinal follow-up, survival outcome and clinical characteristics as well as treatment response was recorded for all patients (not shown). Mutational frequencies of commonly mutated genes were comparable to previous studies.

In-depth HiRIEF-LC-MS/MS was applied to the samples. The workflow identified and quantified more than 12000 protein products (gene centric, FDR<1%) across the entire cohort, with a full overlap of 8632 proteins (Fig 5B). Using hierarchical clustering 9 clusters were identified in the cohort (Fig 5C). Survival outcomes differed between the distinct clusters (Fig 5D), with cluster 4 and 1 exhibiting poorer survival and patients in cluster 2 and 7 exhibiting longer survival. The experimental part also discloses observed treatment response, measured as whether patients achieved complete remission following induction therapy, and genetic risk classification (ELN2017) varied considerably between the clusters (not shown).

To investigate the relationship between mutational patterns and proteome phenotypes the panel sequencing data was utilized to investigate if there were enrichment in the clusters for specific genetic aberrations (Fig 5E).

Conversely, it was also investigated which mutations and genetic aberrations had the largest impact on protein levels and it was found that NPM1 , FLT3 ITD led to the most prominent changes on the proteome level closely followed by transcription related changes due to biallelic CEBPA mutation and inv(16) (not shown). For 117 of the patients matching RNAseq and proteomics data was obtained and it was compared the correlation of the overlap (n=8971 ) on the gene symbol level. A median spearman correlation of 0.36 was found, which is within the range of what has been reported in previous studies of solid tumors, notably it is higher and a greater proportion of significant correlations was found (66%, adjusted P-value < 0.01) compared to what was found previously in acute lymphoblastic leukemia. Correlations of complex members was also significantly higher on the protein level compared to the mRNA level in line with previous reports.

It was also noted that the proteome level clusters also differed in terms of FAB-classification with stem-like or granulocytic differentiated subtypes (M0-M3) being more prevalent in cluster 1 ,4 and 6 while monocytic subtypes (M4-M5) were dominant in cluster 2 (not shown). In line with this, monocytic markers (CD 14, and CD36) were elevated in samples from patients from cluster 2 and to a degree also in cluster 7 (inv(16)) (not shown). Patients in cluster 4 also exhibited increased levels of sternness markers CD34 and CD133. Additionally, it was also explored how signature gene sets for AML subtype differentiation derived from single cell RNA seq differed between the presently presented clusters (Fig 5H).

To more broadly characterize the proteomic phenotypes, the individual clusters were compared, and enrichment analysis was employed to investigate up and down regulated pathways (Fig 5I - enrichment). It was observed that monocyte related genes as well as genes associated to inflammation, TNF-signaling and IL10-signaling were upregulated in cluster 2. In clusters 3, 5 and 6 increased levels of integrin signalling, and integrin interaction related proteins was found. The inv(16) related cluster 7 exhibited decreased oxidative phosphorylation as well as increased chromosome maintenance. Clusters 1 and 4 exhibited increased levels of proteins related to transcription, mRNA processing and splicing. Specifically, spliceosome proteins were generally upregulated in these two clusters (Fig 6A).

The present invention relates to the surprising finding that increased spliceosome levels define a poor outcome population in AML patients.

The patients were stratified into three groups based on clustering of spliceosomal proteins (Fig 6A) and investigated whether there is an association between outcome and spliceosome proteins. It was found that higher spliceosome levels were significantly associated with shorter overall survival (Fig 6B) and that lower levels were associated to a higher rate of complete remission (Chi-square High vs Low p-value = 0.02; Intermediate vs Low p-value = 0.05). Spliceosomal proteins were overall well correlated (not shown) while transcripts exhibited a lower level of correlation (Wilcoxon pval <1E-27). Additionally, the mRNA-protein correlations for the spliceosomal gene products were overall poor and the same phenotype could not be observed using the RNA sequencing data (fig 6C). No gender differences were observed between the groups, but patients with low levels of spliceosome proteins tended to be older.

There were no obvious differences in FAB classification between the three groups but using the scRNAseq derived gene sets described above it was observed that low spliceosome levels were more associated to a promono- or monocyte-like subtype and that the patients with high levels of spliceosome proteins also had increased levels of proteins associated to a progenitor-like state (Fig 6D).

As can be seen in the experimental part, there is no found association (x2-test, p = 0.62), between ELN2017 classification and spliceosome levels indicating that they are independent risk factors (Fig. 6E-F). Univariate cox-regression confirmed that both Adverse ELN-classification as well as high spliceosome levels were related to survival and both metrics retained their significant in a multivariable model indicating independent contributions (not shown). Indeed, combining the two metrics lead to improved stratification of the patient cohort in relation to overall survival (Fig 6G). To better elucidate the genetic contribution to the high spliceosome phenotype it was further investigated which mutations were found more frequently in the high and low spliceosome groups respectively (Fig 6H).

As can be seen in the experimental section, it was found that in addition to spliceosome and RNA processing genes being upregulated, genes involved in chromatin organization, sumoylation and DNA repair were also upregulated. Gene sets related to fatty acid metabolism, GPCR and chemokine signalling, hematopoietic lineage markers and cell adhesion/integrins/focal adhesion were comparatively downregulated (Fig 6I). A method for stratifying

The surprising findings documented in the experimental section for the first time allow stratifying an AML patient by generating a proteomic profile of a patient and comparing said proteomic profile with the herein disclosed AML-specific expression profiles of the proteins selected from the list of proteins with SEQ ID NOs: 1-17500 (see sequence listing).

The stratification is based on a predicted outcome in relation to determinants such as but not limited to overall survival after 1 (one) year, event free survival, overall survival and/or reaching complete remission.

Thus, in one embodiment the AML patient stratification is done on the basis of predicted survival after 1 year. In another embodiment, the AML patient stratification is done on the basis of overall survival, event free survival and/or complete remission.

A method is herein disclosed for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling, compared to an AML patient cohort of AML patients, wherein the method comprises, a. isolating blood and/or tissue samples from said patient, b. processing said sample(s), wherein the processing comprises i. extracting expressed proteins from said sample, c. analyzing the extracted proteins, d. determining the median expression level of said extracted proteins for the intermediate samples of the cohort, and e. quantitatively determining the proteomic risk profile of said patient by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein i. downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, ii. upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort, iii. proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and iv. proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

In addition, a method is herein disclosed for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling, compared to an AML patient cohort of AML patients, wherein the method comprises, a. providing blood and/or tissue samples from said patient, b. processing said sample(s), wherein the processing comprises i. extracting expressed proteins from said sample, c. analyzing the extracted proteins, d. determining the median expression level of said extracted proteins for the intermediate samples of the cohort, and e. quantitatively determining the proteomic risk profile of said patient by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein i. downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, ii. upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort, iii. proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and iv. proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

In embodiments, the method for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling, compared to an AML patient cohort of AML patients, wherein the method comprises providing a blood and/or tissue samples from said patient.

A method is herein disclosed for stratifying an AML patient into a high-risk, intermediate risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling, compared to an AML patient cohort of AML patients, wherein the method comprises, a. isolating blood and/or tissue samples from said patient, b. processing said sample(s), wherein the processing comprises i. extracting expressed proteins from said sample, c. analyzing the extracted proteins, d. determining the median expression level of said extracted proteins for the intermediate samples of the cohort, and e. quantitatively determining the proteomic risk profile of said patient by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein i. downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, ii. upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort,

Hi. proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and iv. proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

In addition, a method is herein disclosed for stratifying an AML patient into a high-risk, intermediate risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling, compared to an AML patient cohort of AML patients, wherein the method comprises, a. providing blood and/or tissue samples from said patient, b. processing said sample(s), wherein the processing comprises i. extracting expressed proteins from said sample, c. analyzing the extracted proteins, d. determining the median expression level of said extracted proteins for the intermediate samples of the cohort, and e. quantitatively determining the proteomic risk profile of said patient by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein i. downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, ii. upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort,

Hi. proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and iv. proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

In embodiments, the method for stratifying an AML patient into a high-risk, intermediate risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling, compared to an AML patient cohort of AML patients, wherein the method comprises providing a blood and/or tissue samples from said patient.

In further embodiments, the invention relates to a method of predicting response to therapy in a patient with acute myeloid leukemia (AML) comprising: a. isolating blood and/or tissue samples from said patient, b. processing said sample(s), wherein the processing comprises i. extracting expressed proteins from said sample, c. analyzing the extracted proteins, d. determining the median expression level of said extracted proteins for the intermediate samples of the cohort, and e. quantitatively determining the proteomic risk profile of said patient by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein i. downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, ii. upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort,

Hi. proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and iv. proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

In addition, a method is herein disclosed for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling, compared to an AML patient cohort of AML patients, wherein the method comprises: a. providing blood and/or tissue samples from said patient, b. processing said sample(s), wherein the processing comprises i. extracting expressed proteins from said sample, c. analyzing the extracted proteins, d. determining the median expression level of said extracted proteins for the intermediate samples of the cohort, and e. quantitatively determining the proteomic risk profile of said patient by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein i. downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, ii. upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort,

Hi. proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and iv. proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

In further embodiments, the invention relates to a method of predicting response to therapy in a patient with acute myeloid leukemia (AML) comprising providing a blood and/or tissue samples from said patient.

In one embodiment, said method is used for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”) or from low-risk Acute Myeloid Leukemia (“AML”).

Thus, in one or more embodiments the AML patient is stratified as belonging to a high-risk group when abnormal protein levels are detected in step e) of the method.

In addition, a method is herein disclosed for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, wherein the method comprises: a. providing a blood and/or plasma sample from said patient, b. processing said sample and extracting expressed proteins from said sample, c. performing mass spectrometry on said blood and/or plasma sample from said patient, to generate a protein-spectra comprising protein peaks; d. based on said protein spectra identifying an abnormal protein expression of at least 20 proteins, such as at least 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or such as at least 100 proteins selected from any one of tables 1 , 4, 15, 16, 17, 18 or 19, and e. predicting said patient's response to therapy based on the identification of abnormal expression of said proteins.

In addition, a method is herein disclosed for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, wherein the method comprises: a. providing a tissue sample from said patient, b. processing said sample and extracting expressed proteins from said sample, c. performing mass spectrometry on said tissue sample from said patient, to generate a protein-spectra comprising protein peaks; d. based on said protein spectra identifying an abnormal protein expression of at least 20 proteins, such as at least 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or such as at least 100 proteins selected from any one of tables 1 , 4, 15, 16, 17, 18 or 19, and e. predicting said patient's response to therapy based on the identification of abnormal expression of said proteins.

In addition, a method is herein disclosed for stratifying an AML patient into a high-risk, intermediate- risk or a low-risk group in relation to overall survival and/or complete remission, wherein the method comprises: a. providing a blood and/or plasma sample from said patient, b. processing said sample and extracting expressed proteins from said sample, c. performing mass spectrometry on said blood and/or plasma sample from said patient, to generate a protein-spectra comprising protein peaks; d. based on said protein spectra identifying an abnormal protein expression of at least 20 proteins, such as at least 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or such as at least 100 proteins selected from any one of tables 1 , 4, 15, 16, 17, 18 or 19, and e. predicting said patient's response to therapy based on the identification of abnormal expression of said proteins.

In addition, a method is herein disclosed for stratifying an AML patient into a high-risk, intermediate- risk or a low-risk group in relation to overall survival and/or complete remission, wherein the method comprises: a. providing a tissue sample from said patient, b. processing said sample and extracting expressed proteins from said sample, c. performing mass spectrometry on said tissue sample from said patient, to generate a protein-spectra comprising protein peaks; d. based on said protein spectra identifying an abnormal protein expression of at least 20 proteins, such as at least 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or such as at least 100 proteins selected from any one of tables 1 , 4, 15, 16, 17, 18 or 19, and e. predicting said patient's response to therapy based on the identification of abnormal expression of said proteins.

In embodiments, the invention relates to a method of predicting response to therapy in a patient with acute myeloid leukemia (AML) comprising: a. providing a blood and/or plasma sample from said patient, b. processing said sample and extracting expressed proteins from said sample, c. performing mass spectrometry on said blood and/or plasma sample from said patient, to generate a protein-spectra comprising protein peaks; d. based on said protein spectra identifying an abnormal protein expression of at least 20 proteins, such as at least 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or such as at least 100 proteins selected from any one of tables 1 , 4, 15, 16, 17, 18 or 19, and e. predicting said patient's response to therapy based on the identification of abnormal expression of said proteins. In embodiments, the invention relates to a method of predicting response to therapy in a patient with acute myeloid leukemia (AML) comprising: a. providing a tissue sample from said patient, b. processing said sample and extracting expressed proteins from said sample, c. performing mass spectrometry on said tissue sample from said patient, to generate a protein-spectra comprising protein peaks; d. based on said protein spectra identifying an abnormal protein expression of at least 20 proteins, such as at least 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or such as at least 100 proteins selected from any one of tables 1 , 4, 15, 16, 17, 18 or 19, and e. predicting said patient's response to therapy based on the identification of abnormal expression of said proteins.

An AML patient

An AML patient is a patient, subject or individual diagnosed with AML.

As used herein, the term “patient” refers to an animal, preferably to a mammal, more preferably to a human. Depending on the embodiment in question, said patient may suffer from AML with or without diagnosis, be suspected to suffer from AML, be at risk of AML, or may have already been treated for AML. In one embodiment, the patient is suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”). In another embodiment, the patient is suspected of suffering from low- risk Acute Myeloid Leukemia (“AML”).

In the present context, the terms “human subject”, “patient” and “individual” are interchangeable.

A sample

A method for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”), and/or for stratifying an AML patient into a high-risk or a low-risk group or into a high-risk, intermediate-risk or low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling according to the present invention comprises isolatingblood and/or tissue samples from said patient. A method for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”), and/or for stratifying an AML patient into a high-risk or a low-risk group or into a high-risk, intermediate-risk or low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling according to the present invention comprises providing blood and/or tissue samples obtained from said patient.

Thus, in one embodiment, the method is for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”) based on proteomic profiling according to the present invention comprises isolating or providing blood and/or tissue samples from said patient.

In another embodiment, the method is for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling according to the present invention comprises isolating, or providing blood and/or tissue samples from said patient.

In a further embodiment, the method is for stratifying an AML patient into a high-risk, intermediaterisk or low-risk group, in relation to overall survival and/or complete remission, based on proteomic profiling according to the present invention comprises isolating, or providing blood and/or tissue samples from said patient.

In yet another embodiment, the method is for stratifying an AML patient into one of at least three risk groups, such as but not limited to a high-risk group, an intermediate-risk group and a low-risk group, in relation to overall survival and/or complete remission, based on proteomic profiling according to the present invention comprises isolating, or providing blood and/or tissue samples from said patient.

As used herein, the term “sample” is used in its broadest sense. In one sense it can refer to a cell lysate. In another sense, it is meant to include a specimen or culture obtained and/or provided from a patient. In particularly, the term "sample" refers to a biological sample, typically a clinical sample which may be, for example, a tissue sample such a skin sample, a cell sample such as a skin cell sample (e.g., a skin fibroblast sample), or a skin punch or skin shave, or a sample of a bodily fluid such as urine, blood, plasma, or serum, obtained from a patient. Preferred samples include biological samples encompassing fluids, solids, tissues, and gases. Presently, biological samples in particular include blood products (e.g., plasma and serum), saliva, urine, and the like. These examples are not to be construed as limiting the sample types applicable to the present disclosure. In embodiments, a sample from the patient is obtained for the method of the invention. In a presently preferred embodiment, bone marrow and/or peripheral blood samples are obtained, isolate and/or provided at the time of diagnosis.

Generally, obtaining the sample to be analysed from a patient is not part of the present method, which may therefore be regarded as an in vitro method. Taking samples from cancer patients is a standard procedure and is often done for performing a plethora of analyses. Thus, in some embodiments, the sample has been obtained for another analysis and can thus readily be used for the stratification method described herein as well. Accordingly, the sample from the patient to be analysed is in embodiments provided in order to perform the method of the invention.

AML patient cohort

A method for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”), and/or for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling according to the present invention is based on proteomic profiling, compared to an AML patient cohort of AML patients.

The experimental part shows a proteogenomic analysis of a treatment naive Acute Myeloid Leukemia - study cohort selected from the larger [Clinseq cohort Wang, M., et al., Validation of risk stratification models in acute myeloid leukemia using sequencing-based molecular profiling. Leukemia, 2017. 31(10): p. 2029-2036] and consisted of 118 Swedish, treatment naive patients who all received first-line induction therapy for AML. In the present invention, the cohort is thus exemplified by 118 patients and the median expression level of said extracted proteins is determined for the intermediate samples of that cohort. Thus, the median expression level of the extractable proteins with SEQ ID NOs: 1-17500 is currently determined as shown in the experimental part.

The currently studied cohort is an exemplary cohort and based on predominantly Caucasian patients. It might be recommendable to use another cohort of treatment naive Acute Myeloid Leukemia patients to determine the median expression level of the extractable proteins with SEQ ID NOs: 1-17500, such as defined by a certain age-group or genetic background. In certain aspects, it might be recommendable to use a cohort that is non-treatment naive.

It is understood that the exact size of patient cohort can vary, as long as a median expression level of the extracted proteins can be determined and the quantitatively determined proteomic risk profile of said patient can be identifying as abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500 (Table 1), wherein the protein level is considered abnormal when the level of any one of the proteins with a SEQ ID NO as provided in Table 3 are <95% of the median expression level for the intermediate samples of the cohort, or any one of the proteins with a SEQ ID NO as provided in Table 2 are up-regulated by at least >5% above the median expression level of the cohort.

Processing said sample(s) and extracting expressed proteins

A method for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”), and/or for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling according to the present invention comprises processing the patient’s sample and extracting expressed proteins from said sample.

In one embodiment, at least one expressed protein selected from the list of proteins with SEQ ID NO: 1-1014, and/or at least one expressed protein selected from the list of proteins with SEQ ID NO: 1015-2519 are extracted from said sample. In one embodiment, at least five expressed proteins selected from the list of proteins with SEQ ID NO: 1-1014, and/or at least five expressed proteins selected from the list of proteins with SEQ ID NO: 1015-2519 are extracted from said sample.

The term “protein extraction” is in the current context used interchangeably with protein purification to describe a series of processes intended to isolate one or a few proteins from a complex mixture, such as cells or tissues. The purification process separates the protein and non-protein parts of the mixture. Separation steps usually exploit differences in protein size, physico-chemical properties, binding affinity and biological activity. The purified result may be termed protein isolate. The person skilled in the art will be able to choose from several standard laboratory proceedings to effect the protein extraction. Several preparative purifications steps are often deployed. The first step of each purification process is the disruption of the cells containing the protein. Non-limiting examples are for instance any one of the following methods: i) repeated freezing and thawing, ii) sonication, iii) homogenization by high pressure (French press), iv) homogenization by grinding (bead mill), and v) permeabilization by detergents (e.g., Triton X-100) and/or enzymes (e.g., lysozyme). Finally, the cell debris can be removed by centrifugation and/or filtration so that the proteins and other soluble compounds remain in the supernatant.

In some embodiments, the proteins are derived by proteolysis or chemical cleavage of a polypeptide. In an embodiment, a protease is utilized to cleave polypeptides into proteins. For example, the protease is trypsin. In additional embodiments, proteases or cleavage agents may be used including but not limited to trypsin, chymotrypsin, endoproteinase Lys-C, endoproteinase Asp- N, pepsin, thermolysin, papain, proteinase K, subtilisin, clostripain, exopeptidase, carboxypeptidase, cathepsin C, cyanogen bromide, formic acid, hydroxylamine, or NTCB, or a combination thereof. In some embodiments, the protease is trypsin. In various embodiments, the proteins are derived by proteolysis or chemical cleavage of a protein. In an embodiment, a protease is utilized to cleave the protein into proteins or peptides. For example, the protease is trypsin. In additional embodiments, proteases or cleavage agents may be used including but not limited to trypsin, chymotrypsin, endoproteinase Lys-C, endoproteinase Asp- N, pepsin, thermolysin, papain, proteinase K, subtilisin, clostripain, exopeptidase, carboxypeptidase, cathepsin C, cyanogen bromide, formic acid, hydroxylamine, or NTCB, or a combination thereof.

Protein extraction and preparation is exemplified in the example section as dissolving cell pellets in a lysis buffer (4% SDS, 50 mM HEPES pH 7,6, 1 mM DTT) followed by heating to 95°C and sonication. The total protein amount was estimated (Bio-Rad DC). Thereafter samples were prepared for mass spectrometry analysis using a modified version of the SP3 protein clean-up and a digestion protocol [Moggridge, S., et al., Extending the Compatibility of the SP3 Paramagnetic Bead Processing Approach for Proteomics. J Proteome Res, 2018. 17(4): p. 1730-1740; Hughes, C.S., et al., Ultrasensitive proteome analysis using paramagnetic bead technology. Mol Syst Biol, 2014. 10: p. 757],

Analyzing the extracted proteins and determining the expression levels of the extracted proteins

As used herein, the term “detection system capable of detecting proteins” refers to any detection apparatus, assay, or system that detects proteins derived from a protein separating apparatus (e.g., proteins in one or fractions collected from a separating apparatus). Such detection systems may detect properties of the protein itself (e.g., UV spectroscopy or mass spectrometry) or may detect labels (e.g., fluorescent labels) or other detectable signals associated with the protein. The detection system converts the detected criteria (e.g., absorbance, fluorescence, luminescence etc.) of the protein into a signal that can be processed or stored electronically or through similar means (e.g., detected through the use of a photomultiplier tube or similar system).

The extracted proteins are generally analysed by mass spectrometry. In a presently preferred embodiment, the mass spectrometry is data dependent acquisition (DDA) mass spectrometry and/or the mass spectrometry is data independent acquisition (DIA) mass spectrometry. In some embodiments, decreased or increased overall translation is determined quantitatively and/or qualitatively on the basis of signature molecules indicative of decreased or increased overall translation resulting from the use of alternative open reading frames or changes in post- translational modification, or signatures indicative of decreased or increased overall translation even though the precise mechanism of translation block or activation is not known. Suitable methods for measuring signatures include mass spectrometry, such as but not limited to selected from reaction monitoring (SRM) assays, multiple reaction monitoring (MRM) or parallel reaction monitoring (PRM) assays of translation products, as well as DDA shotgun and/or DIA approaches that provide diagnostic or predictive information. Additionally, antibody-based methods such as ELISA, proximity ligation assay, proximity extension assay, reverse phase protein arrays, aptamerbased assays, affibody based assays, and 2D-PAGE, for detection of decreased or increased overall translation can be used to measure of a single protein or group of proteins designating a "signature" that results from decreased or increased overall translation and is characteristic of or predictive, i.e., abnormal.

To determine whether a level of translation form a selected protein or group of proteins is abnormal, the median expression level of said extracted protein(s) for the intermediate samples of the cohort has to be determined. Once the median expression level is known, the determined translation levels can be compared therewith, and the significance of the difference can be assessed using standard statistical methods. Before to be compared with the median expression level, the translation levels are normalized using standard methods.

The detection methods particularly envisioned include detection by mass spectrometry, ELISA type antibody recognition or by measuring the enzymatic activity of unique translation products.

Mass spectrometry

In various embodiments, the MS data is collected by a targeted acquisition method. Examples of the targeted acquisition method include but are not limited to Selective Reaction Monitoring (SRM) and/or Multiple Reaction Monitoring (MRM) methods and PRM methods. In various embodiments, acquiring MS data comprises acquiring Selective Reaction Monitoring (SRM) data and/or Multiple Reaction Monitoring (MRM) data.

In various embodiments, the MS data is collected by a data independent acquisition (DIA) method. In various embodiments, the MS data is collected by data dependent acquisition (DDA) method. Non-limiting examples of mass spectrometry techniques include collision-induced dissociation (CID), higher-energy collisional dissociation (HCD), electron-transfer dissociation (ETD), etc.

Data Dependent Acquisition (DDA)

In various embodiments, the MS data is collected by a data dependent acquisition method. Thus, in some embodiments, the mass spectrometry is data dependent acquisition (DDA) mass spectrometry. In some embodiments, the one or more proteins are correlated to the one or more proteins with a SEQ ID NO as provided in the sequence listing. In some embodiments, the one or more proteins are correlated to the one or more proteins according with a sequence of SEQ ID NOs: 1-17500, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort. In embodiments downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort. In embodiments upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort. In that regards proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

Data Independent Acquisition (DIA)

In various embodiments, the MS data is collected by a data independent acquisition method. Thus, in some embodiments, the mass spectrometry is data independent acquisition (DIA) mass spectrometry. In some embodiments, the one or more proteins are correlated to the one or more proteins with a SEQ ID NO as provided in Table 4-5. In some embodiments, the one or more proteins are correlated to the one or more proteins according to one or more of SEQ ID NOs: 1- 17500, wherein the protein level is considered abnormal when said protein level is of any one or more proteins is/are downregulated or upregulated compared to the median expression level of the cohort, wherein downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort, proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

Examples of the independent acquisition (DIA) method include but are not limited to Shotgun CID (see. e.g., Purvine et al. 2003), Original DIA (see e.g., Venable et al. 2004), MSE (see e.g., Silva et al. 2005), p2CID (see e.g., Ramos et al. 2006), PAcIFIC (see e.g., Panchaud et al. 2009), AIF (see e.g., Geiger et al. 2010), XDLA (see e.g., Carvalho et al. 2010), SWATH (see e.g., Gillet et al. 2012), and FT-ARM (see e.g., Weisbrod et al. 2012).

Triple quadrupole mass spectrometer

In some embodiments, the mass spectrometer is a triple quadrupole mass spectrometer. In some embodiments the mass spectrometer is a Triple-Time Of Flight (Triple- TOF) mass spectrometer configured for SWATH or a Q-Exactive mass spectrometer (Thermo Scientific), or any instrument with sufficiently high scan speed and a quadrupole mass filter to perform data independent acquisition. Examples of triple quadrupole mass spectrometers (TOMS) that can perform MRM/SRM/SIM include but are not limited to: QTRAP(R) 6500 and 5500 System (Sciex); Triple QTriple Quad 6500 System (Sciex); Agilent 6400 Series Triple Quadrupole LC/MS systems; Thermo Scientific™ TSQ™ Triple Quadrupole system; quadrupole time-of-flight (QTOF) mass spectrometers, or hybrid quadrupole-orbitrap (QOrbitrap) mass spectrometers. Examples of quadrupole time-of-flight (QTOF) mass spectrometers include but are not limited to: TripleTOF(R) 6600 or 5600 System (Sciex); X500R QTOF System (Sciex); 6500 Series Accurate-Mass Quadrupole Time-of-Flight (Q-TOF) (Agilent); or Xevo G2-XS QTof Quadrupole Time-of-Flight Mass Spectrometry (Waters). Examples of hybrid quadrupole- orbitrap (QOrbitrap) mass spectrometers include but are not limited to: Q Exactive™ Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo Scientific); or Orbitrap Fusion™ Tribrid™ (Thermo Scientific).

Tandem mass spectrometry (MS/MS)

In some embodiments, the mass spectrometry technique is tandem mass spectrometry (MS/MS). In some embodiments, the mass spectrometry technique is liquid chromatography-tandem mass spectrometry (LC-MS/MS). In some embodiments, the mass spectrometry technique is liquid chromatography-selected reaction monitoring-mass spectrometry (LC-SRM-MS). In some embodiments, the mass spectrometry technique is liquid chromatography-multiple reaction monitoring-mass spectrometry (LC-MRM-MS). In some embodiments, the mass spectrometery technique is selected reaction monitoring (SRM). In some embodiments, the mass spectrometry technique is multiple reaction monitoring (MRM). In some embodiments, the mass spectrometry technique is parallel reaction monitoring (PRM). In some embodiments, the mass spectrometry technique is data-independent analysis (DIA). In some embodiments, the mass spectrometry technique is data-dependent analysis (DDA).

In various other embodiments, the method further comprises adding a stable isotope-labelled peptide or protein to the sample prior to mass spectrometry. In some embodiments, the absolute amount of a protein in the sample is determined by comparing the MS signals of natural and stable isotope-labelled peptides and/or proteins.

In various other embodiments, transitions for each protein with high and reproducible peak intensities are identified. In other embodiments, the collision energy for each transition is optimized. In other embodiments, mass spectrometry comprises selected reaction monitoring (SRM), or multiple reaction monitoring (MRM). In other embodiments, SRM or MRM is performed on a triple quadrapole mass spectrometer. In other embodiments, the proteins of interest are those with high correlations, strong signals, high signal/noise and/or sequences unique to the protein of interest.

SRM/MRM/SIM

Selected-ion monitoring (SIM) or selected reaction monitoring (SRM) or multiple reaction monitoring (MRM) provide the simplest method set up and the most selective and sensitive quantification. SRM/MRM/SIM is a method used in tandem mass spectrometry in which an ion of a particular mass is selected in the first stage of a tandem mass spectrometer and an ion product of a fragmentation reaction of the precursor ion is selected in the second mass spectrometer stage for detection. Examples of triple quadrupole mass spectrometers (TQMS) that can perform MRM/SRM/SIM include but are not limited to: QTRAP(R) 6500 and 5500 System (Sciex); Triple QTriple Quad 6500 System (Sciex); Agilent 6400 Series Triple Quadrupole LC/MS systems; or Thermo Scientific™ TSQ™ Triple Quadrupole system.

Parallel- Reaction Monitoring (PRM)

In addition to MRM, the protein(s) can also be quantified through Parallel- Reaction Monitoring (PRM). Parallel reaction monitoring (PRM) is the application of SRM with parallel detection of all transitions in a single analysis using a high-resolution mass spectrometer. PRM provides high selectivity, high sensitivity and high throughput to quantify selected peptide (QI), hence quantifying proteins. Again, multiple peptides or proteins can be specifically selected for each protein. PRM methodology uses the quadrupole of a mass spectrometer to isolate a target precursor ion, fragments the targeted precursor ion in the collision cell, and then detects the resulting product ions in the Orbitrap mass analyser. Quantification is carried out after data acquisition by extracting one or more fragment ions with 5-10 ppm mass windows. PRM uses a quadrupole time-of-flight (QTOF) or hybrid quadrupole-orbitrap (QOrbitrap) mass spectrometer to carry out the peptide/protein quantitation. Examples of QTOF include but are not limited to: TripleTOF(R) 6600 or 5600 System (Sciex); X500R QTOF System (Sciex); 6500 Series Accurate-Mass Quadrupole Time-of-Flight (Q-TOF) (Agilent); or Xevo G2-XS QTof Quadrupole Time-of-Flight Mass Spectrometry (Waters). Examples of QOrbitrap include but are not limited to: Q Exactive™ Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo Scientific); or Orbitrap Fusion™ Tribrid™ (Thermo Scientific).

Non-limiting advantages of PRM include elimination of most interferences, provides more accuracy and attomole-level limits of detection and quantification, enables the confident confirmation of the peptide identity with spectral library matching, reduces assay development time since no target transitions need to be preselected, ensures UHPLC- compatible data acquisition speeds with spectrum multiplexing and advanced signal processing.

In various embodiments, stable isotope-labelled peptide or protein standards for absolute quantification are used. In other embodiments, the peptide or protein labelled with a stable isotope is used as an internal standard to obtain absolute quantification of the protein of interest. In other embodiments, the proteins are quantified and then the amount of the parent protein present is inferred before digesting the sample with trypsin. In other embodiments, MS responses are used to determine an upper limit of quantification (ULOQ) and a lower limit of quantification (LLOQ).

In various embodiments, the MS data comprises raw MS data obtained from a mass spectrometer and/or processed MS data in which proteins and their fragments (e.g., transitions and MS peaks) are already identified, analysed and/or quantified. In various embodiments, the MS data is Selective Reaction Monitoring (SRM) data or Parallel-Reaction Monitoring (PRM) data and/or Multiple Reaction Monitoring (MRM) data. In various embodiments, the MS data is Shotgun CID MS data, Original DIA MS Data, MSE MS data, p2CID MS Data, PAcIFIC MS Data, AIF MS Data, XDLA MS Data, SWATH MS data, or FT-ARM MS Data, or a combination thereof.

In various embodiments, acquiring MS data comprises operating a TripleTOF mass spectrometer, a triple quadrupole mass spectrometer, a liquid chromatography-mass spectrometry (LC-MS) system, a gas chromatography-mass spectrometry (GC-MS) system, or a tandem mass spectrometry (MS/MS) system, a dual time-of-flight (TOF-TOF) mass spectrometer, or a combination thereof.

In various embodiments, acquiring MS data comprises operating a mass spectrometer. Examples of the mass spectrometer include but are not limited to high-resolution instruments such as TripleTOF, Orbitrap, Fourier transform, and tandem time-of-flight (TOF/TOF) mass spectrometers; and high-sensitivity instruments such as triple quadrupole, ion trap, quadrupole TOF (QTOF), and Q trap mass spectrometers; and their hybrid and/or combination. High-resolution instruments are used to maximize the detection of proteins with minute mass-to-charge ratio (m/z) differences.

Conversely, because targeted proteomics emphasize sensitivity and throughput, high-sensitivity instruments are used. In some embodiments, the mass spectrometer is a TripleTOF mass spectrometer. In some embodiments, the mass spectrometer is a triple quadrupole mass spectrometer.

In some embodiments, acquiring MS data does not require operating a mass spectrometer. For examples, MS data can be acquired from MS experiments run previously and/or MS databases. In some embodiments, previously acquired SWATH MS data can be queried with a more comprehensive library to identify additional MS peaks derived from different and macromolecules.

Detection of a protein in a test sample involves routine methods. The skilled artisan can detect the presence or absence of a protein using well known methods. Such methods include diverse immunoassays. In general, immunoassays involve the binding of antibodies or similar probes to proteins in a sample such a histological section or binding of proteins in a sample to a solid phase support such as a plastic surface. Detectable antibodies are then added which selectively bind to the protein of interest. Detection of the antibody indicates the presence of the protein. The detectable antibody may be a labelled or an unlabelled antibody. Unlabelled antibody may be detected using a second, labelled antibody that specifically binds to the first antibody or a second, unlabelled antibody which can be detected using labelled protein A, a protein that complexes with antibodies. Various immunoassay procedures are described in Immunoassays for the 80's, A. Voller et al., Eds., University Park, 1981.

Detection of protein expression level is exemplified in the example section as determining the labelling efficiency by LC-MS/MS before pooling of the samples. For the sample clean-up step, a solid phase extraction (SPE strata-X-C, Phenomenex) was performed, and purified samples were dried in a SpeedVac. An aliquot of approximately 10 pg was suspended in LC mobile phase A and 1 pg was injected on the LC-MS/MS system.

Protein quantification by TMT plex reporter ions was calculated using TMT PSM ratios to the entire sample set (all 10 TMT-channels) and normalized to the sample median. The median PSM TMT reporter ratio from peptides unique to a gene symbol was used for quantification. Protein false discovery rates were calculated using the picked-FDR method using gene symbols as protein groups and limited to 1% FDR.

Quantitatively determining the proteomic risk profile by identifying abnormal protein levels

A method for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”), and/or for stratifying an AML patient into a high-risk or a low-risk group or into a high risk, an intermediate risk or a low risk group in relation to overall survival, event-free survival and/or complete remission, based on proteomic profiling according to the present invention comprises quantitatively determining the proteomic risk profile of said patient by identifying abnormal protein expression levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort, proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3. Patients with abnormal protein expression levels are determined to belong in the high-risk group, i.e. suffering from high-risk Acute Myeloid Leukemia (“AML”).

In one embodiment, the method comprises identifying abnormal protein levels of the extracted proteins, wherein an abnormal protein level of the at least 5 proteins selected from the list of proteins with a SEQ ID NO as provided in Table 3 is defined as < 95% of the median expression level for those proteins from a cohort of AML patients; and wherein an abnormal protein level of the at least one protein selected from the list of proteins with a SEQ ID NO as provided in Table 2 is defined as >5% above the median expression level for those proteins of the cohort. Patients with abnormal protein expression levels are determined to belong in the high-risk group, i.e., suffering from high-risk Acute Myeloid Leukemia (“AML”).

Protein levels are generally tightly regulated by their expression, modification and degradation, all of which are in turn regulated directly or indirectly by other regulators of the specific pathway. In that sense, proteins that are maladapted in their expression level, i.e., displaying an abnormal expression level, may have a consequence for other proteins and visa versa. Thus, an abnormal expression level of a specific protein may be either up- or down regulated, relative to the normal protein level of that protein. In that regard, identifying abnormal protein expression levels, requires a knowledge about the normal level of the specific protein in question.

In one embodiment, identification of an abnormal protein level relates to a level of a protein which is at least 5 % upregulated or at least 5 % down regulated, in relation to the average or median protein level, of the protein in question, across the cohort.

Thus, in a furhter embodiment, the protein level is considered abnormal when the level of said protein is at least 105 % or below 95 % in relation to the average or median protein level, of the protein in question, across the cohort, such as a protein level of at least 106 %, 110 %, 150 %, 200 %, 300 %, 400 %, 500 % or 1000 %, or such as a protein level below 94 %, 90 %, 80 %, 50 %, 25 %, 10 %, 5 % or 1 %.

The median expresion level of a protein across the cohort, in the present disclosure, relates to the median expression level in the cohort. In the experimental section, the cohort is exemplified by 118 patients and the median expression level of said extracted proteins is determined for the intermediate samples of that cohort. Thus, the median expression level of the extractable proteins with SEQ ID NOs: 1-17500 is currently determined as shown in the experimental part.

In the current context the term “protein level” is used interchangably with the term “protein expression level”.

Proteomic profiling

A set of 17500 protein markers is provided herein whose expression level may be correlated to the prognosis of acute myeloid leukemia using cluster analysis. A series of subsets of these markers are identified as useful for disease prognosis are listed in Tables 4, 5, 6, 7, 8, 9, 10 and 11. The markers with SEQ ID NOs provided in Tables 6, 7 and 8 were found suitable for retrospective diagnosis and markers with SEQ ID NOs provided in Tables 9, 10 and 11 were found suitable for prospective diagnosis. Thus, the invention also relates to a method for clustering these markers into sub-groups, which may in turn be used to cluster patients making it possible to stratify patients based on the predicted disease prognosis. Proteomic profiling relates to obtaining a profile describing the level of individual proteins in a sample obtained from a patient. In relation to the proteomic profiling a protein sample may be obtained from a tissue, blood, salvia, mucus and/or bone marrow sample, as described herein. In one embodiment, a proteomic profile relates to the individual protein level of every protein obtained in a sample. In another embodiment, a proteomic profile relates to a selected subset of proteins obtained from a sample.

The present disclosure describes a set of 17500 protein markers that can be used to distinguish between patients with a good AML prognosis ([treatment regiment, event free survival, overall survival or reaching complete remission]) and patients with a poor AML prognosis (disease relapse, non-responders to treatment etc.). These markers are listed with their gene symbol in table 1 and with their amino acid sequence in the sequence listing. Subsets are provided of at least 5, 10, 20, 30, 40, 50, 75, 100, 150, 200, 500, 1000, 2000 or 10000 markers, drawn from the set of 17500, which also distinguish between patients with good and poor prognosis.

The 17500 protein markers of the present invention, relates to a total of 2519 genes/coding regions, thus some genes are translated into more than one protein marker. As a consequence, some of the 17500 protein markers listed in Table 1 are encompassed in the sequence or sequences of other protein markers, which relates to the same gene.

In that regard, in one embodiment of the present disclosure a proteomic profile relates to the level of the 17500 protein markers, described in the present disclosure.

In another embodiment, a proteomic profile relates to a selected subset of proteins obtained from a sample, wherein the subset is selected amongst the 17500 proteins markers disclosed herein.

In a further embodiment, the subset is selected amongst the 17500 proteins markers disclosed herein, wherein the subset comprises at least 5, 10, 20, 30, 40, 50, 75, 100, 150, 200, 500, 1000 or 2000 of the protein markers.

Signatures defined by mass spectrometry

Several methods are described based on application and exploitation of specific AML signatures and the proteins therein. Methods for determining these "signatures" from patient samples and thus providing evidence as to whether those patients are "high-risk," “intermediate-risk” or "low-risk". Such methods are based on the signatures defined by mass spectrometry (examples of which are listed herein) and can be determined using specific sets of antibodies or similar specific probe reagents to define which proteins of the signatures are present in the samples. Methods include immunohistochemistry, ELISA, protein arrays and similar methods known to the skilled person. Use of such methods provides diagnostic, prognostic and monitoring information allowing improved management of AML cancer patients' care and therapy.

The list of proteins

All proteins with SEQ ID NOs: 1-17500 are listed in the sequence listing, wherein they are additionally provided with their respective Gene Symbols. The full protein sequences for each of the identified proteins are known in the art. Detailed information on each of these proteins including genomic information and listing of sources for nucleotide and protein sequences for each of these proteins is provided in the sequence listing and for specific proteins in the below tables in the form of list of Gene Symbols, for each gene symbol there are the associated ENSGs (Ensembl Gene IDs) and the associated ENSPs (Ensembl protein IDs) and a description. In some cases it is a 1 :1 ratio i.e.: GENESYMBOL A : ENSG A : ENSP A. For others, it is not a 1 :1 relationship as several entries in the database (ENSGs/ENSPs) map to the same Gene Symbol, GENESYMBOL_B : ENSG_B1, ENSG B2, ... : ENSP_B1 , ENSP_B2.1 , ENSP_2.2. All gene symbols and their correlating amino acid sequences are given in the sequence listing provided with the application. The full sequences can be identified by accessing any of these accession numbers in the https://www.ensembl.org search query. As these proteins are well known to the skilled artisan, the signatures of proteins and binding reagents described herein include variations and modifications to the sequences as well.

ENSEMBL references and gene names are provided for each SEQ ID NO in the sequence listing.

In total the present disclosure relates to a set of 2519 individual marker proteins, wherein each marker protein is also described by its Gene Symbol as provided in Table 1. The Gene Symbol in some cases covers a number of different sequence variants, thus the 2519 unique marker proteins of Table 1 results in a total of 17500 individual protein sequences identified as SEQ ID NOs: 1- 17500, which are identified in the sequence listing provided with the application. Subsets of the marker proteins were generated, and these may be used for the stratification of AML patients into groups.

Upregulated protein markers are provided with SEQ ID NOs in Table 2 and downregulated protein markers are provided with SEQ ID NOs in Table 3.

Preferred sets of markers are provided in Table 4-5.

Table 1 - Gene Symbols of selected 2519 markers

Table 2 - upregulated proteins

Table 3 - downregulated proteins

Table 4 - medium list

Table 5 - small list

Table 15 - Gene symbols of applied markers from large list (1028 markers)

Table 16 - Gene symbols of the medium list (537 markers)

Table 17- Gene symbols of applied markers from medium list (332 markers) Table 18 - Gene symbols of short list (249 markers) Table 19 - Gene symbols of applied markers from short list (178 markers)

In a specific embodiment, the set of markers consists of spliceosomal proteins. The invention provides a method of using the above markers to distinguish between patients with good or poor prognosis, i.e., between patients belonging to groups classified as AML high-risk or low-risk.

The sets of markers listed in Tables 4-5 partially overlap; in other words, some markers are present in multiple sets, while other markers are unique to a set.

Sets of protein markers:

In the clinic the selected protein marker may differ from patient to patient, as not all protein markers are identified in all patients. As exemplified in Example 2, protein markers used in a prospective diagnosis may be any protein markers that are identified from one or more samples from one or more patients, that are also provided herein, i.e., in Tables 1 , 4 and/or 5 or which sequence is any of SEQ ID NOs 1-17500. As described in Example 1 , in the data analysis Protein groups were defined based on gene symbols to obtain a gene symbol centric quantification, i.e., the SEQ ID NOs 1-17500 provided in the sequence listing can be defined in groups based on the gene symbols provided in Table 1. Thus, in a sample from a patient, it is common that of the Gene Symbols listed in Table 1 , only a subset is identified in the patient, and are used for the classification. Due to the difference in protein expression between individual patients, providing at minimum of protein markers from the list that should be present is preferred. For instance, in a nonlimiting example, a sample could e.g., contain 2005 of the 2519 protein markers identified by their Gene Symbols listed in Table 1 , of the 2005 protein markers, only 1038 of the protein markers are also correctly upregulated or down regulated, as they share their sequence with the sequences of the SEQ ID NOs as provided in Tables 2 and 3, thus the classification is done on the basis of the 1038 protein markers out of the total 2519 protein markers.

In example 1 of the present disclosure, at least about 7 % of the markers (178 markes 12519 markers) of table 1 are used to stratify patients into a long term high or low risk group. In figure 2A- C, it can also be seen that the predictive effect of the full marker list, the medium marker list or the small marker list is also substantial within about 1-2.5 years, thereby showing that the prediction can be done with different subsets of the full marker set from at least as low as 7 % of the total markers in Table 1.

In embodiments, the method for stratifying an AML patient comprises determining an abnormal expression of at least 5 %, such as at least 7%, 10 %, 15 %, 20 %, 25 %, 30 %, 35 %, 40 %, 45 %, 50 %, 55 %, 60 %, 65 %, 70 %, 75 %, 80 %, 85 %, 90% or such as at least 95 % of the markers provided in table 1. The expression of a marker is considered abnormal when it has an expression level which is at least 5% higher or at lower 5% lower than the median expression level of said markers in a reference cohort. In the present disclosure, markers presented in table 1 were shown to have an abnormal expression in the AML patient cohort discussed in example 1 . Markers which were found to by upregulated are presented in table 2 and markers which were shown to be downregulated are presented in table 2.

In embodiments, the method for stratifying an AML patient comprises determining an abnormal expression of about 30 % to about 80 % of the markers provided in table 15. In embodiments, the method for stratifying an AML patient comprises determining an abnormal expression of about 70 % to about 95 % of the markers provided in table 15. In embodiments, the method for stratifying an AML patient comprises using the markers provided in any of tables 1-5 or any of tables 15-19 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using at least 5, 10, 25, 35, 50 or at least 100 of the markers provided in 1 , 15-19, in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using at least 90 % of the markers provided in table 1 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using at least about 90 % of the markers provided in table 15 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using at least 20 %, such as at least 30 %, 35 %, 40%, 45 %, 50 %, 55 %, 60 %, 65 %, 70 %, 75 %, 80 %, 85 %, 90 %, 95 % or such as all of the markers provided in table 16 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using at least about 90 % of the markers provided in table 16 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using at least about 90 % of the markers provided in table 17 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using at least about 90 % of the markers provided in table 18 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using at least 20 %, such as at least 30 %, 35 %, 40%, 45 %, 50 %, 55 %, 60 %, 65 %, 70 %, 75 %, 80 %, 85 %, 90 %, 95 % or such as all of the markers provided in table 19 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using at least about 90 % of the markers provided in table 19 in the stratification of the patient.

In embodiments, the method for stratifying an AML patient comprises using the markers provided in table 1 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using the markers provided in table 4 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using the markers provided in table 5 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using the markers provided in table 15 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using the markers provided in table 16 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using the markers provided in table 17 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using the markers provided in table 18 in the stratification of the patient. In embodiments, the method for stratifying an AML patient comprises using the markers provided in table 19 in the stratification of the patient.

In that regard, all proteins that are found to be usable for quantitatively determining the proteomic risk profile of an AML patient are protein markers with a sequence listed in the sequence listing, i.e., SEQ ID NOs: 1-17500. By identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500 as listed in the sequence listing, patients can be stratified to belong to high-risk or low risk groups or to high-risk, intermediate-risk or low-risk groups. Tables 4-5 list presently preferred protein markers for determining abnormal levels of which allows stratifying patients to belong to high-risk or low-risk groups. Tables 15-19 list the gene symbols of the presently preferred protein markers for determining abnormal levels of which allows stratifying patients to belong to high-risk or low-risk groups.

In the list of proteins with SEQ ID NOs: 1-17500, protein expression levels are considered abnormal when the level of any one of the proteins with a SEQ ID NO as provided in Table 3 are downregulated to <95% of the median expression level for the intermediate samples of the cohort, or any one of the proteins with a SEQ ID NO as provided in Table 2 are up-regulated >5% above the median expression level of the cohort.

In a preferred embodiment, the method for stratifying an AML patient comprises determining that at least 10 proteins selected from the list of proteins with SEQ ID NOs: 1-17500 have an abnormal expression level. In a preferred embodiment, the method for stratifying an AML patient comprises determining that at least 50 proteins selected from the list of proteins with SEQ ID NOs: 1-17500 have an abnormal expression level. In a preferred embodiment, the method for stratifying an AML patient comprises determining that at least 100 proteins selected from the list of proteins with SEQ ID NOs: 1-17500 have an abnormal expression level. In a preferred embodiment, the method for stratifying an AML patient comprises determining that at least 1000 proteins selected from the list of proteins with SEQ ID NOs: 1-17500 have an abnormal expression level. In a preferred embodiment, the method for stratifying an AML patient comprises determining that at least 10000 proteins selected from the list of proteins with SEQ ID NOs: 1-17500 have an abnormal expression level. In a preferred embodiment, the method for stratifying an AML patient comprises determining that at least 15000 proteins selected from the list of proteins with SEQ ID NOs: 1-17500 have an abnormal expression level.

In a preferred embodiment, the method for stratifying an AML patient comprises determining that at least 10, such as at least 100, 1000, 10000 or 15000 proteins selected from the list of proteins with SEQ ID NOs: 1-17500 have an abnormal expression level that correspond to the direction of abnormal expression as provided in Table 2, which specifies upregulated proteins or as provided in Table 3 which specifies downregulated proteins.

In a preferred embodiment, the method for stratifying an AML patient comprises determining that at least 50 % of the proteins selected from the list of proteins with SEQ ID NO: 1-17500 have an abnormal level. In a preferred embodiment, the method for stratifying an AML patient as belonging to a high-risk group comprises determining that at least 50 % of the proteins selected from the list of proteins with SEQ ID NO: 1-17500 have an abnormal level. In a preferred embodiment, the method for stratifying an AML patient as belonging to a low-risk group comprises determining that less than 50 % of the proteins selected from the list of proteins with SEQ ID NO: 1-17500 have an abnormal level.

In a preferred embodiment, the method for stratifying an AML patient as belonging to a intermediate-risk group comprises determining that more than 25 % but less than 50 % of the proteins selected from the list of proteins with SEQ ID NO: 1-17500 have an abnormal level. In a preferred embodiment, the method for stratifying an AML patient as belonging to a low-risk group comprises determining that less than 25 % of the proteins selected from the list of proteins with SEQ ID NO: 1-17500 have an abnormal level.

In a preferred embodiment, the method for stratifying an AML patient comprises determining that at least 50 % of the proteins selected from the list of protein markers identified by their Gene Symbol as provided in Table 1 have an abnormal level. In a preferred embodiment, the method for stratifying an AML patient as belonging to a high-risk group comprises determining that at least 50 % of the proteins selected from the list of protein markers identified by their Gene Symbol as provided in Table 1 have an abnormal level. In a preferred embodiment, the method for stratifying an AML patient as belonging to a low-risk group comprises determining that less than 50 % of the proteins selected from the list of protein markers identified by their Gene Symbol as provided in Table 1 have an abnormal level.

In a preferred embodiment, the method for stratifying an AML patient as belonging to an intermediate-risk group comprises determining that more than 25 % but less than 50 % of the proteins selected from the list of protein markers identified by their Gene Symbol as provided in Table 1 have an abnormal level.

In a preferred embodiment, the method for stratifying an AML patient as belonging to a low-risk group comprises determining that less than 25 % of the proteins selected from the list of protein markers identified by their Gene Symbol as provided in Table 1 have an abnormal level.

The herein presented method is a tool for stratifying an AML patient into a high-risk or a low-risk group, wherein the proteomic risk profile of said patient is done by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs as provided in table 4, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort, proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

The herein presented method is a tool for stratifying an AML patient into a high-risk or a low-risk group, wherein the proteomic risk profile of said patient is done by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs as provided in table 5, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort, proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

Clustering patients

As shown in Example 1 , AML patients were stratified into three groups based on clustering of spliceosomal proteins (Fig 2,3 and 6A) and investigated whether there is an association between outcome and spliceosome proteins. It was found that higher spliceosome levels were significantly associated with shorter overall survival (Fig 2,3 and 6B) and that lower levels were associated to a higher rate of complete remission.

In conclusion, it is possible to cluster proteins of SEQ ID NO: 1-17500 into sub-groups, wherein one or more of the sub-groups is/are used to stratify patients into at least a high-risk and a low-risk group. One such sub-group is provided in Table 4, another such sub-group is provided in Table 5.

Previous attempts of stratifying AML patients according to the expression level of specific markers, have in example been shown in Nicholas et al. (2011), where especially the expression level of the marker S100A8 was shown to have a significant predictive ability, for categorizing a patient cohort (n=17) into a short term (up to 100 days) high-risk or low-risk group. Example 3 shows that the predictive power of S100A8 was not found significant for the cohort of example 1 , neither in short term (<200 days) or long term (>800 days).

Thus, the present invention also relates to a method for stratifying an AML patient into a high-risk or a low-risk group, wherein the proteomic risk profile of said patient is done by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs as provided in Table 4, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort, proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

Also, the present invention also relates to a method for stratifying an AML patient into a high-risk or a low-risk group, wherein the proteomic risk profile of said patient is done by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs as provided in Table 5, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort, proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

Furthermore, the present invention also relates to a method for stratifying an AML patient into a high-risk, intermediate-risk or a low-risk group, wherein the proteomic risk profile of said patient is done by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs as provided in Table 4, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort, proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

Additionally, the present invention also relates to a method for stratifying an AML patient into a high-risk, intermediate-risk or a low-risk group, wherein the proteomic risk profile of said patient is done by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs as provided in Table 5, wherein the protein level is considered abnormal when said protein level is downregulated or upregulated compared to the median expression level of the cohort, wherein downregulated refers to a protein level that is less than 95% of the median expression level of said protein for the intermediate samples of the cohort, upregulated refers to a protein level that is at least 5% above the median expression level of said protein for the intermediate samples of the cohort, proteins that are downregulated are proteins with a SEQ ID NO provided in Table 2, and proteins that are upregulated are proteins with a SEQ ID NO provided in Table 3.

Algorithms

Clustering algorithm(s) used in the presented method is/are one or more algorithms selected from the group consisting of Random Forest, k-Top Scoring Pairs, k-Nearest Neighbor, Support Vector Machines, oPLS-DA, PLS-DA, t-SNE, UMAP, PCA, lasso, Decision Trees, Naive Bayes and Logistic Regression.

In one or more embodiments, the present disclosure relates to a method for clustering a cohort of AML patients into risk groups, wherein the clustering method comprises the use of one or more of the algorithms selected from this group consisting of Random Forest, k-Top Scoring Pairs, k- Nearest Neighbor, Support Vector Machines, oPLS-DA, PLS-DA, t-SNE, UMAP, PCA, lasso, Decision Trees, Naive Bayes and Logistic Regression.

In one or more embodiments, the present disclosure relates to a method for clustering a cohort of AML patients into risk groups, wherein the clustering method comprises the use of the k-Top Scoring Pairs, Support Vector Machines and/or Random Forest, and wherein the data is obtained using DIA-MS, DDA-MS, PRM-MS and/or ELISA.

In one or more embodiments, the present disclosure relates to a method for clustering a cohort of AML patients into risk groups, wherein the clustering method comprises the use of Random Forest, k-Top Scoring Pairs, k-Nearest Neighbor and/or Support Vector Machines.

In one or more embodiments, the present invention relates to a method for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”), and/or for stratifying an AML patient into a high-risk or a low-risk group or into a high-risk, intermediate risk and low risk group in relation to overall survival and/or complete remission, based on proteomic profiling according to the present invention, wherein the stratification method comprises the use of one or more of the algorithms selected from this group consisting of Random Forest, k-Top Scoring Pairs, k-Nearest Neighbor, Support Vector Machines, oPLS-DA, PLS-DA, t-SNE, UMAP, PCA, lasso, Decision Trees, Naive Bayes and Logistic Regression.

In one or more embodiments, the present invention relates to a method for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”), and/or for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling according to the present invention, wherein the stratification method comprises the use Random Forest, k-Top Scoring Pairs, k-Nearest Neighbor and/or Support Vector Machines.

Proteogenomic method

In one embodiment, a method for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”), and/or for stratifying an AML patient into a high- risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic profiling according to the present invention in addition comprises processing the patient’s sample and extracting the patients DNA and analyzing the extracted DNA.

Thus, the present invention also relates to a method for stratifying an AML patient into a high-risk or a low-risk group as described herein, wherein the method further comprises, extracting chromosomal DNA from said sample, and analyzing the extracted DNA and determining genetic risk factors based on genetic variants in one or more genes and providing a patient stratification into groups based on their clinical and molecular profile.

Said sample can e.g., be obtained from bone marrow, peripheral blood, isolated cells from blood, tumour and normal tissue (mouth swab, skin biopsy etc.,) often required for genetic analysis (WGS/WES, for example for ELN2017).

Extracting DNA

Extracting DNA is a routine procedure in molecular biology. For the chemical method, there are many different commercially available kits used for extraction.

Analyzing the extracted DNA

Quantitation of nucleic acids is commonly performed to determine the average concentrations of DNA or RNA present in a mixture, as well as their purity. To date, there are two main approaches used by the person skilled in the art to quantitate, or establish the concentration, of nucleic acids (such as DNA or RNA) in a solution. These are spectrophotometric quantification and UV fluorescence tagging in presence of a DNA dye.

The extracted DNA can be analysed using DNA hybridization and sequencing, and/or quantitative mass spectrometry analysis and/or ELISA or similar methodologies known to the skilled artesan.

The extracted DNA can e.g., be analysed by Short tandem repeat (STR) analysis which builds upon restriction fragment length polymorphism (RFLP) and Amplified fragment-length polymorphism (AmpFLP) used in the past by shrinking the size of the repeat units, to 2 to 6 base pairs, and by combining multiple different loci into one PCR reaction. Alternatively, next-generation sequencing, such as, but not limited to massively parallel sequencing (MPS) can be employed.

For identification of e.g., single nucleotide polymorphism or other DNA sequences variations Dynamic allele-specific hybridization (DASH), molecular beacons or microarrays e.g., high-density oligonucleotide arrays can be employed.

Genetic standards

In one embodiment, a method for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”), and/or for stratifying an AML patient into a high- risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic-genomic profiling according to the present invention comprises, extracting chromosomal DNA from said patient, analyzing the extracted DNA and determining genetic risk factors based on genetic variants in one or more genes listed in a genetic standard, selected from the group consisting of ELN2017 and NCCN Guidelines and providing a patient stratification into groups based on their Overall, complete, combined, clinical and molecular profile.

ELN Guidelines

The ELN guidelines for stratification of AML patients are updated regularly and comprises a number of mutated or otherwise abnormal genes. In general AML patients are stratified into three groups, Favourable, Intermediate and Adverse, thus aiming at predicting the expected clinical outcome for a patient.

The ELN2017 guidelines to stratify patients into different risk groups comprise the following abnormal genes:

Favourable: t(8;21)(q22;q22.1 ); RUNX1-RUNX1T1 inv(16)(p13.1 ;q22) or t(16; 16)(p13.1 ;q22); CBFB-MYH11

Mutated NPM1 without FLT3-YTD or with FLT3-ITD |OW

Biallelic mutated CEBPA

Intermediate:

Mutated NPM1 and FLT3-ITD high

Wild-type NPM1 without FLT3-YTD or with FLT3- ITD |OW (without adverse-risk genetic lesions) t(9; 11 )(p21.3;q23.3); MLLT3-KMT2A Cytogenetic abnormalities not classified as favourable or adverse Adverse: t(6;9)(p23;q34.1); DEK-NUP214 t(v;11q23.3); KMT2A rearranged t(9;22)(q34.1 ;q11.2); BCR-ABL1 inv(3)(q21.3;q26.2) or t(3;3)(q21.3;q26.2); GATA2, MECOM(EVII)

- 5 or del(5q); - 7 ; -17/abn(17p)

Complex karyotype, c monosomal karyotyped

Wild-type NPM1 and FL73-ITD high

Mutated RUNXI

Mutated ASXLI

Mutated TP53

NCCN Guidelines

The NCCN guidelines for stratification of AML patients are updated regularly and comprises a number of mutated or otherwise abnormal genes. In general AML patients are stratified into three groups, Favourable, Intermediate, and Intermediate and Poor or Adverse, thus aiming at predicting the expected clinical outcome for a patient.

The NCCN 2020 guidelines to stratify patients into different risk groups comprise the following abnormal genes:

Favourable: t(8;21 )(q22;q22.1 ); RUNX1-RUNX1T1 inv(16)(p13.1q22) or t(16;16)(p13.1q22); CBFB-MYH11

Biallelic mutated CEBPA

Mutated NPM1 without FLT3-ITD

Intermediate:

Mutated NPM1 and FLT3-ITD high

Wild-type NPM1 without FLT3-ITD or with FL73-ITD |OW (without adverse-risk genetic lesions) t(9; 11 )(p21 ,3;q23.3); MLLT3-KMT2A

Cytogenetic abnormalities not classified as favourable or adverse

Poor or Adverse: t(6;9)(p23;q34.1 ); DEK-NUP214 t(v;11q23.3); KMT2A rearranged t(9;22)(q34.1 ;q11.2); BCR-ABL1 inv(3)(q21.3q26.2) or t(3;3)(q21 ,3;q26.2); GATA2,MECOME(EVI1 ) -5 or del(5q);-7;-17/abn(17p)

Complex karyotype, monosomal karyotype

• Wild-type NPM1 and FLT3-ITD high

Mutated RUNX1

Mutated ASXL1

Mutated TP53

In embodiments, the invention relates to a method for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival after 1 year, wherein genetic risk factors are determined based on genetic variants in at least 5 genes listed in a genetic standard, such as but not limited to selected from the group consisting of ELN2017 and NCCN Guidelines and providing a patient stratification into groups based on their clinical and molecular profile.

A method for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”), and/or for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic-genomic profiling according to the present invention comprises, comprises determining genetic risk factors based on genetic variants in at least 10 genes listed in a genetic standard, selected from the group consisting of ELN2017 and NCCN Guidelines and providing a patient stratification into groups based on their clinical and molecular profile.

A method for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”), and/or for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic-genomic profiling according to the present invention comprises of determining genetic risk factors based on genetic variants in at least 50 genes listed in a genetic standard, selected from the group consisting of ELN2017 and NCCN Guidelines and providing a patient stratification into groups based on their clinical and molecular profile.

A method for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”), and/or for stratifying an AML patient into a high-risk or a low-risk group in relation to overall survival and/or complete remission, based on proteomic-genomic profiling according to the present invention comprises of determining genetic risk factors based on genetic variants in at least 100 genes listed in a genetic standard, selected from the group consisting of ELN2017 and NCCN guidelines and providing a patient stratification into groups based on their clinical and molecular profile. Put to practice in different ways

The present invention can be put to practice in different ways. First, decreased and/or increased expression of the proteins may be used for diagnosing high-risk AML. As used herein, the term "diagnosing" refers, without limitation, to a process aimed at determining whether or not a subject is afflicted with high-risk AML. This is also meant to include instances where the presence or a stage of AML is not finally determined but that further diagnostic testing is warranted. In such embodiments, the method is not by itself determinative of the presence or absence of AML, or the stage of AML in the subject but can indicate that further diagnostic testing is needed or would be beneficial. Therefore, the present method may be combined with one or more other diagnostic methods for the final determination of the presence or absence of AML and/or high-risk AML in the subject. Such other diagnostic methods are well known to those skilled in the art.

Mass spectrometry may be used to detect changes in the absolute or relative expression level of a protein or set of proteins (or their peptide products produced by digestion). This is to be taken to include changes in alternative splicing of proteins, their Post-Translational Modifications or changes in the structure of the post-translational modifications themselves (e.g., changes in gly- can structures of O- or N-linked glycosylated peptides).

Importantly, the present invention enables early detection or diagnosis of high-risk AML. Early diagnosis of high-risk AML could allow early treatment and, thus, delay progression. Moreover, early diagnosis would enable the subject to take appropriate measures aiming at delaying progression or alleviate symptoms.

In some embodiments, the present method may be used to define a subgroup of AML, i.e., to stratify a patient into an AML subgroup, such as, but not limited to, belonging to a high-risk or low- risk group. High-risk and low-risk groups are in some embodiments defined in relation to overall survival after 1 year, in relation to overall survival, complete remission and/or event-free remission.

Overall survival, overall survival after 1 year

In the current context, patient overall survival is used to describe if the patient is still alive for a given period of time after diagnosis. It is a method of describing prognosis in certain disease conditions. Overall survival rate can be used as yardstick for the assessment of standards of therapy. The survival period is usually reckoned from date of diagnosis or start of treatment. Survival rates are important for prognosis and may be different depending on treatments as well as the overall general health of the patient. In general, it is known in the field to use mean overall survival rates to estimate the patient's prognosis. This is often expressed over standard time periods, like one, five, and ten years. For example, a patient with a higher one-year overall survival has a better prognosis. Sometimes the overall survival is reported as a death rate (%) without specifying the period the % applies to (possibly one year) or the period it is averaged over (possibly 5 years). Disease-specific survival rate refers to "the percentage of people in a study or treatment group who have not died from a specific disease in a defined period of time. The time period usually begins at the time of diagnosis or at the start of treatment and ends at the time of death. Patients who died from causes other than the disease being studied are not counted in this measurement.

In some embodiments of the current context, the herein described stratification method will allow to stratify the patient into a high-risk or low-risk group in relation to overall AML specific survival, such as over a standard time period, such as over one, five or ten years. A patient belonging in a high- risk group in relation to overall survival will have a low expectation of overall survival. E.g., a patient belonging in a high-risk group in relation to overall survival after 1 , 5 or 10 year(s) will have a low expectation of overall survival after 1 , 5 or 10 year(s), respectively.

Median survival, or "median overall survival" is also commonly used to express survival rates. This is the amount of time after which 50% of the patients have died and 50% have survived. In some embodiments of the current context, the herein described stratification method will allow to stratify the patient into a high-risk or low-risk group in relation to median AML specific survival. A patient belonging in a high-risk group in relation to median survival will have a low expectation of survival after the amount of time after which 50% of the patients have died.

In cancer research, various types of survival rates can be relevant, depending on the cancer type and stage. These include the disease-free survival (DFS) (the period after curative treatment [disease eliminated] when no disease can be detected), the progression-free survival (PFS) (the period after treatment when disease [which could not be eliminated] remains stable, that is, does not progress), and the metastasis-free survival (MFS) or distant metastasis-free survival (DMFS) (the period until metastasis is detected). Progression can be categorized as local progression, regional progression, locoregional progression, and metastatic progression. In some embodiments of the current context, the herein described stratification method will further allow to stratify the patient into a high-risk or low-risk group in relation to disease-free survival (DFS), progression-free survival (PFS), metastasis-free survival (MFS) or distant metastasis-free survival (DMFS).

The comparison is usually made through the Kaplan-Meier estimator approach.

Complete remission and/or event-free survival

In the current context, the term “Complete remission” is used interchangeably with the term “permanently cured”. The survival at any given time is equal to those that are cured plus those that are not cured, but who have not yet died or, in the case of diseases that feature asymptomatic remissions, have not yet re-developed signs and symptoms of the disease. When all of the noncured people have died or re-developed the disease, only the permanently cured members of the population will remain, and the DFS curve will be perfectly flat. The earliest point in time that the curve goes flat is the point at which all remaining disease-free survivors are declared to be permanently cured. If the curve never goes flat, then the disease is formally considered incurable (with the existing treatments).

Cure rate curves can be determined through an analysis of the data. The analysis allows the statistician to determine the proportion of people that are permanently cured by a given treatment, and also how long after treatment it is necessary to wait before declaring an asymptomatic individual to be cured.

Several cure rate models exist, such as the expectation-maximization algorithm and Markov chain Monte Carlo model. It is possible to use cure rate models to compare the efficacy of different treatments. Generally, the survival curves are adjusted for the effects of normal aging on mortality, especially when diseases of older people are being studied.

AML has been proven to have multiple plateaus, so that what was once hailed as a "cure" results unexpectedly in very late relapses. The goal of treatment for acute myeloid leukemia (AML) is to put the leukemia into complete remission (the bone marrow and blood cell counts return to normal), preferably a complete molecular remission (no signs of leukemia in the bone marrow, even using sensitive lab tests), and to keep it that way. If the patient remains in complete remission for at least 3 years, such as at least 5 years or more, the patient is considered to be cured.

In the current context, the term remission (complete remission) is defined as having no evidence of leukemia after treatment. This means the bone marrow contains fewer than 5% blast cells, the blood cell counts are within normal limits, and there are no signs or symptoms of the disease.

In the present context, the term “event-free remission” is used interchangeably with the term “event-free survival (EFS)”. Event-free survival (EFS) may be the preferred endpoint, where the investigator wants the endpoint to reflect the primary treatment, and not subsequent treatments that are given where the study drug fails, and not subsequent treatments that are given if relapse occurs. Where the study drug fails, subsequent treatments are often not controlled by the investigator. In contrast to the endpoint of EFS, the endpoint of overall survival takes into account second-line treatments that are given where the study drug fails, or where the study drug is unacceptably toxic. In some clinical trials, EFS may be the preferred endpoint, where the cancer in question can be reliably treated by existing drugs. In this situation, use of overall survival as the endpoint would not make much sense, as this particular endpoint would be triggered by so few study subjects. Recent advances for treating childhood acute lymphocytic leukemia (ALL) allow the “vast majority” of patients to achieve complete remission and then to be cured as opposed to in adult oncology.

Event-free survival has been defined in a number of ways in clinical trials for the various leukemias. For AML, EFS is defined as as as failure to achieve remission, resistant leukemia, relapse, second malignancy, or death of any cause, as failure to achieve OR, relapse, or death as a result of any cause. OR means complete remission (recovery of morphologically normal BM and blood counts (i.e. neutrophils^1 ,500/pL and piateietsSl00,000/pL), and no circulating leukemic blasts or evidence of extramedullary leukemia).

Regarding the endpoint of time-to-event, or event-free-survival, this sort of composite endpoint should not be configured too broadly. Where the endpoint of time-to-event is defined, for example as “progression of major symptoms,” and “death,” the value of this particular endpoint would be reduced by using a broader composite that also includes the event “discontinuation of treatment” or “exposure to rescue treatment.”

In some embodiments of the current context, the herein described stratification method will allow to stratify the patient into a high-risk or low-risk group in relation to complete remission or event-free remission. A patient belonging in a high-risk group in relation to complete remission or event-free remission will have a less than average expectation compared to an AML patient cohort of AML patients to achieve complete remission or event-free remission.

Sub-categories and combination of risk assessment

The method of the present invention for the first time provides a tool for generating a proteomic profile of a patient suspected of suffering from high-risk Acute Myeloid Leukemia (“AML”), and/or for stratifying an AML patient into a high-risk or a low-risk group or into a high-risk, intermediaterisk or low risk group in relation to overall survival event free survival, overall survival and/or reaching complete remission.

Thus, the patient stratification is as a basis used to determine if the patient belongs to one of two predefined groups of AML patients, i.e., those with high-risk or low-risk in relation to a median in a given patient cohort.

Alternatively, the patient stratification is as a basis used to determine if the patient belongs to one of three predefined groups of AML patients, i.e., those with high-risk, intermediate-risk or low-risk in relation to a median in a given patient cohort. In some embodiment, the stratification from the proteomic and genomic methods are combined to further generate additional groups. In some embodiment, the stratification from the proteomic and genomic methods are combined to further generate the categories, adverse-high, adverse-low, intermediate-high, intermediate-low, favorable-high, favorable-low. Thus, in some embodiment, the proteomics based stratification or the genomic based stratification is used as a secondary risk stratification that further divides the proteomic stratification and/or genetic stratification into subgroups, such as two subgroups for each group, resulting in six subgroups in total.

In an embodiment, subgroups are employed selectively to divide certain genetic categories. In another embodiment, subgroups are employed selectively to divide certain proteomic categories.

In an embodiment the result of the proteomic stratification is used to subgroup the groups of the genetic stratification. In another embodiment the genetic stratification is used to subgroup the groups of the proteomic stratification. In an embodiment the proteomic stratification of a patient into a high-risk or low risk group is used to subdivide the intermediate group of a genetic standard such as ELN and/or NCCN, into Intermediate-high-risk, Intermediate-low-risk.

In another embodiment the sub-groups Intermediate-high-risk, Intermediate-low-risk are combined with the groups of the genetic standard to form three combined groups, comprising Highest combined risk (Adverse High, Intermediate-high), Intermediate combined risk (Adverse-low, Intermediate-low) and Favorable (Favorable-high, Favorable -low).

Alternatively, the categories can be constructed as: Highest combined risk (Adverse High, Intermediate-high), Intermediate combined risk (Adverse-low, Favorable-high) and Lowest combined risk (Intermediate-low, Favourable-low).

The proteomic risk stratification comprising high-risk, intermediate-risk and low-risk can also be used as stated above i.e., as a secondary measure to further subdivide patients within the genetic categories providing in total 9 categories when combined with the genetic standard.

In the present disclosure “guidelines” and “standard” are used interchangeably.

Scoring matrix

Alternatively, a scoring matrix can be used where patients are scored 3 for the highest risk, 2 for intermediate and 1 for low risk / favourable in the genetic risk stratification and the proteomics risk stratification respectively, as shown in Table 6. Table 6 - scoring matrix

Thus, in an embodiment, the proteomic and genetic risk stratification are combined using a scoring matrix, providing the groups Adverse-risk, Intermediate-risk and Low-risk.

In an embodiment, the proteomic and genetic risk stratification are combined using a scoring matrix, as shown in Table 6.

Accordingly, the present invention may also be used for stratifying participants for clinical studies. An example here could be the choice between using Oxford Biomedica's gene therapy called ProSavin (OXB101) that is most effective in treating early-stage patients or using their higher dosage drug, OXB102 which is targeted at patients in an advanced disease state.

Overall, complete, combined, clinical and molecular profile

The patient to be stratified with the method described herein is a human either suspected of having AML, or having been diagnosed with AML, such as high-risk AML. Methods for identifying subjects suspected of having AML may include physical examination, subject's family medical history, subject's medical history, biopsy, or a number of imaging technologies such as ultrasonography, computed tomography, magnetic resonance imaging, magnetic resonance spectroscopy, or positron emission tomography. Diagnostic methods for AML and the clinical delineation of AML diagnoses are well known to those of skill in the medical arts. The present invention relates to the application of recently developed proteomic methods for identifying expression levels of AML-related proteins selected to correlate with AML progression and predicting disease outcome useful for diagnosis and prognosis. As is well known in the field, a patients response to treatment and survival or remission is always highly dependent on the patient’s overall, complete, combined, clinical and molecular profile. Thus, the presented method can in the clinic be combined with assessing age, gender, pre-treatment, other complications, coadministrations etc.

Thus, a method for stratifying an AML patient into a high-risk or a low-risk group according to the present invention in one embodiment further comprises, stratifying AML patients belonging to the following patient groups selected from the group consisting of age, gender, pre-treatment, AML- aetiology, ECOG status, other complications and co-administrations.

The proteomics based risk stratification can be combined with genetics based risk assessment (e.g. ELN2017) in several straightforward ways.

Drug screening

Furthermore, the present invention may be utilized in drug screening for identifying drugs suitable for restoring normal translation in a cell. For instance, compound libraries may be screened for modulators of decreased overall translation by applying a candidate compound over a cell culture showing decreased or increased protein expression of any one of the proteins listed in Table 1 , 4, or 5, such as in a cell culture obtained from a subject with AML, and analyzing the compound for any change in the level of translation by any standard technique known in the art. Normalized translation by pharmacological means would be expected to prevent, cure, ameliorate or alleviate AML.

Predicting response to apoptosis modulating drugs

The present invention relates to the application of recently developed proteomic methods for identifying expression levels of AML-related proteins selected to correlate with AML progression and predicting disease outcome useful for diagnosis and prognosis. Furthermore, said method can be used for predicting response to apoptosis modulating drugs (including but not limited to Venetoclax, Navitoclax and Triciribine or combinations thereof) in an AML patient, wherein the patient is stratified as respondent when said patient is found to belong to the high-risk group as defined by proteomic risk profiling.

Chemotherapy

Furthermore, the herein disclosed method can be used for predicting response to standard chemotherapy wherein the patient is stratified as respondent when said patient is found to belong to the low-risk group. AS shown in the experimental section, patients belonging to the low-risk group defined by proteomic risk profiling are significantly better responders (a higher proportion reach complete remission defined as <5% blasts in Bone Marrow).

Monitoring

In some embodiments, biological samples may be obtained from the subject at various time points before, during, or after treatment. The level of translation in the biological sample is then determined and compared with that in a biological sample obtained from the same subject at a different time point, or with a control level obtained, for example, from a reference sample derived from an individual whose AML state is known and/or who has not been exposed to said treatment. In some embodiments, predetermined reference values obtained from a pool of apparently healthy individuals may be used as control levels in said comparisons.

Accordingly, the present invention may be used not only for diagnostic purposes but also for monitoring AML. As used herein, the term "monitoring" includes monitoring a subject's disease state or progression of AML over time, as well as monitoring any possible remission or relapse of the disease, or response to treatment. Said monitoring may be carried out by continuously assessing the level of translation and/or performing a medical test repeatedly. In some embodiments of the present invention, a subject's disease state is monitored by obtaining samples repeatedly, assaying the samples for decreased overall translation, and comparing assay results with one another and with a reference value to identify any change in the subject's disease state. Any of these aspects of the invention may be used in combination with other diagnostic tests.

In some embodiments, any presently known or future preventative and/or treatment regimen may be prescribed and/or administered to subjects diagnosed with AML by the present method. Methods are thus provided for identifying a subject with high-risk or low-risk AML, and then prescribing a preventative or therapeutic regimen to said subject. Thus, some embodiments may be directed to methods of reducing risk of AML or treating AML in a subject. In some further embodiments, such methods may be carried out in the context of a clinical study.

Therapeutic target

In some aspects, the specific AML associated proteins identified herein may be utilized as a therapeutic target. These proteins can be targeted by specific reagents designed to interfere with their functions and or expression. For example, many of these proteins have specific receptors and therapeutic agents can be used to block the interactions of proteins with their receptors or with other AML associated proteins in order to treat AML. Additionally, some of the proteins are enzymes. Therapeutics may be used to interfere with the enzymatic activities of these proteins. Additionally, the expression of these proteins can be inhibited using inhibitory RNA. A therapeutic agent useful for blocking a protein-receptor or a protein-protein interaction is any type of reagent that binds to one or both of the proteins (receptor or ligand) and blocks the proteins from interacting. The reagent may be a protein, small molecule, nucleic acid or any other type of molecule which binds to and blocks the interaction, such as a receptor antagonist. For example, the reagent may be (using antibodies, antibody fragments, peptides or peptidomimetics. Integrins as therapeutic reagents are described in, for example, Goodman and Picard, TIPS, 968, p. 1 (2012). Additionally anti-integrins, such as anti-integrin antibodies may be used as therapeutic reagents.

A therapeutic agent useful for blocking enzyme function is any reagent that interrupts the interaction or activity of the enzyme with its substrate. For example, the reagent may directly interfere with the interaction. For instance, a structural antagonist of the substrate may compete for binding to the enzyme and block the interaction between the enzyme and substrate. Additionally, the regent may indirectly interfere with the interaction by causing a conformational change or stability change in the enzyme which results in a loss of the enzymes ability to bind to the substrate or act on the substrate.

As used herein, the term treat, treated, or treating when used with respect to a disorder refers to a prophylactic treatment which increases the resistance of a subject to development of the disease or, in other words, decreases the likelihood that the subject will develop the disease as well as a treatment after the subject has developed the disease in order to fight the disease, prevent the disease from becoming worse, or slow the progression of the disease compared to in the absence of the therapy.

Kit for use in diagnosing or stratifying an AML patient

The present invention also provides a kit for use in the herein disclosed method for stratifying an AML patient. Said kit may comprise any reagents or test agents necessary for assessing protein translation as disclosed herein. Those skilled in the art can easily determine the reagents to be included depending on the specifics of the embodiment in question and a desired technique for carrying out said assessment. In some embodiments, an appropriate control sample or a threshold value may be comprised in the kit. The kit may also comprise a computer readable medium, comprising computer-executable instructions for performing any of the methods of the present disclosure. Machine learning

In addition, the present invention also presents a method for determining proteomic marker profiles for stratifying AML patients into at least a high-risk or a low-risk group, based on comparing the proteomic profiles of an AML patient cohort of AML patients, wherein the method comprises, a. isolating blood and/or tissue samples from said patients, b. processing said sample(s), wherein the processing comprises i. extracting expressed proteins from said sample, c. analyzing the extracted proteins, d. determining the median expression level of said extracted proteins for the intermediate samples of the cohort, and e. quantitatively determining the proteomic risk profile of said patients by identifying abnormal protein levels of at least 5 proteins selected from the list of proteins with SEQ ID NOs: 1-17500, wherein the protein level is considered abnormal when the level of any one of the proteins with a SEQ ID NO as provided in Table 3 are <95% of the median expression level for the intermediate samples of the cohort, or any one of the proteins with a SEQ ID NO as provided in Table 2 are up-regulated by at least >5% above the median expression level of the cohort, f. feeding said proteomic profiles into a machine learning algorithm to determine which protein profiles are high-risk or low-risk determinants, g. validating the outcome of step f) in a sample group of patients with known disease outcome.

A machine learning algorithm to determine which protein profiles are high-risk or low-risk determinants in step f. can e.g., be either Hidden Markov Model (HMM) or unsupervised machine learning (ML) or supervised machine learning (ML) or a combination thereof.

As is apparent to a skilled person, any embodiments, details, advantages, etc. of the disclosed methods apply accordingly to other aspects of the present invention, including a kit for use in said methods, and vice versa. It will be obvious to a person skilled in the art that, as technology advances, the inventive concept can be implemented in various ways. The invention and its embodiments are not limited to the examples described above but may vary within the scope of the claims.

EXAMPLES

The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the subject invention and are not intended to limit the scope of what is regarded as the invention. Efforts have been made to ensure accuracy with respect to the numbers used (e.g., amounts, temperature, concentrations, etc.) but some experimental errors and deviations should be allowed for. Unless otherwise indicated, parts are parts by weight, molecular weight is average molecular weight, temperature is in degrees centigrade; and pressure is at or near atmospheric.

Example 1 Proteogenomic Analysis of treatment naive Acute Myeloid Leukemia cohort

The present example relates to the prognostic analysis using the proteomic analysis of the present disclosure, in brief, the cohort consisted of 274 patients form the Clinseq-AML cohort that was treated in Sweden between February 1997 and August 2014.

Materials and Methods

Patients

Bone marrow or peripheral blood samples were obtained at the time of diagnosis from 274 AML patients of the Clinseq-AML cohort treated in Sweden between February 1997 and August 2014. Samples were separated for mononuclear cells and stored in isothermal liquid nitrogen freezers at -180 °C. All patients were treated with intensive induction regimens, including anthracyclines and cytosine arabinoside, according to national guidelines. Clinical data was retrieved from the Swedish Adult Acute Leukemia Registry or from patient records. For the proteomics study 118 patient samples from the Clinseq-AML cohort, which had sufficient material and were representative of the greater cohort were selected.

Data dependent acquisition Mass Spectrometry using TMT 10 plex labelling

Cell pellets were dissolved in Lysis buffer (4% SDS, 50 mM HEPES pH 7,6, 1 mM DTT), heated to 95°C and sonicated. The total protein amount was estimated (Bio-Rad DC). Samples were then prepared for mass spectrometry analysis using a modified version of the SP3 protein clean-up and a digestion protocol [Moggridge, S., et al., J Proteome Res, 2018. 17(4): p. 1730-1740; Hughes, C.S., et al., Mol Syst Biol, 2014. 10: p. 757.], where proteins were digested by LysC and trypsin (sequencing grade modified, Pierce). In brief, up to 250 pg protein from each sample was alkylated with 4 mM Chloroacetamide. Sera-Mag SP3 bead mix (20 pl) was transferred into the protein sample together with 100% Acetonitrile to a final concentration of 70 %. The mix was incubated under rotation at room temperature for 18 min. The mix was placed on the magnetic rack and the supernatant was discarded, followed by two washes with 70 % ethanol and one with 100 % acetonitrile. The beads-protein mixture was reconstituted in 100 pl LysC buffer (0.5 M Urea, 50 mM HEPES pH: 7.6 and 1 :50 enzyme (LysC) to protein ratio) and incubated overnight. Finally, trypsin was added in 1 :50 enzyme to protein ratio in 100 pl 50 mM HEPES pH 7.6 and incubated overnight. The peptides were eluted from the mixture after placing the mixture on a magnetic rack, followed by peptide concentration measurement (Bio-Rad DC Assay). The samples were then pH adjusted using TEAB pH 8.5 (100 mM final cone.), 65 pg of peptides from each sample were labelled with isobaric TMT-tags (TMT plex reagent) according to the manufacturer’s protocol (Thermo Scientific). Each set consisted of 9 individual patient samples and the tenth channel contained the same sample pool in each set, consisting of a mixture of patient samples. Sample pools were used as denominators when calculating TMT-ratios and thus served to link the 8 sets together. The tryptic peptides for each set were separated by immobilized pH gradient - isoelectric focusing (IPG-IEF) on 3-10 strips.

Of note, the labelling efficiency was determined by LC-MS/MS before pooling of the samples. For the sample clean-up step, a solid phase extraction (SPE strata-X-C, Phenomenex) was performed and purified samples were dried in a SpeedVac. An aliquot of approximately 10 pg was suspended in LC mobile phase A and 1 pg was injected on the LC-MS/MS system.

Online LC-MS was performed using a Dionex UltiMate™ 3000 RSLCnano System coupled to a Q- Exactive-HF mass spectrometer (Thermo Scientific). Each of the 72 plate wells was dissolved in 20ul solvent A and 10ul were injected. Samples were trapped on a C18 guard-desalting column (Acclaim PepMap 100, 75pm x 2 cm, nanoViper, C18, 5 pm, 100A), and separated on a 50 cm long C18 column (Easy spray PepMap RSLC, C18, 2 pm, 100A, 75 pm x 50 cm). The nano capillary solvent A was 95% water, 5%DMSO, 0.1% formic acid; and solvent B was 5% water, 5% DMSO, 95% acetonitrile, 0.1% formic acid. At a constant flow of 0.25 pl min-1 , the curved gradient went from 6-8% B up to 40% B in each fraction in a dynamic range of gradient length followed by a steep increase to 100% B in 5 min. FTMS master scans with 60,000 resolution (and mass range 300-1500 m/z) were followed by data-dependent MS/MS (30 000 resolution) on the top 5 ions using higher energy collision dissociation (HCD) at 30% normalized collision energy. Precursors were isolated with a 2 m/z window. Automatic gain control (AGO) targets were 1e6 for MS1 and 1e5 for MS2. Maximum injection times were 100 ms for MS1 and 100 ms for MS2. The entire duty cycle lasted ~2.5 s. Dynamic exclusion was used with 30 s duration. Precursors with unassigned charge state or charge state 1 were excluded. An underfill ratio of 1% was used.

Protein and peptide identification and quantification was carried out as previously described [Branca, R.M., et al., Nat Methods, 2014. 11(1): p. 59-62; Johansson, H., et al., Nat Commun, 2019]. Briefly, Orbitrap raw MS/MS files were converted to mzML format using msConvert from the ProteoWizard tool suite. Spectra were then searched using MSGF+ (v10072) and Percolator (v2.08), where search results from 8 subsequent fraction were grouped for Percolator target/decoy analysis. All searches were done against the human protein subset of Ensembl 92 in the Galaxy platform. MSGF+ settings included precursor mass tolerance of 10 ppm, fully-tryptic peptides, maximum peptide length of 50 amino acids and a maximum charge of 6. Fixed modifications were TMT-10plex on lysines and peptide N-termini, and carbamidomethylation on cysteine residues, a variable modification was used for oxidation on methionine residues. Quantification of TMT-10plex reporter ions was done using OpenMS project’s IsobaricAnalyzer (v2.0). PSMs found at 1% FDR (false discovery rate) were used to infer gene identities.

Protein quantification by TMT plex reporter ions was calculated using TMT PSM ratios to the entire sample set (all 10 TMT-channels) and normalized to the sample median. The median PSM TMT reporter ratio from peptides unique to a gene symbol was used for quantification. Protein false discovery rates were calculated using the picked-FDR method using gene symbols as protein groups and limited to 1% FDR.

Data independent acquisition Mass Spectrometry - label free

Cell pellets were dissolved in Lysis buffer (4% SDS, 50 mM HEPES pH 7,6, 1 mM DTT), heated to 95°C and sonicated. The total protein amount was estimated (Bio-Rad DC). Samples were then prepared for mass spectrometry analysis using a modified version of the SP3 protein clean-up and a digestion protocol, where proteins were digested by LysC and trypsin (sequencing grade modified, Pierce). In brief, up to 250 pg protein from each sample was alkylated with 4 mM Chloroacetamide. Sera-Mag SP3 bead mix (20 pl) was transferred into the protein sample together with 100% Acetonitrile to a final concentration of 70 %. The mix was incubated under rotation at room temperature for 18 min. The mix was placed on the magnetic rack and the supernatant was discarded, followed by two washes with 70 % ethanol and one with 100 % acetonitrile. The beads- protein mixture was reconstituted in 100 pl LysC buffer (0.5 M Urea, 50 mM HEPES pH: 7.6 and 1 :50 enzyme (LysC) to protein ratio) and incubated overnight. Finally, trypsin was added in 1 :50 enzyme to protein ratio in 100 pl 50 mM HEPES pH 7.6 and incubated overnight. The peptides were eluted from the mixture after placing the mixture on a magnetic rack, followed by peptide concentration measurement (Bio-Rad DC Assay).

Thereafter, a peptide clean-up was performed for all samples using the SP3 method. Briefly, fresh SP3 beads suspension (10 pg/pl, 1 :10 bead to sample volume) and ACN (final concentration of 95%) were added to 50-200 pg of peptides and incubated under rotation at RT for 30 min. The tubes were then placed on a magnetic rack, the supernatant was discarded, and the beads were washed twice with 200 pl of ACN. The beads were briefly air-dried, after which the peptides were eluted with 100 pl of 3% ACN/0.1% FA and transferred to a new tube. The peptide concentration was measured using the Bio-Rad DC protein assay. The required quantities for further LC-MS analysis were aliquoted and dried in a SpeedVac.

MS data acquisition

Peptides were separated using a FAIMS system coupled to a Q Exactive Exploris (Thermo Fischer Scientific, San Jose, CA, USA). Samples were trapped on an Acclaim PepMap nanotrap column (C18, 3 mm, 100 A, 75 pm x 20 mm, Thermo Scientific), and separated on an Acclaim PepMap RSLC column (C18, 2 pm bead size, 100 A, 75 pm x 50 cm, Thermo Scientific). Peptides were separated using a gradient of mobile phase A (5% DMSO, 0.1% FA) and B (90% ACN, 5% DMSO, 0.1% FA), ranging from 6% to 30% B in 180 min with a flow of 250 nl/min.

For the DIA-based analysis of the individual tumor samples, the samples were dissolved in phase A (5% DMSO, 0.1% FA) and 5 pg of peptides were injected into the LC-MS system. The data was acquired using a variable window strategy. The survey scan was performed at 120,000 resolution from 400-1200 m/z, with a max injection time of 200 ms and target of 1 x 106 ions. For generation of HCD fragmentation spectra, maximum ion injection time was set as auto and AGC of 2 x 105 were used before fragmentation at 25% normalized collision energy, 30,000 resolution. The sizes of the precursor ion selection windows were optimized to have similar density of precursors m/z based on identified peptides from the spectral library. The median size of windows was 18.3 m/z with a range of 15-88 m/z covering the scan range of 400-1200 m/z. Neighbor windows had 2 m/z overlap.

DIA-based peptide and protein identification and quantification

Peptide and protein identification and quantification were performed using Spectronaut. All parameters were set as default and for each peptide, the best 3 to 6 fragments were used. Results were filtered at all the precursor, peptide, and protein levels with 1% False discovery rate (FDR). For protein identification and quantification, all DIA raw files were analyzed by Spectronaut software package (version 13.10) from Biognosys. Files were searched against ENSEMBL protein database (GRCh38.92.pep.all.fasta). All parameters were kept as default for protein identification. Briefly, runs were recalibrated using iRT (retention time normalization) standard peptides in a local and non-linear regression. Precursors, peptides and proteins were filtered with FDR 1%. The decoy database was created by mutation method. For quantification, only peptides unique to a protein group were used. Protein groups were defined based on gene symbols to obtain a gene symbol centric quantification. Stripped peptide quantification was defined as the top precursor quantity. Protein group quantification was calculated by the median value of up to 3 most abundant peptides. Normalization was performed at the MS2 level and quantification at the MS2 level based on the peak area. The data filtering was set as Q value for each sample. Some identifications did not have true quantifications at the MS1 level and the instrument’s software automatically imputed these with 1 , thus, these values of 1 were treated as NAs for further quantitative analysis.

From the identified protein groups, a list of 2519 protein groups identified by their gene symbol were identified as relevant for the classification of the patients into different groups, the groups being high-risk and low-risk groups or high-risk, intermediate-risk and low-risk groups, with relation to the overall survival after 1 (one) year, event free survival, overall survival and/or reaching complete remission.

Classifying patients into groups

The identified gene symbol lists, protein groups, and/or proteins can be utilized in several ways to stratify patients into 2 or 3 groups.

Data preparation

Using Labelfree LC-MS/MS data generated by a Data Independent Analysis, as described above, following steps were done after the data was obtained from the mass spectrometer(s): a) Data normalization using Spectronaut (automated). b) MS2-level value quantification (described above). c) Samples with less than 4700 protein groups with corresponding gene symbols identified (as described above) were excluded due to too poor coverage, in the present example this excluded 16 samples. d) All markers identified and quantified in all samples were utilized

I. For the full protein group set comprising 2519 protein groups/gene symbols/markers n = 1028/2519 (Protein sequences provided in sequence listing, SEQ ID NOs 1-17500)

II. For the medium protein group set comprising 537 protein groups/gene symbols/markers (Protein sequences provided in Table 4) n = 332/537 III. For the small protein group set comprising 249 protein groups/gene symbols/markers (Protein sequences provided in Table 5) n= 178/249. e) Markers seen as upregulated in the high-risk group (“up”) were considered upregulated if levels were >5% above the median level for intermediate samples. Similarly, markers seen as downregulated in the high-risk group were considered downregulated if they were <95% of the median level for the intermediate samples. f) The number of markers having the correct regulation (up or down) were summed for each sample.

Patient classification

The data that was prepared for each sample could now be classified into different risk-groups in different ways. In the present example, two groups or three were initially used for the classification, as described in the following steps using a simple algorithm: a) Samples were classified using the following cut-offs:

I. 2 groups (a high-risk and a low-risk group) i. If more than 50% of identified markers were up or down regulated in the correct direction (marker positivity) the sample/patient was considered high-risk.

II. 3 groups (high risk - intermediate risk - low risk) i. If less than 35% of identified markers were up or down regulated in the correct direction see tables 2 and 3the sample/patient was considered low-risk. ii. If >50-58% of identified markers were up or down regulated in the correct direction (markers listed in tablel >55%, table 4> 54%, table 5 >50) the sample/patient was considered high-risk.

Hi. Remaining patients were considered intermediate risk. b) Kaplan-Meier curves were calculated for each classification as is shown in Figures 2 and 3 for the marker lists of different sizes.

The classification of the patients is also exemplified in Examples 1 and 2 of the present disclosure.

Similar approaches could also be employed, if the samples are analysed by DDA (Data Dependent Analysis) and/or isobaric labelling approaches the relationship to an internal standard could be utilized. For Parallel Reaction Monitoring (PRM) based approaches relationships to spiked in labelled peptides could be used.

Additionally, the protein markers could be quantified using affinity-based approaches and compared to a reference or standard. Possible technologies include Western blot, ELISA, Proximity Ligation Assays, Proximity Extension Assays, aptamer-, affimer or bead-based technologies (e.g. Luminex), Reversed phase protein arrays or similar technologies known to the skilled person.

Classification of patients based on the gene lists could be done as specified above (with potential method or reference modifications) or could utilize alternative approaches based on clustering or machine learning. This would include but not be limited to the following either alone or in combination: Random Forest, k-Top Scoring Pairs, k-Nearest Neighbor, Support Vector Machines, oPLS-DA, PLS-DA, t-SNE, UMAP, PCA, lasso, Decision Trees, Naive Bayes, Logistic Regression.

Output from classification can be utilized in combination with existing genomic based stratification methods to improve patient risk assessment.

Results

Patients were characterized by mass-spectrometry based proteomics as well as RNA-sequencing, DNA-panel sequencing, Epic array and ex-vivo drug screening at baseline (Fig 5A). Longitudinal follow-up, survival outcome and clinical characteristics as well as treatment response was recorded for all patients. Mutational frequencies of commonly mutated genes were comparable to previous studies.

The proteomic landscape of AML

To better understand how the phenotypes of the proteome landscape of AML and how they relate to survival and treatment outcomes, a previously developed method was applied, in-depth HiRIEF- LC-MS/MS [22], which has been utilized previously to perform in-depth characterizations of clinical ALL [23], Breast [24] and Lung cancer [Lehtid et al. in revision] samples. The workflow identified and quantified more than 12000 protein products (gene centric, FDR<1%) across the entire cohort, with a full overlap of 8632 proteins (Fig 5B).

Using hierarchical clustering 9 clusters were identified in the cohort (Fig 5C,). Survival outcomes differed between the distinct clusters (Fig 5D,), with cluster 4 and 1 exhibiting poorer survival and patients in cluster 2 and 7 exhibiting longer survival. The experimental part also discloses observed treatment response, measured as whether patients achieved complete remission following induction therapy, and genetic risk classification (ELN2017) varied considerably between the clusters (not shown). Patients grouped in cluster 1 showed an average treatment response (60% CR, compared to 64% for the entire cohort), they also exhibited an overall intermediate risk classification (ELN2017, not shown). Patients in cluster ? exhibited better treatment response (83% CR) as well as overall favorable ELN2017-scores while patients in clusters 2 and 3 had improved CR-rates (80% and 70%, respectively), but more adverse ELN-scores. For clusters 4-6 the remission rates were close to 50% in general (not shown).

To investigate the relationship between mutational patterns and proteome phenotypes the panel sequencing data [Wang, M., et al., Validation of risk stratification models in acute myeloid leukemia using sequencing-based molecular profiling. Leukemia, 2017. 31(10): p. 2029-2036] was utilized to investigate if there were enrichment in the clusters for specific genetic aberrations. It was found that cluster 1 was enriched for FLT3 internal tandem duplications (ITD) (adj. pval = 0.0006), NPM1 mutations (adj. pval = 0.0022) and biallelic CEBPA mutations (adj. pval = 0.030) while cluster 4 showed enrichment for mutations in splice factors SF3B1 and U2AF1 (adj.pvals = 0.005 and 0.089, respectively, figure 5D and F). Cluster 7 consisted entirely of patients with inv( 16) (enrichment adj. pval < 3.1E-9), which is in line with previous reports from transcriptomic studies of AML where the expression profile of inv(16) is clearly distinguishable from other AML subtypes [Gutierrez, N.C., et al., Gene expression profile reveals deregulation of genes with relevant functions in the different subclasses of acute myeloid leukemia. Leukemia, 2005. 19(3): p. 402-9]. By applying the mutationbased classifications from Papaemmanuil et al. [Papaemmanuil, E., H. Dohner, and P.J. Campbell, Genomic Classification in Acute Myeloid Leukemia. N Engl J Med, 2016. 375(9): p. 900-1] their classification was also compared to the proteomic clusters; again inv(16) overlapped with cluster 7, and it was found that NPM1 mutated AML accounted for the majority of cases in Cluster 1 , 2 and 6. Biallelic CEBPA was predominantly found in cluster 1 , but MLL-rearranged cases divided evenly between cluster 1 and 2. The p53-Complex Karyotype subtype was also evenly divided between clusters 1 and 3 while the spliceosome-chromosomal modifier subtype was spread out across several clusters. Cluster 4 was almost entirely made up of this subtype however, reflecting the enrichment for splice factor mutations observed in that cluster (Fig 5F).

Conversely, it was also investigated which mutations and genetic aberrations had the largest impact on protein levels and found that NPM1 , FLT3 ITD led to the most prominent changes on the proteome level closely followed by transcription related changes due to biallelic CEBPA mutation and inv(16) (not shown). For 117 of the patients matching RNAseq and proteomics data were obtained, and it was compared the correlation of the overlap (n=8971) on the gene symbol level. A median spearman correlation of 0.36 was found, which is within the range of what has been reported in previous studies of solid tumors [Johansson, H., et al., Breast cancer quantitative proteome and proteogenomic landscape. Nat Commun, 2019. In Press; Mertins, P., et al., Proteogenomics connects somatic mutations to signalling in breast cancer. Nature, 2016. 534(7605): p. 55-62; Zhang, H., et al., Integrated Proteogenomic Characterization of Human High- Grade Serous Ovarian Cancer. Cell, 2016. 166(3): p. 755-765.], notably higher and a greater proportion of significant correlations was found (66%, adjusted P-value < 0.01 ) compared to what was found previously in acute lymphoblastic leukemia [Yang, M., et al., Proteogenomics and Hi-C reveal transcriptional dysregulation in high hyperdiploid childhood acute lymphoblastic leukemia. Nat Commun, 2019. 10(1): p. 1519.] (not shown). Correlations of complex members was also significantly higher on the protein level compared to the mRNA level in line with previous reports [Johansson, H., et al., Breast cancer quantitative proteome and proteogenomic landscape. Nat Commun, 2019. In Press; Yang, M., et al., Proteogenomics and Hi-C reveal transcriptional dysregulation in high hyperdiploid childhood acute lymphoblastic leukemia. Nat Commun, 2019. 10(1 ): p. 1519] (not shown).

It was also noted that the proteome level clusters also differed in terms of FAB-classification [Bennett, J.M., et al., Proposals for the classification of the acute leukaemias. French-American- British (FAB) co-operative group. Br J Haematol, 1976. 33(4): p. 451-8] with stem-like or granulocytic differentiated subtypes (M0-M3) being more prevalent in cluster 1 ,4 and 6 while monocytic subtypes (M4-M5) were dominant in cluster 2 (not shown). In line with this, monocytic markers (CD14, and CD36) were elevated in samples from patients from cluster 2 and to a degree also in cluster 7 (i n v( 16)) (not shown). Patients in cluster 4 also exhibited increased levels of sternness markers CD34 [Krause, D.S., et al., CD34: structure, biology, and clinical utility. Blood, 1996. 87(1): p. 1-13] and CD133 [Wuchter, C„ et al., Impact of CD133 (AC133) and CD90 expression analysis for acute leukemia immunophenotyping. Haematologica, 2001. 86(2): p. 154- 61 .]. Additionally, it was also explored how signature gene sets for AML subtype differentiation derived from single cell RNAseq van Galen, P., et al., Single-Cell RNA-Seq Reveals AML Hierarchies Relevant to Disease Progression and Immunity. Cell, 2019. 176(6): p. 1265-1281 e24. differed between the presently presented clusters and found that protein levels of genes associated to a HSC-like phenotype were up in cluster 4 and 5, clusters 1 , 6 and 8 had a more progenitor ZGMP like phenotype while cluster 2, 3 and to a degree 7 exhibited a promono/monocytic phenotype (Fig 5H).

To more broadly characterize the proteomic phenotypes the individual clusters were compared, and enrichment analysis was employed to investigate up and down regulated pathways (Fig 5I -). It was observed that monocyte related genes as well as genes associated to inflammation, TNF- signaling and I L10-signaling were upregulated in cluster 2. In clusters 3 and 5 increased levels of integrin signalling, and integrin interaction related proteins was found. The inv(16) related cluster 7 exhibited decreased oxidative phosphorylation as well as increased chromosome maintenance. Clusters 1 and 4 exhibited increased levels of proteins related to transcription, mRNA processing and splicing. Specifically, spliceosome proteins were generally upregulated in these two clusters (Fig 6A). Clustering of spliceosomal proteins

The patients were stratified into three groups based on clustering of spliceosomal proteins (Fig 6A) and investigated whether there is an association between outcome and spliceosome proteins. It was found that higher spliceosome levels were significantly associated with shorter overall survival (Fig 6B) and that lower levels were associated to a higher rate of complete remission (Chi-square High vs Low p-value = 0.02; Intermediate vs Low p-value = 0.05). Spliceosomal proteins were overall well correlated (not shown) while transcripts exhibited a lower level of correlation (Wilcoxon pval <1E-27). Additionally, the mRNA-protein correlations for the spliceosomal gene products were overall poor and the same phenotype could not be observed using the RNA sequencing data (fig 6C). No gender differences were observed between the groups, but patients with low levels tended to be older. Also, patients with low or intermediate spliceosome levels had lower percentages of blasts both in the bone marrow and lower absolute levels in peripheral circulation compared to patients with high levels. Patients with a high spliceosome phenotype also had lower thrombocyte levels (median TPK: 54); low thrombocyte levels have previously been linked to improved OS in AML [Zhang, Y., et al., Low Platelet Counts at Diagnosis Predict Better Survival for Patients with Intermediate-Risk Acute Myeloid Leukemia. Acta Haematol, 2020. 143(1 ): p. 9-18.] (not shown).

There were no obvious differences in FAB classification between the three groups but using the scRNAseq derived gene sets described above it was observed that low spliceosome levels were more associated to a promono- or monocyte-like subtype and that the patients with high levels of spliceosome proteins also had increased levels of proteins associated to a progenitor-like state (Fig 6D).

There was no found association (x2-test, p = 0.62), between ELN2017 classification and spliceosome levels indicating that they are independent risk factors (Fig. 6E-F). Univariate coxregression confirmed that both Adverse ELN-classification as well as high spliceosome levels were related to survival and both metrics retained their significant in a multivariable model indicating independent contributions (not shown). Indeed, combining the two metrics lead to improved stratification of the patient cohort in relation to overall survival (Fig 6G). To better elucidate the genetic contribution to the high spliceosome phenotype it was further investigated which mutations were found more frequently in the high and low spliceosome groups respectively (Fig 6H).

Markers for cell proliferation (MKI67 (Spearman cor = -0.158, p >0.05), PCNA (Spearman cor = 0.180, p >0.05), TOP2A (Spearman cor = -0.222, p <0.05)) did not display significant positive correlation to spliceosome levels. Comparing the three groups it was found that in addition to spliceosome and RNA processing genes being upregulated, genes involved in chromatin organization, sumoylation and DNA repair were also upregulated. Gene sets related to fatty acid metabolism, GPCR and chemokine signalling, hematopoietic lineage markers and cell adhesion/integrins/focal adhesion were comparatively downregulated (Fig 6I).

Example 2 - prospective analysis of patients

Introduction and Results

To evaluate the prognostic power of the method of the present disclosure, a small set of diagnosis samples was analysed prospectively via DIA-LC-MS/MS. 15 patients who were diagnosed with AML at a single center in the Stockholm County region and who after sampling received as induction treatment for their AML only a combination of Cytarabine and Danorubicin, or patients who received no treatment.

Of the 15 patient samples analysed, 10 of the samples had robust quantification (>4700 gene products quantified, selection criteria as set in Example 1 ). These samples were classified in the same manner as the retrospective, discovery cohort of Example 1 .

When stratifying the patients into two groups (High and low risk respectively) 7 patients were classified as low risk and 3 as high risk. Of the 7 high-risk patients, 3 patients had very high marker positivity (>65%; range: 67% - 76%) on average for the three lists (table 7).

Table 7 - Classification output of 10 prospective samples.

Marker Positivity* Classification (2 groups)

Patient Large list Medium list Small list Large list Medium list Small list

Patient s 74% 78% 75% High High High

Patient 10 63% 69% 68% High High High

Patient 1 73% 64% 63% High High High

Patient 9 61% 61% 57% High High High

Patient 8 51% 60% 60% High High High

Patient s 59% 59% 60% High High High

Patient 2 64% 57% 61 % High High High

Patient 4 42% 48% 46% Low Low Low

Patient 6 44% 40% 43% Low Low Low Patient 7 24% 26% 27% Low Low Low

In Table 7, the marker positivity was assessed in all patients, the percentages indicate what proportion of detected markers (n = 935 for the Large, n = 312 for the Medium and n = 168 for the Small list) had abundance levels in the correct direction relative to threshold values. Classification was based on the same percentage cutoffs used for classifying the prospective DIA samples.

Next, the marker positivity percentages were employed to stratify the patients into 3 groups (High, Intermediate and Low risk) using the same cut-offs as employed for classifying the retrospective DIA- samples in Example 1.

Table 8 - Classification output of 10 prospective samples into 3 groups.

Marker Positivity* Classification (3 groups)

Patient Large Medium Small Large Medium Small

Patient 3 74% 78% 75% High High High

Patient

10 63% 69% 68% High High High

Patient 1 73% 64% 63% High High High

Patient 9 61% 61 % 57% High High High

Patient 8 51% 60% 60% Intermediate High High

Patient 5 59% 59% 60% High High High

Patient 2 64% 57% 61 % High High High

Intermedi Intermed

Patient 4 42% 48% 46% Intermediate ate ate

Intermedi Intermed

Patient 6 44% 40% 43% Intermediate ate ate

Patient 7 24% 26% 27% Low Low Low

In table 8, the marker positivity was assessed in all patients, the percentages indicate what proportion of detected markers (n = 935 for the Large, n = 312 for the Medium and n = 168 for the Small list) had abundance levels in the correct direction relative to threshold values. Classification was based on the same percentage cut-offs used for classifying the prospective DIA samples into 3 groups.

To assess concordance with existing genetic risk-stratification, standard ELN2017 was estimated from the existing clinical data, which is shown in Table 9. Table 9 - Estimation of ELN2017 risk in 10 prospective patients.

Patient ELN estimate ELN-related changes

Patient 3 Adverse RUNX1 mutation

Patient 10 Favorable NPM1 , no FLT3 mutation, normal karyotype

Patient 1 Favorable Biallelic-CEBPA mutation

Patient 9 Favorable NPM1 , no FLT3 mutation, normal karyotype

Patient 8 Intermediate normal karyotype

Patient 5 Intermediate normal karyotype

Patient 2 Adverse ASXL1 Mutation

Patient 4 Favorable inv(16)

Patient 6 Adverse KMT2A-rearr (not 9;11)

Patient 7 Intermediate None

In Table 9, the ELN2017 risk-level was estimated based on existing clinical and genetic data using Table 5 in [Dbhner H, Estey E, Grimwade D, et al. Blood. 2017;129(4):424-447].

To evaluate the two-risk stratification, methods information from clinical records was tabulated (Table 10).

Table 10 - Patient outcomes in 10 prospective patients.

Survival Overall

EFS Treatment Outcome Status (cens allo) Survival

Patient s CR Allo 150* 175* 273

Patient 10 CR Death 156 185 185

Patient 1 CR2 (venetoclax) Allo 71* 121* 416

Patient 9 CR Allo 224* 252* 341

Patient 8 no CR Death 0*** 44 44

Patient 5 no CR (untreated) Death 0*** 6 6

Patient 2 CR Death 65 93 93

Patient 4 CR Lives, no allo 157** 182 182

Patient 6 CR Lives, no allo 137** 163 163

Patient 7 CR Allo 140* 166* 451

In Table 10, the following patient outcomes are tabulated: Treatment outcome, OR = Complete Remission, CR2 = Complete Remission after second treatment (second treatment in parentheses), no OR = Did not reach CR, untreated = Patient died before treatment could begin. Status, Allo = Patient received Allogenic stem cell transplantation, Death = patient died, Lives, no allo = Patient is still alive at end of follow up and did not receive an allogenic HSCT. EFS = Event free survival, calculated from date of CR until any of the following events: * = allogenic HSCT, ** = End of follow up, *** = did not reach CR and 0 is imputed. Survival (cens allo) = Overall Survival time for individual patients from date of diagnosis until either death, end of follow up or date for allogenic HSCT. Overall Survival = Overall Survival time for individual patients from date of diagnosis until either death or end of follow up.

Most of the patients reached CR, one patient (patient 5) died before treatment could begin, one patient did not reach CR (patient 8) and one patient only reached CR after a second round of treatment with Venetoclax (Patient 1 ).

To evaluate the method, the outcome for patients in the different groups were compared. For the classification of patients into 2 groups (high and low, see table 7) the seven patients classified as high risk had an initial treatment response of 57% (4 patients reached CR1 ), in the low-risk group all 3 patients reached CR1. In the high-risk group four of the patients died (median survival 69 days) and the remaining 3 received allogenic HSCT (median survival time before transplantation: 175 days). In the low-risk group, one patient received a transplant 166 days after diagnosis and two patients were still alive after end of follow-up despite no transplantation (median survival: 173 days). In the 3-group classification, all 7 patients classified as high-risk group remained in that group while two of the patients classified as low-risk previously, were classified as intermediate risk (Table 8). Comparatively, the ELN2017 estimation into 3 groups (Adverse, Intermediate and Favorable; table

9) grouped three patients as Adverse risk. All three patients reached CR1 and had a median EFS of 137 days (Table 9 and 10). One patient died, one received an allogenic HSCT and one was still alive at end of follow-up. 3 patients were assessed to belong to the intermediate risk group, one of them reached CR1 while the other two patients died, and both died within 44 days or less of diagnosis. The remaining 4 patients were assessed as favorable by ELN2017 criteria, of these patients, 3 reached CR1 and one reached CR after a second round of treatment with Venetoclax (Table 9 and

10). Two patients received allogenic HSCT (median survival until transplant: 149 days), one patient died 185 days after diagnosis and one patient was still alive at end of follow-up (182 days after diagnosis).

To evaluate if combining the two risk assessment methods could improve the patient stratification the ELN2017 assessments were combined with the proteomic classifications (Table 11 ). Table 11 - Combined scoring of prospective patients based on ELN2017 assessment and proteomic risk groups.

Proteomic

ELN risk Proteomic risk (3 Combined Combined estimate (2 groups) groups) score 1 score 2

Patient 3 Adverse High High 6 6

Patient Favorable High High 4 4

Patient 1 Favorable High High 4 4

Patient 9 Favorable High High 4 4

Intermediat High High 5 5

Patient 8 e

Intermediat High High 5 5

Patient 5 e

Patient 2 Adverse High High 6 6

Patient 4 Favorable Low Intermediate 2 3

Patient 6 Adverse Low Intermediate 4 5

Intermediat

Low Low 3 3

Patient 7 e

In Table 11 , the classifications of patients into 2 groups (Table 7) and 3 groups (Table 8) were summarized and the median classification was used here (Patient 8 was classified as borderline high/intermediate in one list and high in the other two). Adverse gives 3 points, Intermediate 2 points and Favorable 1 point. For the proteomic risk groups High gives 3 points, Intermediate 2 points and Low 1 point. For each patient, the ELN score was summed with the score from the 2 and 3 group classifications into Combined score 1 and 2, respectively.

Table 12 - Patient outcomes and Combined score 1.

Survival

Overall

Combined score Treatment EFS (cens

Survival

1 Outcome Status allo)

Patient s 6 CR Allo 150* 175* 273

Patient 2 6 CR Death 65 93 93

Patient 8 5 no CR Death 0** 44 44 no OR (no

5

Patient 5 treatment) Death 0** 6 6

Patient

4

10 CR Death 156 185 185

Patient 1 4 CR2 (venetoclax) Allo 71* 121* 416

Patient 9 4 CR Allo 224* 252* 341

Lives, no 137**

4

Patient 6 CR allo * 163 163

Patient 7 3 CR Allo 140* 166* 451

Lives, no 157**

2

Patient 4 CR allo * 182 182

In Table 12, patients are sorted by Combined score 1 and combined with patient outcomes from

Table 10. Table 13 - Patient outcomes and Combined score 2.

Combined Treatment Survival (cens Overall

EFS score 2 Outcome Status allo) Survival

Patient 3 6 CR Allo 150* 175* 273

Patient 2 6 CR Death 65 93 93

Patient 8 5 no CR Death 0** 44 44 no CR (no

5

Patient 5 treatment) Death 0** 6 6

Lives, no 137*

5

Patient 6 CR allo ** 163 163

Patient 10 4 CR Death 156 185 185

Patient 1 4 CR2 (venetoclax) Allo 71* 121* 416

Patient 9 4 CR Allo 224* 252* 341

Lives, no 157*

3

Patient 4 CR allo ** 182 182

Patient 7 3 CR Allo 140* 166* 451

In Table 13, patients sorted by Combined score 2 and combined with patient outcomes from table

10. Four patients had a combined score 1 of 5 or 6, the median EFS for the group was 33 days (0-150 days), median survival (cens_allo) was 69 days (6-175) and median Overall survival 69 days (6- 273 days). 2 out of 4 patients reached OR in this group and 3 out of 4 patients died. The remaining patient received allogenic HSCT (Table 12) 150 days after reaching CR. Four patients had a Combined score 1 of 4, the median EFS for the group was 147 days (71-224 days), median survival (cens allo) was 174 days (121-252) and median Overall survival 263 days (163-451 days). 3 out of 4 patients reached CR1 in this group and 1 out of 4 patients died. Two of the remaining patients received allogenic HSCT (Table 6) 71-224 days after reaching CR. For the two patients with a combined score 1 of 3 or less both reached CR1 , one was transplanted and the other patient was still alive at the end of the follow-up. Combined score 2 produced as noted very similar results for all patients except for patient 6 who went from 4 (combined score 1 ) to 5 (combined score 2). Patient 6 was classified as low risk (2 group classification) and intermediate risk (3 group classification). Patient 6 was also assessed to have an adverse ELN2017 risk due to a KMT2A- rearrangement (table 9). Despite this Patient 6 reached CR1 and was still alive at the end of the follow up, 137 days after reaching CR without any transplantation.

Survival curves were plotted for patients stratified by ELN, proteomics or a combination of the two and significance testing was performed to evaluate the differences between groups using the logrank test (Figure 4). Stratifying the 10 patients by the estimated ELN2017-risk produced no significant differences (Figure 4A), however, contrasting Adverse+lntermediate risk against Favorable risk showed a trend of better survival in the Favorable group (Fig 4B), at 100 days after diagnosis all Favorable patients were still alive while 50% of the remaining Adverse/lntermediate group patients had succumb to the disease. Similarly, contrasting the 7 patients classified as high risk by our proteomic assessment against the 3 classified as low risk did not reveal any significant differences between the groups but produced the expected trend where the high-risk group exhibited shorter survival (Figure 4C) and at 100 days post diagnosis all low-risk patients were alive while 43% of high-risk patients had died. Surprisingly, when utilizing the combined scores that incorporates both the proteomic risk and the ELN2017-risk (Figure 4D and E) it was observed significant differences despite the very limited number of patients and the short follow-up period.

Materials and Methods

Patient samples

Bone marrow or peripheral blood samples were obtained at the time of diagnosis from 15 AML patients of treated in Sweden between 2018 and 2020. Samples were separated for mononuclear cells, cells were washed in PBS and pelleted by centrifugation. Cell pellets were snap-frozen and stored at -80 °C. All patients were treated with intensive induction regimens, including anthracyclines and cytosine arabinoside, according to national guidelines. Patients treated with other induction regimens , such as Hydrea, Venetoclax or Azacytidine, or who had their initial course of cytarabine and danorubicine supplemented with additional targeted treatments, such as Mylotar or Midostaurin, were not included. Clinical data was retrieved from the Swedish Adult Acute Leukemia Registry or from patient records.

Mass Spectrometry - DIA based proteomics

Each sample was dissolved in 200 pl lysis buffer (25 mM HEPES pH 7.6, 4 % SDS, 1 mM DTT), heated at 90° C for 5 min and sonicated for 1 min. The total protein amount was estimated (Bio-Rad DC). Samples were then prepared for mass spectrometry analysis using a modified version of the SP3 protein clean-up and a digestion protocol (as described earlier in Example 1 ), where proteins were digested by LycC and trypsin (sequencing grade modified, Pierce). In brief, 200 pg (or the entire amount if <200 pg protein was available) from each sample was alkylated with 4 mM Chloroacetamide. Sera-Mag SP3 (GE Healthcare products 45152105050250 and 65152105050250, distributed by Thermo Fisher) bead mix (20 pl) was transferred into the protein sample together with 100% Acetonitrile to a final concentration of 70 %. The mix was incubated under rotation at room temperature for 20 min. The mix was placed on the magnetic rack and the supernatant was discarded, followed by two washes with 70 % ethanol and one with 100 % acetonitrile. The beads- protein mixture was reconstituted in 100 pl Lys-C buffer (0.5 M Urea, 50 mM HEPES pH: 7.6 and 1 :50 enzyme (Lys-C) to protein ratio) and incubated overnight. Finally, trypsin was added in 1 :50 enzyme to protein ratio in 100 pl 50 mM HEPES pH 7.6 and incubated overnight. Peptide concentration was measured using Bio-Rad DC.

50 pg of peptides from each sample were cleaned by SP3 beads. For that, peptides were dried by SpeedVac, and dissolved in 20 pl water. 10 pl beads were added to each tube and mixed by short vortex. 570 pl acetonitrile was added to each sample to reach 95 % ACN. The mixture was incubated for 30 minutes at room temperature. To remove the buffer, the tube was placed on a magnetic rack and incubated for 2 minutes at room temperature. Supernatant was discarded. Magnetic beads were washed by addition of 250 pl of acetonitrile and incubated for 30 seconds on the magnetic stand. Supernatant was discarded and the beads air-dried. Tryptic peptides were detached from the beads by addition of 100 pl of 3 % ACN, 0.1 % FA and transferred to a new tube.

5 pg of peptides from each sample were injected and separated using an Ultimate 3000 RSLCnano system coupled to an Exploris 480 (Thermo Fischer Scientific, San Jose, CA, USA). Samples were trapped on an Acclaim PepMap nanotrap column (C18, 3 mm, 100 A, 75 pm x 20 mm, Thermo Scientific), and separated on an Aurora Series UHPLC emitter column (25 cm x 75 pm ID, 1.6 pm C18, lonoptiks). Peptides were separated using a gradient of mobile phase A (5 % DMSO, 0.1 % FA) and B (90 % ACN, 5 % DMSO, 0.1 % FA), ranging from 6 % to 30 % B in 120 min with a flow of 0.400 ml/min.

For data independent acquisition (DIA), data was acquired using a FAIMS devise with two CVs: -45 and -65 V. The DIA windows size was 25 m/z in a mass range from 375 to1 175 m/z, neighbor windows had an overlap of 1 m/z. For each of the CVs, the survey scan was performed at 120,000 resolution from 375-1175 m/z, with a max injection time set to auto and target of 3e6 ions. For generation of HCD fragmentation spectra, max ion injection time was set as auto and AGO of 2e5 were used before fragmentation at 28 % normalized collision energy, 15,000 resolution. For protein identification and quantification, all raw files analyzed by Spectronaut using the Direct-DIA option without the use of a spectral library, files were searched against ENSEMBL protein database (GRCh38.98.pep.all.fasta). All parameters were kept as default for protein identification. Briefly, runs were recalibrated using iRT standard peptides in a local and non-linear regression. Precursors, peptides and proteins were filtered with FDR 1 %. The decoy database was created by mutation method. For quantification, only peptides unique to a protein group were used. Protein groups were defined based on gene symbols to obtain a gene symbol centric quantification. Stripped peptide quantification was defined as the top precursor quantity. Protein group quantification was calculated by the median value of the top 3 most abundant peptides. Quantification was performed at the MS2 level based on the peak area. The quantitative values were filtered using the qvalue for each sample. Imputation was not performed at any stage of the quantification data generation.

Patient classification

Patient samples with sufficiently deep proteomics data (>4700 gene products quantified) were classified using the same approach as for the retrospective cohort. Proteins fully quantified across all samples (both the retrospective, discovery cohort and the new prospective samples) were retained. Proteins/gene products overlapping with the 3 lists were utilized for classification yielding: Large list: n = 935/2500 (1028/2500 for the previous retrospective-only DIA classification), Medium: n = 31/537 (332/537 previously), Small: n = 168/249, (178/249 previously). Markers seen as upregulated in the high-risk group (“up”) were considered upregulated if levels were >5% above the median level for intermediate samples (in the retrospective group based on the HiRIEF data). Similarly, markers seen as downregulated in the high-risk group were considered downregulated if they were <95% of the median level for the intermediate samples (again, as defined by the HiRIEF data).

For every sample the results were tallied and the percentage of markers trending in the correct direction was calculated. Discussion

The proteomic classification method was assessed by applying it to diagnostic samples from 10 prospectively collected AML patients. It was found that 7 out of 10 patients were classified as high risk and 3 as Intermediate/low risk. Treatment response was worse in the High-risk group (57% reaching CR1 ) and 4 of 7 patients died. In the low-risk group, all patients responded to treatment and no deaths were recorded.

It was also evaluated the combination of our method with an existing genetic risk stratification method (ELN2017) and found that they could be combined to produce improved stratification outputs compared to only using the genetic method for the patients here evaluated. As the number of patients was limited, finding statistically significant differences in survival was not expected, but nonetheless the combination of both methods stratified patients into 2 groups with significantly different survival outcomes.

Additionally, the method was designed based on data from a retrospective cohort where patient samples were viably frozen, then thawed and washed before LC-MS/MS analysis. Here the method using prospectively collected samples which were not freeze-thawed was evaluated and thus demonstrate that the method is compatible with this sample collection technique as well.

Example 3

The protein S1008A, was shown in the prior art (Nicolas, E., Expression of S100A8 in leukemic cells predicts poor survival in de novo AML patients, Leukemia, 2011, 25, 57-65), to have prognostic value in the stratification of AML patients. The present example thus aims to establish if the protein S100A8 has a prognostic value in relation to AML in the HiRIEF-data (cohort size n=118) described in example 2.

Patient classification

The patients were classified according to their expression level of the protein marker, and were initially split into two groups, based on the median expression level of the S100A8 protein, where the patients (n=59) with an expression level above the median expression level were categorized as high expressers (high in figure 7A), and the patients with an expression level below the median expression level were categorized as low expressers (low in figure 7A). The Kaplan-Meiner plots were drawn up for each of the groups (see figure 7A). The groups were statistically compared with relation to overall survivability (see Table 14). To test if the more extreme expression levels of S100A8 could be used for the stratification, the 25 % of the patients (n=30) with the highest expression and the 25 % of the patients with the lowest expression (n=30) of S100A8, were compared with relation to overall survivability (see Table 14 and figure 7B).

Table 14. Summary of S100A8 prognostic effect.

The statistical analysis (Cox regression) revealed no correlation to survival for the individual groups in HiRIEF-data (cohort size n=118).

In table 14, the top row shows cox-regression using relative protein levels as a continuous variable. The middle row shows dividing the cohort into high levels (n=59) vs low levels (n=59). The bottom row shows the most extreme cases (top 25% in terms of S100A8 levels, n=30 vs bottom 25% in terms of S100A8 levels, n=30).

None of these comparisons showed any significant correlation between S100A8 expression level and overall survivability. Actually, it seems that on the intermediate term (<800 days) there is a reversed correlation compared to what was stated in Nicolas, E., Expression of S100A8 in leukemic cells predicts poor survival in de novo AML patients, Leukemia, 2011, 25, 57-65. Nevertheless, statistical comparisons aiming to show short to intermediate term correlations, did not result in any significant associations between either of the groups and overall survivability on the short or intermediate term.

The results of the present example clearly show that the protein biomarker S100A8 is not suitable as a prognosis marker for the stratification of AML patients with relation to their overall survivability.