Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
PROGRESSION MARKERS FOR COLORECTAL ADENOMAS
Document Type and Number:
WIPO Patent Application WO/2021/015619
Kind Code:
A1
Abstract:
The invention relates to methods for typing an individual suffering from a colorectal lesion, or suspected of suffering therefrom, based on a set of biomarkers. The invention further relates to methods of assigning treatment to an individual having a high risk colorectal adenoma, and to a kit comprising reagents for directly or indirectly determining a level of expression of the biomarkers.

Inventors:
KOMÓR MALGORZATA ANNA (NL)
PINTO MORAIS DE CARVALHO BEATRIZ (NL)
FIJNEMAN REMONDUS JOHANNES ADRIAAN (NL)
JIMENEZ CORNELIA RAMONA (NL)
MEIJER GERRIT ALBERT (NL)
DE WIT MEIKE (NL)
Application Number:
PCT/NL2020/050482
Publication Date:
January 28, 2021
Filing Date:
July 23, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
STICHTING HET NEDERLANDS KANKER INST ANTONI VAN LEEUWENHOEK ZIEKENHUIS (NL)
STICHTING VUMC (NL)
International Classes:
G01N33/574
Domestic Patent References:
WO2013162368A12013-10-31
WO2018217087A12018-11-29
WO2007134779A12007-11-29
WO2007068985A22007-06-21
Foreign References:
US7138226B22006-11-21
Other References:
BOSCH L.J.W. ET AL.: "Novel stool-based protein biomarkers for improved colorectal cancer screening : A case-control study", ANN. INTERN. MED., vol. 167, no. 12, 855, 21 November 2017 (2017-11-21), pages 1 - 12, XP055649546
KOMOR M. ET AL.: "Proteins in stool as biomarkers for non-invasive detection of colorectal adenomas with high risk of progression", 29 November 2019 (2019-11-29), pages 1 - 36, XP009517572, Retrieved from the Internet DOI: 10.1002/PATH.5369
BRAY ET AL., CA: CANCER J CLIN, vol. 68, 2018, pages 394 - 424
NAVARRO ET AL., WORLD J GASTROENTEROL, vol. 23, 2017, pages 3632 - 3642
KERR ET AL., NZ MED J, vol. 120, 2007, pages U2629
CARROLLSEAMANHALLORAN, CLIN BIOCHEM, vol. 47, 2014, pages 921 - 39
ZAUBER ET AL., N ENGL J MED, vol. 366, 2012, pages 687 - 96
MANDEL ET AL., J NATL CANCER INST, vol. 91, 1999, pages 434 - 7
ROBERTSON ET AL., GASTROENTEROLOGY, vol. 152, 2017, pages 1217 - 1237
LEE ET AL., ANN INTERN MED, vol. 160, 2014, pages 171
DE WIJKERSLOOTH ET AL., AM J GASTROENTEROL, vol. 107, 2012, pages 1570 - 8
SONG AND LI, WORLD J GASTROINTEST ONCOL, vol. 8, 2016, pages 793 - 800
HAUG ET AL., INT J CANCER, vol. 136, 2015, pages 2864 - 74
IMPERIALEKAHI, CLIN GASTROENTEROL HEPATOL, vol. 16, 2018, pages 483 - 485
SHINYAWOLFF, ANN SURG, vol. 190, 1979, pages 679 - 83
HERMSEN ET AL., GASTROENTEROLOGY, vol. 123, 2002, pages 1109 - 19
CARVALHO ET AL., GUT, vol. 58, 2009, pages 79 - 89
CARVALHO ET AL., CANCER PREV RES (PHILA, vol. 11, 2018, pages 403 - 412
BOSCH ET AL., ANN INTERN MED, vol. 167, 2017, pages 855 - 866
QUE-GEWIRTHSULLENGER, GENE THERAPY, vol. 14, 2007, pages 283 - 291
NORD ET AL., PROT ENG, vol. 8, 1995, pages 601 - 608
SKERRA, FEBS J., vol. 275, 2008, pages 2677 - 2683
SILVERMAN ET AL., NAT BIOTECHNOL, vol. 23, 2005, pages 1556 - 1561
ALESSANDRINIPEZZECIRIBILLI, SEMIN ONCOL, vol. 44, 2017, pages 239 - 253
DE WIJKERSLOOTH ET AL., BMC GASTROENTEROL, vol. 10, 2010, pages 47
STOOP ET AL., LANCET ONCOL, vol. 13, 2012, pages 55 - 64
BUFFART ET AL., CELL ONCOL, vol. 29, 2007, pages 351 - 9
COXMANN, NAT BIOTECHNOL, vol. 26, 2008, pages 1367 - 72
LIU ET AL., ANAL CHEM, vol. 76, 2004, pages 4193 - 201
PHAM ET AL., BIOINFORMATICS, vol. 26, 2010, pages 363 - 9
ROBIN ET AL., BMC BIOINFORMATICS, 2011
Attorney, Agent or Firm:
WITMANS, H.A. (NL)
Download PDF:
Claims:
Claims

1. A method for typing an individual suffering from a colorectal lesion, or suspected of suffering therefrom, the method comprising

a. providing a sample comprising protein expression molecules from colorectal cells, or suspected to comprise protein expression molecules from colorectal cells; b. determining protein expression levels of a set of genes in said sample; and c. typing said sample on the basis of the determined protein expression levels; whereby said set of genes includes at least haptoglobin (Hp).

2. The method of claim 1, wherein said sample is or comprises stool.

3. The method according to claim 1 or claim 2, wherein protein expression levels are determined with one or more binding molecules directed against said protein expression molecules of the set of genes. 4. The method according to any one of claims 1-3, whereby the set of genes further comprises at least one of LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1.

5. The method according to any one of claims 1-4, whereby the set of genes comprises Hp and at least one of LAMP1, SYNE2 and ANXA6.

6. The method according to any one of claims 1-5, whereby the set of genes comprises Hp, LAMP1, SYNE2 and ANXA6.

7. The method according to any one of claims 1-6, whereby said typing distinguishes colorectal adenoma cells with a low risk of progression to cancer from colorectal adenoma cells with a high risk of progression to cancer.

8. The method according to any one of claims 1-4, whereby the set of genes further comprises at least one of LRG1, RBP4 and FN1.

9. The method according to any one of claims 1-4 and 8, whereby the set of genes comprises Hp, LRG1, RBP4 and FN1.

10. The method according to any one of claims 1-4 and 8-9, whereby an enhanced expression level of the protein expression molecules, when compared to the expression level of the protein expression molecules in a control sample, is indicative of the presence of high risk adenoma cells, colorectal cancer cells, or both. 11. A method comprising:

(a) providing a sample comprising protein expression molecules from colorectal cells from an individual;

(b) extracting protein expression molecules from said sample;

(c) reacting said extracted protein expression molecules of a set of genes with at least one binding molecule that binds to at least one of said protein expression molecules; and

(d) quantifying a reaction product between said at least one binding molecule and said extracted protein expression molecules; and

(e) determining a level of expression of said at least one extracted protein expression molecule, based on the quantified reaction product;

whereby said set of genes includes at least haptoglobin (Hp) and at least one of LAMP1, SYNE 2, ANXA6, LRG1, RBP4 and FN1, preferably Hp, LAMP1, SYNE 2 and ANXA6. 12. A method comprising:

(a) providing a sample comprising extracted protein expression molecules of a set of genes from colorectal cells from an individual;

(b) reacting said extracted protein expression molecules with at least one binding molecule that binds to at least one of said protein expression molecules;

(c) quantifying a reaction product between said at least one binding molecule and said at least one protein expression molecule; and

(e) determining a level of expression of said at least one protein expression molecule, based on the quantified reaction product; whereby said set of genes includes haptoglobin (Hp) and at least one of LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, preferably Hp, LRG1, RBP4 and FN1. 13. A method of assigning treatment to an individual having a high risk colorectal adenoma, comprising

(a) typing an individual according to the methods of any one of claims 1-10;

(b) identifying an individual that is suspected to have a high risk adenoma;

(c) assigning treatment comprising removal of at least part of the high risk adenoma, preferably by polypectomy and/or surgical resection, to the identified individual.

14. The method according to claim 13, further comprising assigning 5-fluoruracil (5-FU), preferably in combination with leucovorin.

15. A kit comprising reagents for directly or indirectly determining a level of expression of protein expression products of haptoglobin (Hp) and at least one of FAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, preferably reagents for an immunochemical assay.

Description:
TITLE: Progression markers for colorectal adenomas

1. FIELD OF THE INVENTION

The invention relates to the field of oncology. More specifically, the invention relates to methods for typing a colorectal lesion. The invention provides methods and means for differentiating colorectal adenomas and cancer cells from normal cells, based on biomarkers in stool.

2. BACKGROUND OF THE INVENTION

Colorectal cancer (CRC) is still a major health care problem representing 6.1% percent of all diagnosed cancers worldwide (Bray et al., 2018. CA: Cancer J Clin 68: 394-424). Early detection by way of screening is the most efficient method to reduce the burden of CRC and is now applied in most European countries, Canada, regions of Australia and North and South America and parts of Asia (Navarro et al., 2017. World J Gastroenterol 23: 3632-3642). Such screening aims at detection of disease at a premalignant stage (i.e. colorectal adenoma) or CRC at a curable stage and has been proven to reduce CRC mortality rates (Kerr et al., 2007. NZ Med J 120: U2629; Carroll, Seaman, and Halloran, 2014. Clin Biochem 47): 921-39; Zauber et al., 2012. N Engl J Med 366: 687-96). Most population-wide screening programs make use of a stool-based fecal immunochemical test (FIT) as a triage test to colonoscopy (Mandel et al., 1999. J Natl Cancer Inst 91: 434-7;

Robertson et al. 2017. Gastroenterology 152: 1217-1237). In this setting only a positive FIT is followed by referral to a diagnostic colonoscopy, during which a detected adenoma can be removed or biopsies can be taken.

The FIT test has a high sensitivity for CRC, (i.e. 79%) but the sensitivity for colorectal adenomas (i.e. 31%) is clearly suboptimal leaving room for improvement (Lee et al., 2014. Ann Intern Med 160: 171; de Wijkerslooth et al., 2012. Am J Gastroenterol 107:1570-8; Song and Li, 2016. World J Gastrointest Oncol 8: 793- 800). In this respect, it has been suggested that an increased sensitivity for colorectal adenomas is the best approach to making CRC screening more cost- effective and efficient (Haug et al., 2015. Int J Cancer 136: 2864-74; Imperiale and Kahi, 2018. Clin Gastroenterol Hepatol 16: 483-485). Advanced adenomas (AAs), defined as adenomas with a size of ³10 mm, villous component and/or high-grade dysplasia are currently regarded as an intermediate endpoint for CRC, since these AAs are considered to carry a higher risk of development into a CRC. However, based upon the incidence of AAs and CRC not all AAs are expected to progress, actually only approximately 5% of all adenomas are expected to develop into cancer (Shinya and Wolff, 1979. Ann Surg 190: 679-83). Therefore, it is important to develop new screening tests that aim at the identification of those lesions with the highest risk of progression.

Studies on specific changes in DNA copy numbers in colorectal neoplasia have identified cancer associated events (CAEs), i.e. gains of chromosomal arms 8q, 13q, and 20q, and losses of 8p, 15q, 17p, and 18q to be associated with adenoma to carcinoma progression (Hermsen et al., 2002. Gastroenterology 123: 1109-19;

Carvalho et al., 2009. Gut 58: 79-89; Carvalho et al., 2018. Cancer Prev Res (Phila) 11: 403-412). Adenomas carrying two or more of these aberrations are considered at high risk of progression. These were found in 23%-36% of advanced adenomas and in 1.7-4.8% of non-advanced adenomas, defining them as high-risk adenomas (Carvalho et al., 2018. Cancer Prev Res (Phila) 11: 403-412). Therefore these CAEs may better reflect the true progression risk than the advanced adenoma phenotype.

A previous study reported on stool protein biomarkers, which added sensitivity to hemoglobin and could be detected and quantified in FIT fluid (Bosch et al., 2017. Ann Intern Med 167: 855-866). This approach would allow for similar logistics and costs as is currently employed for the FIT testing and therefore would make it a feasible option to improve on the currently low sensitivity for adenomas.

BRIEF DESCRIPTION OF THE INVENTION

The aim of this study was to identify specific biomarkers, preferably stool- based biomarkers, that complement or even outperform hemoglobin in the detection of molecularly defined high-risk adenomas and colorectal carcinoma’s (CRCs). The currently widely used FIT test has a high sensitivity for CRC, but the sensitivity for AA remains suboptimal, therefore additional biomarkers could complement hemoglobin and improve the performance of the current CRC screening. Based on known Cancer Associated Events (CAEs) that were previously identified as associated with progression, adenomas were classified into samples with low risk or high risk for progressing to cancer. Markers panels were identified that significantly outperform HBA1 for the detection of high-risk adenomas and CRCs.

The invention provides a method for typing an individual suffering from a colorectal lesion, or suspected of suffering therefrom, the method comprising a. providing a sample comprising protein expression molecules from colorectal cells, or suspected to comprise protein expression molecules from colorectal cells; b.

determining protein expression levels of a set of genes in said sample; and c. typing said sample on the basis of the determined protein expression levels; whereby said set of genes includes at least haptoglobin (Hp).

Said sample preferably is or comprises stool.

In a preferred method of the invention, protein expression levels are determined with one or more binding molecules directed against said protein expression molecules of the set of genes.

A preferred set of genes includes Hp and further comprises at least one of LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, more preferably comprises Hp and at least one of LAMP1, SYNE2 and ANXA6, especially Hp, LAMP1, SYNE2 and ANXA6. Using this set of genes, said typing distinguishes colorectal adenoma cells with a low risk of progression to cancer from colorectal adenoma cells with a high risk of progression to cancer.

A further preferred set of genes includes Hp and further at least one of LRG1, RBP4 and FN1, especially Hp, LRG1, RBP4 and FN1. Using this set of genes, an enhanced expression level of the protein expression molecules, when compared to the expression level of the protein expression molecules in a control sample, is indicative of the presence of high risk adenoma cells, colorectal cancer cells, or both.

A preferred set of genes includes HBA1 and/or HBB (together termed hemoglobin of Hb) and further comprises at least one of LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, more preferably comprises Hb and at least one of LAMP1, SYNE 2 and ANXA6, especially Hb, LAMP1, SYNE 2 and ANXA6. Using this set of genes, said typing distinguishes colorectal adenoma cells with a low risk of progression to cancer from colorectal adenoma cells with a high risk of progression to cancer.

A further preferred set of genes includes Hb and further at least one of LRG1, RBP4 and FN1, especially Hb, LRG1, RBP4 and FN1. Using this set of genes, an enhanced expression level of the protein expression molecules, when compared to the expression level of the protein expression molecules in a control sample, is indicative of the presence of high risk adenoma cells, colorectal cancer cells, or both.

The invention further provides a method comprising (a) providing a sample comprising protein expression molecules from colorectal cells from an individual;

(b) extracting protein expression molecules from said sample; (c) reacting said extracted protein expression molecules of a set of genes with at least one binding molecule that binds to at least one of said protein expression molecules; and (d) quantifying a reaction product between said at least one binding molecule and said extracted protein expression molecules; and (e) determining a level of expression of said at least one extracted protein expression molecule, based on the quantified reaction product; whereby said set of genes includes at least haptoglobin (Hp) and at least one of LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, preferably Hp, LAMP1, SYNE2 and ANXA6. As an alternative, said set of genes includes at least Hb and at least one of LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, preferably Hb, LAMP1, SYNE 2 and ANXA6.

The invention further provides a method comprising (a) providing a sample comprising extracted protein expression molecules of a set of genes from colorectal cells from an individual; (b) reacting said extracted protein expression molecules with at least one binding molecule that binds to at least one of said protein expression molecules; (c) quantifying a reaction product between said at least one binding molecule and said at least one protein expression molecule; and (e) determining a level of expression of said at least one protein expression molecule, based on the quantified reaction product; whereby said set of genes includes haptoglobin (Hp) and at least one of LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, preferably Hp, LRG1, RBP4 and FN1. As an alternative, said set of genes includes at least Hb and at least one of LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, preferably Hb, LRG1, RBP4 and FN1. The invention further provides a method of assigning treatment to an individual having a high risk colorectal adenoma, comprising (a) typing an individual according to the methods of the invention; (b) identifying an individual that is suspected to have a high risk adenoma; (c) assigning treatment comprising removal of at least part of the high risk adenoma, preferably by polypectomy and/or surgical resection, to the identified individual. The invention further provides a kit comprising reagents for directly or indirectly determining a level of expression of protein expression products of haptoglobin (Hp) and at least one of LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, preferably reagents for an

immunochemical assay.

LEGENDS TO THE FIGURES

Figure 1. Overview of the data analysis approach for the best biomarker panel identification. Feature selection was performed using beta-binomial test (BB- test) in the comparative setting cases vs controls, in particular high-risk adenomas vs healthy controls and high-risk adenomas with CRCs vs healthy controls. Up- regulated proteins were selected using different thresholds for each comparison (see Materials and Methods). Logistic regression with Lasso regularization was applied to built a model based on X features (where X is either two, three or four features). The data analysis approach was evaluated using leave-one-out cross- validation, where feature selection with BB-test and logistic regression with Lasso regularization were repeated. Cross-validated performance of the built models were evaluated with respect to hemoglobin (HBA1) at high specificity levels.

Figure 2. Biomarker panels from logistic regression analysis to identify high- risk adenomas and CRCs. A. ROC curve of the regression model using four biomarker panel to distinguish between stool samples from individuals with high- risk adenomas (n = 15) and healthy controls (n = 129). ROC curve was obtained from logistic regression predictions from leave-one-out cross-validation analysis. Partial area under the curve (pAUC) was calculated for specificity of 95%-100% and compared to pAUC of hemoglobin to obtain the p-value. B. Frequency plot of biomarkers occurring in the regression models built during the cross-validation analysis to distinguish between the high-risk adenomas and healthy controls. Four proteins were clearly selected more frequently by the Lasso regularization in the cross-validation analysis.

Figure 3. Biomarker panels from logistic regression analysis to identify high- risk adenomas and CRCs. A. ROC curve of the model based on the panel of four biomarkers for high-risk adenomas and CRCs (n = 94) compared to healthy controls (n = 129). ROC curve was obtained from logistic regression predictions from the leave-one-out cross-validation analysis B. Frequency plot of biomarkers occurring in the regression models built during the cross-validation analysis to discriminate high-risk adenomas and CRCs from healthy controls based on four proteins. Four proteins were clearly selected more frequently by the Lasso regularization in the cross-validation analysis. C. ROC curve of the model based on the panel of two biomarkers for high-risk adenomas and CRCs (n = 94) compared to healthy controls (n = 129). ROC curve was obtained from logistic regression predictions from the leave-one-out cross-validation analysis D. Frequency plot of biomarkers occurring in the regression models built during the cross-validation analysis to discriminate high-risk adenomas and CRCs from healthy controls based on two proteins. The same two proteins were consistently selected in the cross- validation analysis.

Figure 4. Validation of Hp protein expression with the use of MSD

immunoassay. A. The study series B. the validation series.

DETAILED DECRIPTION OF THE INVENTION

Definitions

The term“lesion”, as is used herein, refers to a cancerous growth of epithelial tissue that covers or lines surfaces of the colorectal tract. Said cancerous growth preferably is an adenocarcinoma. The term lesion includes early adenoma, advanced adenoma and colorectal cancer.

The term“adenoma”, as is used herein, refers to a benign tumor of epithelial tissue with glandular origin, glandular characteristics, or both. Said adenoma preferably is a colorectal adenoma, also referred to as an adenomatous polyp.

The term typing, as is used herein, refers to assessing presence and/or staging of a lesion. The term typing preferably refers to differentiating adenoma’s that are at risk of developing into a carcinoma, including early adenoma and, especially, advanced adenoma. The term typing also refers to discriminating colorectal cancer from non-cancerous growth, including normal colorectal tissue. Said typing is intended to provide information to aid in clinical evaluation of patients. The methods of the invention find particular use in choosing appropriate treatment for said patients.

The term protein expression molecules, as is used herein, refers to protein products of genes, or parts of these products.

The term“directly conjugated with a detectable label”, as used herein, refers to the labeling of the antibody itself with a detectable label.

The term“indirectly conjugated with a detectable label”, as used herein, refers to the indirect labeling of an antibody, for example using a biotindabelled antibody and a detectable label that is bound to streptavidin, or by using a further antibody that is directed against the indirectly labeled antibody and which further antibody is labeled with a detectable label.

The term“binding molecule”, as used herein, refers to a molecule, preferably a proteinaceous molecule such as a protein that can specifically bind to a target epitope that is present in a protein expression molecule. Said binding molecule preferably is an antibody.

The term“antibody”, as used herein, includes classical VH region-containing proteins that may be paired with a light chain variable region (VL). The term antibody also includes synthetic antibody-like molecules or antibody mimics that are known to those skilled in the art such as APTAMERS (Que-Gewirth and Sullenger, 2007. Gene Therapy 14, 283-291); AFFIBODY® molecules (Nord et al., 1995. Prot Eng 8: 601-608), ANTICALINS® (Skerra, 2008. FEBS J. 275: 2677- 2683), and AVIMERS® (Silverman et al., 2005. Nat Biotechnol 23: 1556-1561). The term also provides reference to a single domain antibody, a single chain antibody, a nanobody, an unibody, a single chain variable fragment (scFv), a Fd fragment, a Fab fragment and a F(ab')2 fragment.

A "detectable label" is a label which may be detected and of which the absolute or relative amount and/or location (for example, the location on an array) can be determined.

The term reference, as is used herein, refers to a sample, preferably a stool sample, that comprises protein expression molecules, preferably proteins, from a healthy individual not suffering from a colorectal lesion or from an individual that is known to suffer from a high risk adenoma or from a carcinoma. The levels of expression of the protein expression molecules preferably are stored on a computer, or on computer-readable media, to be used in comparisons to the level of expression level data from a sample of an individual.

The term specifically binding, as is used herein, refers to a binding reaction between an antibody- antigen, or other binding pair, which is determinative of the presence of a protein comprising the antigen in a heterogeneous population of proteins and/or other biologies. Thus, under designated conditions, a specified antibody or functional part thereof binds to a particular antigen and does not bind in a significant amount to other proteins present in the sample.

The term“polypectomy”, as is used herein, refers to the partial or complete removal of an adenoma.

The term“enzyme-linked immunosorbent assay (ELISA)”, as is used herein, refers to a plate-based assay that is designed for detecting and quantifying antigens such as protein expression molecules.

Sample preparation

A sample from an individual suffering from a colorectal lesion, or suspected to suffer therefrom, comprising protein expression molecules can be obtained in numerous ways, as is known to a skilled person, such as by

esophagogastroduodenoscopy, colonoscopy, or sigmoidoscopy.

Said sample preferably is or comprises stool from an individual suffering from a colorectal lesion, or suspected to suffer from said lesion. A preferred sample is a sample that is obtained from stool by contacting a stool surface, for example with a stick or a brush, and providing a part of the obtained sample in a test tube or on an absorbent surface, for example a test card. Said test tube preferably comprises a buffer, for example a stool stabilization buffer such as a buffer comprising phosphate-buffered saline and sodium azide. A sample comprising protein expression molecules can be freshly prepared at the moment of isolation of the specimen, or it can be prepared from specimen that have been stored, for example at -20°C, until processing for sample preparation. Alternatively, said stool specimen can be stored under conditions that preserve the quality of the protein expression products. Examples of such preservative conditions are fixation, addition of protease inhibitors, addition of reducing agents such as dithiothreitol (DTT) or 2-mercaptoethanol (2-ME), and non-aqueous solutions such as Universal Molecular Fixative (Sakura Finetek USA Inc.; US7138226).

A stool sample may be mixed with a stool stabilization buffer (for example Exact Sciences, Madison, Wl, USA) after defecation, preferably immediately after defecation, and processed to a final stool:buffer w/v ratio of between 1:1 and 1:2, preferably between 1.2 and 1.7, more preferably about 1:4, within 72 hours, and stored at -80°C until use.

Said sample preferably is pretreated to remove contaminants and/or to increase the concentration of the protein expression molecules. This will result in a lower detection limit and will improve reliability of the methods of the invention.

A preferred pretreatment method comprises homogenization in a buffer, for example by vortexing, followed by centrifugation, for example for 15 minutes at 16.000 G. After this, the supernatant may be centrifuged for 10 minutes at full speed. Supernatants may be filtered, for example through a 0.22 mM PVDF filter (Merck Millipore, Billerica, MA, USA) and concentrated using a molecular size cut- off filter, for example a 3 kDa cut-off filter (Amicon Ultra, Merck Millipore, Billerica, MA, USA).

Detection assay

An expression level of a protein expression molecule may determined by any assay known to a skilled person. A level of expression may be determined by polyacrylamide gel electrophoresis, including two dimensional gel electrophoresis, multidimensional protein identification technology, ELISA, bead-based

immunoassays, immuno-PCR using, for example, Thunder-Link® antibody- oligonucleotide conjugation kit (Innova Biosciences. Cambridge UK), surface plasmon resonance, liquid chromatography- tandem mass spectrometry (LC- MS/MS), and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF). Examples of suitable assays are chemo-luminescence assays, fluorescence assays, mass spectrometry, affinity chromatography, Western blotting, Northern blotting, histology and protein expression chips, probes. Preferred are multiplex systems that can measure protein expression molecules from different genes at the same time.

Mass spectrometry is a suitable means of determining a level of expression of a protein. A preferred method comprises liquid chromatography coupled to tandem mass spectrometry in positive electrospray ionization mode. The LC-MS/MS analysis may be performed, for example by using an I-Class UPLC system connected to a Xevo TQS mass spectrometer Waters (Manchester, UK), or an Q Exactive mass spectrometer (Thermo Fisher). A suitable multiplex system for determining an expression level of a protein product is multiple reaction monitoring (MRM), which is a quantitative MS-based approach.

Said protein expression molecules are preferably detected and quantified using an immunochemical assay, preferably employing binding molecules such as antibodies that specifically bind to a ligand on said protein expression molecules, preferably proteins. A protein expression molecule is an antigen for an binding molecule that specifically reacts with said protein expression molecule.

The binding molecules, preferably antibodies, are preferably coupled to a solid support such as a bead, monolithic material or a multi-well array. The binding molecules, preferably antibodies, may be coupled directly, or indirectly, for example by coupling of a second binding molecule that specifically recognizes the first binding molecule that binds to a protein expression molecule. Indirect coupling may be accomplished, for example, by coupling of antibody-binding molecules such as protein A, protein G, or a mixture of protein A and G to beads, monolithic material or array. Direct coupling may be accomplished, for example, by cross-linking, covalently binding or physically adsorbing said binding molecule, preferably antibody, to the solid support.

A preferred method for determining a level of expression of a protein or multiple proteins includes Enzyme-Linked Immuno Sorbent Assay (ELISA) and Flow Cytometric ImmunoAssay (FCIA).

In a competition ELISA, known amounts of an antigen are immobilized to a surface. A sample comprising unknown amounts of said antigen is added, and the antigen is subsequently complexed with a binding molecule that is preferably conjugated, directly or indirectly, to a detectable label such as a colorimetric label, a fluorescent label, a radioactive label or a chemiluminescent label, or an enzyme. Following washing, detection of the binding molecule that is complexed to the immobilized antigen is accomplished by assessing the conjugated label or enzyme activity via incubation with a substrate to produce a measureable product. The amount of label or enzyme activity is inversely proportional to the amount of antigen in the sample.

A preferred assay is a sandwich ELISA, in which a receptacle is coated with a first binding molecule that is specific to a protein expression molecule, termed “capture binding molecule”, and detection of bound protein expression molecule is accomplished with a second binding molecule, termed“detection binding molecule”. It is preferred that the capture and detection binding molecules do not interfere with each other and can bind simultaneously to said protein expression molecule.

Said coating of a receptacle or bead, preferably the surface of a receptacle or bead, may be performed directly or indirectly. Indirect coating may be

accomplished, for example, by using a biotindabeled capture binding molecule that is attached to a linker molecule, for example a U-PLEX Linker (Meso-Scale

Discovery, Rockville, USA). The employment of different linker molecules for different capture antibodies allows the generation of arrayed spots on a receptacle, each of which will bind to a specific protein expression molecule. Said receptacle preferably is a multi-well plate, such as a 24 well plate, a 96 well plate, a 192 well plate, or a 384 well plate, in which each of the wells comprises arrayed spots, whereby each of the spots will bind to a specific protein expression molecule.

Said second binding molecule is preferably directly or indirectly conjugated to a detectable label such as a colorimetric label, a fluorescent label a radioactive label, or a chemiluminescent label, or an enzyme. Detection of the amount of enzyme-conjugated binding molecule is preferably performed by incubation with a substrate to produce a measureable product. As an alternative, turbidimetric assays are preferred, especially for competition ELISAs.

Detectable labels are well known in the art. A detectable label may be a fluorescent, luminescent, chemiluminescent and/or electrochemiluminescent moiety which, when exposed to specific conditions, may be detected. For example, a fluorescent label may be exposed to radiation (i.e. light) at a specific wavelength and intensity to cause excitation of the fluorescent label, thereby enabling it to emit detectable fluorescence at a specific wavelength that may be detected.

Alternatively, the detectable label may be an enzyme which is capable of converting a (preferably undetectable) substrate into a detectable product that can be visualized and/or detected. Suitable enzymes include horseradish peroxidase, phosphatase, phosphatase/pyrophosphatase and luciferase.

Alternatively, the detectable label may be a radioactive label, which may be incorporated by methods known in the art.

Indirect labeling of a binding molecule may be accomplished, for example, through conjugation of a binding molecule with biotin and reacting biotin with labelled or enzymedinked avidin or streptavidin.

As an alternative, carbon coated wells may be equipped with electrodes that produce chemical energy when subjected to an electrical charge, such as the Multi- array® and Multi-spot® 96-well plates of Meso-Scale Discovery. When combined with a SULFO-TAG® antibody, the chemical energy is transformed to emitted light which is measured using a high-resolution CCD camera.

Reference

A level of expression of protein expression molecules is preferably compared with a level of expression of said molecules in a reference. Said reference preferably comprises a stool sample from an individual that is known to suffer from a high risk adenoma or a colorectal cancerous growth, or known not to suffer therefrom.

Based on a comparison with the level of expression of the at least two protein expression molecules, preferably proteins, in the reference, it can be determined whether an individual is likely to have a high risk adenoma, or is likely to suffer from a colorectal cancerous growth. For example, when the reference is sample of a person that is known not to suffer from a high risk adenoma or a colorectal cancerous growth, a difference between the determined level of expression of the at least two protein expression molecules, preferably proteins, might indicate that the individual is suffering from a high risk adenoma or a colorectal cancerous growth.

Typing of a sample can be performed in various ways. In one method, a coefficient is determined that is a measure of a similarity or dissimilarity of a sample with said reference. A number of different coefficients can be used for determining a correlation between the determined expression levels in a sample from an individual and the comparative levels of expression in said reference.

Preferred methods are parametric methods which assume a normal distribution of the data.

Markers

A preferred set of protein expression molecules comprises at least expression products from haptoglobin (Hp). The haptoglobin gene is identified by references 5141 (HGNC database) and ENSG00000257017 (Ensembl data base). The protein expression product is identified by UniProtKB reference number P00738. The higher abundance of Hp protein was validated in 953 fecal immunochemical test (FIT) samples. The Hp levels were significantly increased in high-risk adenoma FIT samples, when compared to healthy controls (p=0.036 and 9e-5, in the study and validation series, respectively).

Haptoglobin protein is known to interact with hemoglobin. As a result of hemolysis, hemoglobin accumulates in the kidney and is secreted in the urine. Haptoglobin captures and combines with free plasma hemoglobin to allow hepatic recycling of heme iron and to prevent kidney damage. Haptoglobin also acts as an antioxidant, has antibacterial activity, and plays a role in modulating many aspects of the acute phase response.

Hp protein was consistently significantly higher abundant in high-risk adenomas and CRCs compared to controls. Hp protein was selected in the three panels whereas hemoglobin (HBA1, HBB or HBD) was not. The fact that levels of hemoglobin were not significantly different in high-risk adenomas compared to controls is in line with the limited sensitivity of FIT for adenomas. Although one would expect that Hp is a marker of blood in stool and, therefore, should not have complementary value to hemoglobin, our data show that Hp may be of added value, especially for the detection of high-risk adenomas.

Next to Hp, also Fysosomal Associated Membrane Protein 1 (FAMP1), Spectrin Repeat Containing Nuclear Envelope Protein 2 (SYNE2) and Annexin A6 (ANXA6) were selected in the analysis for high-risk adenomas only, and Leucine Rich Alpha-2-Glycoprotein 1 (LRG1), Retinol Binding Protein 4 (RBP4) and Fibronectin 1 (FN1) for high-risk adenomas and CRC.

A preferred set of genes for typing an individual suffering from a colorectal lesion, or suspected of suffering therefrom, comprises Hp and at least one of LAMP1, SYNE2, ANXA6, LRG1, RBP4, and FN1.

Said set of genes preferably comprises Hp and LAMP1, Hp and SYNE2, Hp and ANXA6, Hp, LAMP1 and SYNE2, Hp, SYNE 2 and ANXA6, Hp, LAMP1 and ANXA6, and Hp, SYNE2, LAMP1 and ANXA6. Said set of genes preferably is used to distinguish colorectal adenoma cells with a low risk of progression to cancer from colorectal adenoma cells with a high risk of progression to cancer.

An additional set of genes comprises hemoglobin (Hb, identified by HBA1 and HBB) and at least one of LAMP1, SYNE2, ANXA6, LRG1, RBP4, and FN1, preferably Hb and LAMP1, Hb and SYNE2, Hb and ANXA6, Hb, LAMP1 and SYNE 2, Hb, SYNE 2 and ANXA6, Hb, LAMP1 and ANXA6, and Hb, SYNE 2, LAMP1 and ANXA6. Said set of genes preferably is used to distinguish colorectal adenoma cells with a low risk of progression to cancer from colorectal adenoma cells with a high risk of progression to cancer

The hemoglobin alpha 1 (HBA1) gene is identified by references 4823 (HGNC database) and ENSG00000206172 (Ensembl data base). The protein expression product is identified by UniProtKB reference number P69905.

The hemoglobin beta (HBB) gene is identified by references 4827 (HGNC database) and ENSG00000244734 (Ensembl data base). The protein expression product is identified by UniProtKB reference number P68871.

LAMP1, also termed CD 107 Antigen-Like Family Member A, is a lysosome associated membrane protein, which has been implicated in several tumor- promoting activities such as promotion of metastasis, drug resistance and cancer cell survival (Alessandrini, Pezze, and Ciribilli, 2017. Semin Oncol 44:239-253).

Also the gene coding for LAMP1 is located on chromosome 13q, one of the regions that is commonly gained and that is part of the seven CAEs. The LAMP1 gene is identified by references 6499 (HGNC database) and ENSG00000185896 (Ensembl data base). The protein expression product is identified by UniProtKB reference number P11279. SYNE2 (or Nesprin 2) is a nuclear envelope protein that is involved in regulation of nuclear trafficking. The role of SYNE2 in cancer is yet to be established although there are indications that its presence is pivotal for a DNA damage response (Kelkar et al., 2015). Since high-risk adenomas are characterized by chromosomal gains and losses, SYNE2 might be upregulated in response to this DNA damage. The SYNE2 gene is identified by references 17084 (HGNC database) and ENSG00000054654 (Ensembl data base). The protein expression product is identified by UniProtKB reference number Q8WXH0.

ANXA6 is present at the cell membrane and in the endosomal compartments, where ANXA6 functions as a multifunctional scaffolding protein. In that position ANXA6 can contribute to many different processes including cancer cell migration and invasion (Grewal et al., 2017). ANXA6 belongs to a family of calcium- dependent membrane and phospholipid binding proteins. The ANXA6 gene is identified by references 544 (HGNC database) and ENSG00000197043 (Ensembl data base). The protein expression product is identified by UniProtKB reference number P08133.

At the specificity level of 95% the biomarker panel could identify 8 out of 15 high-risk adenomas (sensitivity = 54%), which was significantly more than hemoglobin (p-value = 0.05, see Table 2A).

As an alternative, or in addition, said set of genes preferably comprises Hp and RBP4, Hp and FN1, Hp and LRG1, Hp, RBP4 and FN1, Hp, FN1 and LRG1, Hp, RBP4 and LRG1, and Hp, FN1, RBP4 and LRG1. Said set of genes preferably is used to distinguish high risk adenoma cells, colorectal cancer cells, or both, from normal colorectal cells or from low risk adenoma cells.

RBP4 has been linked to insulin resistance and has been shown to be present in serum, and was previously described as a potential marker especially for AAs in stool (Bosch et al., 2017. Ann Intern Med 167: 855-866). The RBP4 gene is identified by references 9922 (HGNC database) and ENSG00000138207 (Ensembl data base). The protein expression product is identified by UniProtKB reference number P02753.

FN1 is an extracellular matrix protein that is involved in cell adhesion and migration processes, the protein has been shown to be present in serum and has been suggested as a biomarker for hepatocellular carcinoma (Kim et al., 2017). The FN1 gene is identified by references 3778 (HGNC database) and ENSG00000115414 (Ensembl data base). The protein expression product is identified by UniProtKB reference number P02751.

Finally LRG1 was reported to be highly upregulated in CRC, both at the mRNA as at the protein level (Zhou et al., 2017, Choi et al., 2013). A clear role in tumor development has been established for LRG1, as it stimulates proliferation and inhibition of apoptosis through regulating RUNX1 expression. In addition to this tumor promoting role the protein is secreted and may therefore end up in blood or stool passing the tumor. Indeed, increased protein levels of LRG1 in plasma have been reported for colorectal cancer and colon adenoma patients (Ladd et al., 2012, Zhang et al., 2018, Zhou et al., 2017). The LRG1 gene is identified by references 29480 (HGNC database) and ENSG00000171236 (Ensembl data base). The protein expression product is identified by UniProtKB reference number P02750.

A further preferred set of genes comprises Hb and RBP4, Hb and FN1, Hb and LRG1, Hb, RBP4 and FN1, Hb, FN1 and LRG1, Hb, RBP4 and LRG1, and Hb, FN1, RBP4 and LRG1. Said set of genes preferably is used to distinguish high risk adenoma cells, colorectal cancer cells, or both, from normal colorectal cells or from low risk adenoma cells.

As is indicated in Figure 3, cross -validated pAUCs of four (pAUC= 70.4%) and two (pAUC = 71.1%) protein models significantly outperformed hemoglobin (pAUC HBA1 = 62.7%, both p-value = 0.007). At the specificity level of 95% the biomarker panels could identify 58 or 62 out of 94 high-risk adenomas and CRCs combined for the four and two feature models, respectively (sensitivity = 62% or 66%), which was significantly more than HBA1 (sensitivity = 40%, p-value = 5.57e-3, 4.74e-4, respectively, see Table 2B).

We conclude that novel biomarker proteins have been identified with potential important roles in colorectal carcinogenesis with higher sensitivities for high-risk adenomas and CRCs than HBA1. Our data suggests that these biomarker panels can be used to improve current FIT-based screening approaches. Treatment

The invention further provides a method of assigning treatment to an individual having a high risk colorectal adenoma or a colorectal cancer, comprising (a) typing an individual according to the methods of the invention, (b) identifying an individual that is suspected to have a high risk adenoma; and (c) assigning treatment comprising removal of at least part of the high risk adenoma, preferably by polypectomy and/or surgical resection, to the identified individual.

The identification of the biomarkers indicated herein above allows not only the detection of a high risk adenoma, but also enables methods of treating an individual with said high risk adenoma before it progresses into adenocarcinoma. Early diagnosis of high risk adenoma often allows for curative surgical removal of the adenoma, whereas such curative surgical removal may not be possible if diagnosis is delayed.

Said treatment preferably comprises colonoscopy or polypectomy if said individual is classified as having a high risk adenoma, comprising removal of at least part of the high risk adenoma, preferably complete removal of the high risk adenoma.

Said removal is preferably followed by an intensified surveillance program according to which a patient from which a high risk adenoma is removed is repeatedly screened at intervals of less than 5 years, preferably less than 3 years, more preferably once yearly.

Therefore, provided is a method of assigning or prescribing treatment to an individual of which the adenoma is typed as a high risk adenoma according to the methods of typing of this invention, said method comprising (a) classifying said individual as having a high risk adenoma or as not having a high risk adenoma according to a method of the invention, (b) assigning treatment comprising polypectomy if said individual is classified as having said high risk adenoma.

A method of assigning or prescribing treatment may further comprise complete removal of the adenoma by surgery Detection of a high risk adenoma using a method of the invention preferably is followed by polypectomy and complete removal of the lesion. Further provided is a method of assigning or prescribing treatment to an individual who is typed as having a colorectal carcinoma according to the methods of typing of this invention, said method comprising (a) classifying said individual as having a colorectal carcinoma or as not having a colorectal carcinoma according to a method of the invention, (b) assigning treatment comprising polypectomy if said individual is classified as having said high risk adenoma and/or (chemo)therapy, such as by assigning or administering 5- fluoruracil (5-FU), preferably in combination with leucovorin, or by assigning or administering capecitabine and/or oxaliplatin and/or irinotecan.

Therapeutic agents used to treat a colorectal carcinome include monoclonal antibodies, small molecule inhibitors and chemotherapeutic agents.

Typical therapeutic monoclonal antibodies include but are not limited to bevacizumab, cetuximab or panitumumab. Typical small molecule inhibitors include but are not limited to eriotinib, sorafenib or alisertib. Typical

chemotherapeutic agents include but are not limited to 5-fluoruracil (5-FU), preferably further in combination with leucovorin, capecitabine, irinotecan and/or oxaliplatin. A preferred treatment comprises 5-FU, in combination with leucovorin and oxaliplatinin or 5-FU, in combination with leucovorin and irinotecan.

A further preferred treatment comprises capecitabine. Capecitabine may be used as adjuvant treatment, as monotherapy, or in combination with other agents for advanced or metastatic disease. Capecitabine may be used with either irinotecan or oxaliplatin, or used to replace 5-FU in any one of the above indicated combination treatments.

Any of the above indicated combinations comprising 5-FU or capecitabine may be combined with one or more of cetuximab, bevacizumab and panitumumab.

Combination therapies of, for example, a therapeutic monoclonal antibody and a small molecule inhibitor may be used. Thus, any combination of two or more of a monoclonal antibody, a small molecule inhibitor and a chemotherapeutic agent is envisaged.

Kits

The invention further provides a kit for determining whether an individual is having an adenoma with a high risk of progressing to an adenocarcinoma, the kit comprising a device for collecting a sample comprising colorectal cells from said individual, and reagents for directly or indirectly determining a level of expression of at least Hp, preferably Hp and at least one of LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, or Hb and at least one of LAMP1, SYNE 2, ANXA6, LRG1, RBP4 and FN1, in said sample, preferably reagents for an immunochemical assay.

The kit for performing the method according to the invention may be selected from any suitable assay and data processing apparatus and equipment.

Said reagents for determining a level of expression of at least Hp preferably are reagents for an immunochemical assay.

It is preferred that the reagents for determining a level of expression of at least Hp include a receptacle that is coated with antibodies against at least Hp, or monolithic material or microbeads that are coated with antibodies against at least Hp, allowing detection of a level of expression of at least Hp.

Said receptacle preferably is an array, comprising a solid support and antibodies against at least Hp in an arrayed format that are immobilized on the solid support. The solid support is typically glass or a polymer, the most commonly used polymers being cellulose, polyacrylamide, nylon, polystyrene, polyvinyl chloride or polypropylene. The solid support may be in the form of tubes, beads, discs, silicon chips, microplates, polyvinylidene difluoride (PVDF) membrane, nitrocellulose membrane, nylon membrane, other porous membrane, non-porous membrane (e.g. plastic, polymer, polymethylmethacrylaat, silicon), a plurality of polymeric pins, or a plurality of microtitre wells, or any other surface suitable for immobilising proteins, antibodies and other suitable molecules and/or conducting an immunoassay. By using well-known techniques, such as contact or non-contact printing, masking or photolithography, the location of each spot on said solid support can be defined.

Said monolithic material or microbeads are preferably coated with antibodies against at least Hp, more preferably against Hp and at least 1 further gene expression product of a gene selected from LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, more preferably against Hp and at least 2 further gene expression products of LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, most preferably against Hp and at least 3 further gene expression products of LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, preferably Hp, LAMP1, SYNE 2 and ANXA6, or Hp LRG1, RBP4 and FN1.

Said monolithic material or microbeads are preferably coated with antibodies against Hb and at least 1 further gene expression product of a gene selected from LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, more preferably against Hb and at least 2 further gene expression products of LAMP1, SYNE2, ANXA6, LRG1, RBP4 and FN1, most preferably against Hb and at least 3 further gene expression products of LAMP1, SYNE 2, ANXA6, LRG1, RBP4 and FN1, preferably Hb, LAMP1, SYNE2 and ANXA6, or Hb LRG1, RBP4 and FN1.

Said monolithic material or microbeads coated with antibodies enable simultaneous detection of multiple protein expression molecules. The simultaneous analysis is cost effective and amenable to high-throughput/automation. The invention further provides a use of a kit according to the invention for use in a method for typing a sample of colorectal adenoma of an individual, preferably for determining presence of absence of an adenoma with a high risk of progressing to an adenocarcinoma.

General

For the purpose of clarity and a concise description, features are described herein as part of the same or separate aspects and preferred embodiments thereof, however, it will be appreciated that the scope of the invention may include embodiments having combinations of all or some of the features described.

The invention will now be illustrated by the following examples, which are provided by way of illustration and not of limitation and it will be understood that many variations in the methods described and the amounts indicated can be made without departing from the spirit of the invention and the scope of the appended claims.

EXAMPLES

Example 1

Materials and methods

Samples

Written informed consent was obtained from all subjects who provided stool and FIT samples. Collection, storage and use of patient derived tissue and data were performed in compliance with the‘Code for Proper Secondary Use of Human Tissue in The Netherlands’ Dutch Federation of Biomedical Scientific Societies (Dutch Federation of Biomedical Scientific Societies 2011). Study Samples

Whole stool samples from 293 individuals diagnosed with CRC (n=81), AA (n=40), non-advanced adenoma (n=43), and individuals without colorectal neoplasia (n=129) were collected from a colonoscopy-controlled referral population at multiple centers in the Netherlands and Germany, between 2005 and 2012. Processing for mass-spectrometry analysis was described in Bosch et al (Bosch et al., 2017. Ann Intern Med 167: 855-866).

From a subset of 162 of these individuals, FIT samples (OC-sensor, Eiken Chemical, Tokyo, Japan) were obtained prior to colonoscopy. These included patients diagnosed with CRC (n=17), high-risk adenoma (n=10), low-risk adenoma (n=39) or individuals without colorectal neoplasia (n=96).

For all 83 adenoma patients, formalin-fixed paraffin embedded (FFPE) tissue was requested from the pathology archive of the Amsterdam UMC, location VUmc, the Netherlands. In total 12 patients dropped out for various reasons, leaving 71 patients for the final analyses.

Validation Samples

Between June 2009 and July 2010, a population-based screening pilot (COlonoscopy or COlonography for Screening (COCOS) trial) ran in the

Netherlands in which asymptomatic individuals were invited for primary colonoscopy screening (de Wijkerslooth et al., 2010. BMC Gastroenterol 10: 47; Stoop et al., 2012. Lancet Oncol 13 :55-64; de Wijkerslooth et al., 2012, Arm J Gastroenterol 107: 1570-8). Screening participants allocated to the colonoscopy arm of the COCOS-trial (n=1426) and willing to undergo colonoscopy were invited to collect a FIT sample (OC-sensor, Eiken Chemical, Tokyo, Japan) prior to their screening colonoscopy and before the start of laxative treatment. In a parallel study DNA from FFPE tissues of the patients with advanced neoplasia in the COCOS-trial were analyzed for the presence of CAEs. The FIT samples from 795 individuals diagnosed with CRC (n=8), high-risk adenomas (n=19), low-risk adenomas (n=52) or without colorectal neoplasia (n=716) were used as validation series in the current study.

DNA isolation and copy number identification by low-coverage whole genome sequencing DNA was isolated from FFPE tissues with a column-based method (QIamp DNA microkit, Qiagen, Hilden, Germany) as described before (Buffart et al., 2007. Cell Oncol 29: 351-9), with some adaptations for small lesions. DNA copy number analysis and status for both the adenomas of the study samples as well as the validation samples were reported in a previous publication (Carvalho et al., 2018. Cancer Prev Res (Phila) 11: 403-412). In short, isolated DNA was subjected to low- coverage whole-genome sequencing on the HiSeq 2000 (Illumina) in a 50-bp single- read modus using the Illumina Truseq Nano kit. Raw sequence reads were mapped to the human reference genome build GRCh37/hgl9 and data was further analyzed using QDNAseq, CGHcall, CGHregions (Carvalho et al., 2018. Cancer Prev Res (Phila) 11: 403-412). Adenomas were characterized for gains of chromosomal arms 8q, 13q, and 20q, and losses of 8p, 15q, 17p, and 18q. If two or more of these CAEs were present the adenoma would be classified as high-risk adenoma (Carvalho et al., 2018. Cancer Prev Res (Phila) 11: 403-412); Hermsen et al., 2002.

Gastroenterology 123: 1109-19). Individuals with at least one high-risk adenoma are defined as high-risk.

LC-MS/MS data analysis

The tandem mass spectrometry (LC-MS/MS) data were readily available and described in a previous publication (Bosch et al., 2017. Ann Intern Med 167: 855- 866). Protein identification was performed with MaxQuant (Cox and Mann, 2008. Nat Biotechnol 26: 1367-72) as described previously (Bosch et al., 2017. Ann Intern Med 167: 855-866) with the use of Swissprot human reference proteome FASTA file (canonical and isoforms, obtained in October 2017, 20237 entries). Contaminants and reversed proteins were removed. Protein groups with a positive Andromeda score were extracted. Proteins were quantified by spectral counting (Liu et al.,

2004. Anal Chem 76: 4193-201). Protein groups were excluded from further analysis if they had missing data for over 15% of the cases, i.e. 13 samples for high- risk adenomas or 80 samples for high-risk adenomas and CRCs. Differential protein expression analysis was performed using the beta-binominal test (Pham et al., 2010. Bioinformatics 26: 363-9), log2 fold changes and p-values were obtained. P-values adjusted for multiple hypothesis testing were obtained with the

Benjamini-Hochberg procedure. Multidimensional scaling algorithm with

Euclidean distance was used to visualize protein expression profiles. Differential analysis was performed for the following groups and the following thresholds were applied: high-risk adenomas compared to samples without colorectal neoplasia (log2 fold change > 0 and p-value £ 0.1), CRCs and high-risk adenomas compared to samples without colorectal neoplasia (log2 fold change ³ 2 and adjusted p-value £ 0.05) to select for proteins higher expressed in cases than in healthy controls.

Clustering of the proteins higher expressed in cases than controls was performed using hierarchical clustering, where protein abundances were normalized to Z- scores. Subsequently, the Euclidean distance measure was used with ward linkage for samples and complete linkage for proteins.

Protein biomarker panel identification with logistic regression

The overview of the data analysis approach is presented in Figure 1. Proteins overexpressed in cases (high-risk adenomas with/without CRCs) when compared to controls (samples without colorectal neoplasia) constituted input for selecting biomarker panels. Logistic regression analysis with Lasso regularization was used to identify biomarker panels consisting of two, three or four proteins that best distinguishes cases from controls. A leave-one-out cross-validation procedure was applied to evaluate the performance of the data analysis approach. Cross-validated logistic predictions were obtained. Receiver operating characteristic (ROC) analysis was used to evaluate the performance of protein panels to discriminate cases from controls by calculating partial area under the curve (pAUC) at the specificity of 95%-100% and sensitivity at 95% specificity. The pAUC was compared to pAUC of hemoglobin (HBA1), p-values were obtained with the stratified bootstrap resampling of case/control labels of the individuals with 2000 permutations (Robin et al., 2011. BMC Bioinformatics no. 12:77).

Haptoglobin quantification in FIT samples

FIT fluids from the two sample series were analyzed with antibody-based assays. The first series were FIT samples corresponding to the stool samples used for the proteomics study, i.e.“the study series”. From 162 samples, four were excluded from technical reasons (healthy controls n = 3, CRC n = 1) leaving 158 samples for Hp quantification (healthy controls n=93, low-risk adenomas n=39, high-risk adenomas n=10 and CRCs n=16). The second independent series were samples collected from individuals in a screening setting, i.e.“the validation series”, (healthy controls n=716, low-risk adenomas n=52, high-risk adenomas n=19 and colorectal cancers n=8) (de Wijkerslooth et al., 2010. BMC Gastroenterol 10: 47). Screening MULTI-SPOT 96 4-Spot Prototype Human 4-plex plates pre- coated with capture antibodies directed against Hp and corresponding kit reagents were purchased from MSD (Rockville, USA). All solutions and protocols were prepared according to the manufacturer’s instructions. The incubation time of the undiluted FIT fluid sample on the plate and the subsequent incubation of the detection antibody was for two hours at room temperature with vigorous shaking. Standard curves were prepared in FIT buffer (Eiken Chemical Co.) using kit calibrators. After washing three more times and subsequent addition of 150 ml diluted read buffer, plates were immediately measured by

electrochemiluminescence detection on the MSD SECTOR Imager 2400.

Data was analyzed using the MSD Discovery Workbench 4.0 software by application of a 4-parameter logistic curve-fitting algorithm including a 1/Y2 weighting function in order to generate standard curves for the calculation of the analyte concentration in the FIT fluid samples.

Fit values - correlation analysis

FIT samples corresponded to the stool samples used for the proteomics study (the study series) and included healthy controls (n=96), low-risk adenomas (n=43), high-risk adenomas (n=10), unclassified adenomas (n=8) and CRCs (n=17).

Hemoglobin (HBA1 and HBB) and haptoglobin (Hp) protein expression as determined by mass spectrometry were compared to FIT values in the same samples. Missing values were excluded from the analysis. Spearman correlation analysis was performed on normalized spectral counts of HBA1, HBB, Hp and FIT values, correlation coefficients (rho) and p-values were obtained.

Results

Characterization of Cancer Associated Events in colorectal adenomas

Stool samples from 291 individuals including 83 patients with colorectal adenomas were previously profiled by proteomics. In the current study, the adenoma tissues were collected for the identification of Cancer Associated Events (CAEs). In total 95 adenomas corresponding to 71 patients were available for CAE identification (data not shown). Gains of chromosomal arms 13q and 20q were the most frequently observed CAEs in this cohort, present in 17.9% and 16.8% adenomas, respectively (data not shown). Gain of 13q and 20q was also

significantly more abundant in the advanced adenomas as compared to the non- advanced adenomas.

Two CAEs or more, indicating a higher risk of progression, were identified in 15.8% of the adenomas overall, in 36.4% (12/33) of the advanced adenomas and in 4.8% (3/ 62) of the non-advanced adenomas. These 15 adenomas with two CAEs or more will further be referred to as the high-risk adenomas.

Protein profiling and selection of candidate biomarkers

Proteomics profiling of stool samples from 129 healthy control individuals, 83 adenoma patients and 79 CRC patients, revealed 792 protein groups at the FDR threshold of £ 0.01 in all stool samples and 632 in the stool samples derived from individuals with adenomas. As FIT values were available for a subset of

individuals with adenomas, we performed a correlation analysis between FIT values and normalized spectral counts for hemoglobin, in particular for HBA1 and HBB separately. This revealed significant positive correlations for both HBA1 (rho = 0.46, p-value < 0.001) and HBB (rho = 0.43, p-value < 0.001), confirming the feasibility of quantifying human proteins in stool by mass spectrometry.

Dimensionality reduction performed on the protein expression profiles

distinguished stool samples derived from individuals with cancers from the ones with adenomas or healthy controls. To identify potential biomarker proteins that discriminate high-risk adenomas from healthy controls, we performed differential expression analysis between stool samples derived from individuals with adenomas at high risk of progressing to cancer and healthy controls. This yielded 31 up- regulated proteins in high-risk adenomas stool samples (log2 fold change > 0 and p- value £ 0.1, Table 1A). Additionally, to identify proteins differentiating all high- risk lesions, i.e. cancers and high-risk adenomas, we have performed differential protein expression analysis to separate these screen-relevant lesions from controls. Application of the same threshold as for the comparison of high-risk adenomas to healthy controls revealed 125 protein groups higher expressed in high-risk adenomas and CRCs. For further analysis, a more stringent threshold was applied (i.e. p-value £ 0.05 and log2 fold change ³2) and revealed 61 up-regulated proteins in high-risk adenomas and CRCs compared to controls (Table IB). Significant overlap was identified between differential proteins from both analyses (p-value = 1.47e-4, hypergeometric test) with the following 13 proteins overlapping: CP, Hp, A2M, C3, C5, APCS, TF, ANXA6, C4B, C6, STOM, SERPINA4, ITIH4. These 13 proteins are therefore the most promising biomarkers for the identification of high- risk adenomas and CRCs combined.

Biomarker panel selection for high-risk adenomas

We further investigated our data to find biomarker panels of complementary proteins that would perform better than hemoglobin in distinguishing high-risk adenomas from healthy controls and high-risk adenomas and CRCs combined from healthy controls. Panels of four, three or two proteins were examined. To evaluate the diagnostic performance of each biomarker panel in the context of population- wide screening for colorectal cancer, we performed receiver operating characteristic (ROC) curve analysis for each panel and compared its performance to hemoglobin, which is currently used in CRC-screening. Since FIT values were not available for the whole dataset, we compared the performance of the biomarker panel to HBA1 quantified by LC-MS/MS as a substitute. The analysis was done on a partial AUC (pAUC) at the specificity level between 95%-100% and sensitivity was evaluated at 95% specificity, since high specificity is pivotal for the success of a population-wide screening program.

First, we applied logistic regression with lasso regularization on the 31 upregulated proteins in high-risk adenomas to identify a biomarker panel (see Figure 1 for the data analysis overview). In the resulting regression model Hp, LAMP1, SYNE2 and ANXA6 were selected, while the models for three or two proteins were not built, as the coefficients for the proteins shrunk to zero at the same time. Then the performance was evaluated using a leave-one-out cross validation approach and a ROC curve was used to compare to the performance of hemoglobin. In the cross-validation procedure only models based on four proteins were included (Figure 2A for four biomarkers panel). Despite the fact that the pAUC of the biomarker panel (pAUC = 60.2%) was higher than for HBA1 (pAUC = 54.5%), the difference was not significant. At the specificity level of 95% the biomarker panel could identify 8 out of 15 high-risk adenomas (sensitivity = 54%), which was significantly more than hemoglobin (p-value = 0.05, see Table 2A). The markers most frequently selected in the cross-validation procedure were Hp, LAMP1, SYNE2, ANXA6, with a frequency of over 90%, indicating that these proteins have the most discriminative roles in the regression models (Figure 2B). The model was also applied on low-risk adenomas. Here, five (9%) low-risk adenomas were classified as cases and 51 (91%) as controls, indicating that this biomarker panel is specific for the identification of high-risk adenomas (data not shown).

Biomarker panel selection for high-risk adenomas and CRCs combined

Next, we performed the logistic regression analysis for the 61 up-regulated proteins in stool samples derived from individuals with high-risk adenomas and CRCs. The model with four protein biomarkers consisted of Hp, LRG1, RBP4 and FN1, the model with three features was not built as the coefficients of FN1 and RBP4 shrunk to zero at the same time, and the model of two proteins consisted of Hp and FRG1.

In the cross-validation procedure, we evaluated the models of four and two proteins (Figure 3A-B for four biomarkers panel, Figure 3 C-D for the panel of two proteins). The cross-validated pAUCs of four (pAUC= 70.4%) and two (pAUC = 71.1%) protein models significantly outperformed hemoglobin (pAUC HBA1 = 62.7%, both p-value = 0.007, Figure 3A and C).

At the specificity level of 95% the biomarker panels could identify 58 or 62 out of 94 high-risk adenomas and CRCs combined for the four and two feature models, respectively (sensitivity = 62% or 66%), which was significantly more than HBA1 (sensitivity = 40%, p-value = 5.57e-3, 4.74e-4, respectively, see Table 2B).

The original four and two protein models were also applied on low-risk adenomas. The four protein model classified 6 (11%) out of 56 low-risk adenomas as cases and 50 (89%) as controls, while the two protein model classified 7 (12.5%) low-risk adenomas as cases and 49 (87.5 %) as controls (data not shown).

The most frequent proteins included in the four protein regression models in the cross-validation procedure were Hp, FRG1, RBP4 and FN1 with frequencies of over 90%, confirming their predictive characteristics and the stability of the model. The model with two proteins always consisted of Hp and FRG1 in the cross- validation procedure, indicating their strongest predictive characteristics (Figure 3C and D). When focusing on the overlap of up-regulated proteins in both comparisons and the biomarker panels selected by Lasso regularization, only one protein was present in all panels, Hp. This suggests that Hp might be a crucial component when distinguishing between cases and controls, as it also was one of the most frequent proteins occurring in both analyses.

Validation of Hp expression by immunoassay in FIT samples

As Hp forms a complex with hemoglobin, we explored if the protein abundance as measured by mass spectrometry was correlated to FIT and/or hemoglobin (data not shown). As expected, we observed a strong correlation to HBA1 and HBB and a somewhat weaker correlation to FIT (Correlation coefficient 0.77, 0.67 and 0.55, respectively, p-value <0.001 for all comparisons). In line with this, Hp as a single marker protein compared to FIT in a ROC curve analysis did not outperform FIT (data not shown).

However, in the regression models Hp was the only protein consistently selected in all three markers panels and therefore we further explored the Hp levels in two FIT cohorts. For a subset of 153 individuals from whom the stool samples were used for proteomics profiling, i.e. the study series, FIT samples were also available for analysis; 16 CRCs, 10 high-risk adenomas, 39 low-risk adenomas and 93 controls. We performed Hp quantification in these FIT samples with the use of an immunoassay and identified a significantly higher concentration of Hp in the high-risk adenoma samples compared to the controls (fold change=1.9, p-value = 0.036, Figure 4A). Additionally, an independent validation series was used (Figure 4B), in which the CAEs were also profiled and the risk of progression status of the adenomas was available. The validation series consisted of 716 healthy controls, 52 low-risk adenomas, 19 high-risk adenomas and 8 CRCs. Here a higher abundance of Hp in high-risk adenomas and CRCs compared to controls was confirmed (fold change =15.9, p-value = 9e-5 and fold change=42.6, p-value = 9.7e-5, respectively). This confirms our findings by mass spectrometry and suggests that Hp can be applied as biomarker for high-risk adenomas and CRCs. Table 1A. Overview of the proteomics data from the discovery series. Protein abundance per sample as well as results of the differential protein expression analysis were reported. Data sorted for log2 fold change > 0 and p-value £ 0.1 yielded 31 up-regulated proteins in high-risk adenomas stool samples.

Table IB. Overview of the proteomics data from the discovery series. Protein abundance per sample as well as results of the differential protein expression analysis were reported. Data sorted for log2 fold change ³ 0 and p-value £ 0.05 yielded 61 up-regulated proteins in high-risk adenomas and CRC stool samples.

Table 2. Confusion matrix for the cross-validated performance of the models of biomarker panels. Performance of the biomarker panel regression models were evaluated at 95% specificity, p-values were calculated with Fisher exact test for high-risk adenomas and X 2 test for high-risk adenomas and CRCs by a comparison of the sensitivities to hemoglobin.

A. High-risk adenomas versus healthy controls

B. High-risk adenomas and CRCs versus healthy controls.