Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS FOR DETERMINING CANCER
Document Type and Number:
WIPO Patent Application WO/2022/152784
Kind Code:
A1
Abstract:
Subject of the invention is a method for determining which type of cancer an individual has and/or if the individual has cancer or not, comprising (a) providing a sample from the individual, which is a body liquid or fraction thereof, (b) defining a group of cancer types, which comprise at least two from the group of bladder, brain, breast, colorectal, lung, ovarian, pancreas, prostate and stomach cancer, (c) determining the levels of markers in the sample, wherein the markers comprise (i) at least one single nucleotide variant (SNV), (ii) at least one microRNA (miRNA), and (iii) at least one DNA methylation, wherein the markers comprise (a1) at least 2 markers selected from AR H875Y, TP53 (COSM10758), MLH1_meth and hsa_miR_17_5p, and (b1) at least 2 markers selected from APC (COSM18561), hsa_miR_133a_3p, hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_20a_5p, hsa_miR_92a_3p, hsa_miR_155_5p, hsa_miR_195_5p, hsa_miR_101_3p, hsa_miR_27a_3p, hsa_miR_26a_5p and hsa_miR_21_5p, (d) comparing the levels to a known standard, and (e) determining, based on the result of step (d), which type of cancer from the group defined in step (b) the individual has and/or if the individual has cancer or not. Subject of the invention is also the use of the single nucleotide variant AR H875Y as a marker for determining from a body liquid or fraction thereof which type of cancer an individual has, wherein the type of cancer is selected from bladder, colorectal, lung, stomach, ovarian or brain cancer.

Inventors:
TOMEVA ELENA (AT)
HASLBERGER ALEXANDER (AT)
HIPPE BERIT (AT)
SCHMID JUERG DANIEL (CH)
SWITZENY OLIVIER (AT)
HEITZINGER CLEMENS (AT)
Application Number:
PCT/EP2022/050625
Publication Date:
July 21, 2022
Filing Date:
January 13, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HEALTHBIOCARE GMBH (AT)
International Classes:
C12Q1/6886
Domestic Patent References:
WO2016141169A12016-09-09
WO2019067092A12019-04-04
WO2016033114A12016-03-03
Other References:
ARORA ARSHI ET AL: "Pan-cancer identification of clinically relevant genomic subtypes using outcome-weighted integrative clustering", GENOME MEDICINE, vol. 12, no. 1, 3 December 2020 (2020-12-03), pages 1 - 13, XP055813067
HOADLEY KATHERINE A ET AL: "Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer", CELL, ELSEVIER, AMSTERDAM NL, vol. 173, no. 2, 5 April 2018 (2018-04-05), pages 291 - 304, XP085371385, ISSN: 0092-8674, DOI: 10.1016/J.CELL.2018.03.022
BO WANG ET AL: "Similarity network fusion for aggregating data types on a genomic scale", NATURE METHODS, vol. 11, no. 3, 1 March 2014 (2014-03-01), New York, pages 333 - 337, XP055576311, ISSN: 1548-7091, DOI: 10.1038/nmeth.2810
LUO ZHENHUA ET AL: "Pan-cancer analysis identifies telomerase-associated signatures and cancer subtypes", vol. 18, no. 1, 10 June 2019 (2019-06-10), XP055813072, Retrieved from the Internet DOI: 10.1186/s12943-019-1035-x
OMBERG LARSSON ET AL: "Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas", NATURE GENETICS, vol. 45, no. 10, Sp. Iss. SI, October 2013 (2013-10-01), pages 1121 - 1126, XP055813106
MAMATJAN YASIN ET AL: "Molecular Signatures for Tumor Classification", THE JOURNAL OF MOLECULAR DIAGNOSTICS, vol. 19, no. 6, 1 November 2017 (2017-11-01), pages 881 - 891, XP055813074, ISSN: 1525-1578, DOI: 10.1016/j.jmoldx.2017.07.008
SEHGAL VASUDHA ET AL: "Robust Selection Algorithm (RSA) for Multi-Omic Biomarker Discovery; Integration with Functional Network Analysis to Identify miRNA Regulated Pathways in Multiple Cancers", PLOS ONE, vol. 10, no. 10, 27 October 2015 (2015-10-27), pages e0140072, XP055813081, DOI: 10.1371/journal.pone.0140072
YANG ZIJIAN ET AL: "A Multianalyte Panel Consisting of Extracellular Vesicle miRNAs and mRNAs, cfDNA, and CA19-9 Shows Utility for Diagnosis and Staging of Pancreatic Ductal Adenocarcinoma", CLINICAL CANCER RESEARCH, vol. 26, no. 13, 16 April 2020 (2020-04-16), US, pages 3248 - 3258, XP055912997, ISSN: 1078-0432, DOI: 10.1158/1078-0432.CCR-19-3313
AZAD, A.A.VOLIK, S.V.WYATT, A.W.HAEGERT, A.LE BIHAN, S.BELL, R.H.ANDERSON, S.A.MCCONEGHY, B.SHUKIN, R.BAZOV, J. ET AL.: "Androgen Receptor Gene Aberrations in Circulating Cell-Free DNA: Biomarkers of Therapeutic Resistance in Castration-Resistant Prostate Cancer", CLINICAL CANCER RESEARCH : AN OFFICIAL JOURNAL OF THE AMERICAN ASSOCIATION FOR CANCER RESEARCH, vol. 21, 2015, pages 2315 - 2324
BROOKE, G.N.BEVAN, C.L.: "The role of androgen receptor mutations in prostate cancer progression", CURR GENOMICS, vol. 10, 2009, pages 18 - 25
"Comprehensive molecular characterization of human colon and rectal cancer", NATURE, vol. 487, 2012, pages 330 - 337
COHEN, J.D.JAVED, A.A.THOBURN, C.WONG, F.TIE, J.GIBBS, P.SCHMIDT, C.M.YIP-SCHNEIDER, M.T.ALLEN, P.J.SCHATTNER, M. ET AL.: "Combined circulating tumor DNA and protein biomarker-based liquid biopsy for the earlier detection of pancreatic cancers", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, vol. 114, 2017, pages 10202 - 10207, XP055520162, DOI: 10.1073/pnas.1704961114
COHEN, J.D.LI, L.WANG, Y.THOBURN, C.AFSARI, B.DANILOVA, L.DOUVILLE, C.JAVED, A.A.WONG, F.MATTOX, A. ET AL.: "Detection and localization of surgically resectable cancers with a multi-analyte blood test", SCIENCE, vol. 359, 2018, pages 926 - 930, XP055687252, DOI: 10.1126/science.aar3247
ESPOSITO, A.CRISCITIELLO, C.LOCATELLI, M.MILANO, M.CURIGLIANO, G.: "Liquid biopsies for solid tumors: Understanding tumor heterogeneity and real time monitoring of early resistance to targeted therapies", PHARMACOL THER, vol. 157, 2016, pages 120 - 124, XP029374575, DOI: 10.1016/j.pharmthera.2015.11.007
FLEISCHHACKER, M.SCHMIDT, B.: "Circulating nucleic acids (CNAs) and cancer--a survey", BIOCHIMICA ET BIOPHYSICA ACTA, vol. 1775, 2007, pages 181 - 232, XP002490780, DOI: 10.1016/j.bbcan.2006.10.001
GAISUN: "Epigenetic Biomarkers in Cell-Free DNA and Applications in Liquid Biopsy", GENES, vol. 10, 2019, pages 32
LEHMANN-WERMAN, R.NEIMAN, D.ZEMMOUR, H.MOSS, J.MAGENHEIM, J.VAKNIN-DEMBINSKY, A.RUBERTSSON, S.NELLGARD, B.BLENNOW, K.ZETTERBERG, H: "Identification of tissue-specific cell death using methylation patterns of circulating DNA", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, vol. 113, 2016, pages E1826 - 1834, XP055436315, DOI: 10.1073/pnas.1519286113
LI, J.HARRIS, L.MAMON, H.KULKE, M.H.LIU, W.H.ZHU, P.MIKE MAKRIGIORGOS, G.: "Whole genome amplification of plasma-circulating DNA enables expanded screening for allelic imbalance in plasma", THE JOURNAL OF MOLECULAR DIAGNOSTICS : JMD, vol. 8, 2006, pages 22 - 30, XP055149339, DOI: 10.2353/jmoldx.2006.050074
RAZAVI, P.CHANG, M.T.XU, G.BANDLAMUDI, C.ROSS, D.S.VASAN, N.CAI, Y.BIELSKI, C.M.DONOGHUE, M.T.A.JONSSON, P. ET AL.: "The Genomic Landscape of Endocrine-Resistant Advanced Breast Cancers", CANCER CELL, vol. 34, 2018, pages 427 - 438
SHIGEYASU, K.TODEN, S.ZUMWALT, T.J.OKUGAWA, Y.GOEL, A.: "Emerging Role of MicroRNAs as Liquid Biopsy Biomarkers in Gastrointestinal Cancers", CLINICAL CANCER RESEARCH : AN OFFICIAL JOURNAL OF THE AMERICAN ASSOCIATION FOR CANCER RESEARCH, vol. 23, 2017, pages 2391 - 2399, XP055736639, DOI: 10.1158/1078-0432.CCR-16-1676
TAPLIN, M.E.BUBLEY, G.J.SHUSTER, T.D.FRANTZ, M.ESPOONER, A.E.OGATA, G.K.KEER, H.N.BALK, S.P.: "Mutation of the androgen-receptor gene in metastatic androgen-independent prostate cancer", THE NEW ENGLAND JOURNAL OF MEDICINE, vol. 332, 1995, pages 1393 - 1398
ARORA, A.OLSHEN, A.B.SESHAN, V.E. ET AL.: "Pan-cancer identification of clinically relevant genomic subtypes using outcome-weighted integrative clustering", GENOME MED, vol. 12, 2020, pages 110
HOADLEY KA ET AL.: "Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer", CELL, vol. 173, no. 2, 5 April 2018 (2018-04-05), pages 291 - 304, XP085371385, DOI: 10.1016/j.cell.2018.03.022
WANG B ET AL.: "Similarity network fusion for aggregating data types on a genomic scale", NAT METHODS, vol. 11, no. 3, March 2014 (2014-03-01), pages 333 - 7, XP055576311, DOI: 10.1038/nmeth.2810
LUO Z ET AL.: "Pan-cancer analysis identifies telomerase-associated signatures and cancer subtypes", MOL CANCER, vol. 18, no. 1, 10 June 2019 (2019-06-10), pages 106
OMBERG L: "Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas", NAT GENET, vol. 45, no. 10, October 2013 (2013-10-01), pages 1121 - 6, XP055813106
MAMATJAN Y ET AL.: "Molecular Signatures for Tumor Classification: An Analysis of The Cancer Genome Atlas Data", J MOL DIAGN, vol. 19, no. 6, November 2017 (2017-11-01), pages 881 - 891
Attorney, Agent or Firm:
BANSE & STEGLICH PATENTANWÄLTE PARTMBB (DE)
Download PDF:
Claims:
CLAIMS A method for determining which type of cancer an individual has and/or for determining if the individual has cancer or not, comprising

(a) providing a sample from the individual, which is a body liquid or fraction thereof,

(b) defining a group of cancer types, which comprise at least two from the group of bladder, brain, breast, colorectal, lung, ovarian, pancreas, prostate and stomach cancer,

(c) determining the levels of markers in the sample, wherein the markers comprise

(i) at least one single nucleotide variant (SNV),

(ii) at least one microRNA (miRNA), and

(iii) at least one DNA methylation, wherein the markers comprise

(a1) at least 2 markers selected from AR H875Y, TP53 (COSM10758), MLH1_meth and hsa_miR_17_5p, and

(b1) at least 2 markers selected from APC (COSM 18561), hsa_miR_133a_3p, hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_20a_5p, hsa_miR_92a_3p, hsa_miR_155_5p, hsa_miR_195_5p, hsa_miR_101_3p, hsa_miR_27a_3p, hsa_miR_26a_5p and hsa_miR_21_5p,

(d) comparing the levels to a known standard, and

(e) determining, based on the result of step (d), which type of cancer from the group defined in step (b) the individual has and/or if the individual has cancer or not. The method according to at least one of the preceding claims, wherein the sample is blood or a fraction thereof and/or a liquid biopsy sample. The method according to at least one of the preceding claims, wherein the at least one SNV comprises AR H875Y. The method according to at least one of the preceding claims, wherein the at least one miRNA comprises hsa_miR_17_5p and wherein the at least one DNA methylation comprises MLH1_meth.

- 38 - The method according to at least one of the preceding claims, wherein the markers comprise AR H875Y, TP53 (COSM 10758), MLH1_meth and hsa_miR_17_5p. The method according to at least one of claims 3 to 5, wherein the markers comprise at least 5 markers selected from the group of APC (COSM 18561), hsa_miR_133a_3p, hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_20a_5p, hsa_miR_92a_3p, hsa_miR_155_5p, hsa_miR_195_5p, hsa_miR_101_3p, hsa_miR_27a_3p, hsa_miR_26a_5p and hsa_miR_21_5p. The method according to at least one of the preceding claims, wherein the markers comprise

AR H875Y,

(a1) at least 2 markers selected from TP53 (COSM10758), MLH1_meth and hsa_miR_17_5p,

(b1) at least 5 markers selected from APC (COSM 18561), hsa_miR_133a_3p, hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_20a_5p, hsa_miR_92a_3p, hsa_miR_155_5p, hsa_miR_195_5p, hsa_miR_101_3p, hsa_miR_27a_3p, hsa_miR_26a_5p and hsa_miR_21_5p, and

(c1) optionally at least one additional marker selected from GATA5_meth, Stratifin_meth and MDR1_meth. The method according to at least one of the preceding claims, wherein in step (b) a group of at least 5, preferably at least 8 types of cancer is defined. The method according to at least one of the preceding claims, wherein the group of cancers defined in step (b) comprises at least 3, preferably at least 5 from the group of bladder, brain, breast, colorectal, lung, ovarian, pancreas, prostate and stomach cancer. The method according to at least one of the preceding claims, wherein the markers comprise:

- for bladder cancer, at least one of AR H875Y, TP53 (COSM10758), hsa_miR_17_5p

- for brain cancer, at least one of MLH1_meth, GATA5_meth, hsa_miR_133a_3p

- 39 - - for breast cancer, at least one of AR H875Y, hsa_miR_17_5p, TP53

(COSM 10758), MDR1_meth

- for colorectal cancer, at least one of AR H875Y, hsa_miR_17_5p, hsa_miR_195_5p, TP53 (COSM 10758)

- for lung cancer, at least one of hsa_miR_155_5p, TP53 (COSM10758), hsa_miR_92a_3p, hsa_miR_17_5p

- for ovarian cancer, at least one of hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_101_3p, hsa_miR_92a_3p

- for pancreas cancer, at least one of at least one of hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_27a_3p, Stratifin_meth

- for prostate cancer, at least one of AR H875Y, hsa_miR_26a_5p, hsa_miR_17_5p

- for stomach cancer, at least one of hsa_miR_20a_5p, hsa_miR_21_5p, APC (COSM 18561). The method according to at least one of the preceding claims, wherein the levels of the markers are determined by quantitative PCR (qPCR). The method according to at least one of the preceding claims, wherein the total number of markers used is from 5 to 25. Use of the single nucleotide variant AR H875Y as a marker for determining from a body liquid or fraction thereof which type of cancer an individual has, wherein the type of cancer is selected from bladder, colorectal, lung, stomach, ovarian or brain cancer. A method for treating cancer, comprising determining in a method of at least one of the preceding claims, which type of cancer an individual has, and

(f) providing a therapeutic treatment for the individual, which is effective against the type of cancer identified in step (e). A method for providing information for use in determining which type of cancer an individual has and/or if the individual has cancer or not, comprising

(a) providing a sample from the individual, which is a body liquid or fraction thereof,

- 40 - (b) defining a group of cancer types, which comprise at least two from the group of bladder, brain, breast, colorectal, lung, ovarian, pancreas, prostate and stomach cancer,

(c) determining the levels of markers in the sample, wherein the markers comprise (i) at least one single nucleotide variant (SNV),

(ii) at least one microRNA (miRNA), and

(iii) at least one DNA methylation; wherein the markers comprise

(a1) at least 2 markers selected from AR H875Y, TP53 (COSM 10758), MLH1_meth and hsa_miR_17_5p, and

(b1) at least 2 markers selected from APC (COSM 18561), hsa_miR_133a_3p, hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_20a_5p, hsa_miR_92a_3p, hsa_miR_155_5p, hsa_miR_195_5p, hsa_miR_101_3p, hsa_miR_27a_3p, hsa_miR_26a_5p and hsa_miR_21_5p; and (d) comparing the levels to a known standard and optionally storing the information obtained on a storage device.

Description:
Methods for Determining Cancer

The invention relates to a method for determining which type of cancer an individual has or if the individual has cancer or not, wherein single nucleotide variants, microRNA and DNA methylation are used as markers. The invention also relates to uses, methods for treating cancer and methods for providing information.

State of the art

Cancer is mostly a manageable disease as long as it is diagnosed before metastasis has begun. In most cases, higher-grade cancer evolves from lower-grade cancer. Therefore, it is important to detect cancer as early as possible by reliable and accessible diagnostic methods.

The research for tumor markers in the past decades was primarily based on the elevation of specific antigens such as CA-125, CA15-3, CA19-9, PSA and vimentin. Single cancer markers have been found to often lead to false-negative results or are abnormally elevated in cancer-free patients (false positive). This could be explained by the variability in biomarker expression across individuals and the heterogeneity of cancer cells of the same tissue. It is unusual that a single molecular biomarker can accurately diagnose a cancerous disease (Esposito et al., 2016).

Researchers in the field of 'liquid biopsy' are developing technologies to analyze sparse molecular biomarkers shed from inaccessible tissue in easily sampled bodily fluids, such as urine, blood, saliva, sweat, feces, and tears (Cohen et al., 2018). Cell-free DNA (cfDNA) and miRNA (cfmiRNA) from apoptotic, necrotic or viable tumor cells are released into the bloodstream and have a variable half-life in the circulation ranging from 15 minutes to several hours. The majority of cfDNA in the circulation measures 130 to 180 bp (Fleischhacker and Schmidt, 2007). Genome-wide miRNA expression profiling by cfmiRNA sequencing led to finding useful biomarkers for the early diagnosis of various cancers (Shigeyasu et al., 2017).

Tumor-derived cfDNA harbors somatic mutations originating from the tumor and comprises tissue-specific DNA methylation. Hence organ-specific epigenetic pattern is measurable in the circulation (Lehmann-Werman et al., 2016). Since many tumors originating from different tissues share identical SNVs (Olivier et al., 2010), epigenetic information adds a tissue-specific data layer.

A combination of more than one analyte has been found to improve several liquid biopsy studies' specificity and sensitivity. It was found that a cancer screening approach based on results of cfDNA based SNV analysis significantly improves when it is combined with the protein biomarker analysis (Cohen et al., 2017; Cohen et al., 2018). The vast amount of data generated by these methods needs automated data processing to deliver clinically relevant information. They range from simple approaches, such as logistic regression and support vector machines, to complex artificial neural networks with many hidden layers.

Mutations of the AR are known since 1995. They are primarily found in prostate cancer cells and result in constant activation of the receptor that is "androgen-independent", thus providing a constant growth signal for the cell (Brooke and Bevan, 2009; Taplin et al., 1995). The AR H875Y mutation is predominantly found in prostate cancer, including its lethal castration-resistant form (CRPC). AR mutations were also identified in CRC and breast cancer cells, but at a very low frequency. This mutation was found by deep sequencing of cfDNA in 5 out of 62 CRC patients (Azad et al., 2015). Other studies documented this mutation in one colon carcinoma (Cancer Genome Atlas, 2012) and one breast cancer sample (Razavi et al., 2018). Other cancer types are not documented to harbor the mutation. The results in these studies do not suggest that the mutation could be a relevant marker for determining which type of cancer and individual has.

WO 2016/141169 A1 relates to methods for identifying a treatment associated with cancer. The methods are based on the analysis of protein markers, which is relatively complicated. Further, it is not suggested to use the assay for determining which type of cancer an individual has. Overall, the methods could still be improved.

WO 2019/067092 A 1 relates to methods for identifying a single cancer type based on genetic and protein biomarkers. Since the methods are based on the analysis of protein markers, it is relatively complicated. Overall, the methods could still be improved.

WO 2016/033114 A1 relates to methods for determining levels of androgen receptor variants in a sample from a prostate cancer patient. The method is specific for a single cancer type and does not relate to distinguishing types of cancer.

Arora et al, 2020, relates to an outcome-weighted clustering algorithm for integrative molecular stratification focusing on patient survival, which was performed on 18 cancer types across multiple data modalities including somatic mutation, DNA copy number, DNA methylation, and mRNA, miRNA, and protein expression. However, the method is not suitable as a standard diagnostic test, because it is based on a high amount of markers which cannot be determined rapidly and conveniently.

Hoadley et al, 2018, relates to integrative molecular analyses of tumors 10,000 from specimens and 33 types of cancer, which included chromosome-arm-level aneuploidy, DNA hypermethylation, mRNA, and miRNA expression levels and reverse-phase protein arrays. However, the method is not suitable as a standard diagnostic test, because it is based on a high amount of markers which cannot be determined rapidly and conveniently.

Wang et al., 2014, relates to the use of similarity network fusion (SNF) to combine mRNA expression, DNA methylation and microRNA (miRNA) expression data for five cancer data sets. Omberg et al., 2013, relates to an analysis of data files encompassing protein expression, copy number variation, somatic mutation, mRNA expression, DNA methylation, microRNA (miRNA) expression and clinical data for 12 cancer types. Mamatjan et al., 2017, relates to an Analysis of The Cancer Genome Atlas Data regarding molecular signatures for tumor classification. However, no specific diagnostic assays are disclosed, and the accuracy and specificity of the methods could still be improved.

Luo et al., 2019, investigates the correlation between telomerase reverse transcriptase (TERT) activation and cancer types.

The known methods for diagnosing cancer or a cancer type have various drawbacks. Most methods relate to the validation of one single cancer type, but do not allow an early distinction between several different cancer types and a reliable prediction if an individual has cancer or not. The known methods are often complicated and cannot be carried out in a fast and cost efficient routine manner. Frequently, they are based on data which is not easily available, such as protein levels. Further, the reliability of known methods could still be improved. There is an ongoing need for simple, efficient and reliable methods for determining a type of cancer.

Problem underlying the invention

The problem underlying the invention is to provide methods for determining the type of cancer of an individual, which overcome the above mentioned problems. Preferably, the method should also be capable of determining whether the individual has cancer or not. It is a problem to provide respective methods which are reliable, but also simple and efficient. The method shall provide significant information about a cancer type from a relatively large number of potential cancer types.

It is a specific problem to provide a method which can be carried out in a simple, convenient, cost-efficient and preferably automated manner. The method shall be based on a relatively low number of markers and should be effective for a probe which is easily accessible. The number of reagents and process steps shall be relatively low.

Disclosure of the invention

Surprisingly, it was found that the problem underlying the invention is overcome by methods and uses according to the claims. Further embodiments of the invention are outlined in the description.

Subject of the invention is a method for determining which type of cancer an individual has and/or for determining if the individual has cancer or not,, comprising

(a) providing a sample from the individual, which is a body liquid or fraction thereof,

(b) defining a group of cancer types, which comprise at least two from the group of bladder, brain, breast, colorectal, lung, ovarian, pancreas, prostate and stomach cancer,

(c) determining the levels of markers in the sample, wherein the markers comprise

(i) at least one single nucleotide variant (SNV),

(ii) at least one microRNA (miRNA), and

(iii) at least one DNA methylation, wherein the markers comprise

(a1) at least 2 markers selected from AR H875Y, TP53 (COSM10758), MLH1_meth and hsa_miR_17_5p, and

(b1) at least 2 markers selected from APC (COSM 18561), hsa_miR_133a_3p, hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_20a_5p, hsa_miR_92a_3p, hsa_miR_155_5p, hsa_miR_195_5p, hsa_miR_101_3p, hsa_miR_27a_3p, hsa_miR_26a_5p and hsa_miR_21_5p,

(d) comparing the levels to a known standard, and (e) determining, based on the result of step (d), which type of cancer from the group defined in step (b) the individual has and/or if the individual has cancer or not.

The method is for determining which cancer type an individual has and/or if the individual has cancer at all. Therefore, it is a diagnostic method. Cancers involve abnormal cell growth with the potential to invade or spread to other parts of the body. They form a subset of neoplasms (tumors). As used herein, the term “type of cancer” (cancer type) can refer to the body part in which the cancer originates in. However, some body parts contain multiple types of tissue, so that the type of cancer can also be the type of cell that the tumor cells originated from.

The individual is a mammal, much preferably a human. However, the individual could also be a non-human mammal, such as a farm or domestic animal, such as a horse or dog. The user of the method preferably comprises medical professional, such as a physician, laboratory or medical staff. Alternatively, the method may be carried out at least in part by the individual itself or a layman.

Before the method is carried out, there may be no information or preliminary information about the type of cancer and/or the presence of cancer. The individual, typically human, may be suspected of having cancer. Accordingly, the method may provide the first information in this regard. The method can also be used for confirming a type of cancer, for which there has been an indication before by other means. The method can also be used for screening or monitoring if an individual, which is not suspected of having cancer, such as a healthy individual, has cancer and which type of cancer. The method can also be used for individuals, which are not healthy and where the presence of cancer shall be excluded.

In the inventive method, it is determined in step (e) which type of cancer an individual has. The method determines a type of cancer, which is from a defined group of cancer types. The group comprises at least two different cancer types, preferably at least three different types. Preferably, the method is for determining the type of cancer from a group of at least 4, at least 6 or at least 10 different types. Preferably, the method is for determining the type of cancer from a group of 2 to 50, preferably 4 to 30, or 6 to 20 different types of cancer. The method is for differentiating between different cancer types. Thereby, the method is different from methods in the art, in which it is only examined or verified if an individual has one single defined type of cancer or not. According to the invention, it was surprisingly found that a combination of markers, which are described in the following, is especially suitable for distinguishing between different cancer types, although it has not been described in the art that these specific markers would be highly correlated to a single type of cancer. Thus, the present invention is based on a novel approach for distinguishing between a group of different cancers based on a set of novel markers. In the method, a group of cancer types is defined in step (b). In this step, the term “defining” means selecting cancer types, which could be of interest in the diagnosis. For example, the defined group could comprise the most common cancer types, such that the result would be as comprehensive as possible. Alternatively, the defined group could comprise several cancer types, which are of special interest in view of the symptoms or history of the individual or other circumstances. In step (e), it is determined which cancer from this group the individual has and/or if the individual has cancer, or at least has a type of cancer from the defined group. In step (c), a combination of markers is used which provides relevant information regarding the cancer types defined in step (b).

As used herein, to "determine a type of cancer” shall mean that significant information is provided if the individual has a specific type of cancer or not. Thus, the method shall predict with high likelihood if the individual has a specific type of cancer. Preferably, the likelihood is at least 70%, more preferably at least 80% or even more than 90% or more than 95%. Based on the result, a medical professional can verify by other means, such as a highly specific assay, if the individual has the single cancer type which was determined in the inventive method.

In step (a), a sample from the individual is provided, which is a body liquid or fraction thereof. Preferably, the body liquid is blood, especially in the form of plasma or serum. Most preferred is blood plasma, especially from venous blood. In other embodiments, the body liquid may be saliva, urine, sweat, feces or tears. Thus, the body liquid is not a solid tissue or derived from solid tissue.

Preferably, the body liquid is a liquid biopsy sample. Liquid biopsy is a minimally invasive technology for detection of molecular biomarkers without the need for invasive procedures. A liquid biopsy (fluid biopsy, fluid phase biopsy) is a sample from body fluid, typically blood. Liquid biopsy samples may comprise traces of the cancer’s RNA or DNA, which can be identified in the sample as circulating free DNA (cfDNA) or circulating free RNA (cfRNA). Circulating nucleic acids in blood are typically protected by extracellular micro-vesicles, mainly exosomes. With a blood liquid biopsy sample, nucleic acids can be analysed which are released by the tumor or tumor environment into peripheral venous blood.

According to the invention, it is advantageous that the method can be carried out with a body liquid, especially a liquid biopsy sample, which is not invasive and not burdensome for the individual. The method does not require surgical or invasive provision of tissue samples. Thus, the sample can be obtained easily by medical or unskilled staff or even the individual himself.

According to the invention, it was found that a specific combination of marker levels can provide highly relevant information about a type of cancer from a body liquid, especially a liquid biopsy sample. Surprisingly, the same combination of markers can provide less relevant or even meaningless results from other samples, especially cancer tissue. Also for this reason, it is advantageous that the method is carried out with body liquid, especially a liquid biopsy sample.

In a preferred embodiment, the sample is blood or a fraction thereof. The blood fraction from blood can be serum, plasma or a fraction thereof. The sample should comprise circulating free nucleic acids, preferably at relatively high levels. The sample should be processed in a manner such that a relatively high level of circulating free nucleic acids is preserved or accumulated. The sample may comprise known additives, especially for stabilizing nucleic acids.

The method takes into account the levels of at least three different marker types, which are single nucleotide variant (SNV), microRNA (miRNA) and DNA methylation. It was found that the specific combination of the markers can provide highly relevant results. The marker of which levels are examined in step (c) are those which are compared in step (d) and from which the type of cancer is determined in step (e). Herein, this set of markers is also referred to as the markers “used” in the method, or on which the method is “based”. In contrast, markers are not considered “used”, if their levels were only determined additionally in step (c), for example because a larger microarray was used, but in which the markers are not compared in step (d) and/or not used for determining the type of cancer in step (e).

In the method, the level of at least one single-nucleotide variant (SNV; also singlenucleotide alteration) is determined. An SNV is a variation in a single nucleotide without limitations of frequency. SNVs may arise in somatic cells, for example due to cancer, and thus can be found in low numbers of individuals below or significantly below 1%. In this regard, an SNV is distinct from a single-nucleotide polymorphism (SNP), a substitution of a single nucleotide at a specific position in the genome, that is present in a sufficiently large fraction of the population, such as 1% or more. It is known in the art that various SNV can be associated with specific cancer types. Preferably, the level of SNV markers is determined by quantitative PCR. Preferably, determining a level means that it is determined if a mutation is present or not. However, it can also be determined what the ratio and/or amount of the mutation is compared to the wild type gene.

Preferably, the SNV marker(s) is or comprises the androgen receptor (AR) mutation AR H875Y [AR_CT_Y_N, COSMIC ID: cosm238555; Genomic coordinates GRCh38, X:67723701..67723701 , CDS mutation; AA mutation: p.H875Y (Substitution - Missense, position 875, H-*Y) (former designation H874Y) ; CDS mutation: c.2623C>T (Substitution, position 2623, C-»T)]. This marker was described in the art for tissues samples of prostate cancer (Taplin, 1995). Surprisingly, it was found herein that the occurrence of the AR mutation in liquid biopsies of patients, who have cancer other than prostate cancer, is much higher than reported in the literature. For example, it was found that the marker can be suitable for determining bladder, colorectal or breast cancers.

It is highly preferred that the SNV comprise TP53 (COSM10758) (Cosmic ID 10758, TP53_10758_mu_Y_N; substitution - missense). The SNV may also comprise APC (COSM18561) (APC_18561_mu_Y_N, COSMIC ID 18561, Insertion- Frameshift).

In a preferred embodiment of the invention, the SNV comprise AR H875Y and/or TP53 (COSM10758). Preferably, the SNV comprise AR H875Y in combination with TP53 (COSM10758) and/or APC (COSM18561). It was found that these specific SNV markers or groups of markers can provide especially relevant results.

Preferably, the total number of SNV used in the method is at least 2 or at least 3. Preferably, it is up to 30, up to 20, or as low as up to 10, or up to 5. Preferably, the number of SNV is 1 to 30, especially 2 to 20, or 3 to 10. It is advantageous that relevant results can be obtained based on a relatively low number of SNV markers.

In the inventive method, the level of at least one microRNA (miRNA) is determined. A microRNA (miRNA) is a small non-coding RNA molecule, which typically consists of about 22 nucleotides and functions in RNA silencing and post-transcriptional regulation of gene expression. Preferably, the miRNA expression level is determined. Methods for determining miRNA levels in liquid biopsy samples are known in the art (Lan, 2015).

In a preferred embodiment of the invention, the at least one miRNA marker comprises hsa_miR_17_5p. This miRNA was found to provide especially relevant results for determining a type of cancer in the method.

In a preferred embodiment of the invention, the markers comprise at least one, preferably at least 2 or at least 3 miRNA from the group of hsa_miR_17_5p, hsa_miR_133a_3p, hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_20a_5p, hsa_miR_92a_3p, hsa_miR_155_5p, hsa_miR_195_5p, hsa_miR_101_3p, hsa_miR_27a_3p, hsa_miR_26a_5p and hsa_miR_21_5p. Preferably, the markers comprise 2 to 13, preferably 3 to 13 or 5 to 13 miRNA from this group. Preferably, the markers comprise not more than 10 miRNA, preferably not more than 5 miRNA from this group. These miRNA were found to provide especially relevant results for determining a type of cancer in the method. Preferably, the markers from this group include hsa_miR_17_5p, which can be especially relevant for determining a type of cancer. It is advantageous that relevant results can be obtained based on such a relatively low number of miRNA markers from the defined group.

Preferably, the total number of miRNA used in the method is at least 2 or at least 3. Preferably, it is up to 20, up to 10 or even up to 5 only. Preferably, the number of miRNA is 1 to 20, more preferably 2 to 10. It is advantageous that relevant results can be obtained based on a relatively low number of miRNA markers.

In the inventive method, the level of at least one DNA methylation is determined. DNA methylation is a biological process in which methyl groups are attached to a DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. Typically, the DNA methylation is at a CpG site. CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' — > 3' direction. CpG sites occur with high frequency in genomic regions called CpG islands. Cytosines in CpG dinucleotides can be methylated to form 5-methylcytosines. In mammals, DNA methylation is almost exclusively found in CpG dinucleotides. Methods for determining DNA methylation in liquid biopsy samples are known in the art (Gai, 2017). Methylation is typically examined on a gene of interest or region thereof. Preferably, DNA methylation levels are determined by quantitative PCR. Preferably, determining a level means that it is determined if a methylation is present or not. However, it can also be determined what the ratio and/or amount of methylation is compared to corresponding sample from healthy individuals.

In a preferred embodiment of the invention, the at least one DNA methylation marker comprises MLH1_meth. This marker was found to provide especially relevant results for determining a type of cancer in the method. In general, it is preferred that the DNA methylation marker(s) comprise at least one, preferably 2, 3 or all 4 from the group of MLH1_meth, GATA5_meth, Stratifin_meth and MDR1_meth. These DNA methylations were found to provide relevant results for determining a type of cancer in the method. Preferably, the markers from this group include MLH1_meth, which can be especially relevant for determining a type of cancer.

Preferably, the total number of DNA methylation markers used in the method is not more than 10, or not more than 6, preferably not more than 4. Preferably, the number of DNA methylation markers is 1 to 6, preferably 1, 2, 3 or 4. It is advantageous that relevant results can be obtained based on such relatively low numbers of DNA methylation markers.

In the inventive method, specific combinations of the markers from the three types are preferred, which are outlined in the following. The combinations are special, because they basically do not correspond to the markers which are considered as most significant in the art for determining single cancer types. It was surprising that such a group of markers could provide highly significant information for distinguishing between multiple cancer types.

The markers comprise at least 2 markers, preferably 3 or all 4 from a group (A) of AR H875Y, TP53 (COSM10758), MLH1_meth and hsa_miR_17_5p. It was found that these markers can be especially relevant for determining a type of cancer from a group of multiple types.

The markers comprise markers from the group (A) and additional markers, which are at least 2, preferably at least 3 or at least 5 from a group (B) consisting of APC (COSM18561), hsa_miR_133a_3p, hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_20a_5p, hsa_miR_92a_3p, hsa_miR_155_5p, hsa_miR_195_5p, hsa_miR_101_3p, hsa_miR_27a_3p, hsa_miR_26a_5p and hsa_miR_21_5p. Preferably, between 2 and 12, especially between 5 and 10 markers from this group (B) are comprised. Preferably, up to 10 or up to 8 markers from this group are comprised.

In a preferred embodiment, the markers comprise the markers from group (A) and (B) as outlined above, and additionally at least one, preferably at least 2 or at least 3, or all 4 markers selected from a group (C) of GATA5_meth, hsa_miR_133a_3p, Stratifin_meth and MDR1_meth.

In a preferred embodiment of the invention, the markers comprise

(a1) at least 3 markers selected from AR H875Y, TP53 (COSM 10758), MLH1_meth and hsa_miR_17_5p, and

(b1) at at least 3, preferably at least 5 markers selected from APC (COSM 18561), hsa_miR_133a_3p, hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_20a_5p, hsa_miR_92a_3p, hsa_miR_155_5p, hsa_miR_195_5p, hsa_miR_101_3p, hsa_miR_27a_3p, hsa_miR_26a_5p and hsa_miR_21_5p.

In a preferred embodiment, the markers comprise

AR H875Y,

(a1) at least 2 markers selected from TP53 (COSM10758), MLH1_meth and hsa_miR_17_5p,

(b1) at least 5, preferably at least 8 markers selected from APC (COSM18561), hsa_miR_133a_3p, hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_20a_5p, hsa_miR_92a_3p, hsa_miR_155_5p, hsa_miR_195_5p, hsa_miR_101_3p, hsa_miR_27a_3p, hsa_miR_26a_5p and hsa_miR_21_5p, and

(c1) optionally at least one additional marker selected from GATA5_meth, Stratifin_meth and MDR1_meth.

In step (b), a group of at least 2 cancers (cancer types), preferably at least 3, more preferably at least 5, or even at least 8 types is defined. Preferably, the group consists of 2 to 30, or 3 to 25, or 5 to 20 types, especially 6 to 15 types of cancer. Accordingly, it is determined in step (e), which cancer from this pre-determined group the individual has.

In a preferred embodiment, the types of cancer defined in step (b) comprise at least 3, at least 4 or at least 5, more preferably at least 7, at least 8 or all, from the group of bladder, brain, breast, colorectal, lung, ovarian, pancreas, prostate and stomach cancer. Preferably, the types of cancer comprise at least bladder, brain, colorectal and/or stomach cancer, because it was found that the method can identify these types with excellent precision. Preferably, the types of cancer comprise stomach and brain cancer, because it was found that the method can identify these types with especially high precision.

Additionally, the types of cancer may comprise other known types, such as kidney, liver, uterine, oesophageal and thyroid cancer.

In a preferred embodiment of the invention, the markers comprise:

- for bladder cancer, at least one, preferably at least 2, more preferably all of AR H875Y, TP53 (COSM10758), hsa_miR_17_5p,

- for brain cancer, at least one, preferably at least 2, more preferably all of MLH1_meth, GATA5_meth, hsa_miR_133a_3p,

- for breast cancer, at least one, preferably at least 2, more preferably all of AR H875Y, hsa_miR_17_5p, TP53 (COSM10758), MDR1_meth,

- for colorectal cancer, at least one, preferably at least 2, more preferably all of AR H875Y, hsa_miR_17_5p, hsa_miR_195_5p, TP53 (COSM10758),

- for lung cancer, at least one, preferably at least 2, more preferably all of hsa_miR_155_5p, TP53 (COSM10758), hsa_miR_92a_3p, hsa_miR_17_5p,

- for ovarian cancer, at least one, preferably at least 2, more preferably all of hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_101_3p, hsa_miR_92a_3p,

- for pancreas cancer, at least one, preferably at least 2, more preferably all of at least one of hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_27a_3p, Stratifin_meth,

- for prostate cancer, at least one, preferably at least 2, more preferably all of AR H875Y, hsa_miR_26a_5p, hsa_miR_17_5p, and/or

- for stomach cancer, at least one, preferably at least 2, more preferably all of hsa_miR_20a_5p, hsa_miR_21_5p, APC (COSM18561).

In a preferred embodiment, the total number of markers used in the method (the sum of SNV, miRNA and DNA methylation markers) is not more than 50, or not more than 30, or preferably not more than 20. Preferably, the total number is from 5 to 50, especially from 10 to 40, or from 5 to 25. More preferably, the total number is from 10 to 25, especially from 15 to 20. Typically, a minimum total number of at least 5 or at least 10 markers may be used. It is a special advantage of the method that it can be carried out with a relatively low total number of markers. Thus, the method can be conducted with a relatively simple test kit and/or test tools at comparably low costs. Moreover, it was found that the specificity of the result can be enhanced when the number of markers is reduced. In this regard, it is preferred that only the levels of markers used for evaluation in step (e) are determined in step (c) and/or compared in step (d), because this simplifies the method and the specificity of the result can be improved. However, it is also conceivable that a larger data set on marker levels is obtained, although the result in step (e) is obtained only based on a limited set of markers selected in step (c).

In a preferred embodiment, the likelihood that the type of cancer is determined correctly (the accuracy) is at least 80%, preferably at least 90%, more preferably at least 95%. Due to the specific selection of markers, it was found that the method allows predictions which such a high accuracy. This provides significant information for the medical professional, which may be used for subsequent verification, for example with other markers or diagnostic tools which are highly specific only for the type of cancer identified in the method.

In a preferred embodiment, the levels of single nucleotide variants (SNV) and DNA methylation are determined with cell free DNA (cfDNA) and/or the levels of the miRNA is determined in cell free miRNA (cfmiRNA). Preferably, the SNV and/or DNA methylation are determined in circulating tumor DNA (ctDNAs). It is known in the art that ctDNA, including tumor specific SNV and DNA methylation, can be examined in liquid biopsy samples, and can be detected in the plasma of cancer patients in the early stages of their disease (Han et al., 2017; Gai, 2019). It is also known in the art that miRNA are potential biomarkers for cancer (Lan, 2015). However, methods for determining a type of cancer based on the marker combination as in the present invention have not been described in the art.

In a preferred embodiment, the levels of the markers are determined by quantitative PCR (qPCR). Preferably, all marker levels are determined by qPCR. This is advantageous, because all the levels of the different markers SNV, miRNA and DNA methylation can be determined uniformly with this highly specific and relatively simple method. In this regard, microRNA expression can be quantified in a two-step polymerase chain reaction of modified RT-PCR followed by qPCR.

Since the inventive method provides significant information based on SNV, miRNA and DNA methylation markers, it is preferred that no other marker types are additionally used. It is especially preferred that no protein marker is used in the method. This is advantageous, because analyzing protein markers is generally more complicated. Moreover, qPCR is not applicable for determining protein marker levels. In a less preferred embodiment, additional other marker types, which are not SNV, miRNA or DNA methylation markers, can be used. In one embodiment, the method does not comprise a nucleic acid sequencing step. This can be advantageous, because sequencing methods, such as next-generation sequencing, are often more complicated and can be less sensitive. In another embodiment, the method may comprise nucleic acid sequencing, especially instead of qPCR. In principle, a high sensitivity may also be achieved with a sequencing technique. Preferably, the method is conducted in an automated manner, such as an automated workstation.

The method can be applied for determining what type of cancer an individual has. In this case, the result can be that the individual has a cancer type from a defined group. If the result is that the individual does not have a cancer type from the defined group, the result can be that the individual is healthy, or that it may have cancer of another type, which is not part of the defined group.

In a preferred embodiment, it is additionally determined if the individual has cancer or not. In this case, the method is for determining a type of cancer from the defined group, or alternatively that the individual does not have cancer at all, or at least not a type of cancer from the defined group. Preferably, the likelihood to determine correctly that the individual does not have cancer is at least 80%, preferably at least 90%, more preferably at least 95%.

In a preferred embodiment, the method is applied for determining if the individual has cancer or not. In this case, the result will be that the individual has a cancer from the defined group or is healthy. In this application, is not required to determine which type of cancer the individual has. In case the result is positive, it can be determined subsequently what type of cancer the individual has. This method is of high practical relevance, because it is often necessary in clinical practice to determine rapidly and reliably if an individual has cancer or not, before performing subsequent specific diagnostic method, which are generally more time consuming, complicated and costly. Preferably, the likelihood to determine correctly that the individual has cancer is at least 80%, preferably at least 90%, more preferably at least 95%.

Subject of the invention is also the use of the single nucleotide variant AR H875Y as a marker for determining from a body liquid or fraction thereof, preferably a liquid biopsy sample, which type of cancer an individual has, wherein the type of cancer is selected from bladder, colorectal, lung, stomach, ovarian or brain cancer, preferably from bladder or colorectal cancer. In the art, it has not been suggested that this SNV could be of special relevance for distinguishing between different types of cancer in liquid biopsy samples.

In a preferred embodiment, the method comprises a final step, in which it is confirmed by other diagnostic means that the type of cancer was identified correctly in step (e). For example, the confirmation could be made with one or more highly specific markers for the type of cancer determined, possibly based on a tissue sample.

The inventive method comprises steps (a) to (e). Whilst steps (a), (c), (d) and (e) are carried out in consecutive order, the definition of the group of types of cancers in step (b) does not necessarily have to be performed between steps (a) and (c).

In step (a), the liquid biopsy sample can be taken from the individual, preferably in a non- or minimally invasive method. However, the sample can also be obtained from the individual before step (a), such that the overall method is an in vitro method.

Marker levels in step (c) can be determined by routine methods of a medical or biochemistry laboratory, preferably qPCR. Respective methods are known and have been described in the art.

In principle, comparing levels in step (d) to a known standard means, that the levels of the markers, which were determined in step (c), are compared to levels which are known from the art or which were determined in advance for the same markers, and for the same cancer types from the group defined in step (b) and for healthy individuals. Standard levels of SNVs, miRNA and DNA methylation markers for comparison in step (d) can be obtained from the literature, from this specification or experimentally. Comparative data regarding the marker levels can be obtained experimentally from samples from individuals which are known to have the defined cancer types and/or healthy individuals as a standard control group. The results from step (c), when compared to a known standard accordingly, provide relevant information for determining in step (e) which type of cancer the individual has.

In step (e), the type of cancer from the defined group is determined based on the comparative data from step (d). The type of cancer is determined from the comparative data based on a suitable algorithm. The algorithm can be selected and adjusted by known methods or as described in detail in the experimental part of this specification. Briefly, the algorithm can be segmented into three modules, which are (i) calculation of a correlation matrix for all variables of each cancer type in contrast to the healthy control group; (ii) ranking all correlation based on their correlation coefficient; (iii) saving the 12 top-ranked variables of each cancer type; (iv) scaling the variable if it’s not a Boolean variable; (v) performing a linear regression with the transformed variables to predict the cancer type and ranking their importance and computing a confusion matrix; (vi) optimization of the classification by computing a score of all possible combinations of the 12 variables to avoid the inclusion of variables only displaying covariances; (vii) predicting the cancer type with the reduced amount of variables analog to step (v). Instead of 12 top-ranked variables, it is also possible to select less or more top-ranked variables, such as 8 to 15. Preferably, the most relevant information is if the sample has the AR H875Y mutation or not. Preferably, the samples are initially split into two groups which either have or have not the AR H875Y mutation or not.

Subject of the invention is also a method for treating cancer, comprising determining in a method of the invention as described above, which type of cancer an individual has, and

(f) providing a therapeutic treatment for the individual, which is effective against the type of cancer identified in step (e).

Subject of the invention is also a method for identifying a therapeutic treatment against cancer for an individual, which comprises determining which type of cancer the individual has by the method as described above, and subsequent identification of the treatment against the type of cancer. The therapeutic treatment can be any treatment known in the art for the specific type of cancer, such as provision of an active agent and/or physical treatment, such as radiation or surgical treatment.

Subject of the invention is also a method for providing information for use in determining which type of cancer an individual has and/or if the individual has cancer or not, comprising (a) providing a sample from the individual, which is a body liquid or more fraction thereof,

(b) defining a group of cancer types, which comprise at least two from the group of bladder, brain, breast, colorectal, lung, ovarian, pancreas, prostate and stomach cancer,

(c) determining the levels of markers in the sample, wherein the markers comprise

(i) at least one single nucleotide variant (SNV),

(ii) at least one microRNA (miRNA), and

(iii) at least one DNA methylation; wherein the markers comprise

(a1) at least 2 markers selected from AR H875Y, TP53 (COSM 10758), MLH1_meth and hsa_miR_17_5p, and

(b1) at least 2 markers selected from APC (COSM 18561), hsa_miR_133a_3p, hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_20a_5p, hsa_miR_92a_3p, hsa_miR_155_5p, hsa_miR_195_5p, hsa_miR_101_3p, hsa_miR_27a_3p, hsa_miR_26a_5p and hsa_miR_21_5p; and

(d) comparing the levels to a known standard, and optionally storing the information obtained in step (d) on a storage device.

This method can provide information in step (d), which can be used later and/or at a different location for determination which type of cancer an individual has. The information can be stored and/or transmitted to a user, such as a medical professional. The storage device can be digital device, such as a computer or a cloud solution.

The inventive method solves the problem underlying the invention. The method is based on a specific novel combination of markers. Surprisingly, the markers are generally not those considered most relevant in the art for a single specific type of cancer. The specific groups of markers used can distinguish between multiple cancer types. Moreover, it was found that the miRNA and DNA methylation can be especially relevant for distinguishing the types of cancer. This was unexpected, because it is generally assumed that SNV are predominantly significant for identifying a specific cancer type. The inventive method can be carried out with a low number of markers and provides highly significant information. It is advantageous that it can also be used for determining if an individual is healthy or not. Since all markers can be examined by the same method, especially quantitative PCR, the method can be simple, convenient and cost-efficient. Accordingly, it can be used by medical professionals in a routine and automated manner at high throughput. provide a list of TaqMan cast-PCR™ (company ThermoFisher, US) used for the analysis of cfDNA of 206 cancer patients and 15 healthy donors. Assay ID/Name and Target Information refer to the assay ID and names issued by ThermoFisher. Ct Threshold refers to the ACt that was applied in this study and is defined as: ACt=Ct(SNV Assay)-Ct(Reference Assay). provide a list of TaqMan TM advanced miRNA PCR assays from ThermoFisher used for the analysis of cfmiRNA. provide a list of amplicons for qPCR used for the methylation analysis of cfDNA. provides a graphic overview of the study results Fractions correspond to the proportion of correctly classified patients. Error bars represent 95% confidence intervals. provide a graphic overview of correlation plots for each cancer type.

The correlation coefficient of each biomarker is plotted on the y axis (“abs(sorted)”). An index value (“index”) is assigned to each biomarker according to the value of its correlation coefficient (ranked from highest to lowest value) and plotted on the x axis. The 15 biomarkers with the highest correlation coefficients are displayed in the legends of the plots for each cancer type A- 1. provides a graphic overview of the cancer type specific cfDNA methylation which were determined in the working examples. (A-C) Boxplots of cfDNA methylation for (A) MLH1 , (B) SFN, (C) MDR1. Boxes are the 25th to 75th percentile, the line is the median, and whiskers are 1.5x IQR. P-values are showed as * p < 0.05; ** p < 0.005; *** p

< 0.001. Lower case letters indicate the group with significantly different cfDNA methylation levels: a healthy, b bladder, c brain, d breast, e CRC, f lung, g ovarian, h prostate, i stomach, k pancreas. provides a graphic overview of the cancer type specific miRNA expression levels which were determined in the working examples. Boxplots of the miRNAs deregulated between the different cancer types and control group (healthy). Boxes are the 25th to 75th percentile, line is the median, and whiskers are 1.5x IQR. Lower case letters indicate the group with significantly different miRNA levels: a healthy, b bladder, c brain, d breast, e CRC, f lung, g ovarian, h prostate, i stomach, k pancreas. L Grouped plot of the differentially expressed miRNAs between cancer samples with (AR+) and without (AR-) H875Y androgen receptor mutation. Line is the mean value and whiskers are 95% Cl. P-values are showed as * p < 0.05; ** p < 0.005; *** p < 0.001.

EXAMPLES

A clinical study was conducted to determine whether cfmiRNA, SNVs in cfDNA, and DNA methylation in cfDNA biomarkers can be combined to predict an early cancer setting in a specific tissue under conditions that preserve high specificity for bladder, brain, breast, colorectal, lung, ovarian, prostate, stomach, and pancreatic cancers.

First, we developed commercially unavailable new SNV assays covering known mutational hotspots tightly linked to various cancerous states.

Secondly, we developed a pan-cancer panel comprising 6 miRNA expression and 2 DNA methylation profiles implicated across 9 tumor tissue types.

Thirdly, we developed panels to predict a cancerous disease in nine tissues using only 3 SNVs, 13 miRNA expression, and 4 DNA methylation profiles.

STUDY DESIGN

Plasma DNA Samples

A total of 220 plasma samples were obtained through a human biospecimen CRO with collection centers in Bulgaria, Romania, and Serbia following approval by the IRBs for Human Research at each institution and informed consent. 205 patients suffering for the first time in their life from one of nine different cancer types (bladder, brain, breast, colorectal, lung, ovarian, prostate, stomach, and pancreatic) at stage I - III were included in this study. Peripheral blood was collected before surgery and before any neoadjuvant therapy between February 2019 and February 2020. The samples were liquid biopsy samples. The following patient information was documented: Age, weight, height, gender, current infectious diseases, AJCC stage (7th edition), and their family cancer history. For the control group, we collected plasma from 15 healthy donors. Blood was sampled using K2-EDTA blood collection tubes. The blood underwent double centrifugation in the first two hours after the blood draw. First, at 2,000 x g for 10 min at 4°C and the resulting plasma was then transferred to a new tube and centrifuged at 16,000 x g for 10 min at 4°C. The cell-free plasma was then stored at -80°C until shipping on dry ice.

Nucleic acid purification cfDNA was purified from 4 ml plasma on a Kingfisher™ Duo Prime purification system using the MagMAX™ Cell-Free DNA Isolation Kit (company ThermoFisher, US). The elution volume for the cfDNA was 80 pl. The DNA concentration was measured with the Qubit™ dsDNA HS Assay Kit on a Qubit™ 4 fluorometers (ThermoFisher). Total RNA was purified from 100 pl plasma on a Kingfisher™ Duo Prime purification system using the MagMAX™ mirVana™ Total RNA Isolation Kit (ThermoFisher). As a spike-in, C. elegans miRNA 39 was added to the lysis buffer at a concentration of 15 fmol/sample. The elution volume for the total RNA was 50 pl.

SNV marker detection

Based on the publicly available COSMIC database, Taq-man PCR assays were designed to cover SNVs playing a significant role in different cancerous diseases. The maximal Ct difference between the SNV and corresponding wild-type reference assay is set to 9, reflecting a minimal mutant allele fraction (MAF) of 0.2%. This allows us to detect 0.2 % of mutant alleles in the presence of 99.8 % wild-type alleles.

4,45 pl of the eluted cfDNA underwent Blunt End Ligation-Mediated Whole Genome Amplification (BL-WGA) according to a previous study (Li et al., 2006). 1 pl of BI-WGA DNA was measured with the Qubit™ dsDNA BR Assay Kit on a Qubit™ 4 fluorometer (ThermoFisher). 1-2 pg BL-WGA DNA was mixed with TaqMan™ Fast Advanced Master Mix (2X), TaqMan™ Mutation Detection IPC Reagent Kit, and nuclease-free water to a final volume of 1 ml. 10 pl of the mixture was dispensed to each well of a pre-spotted 96- well Competitive allele-specific TaqMan™ PCR (cast-PCR™) plate and analyzed on a QS3 Real-time PCR system (all ThermoFisher). TaqMan™ Mutation Detection Assays are powered by competitive allele-specific TaqMan™ PCR (castPCR™ Technology) to detect and measure somatic mutations in genes associated with cancer research. The castPCR™ technology is highly specific and sensitive and can detect rare amounts of mutated DNA in a sample that contains large amounts of normal, wild-type DNA. Competitive, allele-specific TaqMan™ PCR utilizes an allele-specific primer for mutant allele detection that competes with an MGB blocker oligonucleotide to suppress the wildtype background. These assays can detect down to 0.1% mutation in a background of wild type DNA. The PCR conditions comprised an initial denaturation step of 10 minutes at 95 °C, followed by 5 cycles of 15 sec denaturation at 92 °C and one minute extension at 58 °C. This was followed by 40 cycles of 15 sec denaturation at 92 °C and one minute extension at 60 °C. Real-time data were collected during the last 40 cycles of amplification and analyzed using the Mutation Detector™ software v.2.0 (ThermoFisher). Briefly, the abundance of a SNV is computed by

ACt = Ct(SNP Assay) — Ct Reference Assay)

Theoretically, if ACt is between 9.99 and 0, the sample is declared as being positive for the SNV (0.1% detection limit). Not all PCR assays have the same sensitivity, and the overall sensitivity can be influenced by many factors, such as PCR reaction volume, cfDNA extraction method, cell lysis, which increases WT background DNA, e.g.. To address this issue, we decreased the sensitivity of each assay to a ACt that led to no positive SNV detection in our 15 healthy donors. Assay specific ACt values are declared in Fig. 1a and 1b.

TaqMan® Advanced miRNA cDNA Synthesis Kit (ThermoFisher) was used to reverse transcribe miRNAs from 2 pl of the eluted total RNA. We did not change the protocol of the manufacturer. Briefly, 1 :10 diluted pre-amplified cDNA template was mixed with TaqMan® Fast Advanced Master Mix (2X) and nuclease-free water to a final volume of 528 pl. The mixture was dispensed to 48 wells of a pre-spotted 96-well TaqMan Advanced miRNA PCR plate. The reaction volume in each well was 10 pl. The plate was analyzed on a QS3 Real-time PCR system (all ThermoFisher). The fast PCR conditions comprised an initial denaturation step of 20 seconds at 95 °C. This was followed by 40 cycles of 1 sec denaturation at 95 °C and 20 sec annealing/extension at 60 °C. Real-time data were collected at the end of each annealing/extension step and analyzed using the relative quantification app, which is part of the ThermoFisher cloud (ThermoFisher).

Up to 75 pl of the remaining purified cfDNA was used to enrich for methylated DNA using the MethylMiner Enrichment Kit (ThermoFisher). Methylated and non-methylated control duplexes provided by the manufacturer were used as controls for methyl-CpG-binding- domain (MBD) capture. The primers (150 nM final concentration) were designed to amplify 62 to 119 bp segments (Fig. 3). The abundance of each region of interest was quantified in the depleted and enriched fraction with specific primers mixed with GoTaq® qPCR Master Mix (2X) (Promega), nuclease-free water in a 10 pl reaction volume on a with QS3 Real-time PCR system. The PCR conditions comprised an initial denaturation step of 2 minutes at 95 °C. This was followed by 40 cycles of 3 sec denaturation at 95 °C and 30 sec annealing/extension at 60 °C. Real-time data were collected at the end of each annealing/extension step. A melt curve analysis was performed for quality control. The amount of each region of interest in the depleted and enriched fraction was analyzed with the Quantstudio v 1.4.3 software (ThermoFisher). The methylation of each region of interest was calculated with the following formula:

ACt = Ct(Enrlched) — Ct fiepleted)

100

% Methylation = 100 — - -

1 - 2 Act

Marker selection

For the initial SNV marker pool, we used the Catalog of Somatic Mutations in Cancer (COSMIC) database to find SNVs that occur in 95 % of each cancer type (bladder, brain, breast, colorectal, lung, ovarian, prostate, stomach, and pancreatic cancer). This list consisting of 203 SNVs in 47 genes, was then reduced to SNVs that are more common for one or more cancer types to fit onto a 96 well PCR plate, including a reference assay for each gene. The final SNV list consists of 75 mutational assays and 21 reference assays. This list was further reduced to include assays that detected at least one positive case in our study population (Fig. 1a, 1b).

For the selection of the initial miRNA biomarker, we performed literature research for each cancer type and ranked the observed expression changes in high (weighted as 10), medium (weighted as 5), and slight (weighted as 1) to rate the importance of each miRNA in each cancer. 46 miRNA targets with the highest rating were selected. hsa-miR-16-5p was added as reference and cel-miR-39-3p as spike-in control (Fig. 2).

For the selection of the initial DNA methylation biomarker, we performed literature research for each cancer type. Further criteria for selection were that the PCR amplicon is shorter than 120 bp and that the primers are specific for the selected region of interest, which was tested by aligning them against the whole genome using Primer-BLAST (NCBI). PCR assays for 26 regions in 21 genes were found to meet these criteria and were tested by a SYBR Green qPCR reaction with melt curve analysis. The PCR product was also checked on a 2% agarose gel for length confirmation and possible PCR byproduct determination. 12 assays covering CpG rich regions in 12 genes were selected for the study (Fig. 3).

ALGORITHM

Many measured variables/features, in comparison to the small sample size, tend to produce less accurate estimates. A subset of features for each cancer type is selected to avoid random associations between features and cancer types. The algorithm is segmented into three modules:

Correlations

A correlation matrix is calculated for all variables of each cancer type in contrast to the healthy control group to measure the strength of association of each variable with each cancer type. Relevant variables are selected based on significant correlation and high correlation coefficients. Further, some metric variables were dichotomized upon an automatically defined threshold value and used for cancer type classification.

Classification

The correlation data is then used for the classification module. The Rpart tree classification performs the classification itself. Modifications made to the default values of the packages:

• We adjusted the maximal number of variables to be considered everywhere to 12. This reduces the computing time by only including the first 12 variables that show the highest correlation with a distinct cancer type or cancer itself.

• False-negative classifications are penalized compared to false positives by a factor of 2. This leads to less false-negative results.

This classification by the algorithm gives each variable a score. The higher a score is, the more it is associated with a particular cancer type or cancer itself. Each model's predictive accuracy is determined by calculating a confusion matrix, out-of-bag (OOB) estimate of error rate, Mcnemar's Test, and/or Cohen's kappa coefficient. The Cohen's kappa measures the agreement between two raters who classify N items into C mutually exclusive categories. Kappas over 0.75 are regarded as excellent, 0.40 to 0.75 as fair to good, and below 0.40 as inferior (Fleiss, 1973). Kappa values range from 0 (observed allocation is random) to 1 (perfect agreement between prediction and reference).

Optimize Classification

The correlation plots for each cancer type are shown in Fig. 5a and 5b. The correlation coefficient of each biomarker is plotted on the y axis. An index value is assigned to each biomarker according to the value of its correlation coefficient (ranked from highest to lowest value) and plotted on the x axis. The 15 biomarkers with the highest correlation coefficients are displayed in the legends of the plots for each cancer type.

The twelve variables showing the highest scores for each tumor were taken for further optimization. All combinations of these 12 variables were tested on their importance to predict a particular tumor type. This step eliminates variables with redundant information. The exclusion of certain variables has not worsened the predictive power for a particular tumor type.

Based on these results for the single cancer types, the most useful combinations of variables/features are identified by measuring how they influence specific performance metrics. Performance metrics can be defined based, e.g., on the confusion matrix such that maximal sensitivity is achieved while allowing a moderate specificity, i.e., false positives can be defined to decrease performance less than false negatives, or false positives can be allowed while strictly limiting the allowed relative number of false negatives. The variables/features used for the classification of all cancer types are engineered by varying the number of the features used for each cancer type starting from the most correlated ones, by leaving out (superfluous) features that yield no improvement concerning the performance measure, and by choosing features that are especially useful to discern between cancer types even if their correlations are lower.

An additional design goal is to limit the total number of features necessary to achieve the desired performance by identifying the most meaningful ones concerning the performance metric, as outlined above. Sought results are statements of the form that a particular tumor type/stage is likely present, that another tumor type/stage is present with lower probability, or that no tumor is present.

Since many such combinations of features exist and need to be evaluated with respect to the performance metric, the necessary calculations are performed in an automated manner. This allows for identifying meaningful features for different performance metrics by rerunning the program, and it is useful to automatically evaluate the features and possibly adjust their choice when the sample size increases.

The following R packages were used in the algorithm code:

• Caret: The caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for data splitting, pre-processing, feature selection, model tuning using resampling, variable importance estimation, as well as other functionality.

• Corrplot: A graphical display of a correlation matrix or general matrix. It also contains some algorithms to do matrix reordering.

• Hmisc: Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.

• Readxl: Import excel files into R.

• Rpart: The rpart code builds classification or regression models of a very general structure using a two-stage procedure; the resulting models can be represented as binary trees.

• Rpart.plot: Plot an rpart model, automatically tailoring the plot for the model's response type.

RESULTS

Classification of subjects as cancerous or healthy - first analysis

The algorithm of the pan-cancer analysis classifies a sample as cancerous or healthy. The data is based on 205 cancer patient analysis and 15 healthy cancer-free subjects. A group of eight markers is included in the analysis. The algorithm is based on a tree model in R, and at each node of a 17-node tree, different thresholds for each marker are applied to split the data into the two bins, cancerous and healthy.

Included markers: GATA5_meth, APC_meth, hsa_miR_126_3p, hsa_miR_143_3p, hsa_miR_148b_3p, hsa_miR_155_5p, hsa_miR_205_5p, hsa_miR_22_3p.

Confusion Matrix and Statistics:

Accuracy : 0.9364

95% Cl : (0.8955, 0.9648)

No Information Rate : 0.9318

P- Value [Acc > NIR] : 0.4620

Kappa : 0.5283

Mcnemar's Test P-Value : 0.7893

Sensitivity 0.60000

Specificity 0.96098

Pos Pred Value : 0.52941

Neg Pred Value : 0.97044

Positive Class : Healthy

197 out of 205 patients with cancer were correctly classified as cancer cases. 9 out of 15 healthy subjects were correctly classified as healthy. 8 cancer patients were incorrectly classified as healthy.

With only 8 variables, the algorithm could correctly allocate 96 % of the cancerous cases. Due to the relatively small healthy subject group, the correct allocation of healthy subjects is 60%. When increasing the healthy subject group, it can be expected that the algorithm will enable to classify more healthy subjects as healthy and that the false-negative and false-positive rate will diminish.

One would guess that differentiation between healthy subjects and cancerous patients could easily be performed by SNVs, as no healthy subject harbors an analyzed SNV. A remarkable finding here is that, although no healthy subject harbors a SNV and a mean of 5.8 SNVs is found in every cancerous patient, no SNV variable is used to discriminate these two groups. miRNA and DNA methylation were more powerful variables than SNVs to separate these two groups from each other.

Cancer type-specific algorithm results

In the following, the results are summarized for various different cancer types. For each type, a discrimination of cancer positive and cancer free samples was included. The Tree classification package in R programming language was used for classification. A Confusion Matrix and statistics are provided for each cancer type.

Bladder cancer

Included variables: AR H875Y (AR_CT_Y_N), TP53 (COSM10758,

TP53_10758_mu_Y_N), hsa_miR_17_5p

Accuracy : 0.8857

95% Cl : (0.7326, 0.968)

No Information Rate : 0.5714

P- Value [Acc > NIR] : 6.136e-05

Kappa : 0.7742

16 out of 20 patients with bladder cancer were correctly classified as bladder cancer cases, and all 15 healthy subjects were correctly classified as healthy. 4 bladder cancer patients were incorrectly classified as healthy. The Kappa value of 0.7742 indicates an excellent agreement between predicted and reference samples.

Brain cancer

Included variables: MLH1_meth, GATA5_meth, hsa_miR_133a_3p

Accuracy : 0.9583

95% Cl : (0.7888, 0.9989)

No Information Rate : 0.625

P-Value [Acc > NIR] : 0.0001944

Kappa : 0.9091

8 out of 9 patients with brain cancer were correctly classified as brain cancer cases, and all 15 healthy subjects were correctly classified as healthy. 1 brain cancer patient was incorrectly classified as healthy. The Kappa value of 0.9091 indicates an excellent agreement between predicted and reference samples.

Breast cancer

Included variables: AR H875Y, hsa_miR_17_5p, TP53 (COSM 10758), MDR1_meth

Accuracy : 0.8444

95% Cl : (0.7054, 0.9351)

No Information Rate : 0.6667

P-Value [Acc > NIR] : 0.006328

Kappa : 0.6182

29 out of 30 patients with breast cancer were correctly classified as breast cancer cases. 9 out of 15 healthy subjects were correctly classified as healthy. 6 breast cancer patients were incorrectly classified as healthy. The Kappa value of 0.6182 indicates a good agreement between predicted and reference samples.

Colorectal cancer

Included variables: AR H875Y, hsa_miR_17_5p, hsa_miR_195_5p, TP53 (COSM 10758)

Accuracy : 0.907

95% Cl : (0.7786, 0.9741)

No Information Rate : 0.6512

P-Value [Acc > NIR] : 0.0001203

Kappa : 0.8072

24 out of 28 patients with colorectal cancer were correctly classified as colorectal cancer cases, and all 15 healthy subjects were correctly classified as healthy. 4 colorectal cancer patients were incorrectly classified as healthy. The Kappa value of 0.8072 indicates an excellent agreement between predicted and reference samples. Lung cancer

Included variables: hsa_miR_155_5p, TP53 (COSM 10758), hsa_miR_92a_3p, hsa_miR_17_5p

Accuracy : 0.8636

95% Cl : (0.7265, 0.9483)

No Information Rate : 0.6591

P-Value [Acc > NIR] : 0.002023

Kappa : 0.6966

26 out of 29 patients with lung cancer were correctly classified as lung cancer cases. 12 out of 15 healthy subjects were correctly classified as healthy. 3 lung cancer patients were incorrectly classified as healthy. The Kappa value of 0.6966 indicates a good agreement between predicted and reference samples.

Ovarian cancer

Included variables: hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_101_3p, hsa_miR_92a_3p

Accuracy : 0.8571

95% Cl : (0.6733, 0.9597)

No Information Rate : 0.7143

P-Value [Acc > NIR] : 0.06532

Kappa : 0.65

18 out of 20 patients with ovarian cancer were correctly classified as ovarian cancer cases. 6 out of 8 healthy female subjects were correctly classified as healthy. 2 ovarian cancer patients were incorrectly classified as healthy. The Kappa value of 0.65 indicates a good agreement between predicted and reference samples. Pancreas cancer

Included variables: hsa_miR_148b_3p, Stratifin_meth, hsa_miR_29c_3p, hsa_miR_27a_3p

Accuracy : 0.7778

95% Cl : (0.5774, 0.9138)

No Information Rate : 0.5556

P-Value [Acc > NIR] : 0.01448

Kappa : 0.5645

11 out of 12 patients with pancreatic cancer were correctly classified as pancreatic cancer cases. 10 out of 15 healthy subjects were correctly classified as healthy. 1 pancreatic cancer patient was incorrectly classified as healthy. The Kappa value of 0.5645 indicates a fair agreement between predicted and reference samples.

Prostate cancer

Included variables: AR H875Y, hsa_miR_26a_5p, hsa_miR_17_5p

Accuracy : 0.8235

95% Cl : (0.6547, 0.9324)

No Information Rate : 0.7941

P- Value [Acc > NIR] : 0.4323

Kappa : 0.5546

22 out of 27 patients with prostate cancer were correctly classified as prostate cancer cases. 6 out of 7 healthy male subjects were correctly classified as healthy. 5 prostate cancer patients were incorrectly classified as healthy. The Kappa value of 0.7742 indicates a fair agreement between predicted and reference samples. Stomach cancer

Included variables: hsa_miR_20a_5p, hsa_miR_21_5p

Accuracy : 0.9474

95% Cl : (0.8225, 0.9936)

No Information Rate : 0.6053

P- Value [Acc > NIR] : 1.681e-06

Kappa : 0.8899

22 out of 23 patients with stomach cancer were correctly classified as stomach cancer cases. 14 out of 15 healthy subjects were correctly classified as healthy. 1 prostate cancer patients was incorrectly classified as healthy. The Kappa value of 0.8899 indicates an excellent agreement between predicted and reference samples.

Summary of markers

The markers which are suitable for the group of cancers in this study are shown in table 1 below.

Table 1: Markers used for determining cancer type These markers used can be grouped into three categories as highly relevant, relevant and complementary. Based on the categories and the present study and disclosure, the skilled person can select markers suitable for diagnosing cancer of defined type and patient group. Overall, the relevance of the markers for determining the type of cancer in the specific group can be summarized as follows:

Highly relevant: AR H875Y, TP53 (COSM 10758), MLH1_meth, hsa_miR_17_5p.

Relevant: APC (COSM 18561), hsa_miR_133a_3p, hsa_miR_148b_3p, hsa_miR_29c_3p, hsa_miR_20a_5p, hsa_miR_92a_3p, hsa_miR_155_5p, hsa_miR_195_5p, hsa_miR_101_3p, hsa_miR_27a_3p, hsa_miR_26a_5p, hsa_miR_21_5p.

Complementary: GATA5_meth, hsa_miR_133a_3p, Stratifin_meth, MDR1_meth.

Cancer marker AR H875Y

It was found that the androgen receptor mutation AR H875Y was among the most important markers for determining which type of cancer a subject has. The distributions of AR H875Y in the current study for a specific cancer type are listed in Table 2. In our study, the analysis of plasma from 205 cancer patients consisted of 75 mutational assays. The assay covering the AR mutation showed that this mutation was present in 102 cancer patients. The occurrence of the AR mutation in liquid biopsies of patients who have cancer other than prostate cancer is much higher than reported in the scientific literature. It was found that the AR H875Y is highly suitable for diagnosing bladder, colorectal and breast cancers.

Table 2: Abundance of AR H875Y marker in different cancer types. Cancer results

This inventive method, based on a selection of markers which may comprise 3 SNVs, 13 miRNA expression, and 4 DNA methylations from a liquid biopsy, enables a non- invasive diagnosis of nine different solid cancer types.

The cancer type specific methylation levels for 3 highly relevant cfDNA markers are shown in Fig. 6. The cancer type specific expression levels for 11 highly relevant miRNA markers are shown in Fig. 7a and 7b. Cancer samples with the AR mutation (AR+, N=101) showed significantly lower levels of miRNAs 148a-3p, 148b-3p, 195-5p, 210-3p, 23a-3p, 25-3p when compared to samples without the AR mutation (AR-, N=96) (Fig. 7b L).

Notably, in the art these markers were mostly not considered highly relevant for a specific cancer type. According to the present invention, it was surprisingly found that these markers may not be the most relevant markers for a specific, single cancer type, but can be highly relevant for discriminating between a group of cancers and/or for determining with high accuracy if an individual has cancer or not.

We have designed a blood test that can detect the presence of nine common solid tumor types. By an algorithmic combination of somatic SNVs, DNA methylation, and miRNA signatures, we were able to correctly distinguish a cancer sample from a healthy donor sample in 87 % of all cases (FIG. 4). The accuracy of detection, dependent on the cancer type, ranges from fair to excellent, as stated by high kappa values.

Notably, the SNV detection, miRNA expression, and DNA methylation analysis is based on a quantitative PCR method. Methods in the prior art generally focus on just one cancer type, one analyte, and use next-generation sequencing methods for SNV detection, miRNA expression, or DNA methylation analysis. In the present study, qPCR based methods have been found to be more sensitive in the quantification of SNVs and miRNA expression compared to sequencing-based methods (except for ultra-deep sequencing). The small number of variables to be analyzed combined with its high sensitivity and high accuracy of the algorithm in predicting cancer in one of nine different tissues, makes this method stand out of all other methods described in the art. Classification of subjects as cancerous or healthy - second analysis

Based on the data which was used in the first analysis for classification of subjects as cancerous or healthy, which is described above, a second analysis was made, in which the samples were initially split into two groups, wherein the first group comprises the mutation AR H875Y (AR_CT_Y_N) in the androgen receptor, whereas the second group does not.

Firstly, samples were split into two groups (x 2 14.688, p<0.001): samples with the AR mutation (AR+, N=101) and samples without the AR mutation (AR-, N=111). The AR+ group consisted only of tumor samples, since no AR mutation was detected in the control group. However, the AR- group contained the healthy controls (N=15) as well as tumor samples (N=96), therefore no further classification of these groups was possible based only on AR mutation. In order to separate healthy from tumor samples in the AR- group, several discriminant function analyses with a leave-one-out cross-validation were carried out including different sets of biomarkers, not including AR mutation. The sets of biomarkers were as followed: discriminant analysis 1 (DA1) incorporated all measured targets; discriminant analysis 2 (DA2) only cfDNA mutations; discriminant analysis 3 (DA3) only cfDNA methylation; discriminant analysis 4 (DA4) only miRNAs; discriminant analysis 5 (DA5) included the biomarkers with highest correlations identified through the correlation matrixes (Table 3).

Table 3: Discriminant analysis for classification of AR- samples in healthy and tumor samples The results show that the DA5 model yielded the best results and classified healthy and tumor samples with 95.4% accuracy, 97.9% sensitivity, 80% specificity, and receiver operating characteristic area under the curve (ROC AUC) of 0.884.

The AR H875Y mutation plays a key role in this model. Androgen receptor alterations have been identified as some of the main drivers of castration-resistant prostate cancer. The AR H875Y mutation has been predominantly found in prostate cancer, but this mutation has been reported also for breast cancer and CRC. However, to our knowledge, this is the first time that AR H875Y mutation is reported for bladder, lung, stomach, ovarian, brain, and pancreas cancer. AR mutations have been predominantly studied in connection to prostate and breast cancer, especially treatment response. We analyzed all predefined targets in all samples, and not only the genes reported to be relevant in the specific cancer type. Besides, we used a qPCR-based method for the detection of cfDNA mutations which is shown to have a better sensitivity to detect low allele fraction variants than sequencing.

Here, we describe a model consisting of two steps, sorting samples in AR+ and AR- groups and consequently classifying the AR- group in cancer patients and healthy subjects (95.4% accuracy, 97.9% sensitivity, 80% specificity, 0.884 ROC AUC). The results show that classification models, which are based solely on mutations, cfDNA methylation, or miRNAs, respectively, have poor specificity (DA2 26.7%, DA3 33.3%, and DA4 33.3% respectively). Combining all the analyzed biomarkers improved the specificity to some extent (DA1 57.3%), but the sensitivity declined. The large number of biomarkers included in the DA1 model decreases the performance of the classifier since some of them contain redundant and superficial information. To alleviate the effects of this so- called “curse of dimensionality” (also known as “Hughes phenomenon”), the number of biomarkers included in the model was decreased. Hence, biomarkers selection was carried out and a classification model was performed based on the most relevant biomarkers (DA5), displaying the best results (table 3). Interestingly, the results demonstrate that the combination of three different analytes can improve the performance of a classification model. Each analyte type provides distinct information and adds a value to the classification model which highlights the importance of a multi-analyte based liquid biopsy test for cancer detection. The results demonstrate that it can be concluded with very high accuracy with the inventive method if an individual has a cancer from the defined group or not.

References

Azad, A. A., Volik, S.V., Wyatt, A.W., Haegert, A., Le Bihan, S., Bell, R.H., Anderson, S.A., McConeghy, B., Shukin, R., Bazov, J., et al. (2015). Androgen Receptor Gene Aberrations in Circulating Cell-Free DNA: Biomarkers of Therapeutic Resistance in Castration-Resistant Prostate Cancer. Clinical cancer research : an official journal of the American Association for Cancer Research 21, 2315-2324.

Brooke, G.N., and Bevan, C.L. (2009). The role of androgen receptor mutations in prostate cancer progression. Curr Genomics 10, 18-25.

Cancer Genome Atlas, N. (2012). Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330-337.

Cohen, J.D., Javed, A. A., Thoburn, C., Wong, F., Tie, J., Gibbs, P., Schmidt, C.M., Yip-Schneider, M.T., Allen, P.J., Schattner, M., et al. (2017). Combined circulating tumor DNA and protein biomarker-based liquid biopsy for the earlier detection of pancreatic cancers. Proceedings of the National Academy of Sciences of the United States of America 114, 10202-10207.

Cohen, J.D., Li, L., Wang, Y., Thoburn, C., Afsari, B., Danilova, L., Douville, C., Javed, A. A., Wong, F., Mattox, A., et al. (2018). Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926-930.

Esposito, A., Criscitiello, C., Locatelli, M., Milano, M., and Curigliano, G. (2016). Liquid biopsies for solid tumors: Understanding tumor heterogeneity and real time monitoring of early resistance to targeted therapies. Pharmacol Ther 157, 120-124.

Fleischhacker, M., and Schmidt, B. (2007). Circulating nucleic acids (CNAs) and cancer--a survey. Biochimica et biophysica acta 1775, 181-232.

Fleiss, J.L. (1973). Statistical methods for rates and proportions (New York,: Wiley).

Han, Wang, Sun (2017). Circulating Tumor DNA as Biomarkers for Cancer Detection. Gen Proteom Bioinf 15, 59-72

Gai, Sun (2019). Epigenetic Biomarkers in Cell-Free DNA and Applications in Liquid Biopsy. Genes 10, 32

Lan, Lu, Wang, Jin (2015), MicroRNAs as Potential Biomarkers in Cancer: Opportunities and Challenges, BioMed Res Int, Art. ID125094

Lehmann-Werman, R., Neiman, D., Zemmour, H., Moss, J., Magenheim, J., Vaknin- Dembinsky, A., Rubertsson, S., Nellgard, B., Blennow, K., Zetterberg, H., et al. (2016). Identification of tissue-specific cell death using methylation patterns of circulating DNA. Proceedings of the National Academy of Sciences of the United States of America 113, E1826-1834.

Li, J., Harris, L., Mamon, H., Kulke, M.H., Liu, W.H., Zhu, P., and Mike Makrigiorgos, G. (2006). Whole genome amplification of plasma-circulating DNA enables expanded screening for allelic imbalance in plasma. The Journal of molecular diagnostics : JMD 8, 22-30.

Olivier, M., Hollstein, M., and Hainaut, P. (2010). TP53 mutations in human cancers: origins, consequences, and clinical use. Cold Spring Harb Perspect Biol 2, a001008.

Razavi, P., Chang, M.T., Xu, G., Bandlamudi, C., Ross, D.S., Vasan, N., Cai, Y., Bielski, C.M., Donoghue, M.T.A., Jonsson, P., et al. (2018). The Genomic Landscape of Endocrine-Resistant Advanced Breast Cancers. Cancer cell 34, 427-438 e426.

Shigeyasu, K., Toden, S., Zumwalt, T.J., Okugawa, Y., and Goel, A. (2017). Emerging Role of MicroRNAs as Liquid Biopsy Biomarkers in Gastrointestinal Cancers. Clinical cancer research : an official journal of the American Association for Cancer Research 23, 2391-2399.

Taplin, M.E., Bubley, G.J., Shuster, T.D., Frantz, M.E., Spooner, A.E., Ogata, G.K., Keer, H.N., and Balk, S.P. (1995). Mutation of the androgen-receptor gene in metastatic androgen-independent prostate cancer. The New England journal of medicine 332, 1393-1398.

Arora, A., Olshen, A.B., Seshan, V.E. et al. Pan-cancer identification of clinically relevant genomic subtypes using outcome-weighted integrative clustering. Genome Med 12, 110 (2020).

Hoadley KA et al., Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer. Cell. 2018 Apr 5;173(2):291-304.

Wang B et al., Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014 Mar;11(3):333-7

Luo Z et al., Pan-cancer analysis identifies telomerase-associated signatures and cancer subtypes. Mol Cancer. 2019 Jun 10; 18(1): 106.

Omberg L at al., Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas. Nat Genet. 2013 Oct;45(10):1121-6.

Mamatjan Y et al, Molecular Signatures for Tumor Classification: An Analysis of The Cancer Genome Atlas Data. J Mol Diagn. 2017 Nov;19(6):881-891.