Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DNA COPY NUMBER ALTERATIONS FOR PREDICTING TREATMENT RESPONSE IN PATIENTS WITH BREAST CANCER
Document Type and Number:
WIPO Patent Application WO/2023/208933
Kind Code:
A1
Abstract:
The present invention refers to the in vitro use of DNA copy number alterations (CNAs) for predicting the response of patients with HR+/HER2− breast cancer to a treatment comprising targeted therapy, such as CDK4/6 inhibitors, and/or endocrine therapy; for the prognosis of patients with HR+/HER2− breast cancer; for monitoring patients with HR+/HER2− breast cancer; or for classifying patients with HR+/HER2− breast cancer into responder or non-responder to a treatment comprising targeted therapy, such as CDK4/6 inhibitors, and/or endocrine therapy.

Inventors:
PRAT APARICIO ALEIX (ES)
BRASÓ MARISTANY FARA (ES)
VILLAGRASA-GONZALEZ PATRICIA (ES)
VIVANCOS ANA (ES)
Application Number:
PCT/EP2023/060810
Publication Date:
November 02, 2023
Filing Date:
April 25, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HOSPITAL CLINIC BARCELONA (ES)
FUNDACIO DE RECERCA CLINIC BARCELONA INST DINVESTIGACIONS BIOMEDIQUES AUGUST PI I SUNYER (ES)
UNIV BARCELONA (ES)
FUNDACIO PRIVADA INST DINVESTIGACIO ONCOLÒGICA DE VALL HEBRON (ES)
REVEAL GENOMICS S L (ES)
International Classes:
C12Q1/6886
Other References:
RAKHA EMAD A ET AL: "New Advances in Molecular Breast Cancer Pathology", SEMINARS IN CANCER BIOLOGY, SAUNDERS SCIENTIFIC PUBLICATIONS, PHILADELPHIA, PA, US, vol. 72, 5 April 2020 (2020-04-05), pages 102 - 113, XP086573993, ISSN: 1044-579X, [retrieved on 20200405], DOI: 10.1016/J.SEMCANCER.2020.03.014
BERTUCCI FRAN�OIS ET AL: "The therapeutic response of ER+/HER2- breast cancers differs according to the molecular Basal or Luminal subtype", vol. 6, no. 1, 1 December 2020 (2020-12-01), XP055971211, Retrieved from the Internet DOI: 10.1038/s41523-020-0151-5
MIGLIACCIO ILENIA ET AL: "Circulating Biomarkers of CDK4/6 Inhibitors Response in Hormone Receptor Positive and HER2 Negative Breast Cancer", CANCERS, vol. 13, no. 11, 27 May 2021 (2021-05-27), pages 2640, XP055971123, DOI: 10.3390/cancers13112640
FINN RICHARD S. ET AL: "Biomarker Analyses of Response to Cyclin-Dependent Kinase 4/6 Inhibition and Endocrine Therapy in Women with Treatment-Na�ve Metastatic Breast Cancer", CLINICAL CANCER RESEARCH, vol. 26, no. 1, 16 September 2019 (2019-09-16), US, pages 110 - 121, XP055814465, ISSN: 1078-0432, DOI: 10.1158/1078-0432.CCR-19-0751
MIGLIACCIO ILENIA ET AL: "CDK4/6 inhibitors: A focus on biomarkers of response and post-treatment therapeutic strategies in hormone receptor-positive HER2-negative breast cancer", CANCER TREATMENT REVIEWS, ELSEVIER, AMSTERDAM, NL, vol. 93, 7 December 2020 (2020-12-07), XP086485421, ISSN: 0305-7372, [retrieved on 20201207], DOI: 10.1016/J.CTRV.2020.102136
PAPAKONSTANTINOU ANDRI ET AL: "Prognostic value of ctDNA detection in patients with early breast cancer undergoing neoadjuvant therapy: A systematic review and meta-analysis", CANCER TREATMENT REVIEWS, ELSEVIER, AMSTERDAM, NL, vol. 104, 18 February 2022 (2022-02-18), XP086977131, ISSN: 0305-7372, [retrieved on 20220218], DOI: 10.1016/J.CTRV.2022.102362
XIA YFAN CHOADLEY KAPARKER JSPEROU CM.: "Genetic determinants of the molecular portraits of epithelial cancers", NATURE COMMUNICATIONS, vol. 10, no. 1, 2019, pages 5666
XIA ET AL., NAT COMMS, 2019
Attorney, Agent or Firm:
HOFFMANN EITLE S.L.U. (ES)
Download PDF:
Claims:
CLAIMS

1. In vitro method for the prognosis or predicting the response of patients with HR+/HER2- breast cancer to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy, which comprises: a. Assessing, in a biological sample obtained from the patient, the presence of DNA copy number alterations (CNA) over any of the following DNA segments selected from the list consisting of: chr20:33386980-33969561, chrl3:46362859- 48209064, chrl7: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230- 7717938, chr8: 128774432-128849112, chrl6: 1-38200000, chr4:83634873- 83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3: 58626894- 61524607; b. Wherein the presence of CNA over any of the following DNA segments selected from the list consisting of: chr20:33386980-33969561, chr8: 128774432- 128849112 or chrl6: 1-38200000 is an indication of poor prognosis or poor response; or c. Wherein the presence of CNA over any of the following DNA segments selected from the list consisting of: chrl3:46362859-48209064, chr 17: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230-7717938, chr4:83634873-

83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3 : 58626894- 61524607 is an indication of good prognosis or good response.

2. In vitro method for monitoring patients with HR+/HER2- breast cancer to assess whether they are responding to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy which comprises: a. Assessing, in a biological sample obtained from the patient, the presence of DNA copy number alterations (CNA) over any of the following DNA segments selected from the list consisting of: chr20:33386980-33969561, chrl3:46362859- 48209064, chrl7: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230- 7717938, chr8: 128774432-128849112, chrl6: 1-38200000, chr4:83634873- 83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3: 58626894- 61524607; b. Wherein the presence of CNA over any of the following DNA segments selected from the list consisting of: chr20:33386980-33969561, chr8: 128774432- 128849112 or chrl6: 1-38200000 is an indication of poor response; or c. Wherein the presence of CNA over any of the following DNA segments selected from the list consisting of: chrl3:46362859-48209064, chr 17: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230-7717938, chr4:83634873-

83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3 : 58626894- 61524607 is an indication of good response. In vitro method for classifying patients with HR+/HER2- breast cancer into groups associated with a different response to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy, which comprises: a. Assessing, in a biological sample obtained from the patient, the presence of DNA copy number alterations (CNA) over any of the following DNA segments selected from the list consisting of: chr20:33386980-33969561, chrl3:46362859- 48209064, chrl7: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230- 7717938, chr8: 128774432-128849112, chrl6: 1-38200000, chr4:83634873- 83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3: 58626894- 61524607; b. Wherein the presence of CNA over any of the following DNA segments selected from the list consisting of: chr20:33386980-33969561, chr8: 128774432- 128849112 or chrl6: 1-38200000 is an indication of poor response; or c. Wherein the presence of CNA over any of the following DNA segments selected from the list consisting of: chrl3:46362859-48209064, chr 17: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230-7717938, chr4:83634873-

83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3 : 58626894- 61524607 is an indication of good response. In vitro method, according to any of the previous claims, which further comprises assessing the presence of CNA over any of the following DNA segments selected from the list consisting of: chrl7:63942109-65847254, chr2:32460827-55039898, chr7: 16017926-18944036, chr2: 1-93300000, or chrl2: l-1311104. In vitro method, according to any of the previous claims, which comprises: a. Assessing the presence of DNA copy number alterations (CNA) over any of the combinations of DNA segments of Table 14 and Table 15 in a biological sample obtained from the patient; b. Processing the measured CNAs in order to obtain a score; c. Wherein if a deviation or variation of the score value is identified in any of the combinations of DNA segments of Table 14, as compared with a pre-established reference value, this is indicative of poor prognosis or poor response; or d. Wherein if a deviation or variation of the score value is identified in any of the combinations of DNA segments of Table 15, as compared with a pre-established reference value, this is indicative of good prognosis or good response. In vitro method, according to any of the previous claims, characterized in that it is a computer-implemented method which comprises: a. Receiving a plurality of CNAs data sets from the patient; b. Processing the information according to step a) for finding a statistically significant variations or deviations; c. Providing a result by the computer system based on the information received according to the step a) and a pre-established standard already stored in the computer. In vitro method, according to any of the previous claims, wherein the presence of CNAs is assessed in all the following the DNA segments: chr20:33386980-33969561, chrl3:46362859-48209064, chrl7:63942109-65847254, chrl7: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230-7717938, chr2:32460827-55039898, chr7: 16017926-18944036, chr2: 1-93300000, chr8: 128774432-128849112, chrl2: l- 1311104, chrl6: 1-38200000, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: 1-526082 and chr3:58626894-61524607. In vitro method, according to any of the previous claims, wherein the presence of CNAs is assessed in any or all the DNA segments of Table 7, Table 8 or Table 9. In vitro method, according to any of the previous claims, wherein the biological sample is selected from plasma, serum, breast milk, cerebrospinal fluid or blood samples. In vitro method, according to any of the previous claims, wherein the cancer subtype is selected from: HR+/HER2-, HER2+ and triple negative. In vitro use of a DNA segment selected from the list consisting of: chr20:33386980- 33969561, chrl3:46362859-48209064, chrl7: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230-7717938, chr8: 128774432-128849112, chrl6: 1-38200000, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3:58626894- 61524607, or any of the combination of DNA segments of Table 14 or Table 15, for predicting the response of patients with HR+/HER2- breast cancer to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy; for the prognosis of patients with HR+/HER2- breast cancer; for monitoring patients with HR+/HER2- breast cancer to assess whether they are responding to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy; or for classifying patients into groups associated with a different response to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy.

12. Kit, suitable for performing any of the methods of claims 1 to 10, comprising tools and reagents for assessing the presence of CNAs in a segment selected from the list consisting of: chr20:33386980-33969561, chrl3:46362859-48209064, chrl7: l- 22200000, chrlO: 129812260-135374737, chrl7:7471230-7717938, chr8: 128774432- 128849112, chrl6: 1-38200000, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3:58626894-61524607, or any of the combination DNA segments of Table 14 or 15.

13. Use of the kit of claim 12 for predicting the response of patients with HR+/HER2- breast cancer to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy; for the prognosis of patients with HR+/HER2- breast cancer; for monitoring patients with HR+/HER2- breast cancer to assess whether they are responding to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy; or for classifying patients into groups associated with a different response to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy.

14. Targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy, for use in the treatment of patients with HR+/HER2- breast cancer, wherein the patients have been identified as responder patients according to any of the method of claims 1 to 10.

15. In vitro method for classifying patients with HR+/HER2- breast cancer according to their survival probability which comprises: a) Assessing, in a biological sample obtained from the patient, the presence of DNA copy number alterations (CNA) over any of the following DNA segments selected from the list consisting of: hr20:33386980-33969561, chrl3:46362859-48209064, chrl7:63942109- 65847254, chrl7: 1-22200000, chrl0:129812260-135374737, chrl7:7471230-7717938, chr2:32460827-55039898, chr7: 16017926- 18944036, chr2: 1-93300000, chr8: 128774432-128849112, chrl2: l-1311104, chrl6: 1-38200000, chr4:83634873- 83961360, chr5:76408288-81082828, chrl9: 1-526082 and/or chr3:58626894-61524607; b) Processing the measured CNAs in order to obtain a score value; c) Wherein if a deviation or variation of the score value is identified as compared with a first pre-established reference value, this is indicative that the patient belongs to cluster 1 characterized by the highest survival probability; or c) Wherein if a deviation or variation of the score value is identified as compared with a second pre-established reference value, this is indicative that the patient belongs to cluster 2 characterized by the second highest survival probability; or c) Wherein if a deviation or variation of the score value is identified as compared with a third pre-established reference value, this is indicative that the patient belongs to cluster 3 characterized by the second worst survival probability; or c) Wherein if a deviation or variation of the score value is identified as compared with a fourth pre-established reference value, this is indicative that the patient belongs to cluster

4 characterized by the worst survival probability.

Description:
DNA COPY NUMBER ALTERATIONS FOR PREDICTING TREATMENT RESPONSE IN PATIENTS WITH BREAST CANCER

FIELD OF THE INVENTION

The present invention refers to the medical field. Particularly, the present invention refers to the in vitro use of DNA copy number alterations (CNAs) for predicting the response of patients with HR+/HER2- breast cancer to a treatment comprising targeted therapy such as a CDK4/6 inhibitor and/or endocrine therapy, for the prognosis of patients with HR+/HER2- breast cancer, for monitoring patients with HR+/HER2- breast cancer, or for classifying patients with HR+/HER2- breast cancer into responder or non-responder to a treatment comprising targeted therapy and/or endocrine therapy.

STATE OF THE ART

Sequencing of tumor DNA has brought many new biomarkers and possibilities to precision oncology. Detection of somatic gene mutations, amplifications, and gene fusions allows the delivery of targeted therapies in multiple cancer-types, such as lung cancer, colorectal, melanoma and breast cancer. In addition, detection of a high number of somatic mutations (i.e., tumor mutational burden) or a microsatellite instability-high phenotype can help identify candidates for anti-PDl/PDLl immune checkpoint inhibitors. Importantly, sequencing of circulating tumor DNA in blood samples (i.e., the so-called liquid biopsy, and henceforth called “ctDNA”) allows an easy access to some tumor-based genetic information at any given timepoint and can replace a tumor tissue biopsy in some cases, thus avoiding delays and complications of a solid tumor invasive biopsy procedure, which can be quite challenging in the metastatic setting.

Identification of single tumor DNA alterations can be clinically useful. However, cancer is highly complex and additional biological information is likely needed to refine the prediction of patients' prognosis and/or treatment benefit. Breast cancer is the perfect example since RNA- based signature profiling tests provide clinical and biological useful information beyond individual somatic gene mutations or amplifications of genes such as PIK3CA or ERBB2. In early disease, multi-gene RNA-based prognostic assays (e.g., OncotypeDX, Mammaprint and Prosigna) are available and recommended by clinical guidelines. In advanced disease, RNA- based profiling is becoming a promising prognostic and predictive tool. Unfortunately, tissue samples in patients with advanced disease are not readily available and, even so, the type of metastatic organ or site can compromise the expression patterns obtained from bulk RNA and might not reflect the intra-patient tumor heterogeneity. The present invention is focused on solving this technical and clinical problem and it is herein proposed to use DNA sequencing, preferably DNA sequencing of ctDNA in plasma, serum, breast milk, cerebrospinal fluid or blood to capture clinically relevant information beyond simple single genetic alterations. This approach could be highly relevant in the metastatic setting, where ctDNA might be the only readily available genetic material from tumors.

Although the ability of ctDNA to capture complex data with clinical value is unknown in the context of metastatic breast cancer, the present invention demonstrates that complex and clinically relevant tumor phenotypic traits can be identified in DNA, particularly in ctDNA.

DESCRIPTION OF THE INVENTION

Brief description of the invention

The present invention refers to the in vitro use of CNAs for predicting the response of HR+/HER2- breast cancer to a treatment comprising targeted therapy, such as CDK4/6 inhibition, and/or endocrine therapy, for the prognosis of patients with HR+/HER2- breast cancer, for monitoring patients with HR+/HER2-breast cancer, or for classifying patients with HR+/HER2- breast cancer into responder or non-responder to a treatment comprising targeted therapy, such as CDK4/6 inhibition, and/or endocrine therapy.

Particularly, the present invention is focused on the use of CNAs, preferably from ctDNA, to capture complex and clinically relevant tumor phenotypes in breast cancer.

The inventors of the present invention herein demonstrate that machine learning multi-gene signatures obtained from DNA, preferably ctDNA, sense many parts of a given pathway and identify several complex biological features, including measures of tumor proliferation and estrogen receptor signalling, similar to what is accomplished using direct tumor RNA profiling. For instance, it is herein demonstrated that a ctDNA-based genomic signature tracking retinoblastoma loss-of-heterozygosity (RB-LOH) is significantly associated with poor prognosis and drug response in patients with metastatic breast cancer treated with a CDK4/6 inhibitor and/or endocrine therapy, independently of tumor cell fraction and other clinical-pathological variables.

Kindly note that the contribution made by the present invention to the prior art is the possibility of using CNA over a plurality of DNA segments for predicting complex phenotypes in patients with HR+/HER2- breast cancer, for instance to predict whether the patient will respond to targeted therapy and/or endocrine therapy. So, the special technical feature conferring unity of invention to the present invention would be the assessment of the presence of CNA over a plurality of DNA segments on a plurality of chromosomes precisely in this clinical context, preferably departing for blood, serum, breast milk, cerebrospinal fluid or plasma samples.

The 150 CNA-based signatures from [Xia Y, Fan C, Hoadley KA, Parker JS, Perou CM. Genetic determinants of the molecular portraits of epithelial cancers. Nature Communications 2019; 10(l):5666 doi 10.1038/s41467-019-13588-2~\ (see Figure 1) were interrogated in 87 plasma samples with tumor fraction (TF)>3% of patients with hormone receptor-positive/HER2- negative (HR+/HER2-) metastatic breast cancer (mBC) treated with CDK4/6 inhibitors plus endocrine therapy (CDK4/6i + ET), hereafter CDK plasma-1 cohort. Cox regression models determined the association of each individual signature with progression-free survival (PFS) and overall survival (OS) and identified 27 prognostic signatures for both PFS and OS. The 27 signatures were interrogated in the MSKCC-CDK (n=381, PFS) and METABRIC-HR+/HER2- (n=1131, DFS, OS) cohorts. The prognostic value of the 27 signatures was validated in at least one of the validation cohorts.

Table 1. Number of significant signatures in each cohort

A total of 514 segments were present in the 27 signatures, and we selected those segments included in at least 15 signatures, which meant 75 segments. Redundant segments (segments of the same chromosomal region that had the same signal) were excluded.

Sixteen segments (Table 2) were initially identified as the main drivers of the 27 signatures

(Table 3)

Table 2. A list of the 16 preferred DNA segments is shown.

Table 3. A list of the 27 preferred analysed phenotypes is shown. Table 4. Weights of the 16 segments in the 27 ctDNA-based signatures. Table 5. Number of times a segment is present in a significant combination

Using the CDK plasma-1 (n=87, PFS, OS), MSKCC-CDK (n=381, PFS), METABRIC- HR+/HER2- (n=1131, DFS, OS) cohorts, we also demonstrate that the % of prognostic combination scores is higher than the % of prognostic segments.

Table 6. % of prognostic segments and combinations

In a preferred embodiment of the invention, the presence of CNAs is assessed in any or all the DNA segments of Table 7, Table 8 or Table 9. Table 7. Quantitative SAM analysis of chromosomic regions associated with RB-LOH signature score.

Table 8. Univariate analyses of 514 DNA segments for PFS in 87 patients with advanced HR+/HER2- breast cancer treated with CDK4/6 inhibitors and endocrine therapy. Table 9. Univariate analyses of 514 DNA segments for overall survival (OS) in 87 patients with advanced HR+/HER2- breast cancer treated with CDK4/6 inhibitors and endocrine therapy.

The methodology comprises calculating segment-level CNA scores instead of gene-level CNA. First, segmentation files from CNVkit output (for tumor DNA) and ichorCNA output (ctDNA) are mapped to gene-level feature. Then, each segment score is calculated as the mean copy number score across genes within the segment. The prognostic significance of each segment score is tested as a continuous variable using univariate and multivariable Cox models for PFS and OS. DNA-based signature scores were calculated as the weighted average of DNA segment values for each sample. The coefficients of DNA segments for predicting gene signatures are those reported in Xia et al. Nat Comms 2019. The prognostic significance of each signature score is tested as a continuous variable or as a categorical variable (low, medium, high as defined by tertiles) using univariate and multivariable Cox models for PFS and OS.

So, the first embodiment of the present invention refers to an in vitro method for predicting the response of patients with HR+/HER2- breast cancer to a treatment comprising targeted therapy, such as CDK4/6 inhibitors, and/or endocrine therapy, which comprises: a) assessing the presence of CNA over a plurality of DNA segments on a plurality of chromosomes in a biological sample obtained from the patient; b) processing the measured CNAs in order to obtain a score; and c) wherein if a deviation or variation of the score value is identified, as compared with a pre- established reference value, this is indicative that the patient with HR+/HER2- breast cancer will respond or will be resistant to the treatment.

The second embodiment of the present invention refers to an in vitro method for the prognosis of patients with HR+/HER2- breast cancer which comprises: a) assessing the presence of CNAs over a plurality of DNA segments on a plurality of chromosomes in a biological sample obtained from the patient; b) processing the measured CNAs in order to obtain a score; and c) wherein if a deviation or variation of the score value is identified, as compared with a pre-established reference value, this is indicative of the prognosis of the patient with HR+/HER2- breast cancer. The third embodiment of the present invention refers to an in vitro method for monitoring patients with HR+/HER2- breast cancer to assess whether they are responding to treatment comprising targeted therapy, such as CDK4/6 inhibitors, and/or endocrine therapy, which comprises: a) assessing the presence of CNAs over a plurality of DNA segments on a plurality of chromosomes in a biological sample obtained from the patient; b) processing the measured CNAs in order to obtain a score; and c) wherein if a deviation or variation of the score value is identified, as compared with a pre-established reference value, this is indicative that the patient with HR+/HER2- breast cancer will respond or will be resistant to the treatment.

The fourth embodiment of the present invention refers to an in vitro method for classifying patients with HR+/HER2- breast cancer into biologically and clinically relevant groups which are associated with a different response to a treatment comprising targeted therapy, such as CDK4/6 inhibitors, and/or endocrine therapy, which comprises: a) assessing the presence of CNAs over a plurality of DNA segments on a plurality of chromosomes in a biological sample obtained from the patient; b) processing the measured CNAs in order to obtain multiple scores; c) classifying breast cancer samples based on their scoring profile into different groups as compared with a pre-established reference value; and d) wherein each group is indicative whether the patient with HR+/HER2- breast cancer will respond to the treatment.

In a preferred embodiment of the invention, the method is a computer-implemented method which comprises: a) receiving a plurality of CNAs data sets from the patient; b) processing the information according to step a) for finding a statistically significant variations or deviations; and c) providing a result by the computer system based on the information received according to the step a) and a pre-established standard already stored in the computer.

In a preferred embodiment, the “score” is obtained after assessing the presence of CNAs in tumoral DNA segments which are present in the biological sample. The signal of each segment is calculated by averaging the signal of each gene within each segment. The final score is calculated by multiplying the signal of each DNA segment by a previously established coefficient or weight and summing up all of them. After processing all the values, a single score is obtained. This single score will be compared with the “pre-established reference value” to finally make clinical decisions. The “pre-established reference value” is a threshold value obtained after assessing the presence of CNAs in “normal DNA” segments (i.e., non-tumoral DNA) segments, which are present in the biological sample. Particularly, if a deviation or variation of the “score” value is identified, typically a “score” value higher or lower than the “pre-established reference value”, this is indicative that the patient with HR+/HER2- breast cancer will respond or will be resistant to the treatment, or that the patient will have a good or poor prognosis.

In a preferred embodiment of the invention, the presence of CNAs is assessed in any of the following the DNA segments (see Table 2, Table 4 and Table 5) (the nomenclature of the segments is as found in NCBI Genome data): chr20:33386980-33969561, chrl3:46362859- 48209064, chrl7:63942109-65847254, chrl 7: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230-7717938, chr2:32460827-55039898, chr7: 16017926-18944036, chr2: l-

93300000, chr8: 128774432-128849112, chrl2:l-1311104, chrl6: 1-38200000, chr4:83634873- 83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3:58626894-61524607.

In a preferred embodiment of the invention, the presence of CNAs is assessed in all the following the DNA segments (see Table 2, Table 4 and Table 5): chr20:33386980-33969561, chrl3:46362859-48209064, chrl7:63942109-65847254, chrl7: 1-22200000, chrlO: 129812260- 135374737, chrl7:7471230-7717938, chr2:32460827-55039898, chr7: 16017926-18944036, chr2: 1-93300000, chr8: 128774432-128849112, chrl2:l-1311104, chrl6: 1-38200000, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3: 58626894- 61524607.

In a preferred embodiment, the biological sample is selected from plasma, serum, breast milk, cerebrospinal fluid or blood samples.

In a preferred embodiment, the patient is suffering from breast cancer.

In a preferred embodiment, the breast cancer subtype is selected from: HR+/HER2-, HER2+ and triple-negative.

The fifth embodiment of the present invention refers to a kit, suitable for performing the above described methods, comprising tools and reagents for assessing the presence of CNAs in the following the DNA segments: chr20:33386980-33969561, chrl3:46362859-48209064, chrl7:63942109-65847254, chrl7: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230- 7717938, chr2:32460827-55039898, chr7: 16017926- 18944036, chr2: 1-93300000, chr8: 128774432-128849112, chrl2: l-1311104, chrl6: 1-38200000, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: 1-526082 and/or chr3:58626894-61524607, preferably in the DNA segments of Table 7, Table 8 or Table 9.

The sixth embodiment of the present invention refers to the use of the above defined kit for predicting the response of patients with HR+/HER2- breast cancer to a treatment comprising targeted therapy such as a CDK4/6 inhibitor and/or endocrine therapy, for the prognosis of patients with HR+/HER2- breast cancer, for monitoring patients with HR+/HER2- breast cancer, or for classifying HR+/HER2- breast cancer into biologically relevant groups which are associated with response to a treatment comprising targeted therapy such as a CDK4/6 inhibitor and/or endocrine therapy.

The last embodiment of the present invention refers to targeted therapy such as a CDK4/6 inhibitor and/or endocrine therapy for use in the treatment of patients with HR+/HER2- breast cancer, wherein the patients have been identified as responder patients according to the method described in any of the above embodiments. Alternatively, the present invention refers to a method for treating patients with HR+/HER2- breast cancer which comprises the administration of a therapeutically effective dose or amount of targeted therapy such as a CDK4/6 inhibitor and/or endocrine therapy once the patients have been identified as being responder by means of the method described in any of the above embodiments.

In a preferred embodiment the endocrine therapy comprises letrozole, anastrozole, exemestane, tamoxifen, selective estrogen receptor degraders such as fulvestrant, and targeted therapies comprises CDK4/6 inhibitors (palbociclib, abemaciclib, ribociclib or trilaciclib), PI3K/mT0R inhibitors (alpelisib, everolimus) and antibody-drug conjugates targeting HER2 (trastuzumab deruxtecan, trastuzumab duocarmazine, disitamab vedotin, ARX788 or BAT8001), HER3 (patritutumab or deruxtecan) and TR0P2 (sacituzumab govitecan, datopotamab deruxtecan or SKB264) and LIV-1 (ladiratuzumab vedotin).

Moreover, it is important to note that the prognostic value of the 16 segment scores was assessed in the CDK plasma-1 (n=87, PFS, OS), MSKCC-CDK (n=381, PFS), METABRIC-HR+/HER2- (n=1131, DFS, OS) cohorts.

A total of 11 segments (68.75%) were prognostic in at least one cohort: 4 segments were associated with bad prognosis and 8 segments were associated with good prognosis.

Table 10. Number of significant segments in each cohort Table 11. Segments significantly associated with poor prognosis

Table 12. Segments significantly associated with good prognosis Moreover, it is important to note that the signals from at least 2 segments were combined and their individual association with prognosis was evaluated. Scores for each combination comprising 2 of the above identified 16 segments were calculated as Combination score = adjusted score segment 2 - adjusted score segment. The prognostic value of the 240 possible combination scores was assessed in the CDK plasma-1 (n=87, PFS, OS), MSKCC-CDK (n=381, PFS), METABRIC-HR+/HER2- (n=l 131, DFS, OS) cohorts.

Table 13. Number of significant combinations in each cohort

A total of 184 combinations (76.7%) were prognostic: 92 combinations were associated with bad prognosis and 92 combinations were associated with good prognosis.

Table 14. Combinations significantly associated with poor prognosis

Table 15. Combinations significantly associated with good prognosis

Therefore, the present invention also refers to an in vitro method for the prognosis or predicting the response of patients with HR+/HER2- breast cancer to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy, which comprises: a) assessing, in a biological sample obtained from the patient, the presence of CNA over any of the following DNA segments selected from the list consisting of: chr20:33386980-33969561, chrl3:46362859-48209064, chrl7: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230- 7717938, chr8: 128774432-128849112, chrl6: 1-38200000, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3:58626894-61524607; b) wherein the presence of CNA over any of the following DNA segments selected from the list consisting of: chr20:33386980-33969561, chr8: 128774432-128849112 or chrl6: 1-38200000 is an indication of poor prognosis or poor response; or c) wherein the presence of CNA over any of the following DNA segments selected from the list consisting of: chrl3:46362859-48209064, chrl7: l- 22200000, chrl0:129812260-135374737, chrl7:7471230-7717938, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3: 58626894-61524607 is an indication of good prognosis or good response.

The present invention also refers to an in vitro method for monitoring patients with HR+/HER2- breast cancer to assess whether they are responding to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy which comprises: a) assessing, in a biological sample obtained from the patient, the presence of CNA over any of the following DNA segments selected from the list consisting of: chr20:33386980-33969561, chrl3:46362859- 48209064, chrl 7: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230-7717938, chr8: 128774432-128849112, chrl6: 1-38200000, chr4:83634873-83961360, chr5:76408288- 81082828, chrl9: 1-526082 or chr3:58626894-61524607; b) wherein the presence of CNA over any of the following DNA segments selected from the list consisting of: chr20:33386980- 33969561, chr8: 128774432-128849112 or chrl 6: 1-38200000 is an indication of poor prognosis or poor response; or c) wherein the presence of CNA over any of the following DNA segments selected from the list consisting of: chrl3:46362859-48209064, chrl 7: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230-7717938, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3: 58626894-61524607 is an indication of good prognosis or good response.

The present invention also refers to an in vitro method for classifying patients with HR+/HER2- breast cancer into groups associated with a different response to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy, which comprises: a) assessing, in a biological sample obtained from the patient, the presence of CNA over any of the following DNA segments selected from the list consisting of: chr20:33386980-33969561, chrl3:46362859-48209064, chrl7: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230- 7717938, chr8: 128774432-128849112, chrl6: 1-38200000, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3:58626894-61524607; b) wherein the presence of CNA over any of the following DNA segments selected from the list consisting of: chr20:33386980-33969561, chr8: 128774432-128849112 or chrl6: 1-38200000 is an indication of poor prognosis or poor response; or c) wherein the presence of CNA over any of the following DNA segments selected from the list consisting of: chrl3:46362859-48209064, chrl7: l- 22200000, chrl0:129812260-135374737, chrl7:7471230-7717938, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3: 58626894-61524607 is an indication of good prognosis or good response.

In a preferred embodiment the method further comprises assessing the presence of CNA over any of the following DNA segments selected from the list consisting of: chrl7:63942109- 65847254, chr2:32460827-55039898, chr7: 16017926- 18944036, chr2: 1-93300000, or chrl2: l- 1311104.

In a preferred embodiment the method comprises: a) assessing the presence of CNA over any of the combinations of DNA segments of Table 14 and Table 15 in a biological sample obtained from the patient; b) processing the measured CNAs in order to obtain a score; c) wherein if a deviation or variation of the score value is identified in any of the combinations of DNA segments of Table 14, as compared with a pre-established reference value, this is indicative of poor prognosis or poor response; or d) wherein if a deviation or variation of the score value is identified in any of the combinations of DNA segments of Table 15, as compared with a pre- established reference value, this is indicative of good prognosis or good response. In a preferred embodiment the method is characterized in that it is a computer-implemented method which comprises: a) receiving a plurality of CNAs data sets from the patient; b) processing the information according to step a) for finding a statistically significant variations or deviations; and c) providing a result by the computer system based on the information received according to the step a) and a pre-established standard already stored in the computer.

In a preferred embodiment the method comprises assessing the presence of CNAs in all the following the DNA segments: chr20:33386980-33969561, chrl3:46362859-48209064, chrl7:63942109-65847254, chrl7: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230- 7717938, chr2:32460827-55039898, chr7: 16017926- 18944036, chr2: 1-93300000, chr8: 128774432-128849112, chrl2: l-1311104, chrl6: 1-38200000, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: 1-526082 and chr3:58626894-61524607.

In a preferred embodiment the method comprises assessing the presence of CNAs in any or all the DNA segments of Table 7, Table 8 or Table 9.

In a preferred embodiment the biological sample is selected from plasma, serum, breast milk, cerebrospinal fluid or blood samples.

In a preferred embodiment the cancer subtype is selected from: HR+/HER2-, HER2+ and triple negative.

The present invention also refers to the use of a DNA segment selected from the list consisting of: chr20:33386980-33969561, chrl3:46362859-48209064, chrl7: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230-7717938, chr8: 128774432-128849112, chrl6: l- 38200000, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: 1-526082 or chr3:58626894-61524607, or any of the combination of DNA segments of Table 14 or Table 15, for predicting the response of patients with HR+/HER2- breast cancer to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy; for the prognosis of patients with HR+/HER2- breast cancer; for monitoring patients with HR+/HER2- breast cancer to assess whether they are responding to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy; or for classifying patients into groups associated with a different response to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy.

The present invention also refers to a kit, suitable for performing any of the methods of the invention, comprising tools and reagents for assessing the presence of CNAs in a segment selected from the list consisting of: chr20:33386980-33969561, chr!3:46362859-48209064, chr 17: 1-22200000, chrlO: 129812260-135374737, chrl7:7471230-7717938, chr8: 128774432- 128849112, chrl6: 1-38200000, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: l- 526082 or chr3:58626894-61524607, or any of the combination DNA segments of Table 14 or Table 15

The present invention also refers to the use of the kit for predicting the response of patients with HR+/HER2- breast cancer to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy; for the prognosis of patients with HR+/HER2- breast cancer; for monitoring patients with HR+/HER2- breast cancer to assess whether they are responding to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy; or for classifying patients into groups associated with a different response to a treatment selected from targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy.

The present invention also refers to targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy, for use in the treatment of patients with HR+/HER2- breast cancer, wherein the patients have been identified as responder patients according to any of the methods of the invention.

Alternatively, the present invention refer to a method for treating patients suffering from HR+/HER2- breast cancer with targeted therapy, comprising CDK4/6 inhibitors, and/or endocrine therapy, which comprises identifying the patients as responder patients according to any of the method of the invention.

Finally, the present invention refers to an in vitro method for classifying patients with HR+/HER2- breast cancer according to their survival probability which comprises: a) Assessing, in a biological sample obtained from the patient, the presence of DNA copy number alterations (CNA) over any of the following DNA segments selected from the list consisting of: chr20:33386980-33969561, chrl3:46362859-48209064, chrl7:63942109-65847254, chrl7: l- 22200000, chrlO: 129812260-135374737, chrl7:7471230-7717938, chr2:32460827-55039898, chr7: 16017926-18944036, chr2: 1-93300000, chr8: 128774432-128849112, chrl2: l-1311104, chrl6: 1-38200000, chr4:83634873-83961360, chr5:76408288-81082828, chrl9: 1-526082 and chr3: 58626894-61524607; b) processing the measured CNAs in order to obtain a score value; c) wherein if a deviation or variation of the score value is identified as compared with a first pre- established reference value, this is indicative that the patient belongs to cluster 1 characterized by the highest survival probability; or d) wherein if a deviation or variation of the score value is identified as compared with a second pre-established reference value, this is indicative that the patient belongs to cluster 2 characterized by the second highest survival probability; or e) wherein if a deviation or variation of the score value is identified as compared with a third pre- established reference value, this is indicative that the patient belongs to cluster 3 characterized by the second worst survival probability; or f) wherein if a deviation or variation of the score value is identified as compared with a fourth pre-established reference value, this is indicative that the patient belongs to cluster 4 characterized by the worst survival probability .For the purpose of the present invention the following terms are defined:

• Copy number alterations (CNA): is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. CNAs are a type of structural variation: specifically, it is a type of duplication or deletion event that affects a considerable number of base pairs. CNAs can be generally categorized into two main groups: short repeats and long repeats. However, there are no clear boundaries between the two groups and the classification depends on the nature of the loci of interest. Short repeats include mainly dinucleotide repeats (two repeating nucleotides e.g. A-C-A-C-A-C...) and trinucleotide repeats. Long repeats include repeats of entire genes. This classification based on size of the repeat is the most obvious type of classification as size is an important factor in examining the types of mechanisms that most likely gave rise to the repeats, hence the likely effects of these repeats on phenotype.

• The expression “pre-established reference value” refers to a threshold value obtained after assessing the presence of CNAs in “normal DNA” segments (i.e. non-tumoral DNA) segments, which are present in the biological sample.

• The expression “score” refers to a value obtained after assessing the presence of CNAs in tumoral DNA segments which are present in the biological sample. The signal of each segment is calculated by averaging the signal of each gene within each segment. The final score is calculated by multiplying the signal of each DNA segment by its weight and summing up all of them. After processing all the values, a single score is obtained. This single value will be compared with the “pre-established reference value” to finally make clinical decisions. Particularly, if a deviation or variation of the “score” value is identified, typically a “score” value higher or lower than the “pre-established reference value”, this is indicative that the patient with HR+/HER2- breast cancer will respond or not to the treatment, or that the patient will have a good or poor prognosis.

• By "comprising" it is meant including, but not limited to, whatever follows the word "comprising". Thus, use of the term "comprising" indicates that the listed elements are required or mandatory, but that other elements are optional and may or may not be present.

• By "consisting of’ it is meant “including, and limited to”, whatever follows the phrase “consisting of’. Thus, the phrase "consisting of’ indicates that the listed elements are required or mandatory, and that no other elements may be present.

• By “therapeutically effective dose or amount” is intended an amount that, when administered as described herein, brings about a positive therapeutic response in a subject suffering from breast cancer. The exact amount required will vary from subject to subject, depending on the age, and general condition of the subject, the severity of the condition being treated, mode of administration, and the like.

Description of the figures

Figure 1. Circulating tumor DNA (ctDNA) in metastatic breast cancer, (a) Plasma samples were obtained from 207 patients (174 with HR+/HER2-, 16 HER2+, 16 TNBC, 1 N/A). After purification of plasma cell-free DNA, shallow whole genome sequencing (shWGS) was performed. Using the ctDNA-based sequencing data from 514 DNA segments, 150 previously developed DNA copy number-based signatures [Xia Y, Fan C, Hoadley KA, Parker JS, Perou CM. Genetic determinants of the molecular portraits of epithelial cancers. Nature Communications 2019; 10(1): 5666 doi 10. 1038 s41467- 19-13588-2} tracking a variety of biological processes were applied in patients with a Tumor cell Fraction (TF) >3%. Individual scores for each signature were obtained, (b) Relationship between TF and number of altered DNA segments detected across 246 plasma ctDNA samples before (left) and after (right) adjusting the DNA copy-number signal by TF and ploidy using the ichorCNA tool, (c) Examples of correlation between the scores of two DNA-based signatures when determined in plasma versus tumor tissue. Of note, tissue samples were obtained at different timepoints than the plasma samples, (d) Association between the ER (left) and HER2 (right) tumor tissue status and expression of two ctDNA-based signatures tracking ER-related and HER2-related biology, respectively.

Figure 2. The ctDNA-based RB-LOH signature predicts clinical outcome in advanced HR+/HER2- breast cancer treated with endocrine therapy and a CDK4/6 inhibitor, (a) A plasma sample was obtained from 124 patients within 48 hours prior to starting endocrine therapy and CDK4/6 inhibition. ctDNA-based signatures were applied in plasma samples with a TF>3% (n=87). (b) RB-LOH ctDNA-based signature score in patients with complete or partial response (CR/PR), stable disease and progressive disease (PD). (c) Kaplan-Meier curves of PFS (left) and OS (right) of the RB-LOH ctDNA-based signature. Each patient group is based on tertiles, (d) Average ctDNA signal of 16 features of the original RB-LOH DNA-based signature (column on the left) and the weight and direction of each feature (column on the right) as previously reported in Xia et al [Xia Y, Fan C, Hoadley KA, Parker JS, Perou CM. Genetic determinants of the molecular portraits of epithelial cancers. Nature Communications 2019;10(l):5666 doi 10.1038/s41467-019-13588-2\. (e) Forest plots of hazard ratios (HRs) for PFS (left) and OS (right) of the RB-LOH DNA-based signature when evaluated in plasma alone (i.e., plasma - univariate; n=87), in plasma when adjusted for TF (n=87), in plasma when adjusted for PAM50 RNA-based subtypes (n=53), in plasma when adjusted for TF+PAM50+clinical variables (n=53), in tissue alone (i.e., tissue - univariate; n=63) and in plasma when adjusted for tissue and vice-versa (n=28). (f) ctDNA-based signature scores of the RB-LOH signature, the Luminal A signature, the 13ql4.2 RBI locus and TF across 7 patients with paired plasma samples (baseline vs post-CDK4/6 inhibitor treatment). P-values (p) were determined by two-tailed paired t-tests.

Figure 3. ctDNA-based profiling of metastatic breast cancer, (a) Unsupervised cluster analysis of 178 plasma samples with a TF>3% (columns) and the scores of 150 ctDNA-based signatures (rows). Orange and violet colors represent scores above and below the median score of the signature across the dataset. Below the array tree, the IHC subtype and the PAM50 molecular subtype are shown for each sample. Four clusters of samples (clusters 1 to 4) were identified. Within cluster 2, two subgroups of samples were also identified (clusters 2A and 2B). (b) Expression of 2 tissue PAM50 RNA-based signatures (i.e., Luminal A and HER2-enriched) in cluster 3 versus the other clusters. This analysis was performed in 107 paired plasma and tumor tissue samples. Of note, 58 tumor tissue samples were obtained at the same timepoint as the plasma sample and 49 tumor tissue samples were obtained at different timepoints prior to obtaining the plasma sample, (c) Unsupervised cluster heatmap analysis of Pearson's correlation coefficients obtained by comparing the scores of the top individual ctDNA-based signature versus the scores of each individual PAM50 RNA-based tissue signature across 58 matched- timepoint paired plasma-tissue cases, (d) Detail of an unsupervised cluster heatmap analysis representing the correlation coefficients obtained by comparing the scores of luminal -related and proliferation-related ctDNA-based signatures versus the log2 values of the mRNA expression of each of the 771 genes.

Figure 4. DNA-based tumor profiles in tissue samples and association with clinical outcomes, (a) Unsupervised cluster analysis of 1,689 tumor samples (columns) from METABRIC dataset and the 150 DNA-based signatures scores (rows). Orange and violet colors represent scores above and below the median value of the signature across the dataset. Below the array tree, the InctClust classification and the PAM50 molecular subtypes are shown for each sample. The 4 clusters are shown below the data matrix, (b) PAM50 molecular subtype distribution across the 4 DNA-based clusters in 1,517 breast tumors of the METABRIC dataset, (c) TP53 mutation distribution across the 4 DNA-based clusters in 1,517 breast tumors of the METABRIC dataset, (d) Kaplan-Meier curves of DFS (left) and OS (right) of the 4 DNA-based clusters assessed in all tumors (n=l,683) and HR+/HER2 -negative tumors (n=l,131) of the METABRIC database.

Figure 5. Kaplan-Meier curves for PFS (progression-free survival). Kaplan-Meier curves of PFS the 4 ctDNA-based clusters (determined with the 16 segments in Table 5) in 152 patients with HR+/HER2- metastatic breast cancer treated with CDK4/6 inhibitors plus endocrine therapy.

Figure 6. Kaplan-Meier curves for DFS (disease-free survival) (METABRIC). Kaplan-Meier curves of DFS the 4 ctDNA-based clusters (determined with the 16 segments in Table 5) in 1,131 HR+/HER2 -negative tumors of the METABRIC database.

Detailed description of the invention

The present invention is illustrated by means of the Examples set below, without the intention of limiting its scope of protection.

Example 1. Material and Methods

Example 1.1. Study participants and samples

We collected baseline pre-treatment blood plasma samples from 124 patients with HR+/HER2- advanced breast cancer treated with endocrine therapy in combination with a CDK4/6 inhibitor (i.e., palbociclib, ribociclib or abemaciclib) at Hospital Clinic of Barcelona between the years of 2018 and 2021. All plasma samples were obtained before the start of treatment. In 7 patients, we obtained an additional plasma sample after progressing while on therapy.

To complement the blood plasma dataset of 124 patients treated with endocrine therapy and a CDK4/6 inhibitor, we collected 121 additional plasma samples from patients treated at Hospital Clinic of Barcelona at different stages of the disease: 85 plasmas from 77 patients with advanced HR+/HER2- breast cancer, 19 plasmas from 16 patients with HER2+ advanced breast cancer, 17 plasmas from 16 patients with advanced TNBC and 1 plasma from 1 patient with unknown ER and HER2 status. In addition, FFPE tumor tissues from 110 patients with available plasma samples were collected, including 71 patients treated with endocrine therapy and a CDK4/6 inhibitor. Finally, we collected FFPE tumors from 17 patients with HR+/HER2- advanced breast cancer who did not have plasma samples but were treated with endocrine therapy and a CDK4/6 inhibitor.

The hospital institutional ethics committee approved the study in accordance with the principles of Good Clinical Practice, the Declaration of Helsinki, and other applicable local regulations. Written informed consent was obtained from all patients before enrolment. The medical records were retrospectively reviewed to obtain the necessary clinical data.

Example 1.2. DNA-sequencing of plasma samples

Approximately 30 mL of peripheral blood was collected into K2-EDTA Vacutainer tubes (Becton Dickinson) and plasma isolation was performed within 2 hours of blood collection through two centrifugation steps. Centrifugation at l,600xg for 10 minutes at 4°C separated plasma from peripheral-blood cells. Approximately 12 mL of plasma were obtained per patient, which were subsequently centrifugated at 16,000xg for 10 minutes at 4°C to remove the residual supernatant and any remaining contaminants including cells. Plasma samples were then aliquoted in 1.5 mL tubes and immediately stored at -80 °C. cfDNA was obtained from 3 mL of plasma using the QIAamp Circulating Nucleic Acid Kit (QIAGEN Inc.) according to the manufacturer’s instructions and quantified with a Qubit dsDNA high-sensitivity assay kit and the Qubit 4.0 fluorometer (Life Technologies, Carlsbad, CA, USA). cfDNA was concentrated using SpeedVac to fulfil the requirements for library preparation. Library preparation was performed by ligating unique dual indexes (UDI) custom adapters to a minimum of 10 ng of the isolated cfDNA (10-50 ng dsDNA). More specifically, the fragment ends of cfDNA were blunted and 5’ phosphorylated and, after that, 3’ ends were A-tailed to favour adapter ligation. Adapters were 10 bp - UDI as recommended to mitigate errors introduced by index-hopping or switching in Illumina instruments with patterned flow cells, such as the NovaSeq 6000. Indexed libraries were quantified by qPCR using the KAPA Library Quantification Kit (Roche Sequencing Solutions), pooled, and sequenced in a NovaSeq 6000 Illumina at 0.5x mean coverage with read length of 2 x 150 bp. ShWGS was analyzed with hmmcopy_utils (https://github.com/shahcompbio/hmmcopy_utils) and ichorCNA v0.2.0 (https://github.com/broadinstitute/ichorCNA), with a bin size of 500kb and default parameters.

Example 1.3. DNA-sequencing of FFPE tumor samples

DNA obtained from FFPE-derived tissues was purified with the QIAamp DNA FFPE Tissue kit (QIAGEN Inc.) for all samples available, following manufacturer’s instructions. Quantification was performed with a Qubit dsDNA broad-range assay kit and the Qubit 4.0 fluorometer (Life Technologies, Carlsbad, CA, USA). A minimum of 100 ng of extracted DNA was processed for library preparation using a custom hybridization-based capture panel targeting 435 genes with reported somatic mutations in different tumor types (VHIO-300 v4 panel) performed with Agilent SureSelectXT Low Input Target Enrichment System (Agilent Technologies, Inc). Indexed libraries were quantified by qPCR using the KAPA Library Quantification Kit (Roche Sequencing Solutions), pooled and sequenced in a HiSeq 2500 Illumina (2 x lOObp) at an average coverage of 500x. Reads were aligned to the hgl9 reference genome with BWA, applied GATK base quality score recalibration, indel realignment, duplicate removal, and performed variant calling using VarScan2 (v2.4.3) and Mutect2 (v4.1.0.0) with the following parameters: minimum variant allele frequency (VAF) of 5% for single nucleotide variants (SNVs) and 10% for Indels. Germline variants were excluded by filtering with single nucleotide polymorphisms (SNP) databases.

Example 1.4. DNA-based signature estimation

For both tumor DNA sequencing and plasma cell-free ctDNA sequencing, segmentation files from CNVkit output (for tumor DNA) and ichorCNA output (ctDNA) were first mapped to gene-level feature. Values from 514 DNA segments were then determined as described in Xia et al [Xia Y, Fan C, Hoadley KA, Parker JS, Perou CM. Genetic determinants of the molecular portraits of epithelial cancers. Nature Communications 2019; 10(1): 5666 doi 10.1038/s41467- 019-13588-2], Briefly, each segment score was calculated as the mean copy number score across genes within the segment. The coefficients of DNA segments for predicting gene signatures were obtained from Xia et al. DNA-based signature scores were calculated as the weighted average of DNA segment values for each sample.

For ctDNA, TF and tumor ploidy were estimated by ichorCNA. For ctDNA samples with TF>0, TF and tumor ploidy adjusted signature scores were calculated by first adjusting copy number values in ichorCNA segmentation file: adjusted copy number ratio = log2(logR_copy_number/tumor_ploidy). Then DNA-based signature scores were derived the same as described for tumor tissue. For calculating the number of altered segments, we used arbitrary gain/loss threshold of +/- 0.07 for unadjusted segment values and 0.32/-0.42 for adjusted segment values [Xia Y, Fan C, Hoadley KA, Parker JS, Perou CM. Genetic determinants of the molecular portraits of epithelial cancers. Nature Communications 2019; 10(l):5666 doi 10.1038/s41467-019-13588-2~\. Segments with values above the gain threshold or below the loss threshold were called altered.

Example 1.5. Gene expression analysis of FFPE tumor samples

RNA was extracted using the High Pure FFPET RNA isolation kit (Roche, Indianapolis, IN, USA) following manufacturer’s protocol. One to five 10-pm FFPE slides depending on tumor cellularity were used for each tumor sample, and macrodissection was performed, when needed, to avoid normal tissue contamination. A minimum of ~100 ng of total RNA was analyzed on the nCounter platform (Nanostring Technologies, Seattle, USA) using the 770-gene Breast Cancer 360™ Gene Panel, which includes the 50 PAM50 genes. Gene expression for each sample was independently normalized to the geometric mean of 5 housekeeping (ACTB, MRPL19, PSMC4, PPLPO, and SF3AP). Research-based PAM50 subtyping was performed as previously described.

Example 1.6. METABRIC breast cancer dataset

Clinical -pathological data was obtained from cbioportal. Processed DNA segment values were downloaded, and DNA-based signature scores were calculated as the weighted average of DNA segment values for each sample.

Example 1.7. A DNA-based 4 subtype predictor

To identify the 4 subtype clusters using DNA-based data, we selected signatures that were significantly differentially expressed across the 4 clusters identified in ctDNA using a multi-class significance analysis of microarrays (SAM) with < 5% FDR. Then we used the selected gene list and calculated 4 centroids from the training data. For every new sample in METABRIC, we calculated the Euclidean distances to the 4 centroids and assigned a cluster class to each sample based on the nearest centroid.

Example 1.8. General statistical procedures

Categorical variables were expressed as number (%) and compared by % 2 test or Fisher's exact test. Differentially expressed signatures between two groups were identified using a two-class unpaired SAM with a FDR<5%. Differentially expressed signatures between two timepoints (i.e., baseline versus post-progression to endocrine therapy and a CDK4/6 inhibitor) were identified using a two-class paired SAM with an FDR<5%. Estimates of survival were from the Kaplan-Meier curves and tests of differences by the log-rank test. Univariate and multivariable Cox models for PFS and OS were used to test the prognostic significance of each variable. The Bonferroni correction method was used to control the family-wise error rate in case of multiple comparisons. PFS was defined as the period from initiation of endocrine therapy and a CDK4/6 inhibitor until disease progression or date of last follow-up. OS was defined as the period from initiation of endocrine therapy and a CDK4/6 inhibitor until death or date of last follow-up. All cluster analyses were displayed using Java Treeview version 1.1.3. Average linkage hierarchical clustering was performed using Cluster v3.0. Two-sided p-values <0.05 were considered statistically significant. Statistical computations were carried out in R 4.0.3 (http://cran.r- project.org).

Example 2. Results

To demonstrate that ctDNA can capture complex tumor phenotypes, shallow whole genome sequencing (shWGS) was performed on 209 plasma samples from 174 patients with advanced hormone receptor-positive and HER2 -negative breast cancer (HR+/HER2-). Additional samples from the other clinical subtypes were also assayed including 19 plasma samples from 16 patients with HER2 -positive (HER2+) breast cancer, 17 plasma samples from 16 patients with triplenegative breast cancer (TNBC) and 1 plasma sample from 1 patient with unknown HR and HER2 status were also included.

Example 2.1. Plasma tumor fraction

From 246 plasma samples (Figure la), 178 (72.4%) had a Tumor cell Fraction (TF) of >3% (range 4-84%; median 9.4%), according to ichorCNA. In plasma samples with a TF>3%, we calculated the scores for each of the 150 previously reported Elastic Net Regression analysis based DNA signatures that predict tumor RNA and protein phenotypes [Xia Y, Fan C, Hoadley KA, Parker JS, Perou CM. Genetic determinants of the molecular portraits of epithelial cancers. Nature Communications 2019; 10(1): 5666 doi 10.1038/s41467-019-13588-2], noting all signatures/models were applied exactly as previously reported; thus, in these cases, the 246 samples can be considered a ‘test/validation’ dataset. TF as a continuous variable was found strongly correlated with the number of altered DNA copy-number segments found in each sample (Pearson’s rho = 0.76; Figure lb). Strong correlations (i.e., Pearson's rho >0.70 or <- 0.70) with TF were also identified in 46 of 150 (31.0%) ctDNA-based signatures, most of which were tracking biological processes associated with Luminal B (i.e., high TF) versus Luminal A (i.e., low TF) disease. This result reaffirms the hypothesis that TF not only reflects the amount of disease burden in each patient but also its biological aggressiveness. As expected, adjustment of the tumor copy-number signal detected in plasma by the TF in each sample decreased the strength of association between TF and the number of altered copy-number segments, and between TF and each ctDNA-based signature score (Figure lb).

Example 2.2. Plasma versus tissue DNA-based signatures

We next explored the correlation of each of the 150 DNA-based signatures determined using plasma ctDNA versus tumor DNA across 54 patients with available paired sample-types obtained at different timepoints (Figure 1c). Tumor DNA sequencing was performed from formalin-fixed paraffin-embedded (FFPE) tumors using a capture-based approach that covers the entire chromosomal landscape while the ctDNA shWGS was not FFPE DNA. Across all 150 signatures, the average correlation coefficient was 0.40 (range 0.02 to 0.66) and 40 signatures (26.7%) had a correlation coefficient >0.50. When the correlations were evaluated in 27 cases where plasma and tumor were obtained within a timeframe of <8.0 weeks, the number of signatures with a correlation coefficient >0.50 was 63 (42% versus 19.3% in the 27 cases where plasma and tumor were obtained within >8.0 weeks; p-value<0.001). Overall, these results suggest a moderate association between ctDNA-based and tumor DNA-based signatures across timepoints and DNA sequencing approaches (i.e., ctDNA shWGS versus capture-based using FFPE DNAs).

Example 2.3. ctDNA-based signatures versus tissue ER and HER2 status

Estrogen receptor (ER) expression by immunohistochemistry (IHC), and HER2 overexpression by IHC and/or amplification by in-situ hybridization, are key biological features of breast cancer. To evaluate the relationship between ctDNA-based information and ER or HER2 tumor clinical biomarker status we evaluated the association of each of the 150 ctDNA-based signatures with either ER clinical status (i.e., positive versus negative) or HER2 status (i.e., positive versus negative) in the 177 samples with TF>3% and that had tumor ER and HER2 IHC available. As expected, ctDNA-based signatures tracking luminal biological processes (e.g., luminal-cluster- signature) and GSEA-median-GP7-estrogen-signaling) were found enriched in ER+ disease (p<0.001; false discovery rate [FDR]<1%; highest AUC=0.77) compared to ER-negative disease (Figure Id). Similarly, ctDNA-based signatures tracking HER2 expression or amplification (e.g., HER2-signature and HER2-amplified-HER2-amplicon) were found significantly enriched (p<0.001; FDR<1%; highest AUC=0.72) in HER2+ disease compared to HER2-negative disease (Figure Id). Overall, these results suggest that ctDNA-based profiling captures and predicts specific phenotypic tumor traits.

Example 2.4. Prognosis of ctDNA-based signatures

To evaluate the association of ctDNA-based signatures with prognosis, we evaluated baseline pre-treatment plasma samples from 124 patients with advanced HR+/HER2- breast cancer treated with endocrine therapy and a CDK4/6 inhibitor (Figure 2a). Eighty-seven plasma samples had a TF>3%. The median follow-up was 12.5 months (range 1.0 to 56.7 months), and most patients were defined as endocrine-sensitive (83.9%) and treated in the first-line setting (59.8%) (Table 16).

Table 16. Baseline clinical characteristics of patients with HR+/HER2- advanced disease treated with endocrine therapy and a CDK4/6 inhibitor.

Median age (range) 61 (34-86) 62 (34-86)

Setting in advanced disease

1 st line 71 57.30% 52 59.77%

2 nd line 27 21.80% 15 17.24%

>3 rd line 26 21.00% 20 22.99%

Type of metastasis

Visceral metastasis 66 53.20% 34 39.08%

De novo metastasis 30 24.20% 19 21.84% Bone only 12 9.70% 0 0.00%

ECOG Performance status

0 50 40.30% 38 43.68%

1 61 49.20% 39 44.83%

2 12 9.70% 10 11.49%

Unknown 1 0.80% 0 0.00%

From the 150 ctDNA-based signatures, 36 (24%) and 37 (25%) were found significantly associated with progression-free survival (PFS) and overall survival (OS), respectively, and 27 (18%) signatures were found significantly associated with both PFS and OS. In general, signatures associated with poor survival outcome were those hypothesized to be tracking proliferation- and non-ER+/non-luminal-related biological processes, such as the MM_p53null. Luminal (i.e., TP 53 -deficient) and MM_Myc signatures (i.e., high MYC/MYC amplification). Conversely, ctDNA-signatures associated with better outcome were tracking luminal A-related biological processes.

Consistent with the known mechanism of resistance of CDK4/6 inhibitors, high enrichment of a signature tracking RB-LOH was associated with poor outcome and treatment response (Figure 2b-c). The DNA-based RB-LOH signature is composed of 224 copy number features, including amplification of 2p (e.g., ETV6), 3q (e.g., PIK3CA), 8q (e.g., MYC), 20q (e.g., AURKA) and 21q (e.g., TMPRSS2 and ERG), and deletion of 2q (e.g., PARD3B), 4q, 5q, 12q, 13q (e.g., RBI), 15q and 17p. As expected, the direction (i.e., amplification or deletion) and strength (i.e., coefficient) of the 48 main features of the original tissue-based DNA RB-LOH signature [Xia Y, Fan C, Hoadley KA, Parker JS, Perou CM. Genetic determinants of the molecular portraits of epithelial cancers. Nature Communications 2019; 10(1): 5666 doi 10. 1038 s41467- 19-13588-2] were properly detected in ctDNA (correlation coefficient = 0.75, p-value<0.001; Figure 2d). Finally, the association of the ctDNA RB-LOH signature with PFS and OS was independent of TF (as a continuous variable), type of CDK4/6 inhibitor, line of treatment (first line versus second line versus later lines), presence of visceral disease and number of metastasis (Figure 2e).

Example 2.5. ctDNA RB-LOH signature versus ctDNA RBI individual region

The DNA-based RB-LOH signature considers the signal of the RBI locus (13ql4.2) among 224 other features [Xia Y, Fan C, Hoadley KA, Parker JS, Perou CM. Genetic determinants of the molecular portraits of epithelial cancers. Nature Communications 2019; 10(l):5666 doi 10.1038/841467-019-13588-2 . The correlation coefficient between the ctDNA signal of 13ql4.2 and the ctDNA RB-LOH signature score was -0.12 across the 178 samples with TF>3%. In the previous cohort of patients with advanced HR+/HER2- breast cancer treated with endocrine therapy and a CDK4/6 inhibitor, the ctDNA signal of the individual 13ql4.2 segment was not significantly associated with PFS (p-value=0.061) but was significantly associated with OS (p- value=0.020). However, the RB-LOH signature was the only variable significantly associated with PFS and OS in a bivariate cox model. Overall, the RB-LOH ctDNA-based signature better captured the clinical behavior than did an individual DNA region looking only at RBI, thus highlighting the power of a multi-feature algorithm for sensing pathway activity.

Example 2.6. Signal from individual DNA segments as prognostic drivers

To further understand the prognostic value of individual DNA segments, we focused on the 27 ctDNA-based signatures (see Table 3 and Table 4 above) significantly associated with both PFS and OS. From each signature, we evaluated the signal from 534 DNA segments and their original weights. We identified 16 DNA segments whose weights are high (i.e., defined as >0.10 or <- 0.10) in at least 11 of 40 (40%) signatures. Among them, deletion of 13ql4.2 (where RBI is located) and 17pl3.1 (e.g., TP53) and amplification of 8q24.21 (e.g., MYC) and 12pl3.33 (e.g., FOXM1). Next, we evaluated the association of each of the 16 DNA segments with PFS in 87 patients with advanced HR+/HER2- breast cancer treated with endocrine therapy and a CDK4/6 inhibitor. The signal from 3 of 16 (18.8%) DNA segments were associated with PFS. We then combined signals from 2 segments (i.e., a total of 120 different combinations), and evaluated their individual association with PFS. A total of 40 different combinations of 120 (30%) were significantly associated with PFS and all 16 DNA segments were found in at least 1 combination (see Table 2, Table 4 and Table 5 above). Example 2.7. Prognosis of RB-LOH in tumor versus plasma

To compare the prognostic value of the DNA-based RB-LOH signature when determined in tumor versus plasma, 63 of 124 patients (51.0%) with advanced HR+/HER2- breast cancer treated with endocrine therapy and a CDK4/6 inhibitor had paired tumor DNA samples (Figure 2e). In a univariate analysis, both RB-LOH tumor and ctDNA plasma signatures (as continuous variables) were significantly associated with PFS and OS. When both signatures were evaluated head-to-head in a bivariate cox model, the RB-LOH ctDNA plasma signature was found significantly associated with PFS, but not the RB-LOH tumor signature (Figure 2e). Overall, baseline pre-treatment ctDNA better captures the prognosis of patients than archival tumor tissue DNA.

Example 2.8. Capturing biological features before and after endocrine therapy and CDK4/6 inhibition

Scores from the 150 ctDNA-based signatures were evaluated in paired plasma samples (i.e., baseline versus post-treatment after progressive disease) across 7 patients with advanced HR+/HER2- breast cancer treated with endocrine therapy and a CDK4/6 inhibitor (Figure 2f). Among them, 103 signatures (57.2%) were found differentially enriched between the two timepoints (FDR<5%). As might be expected, enrichment of signatures tracking non- luminal/proliferati on-related biological processes (e.g., RB-LOH) and luminal A related biological processes were found significantly increased and decreased, respectively, in posttreatment samples compared to pre-treatment samples (Figure 2f). Of note, TF did not significantly change between the two timepoints across the 7 patients (Figure 2f), and 1 patient with a substantial decrease in TF still showed an increase in the RB-LOH score and a decrease of the luminal A signature. These biological changes identified in ctDNA are concordant with similar biological changes identified across 18 patients with paired tumor-based RNA expression before and at progression to endocrine therapy and a CDK4/6 inhibitor. Specifically, PAM50 Luminal A and proliferation signatures were found significantly decreased and increased, respectively, in progression samples.

Example. 2.9. ctDNA-based tumor profiling

To explore the biology identified by the 150 ctDNA-based signatures, we performed an unsupervised hierarchical cluster analysis of all 150 signatures across 178 plasma samples with a TF>3% (Figure 3a). Four main clusters of samples were identified using consensus clustering plus; Clusters 3 and 4 showed high scores of ctDNA-based proliferation-related signatures and low scores of differentiation status and of luminal A-related signatures. Compared to Cluster 4, Cluster 3 showed high expression of basal-like gene expression subtype related biology (p- value<0.001). Cluster 2 showed high enrichment of differentiation and luminal B-related signatures, and low enrichment of basal-like related biology. Visually, cluster 2 could be further subdivided (minimum 20 samples and a correlation coefficient >0.75) into Cluster2A and Cluster2B, both of which showed differences in the enrichment of ctDNA-based proliferation features, and luminal A-related signatures. Consistent with the Luminal A-related biology identified in Cluster 2A, this group was characterized by 16p amplification and 16q deletion, both of which are known features of low-grade and low-proliferative breast cancers. Finally, Cluster 1 showed low enrichment of proliferation and luminal B-related signatures and high enrichment of luminal A subtype related signatures. Plasma TF in Cluster 1 was significantly lower compared to the other clusters combined (average 6.8% vs 9.8%, p-value<0.001; Figure 3a), which might be predicted for slow growing luminal A tumors.

Example.2.10. ctDNA-based data versus tissue RNA-based expression data

RNA-based expression data from FFPE tissue using a research-based PAM50 intrinsic subtype assay was available for 108 cases with a TF >3% in plasma. Tissue samples were obtained at various timepoints. As expected, Cluster 3 was enriched for tumors with a PAM50 non-luminal subtype (i.e., HER2-enriched or Basal-like) compared to the other clusters (85.7% versus 25%, p-value<0.001). Concordant with this finding, the PAM50 Luminal A and HER2-enriched signatures (as a continuous variable) were found differentially expressed in Cluster 3 versus the other clusters (Figure 3b). In addition, we observed that Cluster 2B was enriched for PAM50 Luminal B tumors compared to Cluster 2A (53.3% versus 18.8%, p-value=0.044).

To further explore the association between ctDNA enrichments and RNA expression data, we evaluated the correlation of 6 PAM50 RNA-based tissue signatures with each of the 150 ctDNA- based signatures. To summarize these results, we plotted in an unsupervised cluster analysis the correlation coefficients of the most correlated signatures across 58 matched-timepoint paired plasma-tissue cases (Figure 3c). In general, ctDNA-based signatures were positively and negatively correlated with the known biology that each PAM50 subtype signature is hypothesized to be tracking. For example, a ctDNA-based signature tracking RB-LOH gene expression signature, which is enriched in E2F target genes and that tracks tumor proliferation rates, was found positively correlated to the PAM50 RNA-based Basal-like, HER2-enriched and proliferation tumor signatures, and negatively correlated to the PAM50 RNA-based Luminal A tumor signature (Figure 3c).

Expression of 771 genes in tumors using the nCounter Breast Cancer 360 Panel was determined in 107 cases with a TF >3% in plasma. Correlation coefficients of the mRNA expression of each individual gene with each 150 ctDNA-based signature scores were also determined. Like the PAM50 RNA tumor signatures, the mRNA expression of luminal genes (e.g., ESRI and GATA3) was positively correlated with luminal ctDNA-based signatures, while proliferation and cell cycle-related genes by mRNA (e.g., MKI67, AURKA, TTK, E2F1 and CCNEE) were positively correlated with proliferation-related ctDNA-based signatures (Figure 3d). Overall, these findings confirm that DNA-copy number-based signatures coming from ctDNA can track the main breast cancer phenotypes and their known gene expression features.

Example 2.11. Prognosis of ctDNA-based tumor subtypes

This work demonstrates that tumor profiles identify samples with similar patterns of expression, and these patterns are associated with clinically relevant genotypes. We then hypothesized that ctDNA-defined subtypes, representing repeatably observed combinations of these patterns, may explain variation in clinical outcome. To explore this, we evaluated the prognostic value of the 4 ctDNA-based tumor groups (Figure 3a). Compared to Clusters 1-2-4 (as a group), Cluster 3 was significantly associated with worse PFS (median 2.4 vs. 11.6 months; hazard ratio=9.23; 95% confidence interval [CI] 3.1-27.3; p-vahie<0.001). Regarding OS, a tendency was observed which did not reach statistical significance (median 16.8 vs. 55.4 months; hazard ratio=2.73; 95% CI 0.79-9.38; p-value=0.112). As expected, the expression of the ctDNA-based RB-LOH signature was significantly higher in Cluster 3 compared to the other clusters (p<0.001).

Example 2.12. DNA-based tumor subtypes in tissue samples

To explore how tumor subtypes identified in ctDNA data perform when applied to tumor tissue DNA, the scores of the 150 DNA-based signatures were determined and evaluated using tumor DNAs from 1,038 patients with early-stage breast cancer from the publicly available METABRIC dataset (Figure 4a). We developed a 4-class subtype classifier from the ctDNA groups (Figure 3a) and applied this predictor onto METABRIC’s tumor DNA data. Overall, the 4 main clusters were identified in METABRIC, including Cluster 1, suggesting that the profiles identified in ctDNA are observed in primary tumors. Concordant with this finding, the PAM50 non-luminal subtypes in the METABRIC cohort were enriched in Cluster 3 compared to the other clusters (79.6% versus 11.6%, p-value<0.001) (Figure 4b). In addition, TP53 somatic mutations in METABRIC were also enriched in Cluster 3 compared to the other clusters (80.7% versus 51.2%, p-value<0.001) (Figure 4c). Finally, the 4 main clusters were found significantly associated with disease-free survival (DFS) and OS in all patients, and in patients with HR+/HER2 -negative breast cancer (Figure 4d).

Example. 2.13. Analysis to identify the 4-subtype clustering

We also performed an analysis to identify the 4 subtype clustering using the 16 segments (see Table 2) that were significantly differentially expressed across the 4 clusters identified in ctDNA using a multi-class significance analysis of microarrays (SAM) with < 5% FDR. Then we used the selected gene list and calculated 4 centroids from the training data. For every new sample, we calculated the Euclidean distances to the 4 centroids and assigned a cluster class to each sample based on the nearest centroid.

We have performed an analysis comparing the clusters obtained using the 150 gene signatures and the clusters obtained using the 16 segments. To assess the association between these two sets of clusters, we conducted a Chi-squared test of independence. The results of the Chi-squared test showed a p-value of < 2.2e-16, indicating a statistically significant association between the two sets of clusters.

The prognostic value of the 4 ctDNA-based tumor groups in the CDK plasma- 1 and CDK plasma-2 combined (n=152). Compared to Cluster 1 (median PFS=27.6 months), Clusters 2, 3 and 4 were found significantly associated with worse PFS (median PFS of 9.5, 5.8 and 7.4 months, respectively) (pvalue<0.0005). See Figure 5 showing Kaplan-Meier curves for PFS. The prognostic value of the DNA-subtypes were validated in DNA from tumor tissue using the METABRIC-HR+/HER2- (n=1131, DFS) (Figure 6).