Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR PREDICTING THE EFFICACY OF CANCER THERAPY
Document Type and Number:
WIPO Patent Application WO/2009/074968
Kind Code:
A2
Abstract:
The present invention relates to a method and a kit for predicting the efficacy of cancer therapy in a subject who has undergone or is undergoing chemotherapy treatment for cancer.

Inventors:
FARMER PIERRE (FR)
DELORENZI MAURO (CH)
BONNEFOI HERVE (FR)
IGGO RICHARD (FR)
Application Number:
PCT/IB2008/055252
Publication Date:
June 18, 2009
Filing Date:
December 12, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ECOLE POLYTECH (CH)
FARMER PIERRE (FR)
DELORENZI MAURO (CH)
BONNEFOI HERVE (FR)
IGGO RICHARD (FR)
International Classes:
C12Q1/68
Other References:
FARMER P ET AL: "A stroma-related gene signature predicts resistance to epirubicin-containing neoadjuvant chemotherapy in breast cancer" BREAST CANCER RESEARCH AND TREATMENT, vol. 106, no. Suppl. 1, 3 November 2007 (2007-11-03), page S11, XP002531722 & 30TH ANNUAL SAN ANTONIO BREAST CANCER SYMPOSIUM; SAN ANTONIO, TX, USA; DECEMBER 13 -16, 2007 ISSN: 0167-6806
CLEATOR SUSAN J ET AL: "The effect of the stromal component of breast tumours on prediction of clinical outcome using gene expression microarray analysis." BREAST CANCER RESEARCH : BCR 2006, vol. 8, no. 3, 2006, page R32, XP002531723 ISSN: 1465-542X
"AFFYMETRIX CATALOG; PRODUKT: HUMAN GENOME U133A ARRAY; ARRAY FINDER" AFFYMETRIX PRODUCT CATALOG, XX, XX, 1 July 2002 (2002-07-01), page 1, XP002267612
NEWTON TANYA R ET AL: "Expression profiling correlates with treatment response in women with advanced serous epithelial ovarian cancer" INTERNATIONAL JOURNAL OF CANCER, vol. 119, no. 4, August 2006 (2006-08), pages 875-883, XP002531724 ISSN: 0020-7136
KÖNINGER JÖRG ET AL: "Overexpressed decorin in pancreatic cancer: potential tumor growth inhibition and attenuation of chemotherapeutic action." CLINICAL CANCER RESEARCH : AN OFFICIAL JOURNAL OF THE AMERICAN ASSOCIATION FOR CANCER RESEARCH 15 JUL 2004, vol. 10, no. 14, 15 July 2004 (2004-07-15), pages 4776-4783, XP002531725 ISSN: 1078-0432
WEST ROBERT B ET AL: "Determination of stromal signatures in breast carcinoma" PLOS BIOLOGY, vol. 3, no. 6, June 2005 (2005-06), pages 1101-1110, XP002531726 ISSN: 1544-9173(print) 1545-7885(ele
BONNEFOI HERVÉ ET AL: "Validation of gene signatures that predict the response of breast cancer to neoadjuvant chemotherapy: a substudy of the EORTC 10994/BIG 00-01 clinical trial." THE LANCET ONCOLOGY DEC 2007, vol. 8, no. 12, 14 November 2007 (2007-11-14), pages 1071-1078, XP002531727 ISSN: 1474-5488
VEER VAN 'T L J ET AL: "Gene expression profiling predicts clinical outcome of breast cancer" NATURE, NATURE PUBLISHING GROUP, LONDON, UK, vol. 415, no. 6871, 31 January 2002 (2002-01-31), pages 530-536, XP002259781 ISSN: 0028-0836
Attorney, Agent or Firm:
KATZAROV SA (Geneva, Geneva, CH)
Download PDF:
Claims:

CLAIMS

1. A method for predicting the efficacy of cancer therapy in a subject who has undergone or is undergoing chemotherapy treatment for cancer, characterized in that said method comprises (a) obtaining a stromal tissue sample from a subject,

(b) determining in said stromal tissue sample the expression values of at least two stromal genes and of at least two reference genes,

(c) defining stromal content (SC) from the expression values of step (b),

(d) comparing the stromal content (SC) with a reference threshold, (e) predicting resistance to chemotherapy of said subject based on the step (d), wherein high stromal content is indicative of resistance to chemotherapy, while low stromal content is indicative of sensitivity to chemotherapy, (f) adapting the treatment of said subject

2. The method of claim 1, wherein cancer is selected from the group comprising breast cancer, colon cancer, king cancer, colorectal cancer, head and neck cancer, or ovarian cancer.

3. The method of claims 1 to 2, wherein the chemotherapy is anthracycline-based neoadjuvant chemotherapy.

4. The method of claims 1 to 3, wherein the chemotherapy is selected from the group comprising fluorouracil, epirubicin and cyclophosphamide based chemotherapy or combination thereof.

5. The method of claims 1 to 4, wherein said stromal tissue sample is taken from a tumour biopsy.

6. The method of claims 1 to 5, wherein said stromal genes are selected from the group consisting of the genes of Table 4.

7. The method of claim 6, wherein said stromal genes are DCN, CSPG2, CDHl 1, ASPN, SPARC, ITGBLl, PLAU, COL1A2, SNAI2, POSTN and THBS2.

8. The method of claims 1 to 7, wherein said reference genes are selected from the group consisting of the genes of Table 5.

9. The method of claim 8, wherein said reference genes are GAPDH, ACTB, TCF2, ZNF333, ADH6, FOXHl, TPX2 1, CENPA, BIRC5, TOP2A.

10. The method of claims 1 to 9, wherein said reference genes are genes selected from the group consisting of the genes of Table 4 or Table 5 determined within stromal tissue of non- tumor reference biopsy.

11. The method of claims 1 to 10, wherein determining in said stromal tissue sample the expression values of at least 5 stromal genes and of at least 5 reference genes.

12. The method of claims 1 to 11, wherein the determining of expression values of said stromal genes and said reference genes is obtained by detecting mRNA levels of said stromal genes and said reference genes.

13. A kit for predicting the efficacy of cancer therapy in a subject who has undergone or is undergoing chemotherapy treatment for cancer, characterized in that said kit comprises (a) a reagent for detecting mRNA levels of at least two stromal genes selected from the group consisting of the genes of Table 4 and of at least two reference genes selected from the group consisting of the genes of Table 5 in a stromal tissue sample from a subject, and (b) an instruction sheet.

14. The kit of claim 13, wherein said reagent comprises buffers and premeasured portions of probes that hybridize to mRNA of at least two stromal genes of claim 6 and to mRNA of at least two reference genes of claim 8.

15. The kit of claims 13 to 14, further comprising a reagent for preparing and processing a stromal tissue sample from the subject.

Description:

METHOD FOR PREDICTING THE EFFICACY OF CANCER THERAPY

FIELD OF THE INVENTION

The present invention relates to a method and a kit for predicting the efficacy of cancer therapy in a subject who has undergone or is undergoing chemotherapy treatment for cancer.

BACKGROUND OF THE INVENTION

Many cancer patients are diagnosed at a stage in which the cancer is too far advanced to be cured, and most cancer treatments are effective in only a minority of patients undergoing therapy. Therefore, there has been much interest in cancer signatures (herein biomarkers) in predicting future patterns of disease, especially as cancer treatment has made such positive strides in the last few years. Cancer signatures provide a powerful and dynamic approach to understanding the spectrum of malignancies with applications in observational and analytic epidemiology, randomized clinical trials, screening, diagnosis and prognosis. Defined as alterations in the constituents of tissues or body fluids, these signatures offer a means for homogeneous classification of a disease and risk factor, and they can extend one's basic information about the underlying pathogenesis of disease. The goals in cancer research include finding signatures (biomarkers) that can be used for the early detection of cancers, predict the efficacy of a cancer therapy and to identify underlying processes involved in the disease.

For example breast cancer is the most common malignancy among women, and has one of the highest fatality rates of all cancers affecting females. In fact, breast cancer remains the leading cause of cancer deaths in women aged 20-59. Adjuvant systemic chemotherapy for breast cancer decreases the risk of relapse and improves overall survival 1 by 10 to 50% depending on patient's age and tumour's estrogen receptors (ER) status of the tumour 1 . When administrating chemotherapy before surgery, a complete disappearance of the invasive component of the primary tumour (complete pathological response; pCR) is observed in approximately 10% of ER positive tumours and 20 to 30% of ER negative tumours 2"5 , thus breast cancers have heterogeneous sensitivity to chemotherapy. Complete pathological

response correlates with a longer survival and is therefore considered as a surrogate measure of chemosensitivity 2 ' 4 ' 6 8 .

In December 2006, Potti et al. 8 developed predictive signatures based on the in vitro response of cell lines to chemotherapy and validated these signatures in clinical datasets. However, the use of cell lines has the disadvantage of ignoring the influence of the tumour stroma microenvironment on drug response. Indeed, recent pre-clinical findings showed the influence of extracellular matrix proteins, such as fibronectin and laminin, on cell line sensitivity to cytotoxic drug and radiation 12"15 . However, these signatures did not give new insights in the understanding of chemosensitivity or resistance mechanisms in vivo.

In many cases of anticancer therapies, biomarkers are critical to predict efficacy of the therapy for individual subjects. Biomarkers can be used to predict efficacy before treatment or can be monitored to predict the therapeutic response shortly after initiation of treatment. These biomarkers are useful to select appropriate subjects for the therapy and to save remaining subjects, in whom the therapy is unlikely to exhibit any clinical benefit, from unnecessary side effects and costs. Although a variety of gene alterations have been identified, no single biomarker can reliably predict response to therapy and outcome. Thef e stiJl exists a need for additional sets of bioπurkers for individuals having cancers, Currently, there is no diagnostic test, used in clinic that predicts sensitivity of subjects with cancer to various chemotherapy regimens. Therefore, it would be highly desirable to be able to identify whether a subject with cancer, and in particular with breast cancer or colorectal cancer, who has undergone or is undergoing chemotherapy treatment for cancer will be responsive to chemotherapy, in particular anthracycline-based neo-adjuvant chemotherapy.

SUMMARY OF THE INVENTION

This object has been achieved by the Applicants in the present invention which provides for a method for predicting the efficacy of cancer therapy in a subject who has undergone or is undergoing chemotherapy treatment for cancer, characterized in that said method comprises

(a) obtaining a stromal tissue sample from a subject,

(b) determining in said stromal tissue sample the expression values of at least two stromal genes and of at least two reference genes,

(c) defining stromal content (SC) from the expression values of step (b),

(d) comparing the stromal content (SC) with a reference threshold,

(e) predicting resistance to chemotherapy of said subject based on the step (d), wherein high stromal content is indicative of resistance to chemotherapy, while low stromal content is indicative of sensitivity to chemotherapy,

(f) adapting the treatment of said subject

The present invention further provides for a kit for predicting the efficacy of cancer therapy in a subject who has undergone or is undergoing chemotherapy treatment for cancer, characterized in that said kit comprises

(a) a reagent for detecting mRNA levels of at least two stromal genes selected from the group consisting of the genes of Table 4 and of at least two reference genes selected from the group consisting of the genes of Table 5 in a stromal tissue sample from a subject, and

(b) an instruction sheet.

BRIEF DESCRIPTION OF THE FIGURES

Figure 1. Flow chart of experimental procedures. Flow chart showing the steps in the procedure for selecting and testing gene expression modules identified with the multiple regression model. The brackets indicate the dataset used for a particular step.

Figure 2. Gene Set Enrichment Analysis (GSEA) confirms that expression modules are associated to their respective biological process. The GSEA measures the distributional bias of a subset of genes defined a priori within a larger ordered list of genes. Nine independent tests were performed, one for each of the 9 biological process included in the study. All reference gene subsets were taken from MSigDM

((http://www.broad.mit.edu/gsea/). For each test, all genes of the NKI-EMC matrix were ordered (from left to right) and weighted according to their expression similarity to the representative gene indicated below the panel. The distributional bias of genes of the geneset indicated above each panel was measured. A positive relationship between the gene ordering and the tested geneset is visualized by a leftward distribution bias. Results are expressed as percentage of maximal theoretical score. P-values were obtained empirically by randomizing 100 000 times the composition of the genes included in the geneset.

Figure 3. Heatmap and response data for stromal genes. The expression data for the individual genes are shown in the heatmap after mean centring. Panel (A) Results observed in the EORTC study and in (B) the MDA study. The lower panels (C) and (D) show the stroma metagene score for the EORTC and MDA studies for the. The score is defined as the mean expression value of the 50 genes included in the metagene. Patient ordering in panels (A) and (B) are maintained in (C) and (D). pCR, red; non-pCR, blue.

Figure 4. Influence of the relative decision threshold on classification statistics. The upper and lower panels represent results obtained in the EORTC and MDA datasets respectively. The vertical gray line represents the decision threshold that maximizes the Youden index in the EORTC dataset and the associated circle symbols for each line indicate the points used for the classification statistic reported in table 2 c. The horizontal axis represents the various decision thresholds (cutoff) used to make the PCR-nPCR prediction. Accuracy (ACC), Sensitivity (SEN), Specificity (SPE), Positive predictive value (PPV), Negative predictive value (NPV) and Youden index (YOU).

Figure 5. Prediction of pCR using metagenes derived from the DCN expression module.

The horizontal axis represents different metagenes composed of genes chosen in decreasing order of association to the representative gene in the NKI-EMC dataset; window position 1 is constructed with the first 15 genes, window position 2, with the next non-overlapping 15 genes). Red points are metagenes where all genes in the expression module are significantly associated with the representative gene (p ≤ 0.05). Black points are metagenes where at least one gene is not significantly associated with the representative gene. The vertical axis is the AUC of the ROC curve for predicting pCR. When the AUC is over 0.5, the metagene predicts resistance or response better than chance. The error bars show the 95% confidence interval of the bootstrap.

Figure 6. Kaplan Meier survival analysis of patients in the Amsterdam, Rotterdam and Duke datasets. For all panels, patients were divided into 2 groups based on their metagene score and survival of patients with tumours falling into the upper (red) and lower (blue) halves was compared. The NKI (n=189) 21 and EMC (n=286) 36 datasets comprise breast cancer patients that received no chemotherapy while all patients of the Duke dataset (n = 120)

received chemotherapy. Panels (A, B, C): The metagene score for the proliferation module was used to split the patients. Panels (D, E, F): The metagene score for the stroma was used to split patients. Significance was determined with the log rank statistics.

Figure 7. Activated stroma is associated to response to FEC chemotherapy. Panels(A) and (B): Scatter plots of metagene scores for individual tumours. The stroma metagene is plotted against SFT (A) or DTF (B) metagene scores derived from the published gene lists. Each point represents a single tumour: pCR, red; non-pCR, blue. Panel C: Gene Set Experiment Analysis (GSEA) measuring the distribution of Wnt target genes within the DCN expression module. All genes of the NKI-EMC expression matrix were ranked (from left to right) and weighted according to the meta-analytical t statistics for the DCN representative gene. Results are expressed as percentage of maximal theoretical score. Significance was determined by 100 000 permutations of the ranks of the DCN expression module. Panel (D): Stripchart of epithelial-stroma specific scores for selected gene lists. Scores were calculated as follows: Laser Dissection Microscopy (LDM) was performed on 3 colon carcinoma patients. For each patient, 2 fractions were isolated: (1) cancerous epithelial cells and reactive stroma. The score is the log ratio between the average expression in the epithelial fraction and average expression in the reactive stroma. A negative score suggest that the gene is more specifically expressed in reactive stroma compared to epithelial cells. Each symbol represents the mean value of an individual gene of the list; Red vertical bar represent the average score of all genes of the list. Significance was determined by randomly selecting the same number of genes, then comparing their mean score to the observed mean score. A total of 1000 permutations were performed. NBS and NBE are gene lists of normal mammary fibroblast and normal mammary epithelial tissues; DTF, desmoid-type fibromatosis and SFT solitary fibrous tumour.

Figure 8. Stripchart of epithelial-Cancer Associated Fibroblasts specific scores in function of selected gene lists. Scores were calculated as follows: Laser Dissection Microdisection (LDM) was performed on 3 colon carcinoma patients. Scores are defined as Average expression in the epithelial fraction minus average expression in the CAF fraction. A negative score suggest that the gene is more specifically expressed in CAF relative to epithelial cells. Each symbol represents the mean value of an individual gene of the list; Red vertical bar represent the average score of all genes of the list. Significance, determined

empirically was determined the probability that the observed average score be generated by the same number of randomly selected genes. A total of 1000 permutations were performed. NBS and NBE are respectively gene lists of normal mammary fibroblast and epithelial tissues; DTF, desmoid-type fibromatosis and SFT solitary fibrous tumour.

Figure 9. Prediction of pCR using metagenes derived from the DCN signature. Tumors from the EORTC dataset are ranked according to the probability of pCR after conversion of the metagene scores into probabilities by logistic regression. The metagene scores are the averages of the expression level of genes included in the DCN signatures. Open symbol (white) npCR, Closed symbol (black) pCR.

Figure 10. Prediction of pCR using metagenes derived from the DCN signature. Tumors from the MDA dataset are ranked according to the probability of pCR after conversion of the metagene scores into probabilities by logistic regression. The metagene scores are the averages of the expression level of genes included in the DCN signatures. Open symbol (white) npCR, Closed symbol (black) pCR.

Figure 11. Stromal signature associated with response in an independent cohort of 23 rectal carcinoma patients treated with preoperative fluorouracil (1,000 mg/m2/d), as single agent

DETAILED DESCRIPTION OF THE INVENTION

Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. The publications and applications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. In addition, the materials, methods, and examples are illustrative only and are not intended to be limiting.

In the case of conflict, the present specification, including definitions, will control.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of skill in art to which the subject matter herein belongs. As used herein, the following definitions are supplied in order to facilitate the understanding of the present invention.

The term "comprise" is generally used in the sense of include, that is to say permitting the presence of one or more features or components.

As used in the specification and claims, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a cell" includes a plurality of cells, including mixtures thereof. The term "a gene" includes a plurality of genes.

As herein used, "a gene signature" is used to designate a specific combination of genes, which serves as a biomarker for a specific phenotype, state or outcome, herein chemosensitivity.

The abbreviation "FEC" refers to a combination of 5-Fluorouracil, Epirubicin, and Cyclophosphamide, a widely prescribed pre-operative chemotherapy regimen.

The abbreviation "LDM" refers to laser dissection microscopy.

The abbreviation "TOP2A" refers to TOPOISOMERASE, DNA, II, ALPHA. Enzymes that control and alter the topologic states of DNA in both prokaryotes and eukaryotes. Topoisomerase II from eukaryotic cells catalyzes the relaxation of supercoiled DNA molecules, catenation, decatenation, knotting, and unknotting of circular DNA. It appears likely that the reaction catalyzed by topoisomerase II involves the crossing-over of 2 DNA segments.

The abbreviation "T-FAC" refers to a chemotherapy regimen (i.e. a combination of chemotherapy drugs) given to breast cancer patients. T-FAC is the acronym for Taxotere® (Docetaxel), Fluorouracil® (5-FU), Adriamycin® (doxorubicin), Cytoxan® (cyclophosphamide).

The abbreviation "NKI-EMC" refers to two external publicly available data sets, the van de Vijver 3 (n=295, Agilent platform, obtained from author's web-site) and Wang 4 (n=286, Affymetrix platfrom, GEO:GSE2034) datasets, used herein to define the expression modules. A total of 10317 genes could be cross-matched between these two platforms and were used to define the expression modules.

The term "Expression Module" is used herein to designate a group of genes significantly associated, in terms of similarity of expression, with that of the prototype gene.

The abbreviation "pCR" refers to Pathological Complete Response, i.e. disappearance of the tumour after treatment, with at most scattered tumour cells detected by the pathologist in the resection specimen.

The abbreviation "npCR" refers to Non Pathological Complete Response.

The abbreviation "MIAME" refers to Minimum Information About a Microarray Experiment, an international standard for annotation of microarray data.

The term "Multiple Regression" designs linear regression with two or more independent variables.

As used herein, the term "prototype gene" (or "representative gene") is used to designate a gene identified as a typically representative of a large, highly correlated cluster of genes. These cluster genes were observed to be made of functionally related genes that consistently cluster together regardless of the dataset being analyzed.

As used herein, the term "Metagene" designates the average expression value of a subset of genes, all belonging to the same expression module. Therefore a metagene is a "virtual" gene that summarizes the information contained in many real genes into a single value per sample.

The abbreviation "MDA dataset" is herein used to designate as follows: the nine metagenes were tested for their ability to predict pathological complete response by using the area under the receiver operating characteristic curves (AUC) as a measure of association (Fig. 1 step 5).

An AUC value significantly different from random association (AUC=0.5) was observed for the interferon signalling and stromal metagenes (Tab. Ia). These two signatures were then tested in an independent cohort of ER negative tumours (referred to here as the "MDA" dataset).

The term "HUGO name" designates the unique name given by the Human Genome Organisation to a gene. Use of HUGO names is encouraged in order to prevent confusion when, as is often the case, multiple different names have been used in the literature.

The term "sensitivity" is herein used to measure the ability of a classification function to predict pCR when it is truly present. Sensitivity is the proportion of all pCR for whom there is a positive prediction, determined as the number of true positive divided by the sum of true positive + false negatives.

The term "specificity" is herein used to measure the ability of a classification function to predict the absence of pCR when a patient is truly npCR. Specificity is the proportion of npCR patients from whom there is a correct prediction, expressed as the number of true negatives divided by the sum of true negative + false positives.

As herein used, the term "ROC curve" is Receiver Operating Characteristic (curve), a plot of [sensitivity] vs [1 -specificity] for a classification function. The AUC (area under the ROC curve) is a useful global measure for how well the two classes are separated, independent from a particular threshold.

The abbreviation "PPV" is Positive Predictive Value. It is the proportion of the correct decisions among the cases declared positive by a particular classification function and a particular threshold, that is the ratio of the true positives to the number of positive calls.

The abbreviation "NPV" refers to Negative Predictive Value. It is the proportion of predicted nPCR patients (negative tests) that are truly nPCR

The term "Bootstrapping" refers to sampling with replacement from a set of data to produce simulated data sets and approximately determine the variability of a parameter estimate. The

term "95% confidence interval of the AUC" is used here for the range of values bounded by the 2.5 th and 97.5 th centiles of the bootstrap distribution of the AUC.

The abbreviation "DTF" refers to Desmoid-Type Fibromatosis. A form of fibroblastic tumour occurring in patients with germline mutations in the adenomatous polyposis coli gene (APC).

The abbreviation "SFT" is herein used to designate Solitary Fibrous Tumour. A generally benign tumour of fibroblasts.

The abbreviation "CAF" refers to Cancer-Associated Fibroblasts

As herein used, the term "Multiple regression for the selection of expression modules" refers to:

GENE 1 = β Ol + β h ESRl + β 2l CLCA2 + β 3l FABP4 + β 4l GZMA + β 5l CD83 + β 6l MXl + β 7 iDCN + βsiADM + β 9l TPX2 + S 1 (sample's index are removed for clarity), wherein

GENE;: the expression of gene i is an independent variable of the linear model. Its variation over the set of profiled tumours is decomposed into linear terms given by the prototype genes as explanatory variables. ESRl, CLCA2, FABP4, GZMA, CD83, MXl, DCN, ADM, TPX2: the expression vectors of the nine prototype genes. βoi: the intercept term for gene i β j i: the regression coefficient for prototype j and gene i. It is a measure of the correlation between the expression vector of the genes j and i, adjusted by the presence of the other explanatory variables in the model.

Z 1 : the unexplained variation (residual) term for gene i.

HUGO names of the prototypes (the independent variables):

ESRl: estrogen receptor α, classic marker that distinguishes ER+ and ER- breast cancer subtype

CLCA2: chloride-activated calcium channel 2 cells). CLC A2 is a marker that, conjointly with ESRl, distinguishes the molecular apocrine subtype postulated by Farmer et al 10 from the luminal and basal subtypes

FABP4: fatty acid binding protein 4, a marker for adipocytes GZMA: granzyme A, a marker for T lymphocytes CD83: cluster of Differentiation 83, a marker for B lymphocytes MXl: myxovirus resistance gene 1, a marker for interferon signalling DCN: decorin, a marker for stroma

ADM: adrenomedullin, a marker for hypoxia

TPX2: aurora kinase targeting subunit, a marker for proliferation.

The term "chemotherapy" generally refers to a treatment of a cancer using specific chemotherapeutic/chemical agents. A chemotherapeutic agent refers to a pharmaceutical agent generally used for treating cancer. The chemotherapeutic agents for treating cancer include, for example, cisplatin, carboplatin, etoposide, vincristine, cyclophosphamide, doxorubicin, ifosfamide, paclitaxel, gemcitabine, fluorouracil and docetaxel. More specifically, the chemotherapeutic agents of the present invention include 5-fluorouracil, epirubicin, and cyclophosphamide or combinations thereof (for example "FEC").

The term "adapting the treatment" generally refers to the choice of a treatment among different options, based on the specificities of the disease, concomitant pathologies or patient conditions, or the switch from one treatment to another in the course of the therapy because of the non-response, progression or resistance of the disease to the initial treatment, with the intent to offer to the patients the best treatment for their diseases under the given circumstances.

"Stromal tissue" as referred herein is the supportive tissue of an epithelial organ, tumor, gonad, etc., consisting of connective tissues and blood vessels.

As used herein the terms "subject" or "patient" are well-recognized in the art, and, are used interchangeably herein to refer to a mammal, including dog, cat, rat, mouse, monkey, cow, horse, goat, sheep, pig, camel, and, most preferably, a human. In some embodiments, the subject is a subject in need of treatment. However, in other embodiments, the subject can be a normal subject. The term does not denote a particular age or sex. Thus, adult and newborn subjects, as well as fetuses, whether male or female, are intended to be covered. Patient or

subject are used interchangeably and refer to a subject with a disease or disorder. The term patient or subject includes human and veterinary subjects.

As used herein the term "biomarker" is virtually any detectable compound, such as, but not limited to, a protein, a peptide, a carbohydrate, a lipid, or a nucleic acid (e.g., DNA, such as cDNA or amplified DNA, or RNA, such as mRNA), that is present in or derived from a biological sample. "Derived from" as used in this context refers to a compound that, when detected, is indicative of a particular molecule being present in the biological sample. For example, detection of a particular mRNA can be indicative of the presence of the expression of a particular gene in the biological sample. A biomarker can, for example, be isolated from the biological sample, directly measured in the biological sample, or detected in or determined to be in the biological sample. "Biological sample" can be serum, blood, peripheral blood cells, plasma, saliva, amniotic fluid, synovial fluid, lacrimal fluid, milk, lymph and tissue. The tissue is usually a biopsy or surgical specimen taken at tumor removal.

As used herein the term "predicting the efficacy" means to assess the reaction of a cancer to treatment with chemotherapy, i.e to assess the ability of a cancer to respond favourably or to resists to the chemotherapy.

The standard approach to identify a biomarker is to randomly split the dataset into two parts: a learning and a validation set. The learning set is used to identify genes differentially expressed in the two phenotypes using a variable selection method, for example a t-test. Using this methodological framework, biomarkers for pathological complete response (pCR) to FEC neoadjuvant therapy were constructed and tested in three-fold cross-validation. However, statistically significance was never observed. The Applicants noticed that the selected genes varied considerably depending on which tumours were present in the learning set. This gene selection instability may account for the failure to identify a discriminatory signature in this case 16 . To circumvent this difficulty, the Applicants developed a new strategy that aimed to explicitly test the association of the most prominent cancer specific gene clusters with the phenotype of interest.

Figure 1 illustrates the strategy used to construct metagene associated with biological processes prominent in cancer gene expression data. The Applicants 1? identified nine major clusters of co-expressed genes. These clusters are related to epithelial tumour types (luminal,

basal and molecular apocrine), cell physiology (proliferation, hypoxia and interferon signalling) and the tumour microenvironment (T and B cells, adipocytes and stroma) and have been described by others 18~21 (Figure 1 stepl). The Applicants aimed to identify groups of genes comprised in these nine clusters using an automated and non-biased procedure. This was achieved by identifying, a priori, a single "representative-gene" (prototype genes) that is typical of each cluster (Figure 1 step 2). These representative-genes (prototype genes) are included as the explanatory variables of a multi-linear regression model in order to identify other genes that are most strongly associated with them (see Example 1). The linear model was not fitted to the Applicants' data but to a large external dataset (NKI-EMC; see Example 1) comprising 583 tumours hybridized to two different types of microarray (Figure 1 step 3). Importantly, no information regarding response to therapy for the NKI-EMC patients was used in this process. The use of a large external dataset brings the advantages of a higher statistical power to correctly identify the most strongly correlated genes and a lower chance of observing a spurious correlation specific to a single study. For each representative gene, the 50 genes most strongly associated with its respective representative-gene were identified and map to the EORTC dataset. For each sample of the EORTC dataset, the expression values of these 50 genes were averaged to generate a single summary value per biological process. The Applicants refer to this value as a "metagene" (Figure 1 step 4). As a result of this procedure the information contained in the initial nine clusters of genes was condensed into 9 metagenes.

The ability of the linear model to single out functionally related genes was verified. For each of the studied process, The Applicants have identified a corresponding entry in the Molecular Signatures Database (MsigDB), a database of manually curated gene sets 22 . For each test, all genes of the NKI-EMC matrix were ordered and weighted according to their expression similarity to the representative-gene. In each case, gene set enrichment analysis (GSEA) revealed a significant association (see Figure 2). The Applicants conclude that each metagene is, as expected, a measure of the activity of the intended biological process at the gene expression level.

The nine metagenes were tested for their ability to predict pathological complete response by using the area under the receiver operating characteristic curves (AUC) as a measure of association (Figure 1 step 5). This statistic was chosen as it is not dependent on any decision threshold. An AUC value significantly different from random association (AUC=O.5) was observed for the interferon signalling and stromal metagenes (Table IA). These two

signatures were then tested in an independent cohort of ER negative tumours (referred to here as the "MDA" dataset) included in a recent study of response to T-FAC chemotherapy 10 (Figure 1 step 6). The stromal metagene was significantly associated with response in this independent dataset (AUC = 0.70; p < 0.01), whereas the interferon metagene was not (Table IB). Figure 3 shows that the stromal genes have a coherent expression pattern in both the EORTC and the MDA studies. The mean pairwise correlation of stroma genes in the EORTC dataset was 0.55 (p < 0.0001). High stromal gene expression is associated with resistance to chemotherapy (ie, absence of a pCR) (Figure 3c and d). Similar range in expression (about 6 fold in absolute change) was observed in the two datasets meaning that the stromal metagene reveals strong differences between these tumours. The datasets were sufficiently similar that a logistic regression model trained on the EORTC data could be successfully applied, with identical model coefficients and decision threshold, to the MDA data. A significant odds ratio of 3.41 (p < 0.05) was observed using a decision threshold set at 0.38 (Table 1C). The decision threshold was determined by maximizing the Youden index (specificity + sensitivity -1) on the EORTC data only. The impact of varying the decision threshold on PPV, NPV, sensitivity and specificity statistics is presented in Figure 4. The positive likelihood ratio, a good measure of the added information, is equal to 1.7 and 1.6, respectively, for the EORTC and MDA datasets. The maximum observed PPVs and NPVs are respectively 77% and 86% in the EORTC data and 76% and 89 % in the MDA data, indicating that the current classifier is better predicting resistance then sensitivity. A likely reason is that multiple different mechanisms can confer resistance to anthracycline-based chemotherapy. See also Figures 9 and 10.

Table 1. Prediction of pCR using metagene signatures. The 95% confidence intervals were calculated from the AUC distribution over 1000 cycles of bootstrapping. (A) Prediction of pCR using metagene signatures in the EORTC dataset. P-values were determined empirically and adjusted for False Discovery Rate. (B) Validation of the interferon and stroma metagenes in the MDA study. (C) Prediction of pCR using stromal signature using a logistic regression is trained on the EORTC data and applied on the MDA dataset. The accuracy (ACC), sensitivity (SENS), specificity (SPEC), positive (PPV) and negative (NPV) predictive value and Odds Ratio (OR) were determined at a probability threshold of 0.38 defined by maximizing the Youden index on the EORTC dataset. P-value of a fisher exact test is reported.

Table 1 A. EORTC

Representative AUC P-Value

Biological Process HUGO Names Gene [CI. 95%] (FDR)

Luminal-Basal ESRl Estrogen receptor alpha 0.53 [0.37-0.68] 0.40

Apocrine CLCA2 Chloride-Associated Calcium Channel 2 0.46 [0.32-0.59] 0.70

Stroma DCN Decorin 0.68 [0.54-0.80] 0.03

T CeIl GZMA Granzyme A 0.62 [0.48-0.75] 0.14

B CeIl CD83 CD83 Antigen 0.58 [0.44-0.71] 0.23

Adipocyte FABP4 Fatty Acid Binding Protein 4 0.54 [0.38-0.68] 0.40

Proliferation TPX2 TPX2 microtubule-associated homologue 0.55 [0.39-0.69] 0.40

Interferon MXl Myxovirus resistance gene 1 0.72 [0.59-0.84] < 0.01

Hypoxia ADM Adrenomedulin 0.59 [0.44-0.73] 0.23

B.MDA

Representative AUC

Biological Process HUGO Names P-Value Gene [CI. 95%]

Stroma DCN Decorin 0.70 [0.52-0.85] 0.01 Interferon MXl Myxovirus resistance gene 1 0.50 [0.33-0.67] 0.60

C.

Dataset ACC PPV SENS PNV SPEC OR P-Value

EORTC 0.65 0.57 0.86 0.81 0.49 5.51 0.01 MDA 0.65 0.64 0.78 0.67 0.50 3.41 0.05

The relationship of the stromal metagene with the clinical variables histological grade, node status and tumour size, and ERBB2 status were tested for association with pCR by logistic regression (Table 2). The stromal metagene was the only significant variable in the univariate model in both datasets. The multivariate model showed that the stromal metagene is an independent predictive factor in both the EORTC and the MDA datasets. Thus, the stroma metagene does not detect biological information measured by other clinical variables.

It was shown that ER negative tumours were comprised of least two different molecular subtype (basal and ERBB2 or molecular apocrine) 17 ' 19 . The ERBB2-molecular apocrine class was too small to test, but a significant association of the stromal metagene with response was observed in the basal molecular type in both the EORTC dataset (AUC = 0.69 [0.51-0.84]; p < 0.02; N=39) and the MDA dataset (AUC= 0.73 [0.54-0.92]; p < 0.01; N=27). This result shows that stromal metagene signature is not confounded with different tumour molecular classes found within ER-negative breast cancers.

Table 2. Multivariate analysis of predictive factors for Pathological Complete Response (pCR). Univariate and multivariate logistic regression models for the Stroma signature and other clinical variables.

Table 2

EORTC Univariate Multivariate

Variable Coef P Value Coef P-Value

Clinical Node (cNO vs cNl & cN2) -0.099 0.862 0.014 0.982

Clinical Size (cT2 & cT2 vs cT3) -0.076 0.895 -0.222 0.736

Grade (Grade 1 & 2 vs 3) 1.030 0.080 0.923 0.141

ERBB2 (Low vs High ) -0.588 0.362 0.114 0.882

DCN Metagene (Low vs High) -1.658 0.022 1.673 0.036

MDA Univariate Multivariate

Variable Coef P Value Coef P-Value

Clinical Node (cNO vs cNl & cN2) 0.789 0.231 1.844 0.044

Clinical Size (cT2 & cT2 vs cT3) -0.357 0.540 -1.698 0.040

Grade (Grade 1 & 2 vs 3) 1.191 0.181 1.009 0.333

ERBB2 (Neg vs Pos ) 0.577 0.336 0.489 0.477

DCN Metagene (Low vs High) 1.217 0.043 1.605 0.039

The Applicants' results show that the predictive value of the stromal metagene is independent of other clinical variables and this signature remains predictive when tested with tumours restricted to the basal molecular-type.

Using the output of the multiple regression model, all 10317 genes of the NKI-EMC datasets were ranked according to the strength of their association with the stromal representative gene decorin (DCN). Rather than using the top 50 genes, as done previously, the Applicants constructed a series of metagenes from non-overlapping groups of 15 genes (see Figure 5). Nine of the first 12 metagenes, involving a total of 180 genes, gave an AUC significantly greater than 0.5, with a maximum of 0.70 in the EORTC dataset. The Applicants found out that the predictive value f the stromal metagene is not limited to the 50 genes but can be broaden to other genes of the same biological process (see Table 4).

The stromal metagene is detecting intrinsically more aggressive tumours that would have more chance to resist chemotherapy (i.e. npCR). To verify this possibility, the impact of the stromal metagene on relapse-free survival was investigated in three cohorts of patients, one

treated with chemotherapy, the two others not. As a positive control, a proliferation metagene known to be associated with high grade and poor survival 20 was used. Patients in the NKI and EMC studies who did not receive either chemotherapy or hormonal therapy were taken as the reference untreated population. Patients were split into two equally sized groups based on the value of their metagenes. The proliferation metagene divided patients into 2 groups with significantly different survival in both datasets (Figure 6a and 6b) while the stromal metagene does not (Figure 6d and 6e). In the cohort of breast cancer patients that received adjuvant chemotherapy 23 the group of patients having higher stromal metagene signature showed significantly shorter relapse free survival (Figure 6f). This observation shows that the stromal metagene is a predictive rather than a prognostic signature and reinforces the hypothesis that the main association is between high stromal content and resistance to chemotherapy.

The Applicants have tested the ability of other published stroma-related gene lists to predict reponse to chemotherapy. West et al 24 compared two neoplastic conditions with fibroblastic features: desmoid-type fibromatosis (DTF) and solitary fibrous tumours (SFT). The former arise in patients with mutations in the adenomatous polyposis coli gene that leads to activation of Wnt signalling in these tumours. The set of genes up-regulated in DTF is enriched in genes involved in the fibrotic response, including genes for matrix remodelling, whereas the set up- regulated in SFT includes more basement membrane genes and lacks remodelling genes 24 .

The DTF but not the SFT signature predicts response in the EORTC and MDA datasets (see Table 3). The Applicants' stromal metagene is closely related to the DTF metagene but not to the SFT metagene, with correlation coefficients respectively of 0.91 and 0.41 in the EORTC dataset (Figure 7 a and b). GSEA confirmed a significant relationship (p < 0.01) between the stromal metagene and a published list of Wnt target genes (Figure 7c). This shows that it is not just a difference in the amount of fibroblasts but a difference in the type of stroma that is associated with response.

Table 3. Prediction of pCR using stroma related gene list from the literature. The 95% confidence intervals were calculated from the AUC distribution over 1000 cycles of bootstrapping.

Table 3

Eortc - AUC [CI. MDA- AUC [CI.

Name Nb Genes Reference 95%] 95%]

West et al.

DTF 415 2005 0.66 [0.51-0.79] 0.67 [0.51-0.81]

West et al.

SFT 194 2005 0.52 [0.38-0.66] 0.53 [0.37-0.69]

Finale et al.

LDM-Stroma 143 2006 0.53 [0.37-0.68] 0.62 [0.45-0.76]

Further evidence linking the stromal metagene with the stromal tissue in tumours was obtained from gene expression profiles of samples in an ongoing study of colon carcinoma. Well- defined tumour epithelial tissue and stromal tissue were isolated from tumours of three patients by laser dissection microscopy (LDM). Cancer-associated fibroblasts (CAFs) were also isolated. mRNA from the LDM material and CAFs was hybridised to microarrays. To confirm that the microarrayed samples showed the expected expression patterns, genes known to be expressed selectively in normal mammary epithelial cells (Figure 7d, NBE) and normal mammary stroma (Figure 7d NBS) were identified from the literature 25 . Figure 7d shows the "epithelial : stromal" expression ratio of these genes in the microarrayed samples. A negative ratio indicates that genes are more strongly expressed in stromal compared to epithelial tissues. Normal mammary epithelial and stromal genes show positive and negative mean log ratio values of 0.95 and -2.41, respectively. Almost all of the 50 genes included in the stromal metagene were more highly represented in microdissected stromal than epithelial tissue (mean Iog 2 difference -4.68; p < 0.001). The DTF gene list shows a similar pattern. To further confirm that genes included in the stromal metagene are expressed by fibroblasts, rather than some other cellular component of the tumour stroma, the analysis was repeated with purified CAFs in place of microdissected stroma (see Figure 8). The stromal-metagene is shown to be significantly associated with the CAF fraction with a mean Iog 2 difference of -4.12, p < 0.001. The Applicants found out that the stromal metagene does indeed measure the amount of the activated stroma in the tumour.

Thus the Applicants have identified stromal gene signatures that predict poor pathological response to anthracycline-based neo-adjuvant chemotherapy in two independent datasets. These signatures were shown to be a reflection of the activation state of the tumour stroma. Preferably the anthracycline-based neo-adjuvant chemotherapy is the neoadjuvant therapy with both FEC and T-FAC. The Applicants identified stromal genes signature that influences

the response of cancers to anthracycline-based neo-adjuvant chemotherapy. Preferably, the breast cancers are estrogen (ER) and progesterone receptor negative cancers.

The Applicants have identified several specific combinations of stromal genes (Table 4), which are part of the stromal metagene and which are biomarkers for chemosensitivity of cancer subjects to the anthracycline-based neo-adjuvant chemotherapy. Quantified mRNA levels for genes included in the expression signature within tumour biopsies are associated with the cancer sensitivity to chemotherapy. More precisely, high levels of mRNA for genes of the expression signature is indicative of non pathological complete response (resistance) to anthracycline-based neo-adjuvant chemotherapy while low mRNA levels for these genes is associated with pathological complete response (sensitivity) to this treatment.

Table 4 Genes associated with the tumour sensitivity to chemotherapy. The list contains official gene symbol defined by the Human Genome Organisation (HUGO; hUp://ww w .hugo- bi a 'lignc ^ search.pi) :

Moreover, this expression signature predicts resistance to each of individual drugs given in the FEC chemotherapy regimen (i.e. Cyclophosphamide, Adriamycin/Doxorubicin and Fluorouracil). The gene signature predicts chemotherapy regimen composed of drugs having similar mechanism of action such as other pyrimidine analogues (ex: Capecitabine, Cytarabine, Floxuridine, Gemcitabine), purine analogues (Cladribine, Clofarabine,

Fludarabine, Mercaptopurine, Pentostatin, Thioguanine), Cytotoxic/antitumor antibiotics of the anthracyclin family (Daunorubicin, Doxorubicin, Epirubicin, Idarubicin, Mitoxantrone, Valrubicin) or nitrogen mustard agent (Chlorambucil, Chlormethine, Cyclophosphamide, Ifosfamide, Melphalaή).

The present invention further provides for that mRNA levels of genes included in expression signatures was not associated in predicting pathological or clinical response for subjects treated by Epirubicin / Taxotere (ET) chemotherapy regimen. Thus the expression signature will not be associated to pathological or clinical response to drugs having similar mechanism of actions such as spindle poison/mitotic inhibitor of the taxanes family (Docetaxel, Paclitaxel). Therefore, the stromal metagene signature could serve also as a treatment indicator. Subject having a high stromal metagene signature should be treated with taxane based chemotherapy.

The stromal metagene was shown to be associated with relapse-free survival in a third independent study performed in an adjuvant setting. That the signature is not merely detecting a difference in the innate aggressiveness of cancers is confirmed since the stromal metagene signature was not associated with differences in relapse-free survival in the patients in the NKI-EMC datasets who had not been given adjuvant systemic therapy. Thus the stromal metagene signature only predicts response to therapy, meaning it is predictive rather than prognostic.

The mRNA levels of genes included in the gene signatures are also associated with a well define physiological activation state of the tissue surrounding of the tumour termed "stroma activation". The present invention can also be used for any clinical situations where the state of "stroma activation" is required.

The Applicants' results also show that the stroma within the tumour needs to be in an activated state in order to confer resistance to anthracycline-based neo-adjuvant

chemotherapy. Indeed neither the normal stromal expression profile nor the SFT signature was able to predict FEC response. Interestingly, the two signatures that produced a significant prediction (stroma and DTF) comprised genes that are more strongly associated with CAF purified from colon carcinomas (Figure 7). Technical limitations prevented the isolation of the stroma tissues by laser dissection microscopy (LDM) from breast tumours samples used for the microarray study. However, genes shown, by LDM, to be specifically expressed in normal breast stroma and epithelial compartments 25 were equally associated to their respective tumour compartments in colon carcinomas. Furthermore, the stromal genes signatures of the present invention were strongly associated with the stromal compartment of the colon carcinomas suggesting that phenotype modelled by the signature is specific to the stroma tissue regardless of the studied organ.

A precedent, and possible mechanism for a direct role for stroma as a resistance factor, is provided by studies showing that β4-integrin activation renders mammary epithelial cells resistant to apoptosis induction by a wide range of different treatments 26 . Hyaluronic acid was found to promote doxorubicin resistance in the MCF7 human mammary carcinoma cell line 2? . Adhesion of multiple myeloma cells to fibronectin was also shown to provide a survival advantage in the presence of doxorubicin 14 . The same group proved that adhesion to fibronectin by means of β 1 -integrins protects U937 human monocytic leukaemia cells from doxorubicin-induced DNA damage by reducing the activity and modifying the subcellular distribution of topoisomerase II 28 . Growth factors linked to the stromal compartment could also contribute to drug resistance. Fibroblast growth factors were shown to induce paclitaxel, doxorubicin and 5-fluorouracil resistance to human prostate PC3 tumours cells and rat MAT- LyLu tumours cells in vitro 29 while suramin, an inhibitor of FGF and other growth factors, sensitized these tumour cells to doxorubicin both in vitro and in vivo experimental models. Interestingly, despite including TOP2A, the target of the epirubicin, the proliferation metagene was not associated with response. This signature was shown to be functional as it successfully identified patients having the worst prognosis in both the NKI and EMC datasets (Figure 6). The proliferation metagene was not associated with response in the MDA dataset 0.63 [0.39-0.83]. However, a significant AUC (0.68[0.58-78]) was observed when all ER positive tumours were included in the MDA dataset. Therefore, proliferation is a prognostic but not a predictive signature. From these results, the Applicants discovered that the low response rate to chemotherapy observed in ER positive tumours is not explained by their

lower proliferation relative to ER negative tumours but rather due to differences intrinsic to their epithelial biology.

Compared to traditional supervised analysis, building a predictor from expression modules has several advantages. First, testing groups of co-expressed genes related to a defined biological process facilitates the interpretation of the results. Second, random effects in single measurements are tamed by averaging over redundant measurements (because metagene contains many similarly expressed genes), reducing the problem of chance associations between one gene and the response. Over a wide range, the choice of genes among the stromal module used to produce the metagene did not affect the result (Figure 5). This has as a practical consequence that a customised diagnostic kit would not have to include all the genes in the expression module and the genes themselves can be flexibly chosen. Third, the approach reduces the number of variables that are tested, which decreases the probability of observing a significant association by chance. Fourth, performing the multiple regressions on data from two gene expression studies that used different microarray platforms reduces the probability of observing spurious associations. Fifth, the use of metagenes eases cross-platform mapping because many genes are available within each metagene, and the loss of a few genes after mapping does not impair the predictive power of the metagene.

The present invention provides for a method for predicting the efficacy of cancer therapy in a subject who has undergone or is undergoing chemotherapy treatment for cancer, characterized in that said method comprises

(a) obtaining a stromal tissue sample from a subject,

(b) determining in said stromal tissue sample the expression values of at least two stromal genes and of at least two reference genes,

(c) defining stromal content (SC) from the expression values of step (b),

(d) comparing the stromal content (SC) with a reference threshold,

(e) predicting resistance to chemotherapy of said subject based on the step (d), wherein high stromal content is indicative of resistance to chemotherapy, while low stromal content is indicative of sensitivity to chemotherapy,

(f) adapting the treatment of said subject

Preferably said cancer is selected from the group comprising breast cancer, colon cancer, lung cancer, colorectal cancer, head and neck cancer, or ovarian cancer.

Preferably said chemotherapy is anthracycline-based neo-adjuvant chemotherapy and the most preferably said chemotherapy is selected from the group comprising fluorouracil, epirubicin and cyclophosphamide based chemotherapy or combination thereof (two or three compounds together), such as for example "FEC" chemotherapy.

The stromal tissue sample is taken from a tumour biopsy. "Sample" or "tissue sample" refers to a collection of similar cells obtained from a tissue of a subject or patient. The source of the tissue sample may be solid tissue as from a fresh, frozen and/or preserved organ or tissue sample or biopsy or aspirate; blood or any blood constituents; bodily fluids such as cerebral spinal fluid, amniotic fluid, peritoneal fluid, or interstitial fluid; or cells from any time in gestation or development of the subject. The tissue sample may contain compounds which are not naturally intermixed with the tissue in nature such as preservatives, anticoagulants, buffers, fixatives, nutrients, antibiotics, or the like. In one aspect of the invention, tissue samples or patient samples are fixed, particularly conventional formalin-fixed paraffin- embedded samples. Such samples are typically used in an assay for receptor complexes in the form of thin sections, e.g. 3-10 μm thick, of fixed tissue mounted on a microscope slide, or equivalent surface. Such samples also typically undergo a conventional re-hydration procedure, and optionally, an antigen retrieval procedure as a part of, or preliminary to, assay measurements.

Biopsy refers to the removal of a sample of tissue for purposes of diagnosis. For example, a biopsy is from a cancer or tumour, including a sample of tissue from an abnormal area or an entire tumour. A non-limiting list of different types of cancers include lymphoma, B cell lymphoma, T cell lymphoma, mycosis fungoides, Hodgkin's Disease, myeloid leukemia, bladder cancer, brain cancer, nervous system cancer, head and neck cancer, kidney cancer, lung cancers such as small cell lung cancer and non-small cell lung cancer, brain cancers such as neuroblastoma and glioblastoma, ovarian cancer, pancreatic cancer, prostate cancer, liver cancer, melanoma, squamous cell carcinomas, cervical carcinoma, breast cancer, renal cancer, genitourinary cancer, esophageal carcinoma, hematopoietic cancers, testicular cancer, or colon and rectal cancers.

The stromal genes of the present invention are selected from the group consisting of the genes of Table 4. Preferably said stromal genes are DCN, CSPG2, CDHl 1, ASPN, SPARC, ITGBLl, PLAU, COL1A2, SNAI2, POSTN and THBS2.

The reference genes are selected from the group consisting of the genes of Table 5. Preferably said reference genes are GAPDH, ACTB, TCF2, ZNF333, ADH6, FOXHl, TPX2 1, CENPA, BIRC5, TOP2A.

In another embodiment of the present invention, said reference genes are genes selected from the group consisting of the genes of Table 4 or Table 5 determined within stromal tissue of non-tumor reference biopsy.

Preferably the expression values of at least 5 stromal genes and of at least 5 reference genes are determined in the stromal tissue sample. The most preferably the expression values of at least 10 stromal genes and of at least 10 reference genes are determined in the stromal tissue sample.

The said subject is preferably a mammal and the most preferably a human.

Table 5. Reference genes not associated with the tumour sensitivity to chemotherapy.

The list contains official gene symbol defined by the Human Genome Organisation (HUGO; http://www.hugo-international.org/) and gene bank accession numbers http://www.genenames.org/cgi-bin/hgnc search.pl):

The stromal content (SC) can be determined by any technique or calculation method known to the person skilled in the art. For example the stromal content (SC) can be defined as follows:

SC= log 2 [(CTEi + Stromal Metagene Score) / (CTE 2 + Reference Metagene Score)] + CTE 3

The stromal metagene score (equivalent to stromal metagene signature) is a weighted average of the expression values of the stromal genes of Table 4 measured within the tumor biopsy. Stromal Metagene Score = (Stromal _ Gene * ks ) , wherein - n can be any value within the range of 2 (inclusive) and 300 (inclusive)

"Stromal Gene/' represents the expression value of each selected stromal gene, ksj is specific for each stromal gene and defines the importance of the corresponding stromal gene in the calculation of the weighted average of the Stromal Metagene Score. The variable ksj may take any positive real value within the range of zero (inclusive) and 1000 times the maximal expression value of the stromal gene included in the calculation of the Stromal Metagene score.

Preferably the expression value of at least 5 stromal genes is used to calculate the Stromal Metagene Score. The most preferably the expression value of at least 10 stromal genes is used to calculate the Stromal Metagene Score.

The purpose of the variable kS[ is to adjust (or correct) for the difference in expression magnitude between stromal genes and therefore will make these expression values more similar to all other stromal genes included in the calculation of the Stromal Metagene Score.

The variable CTEl may take any real value within the range of plus / minus 1000 times the average of the stromal metagene score. The purpose to the CTEl variable is to adjust for differences in efficiency in extracting the mRNA of stromal genes from the tumor sample relative to the reference genes.

The reference metagene score (equivalent to reference metagene signature) is a weighted average of the expression values of the reference genes of Table 5 measured within the tumor

biopsy or the reference genes of Table 5 and/or the stromal genes of Table 4 measured within stromal tissue of non-tumour reference biopsy (normal non pathological biopsy).

1 " The Reference Metagene Score = = — V (Re ference _ Gene t • kr ) , wherein

n can be any value within the range of 2 (inclusive) and 237 (inclusive) - "Reference Gene t " represents the expression value of each selected reference gene.

- kr t is specific to each reference gene and defines the importance of the corresponding reference gene in the calculation of the weighted average of the Reference Metagene Score. The variable kr t may take any positive real value within the range of zero (inclusive) and 1000 times the maximal expression value of the reference gene included in the calculation the Reference Metagene Score.

Preferably the expression value of at least 5 reference genes is used to calculate the Reference Metagene Score. The most preferably the expression value of at least 10 reference genes is used to calculate the Reference Metagene Score.

The purpose of the variable kr t is to adjust (or correct) for the difference in expression magnitude between reference genes and therefore will make these expression values more similar to all other reference genes included in the calculation of the Reference Metagene Score.

The variable CTE2 may take any real value within the range of plus / minus 1000 times the average of the reference metagene score. The purpose to the CTE2 variable is to adjust for differences in efficiency in extracting the mRNA of reference genes from the tumor sample relative to the stromal genes.

The purpose of the variable CT3 is to adjust for systematic bias due to experimental measurements.

A tumor sample is considered as having high stromal content if the score SC is grater than the threshold THl (i.e. SC > THl), which is indicative of resistance to chemotherapy.

A tumor sample is considered as having low stromal content if the score SC is lower than the threshold TH2 (i.e. SC < TH2), which is indicative of sensitivity to chemotherapy.

The variables THl and TH2 can take any real value between -50 and + 50, depending on the selected stromal genes, reference genes and the method used to determine the expression values of said genes. The purpose THl constant is to adjust for the desire sensitivity and specificity in declaring a tumor sample as having high tumour content. As the threshold THl increases, there will be an increase in the true positive rate when classifying a tumour sample as having high stroma. The purpose of the TH2 constant is to adjust for the desire sensitivity and specificity in declaring a tumour sample as having low tumour content. As the value of TH2 decreases, the higher will be the true positive rate of classifying a sample as having low stroma content. The use of both constant brings the advantage of controlling specificity and selectivity of both samples having low stromal content and high stromal content and thus leaving a security margin for samples having "dubious" (i.e. ambiguous) stromal content, (see Example 2).

Generally the determining of expression values of stromal genes and reference genes of the present invention is obtained by detecting mRNA levels of said stromal genes and said reference genes. Usually the detecting of mRNA levels is obtained through, but mot limited to, Microarray hybridization, real-time polymerase chain reaction, Northern blot, In Situ Hybridization, sequencing-based methods, reverse transcription-polymerase chain reaction, RNA expression microarray or RNAse protection assay.

The present invention also provides for a kit for predicting the efficacy of cancer therapy in a subject who has undergone or is undergoing chemotherapy treatment for cancer, characterized in that said kit comprises (a) a reagent for detecting mRNA levels of at least two stromal genes selected from the group consisting of the genes of Table 4 and of at least two reference genes selected from the group consisting of the genes of Table 5 in a stromal tissue sample from a subject, and (b) an instruction sheet. Said reagent comprises buffers and premeasured portions of probes that hybridize to mRNA of at least two stromal genes of Table 4 and to mRNA of at least two reference genes of Table 5. The kit of the present invention further comprises a reagent for preparing and processing a stromal tissue sample from the subject.

For all cancer patients that require chemotherapy, the kit of the present invention informs clinician of probable therapy outcome. Such kit allows the clinicians to prescribe the

chemotherapy regimen whose probable favourable outcome is highest, based on the result of the prediction test leading to an overall increase in treatment efficacy.

The kit of the present invention can further include reagents for collecting a stromal tissue sample from a subject, such as by biopsy, and reagents for preparing and processing the stromal tissue. The kit can also include one or more reagents for performing a gene expression analysis, such as reagents for performing, but not limited to, to determine mRNA expression levels in a tumor sample. Suitable techniques for the determination of mRNA expression levels can be, but not limited to, Microarray hybridization, real-time polymerase chain reaction, Northern blot, In Situ Hybridization, sequencing-based methods, reverse transcription-polymerase chain reaction, RNA expression microarray or RNAse protection assay. For example Northern hybridization is known in the art. See, for example, Sambrook and Russell, Molecular Cloning: A Laboratory Manual, 3 rd Edition, 2001, Cold Spring Harbor Laboratory Press; and Harlow and Lane, Using Antibodies, supra. For example, probes for performing Northern blot analyses can be included in such kits. Appropriate buffers for the assays can also be included. Detection reagents required for the assay can also be included. The kits featured herein can also include an instruction sheet describing how to perform the assay for measuring gene expression.

Alternatively, the kit can include reagents for detecting protein levels, said proteins being encoded by the stromal genes of the present invention. Such analysis can be performed, but not limited to, Western Blotting, ELISA and Immunohistochemistry.

The instruction sheet can also include instructions for how to define stromal content (SC) and the thresholds, including how to determine expression levels of the reference genes of the present invention in stromal tissue from a tumor biopsy or from a non-pathological reference biopsy. The instruction sheet can also include instructions to subsequently determine the appropriate chemotherapy for the subject. Methods for determining the appropriate chemotherapy are described above and can be described in detail in the instruction sheet.

The kit can contain separate containers, dividers or compartments for the reagents and informational material. A container can be labeled for use for the determination of gene expression levels and the subsequent determination of an appropriate chemotherapy for the human. The informational material of the kits is not limited in its form. In many cases, the

informational material, e.g., instructions, is provided in printed matter, e.g., a printed text, drawing, and/or photograph, e.g., a label or printed sheet. However, the informational material can also be provided in other formats, such as Braille, computer readable material, video recording, or audio recording. Of course, the informational material can also be provided in any combination of formats.

Those skilled in the art will appreciate that the invention described herein is susceptible to variations and modifications other than those specifically described. It is to be understood that the invention includes all such variations and modifications without departing from the spirit or essential characteristics thereof. The invention also includes all of the steps, features, compositions and compounds referred to or indicated in this specification, individually or collectively, and any and all combinations or any two or more of said steps or features. The present disclosure is therefore to be considered as in all aspects illustrated and not restrictive, the scope of the invention being indicated by the appended Claims, and all changes which come within the meaning and range of equivalency are intended to be embraced therein.

Various references are cited throughout this specification, each of which is incorporated herein by reference in its entirety.

The foregoing description will be more fully understood with reference to the following Examples. Such Examples, are, however, exemplary of methods of practising the present invention and are not intended to limit the scope of the invention.

EXAMPLES

EXAMPLE 1

Patient selection and sample processing

This study was performed in the context of a prospective phase III intergroup trial of neoadjuvant chemotherapy (EORTC 10994/BIG 00-01). Eligible patients had no evidence of metastatic disease, and had histologically confirmed large operable invasive tumor or locally advanced breast cancer. This sub-study was restricted to all cases evaluated at the EORTC data center on April 1st, 2005 meeting the following criteria: (1) estrogen receptor negative tumors defined as <10% of tumor cells stained positive for ER by immunohistochemistry of

the pretreatment formalin-fixed biopsy; (2) patients who had completed the planned chemotherapy regimen with no major protocol violation; (3) non T4 tumors; (4) good quality and >200 ng yield of RNA available from a pretreatment frozen biopsy. Ethical approval for the clinical trial and associated translational projects was obtained in all participating institutions. Patients gave signed informed consent for both the clinical and translational studies. Patients randomized to FEC received either six cycles of FEC 100, a European non- taxane regimen consisting of 500 mg/m 2 5-fluorouracil, 100 mg/m 2 epirubicin, 500 mg/m 2 cyclophosphamide, or six cycles of dose escalated FEC (Swedish patients) 30 . At the completion of chemotherapy all patients underwent either tumorectomy or mastectomy. Pathological complete response (pCR) was used as the outcome measure, defined as disappearance of the invasive component of the primary tumor after treatment, with at most scattered tumor cells detected by the pathologist in the resection specimen. Analysis of pCR was performed locally in each centre. All patients had one incisional or two trucut biopsies frozen before starting chemotherapy. Frozen sections of these biopsies were examined centrally by one pathologist and excluded if the tumor cell content was below 20%. RNA was extracted from frozen sections as previously described and hybridized to Affymetrix X3P chips.

Laser dissection microscopy and cell culture Samples were collected from fresh colon tumor tissue from 3 independent patients, embedded in OCT, and frozen by immersion in dry ice/ethanol. 12 μm frozen sections were cut and mounted on membrane slides and stained with hematoxilin and eosin solution. Samples were then processed using a laser dissecting microscope, coupled to a CCD camera. Cancer- associated fibroblast (CAF) cultures were prepared as described in 31 . Total RNA of microdissected samples and cell cultures was extracted and analyzed using HU133 Plus 2.0 chips (Affymetrix, USA) after assessing the quality by Bioanalyzer. A tissue specific score defined as the mean expressions of the epithelial fraction (tumour epithelial cells, EPI) minus the Reatctive Stroma (or CAF) fractions was calculated for each gene of the tested list. Negative value imply that the genes are preferentially expressed in RS (or CAF) compared to EPI. Significance was measured empirically by estimating the probability score (i.e. departure from zero) by randomly selecting the same number of genes and measuring its average score (1000 randomization were performed).

Microarray data analysis

MIAME-compliant data were deposited in the Gene Expression Omnibus database (www.ncbi.nlm.nih.gov/geo) under accession number GSE4779. Raw data were processed with the statistical programming language R (cran.r-project.org), and Bioconductor packages (www.bioconductor.org). Gene expression was normalized with the rma package and transformed to a Iog 2 scale. Four exclusion criteria were applied to all probesets: (1) A consensus sequence inferior to 56 nucleotides, (2) No annotation to a defined Entrez-gene id, (3) A standard deviation inferior to 0.5 across all experiments and (4) in case of multiple probesets representing a single Entrez-gene id, only the most variable was considered. Cross-platform mapping was performed by taking the Entrez-gene id as primary key. Two external publicly available data sets, the van de Vijver 32 (n=295, Agilent platform, obtained from author's web-site) and Wang 33 (n=286, Affymetrix platfrom, GEO:GSE2034) datasets were used to define the expression modules. A total of 10317 genes could be cross-matched between these two platforms and were used to define the expression modules. For convenience, these two datasets are referred as NKI-EMC. Two external validation set were equally used. From the study by Hess et al 34 , the 51 IHC-ER negative patients of the 133 stage 1 -3 breast cancer patients were included and referred to as "MDA". The "CEL" files were obtained from the authors' web page (http://www.bioinformatics.mdanderson.org/pubdata.html) and normalized using rma. The Duke University dataset (DUKE) from the study by BiId et al. 35 was obtained from the authors' web-site (htφ:// jata.cgt.diikc.edu). A total of 120 patients were treated with the following chemotherapies: A3CMF (35), AC4 (11), FAC (48), CMF (26) and total of 44 events. For both the MDA and DUKE datasets, only probesets having a standard variation greater than 0.5 were analysed. When multiple probesets were mapped to the same Entrez- gene id, only the most variable one was kept.

Procedure for analysis

Classifier based on variable selection by association with outcome

A classifier was built by selecting on a training set, the 50 genes most strongly associated with pCR by a two-sample student t-test. The average of these 50 genes was calculated for each tumour. The resulting vector was used to fit a logistic regression model for pCR. Alternatively, the ranksum statistics was used for gene selection. The performance of the classifier was tested on the respective test set in three-fold cross-validation and by pooling the

predictions from the three test sets, so that each sample was classified once. The full cross- validation procedure was repeated 1000 times.

Classifier based on gene modules defined on external data A method was developed to measure a statistical association between large clusters of functionally related genes that were observed repeatedly 36~39 with response using public datasets. For each of these biological processes, a prototype gene was chosen as the representative genes of the cluster. The prototype gene is a gene regularly found grouped with other functionally related genes when clustering different breast cancer profile datasets. A more quantitative system than hierarchical clustering was used to identify groups of genes associated with each prototype. The method is based on a multiple linear regression model. Linear models provide the framework to allow easy adjustment for potential confounding effects and integration of data coming from different technological platforms. This selection of the genes belonging to a module was unsupervised and was performed using the NKI-EMC datasets. The multilinear model was fitted separately for each study and the associated t- statistics were combined using the fixed-effect meta-analytical method 40 . P-values were estimated by random permutation according to the method of Westfall 41 in order to correct for multiple testing. For all expression modules, the number of genes significantly associated (p < 0.05) with the prototype was over 50 genes. The expression of the module genes in the Applicants' data was visualized with heatmaps which are color-coded representations of the mean centred expression matrix. In the definition of the metagenes, the Applicants fixed the number of genes used per module to 50. A sensitivity-analysis to test for the impact of the number of genes to be included was performed. For this, the Applicants ranked the genes by the strength of association with the prototype gene that defines the expression module, which is in decreasing order of the meta-analytical t. Then, the Applicants took from the top consecutive non-overlapping groups of 15 genes and computed their average.

Prediction of Accuracy

The ability of the metagenes to classify the samples by their pCR status, was assessed with the area under the receiver operating characteristic curve (AUC). In all cases, gene selection was performed using external NKI-EMC data only; therefore no pCR-npCR labels were used. In consequence, no cross-validation is needed for unbiased assessment of classification performance. The classification function would be identical as it is not trained on the dataset

where it is applied, that is a cross-validation procedure would give always the same results as the full data set. The 95% confidence intervals for the AUC were estimated by bootstrapping (1000 iterations). P-values were adjusted for multiple testing using the false discovery rate (FDR) method 42 .

Gene Set Enrichment Assay (GSEA)

Nine genesets from the MSigDB database 43 (htt£;/Avww . .to were tested by weighted GSEA as previously described 43 . The genes were ranked and weighted according to the t statistic for each prototype. The test was performed using R statistical software. P-values were obtained empirically by randomizing the ranking 100 000 times the gene composition of the geneset.

Molecular Sub-type Classification

Molecular classification of tumours was performed as previously described 39 . Briefly, the expression profile of each tumour was compared to reference profiles based on the mean expression level of the previously defined Luminal, Basal, Molecular- Apocrine (LAB) gene lists.

Stromal signature and colorectal cancer To confirm the association of the stromal signature with the fluoruracil activity, we have tested if the stromal signature is associated with response in an independent cohort of 23 rectal carcinoma patients u treated with preoperative fluorouracil (1,000 mg/m2/d), as single agent. Response was measured as by T level downsizing and histopathologic tumour regression grading (see Figure 11). Results show a significant association between response to fluorourail (5-FU) and the stromal's metagene scores AUC 0.77; p = 0.032; figure xx). In summary, results show that the stromal signature predicts response to fluorouracil in rectal cancer patients. A significant association with 5-FU resistance in both of these studies confirming results derived form the NCI60 datasets. More importantly, these results show that the stromal signature could be clinical useful in other cancer, such as colon and rectal cancer in addition to breast carcinoma.

EXAMPLE 2:

The following genes are selected to be part of the stromal signature: DCN, PLAU, CYR61, and SPARC and have the respective expression values of 8.0, 4.6, 8.2, 7.6. Their respective coefficients ks are 1, 2, 1, and 1.

The Stroma Metagene Score = average (ksi*DCN + ks 2 * PLAU + ks 3 * CYR61 + ks 4 *SPARC) = average (1 * 8.0 + 2 * 4.6 + 1 * 8.2 + 1 * 7.6 ) = 8.2

The ks 2 has a value of 2 rather then 1 to compensate for the property of PLAU to have a lower expression level relative to the 3 other genes.

The following genes are selected to be part of the reference signature: GAPDH, KRT5, BIRC5 and TPX2 and have the respective expression values of 4.2, 5.1, 4.7, and 3.9. Their respective coefficients kr are 1, 2, 1, and 1.

The Reference Metagene Score = average (kri* GAPDH + kr 2 * KRT5 + kr 3 * BIRC5 + kr 4 * TPX2) = average (1 * 4.2 + 1 * 5.1 + 1 * 4.7 + 1 * 3.9 ) = 4.475

The efficiency of extracting mRNA for stromal and reference genes is equal, therefore the values of both CTl and CTE2 constants will be zero. As there is no batch specific bias, the value of CTE3 constant will be also zero.

The THl threshold is defined as +1, the TH2 is defined a -1.

The SC value is (8.2+0) / (4.475+0) = 1.82. As 1.82 is > +1, the tumor will be classified as having a high tumor content.

Reference list

1. Effects of chemotherapy and hormonal therapy for early breast cancer on recurrence and 15-year survival: an overview of the randomised trials. Lancet 365, 1687-717

(2005).

2. Colleoni, M. et al. Chemotherapy is more effective in patients with breast cancer not expressing steroid hormone receptors: a study of preoperative treatment. Clin Cancer Res 10, 6622-8 (2004). 3. Hannemann, J. et al. Changes in gene expression associated with response to neoadjuvant chemotherapy in breast cancer. J Clin Oncol 23, 3331-42 (2005). 4. Guarneri, V. et al. Prognostic value of pathologic complete response after primary chemotherapy in relation to hormone receptor status and other factors. J Clin Oncol

24, 1037-44 (2006). 5. Fisher, E.R. et al. Pathobiology of preoperative chemotherapy: findings from the

National Surgical Adjuvant Breast and Bowel (NSABP) protocol B- 18. Cancer 95,

681-95 (2002). 6. Bonadonna, G. et al. Primary chemotherapy in operable breast cancer: eight-year experience at the Milan Cancer Institute. JCHn Oncol 16, 93-100 (1998). 7. Fisher, B. et al. Effect of preoperative chemotherapy on the outcome of women with operable breast cancer. J Clin Oncol 16, 2672-85 (1998).

8. Potti, A. et al. Genomic signatures to guide the use of chemotherapeutics. Nat Med 12, 1294-300 (2006).

9. Gianni, L. et al. Gene expression profiles in paraffin-embedded core biopsy tissue predict response to chemotherapy in women with locally advanced breast cancer. J

CHn Oncol 23, 7265-77 (2005).

10. Hess, K.R. et al. Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. JCHn Oncol 24, 4236-44 (2006). 11. Thuerigen, O. et al. Gene expression signature predicting pathologic complete response with gemcitabine, epirubicin, and docetaxel in primary breast cancer. J Clin Oncol IA, 1839-45 (2006).

12. Aoudjit, F. & Vuori, K. Integrin signaling inhibits paclitaxel-induced apoptosis in breast cancer cells. Oncogene 20, 4995-5004 (2001). 13. Cordes, N., Blaese, M.A., Plasswilm, L., Rodemann, H.P. & Van Beuningen, D. Fibronectin and laminin increase resistance to ionizing radiation and the cytotoxic drug Ukrain in human tumour and normal cells in vitro. Int J Radiat Biol 79, 709-20 (2003).

14. Damiano, J.S., Cress, A.E., Hazlehurst, L.A., Shtil, A.A. & Dalton, W.S. Cell adhesion mediated drug resistance (CAM-DR): role of integrins and resistance to apoptosis in human myeloma cell lines. Blood 93, 1658-67 (1999).

15. Sethi, T. et al. Extracellular matrix proteins protect small cell lung cancer cells against apoptosis: a mechanism for small cell lung cancer growth and drug resistance in vivo. Nat Med 5, 662-8 (1999). 16. Ein-Dor, L., Zuk, O. & Domany, E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sd USA 103, 5923-8 (2006).

17. Farmer, P. et al. Identification of molecular apocrine breast tumours by microarray analysis. Oncogene 24, 4660-71 (2005).

18. Perou, CM. et al. Molecular portraits of human breast tumours. Nature 406, 747-52 (2000).

19. Sorlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA 98, 10869-74 (2001). 20. Sotiriou, C. et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 98, 262- 72 (2006). 21. van de Vijver, M.J. et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347, 1999-2009 (2002). 22. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102, 15545- 50 (2005). 23. BiId, A.H., Potti, A. & νevins, J.R. Linking oncogenic pathways with therapeutic opportunities. Nat Rev Cancer 6, 735-41 (2006). 24. West, R.B. et al. Determination of stromal signatures in breast carcinoma. PLoS Biol 3, el87 (2005).

25. Finak, G. et al. Gene expression signatures of morphologically normal breast tissue identify basal-like tumors. Breast Cancer Res 8, R58 (2006).

26. Weaver, V.M. et al. beta4 integrin-dependent formation of polarized three- dimensional architecture confers resistance to apoptosis in normal and malignant mammary epithelium. Cancer Cell 2, 205-16 (2002).

27. Misra, S., Ghatak, S. & Toole, B. P. Regulation of MDRl expression and drug resistance by a positive feedback loop involving hyaluronan, phosphoinositide 3- kinase, and ErbB2. J Biol Chem 280, 20310-5 (2005). 28. Hazlehurst, L.A. et al. Reduction in drug-induced DνA double-strand breaks associated with betal integrin-mediated adhesion correlates with drug resistance in U937 cells. Blood 9H, 1897-903 (2001).

29. Song, S., Wientjes, M.G., Gan, Y. & Au, J.L. Fibroblast growth factors: an epigenetic mechanism of broad spectrum resistance to anticancer drugs. Proc Natl Acad Sci US A 97, 8658-63 (2000).

30. Bergh, J. et al. Dosage of adjuvant G-CSF (fϊlgrastim)-supported FEC polychemotherapy based on equivalent haematological toxicity in high-risk breast cancer patients. Scandinavian Breast Group, Study SBG 9401. Ann Oncol 9, 403-11 (1998).

31. Orimo, A. et al. Stromal fibroblasts present in invasive human breast carcinomas promote tumor growth and angiogenesis through elevated SDF-I /CXCL 12 secretion. Cell 121, 335-48 (2005).

32. van de Vijver, MJ. et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347, 1999-2009 (2002).

33. Wang, Y. et al. Gene-expression profiles to predict distant metastasis of lymph-node- negative primary breast cancer. Lancet 365, 671-9 (2005).

34. Hess, K.R. et al. Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. J Clin Oncol 24, 4236-44 (2006).

35. BiId, A.H., Potti, A. & νevins, J.R. Linking oncogenic pathways with therapeutic opportunities. Nat Rev Cancer 6, 735-41 (2006).

36. Perou, CM. et al. Molecular portraits of human breast tumours. Nature 406, 747-52 (2000).

37. Sorlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA 98, 10869-74 (2001).

38. Sotiriou, C. et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 98, 262- 72 (2006).

39. Farmer, P. et al. Identification of molecular apocrine breast tumours by microarray analysis. Oncogene 24, 4660-71 (2005).

40. Hedges, L. & Olkin, I. Statistical methods for meta-analysis., (London, 1985).

41. Westfall, P. & Young, S. Resampling-based multiple testing: Examples and methods for p-value adjustment, (New York, 1993).

42. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerfull approach to multiple testing. Journal of the Royal Statistical Society 57, 289-300 (1995).

43. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102, 15545-

50 (2005).

44. B. Michael Ghadimi, Marian Grade, Michael J. Difilippantonio, Sudhir Varma, Richard Simon, Cristina Montagna, Laszlo Fu zesi, Claus Langer, Heinz Becker, Torsten Liersch, and Thomas Ried Effectiveness of Gene Expression Profiling for Response Prediction of Rectal Adenocarcinomas to Preoperative Chemoradiotherapy,

J. Clin. Oncol. 2005