Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
PREDICTING RESPONSIVENESS TO CANCER THERAPEUTICS
Document Type and Number:
WIPO Patent Application WO/2009/052484
Kind Code:
A3
Abstract:
Provided herein are methods for predicting the responsiveness of a cancer to a chemotherapeutic agent using gene expression profiles. In particular, methods for predicting the responsiveness to 5-fluorouracil, adriamycin, cytotoxan, docetaxol, etoposide, taxol, topotecan, PB kinase inhibitors and Src inhibitors are provided. Methods for developing treatment plans for individuals with cancer are also provided. Kits including gene chips and instructions for predicting responsiveness and computer readable media comprising responsivity information are also provided.

Inventors:
POTTI ANIL (US)
NEVINS JOSEPH R (US)
LANCASTER JOHNATHAN M (US)
Application Number:
PCT/US2008/080481
Publication Date:
June 04, 2009
Filing Date:
October 20, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV DUKE (US)
POTTI ANIL (US)
NEVINS JOSEPH R (US)
LANCASTER JOHNATHAN M (US)
International Classes:
C12Q1/68; G16B25/10; G16B40/20; G16B40/30
Foreign References:
US20070172844A12007-07-26
Other References:
POTTI ET AL.: "Genomic signatures to guide the use of chemotherapeutics.", NATURE MEDICINE, vol. 12, no. 11, November 2006 (2006-11-01), pages 1294 - 1300, Retrieved from the Internet
IWAO-KOIZUMI ET AL.: "Prediction of Docetaxel Response in Human Breast Cancer by Gene Expression Profiling.", JOUR. CLIN. ONCOL., vol. 23, no. 3, 20 January 2005 (2005-01-20), pages 422 - 431
WATTERS ET AL.: "Developing gene expression signatures of pathway deregulation in tumors.", MOL. CANCER. THER., vol. 5, no. 10, October 2006 (2006-10-01), pages 2444 - 2449
Attorney, Agent or Firm:
HARTWIG, Gregory, J. (100 East Wisconsin AvenueSuite 330, Milwaukee WI, US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A method for predicting responsiveness of a cancer to a chemotherapeutic agent comprising: using a comparison of a first gene expression profile of the cancer to a chemotherapy responsivity predictor set of gene expression profiles to predict the responsiveness of the cancer to the chemotherapeutic agent, the first gene expression profile and the chemotherapy responsivity predictor set each comprising at least five genes from one of Tables 1-8, wherein Tables 1-8 comprise the chemotherapy responsivity predictor set for 5- flourouracil, adriamycin, cytotoxan, docetaxol, etoposide, taxol, topotecan and PB kinase inhibitors, respectively.

2. A method for predicting responsiveness of a cancer to an inhibitor of the PBkinase pathway comprising: using a comparison of a first gene expression profile of the cancer to a docetaxol chemotherapy responsivity predictor set of gene expression profiles to predict the responsiveness of the cancer to the inhibitor of the PBkinase pathway, the first gene expression profile and the docetaxol chemotherapy responsivity predictor set each comprising at least five genes from Table 4.

3. A method for predicting responsiveness of a cancer to an inhibitor of the Src pathway comprising: using a comparison of a first gene expression profile of the cancer to a topotecan chemotherapy responsivity predictor set of gene expression profiles to predict the responsiveness of the cancer to the inhibitor of the Src pathway, the first gene expression profile and the topotecan chemotherapy responsivity predictor set each comprising at least five genes from Table 7.

4. The method of any of claims 1-3, wherein the first gene expression profile is obtained by analyzing a nucleic acid sample from the cancer.

5. The method of any of claims 1-4, wherein the first gene expression profile is obtained by analyzing a sample from a tumor or ascites.

6. The method of any of claims 1-5, wherein the first gene expression profile is determined using a nucleic acid microarray.

7. The method of any of claims 1-6, wherein the first gene expression profile and the chemotherapy responsivity predictor set each comprises at least 10 genes.

8. The method of any of claims 1-7, wherein the first gene expression profile and the chemotherapy responsivity predictor set each comprises at least 20 genes.

9. The method of any of claims 1-8, wherein the cancer is from an individual and wherein step (b) identifies the individual as a complete responder or as an incomplete responder to the chemotherapeutic agent.

10. The method of any of claims 1-9, wherein the first gene expression profile is compared to at least two chemotherapy responsivity predictor sets each comprising at least five genes from the corresponding Tables 1-8.

11. The method of any of claims 1-10, wherein the cancer is selected from the group consisting of lung, breast, ovarian, prostrate, renal, colon, leukemia, skin, and brain cancer.

12. The method of any of claims 1-11, wherein the chemotherapy responsivity predictor set is defined by extracting a single dominant value using singular value decomposition (SVD) and determining the value of the chemotherapy responsivity predictor set in the cancer.

13. The method of any of claims 1-12, wherein predicting comprises applying one or more statistical models to the comparison, each model producing a statistical probability of the sensitivity of the cancer to the chemotherapeutic agent.

14. The method of claim 13, wherein the statistical model is a binary regression model.

15. The method of claim 13, wherein the statistical model is a tree model, the tree model including one or more nodes, each node representing a metagene, each node including a statistical probability of sensitivity of the cancer to the chemotherapeutic agent.

16. The method of any of claims 1-15, wherein the method predicts responsiveness to the chemotherapeutic agent with at least 80% accuracy.

17. The method of any of claims 1-16, wherein the chemotherapy responsivity predictor set is developed using at least one resistant cell line and at least one sensitive cell line of one of Tables 9-15, Tables 9-15 listing cell lines sensitive or resistant to 5-flourouracil, adriamycin, cytotoxan, docetaxol, etoposide, taxol, and topotecan, respectively.

18. A method of developing a treatment plan for an individual with cancer comprising using the predicted responsivity of a cancer to a chemotherapeutic agent obtained by the method of any of claims 1-17 to develop a treatment plan.

19. The method of claim 18, wherein the treatment plan includes administering an effective amount of a chemotherapeutic agent to the individual with the cancer if the cancer is predicted to respond to the chemotherapeutic agent.

20. The method of claims 18 or 19, further comprising comparing the first gene expression profile to an alternative chemotherapy responsivity predictor set of gene expression profiles predictive of responsivity to alternative chemotherapeutic agents; predicting responsiveness of the cancer to the alternative chemotherapeutic agents and administering an alternative chemotherapeutic agent to the individual with the cancer.

21. The method of claim 20, wherein the alternative chemotherapeutic agent is selected from the group comprising docetaxel, paclitaxel, abraxane, topotecan, adriamycin, etoposide, fluorouracil (5-FU), cyclophosphamide, denopterin, edatrexate, methotrexate, nolatrexed, pemetrexed, piritrexim, pteropterin, raltitrexed, trimetrexate, cladribine, ctofarabine, fludarabine, 6-mercaptopurine, nelarabine, thiamiprine, thioguanine, tiazofurin, ancitabine, azacibdine, 6-azauridine, capecitabine, carmofur, cytarabine, decitabine, doxifluridine, enocitabine, floxuridine, fluorouracil, gemcitabine, tegafur, troxacitabine, pentostatin, hydroxyurea, cytosine arabinoside.

22. The method of any of claims 18-21, wherein the plan includes administering the chemotherapeutic agent before, after or concurrently with the administration of one or more alternative chemotherapeutic agents.

23. The method of any of claims 18-22, wherein the alternative chemotherapeutic agent targets a signal transduction pathway.

24. The method of claim 23, wherein the first gene expression profile of the cancer comprises at least one gene expression profile indicative of deregulation of the signal transduction pathway.

25. The method of claims 23 or 24, wherein the alternative chemotherapeutic agent is selected from inhibitors of a signal transduction pathway selected from the group consisting of Src, E2F3, Myc, PBkinase and β-catenin.

26. The method of any of claims 18-25, wherein the cancer is predicted to be responsive to more than one chemotherapeutic agent.

27. The method of claim 26, wherein the treatment plan administering an effective amount of at least two chemotherapeutic agents to the individual with the cancer.

28. The method of claim 27, wherein the plan includes administering at least two chemotherapeutic agents before, after or concurrently with each other.

29. The method of any of claims 18-28, wherein the treatment plan has an estimated efficacy of at least 50%.

30. A kit comprising a gene chip for predicting responsivity of a cancer to a chemotherapeutic agent comprising nucleic acids capable of detecting at least five genes selected from Tables 1-8 and instructions for predicting responsivity of a cancer to the chemotherapeutic agent.

31. A computer readable medium comprising gene expression profiles and corresponding responsivity information for chemotherapeutic agents comprising at least five genes from any of Tables 1-8.

Description:

PREDICTING RESPONSIVENESS TO CANCER THERAPEUTICS

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Utility Application No. 11/975,722, filed October 19, 2007, which is incorporated herein by reference in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED

RESEARCH OR DEVELOPMENT

This invention was made with government support under NCI-U54 CAl 12952-02 and ROI-CAl 06520 awarded by the National Cancer Institute. The government has certain rights in the invention.

BACKGROUND OF THE INVENTION

The National Cancer Institute has estimated that in the United States alone, one in three people will be afflicted with cancer. Moreover, approximately 50% to 60% of people with cancer will eventually die from the disease. The inability to predict responses to specific therapies is a major impediment to improving outcome for cancer patients. Because treatment of cancer typically is approached empirically, many patients with chemo-resistant disease receive multiple cycles of often toxic therapy before the lack of efficacy becomes evident. As a consequence, many patients experience significant toxicities, compromised bone marrow reserves, and reduced quality of life while receiving chemotherapy. Further, initiation of efficacious therapy is delayed.

BRIEF SUMMARY OF THE INVENTION

In one aspect, methods for predicting responsiveness of a cancer to a chemotherapeutic agent are provided. The method includes using a comparison of a first gene expression profile of the cancer to a chemotherapy responsivity predictor set of gene expression profiles to predict the responsiveness of the cancer to the chemotherapeutic agent. The first gene expression profile and the chemotherapy responsivity predictor set each comprise at least five genes from one of Tables 1-8. Tables 1-8 comprise the chemotherapy responsivity predictor set for 5-flourouracil, adriamycin, cytotoxan, docetaxol, etoposide, taxol, topotecan and PB kinase inhibitors, respectively. Also included are methods of

predicting the responsiveness to PBkinase pathway inhibitors and Src pathway inhibitors using the chemotherapy response predictor sets for docetaxol and topotecan, respectively.

In another aspect, methods of developing a treatment plan for an individual with cancer are provided. The predicted responsivity of a cancer to a chemotherapeutic agent may be used to develop a treatment plan for the individual with the cancer. The treatment plan may include administering an effective amount of a chemotherapeutic agent to the individual with the cancer which is predicted to respond to the chemotherapeutic agent.

In yet another aspect, kits including a gene chip for predicting responsivity of a cancer to a chemotherapeutic agent comprising nucleic acids capable of detecting at least five genes selected from any one of Tables 1-8 and instructions for predicting responsivity of a cancer to the chemotherapeutic agents are provided.

In a further aspect, computer readable mediums including gene expression profiles and corresponding responsivity information for chemotherapeutic agents comprising at least five genes from any of Tables 1-8 are provided. Throughout this specification, reference numbering is sometimes used to refer to the full citation for the references, which can be found in the "Reference Bibliography" after the Examples section. The disclosure of all patents, patent applications, and publications cited herein are hereby incorporated by reference in their entirety for all purposes.

BRIEF DESCRIPTION OF THE DRAWINGS The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.

Figures 1A-1E show a gene expression signature that predicts sensitivity to docetaxel. (A) Strategy for generation of the chemotherapeutic response predictor. (B) Top panel - Cell lines from the NCI-60 panel used to develop the in vitro signature of docetaxel sensitivity. The figure shows a statistically significant difference (Mann Whitney U test of significance) in the IC50/GI50 and LC50 of the cell lines chosen to represent the sensitive and resistant subsets. Bottom Panel - Expression plots for genes selected for discriminating the docetaxel resistant and sensitive NCI-60 cell lines, depicted by color coding with blue representing the lowest level and red the highest. Each column in the figure represents individual samples. Each row represents an individual gene, ordered from top to bottom according to regression coefficients. (C) Top Panel - Validation of the docetaxel response prediction model in an

independent set of lung and ovarian cancer cell line samples. A collection of lung and ovarian cell lines were used in a cell proliferation assay to determine the 50% inhibitory concentration (IC 50 ) of docetaxel in the individual cell lines. A linear regression analysis demonstrates a statistically significant (p < 0.01, log rank) relationship between the IC 50 of docetaxel and the predicted probability of sensitivity to docetaxel. Bottom panel - Validation of the docetaxel response prediction model in another independent set of 29 lung cancer cell line samples (Gemma A, Geo accession number: GSE 4127). A linear regression analysis demonstrates a very significant (p < 0.001, log rank) relationship between the IC 50 of docetaxel and the predicted probability of sensitivity to docetaxel. (D) Left Panel - A strategy for assessment of the docetaxel response predictor as a function of clinical response in the breast neoadjuvant setting. Middle panel - Predicted probability of docetaxel sensitivity in a collection of samples from a breast cancer single agent neoadjuvant study. Twenty of twenty four samples (91.6%) were predicted accurately using the cell line based predictor of response to docetaxel. Right panel - A single variable scatter plot demonstrating a significance test of the predicted probabilities of sensitivity to docetaxel in the sensitive and resistant tumors (p < 0.001, Mann Whitney U test of significance). (E) Left Panel - A strategy for assessment of the docetaxel response predictor as a function of clinical response in advanced ovarian cancer. Middle panel - Predicted probability of docetaxel sensitivity in a collection of samples from a prospective single agent salvage therapy study. Twelve of fourteen samples (85.7%) were predicted accurately using the cell line based predictor of response to docetaxel. Right panel - A single variable scatter plot demonstrating statistical significance (p < 0.01, Mann Whitney U test of significance).

Figures 2A-2C show the development of a panel of gene expression signatures that predict sensitivity to chemotherapeutic drugs. (A) Gene expression patterns selected for predicting response to the indicated drugs. The genes involved the individual predictors are shown in Tables 1-8, as indicated. (B) Independent validation of the chemotherapy response predictors in an independent set of cancer cell lines 37 that have dose response and Affymetrix expression data. 38 A single variable scatter plot demonstrating a significance test of the predicted probabilities of sensitivity to any given drug in the sensitive and resistant cell lines (p value, Mann Whitney U test of significance). Red symbols indicate resistant cell lines, and blue symbols indicate those that are sensitive. (C) Prediction of single agent therapy response in patient samples using in vitro cell line based expression signatures of chemosensitivity. In each case, red represents non-responders (resistance) and blue

- A - represents responders (sensitivity). The top panel shows the predicted probability of sensitivity to topotecan when compared to actual clinical response data (n = 48), the middle panel demonstrates the accuracy of the adriamycin predictor in a cohort of 122 samples (Evans W, GSE650 and GSE651). The bottom panel shows the predictive accuracy of the cell line based paclitaxel (taxol) predictor when used as a salvage chemotherapy in advanced ovarian cancer (n = 35). The positive and negative predictive values for all the predictors are summarized in Table 16.

Figures 3A-3B show the prediction of response to combination therapy. (A) Top Panel - Strategy for assessment of chemotherapy response predictors in combination therapy as a function of pathologic response. Middle panel- Prediction of patient response to neoadjuvant chemotherapy involving paclitaxel, 5-flourouracil (5-FU), adriamycin, and cyclophosphamide (TFAC) using the single agent in vitro chemosensitivity signatures developed for each of these drugs. Bottom Panel- Prediction of response (38 non-responders, 13 responders) employing a combined probability predictor assessing the probability of all four chemosensitivity signatures in 51 patients treated with TFAC chemotherapy shows statistical significance (p < 0.0001, Mann Whitney) between responders (blue) and non- responders (red). Response was defined as a complete pathologic response after completion of TFAC neoadjuvant therapy. (B) Top Panel - Prediction of patient response (n = 45) to adjuvant chemotherapy involving 5 -FU, adriamycin, and cyclophosphamide (FAC) using the single agent in vitro chemosensitivity predictors developed for these drugs. Middle panel - Prediction of response (34 responders, 11 non-responders) employing a combined probability predictor assessing the probability of all four chemosensitivity signatures in 45 patients treated with FAC chemotherapy. Bottom panel - Kaplan Meier survival analysis for patients predicted to be sensitive (blue curve) or resistant (red curve) to FAC adjuvant chemotherapy. Figure 4 shows patterns of predicted sensitivity to common chemotherapeutic drugs in human cancers. Hierarchical clustering of a collection of breast (n = 171), lung cancer (n = 91) and ovarian cancer (n = 119) samples according to patterns of predicted sensitivity to the various chemotherapeutics. These predictions were then plotted as a heatmap in which high probability of sensitivity /response is indicated by red, and low probability or resistance is indicated by blue.

Figures 5A-5B show the relationship between predicted chemotherapeutic sensitivity and oncogenic pathway deregulation. (A) Top Panel - Probability of oncogenic pathway deregulation as a function of predicted docetaxel sensitivity in a series of lung cancer cell

lines (red = sensitive, blue = resistant). Bottom panel - Probability of oncogenic pathway deregulation as a function of predicted topotecan sensitivity in a series of ovarian cancer cell lines (red = sensitive, blue = resistant). (B) Top Left Panel - The lung cancer cell lines showing an increased probability of PB kinase were also more likely to respond to a PB kinase inhibitor (L Y -294002) (p = 0.001, log-rank test)), as measured by sensitivity to the drug in assays of cell proliferation. Top Right Panel - Those cell lines predicted to be resistant to docetaxel were more likely to be sensitive to PB kinase inhibition (p < 0.001, log- rank test). Bottom Left Panel - Ovarian cancer cell lines showing an increased probability of Src pathway deregulation were also more likely to respond to a Src inhibitor (SU6656) (p<0.007, log-rank test). Bottom Right Panel - The relationship between Src pathway deregulation and topotecan resistance can be demonstrated in a set of 13 ovarian cancer cell lines. Ovarian cell lines that are predicted to be topotecan resistant have a higher likelihood of Src pathway deregulation and there is a significant linear relationship (p = 0.001, log rank) between the probability of topotecan resistance and sensitivity to a drug that inhibits the Src pathway (SU6656).

Figure 6 shows a scheme for utilization of chemotherapeutic and oncogenic pathway predictors for identification of individualized therapeutic options.

Figures 7A-7C show a patient-derived docetaxel gene expression signature predicts response to docetaxel in cancer cell lines. (A) Top panel - A ROC curve analysis to show the approach used to define a cut-off, using docetaxel as an example. Middle panel - A t-test plot of significance between the probability of docetaxel sensitivity and IC 50 for docetaxel sensitive in cell lines, shown by histologic type. Bottom panel - A linear regression analysis showing the significant correlation between predicted sensitivity and actual sensitivity (IC50) for docetaxel, in lung and ovarian cancer cell lines. (B) Generation of a docetaxel response predictor based on patient data that was then validated in a leave one out cross validation and linear regression analysis (p-value obtained by log-rank), evaluated against the IC50 for docetaxel in two NCI-60 cell line drug screening experiments. (C) A comparison of predictive accuracies between a predictor for docetaxel generated from the cell line data (top panel, accuracy: 85.7%) and a predictor generated from patients treatment data (bottom panel, accuracy: 64.3%) shows the relative inferiority of the latter approach, when applied to an independent dataset of ovarian cancer patients treated with single agent docetaxel.

Figures 8A-8C show the development of gene expression signatures that predict sensitivity to a panel of commonly used chemotherapeutic drugs. Panel A shows the gene

expression models selected for predicting response to the indicated drugs, with resistant lines on the left, sensitive on the right for each predictor. Panel B shows the leave one out cross validation accuracy of the individual predictors. Panel C demonstrates the results of an independent validation of the chemotherapy response predictors in an independent set of cancer cell lines 37 shown as a plot with error bars (blue - sensitive, red - resistant).

Figure 9 shows the specificity of chemotherapy response predictors. In each case, individual predictors of response to the various cytotoxic drugs was plotted against cell lines known to be sensitive or resistant to a given chemotherapeutic agent (e.g., adriamycin, paclitaxel). Figure 10A- 1OC shows the relationships in predicted probability of response to chemotherapies in breast (A), lung (B) and ovarian (C) cancers. In each case, a regression analysis (log rank) of predicted probability of response to two drugs is shown.

Figure 11 shows the absolute probabilities of response to various chemotherapies in human lung and breast cancer samples. Figure 12 shows a gene expression based signature of PB kinase pathway deregulation. Image intensity display of expression levels for genes that most differentiate control cells expressing GFP from cells expressing the oncogenic activity of P13 kinase. The expression value of genes composing each signature is indicated by color, with blue representing the lowest value and red representing the highest level. The panel below shows the results of a leave one out cross validation showing a reliable differentiation between GFP controls (blue) and cells expressing Pl 3 kinase (red).

Figures 13A-13C show the relationship between oncogenic pathway deregulation and chemosensitivity patterns (using docetaxel as an example). (A) Probability of oncogenic pathway deregulation as a function of predicted docetaxel sensitivity in the NCI-60 cell line panel (red = sensitive, blue = resistant). (B) Linear regression analysis (log-rank test of significance) to identify relationships between predicted docetaxel sensitivity or resistance and deregulation of PI3 kinase and E2F3 pathways. (C) A non-parametric t-test of significance demonstrating a significant difference in docetaxel sensitivity, between those cell lines predicted to be either pathway deregulated (>50% probability, red) or quiescent (<50% probability, blue), shown for both E2F and PI3 kinase pathways.

Figure 14 shows a scatter plot demonstrating a linear regression analysis that identifies a statistically significant correlation between probability of docetaxel resistance and

PB Kinase pathway activation in an independent cohort of 17 non-small cell lung cancer cell lines.

Figure 15 shows a functional block diagram of general purpose computer system 1500 for performing the functions of the software provided by the invention.

BRIEF DESCRIPTION OF THE TABLES

Tables 1-8 include the chemotherapy responsivity predictor set for 5-flourouracil, adriamycin, cytotoxan, docetaxol, etoposide, taxol, topotecan and PB kinase inhibitors, respectively.

Tables 9-15 list cell lines and indicate their sensitivity or resistance to 5-flourouracil, adriamycin, cytotoxan, docetaxol, etoposide, taxol, and topotecan, respectively.

Table 16 is a summary of the chemotherapy response predictors - validations in cell line and patient data sets.

Table 17 shows an enrichment analysis shows that a genomic-guided response prediction increases the probability of a clinical response in the different data sets studied. Table 18 shows the accuracy of genomic-based chemotherapy response predictors as compared to previously reported predictors of response.

DETAILED DESCRIPTION OF THE INVENTION

The difficulty with administering one or more chemotherapeutic agents to an individual with cancer is that not all individuals with cancer will respond favorably to the chemotherapeutic agent selected by the physician. Frequently, the administration of one or more chemotherapeutic agent results in the individual becoming even more ill from the toxicity of the agent, while the cancer persists. Due to the cytotoxic nature of chemotherapeutic agents, the individual is physically weakened and immunologically compromised such that the individual cannot tolerate multiple rounds of therapy. Hence a personalized treatment plan is highly desirable.

As described in the Examples, the inventors identified gene expression patterns within primary tumors or cell lines that predict response to various chemotherapeutic agents. These predictions may be used to develop treatment plans for individual cancer patients. The invention also provides integrating gene expression profiles that predict responsiveness to combination therapies as a strategy for developing personalized treatment plans for individual

patients. Treatment plans may result in individuals having a complete response, a partial response or an incomplete response to the cancer.

A "complete response" (CR) to treatment of cancer is defined as a complete disappearance of all measurable and assessable disease. In ovarian cancer a complete response includes, in the absence of measurable lesions, a normalization of the CA- 125 level following adjuvant therapy. An individual who exhibits a complete response is known as a

"complete responder."

An "incomplete response" (IR) includes those who exhibited a "partial response" (PR), had "stable disease" (SD), or demonstrated "progressive disease" (PD) during primary therapy.

A "partial response" refers to a response that displays 50% or greater reduction in bi- dimensional size (area) of the lesion for at least 4 weeks or, in ovarian cancer, a drop in the CA-125 level by at least 50% for at least 4 weeks.

"Progressive disease" refers to response that is a 50% or greater increase in the product from any lesion documented within 8 weeks of initiation of therapy, the appearance of any new lesion within 8 weeks of initiation of therapy, or in the case of ovarian cancer, any increase in the CA-125 from baseline at initiation of therapy.

"Stable disease" was defined as disease not meeting any of the above criteria. "Effective amount" refers to an amount of a chemotherapeutic agent that is sufficient to exert a prophylactic or therapeutic effect in the subject, i.e., that amount which will stop or reduce the growth of the cancer or cause the cancer to become smaller in size compared to the cancer before treatment or compared to a suitable control. In most cases, an effective amount will be known or available to those skilled in the art. The result of administering an effective amount of a chemotherapeutic agent may lead to effective treatment of the patient. It is desirable for an effective amount to be an amount sufficient to exert cytotoxic effects on cancerous cells.

"Predicting" and "prediction" as used herein includes, but is not limited to, generating a statistically based indication of whether a particular chemotherapeutic agent will be effective to treat the cancer. This does not mean that the event will happen with 100% certainty.

As used herein, "individual" and "subject" are interchangeable. A "patient" refers to an "individual" who is under the care of a treating physician.

The present invention may be practiced using any suitable technique, including techniques known to those skilled in the art. Such techniques are available in the literature or in scientific treatises, such as, Molecular Cloning: A Laboratory Manual, second edition (Sambrook et al., 1989) and Molecular Cloning: A Laboratory Manual, third edition (Sambrook and Russel, 2001), (jointly referred to herein as "Sambrook); Current Protocols in Molecular Biology (F.M. Ausubel et al., eds., 1987, including supplements); PCR: The Polymerase Chain Reaction, (Mullis et al., eds., 1994); Harlow and Lane (1988) Antibodies, A Laboratory Manual, Cold Spring Harbor Publications, New York; Harlow and Lane (1999) Using Antibodies: A Laboratory Manual Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (jointly referred to herein as "Harlow and Lane"), Beaucage et al. eds., Current Protocols in Nucleic Acid Chemistry John Wiley & Sons; Inc., New York, 2000) and Casarett and Doull 's Toxicology The Basic Science of Poisons, C. Klaassen, ed., 6th edition (2001).

Methods for Predicting Responsiveness to Chemotherapy Methods of predicting responsiveness of a cancer to a chemotherapeutic agent are provided herein. Specifically, the methods rely on using a comparison of a gene expression profile of the cancer to a chemotherapy responsivity predictor set to predict the responsiveness to the chemotherapeutic agent. See Tables 1-8 for the chemotherapeutic responsivity predictor sets. The chemotherapy responsivity predictor set is expected to be distinct for each class of chemotherapeutic agents and may vary between chemotherapeutic agents within the same class. A class of chemotherapeutic agents is chemotherapeutic agents that are similar in some way. For example, the agents may be known to act through a similar mechanism, or have similar targets or structures. An example of a class of chemotherapeutic agents is agents that inhibit PBkinase. The chemotherapy predictor set is, or may be derived from, a set of gene expression profiles obtained from samples (cell lines, tumor samples, etc.) with known sensitivity or resistance to the chemotherapeutic agent. The comparison of the expression of a specific set of genes in the cancer to the same set of genes in samples known to be sensitive or resistant to the chemotherapeutic agent allows prediction of the responsiveness of the cancer to the chemotherapeutic agent. The prediction may indicate that the cancer will respond completely to the chemotherapeutic agent, or it may predict that the cancer will be only partially responsive or non-responsive (i.e. resistant) to the chemotherapeutic agent. The cell lines

used to generate the chemotherapy responsivity predictor sets and an indication of the cell lines' sensitivity or resistance to the chemotherapeutic agents are provided in Tables 9-15.

The methods described herein provide an indication of whether the cancer in the patient is likely to be responsive to a particular chemotherapeutic prior to beginning treatment that is more accurate than predictions using population-based approaches from clinical studies. The methods allow identification of chemotherapeutics estimated to be useful in combating a particular cancer in an individual patient, resulting in a more cost-effective, targeted therapy for the cancer patient and avoiding side effects from non-efficacious chemotherapeutic agents. Tables 1-8 also provide the relative "weights" of each of the individual genes that make up the responsivity predictor set. The weights demonstrate that some genes are more strongly indicative of sensitivity or resistance of a cancer to a particular therapeutic agent. Predictions based on the complete set of genes are expected to provide the most accurate predictions regarding the efficacy of treating the cancer with a particular therapeutic agent. Those of skill in the art will understand based on the weights of each gene in the responsivity predictor set that some genes are more predictive of outcome than others and thus that the entire responsivity predictor set need not be used to develop a useful prediction.

Once an individual's cancer is predicted to be responsive to a particular chemotherapy, then a treatment plan can be developed incorporating the chemotherapeutic agent and an effective amount of the chemotherapeutic agent(s) may be administered to the individual with the cancer. Those of skill in the art will appreciate that the methods do not guarantee that the individuals will be responsive to the chemotherapeutic agent, but the methods will increase the probability that the selected treatment will be effective to treat the cancer. Also encompassed is the ability to predict the responsiveness of the cancer to multiple chemotherapeutic agents and then to develop a treatment plan using a combination of two or more chemotherapeutic agents. Those of skill in the art appreciate that combination therapy is often suitable.

Treatment or treating a cancer includes, but is not limited to, reduction in cancer growth or tumor burden, enhancement of an anti-cancer immune response, induction of apoptosis of cancer cells, inhibition of angiogenesis, enhancement of cancer cell apoptosis, and inhibition of metastases. Administration of an effective amount of a chemotherapeutic agent to a subject may be carried out by any means known in the art including, but not limited to intraperitoneal, intravenous, intramuscular, subcutaneous, transcutaneous, oral,

nasopharyngeal or transmucosal absorption. The specific amount or dosage administered in any given case will be adjusted in accordance with the specific cancer being treated, the condition, including the age and weight, of the subject, and other relevant medical factors known to those of skill in the art. In one embodiment, the methods involve predicting responsiveness to chemotherapeutic agents of an individual with cancer. Cancers include but are not limited to any cancer treatable with the chemotherapeutic agents described herein. Cancers include, but are not limited to, ovarian cancer, lung cancer, prostrate cancer, renal cancer, colon cancer, leukemia, skin cancer, brain or central nervous system cancer and breast cancer. In another embodiment, the individual has advanced stage cancer (e.g., Stage III/IV ovarian cancer). In other embodiments, the individual has early stage cancer. For the individuals with advanced cancer, one form of primary treatment practiced by treating physicians is to surgically remove as much of the tumor as possible, a practice sometime known as "debulking."

The sample of the cancer used to obtain the first gene expression profile may be directly from a tumor that was surgically removed. Alternatively, the sample of the cancer could be from cells obtained in a biopsy or other tumor sample. A sample from ascites surrounding the tumor may also be used.

The sample is then analyzed to obtain a first gene expression profile. This can be achieved by any suitable means, including those available to those of skill in the art. One method that can be used is to isolate RNA (e.g., total RNA) from the cellular sample and use a publicly or commercially available micro array system to analyze the gene expression profile from the cellular sample. One microarray that may be used is Affymetrix Human Ul 33 A chip. One of skill in the art follows the standard directions that come with a commercially available microarray. Other types of microarrays may be used, for example, microarrays using RT-PCR for measurement. Other sources of microarrays include, but are not limited to, Stratagene (e.g., Universal Human Microarray), Genomic Health (e.g., Oncotype DX chip), Clontech (e.g., Atlas™ Glass Microarrays), and other types of Affymetrix microarrays. In one embodiment, the microarray may be made by a researcher or obtained from an educational institution. In other embodiments, customized microarrays, which include the particular set of genes that are particularly suitable for prediction, can be used. The gene expression profile may be obtained by any other means, including those known to those of skill in the art, e.g., Northern blots, real time rt-PCR, Western blots for the expressed proteins or protein assays.

Once a first gene expression profile has been obtained from the sample, it is compared with chemotherapy responsivity predictor set of gene expression profiles. Tables 1-8 describe the chemotherapy responsivity predictor sets for 5 -FU, adriamycin, cytotoxan, docetaxol, etoposide, taxol, topotecan, and PB kinase inhibitors, respectively. The use of the chemotherapy responsitivity predictor set in its entirety is contemplated; however, it is also possible to use subsets of the predictor set. For example, a subset of at least 2, 5, 10, 15, 20, 25, 30, 35 or 40 or more genes from one of Tables 1-8 can be used for predictive purposes. For example, 40, 45, 50, 55, 60, 65, 70, 75 or 80 genes from Table 7 could be used in a topotecan chemotherapy responsivity predictor set. Thus, one of skill in art may use the chemotherapy responsitivity predictor set as detailed in the Examples to predict whether an individual or patient with cancer will be responsive to the selected chemotherapeutic agent. If the individual is a complete responder to a chemotherapeutic agent, then a treatment plan may be designed in which the therapeutic agent will be administered in an effective amount. If the complete responder stops being a complete responder, as sometimes happens, then the first gene expression profile may be further analyzed for responsivity to an alternative agent to determine which alternative agent should be administered to most effectively combat the cancer while minimizing the toxic side effects to the individual. If the individual is an incomplete responder, then the individual's gene expression profile can be further analyzed for responsivity to an alternative agent to determine which agent should be administered, or alternatively which combination of agents is predicted to be most effective to treat the cancer.

Those of skill in the art will understand that the first gene expression profile may be tested against more than one chemotherapy responsivity predictor set to allow development of a treatment plan with the best likelihood of treating the individual with the cancer. For example, an individual can be evaluated for responsiveness to one or more chemotherapeutic agents. In certain embodiments, the methods of the application are performed outside of the human body. In addition, an individual can be assessed to determine if they will be refractory to a commonly used first-line therapy such that additional alternative therapeutic intervention can be started. For the individuals who appear to be incomplete responders to a chemotherapeutic agent or for those individuals who have ceased being complete responders, an important step in the treatment is to determine other alternative cancer therapies that may be administered to

the individual to best combat the cancer while minimizing the toxicity of these additional agents.

Alternative therapeutic agents include, but are not limited to, cisplatin, denopterin, edatrexate, methotrexate, nolatrexed, pemetrexed, piritrexim, pteropterin, raltitrexed, trimetrexate, cladribine, clofarabine, fludarabine, 6-mercaptopurine, nelarabine, thiamiprine, thioguanine, tiazofurin, ancitabine, azacitidine, 6-azauridine, capecitabine, carmofur, cytarabine, decitabine, doxifluridine, enocitabine, floxuridine, fluorouracil, gemcitabine, tegafur, troxacitabine, pentostatin, hydroxyurea, cytosine arabinoside, docetaxel, paclitaxel, abraxane, topotecan, adriamycin, etoposide, fluorouracil (5-FU), and cyclophosphamide. In one embodiment, the agent may be selected from platinum-based chemotherapeutic agents (e.g., cisplatin), alkylating agents (e.g., nitrogen mustards), antimetabolites (e.g., pyrimidine analogs), radioactive isotopes (e.g., phosphorous and iodine), miscellaneous agents (e.g., substituted ureas) and natural products (e.g., vinca alkyloids and antibiotics). In another embodiment, the therapeutic agent may be selected from the group consisting of allopurinol sodium, dolasetron mesylate, pamidronate disodium, etidronate, fluconazole, epoetin alfa, levamisole HeL, amifostine, granisetron HCL, leucovorin calcium, sargramostim, dronabinol, mesna, filgrastim, pilocarpine HCl, octreotide acetate, dexrazoxane, ondansetron HCL, ondanselron, busulfan, carboplatin, cisplatin, thiotepa, melphalan HCl, melphalan, cyclophosphamide, ifosfamide, chlorambucil, mechlorethamine HCL, carmustine, lomustine, polifeprosan 20 with carmustine implant, streptozocin, doxorubicin HCL, bleomycin sulfate, daunirubicin HCL, dactinomycin, daunorucbicin citrate, idarubicin HCL, pllmycin, mitomycin, pentostatin, mitoxantrone, valrubicin, cytarabine, tludarabine phosphate, floxuridine, cladribine, methotrexate, mercaptipurine, thioguanine, capecitabine, methyltestosterone, nilutamide, testolactone, bicalutamide, flutamide, anastrozole, toremifene citrate, estramustine phosphate sodium, ethinyl estradiol, estradiol, esterifϊed estrogens, conjugated estrogens, leuprolide acetate, goserelin acetate, medroxyprogesterone acetate, megestrol acetate, levamisole HCL, aldesleukin, irinotecan HCL, dacarbazine, asparaginase, etoposide phosphate, gemcitabine HCL, altretamine, topotecan HCL, hydroxyurea, interferon alpha-2b, mitotane, procarbazine HCL, vinorelbine tartrate, E. coli 1-asparaginase, Erwinia L- asparaginase, vincristine sulfate, denileukin diftitox, aldesleukin, rituximab, interferon alpha- la, paclitaxel, abraxane, docetaxel, BCG live (intravesical), vinblastine sulfate, etoposide, tretinoin, teniposide, porfuner sodium, tluorouracil, betamethasone sodium phosphate and

betamethasone acetate, letrozole, etoposide citrororum factor, folinic acid, calcium leucouorin, 5-fluorouricil, adriamycin, c}toxan, and diamino-dichloro-platinum.

In another aspect, the first gene expression profile from the individual with cancer is analyzed and compared to gene expression profiles (or signatures) that are reflective of deregulation of various oncogenic signal transduction pathways. In one embodiment, the alternative cancer therapeutic agent is directed to a target that is implicated in oncogenic signal transduction deregulation. Such targets include, but are not limited to, Src, myc, beta- catenin and E2F3 pathways. Thus, in one aspect, the invention contemplates using an inhibitor that is directed to one of these targets as an additional therapy for cancer. One of skill in the art will be able to determine the dosages for each specific chemotherapeutic agent.

As shown in Example 1, the teachings herein provide a gene expression model that predicts response to docetaxel therapy. The other Examples provide predictors for 5 -FU, adriamycin, cytotoxan, taxol, etoposide, topotecan, PB kinase inhibitors and Src inhibitors.

The gene expression model was developed by using Bayesian binary regression analysis to identify genes highly correlated with drug sensitivity. The developed models were validated in a leave -one -out cross validation.

Chemotherapy Responsivity Predictor Set of Gene Expression Profiles

The chemotherapy responsitivity predictor sets were created by a method described in detail in the Examples and similar to that detailed in Potti et al. (Genomic signatures to guide the use of chemotherapeutics. Nature Medicine 12(11): 1294-1300, 2006, incorporated herein by reference). Unless otherwise noted in the Examples, the [-loglO(M)] GI50/IC50 and LC50 (50% cytotoxic dose) data on the NCI-60 cell line panel for each of the indicated therapeutic agents was used to populate a matrix with MATLAB software with the relevant expression data for each individual cell line. When multiple entries for a drug screen existed (by NCS number), the entry with the largest number of replicates was included. To develop in vitro gene expression based predictors for chemotherapeutic agent sensitivity from the pharmacologic data used in the NCI-60 drug screen studies, we chose cell lines within the NCI-60 panel that would represent the extremes of sensitivity (See Tables 9-15). Relevant expression data (updated data available on the Affymetrix U95A2 GeneChip) for the selected NCI-60 cell lines were then used in a supervised analysis using Bayesian regression methodologies, as described previously (Pittman J, Huang E, Nevins J, et al: Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes. Biostatistics

5(4):587-601, 2004), to develop a probit model predictive of sensitivity to the indicated chemotherapeutic agent.

Method of Treating Individuals with Cancer

The methods described herein also include treating an individual afflicted with cancer. This method involves administering an effective amount of a chemotherapeutic agent to those individuals predicted to be responsive to such therapy. In the alternative, an effective amount of a combination of chemotherapeutic agents may be administered to individuals predicted to be responsive to combination therapy. In the instance where the individual is predicted to be a non-responder, a physician may decide to administer alternative therapeutic agents alone. In many instances, the treatment will comprise a combination of chemotherapeutic agents.

The methods described herein include, but are not limited to, treating individuals afflicted with NSCLC, breast cancer and ovarian cancer. In one aspect, a chemotherapeutic agent is administered in an effective amount by itself (e.g., for complete responders). In another embodiment, the therapeutic agent is administered with an alternative chemotherapeutic in an effective amount concurrently. In another embodiment, the two therapeutic agents are administered in an effective amount in a sequential manner. In yet another embodiment, the alternative therapeutic agent is administered in an effective amount by itself. In yet another embodiment, the alternative therapeutic agent is administered in an effective amount first and then followed concurrently or step-wise by a second or third chemotherapeutic agent.

Methods of Predicting/Estimating the Efficacy of a Therapeutic Agent in Treating an Individual Afflicted with Cancer

One aspect of the invention provides a method for predicting, estimating, aiding in the prediction of, or aiding in the estimation of, the efficacy of a therapeutic agent in treating a subject afflicted with cancer. In certain embodiments, the methods of the application are performed outside of the human body.

One method comprises (a) determining the expression level of multiple genes in a tumor biopsy sample from the subject; (b) defining the value of one or more metagenes from the expression levels of step (a), wherein each metagene is defined by extracting a single dominant value using singular value decomposition (SVD) from a chemotherapy responsivity predictor set; and (c) averaging the predictions of one or more statistical tree models applied

to the values of the metagenes, wherein each model includes one or more nodes, each node representing a metagene, each node including a statistical predictive probability of tumor sensitivity to the therapeutic agent, thereby estimating the efficacy of a therapeutic agent in a subject afflicted with cancer. Another method comprises (a) determining the expression level of multiple genes in a tumor biopsy sample from the subject; (b) defining the value of one or more metagenes from the expression levels of step (a), wherein each metagene is defined by extracting a single dominant value using singular value decomposition (SVD) from a chemotherapy responsivity predictor set; and (c) averaging the predictions of one or more binary regression models applied to the values of the metagenes, wherein each model includes a statistical predictive probability of tumor sensitivity to the therapeutic agent, thereby estimating the efficacy of a therapeutic agent in a subject afflicted with cancer.

In one embodiment, the methods predict the efficacy of a therapeutic agent in treating a subject afflicted with cancer with at least 70% accuracy. In another embodiment, the methods predict the efficacy of a therapeutic agent in treating a subject afflicted with cancer with at least 80% accuracy. In another embodiment, the methods predict the efficacy of a therapeutic agent in treating a subject afflicted with cancer with at least 85% accuracy. In another embodiment, the methods predict the efficacy of a therapeutic agent in treating a subject afflicted with cancer with at least 90% accuracy. In another embodiment, the methods predict the efficacy of a therapeutic agent in treating a subject afflicted with cancer with at least 70%, 75%, 80%, 85%, 90% or 95% accuracy when tested against a validation sample. In another embodiment, the methods predict the efficacy of a therapeutic agent in treating a subject afflicted with cancer with at least 70%, 75%, 80%, 85%, 90% or 95% accuracy when tested against a set of training samples. In another embodiment, the methods predict the efficacy of a therapeutic agent in treating a subject afflicted with cancer with at least 70%, 75%, 80%, 85%, 90% or 95% accuracy when tested on human primary tumors ex vivo or in vivo. Accuracy is the ability of the methods to predict whether a cancer is sensitive or resistant to the chemotherapeutic agent.

The methods predict the efficacy of a therapeutic agent to treat a subject with cancer with at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% sensitivity for a particular chemotherapeutic agent. In another embodiment, the methods predict the efficacy of a therapeutic agent in treating a subject afflicted with cancer with at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% sensitivity when tested against a validation sample. In another embodiment, the methods predict the efficacy of a therapeutic agent in

treating a subject afflicted with cancer with at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% sensitivity when tested against a set of training samples. In another embodiment, the methods predict the efficacy of a therapeutic agent in treating a subject afflicted with cancer with at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% sensitivity when tested on human primary tumors ex vivo or in vivo. Sensitivity measures the ability of the methods to predict all cancers that will be sensitive to the chemotherapeutic agent.

(A) Sample of the cancer

In one embodiment, the methods comprise determining the expression level of genes in a tumor sample from the subject. In certain embodiments, the tumor is a breast tumor, an ovarian tumor, or a lung tumor. In one embodiment, the tumor is not a breast tumor. In one embodiment, the tumor is not an ovarian tumor. In one embodiment, the tumor is not a lung tumor. In one embodiment of the methods described herein, the methods comprise the step of surgically removing a tumor sample from the subject, obtaining a tumor sample from the subject, or providing a tumor sample from the subject.

Alternatively, the sample may be derived from cells from the cancer, or cancerous cells. In another embodiment, the cells may be from ascites surrounding the tumor. The sample may contain nucleic acids from the cancer. Any method may be used to remove the sample from the patient. In one embodiment, at least 40%, 50%, 60%, 70%, 80% or 90% of the cells in the sample are cancer cells. In preferred embodiments, samples having greater than 50% cancer cell content are used. In one embodiment, the sample is a live tumor sample. In another embodiment, the sample is a frozen sample. In one embodiment, the sample is one that was frozen within less than 5, 4, 3, 2, 1, 0.75, 0.5, 0.25, 0.1, or 0.05 hours after extraction from the patient. Frozen samples include those stored in liquid nitrogen or at a temperature of about -80 0 C or below.

(B) Gene Expression

The expression of the genes may be determined using any method known in the art for assaying gene expression. Gene expression may be determined by measuring mRNA or protein levels for the genes. In one embodiment, an mRNA transcript of a gene may be detected for determining the expression level of the gene. Based on the sequence information provided by the GenBank ™ database entries, the genes can be detected and expression levels

measured using techniques well known to one of ordinary skill in the art, including but not limited to rtPCR, Northern blot analysis and microarray analysis. For example, sequences within the sequence database entries corresponding to polynucleotides of the genes can be used to construct probes for detecting mRNAs by, e.g., Northern blot hybridization analyses. The hybridization of the probe to a gene transcript in a subject biological sample can be also carried out on a DNA array. The use of an array is suitable for detecting the expression level of a plurality of the genes. As another example, the sequences can be used to construct primers for specifically amplifying the polynucleotides in, e.g., amplification-based detection methods such as reverse-transcription based polymerase chain reaction (RT-PCR). As another example, mRNA levels can be assayed by quantitative RT-PCR. Furthermore, the expression level of the genes can be analyzed based on the biological activity or quantity of proteins encoded by the genes. Methods for determining the quantity of the protein include immunoassay methods such as Western blot analysis.

In one exemplary embodiment, about l-50mg of cancer tissue was added to a chilled tissue pulverizer, such as to a BioPulverizer H tube (BiolOl Systems, Carlsbad, CA). Lysis buffer, such as from the Qiagen RNeasy Mini kit, was added to the tissue and homogenized. A device such as a Mini-Beadbeater (Biospec Products, Bartlesville, OK) was used. Tubes were spun briefly as needed to pellet the mixture and reduce foam. The resulting lysate was passed through syringes, such as a 21 gauge needle, to shear DNA. Total RNA was extracted using commercially available kits, such as the Qiagen RNeasy Mini kit. The samples were prepared and arrayed using Affymetrix U133 plus 2.0 GeneChips or Affymetrix U133A GeneChips. Any suitable gene chip may be used.

In one exemplary embodiment, total RNA was extracted using the Qiashredder and Qiagen RNeasy Mini kit and the quality of RNA was checked by an Agilent 2100 Bioanalyzer. The targets for Affymetrix DNA microarray analysis were prepared according to the manufacturer's instructions. Biotin-labeled cRNA, produced by in vitro transcription, was fragmented and hybridized to the Affymetrix U133A GeneChip arrays at 45° C for 16 hrs and then washed and stained using the GeneChip Fluidics. The arrays were scanned by a GeneArray Scanner and patterns of hybridization were detected as light emitted from the fluorescent reporter groups incorporated into the target and hybridized to oligonucleotide probes. Full details of the methods used for RNA extraction and development of gene expression data from lung and ovarian tumors have been described previously. (BiId A, Yao G, Chang JT, et al: Oncogenic pathways signatures in human cancers as guide to targeted

therapies. Nature 439(7074):353-357, 200, Potti A, Dressman HK, BiId A, et al: Genomic signatures to guide the use of chemotherapeutics. Nature Medicine 12(11): 1294-1300, 2006).

In one embodiment, determining the expression level (or obtaining a first gene expression profile) of multiple genes in a tumor sample from the subject comprises extracting a nucleic acid sample from the sample from the subject. In certain embodiments, the nucleic acid sample is an mRNA sample. In one embodiment, the expression level of the nucleic acid is determined by hybridizing the nucleic acid, or amplification products thereof, to a

DNA microarray. Amplification products may be generated, for example, with reverse transcription, optionally followed by PCR amplification of the products.

(C) Genes Screened

In one embodiment, the predictive methods of the invention comprise determining the expression level of all the genes in the cluster that define at least one therapeutic sensitivity/resistance determinative metagene. In one embodiment, the predictive methods of the invention comprise determining the expression level of at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99% of the genes in each of the clusters that defines 1 or 2 or more therapeutic sensitivity/resistance determinative metagenes. A metagene is a cluster or set of genes which may be used to predict sensitivity or resistance to a therapeutic agent.

In one embodiment, at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99% of the genes whose expression levels are used in order to predict sensitivity to the chemotherapeutic agent (or the genes in the cluster that define a metagene having said predictivity) are genes listed in one of Tables 1-8. In one embodiment, at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99% of the genes whose expression levels are determined to predict sensitivity to more than one chemotherapeutic agent (or the genes in the cluster that define a metagene having said predictivity) includes genes listed in more than one of Tables 1-8.

In one embodiment, at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99% of the genes listed in one of Tables 1-8 are used to predict responsiveness of a cancer to the corresponding chemotherapeutic agent. Tables 1-8 show the genes in the cluster that are used to define metagenes and indicate the therapeutic agent whose sensitivity it predicts. In one embodiment, at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99% of the genes whose expression levels are determined to predict 5 -FU sensitivity (or the genes in the cluster that define a metagene having said predictivity) are genes represented by the following

symbols: LOC92755 ( TUBB, LOC648765), CDKN2A, TRA@, GABRA3, COL11A2, ACTB, PDLIM4, ACTA2, FTSJl, NBRl (LOC727732), CFLl, ATP1A2, APOC4, K1AA1509, ZNF516, GRIK5, PDE5A, ARSF, ZC3H7B, WBP4, CSTB, TSPYl (TSPY2, LOC653174, LOC728132, LOC728137, LOC728395, LOC728403, LOC728412), HTR2B, KBTBDI l, SLC25A17, HMGN3, FIBP, IFT 140, FAM63B, ZNF337, KlAAOlOO, FAM13C1, STK25, CPNEl, PEX19, EIF5B, EEFlAl (APOLDl, LOC440595), SRR, THEM2, ID4, GGTl (GGTL4), IFNAlO, TUBB2A (TUBB4, TUBB2B), and TUBB3.

In one embodiment, at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99% of the genes whose expression levels are determined to predict adriamycin sensitivity are genes represented by the following symbols: MLANA, CSPG4, DDR2, ETS2, EGFR, BIK, CD24, ZNF185, DSCRl, GSN, TPSTl, LCN2, FAIM3, NCK2, PDZRN3, FKBP2, KRT8, NRP2, PKP2, CLDN3, CAPNl, STXBPl, LY96, WWCl, C10orf56, SPINT2, MAGED2, SYNGR2, SGCD, LAMC2, C19orf21, ZFHXlB, KRT18, CYBA, DSP, IDl, IDl, PSAP, ZNF629, ARHGAP29, ARHGAP8 (LOC553158), GPM6B, EGFR, CALU, KCNKl, RNF144, FEZl, MEST, KLF5, CSPG4, FLNB, GYPC, SLC23A2, MITF, PITPNMl, GPNMB, PMP22, PLXNB3 (SRPK3), MIA, RAB40C, MAD2L1BP, PLOD3, VIL2, KLF9, PODXL, ATP6V1B2, SLC6A8, PLPl, KRT7, PKP3, DLG3, ZHX2, LAMA5, SASHl, GASl, TACSTDl, GASl, and CYP27A1.

In one embodiment, at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99% of the genes whose expression levels are determined to predict Cytoxan sensitivity (or the genes in the cluster that define a metagene having said predictivity) are genes represented by the following symbols: DAP3, RPS9, TTR, ACTB, MARCKS, GGTl (GGT2), GGTL4, GGTLA4, LOC643171, LOC653590, LOC728226, LOC728441, LOC729S38, LOC73 1629), FANCA, CDC42EP3, TSPAN4, C60rfl45, ARNT2, KIF22 (LOC728037), NBEAL2, CA Vl, SCRNl, SCHIPl, PHLDBl, AKAP12, ST5, SNAI2, ESD, ANP32B, CD59, ACTNl, CD59, PEGlO, SMARCAl, GGCX, SAMD4A, CNN3, LPP, SNRPF, SGCE, CALDl, and C220rf5.

In one embodiment, at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99% of the genes whose expression levels are determined to predict docetaxel sensitivity (or the genes in the cluster that define a metagene having said predictivity) are genes represented by the following symbols: BLRl, EIF4A2, FLTl, BAD, PIP5K3, BINl, YBXl, BCKDK, DOHH, FOXDl, TEX261, NBRl (LOC727732), APOA4, DDX5, TBCA, USP52, SLC25A36, CHP, ANKRD28, PDXK, ATP6AP1, SETD2, CCS, BRD2, ASPHDl, B4GALT6, ASL, CAPZA2,

STARD3, LIMK2 (PPPIR14BP1), BANFl, GNB2, ENSA, SH3GL1, ACVRlB, SLC6A1, PPP2R1A, PCGFl, LOC643641, INPP5A, TLEl, PLLP, ZKSCANl, TIALl, TKl, PPP2R1 A, and PSMB6.

In one embodiment, at least 50%, 60%, 70%), 80%, 90%, 95%, 98%, 99% of the genes whose expression levels are determined to predict etoposide sensitivity are genes represented by the following symbols: LIMKl, LIG3, AXL, IFI16, MMP14, GRB7, VAV2, FLTl, JUP, FNl, FNl, PKM2, LYPLA3, RFTNl, LADl, SPINTl, CLDN3, PTRF, SPINT2, MMP14, FAAH, CLDN4, ST14, C19orf21, KIAA0506, LLGL2 (MADD), COBL, ZFHXlB, GBPl, 1ER2, PPL, TMEM30B, CNKSRl, CLDN7, BTN3A2, BTN3A2, TUBB2A, MAP7, HNRNPG-T, UGCG, GAK, PKP3, DFNA5, DAB2, TACSTDl, SPARC, and PPP2R5A.

In one embodiment, at least 50%,60%,70%,80%,90%,95%,98%,99% of the genes whose expression levels are determined to predict taxol sensitivity (or the genes in the cluster that define a metagene having said predictivity) are genes represented by the following symbols: NR2F6, TOP2B, RARG, PCNA, PTPNI l, ATM, NFATC4, CACNGl, C22orf31, PIK3R2, PRSS12, MYH8, SCCPDH, PHTF2, IQSEC2, TRPC3, TRAFDl, HEPH, SOX30, GATM, LMNA, HD, YIPF3, DNPEP, PCDH9, KLHDC3, SLC10A3, LHX2, CKS2, SECTMl, SFl, RPS6KA4, DYRK2, GDI2, and IFI30.

In one embodiment, at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99% of the genes whose expression levels are determined to predict topotecan sensitivity (or the genes in the cluster that define a metagene having said predictivity) are genes represented by the following symbols: DUSPl, THBSl, AXL, RAPlGAP, QSCN6, ILlRl, TGFBI, PTX3, BLM, TNFRSFlA, FGF2, VEGFC, AC02, FARSLA, RIN2, FGF2, RRAS, FIGF, MYB, CDH2, FGFRl, FGFRl, LAMCl, HIST1H4K (HIST1H4J), COL6A2, TMC6, PEA15, MARCKS, CKAP4, GJAl, FBNl, BASPl, BASPl, BTN2A1, ITGBl, DKFZP686A01247, MYLK, LOXL2, HEGl, DEGSl, CAP2, CAP2, PTGER4, BAI2, NUAKl, DLEUl (SPANXC), RAB11FIP5, FSTL3, MYL6, VIM, GNA12, PRAF2, PTRF, CCL2, PLOD2, COL6A2, ATP5G3, GSR, NDUFS3, ST14, NIDI, MYOlD, SDHB, CAVl, DPYSL3, PTRF, FBXL2, RIN2, PLEKHCl, CTGF, COL4A2, TPMl, TPMl, TPMl, FZD2, LOXLl, SYK, HADHA, TNFAIPl, NNMT, HPGD, MRC2, MEIS3P1, AOXl, SEMA3C, SEMA3C, SYNEl, SERPINEl, IL6, RRAS, GPDlL, AXL, WDR23, CLDN7, IL15, TNFAIP2, CYR61, LRPl, AMOTL2, PDElB, SPOCKl, RAI14, PXDN, COL4A1, ClR, KIAA0802 (C21orf57), C50rfl3, TUFM, EDIL3, BDNF, PRSS23, ATP5A1, FRAT2, C16orf51, TUSC4, NUP50, TUBA3, NFIB, TLE4, AKT3, CRIMl, RAD23A, COX5A, SMCR7L, MXRA7,

STARD7, STCl, TTC28, PLK2, TGDS, CALDl, OPTN, IFITM3, DFNA5, FGFRl, HTATIP, SYK, LAMBl, FZD2, SERPINEl, THBSl, CCL2, ITGA3, ITGA3, and UBE2A.

(D) Metagene Valuation

In one embodiment, the predictive methods of the invention comprise defining the value of one or more metagenes from the expression levels of the genes. A metagene value is defined by extracting a single dominant value from a cluster of genes associated with sensitivity to an anti-cancer agent.

In one embodiment, the dominant single value is obtained using single value decomposition (SVD). In one embodiment, the cluster of genes of each metagene or at least of one metagene comprises at least 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 18, 20 or 25 genes.

In one embodiment, the predictive methods of the invention comprise defining the value of at least one metagene wherein the genes in the cluster of genes from which the metagene is defined, shares at least 50%, 60%, 70%, 80%, 90%, 95% or 98% of genes in common to the genes in one of Tables 1-8. In one embodiment, the predictive methods of the invention comprise defining the value of at least two metagenes, wherein the genes in the cluster of genes from which each metagene is defined share at least 50%, 60%, 70%, 80%, 90%, 95% or 98% of genes in common to the genes in any one of Tables 1-8. In one embodiment, the predictive methods of the invention comprise defining the value of a metagene from a cluster of genes, wherein at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 genes in the cluster are selected from the genes listed in one of Tables 1- 8.

In one embodiment, the clusters of genes that define each metagene were identified using supervised classification methods of analysis as previously described. See, for example, West, M. et al. Proc Natl Acad Sci USA 98, 11462-11467 (2001). A set of genes whose expression levels are most highly correlated with the classification of tumor samples into sensitivity to an anti-cancer agent versus no sensitivity to an anti-cancer agent were selected. The dominant principal components from such a set of genes defines a relevant phenotype-related metagene, and regression models, such as binary regression models, were used to assign the relative probability of sensitivity to an anti-cancer agent. (E) Predictions from Tree Models

In one embodiment, the methods comprise averaging the predictions of one or more statistical tree models applied to the metagene values, wherein each model includes one or

more nodes, each node representing a metagene, each node including a statistical predictive probability of sensitivity to an anti-cancer agent. The statistical tree models may be generated using the methods described herein for the generation of tree models. General methods of generating tree models may also be found in the art (See for example Pitman et al, Biostatistics 2004;5:587-601; Denison et al. Biometrika 1999;85:363-77; Nevins et al. Hum MoI Genet 2003;12:R153-7; Huang et al. Lancet 2003;361 : 1590-6; West et al. Proc Natl A cad Sd USA 2001 ;98:11462-7; U.S. Patent Pub. Nos. 2003-0224383; 2004-0083084; 2005- 0170528; 2004-0106113; and U.S. Application No. 11/198782).

In one embodiment, the methods comprise deriving a prediction from a single statistical tree model, wherein the model includes one or more nodes, each node representing a metagene, each node including a statistical predictive probability of sensitivity to an anticancer agent. In alternative embodiments, the tree may comprise at least 2, 3, 4, or 5 nodes.

In one embodiment, the methods comprise averaging the predictions of one or more statistical tree models applied to the metagene values, wherein each model includes one or more nodes, each node representing a metagene, each node including a statistical predictive probability of sensitivity to an anti-cancer agent. Accordingly, the invention provides methods that use mixed trees, where a tree may contain at least two nodes, where each node represents a metagene representative of the sensitivity/resistance to a particular agent.

In one embodiment, the statistical predictive probability was derived from a Bayesian analysis. In another embodiment, the Bayesian analysis included a sequence of Bayes factor based tests of association to rank and select predictors that define a node binary split, the binary split including a predictor/threshold pair. Bayesian analysis is an approach to statistical analysis that is based on the Bayes law, which states that the posterior probability of a parameter p is proportional to the prior probability of parameter p multiplied by the likelihood of p derived from the data collected. This methodology represents an alternative to the traditional (or frequentist probability) approach: whereas the latter attempts to establish confidence intervals around parameters, and/or falsify a-priori null-hypotheses, the Bayesian approach attempts to keep track of how a priori expectations about some phenomenon of interest can be refined, and how observed data can be integrated with such a priori beliefs, to arrive at updated posterior expectations about the phenomenon. Bayesian analysis has been applied to numerous statistical models to predict outcomes of events based on available data. These include standard regression models, e.g. binary regression models, as well as to more complex models that are applicable to multi-variate and essentially non-linear data.

Another such model is commonly known as the tree model which is essentially based on a decision tree. Decision trees can be used in clarification, prediction and regression. A decision tree model is built starting with a root mode, and training data partitioned to what are essentially the "children" nodes using a splitting rule. For instance, for clarification, training data contains sample vectors that have one or more measurement variables and one variable that determines that class of the sample. Various splitting rules may be used. A statistical predictive tree model to which Bayesian analysis is applied may consistently deliver accurate results with high predictive capabilities. Other statistical models known to those of skill in the art may be used. Gene expression signatures that reflect the activity of a given pathway may be identified using supervised classification method of analysis previously described (e.g., West, M. et al. Proc Natl Acad Sci USA 98, 11462-11467, 2001). The analysis selects a set of genes whose expression levels are most highly correlated with the classification of tumor samples into sensitivity to an anti-cancer agent versus no sensitivity to an anti-cancer agent. The dominant principal components from such a set of genes then defines a relevant phenotype-related metagene, and regression models assign the relative probability of sensitivity to an anti-cancer agent.

In one embodiment, each statistical tree model generated by the methods described herein comprises 2, 3, 4, 5, 6 or more nodes. In one embodiment of the methods described herein for defining a statistical tree model predictive of sensitivity/resistance to a therapeutic, the resulting model predicts cancer sensitivity to an anti-cancer agent with at least 70%, 80%, 85%, or 90% or higher accuracy. In another embodiment, the model predicts sensitivity to an anti-cancer agent with greater accuracy than clinical variables. In one embodiment, the clinical variables are selected from age of the subject, gender of the subject, tumor size of the sample, stage of cancer disease, histological subtype of the sample and smoking history of the subject. In one embodiment, the cluster of genes that define each metagene comprise at least 3, 4, 5, 6, 7, 8, 9, 10, 12 or 15 genes. In one embodiment, the correlation-based clustering is Markov chain correlation-based clustering or K-means clustering.

Gene Chips and Kits Arrays and microarrays which contain the gene expression profiles for determining responsivity to the chemotherapeutic agents as disclosed here are also encompassed within

the scope of this invention. Methods of making arrays are well-known in the art and as such do not need to be described in detail here.

Such arrays can contain the profiles of 5, 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 200 or more genes as disclosed in the Tables. Accordingly, arrays for detection of responsivity to particular therapeutic agents can be customized for diagnosis or treatment of specific cancers, such as ovarian cancer, breast cancer, or NSCLC. The array can be packaged as part of kit comprising the customized array itself and a set of instructions for how to use the array to determine an individual's responsivity to a specific cancer therapeutic agent.

Also provided are reagents and kits for practicing one or more of the above described methods. The subject reagents and kits thereof may vary greatly. Reagents of interest include reagents specifically designed for use in production of the above described metagene values.

One type of such reagent is an array probe of nucleic acids, such as a DNA chip, in which the genes defining the metagenes in the therapeutic efficacy predictive tree models are represented. A variety of different array formats are known in the art, with a wide variety of different probe structures, substrate compositions and attachment technologies. Representative array structures of interest include those described in U.S. Pat. Nos. 5,143,854; 5,288,644; 5,324,633; 5,432,049; 5,470,710; 5,492,806; 5,503,980; 5,510,270; 5,525,464; 5,547,839; 5,580,732; 5,661,028; 5,800,992; as well as WO 95/21265; WO 96/31622; WO 97/10365; WO 97/27317; EP 373 203; and EP 785 280; the disclosures of which are herein incorporated by reference.

The DNA chip is conveniently used to compare the expression levels of a number of genes at the same time. DNA chip-based expression profiling can be carried out, for example, by the method as disclosed in "Microarray Biochip Technology" (Mark Schena, Eaton Publishing, 2000). A DNA chip comprises immobilized high-density probes to detect a number of genes. Thus, the expression levels of many genes can be estimated at the same time by a single-round analysis. Namely, the expression profile of a specimen can be determined with a DNA chip. A DNA chip may comprise probes, which have been spotted thereon, to detect the expression level of the metagene-defming genes of the present invention, i.e. the genes described in Tables 1-8. A probe may be designed for each marker gene selected, and spotted on a DNA chip. Such a probe may be, for example, an oligonucleotide comprising 5-50 nucleotide residues. Methods for synthesizing such oligonucleotides on DNA chips are known to those skilled in the art. Longer DNAs can be

synthesized by PCR or chemically. Methods for spotting long DNA, which is synthesized by PCR or the like, onto a glass slide are also known to those skilled in the art. A DNA chip that is obtained by the methods described above can be used for estimating the efficacy of a therapeutic agent in treating a subject afflicted with cancer according to the present invention. DNA microarray and methods of analyzing data from microarrays are well-described in the art, including in DNA Microarrays: A Molecular Cloning Manual. Ed. by Bowtel and Sambrook (Cold Spring Harbor Laboratory Press, 2002); Microarraysfor an Integrative Genomics by Kohana (MIT Press, 2002); A Biologist's Guide to Analysis of DNA Micraarray Data, by Knudsen (Wiley, John & Sons, Incorporated, 2002); DNA Microarrays: A Practical Approach, Vol. 205 by Schema (Oxford University Press, 1999); and Methods of Microarray Data Analysis II, ed. by Lin et al. (Kluwer Academic Publishers, 2002) all of which are incorporated herein by reference.

One aspect of the invention provides a kit comprising: (a) any of the gene chips described herein; and (b) one of the computer-readable mediums described herein. In some embodiments, the arrays include probes for at least 2, 3, 4, 5, 6, 7, 8, 9, 10,

15, 20, 25, 30, 40, or 50 of the genes listed in one of Tables 1-8. In certain embodiments, the number of genes that are from one of the Tables that are represented on the array is at least 5, at least 10, at least 25, at least 50, at least 75 or more, including all of the genes listed in the table. Where the subject arrays include probes for additional genes not listed in the tables, in certain embodiments the number % of additional genes that are represented does not exceed about 50%, 40%, 30%, 20%, 15%, 10%, 8%, 6%, 5%, 4%, 3%, 2% or 1%. In some embodiments, a great majority of genes in the collection are genes that define the metagenes of the invention, whereby great majority is meant at least about 75%, usually at least about 80% and sometimes at least about 85, 90, 95% or higher, including embodiments where 100% of the genes in the collection are metagene-defining genes. In an alternative embodiment, the arrays for use in the invention may include a majority of probes that are not listed in any of Tables 1-8.

The kits of the subject invention may include the above described arrays or gene chips. The kits may further include one or more additional reagents employed in the various methods, such as primers for generating target nucleic acids, dNTPs and/or rNTPs, which may be either premixed or separate, one or more uniquely labeled dNTPs and/or rNTPs, such as biotinylated or Cy3 or Cy5 tagged dNTPs, gold or silver particles with different scattering spectra, or other post synthesis labeling reagent, such as chemically active derivatives of

fluorescent dyes, enzymes, such as reverse transcriptases, DNA polymerases, RNA polymerases, and the like, various buffer mediums, e.g. hybridization and washing buffers, prefabricated probe arrays, labeled probe purification reagents and components, like spin columns, etc., signal generation and detection reagents, e,g. streptavidin-alkaline phosphatase conjugate, chemifluorescent or chemiluminescent substrate, and the like.

In addition to the above components, the subject kits further include instructions for practicing the subject methods. These instructions may be present in the subject kits in a variety of forms, one or more of which may be present in the kit. One form in which these instructions may be present is as printed information on a suitable medium or substrate, e.g., a piece or pieces of paper on which the information is printed, in the packaging of the kit, in a package insert, etc. Yet another means would be a computer readable medium, e.g., diskette, CD, etc., on which the information has been recorded. Yet another means that may be present is a website address which may be used via the internet to access the information at a remote site. Any convenient means of conveying instructions may be present in the kits. The kits also include packaging material such as, but not limited to, ice, dry ice, styrofoam, foam, plastic, cellophane, shrink wrap, bubble wrap, paper, cardboard, starch peanuts, twist ties, metal clips, metal cans, drierite, glass, and rubber.

Diagnostic Business Methods

One aspect of the invention provides methods of conducting a diagnostic business, including a business that provides a health care practitioner with diagnostic information for the treatment of a subject afflicted with cancer. One such method comprises one, more than one, or all of the following steps: (i) obtaining an tumor sample from the subject; (ii) determining the expression level of multiple genes in the sample; (iii) defining the value of one or more metagenes from the expression levels of step (ii), wherein each metagene is defined by extracting a single dominant value using single value decomposition (SVD) from a cluster of genes associated with sensitivity to an anti-cancer agent; (iv) averaging the predictions of one or more statistical tree models applied to the values, wherein each model includes one or more nodes, each node representing a meta gene, each node including a statistical predictive probability of sensitivity to an anti-cancer agent, wherein at least one metagene is one of metagenes 1-7; and (v) providing the health care practitioner with the prediction from step (iv).

In one embodiment, obtaining a tumor sample from the subject is effected by having an agent of the business (or a subsidiary of the business) remove a tumor sample from the subject, such as by a surgical procedure. In another embodiment, obtaining a tumor sample from the subject comprises receiving a sample from a health care practitioner, such as by shipping the sample, preferably frozen. In one embodiment, the sample is a cellular sample, such as a mass of tissue. In one embodiment, the sample comprises a nucleic acid sample, such as a DNA, eDNA, mRNA sample, or combinations thereof, which was derived from a cellular tumor sample from the subject. In one embodiment, the prediction from step (iv) is provided to a health care practitioner, to the patient, or to any other business entity that has contracted with the subject.

In one embodiment, the method comprises billing the subject, the subject's insurance carrier, the health care practitioner, or an employer of the health care practitioner. A government agency, whether local, state or federal, may also be billed for the services. Multiple parties may also be billed for the service. In some embodiments, all the steps in the method are carried out in the same general location. In certain embodiments, one or more steps of the methods for conducting a diagnostic business are performed in different locations. In one embodiment, step (ii) is performed in a first location, and step (iv) is performed in a second location, wherein the first location is remote to the second location. The other steps may be performed at either the first or second location, or in other locations. In one embodiment, the first location is remote to the second location. A remote location could be another location (e.g. office, lab, etc.) in the same city, another location in a different city, another location in a different state, another location in a different country, etc. As such, when one item is indicated as being "remote" from another, what is meant is that the two items are at least in different buildings, and may be at least one mile, ten miles, or at least one hundred miles apart In one embodiment, two locations that are remote relative to each other arc at least 1, 2, 3, 4, 5, 10, 20, 50, 100, 200, 500, 1000, 2000 or 5000 km apart. In another embodiment, the two locations are in different countries, where one of the two countries is the United States.

Some specific embodiments of the methods described herein where steps are performed in two or more locations comprise one or more steps of communicating information between the two locations. "Communicating" information means transmitting the data representing that information as electrical signals over a suitable communication channel (for example, a private or public network). "Forwarding" an item refers to any

means of getting that item from one location to the next, whether by physically transporting that item or otherwise (where that is possible) and includes, at least in the case of data, physically transporting a medium carrying the data or communicating the data. The data may be transmitted to the remote location for further evaluation and/or use. Any convenient telecommunications means may be employed for transmitting the data, e.g., facsimile, modem, internet, etc.

In one specific embodiment, the method comprises one or more data transmission steps between the locations. In one embodiment, the data transmission step occurs via an electronic communication link, such as the internet. In one embodiment, the data transmission step from the first to the second location comprises experimental parameter data, such as the level of gene expression of multiple genes. In some embodiments, the data transmission step from the second location to the first location comprises data transmission to intermediate locations. In one specific embodiment, the method comprises one or more data transmission substeps from the second location to one or more intermediate locations and one or more data transmission substeps from one or more intermediate locations to the first location, wherein the intermediate locations are remote to both the first and second locations. In another embodiment, the method comprises a data transmission step in which a result from gene expression is transmitted from the second location to the first location.

In one embodiment, the methods of conducting a diagnostic business comprise the step of determining if the subject carries an allelic form of a gene whose presence correlates to sensitivity or resistance to a chemotherapeutic agent. This may be achieved by analyzing a nucleic acid sample from the patient and determining the DNA sequence of the allele. Any technique known in the art for determining the presence of mutations or polymorphisms may be used. 'The method is not limited to any particular mutation or to any particular allele or gene. For example, mutations in the epidermal growth factor receptor (EGFR) gene are found in human lung adenocarcinomas and are associated with sensitivity to the tyrosine kinase inhibitors gefitinib and erlotinib. (See, e.g., Yi et al. Proc Natl Acad Sci USA. 2006 May 16;103(20):7817-22; Shimato et al. Neuro-oncol 2006 Apr;8(2): 137-44). Similarly, mutations in breast cancer resistance protein (HCRP) modulate the resistance of cancer cells to BCRP-substrate anticancer agents (Yanase et al., Cancer Lett. 2006 Mar 8;234(1 ):73-80).

Computer Readable Media Comprising Gene Expression Profiles

The invention also contemplates computer readable media that comprises gene expression profiles. Such media can contain all or part of the gene expression profiles of the genes listed in the Tables that comprise the responsivity predictor set. The media can be a list of the genes or contain the raw data for running a user's own statistical calculation, such as the methods disclosed herein.

Another aspect of the invention provides a program product (i.e., software product) for use in a computer device that executes program instructions recorded in a computer- readable medium to perform one or more steps of the methods described herein, such for estimating the efficacy of a therapeutic agent in treating a subject afflicted with cancer.

One aspect of the invention provides a computer readable medium having computer readable program codes embodied therein, the computer readable medium program codes performing one or more of the following functions: defining the value of one or more metagenes from the expression levels of genes in known responsive and sensitive cells; defining a metagene value by extracting a single dominant value using singular value decomposition (SVD) from a cluster of genes associated with tumor sensitivity to a therapeutic agent; averaging the predictions of one or more statistical tree models applied to the values of the metagenes; or averaging the predictions of one or more binary regression models applied to the values of the metagenes, wherein each model includes a statistical predictive probability of tumor sensitivity to a therapeutic agent.

Another related aspect of the invention provides kits comprising the program product or the computer readable medium, optionally with a computer system. One aspect of the invention provides a system, the system comprising: a computer (See Figure 15); a computer readable medium, operatively coupled to the computer, the computer readable medium program codes performing one or more of the following functions: defining the value of one or more metagenes from the expression levels genes; defining a metagene value by extracting a single dominant value using singular value decomposition (SVD) from a cluster of genes associated tumor sensitivity to a therapeutic agent; averaging the predictions of one or more statistical tree models applied to the values of the metagenes; or averaging the predictions of one or more binary regression models applied to the values of the metagenes, wherein each model includes a statistical predictive probability of tumor sensitivity to a therapeutic agent.

In one embodiment, the program product comprises: a recordable medium; and a plurality of computer-readable instructions executable by the computer device to analyze data

from the array hybridization steps, to transmit array hybridization from one location to another, or to evaluate genome -wide location data between two or more genomes. Computer readable media include, but are not limited to, CD-ROM disks (CD-R, CD-RW), DVD-RAM disks, DVD-RW disks, floppy disks and magnetic tape. A related aspect of the invention provides kits comprising the program products described herein. The kits may also optionally contain paper and/or computer-readable format instructions and/or information, such as, but not limited to, information on DNA microarrays, on tutorials, on experimental procedures, on reagents, on related products, on available experimental data, on using kits, on chemotherapeutic agents including their toxicity, and on other information. The kits optionally also contain in paper and/or computer- readable format information on minimum hardware requirements and instructions for running and/or installing the software. The kits optionally also include, in a paper and/or computer readable format, information on the manufacturers, warranty information, availability of additional software, technical services information, and purchasing information. The kits optionally include a video or other viewable medium or a link to a viewable format on the internet or a network that depicts the use of the software, and/or use of the kits. The kits also include packaging material such as, but not limited to, styrofoam, foam, plastic, cellophane, shrink wrap, bubble wrap, paper, cardboard, starch peanuts, twist ties, metal clips, metal cans, drierite, glass, and rubber. The analysis of data, as well as the transmission of data steps, can be implemented by the use of one or more computer systems. Computer systems are readily available. The processing that provides the displaying and analysis of image data for example, can be performed on multiple computers or can be performed by a single, integrated computer or any variation thereof. The components contained in the computer system are those typically found in general purpose computer systems used as servers, workstations, personal computers, network terminals, and the like. See Figure 15. In fact, these components are intended to represent a broad category of such computer components that are well known in the art.

Figure 15 shows a functional block diagram of general purpose computer system 1500 for performing the functions of the software according to an illustrative embodiment of the invention. The exemplary computer system 1500 includes a central processing unit (CPU) 3002, a memory 1504, and an interconnect bus 1506. The CPU 1502 may include a single microprocessor or a plurality of microprocessors for configuring computer system 1500 as a

multi-processor system. The memory 1504 illustratively includes a main memory and a read only memory. The computer 1500 also includes the mass storage device 1508 having, for example, various disk drives, tape drives, etc. The main memory 1504 also includes dynamic random access memory (DRAM) and high-speed cache memory. In operation, the main memory 1504 stores at least portions of instructions and data for execution by the CPU 1502.

The mass storage 1508 may include one or more magnetic disk or tape drives or optical disk drives, for storing data and instructions for use by the CPU 1502. At least one component of the mass storage system 1508, preferably in the form of a disk drive or tape drive, stores one or more databases, such as databases containing of transcriptional start sites, genomic sequence, promoter regions, or other information.

The mass storage system 1508 may also include one or more drives for various portable media, such as a floppy disk, a compact disc read only memory (CD-ROM), or an integrated circuit non- volatile memory adapter (i.e., PC-MCIA adapter) to input and output data and code to and from the computer system 1500. The computer system 1500 may also include one or more input/output interfaces for communications, shown by way of example, as interface 1510 for data communications via a network. The data interface 1510 may be a modem, an Ethernet card or any other suitable data communications device. To provide the functions of a computer system according to Figure 15 the data interface 1510 may provide a relatively high-speed link to a network, such as an intranet, internet, or the Internet, either directly or through an another external interface. The communication link to the network may be, for example, optical, wired, or wireless (e.g., via satellite or cellular network). Alternatively, the computer system 1500 may include a mainframe or other type of host computer system capable of Web-based communications via the network. The computer system 1500 also includes suitable input/output ports or use the interconnect bus 1506 for interconnection with a local display 1512 and keyboard 1514 or the like serving as a local user interface for programming and/or data retrieval purposes. Alternatively, server operations personnel may interact with the system 1500 for controlling and/or programming the system from remote terminal devices via the network. The following examples are provided to illustrate aspects of the invention but are not intended to limit the invention in any manner.

EXAMPLES

Example 1. A gene expression based predictor of sensitivity to docetaxel

The NCI-60 panel 49 was used to develop predictors of chemotherapeutic drug response, and cell lines that were most resistant or sensitive to docetaxel were identified (Figure IA, B). Genes whose expression most highly correlated with drug sensitivity, using Bayesian binary regression analysis, were selected to develop a model that differentiates a pattern of docetaxel sensitivity from resistance. A gene expression signature consisting of 50 genes was identified that classified on the basis of docetaxel sensitivity (Figure 1 B, bottom panel). In addition to leave-one-out cross validation, we utilized an independent dataset derived from docetaxel sensitivity assays in a series of 30 lung and ovarian cancer cell lines for further validation. As shown in Figure 1C (top panel), the correlation between the predicted probability of sensitivity to docetaxel (in both lung and ovarian cell lines) and the respective IC50 for docetaxel confirmed the capacity of the docetaxel predictor to predict sensitivity to the drug in cancer cell lines (Figure 7). In each case, the accuracy exceeded 80%. Finally, a second independent dataset including 29 lung cancer cell lines (Gemma A, GEO accession number: GSE 4127), was used to predict and measure docetaxel sensitivity. As shown in Figure 1C (bottom panel), the docetaxel sensitivity model developed from the NCI-60 panel again predicted sensitivity in this independent data set, again with an accuracy exceeding 80%.

Example 2. Utilization of the expression signature to predict docetaxel response in patients

The development of a gene expression signature capable of predicting in vitro docetaxel sensitivity provides a tool that might be useful in predicting response to the drug in patients. We made use of published studies with clinical and genomic data that linked gene expression data with clinical response to docetaxel in a breast cancer neoadjuvant study 50 (Figure ID) to test the capacity of the in vitro docetaxel sensitivity predictor to accurately identify those patients that responded to docetaxel. Using a 0.45 predicted probability of response as the cut-off for predicting positive response, as determined by ROC curve analysis (Figure 7A), the in vitro generated profile correctly predicted docetaxel response in 22 out of 24 patient samples, achieving an overall accuracy of 91.6% (Figure ID). Applying a Mann- Whitney U test for statistical significance demonstrates the capacity of the predictor to

distinguish resistant from sensitive patients (Figure ID, right panel). We extended this further by predicting the response to docetaxel as salvage therapy for ovarian cancer. As shown in Figure IE, the prediction of response to docetaxel in patients with advanced ovarian cancer achieved an accuracy exceeding 85% (Figure IE, middle panel). Further, an analysis of statistical significance demonstrated the capacity of the predictors to distinguish patients with resistant versus sensitive disease (Figure IE, right panel).

We also performed a complementary analysis using the patient response data to generate a predictor and found that the in vivo generated signature of response predicted sensitivity of NCI-60 cell lines to docetaxel (Figure 7B). This crossover is further emphasized by the fact that the genes represented in either the initial in vitro generated docetaxel predictor or the alternative in vivo predictor exhibit considerable overlap. (Table 4). We also note that the predictor of docetaxel sensitivity developed from the NCI-60 data was more accurate in predicting patient response in the ovarian samples than the predictor developed from the breast neoadjuvant patient data (85.7% vs. 64.3%) (Figure 7C).

Example 3. Development of a panel of gene expression signatures that predict sensitivity to chemotherapeutic drugs

Given the development of a docetaxel response predictor, we examined the NCI-60 data set for other opportunities to develop predictors of chemotherapy response. Shown in Figure 2A are a series of expression profiles developed from the NCI-60 dataset that predict response to topotecan, adriamycin, etoposide, 5-flourouracil (5-FU), taxol (paclitaxel), and cyclophosphamide (cytotoxan). In each case, the leave-one-out cross validation analyses demonstrate a capacity of these profiles to accurately predict the samples utilized in the development of the predictor (Figure 8B). Each profile was then further validated using in vitro response data from independent datasets; in each case, the profile developed from the NCI-60 data was capable of accurately (> 85%) predicting response in the separate dataset of approximately 30 cancer cell lines for which the dose response information and relevant Affymetrix Ul 33 A gene expression data is publicly available 37 (Figure 8C) and Table 16). Once again, applying a Mann- Whitney U test for statistical significance demonstrates the capacity of the predictor to distinguish resistant from sensitive patients (Figure 2B). In addition to the capacity of each signature to distinguish cells that are sensitive or resistant to a particular drug, we also evaluated the extent to which a signature was also specific for an individual chemotherapeutic agent. From the example shown in Figure 9,

using the validations of chemosensitivity seen in the independent European (UC) cell line data it is clear that each of the signatures is specific for the drug that was used to develop the predictor. In each case, individual predictors of response to the various cytotoxic drugs was plotted against cell lines known to be sensitive or resistant to a given chemotherapeutic agent (e.g., adriamycin, paclitaxel).

Given the ability of the in vitro developed gene expression profiles to predict response to docetaxel in the clinical samples, we extended this approach to test the ability of additional signatures to predict response to commonly used salvage therapies for ovarian cancer and an independent data set of samples from adriamycin treated patients (Evans W, GSE650, GSE651). As shown in Figure 2C, each of these predictors was capable of accurately predicting the response to the drugs in patient samples, achieving an accuracy in excess of 81 % overall. In each case, the positive and negative predictive values confirm the validity and clinical utility of the approach (Table 16).

Example 4. Chemotherapy response signatures predict response to multi-drug regimens Many therapeutic regimens make use of combinations of chemotherapeutic drugs raising the question as to the extent to which the signatures of individual therapeutic response will also predict response to a combination of agents. To address this question, we have made use of data from a breast neoadjuvant treatment that involved the use of paclitaxel, f-'flourouracil, adriamycin, and cyclophosphamide (TFAC) 55 ' 56 (Figure 3A). Using available data from the 51 patients to then predict response with each of the single agent signatures (paclitaxel, 5 -FU, adriamycin and cyclophosphamide) developed from the NCI-60 cell line analysis; we then compared to the clinical outcome information which was represented as complete pathologic response. As shown in Figure 3A (middle panel), the predicted response based on each of the individual chemosensitivity signatures indicated a significant distinction between the responders (n = 13) and non-responders (n = 38) with the exception of 5- flourouracil. Importantly, the combined probability of sensitivity to the four agents in this TFAC neoadjuvant regimen was calculated using the probability theorem and it is clear from this analysis that the prediction of response based on a combined probability of sensitivity, built from the individual chemosensitivity predictions yielded a statistically significant (p < 0.0001, Mann Whitney U) distinction between the responders and non-responders (Figure 3A, bottom panel).

As a further validation of the capacity to predict response to combination therapy, we made use of gene expression data generated from a collection of breast cancer (n = 45) samples from patients who received 5-flourouracil, adriamycin and cyclophosphamide (FAC) in the adjuvant chemotherapy set. As shown in Figure 3B (top panel), the predicted response based on signatures for 5 -FU, adriamycin, and cyclophosphamide indicated a significant distinction between the responders (n = 34) and non-responders (n = 11) for each of the single agent predictors. Furthermore, the combined probability of sensitivity to the three agents in the FAC regimen was calculated and shown in the middle panel of Figure 3B. It is evident from this analysis that the prediction of response based on a combined probability of sensitivity to the FAC regimen yielded a clear, significant (p < 0.001, Mann Whitney U) distinction between the responders and non-responders (accuracy: 82.2%, positive predictive value: 90.3%, negative predictive value: 64.3%). We note that while it is difficult to interpret the prediction of clinical response in the adjuvant setting since many of these patients were likely free of disease following surgery, the accurate identification of non-responders is a clear endpoint that does confirm the capacity of the signatures to predict clinical response.

As a further measure of the relevance of the predictions, we examined the prognostic significance of the ability to predict response to FAC. As shown in Figure 3B (bottom panel), there was a clear distinction in the population of patients identified as sensitive or resistant to FAC, as measured by disease-free survival (sensitive = blue, resistant = red). These results, taken together with the accuracy of prediction of response in the neoadjuvant setting where clinical endpoints are uncomplicated by confounding variables such as prior surgery, and results of the single agent validations, leads us to conclude that the signatures of chemosensitivity generated from the NCI-60 panel do indeed have the capacity to predict therapeutic response in patients receiving either single agent or combination chemotherapy (Table 17).

When comparing individual genes that constitute the predictors, it was interesting to observe that the gene coding for MAP-Tau, described previously as a determinant of paclitaxel sensitivity, 56 was also identified as a discriminator gene in the paclitaxel predictor generated using the NCI-60 data. Although, similar to the docetaxel example described earlier, a predictor for TFAC chemotherapy developed using the NCI-60 data was superior to the ability of the MAP-Tau based predictor described by Pusztai et al (Table 18).

Example 5. Patterns of predicted chemotherapy response across a spectrum of tumors

The availability of genomic-based predictors of chemotherapy response could potentially provide an opportunity for a rational approach to selection of drugs and combinations of drugs. With this in mind, we have utilized the panel of chemotherapy response predictors described in Figure 6 to profile the potential options for use of these agents, by predicting the likelihood of sensitivity to the agents in a large collection of breast, lung, and ovarian tumor samples. We then clustered the samples according to patterns of predicted sensitivity to the various chemotherapeutics, and plotted a heatmap in which high probability of sensitivity response is indicated by red and low probability or resistance is indicated by blue (Figure 4).

There are clearly evident patterns of predicted sensitivity to the various agents. In many cases, the predicted sensitivities to the chemotherapeutic agents are consistent with the previously documented efficacy of single agent chemotherapies in the individual tumor types 57 . For instance, the predicted response rate for etoposide, adriamycin, cyclophosphamide, and 5 -FU approximate the observed response for these single agents in breast cancer patients (Figure 11). Likewise, the predicted sensitivity to etoposide, docetaxel, and paclitaxel approximates the observed response for these single agents in lung cancer patients (Figure 11). This analysis also suggests possibilities for alternate treatments. As an example, it would appear that breast cancer patients likely to respond to 5-flourouracil are resistant to adriamycin and docetaxel (Figure 10A). Likewise, in lung cancer, docetaxel sensitive populations are likely to be resistant to etoposide (Figure 10B). This is a potentially useful observation considering that both etoposide and docetaxel are viable front-line options (in conjunction with cis/carboplatin) for patients with lung cancer 58 A similar relationship is seen between topotecan and adriamycin, both agents used in salvage chemotherapy for ovarian cancer (Figure 10C). Thus, by identifying patients/patient cohorts resistant to certain standard of care agents, one could avoid the side effects of that agent (e.g. topotecan) without compromising patient outcome, by choosing an alternative standard of care (e.g., adriamycin).

Example 6. Linking predictions of chemotherapy sensitivity to oncogenic pathway deregulation

Most patients who are resistant to chemotherapeutic agents are then recruited into a second or third line therapy or enrolled in a clinical trial. 38 ' 59 Moreover, even those patients

who initially respond to a given agent are likely to eventually suffer a relapse and in either case, additional therapeutic options are needed. As one approach to identifying such options, we have taken advantage of our recent work that describes the development of gene expression signatures that reflect the activation of several oncogenic pathways. 36 To illustrate the approach, we first stratified the NCI cell lines based on predicted docetaxel response and then examined the patterns of pathway deregulation associated with docetaxel sensitivity or resistance (Figure 13A). Regression analysis revealed a significant relationship between PB kinase pathway deregulation and docetaxel resistance, as seen by the linear relationship (p = 0.001) between the probability of PB kinase activation and the IC50 of docetaxel in the cell lines (Figure 12 and Table 8).

The results linking docetaxel resistance with deregulation of the PB kinase pathway, suggests an opportunity to employ a PB kinase inhibitor in this subgroup, given our recent observations that have demonstrated a linear positive correlation between the probability of pathway deregulation and targeted drug sensitivity. 36 To address this directly, we predicted docetaxel sensitivity and probability of oncogenic pathway deregulation using DNA microarray data from 17 NSCLC cell lines (Figure 5 A, top panel). Consistent with the analysis of the NCI-60 cell line panel, the cell lines predicted to be resistant to docetaxel were also predicted to exhibit PB kinase pathway activation (p = 0.03, log-rank test, Figure 14). In parallel, the lung cancer cell lines were subjected to assays for sensitivity to a PB kinase specific inhibitor (LY-294002), using a standard measure of cell proliferation. 36 ' 38 ' 59 As shown by the analysis in Figure 5B (top left panel), the cell lines showing an increased probability of PB kinase pathway activation were also more likely to respond to a PB kinase inhibitor (LY-294002) (p = 0.001, log-rank test)). The same relationship held for prediction of resistance to docetaxel - these cells were more likely to be sensitive to PB kinase inhibition (p < 0.001, log-rank test) (Figure 5B, top right panel).

An analysis of a panel of ovarian cancer cell lines provided a second example. Ovarian cell lines that are predicted to be topotecan resistant (Figure 5A, bottom panel) have a higher likelihood of Src pathway deregulation and there is a significant linear relationship (p = 0.001, log rank) between the probability of topotecan resistance and sensitivity to a drug that inhibits the Src pathway (SU6656) (Figure 5B, bottom right panel). The results of these assays clearly demonstrate an opportunity to potentially mitigate drug resistance (e.g., docetaxel or topotecan) using a specific pathway-targeted agent, based on a predictor developed from pathway deregulation (i.e., PB kinase or Src inhibition).

Taken together, these data demonstrate an approach to the identification of therapeutic options for chemotherapy resistant patients, as well as the identification of novel combinations for chemotherapy sensitive patients, and thus represents a potential strategy to a more effective treatment plan for cancer patients, after future prospective validations trials (Figure 6).

Example 7. Methods

NCI-60 data. The (-loglO(M)) GI50/IC50, TGI (Total Growth Inhibition dose) and LC50 (50% cytotoxic dose) data was used to populate a matrix with MA TLAB software, with the relevant expression data for the individual cell lines. Where multiple entries for a drug screen existed (by NCS number), the entry with the largest number of replicates was included. Incomplete data were assigned as Nan (not a number) for statistical purposes. To develop an in vitro gene expression based predictor of sensitivity/resistance from the pharmacologic data used in the NCI-60 drug screen studies, we chose cell lines within the NCI-60 panel that would represent the extremes of sensitivity to a given chemotherapeutic agent (mean GI50 +/- 1 SD). Relevant expression data (updated data available on the Affymetrix U95A2 GeneChip) for the solid tumor cell lines and the respective pharmacological data for the chemotherapeutics was downloaded from the NCI website (http://dtp.nci.nih.gov/docs/cancer/cancer_data.html). The individual drug sensitivity and resistance data from the selected solid tumor NCI-60 cell lines was then used in a supervised analysis using binary regression methodologies, as described previously, 60 to develop models predictive of chemotherapeutic response.

Human ovarian cancer samples. We measured expression of 22,283 genes in 13 ovarian cancer cell lines and 119 advanced (FIGO stage III/IV) serous epithelial ovarian carcinomas using Affymetrix U133A GeneChips. All ovarian cancers were obtained at initial cytoreductive surgery from patients. All tissues were collected under the auspices of respective institutional (Duke University Medical Center and H. Lee Moffitt Cancer Center) IRB approved protocols involving written informed consent.

Full details of the methods used for RNA extraction and development of gene expression signatures representing deregulation of oncogenic pathways in the tumor samples were recently described. 36 Response to therapy was evaluated using standard criteria for patients with measurable disease, based upon WHO guidelines. 28

Lung and ovarian cancer cell culture. Total RNA was extracted and oncogenic pathway predictions was performed similar to the methods described previously. 36

Cross-platform Affymetrix Gene Chip comparison. To map the probe sets across various generations of Affymetrix GeneChip arrays, we utilized an in-house program, Chip Comparer (http://tenero.duhs.duke.edu/genearray/perl/chip/chipcompare r.pl) as described previously. 36

Cell proliferation assays. Growth curves for cells were produced by plating 500- 10,000 cells per well in 96-well plates. The growth of cells at 12hr time points (from t =12 hrs) was determined using the CellTiter 96 Aqueous One 23 Solution Cell Proliferation Assay Kit by Promega, which is a colorimetric method for determining the number of growing cells. 36 The growth curves plot the growth rate of cells vs. each concentration of drug tested against individual cell lines. Cumulatively, these experiments determined the concentration of cells to use for each cell line, as well as the dosing range of the inhibitors. The final dose-response curves in our experiments plot the percent of cell population responding to the chemotherapy vs. the concentration of the drug for each cell line. Sensitivity to docetaxel and a phosphatidylinositol 3-kinase (PI3 kinase) inhibitor (LY- 294002) 36 in 17 lung cell lines, and topotecan and a Src inhibitor (SU6656) in 13 ovarian cell lines was determined by quantifying the percent reduction in growth (versus DMSO controls) at 96 hrs using a standard MTT colorimetric assay. 36 Concentrations used ranged from 1- 1OnM for docetaxel, 300nM-10μ/M (SU6656), and 30OnM-IOM for LY-294002. AU experiments were repeated at least three times.

Statistical analysis methods. Analysis of expression data are as previously described. 36 ' 60~62 Briefly, prior to statistical modeling, gene expression data is filtered to exclude probe sets with signals present at background noise levels, and for probe sets that do not vary significantly across samples. Each signature summarizes its constituent genes as a single expression profile, and is here derived as the top principal components of that set of genes. When predicting the chemo sensitivity patterns or pathway activation of cancer cell lines or tumor samples, gene selection and identification is based on the training data, and then metagene values are computed using the principal components of the training data and additional cell line or tumor expression data. Bayesian fitting of binary probit regression models to the training data then permits an assessment of the relevance of the metagene signatures in within-sample classification, 60 and estimation and uncertainty assessments for the binary regression weights mapping metagenes to probabilities. To guard against over-

fitting given the disproportionate number of variables to samples, we also performed leave- one-out cross validation analysis to test the stability and predictive capability of our model. Each sample was left out of the data set one at a time, the model was refitted (both the metagene factors and the partitions used) using the remaining samples, and the phenotype of the held out case was predicted and the certainty of the classification was calculated. Given a training set of expression vectors (of values across metagenes) representing two biological states, a binary probit regression model, of predictive probabilities for each of the two states (resistant vs. sensitive) for each case is estimated using Bayesian methods. Predictions of the relative oncogenic pathway status and chemosensitivity of the validation cell lines or tumor samples are then evaluated using methods previously described 36 ' 60 producing estimated relative probabilities - and associated measures of uncertainty - of chemosensitivity/oncogenic pathway deregulation across the validation samples. In instances where a combined probability of sensitivity to a combination chemotherapeutic regimen was required based on the individual drug sensitivity patterns, we employed the theorem for combined probabilities as described by Feller: [ Probability (Pr) of (A), (B), (C) (N)] =

σPr (A) + Pr (B) + Pr (C) [ Pr (N) - [Pr(A) x Pr(B) x Pr(C) x Pr (N) ]. Hierarchical clustering of tumor predictions was performed using Gene Cluster 3.O. 63 Genes and tumors were clustered using average linkage with the uncentered correlation similarity metric. Standard linear regression analyses and their significance (log rank test) were generated for the drug response data and correlation between drug response and probability of chemosensitivity/pathway deregulation using GraphPad ® software.

Reference Bibliography

1. Levin L, Simon R, Hryniuk W: Importance of multi agent chemotherapy regimens in ovarian carcinoma: dose intensity analysis. J Natl. Cane. Inst. 85: 1732-1742, 1993 2. McGuire WP, Hoskins WJ, Brady MF, et al: Assessment of dose-intensive therapy in suboptimally debulked ovarian cancer: a Gynecologic Oncology Group study. J Clin. Oncol. 13:1589-1599, 1995

3. Jodrell Dl, Egorin MJ, Canetta RM, et al: Relationships between carboplatin explosure and tumor response and toxicity in patients with ovarian cancer. J. Clin. Oncol. 10:520-528, 1992

4. McGuire WP, Hoskins WJ, Brady MF, et al: Cyclophosphamide and cisplatin compared with paclitaxel and cisplatin in patients with stage III and stage IV ovarian cancer. N. Engl. J. Med. 334:1-6, 1996

5. McGuire WP, Brady MF, Ozols RF: The Gynecologic Oncology Group experience in ovarian cancer. Ann. Oncol. 10:29-34, 1999

6. Piccart MJ, Bertelsen K, Stuart G, et al: Long-term follow-up confirms a survival advantage of the paclitaxel-cisplatin regimen over the cyclophosphamide-cisplatin combination in advanced ovarian cancer. Int. J. Gyneeol. Cancer 13:144-148,2003

7. Wenham RM, Lancaster JM, Berchuck A: Molecular aspects of ovarian cancer. Best Pract. Res. Clin. Obstet. Gynaecol. 16:483-497, 2002

8. Berchuck A, Kohler MF, Marks JR, et al: The p53 tumor suppressor gene frequently is altered in gynecologic cancers. Am. J. Obstet. Gyneeol. 170:246-252, 1994

9. Kohler MF, Marks JR, Wiseman RW, et al: Spectrum of mutation and frequency of allelic deletion of the p53 gene in ovarian cancer. J. Natl. Cane. Inst. 85:1513-1519, 1993 10. Havrilesky L, Alvarez AA, Whitaker RS, et al: Loss of expression of the pl6 tumor suppressor gene is more frequent in advanced ovarian cancers lacking p53 mutations. Gyneeol. Oncol. 83:491-500, 2001

11. Reles A, Wen WH, Schmider A, et al: Correlation of 53 mutations with resistance to platinum-based chemotherapy and shortened survival in ovarian cancer. Clinical Cancer Research 7:2984-2997, 2001

12. Schmider A, Gee C, Friedmann W, et al: p21 (WAF IICIP 1) protein expression is associated with prolonged survival but not with p53 expression in epithelial ovarian carcinoma. Gyneeol. On col. 77:237-242, 2000

13. Wong KK, Cheng RS, Mok SC: Identification of differentially expressed genes from ovarian cancer cells by MICRO MAX cDNA microarray system. Biotechniques 30:670675,

2001

14. Welsh JB, Zarrinkar PP, Sapinoso LM, et al: Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. Proc. Natl. Acad. ScL USA 98: 1176-1181, 2001 15. Shridhar Y, Lee J-S, Pandita A, et al: Genetic analysis of early- versus late-state ovarian tumors. Cancer Res. 61 :5895-5904, 2001

16. Schummer M, Ng WW, Bumgarner RE, et al: Comparative hybridization of an array of 21 ,500 ovarian cDNAs for the discovery of genes overexpressed in ovarian carcinomas. Gene 238:375-385, 1999

17. Ono K, Tanaka T, Tsunoda T, et al: Identification by cDNA micro array of genes involved in ovarian carcinogenesis. Cancer Res. 60:5007-5011, 2000

18. Sawiris GP, Sherman-Baust CA, Becker KG, et al: Development of a highly specialized cDNA array for the study and diagnosis of epithelial ovarian cancer. Cancer Res. 62:2923-2928, 2002

19. Jazaeri AA, Yee CJ, Sotiriou C, et al: Gene expression profiles ofBRCAl -linked, BRCA2-linked, and sporadic ovarian cancers. J Natl. Cane. Inst. 94:990-1000, 2002

20. Schaner ME, Ross DT, Ciaravino G, et al: Gene expression patterns in ovarian carcinomas. MoI. Biol. Cell 14:4376-4386, 2003

21. Lancaster JM, Dressman H, Whitaker RS, et al: Gene expression patterns that characterize advanced stage serous ovarian cancers. J Surgical Gynecol. Invest. 11 :51-59, 2004

22. Berchuck A, Iversen ES, Lancaster JM, et al: Patterns of gene expression that characterize long term survival in advanced serous ovarian cancers. Clin. Can. Res. 11 :3686- 3696,2005

23. Berchuck A, Iversen E, Lancaster JM, et al: Prediction of optimal versus suboptimal cytoreduction of advanced stage serous ovarian cancer using microarrays. Am. J. Obstet.

Gynecol. 190:910-925, 2004

24. Jazaeri AA, Awtrey Cs, Chandramouli GV, et al: Gene expression profiles associated with response to chemotherapy in epithelial ovarian cancers. Clin. Cancer Res. 11 :6300- 6310, 2005 25. Helleman J, Jansen MP, Span PN, et al: Molecular profiling of platinum resistant ovarian cancer. Int. J. Cancer 118: 1963-1971, 2005

26. Spentzos D, Levine DA, Kolia s, et al: Unique gene expression profile based on pathologic response in epithelial ovarian cancer. J. Clin. Oncol. 23:7911-7918, 2005

27. Spentzos D, Levine DA, Ramoni MF, et al: Gene expression signature with independent prognostic significance in epithelial ovarian cancer. J. Clin. Oneal. 22:4700-

4710, 2004

28. Miller AB, Hoogstraten B, Staquet M, et al: Reporting results of cancer treatment. Cancer 47:207-214, 1981

29. Rustin GJ, Nelstrop AE, Bentzen SM, et al: Use of tumor markers in monitoring the course of ovarian cancer. Ann. Oncol. 10:21-27, 1999

30. Rustin GJ, Nelstrop AE, McClean P, et al: Defining response of ovarian carcinoma to initial chemotherapy according to serum CA 125. J. Clin. Oneol. 14:1545-1551, 1996 31. Irizarry RA, Hobbs B, Collin F, et al: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249-263, 2003 32. Bolstad BM, Irizarry RA, Astrand M, et al: A comparison of normalizaton methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19: 185-193, 2003 33. Lucus J, Carvalho C, Wang Q, et al: Sparse statistical modeling in gene expression genomics. Cambridge, Cambridge University Press, 2006

34. Rich J, Jones B, Hans C, et al: Gene expression profiling and genetic markers in glioblastoma survival. Cancer Res. 65 :4051-4058, 2005

35. Hans C, Dobra A, West M: Shotgun stochastic search for regression with many candidate predictors. JASA in press. 2006

36. BiId A, Yao G, Chang JT, et al: Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439:353-357, 2006.

37. Gyorrfy B, Surowiak P, Kiesslich O, Denkert C, Schafer R, Dietel M, Lage H: Gene expression profiling of 30 cancer cell lines predicts resistance towards 11 anticancer drugs at clinically achieved concentrations. Int. J Cancer 118(7): 1699-712, 2006

38. Minna, JD, Gazdar, AF, Sprang, SR & Herz, J: Cancer. A bull's eye for targeted lung cancer therapy. Science 304: 1458-1461, 2004

39. Jemal et al., CA Cancer J Clin., 53, 5-26,2003

40. Cancer Facts and Figures: American Cancer Society, Atlanta, p. 11, 2002 41. Travis et al., Lung Cancer Principles and Practice, Lippincott-Raven, New York, pps. 361-395, 1996

42. Gazdar et al., Anticancer Res. 14:261-267

43. Niklinska et al., Folia Histochem. Cytobiol. 39: 147-148, 2001

44. Parker et al, CA Cancer J Clin. 47:5-27, 1997 45. Chu et al, ./ Nat. Cancer Inst. 88:1571-1579,1996

46. Baker, VV: Salvage therapy for recurrent epithelial ovarian cancer. Hematol. 01 ICOI. Clin. N. Am, 17: 977-988, 2003

47. Hansen, HH, Eisenhauer, EA, Hasen M, Neijt JP, Piccart MJ, Sessa C, Thigpen JT: New cytostatis drugs in ovarian cancer. Ann. Oncol. 4:S63-S70, 1993

48. Herrin, VE, Thigpen JT: Chemotherapy for ovarian cancer: current concepts. Semin. Surg. On col. 17: 181-188, 1999 49. Staunton, J.E. et al. Chemosensitivity prediction by transcriptional profiling. Proc Natl Acad Sci USA 98: 10787-19792, 2001

50. Chang, J.c. et al. Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet 362:362-369, 2003

51. Emi, M., Kim, R., Tanabe, K., Uchida, Y. & toge, T. Targeted therapy against Bcl-2- related proteins in breast cancer cells. Breast Cancer Res 7: R940-R952, 2005

52. Takahashi, T. et al. Cyclin A-associated kinase activity is needed for paclitaxel sensitivity. MoI Cancer Ther 4: 1039-1046, 2005

53. Modi, S. et al. Phosphorylated/activated HER2 as a marker of clinical resistance to single agent taxane chemotherapy for metastatic breast cancer. Cancer Invest 23: 483-487, 2005

54. Langer, R. et al. Association of pre therapeutic expression of chemotherapy-related genes with response to neoadjuvant chemotherapy in Barrett carcinoma. Clin Cancer Res. 11 : 7462-7469, 2005

55. Rouzier, R. et al. Breast cancer molecular subtypes respond differently to preoperative chemotherapy. Clin Cancer Res. 11 : 5678-5685, 2005

56. Rouzier, R. et al. Microbubule-associated protein tau: a marker of paclitaxel sensitivity on breast cancer. Proc Natl Acad Sci USA 102: 8315-8320, 2005

57. DeVita, V. T., Hellman, S. & Rosenberg, S.A. Cancer: Principles and Practice of Oncology, Lippincott-Raven, Philadelphia, 2005 58. Herbst,R.S. et at Clinical Cancer Advances 2005; Major research advances in cancer treatment, prevention, and screening - a report from the American Society of Clinical Oncology. J. Clin. Oncol. 24: 190-205, 2006

59. Broxterman, H.J. & Georgopapadakou, N.H. Anticancer therapeutics: Addictive targets, multi-targeted drugs, new drug combinations. Drug Resist Update 8: 183-197, 2005 60. Pittman, J. Huang, E., Wang, Q., Nevins, JR. & West, M. Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes. Biostatistics 5: 587-601, 2004

61. West, M. et at Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98: 11462-11467, 2001

62. Ihaka, R. & Gentleman, R. A language for data analysis and graphics. J. Comput. Graph. Stat. 5: 299-314, 1996

63. Eisen, M. B., Spellman, P. T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863-14868, 1998

Table 1: 5-Flourouracil responsivity predictor set

Table 2: Adriamycin responsivity predictor set

Table 3: Cytotaxan responsivity predictor set

Table 4: Docetaxol responsivity predictor set

Table 5: Etoposide responsivity predictor set

Table 6: Taxol responsivity predictor set

Table 7: Topotecan responsivity predictor set

361 19 at -0.40934 CAV 1 inactivation of MAPK activity /// 857 vasculogenesis /// response to hypoxia /// negative regulation of endothelial cell proliferation /// triacylglycerol metabolic process /// calcium ion transport /// cellular calcium ion homeostasis /// endocytosis /// regulation of smooth muscle contraction /// skeletal muscle development /// protein localization /// vesicle organization and biogenesis /// regulation of fatty acid metabolic process /// sequestering of lipid /// regulation of blood coagulation /// cholesterol transport /// negative regulation of epithelial cell differentiation /// mammary gland development /// nitric oxide homeostasis /// cholesterol homeostasis /// cholesterol homeostasis /// negative regulation of MAPKKK cascade /// negative regulation of nitric oxide biosynthetic process /// positive regulation of vasoconstriction /// negative regulation of vasodilation /// negative regulation of JAK- STAT cascade /// positive regulation of metalloenzyme activity /// protein homooligomerization /// membrane depolarization /// regulation of peptidase activity /// calcium ion homeostasis /// mammary gland involution

36149 at -1.468146 DPYSL3 nucleobase, nucleoside, nucleotide 1809 and nucleic acid metabolic process /// signal transduction /// nervous system development /// nervous system development

36369 at 0.937326 PTRF transcription /// transcription 284119 termination /// regulation of transcription, DNA-dependent /// transcription initiation from RNA polymerase I promoter

36525 at -0.978609 FBXL2 protein modification process /// 25827 proteolysis /// ubiquitin cycle

36550 at -1.009504 RIN2 endocytosis /// signal transduction 54453 /// small GTPase mediated signal transduction

36577 at -1.446371 FERMT2 cell adhesion /// cell adhesion /// 10979 regulation of cell shape /// actin cytoskeleton organization and biogenesis

Table 8: PI3 Kinase inhibitor responsivity predictor set

Table 9: 5-Flourouracil cell lines

Table 10: Adriamycin cell lines

Table 11 : Cytotoxan cell lines

Table 12: Taxotere (docetaxel) cell lines

Table 13: Etoposide cell lines

Table 14: Taxol cell lines

Table 15: Topotecan cell lines

Table 16: Validation of predictor sets in cell line and patient data sets

Table 17: Accuracy of predictions in cell lines and patients

PPV-positive predictive value, NPV-negative predictive value. "Determining accuracy for the docetaxel predictor in the IJC cellline data set was not possible since docetaxel was not one of the drugs studied. Instead, the docetaxel predictor was validated in two independent cell line experiments, correlating predicted probability of response to docetaxel in vitro with actual IC50 of docetaxel by cell line (Figure 1 C).

Table 18: Comparison of different predictors

PPV-positive predictive value, NPV-negative predictive value. ** For both the Chang and Pusztai data, the actual numbers of predicted responders was not available, just the predictive accuracies. Also, the predictive accuracy reported for the Chang data is not in an independent validation, instead it is for leave-one cross out validation.