Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR PREDICTING RENAL CELL CARCINOMA (RCC)
Document Type and Number:
WIPO Patent Application WO/2015/170105
Kind Code:
A1
Abstract:
The present invention relates to methods, uses and kits for predicting the prognosis and/or progression of Renal Cell Carcinoma in an individual, and methods, uses and kits for predicting the response to therapy of, and/or selecting a treatment for, Renal Cell Carcinoma in an individual.

Inventors:
OVERTON IAN MICHAEL (GB)
STEWART GRANT (GB)
LUBBOCK ALEXANDER LYULPH ROBERT (GB)
HARRISON DAVID JAMES (GB)
POWLES THOMAS (GB)
Application Number:
PCT/GB2015/051345
Publication Date:
November 12, 2015
Filing Date:
May 07, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV EDINBURGH (GB)
UNIV LONDON QUEEN MARY (GB)
International Classes:
G01N33/574
Foreign References:
US20130005597A12013-01-03
US20100222230A12010-09-02
Other References:
GRANT D. STEWART ET AL: "What can molecular pathology contribute to the management of renal cell carcinoma?", NATURE REVIEWS UROLOGY, vol. 8, no. 5, 12 April 2011 (2011-04-12), pages 255 - 265, XP055206609, ISSN: 1759-4812, DOI: 10.1038/nrurol.2011.43
ALEXANDER LAIRD ET AL: "Differential Expression of Prognostic Proteomic Markers in Primary Tumour, Venous Tumour Thrombus and Metastatic Renal Cell Cancer Tissue and Correlation with Patient Outcome", PLOS ONE, vol. 8, no. 4, 5 April 2013 (2013-04-05), pages 1 - 14, XP055206601
SAMIRA A. BROOKS ET AL: "ClearCode34: A Prognostic Risk Predictor for Localized Clear Cell Renal Cell Carcinoma", EUROPEAN UROLOGY, vol. 66, no. 1, 25 February 2014 (2014-02-25), pages 77 - 84, XP055206732, ISSN: 0302-2838, DOI: 10.1016/j.eururo.2014.02.035
A. J. ARMSTRONG ET AL: "Circulating Tumor Cells from Patients with Advanced Prostate and Breast Cancer Display Both Epithelial and Mesenchymal Markers", MOLECULAR CANCER RESEARCH, vol. 9, no. 8, 10 June 2011 (2011-06-10), pages 997 - 1007, XP055206642, ISSN: 1541-7786, DOI: 10.1158/1541-7786.MCR-10-0490
Attorney, Agent or Firm:
DIDMON, Mark (The Belgrave CentreTalbot Street, Nottingham NG1 5GG, GB)
Download PDF:
Claims:
A method for predicting the prognosis of Renal Cell Carcinoma (RCC) in an individual, comprising the steps of.

- providing a sample comprising one or more Renal Cell Carcinoma cell from the individual;

- determining in the one or more cell the expression level of at least three genes selected from those listed in Table A; and

predicting the prognosis of Renal Cell Carcinoma in the individual based on the expression level of the at least three genes.

A method for predicting the progression of Renal Cell Carcinoma in an individual, comprising the steps of:

providing a sample comprising one or more Renal Cell Carcinoma cell from the individual;

determining in the one or more cell the expression level of at least three genes selected from those listed in Table A; and

predicting the progression of Renal Cell Carcinoma in the individual based on the expression level of the at least three genes.

The method according to Claim 1 or 2, wherein the step of predicting prognosis or progression of Renal Cell Carcinoma in the individual is additionally based on the age of the individual.

The method according to any preceding claim, wherein the individual has been treated, or is being treated, with an anti-Renal Cell Carcinoma treatment.

The method according to Claim 4, wherein the anti-Renal Cell Carcinoma treatment is a Tyrosine Kinase Inhibitor or an mTOR Inhibitor.

The method according to Claim 5, wherein the Tyrosine Kinase Inhibitor is selected from the group comprising: sunitinib; sorafenib; bevacizumab; pazopanib; axitinib.

The method according to Claim 5, wherein the mTOR Inhibitor is selected from the group comprising: temsirolimus; everolimus. The method according to any preceding claim, wherein the Renal Cell Carcinoma is characterised in that it comprises one or more of the following: a Stage I tumour; a Stage II tumour; a Stage III tumour; a Stage IV tumour; a Grade 1 tumour; a Grade 2 tumour; a Grade 3 tumour; a Grade 4 tumour.

The method according to any preceding claim, wherein the step of predicting the prognosis or progression of Renal Cell Carcinoma in the individual is based on the expression level of the at least three genes, relative to a control.

The method according to Claim 9, wherein the control comprises one or more Renal Cell Carcinoma cell known to be associated with a poor prognosis or progression, and/or is resistant to an anti-Renal Cell Carcinoma treatment.

The method according to Claim 9, wherein the control comprises one or more Renal Cell Carcinoma cell known to be associated with a favourable prognosis or progression, and/or is treatable with an anti-Renal Cell Carcinoma treatment.

The method according to Claim 9, wherein the control comprises one or more corresponding renal cell which is not cancerous.

The method according to any of Claims 10 or 11 , wherein the control comprises Renal Cell Carcinoma cells obtained from a group of individuals.

The method according to Claim 12, wherein the control comprises renal cells obtained from a group of individuals.

The method according to any preceding claim wherein the at least three genes comprise: N-cadherin, EpCAM and mTOR.

The method according to Claim 15, wherein the step of predicting the prognosis or progression of Renal Cell Carcinoma in the individual is based on the algorithm:

Hazard = exp(-18.385 mTOR + 8.927 N-Cadherin + 3.800 EpCAM + 0.129 Age).

The method according to Claim 16, wherein a Hazard of >1 indicates a poor prognosis or progression, and wherein a Hazard of <1 indicates a favourable prognosis or progression.

18. The method according to any preceding claim, wherein the step of predicting the prognosis or progression of Renal Cell Carcinoma in the individual comprises predicting one or more of the following:

- percentage response of the individual to anti-Renal Cell Carcinoma treatment;

- overall survival of the individual;

- disease-specific survival of the individual; and/or

progression-free survival of the individual.

19. The method according to any preceding claim, wherein the expression level is determined by measuring the presence and/or amount of one or more product of the gene, for example: protein or mRNA. 20. The method according to Claim 19, wherein protein is measured using a method selected from the list comprising: Raman spectroscopy; Acoustic Membrane MicroParticle technology; an antibody-based detection method, for example, RPPA or tissue microarray (such as AQUA). 21. The method according to Claim 19, wherein mRNA is measured using a PCR- based approach, for example RT-PCR.

22. The method according to any preceding claim further comprising the step of selecting a treatment for the individual.

23. The method according to Claim 22, further comprising the step of administering the selected treatment to the individual.

24. The method according to Claim 22 or 23, wherein the selected treatment is sunitinib.

25. The method according to any preceding claim wherein the sample is selected from the group comprising; a tumour biopsy; blood; serum; plasma; lymphatic fluid; urine.

26. The method according to Claim 25, wherein the sample is a tumour biopsy and the sample comprises tissue from two or more distinct parts of the tumour.

27. The method according to Claim 26, wherein the sample comprises: tissue from three or more distinct parts of the tumour; or tissue from four or more distinct parts of the tumour; or tissue from five or more distinct parts of the tumour; or tissue from six or more distinct parts of the tumour; or tissue from seven or more distinct parts of the tumour; or tissue from eight or more distinct parts of the tumour; or tissue from nine or more distinct parts of the tumour; or tissue from ten or more distinct parts of the tumour.

28. The method according to Claim 26 or 27, wherein the parts of the tumour are distinguished from each other, for example, on the basis of one or more of the following: tumour morphology; and/or physical location within the tumour.

29. A method for predicting the response to therapy of Renal Cell Carcinoma in an individual, comprising the steps of:

- providing a sample comprising one or more Renal Cell Carcinoma cell from the individual;

- determining in the one or more cell the expression level of at least three genes selected from those listed in Table A; and

- predicting the response to therapy of Renal Cell Carcinoma in the individual based on the expression level of the at least three genes.

30. A method for selecting a treatment for an individual with Renal Cell Carcinoma, comprising the steps of:

- providing a sample comprising one or more Renal Cell Carcinoma cell from the individual;

- determining in the one or more cell the expression level of at least three genes selected from those listed in Table A; and

- selecting a treatment for the individual based on the expression level of the at least three genes.

31. The method according to Claim 29 or 30, wherein the step of predicting the response to therapy or selecting a treatment for the individual is additionally based on the age of the individual.

32. The method according to any of Claims 29 to 31 , wherein the step of predicting the response to therapy or selecting a treatment for the individual is based on the expression level of the at least three genes, relative to a control.

33. The method according to Claim 32, wherein the control comprises one or more Renal Cell Carcinoma cell known to be associated with a poor prognosis or progression, and/or is resistant to an anti-Renal Cell Carcinoma treatment.

34. The method according to Claim 32, wherein the control comprises one or more Renal Cell Carcinoma cell known to be associated with a favourable prognosis or progression, and/or is treatable with an anti-Renal Cell Carcinoma treatment.

35. The method according to Claim 32, wherein the control comprises one or more corresponding renal cell which is not cancerous.

36. The method according to Claim 33 or 34, wherein the control comprises Renal Cell Carcinoma cells obtained from a group of individuals.

37. The method according to Claim 35, wherein the control comprises renal cells obtained from a group of individuals.

38. The method according to any of Claims 29 to 37 wherein the at least three genes comprise: N-cadherin, EpCAM and mTOR.

39. The method according to Claim 38, wherein the step of predicting the response to therapy of Renal Cell Carcinoma in the individual, or the step of selecting a treatment for the individual, is based on the algorithm:

Hazard = exp(-18.385 mTOR + 8.927 N-Cadherin + 3.800 EpCAM + 0.129 Age).

40. The method according to Claim 39, wherein a Hazard of >1 indicates a poor prognosis or progression, and wherein a Hazard of <1 indicates a favourable prognosis or progression.

41. The method according to any of Claims 29 to 40, wherein the expression level is determined by measuring the presence and/or amount of one or more product of the gene, for example: protein or mRNA. 42. The method according to any of Claims 29 and 31 to 41 , further comprising the step of selecting a treatment for the individual.

The method according to Claim 28 or 42, further comprising the step of administering the selected treatment to the individual.

The method according to any of Claims 29, 42 or 43, wherein the anti-Renal Cell Carcinoma treatment is a Tyrosine Kinase Inhibitor or an mTOR Inhibitor.

The method according to Claim 44, wherein the Tyrosine Kinase Inhibitor is selected from the group comprising: sunitinib; sorafenib; bevacizumab; pazopanib; axitinib.

The method according to Claim 44, wherein the mTOR Inhibitor is selected from the group comprising: temsirolimus; everolimus.

The method according to any of Claims 29 to 46, wherein the sample is selected from the group comprising: a tumour biopsy; blood; serum; plasma; lymphatic fluid; urine.

The method according to Claim 47, wherein the sample is a tumour biopsy and the sample comprises tissue from two or more distinct parts of the tumour.

The method according to Claim 48, wherein the sample comprises: tissue from three or more distinct parts of the tumour; or tissue from four or more distinct parts of the tumour; or tissue from five or more distinct parts of the tumour; or tissue from six or more distinct parts of the tumour; or tissue from seven or more distinct parts of the tumour; or tissue from eight or more distinct parts of the tumour; or tissue from nine or more distinct parts of the tumour; or tissue from ten or more distinct parts of the tumour. The method according to Claim 48 or 49, wherein the parts of the tumour are distinguished from each other, for example, on the basis of one or more of the following: tumour morphology; and/or physical location within the tumour.

A method for identifying an agent suitable for treating Renal Cell Carcinoma, comprising the steps of:

providing an agent to be tested;

- providing a sample comprising one or more Renal Cell Carcinoma cell, wherein the one or more cell is characterised in that the expression level of at least three genes selected from those listed in Table A is associated with poor prognosis and/or resistance to sunitinib;

- contacting the agent to be tested with the one or more cell; and

- identifying the agent as suitable for treating Renal Cell Carcinoma if the agent reduces and/or prevents proliferation and/or differentiation of the one or more cell.

The method according to Claim 51 , wherein the at least three genes comprise: N- cadherin, EpCAM and mTOR.

The method according to Claim 51 or 52 further comprising the step of manufacturing the identified agent.

The method according to any of Claims 51 to 53, further comprising the step of formulating the identified agent into a pharmaceutical composition.

The method according to any preceding claim wherein the Renal Cell Carcinoma is metastatic clear cell Renal Cell Carcinoma.

A kit for performing a method as defined in any of Claims 1 to 55, the kit comprising:

(i) one or more reagent for determining the expression level of at least three genes selected from those listed in Table A; and

(ii) one or more control sample comprising a standard level of expression level of the same at least three genes as defined in (i), above.

A kit according to Claim 56, wherein the at least three genes comprises N-Cadherin and EpCAM and mTOR.

58. A kit according to Claim 56 or 57, wherein the one or more reagent for determining the expression level comprises: an antibody against N-Cadherin, and an antibody against EpCAM, and an antibody against mTOR.

59. A method or a use or a kit substantially as claimed herein, with reference to the accompanying description and/or examples and/or drawings.

Description:
METHOD FOR PREDICTING RENAL CELL CARCINOMA (RCC)

The present invention relates to methods, uses and kits for predicting the prognosis and/or progression of Renal Cell Carcinoma in an individual, and methods, uses and kits for predicting the response to therapy of, and/or selecting a treatment for, Renal Cell Carcinoma in an individual.

Renal Cell Carcinoma (RCC) is the most common type of kidney cancer in adults, in which it is responsible for approximately 90-95% of cases. Mortality is approximately 40%, and five-year survival for those with metastatic Renal Cell Carcinoma is <10% (Stewart et a/., 201 1 , Nat Rev Urol., 8:255-265). There is a great unmet need for significant improvement in the treatment of localised and metastatic cancer as the disease remains the most lethal of all urological malignancies. Renal Cell Carcinoma typically originates in the lining of the proximal convoluted tubule. Unlike many other cancers, Renal Cell Carcinoma is not a single entity, but is instead composed of different cell and tumour types derived from distinct parts of the nephron (such as the epithelium and/or renal tubules), each of which have distinct genotypes, gene expression profiles, histological features and clinical phenotypes.

Currently, risk-stratification of patients with Renal Cell Carcinoma relies on clinico- pathological scoring systems (such as the "Leibovich score"), but it is well-recognised that such methods of stratification have limited value and that there is a need to improve prognosis by inclusion of molecular markers (Galsky, 2013, Lancet Oncol., 14: 102-103). To date, attempts to identify molecular markers which can be used to evaluate patients and predict prognosis of Renal Cell Carcinoma have failed, particularly because the heterogeneous nature of Renal Cell Carcinoma tumours has made it extremely difficult to identify markers which strongly associate with the disease. Thus, in contrast to many other cancer types (such as lung cancer and breast cancer), there is no molecular means to identify and select patients that are likely to be responsive to a particular treatment. A range of targeted therapies now exist for Renal Cell Carcinoma (such as sunitinib and axitinib), but the lack of a molecular selection criteria means that the majority (around 70%) of Renal Cell Carcinoma patients are subjected to drugs without having any tumour response, whilst incurring potentially significant toxicity and cost (estimated at £70 million per annum in the ΌΚ). As a further complication, the heterogeneous genotype and gene expression profile of Renal Cell Carcinoma tumours appears to increase following drug treatment (O'Mahony et al. (2013, J. Vis. Exp. (71 ), e50221 , doi:10.3791/50221 ), so it is even more difficult to identify relevant molecular markers in patients once treatment has started.

Against this background, the present inventors have surprisingly discovered that the expression level of certain genes when taken in combination can be used as a indicator of the severity and progression of Renal Cell Carcinoma. That finding provides effective molecular markers for predicting the prognosis and/or progression of Renal Cell Carcinoma in an individual, and allows predictions to be made regarding the likely response to therapy of Renal Cell Carcinoma in an individual, thereby permitting an appropriate therapeutic treatment to be selected for that individual.

As described in the accompanying Examples, based on the need for improved prognostication and prediction of response to targeted therapies for patients with Renal Cell Carcinoma, the present inventors used statistical machine-learning techniques to integrate clinical and pathological information with proteomic data to create a novel, robust prognostic and predictive algorithm in RCC. The present invention therefore provides a general approach for predicting the prognosis and/or progression of Renal Cell Carcinoma in an individual, irrespective of whether treatment has commenced or not. Additionally, the invention provides an approach for predicting the response to therapy of Renal Cell Carcinoma in an individual, and provides a means for selecting an appropriate treatment for an individual with Renal Cell Carcinoma.

Accordingly, in a first aspect, the invention provides a method for predicting the prognosis of Renal Cell Carcinoma in an individual, comprising the steps of: - providing a sample comprising one or more Renal Cell Carcinoma cell from the individual;

- determining in the one or more cell the expression level of at least three genes selected from those listed in Table A; and

- predicting the prognosis of Renal Cell Carcinoma in the individual based on the expression level of the at least three genes. In a second aspect, the invention provides a method for predicting the progression of Renal Cell Carcinoma in an individual, comprising the steps of:

- providing a sample comprising one or more Renal Cell Carcinoma cell from the individual;

- determining in the one or more cell the expression level of at least three genes selected from those listed in Table A; and

- predicting the progression of Renal Cell Carcinoma in the individual based on the expression level of the at least three genes.

Thus, the present invention provides a new molecular approach for assessing, characterising and monitoring Renal Cell Carcinoma in patients. The inventor's findings are particularly surprising given the heterogeneity of gene expression in Renal Cell Carcinoma, and the invention will therefore be of real clinical benefit, for example in:

(i) Identifying patients in whom a particular treatment (such as sunitinib) is not likely to be effective, thereby sparing patients unnecessary treatments and the associated side-effects and enhancing quality of life; saving cost and allowing more-effective spending of health service budgets; and enabling the exploration of alternative therapeutic treatments where appropriate.

Identifying patients in whom a particular treatment (such as sunitinib) is likely to be effective, thereby guiding treatment decisions (particularly where patients are on the borderline of receiving that treatment due to other clinical factors), and providing additional confidence in the clinician's decision to treat.

(iii) Inform patient counselling and, in combination with (i) and (ii), provide for evidence-based medicine and improved patient management.

It will be appreciated that the invention involves an individual that has, and is known to have, Renal Cell Carcinoma. Renal Cell Carcinoma is a well known disorder, and those skilled in the arts of medicine and oncology will be familiar with the associated symptoms and be capable of identifying and diagnosing the presence of Renal Cell Carcinoma in an individual. Historically, medical practitioners expected an individual with Renal Cell Carcinoma to present with three findings - in particular: (1 ) haematuria; (2): flank pain; and (3): an abdominal mass - but it is now known that his triad of symptoms only occurs in 10-15% of cases, and is usually indicative of Renal Cell Carcinoma at an advanced stage.

The initial symptoms of Renal Cell Carcinoma typically include: blood in the urine (occurring in 40% of affected persons at the time that medical advice is sought); and/or flank pain (40%); and/or a mass in the abdomen or flank (25%); and/or weight loss (33%); and/or fever (20%); and/or high blood pressure (20%); and/or night sweats; and/or malaise. Renal Cell Carcinoma is also typically associated with a number of "paraneoplastic syndromes", which are conditions caused by either the hormones produced by the tumour itself or by the body's attack on the tumour, and which commonly affect tissues which do not actually house the tumour. The most common syndromes are selected from: anaemia or polycythaemia; and/or high blood calcium levels; and/or thrombocytosis; and/or secondary amyloidosis.

Thus, by "Renal Cell Carcinoma in an individual", we include an individual that has been diagnosed as having Renal Cell Carcinoma, for example, due to the presentation of one or more of the associated symptoms as discussed herein.

It will be appreciated that Renal Cell Carcinoma is a general term that encompasses a range of distinct types of RCC, including: metastatic clear cell RCC; localised clear cell RCC; multilocular cystic clear cell RCC; tubulocystic RCC; thyroid-like follicular RCC; acquired cystic kidney disease-associated RCC; hybrid oncocytoma/chromophobe RCC. Thus, preferably, the invention involves an individual that has, and is known to have, one or more type of RCC selected from the group comprising: metastatic clear cell RCC; localised clear cell RCC; multilocular cystic clear cell RCC; tubulocystic RCC; thyroid-like follicular RCC; acquired cystic kidney disease-associated RCC; hybrid oncocytoma/chromophobe RCC. Most preferably, the individual has, and is known to have, metastatic clear cell RCC.

Whilst it is preferred that the individual is a human, the individual may also be a non-human mammal (i.e. any mammal other than a human), such as, a horse, cow, goat, sheep, pig, dog, cat, rabbit, mouse or rat. A particularly important aspect of the present invention is the inventors' finding that the expression level of at least three selected genes (of those listed in Table A) in combination provides an indicator of the severity and progression of Renal Cell Carcinoma.

Table A: Genes used in the present invention.

It will be appreciated that any combination of at least three genes from those listed in Table A could be selected and used in the present invention.

Additionally, it will be appreciated that the expression level of more than three genes selected from those listed in Table A could be used in the present invention - for example: four or more; or five or more; or six or more; or seven or more; or eight or more; or nine or more; or ten or more; or 11 or more; or 12 or more; or 13 or more; or 14 or more; or 15 or more; or 16 or more; or 17 or more; or 18 or more; or 19 or more; or 20 or more; or 21 or more; or 22 or more; or 23 or more; or 24 or more; or 25 or more; or 26 or more; or 27 or more; or 28 or more; or 29 or more; or 30 of the genes in Table A.

It will be appreciated to those skilled in the art of molecular biology that gene expression involves the steps of gene transcription (in which the gene coding sequence is transcribed to mRNA) and translation (in which mRNA is translated to form the encoded protein molecule). Methods for measuring gene expression are known in the art and typically involve detecting the presence and/or activity of the product of transcription {i.e. mRNA), and/or the presence and/or activity of the product of translation {i.e. protein). Exemplary methods for detecting mRNA or protein associated with a particular gene are discussed below. By "expression level" we include a measure of the amount of mRNA and/or protein that is produced by gene expression. Expression may be quantified as the total amount of mRNA and/or protein detectable in a particular sample (such as in a single cell or group of cells), or the amount of mRNA and/or protein produced over a given period.

Methods suitable for obtaining one or more Renal Cell Carcinoma cell from the individual will be known to those skilled in the art of medicine and cellular biology. An exemplary method is described in O'Mahony et al. (2013, J. Vis. Exp. (71 ), e50221 , doi: 10.3791 /50221 (2013).

By "prognosis of Renal Cell Carcinoma in an individual" we include the likely clinical development and outcome of Renal Cell Carcinoma in the individual, including the severity of the disease and/or the life expectancy or survival of the individual. Thus by "predicting the prognosis of Renal Cell Carcinoma in an individual" we include the prediction of the likely clinical development and outcome of Renal Cell Carcinoma in the individual, including the likely severity of the disease and/or the likely life expectancy or survival of the individual.

Those skilled in the art of medicine will be familiar with predicting the prognosis of particular diseases. Prior to the present invention, the prognosis for Renal Cell Carcinoma was known to be influenced by a variety of factors, including tumour size, degree of invasion and metastasis, histologic type, and nuclear grade. For example, for metastatic Renal Cell Carcinoma, factors which may present a poor prognosis include a low "Karnofsky" performance-status score (a standard way of measuring functional impairment in patients with cancer), a low haemoglobin level, a high level of serum lactate dehydrogenase, and a high corrected level of serum calcium. For non-metastatic cases, the "Leibovich" scoring system may be used to predict disease progression.

By "progression of Renal Cell Carcinoma in an individual" we include the likely physical, cellular and/or molecular development of Renal Cell Carcinoma in the individual, including the progression between Stages and Grades of the disease. Thus by "predicting the prognosis of Renal Cell Carcinoma in an individual" we include the prediction of the likely physical, cellular and/or molecular development of Renal Cell Carcinoma in the individual, including the likelihood of progression between Stages and Grades of the disease.

It will be appreciated that Staging and/or Grading may be used to classify and characterise Renal Cell Carcinoma in the methods of the invention. Various staging approaches are known, such as the "TNM" staging system where the size and extent of the tumour ("T"), involvement of lymph nodes ("N") and metastases ("M") are classified separately. Alternatively, an overall stage grouping (into Stage l-IV) can be used, in line with various clinical guidelines, as shown below:

Grading may be performed using the "Fuhrman system", which is an assessment based on the microscopic morphology of a neoplasm, using haematoxylin and eosin (H&E staining). That system categorises Renal Cell Carcinoma with Grades 1-4 based on nuclear characteristics, as shown below:

Preferably, the step of predicting prognosis or progression of Renal Cell Carcinoma in the individual is additionally based on the age of the individual.

The inventors have found that the age of the individual is associated with prognosis and/or progression of Renal Cell Carcinoma in that individual. In particular, as demonstrated in the accompanying Examples, increasing age correlates with a poor prognosis whilst decreasing age correlates with a good prognosis; and increasing age correlates with an increased likelihood of progression whilst decreasing age correlates with a decreased likelihood of progression. Thus, older individuals have a greater risk of a poor prognosis than younger individuals.

It is preferred that the age of the individual is 38 years old, or older; for example: 50 years old or older; or 60 years old or older; or 70 years old or older; or 80 years old or older; or 90 years old or older. Preferably, the individual is aged between 38 years old and 79 years old. As discussed above and demonstrated in the accompanying Examples, an older individual has a greater risk of a poor prognosis than a younger individual. Preferably, the individual has been treated, or is being treated, with one or more anti-Renal Cell Carcinoma treatment. In an alternative embodiment, the individual has not previously been treated with an anti-Renal Cell Carcinoma treatment.

Various Renal Cell Carcinoma treatments are known and used to manage the disorder and alleviate the associated symptoms.

Renal Cell Carcinoma may be treated with radiation therapy and/or chemotherapy, although many cases are relatively resistant to such therapies and immunotherapy is often preferable. As will be appreciate, all such treatments carry the risk of drug toxicity which can lead to undesirable and unpleasant side-effects in the treated individual.

In a preferred embodiment, the anti-Renal Cell Carcinoma treatment comprises or consists of a Tyrosine Kinase Inhibitor (such as a Receptor Tyrosine Kinase inhibitor) or an mTOR Inhibitor. Such inhibitors are well known to those in the art of medicine. It is preferred that the Tyrosine Kinase Inhibitor is selected from the group comprising: sunitinib; sorafenib; bevacizumab; pazopanib; axitinib. Preferably, the mTOR Inhibitor is selected from the group comprising: temsirolimus; everolimus. Most preferably, the anti-Renal Cell Carcinoma treatment is sunitinib. Typically, sunitinib is administered in a six-week treatment cycle comprising four-weeks of continuous treatment of 50mg sunitinib administered once per day, followed by two-weeks of no treatment. The dosage may be adjusted in steps of 12.5mg according to tolerability.

Sunitinib and pazopanib are typically used as first-line treatment in many individuals with Renal Cell Carcinoma. Axitinib is typically used as a second-line treatment in individuals with Renal Cell Carcinoma, for example, when the individual is not responsive to sunitinib and/or pazopanib or has developed resistance to sunitinib and/or pazopanib.

It will be appreciated that the present invention may be performed on an individual that has not been treated with an anti-Renal Cell Carcinoma treatment, or may be performed on an individual that has been treated and/or is being treated, with an anti-Renal Cell Carcinoma treatment.

In one embodiment, the present invention may be used to predict the prognosis and/or progression of Renal Cell Carcinoma in an individual that has been treated, and/or is being treated, with an anti-Renal Cell Carcinoma treatment (such as sunitinib or pazopanib). In that embodiment, the invention therefore provides a method for determining whether the treatment is therapeutically effective for that individual, and therefore allows a decision to be made regarding the future treatment of that individual.

For example, where the individual has been treated, or is being treated, with sunitinib and it is determined not to be therapeutically effective, another treatment (such as pazopanib or axitinib) can be administered to the individual instead of, or in addition to, sunitinib.

As an alternative example, where the individual has been treated, or is being treated, with sunitinib and it is determined to be therapeutically effective, that treatment can be continued.

Preferably, the invention provides a method wherein the Renal Cell Carcinoma is characterised in that it comprises one or more of the following: a Stage I tumour; a Stage II tumour; a Stage III tumour, a Stage IV tumour. Preferably, the invention provides a method wherein the Renal Cell Carcinoma is characterised in that it comprises one or more of the following: a Grade 1 tumour; a Grade 2 tumour; a Grade 3 tumour; a Grade 4 tumour. Based on the accompanying Examples, the inventors believe that the VHL mutation (a known mutation that predisposes to multiple cancers, including Renal Cell Carcinoma) is not responsible for the heterogeneity observed in Renal Cell Carcinoma. Accordingly, the present invention is applicable to Renal Cell Carcinoma regardless of the VHL gene status of the cancer, and may therefore be performed on Renal Cell Carcinoma which has the VHL mutation or does not have the VHL mutation.

Preferably, the step of predicting the prognosis or progression of Renal Cell Carcinoma in the individual is based on the expression level of the at least three genes, relative to a control.

In a preferred embodiment, the control comprises a standard level of expression level of the same at least three genes that are analysed in the sample from the individual being tested. For example, where the expression level of N-Cadherin and EpCAM and mTOR is determined in the sample, the control comprises a standard level of expression of the N-Cadherin and EpCAM and mTOR genes.

In one embodiment, the control comprises one or more Renal Cell Carcinoma cell known to be associated with a poor prognosis or progression, and/or is resistant to an anti-Renal Cell Carcinoma treatment. In that embodiment of the invention, it is preferred that the control comprises Renal Cell Carcinoma cells obtained from a group of individuals.

In another embodiment, the control comprises one or more Renal Cell Carcinoma cell known to be associated with a favourable prognosis or progression, and/or is treatable with an anti-Renal Cell Carcinoma treatment. For example, the control may comprise one or more corresponding renal cell which is not cancerous. In that embodiment, it is preferred that the control comprises renal cells obtained from a group of individuals.

By a "group of individuals" we mean a group comprising two or more individuals, and preferably four or more individuals. Ideally, the group will not exceed 10,000 individuals, and will preferably not exceed 1000 individuals. Preferably, the group comprises or consists of between 4 and 100 individuals. Preferably, the control will be one that is appropriately matched with the individual being tested - for example, in terms of being the same sex and/or of similar age and/or smoking status. It is preferred in the methods of the invention that the at least three genes comprise or consist of: N-cadherin, EpCAM and mTOR.

It is particularly preferred in the methods of the invention that the at least three genes comprise or consist of: N-cadherin, EpCAM and mTOR, and that the method of the invention is additionally based on the age of the individual.

Preferably, the step of predicting the prognosis or progression of Renal Cell Carcinoma in the individual is based on the algorithm: Hazard = exp(-18.385 mTOR + 8.927 N-Cadherin + 3.800 EpCAM + 0.129 Age).

In the algorithm of the invention, the value assigned to the expression level of N-cadherin, EpCAM and mTOR is an estimate of the concentration of that protein relative to total protein or to the concentration of a specific control protein (such as pan-cytokeratin), as described in the accompanying Examples.

It will be appreciated that the above algorithm provides a mathematical statement of the inventors' findings in relation to the age and expression level of N-cadherin, EpCAM and mTOR, and that alternative algorithms could be derived.

For instance, as described in the accompanying Examples, an algorithm may be derived using the following approach:

- Variables are selected for Cox multivariate analysis using backward elimination regularised by Bayesian Information Criterion (BIC) on a "training" dataset, a form of wrapper feature selection (Ann Stat 1978;6:461 , Artif Intel! 1997;97:273). BIC regularisation controls overfitting, which is particularly important when training data is limited {JAMA 2010; 105:312).

- An initial Cox regression model is fitted using all features by the 'coxph' function from the R survival library (Therneau and Grambsch (2000). 'Modelling Survival

Data: Extending the Cox Model' Springer, New York. ISBN 0-387-98784-3). Backward elimination iteratively removes a single feature at each step, selected for the greatest improvement in BIC value. Thus, features with low predictive power or high redundancy are removed. The procedure terminates with a final model when removing any single feature does not improve the BIC value. The function 'stepAIC is used from a MASS R library, with the value of k (a multiplier penalising model complexity) specified for BIC regularisation (Venables and Ripley (2002)

'Modern Applied Statistics with S'. Fourth edition. Springer.).

- A total of 12 features are input to wrapper selection; these may include key clinical parameters where data is available for the cohorts (such as: grade, gender, age, neutrophils, haemoglobin level, DCM score (Lancet Oncol 2013; 14: 141)). Other features that may be included are the median tumour expression of proteins that are significantly differentially expressed and/or have substantively increased variance upon treatment (for example: BCL2, MLH1 , CAIX, mTOR, N-cadherin and EpCAM). It will be appreciated that in the accompanying Examples, the selected features were N-cadherin, EpCAM, Age and mTOR (NEAT) and the resulting multivariate Cox proportional hazards model learned on the RCC_TRAIN dataset had likelihood ratio test p=1.18x10 "4 . The proportional hazards assumption was met, Grambsch-Therneau test results are given in Table 5 [Biometrika 1994;81 :515]. Those skilled in the art will be familiar with the Cox analysis approach. In short, the method of Cox proportional hazards regression (J R Stat Soc B 1972;24:187) is a popular approach for multivariate modelling of survival data. This method takes as input a set of variables under consideration (covariates) and patient survival times. The resulting regression model relates covariate values and survival time, expressed as the hazard function which takes the general form: h(t) = ho(t) x exp{biXi + b∑X2 + ... + b n x n }, where: h(t) is the hazard function, determined by n covariates (xi ... x n ) weighted by their coefficients;

the baseline hazard ho(t) represents the intercept and the hazard where all covariate values are 0.

Cox modelling assumes that covariates are related multiplicatively to hazard, which is known as the proportionality assumption. It is important that the proportionality assumption is met, and the scaled Schoenfield residuals test is appropriate for this (Biometrika 1994;81 :515, Stat Med 1997; 16:61 1 ). For further discussion see BJC 2003;89:431 and BJC 2003;89:605.

In short, the exemplary algorithm given above reflects the inventors' findings that: • A favourable prognosis or progression of Renal Cell Carcinoma in an individual is predicted when:

- the expression level of N-cadherin is reduced; and

- the expression level of EpCAM is reduced; and

- the expression level of mTOR is elevated; and

- there is an increasing hazard with increasing Age of the individual.

• A poor prognosis or progression of Renal Cell Carcinoma in an individual is predicted when:

- the expression level of N-cadherin is elevated; and

- the expression level of EpCAM is elevated; and

- the expression level of mTOR is reduced; and

- there is an increasing hazard with increasing Age of the individual.

Conveniently, according to the methods of the invention, a Hazard of >1 indicates a poor prognosis or progression, and a Hazard of <1 indicates a favourable prognosis or progression. In other words, a Hazard of >1 indicates a poor prognosis and a high risk/likelihood that the Renal Cell Carcinoma will progress to a more serious or severe stage and/or grade; a Hazard of <1 indicates a good prognosis and a low risk/likelihood that the Renal Cell Carcinoma will progress to a more serious or severe stage and/or grade. Those skilled in the art will be familiar with the concept of assessing risk using a "Hazard" value. As will be appreciated, the term "Hazard" describes the probability of an individual experiencing an event - as used herein, in the context of survival, Hazard describes the instantaneous death rate for an individual who is alive at a given time. As discussed above, the method of Cox proportional hazards regression is a popular approach for multivariate modelling of survival data, and that method takes as input a set of variables under consideration (covariates) and patient survival times. The resulting regression model relates covariate values and survival time, expressed as the hazard function which takes a general form: h(t) = ho(t) x exp{biXi + b2X2 + ... + b n x n }, as described above. It is preferred that the step of predicting the prognosis or progression of Renal Cell Carcinoma in the individual comprises predicting one or more of the following: percentage response of the individual to anti-Renal Cell Carcinoma treatment; overall survival of the individual; disease-specific survival of the individual; and/or progression-free survival of the individual.

Those skilled in the art will be familiar with the use of such criteria for predicting the prognosis or progression of diseases.

"Overall survival of the individual" refers to the length of time that the individual survives, regardless of the cause of death. "Disease-specific survival of the individual" refers to the length of time that the individual survives, where the cause of death is the disease being monitored (and in which individuals that die from other causes are disregarded). Thus, the "disease-specific survival rate" is the percentage of individuals in a study who have not died from the disease in a defined period of time; the time period usually begins at the time of diagnosis or at the start of treatment and ends at the time of death.

"Progression-free survival of the individual" refers to the length of time during and after the diagnosis and any treatment, that the individual lives with the disease without it worsening. Methods of measuring the expression level of a gene are well known to those in the art of molecular biology. In one embodiment of the invention, the expression level is determined by measuring the presence and/or amount of one or more product of the gene, for example: protein or mRNA. Assaying protein levels in a biological sample can be performed using any art-known method. Preferred for assaying protein levels in a biological sample are antibody-based techniques. Such techniques may involve a primary antibody (which specifically recognises the target protein) and a secondary antibody (which specifically recognises the primary antibody) which comprises a detectable moiety.

Other antibody-based methods useful for detecting protein levels include immunoassays, such as the enzyme linked immunosorbent assay (ELISA) and the radioimmunoassay (RIA). For example, a protein-specific monoclonal antibody can be used both as an immune-adsorbent and as an enzyme-labelled probe to detect and quantify the protein. The amount of protein present in the sample can be calculated by reference to the amount present in a standard preparation using a linear regression computer algorithm. Such an ELISA for detecting a tumour antigen is described in lacobelli et al., Breast Cancer Research and Treatment 11 : 19-30 (1988). In another ELISA assay, two distinct specific monoclonal antibodies can be used to detect protein in a body fluid. In this assay, one of the antibodies is used as the immune-adsorbent and the other as the enzyme-labelled probe.

The above techniques may be conducted essentially as a "one-step" or "two-step" assay. The "one-step" assay involves contacting protein with immobilized antibody and, without washing, contacting the mixture with the labelled antibody. The "two-step" assay involves washing before contacting the mixture with the labelled antibody. Other conventional methods may also be employed as suitable. It is usually desirable to immobilize one component of the assay system on a support, thereby allowing other components of the system to be brought into contact with the component and readily removed from the sample. Suitable enzyme labels include, for example, those from the oxidase group, which catalyse the production of hydrogen peroxide by reacting with substrate. Glucose oxidase is particularly preferred as it has good stability and its substrate (glucose) is readily available. Activity of an oxidase label may be assayed by measuring the concentration of hydrogen peroxide formed by the enzyme-labelled antibody/substrate reaction. Besides enzymes, other suitable labels include radioisotopes, such as iodine (1251, 1211), carbon (14C), sulphur 35S), tritium (3H), indium (112ln), and technetium (99mTc), and fluorescent labels, such as fluorescein and rhodamine, and biotin.

It will be appreciated that protein-specific antibodies for use in the present invention can be raised against the intact protein or an antigenic polypeptide fragment thereof, which may be presented together with a carrier protein, such as an albumin, to an animal system (such as rabbit or mouse) or, if it is long enough (at least about 25 amino acids), without a carrier. As used herein, the term "antibody" (Ab) or "monoclonal antibody" (Mab) is meant to include intact molecules as well as antibody fragments (such as, for example, Fab and F(ab')2 fragments) which are capable of specifically binding to the target protein. Fab and F(ab')2 fragments lack the Fc fragment of intact antibody, clear more rapidly from the circulation, and may have less non-specific tissue binding of an intact antibody (Wahl et al_, J. Nucl. Med. 24:316-325 (1983)). Thus, these fragments are preferred.

Further suitable labels for protein-specific antibodies are provided below. Examples of suitable enzyme labels include malate dehydrogenase, staphylococcal nuclease, delta-5- steroid isomerase, yeast-alcohol dehydrogenase, alpha-glycerol phosphate dehydrogenase, triose phosphate isomerase, peroxidase, alkaline phosphatase, asparaginase, glucose oxidase, beta-galactosidase, ribonuclease, urease, catalase, glucose-6-phosphate dehydrogenase, glucoamylase, and acetylcholine esterase.

Examples of suitable radio-isotopic labels include 3H, 1111n, 1251, 1311, 32P, 35S, 14C, 51 Cr, 57To , 58Co, 59Fe , 75Se, 152Eu, 90Y, 67Cu, 217Ci, 211 At, 212Pb , 47Sc, and 109Pd. Examples of suitable non-radioactive isotopic labels include 157Gd, 55Mn, 162Dy, 52Tr, and 56Fe.

Examples of suitable fluorescent labels include an 152Eu label, a fluorescein label, an isothiocyanate label, a rhodamine label, a phycoerythrin label, a phycocyanin label, an allophycocyanin label, an o-phthaldehyde label, and a fluorescamine label. Examples of suitable toxin labels include diphtheria toxin, ricin, and cholera toxin. Examples of chemiluminescent labels include a luminal label, an isoluminal label, an aromatic acridinium ester label, an imidazole label, an acridinium salt label, an oxalate ester label, a luciferin label, a luciferase label, and an aequorin label. Examples of nuclear magnetic resonance contrasting agents include heavy metal nuclei such as Gd, Mn, and iron.

Typical techniques for binding the above-described labels to antibodies are provided by Kennedy et al., Clin. Chim. Acta 70:1-31 (1976), and Schurs et al, Clin. Chim. Acta 81:1- 40 (1977). Coupling techniques mentioned in the latter are the glutaraldehyde method, the periodate method, the dimaleintide method, the m-maleimidobenzyl-N-hydroxy- succinimide ester method, all of which methods are incorporated by reference herein.

Preferably, protein is measured using a method selected from the list comprising: Raman spectroscopy; Acoustic Membrane MicroParticle technology; an antibody-based detection method, for example, RPPA or AQUA.

Acoustic Membrane MicroParticle (AMMP) technology is a non-optical detection technology for determining protein concentration. In brief, micro-particles are used to capture a protein analyte in a sample, and rapidly transport it to a sensor surface, thus resulting in a measurable signal that is tracked by observing the sensor response.

Preferably, mRNA is measured using a PCR-based approach, for example RT-PCR. The RT-PCR method is described in Makino et al, Technique 2:295-301 (1990), and involves the radio-activities of the "amplicons" in the polyacrylamide gel bands being linearly related to the initial concentration of the target mRNA. Briefly, that method involves adding total RNA isolated from a biological sample in a reaction mixture containing a RT primer and appropriate buffer. After incubating for primer annealing, the mixture can be supplemented with a RT buffer, dNTPs, DTT, RNase inhibitor and reverse transcriptase. After incubation to achieve reverse transcription of the RNA, the RT products are subjected to PCR using labelled primers. Alternatively, rather than labelling the primers, a labelled dNTP can be included in the PCR reaction mixture. PCR amplification can be performed in a DNA thermal cycler according to conventional techniques. After a suitable number of rounds to achieve amplification, the PCR reaction mixture is electrophoresed on a polyacrylamide gel. After drying the gel, the radioactivity of the appropriate bands (corresponding to the mRNA) is quantified using an imaging analyser. RT and PCR reaction ingredients and conditions, reagent and gel concentrations, and labelling methods are well known in the art. Variations on the RT-PCR method will be apparent to those skilled in the art. Any set of oligonucleotide primers which will amplify reverse transcribed target MRNA can be used and those skilled in the art will be aware of how to design, manufacture and use such primers. In a preferred embodiment, the method of the first and/or second aspect of the invention further comprises the step of selecting a treatment for the individual.

Still more preferably, the method of the first and/or second aspect of the invention further comprises the step of administering the selected treatment to the individual.

As discussed above, the present invention may be used to predict the prognosis and/or progression of Renal Cell Carcinoma in an individual that has been treated, or is being treated, with an anti-Renal Cell Carcinoma treatment (such as sunitinib or pazopanib). In that embodiment, the invention therefore provides a method for determining whether the treatment is therapeutically effective for that individual, and therefore allows a decision to be made regarding the future treatment of that individual.

For example, where the individual has been treated, or is being treated, with sunitinib and it is determined not to be therapeutically effective, another treatment (such as pazopanib or axitinib) can be administered to the individual instead of, or in addition to, sunitinib. As an alternative example, where the individual has been treated, or is being treated, with sunitinib and it is determined to be therapeutically effective, that treatment can be continued. As discussed above, in a preferred embodiment, the anti-Renal Cell Carcinoma treatment comprises or consists of a Tyrosine Kinase Inhibitor or an mTOR Inhibitor. Such inhibitors are well known to those in the art of medicine. It is preferred that the Tyrosine Kinase Inhibitor is selected from the group comprising: sunitinib; sorafenib; bevacizumab; pazopanib; axitinib. Preferably, the mTOR Inhibitor is selected from the group comprising: temsirolimus; everolimus.

Most preferably, the anti-Renal Cell Carcinoma treatment is sunitinib. Typically, sunitinib is administered in a six-week treatment cycle comprising four-weeks of continuous treatment of 50mg sunitinib administered once per day, followed by two-weeks of no treatment. The dosage may be adjusted in steps of 12.5mg according to tolerability.

In an embodiment of the methods of the invention, the sample is selected from the group comprising: a tumour biopsy; blood; serum; plasma; lymphatic fluid; urine. Clinical approaches for obtaining such samples from an individual will be known to those skilled in the art of medicine.

Preferably, the sample is a tumour biopsy. Samples of bodily fluids (such as blood; serum; plasma; lymphatic fluid; urine) contain Circulating Tumour Cells (CTCs) derived from the Renal Cell Carcinoma, which may be purified using standard techniques known to those in the art (for example, Nagrath er a/., 2007, Nature, 450:1235-1239).

The inventors have identified that, where the methods of the invention are performed using a tumour biopsy sample, the results are of greater significance if the sample contains tissue from two or more (and preferably, three or more) distinct parts of the tumour. As discussed above, Renal Cell Carcinoma tumours exhibit significant heterogeneity across the tumour, and the inventors' approach of generating a "combined" or "averaged" sample, is thought to provide a sample that is a better representation of the tumour as a whole - by doing so, the methods of the invention are less likely to be based on an atypical part of the tumour, and the results consequently have greater significance and accuracy. Thus, in a particularly preferred embodiment, the sample is a tumour biopsy and the sample comprises tissue from two or more distinct parts of the tumour. It will be appreciated that the tissue will comprise one or more Renal Cell Carcinoma cell from the part of the tumour from which it is taken.

It is preferred that the sample comprises tissue from three or more distinct parts of the tumour. However, the sample may comprise tissue from four or more distinct parts of the tumour; or tissue from five or more distinct parts of the tumour; or tissue from six or more distinct parts of the tumour; or tissue from seven or more distinct parts of the tumour; or tissue from eight or more distinct parts of the tumour; or tissue from nine or more distinct parts of the tumour; or tissue from ten or more distinct parts of the tumour.

Typically, each sample comprises 1 cubic centimetre (i.e. 1cm3) volume of tissue, or more (such as 2cm3 of tissue; or 3cm3 of tissue).

Preferably, tissue from the relevant parts of the tumour is obtained by taking a separate biopsy from each distinct part of the tumour. However, it will be appreciated that, where distinct parts of a tumour are adjacent to one another, tissue from the relevant parts of the tumour may be obtained by taking a single biopsy which comprises distinct part of the tumour.

Where separate biopsies are taken, it is preferred in the methods and uses of the invention that the step of determining the expression level of the selected genes is performed separately in each biopsy.

Preferably, the parts of the tumour are distinguished from each other, for example, on the basis of one or more of the following: tumour morphology (such as Fuhrman grade and/or sarcomatoid features and/or necrosis); and/or physical location within the tumour. Approaches for sampling a tumour are known to those in the art and include:

1) Sampling guided by Fuhrman grade, to collect samples representative of the Fuhrman grade diversity in the tumour;

2) Sampling of the tumour to avoid collection of material from necrotic regions;

3) Sampling guided by physical proximity (for example, in the centre of the tumour, at the edge of the tumour, in between the edge and the centre of the tumour). It is known that such parts of tumours may have differences in terms of oxygenation, tumour cell biology (for example, invasive character and/or differentiation status), and composition of the stromal components;

4) Sampling guided by sarcomatoid features;

5) Sampling of the tumour at biopsy (not nephrectomy) where areas which are radiologically variable (i.e. no CT scanning) are sampled.

It will be appreciated that any or all of the above approaches may be used within the methods and uses of the invention. Suitable approaches are known in the art for selecting and taking tissue from distinct parts of a tumour in order to generate a tumour biopsy sample for use in the invention. For example, the parts of the tumour may be randomly selected from within the tumour. Alternatively, the parts of the tumour may be purposely selected from distinct regions of the tumour - for example, to ensure that certain distinct parts of the tumour are selected for analysis, as discussed above.

Thus, a particularly-preferred embodiment of the first aspect of the invention is: a method for predicting the prognosis of Renal Cell Carcinoma (RCC) in an individual, comprising the steps of:

- providing a sample comprising one or more Renal Cell Carcinoma cell from the individual;

- determining in the one or more cell the expression level of at least three genes selected from those listed in Table A; and

- predicting the prognosis of Renal Cell Carcinoma in the individual based on the expression level of the at least three genes; wherein the step of predicting the prognosis of Renal Cell Carcinoma in the individual is additionally based on the age of the individual;

and wherein the at least three genes comprise: N-cadherin, EpCAM and mTOR; and wherein the sample is a tumour biopsy and the sample comprises tissue from two or more (and, preferably, three or more) parts of the tumour.

A particularly-preferred embodiment of the second aspect of the invention is: a method for predicting the progression of Renal Cell Carcinoma in an individual, comprising the steps of:

- providing a sample comprising one or more Renal Cell Carcinoma cell from the individual; - determining in the one or more cell the expression level of at least three genes selected from those listed in Table A; and

- predicting the progression of Renal Cell Carcinoma in the individual based on the expression level of the at least three genes; wherein the step of predicting the progression of Renal Cell Carcinoma in the individual is additionally based on the age of the individual;

and wherein the at least three genes comprise: N-cadherin, EpCAM and mTOR; and wherein the sample is a tumour biopsy and the sample comprises tissue from two or more (and, preferably, three or more) parts of the tumour.

In a third aspect, the invention provides a method for predicting the response to therapy of Renal Cell Carcinoma in an individual, comprising the steps of: - providing a sample comprising one or more Renal Cell Carcinoma cell from the individual;

- determining in the one or more cell the expression level of at least three genes selected from those listed in Table A; and

- predicting the response to therapy of Renal Cell Carcinoma in the individual based on the expression level of the at least three genes.

In a fourth aspect, the invention provides a method for selecting a treatment for an individual with Renal Cell Carcinoma, comprising the steps of: - providing a sample comprising one or more Renal Cell Carcinoma cell from the individual;

- determining in the one or more cell the expression level of at least three genes selected from those listed in Table A; and

- selecting a treatment for the individual based on the expression level of the at least three genes.

It will be appreciated that, in the methods of the third and fourth aspects of the invention, the step of "providing a sample comprising one or more Renal Cell Carcinoma cell from the individual" and "determining in the one or more cell the expression level of at least three genes selected from those listed in Table A" may be performed as described above in relation to the first or second aspects of the invention. It will be appreciated that any combination of at least three genes from those listed in Table A could be selected and used in the third and/or fourth aspects of the invention. Additionally, it will be appreciated that the expression level of more than three genes selected from those listed in Table A could be used in the present invention - for example: four or more; or five or more; or six or more; or seven or more; or eight or more; or nine or more; or ten or more; or 11 or more; or 12 or more; or 13 or more; or 14 or more; or 15 or more; or 16 or more; or 17 or more; or 18 or more; or 19 or more; or 20 or more; or 21 or more; or 22 or more; or 23 or more; or 24 or more; or 25 or more; or 26 or more; or 27 or more; or 28 or more; or 29 or more; or 30 of the genes in Table A.

As discussed herein and in the accompanying Examples, the inventors' findings provide an approach for predicting the prognosis and/or progression of Renal Cell Carcinoma in an individual that has been treated, or is being treated, with an anti-Renal Cell Carcinoma treatment (such as sunitinib or pazopanib). Thus, the invention provides a method for determining whether the treatment is therapeutically effective for that individual, and therefore allows a decision to be made regarding the future treatment of that individual. Those skilled in the art will be aware of methods for determining the response to therapy - for example, the RECIST (Response Evaluation Criteria In Solid Tumours) criteria provides a set of rules that define when cancer patients improve ("respond"), stay the same ("stable") or worsen ("progression") during treatments, and were originally published in February 2000 and updated in January 2009 (www.recist.com).

For example, where the individual has been treated, or is being treated, with sunitinib and it is determined not to be therapeutically effective, another treatment (such as pazopanib or axitinib) can be administered to the individual instead, or in addition to, sunitinib.

As an alternative example, where the individual has been treated, or is being treated, with sunitinib and it is determined to be therapeutically effective, that treatment can be continued.

It is preferred that the step of predicting the response to therapy or selecting a treatment for the individual is additionally based on the age of the individual.

As discussed above, the age of the individual is associated with prognosis and/or progression of Renal Cell Carcinoma in that individual. In particular, as demonstrated in the accompanying Examples, increasing age correlates with a poor prognosis whilst decreasing age correlates with a good prognosis; and increasing age correlates with an increased likelihood of progression whilst decreasing age correlates with a decreased likelihood of progression. Thus, older individuals have a greater risk of a poor prognosis than younger individuals. It is preferred that, in the third or fourth aspect of the invention, the individual has been treated, or is being treated, with one or more anti-Renal Cell Carcinoma treatment. In an alternative embodiment, the individual has not previously been treated with an anti-Renal Cell Carcinoma treatment. Preferably, the invention provides a method wherein the Renal Cell Carcinoma is characterised in that it comprises one or more of the following: a Stage I tumour; a Stage II tumour; a Stage III tumour; a Stage IV tumour.

Preferably, the invention provides a method wherein the Renal Cell Carcinoma is characterised in that it comprises one or more of the following: a Grade 1 tumour; a Grade 2 tumour; a Grade 3 tumour; a Grade 4 tumour.

In a preferred embodiment of the third and/or fourth aspect of the invention, the step of predicting the response to therapy or selecting a treatment for the individual is based on the expression level of the at least three genes, relative to a control.

In a preferred embodiment, the control comprises a standard level of expression level of the same at least three genes that are analysed in the sample from the individual being tested. For example, where the expression level of N-Cadherin and EpCAM and mTOR is determined in the sample, the control comprises a standard level of expression of the N-Cadherin and EpCAM and mTOR genes.

In one embodiment, the control comprises one or more Renal Cell Carcinoma cell known to be associated with a poor prognosis or progression, and/or is resistant to an anti-Renal Cell Carcinoma treatment.

In that embodiment of the invention, it is preferred that the control comprises Renal Cell Carcinoma cells obtained from a group of individuals. In another embodiment, the control comprises one or more Renal Cell Carcinoma cell known to be associated with a favourable prognosis or progression, and/or is treatable with an anti-Renal Cell Carcinoma treatment. For example, the control may comprise one or more corresponding renal cell which is not cancerous. In that embodiment, it is preferred that the control comprises renal cells obtained from a group of individuals.

In the third and fourth aspects of the invention, it is preferred that the at least three genes comprise: N-cadherin, EpCAM and mTOR.

It is particularly preferred in the methods of the invention that the at least three genes comprise or consist of: N-cadherin, EpCAM and mTOR, and that the method of the invention is additionally based on the age of the individual.

Preferably, the step of predicting the response to therapy of Renal Cell Carcinoma in the individual, or the step of selecting a treatment for the individual, is based on the algorithm:

Hazard = exp(-18.385 mTOR + 8.927 N-Cadherin + 3.800 EpCAM + 0.129 Age).

Conveniently, according to the methods of the invention, a Hazard of >1 indicates a poor prognosis or progression, and wherein a Hazard of <1 indicates a favourable prognosis or progression.

In the algorithm of the invention, the value assigned to the expression level of N-cadherin, EpCAM and mTOR is an estimate of the concentration of that protein relative to total protein or to the concentration of a specific control protein (such as pan-cytokeratin), as described in the accompanying Examples.

In one embodiment of the third and fourth methods of the invention, the expression level is determined by measuring the presence and/or amount of one or more product of the gene, for example: protein or mRNA.

Methods for measuring the expression level of a gene are well known to those in the art of biochemistry and are described above.

Preferably, protein is measured using a method selected from the list comprising: Raman spectroscopy; Acoustic Membrane MicroParticle technology; an antibody-based detection method, for example, RPPA orAQUA. Preferably, mRNA is measured using a PCR-based approach, for example RT-PCR. In an embodiment, the method of the third aspect of the invention further comprises the step of selecting a treatment for the individual. In an embodiment, the method of the third or fourth aspect of the invention further comprise the step of administering the selected treatment to the individual.

Preferably, the treatment selected and/or administered in the methods of third or fourth aspects of the invention is an anti-Renal Cell Carcinoma treatment, such as a Tyrosine Kinase Inhibitor or an mTOR Inhibitor. Anti-Renal Cell Carcinoma treatments, and such inhibitors, are known to those skilled in the art. It is preferred that the Tyrosine Kinase Inhibitor is selected from the group comprising: sunitinib; sorafenib; bevacizumab; pazopanib; axitinib. It is preferred that the mTOR Inhibitor is selected from the group comprising: temsirolimus; everolimus.

Most preferably, the anti-Renal Cell Carcinoma treatment is sunitinib. Typically, sunitinib is administered in a six-week treatment cycle comprising four-weeks of continuous treatment of 50mg sunitinib administered once per day, followed by two-weeks of no treatment. The dosage may be adjusted in steps of 12.5mg according to tolerability.

In common with the first and second aspects of the invention, in the methods of the third and fourth aspects of the invention it is preferred that the sample is selected from the group comprising: a tumour biopsy; blood; serum; plasma; lymphatic fluid; urine. Details of those sample types are discussed above.

As discussed above, the inventors have identified that, where the methods of the invention are performed using a tumour biopsy sample, the results are of greater significance if the sample contains tissue from two or more (and preferably, three or more) distinct parts of the tumour. As discussed above, Renal Cell Carcinoma tumours exhibit significant heterogeneity across the tumour, and the inventors' approach of generating a "combined" or "averaged" sample, is thought to provide a sample that is a better representation of the tumour as a whole - by doing so, the methods of the invention are less likely to be based on an atypical part of the tumour, and the results consequently have greater significance and accuracy.

Thus, in a particularly preferred embodiment, the sample is a tumour biopsy and the sample comprises tissue from two or more distinct parts of the tumour. It will be appreciated that the tissue will comprise one or more Renal Cell Carcinoma cell from the part of the tumour from which it is taken.

It is preferred that the sample comprises tissue from three or more distinct parts of the tumour. However, the sample may comprise tissue from four or more distinct parts of the tumour; or tissue from five or more distinct parts of the tumour; or tissue from six or more distinct parts of the tumour; or tissue from seven or more distinct parts of the tumour; or tissue from eight or more distinct parts of the tumour; or tissue from nine or more distinct parts of the tumour; or tissue from ten or more distinct parts of the tumour.

Typically, each sample comprises 1 cubic centimetre (i.e. 1 cm3) volume of tissue, or more (such as 2cm3 of tissue; or 3cm3 of tissue).

Preferably, tissue from the relevant parts of the tumour is obtained by taking a separate biopsy from each distinct part of the tumour. However, it will be appreciated that, where distinct parts of a tumour are adjacent to one another, tissue from the relevant parts of the tumour may be obtained by taking a single biopsy which comprises distinct part of the tumour. Where separate biopsies are taken, it is preferred in the methods and uses of the invention that the step of determining the expression level of the selected genes is performed separately in each biopsy.

Preferably, the parts of the tumour are distinguished from each other, for example, on the basis of one or more of the following: tumour morphology (such as Fuhrman grade and/or sarcomatoid features and/or necrosis); and/or physical location within the tumour.

Approaches for sampling a tumour are known to those in the art and include: 1) Sampling guided by Fuhrman grade, to collect samples representative of the

Fuhrman grade diversity in the tumour;

2) Sampling of the tumour to avoid collection of material from necrotic regions;

3) Sampling guided by physical proximity (for example, in the centre of the tumour, at the edge of the tumour, in between the edge and the centre of the tumour). It is known that such parts of tumours may have differences in terms of oxygenation, tumour cell biology (for example, invasive character and/or differentiation status), and composition of the stromal components; 4) Sampling guided by sarcomatoid features;

5) Sampling of the tumour at biopsy (not nephrectomy) where areas which are radiologically variable (i.e. no CT scanning) are sampled.

It will be appreciated that any or all of the above approaches may be used within methods and uses of the invention.

Suitable approaches are known in the art for selecting and taking tissue from distinct parts of a tumour in order to generate a tumour biopsy sample for use in the invention. For example, the parts of the tumour may be randomly selected from within the tumour. Alternatively, the parts of the tumour may be purposely selected from distinct regions of the tumour - for example, to ensure that certain distinct parts of the tumour are selected for analysis, as discussed above.

Thus, a particularly-preferred embodiment of the third aspect of the invention is: a method for predicting the response to therapy of Renal Cell Carcinoma in an individual, comprising the steps of:

- providing a sample comprising one or more Renal Cell Carcinoma cell from the individual;

- determining in the one or more cell the expression level of at least three genes selected from those listed in Table A; and

- predicting the response to therapy of Renal Cell Carcinoma in the individual based on the expression level of the at least three genes; wherein the step of predicting the response to therapy of Renal Cell Carcinoma in the individual is additionally based on the age of the individual;

and wherein the at least three genes comprise: N-cadherin, EpCAM and mTOR; and wherein the sample is a tumour biopsy and the sample comprises tissue from two or more (and, preferably, three or more) parts of the tumour.

A particularly-preferred embodiment of the fourth aspect of the invention is: a method for selecting a treatment for an individual with Renal Cell Carcinoma, comprising the steps of:

- providing a sample comprising one or more Renal Cell Carcinoma cell from the individual;

- determining in the one or more cell the expression level of at least three genes selected from those listed in Table A; and - selecting a treatment for the individual based on the expression level of the at least three genes; wherein the step of selecting a treatment for the individual is additionally based on the age of the individual;

and wherein the at least three genes comprise: N-cadherin, EpCAM and mTOR; and wherein the sample is a tumour biopsy and the sample comprises tissue from two or more (and, preferably, three or more) parts of the tumour. As discussed above, the invention provides methods and uses for characterising and classifying Renal Cell Carcinoma cells in an individual, and provides a means for identifying cells that are associated with poor prognosis and/or resistance to sunitinib. Such cells may be useful as tools in methods for identifying alternative agents that can be used as effective therapeutics against such cells.

Accordingly, in a fifth aspect, the invention provides a method for identifying an agent suitable for treating Renal Cell Carcinoma, comprising the steps of: providing an agent to be tested;

- providing a sample comprising one or more Renal Cell Carcinoma cell, wherein the one or more cell is characterised in that the expression level of at least three genes selected from those listed in Table A is associated with poor prognosis and/or resistance to sunitinib;

- contacting the agent to be tested with the one or more cell; and - identifying the agent as suitable for treating Renal Cell Carcinoma if the agent reduces and/or prevents proliferation and/or differentiation of the one or more cell.

Preferably in the fifth aspect of the invention, the sample is a tumour biopsy and the sample comprises tissue from two or more distinct parts of the tumour. It will be appreciated that the tissue will comprise one or more Renal Cell Carcinoma cell from the part of the tumour from which it is taken.

It is preferred that the sample comprises tissue from three or more distinct parts of the tumour. However, the sample may comprise tissue from four or more distinct parts of the tumour; or tissue from five or more distinct parts of the tumour; or tissue from six or more distinct parts of the tumour; or tissue from seven or more distinct parts of the tumour; or tissue from eight or more distinct parts of the tumour; or tissue from nine or more distinct parts of the tumour; or tissue from ten or more distinct parts of the tumour.

Typically, each sample comprises 1 cubic centimetre (i.e. 1 cm3) volume of tissue, or more (such as 2cm3 of tissue; or 3cm3 of tissue).

Preferably in the fifth aspect of the invention, tissue from the relevant parts of the tumour is obtained by taking a separate biopsy from each distinct part of the tumour. However, it will be appreciated that, where distinct parts of a tumour are adjacent to one another, tissue from the relevant parts of the tumour may be obtained by taking a single biopsy which comprises distinct part of the tumour.

Where separate biopsies are taken, it is preferred in the methods and uses of the invention that the step of determining the expression level of the selected genes is performed separately in each biopsy.

Preferably in the fifth aspect of the invention, the parts of the tumour are distinguished from each other, for example, on the basis of one or more of the following: tumour morphology (such as Fuhrman grade and/or sarcomatoid features and/or necrosis); and/or physical location within the tumour.

Approaches for sampling a tumour are known to those in the art and include:

1 ) Sampling guided by Fuhrman grade, to collect samples representative of the Fuhrman grade diversity in the tumour;

2) Sampling of the tumour to avoid collection of material from necrotic regions;

3) Sampling guided by physical proximity (for example, in the centre of the tumour, at the edge of the tumour, in between the edge and the centre of the tumour). It is known that such parts of tumours may have differences in terms of oxygenation, tumour cell biology (for example, invasive character and/or differentiation status), and composition of the stromal components;

4) Sampling guided by sarcomatoid features;

5) Sampling of the tumour at biopsy (not nephrectomy) where areas which are radiologically variable (i.e. no CT scanning) are sampled.

It will be appreciated that any or all of the above approaches may be used within the methods and uses of the invention. Suitable approaches are known in the art for selecting and taking tissue from distinct parts of a tumour in order to generate a tumour biopsy sample for use in the invention. For example, the parts of the tumour may be randomly selected from within the tumour. Alternatively, the parts of the tumour may be purposely selected from distinct regions of the tumour - for example, to ensure that certain distinct parts of the tumour are selected for analysis, as discussed above.

Assays suitable for determining proliferation and/or differentiation of the one or more cell are well known to those in the art of cell biology. For example, one or more of the following assays may be used: a 2D invasion assay (for example, scratch plate and/or transwell); a 3D invasion assay (for example, as described in PLoS One 2011 ;6:e17083); an apoptosis assay; a cell viability assay; a growth inhibition assay; a vascularisation assay (for example, as described in Microvasc. Res 2014;92:72) Kits suitable for performing such assays are commercially available and may be sourced from, for example, Roche, Cell Biolabs, and/or Millipore.

It will be appreciated that any combination of at least three genes from those listed in Table A could be selected and used in the present invention.

Additionally, it will be appreciated that the expression level of more than three genes selected from those listed in Table A could be used in the present invention - for example: four or more; or five or more; or six or more; or seven or more; or eight or more; or nine or more; or ten or more; or 11 or more; or 12 or more; or 13 or more; or 14 or more; or 15 or more; or 16 or more; or 17 or more; or 18 or more; or 19 or more; or 20 or more; or 21 or more; or 22 or more; or 23 or more; or 24 or more; or 25 or more; or 26 or more; or 27 or more; or 28 or more; or 29 or more; or 30 of the genes in Table A.

Preferably, in the method of the fifth aspect of the invention, the at least three genes comprise: N-cadherin, EpCAM and mTOR.

In an embodiment, the method of the fifth aspect of the invention further comprises the step of manufacturing the identified agent. In a further preferred embodiment, the method of the fifth aspect of the invention further comprises the step of formulating the identified agent into a pharmaceutical composition. It is preferred in the methods, uses and kits of the invention that the Renal Cell Carcinoma is metastatic clear cell Renal Cell Carcinoma. Typically, in metastatic Renal Cell Carcinoma, the cancer has metastasised to one or more of the following: lymph node; lung; liver; adrenal gland; brain; bone.

In a sixth aspect, the invention provides a kit for performing the method of the first and/or second and/or third and/or fourth aspect of the invention, the kit comprising: one or more reagent for determining the expression level of at least three genes selected from those listed in Table A; and

one or more control sample comprising a standard level of expression level of the same at least three genes as defined in (i), above.

In one embodiment of the sixth aspect of the invention, the one or more control sample comprises one or more Renal Cell Carcinoma cell known to be associated with a poor prognosis or progression, and/or is resistant to an anti-Renal Cell Carcinoma treatment. In that embodiment of the invention, it is preferred that the control comprises Renal Cell Carcinoma cells obtained from a group of individuals. In another embodiment of the sixth aspect of the invention, the control comprises one or more Renal Cell Carcinoma cell known to be associated with a favourable prognosis or progression, and/or is treatable with an anti-Renal Cell Carcinoma treatment. For example, that control may comprise one or more corresponding renal cell which is not cancerous. In that embodiment, it is preferred that the control comprises renal cells obtained from a group of individuals.

Preferably, the one or more reagent for determining the expression level is an antibody, and is preferably one or more antibody selected from those listed in Table 2. It will be appreciated that the kit may comprise one or more reagent for determining the expression level of any combination of at least three genes from those listed in Table A. Additionally, it will be appreciated that the kit may comprise one or more reagent for determining the expression level of more than three genes selected from those listed in Table A - for example: four or more; or five or more; or six or more; or seven or more; or eight or more; or nine or more; or ten or more; or 11 or more; or 12 or more; or 13 or more; or 14 or more; or 15 or more; or 16 or more; or 17 or more; or 18 or more; or 19 or more; or 20 or more; or 21 or more; or 22 or more; or 23 or more; or 24 or more; or 25 or more; or 26 or more; or 27 or more; or 28 or more; or 29 or more; or 30 of the genes in Table A.

It is preferred that the at least three genes comprises or consists of N-Cadherin and EpCAM and mTOR. More preferably, the one or more reagent for determining the expression level comprises: an antibody against N-Cadherin, and an antibody against EpCAM, and an antibody against mTOR. Most preferably, those antibodies are as defined in Table 2. In a seventh aspect, the invention provides the use of one or more Renal Cell Carcinoma cell from an individual in predicting the prognosis of Renal Cell Carcinoma in the individual, comprising the step of determining the expression level in the one or more Renal Cell Carcinoma cell of at least three genes selected from those listed in Table A and, optionally, additionally based on the age of the individual. It is preferred that the at least three genes comprises N-cadherin, EpCAM and mTOR. It is preferred that the sample is a tumour biopsy and the sample comprises tissue from two or more (and, preferably, three or more) distinct parts of the tumour. Exemplary approaches for performing such a use are discussed above. In an eighth aspect, the invention provides the use of one or more Renal Cell Carcinoma cell from an individual in predicting the progression of Renal Cell Carcinoma in the individual, comprising the step of determining the expression level in the one or more Renal Cell Carcinoma cell of at least three genes selected from those listed in Table A and, optionally, additionally based on the age of the individual. It is preferred that the at least three genes comprises N-cadherin, EpCAM and mTOR. It is preferred that the sample is a tumour biopsy and the sample comprises tissue from two or more (and, preferably, three or more) distinct parts of the tumour. Exemplary approaches for performing such a use are discussed above. In a ninth aspect, the invention provides the use of one or more Renal Cell Carcinoma cell from an individual in predicting the response to therapy of Renal Cell Carcinoma in the individual, comprising the step of determining the expression level in the one or more Renal Cell Carcinoma cell of at least three genes selected from those listed in Table A and, optionally, additionally based on the age of the individual. It is preferred that the at least three genes comprises N-cadherin, EpCAM and mTOR. It is preferred that the sample is a tumour biopsy and the sample comprises tissue from two or more (and, preferably, three or more) distinct parts of the tumour. Exemplary approaches for performing such a use are discussed above.

In a tenth aspect, the invention provides the use of one or more Renal Cell Carcinoma cell from an individual in selecting a treatment for the Renal Cell Carcinoma in the individual, comprising the step of determining the expression level in the one or more Renal Cell Carcinoma cell of at least three genes selected from those listed in Table A and, optionally, additionally based on the age of the individual. It is preferred that the at least three genes comprises N-cadherin, EpCAM and mTOR. It is preferred that the sample is a tumour biopsy and the sample comprises tissue from two or more (and, preferably, three or more) distinct parts of the tumour. Exemplary approaches for performing such a use are discussed above.

In an eleventh aspect, the invention provides the use of one or more Renal Cell Carcinoma cell from an individual in identifying an agent suitable for treating Renal Cell Carcinoma, characterised in that the one or more cell has an expression level of at least three genes selected from those listed in Table A, that is associated with poor prognosis and/or resistance to sunitinib. It is preferred that the at least three genes comprises N-cadherin, EpCAM and mTOR. It is preferred that the sample is a tumour biopsy and the sample comprises tissue from two or more (and, preferably, three or more) distinct parts of the tumour. Exemplary approaches for performing such a use are discussed above.

In a twelfth aspect, the invention provides the use of the formula: Hazard = exp(-18.385 mTOR + 8.927 N-Cadherin + 3.800 EpCAM + 0.129 Age), in predicting the prognosis of Renal Cell Carcinoma in an individual, or predicting the progression of Renal Cell Carcinoma in an individual, or predicting the response to therapy of Renal Cell Carcinoma in an individual, or selecting a treatment for Renal Cell Carcinoma in an individual. Exemplary approaches for performing such uses are discussed above.

The invention further provides a method or a use or a kit substantially as claimed or described herein, with reference to the accompanying description and/or examples and/or drawings. The listing or discussion in this specification of an apparently prior-published document should not necessarily be taken as an acknowledgement that the document is part of the state of the art or is common general knowledge. Preferred, non-limiting examples which embody certain aspects of the invention will now be described, with reference to the following figures:

Figure 1 - Expression of proteins differentially expressed in sunitinib-na ' fve and sunitinib- exposed tumours. The distribution of median expression values are shown in pairs for drug naive (left of each pair, light shading) and exposed (right of each pair, dark shading) tumour samples. Thirty proteins had significant expression differences (FDR p <0.05).

Figure 2 - Median expression values per tumour for proteins that had differential expression and substantively higher variance following sunitinib exposure. Values for sunitinib-exposed tumours are shown on the right (dark shading) and sunitinib-na ' fve on the left (light shading). This figure does not capture the intratumoral variance, because median expression values per tumour are plotted.

Figure 3 - Overall survival of mccRCC development cohort stratified by the NEAT algorithm. Stratification is based on a Cox proportional hazards model trained using the neoadjuvant sunitinib cohort (n=22, 'RCCJTRAIN'), hence sunitinib-exposed tissue was analysed. Features selected by regularised machine learning were median values of N- cadherin, EpCAM and mTOR plus age (NEAT). The identified features and model coefficients were learned on the data shown, which therefore does not provide an independent test. The high risk patient group is shown with the dashed line; the low risk group with dotted line (Mantel-Cox test p = 0.00553). There is a striking survival difference at two years.

Figure 4 - Overall survival of mccRCC validation cohort, stratified by the NEAT algorithm. Groups were defined according to risk score from the Cox proportional hazards model based on median tumour expression value of N-Cadherin, EpCAM and mTOR, plus age (NEAT). The data shown is for the cytoreductive nephrectomy validation cohort (n=22, 'RCC_TEST') where marker expression was determined from sunitinib-na ' fve tissue. The data analysed was independent of feature selection and of fitting the model coefficients. The high risk patient group is shown with the dashed line; the low risk group with dotted line (Mantel-Cox test p = 7.62x10 "7 ) The survival difference between these groups at 2 years is dramatic. Figure 5 - Overall Survival distributions for the RCC_TRAIN (Su R) and RCC_TEST (SCOTRCC) cohorts. The above distributions show bimodality for both cohorts studies, with very similar mode positions around 11 and 27 months. RCC_TRAIN has largest proportion of patients in the mode centred around 27 months (reaching density value of 0.037). RCC_TEST has largest proportion of patients in the mode around 11 months (reaching density value of 0.049). All data, including censored, are shown. The proportion of patients in the long and short survival times differs between the groups. The apparently larger proportion of patients with short survival in RCC_TEST is partly due to greater censoring in this cohort (55%) compared with RCC_TRAIN (41 %).

Figure 6 - Within tumour variance ratio of sunitinib naive and sunitinib treated samples. The number of proteins analysed for which the median intratumoural variance is greater in the sunitinib naive group (n=15); greater in the sunitinib treated group (n=40). There are significantly more proteins with higher median variance in the treated group than the untreated group (P=0.00101 , binomial-test). Proteins with variance below zero on x- axis are greater for the untreated primary tumour, those above the x-axis zero were greater in the treated primary tumour. This result therefore demonstrates significantly increased molecular variance upon treatment.

Figure 7 - RPPA differential protein expression results. Box-and-whisker plot showing differential expression analysis of 55 proteins evaluated by RPPA in sunitinib naive (light blue) and treated (dark blue) ccRCC samples. *=P<0.05, **=P<0.01 , ***=P<0.001. Figure 8 - Results for significantly variable and differentially expressed proteins in sunitinib naive and treated patient test and validation samples. (A) Box-and-whisker plot showing test set RPPA differential expression results of 4 key proteins. Medians and inter-quartile ranges are shown in the figure. (B) Box-and-whisker plot showing AQUA evaluated protein expression of 4 key proteins using validation cohort of 61 sunitinib and 25 pazopanib treated and untreated paired mccRCC samples. Of the 4 proteins, CA9 was the only with significant differential protein expression (P=0.01). Medians and inter-quartile ranges are shown in the figure. (C) Kaplan-Meier curve showing relationship of CA9 protein expression determined by AQUA in situ analysis (low vs high, as determined using X-tile (Camp et al., Clin Cancer Res 2004;10:7252-9) in sunitinib/pazopanib treated patients to OS (HR=0.260, 95% CI: 0.11 1-0.608, P=0.001 ). Figure 9 - Array CGH and RNA interference CA9 results. (A) Heat map plotting gains (red) and losses (blue) of CA9. The right-hand bar represents the lod score (-log 10) of the adjusted P-value (Fisher's test), dashed line represents P=0.05. There were significantly more losses in the treated samples relative to the untreated patient samples (P=0.002, Fisher's test). See Figure 10 for further description of regional chromosomal changes and Figure 11 for details of the genome wide changes in gains and losses following sunitinib therapy. (B) RCC1 1 and (C) CAKI-2 human RCC transfected with either control or CA9 siRNA, followed by sunitinib treatment and cell viability analysis 5 days later. Error bars represent standard errors of the mean. To confirm silencing, cell lysates from RCC11 and CAKI-2 siRNA transfected cells were analysed by western blotting using CA9 and β-actin- specific antibodies, as indicated.

Figure 10 - Heatmap from aCGH data illustrating the recurrently lost region containing CA9 comprises 5.2MB on chromosome 9 from 32554042 to 38751914. CA9 locus is indicated.

Figure 11 - Frequency plot illustrating quantity of aCGH gains (green) and losses (red) across the entire whole genome. In the sunitinib treated patient samples there are increases in gains (P=0.02) and losses (P=0.2) relative to the untreated patient samples.

Figure 12 - Overall approach for investigation of the effect of subsampling on NEAT predictive performance. A total of 10 6 combinations of n={1 ,2,3} samples were analysed across the 22 patients in RCC_TEST where multiregional sampling was performed to capture identified morphological heterogeneity (top left). The distributions of hazard ratios and log-rank p-values (bottom right) across the 10 6 samples taken using a low-discrepancy sequence (methods) are shown in more detail in Figure 13.

Figure 13 - Log hazard ratios (A) and log-rank p-values (B) from testing NEAT on RCC_TEST using a maximum of 1, 2 or 3 tumour samples from each patient, over one million sampling runs for each maximum. Protein concentrations for input to NEAT used median expression from the selected samples. Where patients had equal or fewer samples than the maximum indicated, all their samples were used. Input age values were unchanged. Hazard and p-values were determined by testing NEAT on the subsampled RCC_TEST cohorts. The vertical dot-dash lines indicates baseline NEAT performance using all available samples for each patient (log-hazard ratio, A). Using one sample per patient results in poor algorithm performance, although hazard ratios are significantly better than random; higher numbers of samples result in significantly improved performance (p<10 "324 ).

Figure 14 - Variation in per-patient NEAT risk score using a limited number of tumour samples. Each plot shows the distribution of NEAT risk scores using every possible combination of tumour samples for a particular patient. Vertical bars indicate log risk score (logRS) range using the specified number of samples. All patients had 2-8 samples available (median 4). The baseline risk using all samples is thus shown on the right of each plot as a single point. In general, logRS spread decreases as the number of samples increases. For many patients the risk score distribution encompasses the classification threshold (logRS=0); therefore assignment of high or low risk is sensitive to the tumour sampling regime.

Table 1 - Clinical characteristics of cohorts studied, o Hb <130 (M), <110 (F) * One patient in the SuMR cohort had insufficient tissue for grade analysis.

Table 2 - Antibodies used.

Table 3 - Multivariate Cox proportional-hazards results for overall survival using predictive variables identified by regularised wrapper selection on the SuMR mccRCC cohort (neo-adjuvant sunitinib).

Table 4 - Performance Characteristics of the NEAT Algorithm Against Clinical Nomograms. *n=20 due to missing data; **n=16 due to missing data.

Table 5 - Results of Grambsch-Therneau Test of Proportional Hazards Assumption for NEAT model.

Table 6 - Multivariate Cox modelling results with a single molecular marker excluded. * Likelihood-ratio p values for models 1 , 2, 3 were respectively 0.00175, 0.65, 0.081.

Table 7 - Patient demographics, pathology details and clinical outcomes of test and validation patient cohorts from whom there was adequate tumour tissue for molecular analysis, in Example 2. IQR: interquartile range; N/A: not applicable; NA: not available; P- values comparing sunitinib naive and sunitinib treated patients: P>0.05, S P=0.02; + P=0.04'; A number of sunitinib naive patients who had post-nephrectomy TKIs. Table 8 - Validated antibodies used in RPPA and AQUA experiments in Example 2.

Table 9 - Multivariate Cox regression analysis to assess the effect of multiple prognostic factors on overall survival of the validation cohort in Example 2. ** p<0.05.

Table 10 - Multivariate Cox proportional-hazards results for overall survival using NEAT normalised by total protein (NEAT_NORM) values on the SuMR mccRCC cohort (neoadjuvant sunitinib).

REFERENCES

Armstrong, A.J., George, D.J. & Halabi, S., 2012. Serum Lactate Dehydrogenase Predicts for Overall Survival Benefit in Patients With Metastatic Renal Cell Carcinoma Treated With Inhibition of Mammalian Target of Rapamycin. Journal of Clinical Oncology, 30(27), pp.3402-3407.

Banumathy, G. & Cairns, P., 2010. Signaling pathways in renal cell carcinoma. Cancer biology & therapy, 10(7), pp.658-664.

Bradburn, M.J. et al., 2003a. Survival Analysis Part II: Multivariate data analysis - an introduction to concepts and methods. British Journal of Cancer, 89(3), pp.431-436. Bradburn, M.J. et al., 2003b. Survival Analysis Part III: Multivariate data analysis - choosing a model and assessing its adequacy and fit. British Journal of Cancer, 89(4), pp.605-611.

Conover, W.J., Johnson, M.E. & Johnson, M.M., 1981. A Comparative Study of Tests for Homogeneity of Variances, with Applications to the Outer Continental Shelf Bidding Data. Technometrics, 23(4), pp.351-361.

Cox, D., 1972. Regression models and life tables. Journal of the Royal Statistical Society B, 34, pp.187-220.

Donhuijsen, K. & Schulz, S., 1989. Prognostic significance of vimentin positivity in formalin- fixed renal cell carcinomas. Pathology research and practice, 184(3), pp.287-291. Eichelberg, C. et al., 2013. Epithelial cell adhesion molecule is an independent prognostic marker in clear cell renal carcinoma. International journal of cancer. Journal international du cancer, 132(12), pp.2948-2955. Galsky, M.D., 2013. A prognostic model for metastatic renal-cell carcinoma. The Lancet Oncology, 14(2), pp.102-103.

Gerlinger, M. et al., 2012. Intratumor Heterogeneity and Branched Evolution Revealed by Multiregion Sequencing. New England Journal of Medicine, 366(10), pp.883-892.

Grambsch, P.M. & Therneau, T.M., 1994. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika, 81 (3), pp.515-526.

Harada, K. et al., 2012. Expression of epithelial-mesenchymal transition markers in renal cell carcinoma: impact on prognostic outcomes in patients undergoing radical nephrectomy. BJU international, 1 10(1 1 Pt C), pp.E1 131-1137.

Heng, D.Y. et al., 2013. External validation and comparison with other models of the International Metastatic Renal-Cell Carcinoma Database Consortium prognostic model: a population-based study. The Lancet Oncology, 14(2), pp.141-148.

Johnson, W.E., Li, C. & Rabinovic, A., 2007. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8(1), pp.1 18-127. Katz, E. et al., 201 1. An In Vitro Model That Recapitulates the Epithelial to Mesenchymal Transition (EMT) in Human Breast Cancer. PLoS ONE, 6(2). Available at: http://dx.doi.org/10.1371/journal.pone.0017083 [Accessed May 6, 2014].

Kim, H.L et al., 2005. Using tumor markers to predict the survival of patients with metastatic renal cell carcinoma. The Journal of urology, 173(5), pp.1496-1501.

Kohavi, R. & John, G.H., 1997. Wrappers for feature subset selection. Artificial Intelligence, 97(1-2), pp.273-324.

Kolch, W. & Pitt, A., 2010. Functional proteomics to dissect tyrosine kinase signalling pathways in cancer. Nature Reviews Cancer, 10(9), pp.618-629.

Laplante, M. & Sabatini, D.M., 2012. mTOR signaling in growth control and disease. Cell, 149(2), pp.274-293.

LEIBOVICH, B.C. et al„ 2005. A SCORING ALGORITHM TO PREDICT SURVIVAL FOR PATIENTS WITH METASTATIC CLEAR CELL RENAL CELL CARCINOMA: A STRATIFICATION TOOL FOR PROSPECTIVE CLINICAL TRIALS. The Journal of Urology, 174(5), pp.1759-1763.

Lynch, T.J. et al., 2004. Activating Mutations in the Epidermal Growth Factor Receptor Underlying Responsiveness of Non-Small-Cell Lung Cancer to Gefitinib. New England Journal of Medicine, 350(21 ), pp.2129-2139.

Maetzel, D. et al., 2009. Nuclear signalling by tumour-associated antigen EpCAM. Nature cell biology, 11 (2), pp.162-171.

Mendel, D.B. et al., 2003. In Vivo Antitumor Activity of SU11248, a Novel Tyrosine Kinase Inhibitor Targeting Vascular Endothelial Growth Factor and Platelet-derived Growth Factor Receptors Determination of a Pharmacokinetic/Pharmacodynamic Relationship. Clinical Cancer Research, 9(1), pp.327-337.

Motzer, R.J. et al., 2002. Interferon-Alfa as a Comparative Treatment for Clinical Trials of New Therapies Against Advanced Renal Cell Carcinoma. Journal of Clinical Oncology, 20(1), pp.289-296. Motzer, R.J. et al., 2009. Overall survival and updated results for sunitinib compared with interferon alfa in patients with metastatic renal cell carcinoma. Journal of clinical oncology: official journal of the American Society of Clinical Oncology, 27(22), pp.3584-3590.

Motzer, R.J. et al., 2007. Sunitinib versus Interferon Alfa in Metastatic Renal-Cell Carcinoma. New England Journal of Medicine, 356(2), pp.1 15-124.

Neeley, E.S. et al., 2009. Variable slope normalization of reverse phase protein arrays. Bioinformatics (Oxford, England), 25(1 1), pp.1384-1389. Ng'andu, N.H., 1997. An Empirical Comparison of Statistical Tests for Assessing the Proportional Hazards Assumption of Cox's Model. Statistics in Medicine, 16(6), pp.611— 626.

O'Mahony, F.C. et al., 2013. The use of reverse phase protein arrays (RPPA) to explore protein expression variation within individual renal cell cancers. Journal of visualized experiments: JoVE, (71).

Pantuck, A.J. et al., 2010. NF-kappaB-dependent plasticity of the epithelial to mesenchymal transition induced by Von Hippel-Lindau inactivation in renal cell carcinomas. Cancer research, 70(2), pp.752-761. Pencina, M.J., D'Agostino, R.B., Sr & Steyerberg, E.W., 201 1. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Statistics in medicine, 30(1), pp.1 1-21.

Powles, T. et al., 2011. The Outcome of Patients Treated with Sunitinib Prior to Planned Nephrectomy in Metastatic Clear Cell Renal Cancer. European Urology, 60(3), pp.448- 454. Romond, E.H. et al., 2005. Trastuzumab plus Adjuvant Chemotherapy for Operable HER2- Positive Breast Cancer. New England Journal of Medicine, 353(16), pp.1673-1684.

Schwarz, G., 1978. Estimating the Dimension of a Model. The Annals of Statistics, 6(2), pp.461-464. Seligson, D.B. et al., 2004. Epithelial cell adhesion molecule (KSA) expression: pathobiology and its role as an independent predictor of survival in renal cell carcinoma. Clinical cancer research: an official journal of the American Association for Cancer Research, 10(8), pp.2659-2669. Sharpe, K. et al., 2013. The Effect of VEGF-Targeted Therapy on Biomarker Expression in Sequential Tissue from Patients with Metastatic Clear Cell Renal Cancer. Clinical Cancer Research, 19(24), pp.6924-6934.

Shimazui, T. et al., 2006. Expression profile of N-cadherin differs from other classical cadherins as a prognostic marker in renal cell carcinoma. Oncology Reports. Available at: http://www.spandidos-publications.eom/or/15/5/1181 [Accessed May 6, 2014].

Sokic, S. et al., 2014. Label-free nondestructive imaging of vascular network structure in 3D culture. Microvascular research, 92, pp.72-78.

Spizzo, G. et al., 2011. EpCAM expression in primary tumour tissues and metastases: an immunohistochemical analysis. Journal of clinical pathology, 64(5), pp.415-420.

Stephens, M.A., 1974. EDF Statistics for Goodness of Fit and Some Comparisons. Journal of the American Statistical Association, 69(347), pp.730-737.

Stewart, G.D. et al., 2011. What can molecular pathology contribute to the management of renal cell carcinoma? Nature Reviews Urology, 8(5), pp.255-265. Sun, M., Thuret, R., et al., 2011. Age-Adjusted Incidence, Mortality, and Survival Rates of Stage-Specific Renal Cell Carcinoma in North America: A Trend Analysis. European Urology, 59(1), pp.135-141. Sun, M., Shariat, S.F., et al., 2011. Prognostic Factors and Predictive Models in Renal Cell Carcinoma: A Contemporary Review. European Urology, 60(4), pp.644-661.

Therneau, T.M., 2000. Modeling Survival Data: Extending the Cox Model, Springer. Thiery, J. P. et al., 2009. Epithelial-Mesenchymal Transitions in Development and Disease. Cell, 139(5), pp.871-890.

Trzpis, M. et al., 2007. Epithelial cell adhesion molecule: more than a carcinoma marker and adhesion molecule. The American journal of pathology, 171 (2), pp.386-395.

Vazquez, S. et al., 2012. Sunitinib: the first to arrive at first-line metastatic renal cell carcinoma. Advances in therapy, 29(3), pp.202-217.

Venables, W.N. & Ripley, B.D., 2002. Modern Applied Statistics with S, Springer.

Vogelstein, B. & Kinzler, K.W., 2004. Cancer genes and the pathways they control. Nature medicine, 10(8), pp.789-799.

Y Benjamini, Y.H., 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. Royal Statist. Soc, Series B, 57, pp.289-300.

Zhang, Y., Li, R. & Tsai, C.-L, 2010. Regularization Parameter Selections via Generalized Information Criterion. Journal of the American Statistical Association, 105(489), pp.312- 323.

EXAMPLES

Example 1: Experimental data

Molecular Risk Stratification of Clear Cell Renal Cell Carcinomas by Overall Survival and Sunitinib Response

Introduction

There is great unmet need for better treatment of localised and metastatic renal cancer, which remains the most lethal of all genitourinary cancers. Indeed, five year survival in Renal Cell Carcinoma (RCC) is =40% overall, and approximately 10% when metastasis occurs [Nat Rev Urol 201 1 ;8:255, Eur Urol 2011 ;59: 135]. Around one-third of patients present with detectable metastasis and clear cell RCC (ccRCC) represents about 85% of cases. Existing risk-stratification uses clinico-pathological scoring systems (e.g. Leibovich score [J Urol 2005; 174:1759]). It is well recognised that such methods of stratification have plateaued and that advances in prognostication requires inclusion of molecular markers, which has failed to date [Lancet Oncol 2013; 14: 102; Eur Urol 2015;67:913].

Sunitinib is an orally administered first-line treatment for metastatic clear cell Renal Cancer (mccRCC), and doubles median overall survival compared with immunotherapy [NEJM 2007;356:15, JCO 2009;27:3584]. Sunitinib targets tumour, endothelial and pericyte cells, where the mechanism of action includes competitive inhibition of multiple tyrosine kinases [Adv Ther 2012;29:202, Clin Cancer Res 2003;9:327]. Approximately 70% of patients treated with sunitinib show little or no tumour response [NEJM 2007;356:15], but incur potentially significant toxicity and also cost (roughly £70 million per annum in the UK alone, estimated as 70% of 5,000 metastatic ccRCC (mccRCC) patients treated with sunitinib at cost of >£20,000). In contrast to other tumours treated with targeted therapies (e.g. selection by HER2 expression in breast cancer [NEJM 2005,353: 1673] or EGFR mutations in lung cancer [/VE M 2004;350:2129]) there is no molecular means to select patients with renal cancer for whom targeted therapy is likely to confer benefit.

Current guidelines indicate that percutaneous biopsy may inform mccRCC treatment, taking at least two quality cores >10mm length [Eur Urol 2015;67:913]. However, marked differences have been identified between key pathological characteristics {i.e. grade, sarcomatoid features) determined from biopsies and cytoreductive nephrectomy [J Urol 2010; 184: 1877]. Indeed, intratumoural molecular heterogeneity appears highly prevalent in ccRCC [Λ/ar Genef 2014;46:225, Λ/ar Rev Cancer 2012; 12:323, NEJM 2012; 366: 883], Cancer is a disease of dysregulated pathways; protein abundance and post-translational modifications are key determinants of pathway activity, thus functional proteomics is an attractive approach [Nat Med 2004; 10:789, Nat Rev Cancer 2010:10618]. However, cancer biomarker discovery has been dogged by failure [Cancer Res 2012;72:6079]. Intratumoural heterogeneity is a potential limiting factor in validation studies where molecular readouts are derived from sampling a small fraction of the overall tumour volume.

We have developed a novel molecular- and/or protein-based approach for prognosis and prediction of treatment response (sunitinib) in mccRCC, controlling for established clinical variables. We found good performance in development and validation cohorts. This approach controls for currently used clinical variables, and outperforms current clinico- pathological nomograms on the data examined. NEAT performs favourably on development and validation cohorts in comparison to current clinico-pathological nomograms. Our unique proteomic data from multiregional tumour sampling enabled investigation of the relationship between number of samples and risk stratification performance. The sampling regimen was shown to have a dramatic effect on NEAT performance in the validation cohort, demonstrating the importance of taking steps to mitigate heterogeneity in predictive cancer medicine.

Methods

Initial work used Reverse Phase Protein Array (RPPA) and clinical data from 44 patients split into two distinct cohorts from a) SuMR phase II clinical trial of neoadjuvant sunitinib, development, n=22 [Eur Urol 2011 ;60:448] and b) Cytoreductive nephrectomy patients from the SCOTRRCC study, validation, n=22 [Nat Rev Urol 2011 ;8:255]. An algorithm was developed using protein and clinical features selected by regularised machine learning with overall survival as primary endpoint. Cohorts Studied and Reverse Phase Protein Arrays (RPPA)

Ethical approval was obtained prior to commencing work (REC references: 10/S1102/68 and 10/S1402/33). The characteristics and clinical outcome of the metastatic clear cell Renal Cell Carcinoma (mccRCC) cohorts studied are given in Table 1. Excluding necrotic tissue, a median of four fresh-frozen samples were obtained per tumour for both cohorts; SuMR (n=22; 'RCC_TRAIN'), SCOTRRCC (n=22; 'RCC_TEST'). Samples were selected as representative of the diversity of Fuhrman grade observed across the tumour. Mixture modelling analysis of overall survival with unsupervised cardinality selection (Bayesian Information Criterion regularisation) identified a bimodal distribution for the combined cohorts (n=44, 'RCC_ALL') and the RCC_TEST cohort, a bimodal model was also fitted to the RCC_TRAIN cohort although BIC regularisation identified a unimodal model by a small margin. The two cohorts had very similar overall survival modes (Table 1 ), with the caveat that the censoring of RCC_TEST was more extensive. Further inspection reveals that the proportion of patients in these modes differs between the cohorts (Figure 5). Expression of 58 proteins were investigated, selected according to prior knowledge, antibody availability and successful validation. No signal was detected for three of the 58 proteins examined, leaving a total of 55 for analysis. Details of antibodies are given in Table 2. In addition, total protein staining was performed using FAST green (Sigma). The RCC_TEST patients were subsequently treated with sunitinib/similar targeted agents (n=11), no treatment (n=9) or radiation therapy only (n=2). RPPA data was obtained using previously described approaches [J Vis Exp 2013;doi; 10.3791/50221]. For total protein, the SuperCurve algorithm was employed to measure intensities, median values were taken per patient across five replicate slides. Batch effects were mitigated using ComBat [Biostat 2007;8:118] and data normalised using VSN [Bioinformatics 2009; 25: 1384].

Median Protein Expression and Intra-tumoural Variance in Sunitinib-Exposed and Sunitinib-Na ' fve Tumours

Differential expression of median values per tumour was assessed for the sunitinib-naive (SN) and sunitinib-exposed (SE) tumour samples using established approaches. The t- test may be applied when marker expression values are normally distributed and have equivalent variance (homoscedasticity). Therefore, to identify normally distributed markers we used the Lilliefors test, which examines the difference between the empirical and hypothetical (normal) cumulative distribution; the null hypothesis is that the data are normally distributed [JAMA 1974;69:730]. Homoscedasticity was tested using the Fligner- Killeen test where the null hypothesis is that variances are equivalent [Technometrics 1981 ;23:351]. For markers where Fligner-Killeen and Lillefors test results were consistent with the assumptions of homoscedasticity and normality the t-test was applied to assess differential expression. For markers where either the Fligner-Killeen or Lillefors tests indicated that the t-test was not appropriate the non-parametric Mann-Whitney U test was performed to assess differential expression. The resultant list of 55 p-values was corrected for multiple hypothesis testing using the method of Benjamini and Hochberg [J Roy Stat Soc B 1995; 57:289].

Molecular heterogeneity was studied using an ANOVA framework, calculating intra- tumoural variances for the SN and SE tumour groups separately. The F-test was applied to assess differences in variance, as determined by the ratio of mean squared errors between SE and SN groups. The F-test is sensitive to deviation from normality and therefore was only applied where marker expression distributions were not significantly different to the normal distribution according to the Lilliefors test on both the SN and SE groups. Normality is also a requirement for ANOVA. Within-group homoscedasticity was assessed using the Fligner-Killeen test and proteins excluded from analysis unless they met this assumption in each of the SE and SN groups, because homoscedasticity is a formal requirement for ANOVA. Again, p-values were corrected for false discovery rate according to the method of Benjamini and Hochberg. All p-values were two-tailed. Four proteins (BCL2, CAIX, mTOR and MLH1) had significantly different variance between the two groups (F-test FDR p<0.05). Ranking proteins according to the ST:SN variance log- ratio enabled investigation of increased variance upon treatment where F-test assumptions were not met. Proteins were included for analysis when their ST:SN variance rank was equal to or higher than any of those identified as significant by the F-test. This approach identified a further two proteins (N-cadherin, EpCAM) for analysis, therefore six proteins were candidate features for survival modelling.

Selection of Predictive Features and Multivariate Modelling

Variables were selected for Cox multivariate analysis using backward elimination regularised by Bayesian Information Criterion (BIC) on the RCC_TRAIN dataset, a form of wrapper feature selection (Schwarz 1978; Kohavi & John 1997). BIC regularisation controls overfitting, which is particularly important when training data is limited (Zhang et al. 2010). An initial Cox regression model was fitted using all features by the 'coxph' function from the R survival library (Therneau 2000). Backward elimination iteratively removed a single feature at each step, selected for the greatest improvement in BIC value. Thus, features with low predictive power or high redundancy were removed. The procedure terminated with a final model when removing any single feature would not improve the BIC value. The function 'stepAIC was used from the MASS R library, with the value of k (a multiplier penalising model complexity) specified for BIC regularisation (Venables & Ripley 2002).

A total of 12 features were input to wrapper selection; these included six key clinical parameters where data was available for both SE and SN cohorts (grade, gender, age, neutrophils, hemoglobin level, DCM score (Heng et al. 2013). Also included were the median tumour expression of six proteins that were significantly differentially expressed and had substantively increased variance upon treatment (BCL2, MLH1 , CAIX, mTOR, N- cadherin and EpCAM). The selected features were N-cadherin, EpCAM, Age and mTOR (NEAT) and the resulting multivariate Cox proportional hazards model learned on the RCC_TRAIN dataset had likelihood ratio test ^=1.18xl0 -4 . The proportional hazards assumption was met, Grambsch-Therneau test results are given in Table 5 (Grambsch & Therneau 1994). Comparison with Clinical Nomograms

Scores for the Database Consortium Model (DCM) and the Memorial-Sloan Kettering Cancer Centre (MSKCC) were calculated according to the relevant clinical data (Heng et al. 2013; Motzer et al. 2002). There was sufficient data available for DCM to make unambiguous classification for 20/22 patients in RCC_TEST, all of which were either intermediate or poor prognosis. Patients were grouped by MSKCC scores as either a) favourable/intermediate or b) poor prognosis; data were available for unambiguous classification for 14/22 patients in RCCJTEST. A further two patients were on the borderline of intermediate or poor prognosis with MSKCC parameters, due to missing data, but had short survival times and were therefore assigned to the poor prognosis group, hence 16/22 patients were assigned MSKCC scores in RCCJTEST. All patients in RCC_TRAIN had sufficient data for unambiguous DCM and MSKCC scoring. Hazard ratios for the NEAT, DCM and MSKCC approaches on RCC_TRAIN and RCC_TEST were calculated by stratification into better and worse than average risk groups according to hazard values derived from the model fitted on RCC_TRAIN.

Results

Candidate protein drivers of sunitinib resistance

We hypothesised that increased protein variance following sunitinib exposure reflects dynamic changes in protein expression and cellular heterogeneity in response to drug treatment. Intra-tumoural heterogeneity has been demonstrated in renal cancers, consistent with expectations that clonal selection is important in tumour progression(Gerlinger et al. 2012; Sharpe et al. 2013). Increased intra-tumoural variance of a protein marker upon treatment implies that tumour cells change the concentration of that marker in response to drug. Some of these clones (i.e. cells) may acquire drug resistance and go on to populate the majority of the tumour under selective pressure from the drug. We reasoned that the associated change in overall tumour cell composition would be observed as differential expression of the relevant protein, when comparing overall expression before and after sunitinib treatment. Therefore, we hypothesised that proteins with high intra-tumoural variance and that are differentially expressed may provide markers for clonal changes underlying drug resistance. Importantly, the development cohort all showed initial response to treatment and completed three cycles of sunitinib prior to nephrectomy (4 weeks on, 2 weeks off). Indeed, at surgery, 16/22 patients showed >5% shrinkage of primary tumour longest diameter; and all patients had either partial renal response or stable disease at time of nephrectomy. We expect simultaneous acquisition of drug resistance and other aggressive characteristics such as invasiveness, for example in an EMT-like cellular programme (Sharpe et al. 2013; Thiery et al. 2009). Therefore, we hypothesised that differentially expressed proteins with increased intra-tumoral variance post-treatment would be useful variables for assessment of sustained treatment response and overall survival, indeed these clinical factors are inherently linked. Thirty proteins had significant differential expression (Figure 1). Six proteins were identified as both differentially expressed and where intra-tumoural variance was substantively increased following treatment (methods); these were N-cadherin, mTOR, CA9, BCL2, EpCAM, and MLH1 (Figure 2).

A novel algorithm for risk stratification of metastatic renal cancer patients

Regularised wrapper selection with Cox multivariate analysis of the development cohort (methods) selected four variables as predictive features (Table 3). Features considered in machine learning included six established clinical parameters plus median protein expression of the candidate drivers of sunitinib resistance. The final predictor set (N- cadherin, EpCAM, Age and mTOR) were therefore selected over and above current clinical variables, including the DCM score which is thought to perform best out of all current clinical nomograms designed for metastatic RCC (Heng et al. 2013). Hazard can therefore be calculated from the protein variables (mTOR, N-cadherin, EpCAM) plus age at diagnosis as follows: Hazard = exp(8.927 N-cadherin + 3.800 EpCAM + 0.129 Age - 18.385 mTOR)

In the above model (termed "NEAT" - i.e. N-cadherin, EpCAM, age, mTOR = NEAT), EpCAM, N-cadherin and age all have significant negative correlations with survival. N- cadherin is normally expressed in proximal tubules of the kidney, the presumed origin of RCCs and is a marker of aggressiveness (Shimazui et al. 2006). Canonical Epithelial to Mesenchymal Transition (EMT) involves gain of N-cadherin and EMT-like changes in RCCs correlate with poor prognosis(Harada et al. 2012; Pantuck et al. 2010; Thiery et al. 2009; Donhuijsen & Schulz 1989). High EpCAM expression is broadly associated with poor prognosis in cancers (Spizzo et al. 201 1 ; Trzpis et al. 2007), although in RCC reports link EpCAM with better prognosis especially in localised disease e.g. (Seligson et al. 2004; Eichelberg et al. 2013). Therefore, our finding that EpCAM expression correlates with RCC poor prognosis in a multivariate model differs from current thinking. This difference may be due to material analysed (i.e. sunitinib-exposed metastatic renal cancer tissue), and the technologies employed. For example we used quantitative immunofluorescence which provides continuous scores rather than categorical classification; we also used RPPA on whole tissue lysates, hence including signal from all of the cells populating the tumour (e.g. macrophages, pericytes, endothelium) and not restricted to plasma membrane staining as done in other studies e.g. (Seligson et al. 2004; Eichelberg et al. 2013). Indeed EpCAM can function as a signal transducer, involving nuclear localisation of the cleaved intracellular domain (Maetzel et al. 2009) ; unlike other work, our analysis would include this nuclear EpCAM signal. A positive correlation with survival was found for mTOR. This was investigated further by training univariate Cox models on the SE and SN cohorts separately, and indicated a positive relationship with survival for mTOR, although the SN group was not significant (SE: p=0.0656; SN p=0.309). In contrast to current thinking (Banumathy & Cairns 2010) this result suggests that mTOR variation in mccRCC may predominantly reflect changes in mTORCI complex concentration, particularly under sunitinib treatment. The mTORCI complex exerts negative feedback on receptor tyrosine kinases to suppress proliferation and survival (Laplante & Sabatini 2012) and so mTORCI signalling may synergise with sunitinib. However, mTOR inhibitors are currently in clinical use (e.g. in USA), possibly in conjunction with sunitinib or agents with similar activity profile. Therefore a positive relationship with survival for mTOR is of immediate interest because this indicates that mTOR inhibitors may adversely affect outcome in patients where RTK-active drugs such as sunitinib are, or will be, part of the treatment schedule. The NEAT algorithm performed well in stratifying ccRCC patients in both development (sunitinib-exposed) (Figure 3) and validation (sunitinib-na ' ive) samples (Figure 4). Following up the results of feature selection to investigate predictive power with smaller numbers of variables, we fitted Cox models on RCC_TRAIN leaving out one of the three selected molecular features to investigate the pairwise combinations with age. Consistent with the full model, we identified that mTOR and N-cadherin were the strongest predictive variables, significant models (likelihood ratio test p <0.1 ) were obtained on all combinations except for EpCAM with N-cadherin. Results are summarised in Table 6.

Exploration of spot-specific correction using total protein values

One strategy to protect the robustness of the NEAT algorithm across geographically distinct locations is spot-specific correction by normalisation of marker expression values, where a quotient is obtained with divisor from total protein (TP) staining. We investigated Cox modelling of TP-normalised values for the NEAT markers (Age was unchanged) in RCC_TRAIN to generate the model NEAT_NORM (Table 10, likelihood ratio test p=0.0011). According to the same approach applied for the NEAT model, a hazard threshold of 1 was used with NEATJMORM for identification of low and high risk groups in RCC JEST (log-rank p=0.011). While significant, the worse p-value on RCC JEST is largely due to the reduction in patients classified as high risk: n=2 (9%) by NEATJMORM versus n=4 (18%) by NEAT. All patients identified by either NEAT or NEATJMORM as high risk in RCC_TEST experienced an event within 10 months or less and therefore the algorithms have very high specificity for classification of the high risk group. Interestingly, TP values correlated with the number of metastatic sites at surgery in RCC_TRAIN (Spearman rho=0.4, p=0.044), which may partly explain reduced predictive power for NEAT_NORM compared with NEAT. Overall, these data show that spot-specific correction by TP has potential as an additional measure to safeguard robustness, if required.

Comparison with clinical nomograms

The NEAT model outperformed both the DCM (Heng et al. 2013) and MSKCC (Motzer et al. 2002) nomograms on the RCC_TRAIN (SuMR) and RCC JEST (SCOTRRCC) cohorts, using hazard of 1 to stratify into high (above average) and low (below average) risk. The MSKCC approach is probably the most popular, although DCM has been found to be best-performing when compared with key current approaches (Heng et al. 2013). The performance comparison results are summarised in Table 4; Net Reclassification Improvement (Pencina et al. 2011 ) on RCC_TEST found that NEAT outperformed the DCM and MSKCC nomograms with respective values of 7.1% and 25.4%. All patients in RCC_TEST that were identified as high risk by NEAT experienced an event within 10 months or less. Therefore NEAT has very high estimated specificity for high risk patients at one or two years (100%). NEAT also achieves good sensitivity; in RCC_TEST the predicted high risk group contains 75% of patients experiencing an event within 1 year and 50% of those experiencing an event within 2 years.

Example 2: Experimental data

Carbonic anhydrase 9 expression increases with VEGF targeted therapy and is predictive of outcome in metastatic clear cell renal cancer

Abstract

BACKGROUND: There is a lack of biomarkers to predict outcome with targeted therapy in metastatic clear cell renal cancer (mccRCC). This may be because dynamic molecular changes occur with therapy.

OBJECTIVE: To explore if dynamic, targeted therapy driven molecular changes correlate with mccRCC outcome.

DESIGN, SETTING, AND PARTICIPANTS: Multiple frozen samples from primary tumours were taken from sunitinib naive (n=22) and sunitinib treated mccRCC patients (n=23) for protein analysis. A cohort (n=86) of paired untreated and sunitinib/pazopanib treated mccRCC samples was used for validation. Array CGH analysis and RNA interference (RNAi) was used to support the findings. INTERVENTION: Three cycles sunitinib 50mg (4 weeks on, 2 weeks off).

MEASUREMENTS AND STATISTICAL ANALYSIS: Reverse phase protein arrays (training set) and immunofluorescence automated quantitative analysis (validation set) assessed protein expression.

RESULTS AND LIMITATIONS: Differential expression between sunitinib naive and treated samples was seen in 30/55 proteins (p<0.05 for each). BCL2, MLH1 , carbonic anhydrase 9 (CA9), and mTOR had both increased intratumoural variance and significant differential expression with therapy. The validation cohort confirmed increased CA9 with therapy. Multivariate analysis showed high CA9 after treatment was associated with longer survival (HR=0.48 [95%CI: 0.26-0.87] p=0.02). Array CGH profiles revealed sunitinib was associated with significant CA9 region loss. RNAi CA9 silencing in two cell lines inhibited the anti-proliferative effects of sunitinib. Shortcomings include: selection of specific protein for analysis, and the specific time-points at which the treated tissue was analysed. CONCLUSIONS: CA9 levels increase with targeted therapy in mccRCC. Lower CA9 levels are associated with a poor prognosis and possible resistance, as indicated by the validation cohort. PATIENT SUMMARY: Drug treatment of advanced kidney cancer alters molecular markers of treatment resistance. Measuring CA9 levels may be helpful in determining which patients benefit from therapy.

1. Introduction

VEGF targeted tyrosine kinase inhibitor (TKI) therapy is established as first line therapy in metastatic clear cell renal cancer (mccRCC) [1]. Clinical benefit with sunitinib varies between mccRCC patients. While there are a number of prognostic clinical factors, there are presently few validated molecular means of improving prognosis or prediction of response of mccRCC patients to targeted therapies [2]; the recent report of serum IL-6 predicting response to pazopanib being an exception [3]. This lack of predictive ability is in contrast to numerous other tumour types, such as chronic myeloid leukaemia and breast cancer, where protein expression and mutation analysis can be used to predict response and treatment failure [4,5].

Analysis of molecular markers from single tumour tissue samples taken at baseline in mccRCC has failed to identify predictive biomarkers associated with response to sunitinib [6], We hypothesise that dynamic changes occur to biomarker expression with VEGF targeted therapy, and only tissue taken later in the course of treatment can predict drug activity. Therefore by analysing protein expression from VEGF treated and untreated renal cancer tissue, it may be possible to identify and validate protein biomarkers.

In this work we compared the expression of 55 key proteins in nephrectomy tumour samples from patients with mccRCC who were treated with sunitinib prior to nephrectomy or were sunitinib naive at the time of surgery. Recent publications have demonstrated extensive intratumoural heterogeneity (ITH) in mccRCC [7,8]. ITH is likely to hamper biomarker research. To address ITH in this study, lysates were taken for multiple spatially separate areas of each primary tumour. We attempted to identify not only biomarkers which significantly change with VEGF targeted therapy, but also those which demonstrated increased protein variance with therapy. To confirm these findings a validation cohort was used, consisting of paired untreated and anti-VEGF TKI treated samples (n=86) taken from previously untreated mccRCC patients enrolled in 3 clinical trials. To further explore the cause and relevance of changes in protein expression array comparative genomic hybridization (aCGH) was used to identify relevant chromosomal changes, while RNA interference (RNAi) in RCC cell lines addressed the functional relevance of significant changes.

2. Methods

2.1 Cell lines

See supplementary methods (below) for cell line details.

2.2. Patient samples

Fresh frozen primary ccRCC tissue was obtained from the nephrectomy samples of 22 sunitinib naive mccRCC patients as part of the SCOTRRCC study (UK CRN ID: 12229). Tissue was also obtained from 27 mccRCC patients treated with 3 cycles of pre- surgical sunitinib (18 weeks) as part of the SuMR trial (NCT01024205), tissue from four of these patients was entirely necrotic, leaving 23 patients with adequate tissue for analysis (table 7). These two sample cohorts made up the test sample set.

A tissue microarray (TMA), with the paraffin embedded tissue from matched pre- treatment primary tumour biopsy tissue and post-treatment nephrectomy tumour tissue from the same patients (n=86) were used as a validation sample set. See supplementary methods for TMA construction details. This tissue came from 3 prospective studies including the SuMR study described above (NCT01024205, NCT01512186, NCT01064310) and included patients treated with sunitinib and pazopanib. Patients were followed up according to standard guidelines with 12 weekly cross sectional imaging. Outcome data was recorded. All studies underwent ethics approval prior to commencement.

Each piece of fresh frozen tumour tissue was mapped and separated into small pieces (~1 cm 3 ) from which lysates were created. A frozen section was performed (MO) on each 1 cm 3 piece of tissue to confirm the presence of viable ccRCC and for grading. Where possible a minimum of four protein/DNA lysates were aimed for per patient.

2.3. Reverse phase protein arrays (RPPA)

Protein extraction and RPPA spotting and protein quantification were performed as described previously [9,10]. Protein expression levels from RC124 and HUVEC cell lines were used as references on each RPPA slide. Batch effects across the three RPPA slides per marker were mitigated using ComBat [1 1] and data normalized using variable slope normalisation (VSN) [12].

58 proteins were evaluated using RPPA. These proteins were relevant in RCC pathogenesis or sunitinib response and belonged to the following functional groups: cell cycle, apoptosis, protein kinases, angiogenesis, cell adhesion, PI3K pathway, epithelial- to-mesenchymal transition, MET/HGF and mismatch repair. There was no signal detected for three of the proteins (Ki67, FLT3 and phospho-Jak2). As such, 55 proteins were analysed in this study (antibodies detailed in Table 8). 2.4. Automated Quantitative Analysis (AQUA) of immunofluorescence

Immunofluorescence and AQUA analysis was performed on the validation cohort, using methods previously described [13,14]. Table 8 details antibodies used.

To correct for any bias due to the separation of pre- and post-treatment samples on unique tissue microarrays AQUA results for the matched tissue samples on each TMA were median normalised prior to analysis of significance by Wilcoxon matched pairs test. X-tile was used for determining the cut-off for defining high and low protein expression in the primary tumour [15].

2.5. aCGH

DNA extraction from fresh frozen tissue and FFPE was carried out using the

DNeasy Blood and Tissue Kit (Qiagen, UK) according to the manufacturer's instructions. aCGH hybridisation and analysis was carried out as recently described [16] using Roche UK Nimblegen 12X135k whole-genome array.

The CGH-segMNT module of NimbleScan was used for the analysis with a minimum segment length of five probes and an averaging window of 130kb. Nimblegen arrays were positionally annotated based upon hg19 genomic coordinates and log ratio data was pre-processed in R as previously described [17]. Briefly, array data was normalised with print tip Loess from the limma package to produce normalised log ratios, filtered to remove outliers based upon a 1 MAD deviation of each probe from its immediate genomic neighbours and smoothed with a circular binary segmentation algorithm from the DNACopy package. Smoothed log ratios were then thresholded for gain/loss (±0.1 ) and amplification/deletion (±0.45) to identify contiguous copy number aberrations containing at least 15 consecutive probes. Further details of aCGH analysis are given in supplementary methods.

2.6. RNAi experiments

The human RCC cell lines CAKI-2 (wild type VHL) and RCC1 1 (VHL mutant) were transfected with a non-targeting control short interfering RNA (siRNA) (5'- CATGCCTGATCCGCTAGTC-3') or CA9 siRNA (5'-GAGGAGGATCTGCCCAGTGAA-3') (Qiagen, UK). Twenty-four hours after transfection, cells were either treated with 0.01 % DMSO or 4μΜ sunitinib and cell viability was assessed after 5 days using the Cell Titer Glo assay (Promega, UK). See supplementary methods for further RNAi methodology. 2.7. Statistical analysis

Differential expression was assessed per protein between sunitinib treated and sunitinib naive tumours by application of the t-test where normality and homoscedasticity assumptions were satisfied, otherwise using the nonparametric Mann-Whitney U test (MWT). The F-test was used to assess intratumoural variances within an ANOVA framework for those proteins where assumptions of normality and homoscedasticity (within-group) were met. Appropriate false discovery rate (FDR) correction was applied to all P-values [18]. Further details on the assessment of intratumoural variance are given in supplementary methods.

Overall survival was estimated using Kaplan-Meier methods, with differences assessed using the log rank test. Multivariate analysis was performed using Cox regression. SPSS version 20 or R were used for all statistical analyses. 3. Results

3.1. Patient demographics

The key patient characteristics and treatment outcomes were comparable for patients in the test set who were not treated with sunitinib prior to a cytoreductive nephrectomy and those patients who had sunitinib therapy prior to nephrectomy (table 7). Of the 45 patients included, 44 patients had multiple samples taken (median 4 regions, (range 2-10 regions).

3.2. Effect of sunitinib treatment on protein expression assessed by RPPA

There were significant differences in protein expression between the treated and untreated samples expression for 30 of the 55 proteins evaluated in the test set (figure 7). Of particular note were four proteins that had both significant differential expression and significantly increased intratumoural variance after sunitinib (BCL2, MLH1 , carbonic anhydrase 9 (CA9), and mTOR : p<0.05 for each) (figure 8a).

3.3. AQUA results from the validation cohort

BCL2, MLH1 , CA9 and mTOR protein expression was evaluated using in situ staining and AQUA of the validation TMA (paired treated and untreated samples from the same patient (n=86)). This analysis revealed that of these four proteins only CA9 was significantly differentially expressed (increased) with treatment (figure 8b) (P=0.01 ). High expression of CA9 in sunitinib treated tissue was found to be associated with good OS (HR=0.26, 95% CI: 0.1 1-0.61 , P=0.001 ; log rank test (figure 8c)). Results from the multivariate analysis which included a number of prognostic factors (Heng prognostic score, Fuhrman grade, T stage at diagnosis, number of metastatic sites, age, and CA9 expression in nephrectomy ccRCC specimen) in the model showed that low Fuhrman grade (HR=0.51 , 95% CI: 0.30- 0.89) and high CA9 at nephrectomy (HR=0.48, 95% CI: 0.26-0.87) were associated with a good OS (full results in Table 9). 3.4. aCGH analysis

DNA from the test set of sunitinib naive and treated patient samples were used for aCGH analysis to compare chromosomal aberrations. Total number of aberrations were significantly greater in treated samples. Comparisons of gains, losses, amplifications and deletions in sunitinib treated and untreated samples revealed significantly greater levels of chromosomal losses in the region encoding CA9 in the treated samples (P=0.002, Fisher's test) (figure 9a). Conversely, the increase in losses across the whole genome was not significant (Figure 11 ).

3.5. Functional analysis of CA9 using RNAi in renal cancer cell lines

Results from both renal cancer cell lines (CAKI-2 and RCC11) showed CA9 was successfully silenced with siRNA (figure 9b and 9c). Cell viability assay showed CA9 silencing inhibited the anti-proliferative effects of sunitinib, regardless of cell line VHL status. These results support the findings from the clinical tissue where low levels of CA9 are associated with poor outcome from sunitinib therapy.

4. Discussion

In this work we have demonstrated that VEGF targeted therapy significantly alters the expression of a number of selected proteins despite protein ITH.

Four proteins, CA9, MLH1 , mTOR and BCL2, showed both significant changes in expression and increases intratumoural variance with sunitinib. These dynamically variable and changing proteins were chosen for further evaluation due to the likelihood of them being biologically relevant. Of these 4 proteins, CA9 up-regulation revealed significant results in the validation cohort and functional cell line work.

CA9 is a hypoxia-regulated transmembrane protein overexpressed in a number of cancers. It is usually associated with hypoxic stress and poor prognosis [19]. Extensive investigation has been performed in renal cancer due to the frequency of CA9 over expression and conflicting results regarding its prognostic value [19-21], CA9 is over expressed in the vast majority of ccRCC and has promise at a diagnostic level [20], but paradoxically, high CA9 levels correlate with good outcomes in some studies [21]. Tumour samples in the pre-VEGF TKI era showed conflicting data on the prognostic value of baseline CA9 [21 ,22]. Prospective studies showed CA9 was not able to predict response to immune therapy [23]. Together these data suggested that CA9 did not have a crucial predictive role in the era of immune therapy.

Biomarker studies in the era of VEGF targeted therapy have failed to consistently show that high baseline CA9 protein levels are associated with a good outcome [24,25]. This may be because of the dynamic changes that occur with therapy and the use of archival tissue from a single time point for biomarker analysis. The work presented here shows that not only does targeted therapy increase the expression of CA9 but these changing levels are also prognostic, these findings were observed in both inter-patient (unmatched test set) and critically the intra-patient (matched sequential biopsy and nephrectomy validation set) samples. This dynamic change in a prognostically important biomarker suggests the drive towards predictive biomarkers may be possible. Anti-VEGF therapy is associated with vasoconstriction and subsequent hypoxia [8]; therefore, the up- regulation of CA9 with VEGF targeted therapy could be a consequence of effective VEGF targeting. Indeed, figure 7 shows that a number of VEGF and hypoxia associated markers are also affected by sunitinib (VEGFR-1 , VEGFR-3, PDGFR-β, c-KIT, VEGF-A, VEGF-D), supporting this argument. Of note, HIF-1a was not differentially expressed, most likely due to its short half-life, which was exceeded by the warm ischaemia time during sample acquisition. Alternatively, CA9 may have more of a direct oncogenic effect as a reaction to VEGF targeted therapy. The silencing of CA9 in renal cancer cell lines resulted in inhibition of the anti-proliferative effects of sunitinib. While not conclusive, these in vitro findings support our clinical sample data. Together these results suggest a role for CA9 in sunitinib activity, and that CA9 may be relevant in the development of sunitinib resistance. Previous studies have shown that CA9 can affect cell adhesion and contact inhibition, demonstrating a role beyond simply a reaction to hypoxia [26,27]. The work presented here also shows that sunitinib treatment is associated with chromosomal changes to CA9. These chromosomal changes also point to a change in tumour DNA rather than simply a stromal reaction to hypoxia.

A biomarker, identified after a specific period of therapy is of potential clinical use providing patients are willing to have a repeat biopsy during treatment. The utility of repeat biopsy in practice is challenging. A randomised trial, comparing continued therapy with a change in therapy, in those patients who failed to gain a rise in CA9 with therapy would test this biomarker prospectively.

There are several strengths to this work. Obtaining sequential tissue in metastatic renal cancer is challenging and this is to our knowledge the largest series available. Different techniques to measure protein expression were used in the training and validation set; furthermore, the chromosomal and in vitro work support the CA9 findings. However, there are a number of limitations of this study. Firstly, the 55 proteins chosen for analysis may be subject to selection bias. Also, the initial biomarker testing was not performed in matched pairs from the same individual. Ideally RPPA and aCGH would have been performed using matched samples from the same patient before and after sunitinib therapy; this was not feasible due to the amount of fresh frozen tumour tissue required to allow multilevel molecular analysis from multiple spatially variant regions of the same tumours. However, the characteristics of the two test set groups were similar and validation of the biomarkers occurred in paraffin embedded matched pairs. Biomarker analysis took place at a specific time point with therapy and following a two-week break before nephrectomy, which may have influenced biomarker expression. Finally, the in vitro study does not assess the effect of sunitinib on the tumour vasculature [28], the main target of this treatment, which is a limitation of the epithelial cell culture used in this study.

5. Conclusions

This study illustrates the dynamic changes to relevant proteins with anti-VEGF targeted therapies. Despite these dynamic changes and factoring in ITH, it was possible to identify and validate CA9 as an independent predictor of outcome following anti-VEGF targeted therapy. There were consistent dynamic changes to CA9 at chromosomal and protein levels, together with the multivariate analysis and in vitro studies these results suggest CA9 may have relevance to sunitinib resistance. CA9 modulation to overcome anti-VEGF therapy resistance may be a potential therapeutic area of investigation in the future.

6. Example 2 - References

[1] Motzer RJ, Hutson TE, Tomczak P, Michaelson MD, Bukowski RM, Oudard S, et al. Overall survival and updated results for sunitinib compared with interferon alfa in patients with metastatic renal cell carcinoma. J Clin Oncol 2009;27:3584-90.

[2] Stewart GD, O'Mahony FC, Powles T, Riddick ACP, Harrison DJ, Faratian D. What can molecular pathology contribute to the management of renal cell carcinoma? Nat Rev Urol 201 1 ;8:255-65.

[3] Tran HT, Liu Y, Zurita AJ, Lin Y, Baker-Neblett KL, Martin A-M, et al. Prognostic or predictive plasma cytokines and angiogenic factors for patients treated with pazopanib for metastatic renal-cell cancer: a retrospective analysis of phase 2 and phase 3 trials. Lancet Oncol 2012;13:827-37.

[4] Druker BJ, Guilhot F, O'Brien SG, Gathmann I, Kantarjian H, Gattermann N, et al. Five-year follow-up of patients receiving imatinib for chronic myeloid leukemia. N Engl J Med 2006;355:2408-17. [5] Slamon D, Eiermann W, Robert N, Pienkowski T, Martin M, Press M, et al. Adjuvant trastuzumab in HER2-positive breast cancer. N Engl J Med 2011 ;365: 1273-83.

[6] Powles T, Hutson TE. Difficulty in predicting survival in metastatic renal cancer. Lancet Oncol 2012;13:859-60.

[7] Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E, et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med 2012;366:883-92.

[8] Sharpe K, Stewart GD, Mackay A, Van Neste C, Rofe C, Berney D, et al. The effect of VEGF-targeted therapy on biomarker expression in sequential tissue from patients with metastatic clear cell renal cancer. Clin Cancer Res 2013;19:6924-34.

[9] O'Mahony FC, Nanda J, Laird A, Mullen P, Caldwell H, Overton IM, et al. The use of reverse phase protein arrays (RPPA) to explore protein expression variation within individual renal cell cancers. J Vis Exp JoVE 2013; Jan 22;(71).

[10] Faratian D, Urn I, Wilson DS, Mullen P, Langdon SP, Harrison DJ. Phosphoprotein pathway profiling of ovarian carcinoma for the identification of potential new targets for therapy. Eur J Cancer 2011 ;47:1420-31.

[11] Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostat Oxf Engl 2007;8:1 18-27.

[12] Neeley ES, Kornblau SM, Coombes KR, Baggerly KA. Variable slope normalization of reverse phase protein arrays. Bioinformatics 2009;25:1384-9.

[13] Camp RL, Chung GG, Rimm DL Automated subcellular localization and quantification of protein expression in tissue microarrays. Nat Med 2002;8:1323-7.

[14] O'Mahony FC, Faratian D, Varley J, Nanda J, Theodoulou M, Riddick ACP, et al.

The use of automated quantitative analysis to evaluate epithelial-to-mesenchymal transition associated proteins in clear cell renal cell carcinoma. PloS One 2012;7:e31557.

[15] Camp RL, Dolled-Filhart M, Rimm DL. X-tile: a new bio-informatics tool for biomarker assessment and outcome-based cut-point optimization. Clin Cancer Res

2004;10:7252-9.

[16] Gerth-Kahlert C, Williamson K, Ansari M, Rainger JK, Hingst V, Zimmermann T, et al. Clinical and mutation analysis of 51 probands with anophthalmia and/or severe microphthalmia from a single center. Mol Genet Genomic Med 2013;1 :15-31.

[17] Natrajan R, Weigelt B, Mackay A, Geyer FC, Grigoriadis A, Tan DSP, et al. An integrative genomic and transcriptomic analysis reveals molecular pathways and networks regulated by copy number aberrations in basal-like, HER2 and luminal cancers. Breast Cancer Res Treat 2010;121 :575-89.

[18] Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Ser B Methodol 1995;57:289-300. [19] Wykoff CC, Beasley NJ, Watson PH, Turner KJ, Pastorek J, Sibtain A, et al. Hypoxia-inducible expression of tumor-associated carbonic anhydrases. Cancer Res 2000;60:7075-83.

[20] Luong-Player A, Liu H, Wang HL, Lin F. Immunohistochemical Reevaluation of Carbonic Anhydrase IX (CA IX) Expression in Tumors and Normal Tissues. Am J Clin Pathol 2014;141 :219-25.

[21] Bui MHT, Seligson D, Han K, Pantuck AJ, Dorey FJ, Huang Y, et al. Carbonic anhydrase IX is an independent predictor of survival in advanced renal clear cell carcinoma: implications for prognosis and therapy. Clin Cancer Res 2003;9:802-11.

[22] Zhang BY, Thompson RH, Lohse CM, Dronca RS, Cheville JC, Kwon ED, et al. Carbonic anhydrase IX (CAIX) is not an independent predictor of outcome in patients with clear cell renal cell carcinoma (ccRCC) after long-term follow-up. BJU Int 2013; 111 :1046— 53.

[23] Clement JM, McDermott DF. The high-dose aldesleukin (IL-2) "select" trial: a trial designed to prospectively validate predictive models of response to high-dose IL-2 treatment in patients with metastatic renal cell carcinoma. Clin Genitourin Cancer 2009;7:E7-9.

[24] Choueiri TK, Cheng S, Qu AQ, Pastorek J, Atkins MB, Signoretti S. Carbonic anhydrase IX as a potential biomarker of efficacy in metastatic clear-cell renal cell carcinoma patients receiving sorafenib or placebo: Analysis from the treatment approaches in renal cancer global evaluation trial (TARGET)-iir. Urol Oncol 2012.

[25] Garcia-Donas J, Leandro-Garcia LJ, Gonzalez Del Alba A, Morente M, Alemany I, Esteban E, et al. Prospective study assessing hypoxia-related proteins as markers for the outcome of treatment with sunitinib in advanced clear-cell renal cell carcinoma. Ann Oncol 2013;24:2409-14.

[26] Parkkila S, Rajaniemi H, Parkkila AK, Kivela J, Waheed A, Pastorekova S, et al. Carbonic anhydrase inhibitor suppresses invasion of renal cancer cells in vitro. Proc Natl Acad Sci 2000;97:2220-4.

[27] Zavada J, Zavadova Z, Pastorek J, Biesova Z, Jezek J, Velek J. Human tumour- associated cell adhesion protein MN/CA IX: identification of M75 epitope and of the region mediating cell adhesion. Br J Cancer 2000;82:1808-13.

[28] Huang D, Ding Y, Li Y, Luo W-M, Zhang Z-F, Snider J, et al. Sunitinib acts primarily on tumor endothelium rather than tumor cells to inhibit the growth of renal cell carcinoma. Cancer Res 2010;70:1053-62. 7. Example 2 - Supplemental information

Supplemental methods

Cell lines

HUVECs were a gift from Kathryn Sangster of the Tissue Injury and Repair Group at the University of Edinburgh. HUVECs were cultures in EBM-2 media (Lonza Clonetics, UK) supplemented with endothelial cell growth kit (Lonza Clonetics, UK) and used in the RPPA experiments. RC124 human kidney cell (CLS, Cell Line Service, Germany) were also used in RPPA experiments and cultured in McCoy's 5A medium, 90% (Invitrogen, UK); fetal bovine serum (FBS), 10%. The human renal cancer cell lines CAKI-2 and RCC11 , a gift from Dr Tyson Sharp (Barts Cancer Institute, London, UK) were grown in RPMI medium (Sigma, UK) supplemented with FBS (10%) and antibiotics and used in RNAi experiments. TMA construction

The TMA constructed using tissue from nephrectomy tissue was constructed using standardised techniques. 1mm tumour cores were used from at least 3 regions per tumour specimen [1]. For the biopsy TMA a technique previously described to create TMAs from biopsy specimens was utilized [2].

RNAi experiments

Cells were plated in 96 well plates and the following day transfected with 50nM non- targeting control or CA9 siRNA, using Lipofectamine 2000 as per manufacturers instructions. Dose finding experiments were performed with a range of sunitinib concentrations, 4μΜ was determined to be the optimal concentration as it reduced cell viability by 70-90% relative to DMSO control treated cells and hence allowed for assessment of the effect of CA9 knockdown. The effect of sunitinib was normalized to DMSO treatment. To confirm silencing, protein was isolated from cells seventy-two hours post siRNA transfection. Lysates were immunoblotted as previously described [3], using anti-CA9 and anti-p-actin antibodies (Cell Signaling, Boston, MA).

aCGH Methodology

aCGH data was pre-processed on the basis of smoothed log ratios essentially as described [4,5]. In brief, raw Log2 ratios of intensity between samples and pooled female genomic DNA were read without background subtraction and normalised in the LIMMA package in R using PrinTipLoess. Outliers were removed based upon their deviation from neighbouring genomic probes using an estimation of the genome-wide median absolute deviation of all probes. A final dataset of 134937 probes with unambiguous mapping information according to the February 2009 build (hg19) of the human genome were used. Log2 ratios were rescaled using the genome-wide median absolute deviation in each sample, and then smoothed using circular binary segmentation in the DNACopy package as previously described. Losses and gains were defined as a circular binary segmentation (cbs)-smoothed Log2 ratio +/- 0.1. Copy number thresholds for Nimblegen arrays used thresholds chosen based on an average genome wide median absolute deviation of 0.139 across all arrays, similar to those used in previous studies. These thresholds were determined as previously described and validated empirically by means of in situ hybridisation methods [6,7]. Gene amplification was defined as having a Log2 ratio > 0.45, corresponding to more than five copies. A categorical analysis was applied to the probes after classifying them as representing amplification (>0.45), gain (>0.1 and <0.045), loss (<-0.1), or no-change according to their cbs-smoothed Log2 ratio values.

Statistical methods

Differential protein expression was assessed for the sunitinib naive and treated samples using established approaches. The t-test may be applied when marker expression values are normally distributed and have equivalent variance (homoscedasticity). Therefore, to identify normally distributed markers we used the Lilliefors test, which examines the difference between the empirical and hypothetical (normal) cumulative distribution. The null hypothesis is that the data are normally distributed, for further detail, see [8]. Homoscedasticity was tested using the Fligner-Killeen test where the null hypothesis is that variances are equivalent, see [9]. For markers where Fligner-Killeen and Lillefors test results were consistent with the assumptions of homoscedasticity and normality the t-test was applied to assess differential expression. For markers where either the Fligner-Killeen or Lillefors tests indicated that the t-test was not appropriate the non-parametric Mann- Whitney U test was performed to assess differential expression. The resultant list of 55 P- values was corrected for multiple hypothesis testing using the method of Benjamini and Hochberg [10]. Molecular heterogeneity was studied using an ANOVA framework, calculating intratumoural variances for the sunitinib naive and treated groups separately. The F-test was applied to assess differences in variance, as determined by the ratio of mean squared errors between naive and treated groups for individual markers. The F-test is sensitive to deviation from normality and therefore was only applied where marker expression distributions were not significantly different to the normal distribution according to the Lilliefors test on both the naive and treated groups. Normality is also a requirement for ANOVA. Within-group homoscedasticity was assessed using the Fligner-Killeen test and proteins excluded from analysis unless they met this assumption in each of the treated and untreated groups, because homoscedasticity is a formal requirement for ANOVA. Again, P-values were corrected for false discovery rate according to the method of Benjamini and Hochberg. All P-values were two-tailed.

References for supplemental information

[1] llyas M, Grabsch H, Ellis IO, Womack C, Brown R, Berney D, et al. Guidelines and considerations for conducting experiments using tissue microarrays. Histopathology 2013;62:827-39.

[2] McCarthy F, Fletcher A, Dennis N, Cummings C, O'Donnell H, Clark J, et al. An improved method for constructing tissue microarrays from prostate needle biopsy specimens. J Clin Pathol 2009;62:694-8.

[3] Martin SA, McCabe N, Mullarkey M, Cummins R, Burgess DJ, Nakabeppu Y, et al. DNA Polymerases as Potential Therapeutic Targets for Cancers Deficient in the DNA

Mismatch Repair Proteins MSH2 or MLH1. Cancer Cell 2010; 17:235^18.

[4] Natrajan R, Lambros MB, Rodriguez-Pinilla SM, Moreno-Bueno G, Tan DSP,

Marchio C, et al. Tiling Path Genomic Profiling of Grade 3 Invasive Ductal Breast Cancers.

Clin Cancer Res 2009;15:2711-22.

[5] Natrajan R, Mackay A, Wilkerson PM, Lambros MB, Wetterskog D, Arnedos M, et al. Functional characterization of the 19q12 amplicon in grade III breast cancers. Breast

Cancer Res 2012;14:R53.

[6] Lacroix-Triki M, Suarez PH, MacKay A, Lambros MB, Natrajan R, Savage K, et al. Mucinous carcinoma of the breast is genomically distinct from invasive ductal carcinomas of no special type. J Pathol 2010;222:282-98.

[7] Mackay A, Tamber N, Fenwick K, Iravani M, Grigoriadis A, Dexter T, et al. A high- resolution integrated analysis of genetic and expression profiles of breast cancer cell lines. Breast Cancer Res Treat 2009; 1 18:481-98.

[8] Stephens MA. EDF Statistics for Goodness of Fit and Some Comparisons. J Am Stat Assoc 1974;69:730-7.

[9] Conover WJ, Johnson ME, Johnson MM. A Comparative Study of Tests for Homogeneity of Variances, with Applications to the Outer Continental Shelf Bidding Data. Technometrics 1981 ;23:351-61.

[10] Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Ser B Methodol 1995;57:289-300. Example 3: Experimental data

Focus on Impact of Heterogeneity on Patient Stratification Performance

1. Methods

NEAT Performance Variation when Limiting Number of Samples Per Patient

The NEAT algorithm performance was tested on modifications of RCC_TEST using a maximum number of tumour samples (MNTS) of 1 , 2 or 3 from each patient. The median of the selected tumour samples' protein expression values for each patient was used as input to NEAT. Patient age was input unchanged. For patients with number of samples fewer than or equal to the current MNTS, all samples were used. The hazard ratio and log-rank p-value for stratification into 'high' and 'low 1 risk groups were calculated by the NEAT algorithm on each subsampled cohort. The subsampling approach to assess is summarised in Figure 12.

For each MNTS, there were >10 7 possible tumour sample combinations across the cohort. For the analysis presented, one million tumour combinations for each MNTS were taken using an approach to ensure good coverage of the sample space [Sobol' I.M.,Comput. Math. Math. Phys, 7, 86-112 (1967); Sobol' I.M., USSR Computational Mathematics and Mathematical Physics, 16, 5, 236-242 (1976)].

The number of possible tumour sample combinations for each MNTS is given by the product of the number of sample combinations per patient: N, choose{1 ,2,3}, where N, indicates the number of samples available for patient i (2-8 for RCC_TEST) and {1 ,2,3} is the current MNTS. Each of these possibilities may be indexed by a unique integer; one million such integers were generated using a Sobol sequence, a quasi-random procedure ensuring low discrepancy [Sobol (1967) and Sobol 1976), supra].

The mapping of each (master) integer generated by the sequence to a list of samples for each patient was calculated in two steps. First, the master integer was converted to a set of integers, one integer for each patient, that represented the combinadic (combinatorial index) for each patient's sample combination. This was done by converting the master integer to a mixed radix number, where a radix equal to N, choose {1 ,2,3} was present for each patient, which is interpreted as a patient's sample combinadic. Then, each combinadic was mapped onto the specific samples for the corresponding patient for inclusion in the sampling run.

2. Results and Discussion

NEAT overall performance on the validation cohort (RCC_TEST) was poor when limited to a single tumour sample per patient, and was significantly decreased when limited to two samples per patient (Figure 13). For one sample per patient, performance was only just better than random (median HR=0.109), although significantly better due to the large number of samples (binomial p<10 "322 ). Two samples (HR=1.614) improved performance over a single sample, as did three samples (median HR=3.030) over the two samples (both p<10 "324 , Mann-Whitney test).

The log HR starts to coalesce at specific values as the maximum number of samples increases, particularly where MNTS=3. This is in part due to some patients using the same tumour samples in every sampling run (0, 7 and 10 patients in the maximum 1 , 2 and 3 tumour sample experiments respectively). Nevertheless, the trend towards improved performance with more samples is pronounced and statistically significant. The variation in estimated risk with different samples is also clear for individual patients, where predicted risk scores often fall above and below zero; indicating the patient may fall into a low or high risk group depending on the sample(s) taken (Figure 14).

Together, these findings reflect the impact of intratumoural heterogeneity on predictive power. Indeed, in the current study, tumour samples were selected to be representative across the range of grades found in the tumour (low, mixed, high) and necrotic tissue was excluded. Our results suggest that some prognostic and predictive methods evaluated at low sampling rates will have variable performance and suffer from low repeatability. Approaches that capture tumour heterogeneity improve risk stratification of metastatic clear cell renal cell cancer specifically and furthermore, the sampling approach may be severely limiting in validation of novel predictive tools for cancer medicine.