Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MIRNA PROSTATE CANCER MARKER
Document Type and Number:
WIPO Patent Application WO/2018/049506
Kind Code:
A1
Abstract:
There is described herein a method of determining the likelihood of disease or disease progression in a patient with respect to prostate cancer, the method comprising: a) providing a biological fluid sample, preferably urine, from the patient containing miRNA; b) determining or measuring the abundance of at least one miRNA biomarker selected from the group consisting of: hsa-miR-3195, hsa-let-7b-5p, hsa-miR-144-3p, hsa-miR-451 a, hsa-miR-148a-3p, hsa-miR- 512-5p, and hsa-miR-431 -5p; c) comparing the abundance of said at least one miRNA biomarker in the sample with a reference or control abundance of at least one miRNA biomarker; and d) determining the likelihood of disease or disease progression; wherein a likelihood of disease or disease progression is higher when there is statistically significant higher abundance in the sample in comparison with the reference or control abundance.

Inventors:
BOUTROS PAUL CHRISTOPHER (CA)
BRISTOW ROBERT A (CA)
JEON JOUHYUN (CA)
BAPAT BHARATI (CA)
LIU STANLEY (CA)
Application Number:
PCT/CA2017/000205
Publication Date:
March 22, 2018
Filing Date:
September 14, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ONTARIO INSTITUTE FOR CANCER RES OICR (CA)
UNIV HEALTH NETWORK UHN (CA)
International Classes:
C12Q1/68; C07H21/02; C12N15/113; G01N33/48; G06F19/10; G06F19/20
Domestic Patent References:
WO2016127998A12016-08-18
WO2011080315A12011-07-07
WO2014085906A12014-06-12
Foreign References:
CA2951016A12015-12-17
Other References:
WALTER, B. A. ET AL.: "Comprehensive microRNA profiling of prostate cancer", JOURNAL OF CANCER, vol. 4, no. 5, 9 May 2013 (2013-05-09), pages 350 - 357
SCHUBERT, M. ET AL.: "Distinct microRNA expression profile in prostate cancer patients with early clinical failure and the impact of let-7 as prognostic marker in high-risk prostate cancer", PLOS ONE, vol. 8, no. 6, 14 June 2014 (2014-06-14), pages e65064
KIM, T. ET AL.: "Targeted proteomics identifies liquid-biopsy signatures for extracapsular prostate cancer", NATURE COMMUNICATIONS, vol. 7, 28 June 2016 (2016-06-28), pages 11906, Retrieved from the Internet [retrieved on 20171123]
Attorney, Agent or Firm:
CHIU, Jung-Kay et al. (CA)
Download PDF:
Claims:
CLAIMS:

1. A method of determining the likelihood of disease or disease progression in a patient with respect to prostate cancer, the method comprising: a) providing a biological fluid sample, preferably urine, from the patient containing miRNA; b) determining or measuring the abundance of at least one miRNA biomarker selected from the group consisting of: hsa-miR-3195, hsa-let-7b-5p, hsa-miR-144-3p, hsa-miR-451 a, hsa-miR-148a-3p, hsa-miR-512-5p, and hsa-miR-431 -5p. c) comparing the abundance of said at least one miRNA biomarker in the sample with a reference or control abundance of at least one miRNA biomarker ; and d) determining the likelihood of disease or disease progression; wherein a likelihood of disease or disease progression is higher when there is statistically significant higher abundance in the sample in comparison with the reference or control abundance. 2. The method according to claim 1 , wherein the at least one miRNA biomarker, is at least 2, 3, 4, 5, 6 or 7 patient biomarkers.

3. The method according to claim 2, wherein the at least one miRNA biomarker, is all of hsa-miR-3195, hsa-let-7b-5p, hsa-miR-144-3p, hsa-miR-451 a, hsa-miR-148a-3p, hsa-miR- 512-5p, and hsa-miR-431-5p. 4. The method according to any one of claims 1 -3, further comprising building a subject biomarker profile from the determined or measured patient biomarkers.

5. The method of any one of claims 1 to 4, wherein the prediction of disease progression is following at least one of active surveillance, surgery, endocrine therapy, chemotherapy, radiotherapy, hormone therapy, gene therapy, thermal therapy, and ultrasound therapy. 6. The method of any one of claims 1 to 5, further comprising classifying the patient into a high risk group if the likelihood of disease progression is relatively high or a low risk group if the likelihood of disease progression is relatively low.

7. The method of claim 6, further comprising treating the patient with more aggressive therapy if the patient is in the high risk group.

8. The method of claim 7, wherein the more aggressive therapy comprises adjuvant therapy, preferably hormone therapy, chemotherapy or radiotherapy. 9. A computer-implemented method of determining the likelihood of disease or disease progression in a patient with respect to prostate cancer, the method comprising: a) receiving, at at least one processor, data reflecting the abundance of at least one miRNA biomarker in a subject bodily fluid, preferably urine, selected from the group consisting of: hsa-miR-3195, hsa-let-7b-5p, hsa-miR-144-3p, hsa-miR-451 a, hsa-miR-148a-3p, hsa-miR-512-5p, and hsa-miR-431 -5p; b) constructing, at the at least one processor, an expression profile corresponding to the abundance; c) comparing, at the at least one processor, said subject abundance to corresponding reference or control abundance; d) determining, at the at least one processor, the likelihood of disease progression; wherein a likelihood of disease or disease progression is higher when there is statistically significant higher abundance in the sample in comparison with the reference or control abundance. 10. The method according to claim 9, wherein the at least one miRNA biomarker, is at least 2, 3, 4, 5, 6 or 7 miRNA biomarkers, preferably all 7 miRNA biomarkers.

1 1. A computer program product for use in conjunction with a general-purpose computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the method of any one of claims 1 to 6.

12. A computer readable medium having stored thereon a data structure for storing the computer program product according to claim 1 1.

13. A device for determining the likelihood of disease or disease progression in a patient with respect to prostate cancer, the device comprising: at least one processor; and electronic memory in communication with the at one processor, the electronic memory storing processor-executable code that, when executed at the at least one processor, causes the at least one processor to: a) receive data reflecting abundance of at least one miRNA biomarker in a patient bodily fluid, preferably urine, selected from the group consisting of: hsa-miR-3195, hsa-let-7b-5p, hsa-miR-144-3p, hsa-miR-451 a, hsa-miR-148a-3p, hsa-miR-512-5p, and hsa-miR-431 -5p; b) compare said patient biomarkers to corresponding reference or control biomarkers; and c) determining, at the at least one processor, the likelihood of disease progression; wherein a likelihood of disease or disease progression is higher when there is statistically significant higher abundance in the sample in comparison with the reference or control abundance. 14. The device according to claim 13, wherein the at least one miRNA biomarker, is at least 2, 3, 4, 5, 6 or 7 miRNA biomarkers, preferably all 7 miRNA biomarkers.

Description:
miRNA PROSTATE CANCER MARKER

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority of U.S. Provisional Patent Application No. 62/394,535 filed September 14, 2016 and incorporated herein by reference in its entirety.

FIELD OF INVENTION

The present disclosure relates generally to a prostate cancer biomarker signature. More particularly, the present disclosure relates to an miRNA signature for the prognosis of prostate cancer outcomes, which can inform treatment decisions and guide therapy.

BACKGROUND

Prostate cancer (PCa) is the most common non-skin male malignancy 1 and the second- leading cause of oncological mortality for men in developed countries 2 . Many PCa diagnoses involve indolent disease, and the clinical estimation of prognosis involves serum prostate- specific antigen (PSA) measurement, digital rectal examination (DRE) and multiple prostate biopsies to assess tumour grade (Gleason Score) 3 . While clinically effective, the low specificity of PSA testing, low sensitivity of DRE and complications of biopsies create an urgent clinical need for non-invasive prognostic tests 4 . Furthermore, prostate cancer is clinically heterogeneous with major spatial 5,6 and temporal genomic variability 7 , confounding the development of tissue-based molecular prognostic tests.

Several approaches have been considered to develop non-invasive molecular tests to identify aggressive prostate cancer. Cell-free DNA in plasma is increased in PCa patients 8 , and the number of circulating tumour cells in blood is associated with worse survival of PCa patients 9 . Although those liquid-biopsies provide information on cancer detection and progression, accurate isolation and quantification remains technically challenging. Urine proteins are another promising candidate matrix for non-invasive biomarkers 10,11 , but clinical translation of mass-spectrometry tests remains challenging.

SUMMARY OF INVENTION In an aspect, there is provided a method of determining the likelihood of disease or disease progression in a patient with respect to prostate cancer, the method comprising: a) providing a biological fluid sample, preferably urine, from the patient containing miRNA; b) determining or measuring the abundance of at least one miRNA biomarker selected from the group consisting of: hsa-miR-3195, hsa-let-7b-5p, hsa-miR-144-3p, hsa-miR-451 a, hsa-miR-148a-3p, hsa-miR- 512-5p, and hsa-miR-431 -5p; c) comparing the abundance of said at least one miRNA biomarker in the sample with a reference or control abundance of at least one miRNA biomarker; and d) determining the likelihood of disease or disease progression; wherein a likelihood of disease or disease progression is higher when there is statistically significant higher abundance in the sample in comparison with the reference or control abundance. In an aspect, there is provided a computer-implemented method of determining the likelihood of disease or disease progression in a patient with respect to prostate cancer, the method comprising: a) receiving, at at least one processor, data reflecting the abundance of at least one miRNA biomarker in a subject bodily fluid, preferably urine, selected from the group consisting of: hsa-miR-3195, hsa-let-7b-5p, hsa-miR-144-3p, hsa-miR-451 a, hsa-miR-148a- 3p, hsa-miR-512-5p, and hsa-miR-431 -5p; b) constructing, at the at least one processor, an expression profile corresponding to the abundance; c) comparing, at the at least one processor, said subject abundance to corresponding reference or control abundance; d) determining, at the at least one processor, the likelihood of disease progression; wherein a likelihood of disease or disease progression is higher when there is statistically significant higher abundance in the sample in comparison with the reference or control abundance.

In an aspect, there is provided a computer program product for use in conjunction with a general-purpose computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the method described herein.

In an aspect, there is provided a computer readable medium having stored thereon a data structure for storing the computer program product described herein. In an aspect, there is provided a device for determining the likelihood of disease or disease progression in a patient with respect to prostate cancer, the device comprising: at least one processor; and electronic memory in communication with the at one processor, the electronic memory storing processor-executable code that, when executed at the at least one processor, causes the at least one processor to: a) receive data reflecting abundance of at least one miRNA biomarker in a patient bodily fluid, preferably urine, selected from the group consisting of: hsa-miR-3195, hsa-let-7b-5p, hsa-miR-144-3p, hsa-miR-451 a, hsa-miR-148a-3p, hsa-miR- 512-5p, and hsa-miR-431 -5p; b) compare said patient biomarkers to corresponding reference or control biomarkers; and c) determining, at the at least one processor, the likelihood of disease progression; wherein a likelihood of disease or disease progression is higher when there is statistically significant higher abundance in the sample in comparison with the reference or control abundance.

Other aspects and features of the present disclosure will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments in conjunction with the accompanying figures.

BRIEF DESCRIPTION OF FIGURES

Embodiments of the present disclosure will now be described, by way of example only, with reference to the attached Figures.

Figure 1 shows DRE-urine miRNA transcriptome profile, (a) Overview of analysis of miRNA abundance variance in PCa patients, (b) Parameter selection to optimize miRNA abundance. Similarity (p) represents similarity of miRNA profile between two control samples. Misinterpreted samples indicate the fraction of samples with failed normalization. The similarity between control samples is likely to be increased when there are more misinterpreted samples. Since we considered samples with less than 10% of expressed miRNAs after normalization as misinterpreted samples, this correlation could be an inevitable effect of small size of expressed miRNAs to calculate a similarity. To mitigate this effect, we only considered parameters that show high similarity between controls and zero misinterpreted samples (red arrow), (c) Normalized miRNA transcriptome profile. Green and grey bars (top) represent the number of detected miRNAs in patient urine samples and control samples, respectively. Yellow (≥ 5) and grey (< 5) bars (right) represent the number of samples that a given miRNA is detected.

Figure 2 shows correlation of miRNA abundance intra- and inter-individual patients, (a) Global similarity of miRNA transcriptome. Black box indicates similarity of intra-individual patient. Yellow and purple lines in right panel indicate the distribution of correlation coefficient (p) of inter- and intra-individual patient, respectively, (b) Coefficient of variation (CV) of individual miRNA (left) and its difference between intra- and inter-individual (right). Negative grey bar indicates that inter-CV is bigger than intra-CV. (c) The distribution of estimated variability of miRNA abundance. Intra- and inter-variability are estimated using intra-class correlation coefficient (ICC). Bar graph (right) shows the average proportion of inter-individual variability (yellow) and intra-individual variability (purple) of miRNA abundance.

Figure 3 shows biological properties of miRNAs and their target genes, (a) Chromosomal positions of studied miRNAs. miRNAs are divided into four variable groups depending on ICC (Q25, Q50, Q75 and Q100). Dashed red boxes indicate enriched chromosome of a given group (q < 0.1 ). Q25 and Q100 represent miRNAs that are most and least variable within individuals, respectively, (b) Number of target genes in variable groups, (c) Overlapped target genes among variable groups, (d) Enriched biological functions of target genes in variable groups. In total, 1 ,215 GO terms showing q < 0.25 in at least one variable group are coloured. Q25, Q50, Q75 and Q100 represent targets that are regulated by specific variable group. Common indicates targets that are regulated by all four variable groups. GO terms with q < 0.05 in each variable group are shown. Full GO terms and their enriched scores available but not shown.

Figure 4 shows predictive model to distinguish different risk groups of PCa. (a) Schematic view of the design of the machine-learning based predictive model to classify two risk-groups. (b) The performance of predictive model in a training cohort. Bold green line indicates median AUC of 10-times repeated 5-fold cross validation. Grey shadow indicates all cross validated AUCs. (c) The performance of predictive model in a validation cohort. ROC curves of intra- stable (purple), intra-variable (yellow) and randomly selected (yellow) miRNA signatures are compared, (d) AUC distribution of random models. In total, 10,000 random models are generated and their AUCs are measured. Purple, yellow and grey dashed lines represent AUCs of predictive models based on intra-stable, intra-variable and random (median AUC of random models), respectively.

Figure 5 shows optimization of miRNA transcriptome. (a) Similarity of miRNA profile of control samples before (left) and after (right) normalization are compared, (b) Distribution of miRNA abundance before (left) and after (right) normalization are compared, (c) Number of miRNAs that are detected in a given number of samples. Figure 6 shows miRNA abundances in tissues and urine samples of PCa patients, (a) Fraction of tissues and urine samples in which miRNA is detected. Each hexbin represents the number of miRNAs that are detected in a given fraction of urine samples and tumour tissues of PCa patients, (b) Presence and absence of miRNA in urine samples and tissues are compared (blue). Absence of miRNA is defined when a given miRNA has 0 of mean normalized count across samples. Number of urine samples (green) and tissues (yellow) in which the miRNA detected are shown. Red dashed box shows miRNAs that are used for further analyses, (c) Comparison of number of target genes between tissue-specific miRNAs and all miRNAs in miRTarBase. The number of miRNAs with known target genes is in parentheses, (d and e) Comparison of miRNA abundance between urine samples and tissues, (d) All studied miRNAs and (e) miRNAs that are detected in both urine and tissue (mean normalized count > 1 ) are shown.

Figure 7 shows the effect of expression difference on the similarity of miRNA expression between samples. Grey bars represent the miRNA abundance similarity between DRE-urine samples from an identical patient. Green shows the difference of the number of undetected miRNAs in two samples. Purple represents the fraction of undetected miRNAs of each sample.

Figure 8 shows clustering of miRNA expressions, (a) The variation of global expression profiles was further evaluated by performing consensus hierarchical clustering (k-means clustering, k = 5). Samples from an identical patient are clustered together at k = 5 and (b) tend to be involved in the same cluster regardless of number of clusters. Figure 9 shows miRNA abundance variability in intra- and inter-individual PCa patient, (a) Coefficient of variation (CV) of intra- (purple) and inter-individual (yellow). Variable estimates (ICC) is compared with (b) mean miRNA abundance in samples and (c) number of urine samples that a given miRNA is detected. Figure 10 shows biological properties of miRNAs and their target genes, (a) Tested miRNAs are divided into 4 variable groups depending on ICC. Q25, Q50, Q75 and Q100 represent 1 st , 2 nd , 3 rd and 4 th quartile, respectively, (b) Chromosomal enrichment of miRNAs. Red dot indicates significantly enriched chromosome of a given variable group (q < 0.1 ). Orange bars represent the number of studied miRNAs in a given chromosome, (c) Number of target genes depending on variable groups. Target genes with different number of experimental evidences are compared. The number of miRNAs with known target genes is in parentheses. Red dotted line indicates an average number of target genes in miRTarBase.

Figure 11 shows relationship between the miRNA abundances in urine and PCa tissues, (a) Variable group-specific correlation of miRNA abundance between urine samples and PCa tissues. Spearman's p and its statistical significance are tabulated, (b) Correlation coefficients of different variable groups.

Figure 12 shows univariate analysis of miRNAs in a training cohort, (a) miRNA abundance profile of an independent training cohort, (b) Relationship of miRNA abundances between discovery cohort and training cohort, (c) Abundance changes between high-risk and low-risk groups and their statistical significances. Each miRNA is coloured depending on its estimated variability. Dashed horizontal line indicates p = 0.05. (d) Fraction of variable groups depending on statistical significance to discriminate two risk groups.

Figure 13 shows univariate analysis of miRNAs in PCa tissues, (a) Abundance changes between high-risk and low-risk PCa tissues and their statistical significances. Each miRNA is coloured depending on its estimated variability. Dashed horizontal line indicates p = 0.001. Dashed vertical lines indicate -1.5 fold-change, no change and 1.5 fold-change, respectively, (b) Fraction of variable groups in significantly changed miRNAs. Significantly changed miRNAs show more than 1.5 fold-change of abundance between two risk groups with statistical significance, p < 0.001. Figure 14 shows performance evaluation of predictive model, (a) Importance of miRNAs to discriminate two risk groups. Importance of each miRNA represents average mean decrease in accuracy from resampling, (b) Median area under the ROC curve (AUC) depending on the number of selected miRNAs. miRNAs that are ranked from top 2 to top 15 are selected and used to generate a predictive model.

Figure 15 shows univariate analysis of miRNAs in a validation cohort, (a) miRNA abundance profile of a validation cohort, (b) Relationship of miRNA abundances between discovery cohort and validation cohort, (c) Fraction of variable groups depending on statistical significance to discriminate two risk groups. Figure 16 shows relationship between miRNA profile similarity and other clinical information. miRNA profile similarity between two samples of a same patient is compared with PSA, Age and time difference between sample preparation (delta of sample preparation). Discovery cohort was used to examine the relationship.

Figure 17 shows shows suitable configured computer device, and associated communications networks, devices, software and firmware to provide a platform for enabling one or more embodiments as described herein.

DETAILED DESCRIPTION

The development of non-invasive tests for the early detection and assessment of tumour aggressiveness is a major goal in prostate cancer research. miRNAs are promising noninvasive biomarkers. They play an essential role in tumorigenesis, are stable under diverse analytical conditions, and can be detected in body fluids. Specifically, these small RNAs are involved in prostate cancer development and progression 12 , influence treatment response 13 , are stable under harsh conditions 14 and have been detected in urine 15 . We therefore investigated the intra-individual stability of the urine miRNA transcriptome by investigating longitudinal changes over months to years in a cohort of patients with localized prostate cancer. We find that each individual have a specific urine miRNA fingerprint that is temporally stable and biased towards specific biological functions. We combined this observation with machine-learning techniques to create a urine biomarker that identifies aggressive prostate cancer. This biomarker, comprised of seven miRNAs, was validated in an independent prostate cancer cohort to non-invasively predict high-risk disease at a similar accuracy to the best tissue-based prognostic markers (AUC: 0.71 ). By understanding the intra- and inter-tumoural heterogeneity of the urine miRNA transcriptome, non-invasive biomarkers can be developed to precede or supplement tissue-based clinical tests.

In an aspect, there is provided a method of determining the likelihood of disease or disease progression in a patient with respect to prostate cancer, the method comprising: a) providing a biological fluid sample, preferably urine, from the patient containing miRNA; b) determining or measuring the abundance of at least one miRNA biomarker selected from the group consisting of: hsa-miR-3195, hsa-let-7b-5p, hsa-miR-144-3p, hsa-miR-451 a, hsa-miR-148a-3p, hsa-miR- 512-5p, and hsa-miR-431 -5p; c) comparing the abundance of said at least one miRNA biomarker in the sample with a reference or control abundance of at least one miRNA biomarker; and d) determining the likelihood of disease or disease progression; wherein a likelihood of disease or disease progression is higher when there is statistically significant higher abundance in the sample in comparison with the reference or control abundance.

The methods described herein are useful for prognosing the outcome of a subject that has, or has had, a cancer associated with the prostate. The cancer may be prostate cancer or a cancer that has metastasized from a cancer of the prostate.

The term "subject" as used herein refers to any member of the animal kingdom, preferably a human being and most preferably a human being that has, has had, or is suspected of having prostate cancer.

The term "biological fluid sample" as used herein refers to any fluid , from a subject which can be assayed for the biomarkers described herein. Biological fluids generally include amniotic fluid, aqueous humour and vitreous humour, bile, blood, blood serum, breast milk, cerebrospinal fluid, cerumen (earwax), chyle, chyme, endolymph and perilymph, exudates, feces, female ejaculate, gastric acid, gastric juice, lymph, mucus (including nasal drainage and phlegm), pericardial fluid, peritoneal fluid, pleural fluid, pus, rheum, saliva, sebum (skin oil), serous fluid, semen, smegma, sputum, synovial fluid, sweat, tears, urine, vaginal secretion, and vomit. The term "prognosis" as used herein refers to the prediction of a clinical outcome associated with a disease subtype which is reflected by a reference profile such as a biomarker reference profile. The prognosis provides an indication of disease progression and includes an indication of likelihood of death due to cancer. The prognosis may be a prediction of metastasis, or alternatively disease recurrence. In one embodiment the clinical outcome class includes a better survival group and a worse survival group. The term "prognosing or classifying" as used herein means predicting or identifying the clinical outcome of a subject according to the subject's similarity to a reference profile or biomarker associated with the prognosis. For example, prognosing or classifying comprises a method or process of determining whether an individual has a better or worse survival outcome, or grouping individuals into a better survival group or a worse survival group, or predicting whether or not an individual will respond to therapy.

As used herein, the term "control" refers to a specific value or dataset that can be used to prognose or classify the value e.g the measured biomarker or reference biomarker profile obtained from the test sample associated with an outcome. In one embodiment, a dataset may be obtained from samples from a group of subjects known to have cancer having different tumor states and/or healthy individuals. The state or expression data of the biomarkers in the dataset can be used to create a control value that is used in testing samples from new patients. In some embodiments, a cohort of subjects is used to obtain a control dataset. A control cohort patients may be a group of individuals with or without cancer.

In some embodiments, the at least one miRNA biomarker, is at least 2, 3, 4, 5, 6 or 7 patient biomarkers. Preferably, the at least one miRNA biomarker, is all of hsa-miR-3195, hsa-let-7b- 5p, hsa-miR-144-3p, hsa-miR-451 a, hsa-miR-148a-3p, hsa-miR-512-5p, and hsa-miR-431-5p.

In some embodiments, the method further comprises building a subject biomarker profile from the determined or measured patient biomarkers.

In some embodiments, the method further comprises building a patient biomarker profile from the determined or measured patient biomarkers.

The term "biomarker profile" as used herein refers to a dataset representing the state or expression level(s) of one or more biomarkers. A biomarker profile may represent one subject, or alternatively a consolidated dataset of a cohort of subjects, for example to establish a reference biomarker profile as a control.

In some embodiments, the prediction of disease progression is following at least one of active surveillance, surgery, endocrine therapy, chemotherapy, radiotherapy, hormone therapy, gene therapy, thermal therapy, and ultrasound therapy.

In some embodiments, the method further comprises classifying the patient into a high risk group if the likelihood of disease progression is relatively high or a low risk group if the likelihood of disease progression is relatively low. Preferably, the method further comprises treating the patient with more aggressive therapy if the patient is in the high risk group. Preferably, the more aggressive therapy comprises adjuvant therapy, preferably hormone therapy, chemotherapy or radiotherapy.

As used herein, "overall survival" refers to the percentage of or length of time that people in a study or treatment group are still alive following from either the date of diagnosis or the start of treatment for a disease, such as cancer. In a clinical trial, measuring the overall survival is one way to see how well a new treatment works.

As used herein, "relapse-free survival" refers to, in the case of caner, the percentage of or length of time that people in a study or treatment group survive without any signs or symptoms of that cancer after primary treatment for that cancer. In a clinical trial, measuring the relapse- free survival is one way to see how well a new treatment works. It is defined as any disease recurrence or relapse (local, regional, or distant).

The term "good survival" or "better survival" as used herein refers to an increased chance of survival as compared to patients in the "poor survival" group. For example, the biomarkers of the application can prognose or classify patients into a "good survival group". These patients are at a lower risk of death after surgery and can also be categorized into a "low-risk group". The term "poor survival" or "worse survival" as used herein refers to an increased risk of disease progression or death as compared to patients in the "good survival" group. For example, biomarkers or genes of the application can prognose or classify patients into a "poor survival group". These patients are at greater risk of death or adverse reaction from disease or surgery, treatment for the disease or other causes, and can also be categorized into a "high- risk group". The present system and method may be practiced in various embodiments. A suitably configured computer device, and associated communications networks, devices, software and firmware may provide a platform for enabling one or more embodiments as described above. By way of example, FIG. 17 shows a generic computer device 100 that may include a central processing unit ("CPU") 102 connected to a storage unit 104 and to a random access memory 106. The CPU 102 may process an operating system 101 , application program 103, and data 123. The operating system 101 , application program 103, and data 123 may be stored in storage unit 104 and loaded into memory 106, as may be required. Computer device 100 may further include a graphics processing unit (GPU) 122 which is operatively connected to CPU 102 and to memory 106 to offload intensive image processing calculations from CPU 102 and run these calculations in parallel with CPU 102. An operator 107 may interact with the computer device 100 using a video display 108 connected by a video interface 105, and various input/output devices such as a keyboard 1 15, mouse 1 12, and disk drive or solid state drive 1 14 connected by an I/O interface 109. In known manner, the mouse 1 12 may be configured to control movement of a cursor in the video display 108, and to operate various graphical user interface (GUI) controls appearing in the video display 108 with a mouse button. The disk drive or solid state drive 1 14 may be configured to accept computer readable media 1 16. The computer device 100 may form part of a network via a network interface 1 1 1 , allowing the computer device 100 to communicate with other suitably configured data processing systems (not shown). One or more different types of sensors 135 may be used to receive input from various sources.

The present system and method may be practiced on virtually any manner of computer device including a desktop computer, laptop computer, tablet computer or wireless handheld. The present system and method may also be implemented as a computer-readable/useable medium that includes computer program code to enable one or more computer devices to implement each of the various process steps in a method in accordance with the present invention. In case of more than computer devices performing the entire operation, the computer devices are networked to distribute the various steps of the operation. It is understood that the terms computer-readable medium or computer useable medium comprises one or more of any type of physical embodiment of the program code. In particular, the computer-readable/useable medium can comprise program code embodied on one or more portable storage articles of manufacture (e.g. an optical disc, a magnetic disk, a tape, etc.), on one or more data storage portioned of a computing device, such as memory associated with a computer and/or a storage system.

In an aspect, there is provided a computer-implemented method of determining the likelihood of disease or disease progression in a patient with respect to prostate cancer, the method comprising:a) receiving, at at least one processor, data reflecting the abundance of at least one miRNA biomarker in a subject bodily fluid, preferably urine, selected from the group consisting of: hsa-miR-3195, hsa-let-7b-5p, hsa-miR-144-3p, hsa-miR-451 a, hsa-miR-148a- 3p, hsa-miR-512-5p, and hsa-miR-431 -5p; b) constructing, at the at least one processor, an expression profile corresponding to the abundance; c) comparing, at the at least one processor, said subject abundance to corresponding reference or control abundance; d) determining, at the at least one processor, the likelihood of disease progression; wherein a likelihood of disease or disease progression is higher when there is statistically significant higher abundance in the sample in comparison with the reference or control abundance.

In an aspect, there is provided a computer program product for use in conjunction with a general-purpose computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the method described herein. In an aspect, there is provided a computer readable medium having stored thereon a data structure for storing the computer program product described herein.

In an aspect, there is provided a device for determining the likelihood of disease or disease progression in a patient with respect to prostate cancer, the device comprising: at least one processor; and electronic memory in communication with the at one processor, the electronic memory storing processor-executable code that, when executed at the at least one processor, causes the at least one processor to: a) receive data reflecting abundance of at least one miRNA biomarker in a patient bodily fluid, preferably urine, selected from the group consisting of: hsa-miR-3195, hsa-let-7b-5p, hsa-miR-144-3p, hsa-miR-451 a, hsa-miR-148a-3p, hsa-miR- 512-5p, and hsa-miR-431 -5p; b) compare said patient biomarkers to corresponding reference or control biomarkers; and c) determining, at the at least one processor, the likelihood of disease progression; wherein a likelihood of disease or disease progression is higher when there is statistically significant higher abundance in the sample in comparison with the reference or control abundance.

A person skilled in the art would understand how to implement differing cut-offs for good survival vs. worse survival, depending on the clinical outcome one is predicting and the biomarkers being assayed.

As used herein, "processor" may be any type of processor, such as, for example, any type of general-purpose microprocessor or microcontroller (e.g., an Intel™ x86, PowerPC™, ARM™ processor, or the like), a digital signal processing (DSP) processor, an integrated circuit, a field programmable gate array (FPGA), or any combination thereof. As used herein "memory" may include a suitable combination of any type of computer memory that is located either internally or externally such as, for example, random-access memory (RAM), read-only memory (ROM), compact disc read-only memory (CDROM), electro-optical memory, magneto-optical memory, erasable programmable read-only memory (EPROM), and electrically-erasable programmable read-only memory (EEPROM), or the like. Portions of memory 102 may be organized using a conventional filesystem, controlled and administered by an operating system governing overall operation of a device.

As used herein, "computer readable storage medium" (also referred to as a machine-readable medium, a processor-readable medium, or a computer usable medium having a computer- readable program code embodied therein) is a medium capable of storing data in a format readable by a computer or machine. The machine-readable medium can be any suitable tangible, non-transitory medium, including magnetic, optical, or electrical storage medium including a diskette, compact disk read only memory (CD-ROM), memory device (volatile or non-volatile), or similar storage mechanism. The computer readable storage medium can contain various sets of instructions, code sequences, configuration information, or other data, which, when executed, cause a processor to perform steps in a method according to an embodiment of the disclosure. Those of ordinary skill in the art will appreciate that other instructions and operations necessary to implement the described implementations can also be stored on the computer readable storage medium. The instructions stored on the computer readable storage medium can be executed by a processor or other suitable processing device, and can interface with circuitry to perform the described tasks. As used herein, "data structure" a particular way of organizing data in a computer so that it can be used efficiently. Data structures can implement one or more particular abstract data types (ADT), which specify the operations that can be performed on a data structure and the computational complexity of those operations. In comparison, a data structure is a concrete implementation of the specification provided by an ADT.

The above listed aspects and/or embodiments may be combined in various combinations as appreciated by a person of skill in the art. The advantages of the present disclosure are further illustrated by the following examples. The examples and their particular details set forth herein are presented for illustration only and should not be construed as a limitation on the claims of the present invention.

EXAMPLES

Methods and Materials

Urine sample procurement - Discovery cohort To measure intra- and inter-variability of miRNA abundance, 10 prostate cancer (PCa) patients with Gleason score (GS) 6 were recruited prospectively from the Active Surveillance program of the Odette Cancer Centre at Sunnybrook Health Sciences Centre (Toronto, Canada). Urine samples were obtained during scheduled surveillance check-ups (mean time interval between check-ups is 291 days). Following digital rectal exam (DRE) performed by the attending oncologist, first-catch urine samples (20-70 mL) were collected in vials containing 25 mM EDTA. Within 2-5 hours of collection, samples were centrifuged at 1800g for 10 minutes followed by washing with 5 mL of cold 1x PBS to obtain the urinary cell sediments. Sediments were re-suspended in 200 pL of 1 x PBS and frozen at -80°C. Two to three consecutive DRE- urine samples were collected from each PCa patient. In total, 22 DRE-urine samples were collected from 10 PCa patients (data not shown). miRNA extraction and profiling

Small RNA molecules (200 nucleotides or less, including miRNAs) were isolated from urinary cell sediments using the Urine miRNA Purification kit (Norgen Biotek Corp., Thorold, Ontario, Canada, Catalogue # 29000) according to the manufacturer's protocol. Following isolation, RNA was purified using ammonium acetate-ethanol precipitation. 25 μί. of 7.5 mM ammonium acetate and 125 μί. of cold 100 % ethanol was added to isolated RNA samples (50 μΙ) and left at -80 °C overnight. 1 mL of cold 80 % ethanol was added and samples were centrifuged at 18,000 x g for 30 minutes at 4 °C. RNA pellet was washed twice with 0.5 mL of cold 80% ethanol and centrifuged at 18,000 x g for 10 minutes at 4 °C. Ethanol was removed and pellets were allowed to dry at room temperature. Dried pellets were re-suspended in 22 μΙ_ of nuclease-free water.

Quality and quantity of RNA samples were evaluated using NanoDrop 8000 Spectrophotometer (Thermo Scientific, Wilmington, BE, USA). miRNA profiling was performed using nCounter® Human v.2 miRNA Expression Assay (NanoString Technologies, Seattle, WA, USA). Up to 100 ng per sample was used for profiling. Two batches were loaded with 1 1 samples and a mixture of four PCa cell lines (DU-145, PC-3, C42, and LNCaP). Cell line RNA served as an internal control for normalization of sample miRNA expression between batches. Raw data are deposited at GEO Omnibus (GSE86474, http://www.ncbi.nlm.nih.gov/geo/). Measuring miRNA Abundance and Normalization miRNAs abundance was normalized using the R package NanoStringNorm (v1 .1.20) 1 . It examines the signal intensities of housekeeping genes, positive genes and negative genes by changing optimization options (such as SampleContent, CodeCount, and Background) and normalizes abundance values to show high abundance of housekeeping genes and positive genes but low abundance for negative controls. We used seven metrics for Sample Content, three metrics for CodeCount, four metrics for Background and three other optimization options to correct scale of abundance values. In total, 252 parameter combinations were tested. To find the best parameter combinations, we adopted a new strategy to use control sample. Control sample is composed of miRNAs that are extracted from the mix of four prostate cancer cell line (DU-145, PC-3, C42, and LNCaP). They were equally mixed together and loaded to each batch. We measured the similarity of abundance profiles between control samples in batches. Additionally, we counted the fraction of misinterpreted samples that are erroneously normalized in the parameter settings. Finally, we identified the best parameter combination that minimizes both the variability of control samples across the two batches and misinterpreted samples. Similarity between control samples is measured using Spearman's correlation coefficient (p). Misinterpreted samples occur when positive controls (ratio of average abundance of positive controls and abundance of each positive control was either < 0.3 or > 3), negative controls (Z-score > 5), housekeeping genes (Z-score > 5) are not expressed or show high abundance variation and the fraction of detected miRNAs (read count after normalization > 0) in a sample is less than 10% of total endogenous miRNAs. In results, the data was normalized using the geometric mean of the housekeeping genes and background corrected by subtracting the mean of the negative controls. Normalized miRNA abundances were log 2 transformed.

Measuring similarity of miRNA profiles

To examine global similarity of miRNA profiles, we created a sample-by sample correlation matrix using Spearman's correlation coefficient (p). To measure the similarity in abundance for individual miRNAs, we calculated coefficient of variation (CV) of each as: (1 ) σ and μ represent the standard deviation and mean of miRNA abundance, respectively.

Quantify intra- and inter-individual variance of miRNA abundance The relative effects of intra- and inter-individual miRNA variance were assessed via linear mixed-effects regression using Ime4 package (v1.1 -10) 2 in R statistical environment. In the model, subjects were specified as a random factor to control for their associated intra-class correlation.

Where Y is normalized abundance for the i replicate (sample) in the j individual (patient), μ is mean abundance for any miRNA. The individual (A) and replicates (e) effects are assumed to be random with variance, respectively. To measure the intra- and inter-individual variances, we calculated intra-class correlation coefficient (ICC).

The ICC represents the proportion of inter-individual variance relative to total intra- and inter- individual variance explained by a model. A high ICC indicates a high level of inter-individual variability relative to intra-individual variability.

Biological properties of intra-stable and intra-variable miRNAs To examine chromosome locations of miRNAs, miRBase (v21 ) 3 is used. The database compiles all miRNA sequences and their annotation. For the identification of target genes of miRNAs, miRTarBase (v6.0) 4 is used. The database collected experimentally validated miRNA-target interactions. In total, 5,506 miRNA - target gene interactions, which have strong experimental evidence (e.g. reporter assay and western blot), are deposited in the database. These interactions are composed of 515 human miRNAs and 2,180 target genes. We identified 3,462 interactions between 199 studied miRNAs and 1 ,669 potential target genes (data not shown). To analyze biological function of target genes, we used g:Profiler (v0.6.1 ) to classify target genes into gene ontology (GO) categories and calculate the statistical significance of over-representation of a given biological function 5 . GO terms enriched in our dataset (FDR < 0.25) available but not shown.

Generating a predictive model to discriminate PCa patient risk groups

To generate a predictive model that can distinguish high-risk PCa patients from low-risk PCa patients, we first defined an independent training cohort, which is not used for intra- and inter- individual variance of miRNA abundance. Training cohort (n = 99) is composed of 50 DRE- urine samples from high-risk PCa patients (positive set) and 49 DRE-urine samples from low- risk PCa patients (negative set). PCa can be grouped based on the risk of recurrence. High- risk PCa commonly refers to the most aggressive of tumours. Meanwhile, low-risk PCa are unlikely to grow or spread for a few years 6 . High-risk PCa was defined as biopsy Gleason score (GS) > 7, and low-risk PCa was defined as biopsy GS = 6. Urinary miRNAs are prepared as previously described. Nine batches were loaded with 99 samples and a mixture of four PCa cell lines (DU-145, PC-3, C42, and LNCaP). miRNA abundance is normalized as previously described.

To confirm that a predictive model has similar prognostic value in different population, external validation set (18 high-risk and 29 low-risk samples), is introduced. Urinary miRNAs are prepared as previously described and loaded to four batches without control sample. To reduce the batch effects and obtain optimal miRNA profile, adjusted rank index (ARI) was measured after K-means clustering (number of clusters = 4). The best parameter combination for NanoStringNorm that minimize ARI between batches was selected.

A Random forest was used to discriminate PCa patient risk groups. Random forest is a widely used machine learning algorithm with excellent performance on many applications in cancer biomarker identification 7 . First, we used an independent training cohort (n = 99) to identify miRNA biomarker signatures and generate a predictive model. Relevant intra-stable miRNAs, taken from the Q100 quartile, are selected as features for the predictive model. For the feature selection and parameter optimization for random forest, we performed 10-repeated 5-fold cross validation (50 resampling). During each resampling, we measured mean decrease in accuracy as a metric to evaluate the importance of each miRNA to discriminate two risk groups (Fig. 14a). For this, R package 'randomForest' (v4.6-12) is used. Next, we subsequently selected the top-ranked miRNAs as features to build a predictive model. The resulting area under the receiver operating characteristic curve (AUC) was calculated and used as a performance measure. A set of top-ranked miRNAs that showed the highest AUC was chosen as the most relevant features (Fig. 14b). Parameters for random forest (mtry and ntree) were optimized by grid search using 5-fold cross validation with a 10 repeat. As a result, we selected the top seven miRNAs (hsa-miR-3195, hsa-let-7b-5p, hsa-miR-144-3p, hsa-miR-451 a, hsa-miR-148a-3p, hsa-miR-512-5p, hsa-miR-431 -5p) as the most relevant features. All machine-learning approaches were performed using randomForestSRC (v2.4.2) for the R statistical environment 8 . miRNA abundance in tissue samples miRNA expression data of prostate adenocarcinoma is downloaded from the TCGA data Portal (https://portal.gdc.cancer.gov). A total of 480 tumour samples with sample code '01 ' and vial code 'A' were found at the time the data were downloaded. Normalized quantification expression levels for these samples were further examined for each investigated miRNA.

Statistical Analysis

All analyses were carried out in the programming language R (v3.4.0) with the aforementioned packages. In general, unpaired Student's t-tests with Welch's adjustment for heteroscedasticity were used to examine statistical significance for two-group comparisons. To evaluate the enriched chromosomal position of intra-stable and intra-variable miRNAs, bootstrap test with 10,000 iterations is used. To examine functional enrichment of variant and non-variant miRNAs, Benjamini-Hochberg adjusted p-values were used to account for multiple testing. Hierarchical clustering analysis was performed with the ConsensusClusterPlus package (v1 .40.0) in R statistical environment 9 . Data visualization employed the BPG package (v5.7.1 ) for the R statistical environment (P'ng et al. in submission).

Results/Discussion miRNA Landscape in Urine of Prostate Cancer Patients miRNA are known to be both stable and present in the urine 14,15 . However, to form robust markers, it is critical that analyte profiles remain stable over time and distinctive from individual to individual. We employed NanoString nCounter technology to profile the abundances of 673 human miRNAs in 22 serial DRE-urine samples from 10 PCa patients (discovery cohort) with localized disease undergoing active surveillance (i.e. repeated monitoring for disease progression without active administration of therapy; Fig. 1a). In discovery cohort, all patients were clinically comparable with the same Gleason score (3+3) and clinical T-category (T1 c; data not shown). The median time between urine collections was 245 days. We systematically assessed a panel of 252 pre-processing strategies based on similarity across batches (Spearman's p) and the rate of failed normalizations (Fig. 1 b). The optimal pre-processing methodology minimized control samples variance (Spearman's p = 0.73; Fig. 5a) and yielded similar distribution of miRNA abundances in all samples (Fig. 5b).

Overall miRNA Abundance Profile

A subset of miRNAs was detected in urine: 481 were detected in at least one sample, 358 in at least three samples, and 25 in all 22 samples (normalized read count > 0; Fig. 5c). miRNAs detected in urine overlap strongly with those expressed in PCa tissue 16 : 88.98% (428/481 ) of miRNAs detected in at least one urine sample were also detected in at least one PCa tumour tissue (Fig. 6a,b). There are 53 miRNAs (7.88%, 53/673) that are only observed in urine. Meanwhile, 167 miRNAs are only observed in PCa tumour tissue (Fig. 6b). These tissue- specific miRNAs regulated smaller number of genes (average 5 genes) compared to known human miRNAs in the most updated miRNA-target interaction database, miRTarBase [ref] (average 1 1 genes, bootstrap test with 10,000 iterations, p = 0.001 , Fig. 6c). It suggests that tissue-specific miRNAs occupy a peripheral position in miRNA-target gene network which has minimal functional impact [ref]. In addition, urine miRNA abundance was directly correlated to miRNA abundance in PCa tissues (Spearman's p of all tested miRNAs = 0.23; Fig. 6d and Spearman's p of miRNAs that are observed in both tissues and urine = 0.36; Fig. 6e). We focused analyses of longitudinal stability of urine miRNA profiles on the 298 miRNAs detected in at least five samples (Fig. 1 c and Table 2). Urine miRNA abundance profiles from a single individual resemble one another more closely (p in tra = 0.67 ± 0. 1 0) than they do profiles from different individuals (p in ter = 0.40 ± 0.1 5, p = 9.20 x 1 0 -9 , Student's t-test; Fig. 2a). Further, the primary difference within individuals is the numbers of miRNAs detected, rather than abundance changes within the set detected (Fig. 7). This suggests differences in experimental sensitivity, not biology. These results were confirmed using k-means clustering: samples from the same individual cluster together (Fig. 8a, b).

To understand which miRNA species are most- and least-variable within individuals, we used variance analysis (Fig. 9a). Overall, 89.93% of miRNA species showed more variability between individuals than within individuals (Fig. 2b). To deconvolve the variance of individual miRNA into intra- and inter-individual components, we performed linear mixed-effects modelling (LMM) and measured the intra-class correlation coefficient (ICC). The higher a miRNA's ICC, the more it varies primarily across individuals rather than within them. Overall, 41 ±28% of total variance occurs between individuals (Fig. 2c). It suggests that there is a subset of miRNAs that show unusual variability within individuals. These particular urinary miRNAs should be excluded from biomarker-discovery studies due to their inherent variability (Table 3). Interestingly, the amount of miRNA in urine sample is weakly correlated with ICC (Spearman's p= 0.20, p = 6. 16 x 1 0 -4 ; Fig. 9b), as was the number of samples in which each miRNA was detected (Spearman's p= 0. 1 1 , p = 4.97 x 1 0 -2 ; Fig. 9c).

x The biological consequences of miRNA variability

We hypothesized that miRNAs that varied within individuals would carry distinct biological roles. We created four quartiles of miRNAs based on their ICC (Fig. 10a). Those miRNAs most variable within individuals are in the first quartile (Q25, intra-variable) while those least variable within individuals are in the fourth quartile (Q100, intra-stable). Intra-stable miRNAs (Q100) showed a clear chromosomal bias (Fig. 3a and Fig. 10b), for example with a large enrichment on chromosomes 6 and 17 (bootstrap test with 10,000 iterations, q < 0.1 ).

To determine if miRNAs that showed intra-individual variability targeted a set of genes with specific functions, we collected experimentally validated miRNA target genes (data not shown). Intra-stable miRNAs (Q100) targeted more genes (average 24 target genes) than intra-variable miRNAs (Q25, average 1 1 target genes, p = 0.02, two-sided Student's t-test, Fig. 3b). This relationship held true only when more reliable target genes were considered (Fig. 10c). About 68% of all targets (1 , 133 genes) were regulated in a miRNA variability-specific manner and, only 3.8% of targets (63 genes) were regulated independently of miRNA variability (Fig. 3c). We found that target genes of each variable group played different roles in cellular functions (Fig. 3d). Targets of intra-variable miRNAs (Q25) are involved in the initiation or perpetuation of an immune response (q < 0.05). Meanwhile, targets of miRNAs that are moderately intra-variable (Q50) were involved in nucleotide metabolic process and vesicular transport. Targets of miRNAs that are moderately intra-stable (Q75) were likely to be located at extracellular matrix and regulated cell morphogenesis, homeostatic process and defense response. Finally, significantly intra-stable miRNAs (Q100) targeted genes that were likely to be located at plasma membrane region and adherens junctions and participated in the organization of extracellular structure and actin cytoskeleton.

Furthermore, we found that, among variable groups, intra-stable miRNAs (Q100) showed the strongest positive correlation between the miRNA abundances in urine and PCa tissues (Spearman's p= 0.46, p = 3.17 x 10 -5 ; Fig. 11a). This correlation was significantly different from that of intra-variable miRNAs (Q25, significance for the comparison between two correlation coefficients = 7.44 x 10 -5 ; Fig. 11 a). Interestingly, the correlation between the miRNA abundances in urine and tumour tissues was increased when miRNAs were more intra-stable (Q100 > Q75 > Q50 > Q25, asymptotic general independence test, one-tailed, p = 0.04, Fig. 11 b). Taken together, these results suggest that urine changes directly reflect tumour changes and variability of miRNA abundances may occur in response to different genetic and biological stimuli. Predictive model that distinguishes PCa patient risk groups

Given that some urine miRNAs show minimal intra-individual variability and regulate distinct functional processes, we hypothesized that urine miRNAs could yield non-invasive biomarkers of disease states. We first examined the abundances of urinary miRNAs in an independent training cohort of 50 high-risk (GS > 7) and 49 low-risk (GS = 6) samples (Fig. 12a). Tested 298 miRNAs showed similar abundances between discovery cohort (n = 22) and training cohort (n = 99, Spearman's p= 0.46, p = 9.52 x 10 -16 ; Fig. 12b). In training cohort, more than half of miRNAs (180/298, 60%) showed lower miRNA abundance in the high-risk group, and 12 miRNAs show differential abundances between two risk groups with statistical significance (p < 0.05, Fig. 12c). Of them, eight (67%) were stable within individual (Q75 and Q100) implying that intra-stable miRNAs make better biomarkers. Indeed, miRNAs that are better at discriminating between risk groups are likely to be stable within individuals (Fig. 12d). Furthermore, in tumour tissue, we found similar variability-related changes of miRNA abundances between two risk groups (Fig. 13a). About half of significantly changed miRNAs in PCa tissues (abundance is changed more than 1 .5-fold with p < 0.001 ) were intra-stable miRNAs (Q100, Fisher's exact test, p = 0.05; Fig. 13b). These findings strongly suggest that the clinical utility of a subset of intra-stable urine miRNAs as biomarkers.

To generate potential clinical utility of urine miRNAs predictive of patient risk-groups, we used a standard signature generation strategy to create a multi-miRNA risk model (Fig. 4a). To select relevant miRNAs that are stable within individuals, we performed 10-times repeated 5- fold cross validation using a training cohort and measured the importance of miRNA to discriminate two risk groups (Fig. 14a). We then subsequently selected the top-ranked miRNAs as features to build predictive models using random forest. The final model comprised the seven intra-stable miRNAs (Table 1 and Fig. 14b). From the cross-validation, we found that the predictive model strongly distinguished the two groups, with median AUC of 0.74 (95% confidence interval, 0.69 - 0.76, green bold line in Fig. 4b). Table 1

To confirm that these seven intra-stable miRNAs had similar prognostic values in different population, we applied the predictive model to an external validation cohort (n = 47) which is composed of 18 high-risk and 29 low-risk samples (Fig 4a and Fig. 15a). As we observed from a training cohort, miRNAs showed similar abundance in both external validation and discovery cohorts (Spearman's p= 0.48, p = 2.34 x 10 -17 ; Fig. 15b), and intra-stable miRNAs are tend to be better to discriminate risk groups (Fig. 15c). As shown in Fig. 4c, intra-stable miRNA-based predictive model achieved an AUC of 0.71 (95% confidence interval, 0.55 - 0.88, purple line in Fig. 4c). Meanwhile, the best signature that could be generated from intra- variable miRNAs (Q25) had an AUC of 0.53 (95% confidence interval, 0.36 - 0.7, yellow line in Fig. 4c), highlighting the importance of temporal stability. To evaluate the null distribution of biomarkers 17 , we generated 10,000 random sets of seven miRNAs and measured their performance. These random models had a median AUC of 0.559 (95% confidence interval, 0.557 - 0.560, grey line in Fig. 4c), and our top miRNA model improved upon this null distribution (p = 0.009; Fig. 4d).

Summary

Alterations in miRNA abundances are associated with PCa progression 12 , and these molecules can be detected in the urine of PCa patients. We show the urinary miRNA profile of an individual remains stable over longitudinal samplings taken over a year apart. Nevertheless, intra-individual variability is observed, and likely results at least in part due to differences in the number of miRNAs detected as well as diet and other epidemiological factors. Recent studies of the placental mRNA transcriptome have shown similar intra- individual variance 18 , as have studies of mRNA and methylation in tissues 19,20 . We provide a new strategy for generating robust fluid biomarkers: quantifying longitudinal intra-patient variability improves accuracy and generalizability. Indeed, the AUC for our non- invasive test (median AUC of training cohort = 0.74 and AUC of validation cohort = 0.71) compares favourably to that from the best tissue-based prognostic assays 21 . Encouragingly, four out of seven miRNA species comprising this assay have been previously functionally characterized in cancer aggressivity 22-26 .

There remains an urgent clinical need for accurate non-invasive diagnostics tests, both in the pre- and post-treatment settings. Prior to treatment, there is a need to avoid the discomfort, expenses and complications of biopsies, which can include infection and sepsis 27 . After treatment, rapid and accurate monitoring of disease relapse is needed. Not surprisingly, then, there is more baseline variability in older PCa patients (Fig. 15). Therefore, incorporating clinical and urine miRNA data could lead to increasingly accurate biomarkers 28,29 . Overall, the urine miRNA transcriptome may be a clinically-important source of biomarkers, especially for genito-urinary diseases. All documents disclosed herein, including those in the following reference list, are incorporated by reference. Although preferred embodiments of the invention have been described herein, it will be understood by those skilled in the art that that the detailed description and the specific examples while indicating preferred embodiments of the invention are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.

REFERENCES

I . Mohler, J., et al. NCCN clinical practice guidelines in oncology: prostate cancer. Journal of the National Comprehensive Cancer Network : JNCCN 8, 162-200 (2010). 2. Siegel, R., Naishadham, D. & Jemal, A. Cancer statistics, 2013. CA: a cancer journal for clinicians 63, 1 1 -30 (2013).

3. Troyer, D.A., Mubiru, J., Leach, R.J. & Naylor, S.L. Promise and challenge: Markers of prostate cancer detection, diagnosis and prognosis. Disease markers 20, 1 17-128 (2004).

4. Loeb, S., et al. Systematic review of complications of prostate biopsy. European urology 64, 876-892 (2013).

5. Cooper, C.S., et al. Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue. Nature genetics 47, 367-372 (2015).

6. Boutros, P.C., et al. Spatial genomic heterogeneity within localized, multifocal prostate cancer. Nature genetics 47, 736-745 (2015).

7. Gundem, G., et al. The evolutionary history of lethal metastatic prostate cancer. Nature 520, 353-357 (2015).

8. Cortese, R., et al. Epigenetic markers of prostate cancer in plasma circulating DNA. Human molecular genetics 21 , 3619-3631 (2012). 9. Danila, D.C., et al. Circulating tumor cell number and prognosis in progressive castration-resistant prostate cancer. Clin Cancer Res 13, 7053-7058 (2007).

10. Kim, Y., et al. Targeted proteomics identifies liquid-biopsy signatures for extracapsular prostate cancer. Nature communications 7, 1 1906 (2016).

I I . Surinova, S., et al. Non-invasive prognostic protein biomarker signatures associated with colorectal cancer. EMBO molecular medicine 7, 1 153-1 165 (2015). 12. Walter, B.A., Valera, V.A., Pinto, P.A. & Merino, M.J. Comprehensive microRNA Profiling of Prostate Cancer. Journal of Cancer 4, 350-357 (2013).

13. Korpela, E., Vesprini, D. & Liu, S.K. MicroRNA in radiotherapy: miRage or miRador? British journal of cancer 112, 777-782 (2015). 14. Chen, X., et al. Characterization of microRNAs in serum: a novel class of biomarkers for diagnosis of cancer and other diseases. Cell research 18, 997-1006 (2008).

15. Korzeniewski, N., et al. Identification of cell-free microRNAs in the urine of patients with prostate cancer. Urologic oncology 33, 16 e17-22 (2015).

16. Cancer Genome Atlas Research, N. The Molecular Taxonomy of Primary Prostate Cancer. Ce// 163, 101 1-1025 (2015).

17. Boutros, P.C., et al. Prognostic gene signatures for non-small-cell lung cancer. Proceedings of the National Academy of Sciences of the United States of America 106, 2824- 2828 (2009).

18. Hughes, D.A., et al. Evaluating intra- and inter-individual variation in the human placental transcriptome. Genome biology 16, 54 (2015).

19. Cowley, M.J., et al. Intra- and inter-individual genetic differences in gene expression. Mammalian genome : official journal of the International Mammalian Genome Society 20, 281- 295 (2009).

20. Turan, N., ef al. Inter- and intra-individual variation in allele-specific DNA methylation and gene expression in children conceived using assisted reproductive technology. PLoS genetics 6, e 1001033 (2010).

21. Lalonde, E., et al. Tumour genomic and microenvironmental heterogeneity for integrated prediction of 5-year biochemical recurrence of prostate cancer: a retrospective cohort study. The Lancet. Oncology 15, 1521 -1532 (2014). 22. Kalimutho, M., et al. Differential expression of miR-144x as a novel fecal-based diagnostic marker for colorectal cancer. Journal of Gastroenterology AS, 1391 -1402 (201 1 ). 23. Rapa, I., et al. Identification of MicroRNAs Differentially Expressed in Lung Carcinoid Subtypes and Progression. Neuroendocrinology 101 , 246-255 (2015).

24. McCann, MJ., et al. Expression profiling indicating low selenium-sensitive microRNA levels linked to cell cycle and cell stress response pathways in the CaCo-2 cell line. British Journal of Nutrition 9, 1212-1221 (2017).

25. Liu, C, et al. Distinct microRNA expression profiles in prostate cancer stem/progenitor cells and tumor-suppressive functions of let-7. Cancer Research 72, 3393-3404 (2012).

26. Schubert, M. ; et al. Distinct microRNA expression profile in prostate cancer patients with early clinical failure and the impact of let-7 as prognostic marker in high-risk prostate cancer. PLos One e65064 (2013).

27. Djavan, B., et al. Safety and morbidity of first and repeat transrectal ultrasound guided prostate needle biopsies: results of a prospective European prostate cancer detection study. The Journal of urology 166, 856-860 (2001 ).

28. Salido-Guadarrama, A.I., et al. Urinary microRNA-based signature improves accuracy of detection of clinically relevant prostate cancer within the prostate-specific antigen grey zone.

Molecular medicine reports 13, 4549-4560 (2016).

29. Bell, E.H., et al. A novel miRNA-based predictive model for biochemical failure following post-prostatectomy salvage radiation therapy. PloS one 10, e01 18745 (2015).