Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MITOCHONDRIAL PNA PROSTATE CANCER MARKER AND RELATED SYSTEMS AND METHODS
Document Type and Number:
WIPO Patent Application WO/2017/205963
Kind Code:
A1
Abstract:
There is described herein a method of prognosing and/or predicting disease progression and/or in subject with prostate cancer, the method comprising: a) providing a sample containing mitochondrial genetic material from prostate cancer cells; b) sequencing the mitochondrial genetic material with respect to at least 1 patient biomarker selected from CSB1, OHR, ATP8 and HV1 (hypervariable region 1); c) comparing the sequence of said patient biomarkers to control or reference biomarkers to determine mitochondrial single nucleotide variations (mtSNVs); and d) determining the a prostate cancer prognosis; wherein a relatively worse outcome is associated with the presence of mtSNVs in CSB1, OHR, ATP8 and a relatively better outcome is associated with the presence of mtSNVs in HV1.

Inventors:
HOPKINS JULIA A (CA)
BOUTROS PAUL (CA)
BRISTOW ROBERT G (CA)
Application Number:
PCT/CA2017/000139
Publication Date:
December 07, 2017
Filing Date:
June 02, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ONTARIO INSTITUTE FOR CANCER RES (OICR) (CA)
UNIV HEALTH NETWORK (CA)
International Classes:
C12Q1/68; G01N33/48; G06F19/22
Foreign References:
US20070190534A12007-08-16
Other References:
PETROS, JA. ET AL.: "mtDNA mutations increase tumorigenicity in prostate cancer", PNAS, vol. 102, no. 3, 18 January 2005 (2005-01-18), pages 719 - 724, XP002524593, ISSN: 0027-8424
Attorney, Agent or Firm:
CHIU, Jung-Kay et al. (CA)
Download PDF:
Claims:
CLAIMS:

1. A method of prognosing and/or predicting disease progression and/or in subject with prostate cancer, the method comprising: a) providing a sample containing mitochondrial genetic material from prostate cancer cells; b) sequencing the mitochondrial genetic material with respect to at least 1 patient biomarker selected from CSB1 , OHR, ATP8 and HV1 (hypervariable region 1); c) comparing the sequence of said patient biomarkers to control or reference biomarkers, preferably from the subject's own matched normal tissue or blood, to determine mitochondrial single nucleotide variations (mtSNVs); and d) determining the a prostate cancer prognosis; wherein a relatively worse outcome is associated with the presence of mtSNVs in CSB1 , OHR, ATP8 and a relatively better outcome is associated with the presence of mtSNVs in HV1.

2. The method according to claim 1 , wherein the at least 1 patient biomarker, is at least 2, 3 or 4 patient biomarkers.

3. The method of any one of claims 1-2, wherein the prostate cancer is localized prostate cancer, preferably non-indolent localized prostate cancer.

4. The method according to any one of claims 1-3, further comprising building a patient biomarker profile from the determined or measured patient biomarkers.

5. The method of any one of claims 1 to 4, wherein the prostate cancer prognosis is the likelihood of disease recurrence, preferably measured by biochemical relapse.

6. The method of ciaim 5, further comprising classifying the patient into a high risk group if the likelihood of disease recurrerice is relatively high or a low risk group if the likelihood of disease recurrence is relatively low.

7. The method of claim 6, further comprising treating the patient with more aggressive therapy if the patient is in the high risk group.

8. The method of claim 7, wherein the more aggressive therapy comprises adjuvant therapy, preferably hormone therapy, chemotherapy or radiotherapy.

9. The method of any one of claims 1-8, wherein the patient biomarkers further comprises C02, C03 and ND4L.

10. The method of claim 9, wherein the at least 1 biomarker is at least 5, 6 or all 7 biomarkers.

11. The method of claim 10, wherein the at least 1 biomarker is all 7 biomarkers.

12. The method of claim 11 , wherein the subject is classified as low risk if there exists mtSNVs in C02, C03, and HV1 and high risk if there exists mtSNVs in ATP8, OHR, ND4L and CSB1.

13. The method of any one of claims 1-12, wherein the mtSNVs are the mtSNVs identified in Table 5.

13. A computer-implemented method of prognosing or predicting disease progression in a patient with prostate cancer, the method comprising: a) receiving, at at least one processor, sequencing data of mitochondrial genetic material from prostate cancer cells of the patient, the sequencing data reflecting at least 1 patient biomarker selected from CSB1 , OHR, ATP8 and HV1 (hypervariable region 1); b) comparing, at the at least one processor, said sequencing data to corresponding control or reference sequences, preferably from the subject's own matched normal tissue or blood, to determine mitochondrial single nucleotide variations (mtSNVs); d) determining, at the at least one processor, a prostate cancer prognosis; wherein a relatively worse outcome is associated with the presence of mtSNVs in CSB1 , OHR, ATP8 and a relatively better outcome is associated with the presence of mtSNVs in HV1.

14, The method according to claim 13, wherein the method further comprises displaying the prostate cancer prognosis on a user display.

15. A computer program product for use in conjunction with a general-purpose computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the method of any one of claims 1 to 12.

16. A computer readable medium having stored thereon a data structure for storing the computer program product according to claim 15.

17. A device for prognosing or predicting disease progression in a patient with prostate cancer, the device comprising: at least one processor; and electronic memory in communication with the at one processor, the electronic memory storing processor-executable code that, when executed at the at least one processor, causes the at least one processor to: a) receive sequencing data of mitochondrial genetic material from prostate cancer cells of the patient, the sequencing data reflecting at least 1 patient biomarker selected from CSB1 , OHR, ATP8 and HV1 (hypervariable region 1); b) compare said sequencing data to corresponding control or reference sequences, preferably from the subject's own matched normal tissue or blood, to determine mitochondrial single nucleotide variations (mtSNVs); and c) determining, at the at least one processor, a prostate cancer prognosis; wherein a relatively worse outcome is associated with the presence of mtSNVs in CSB1 , OHR, ATP8 and a relatively better outcome is associated with the presence of mtSNVs in HV1.

13. The device according to claim 17, wherein the processor further displays the prostate cancer prognosis on a user display.

19. A kit for prognosing or predicting disease progression in a patient with prostate cancer, the kit comprising primer sequences that permit the sequencing of a mitochondrial genome to determine mtSNVs in ATP8, OHR, ND4L and CSB1.

20. The kit of claim 19, wherein the primers further permit sequencing of C02, C03 and

Description:
MITOCHONDRIAL DNA PROSTATE CANCER MARKER

AND RELATED SYSTEMS AND METHODS

CROSS REFERENCE TO RELATED APPLICATIONS This application claims the benefit of priority of U.S. Provisional Patent Application No. 62/344,723 filed June 2, 2016 and incorporated herein by reference in its entirety.

FIELD OF INVENTION

The present disclosure relates generally to a prostate cancer biomarker signature. More particularly, the present disclosure relates to a mitochondrial DNA for the prognosis of prostate cancer outcomes, which can inform treatment decisions and guide therapy.

BACKGROUND

Prostate cancer remains the most prevalent non-skin cancer in men 1 and exhibits a remarkably quiet mutational profile 2 . Exome sequencing studies of localized tumours have revealed few recurrent somatic single nucleotide variants (SNVs) 3,4 , while whole-genome sequencing studies have not identified highly recurrent driver non-coding SNVs or genomic rearrangements (GRs) 5-8 . Although strong mutagenic field effects have been observed 9,10 , their underlying mechanisms and to what extent they drive tumour initiation or progression are unknown. Nevertheless, promising molecular diagnostics predictive of aggressive disease have been created using supervised machine-learning techniques, both from RNA abundance data 11 ,12 and from DNA copy number data 13 , showing strong linkage between molecular features of prostate tumour cells and patient outcome.

Most studies of the prostate cancer genome have focused on mutations occurring in the nuclear genome, and have ignored the other genome of the cell: the mitochondrial genome. Mitochondria are maternally inherited and play critical roles in pathways dysregulated in cancer cells, including energy production, metabolism and apoptosis 14 . While mitochondrial mutations have been observed in several tumour types 15-17 , including prostate cancer 18-22 , their global frequency and clinical impact have not yet been comprehensively characterized. Previous studies have found that mitochondrial mutations are associated with increased serum prostate-specific antigen (PSA) levels 21 , have suggested that mtDNA mutations increase cancer cell tumourigenicity 20 , and indicate that overall mitochondrial mutation burden is correlated with higher Gleason Scores 22 .

SUMMARY OF INVENTION

In an aspect, there is provided a method of prognosing and/or predicting disease progression and/or in subject with prostate cancer, the method comprising: a) providing a sample containing mitochondrial genetic material from prostate cancer cells; b) sequencing the mitochondrial genetic material with respect to at least 1 patient biomarker selected from CSB1 , OHR, ATP8 and HV1 (hypervariable region 1 ); c) comparing the sequence of said patient biomarkers to control or reference biomarkers to determine mitochondrial single nucleotide variations (mtSNVs); and d) determining the a prostate cancer prognosis; wherein a relatively worse outcome is associated with the presence of mtSNVs in CSB1 , OHR, ATP8 and a relatively better outcome is associated with the presence of mtSNVs in HV1 .

In an aspect, there is provided a computer-implemented method of prognosing or predicting disease progression in a patient with prostate cancer, the method comprising: a) receiving, at at least one processor, sequencing data of mitochondrial genetic material from prostate cancer cells of the patient, the sequencing data reflecting at least 1 patient biomarker selected from CSB1 , OHR, ATP8 and HV1 (hypervariable region 1 ); b) comparing, at the at least one processor, said sequencing data to corresponding control or reference sequences to determine mitochondrial single nucleotide variations (mtSNVs); d) determining, at the at least one processor, a prostate cancer prognosis; wherein a relatively worse outcome is associated with the presence of mtSNVs in CSB1 , OHR, ATP8 and a relatively better outcome is associated with the presence of mtSNVs in HV1.

In an aspect, there is provided a computer program product for use in conjunction with a general-purpose computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the method described herein.

In an aspect, there is provided a computer readable medium having stored thereon a data structure for storing the computer program product described herein.

In an aspect, there is provided a device for prognosing or predicting disease progression in a patient with prostate cancer, the device comprising: at least one processor; and electronic memory in communication with the at one processor, the electronic memory storing processor- executable code that, when executed at the at least one processor, causes the at least one processor to: a) receive sequencing data of mitochondrial genetic material from prostate cancer cells of the patient, the sequencing data reflecting at least 1 patient biomarker selected from CSB1 , OHR, ATP8 and HV1 (hypervariable region 1 ); b) compare said sequencing data to corresponding control or reference sequences to determine mitochondrial single nucleotide variations (mtSNVs); and c) determining, at the at least one processor, a prostate cancer prognosis; wherein a relatively worse outcome is associated with the presence of mtSNVs in CSB1 , OHR, ATP8 and a relatively better outcome is associated with the presence of mtSNVs in HV1 . In some embodiments, the processor further displays the prostate cancer prognosis on a user display.

In an aspect, there is provided a kit for prognosing or predicting disease progression in a patient with prostate cancer, the kit comprising primer sequences that permit the sequencing of a mitochondrial genome to determine mtSNVs in ATP8, OHR, ND4L and CSB1 .

Other aspects and features of the present disclosure will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments in conjunction with the accompanying figures.

BRIEF DESCRIPTION OF FIGURES

Embodiments of the present disclosure will now be described, by way of example only, with reference to the attached Figures and Tables.

Figure 1 shows panorama of mitochondrial mutations in prostate cancer, (a) The top panel displays the number of mtSNVs per patient sorted first by T-Category and then by the number of mtSNVs; histogram bars are coloured by the average difference in the heteroplasmy fraction (ΔHF) between tumour and normal samples, light-blue 20-40%, medium-blue 40-60%, dark-blue ≥60%. A heatmap showing the location of each mtSNV on the mitochondrial genome (middle), where the colour of each dot represents ΔHF. The mitochondrial genome is represented on the left. The bottom panel shows the clinical covariates for all 384 patients: Age, Gleason Score, PSA and T-Category. Bottom right: Associations between the covariates and number of mtSNVs. (b) Frequency and distribution of single nucleotide variants (SNVs) within the mitochondrial genome. Mutation frequency normalized by dividing the number of mutations per locus of each patient by (length of the locus (kbp) x MCN). (c) Distribution of mtSNVs across the mitochondrial genome. mtSNVs were fairly evenly distributed across the genome (black bars) and recurrent mutation positions are indicated by the histogram.

Figure 2 shows the difference in mitochondrial mutational frequency and copy number with age. (a) Association of nuclear (green) and mitochondrial (yellow) mutation SNV/Mbp rates with patient age. Mitochondrial mutation rate normalized by MCN. (b) Distribution of mtSNVs in EOPC (red) and LOPC (blue) patients. The histogram indicates presence and frequency of a mtSNV. The most recurrent mtSNV was at position 16093. (c) The fraction of patients by number of mtSNVs, EOPC (gray bars), LOPC (black bars), (d) Tumour mitochondrial copy number (MCN) for both patient age groups. EOPC: n=164; LOPC: n=220.

Figure 3 shows associations between mitochondrial and nuclear genome mutations, (a) Correlations of mitochondrial features with nuclear genome features. The size and colour of the dot represents the Spearman correlation and the background shading represents the p- value. Nuclear features: SNVs, CTXs, INVs, kataegis data available for 172 patients; Chromothripsis: n = 159; CNAs: MYC, NKX3-1 (n = 203); CDH1 , CDKN1 B, CHD1 , PTEN, RB1 , TP53 (n = 194); Methylation: n = 104. Mitochondrial features: 216 patients, (b) Mutations in OHR are associated with CNAs in MYC. Heatmap showing those patients with CNA gains (red) in MYC and those with mtSNVs in OHR, CSB1 , the control region and ATP6, mtSNV colour represents the ΔHF. Since CSB1 is a subregion within OHR, mutations in CSB1 are also considered as OHR mtSNVs, similarly, mtSNVs in OHR are also within the control region (n = 203). The barplot on the right shows the fraction of patients with or without a MYC CNA that have a specific mtSNV. (c) Kaplan-Meier plot of 165 patients with OHR and MYC mutations. Patients were grouped according to whether they had neither MYC CNAs nor OHR SNVs (black line), a MYC CNA or an OHR mtSNV (blue) or had both (red line). The group that had a CNA gain in MYC and an mtSNV in the OHR region had significantly worse outcomes than those without the mutations. Biochemical RFR (Biochemical relapse-free rate).

Figure 4 shows clinical impact of mitochondrial mutations in prostate cancer, (a) The associations of biochemical recurrence (BCR) and 21 mitochondrial features: 19 mitochondrial genes or regions, MCN (median-dichotomized), and mtSNV count (0 vs. 1 +) were calculated using Cox models in 165 LOPC patients. Hazard ratios (HRs) are shown in the middle panel and p-values from the log-rank test in the right panel. The change in the 10 year survival for patients with mutations in each mitochondrial region is indicated (left panel). The colour of the bars indicate the average ΔHF for mtSNVs in that region; light-blue 20-40%, medium-blue 40- 60%, dark-blue >60%. (b) Kaplan-Meier plots of mtSNVs occurring within HV1 and (c) OHR. (d) Kaplan-Meier plot of results of leave-one-out cross-validation predictions (p-value from log- rank test).

Figure 5 shows the experimental design/ experimental workflow for the project. Whole genome sequencing was performed on 333 CPC-GENE and EOPC samples. In addition, 51 publicly available samples with whole genome sequences were included in the dataset and realigned. Mitochondrial reads were extracted and the mitochondrial analysis tool MToolBox was run on the resulting BAM files. Heteroplasmic fractions (HF) were calculated for each nucleotide and only those positions that differed by≥ 0.2 HF between the tumour and matched normal were included in the list of mtSNVs. Figure 6 shows MCN association with clinical variables. Tumour MCN categorized by age (a), T-category (b), and Gleason score (c). EOPC patients are indicated by red dots, LOPC by blue dots, (d) MCN of the matched normal samples show a significant difference between the two age groups.

Figure 7 shows PCR validation confirms predicted mtSNVs. A comparison of chromatograms after PCR amplification and Sanger sequencing from (a) normal and (b) tumour samples from patient CPCG0196 for the mtDNA region: 187-208. Arrow indicates position 195 which has significant heteroplasmy in tumour.

Figure 8 shows mtSNVs chosen for PCR validation. The 25 mtSNVs validated by PCR amplification and Sanger sequencing had varying levels in the difference in heteroplasmy (ΔHF) between tumour and normal samples. Light blue 20-40%, medium blue 40-60% and dark blue≥ 60% ΔHF. mtDNA position on x-axis. Labels in red indicate those mtSNVs that failed PCR validation.

Figure 9 shows frequency of mutations by patient and mitochondrial loci. Heatmap showing the distribution of mutations in the different mitochondrial regions (y-axis) by patients (x-axis). The difference in heteroplasmy fraction between tumour and normal sample (ΔHF) is indicated by colour, white: no mutation; light-blue: 20-40%; blue: 40-60% and dark blue >60%. Patients with more than one mtSNV in a particular mtDNA region are indicated by gray dots. Note: CSB1 and OHR are overlapping regions with the mtDNA Control region, mtSNVs in CSB1 are necessarily mtSNVs in OHR and both are mtSNVs within the Control region. Figure 10 shows distributions of mtSNV fractions by mitochondrial genome loci for EOPC and LOPC patients. The fraction of total mtSNVs per loci for EOPC and LOPC cohorts, those≤ 50 years old (164 patients) and those > 50 year old (220 patients) respectively.

Figure 11 shows correlations between nuclear and mitochondrial features as a function of heteroplasmy fraction. Spearman's p (a) and p-values (b) were calculated using increasing ΔHF cutoffs for mtSNVs for several nuclear and mitochondrial features: MYC CNAs and OHR mtSNVs, the non-coding SNV chr4:39684557 and ND2 mtSNVs, TP53 SNVs and ND5 mtSNVs, and MYC CNAs and RNR2 mtSNVs. (c) The total number of mtSNVs for 384 patients at each ΔHF threshold (unadjusted).

Figure 12 shows prognostic synergy between mitochondrial and nuclear mutations. Kaplan- Meier plots of patients with (a) methylation events in miR129-2 and mtSNVs in HV1 or (b) ND5; (c) NKX3-1 CNAs and OHR mtSNVs; (d) mtSNVs in HV1 and methylation events in TCERG1 L-5' or (e) TUBA3C; and (f) MYC CNAs and HV2 mtSNVs. Patients were grouped according to whether they had no mutations (black line), either a mtSNV or nuclear genomic mutation (blue) or had both (red line). Figure 13 shows signature flow chart and subset signature, (a) Flowchart showing details of the leave-one-out cross validation method, (b) Mitochondrial signature using three genes (HV1 , OHR, C03).

Figure 14 shows mitochondrial signature in intermediate risk patients. Only patients classified as NCCN-intermediate risk were used with the mtSNV signature and were separated into three risk-prediction groups, 'high' (red line), 'intermediate' (black line), and 'low' (blue line). Figure 15 shows suitable configured computer device, and associated communications networks, devices, software and firmware to provide a platform for enabling one or more embodiments as described herein.

Table 1 shows results of PCR validation of 25 mtSNVs. The table includes the mtSNV position, which PCR primers were used to validate, the heteroplasmy fraction (adjusted by cellularity) of the major allele for both tumour and normal and the results of the PCR amplification and Sanger sequencing.

Table 2 shows results from univariate Cox proportional modeling. Hazard ratios were calculated for the different mitochondrial loci individually, the table includes the HR and 95% CI , p-values, the change in 10 year survival and the number of patients with a mtSNV in that loci.

Table 3 shows the sequence and mtDNA targeted region of 20 forward and reverse PCR primers.

Table 4 shows clinical and sequencing data per patient. The data includes patient age at treatment, Gleason Score, T-category, PSA (ng/mL) level, tumour cellularity, number of mtSNVs and the mean coverage depth, mitochondrial copy number for both normal and tumour sample and the aligner used for each wgs. The presence or absence of mutations in each of 20 mitochondrial regions and MYC and NKX3-1 copy number aberrations is indicated for each sample and the amount of DNA that was sent for sequencing for the CPC-GENE samples are included.

Table 5 shows 293 somatic mtSNVs. List of mtSNVs, including heteroplasmic fractions (HF), reference allele nucleotide, identity of tumour and normal major alleles and major allele heteroplasmy fractions (both adjusted and unadjusted by tumour cellularity), tumour and normal coverage at each position, the mtDNA gene or region and pathogenicity scores from MutPred and Polyphen2 obtained from MToolBox.

Table 6 shows mitochondrial mutation recurrence for 41 nuclear genomic features. The table includes the number of patients that had a specific nuclear genome CNA, GR, methylation event or SNV and of those patients the number that also harbours an mtSNV in any of 22 mtDNA features. Table 7 shows mtSNVs with ΔHF values between 0.1 and 0.2. List of 265 mtSNVs, that had ΔHF values greater than 0.1 , but less than 0.2. The table includes heteroplasmic frequencies, reference allele nucleotide, identity of tumour and normal major alleles and major allele heteroplasmy fractions, tumour and normal coverage at each position and the mtDNA gene or region.

DETAILED DESCRIPTION

Nuclear mutations are well-known to drive tumour incidence, aggression and response to therapy. By contrast, the frequency and roles of mutations in the maternally-inherited mitochondrial genome are poorly understood. To characterize the mitochondrial mutation landscape of prostate cancer, we analyzed the mitochondrial genomes of 384 adenocarcinomas of the prostate across all National Comprehensive Cancer Network (NCCN) defined risk categories, including 164 early-onset prostate cancers (EOPCs, age at diagnosis less than 50). We identified a median of one mitochondrial single nucleotide variant (mtSNV) per patient.

We identify recurrent mutational hotspots in the mitochondrial genome, which included recurrently mutated bases or recurrently mutated genes or regions. We also confirm increasing mutation burden with patient age 23"26 , identify interactions between nuclear and mitochondrial mutation profiles and reveal specific mitochondrial mutations enriched in aggressive prostate tumours. For example certain control region mtSNVs co-occur with gain of the MYC oncogene, and these mutations are jointly associated with patient survival.

These data demonstrate frequent mitochondrial mutation in prostate cancer, and suggest interplay between nuclear and mitochondrial mutational profiles in prostate cancer.

The methods described herein are useful for prognosing the outcome of a subject that has, or has had, a cancer associated with the prostate. The cancer may be prostate cancer or a cancer that has metastasized from a cancer of the prostate.

I n an aspect, there is provided a method of prognosing and/or predicting disease progression and/or in subject with prostate cancer, the method comprising: a) providing a sample containing mitochondrial genetic material from prostate cancer cells; b) sequencing the mitochondrial genetic material with respect to at least 1 patient biomarker selected from CSB1 , OHR, ATP8 and HV1 (hypervariable region 1 ); c) comparing the sequence of said patient biomarkers to control or reference biomarkers to determine mitochondrial single nucleotide variations (mtSNVs); and d) determining the a prostate cancer prognosis; wherein a relatively worse outcome is associated with the presence of mtSNVs in CSB1 , OHR, ATP8 and a relatively better outcome is associated with the presence of mtSNVs in HV1 .

The term "subject" as used herein refers to any member of the animal kingdom, preferably a human being and most preferably a human being that has, has had, or is suspected of having prostate cancer. The term "sample" as used herein refers to any fluid (e.g. blood, urine, semen), cell, tumor or tissue sample from a subject which can be assayed for the biomarkers described herein.

The term "genetic material" used herein refers to materials found/originate in the nucleus, mitochondria and cytoplasm, which play a fundamental role in determining the structure and nature of cell substances, and capable of self-propagating and variation. In the context of the present methods, the genetic material is any material from which one can measure the biomakers described herein. The genetic material is preferably DNA.

The term "prognosis" as used herein refers to the prediction of a clinical outcome associated with a disease subtype which is reflected by a reference profile such as a biomarker reference profile. The prognosis provides an indication of disease progression and includes an indication of likelihood of death due to cancer. The prognosis may be a prediction of metastasis, or alternatively disease recurrence. In one embodiment the clinical outcome class includes a better survival group and a worse survival group. The term "prognosing or classifying" as used herein means predicting or identifying the clinical outcome of a subject according to the subject's similarity to a reference profile or biomarker associated with the prognosis. For example, prognosing or classifying comprises a method or process of determining whether an individual has a better or worse survival outcome, or grouping individuals into a better survival group or a worse survival group, or predicting whether or not an individual will respond to therapy.

The term "biomarker profile" as used herein refers to a dataset representing the state or expression level(s) of one or more biomarkers. A biomarker profile may represent one subject, or alternatively a consolidated dataset of a cohort of subjects, for example to establish a reference biomarker profile as a control.

As used herein, the term "control" refers to a specific value or dataset that can be used to prognose or classify the value e.g the measured biomarker or reference biomarker profile obtained from the test sample associated with an outcome. In one embodiment, a dataset may be obtained from samples from a group of subjects known to have cancer having different tumor states and/or healthy individuals. The state or expression data of the biomarkers in the dataset can be used to create a control value that is used in testing samples from new patients. In some embodiments, a cohort of subjects is used to obtain a control dataset. A control cohort patients may be a group of individuals with or without cancer. In a particularly embodiment, the control is a patient's own matched normal profile (e.g. from blood or normal tissue).

As used herein, "overall survival" refers to the percentage of or length of time that people in a study or treatment group are still alive following from either the date of diagnosis or the start of treatment for a disease, such as cancer. In a clinical trial, measuring the overall survival is one way to see how well a new treatment works.

As used herein, "relapse-free survival" refers to, in the case of caner, the percentage of or length of time that people in a study or treatment group survive without any signs or symptoms of that cancer after primary treatment for that cancer. In a clinical trial, measuring the relapse- free survival is one way to see how well a new treatment works. It is defined as any disease recurrence or relapse (local, regional, or distant).

The term "good survival" or "better survival" as used herein refers to an increased chance of survival as compared to patients in the "poor survival" group. For example, the biomarkers of the application can prognose or classify patients into a "good survival group". These patients are at a lower risk of death after surgery and can also be categorized into a "low-risk group".

The term "poor survival" or "worse survival" as used herein refers to an increased risk of disease progression or death as compared to patients in the "good survival" group. For example, biomarkers or genes of the application can prognose or classify patients into a "poor survival group". These patients are at greater risk of death or adverse reaction from disease or surgery, treatment for the disease or other causes, and can also be categorized into a "high- risk group".

A person skilled in the art would understand how to implement differing cut-offs for good survival vs. worse survival, depending on the clinical outcome one is predicting and the biomarkers being assayed.

In some embodiments, the at least 1 patient biomarker, is at least 2, 3 or 4 patient biomarkers.

In some embodiments, the prostate cancer is localized prostate cancer, preferably non- indoient localized prostate cancer.

In some embodiments, the method further comprises building a patient biomarker profile from the determined or measured patient biomarkers.

In some embodiments, the prostate cancer prognosis is the likelihood of disease recurrence, preferably measured by biochemical relapse.

In some embodiments, the method further comprises classifying the patient into a high risk group if the likelihood of disease recurrence is relatively high or a low risk group if the likelihood of disease recurrence is relatively low.

In some embodiments, the method further comprises treating the patient with more aggressive therapy if the patient is in the high risk group. Preferably, the more aggressive therapy comprises adjuvant therapy, preferably hormone therapy, chemotherapy or radiotherapy.

In some embodiments, the patient biomarkers further comprise C02, C03 and ND4L. Preferably, the at least 1 biomarker is at least 5, 6 or all 7 biomarkers. Further preferably, the at least 1 biomarker is all 7 biomarkers.

In some embodiments, the subject is classified as low risk if there exists mtSNVs in C02, C03, and HV1 and high risk if there exists mtSNVs in ATP8, OHR, ND4L and CSB1 .

In some embodiments, the mtSNVs are the mtSNVs identified in Table 5. [NTD: Please confirm]

The present system and method may be practiced in various embodiments. A suitably configured computer device, and associated communications networks, devices, software and firmware may provide a platform for enabling one or more embodiments as described above. By way of example, Fig. 15 shows a generic computer device 100 that may include a central processing unit ("CPU") 102 connected to a storage unit 104 and to a random access memory 106. The CPU 102 may process an operating system 101 , application program 103, and data 123. The operating system 101 , application program 103, and data 123 may be stored in storage unit 104 and loaded into memory 106, as may be required. Computer device 100 may further include a graphics processing unit (GPU) 122 which is operatively connected to CPU 102 and to memory 106 to offload intensive image processing calculations from CPU 102 and run these calculations in parallel with CPU 102. An operator 107 may interact with the computer device 100 using a video display 108 connected by a video interface 105, and various input/output devices such as a keyboard 1 15, mouse 1 12, and disk drive or solid state drive 1 14 connected by an I/O interface 109. In known manner, the mouse 1 12 may be configured to control movement of a cursor in the video display 108, and to operate various graphical user interface (GUI) controls appearing in the video display 108 with a mouse button. The disk drive or solid state drive 1 14 may be configured to accept computer readable media 1 16. The computer device 100 may form part of a network via a network interface 1 1 1 , allowing the computer device 100 to communicate with other suitably configured data processing systems (not shown). One or more different types of sensors 135 may be used to receive input from various sources. The present system and method may be practiced on virtually any manner of computer device including a desktop computer, laptop computer, tablet computer or wireless handheld. The present system and method may also be implemented as a computer-readable/useable medium that includes computer program code to enable one or more computer devices to implement each of the various process steps in a method in accordance with the present invention. In case of more than computer devices performing the entire operation, the computer devices are networked to distribute the various steps of the operation. It is understood that the terms computer-readable medium or computer useable medium comprises one or more of any type of physical embodiment of the program code. In particular, the computer-readable/useable medium can comprise program code embodied on one or more portable storage articles of manufacture (e.g. an optical disc, a magnetic disk, a tape, etc.), on one or more data storage portioned of a computing device, such as memory associated with a computer and/or a storage system. In an aspect, there is provided a computer-implemented method of prognosing or predicting disease progression in a patient with prostate cancer, the method comprising: a) receiving, at at least one processor, sequencing data of mitochondrial genetic material from prostate cancer cells of the patient, the sequencing data reflecting at least 1 patient biomarker selected from CSB1 , OHR, ATP8 and HV1 (hypervariable region 1 ); b) comparing, at the at least one processor, said sequencing data to corresponding control or reference sequences to determine mitochondrial single nucleotide variations (mtSNVs); d) determining, at the at least one processor, a prostate cancer prognosis; wherein a relatively worse outcome is associated with the presence of mtSNVs in CSB1 , OHR, ATP8 and a relatively better outcome is associated with the presence of mtSNVs in HV1.

In some embodiments, the method further comprises displaying the prostate cancer prognosis on a user display.

In an aspect, there is provided a computer program product for use in conjunction with a general-purpose computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the method described herein.

In an aspect, there is provided a computer readable medium having stored thereon a data structure for storing the computer program product described herein.

In an aspect, there is provided a device for prognosing or predicting disease progression in a patient with prostate cancer, the device comprising: at least one processor; and electronic memory in communication with the at one processor, the electronic memory storing processor- executable code that, when executed at the at least one processor, causes the at least one processor to: a) receive sequencing data of mitochondrial genetic material from prostate cancer cells of the patient, the sequencing data reflecting at least 1 patient biomarker selected from CSB1 , OHR, ATP8 and HV1 (hypervariable region 1 ); b) compare said sequencing data to corresponding control or reference sequences to determine mitochondrial single nucleotide variations (mtSNVs); and c) determining, at the at least one processor, a prostate cancer prognosis; wherein a relatively worse outcome is associated with the presence of mtSNVs in CSB1 , OHR, ATP8 and a relatively better outcome is associated with the presence of mtSNVs in HV1 . In some embodiments, the processor further displays the prostate cancer prognosis on a user display.

As used herein, "processor" may be any type of processor, such as, for example, any type of general-purpose microprocessor or microcontroller (e.g., an Intel™ x86, PowerPC™, ARM™ processor, or the like), a digital signal processing (DSP) processor, an integrated circuit, a field programmable gate array (FPGA), or any combination thereof.

As used herein "memory" may include a suitable combination of any type of computer memory that is located either internally or externally such as, for example, random-access memory (RAM), read-only memory (ROM), compact disc read-only memory (CDROM), electro-optical memory, magneto-optical memory, erasable programmable read-only memory (EPROM), and electrically-erasable programmable read-only memory (EEPROM), or the like. Portions of memory 102 may be organized using a conventional filesystem, controlled and administered by an operating system governing overall operation of a device.

As used herein, "computer readable storage medium" (also referred to as a machine-readable medium, a processor-readable medium, or a computer usable medium having a computer- readable program code embodied therein) is a medium capable of storing data in a format readable by a computer or machine. The machine-readable medium can be any suitable tangible, non-transitory medium, including magnetic, optical, or electrical storage medium including a diskette, compact disk read only memory (CD-ROM), memory device (volatile or non-volatile), or similar storage mechanism. The computer readable storage medium can contain various sets of instructions, code sequences, configuration information, or other data, which, when executed, cause a processor to perform steps in a method according to an embodiment of the disclosure. Those of ordinary skill in the art will appreciate that other instructions and operations necessary to implement the described implementations can also be stored on the computer readable storage medium. The instructions stored on the computer readable storage medium can be executed by a processor or other suitable processing device, and can interface with circuitry to perform the described tasks.

As used herein, "data structure" a particular way of organizing data in a computer so that it can be used efficiently. Data structures can implement one or more particular abstract data types (ADT), which specify the operations that can be performed on a data structure and the computational complexity of those operations. In comparison, a data structure is a concrete implementation of the specification provided by an ADT.

In an aspect, there is provided a kit for prognosing or predicting disease progression in a patient with prostate cancer, the kit comprising primer sequences that permit the sequencing of a mitochondrial genome to determine mtSNVs in ATP8, OHR, ND4L and CSB1.

In some embodiments, the primers further permit sequencing of C02, C03 and ND4L.

The above listed aspects and/or embodiments may be combined in various combinations as appreciated by a person of skill in the art. The advantages of the present disclosure are further illustrated by the following examples. The examples and their particular details set forth herein are presented for illustration only and should not be construed as a limitation on the claims of the present invention.

EXAMPLES

Methods/Materials Patient Cohort

We collected 384 prostate cancer tumour samples with matched normal samples (381 blood, 3 tissue-derived). The samples had Gleason Scores ranging from 3+3 to 5+4. The 165 patients from the Canadian Prostate Cancer Genome Network (CPC-GENE) underwent either radical prostatectomy or image-guided radiotherapy as detailed in Fraser et al. (2017) 7 . In addition, 51 samples from publicly available datasets were included in the somatic mutation analysis and correlations with clinical variables, age, Gleason Score and T-category 4-6,8 , three of TCGA samples had tissue-derived normal samples as opposed to blood-normals. All samples were manually macro-dissected and were assessed by an expert urological pathologist to have tumour cellularity >70%. All tumour specimens were taken from the index lesion. Publicly available tumour tissues were obtained and used following University Health Network Research Ethics Board (REB) approved study protocols (UHN 06-0822-CE, UHN 1 1 -0024-CE, CHUQ 2012-913:H12-03-192). Local REB and ICGC guidelines were used to collect whole blood and informed consent from CPC-GENE patients at the time of clinical follow-up. EOPC patient cohort and sample processing

We collected 168 tumour samples from EOPC patients. Informed consent and an ethical vote (institutional reviewing board) were obtained according to the current ICGC guidelines. The patients did not receive any neo-adjuvant radiotherapy, androgen deprivation therapy, or chemotherapy prior to the surgical removal of tumor tissue. Tumor samples and a normal blood control were frozen at -20 °C and subsequently stored at -80 °C.

EOPC DNA Library Preparation, Sequencing and Alignment

DNA library preparation and whole-genome sequencing was performed on lllumina sequencers with the raw length of the reads displaying a median of 101 bp. Reads were aligned to the hg19 reference genome using BWA-MEM version 0.7.8-r455 [arXiv: 1303.3997v2] and duplicates were removed using Picard (http://broadinstitute.github.io/picard). Mitochondrial reads were extracted using SAMtools 39 .

Nuclear mutation calling

Recurrent nuclear genomic features were obtained from Fraser et al. (2017) 7 , which included five recurrent coding SNVs from commonly mutated genes in prostate cancer; the six most recurrent noncoding SNVs; CNAs from eight commonly mutated prostate cancer genes; the 10 GRs included the five most recurrent translocations and the four most recurrent inversions plus a recurrent inversion containing the PTEN gene; the TMPRSS-ERG fusion; presence or absence of kataegis events; chromothripsis; 3 metrics of mutation density (median dichotomized PGA estimates, number of SNVs and number of GRs); six methylation events were identified through univariate CoxPH modelling as associated with disease progression. Nuclear somatic single nucleotide variants were predicted by SomaticSniper (v1 .0.2) 38 , (n = 172 samples) setting the mapping quality threshold to 1 , otherwise with default parameters. Nuclear SNVs were filtered using SAMtools (vO.1 .6) 39 and SomaticSniper (v1 .0.2) provided filters, as well as a mapping quality filter and false positive filter from bam-readcount (downloaded Jan 10 th , 2014). Nuclear SNVs were then annotated by ANNOVAR (v2015-06- 17) 40 . The nuclear mutation rate was obtained by dividing the number of SNVs after filtering by the number of callable loci. Copy number aberrations were analyzed by Affymetrix OncoScan microarrays (n = 194) and methylation data was generated by lllumina Infinium Human Methylation 450k BeadChip kits (n = 104). Genomic rearrangements were called using Delly (vO.5.5) 41 (n = 172). Chromothripsis scores (n = 159) were calculated by Shatterproof (v0.14) 42 and subsequently dichotomized with a 0.517 threshold. Sample processing, whole genome sequencing and whole genome sequencing data analysis are as described in detail by Fraser et al. (2017) 7 . Mitochondrial SNV Calling

Reads mapped to the mitochondria during whole genome alignment were extracted using BAMQL (v1 .1 ) 43 using the command: bamql -I -o out_mito_reads.bam -f input_wgs.bam '(chr(M) & mate_chr(M)) | (chr(Y) & after(59000000) & mate_chr(M))'; The second part of the query statement collects reads where one of the pair mapped to chrM and the other unmapped which in our data was also assigned to an unresolved region in chrY.

The output files from BAMQL were used as input bam files for the mitochondrial genome analysis program MToolBox (v0.2.2) 44 . The versions of the various system requirements were: Python v2.7.2; gmap v.2013-07-20 45 ; samtools vO.1 .18 39 ; java v1 .7.0_72; picard v1 .92 (http://broadinstitute.github.io/picard); muscle V3.8.31 46 . We used default parameters for MToolBox and used the default RSRS 47 as the reference genome. The default parameters include a minimum base quality score of 25, samples that failed the MToolBox program using default parameters, but successfully completed at a lower base quality parameter setting of 20, were nonetheless removed from the analysis. MToolBox_v0.2.2/MToolBox.sh -i bam -r RSRS -M -I -m '-D genomejndex/ -H hg19RSRS -M chrRSRS' -a '-r genomejasta/ -F -P -C

The predicted mitochondrial genome for each tumour sample and the number of reads supporting each base were compared to the corresponding normal sample if available, from each patient. Positions where the absolute difference in heteroplasmy fraction (ΔHF) was greater than 0.2 were considered to be mitochondrial SNVs (mtSNVs). While this does not preclude the possibility of tissue-specific heteroplasmy being mislabeled as somatic mutations, this allowed us to identify somatic variants as well as ignore those positions that could be called population variants, reducing the number of potentially false positive variant calls. Heteroplasmy fraction estimates were adjusted to account for tumour cellularity using cellularity values calculated by qpure Tumour HF values were adjusted with the following equation:

Tumour

If there were no cellularity values available we assumed cellularity = 1 ,0. Those values of Tumour H F ce n ut ahty that were less than zero or greater than one were rounded to zero and one respectively.

In the mitochondrial reference genome there are three positions encoded as 'N' to preserve historical numbering, (523, 524 and 3107), in addition position 310 is located within a homopolymer region and is a common variant 28 . These four positions can result in misalignments 49 , therefore they were filtered out of our analyses, as in previous studies 50 . We also filtered out those positions with relatively low coverage of less than 100 read depth. Positions of mitochondrial genes and subregions of the noncoding control region were obtained from http://www.mitomap.org. Pathogenicity scores from MutPred 51 , PolyPhen-2 52 and SiteVar 53 were obtained from the MToolBox output. Mutations in tRNA genes were compared to the Mamit-tRNA database 54 .

We chose to a threshold of 0.2 ΔHF in order to balance removing false positives without excluding a large number of mtSNVs unnecessarily (Fig. 11 c). As part of this assessment, we looked at four correlations between different nuclear and mitochondrial features using mtSNVs assessed at increasing AH F cutoffs from 0.1 -0.6 (Fig. 11 , Table 7). In each of these four cases, raising AH F from 0.1 to 0.2 led to increasing correlation coefficients between the two features. Three of the correlations that were not significant at 0.1 ΔHF, became significant at higher AH F, suggesting that some mtSNVs with lower H F values may be either false positives or low-level tissue specific heteroplasmies. Any further increases in ΔHF had differing effects on the four correlations. mtDNA copy number

Mitochondrial copy number per cell (MCN) was calculated using the equation: (mitochondrial coverage/nuclear coverage) x2, using nuclear coverage data from the whole genome alignment 7 and mitochondrial coverage data calculated by bedtools genomecov (v2.24.0) 55 . The mitochondrial mutation rate per megabase DNA was calculated by dividing the number of mtSNVs by the tumour MCN multiplied by the number of callable bases, 16565, accounting for the 4 positions that were removed.

Survival and Statistical Analyses

The mtSNV data were compared to patient clinical features in the R statistical environment (v3.2.3). Binomial regression (age, PSA) and Chi-square tests (T-category, Gleason Score) were used to identify associations between the clinical variables and mtSNVs for all 384 patients. Survival analyses were performed on 165 patients due to survival data availability. Cox proportional hazards models were used to calculate HRs for mtSNVs in the different mitochondrial features such as genes or MCN, with verification of the proportional hazards assumption. The mitochondrial feature MT-ND4L was removed from this analysis as only one patient in the 165 cohort had a mtSNV in this gene. Change in 10 year percent survival was calculated using survival rates. Kaplan-Meier plots were created comparing biochemical recurrence with the presence or absence of mutations in certain mitochondrial loci, (genes or noncoding regions) or median-dichotomized tumour MCN. Nuclear genomic features were chosen based on recurrence in a previous prostate cancer study 7 . Data was visualized using the R-environment and lattice (vO.20-31 ), latticeExtra (vO.6-26) and circos (vO.67-4) 56 . Associations between nuclear and mitochondrial genome features were calculated using Spearman's correlation.

PCR Validation Single nucleotide variants in mitochondrial DNA were validated by Sanger re-sequencing, as previously reported 7 . Briefly, 10ng of total genomic DNA (including mitochondrial DNA) was subjected to PCR amplification using primer pairs flanking SNVs identified from whole-genome sequencing (Table 3). Sequence data surrounding the region of interest was obtained from http://www.mitomap.org/bin/view.pl/MITOMAP/HumanMitoSeq. The amplicon sequence generated by the in silico PCR was then entered into the NCBI genome BLAST search engine to identify non-mitochondrial sequences that were similar. This was done to ensure that there were some differences between the designed primers and nuclear sequences, as well as to identify any sequence regions that could confound downstream analyses. The genome used for the BLAST search was GRCh38.p2 reference assembly top-level. These web pages were used on August 20 th and 21 st , 2015 and verified on September 13 th , 2016. PCR reactions were purified using the QIAquick PCR purification kit (Qiagen, Toronto, Canada). Sanger re- sequencing was performed using amplicon-specific primers on an ABI 3730XL capillary electrophoresis instrument (Thermo Fisher Scientific, Burlington, Canada) at The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Canada.

Data Availability Statement Sequencing data is available at the European Genotype-Phenotype Archive (EGA) repository under accession EGAS00001001782.

Results

Mitochondrial Genome Sequence Analysis

We collected 384 tumours from patients with localized prostate cancer, comprising 164 EOPCs and 220 late-onset prostate cancers (LOPC; Table 4; Fig. 5). The LOPC patients represented the three NCCN risk groups: 19 low-risk, 151 intermediate-risk and 36 high-risk. The average sequencing depth of the mitochondrial genome was 13,577x, allowing extremely sensitive mutation detection. This cohort does not include any nuclear whole genome duplication events, as demonstrated by SNP microarray analysis 7 . We first evaluated the mitochondrial copy number (MCN) for each sample from the sequencing coverage of the mitochondrial and nuclear genomes. MCN ranged from 75 to 1405 (mean: 431 ) across the cohort, and was strongly associated with age (linear model, p = 1 .67 x 10 '26 ), as well with clinical indices such as T-category (AN OVA, p = 6.01 x 10 -3 ) and Gleason Score (GS; ANOVA, p = 6.46 x 10 ~3 ; Fig. 6). We next conservatively identified mitochondrial SNVs (mtSNVs) as those positions that had an absolute difference in their heteroplasmy fraction (ΔHF) between purity-adjusted tumour and paired-normal samples of at least 0.20 (Fig. 5). Because the number of identified mtSNVs is dependent on the heteroplasmy fraction threshold, we chose to balance false positives and false negatives with an intermediate value. There were 293 mtSNVs across all patients, with 47.4% of tumours (182/384) harbouring at least one and 6.8% (26/384) harbouring three or more Figure 1 a; Table 5). Proportions of patients with 0, 1 , 2, > 3 mtSNVs are 202/384 (52.6%), 1 10/384 (28.6%), 46/384 (12.0%) and 26/384 (6.8%), respectively. The number of patients with no mtSNVs was greater than expected by chance, suggesting significant variability in mtSNV burden (permutation test; p = 3.4 x 10 -5 ). Tumours with a larger number of mitochondria were more likely to have an mtSNV (generalized linear model (GLM) family binomial; p = 8.38 x 10 -7 ). mtSNVs were associated with tumour size (T-category; χ 2 test; p = 2.47 x 10 -4 ), but not other clinical prognostic indices like pre-treatment PSA and GS (Figure 1 a). PCR followed by Sanger sequencing validated 18/25 predicted mtSNVs (Fig. 7; Fig. 8; Table 1 ), suggesting precision of -75%, comparable to somatic indel detection accuracy 27 . Frequently Mutated Mitochondrial Loci

The noncoding control region of the mitochondria (mtDNA positions: 1 -576 and 16024-16569), was the most frequently mutated region with 15.4% (59/384) of tumours harbouring mutations in that region (Table 5; Fig. 9). The control region comprises several elements, including the heavy- and light-strand promoters, as well as the origins of replication for the heavy strand (OHR), two hypervariable regions (HV1 , HV2) and three conserved sequence blocks (CSB1 , CSB2, CSB3). All functional locations were defined from mitmap.org 28 . Of these regions, HV1 was the most frequently mutated (mtDNA positions: 16024-16383). Overall, mutation rates were generally consistent across regions of the mitochondrial genome (Figure 1 b).

There were 157 mtSNVs in the 13 protein coding genes, 82% (129/157) of which were nonsynonymous, including 6 premature stop codons and two mutated stop codons. The most frequently mutated protein coding gene was ND5 (30/157). We identified 21 specific positions mutated in at least two patients (Figure 1 c): ten within the control region, eight in protein- coding regions and three in rRNA subunits. Of the coding mutations, seven were non- synonymous and one introduced a premature stop codon. In the control region, position 16093 - a common site of tissue specific heteroplasmy 29,30 - was the most frequently mutated position (nine patients; Figure 1 c). Of protein-coding genes, ND1 was frequently mutated, with two patients having G3946A mutations (ΔHF: 0.63, 0.24), leading to a structure-disrupting E214K amino acid change, resulting in a reduction of complex assembly 31 . A second mutation, G4142A was found in two patients (ΔHF: 1 .0, 0.21 ; R279Q) and a third mutation, G3842A, in three patients (ΔHF: 0.45, 0.21 , 0.95; premature stop codon).

There were 22 mutations within mitochondrial tRNA genes, and eight of these were located within anticodon stems. In C01 there were non-synonymous mutations at G5910A (A2T in one patient; ΔHF: 0.84) and T6664C (I254T in one patient; ΔHF: 0.46), two amino acids previously observed to be mutated in prostate cancer cells 20 . Two patients with mutations at position 6419 were detected within the C01 gene (ΔHF: 0.2, 0.23), although these two showed heteroplasmy within the normal samples and homoplasmy in the tumour suggesting that these mtSNVs represent either tissue-specific heteroplasmy or mutations that have gone to fixation in the tumour. Overall C01 was mutated in 4.7% (18/384) of patients, markedly lower than the 1 1 % rate previously reported 20 .

Age effect on the distribution of mtSNVs in prostate cancer As expected, the occurrence of mitochondrial mutations was strongly associated with patient age (GLM family binomial; p = 5.88 x 10 -9 ; Figure 1 a) 23-26 . The mitochondrial mutation rate was significantly lower than that of the nuclear genome mutation rate (Figure 2a; p = 0.040, F- test), which may in part be explained by differential mutation detection accuracy in the two genomes. To further understand the association of mtSNVs with age, we separated patients into those 50 and under years of age (EOPC; n = 164) and those over 50 (LOPC; n = 220). The median ages of the EOPC and LOPC cohorts were 47 and 63.5 years old, respectively. Patients with EOPC were significantly more likely to have no mitochondrial mutations, 1 17/164 (71 .3%), than those with LOPC (85/220, 38.6%; p = 4.22 x 10 -10 , proportion test; Figure 2b, c). Despite this difference in mutational load, the two groups have similar distributions of mtSNVs across the mitochondrial genome, with the highest fraction of mtSNVs within the control region (Fig. 10). EOPC patients had about 224 fewer copies of the mitochondria than LOPC patients (Mann-Whitney test; p = 4.56 x 10 -30 ; Figure 2d). This effect was inverted in the normal samples with EOPC patients having 86 more copies (Mann-Whitney; p = 1 .54 x 10 -14 ; Fig. 6d), consistent with the decline in lymphocyte MCN with age 33 . Associations between mtSNVs and nuclear genomic mutations

Intriguingly, mutations in the large rRNA subunit (RNR2) were significantly correlated with mutations in the mitochondrial gene ND4 (Spearman's p = 0.19; p = 0.00015), suggesting to us an inter-play between different mutational types. To rigorously assess this phenomenon, we studied mutational associations between the nuclear and mitochondrial genomes. We exploited a set of 40 candidate nuclear somatic driver events recently identified through recurrence analyses, including five measures of mutation density, six methylation events, six non-coding SNVs, five coding SNVs, five measures of mutational density, ten genomic rearrangements and eight copy number aberrations (CNAs) 7 . The SNVs included recurrent coding SNVs in genes that are commonly mutated in prostate cancer, as well as the six most recurrent non-coding SNVs. To characterize per-region mtSNVs, we defined 22 mutational features representing the broad functional aspects of the mitochondria, 13 protein coding genes, 2 rRNAs, tRNAs (treated as one group), the control region and 3 subregions within the control region, along with mtSNV number and MCN. For each of the nuclear features, we evaluated their correlation to 22 mitochondrial mutational features in 194 LOPCs with nuclear mutational data (Table 6). We detected multiple nuclear-mitochondrial mutational associations (Figure 3a). For example, SNVs in FOXA1 were significantly positively correlated with multiple mitochondrial features, as were SNVs in MED12. Nuclear-mitochondrial correlations were weakly dependent on the ΔHF threshold used to call mtSNVs (Fig. 11 , Table 7).

One prominent nuclear-mitochondrial mutational interactions was co-occurrence of MYC copy number gain and mtSNVs within the OHR (Figure 3b). Mutations within the OHR may dysregulate mtDNA replication, while MYC induces mitochondrial biogenesis by activating genes required for mitochondrial function 34 and influences metabolic plasticity in cancer stem cells 35 . Risk of biochemical failure (BCR) after primary definitive treatment by radiotherapy or surgery was significantly higher for patients whose tumours harboured both MYC CNAs and OHR mtSNVs relative to those with neither or one of these two mutations, suggesting a synergistic mitochondrial-nuclear effect on disease aggression (Figure 3c). Several other similar instances of apparent synergistic mitochondrial-nuclear effects on disease aggression were observed (Fig. 12a-e), suggesting that this is a common phenomenon in prostate cancer. While we have used the region defined as OHR (mtDNA positions: 1 10-441) as the mitochondrial feature, this subregion of the Control Region significantly overlaps with a region defined as HV2 (mtDNA positions: 57-372). We confirmed that HV2 mtSNVs show the same synergistic effect with MYC CNAs as mtSNVs defined as OHR (Fig. 12f). Interestingly, MYC CNAs were more common in LOPCs (14.5%; 29/200) than in EOPCs (8.4%; 10/1 19) making it impossible to assess if the same nuclear-mitochondrial interactions occur in both disease states. Further evaluation of changes in nuclear-mitochondrial associations across disease progression will be revealing.

Clinical impact of mtSNVs in prostate cancer

The recurrence of mitochondrial mutations in specific regulatory regions and their association with prognostic nuclear mutations strongly suggested their ability to drive disease aggression. We therefore systematically evaluated the association of individual mitochondrial somatic mutational features with disease aggression in 165 patients with clinical follow-up using Cox proportional hazards modeling. Of our 22 mitochondrial mutational features (Figure 3a), four were significantly associated with biochemical relapse rates (Figure 4a; Table 2): mutations in CSB1 , OHR, ATP8 and HV1 . We should note that MT-ND4L was not included in this analysis as only one patient of the 165 had a mtSNV in this gene. To evaluate if these mutations were independent prognostic variables, we employed multivariable modeling to adjust for age, pre- treatment PSA, T-category and GS. After adjustment, mtSNVs in HV1 were associated with better patient outcome (Figure 4b; Hazard Ratio, HR = 0.28, 95% CI = 0.08-0.9, p = 0.032, Wald test), while mtSNVs in OHR were associated with significantly worse patient outcome (Figure 4c; HR = 2.47, 95% CI = 1.13-5.38, p = 0.023, Wald test).

These data suggested that mtSNVs might comprise a novel way to predict patient outcome. We therefore assessed the ability of a multi-mtSNV signature to identify patients at elevated risk for biochemical failure (who therefore might benefit from treatment intensification) and those at low risk (who might therefore be appropriate for surveillance protocols). Using leave- one-out cross-validation and univariate feature-selection, we created a three-class signature that separated patients into three distinct risk groups for biochemical failure (Fig. 13a). The signature identified both patients at elevated risk (Figure 4d; HR = 3.41 , 95% CI = 1.71 -6.80, p = 0.0005, Wald test) and patients at low-risk (HR = 0.23, 95% CI = 0.08-0.65, p = 0.005, Waid test). These effects are independent of clinical features: when we considered only the clinically-homogeneous NCCN intermediate risk group, the same mtSNV signature again separated three groups with distinct risk profiles (Fig. 14). The cross-validation method identified seven genes (C02, C03, ATP8, HV1 , OHR, CSB1 , ND4L) as informative for classification. Patients with mtSNVs in (C02, C03, HV1 ) were classified as low-risk and patients with mtSNVs in {ATP8, OHR, ND4L, CSB1 ) were classified as high-risk. To show that this does not lead to over-fitting, we chose the three most frequently mutated regions of the seven (C03, HV1 , OHR) which also clearly separated patients into three groups (Fig. 13b).

Discussion The mitochondrial mutational landscape of cancer has been relatively unexplored. Previous work has shown a large-scale mtDNA deletion has predictive value in the prostate biopsy outcomes 36 , suggesting the feasibility of mtDNA-based molecular tests. We identify a large number of mtSNVs in localized prostate cancer. These mutations show complex interplay with nuclear mutational characteristics, and appear to work together to drive tumour aggressiveness. Mitochondrial mutations also show associations with risk of biochemical relapse. Interestingly, mtSNVs within the control region can have conflicting outcomes, however when separated into the different noncoding subregions (HV1 , OHR) we found that certain loci were associated with better outcomes and others with worse outcomes. The overlap of the OHR and HV2 within the control region and their association with MYC CNAs highlight the need for better understanding of the functions of the control region 37 . In future, treating the control region as distinct regulatory regions may provide further insight into the roles of these regions, as well as any contribution they may make towards tumour aggression. We note that the number of pairs of nuclear-mitochondria! mutational features tested may elevate false-positive rates, and it will be key to perform validation studies in larger cohorts to verify their effect-sizes and biological significance.

The differences observed in the mitochondrial mutational profiles of EOPC and LOPC patients show a need to better understand the association between mtSNVs and aging and how this may relate to the development of prostate cancer. While the mitochondrial copy number of matched-normal samples decreases with patient age, a previously observed trend 33 , tumour MCN estimates were significantly higher in older patients which could account for the higher frequency of mtSNVs in these patients. However, since the majority of the samples of each age group come from different research centres, this striking difference in tumour MCN will require further investigation to exclude any confounding effects. Further studies will be needed to assess when different mtSNVs occur during tumour evolution, their timing relative to common nuclear mutations and the effects of these mutations on mitochondrial function. This will more clearly identify the mitochondrial mutations that are important for mitochondrial-nuclear communication and how they may interact to drive tumour formation. Localized prostate cancer remains the most diagnosed non-skin cancer in men, and identification of aggressive disease remains an urgent clinical dilemma. Addition of mtSNVs to prognostic biomarkers may be an effective way of improving prediction of patient outcome, supporting triage of patients with low-risk disease to surveillance protocols and with high-risk disease to adjuvant therapy regimens.

All documents disclosed herein, including those in the following reference list, are incorporated by reference. Although preferred embodiments of the invention have been described herein, it will be understood by those skilled in the art that that the detailed description and the specific examples while indicating preferred embodiments of the invention are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.

REFERENCES

1 . Lozano, R. ei al. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet 380, 2095-2128 (2012).

2. Barbieri, C. E. ei al. The mutational landscape of prostate cancer. Eur. Urol. 64, 567-576 (2013).

3. Barbieri, C. E. et al. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat. Genet. 44, 685-689 (2012).

4. Cancer Genome Atlas Research Network. The Molecular Taxonomy of Primary Prostate Cancer. Ce// 163, 101 1-1025 (2015).

5. Baca, S. C. ei al. Punctuated evolution of prostate cancer genomes. Cell 153, 666-677 (2013). . Berger, M. F. et al. The genomic complexity of primary human prostate cancer. Nature 470, 214-220 (201 1 ). . Fraser, M. et al. Genomic hallmarks of localized, non-indolent prostate cancer. Nature 541 , 359-364 (2017). . Weischenfeldt, J. et al. Integrative genomic analyses reveal an androgen-driven somatic alteration landscape in early-onset prostate cancer. Cancer Cell 23, 159-170 (2013). . Boutros, P. C. et al. Spatial genomic heterogeneity within localized, multifocal prostate cancer. Nat. Genet. 47, 736-745 (2015). 0. Cooper, C. S. et al. Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue. Nat. Genet. 47, 367-372 (2015). 1 . Erho, N. et al. Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy. PloS One 8, e66855 (2013). Wu, C.-L. et al. Development and validation of a 32-gene prognostic index for prostate cancer progression. Proc. Natl. Acad. Sci. U. S. A. 1 10, 6121-6126 (2013). Lalonde, E. et al. Tumour genomic and microenvironmental heterogeneity for integrated prediction of 5-year biochemical recurrence of prostate cancer: a retrospective cohort study. Lancet Oncol. 15, 1521-1532 (2014). Wallace, D. C. Mitochondria and cancer. Nat. Rev. Cancer Ί 2, 685-698 (2012). Kumimoto, H. et al. Frequent somatic mutations of mitochondrial DNA in esophageal squamous cell carcinoma. Int. J. Cancer 108, 228-231 (2004). Larman, T. C. et al. Spectrum of somatic mitochondrial mutations in five cancers. Proc. Natl. Acad. Sci. U. S. A. 109, 14087-14091 (2012). McMahon, S. & LaFramboise, T. Mutational patterns in the breast cancer mitochondrial genome, with clinical correlates. Carcinogenesis 35, 1046-1054 (2014). Chen, J. Z., Gokden, N., Greene, G. F., Mukunyadzi, P. & Kadlubar, F. F. Extensive somatic mitochondrial mutations in primary prostate cancer using laser capture microdissection. Cancer Res. 62, 6470-6474 (2002). Gomez-Zaera, M. ef al. Identification of somatic and germline mitochondrial DNA sequence variants in prostate cancer patients. Mutat. Res. 595, 42-51 (2006). Petros, J. A. ef al. mtDNA mutations increase tumorigenicity in prostate cancer. Proc. Natl. Acad. Sci. U. S. A. 102, 719-724 (2005). Kloss-Brandstatter, A. ef al. Somatic mutations throughout the entire mitochondrial genome are associated with elevated PSA levels in prostate cancer patients. Am. J. Hum. Genet. 87, 802-812 (2010). McCrow, J. P. ef al. Spectrum of mitochondrial genomic variation and associated clinical presentation of prostate cancer in South African men. Prostate 76, 349-358 (2016). Cortopassi, G. A. & Arnheim, N. Detection of a specific mitochondrial DNA deletion in tissues of older humans. Nucleic Acids Res. 18, 6927-6933 (1990). Corral-Debrinski, M., Shoffner, J. M., Lott, M. T. & Wallace, D. C. Association of mitochondrial DNA damage with aging and coronary atherosclerotic heart disease. Mutat. Res. 275, 169-180 (1992). Zhang, C. et al. Differential occurrence of mutations in mitochondrial DNA of human skeletal muscle during aging. Hum. Mutat. 1 1 , 360-371 (1998). Michikawa, Y., Mazzucchelli, F., Bresolin, N., Scarlato, G. & Attardi, G. Aging-dependent large accumulation of point mutations in the human mtDNA control region for replication. Science 286, 774-779 (1999). Alioto, T. S. et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun. 6, 10001 (2015). Lott, M. T. et al. mtDNA Variation and Analysis Using Mitomap and Mitomaster. Curr. Protoc. Bioinforma. 44, 1 .23.1 -26 (2013). Krjutskov, K. ef al. Tissue-specific mitochondrial heteroplasmy at position 16,093 within the same individual. Curr. Genet. 60, 1 1-16 (2014). Samuels, D. C. et al. Recurrent tissue-specific mtDNA mutations are common in humans. PLoS Genet. 9, e1003929 (2013). Kervinen, M. et al. The MELAS mutations 3946 and 3949 perturb the critical structure in a conserved loop of the ND1 subunit of mitochondrial complex I. Hum. Mol. Genet. 15, 2543-2552 (2006). He, Y. et al. Heteroplasmic mitochondrial DNA mutations in normal and tumour cells. Nature 464, 610-614 (2010). Ding, J. ef al. Assessing Mitochondrial DNA Variation and Copy Number in Lymphocytes of -2,000 Sardinians Using Tailored Sequencing Analysis Tools. PLoS Genet. 1 1 , e1005306 (2015). Li, F. et al. Myc stimulates nuclearly encoded mitochondrial genes and mitochondrial biogenesis. Mol. Cell. Biol. 25, 6225-6234 (2005). Sancho, P. et al. MYC/PGC-1 a Balance Determines the Metabolic Phenotype and Plasticity of Pancreatic Cancer Stem Cells. Cell Metab. 22, 590-605 (2015). Robinson, K. ef al. Accurate prediction of repeat prostate biopsy outcomes by a mitochondrial DNA deletion assay. Prostate Cancer Prostatic Dis. 13, 126-131 (2010). Nicholls, T. J. & Minczuk, M. In D-loop: 40 years of mitochondrial 7S DNA. Exp. Gerontol. 56, 175-181 (2014). Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 31 1-317 (2012). Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078- 2079 (2009). Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010). Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split- read analysis. Bioinformatics 28, Ϊ333-Ϊ339 (2012). Govind, S. K. et al. Shatterproof: operational detection and quantification of chromothripsis. BMC Bioinformatics 15, 78 (2014). Masella, A. P. ef al. BAMQL: a query language for extracting reads from BAM files. BMC Bioinformatics 17, 305 (2016). Calabrese, C. ef al. MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing. Bioinformatics 30, 31 15-31 17 (2014). Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21 , 1859-1875 (2005). Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792-1797 (2004). Behar, D. M. et al. A 'Copernican' reassessment of the human mitochondrial DNA tree from its root. Am. J. Hum. Genet. 90, 675-684 (2012). Song, S. ei al. qpure: A Tool to Estimate Tumor Cellularity from Genome-Wide Single- Nucleotide Polymorphism Profiles. PLOS ΟΛ/Ε 7, e45835 (2012). Guo, Y. et al. The use of Next Generation Sequencing Technology to Study the Effect of Radiation Therapy on Mitochondrial DNA Mutation. Mutat. Res. 744, 154-160 (2012). Ju, Y. S. ei al. Origins and functional consequences of somatic mitochondrial DNA mutations in human cancer. eLife 3, (2014). Li, B. et al. Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 25, 2744-2750 (2009). Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. 7, Unit7.20 (2013). Rubino, F. ei al. HmtDB, a genomic resource for mitochondrion-based human variability studies. Nucleic Acids Res. 40, D1 150-1 159 (2012). Putz, J., Dupuis, B., Sissler, M. & Florentz, C. Mamit-tRNA, a database of mammalian mitochondrial tRNA primary and secondary structures. RNA 13, 1 184-1 190 (2007). Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842 (2010). Krzywinski, M. I. et al. Circos: An information aesthetic for comparative genomics. Genome Res. (2009).