Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
POLYGENIC RISK SCORES FOR PREDICTING DISEASE COMPLICATIONS AND/OR RESPONSE TO THERAPY
Document Type and Number:
WIPO Patent Application WO/2019/237209
Kind Code:
A1
Abstract:
Methods, processes, and systems for predicting a subject's disease complications and/or response to therapy are described herein. The methods generally comprise genotyping or receiving genotyping information from the subject at a plurality of risk alleles associated with the disease and at a plurality of ancestry-informative markers. The genotyping information is used to generate a polygenic risk score (PRS) by weighting the number of risk alleles by the effect size of their association (weighted genetic risk score or wGRS), combined with a geo -ethnic principal component (PC) determined from the subject's genotype at said ancestry-informative markers. The PRS enables better prediction of the subject's disease complications and/or response to therapy, as compared to a corresponding PRS generated lacking the geo- ethnic principal component. Computer-implemented methods and processes are also described herein.

Inventors:
HAMET PAVEL (CA)
TREMBLAY JOHANNE (CA)
Application Number:
PCT/CA2019/050848
Publication Date:
December 19, 2019
Filing Date:
June 14, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
OPTI THERA INC (CA)
SERVIER LAB (FR)
International Classes:
C12Q1/68; G16B5/00; G01N33/50; G16B20/00
Domestic Patent References:
WO2014134970A12014-09-12
Foreign References:
US20100070455A12010-03-18
US20070059722A12007-03-15
Other References:
See also references of EP 3807883A4
Attorney, Agent or Firm:
ROBIC, LLP (CA)
Download PDF:
Claims:
CLAIMS:

1. A method for predicting a subject’s disease complications and/or response to therapy, the method comprising:

(a) genotyping said subject at a plurality of risk alleles associated with the disease;

(b) genotyping said subject at a plurality of ancestry-informative markers; and

(c) generating a polygenic risk score (PRS) by weighting the number of risk alleles by the effect size of their association (weighted genetic risk score or wGRS), combined with a geo-ethnic principal component (PC) determined from the subject’s genotype at said ancestry- informative markers, such as generating the PRS by comparing the genomic and ancestral profile of said subject with a matching cohort of subjects with outcome data collected during a selected period of interest,

wherein the PRS enables prediction of said subject’s disease complications and/or response to therapy.

2. The method of claim 1, wherein the wGRS is determined by the equation

wGRS^ = å =i Xkij X b),

where X^j is the allele frequency of ith subject in jth SNP for kth phenotype, and b is the effect size of the phenotype.

3. The method of claim 1 or 2, wherein the PRS further comprises one or more clinical components.

4. The method of claim 3, wherein the clinical components comprise said subject’s sex, age, age of onset of said disease, duration of said disease, or any combination thereof.

5. The method of any one of claims 1 to 4, wherein said plurality of risk alleles comprise at least 100, 200, 300, 400, 500 or 600 different single nucleotide polymorphisms (SNPs).

6. The method of any one of claims 1 to 5, wherein said plurality of ancestry-informative markers comprise at least 1000, 2000, 3000, 4000, 5000, 10 000, 15 000, 20 000, or 30 000 different SNPs.

7. The method of any one of claims 1 to 6, wherein the disease is diabetes and said subject has been diagnosed with diabetes.

8. The method of claim 7, wherein said PRS distinguishes subjects who benefit the most from intensive antihypertensive and/or glucose lowering therapy.

9. The method of claim 7 or 8, wherein said PRS enables the prediction of diabetic patients with increased risk for vascular complications and/or cardiovascular mortality.

10. The method of any one of claims 7 to 9, wherein said disease complications comprise macroalbuminuria, new or worsening nephropathy, new or worsening retinopathy, doubling serum creatinine, major microvascular, major macrovascular, myocardial infarction, stroke, heart failure, all causes of death, cardiovascular death, or any combination thereof.

11. The method of any one of claims 7 to 10, wherein said plurality of risk alleles comprise the SNPs set forth in Fig. 6.

12. The method of any one of claims 7 to 11, wherein said plurality of risk alleles comprise at least 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550 or 600 different SNPs selected from: rsl092393 l, rs2779l 16, rs340874, rs7578597, rs24302l, rs7593730, rs552976, rs7578326, rsl80l282, rs4607l03, rsl 1708067, rsl470579, rsl00l0l3 l, rs7754840, rsl800562, rs9472l38, rs2l9l349, rs864745, rsl799884, rs46075 l7, rs972283, rs4737009, rs896854, rsl3266634, rsl08H66l, rsl3292l36, rsl2779790, rsl 111875, rs2334499, rs23 l362, rs2237892, rs52l5, rsl552224, rsl387l53, rsl0830963, rsl 153188, rsl531343, rs796l58l, rs7957l97, rs7998202, rsl 1634397, rs8042680, rs9939609, rsl046896, rs85579l, rs3127553, rs2815752, rs753 l l l8, rsl514175, rs4l30548, rsl 1165623, rsl555543, rs984222, rslOH73 l, rs6337l5, rs543874, rs2820292, rs4846567, rs6429082, rs2867l25, rs6755502, rs7l3586, rs8879l2, rs65457l4, rs2890652, rsl0l95252, rs2l76040, rs67846l5, rs6795735, rs3849570, rs2325036, rsl3078807, rs6440003, rs98l6226, rsl5 l6725, rsl0938397, rs2H2347, rs4836l33, rs686l68l, rsl 294421, rs806794, rs206936, rsl6894959, rs6905288, rs987237, rs943005, rs9400239, rs949l696, rs2489623, rsl055 l44, rsl0968576, rs6l63, rs7903 l46, rs4929949, rsl0840l00, rsl0767658, rsl0767664, rs2293576, rs38l7334, rs7l2l446, rs7l83 l4, rs7l38803, rsl4435 l2, rs477H22, rsl2429545, rsl0l32280, rsl2885454, rsl0l50332, rs7l440l l, rs4776970, rs224l423, rs253 l992, rsl2444979, rsl2446632, rs2650492, rs7498665, rs7359397, rsl549293, rsl558902, rs7239883, rs6567l60, rs57l3l2, rs2994l, rs38l029l, rsl6996700, rs4823006, rsl l5793 l2, rs4072037, rs9l46l5, rsl3427836, rsl7346504, rsl6827742, rsl l678l90, rsl l678l90, rsl3079877, rsl0772l6, rs7634770, rsl3 l60548, rsl27l9264, rsl7738l55, rs2H0904, rs4722909, rsl730l329, rs785 l726, rsl l0986l, rsl7343073, rsl80l239, rs6602l63, rsl276444l, rs3740393, rs7922045, rs7290l4, rsl0899033, rs649529, rs2303658, rs7l45202, rsl728897, rsl528472, rs23 l227, rs6513791, rs2828785, rsl8006l5, rsl2l36063, rs267734, rs3850625, rs2802729, rs80760l, rsl260326, rs6546838, rsl3538, rs4667594, rs27l2l84, rs6795744, rs286l422, rs347685, rs968204l, rsl05 l380l, rsl73 l972l, rs2286l l, rsl l959928, rsl l959928, rs775900l, rs88l858, rs9472l35, rs316009, rs3127573, rs3750082, rs848490, rs7805747, rs7805747, rs3758086, rsl73 l274, rs47447l2, rsl04426l, rsl0994860, rsl63 l60, rs963837, rs40l4l95, rsl077402l, rsl049l967, rs7956634, rsl 106766, rs653 l78, rs7l6877, rs626277, rs476633, rs2453533, rs2467853, rs49l567, rsl394l25, rs4293393, rsl3329952, rsl64748, rsl 1657044, rs80683 l8, rsl2460876, rsl 1666497, rs6088580, rsl72l6707, rs482l467, rsl7367504, rsl7367504, rs848309, rs4360494, rsl 12557609, rs3889l99, rs2932538, rs228908l, rsl 1690961, rs74l8l299, rsl 1689667, rsl250259, rsl30827l l, rs3774372, rs4l9076, rs4l9076, rs87l606, rsl458038, rsl3107325, rs78049276, rsl46853253, rsl3139571, rsl566497, rsl7059668, rsl 173771, rsl0057l88, rs3 l864, rsl799945, rs805303, rsl858l9, rsl 1154027, rs36083386, rs449789, rsl322639, rs76206723, rsl7477l77, rs6557876, rs35783704, rs207l5l8, rs4454254, rs72765298, rs43738l4, rsl8l3353, rs933795 l, rsl0826995, rs45908l7, rs932764, rsl 1191548, rs7l29220, rs38l8l5, rsl l4428l9, rs2289l25, rs633 l85, rs8258, rsl l222084, rsl07706l2, rs73099903, rs7312464, rsl7249754, rsl39236208, rs3184504, rs3184504, rsl08504l l, rsl2434998, rs9323988, rsl378942, rs56249585, rs7500448, rs7226020, rs62080325, rsl7608766, rsl2940887, rs57927l00, rs7236548, rs2l 16941, rs22068l5, rsl327235, rs608l6l3, rs60l5450, rs60l5450, rs73161324, rsl2628032, rsl2037222, rs4420065, rs4l29267, rs2794520, rsl2239046, rsl260326, rs6734238, rs511154, rsl800789, rs2522056, rs4705952, rs690l250, rsl323357l, rs9987289, rsl0745954, rsl 183910, rs340029, rs284728l, rs4420638, rsl2027l35, rs4660293, rs2479409, rs2l3 l925, rs75 l5577, rs62930l, rsl689800, rs2642442, rs48469l4, rs5 l4230, rsl367H7, rs4299376, rs757097l, rs2972l46, rs2290l59, rs645040, rs442l77, rs6450l76, rs968666l, rsl29l6, rs6882076, rs3757354, rs3177928, rs2814944, rs9488822, rs605066, rsl564348, rsl2670798, rs2072l83, rsl7l45738, rs473 l702, rsl 1776767, rsl49574l, rsl26789l9, rs208l687, rs2293889, rs2954029, rsl 1136341, rs58l080, rsl883025, rs94H489, rsl076l73l, rs2255 l4l, rs2923084, rsl0l287l l, rs3 l3644l, rsl74546, rsl2280753, rs964l84, rs794l030, rsl 1220462, rs7l34375, rsl 1613352, rs7l34594, rsl 169288, rs4759375, rs4765 l27, rs2929282, rsl532085, rsl 1649653, rs376426l, rsl6942887, rs2000999, rs2925979, rsl 1869286, rs720697l, rs4l48008, rs4l29767, rs724l9l8, rsl2967l35, rs737337, rsl 0401969, rs2277862, rs2902940, rs6029526, rs6065906, rsl8l362, rs575693 l, rsl6l802, rs225 l32, rsl 1206510, rsl7l 14036, rs9970807, rs56l70783, rs75284l9, rs646776, rs646776, rs602633, rsl2l2234l, rsl l8 l057l, rs6689306, rsl2H872l, rsl3376333, rsl800594, rsl09H02l, rs35700460, rsl7465637, rsl7465637, rs67l80937, rs585967, rs4299376, rsl3407662, rsl0l76l76, rs7568458, rsl7678683, rs6725887, rs6725887, rsl 14123510, rsl250229, rsl3003675, rs7623687, rsl42695226, rsl390l6349, rs9818870, rsl685 l055, rsl2493885, rs72627509, rsl0857l47, rs2634074, rsl2646447, rsl7042l7l, rs2200733, rsl906599, rs6843082, rs7678555, rs4593 l08, rs684l58 l, rs2306556, rs72689l47, rs7692395, rs4975709, rs9369640, rs9349379, rsl2526453, rs6909752, rs3130683, rs4472337, rsl220533 l, rsl544935, rs560l5508, rs29l6260, rs55662l, rs632728, rs783396, rsl22020l7, rsl2l90287, rs6922269, rs2048327, rsl0455872, rsl0455872, rs23 l5065, rs694 l5 l3, rs472l377, rsl0486776, rsl2669789, rsl 1984041, rsl 1984041, rs7798l97, rs2l07595, rs2l07595, rsl0230207, rsl 12370447, rsl 1556924, rsl 1556924, rs264, rs2083636, rs200l846, rs2954029, rs6475606, rsl537370, rs4977574, rs4977574, rs289 H68, rsl0757278, rsl333047, rsl333049, rs5 l4659, rs5 l4659, rs532436, rsl8873 l8, rs2505083, rsl870634, rs50H20, rsl746048, rsl746048, rsl004467, rsl 1191416, rsl24l3409, rsl 1196288, rsl0840293, rs3993 l05, rs20l9090, rs28398 l2, rs9326246, rs964l84, rsl242579 l, rs2229357, rs268 l472, rs3184504, rsl0774625, rs653 l78, rs2238 l5 l, rsl0744777, rsl7696736, rs2244608, rsl l057830, rs4304924, rsl6945 l84, rsl l6l7955, rs55940034, rsl92498 l, rsl2435908, rsl005224, rs963474, rsl0l39550, rs7274346l, rsl9940l6, rsl9940l6, rs3825807, rs7l64479, rs7l65042, rs7l73743, rs2083460, rs2476l6, rs7l93343, rs2l0626l, rs879324, rs7500448, rs48434l6, rsl 13348108, rs228 l727, rs9914266, rs4792l43, rsl2936587, rs9897596, rs35895680, rs4643373, rs72l2798, rs8068952, rs6565653, rsl l22608, rsl l22608, rs8 l08632, rs56l3 H96, rs2845 l064, rs998260l, rs998260l, rs2473248, rs43309l2, rs72480273, rs6l830764, rs7575873, rsl374204, rs2l68443, rsl 1719201, rsl0935733, rs900399, rs2724475, rs2l3 l354, rs4432842, rs2946l79, rs3526l542, rs9379832, rs9368777, rsl 187118, rsl4l570l, rsl0872678, rs798489, rsl 1765649, rs6959887, rs62466330, rsl32662l0, rs6989280, rsl2543725, rsl255 l0l9, rs3780573, rsl4H424, rs4836833, rsl08 l8797, rs2497304, rs79237883, rs740746, rs242l0l6, rsl0830963, rsl 1055034, rs2306547, rsl35 l394, rs796436l, rs7998537, rs342l7484, rsl8 l9436, rs7402982, rslOH939, rsl 13086489, rs72833480, rsl04027l2, rs6040076, rs285306l8, rs60l6377, rs2229742, and rsl34594.

13. The method of any one of claims 7 to 12, wherein said ancestry-informative markers are Caucasian ancestry-informative markers.

14. A method for treating a subject having diabetes, said method comprising predicting the subject’s disease complications and/or response to therapy as set forth in any one of claims 1 to 13, and beginning or modifying the anti -diabetes treatment of said subject based on said PRS.

15. A computer-implemented process of predicting a subject’s disease complications and/or response to therapy, the process comprising:

(a) inputting or receiving genotyping information from said subject at a plurality of risk alleles associated with the disease; (b) inputting or receiving genotyping information from said subject at a plurality of ancestry- informative markers;

(c) generating a polygenic risk score (PRS) by weighting the number of risk alleles by the effect size of their association (weighted genetic risk score or wGRS), combined with a geo-ethnic principal component (PC) determined from the subject’s genotype at said ancestry- informative markers such as generating the PRS by comparing the genomic and ancestral profile of said subject with a matching cohort of subjects with outcome data collected during a selected period of interest, wherein the PRS enables prediction of said subject’s disease complications and/or response to therapy; and

(d) optionally communicating said PRS to said subject and/or to said subject’s health care provider.

16. The computer-implemented process of claim 15, which is a cloud-based computer-implemented process.

17. The computer-implemented process of claim 15 or 16, said process comprising one of more features as defined in any one of claims 2 to 13.

18. A computer-implemented system for predicting a subject’s disease complications and/or response to therapy, the computer-implemented system comprising a computer configured to: (i) receive a subject’s genotyping information at a plurality of risk alleles associated with the disease and at a plurality of ancestry- informative markers; (ii) generate a polygenic risk score (PRS) by weighting the number of risk alleles by the effect size of their association (weighted genetic risk score or wGRS), combined with a geo-ethnic principal component (PC) determined from the subject’s genotype at said ancestry-informative markers such as generating the PRS by comparing the genomic and ancestral profile of said subject with a matching cohort of subjects with outcome data collected during a selected period of interest, wherein the PRS enables prediction of said subject’s disease complications and/or response to therapy; and (iii) optionally communicate said PRS to said subject and/or to said subject’s health care provider.

19. The computer-implemented system of claim 18, which is a cloud-based computer-implemented system.

20. The computer-implemented system of claim 18 or 19, wherein said computer is configured to implement the method as defined in any one of claims 2 to 13, or the process of any one of claims 15 to 17.

21. A non -transitory computer-readable medium storing processor-executable instructions, the instructions when executed by a processor cause the processor to perform the method of: (i) receiving a subject’s genotyping information at a plurality of risk alleles associated with the disease and at a plurality of ancestry-informative markers; (ii) generating a polygenic risk score (PRS) by weighting the number of risk alleles by the effect size of their association (weighted genetic risk score or wGRS), combined with a geo-ethnic principal component (PC) determined from the subject’s genotype at said ancestry-informative markers, such as generating the PRS by comparing the genomic and ancestral profile of said subject with a matching cohort of subjects with outcome data collected during a selected period of interest, wherein the PRS enables prediction of said subject’s disease complications and/or response to therapy; and (iii) optionally outputting said PRS to a user.

22. The non-transitory computer-readable medium of claim 21, wherein said instructions when executed by a processor cause the processor to perform the method as defined in any one of claims 1 to 13, the computer-implemented process as defined in any one of claims 15 to 17, or the computer-implemented system as defined in any one of claims 18 to 20.

23. A method for treating a subject having diabetes, said method comprising predicting the subject’s disease complications and/or response to therapy using a PRS generated by the computer-implemented process as defined in any one of claims 15 to 17, the computer-implemented system as defined in any one of claims 18 to 20, or by executing instructions stored on the non-transitory computer-readable medium as defined in claim 21 or 22.

Description:
POLYGENIC RISK SCORES FOR PREDICTING DISEASE COMPLICATIONS AND/OR

RESPONSE TO THERAPY

The high incidence and increasing prevalence of type 2 diabetes (T2D) is one of the greatest challenges in public health worldwide. Diabetes is the leading cause of cardiovascular and renal diseases that are both serious and costly 1 . T2D decreases life expectancy by 5 to 10 years, resulting in excess death particularly in younger age groups (60-69 years old) 2 . Since the onset of vascular complications of diabetes can be postponed or partially prevented by early medical interventions that control glycaemia and blood pressure 3 , improvement of the risk prediction is becoming crucial to enable targeting individuals at high risk that could most benefit from an early prevention 3 .

Over the last few decades, several clinical risk factors have been combined into clinical prediction tools for cardiovascular diseases such as the Framingham Risk Score (FRS) and its many derivatives, some of which even included some genetic data 4 · 5 . Two prediction models exist for patients with newly diagnosed diabetes both from the UK Prospective Diabetes Study (UKPDS) 6 , but their major limitations are that they were developed many decades ago with medications and therapeutic targets of that period. Several other prediction models were developed in populations with varying duration of diagnosed diabetes 7 . The majority of these prediction models predicted 5 -year cardiovascular risk with predictors that include age, sex, and duration of diabetes, HbAlc, and smoking. The area (AUC) under receiver operating characteristic curve (ROC) were reported to range from 0.68 to 0.85 for these models, but only a minority of them has been validated and tested for their predictive accuracy 7 . Data from ADVANCE trial were also used to develop a model for cardiovascular risk prediction in people with T2D 8 . The risk factors included in the ADVANCE model are age at diagnosis and known duration of diabetes, sex, pulse pressure, treated hypertension, atrial fibrillation, retinopathy, HbAlc, urinary albumin/creatinine ratio and non-HDL cholesterol. The ADVANCE risk engine reached an AUC of 0.70 for major cardiovascular events occurring over a period of 4.5 years 8 and was replicated (AUC=0.69) in an independent set of patients with T2D, the DIABHYCAR cohort.

There is an increasing interest in the use of genetic variants to predict the risk of diseases 9 . A successful application of genome wide association studies (GWAS) has been the identification of multiple common variants associated to complex traits such as T2D 10 12 , renal and cardiovascular diseases 13 using genomic approaches. Taken individually, these genetic variants account for only a small effect size. Combination of hundreds or even thousands of genetic variants into polygenic risk scores (PRS) was recently introduced into models used to predict individual risk of diseases 14 · 15 and combining PRS with clinical risk scores somewhat improved the predictive power of the model 4 . However, there remains a need for methods for predicting a subject’s disease complications and/or response to therapy, particularly in T2D, where early risk prediction can lead to early medical interventions in subjects could benefit the most. SUMMARY

Our objective was to develop a polygenic risk score (PRS) with high predictive value for complications of type 2 diabetes and other diseases. Genetic variants encompassing the main risk factors of diabetes complications were selected from publicly available GWAS data. A PRS was generated by weighting the number of risk alleles by the effect size of their association, combined with a geo-ethnic principal component as an individualized genomic background. Its predictive value was tested in Caucasian subjects of the ADVANCE trial. The PRS was a significant predictor of micro- and macrovascular complications, and of total and cardiovascular mortality, and its performance improved with the inclusion of sex, age of onset and diabetes duration to the model. The AUCs for prediction of cardiovascular death for this enhanced PRS was 0.720 (95%CI, 0.688-0.752) compared to 0.650 (95%CI, 0.617-0.683) for Framingham risk score and 0.597 (95%CI, 0.563-0.631) for the UK Prospective Diabetes Study in the same population. While the highest risk of macrovascular events and death (total and cardiovascular) was seen in older patients with high PRS, the risk of microvascular, including renal, events was highest in patients with high PRS and early onset of diabetes. High PRS patients had the greatest relative risk reduction with the combined therapy of ADVANCE with a number needed to treat of 12 to prevent one cardiovascular death over 4.5 years (p=0.0062) and persisted during the ADVANCE-ON post-trial observational study. Strikingly, the PRS described herein outperformed clinical scores in identifying earlier diabetic patients at increased risk of incident and prevalent vascular and renal complications and mortality, highlighting the clinical utility of the present invention in targeting high risk individuals that would benefit the most from early therapy.

In some aspects, described herein is a method for predicting a subject’s disease complications and/or response to therapy, the method comprising: (a) genotyping said subject at a plurality of risk alleles associated with the disease; (b) genotyping said subject at a plurality of ancestry-informative markers; and (c) generating a PRS by weighting the number of risk alleles by the effect size of their association (weighted genetic risk score or wGRS), combined with a geo-ethnic principal component (PC) determined from the subject’s genotype at said ancestry-informative markers such as generating the PRS by comparing the genomic and ancestral profile of said subject with a matching cohort of subjects with outcome data collected during a selected period of interest, wherein the PRS enables prediction of said subject’s disease complications and/or response to therapy (e.g., based on data from the matching cohort).

In some aspects, described herein is a method for treating a subject having diabetes, the method comprising predicting the subject’s disease complications and/or response to therapy (e.g., based on data from the matching cohort) as described herein, and beginning or modifying the anti-diabetes treatment of said subject based on said PRS. In some aspects, described herein is a computer-implemented process of predicting a subject’s disease complications and/or response to therapy, the process comprising: (a) inputting or receiving genotyping information from said subject at a plurality of risk alleles associated with the disease; (b) inputting or receiving genotyping information from said subject at a plurality of ancestry-informative markers; (c) generating a PRS by weighting the number of risk alleles by the effect size of their association (weighted genetic risk score or wGRS), combined with a geo-ethnic principal component (PC) determined from the subject’s genotype at said ancestry-informative markers such as generating the PRS by comparing the genomic and ancestral profile of said subject with a matching cohort of subjects with outcome data collected during a selected period of interest, wherein the PRS enables prediction of said subject’s disease complications and/or response to therapy (e.g., based on data from a matching cohort); and (d) optionally communicating said PRS to said subject and/or to said subject’s health care provider.

In some aspects, described herein is a computer-implemented system for predicting a subject’s disease complications and/or response to therapy, the computer-implemented system comprising a computer configured to: (i) receive a subject’s genotyping information at a plurality of risk alleles associated with the disease and at a plurality of ancestry-informative markers; (ii) generate a PRS by weighting the number of risk alleles by the effect size of their association (weighted genetic risk score or wGRS), combined with a geo-ethnic principal component (PC) determined from the subject’s genotype at said ancestry-informative markers such as generating the PRS by comparing the genomic and ancestral profile of said subject with a matching cohort of subjects with outcome data collected during a selected period of interest, wherein the PRS enables prediction of said subject’s disease complications and/or response to therapy (e.g., based on data from the matching cohort); and (iii) optionally communicate said PRS to said subject and/or to said subject’s health care provider.

In some aspects, described herein is a non-transitory computer-readable medium storing processor- executable instructions, the instructions when executed by a processor cause the processor to perform the method of: (i) receiving a subject’s genotyping information at a plurality of risk alleles associated with the disease and at a plurality of ancestry-informative markers; (ii) generating a PRS by weighting the number of risk alleles by the effect size of their association (weighted genetic risk score or wGRS), combined with a geo-ethnic principal component (PC) determined from the subject’s genotype at said ancestry-informative markers; and (iii) optionally outputting said PRS to a user.

In some aspects, described herein is a method for treating a subject having diabetes, the method comprising predicting the subject’s disease complications and/or response to therapy using a PRS generated by a computer-implemented process, a computer-implemented system, or by executing instructions stored on the non-transitory computer-readable medium, as described herein. General Definitions

Headings, and other identifiers, e.g., (a), (b), (i), (ii), etc., are presented merely for ease of reading the specification and claims. The use of headings or other identifiers in the specification or claims does not necessarily require the steps or elements be performed in alphabetical or numerical order or the order in which they are presented.

The use of the word“a” or“an” when used in conjunction with the term“comprising” in the claims and/or the specification may mean“one” but it is also consistent with the meaning of“one or more”,“at least one”, and“one or more than one”.

As used in this specification and claim(s), the words“comprising” (and any form of comprising, such as“comprise” and“comprises”),“having” (and any form of having, such as“have” and“has”), “including” (and any form of including, such as“includes” and“include”) or“containing” (and any form of containing, such as“contains” and“contain”) are inclusive or open-ended and do not exclude additional, unrecited elements or method steps.

As used herein,“subject” generally refers to a mammal, including primates, and particularly to a human.

Other objects, advantages and features of the present description will become more apparent upon reading of the following non-re strictive description of specific embodiments thereof, given by way of example only with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE FIGURES

In the appended figures:

Fig. 1 : Characteristics of ADVANCE genotyped participants at baseline in comparison with whole ADVANCE cohort. * The values of all characteristics for whole ADVANCE are reported and extracted from Patel et al. 16 except for eGFR & UACR, extracted from Ninomiya T. et al. 17 Abbreviations: n: Number of patients, yr: Years, SD: Standard deviation, IQR: Interquartile range. BMI: Body mass index, HbAi c : Glycated hemoglobin, SBP: Systolic blood pressure, DBP: Diastolic blood pressure, eGFR: Estimated glomerular filtration rate based on CKD-EPI formula, UACR: Urinary albumin creatinine ratio.

Fig. 2: Stepwise approach for selection ofSNPs included in the PRS. The approach of SNP selection to construct the PRS integrates public-access resources including continuously updated database of GWAS results, NHGRI-EBI GWAS Catalog and PubMed. Curating consists on extraction of SNPs associated with one of the selected phenotypes at a given significance threshold from each GWAS and PubMed pertinent literature including only Caucasian participants. Clustering: extracted SNPs are clustered according to patterns of linkage disequilibrium (ED) determined from a HapMap CEU 1000 genomes reference population that matches the European population of the selected SNPs to yield a set of LD Blocks. Matching: Tag SNPs identified from LD blocks were used for matching with SNPs from ADVANCE GWAS databases. SNPs retained were used to construct the PRS that is tested on ADVANCE genotyped participants. Abbreviation: NHGRI-EBI GWAS catalog: National human genome research institute- European bioinformatics institute catalog of published genome-wide association studies.

Fig. 3: Polygenic risk score construction from published data & testing on ADVANCE genotyped patients. The PRS creation was based on 27 risk predictors grouped in 9 risk groups and weighted PRS testing was performed on ADVANCE 4098 genotyped subjects using additive model.

Fig. 4: Optimization of the predictive metrics of the PRS and effect of different controls groups on its performance on prevalent and incident cases of microvascular and macrovascular diseases and death. The logistic regression model used for this analysis included age of onset, diabetes duration, sex and PRS (PRS4).“Controls” group is composed of subjects who do not have a specific phenotype at baseline or during the study but may have others;“Super-controls” group is composed of subjects who had none of the complications at entry or during the whole ADVANCE trial;“Normotensive controls” group is composed of Controls with exclusion of hypertensive subjects at baseline and“Clean normotensive controls” group is normotensive cleaned controls composed of subjects without hypertension at baseline and who had none of the complications at entry of ADVANCE study. Abbreviations: AUC: area under the curve, PPV: positive predictive value, NPV: negative predictive value.

Fig. 5 : SNPs selected from literature and divided into 9 risk groups of complications ofT2D. SNPs are selected from the listed references as adapted from the model described by Ibrahim -Verbaas CA 18 .

Fig. 6 : List of 622 SNPs organized by risk groups. Abbreviations: SNP: Single nucleotide polymorphism, EAF: Effect Allele Frequency, T2D: Type 2 diabetes, HbAlc: Glycated hemoglobin, BMI: Body mass index, WC: Waist circumference, WHR: Waist hip ratio, MA: Microalbuminuria, UACR: Urinary albumin creatinine ratio, eGFR: Estimated glomerular filtration rate based on CKD-EPI formula, CREAT: Plasma creatinine, ESRD: End-stage renal disease, HTN: Hypertension, PP: Pulse pressure, SBP: Systolic blood pressure, CRP: C-reactive protein, HDL: High density lipoprotein, LDL: Low density lipoprotein, TC: Total cholesterol, TG: Triglycerides, AF: Atrial fibrillation, CAC: Coronary artery calcification, CAD: Coronary artery disease, CHD: Coronary heart disease, ICA: Intracranial aneurysm, MI: Myocardial infarction.

Fig. 7: Improvements in discrimination power of different outcomes when adding sex, age of onset of T2D & diabetes duration to genetic models (PRS to PRS4) and comparison with clinical models (Framingham and UKPDS). The polygenic risk score is composed of the weighted T2D complications genetic risk scores. AUCs and percentile-based confidence intervals were estimated from ROC curves and calculated from the predicted risks derived from the regression models. PRS = PC1 + genetic variants, PRS1 = sex + PRS, PRS2 = age of onset + sex + PRS, PRS3 = diabetes duration + sex + PRS, and PRS4 = age of onset + diabetes duration + sex + PRS. The controls used for each outcome did not have the specific phenotype at baseline or during the study but may have others (control group). The Framingham and UKPDS risk scores were calculated as described in the literature and tested for association with ADVANCE phenotypes using linear regression. Microalbuminuria is defined as urinary albumin creatinine ratio of 30 to 300 mg/g at the end of the study. Macroalbuminuria is defined as urinary albumin creatinine ratio of >300 mg/g at the end of the study. New or worsening nephropathy is defined as the development of macroalbuminuria, doubling of serum creatinine to a level at least 200 mmol/L, end-stage renal disease (ESRD) defined as a need for dialysis or renal transplantation, and death due to renal disease. New or worsening retinopathy is defined as proliferative retinopathy, macular oedema and history of retinal photocoagulation therapy. Major microvascular events is a composite of ESRD and defined as requirement for renal replacement therapy, death induced by renal disease, requirement of retinal photocoagulation, or diabetes-related blindness in either eye. Major macrovascular events is a composite of nonfatal myocardial infarction, nonfatal stroke, or cardiovascular death. Abbreviation: n: Number of events, AUC: Area under the curve, Cl: Confidence interval.

Fig. 8: Frequencies of myocardial infarction, stroke, major micro- & macrovascular events and cardiovascular & all cause death by PRS and age strata. Frequencies during the 4.5-year follow up of ADVANCE of all cause death (A), cardiovascular death (B), major macrovascular events (C), major microvascular events (D), myocardial infarction (E) and stroke (F) by age and PRS thirds. The model used here includes the PRS only (see Fig. 7). The control group is composed of normotensive patients that did not have a specific outcome at any time during the study (normotensive controls). The trend testing was done within formal regression analysis using parametric method separately for different age categories and PRS strata. Major microvascular events is a composite of ESRD and defined as requirement for renal replacement therapy, death induced by renal disease, requirement of retinal photocoagulation, or diabetes- related blindness in either eye. Major macrovascular events is a composite of nonfatal myocardial infarction, nonfatal stroke, or cardiovascular death.

Fig. 9: Frequencies of cardiovascular death separated by sex during the 4.5-year follow-up of ADVANCE by age thirds and global PRS quintiles. The model used here includes PRS only (see Fig. 7). The global risk includes major macrovascular events (a composite of nonfatal myocardial infarction, nonfatal stroke or cardiovascular death), total renal events (including doubling of serum creatinine, macroalbuminuria event, new microalbuminuria and ESRD) and all-cause mortality. The controls used here are composed of subjects who had none of the complications at entry or during the whole ADVANCE trial (Super Controls). The non-parametric ANOVA was used to compare the slopes.

Fig. 10: Major microvascular and macrovascular events frequency by PRS and age of onset strata. Frequencies during the 4.5-year follow-up of ADVANCE of major microvascular (A) and major macrovascular (B) events by age of onset of T2D and PRS thirds. The model used here includes PRS only (see Fig. 7). P-value of interaction was calculated between PRS and age of onset of T2D. The controls used are normotensive subjects with no complication phenotype at baseline (normotensive controls). The trend testing was done within formal regression analysis using parametric method. Major microvascular and major macrovascular events are defined as in Fig. 7.

Fig. 11: Frequency of albuminuria and low eGFR events by PRS, age, age of onset ofT2D and its duration strata. Frequencies during the 4.5-year follow-up of ADVANCE of albuminuria events (A-C) and total low eGFR (D-F) by age, diabetes duration, age of onset of T2D and PRS thirds. The model used here includes PRS only. The model used includes PRS only. The control group is composed of normotensive subjects that did not have albuminuria or low eGFR events at any time during the study The trend testing was done within formal regression analysis using parametric method separately for different age categories and PRS strata. P-value of interaction was calculated between PRS and age, between PRS and diabetes duration, and between PRS and age of onset of T2D. Albuminuria event is defined as urinary albumin creatinine ratio > 30 mg/g by the end of the study. Total low eGFR is defined as estimated glomerular filtration rate < 60 ml/min/l .73 m 2 based on CKD-EPI formula at baseline or by the end of the study.

Fig. 12: Cumulative hazard of cardiovascular death stratified by both glucose and BP lowering treatment and risk. Adjusted cumulative hazard curves for 9.5-year cardiovascular death by combined intensive blood pressure and glucose lowering treatment arms in the high (red), medium (blue) and low PRS 4 (green) thirds. Hazard ratio are estimated by Cox proportional hazards regression analysis. The PRS 4 used here includes PRS, sex, age of diagnosis, and diabetes duration. The controls used are normotensive subjects. The difference between high, medium and low risk categories was highly significant (p<0.000l). The effect of BP and glucose lowering treatments was significant for individuals included in the high risk group (p=0.005 at year 4.5 end of ADVANCE trial, and p=0.025 at year 9.5 end of ADVANCE-ON follow up). Abbreviation: HR: hazard ratio.

Fig. 13: Effect of ADVANCE treatments on cardiovascular death in patients stratified into low (A) and high (B) risk groups by UKPDS, FRS and PRS4. The model used here includes PRS, sex, age of diagnosis and diabetes duration (see PRS4 in Fig. 7). The NNT is shown for patients who received the combined BP lowering and intensive glucose control treatments and the p values are for the differences in NNT between this group and the combined control group. Abbreviation: NNT: Number needed to treat, BP: Blood pressure, Per-Ind: Perindopril-Indapamide.

Fig. 14: Cumulative hazard plots of all cause death, cardiovascular death and end-stage renal disease stratified by PRS thirds in standard and intensive blood pressure and glucose treatment arms. Adjusted cumulative hazard curves for all cause death, cardiovascular death, and end-stage renal disease by standard and intensive blood pressure or glucose treatment arms in the high, medium and low PRS4 thirds. Hazard ratio was analyzed by Cox proportional hazards regression analysis. Red: high risk, blue: medium risk, Green: low risk categories. The model used includes PRS, sex, age of diagnosis and diabetes duration (see PRS4 in Fig. 7). The control group includes normotensive subjects with no complications at baseline (normotensive controls). The effect of BP lowering treatment was significant only for individuals included in the high risk third for all cause (p=0.046 at the end of ADVANCE trial and p=0.047 at the end of ADVANCE -ON) and cardiovascular death (p=0.009 at the end of ADVANCE trial and r=0.014 at the end of ADVANCE-ON). The effect of glucose treatment was observed only in individuals in the high risk third for end-stage renal disease (p=0.043 at the end of ADVANCE trial and p=0.026 at the end of ADVANCE-ON). Abbreviation: HR: Hazard ratio, AD: ADVANCE trial, AD-ON: ADVANCE-ON follow-up.

Fig. 15: Ethnic- and sex-specific calibration plots ofPRS4 for cardiovascular and all cause death.

Fig. 16: Percentage of events by deciles of PRS4. OR and p-values were obtained by comparing the top 30% of distribution with the remainder of population. (A) Microalbuminuria; (B) Macroalbuminuria; (C) New or worsening nephropathy; (D) Myocardial infarction; (E) Stroke; (F) Heart failure; (G) Major Microvascular; (H) Major Macro vascular; (I) All cause death; and (J) Cardiovascular death events.

Fig. 17: Clustering of combined macrovascular disease risk by PRS4 using unsupervised hierarchical clustering algorithm. This clustering method identified three clusters of individuals with low (green;“g”), medium (orange;“o”), or high (red;“r”) risk for combined macrovascular risk representing 37.1%, 33.5%, and 29.4% respectively of ADVANCE patients. (A) The PRS4 values for each participant and each outcome were represented by Z-score (blue color: low risk score & red color: high risk score) in the heat map. (B) The incidence (%) of cardiovascular and all cause death were compared between the low and high risk clusters. UACRand eGFR values were determined in the three clusters and compared between the low and high risk clusters. Abbreviations: UACR: Urinary albumin creatinine ratio, eGFR: Estimated glomerular filtration rate based on CKD-EPI formula.

DETAILED DESCRIPTION

The present study selected a large number of SNPs within most loci determining risk factors of complications of T2D to construct a polygenic risk score (PRS), and compared its performance with two established clinical risk scores (Framingham risk score and the UKPDS risk engine). We tested these three risk scores on 4098 genotyped patients with T2D of Caucasian origin of the ADVANCE trial 16 · 19 , extended to its post-trial follow-up, ADVANCE-ON 20 . Our selection of informative genetic variants was based on publicly available results of meta-analyses of GWAS data. The risk alleles of the selected genetic variants were used to generate PRS by weighting the number of risk alleles by the effect size of their association in the original association study. Ethnicity plays a role in the development of several diabetic complications and we recently reported 19 that a principal component (PC) analysis with several thousands of SNPs, can be used to stratify ADVANCE Caucasian participants into two main geo-ethnic groups. The first principal component (PC1) divided the individuals of Europe along an east-west gradient of Balto-Slavic and Germano-Celtic origins 19 . These two geo-ethnic groups exhibit different risk profiles for T2D complications, so we integrated PC1 into the PRS as an individualized genetic background (IGB). This PRS was then combined with age of onset and diabetes duration, the two diabetes-specific predictors used in ADVANCE risk engine 8 , in order to define specific predictive models that were tested by the AUC of the ROC curves. We also investigated whether the PRS could be used to identify individuals that most benefit from the combined therapy administered in ADVANCE 21 .

Accordingly, in some embodiments, the present description relates to a method for predicting a subject’s disease complications and/or response to therapy, the method comprising: (a) genotyping said subject at a plurality of risk alleles associated with the disease; (b) genotyping said subject at a plurality of ancestry-informative markers; and (c) generating a polygenic risk score (PRS) by weighting the number of risk alleles by the effect size of their association (weighted genetic risk score or wGRS), combined with a geo-ethnic principal component (PC) determined from the subject’s genotype at said ancestry-informative markers (e.g., generating the PRS by comparing the genomic and ancestral profile of said subject with a matching cohort of subjects with outcome data collected during a selected period of interest), wherein the PRS enables prediction of said subject’s disease complications and/or response to therapy (e.g., based on data from the matching cohort). For greater clarity, the PRS of the subject may be generated by comparing the genomic and ancestral profile of the subject with a matching cohort of subjects with outcome data collected during a selected period of interest (e.g., a 5- or lO-year period). For example, the PRS determination may take into account multiple genomic and/or ancestral profile variants of a subject and match them to subjects in a database with similar genomic and/or ancestral profiles, and who developed or not the disease or its complications over the same period of time.

Advantageously, the PRSs described herein (e.g., generated in (c) above) may enable better and/or earlier prediction of the subject’s disease complications and/or response to therapy, as compared to a corresponding PRS generated lacking the geo-ethnic principal component determined from the subject’s genotype at said ancestry-informative markers. In some embodiments, the PRSs described herein may enable earlier prediction of the subject’s disease complications and/or response to therapy, as compared to an approved clinical risk score for the disease. For example, the PRSs described herein may enable better and/or earlier prediction of the subject’s disease complications and/or response to therapy as compared to Framingham Risk Score and/or UKPDS risk score, for example for one or more outcomes comprising microalbuminuria, macroalbuminuria, new or worsening nephropathy, new or worsening retinopathy, doubling serum creatinine, major microvascular, major macrovascular, myocardial infarction, stroke, heart failure, all cause death, cardiovascular death, or any combination thereof.

In some embodiments, the genotyping is performed from a biological sample from the subject, such as a blood or tissue sample. In some embodiments, the genotyping is performed by microarray analysis.

In some embodiments, the wGRS is determined by the equation

X b),

where X^ j is the allele frequency of i th subject in j th SNP for k th phenotype, and b is the effect size of the phenotype.

In some embodiments, the PRS further comprises one or more clinical components (e,g., the subject’s sex, age, age of onset of said disease, duration of said disease, or any combination thereof). In some embodiments, the inclusion of one or more clinical components further increases the performance of the PRSs described herein for predicting the subject’s disease complications and/or response to therapy (e.g., based on data from a matching cohort).

In some embodiments, the plurality of risk alleles may comprise at least 100, 200, 300, 400, 500 or 600 different single nucleotide polymorphisms (SNPs) (e.g., known to be associated with T2D or another disease or outcome described). In some embodiments, the plurality of ancestry-informative markers may comprise at least 1000, 2000, 3000, 4000, 5000, 10 000, 15 000, 20 000, or 30 000 different SNPs.

In some embodiments, the diseases and/or disease complications suitable for the PRS-based prediction methods, processes, and systems generally described herein, are those for which different geo ethnic groups exhibit different risk profiles. Prediction of such diseases and/or disease complications are expected to benefit from PRSs generated by combining a weighted genetic risk score with a geo-ethnic principal component determined from a subject’s genotype at a plurality of ancestry-informative markers (e.g., based on data from a matching cohort), as described herein.

In some embodiments, the subject has been recently (or for the first time) diagnosed with diabetes (e.g., T2D). In some embodiments, the PRS distinguishes subjects who benefit the most from intensive antihypertensive and/or glucose lowering therapy. In some embodiments, the PRS enables the prediction of diabetic patients with increased risk for vascular complications and/or cardiovascular mortality.

In some embodiments, the method described herein may be useful for predicting a subject’s diabetes complications, such as macroalbuminuria, new or worsening nephropathy, new or worsening retinopathy, doubling serum creatinine, major microvascular, major macrovascular, myocardial infarction, stroke, heart failure, all causes of death, cardiovascular death, or any combination thereof.

In some embodiments, the plurality of risk alleles comprise the SNPs set forth in Fig. 6. In some embodiments, the plurality of risk alleles comprise at least 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550 or 600 different SNPs selected from: rsl092393 l, rs2779H6, rs340874, rs7578597, rs24302l, rs7593730, rs552976, rs7578326, rsl80l282, rs4607l03, rsl 1708067, rsl470579, rsl00l0l3 l, rs7754840, rs 1800562, rs9472l38, rs2l9l349, rs864745, rsl799884, rs46075 l7, rs972283, rs4737009, rs896854, rsl3266634, rsl08H66l, rsl3292l36, rsl2779790, rsl l l l875, rs2334499, rs23 l362, rs2237892, rs52l5, rsl 552224, rsl387l53, rsl0830963, rsl l53 l88, rsl53 l343, rs796l58l, rs7957l97, rs7998202, rsl 1634397, rs8042680, rs9939609, rsl046896, rs85579l, rs3 l27553, rs28l5752, rs753 l l l8, rsl5 l4l75, rs4l30548, rsl 1165623, rsl555543, rs984222, rslOH73 l, rs6337l5, rs543874, rs2820292, rs4846567, rs6429082, rs2867l25, rs6755502, rs7l3586, rs8879l2, rs65457l4, rs2890652, rsl0l95252, rs2l76040, rs67846l5, rs6795735, rs3849570, rs2325036, rsl3078807, rs6440003, rs98l6226, rsl5 l6725, rsl0938397, rs2l 12347, rs4836l33, rs686l68l, rsl29442l, rs806794, rs206936, rsl6894959, rs6905288, rs987237, rs943005, rs9400239, rs949l696, rs2489623, rsl055 l44, rsl0968576, rs6l63, rs7903 l46, rs4929949, rsl0840l00, rsl0767658, rsl0767664, rs2293576, rs38l7334, rs7l2l446, rs7l83 l4, rs7l38803, rsl4435 l2, rs477H22, rsl2429545, rsl0l32280, rsl2885454, rsl0l50332, rs7l440l l, rs4776970, rs224l423, rs253 l992, rsl2444979, rsl2446632, rs2650492, rs7498665, rs7359397, rsl549293, rsl558902, rs7239883, rs6567l60, rs57l3 l2, rs2994l, rs38l029l, rsl6996700, rs4823006, rsl l5793 l2, rs4072037, rs9l46l5, rsl3427836, rsl7346504, rsl6827742, rsl l678l90, rsl l678l90, rsl3079877, rsl0772l6, rs7634770, rsl3 l60548, rsl27l9264, rsl7738l55, rs2H0904, rs4722909, rsl730l329, rs785 l726, rsl 109861, rsl7343073, rsl80l239, rs6602l63, rsl276444l, rs3740393, rs7922045, rs7290l4, rsl0899033, rs649529, rs2303658, rs7l45202, rsl728897, rsl528472, rs23 l227, rs6513791, rs2828785, rsl8006l5, rsl2l36063, rs267734, rs3850625, rs2802729, rs80760l, rsl260326, rs6546838, rsl3538, rs4667594, rs27l2l84, rs6795744, rs286l422, rs347685, rs968204l, rsl05 l380l, rsl7319721, rs2286l l, rsl l959928, rsl l959928, rs775900l, rs88l858, rs9472l35, rs3 l6009, rs3l27573, rs3750082, rs848490, rs7805747, rs7805747, rs3758086, rsl73 l274, rs47447l2, rsl04426l, rsl0994860, rsl63160, rs963837, rs40l4l95, rsl077402l, rsl049l967, rs7956634, rsl 106766, rs653 l78, rs7l6877, rs626277, rs476633, rs2453533, rs2467853, rs49l567, rsl394l25, rs4293393, rsl3329952, rsl64748, rsl 1657044, rs80683 l8, rsl2460876, rsl 1666497, rs6088580, rsl72l6707, rs482l467, rsl7367504, rsl7367504, rs848309, rs4360494, rsl 12557609, rs3889l99, rs2932538, rs228908l, rsl 1690961, rs74l8l299, rsl 1689667, rsl250259, rsl30827l l, rs3774372, rs4l9076, rs4l9076, rs87l606, rsl458038, rsl3 l07325, rs78049276, rsl46853253, rsl3 l3957l, rsl566497, rsl7059668, rsl l7377l, rsl0057l88, rs3 l864, rsl799945, rs805303, rsl858l9, rsl l l54027, rs36083386, rs449789, rsl322639, rs76206723, rsl7477l77, rs6557876, rs35783704, rs207l5 l8, rs4454254, rs72765298, rs43738l4, rsl8l3353, rs933795 l, rsl0826995, rs45908l7, rs932764, rsl l l9l548, rs7l29220, rs38l8l5, rsl l4428l9, rs2289l25, rs633185, rs8258, rsl 1222084, rsl07706l2, rs73099903, rs7312464, rsl7249754, rsl39236208, rs3184504, rs3184504, rsl08504l l, rsl2434998, rs9323988, rsl378942, rs56249585, rs7500448, rs7226020, rs62080325, rsl7608766, rsl2940887, rs57927l00, rs7236548, rs2l 16941, rs22068l5, rsl327235, rs608l6l3, rs60l5450, rs60l5450, rs73 l6l324, rsl2628032, rsl2037222, rs4420065, rs4129267, rs2794520, rsl2239046, rsl260326, rs6734238, rs5 l l l54, rsl800789, rs2522056, rs4705952, rs690l250, rsl323357l, rs9987289, rsl0745954, rsl 183910, rs340029, rs284728l, rs4420638, rsl2027l35, rs4660293, rs2479409, rs2l3 l925, rs7515577, rs62930l, rsl689800, rs2642442, rs48469l4, rs514230, rsl367H7, rs4299376, rs757097l, rs2972l46, rs2290l59, rs645040, rs442l77, rs6450l76, rs968666l, rsl29l6, rs6882076, rs3757354, rs3 l77928, rs28l4944, rs9488822, rs605066, rsl564348, rsl2670798, rs2072l83, rsl7l45738, rs473 l702, rsl l776767, rsl49574l, rsl26789l9, rs208l687, rs2293889, rs2954029, rsl 1136341, rs58l080, rsl883025, rs94l 1489, rsl076l731, rs2255141, rs2923084, rsl0l287l 1, rs3136441, rsl74546, rsl2280753, rs964l84, rs794l030, rsl 1220462, rs7l34375, rsl 1613352, rs7l34594, rsl l69288, rs4759375, rs4765 l27, rs2929282, rsl532085, rsl l649653, rs3764261, rsl6942887, rs2000999, rs2925979, rsl 1869286, rs720697l, rs4l48008, rs4l29767, rs724l9l8, rsl2967l35, rs737337, rsl040l969, rs2277862, rs2902940, rs6029526, rs6065906, rsl8l362, rs575693 l, rsl6l802, rs225 l32, rsl l2065 l0, rsl7H4036, rs9970807, rs56l70783, rs75284l9, rs646776, rs646776, rs602633, rsl2l2234l, rsl 1810571, rs6689306, rsl2l 18721, rsl3376333, rsl800594, rsl09H02l, rs35700460, rsl7465637, rsl7465637, rs67l80937, rs585967, rs4299376, rsl3407662, rsl0l76l76, rs7568458, rsl7678683, rs6725887, rs6725887, rsl 14123510, rsl250229, rsl3003675, rs7623687, rsl42695226, rsl390l6349, rs98l8870, rsl685 l055, rsl2493885, rs72627509, rsl0857l47, rs2634074, rsl2646447, rsl7042l7l, rs2200733, rsl906599, rs6843082, rs7678555, rs4593l08, rs684l58l, rs2306556, rs72689l47, rs7692395, rs4975709, rs9369640, rs9349379, rsl2526453, rs6909752, rs3l30683, rs4472337, rsl220533 l, rsl544935, rs560l5508, rs29l6260, rs55662l, rs632728, rs783396, rsl22020l7, rsl2l90287, rs6922269, rs2048327, rsl0455872, rsl0455872, rs2315065, rs694l5 l3, rs472l377, rsl0486776, rsl2669789, rsl l98404l, rsl l98404l, rs7798l97, rs2l07595, rs2l07595, rsl0230207, rsl 12370447, rsl 1556924, rsl 1556924, rs264, rs2083636, rs200l846, rs2954029, rs6475606, rsl537370, rs4977574, rs4977574, rs289H68, rsl0757278, rsl333047, rsl333049, rs514659, rs514659, rs532436, rsl8873 l8, rs2505083, rsl870634, rs50H20, rsl746048, rsl746048, rsl004467, rsl 1191416, rsl24l3409, rsl l l96288, rsl0840293, rs3993 l05, rs20l9090, rs28398l2, rs9326246, rs964l84, rsl242579l, rs2229357, rs268l472, rs3 l84504, rsl0774625, rs653 l78, rs2238l5 l, rsl0744777, rsl7696736, rs2244608, rsl l057830, rs4304924, rsl6945 l84, rsl l6l7955, rs55940034, rsl92498l, rsl2435908, rsl005224, rs963474, rsl0l39550, rs7274346l, rsl9940l6, rsl9940l6, rs3825807, rs7l64479, rs7l65042, rs7l73743, rs2083460, rs2476l6, rs7l93343, rs2l0626l, rs879324, rs7500448, rs48434l6, rsl 13348108, rs228l727, rs99l4266, rs4792l43, rsl2936587, rs9897596, rs35895680, rs4643373, rs72l2798, rs8068952, rs6565653, rsl l22608, rsl l22608, rs8l08632, rs56l3 H96, rs2845l064, rs998260l, rs998260l, rs2473248, rs43309l2, rs72480273, rs6l830764, rs7575873, rs 1374204, rs2l68443, rsl 1719201, rsl0935733, rs900399, rs2724475, rs2l3 l354, rs4432842, rs2946l79, rs3526l542, rs9379832, rs9368777, rsl 187118, rsl4l570l, rsl0872678, rs798489, rsl 1765649, rs6959887, rs62466330, rsl32662l0, rs6989280, rsl2543725, rsl255 l0l9, rs3780573, rsl4H424, rs4836833, rsl08 l8797, rs2497304, rs79237883, rs740746, rs242l0l6, rsl0830963, rsl l055034, rs2306547, rsl35 l394, rs796436l, rs7998537, rs342l7484, rsl8 l9436, rs7402982, rslOH939, rsl 13086489, rs72833480, rsl04027l2, rs6040076, rs285306l8, rs60l6377, rs2229742, and rsl34594

(Fig. 6).

In some embodiments, the ancestry-informative markers may be Caucasian ancestry-informative markers.

In some embodiments, the present description relates to a method for treating a subject having diabetes, the method comprising predicting the subject’s disease complications and/or response to therapy as set forth herein, and beginning or modifying the anti-diabetes treatment of said subject based on said PRS (e.g., intensive antihypertensive and/or glucose lowering therapy).

In some aspects, the present description relates to a computer-implemented process of predicting a subject’s disease complications and/or response to therapy, the process comprising: (a) inputting or receiving genotyping information from said subject at a plurality of risk alleles associated with the disease; (b) inputting or receiving genotyping information from said subject at a plurality of ancestry-informative markers; and (c) the PRS is inferred by comparing the genomic and ancestral profile of said subject with a matching cohort of patients with outcome data collected during a 10 year period generating a polygenic risk score (PRS) by weighting the number of risk alleles by the effect size of their association (weighted genetic risk score or wGRS), combined with a geo-ethnic principal component (PC) determined from the subject’s genotype at said ancestry-informative markers, such as generating the PRS by comparing the genomic and ancestral profile of said subject with a matching cohort of subjects with outcome data collected during a selected period of interest, wherein the PRS enables prediction of said subject’s disease complications and/or response to therapy (e.g., based on data from the matching cohort). In some embodiments, the computer-implemented process may further comprise (d) communicating said PRS to a user or party of interest (e.g., said subject and/or to said subject’s health care provider).

In some embodiments, the computer-implemented process described herein may be or comprise a cloud-based computer-implemented process. In some embodiments, the computer-implemented process described herein may further include one or more features of the methods described herein.

In some aspects, the present description relates to a computer-implemented system for predicting a subject’s disease complications and/or response to therapy, the computer-implemented system comprising a computer configured to: (i) receive a subject’s genotyping information at a plurality of risk alleles associated with the disease and at a plurality of ancestry-informative markers; and (ii) generate a polygenic risk score (PRS) by weighting the number of risk alleles by the effect size of their association (weighted genetic risk score or wGRS), combined with a geo-ethnic principal component (PC) determined from the subject’s genotype at said ancestry-informative markers such as generating the PRS by comparing the genomic and ancestral profile of said subject with a matching cohort of subjects with outcome data collected during a selected period of interest, wherein the PRS enables prediction of said subject’s disease complications and/or response to therapy (e.g., based on data from the matching cohort). In some embodiments, the computer-implemented system may further comprise (iii) communicating the PRS to a user or party of interest (e.g., the subject and/or to the subject’s health care provider).

In some embodiments, the computer-implemented system described herein may be or comprise a cloud-based computer-implemented system.

In some embodiments, the computer-implemented system described herein may be configured to implement a method or process as described herein.

In some aspects, the present description relates to a non-transitory computer-readable medium storing processor-executable instructions, the instructions when executed by a processor cause the processor to perform the method of: (i) receiving a subject’s genotyping information at a plurality of risk alleles associated with the disease and at a plurality of ancestry-informative markers; and (ii) generating a polygenic risk score (PRS) by weighting the number of risk alleles by the effect size of their association (weighted genetic risk score or wGRS), combined with a geo-ethnic principal component (PC) determined from the subject’s genotype at said ancestry-informative markers. In some embodiments, the processor- executable instructions, when executed by a processor, further comprise (iii) outputting the PRS to a user or party of interest (e.g., the subject and/or to the subject’s health care provider).

In some embodiments, the processor-executable instructions, when executed by a processor, may cause the processor to perform a method, computer-implemented process, or computer-implemented system as described herein.

In some aspects, the present description relates a method for treating a subject having diabetes, the method comprising predicting the subject’s disease complications and/or response to therapy using a PRS generated by a computer-implemented process, a computer-implemented system, or by executing instructions stored on the non-transitory computer-readable medium, as described herein.

EXAMPLES

Example 1: Methods 1.1 ADVANCE cohort and subset of genotyped patients

ADVANCE was a factorial randomized controlled clinical trial of blood pressure (BP) lowering and intensive glucose control in patients with T2D. A total of 11, 140 participants were recruited from 215 centers in 20 countries. They were 55 years of age or older and had T2D diagnosed after the age of 30 years. In brief, ADVANCE was a 2x2 factorial randomized controlled trial of blood pressure lowering (perindopril-indapamide vs placebo) and glucose control (glicazide MR-based intensive intervention vs standard care). The trial was successful in decreasing total mortality by attenuation of combined microvascular and macrovascular outcomes with blood pressure control 16 and combination of blood glucose and blood pressure control 22 .

In the present study, a genotyped subset of 4098 T2D patients of Caucasian origin from the ADVANCE cohort was analysed. The baseline phenotypes (see Fig. 1) used for this study consists of age, gender, BMI, age at diagnosis of T2D, diabetes duration, HbAlc, systolic blood pressure (SBP), diastolic (DBP), history of currently treated hypertension and renal function as determined by eGFR and UACR. ADVANCE-ON was a 4.5-year post-trial observational extension of ADVANCE conducted in 80% of subjects, which demonstrated that benefits of intensive blood pressure control in reduction of mortality persisted in ADVANCE-ON 20 . The long-term benefit of intensive glycemic control persisted only in reduction of end-stage kidney disease during ADVANCE-ON similarly to what has been reported in the long-term observation of DCCT/EDIC trial on intensive therapy of type 1 diabetes 23 · 24 . Details of statistical analysis, genotyping and imputation as well as a stepwise approach for selection of SNPs associated to risk factors of complications of T2D are described in Examples 1.3-1.5

Similarly, the creation of polygenic risk score (PRS), incorporating geoethnicity/individualized genetic background (IGB), is detailed in Examples 1.5 and 1.6, as well as in Figs. 2 and 3. Finally, in order to optimize the predictive power of our PRS, we evaluated the impact of various control groups, since all subjects were diabetics, recruited for high risk of cardiovascular outcomes, independently of their initial blood pressure, including many with past events. So, as described in Example 1.7, we tested as“controls” subjects without the phenotype in question at baseline or during the study, with and without hypertension, and with or without any phenotype at baseline or throughout the ADVANCE trial (Fig. 4).

1.2 Clinical prediction using Framingham and UKPDS risk predictors

We did not calculate the ADVANCE clinical risk score 5 in our set of genotyped patients as the original ADVANCE prediction model was developed on the same source population. In a similar way, we did not select the SNPs of our PRS from ADVANCE GWAS data on which it was to be tested. As comparative clinical risk predictors, we used the popular Framingham and UKPDS predictive tools. The FRS includes age, sex, total cholesterol, HDL cholesterol, smoking status, diabetes, SBP, and blood pressure treatment as clinical risk factors 25 , while the UKPDS prediction model (UKPDS 56) includes age, sex, diabetes duration, total cholesterol, HDL cholesterol, SBP, smoking status and HbAlc 6 . We calculated the risk scores of our 4098 ADVANCE patients. These formulated risk scores were then used in our models to predict micro- and macrovascular complications and mortality. The PRS was calibrated overall and separately for both sexes and geo -ethnic background.

1.3 Statistical analyses

Descriptive summary statistics were computed, using frequencies (%) for categorical variables and mean (±SD) for continuous variables. A binomial test was used to compare the two proportions of categorical variables. AUCs with 95% confidence interval were calculated by fitting the multivariable logistic regression over the PRS, including genetically determined ethnicity (PC1). FRS and UKPDS engine were computed on ADVANCE trial subjects as was done in other studies 18 . We divided the study participants into thirds representing equal number of individuals carrying low, medium or high PRS values, thirds of age strata, thirds of age of onset of diabetes and thirds of diabetes duration, and analyzed the predictive performance of these strata on total and cardiovascular death, micro-and macrovascular events, myocardial infarction and stroke events. A proportion trend test 26 was used to calculate the trend p-values of the stratified data. We used the pROC package in R for the analysis of ROC curves 27 . The areas under the receiver operating characteristics (ROC) curves (AUCs; 95% confidence intervals), calculated from the predicted risks derived from the regression models, were used to assess the predictive performance of the PRS and the two clinical risk scores. The DeLong method was used to calculate the p-value for the comparison of two ROC curves 28 . PRS thirds and treatment effects were examined through cumulative hazards curves with the use of Cox proportional hazard models. The log -rank test was used to compare the cumulative hazards and the plots are shown over the period of 9.5 years (ADVANCE -ON) to examine post trial effects of the intensive blood pressure-lowering and the intensive glucose therapies on cardiovascular death, all cause death and end-stage renal disease in the three genetic risk groups.

1.4 Genotyping and imputation

Genotyping was performed using the Affymetrix Genome-Wide Human SNP Arrays 5.0 or 6.0 or the Affymetrix UK BioBank Axiom arrays (Affymetrix, Santa Clara, California, USA) following standard protocols recommended by the manufacturer. A quality control filtering step was applied to the genotype calls as described in our previous work 19 . Additional quality control steps included coarse-grain stratification to ensure a Caucasian population ratio more than 0.8 (STRUCTURE software 29 ), a genetic relatedness check to ensure independent samples (PLINK) and a sex check to ensure genetic accuracy and database integrity 30 . Quality control was also performed on the final genotypes to remove any SNPs with more than 4% of missing values across the entire cohort and any samples with more than 2% of missing SNP genotypes. A more stringent threshold was used for any SNPs with between 1 and 5% minor allele frequencies (MAF). Low MAF SNPs with more than 1% of missing values were removed prior to the imputation. 4098 samples passed these quality filters. Three sets of imputation were performed separately for the individuals genotyped either on Affymetrix arrays 5.0, 6.0 or UK Biobank using SHAPEIT 31 , IMPUTE2 software 32 and the 1000 Genomes project 33 · 34 phase 3 data set as reference. Only SNPs with an imputation quality score greater than or equal to 0.80 were kept as it has been proposed in other studies 35 . A subset of 34,570 independent SNPs common to all three microchips was selected to perform a principal component (PC) 19 analysis for the ADVANCE study participants of Caucasian origin using the EIGENSOFT 3.0 package 36 . The first principal component (PC1) separated the 4098 individuals along a geographical gradient from East (Balto-Slavic) to West (Germano-Celtic) Europe as described previously 19 . Individual PC1 value was added to the 622 SNPs described below to create the PRS.

1.5 Stepwise approach for selection of SNPs associated to risk factors of complications of T2D

We selected 27 risk factors of vascular complications of T2D that we divided into 9 risk groups as initially suggested by Ibrahim-Verbaas 18 and modified as described in the table shown in Fig. 5. The 9 risk groups include SNPs associated to diabetes, obesity, blood pressure, albuminuria, glomerular filtration rate (GFR), biomarker levels, lipids, cardiovascular and birth weight. We identified most of the GWAS (as per October 10, 2017) that were reported in the NGHRI GWAS Catalog and using HuGE navigator we extracted all SNPs together with their p values and effect size (b) for the 27 risk factors listed in Fig. 5. We also performed manual literature curation and included additional SNPs by relaxing p-values threshold of associations if they had evidence of independent replication and/or were reported in meta-analyses published by major GWAS Consortia for blood pressure, renal function, stroke and lipids. We then defined LD blocks and identified tag SNPs (SNPs that are in LD with the lead SNP contained in the block at R 2 >0.8) using the HapMap CEU samples from 1000 Genomes phase 3. We matched the tag SNPs with SNPs in our genotype database. When a tag SNPs could not be found in ADVANCE, the LD threshold was relaxed to R 2 >0.7 and closest LD proxy was selected. If no tag SNP could be found, the LD block was removed from analysis. When several SNPs within a class of risk factors were located in the same loci, we selected the top SNP with the lowest P value. In some cases, a SNP was associated with more than one trait and it was thus included in more than one risk groups. We identified a total of 622 SNPs (594 unique SNPs) (Figs. 5 and 6). Among other things, their rs number, genomic position, risk group to which they belong and the published reference which describe them are included in Supplementary table 3. Our stepwise strategy for selection of SNPs to be included in the PRS is illustrated in Figs. 2 and 3. 1.6 Creation of the PRS

To create a PRS we constructed, at first, a weighted genetic risk score (wGRS) for each of the 27 aforementioned risk predictors over all study participants to evaluate the effect of the 622 SNPs (Fig. 5). We used the additive model, assuming that each SNP is independently associated with risk, to construct wGRS (different SNPs contribute with different weights to the GRS value) according to the effect size (b) attributed to the tested SNPs in the original association study (Fig. 5).

We calculated wGRS for these predictors, as previously described, by summing the product of the number of risk alleles for each patient by the effect size of those SNPs i.e., wGRS'f = X[ X /? , where X^ j is the allele frequency of i th subject in j th SNP for k th phenotype and b is the effect size of the phenotype. The number of loci is not the same for the 27 predictors as well as the unit used, so the wGRS had to be scaled by the sum of its effect coefficients and multiplied by the number of loci of that specific trait. With this scaling, each risk predictor will have an equivalent weight at an equivalent number of loci (Fig. 5). Using wGRS of these 27 predictors, we then formed 9 risk groups that were added to the PC1 value to constitute the PRS. The predictive performance of the PRS was computed alone and with the inclusion of sex (PRS1), sex and age of onset of diabetes (PRS2), sex and diabetes duration (PRS3), or sex and both ages (PRS4) in the model.

1.7 Optimization of the predictive value of the PRS

It is known that the performance of a risk prediction tool depends not only on the definition of cases but also of the control group 16 . Since high risk patients were recruited in the ADVANCE study and many of them had multiple risk factors at study entry, we used different sets of controls to estimate the maximal predictive power (AUC max ) of PRS4: 1) A control group defined as patients who did not have a specific phenotype neither at baseline nor during the study, but may have other manifestations; 2) A super-control group composed of subjects who had none of the complications at entry or during the whole ADVANCE trial, 3) A normotensive control group (controls no HT) as in 1) with exclusion of hypertensive subjects at baseline; and finally A) A normotensive cleaned control group who are controls without hypertension and who had none of the complications at baseline. As shown in Fig. 4, the performance of PRS4 as a classifier or predictor of prevalent or incident complications during the median 4.5-year trial period of ADVANCE, was higher with either super-controls, normotensive controls, or normotensive clean controls when compared to the control group. However, since the number of normotensive clean controls or super-controls compared to cases was too small we reported data using either controls or normotensive controls unless indicated otherwise. Example 2: Results

We analysed the data of 4098 Caucasian subjects from the total set of 11,140 subjects of the ADVANCE study. Their clinical characteristics at baseline are shown in Fig. 1 along with those of the entire set. A sizable difference in the sex ratio was caused by a higher proportion of males in the most numerous geo-ethnic group, patients of Celtic origin 19 . During the median 4.5-years of the study, the genotyped patients had 334 microvascular and 559 macrovascular events (Fig. 1). During the same period, 192 myocardial infarcts, 154 strokes and 225 heart failures also occurred in these patients. Eight hundred fifty-one patients developed micro-albuminuria and 150 had macro-albuminuria, 198 had new or worsening nephropathy, 62 doubled their serum creatinine and 151 had new or worsening retinopathy. A total of 549 genotyped patients died (including 283 cardiovascular deaths) during the follow-up time of ADVANCE (Fig. 1).

2.1 Predictive performance of PRS compared to UKPDS and FRS scores

The AUCs of Fig. 7 represent the discrimination between cases, defined as having a phenotype, from controls that did not have a specific phenotype either at baseline or during the study, but could have other manifestations, using the PRS models, and the Framingham and UKPDS scores. The AUCs of the PRS alone were modest but significant for all T2D outcomes listed in Fig. 7, ranging from 0.536 (95%CI, 0.510-0.563) for all-cause death to 0.612 (95%CI, 0.567-0.657) for new or worsening retinopathy. This shows that a significant prediction can be achieved with a PRS based on a high number of genomic variants associated to risk factors and IGB. The discriminations with the PRS alone were higher than the AUCs obtained with the UKPDS score for major microvascular events (micro- and macroalbuminuria, new or worsening nephropathy and retinopathy, doubling of serum creatinine). The predictive performance of the PRS improved with the inclusion of sex (PRS1), sex and age of onset of diabetes (PRS2), sex and diabetes duration (PRS3), or sex, age of onset, and diabetes duration (PRS4). The AUC for prediction of cardiovascular death with PRS4 was 0.720 (95%CI, 0.688-0.752) compared to 0.650 (95%CI, 0.617-0.683) for Framingham risk score and 0.597 (95%CI, 0.563-0.631) for UKPDS in the same population. Thus, PRS4 exhibited the best prediction of the risk of outcomes and mortality in T2D patients of Caucasian origin outperforming the two popular Framingham and UKPDS risk scores and without requiring the presence of any clinical manifestations or initial outcomes. The PRS4 was well calibrated for cardiovascular death in the whole population (p=0.67: Hosmer-Lameshow test for predicted vs. observed) with better fit for males (p=0.66) than females (p=0.48) and for Slavic (p=0.77) than Celtic (p=0.44) individuals The calibration for total death was p =0.59 for whole population with better fit for females (p=0.83) than males (p=0.23) and as good for Celtic (p=0.8 l) and Slavic (p=0.83) (Fig. 15). In general, an AUC of 0.7 or greater is considered acceptable for prediction and AUCs of PRS4 exceeded 0.70 for all incident and prevalent cases when a normotensive control group was used, underlying the importance of hypertension and its AUCs reached 0.79 for cardiovascular death with a group of super controls or normotensive clean controls (Fig. 4). The adjusted negative predictive values (NPV) of PRS4 were uniformly over 80% for most prevalent and incident cases and the adjusted positive predictive value (PPV) exceeded 0.60 for micro- and macrovascular prevalent and incident cases (Fig. 4).

2.2 Impact of PRS, age, age of onset and diabetes duration on risk prediction

The significance of the age factor was higher than the predicted probability of the PRS for total death (p=2. l x l0 27 and 8.6 x 10 14 , age and PRS respectively), cardiovascular death (p=6.0 x 10 22 and 4.8 x 10 15 ) and macrovascular events (p=2.7 x 10 16 and 3.1 x 10 14 ), occurring during the 4.5-year period of ADVANCE. When individuals were divided into equal thirds of PRS values and of age strata, the frequency of macrovascular events, total and cardiovascular deaths increased across all age and PRS strata (trend p values of PRS were significant at all age strata and those of age were significant at all PRS thirds) (Fig. 8). The highest risk was seen in older patients of the high PRS third. On the other hand, age and PRS had comparable capacity of prediction (r=10 5 for both) of microvascular events, while the prediction of myocardial infarction (p=3.8 x l0 6 and 2.8 x 10 5 , PRS and age respectively) and stroke (p=7.9 x 10 14 and 7.4 x 10 7 ) was more significant with the PRS than with age and the prediction with age was the highest in patients with high PRS. No interactions were noted between age and PRS on the prediction of micro and macrovascular events, myocardial infarction and stroke (all p interactions >0.4).

We then calculated a PRS for global risk as defined in ADVANCE 37 , and divided individuals into quintiles of their PRS values to assess the difference in cardiovascular death between lower and higher PRS quintiles, in men and women respectively. Fig. 9 shows that cardiovascular death increased across all PRS quintiles and age strata. The slopes of cardiovascular death as a function of PRS quintiles increased from 0.005 and 0.011 in men and women younger than 65 years to 0.010 and 0.025 in middle aged men and women (65 to 70 years old) and to 0.015 and 0.037 in men and women older than 70 years (ANOVA of increasing slopes p=0.004 for men and r=0.013 for women) suggesting an interaction between age, PRS values of global risk and cardiovascular death.

Zoungas et al 38 reported that the best predictors of microvascular events were age of onset and diabetes duration in ADVANCE. We therefore analyzed micro- and macrovascular events as function of their PRS values and age of onset of T2D. Fig. 10 shows that the PRS was a better predictor (p=5.6 x 10 4 and 2.1 x 10 13 ) than age of onset (p=l .5 x 10 2 and 2.0 x 10 2 ) of major microvascular and major macrovascular events, respectively. No significant interactions were noted between PRS and age of onset (p=0.19 for microvascular and 0.15 for macrovascular events). The predicted risk of microvascular events was significantly lower in the low PRS third (r=0.010 and r=0.017 comparing low third to middle and late age of onset thirds). The risk of microvascular events was lower in individuals with later onset of diabetes across all PRS groups, contrasting with macrovascular events for which the highest risk was seen in the highest PRS group independently of the age of onset of diabetes. It is noteworthy that the stratification capacity of the PRS was best in patients with earlier onset of diabetes for both micro- and macrovascular events as shown by the p trend values in Fig. 10.

As shown previously in ADVANCE 38 , renal events are more dependent on age of onset of diabetes than age itself. We investigated further the contribution of the PRS, age, diabetes duration and age of onset of diabetes on the prediction of renal events (albuminuria and low eGFR) shown to be themselves independent predictors of cardiovascular and renal outcomes in T2D 17 . The PRS was a more powerful predictor of development of albuminuria than age (PRS p=4.2 x 10 15 and age p=4.2 x 10 5 ) or diabetes duration (PRS p=6.0 x 10 14 and diabetes duration p=l .2 x 10 2 ). The prediction with the PRS was also more significant (p=4.2 x 10 14 ) than age of onset (p=0.25). While no significant interactions were observed between diabetes duration and PRS (p=0T5) or age and PRS (p=0.07), a significant interaction was observed between the PRS and age of onset and the risk of albuminuria (p=0.02), such that the prediction capacity of the PRS was highest in patients with early onset of diabetes (younger than 56 years old). The risk stratification of developing albuminuria with the PRS was the most significant in younger patients (p trend=l .5 x 10 6 ) or in patients who had longer diabetes duration (p trend=l .7 x 10 5 ) or early onset of diabetes (p trend=2.7 x 10 7 ) (Fig. 11A-C).

Age was a more important predictor of low eGFR than the PRS (age p value=3.4 x 10 32 and PRS p value=2.6 x 10 13 ) but the PRS had a higher predictive value (p=l .8 x 10 9 ) than diabetes duration (p=3.2 x 10 5 ) and a higher predictive value (p=3.4 x 10 12 ) than age of onset of diabetes (p=l . l x 10 9 ). No interaction was noted between PRS and the three strata of age, diabetes duration and age of onset (all p interaction values >0.2). These results suggest that age is a dominant predictor of low eGFR while for albuminuria the PRS is the major predictor particularly in young subjects or at the onset of diabetes (Fig. 11D-F). This is an important characteristic of the prediction with the PRS highlighting its clinical utility for targeting individuals at high risk earlier than with other types of risk engine 18 .

2.3 Clinical utility of the PRS

To further evaluate the clinical utility of the PRS, we compared the differences in reduction of cardiovascular death by the two ADVANCE treatment arms; intensification of blood pressure and blood glucose control (see Example 1: Methods) between patients stratified by the two clinical scores of UKPDS and Framingham or the PRS4. While the UKPDS score failed to identify patients who benefited the most from the ADVANCE treatment combination, we observed a significant reduction of cardiovascular death by the ADVANCE treatment combination in individuals with high Framingham score (FRS) (r=0.013) compared to no significant effect in individuals with low FRS (Fig. 13A individuals with low risk scores and 3B individuals with high risk scores). Patients with high PRS4 values benefited even more from the ADVANCE treatment (p=0.0062) and the number needed to treat (NNT) to save one life from cardiovascular death during the 4.5-year duration of ADVANCE was 12 in the high PRS risk category (Fig. 13B). These results indicate that stratification by PRS values was the most effective to identify patients that benefited the most from ADVANCE combination therapy.

The cumulative hazard plots (Fig. 14) for all cause death, cardiovascular death and end-stage renal disease (ESRD) in ADVANCE (4.5 years for the trial) and ADVANCE-ON (for a total of 9.5 years) are concordant with previous clinical reports 16 · 20 . The cumulative incidence rate of total death, cardiovascular death, and end-stage renal disease was significantly different (p<0.000l) between individuals with low, medium and high PRS4 (PRS with sex, age of onset and diabetes duration). In addition, our data show that intensive blood pressure control achieved during ADVANCE trial led to a significant reduction of total death (HR=0.797, p=0.046) and cardiovascular death (HR=0.677, p=0.009) in individuals included in the highest PRS4 third only, and the reduction in cardiovascular and total death remained significant during ADVANCE-ON. Again, in line with previous clinical observations 23 no such benefit was observed for total and cardiovascular death with intensive glycemic control but it was observed for ESRD only in individuals carrying the highest PRS4 values (HR=0.345, p=0.043 at year 4.5) remaining significant in ADVANCE- ON (HR=0.455, p=0.026 at year 9.5). It should be noted that 59.2% cases of ESRD occurred in the highest PRS4 third (Fig. 14).

The risk of macrovascular outcomes and death increased exponentially according to PRS4 deciles, rising sharply at the last three deciles of the distribution, suggesting that 30% can be considered the threshold for high risk individuals (Fig. 16). For instance, the top three deciles of ADVANCE participants with the highest PRS4 had a 4.4-fold (p=l .9xl0 3 °) increased risk of cardiovascular death and 3. l-fold (p=6.8xl0 33 ) of all-cause death than the rest of participants. Here the PRS4 identified about one third of ADVANCE participants at more than 2-fold greater risk than the rest of subjects for almost all T2D complications shown in Fig. 16.

The threshold of 30% was confirmed by an unbiased, unsupervised hierarchical clustering analysis that identified three clusters of individuals representing 37.1%, 33.5%, and 29.4% of ADVANCE having a low, medium, or high genetic risk of major macrovascular events including myocardial infarction, stroke, heart failure, and cardiovascular and all cause death (Fig. 17). The incidence of cardiovascular death was 3.8 fold higher in individuals at high (11%) vs low genetic risk (2.9%) (p=l .5xl0 13 ). One fifth (20%) of individuals of the high genetic risk group have died during the ADVANCE trial compared to only 5% in the low risk category (p=l .8xl0 21 ). The difference was also highly significant for microvascular events (including albuminuria and decrease of eGFR) known to contribute to the high level of mortality in high risk individuals (Fig. 17, right panel).

Example 3:

PRS outperformed clinical scores in identifying diabetic patients with increased risk for incident and prevalent vascular complications and cardiovascular mortality

It is generally perceived that clinical risk scores are better than genetic ones to predict the risk of diseases. However, in the whole ADVANCE cohort, it was noted several years ago that the risks of major CVD and of major coronary heart diseases were over-estimated by the Framingham and UKPDS scores. The AUCs for major cardiovascular events (95% Cl) with the re-calibrated UKPDS equations were 0.61 (0.57-0.66), 0.62 (0.57-0.66) and 0.70 (0.65-0.76), performing better than the Framingham one (p<0.004 for UKPDS vs both Framingham) 39 . The authors explained the poor performance of these clinical scores in ADVANCE cohort by the fact that patients in ADVANCE are generally older, are from many countries with different ethnic backgrounds, and have characteristics different from those on which the scores were developed initially.

Prior analyses have evaluated genetic prediction of T2D 13 , kidney diseases 40 , stroke 18 or cardiovascular outcomes 41 . However, such analyses were limited to either a few SNPs or employed SNPs whose association with the disease was not well validated. Other considerations were that the genetic scores must improve the performance of the clinical ones and not the reciprocal or included in the clinical component, phenotypes that are clearly genetically based (for instance sex, race, ethnicity, family history).

Potential clinical utility of GRS emerged in the last few years with the demonstration that subjects in the highest genetic risk category for coronary heart disease events had the largest relative and absolute clinical benefit from statin therapy 42 . Interestingly, GRS were recently applied in re-classification of diabetes into 5 genetic/clinical categories leading to distinct outcomes 43 .

In our study, we selected a high number of well validated common SNPs to construct our PRS and considered that the strength of genetic markers above clinical ones is their presence from birth contrasting with clinical markers of processes already initiated 44 . By dividing the 622 SNPs into 27 risk factors that were further divided into 9 risk groups of vascular complications of T2D, we noticed that more than one SNP per locus were included in the PRS because they were associated to different complications in ADVANCE (Fig. 6). We propose here that taking into account genetic pleiotropy and population specific SNP determinants contribute to a better coverage of the populations. Pleiotropy has been recently proposed as a way of improving accuracy of GRS 45 · 46 .

Most recently a re-analysis of the robust and highly reproducible PRS developed for schizophrenia demonstrated that it contains SNPs strongly associated to ancestry which led the author to suggest caution about PRS 47 . He concluded that any PRS derived from European subjects, cannot be applied to non- Europeans and that previous studies including those cited above, on heart coronary disease and statin therapy 42 and on T2D diabetes re-classification 43 should be re-examined in the light of these findings. We recently published a strong East-West gradient in the prevalence and incidence of T2D complications in Europe 19 that appears to be due in part to a highly significant and genetically-based earlier age of onset of T2D in patients of Slavic origin. We thus selected a subset of 34,570 independent SNPs to perform a principal component (PC) 19 analysis of our genotyped patients 48 . PC1 separated the 4098 individuals into Caucasian of either Slavic or Celtic origin as described previously 19 and this was important to distinguish genomic from environmentally based determinants. For instance, we showed that age of onset of T2D and development of albuminuria were more genetically dependent compared to the presence of hypertension or low eGFRthat were more influenced by the environment 19 . We thus incorporated the individual PC 1 values to the 622 SNPs as an IGB to create the PRS. Our PRS is thus composed of a combination of SNPs associated to risk factors of T2D complications and PC1/IGB values associated to geo-ethnic stratification within the Caucasian population. Thus, while we concur with Curtis 47 on the importance of ethnicity in the development of diseases (even with increased granularity to capture lesser differences within the same racial group as a consequence of various migrations and admixtures), we propose that ancestral genomic background when relevant to diseases, needs to be incorporated into the PRS and our study demonstrates that the predictive power of the PRS can be improved by the inclusion of sex, age, age of onset and duration of disease exposure in addition to geo-ethnicity in the model.

Several studies have discussed the importance of timing in the application of a genetic test to be clinically useful, underlying the relative contribution to prevalent and incident cases. In ADVANCE, we observed that the penetrance of outcomes is different between macro- and microvascular complications 38 . For instance, in patients with T2D, age, age at T2D diagnosis and diabetes duration are independently positively associated to macrovascular events and death whereas only diabetes duration is independently associated to microvascular events and this is particularly true in young patients. Similarly to ADVANCE, the TODAY clinical trial reported a rapid rise in hypertension and nephropathy in youth with T2D 49 . Age of onset of diabetes is the most important contributor to the PRS to predict microvascular events (Figs. 7 and 10), while age contributes the most to the prediction of macrovascular events and death by the PRS (see Fig. 8). Futhermore, while a GRS, even with a limited number of SNPs, was significant for prevalence of myocardial infarct in a coronary angiography study, it was not significant for incident cases. The authors concluded that utility of genetic scoring is in primary prevention 50 . Similar observation was made by de Vries et al. 51 in the Rotterdam study where a GRS composed of 152 SNPs improved the prediction of prevalent coronary heart diseases (CHD) beyond traditional risk factors and family history but was not significant for incident CHD cases. We adhere to this conclusion, particularly when tesing subjects who already had the pathologic condition at the time of inclusion into the study. We demonstrated here the power of stratifying subjects along gradients of PRS. As discussed above the prediction is better with prevalence than incidence of microvascular complications since they were already present at the time of patients’ recruitment into the ADVANCE trial. Again, the main clinical utility of the PRS is in primary prevention before target organ damage.

A genetic risk score alone does not have inherent clinical utility, its clinical utility refers more to its ability to prevent or ameliorate adverse health outcomes by using its results to inform/advise clinical decision-making. Thus, the clinical utility of a risk engine depends not only on its predictive capacity but also on the discrimination thresholds chosen, the set of covariates used and the selection of appropriate control groups. It also includes analyses of prevalent as well as incident cases and finally the contribution of ethnicity, as shown here.

The capacity to detect subjects with the best response to medication is one of the most important observations of this study. The PRS was the best to distinguish subj ects who benefit the most from intensive antihypertensive and glucose lowering therapies applied in ADVANCE. Fig. 13 illustrates three main components of the present study: 1) Subjects classified into the low PRS category did not benefit from intensive therapies compared to patients of the higher PRS thirds; classification was the best with PRS compared to UKPDS and Framingham scores. 2) Combination of glucose and blood pressure intensive therapies showed the best reduction in risk, as reported in ADVANCE 22 and finally, 3) the highest thirds of PRS had the lowest number needed to treat with combined therapies: Only 12 subjects are needed to save one life over the 4.5-year period of ADVANCE.

In conclusion, we showed here that PRS outperformed clinical scores in identifying diabetic patients with increased risk for incident and prevalent vascular complications and cardiovascular mortality. The highest benefit of treatment was confined to the highest genetic risk category.

The strength of the present study resides in the dataset used for risk prediction that includes longitudinal data collected over a period of 9.5 years with relatively few missing values. Second, the selection of validated genomic variants was made from large meta-analyses and therefore overfitting was low in our cohort. Third, our prediction models are based on genetic variants that are present at birth and reliable demographic variables that are usually collected during clinical practice without requiring the presence of any clinical manifestations or initial outcomes.

REFERENCES

1. Emerging Risk Factors Collaboration, Sarwar N, Gao P, et al. Diabetes mellitus, fasting blood glucose concentration, and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies. Lancet 2010;375:2215-22. 2. Jacobs E, Hoyer A, Brinks R, Kuss O, Rathmann W. Burden of Mortality Attributable to Diagnosed Diabetes: A Nationwide Analysis Based on Claims Data From 65 Million People in Germany. Diabetes Care 2017;40: 1703-9.

3. American Diabetes Association. 9. Cardiovascular Disease and Risk Management: Standards of Medical Care in Diabetes-2018. Diabetes Care 20l8;4l:S86-S l04.

4. Zarkoob H, Lewinsky S, Almgren P, Melander O, Fakhrai-Rad H. Utilization of genetic data can improve the prediction of type 2 diabetes incidence in a Swedish cohort. PLoS One 20l7; l2:e0l80l80.

5. Kengne AP. The ADVANCE cardiovascular risk model and current strategies for cardiovascular disease risk evaluation in people with diabetes. Cardiovasc J Afir 2013;24:376-81.

6. Stevens RJ, Kothari V, Adler AI, Stratton IM, United Kingdom Prospective Diabetes Study G. The UKPDS risk engine: a model for the risk of coronary heart disease in Type II diabetes (UKPDS 56). Clin Sci (Lond) 2001; 101:671-9.

7. van Dieren S, Beulens JW, Kengne AP, et al. Prediction models for the risk of cardiovascular disease in patients with type 2 diabetes: a systematic review. Heart 2012;98:360-9.

8. Kengne AP, Patel A, Marre M, et al. Contemporary model for cardiovascular risk prediction in people with type 2 diabetes. Eur J Cardiovasc Prev Rehabil 201 l; l8:393-8.

9. Abraham G, Inouye M. Genomic risk prediction of complex human disease and its clinical application.

Curr Opin Genet Dev 2015;33 : 10-6.

10. Lall K, Magi R, Morris A, Metspalu A, Fischer K. Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores. Genet Med 2017; 19:322-9.

11. Lei X, Huang S. Enrichment of minor allele of SNPs and genetic prediction of type 2 diabetes risk in British population. PLoS One 20l7; l2:e0l87644.

12. Wu Y, Jing R, Dong Y, et al. Functional annotation of sixty-five type-2 diabetes risk SNPs and its application in risk prediction. Sci Rep 20l7;7:43709.

13. Anand SS, Meyre D, Pare G, et al. Genetic information and the prediction of incident type 2 diabetes in a high-risk multiethnic population: the EpiDREAM genetic study. Diabetes Care 2013;36:2836-42.

14. Mihaescu R, Moonesinghe R, Khoury MJ, Janssens AC. Predictive genetic testing for the identification of high-risk groups: a simulation study on the impact of predictive ability. Genome Med 2011;3:51.

15. Escott-Price V, Shoai M, Pither R, Williams J, Hardy J. Polygenic score prediction captures nearly all common genetic risk for Alzheimer's disease. Neurobiol Aging 2017;49:214 e7- e l 1.

16. Patel A, Group AC, MacMahon S, et al. Effects of a fixed combination of perindopril and indapamide on macrovascular and microvascular outcomes in patients with type 2 diabetes mellitus (the ADVANCE trial): a randomised controlled trial. Lancet 2007;370:829-40. 17. Ninomiya T, Perkovic V, de Galan BE, et al. Albuminuria and kidney function independently predict cardiovascular and renal outcomes in diabetes. J Am Soc Nephrol 2009;20: 1813-21.

18. Ibrahim -Verbaas CA, Fomage M, Bis JC, et al. Predicting stroke through genetic risk functions: the CHARGE Risk Score Project. Stroke 2014;45:403-12.

19. Hamet P, Haloui M, Harvey F, et al. PROX1 gene CC genotype as a major determinant of early onset of type 2 diabetes in Slavic study participants from Action in Diabetes and Vascular Disease: Preterax and Diamicron MR Controlled Evaluation study. J Hypertens 20l7;35 Suppl ES24-S32.

20. Zoungas S, Chalmers J, Neal B, et al. Follow-up of blood-pressure lowering and glucose control in type 2 diabetes. N Engl J Med 2014;371 : 1392-406.

21. ADVANCE Collaborative Group, Patel A, MacMahon S, et al. Intensive blood glucose control and vascular outcomes in patients with type 2 diabetes. N Engl J Med 2008;358:2560-72.

22. Zoungas S, de Galan BE, Ninomiya T, et al. Combined effects of routine blood pressure lowering and intensive glucose control on macrovascular and microvascular outcomes in patients with type 2 diabetes: New results from the ADVANCE trial. Diabetes Care 2009;32:2068-74.

23. Wong MG, Perkovic V, Chalmers J, et al. Long-term Benefits of Intensive Glucose Control for Preventing End-Stage Kidney Disease: ADVANCE-ON. Diabetes Care 2016;39:694-700.

24. Dcct Edic research group. Effect of intensive diabetes treatment on albuminuria in type 1 diabetes: long-term follow-up of the Diabetes Control and Complications Trial and Epidemiology of Diabetes Interventions and Complications study. Lancet Diabetes Endocrinol 2014;2:793-800.

25. D gostino RB, Sr., Vasan RS, Pencina MJ, et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation 2008; 117:743-53.

26. Dalgaard P. Introductory statistics with R. 2nd ed. ed2008.

27. Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011; 12:77.

28. DeLong ER, DeLong DM, Clarke -Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837-45.

29. Raj A, Stephens M, Pritchard JK. fast STRUCTURE: variational inference of population structure in large SNP data sets. Genetics 2014; 197:573-89.

30. Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007;81 :559-75.

31. Delaneau O, Marchini J, Zagury JF. A linear complexity phasing method for thousands of genomes.

Nat Methods 2011;9: 179-81.

32. Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 2009;5:e l000529. 33. htps ://mathgen. stats .ox.ac.uk/impute/ 1000GP_Phase3.html .

34. Cunningham F, Amode MR, Barrell D, et al. Ensembl 2015. Nucleic Acids Res 20l5;43:D662-9.

35. Southam L, Panoutsopoulou K, Rayner NW, et al. The effect of genome-wide association scan quality control on imputation outcome for common variants. Eur J Hum Genet 2011; 19:610-4.

36. Price AL, Paterson NJ, Plenge RM, Weinblat ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006;38:904-9.

37. Radholm K, Chalmers J, Ohkuma T, et al. Use of the waist-to-height-ratio to predict cardiovascular risk in patients with diabetes: results from ADVANCE-ON. Diabetes Obes Metab 2018.

38. Zoungas S, Woodward M, Li Q, et al. Impact of age, age at diagnosis and duration of diabetes on the risk of macrovascular and microvascular complications and death in type 2 diabetes. Diabetologia 2014;57:2465-74.

39. Kengne AP, Patel A, Colagiuri S, et al. The Framingham and UK Prospective Diabetes Study (UKPDS) risk equations do not reliably estimate the probability of cardiovascular events in a large ethnically diverse sample of patients with diabetes: the Action in Diabetes and Vascular Disease: Preterax and Diamicron-MR Controlled Evaluation (ADVANCE) Study. Diabetologia 20l0;53:82l- 31.

40. Ma J, Yang Q, Hwang SJ, Fox CS, Chu AY. Genetic risk score and risk of stage 3 chronic kidney disease. BMC Nephrol 20 l7; l8:32.

41. van Seten J, Isgum I, Pechlivanis S, et al. Serum lipid levels, body mass index, and their role in coronary artery calcification: a polygenic analysis. Circ Cardiovasc Genet 2015;8:327-33.

42. Mega JL, Stitziel NO, Smith JG, et al. Genetic risk, coronary heart disease events, and the clinical benefit of statin therapy: an analysis of primary and secondary prevention trials. Lancet 2015;385:2264-71.

43. Ahlqvist E, Storm P, Karajamaki A, et al. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol 2018.

44. Hamet P. Missing heritability or need for reality check of clinical utility in genomic testing? J Hypertens 2014;32: 1395-6.

45. Witoelar A, Jansen IE, Wang Y, et al. Genome-wide Pleiotropy Between Parkinson Disease and Autoimmune Diseases. JAMA Neurol 2017;74:780-92.

46. Li J, Wei Z, Hakonarson H. Application of computational methods in genetic study of inflammatory bowel disease. World J Gastroenterol 2016;22:949-60.

47. Curtis D. Polygenic risk score for schizophrenia is more strongly associated with ancestry than with schizophrenia. bioRxiv 2018. 48. Price AL, Paterson NJ, Plenge RM, Weinblat ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006;38:904-9.

49. Today Study Group. Rapid rise in hypertension and nephropathy in youth with type 2 diabetes: the TODAY clinical trial. Diabetes Care 2013;36: 1735-41.

50. Patel RS, Sun YV, Hartiala J, et al. Association of a genetic risk score with prevalent and incident myocardial infarction in subjects undergoing coronary angiography. Circ Cardiovasc Genet 2012;5:441-9.

51. de Vries PS, Kavousi M, Ligthart S, et al. Incremental predictive value of 152 single nucleotide polymorphisms in the lO-year risk prediction of incident coronary heart disease: the Roterdam Study. Int J Epidemiol 2015;44:682-8.