Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
PROTEIN PREDICTORS FOR LUNG CANCER
Document Type and Number:
WIPO Patent Application WO/2023/242206
Kind Code:
A1
Abstract:
Disclosed herein are methods for analyzing predictors including quantitative values of biomarkers (e.g., protein biomarkers) for predicting risk of cancer in a human subject. Further disclosed herein are kits for measuring quantitative values of the markers as well as computer systems and software embodiments for predicting risk of cancer in a human subject based on the quantitative values of the biomarkers (e.g., protein biomarkers).

Inventors:
SATO TAKAHIRO (US)
YANG ROBERT YUNCHUAN (US)
WHITNEY DUNCAN H (US)
Application Number:
PCT/EP2023/065832
Publication Date:
December 21, 2023
Filing Date:
June 13, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
JANSSEN PHARMACEUTICA NV (BE)
International Classes:
G01N33/574; G16H50/20
Foreign References:
US20150064210A12015-03-05
Other References:
RENÉE T FORTNER ET AL: "Endometrial cancer risk prediction including serum-based biomarkers: results from the EPIC cohort", INTERNATIONAL JOURNAL OF CANCER, JOHN WILEY & SONS, INC, US, vol. 140, no. 6, 27 January 2017 (2017-01-27), pages 1317 - 1323, XP071289649, ISSN: 0020-7136, DOI: 10.1002/IJC.30560
MA CHENHUI ET AL: "Bionformatics analysis reveals TSPAN1 as a candidate biomarker of progression and prognosis in pancreatic cancer", BOSNIAN JOURNAL OF BASIC MEDICAL SCIENCES, 1 February 2021 (2021-02-01), Sarajevo , Bosnia, XP093063778, ISSN: 1512-8601, Retrieved from the Internet DOI: 10.17305/bjbms.2020.5096
Attorney, Agent or Firm:
DUFFIELD, Stephen et al. (GB)
Download PDF:
Claims:
CLAIMS A method for predicting risk of cancer in a subject, the method comprising: obtaining or having obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of TSPAN1, CD28, SCN3B, ADGRB3, and IGFBP6, and generating a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers. The method of claim 1 , wherein the protein biomarkers comprise three or more of TSPAN1, CD28, SCN3B, ADGRB3, and IGFBP6. The method of claim 1, wherein the protein biomarkers comprise four or more of TSPAN1, CD28, SCN3B, ADGRB3, and IGFBP6. The method of claim 1 , wherein the protein biomarkers comprise each of TSPAN 1 , CD28, SCN3B, ADGRB3, and IGFBP6. The method of any one of claims 1-4, wherein the protein biomarkers further comprise one or more of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B. The method of any one of claims 1 -4, wherein the protein biomarkers further comprise five or more of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B. The method of any one of claims 1 -4, wherein the protein biomarkers further comprise ten or more of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B. The method of any one of claims 1-4, wherein the protein biomarkers further comprise each of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POFIB. The method of any one of claims 1-8, wherein the protein biomarkers further comprise one or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TM0D4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7. The method of any one of claims 1-8, wherein the protein biomarkers further comprise five or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TM0D4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7. The method of any one of claims 1-8, wherein the protein biomarkers further comprise ten or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, IAM2, FAP, TM0D4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7. The method of any one of claims 1-8, wherein the protein biomarkers further comprise twenty or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, IAM2, FAP, TM0D4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7. The method of any one of claims 1-8, wherein the protein biomarkers further comprise each of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, IAM2, FAP, TM0D4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7. The method of any one of claims 1-13, wherein the protein biomarkers further comprise one or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The method of any one of claims 1-13, wherein the protein biomarkers further comprise five or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, AP0A4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, EN03, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The method of any one of claims 1-13, wherein the protein biomarkers further comprise ten or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, AP0A4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, EN03, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The method of any one of claims 1-13, wherein the protein biomarkers further comprise twenty or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, AP0A4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, EN03, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The method of any one of claims 1-13, wherein the protein biomarkers further comprise thirty or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, AP0A4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, EN03, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The method of any one of claims 1-13, wherein the protein biomarkers further comprise forty or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, AP0A4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The method of any one of claims 1-13, wherein the protein biomarkers further comprise each of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The method of any one of claims 1-20, wherein the protein biomarkers further comprise one or more of ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, TJP3, DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, CTSO, CTLA4, CSF3R, FCAR, CTAG1A, SCPEP1, PRSS53, CRELD2, PILRA, PROC, VASH1, NOS3, BPIFB2, UPK3BL1, NOP56, JAM3, HLA-DRA, SIL1, TRPV3, EDEM2, POLR2A, CBLN1, FKBP7, CCL20, PILRB, SIRPB1, VSTM1, BST2, DLL4, C1RL, RNASET2, KCNH2, IL12RB2, FZD10, OXCT1, TREML2, GRIN2B, GFRAL, RGS8, LRPAP1, LRP2, IGSF21, DPT, HEPACAM2, MATN3, UXS1, PTTG1, BTN1A1, IL17C, SCIN, TK1, FKBP14, VWA5A, PRKG1, SV2A, PMCH, NEXN, CDCP1, DDX53, THSD1, PAK4, MMP12, FCN1, UMOD, PDIA4, IL6, BRK1, LILRA2, RBPMS2, SERPIND1, TPSG1, CEACAM5, FGF9, PPIF, RNF43, SIGLEC9, TOMM20, PDE5A, NELLI, GBA, PAEP, ERN1, PCSK7, CHCHD6, MARCO, SFTPA1, IL9, KYNU, SPINT1, LRFN2, NECTIN1, OSCAR, PZP, BPIFB1, LILRA5, CALY, RRAS, GADD45GIP1, ISM2, SCGB3A2, CEACAM6, LPP, GKN1, LRIG1, CLSPN, CXCL13, SFTPA2, COX6B1, PTGR1, RBPMS, PPT1, AOC1, PDLIM5, L3HYPDH, LONP1, APOL1, CEACAM18, FGF7, and KRT14. The method of any one of claims 1-21, wherein the predictive model comprises a elastic net regression model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.85. The method of any one of claims 1-21, wherein the predictive model comprises a support vector machine, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.84. The method of any one of claims 1-21, wherein the predictive model comprises a random forest model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.72. The method of any one of claims 1-21, wherein the predictive model comprises a XGBoost model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.73. A method for predicting risk of cancer in a subject, the method comprising: obtaining or having obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of GAST, ENPP2, FZD8, FGF23, and TFF1, and generating a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers. The method of claim 26, wherein the protein biomarkers comprise three or more of GAST, ENPP2, FZD8, FGF23, and TFF1. The method of claim 26, wherein the protein biomarkers comprise four or more of GAST, ENPP2, FZD8, FGF23, and TFF1. The method of claim 26, wherein the protein biomarkers comprise each of VWA5 A, GAST, ENPP2, FZD8, FGF23, and TFF1. The method of any one of claims 26-29, wherein the protein biomarkers further comprise one or more of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPPA, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3. The method of any one of claims 26-29, wherein the protein biomarkers further comprise five or more of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPPA, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3. The method of any one of claims 26-29, wherein the protein biomarkers further comprise ten or more of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPPA, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3. The method of any one of claims 26-29, wherein the protein biomarkers further comprise each of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPPA, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3. The method of any one of claims 26-33, wherein the protein biomarkers further comprise one or more of SOWAHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3. The method of any one of claims 26-33, wherein the protein biomarkers further comprise five or more of SOWAHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3. The method of any one of claims 26-33, wherein the protein biomarkers further comprise ten or more of SOWAHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3. The method of any one of claims 26-33, wherein the protein biomarkers further comprise twenty or more of SOWAHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3. The method of any one of claims 26-33, wherein the protein biomarkers further comprise each of SOWAHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3. The method of any one of claims 26-38, wherein the protein biomarkers further comprise one or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl 1, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. The method of any one of claims 26-38, wherein the protein biomarkers further comprise five or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. The method of any one of claims 26-38, wherein the protein biomarkers further comprise ten or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, HNAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. The method of any one of claims 26-38, wherein the protein biomarkers further comprise twenty or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, HNAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. The method of any one of claims 26-38, wherein the protein biomarkers further comprise thirty or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. The method of any one of claims 26-38, wherein the protein biomarkers further comprise forty or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, HNAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. The method of any one of claims 26-38, wherein the protein biomarkers further comprise each of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. The method of any one of claims 26-45, wherein the protein biomarkers further comprise one or more of GRN, IFNAR1, ENPEP, ACADSB, MAN1A2, GBP4, SERPING1, COL4A4, SOX2, GRSF1, PRAME, KIR2DS4, ADAMTS1, ITPRIP, CRISP3, DSG4, ITIH4, MRC1, GABRA4, SERPINA3, MILR1, PLIN1, SHH, KLKB1, IL17RA, MMP10, LBP, SMAD5, ADRA2A, SESTD1, CFI, AKR7L, CTSH, LYPD3, CBLIF, SMTN, CFH, SERPINC1, GDF15, PDZD2, ALDH2, IZUMO1, DNM3, CCL19, CSF2, MCEE, FDX1, SDC1, POSTN, GP2, CST7, CD 14, NEK7, SHC1, CRELD1, TCN2, CMIP, CRHBP, C9, PXDNL, NRCAM, DLG4, TRAF3IP2, SULT2A1, GSTT2B, ITIH1, MRPL24, MUC16, IL3, CLU, FHIP2A, TK1, FKBP14, VWA5A, PRKG1, SV2A, PMCH, NEXN, CDCP1, DDX53, THSD1, PAK4, MMP12, FCN1, UMOD, PDIA4, IL6, BRK1, LILRA2, RBPMS2, SERPIND1, TPSG1, CEACAM5, FGF9, PPIF, RNF43, SIGLEC9, TOMM20, PDE5A, NELLI, GBA, PAEP, ERN1, PCSK7, CHCHD6, MARCO, SFTPA1, IL9, KYNU, SPINT1, LRFN2, NECTIN1, OSCAR, PZP, BPIFB1, LILRA5, CALY, RRAS, GADD45GIP1, ISM2, SCGB3A2, CEACAM6, LPP, GKN1, LRIG1, CLSPN, CXCL13, SFTPA2, COX6B1, PTGR1, RBPMS, PPT1, AOC1, PDLIM5, L3HYPDH, LONP1, APOL1, CEACAM18, FGF7, and KRT14. The method of any one of claims 26-46, wherein the predictive model comprises a elastic net regression model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.79. The method of any one of claims 26-46, wherein the predictive model comprises a support vector machine, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.81. The method of any one of claims 26-46, wherein the predictive model comprises a random forest model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.71. The method of any one of claims 26-46, wherein the predictive model comprises a XGBoost model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.70. A method for predicting risk of cancer in a subject, the method comprising: obtaining or having obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1, and generating a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers. The method of claim 51, wherein the protein biomarkers comprise three or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1. The method of claim 51, wherein the protein biomarkers comprise four or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1. The method of claim 51, wherein the protein biomarkers comprise each of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1. The method of any one of claims 51-54, wherein the protein biomarkers further comprise one or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16. The method of any one of claims 51-54, wherein the protein biomarkers further comprise five or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16. The method of any one of claims 51-54, wherein the protein biomarkers further comprise ten or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16. The method of any one of claims 51-54, wherein the protein biomarkers further comprise each of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16. The method of any one of claims 51-58, wherein the protein biomarkers further comprise one or more of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3. The method of any one of claims 51-58, wherein the protein biomarkers further comprise five or more of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3. The method of any one of claims 51-58, wherein the protein biomarkers further comprise each of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3. The method of any one of claims 51-61, wherein the predictive model comprises a elastic net regression model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.65. The method of any one of claims 51-61, wherein the predictive model comprises a support vector machine, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.70. The method of any one of claims 51-61, wherein the predictive model comprises a random forest model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.67. The method of any one of claims 51-61, wherein the predictive model comprises a XGBoost model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.68. The method of any one of claims 1-65, wherein the cancer is lung cancer. The method of any one of claims 1 -66, wherein the risk of cancer is a level of risk of the subject developing cancer within 1 year, within 2 years, within 3 years, within 4 years, within 5 years, within 6 years, within 7 years, within 8 years, within 9 years, or within 10 years. The method of any one of claims 1-66, wherein the risk of cancer is a presence or absence of cancer. The method of any one of claims 1-68, wherein the dataset is derived from a test sample obtained from the subject. The method of claim 69, wherein the test sample is a blood, serum or plasma sample. The method of any one of claims 1-70, wherein obtaining or having obtained the dataset comprises performing one or more assays. The method of claim 71, wherein performing the one or more assays comprises performing an immunoassay to determine the expression levels of the plurality of biomarkers. The method of claim 72, wherein the immunoassay is a Proximity Extension Assay (PEA) or LUMINEX xMAP Multiplex Assay. The method of any one of claims 1-73, wherein the dataset comprises plasma proteomics data. The method of any one of claims 1-74, further comprising: selecting a therapy for providing to the subject based on the prediction of cancer. A non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain or have obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of TSPAN1, CD28, SCN3B, ADGRB3, and IGFBP6, and generate a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers. The non-transitory computer readable medium of claim 76, wherein the protein biomarkers comprise three or more of TSPAN1, CD28, SCN3B, ADGRB3, and IGFBP6. The non-transitory computer readable medium of claim 76, wherein the protein biomarkers comprise four or more of TSPAN1, CD28, SCN3B, ADGRB3, and IGFBP6. The non-transitory computer readable medium of claim 76, wherein the protein biomarkers comprise each of TSPAN1, CD28, SCN3B, ADGRB3, and IGFBP6. The non-transitory computer readable medium of any one of claims 76-79, wherein the protein biomarkers further comprise one or more of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B. The non-transitory computer readable medium of any one of claims 76-79, wherein the protein biomarkers further comprise five or more of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B. The non-transitory computer readable medium of any one of claims 76-79, wherein the protein biomarkers further comprise ten or more of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B. The non-transitory computer readable medium of any one of claims 76-79, wherein the protein biomarkers further comprise each of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B. The non-transitory computer readable medium of any one of claims 76-83, wherein the protein biomarkers further comprise one or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TM0D4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7. The non-transitory computer readable medium of any one of claims 76-83, wherein the protein biomarkers further comprise five or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TM0D4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7. The non-transitory computer readable medium of any one of claims 76-83, wherein the protein biomarkers further comprise ten or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TM0D4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7. The non-transitory computer readable medium of any one of claims 76-83, wherein the protein biomarkers further comprise twenty or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TM0D4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7. The non-transitory computer readable medium of any one of claims 76-83, wherein the protein biomarkers further comprise each of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TM0D4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7. The non-transitory computer readable medium of any one of claims 76-88, wherein the protein biomarkers further comprise one or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The non-transitory computer readable medium of any one of claims 76-88, wherein the protein biomarkers further comprise five or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The non-transitory computer readable medium of any one of claims 76-88, wherein the protein biomarkers further comprise ten or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The non-transitory computer readable medium of any one of claims 76-88, wherein the protein biomarkers further comprise twenty or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The non-transitory computer readable medium of any one of claims 76-88, wherein the protein biomarkers further comprise thirty or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The non-transitory computer readable medium of any one of claims 76-88, wherein the protein biomarkers further comprise forty or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The non-transitory computer readable medium of any one of claims 76-88, wherein the protein biomarkers further comprise each of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. The non-transitory computer readable medium of any one of claims 76-95, wherein the protein biomarkers further comprise one or more of ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, TJP3, DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, CTSO, CTLA4, CSF3R, FCAR, CTAG1A, SCPEP1, PRSS53, CRELD2, PILRA, PROC, VASH1, NOS3, BPIFB2, UPK3BL1, NOP56, JAM3, HLA-DRA, SIL1, TRPV3, EDEM2, POLR2A, CBLN1, FKBP7, CCL20, PILRB, SIRPB1, VSTM1, BST2, DLL4, C1RL, RNASET2, KCNH2, IL12RB2, FZD10, OXCT1, TREML2, GRIN2B, GFRAL, RGS8, LRPAP1, LRP2, IGSF21, DPT, HEPACAM2, MATN3, UXS1, PTTG1, BTN1A1, IL17C, SCIN, TK1, FKBP14, VWA5A, PRKG1, SV2A, PMCH, NEXN, CDCP1, DDX53, THSD1, PAK4, MMP12, FCN1, UMOD, PDIA4, IL6, BRK1, LILRA2, RBPMS2, SERPIND1, TPSG1, CEACAM5, FGF9, PPIF, RNF43, SIGLEC9, TOMM20, PDE5A, NELLI, GBA, PAEP, ERN1, PCSK7, CHCHD6, MARCO, SFTPA1, IL9, KYNU, SPINT1, LRFN2, NECTIN1, OSCAR, PZP, BPIFB1, LILRA5, CALY, RRAS, GADD45GIP1, ISM2, SCGB3A2, CEACAM6, LPP, GKN1, LRIG1, CLSPN, CXCL13, SFTPA2, COX6B1, PTGR1, RBPMS, PPT1, AOC1, PDLIM5, L3HYPDH, LONP1, APOL1, CEAC M18, FGF7, KRT14. The non-transitory computer readable medium of any one of claims 76-96, wherein the predictive model comprises a elastic net regression model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.85. The non-transitory computer readable medium of any one of claims 76-96, wherein the predictive model comprises a support vector machine, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.84. The non-transitory computer readable medium of any one of claims 76-96, wherein the predictive model comprises a random forest model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.72. . The non-transitory computer readable medium of any one of claims 76-96, wherein the predictive model comprises a XGBoost model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.73. . A non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain or have obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of GAST, ENPP2, FZD8, FGF23, and TFF1, and generate a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers. . The non-transitory computer readable medium of claim 101, wherein the protein biomarkers comprise three or more of GAST, ENPP2, FZD8, FGF23, and TFF1. . The non-transitory computer readable medium of claim 101, wherein the protein biomarkers comprise four or more of GAST, ENPP2, FZD8, FGF23, and TFF1. . The non-transitory computer readable medium of claim 101, wherein the protein biomarkers comprise each of GAST, ENPP2, FZD8, FGF23, and TFF1.

. The non-transitory computer readable medium of any one of claims 101-104, wherein the protein biomarkers further comprise one or more of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPP A, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3. . The non-transitory computer readable medium of any one of claims 101-104, wherein the protein biomarkers further comprise five or more of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPP A, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3. . The non-transitory computer readable medium of any one of claims 101-104, wherein the protein biomarkers further comprise ten or more of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPP A, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3. . The non-transitory computer readable medium of any one of claims 101-104, wherein the protein biomarkers further comprise each of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPP A, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3.. The non-transitory computer readable medium of any one of claims 101-108, wherein the protein biomarkers further comprise one or more of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3. . The non-transitory computer readable medium of any one of claims 101-108, wherein the protein biomarkers further comprise five or more of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3. . The non-transitory computer readable medium of any one of claims 101-108, wherein the protein biomarkers further comprise ten or more of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3.

. The non-transitory computer readable medium of any one of claims 101-108, wherein the protein biomarkers further comprise twenty or more of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3.. The non-transitory computer readable medium of any one of claims 101-108, wherein the protein biomarkers further comprise each of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3. . The non-transitory computer readable medium of any one of claims 101-113, wherein the protein biomarkers further comprise one or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, HNAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl 1, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. . The non-transitory computer readable medium of any one of claims 101-113, wherein the protein biomarkers further comprise five or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, HNAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. . The non-transitory computer readable medium of any one of claims 101-113, wherein the protein biomarkers further comprise ten or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, HNAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, EN0PH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. . The non-transitory computer readable medium of any one of claims 101-113, wherein the protein biomarkers further comprise twenty or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. . The non-transitory computer readable medium of any one of claims 101-113, wherein the protein biomarkers further comprise thirty or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. . The non-transitory computer readable medium of any one of claims 101-113, wherein the protein biomarkers further comprise forty or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. . The non-transitory computer readable medium of any one of claims 101-113, wherein the protein biomarkers further comprise each of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D.

. The non-transitory computer readable medium of any one of claims 101-120, wherein the protein biomarkers further comprise one or more of GRN, IFNAR1 , ENPEP, ACADSB, MAN1A2, GBP4, SERPING1, COL4A4, SOX2, GRSF1, PRAME, KIR2DS4, ADAMTS1, ITPRIP, CRISP3, DSG4, ITIH4, MRC1, GABRA4, SERPINA3, MILR1, PLIN1, SHH, KLKB1, IL17RA, MMP10, LBP, SMAD5, ADRA2A, SESTD1, CFI, AKR7L, CTSH, LYPD3, CBLIF, SMTN, CFH, SERPINC1, GDF15, PDZD2, ALDH2, IZUM01, DNM3, CCL19, CSF2, MCEE, FDX1, SDC1, POSTN, GP2, CST7, CD 14, NEK7, SHC1, CRELD1, TCN2, CMIP, CRHBP, C9, PXDNL, NRCAM, DLG4, TRAF3IP2, SULT2A1, GSTT2B, ITIH1, MRPL24, MUC16, IL3, CLU, FHIP2A, TK1, FKBP14, VWA5A, PRKG1, SV2A, PMCH, NEXN, CDCP1, DDX53, THSD1, PAK4, MMP12, FCN1, UMOD, PDIA4, IL6, BRK1, LILRA2, RBPMS2, SERPIND1, TPSG1, CEACAM5, FGF9, PPIF, RNF43, SIGLEC9, TOMM20, PDE5A, NELLI, GBA, PAEP, ERN1, PCSK7, CHCHD6, MARCO, SFTPA1, IL9, KYNU, SPINT1, LRFN2, NECTIN1, OSCAR, PZP, BPIFB1, LILRA5, CALY, RRAS, GADD45GIP1, ISM2, SCGB3A2, CEACAM6, LPP, GKN1, LRIG1, CLSPN, CXCL13, SFTPA2, COX6B1, PTGR1, RBPMS, PPT1, AOC1, PDLIM5, L3HYPDH, LONP1, APOL1, CEACAM18, FGF7, and KRT14. . The non-transitory computer readable medium of any one of claims 101-121, wherein the predictive model comprises a elastic net regression model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.79. . The non-transitory computer readable medium of any one of claims 101-121, wherein the predictive model comprises a support vector machine, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.81. . The non-transitory computer readable medium of any one of claims 101-121, wherein the predictive model comprises a random forest model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.71. . The non-transitory computer readable medium of any one of claims 101-121, wherein the predictive model comprises a XGBoost model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.70.

. The non-transitory computer readable medium of any one of claims 101-125, wherein the cancer is lung cancer. . The non-transitory computer readable medium of any one of claims 101-126, wherein the risk of cancer is a level of risk of the subject developing cancer within 1 year, within 2 years, within 3 years, within 4 years, within 5 years, within 6 years, within 7 years, within 8 years, within 9 years, or within 10 years. . The non-transitory computer readable medium of any one of claims 101-127, wherein the risk of cancer is a presence or absence of cancer. . The non-transitory computer readable medium of any one of claims 101-128, wherein the dataset is derived from a test sample obtained from the subject. . The non-transitory computer readable medium of any one of claims 101-129, wherein the test sample is a blood, serum or plasma sample. . The non-transitory computer readable medium of any one of claims 101-130, wherein the dataset is obtained from having performed one or more assays. . The non-transitory computer readable medium of claim 131, wherein the one or more assays comprises an immunoassay to determine the expression levels of the plurality of biomarkers. . The non-transitory computer readable medium of claim 132, wherein the immunoassay is a Proximity Extension Assay (PEA) or LUMINEX xMAP Multiplex Assay. . A non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain or have obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1, and generate a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers. . The non-transitory computer readable medium of claim 134, wherein the protein biomarkers comprise three or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1.

. The non-transitory computer readable medium of claim 134, wherein the protein biomarkers comprise four or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1. . The non-transitory computer readable medium of claim 134, wherein the protein biomarkers comprise each of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1.. The non-transitory computer readable medium of any one of claims 134-137, wherein the protein biomarkers further comprise one or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16. . The non-transitory computer readable medium of any one of claims 134-137, wherein the protein biomarkers further comprise five or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16. . The non-transitory computer readable medium of any one of claims 134-137, wherein the protein biomarkers further comprise ten or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16. . The non-transitory computer readable medium of any one of claims 134-137, wherein the protein biomarkers further comprise each of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.. The non-transitory computer readable medium of any one of claims 134-141, wherein the protein biomarkers further comprise one or more of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3. . The non-transitory computer readable medium of any one of claims 134-141, wherein the protein biomarkers further comprise five or more of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3. . The non-transitory computer readable medium of any one of claims 134-141, wherein the protein biomarkers further comprise each of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

. The non-transitory computer readable medium of any one of claims 134-144, wherein the predictive model comprises a elastic net regression model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.65. . The non-transitory computer readable medium of any one of claims 134-144, wherein the predictive model comprises a support vector machine, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.70. . The non-transitory computer readable medium of any one of claims 134-144, wherein the predictive model comprises a random forest model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.67. . The non-transitory computer readable medium of any one of claims 134-144, wherein the predictive model comprises a XGBoost model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.68. . The non-transitory computer readable medium of any one of claims 76-148, wherein the dataset comprises plasma proteomics data. . The non-transitory computer readable medium of any one of claims 76-149, wherein a therapy is selected for providing to the subject based on the prediction of cancer.

Description:
PROTEIN PREDICTORS FOR LUNG CANCER

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/351,689 filed June 13, 2022, the entire disclosure of which is hereby incorporated by reference in its entirety for all purposes.

FIELD

[0002] The field relates to predictive models that are useful for predicting risk of cancer (e.g., lung cancer). These predictive models are based at least on the measurement of protein profiles from samples (e.g., blood plasma samples).

BACKGROUND

[0003] Lung cancer is the leading cause of cancer deaths worldwide. This is largely due to its advanced stage at the time of diagnosis, with 5-year survival of only 15% or less. It is difficult to identify people who have early stage lung cancer in a cost-efficient manner. Hence, people are often referred to hospital clinics with late stage disease, which leads to poor curative opportunities and outlook.

SUMMARY

[0004] Disclosed herein are methods for predicting risk of cancer (e.g., future risk of cancer or presence or absence of cancer) in a subject using plasma proteomics data derived from the subject. Further disclosed are methods, such as recursive feature elimination, for selecting a subset of protein biomarkers for predicting risk of cancer. Additionally disclosed herein are non- transitory computer readable mediums for predicting risk of cancer in a subject using predictive models. Additionally disclosed herein are kits containing one or more sets of reagents for determining quantitative values of protein predictors for predicting risk of cancer. In various embodiments, the prediction for risk of cancer for the subject is a prediction of presence or absence of cancer in the subject, or a prediction of whether the subject is likely to develop cancer in the future (e.g., within 1-20 years). In various embodiments, the terms “levels” and “values”, such as the levels or values of metabolites, biomarkers, markers or predictors, are synonymous and may be used interchangeably. Therefore, in these embodiments, any reference to “values”, such as the values of metabolites, biomarkers, markers or predictors, may equally be construed as “levels”, such as the levels of those metabolites, biomarkers, markers or predictors. Similarly, in these embodiments, any reference herein to “levels”, such as the levels of metabolites, biomarkers, markers or predictors, may equally be construed as “values”, such as the values of those metabolites, biomarkers, markers or predictors.

[0005] Advantageously, the methods, non-transitory computer readable mediums, and/or kits as described herein can lead to early detection of lung cancer (e.g., before diagnosis), which may result in early intervention and treatment. This informs which patients to target with disease interception strategies, and thus improve the survival and decreased mortality rates due to lung cancer.

[0006] Disclosed herein is a method for predicting risk of cancer in a subject, the method comprising: obtaining or having obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of TSPAN1, CD28, SCN3B, ADGRB3, and IGFBP6, and generating a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers.

[0007] In various embodiments, the protein biomarkers comprise three or more of TSPAN 1 , CD28, SCN3B, ADGRB3, and IGFBP6.

[0008] In various embodiments, the protein biomarkers comprise four or more of TSPAN 1 , CD28, SCN3B, ADGRB3, and IGFBP6.

[0009] In various embodiments, the protein biomarkers comprise each of TSPAN1, CD28, SCN3B, ADGRB3, and IGFBP6.

[0010] In various embodiments, the protein biomarkers further comprise one or more of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B.

[0011] In various embodiments, the protein biomarkers further comprise five or more of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B. [0012] In various embodiments, the protein biomarkers further comprise ten or more of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B.

[0013] In various embodiments, the protein biomarkers further comprise each of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B.

[0014] In various embodiments, the protein biomarkers further comprise one or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TMOD4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7.

[0015] In various embodiments, the protein biomarkers further comprise five or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TMOD4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7.

[0016] In various embodiments, the protein biomarkers further comprise ten or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TMOD4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7.

[0017] In various embodiments, the protein biomarkers further comprise twenty or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TMOD4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7.

[0018] In various embodiments, the protein biomarkers further comprise each of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TMOD4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7. [0019] In various embodiments, the protein biomarkers further comprise one or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[0020] In various embodiments, the protein biomarkers further comprise five or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[0021] In various embodiments, the protein biomarkers further comprise ten or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[0022] In various embodiments, the protein biomarkers further comprise twenty or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[0023] In various embodiments, the protein biomarkers further comprise thirty or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP. [0024] In various embodiments, the protein biomarkers further comprise forty or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[0025] In various embodiments, the protein biomarkers further comprise each of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[0026] In various embodiments, the protein biomarkers further comprise one or more of ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, TJP3, DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, CTSO, CTLA4, CSF3R, FCAR, CTAG1A, SCPEP1, PRSS53, CRELD2, PILRA, PROC, VASH1, NOS3, BPIFB2, UPK3BL1, NOP56, JAM3, HLA-DRA, SIL1, TRPV3, EDEM2, POLR2A, CBLN1, FKBP7, CCL20, PILRB, SIRPB1, VSTM1, BST2, DLL4, C1RL, RNASET2, KCNH2, IL12RB2, FZD10, OXCT1, TREML2, GRIN2B, GFRAL, RGS8, LRPAP1, LRP2, IGSF21, DPT, HEPACAM2, MATN3, UXS1, PTTG1, BTN1A1, IL17C, SCIN, TK1, FKBP14, VWA5A, PRKG1, SV2A, PMCH, NEXN, CDCP1, DDX53, THSD1, PAK4, MMP12, FCN1, UMOD, PDIA4, IL6, BRK1, LILRA2, RBPMS2, SERPIND1, TPSG1, CEACAM5, FGF9, PPIF, RNF43, SIGLEC9, TOMM20, PDE5A, NELLI, GBA, PAEP, ERN1, PCSK7, CHCHD6, MARCO, SFTPA1, IL9, KYNU, SPINT1, LRFN2, NECTIN1, OSCAR, PZP, BPIFB1, LILRA5, CALY, RRAS, GADD45GIP1, ISM2, SCGB3A2, CEACAM6, LPP, GKN1, LRIG1, CLSPN, CXCL13, SFTPA2, COX6B1, PTGR1, RBPMS, PPT1, AOC1, PDLIM5, L3HYPDH, LONP1, APOL1, CEACAM18, FGF7, and KRT14. [0027] In various embodiments, the predictive model comprises a elastic net regression model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.85. [0028] In various embodiments, the predictive model comprises a support vector machine, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.84.

[0029] In various embodiments, the predictive model comprises a random forest model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.72.

[0030] In various embodiments, the predictive model comprises a XGBoost model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.73.

[0031] Additionally disclosed herein is a method for predicting risk of cancer in a subject, the method comprising: obtaining or having obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of GAST, ENPP2, FZD8, FGF23, and TFF1, and generating a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers.

[0032] In various embodiments, the protein biomarkers comprise three or more of GAST, ENPP2, FZD8, FGF23, and TFF1.

[0033] In various embodiments, the protein biomarkers comprise four or more of GAST, ENPP2, FZD8, FGF23, and TFF1.

[0034] In various embodiments, the protein biomarkers comprise each of VWA5A, GAST, ENPP2, FZD8, FGF23, and TFF1.

[0035] In various embodiments, the protein biomarkers further comprise one or more of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPP A, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3.

[0036] In various embodiments, the protein biomarkers further comprise five or more of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPP A, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3.

[0037] In various embodiments, the protein biomarkers further comprise ten or more of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPP A, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3. [0038] In various embodiments, the protein biomarkers further comprise each of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPPA, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3.

[0039] In various embodiments, the protein biomarkers further comprise one or more of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3.

[0040] In various embodiments, the protein biomarkers further comprise five or more of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3.

[0041] In various embodiments, the protein biomarkers further comprise ten or more of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3.

[0042] In various embodiments, the protein biomarkers further comprise twenty or more of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3.

[0043] The method of any one of claims 26-33, wherein the protein biomarkers further comprise each of SOWAHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3.

[0044] In various embodiments, the protein biomarkers further comprise one or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. [0045] In various embodiments, the protein biomarkers further comprise five or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D.

[0046] In various embodiments, the protein biomarkers further comprise ten or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D.

[0047] In various embodiments, the protein biomarkers further comprise twenty or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D.

[0048] In various embodiments, the protein biomarkers further comprise thirty or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D.

[0049] In various embodiments, the protein biomarkers further comprise forty or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. [0050] In various embodiments, the protein biomarkers further comprise each of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D.

[0051] In various embodiments, the protein biomarkers further comprise one or more of GRN, IFNAR1, ENPEP, ACADSB, MAN1A2, GBP4, SERPING1, COL4A4, SOX2, GRSF1, PRAME, KIR2DS4, ADAMTS1, ITPRIP, CRISP3, DSG4, ITIH4, MRC1, GABRA4, SERPINA3, MILR1, PLIN1, SHH, KLKB1, IL17RA, MMP10, LBP, SMAD5, ADRA2A, SESTD1, CFI, AKR7L, CTSH, LYPD3, CBLIF, SMTN, CFH, SERPINC1, GDF15, PDZD2, ALDH2, IZUMO1, DNM3, CCL19, CSF2, MCEE, FDX1, SDC1, POSTN, GP2, CST7, CD14, NEK7, SHC1, CRELD1, TCN2, CMIP, CRHBP, C9, PXDNL, NRCAM, DLG4, TRAF3IP2, SULT2A1, GSTT2B, ITIH1, MRPL24, MUC16, IL3, CLU, FHIP2A, TK1, FKBP14, VWA5A, PRKG1, SV2A, PMCH, NEXN, CDCP1, DDX53, THSD1, PAK4, MMP12, FCN1, UMOD, PDIA4, IL6, BRK1, LILRA2, RBPMS2, SERPIND1, TPSG1, CEACAM5, FGF9, PPIF, RNF43, SIGLEC9, TOMM20, PDE5A, NELLI, GBA, PAEP, ERN1, PCSK7, CHCHD6, MARCO, SFTPA1, IL9, KYNU, SPINT1, LRFN2, NECTIN1, OSCAR, PZP, BPIFB1, LILRA5, CALY, RRAS, GADD45GIP1, ISM2, SCGB3A2, CEACAM6, LPP, GKN1, LRIG1, CLSPN, CXCL13, SFTPA2, COX6B1, PTGR1, RBPMS, PPT1, AOC1, PDLIM5, L3HYPDH, LONP1, APOL1, CEACAM18, FGF7, and KRT14.

[0052] In various embodiments, the predictive model comprises a elastic net regression model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.79. [0053] In various embodiments, the predictive model comprises a support vector machine, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.81.

[0054] In various embodiments, the predictive model comprises a random forest model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.71.

[0055] In various embodiments, the predictive model comprises a XGBoost model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.70. [0056] Additionally disclosed herein is a method for predicting risk of cancer in a subject, the method comprising: obtaining or having obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1, and generating a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers.

[0057] In various embodiments, the protein biomarkers comprise three or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1.

[0058] In various embodiments, the protein biomarkers comprise four or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1.

[0059] In various embodiments, the protein biomarkers comprise each of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASPl.

[0060] In various embodiments, the protein biomarkers further comprise one or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[0061] In various embodiments, the protein biomarkers further comprise five or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[0062] In various embodiments, the protein biomarkers further comprise ten or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[0063] In various embodiments, the protein biomarkers further comprise each of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[0064] In various embodiments, the protein biomarkers further comprise one or more of IL IB, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[0065] In various embodiments, the protein biomarkers further comprise five or more of IL IB, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[0066] In various embodiments, the protein biomarkers further comprise each of IL IB, CD 84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3. [0067] In various embodiments, the predictive model comprises a elastic net regression model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.65. [0068] In various embodiments, the predictive model comprises a support vector machine, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.70.

[0069] In various embodiments, the predictive model comprises a random forest model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.67.

[0070] In various embodiments, the predictive model comprises a XGBoost model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.68.

[0071] In various embodiments, the cancer is lung cancer.

[0072] In various embodiments, the risk of cancer is a level of risk of the subject developing cancer within 1 year, within 2 years, within 3 years, within 4 years, within 5 years, within 6 years, within 7 years, within 8 years, within 9 years, or within 10 years.

[0073] In various embodiments, the risk of cancer is a presence or absence of cancer.

[0074] In various embodiments, the dataset is derived from a test sample obtained from the subject.

[0075] In various embodiments, the test sample is a blood, serum or plasma sample.

[0076] In various embodiments, obtaining or having obtained the dataset comprises performing one or more assays.

[0077] In various embodiments, performing the one or more assays comprises performing an immunoassay to determine the expression levels of the plurality of biomarkers.

[0078] In various embodiments, the immunoassay is a Proximity Extension Assay (PEA) or LUMINEX xMAP Multiplex Assay.

[0079] In various embodiments, the dataset comprises plasma proteomics data.

[0080] In various embodiments, the method further comprises: selecting a therapy for providing to the subject based on the prediction of cancer.

[0081] Additionally disclosed herein is a non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain or have obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of

TSPAN1, CD28, SCN3B, ADGRB3, and IGFBP6, and generate a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers.

[0082] In various embodiments, the protein biomarkers comprise three or more of TSPAN 1 , CD28, SCN3B, ADGRB3, and IGFBP6.

[0083] In various embodiments, the protein biomarkers comprise four or more of TSPAN 1 , CD28, SCN3B, ADGRB3, and IGFBP6.

[0084] In various embodiments, the protein biomarkers comprise each of TSPAN1, CD28, SCN3B, ADGRB3, and IGFBP6.

[0085] In various embodiments, the protein biomarkers further comprise one or more of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B.

[0086] In various embodiments, the protein biomarkers further comprise five or more of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B.

[0087] In various embodiments, the protein biomarkers further comprise ten or more of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B.

[0088] In various embodiments, the protein biomarkers further comprise each of NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LM0D1, and POF1B.

[0089] In various embodiments, the protein biomarkers further comprise one or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TMOD4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7.

[0090] In various embodiments, the protein biomarkers further comprise five or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TMOD4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7. [0091] In various embodiments, the protein biomarkers further comprise ten or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TMOD4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7.

[0092] In various embodiments, the protein biomarkers further comprise twenty or more of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TMOD4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7.

[0093] In various embodiments, the protein biomarkers further comprise each of DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TM0D4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7. [0094] In various embodiments, the protein biomarkers further comprise one or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[0095] In various embodiments, the protein biomarkers further comprise five or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[0096] In various embodiments, the protein biomarkers further comprise ten or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1,

TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, AP0A4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[0097] In various embodiments, the protein biomarkers further comprise twenty or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[0098] In various embodiments, the protein biomarkers further comprise thirty or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[0099] In various embodiments, the protein biomarkers further comprise forty or more of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[00100] In various embodiments, the protein biomarkers further comprise each of CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[00101] In various embodiments, the protein biomarkers further comprise one or more of ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, TJP3, DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSP03, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, CTSO, CTLA4, CSF3R, FCAR, CTAG1A, SCPEP1, PRSS53, CRELD2, PILRA, PROC, VASH1, NOS3, BPIFB2, UPK3BL1, NOP56, JAM3, HLA-DRA, SIL1, TRPV3, EDEM2, POLR2A, CBLN1, FKBP7, CCL20, PILRB, SIRPB1, VSTM1, BST2, DLL4, C1RL, RNASET2, KCNH2, IL12RB2, FZD10, OXCT1, TREML2, GRIN2B, GFRAL, RGS8, LRPAP1, LRP2, IGSF21, DPT, HEPACAM2, MATN3, UXS1, PTTG1, BTN1A1, IL17C, SCIN, TK1, FKBP14, VWA5A, PRKG1, SV2A, PMCH, NEXN, CDCP1, DDX53, THSD1, PAK4, MMP12, FCN1, UMOD, PDIA4, IL6, BRK1, LILRA2, RBPMS2, SERPIND1, TPSG1, CEACAM5, FGF9, PPIF, RNF43, SIGLEC9, TOMM20, PDE5A, NELLI, GBA, PAEP, ERN1, PCSK7, CHCHD6, MARCO, SFTPA1, IL9, KYNU, SPINT1, LRFN2, NECTIN1, OSCAR, PZP, BPIFB1, LILRA5, CALY, RRAS, GADD45GIP1, ISM2, SCGB3A2, CEACAM6, LPP, GKN1, LRIG1, CLSPN, CXCL13, SFTPA2, COX6B1, PTGR1, RBPMS, PPT1, AOC1, PDLIM5, L3HYPDH, LONP1, APOL1, CEACAM18, FGF7, KRT14.

[00102] In various embodiments, the predictive model comprises a elastic net regression model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.85. [00103] In various embodiments, the predictive model comprises a support vector machine, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.84. [00104] In various embodiments, the predictive model comprises a random forest model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.72. [00105] In various embodiments, the predictive model comprises a XGBoost model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.73.

[00106] Additionally disclosed herein is a non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain or have obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of GAST, ENPP2, FZD8, FGF23, and TFF1, and generate a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers. [00107] In various embodiments, the protein biomarkers comprise three or more of GAST, ENPP2, FZD8, FGF23, and TFF1. [00108] In various embodiments, the protein biomarkers comprise four or more of GAST, ENPP2, FZD8, FGF23, and TFF1.

[00109] In various embodiments, the protein biomarkers comprise each of GAST, ENPP2, FZD8, FGF23, and TFFl.

[00110] In various embodiments, the protein biomarkers further comprise one or more of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPP A, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3.

[00111] In various embodiments, the protein biomarkers further comprise five or more of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPP A, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3.

[00112] In various embodiments, the protein biomarkers further comprise ten or more of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPP A, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3.

[00113] In various embodiments, the protein biomarkers further comprise each of MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPP A, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3.

[00114] In various embodiments, the protein biomarkers further comprise one or more of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3.

[00115] In various embodiments, the protein biomarkers further comprise five or more of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3.

[00116] In various embodiments, the protein biomarkers further comprise ten or more of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3.

[00117] In various embodiments, the protein biomarkers further comprise twenty or more of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNP01, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLKIO, CCL24, GPR37, CD3D, and TJP3.

[00118] In various embodiments, the protein biomarkers further comprise each of SOW AHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLKIO, CCL24, GPR37, CD3D, and TJP3.

[00119] In various embodiments, the protein biomarkers further comprise one or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D.

[00120] In various embodiments, the protein biomarkers further comprise five or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D.

[00121] In various embodiments, the protein biomarkers further comprise ten or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D.

[00122] In various embodiments, the protein biomarkers further comprise twenty or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D. [00123] In various embodiments, the protein biomarkers further comprise thirty or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D.

[00124] In various embodiments, the protein biomarkers further comprise forty or more of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D.

[00125] In various embodiments, the protein biomarkers further comprise each of DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, SCT, CFB, Fl l, ANK2, ENOPH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D.

[00126] In various embodiments, the protein biomarkers further comprise one or more of GRN, IFNAR1, ENPEP, ACADSB, MAN1A2, GBP4, SERPING1, COL4A4, SOX2, GRSF1, PRAME, KIR2DS4, ADAMTS1, ITPRIP, CRISP3, DSG4, ITIH4, MRC1, GABRA4, SERPINA3, MILR1, PLIN1, SHH, KLKB1, IL17RA, MMP10, LBP, SMAD5, ADRA2A, SESTD1, CFI, AKR7L, CTSH, LYPD3, CBLIF, SMTN, CFH, SERPINC1, GDF15, PDZD2, ALDH2, IZUMO1, DNM3, CCL19, CSF2, MCEE, FDX1, SDC1, POSTN, GP2, CST7, CD14, NEK7, SHC1, CRELD1, TCN2, CMIP, CRHBP, C9, PXDNL, NRCAM, DLG4, TRAF3IP2, SULT2A1, GSTT2B, ITIH1, MRPL24, MUC16, IL3, CLU, FHIP2A, TK1, FKBP14, VWA5A, PRKG1, SV2A, PMCH, NEXN, CDCP1, DDX53, THSD1, PAK4, MMP12, FCN1, UMOD, PDIA4, IL6, BRK1, LILRA2, RBPMS2, SERPIND1, TPSG1, CEACAM5, FGF9, PPIF, RNF43, SIGLEC9, TOMM20, PDE5A, NELLI, GBA, PAEP, ERN1, PCSK7, CHCHD6, MARCO, SFTPA1, IL9, KYNU, SPINT1, LRFN2, NECTIN1, OSCAR, PZP, BPIFB1, LILRA5, CALY, RRAS, GADD45GIP1, ISM2, SCGB3A2, CEACAM6, LPP, GKN1, LRIG1, CLSPN, CXCL13, SFTPA2, C0X6B1, PTGR1, RBPMS, PPT1, A0C1, PDLIM5, L3HYPDH, LONP1, APOL1, CEACAM18, FGF7, and KRT14.

[00127] In various embodiments, the predictive model comprises a elastic net regression model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.79. [00128] In various embodiments, the predictive model comprises a support vector machine, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.81.

[00129] In various embodiments, the predictive model comprises a random forest model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.71.

[00130] In various embodiments, the predictive model comprises a XGBoost model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.70.

[00131] In various embodiments, the cancer is lung cancer.

[00132] In various embodiments, the risk of cancer is a level of risk of the subject developing cancer within 1 year, within 2 years, within 3 years, within 4 years, within 5 years, within 6 years, within 7 years, within 8 years, within 9 years, or within 10 years.

[00133] In various embodiments, the risk of cancer is a presence or absence of cancer.

[00134] In various embodiments, the dataset is derived from a test sample obtained from the subject.

[00135] In various embodiments, the test sample is a blood, serum or plasma sample.

[00136] In various embodiments, the dataset is obtained from having performed one or more assays.

[00137] In various embodiments, the one or more assays comprises an immunoassay to determine the expression levels of the plurality of biomarkers.

[00138] In various embodiments, the immunoassay is a Proximity Extension Assay (PEA) or LUMINEX xMAP Multiplex Assay.

[00139] Additionally disclosed herein is a non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain or have obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1, and generate a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers.

[00140] In various embodiments, the protein biomarkers comprise three or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1.

[00141] In various embodiments, the protein biomarkers comprise four or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1.

[00142] In various embodiments, the protein biomarkers comprise each of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1.

[00143] In various embodiments, the protein biomarkers further comprise one or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[00144] In various embodiments, the protein biomarkers further comprise five or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[00145] In various embodiments, the protein biomarkers further comprise ten or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[00146] In various embodiments, the protein biomarkers further comprise each of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[00147] In various embodiments, the protein biomarkers further comprise one or more of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[00148] In various embodiments, the protein biomarkers further comprise five or more of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[00149] In various embodiments, the protein biomarkers further comprise each of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[00150] In various embodiments, the predictive model comprises a elastic net regression model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.65.

[00151] In various embodiments, the predictive model comprises a support vector machine, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.70. [00152] In various embodiments, the predictive model comprises a random forest model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.67.

[00153] In various embodiments, the predictive model comprises a XGBoost model, and wherein the predictive model achieves an area under a curve (AUC) value of at least 0.68.

[00154] In various embodiments, the dataset comprises plasma proteomics data.

[00155] In various embodiments, a therapy is selected for providing to the subject based on the prediction of cancer.

BRIEF DESCRIPTION OF THE DRAWINGS

[00156] These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description and accompanying drawings.

[00157] Figure (FIG.) 1A depicts an overview of an environment for predicting risk of cancer in a subject via a cancer prediction system, in accordance with an embodiment.

[00158] FIG. IB depicts a block diagram of the cancer prediction system, in accordance with an embodiment.

[00159] FIG. 2 depicts example training data for training a prediction model, in accordance with an embodiment.

[00160] FIG. 3 depicts implementation of an example prediction model, in accordance with an embodiment.

[00161] FIG. 4 illustrates an example computer for implementing the entities shown in FIG. 1 A, IB, 2, and 3.

[00162] FIGS. 5A-5C show the performance of predictive models using various machine learning algorithms in an Olink® Target 96 platform, in accordance with the embodiments of the prediction model shown in FIGS 1-3.

[00163] FIGS. 6A-6B show the performance of predictive models using various machine learning algorithms in an Olink® Explore 3072 platform, in accordance with the embodiments of the prediction model shown in FIGS 1-3.

[00164] FIGS. 7A-7B show the performance of predictive models using various machine learning algorithms in an Olink® Explore 3072 platform, in accordance with the embodiments of the prediction model shown in FIGS 1-3. [00165] FIGS. 8A-8E illustrate circulating plasma proteins prediction of future lung cancer using the 240 proteins in the 1-3Y cohort (as identified in Table 13). FIG. 8A illustrates a boxplot of training AUC values from four different machine learning models (e.g., Elastic Net, Random Forest, Support Vector Machine, XGBoost, 5-fold CV repeated 5 times) trained on the LLP cohort to predict lung cancer in patients 1-3 years before diagnosis (53 cancer and 109 control samples). FIG. 8B illustrates combined z-scores plotted over time in the LLP cohort for 1-3Y proteins, where protein levels in LLP subjects were transformed using the z-score method and combined to generate one score. FIG. 8C illustrates AUROC (Area Under the Receiver Operating Characteristic Curve) of 1-3Y SVM model trained in Liverpool tested in UK Biobank samples 1-3 years before lung cancer diagnosis (62 cancer and 5500 control samples). FIG. 8D illustrates performance of the 1-3Y SVM model in the UK Biobank across different years of diagnosis of lung cancer. Samples taken at different times prior to lung cancer were segregated by year (2-12 years) and the SVM model for 1-3Y was tested by ROC analysis. FIG. 8E illustrates Barplot for AUROC values for SVM model predicting future development of cancer for several cancer types from UK Biobank 1-3 years before diagnosis, where the same approach as taken for lung cancer was taken to identify plasma samples at least 2 years prior to other first cancer diagnosis (number of cases labelled on bar chart) and the AUC for ROC analysis shown. [00166] FIGS. 9A-9B illustrate combined z-score from 1-3Y in relation to cancer stage and pack years of smoking, where protein levels in LLP subjects were transformed using the z-score method and combined to generate one score. FIG. 9A illustrates combined z-scores plotted in time-frame categories (5-10 years, 3-5 years, 1-3 years prior to diagnosis or at diagnosis) for healthy subjects and cases of different lung cancer stage for 1-3Y proteins with P-values generated using Wilcoxon signed-rank test. FIG. 9B illustrates z-scores correlated with pack years of smoking at time of sample in the same time frame categories, where the correlation was measured using Pearson correlation coefficient.

[00167] FIGS. 10A-10C illustrate circulating plasma proteins prediction of long-term future lung cancer. FIG. 10A illustrates a boxplot of training AUC values from four different machine learning models (Elastic Net, Random Forest, Support Vector Machine, XGBoost, 5-fold CV repeated 5 times) trained on the LLP cohort to predict lung cancer in patients 1-5 years before 1 diagnosis (110 Cancer, 215 control samples). FIG. 10B illustrates combined z-scores plotted over time in the LLP cohort for 1-5Y proteins, where protein levels in LLP subjects were transformed using the z-score method and combined to generate one score.. FIG. 10C illustrates z-scores correlated with age at time of sample in the same time frame categories; correlation was measured using Pearson correlation coefficient.

[00168] FIG. 11 illustrates Gene Enrichment Analysis including top 20 pathways over- or under- represented in plasma samples from 1-3Y or 1-5Y models. FIG. 11 demonstrates pathways for predictive panels, including three shared over-represented and three shared under-represented pathways.

[00169] FIG. 12 illustrates an example Study Design.

[00170] FIG. 13 illustrates identification of future lung cancer cases and relevant matched controls from the UK Biobank.

[00171] FIG. 14 illustrates correlation between plasma protein measurements utilizing the Olink Target 96 platform (“old”) and the Olink Explore 3072 platform (“new”).

[00172] FIG. 15 illustrates longitudinal changes in z score for 1-3Y and 1-5Y proteins.

[00173] FIG. 16A-16F illustrate combined z-scores from 1-3Y and 1-5Y in relation to histology, history of COPD, age, and stage.

[00174] FIG. 17 illustrates examples of time-dependent levels for selected plasma proteins.

DETAILED DESCRIPTION

I. Definitions

[00175] Terms used in the claims and specification are defined as set forth below unless otherwise specified.

[00176] The term “subject” encompasses a cell, tissue, or organism, human or non-human, whether in vivo, ex vivo, or in vitro, male or female.

[00177] The term “mammal” encompasses both humans and non-humans and includes but is not limited to humans, non-human primates, canines, felines, murines, bovines, equines, and porcines.

[00178] The term “sample” can include a single cell or multiple cells or fragments of cells or an aliquot of body fluid, such as a blood sample, taken from a subject, by means including venipuncture, excretion, ejaculation, massage, biopsy, needle aspirate, lavage sample, scraping, surgical incision, or intervention or other means known in the art. Examples of an aliquot of body fluid include amniotic fluid, aqueous humor, bile, lymph, breast milk, interstitial fluid, blood, blood plasma, cerumen (earwax), Cowper’s fluid (pre-ejaculatory fluid), chyle, chyme, female ejaculate, menses, mucus, saliva, urine, vomit, tears, vaginal lubrication, sweat, serum, semen, sebum, pus, pleural fluid, cerebrospinal fluid, synovial fluid, intracellular fluid, and vitreous humour.

[00179] The term “predictor” or “predictors” refers to variables, such as markers or biomarkers, analyzed by a prediction model, or one or more panels of a prediction model. In various embodiments, a “predictor” refers to biomarkers, such as protein biomarkers.

[00180] The terms “marker,” “markers,” “biomarker,” and “biomarkers” encompass, without limitation, lipids, lipoproteins, proteins, cytokines, chemokines, growth factors, peptides, nucleic acids (e.g., DNA, mRNA, or micro-RNA (miRNA)), genes, and oligonucleotides, together with their related complexes, metabolites, mutations, variants, polymorphisms, modifications, fragments, subunits, degradation products, elements, and other analytes or sample-derived measures. A marker can also include mutated proteins, mutated nucleic acids, variations in copy numbers, and/or transcript variants, in circumstances in which such mutations, variations in copy number and/or transcript variants are useful for generating a prediction model, or are useful in prediction models developed using related markers (e.g., non-mutated versions of the proteins or nucleic acids, alternative transcripts, etc.). In particular embodiments, a marker or biomarker refers to a protein biomarker. In particular embodiments, a marker or biomarker refers to a non- invasive protein biomarker.

[00181] The term "antibody" is used in the broadest sense and specifically covers monoclonal antibodies (including full length monoclonal antibodies), polyclonal antibodies, multispecific antibodies (e.g., bispecific antibodies), and antibody fragments that are antigen-binding so long as they exhibit the desired biological activity, e.g., an antibody or an antigen-binding fragment thereof.

[00182] "Antibody fragment", and all grammatical variants thereof, as used herein are defined as a portion of an intact antibody comprising the antigen binding site or variable region of the intact antibody, wherein the portion is free of the constant heavy chain domains (i.e. CH2, CH3, and CH4, depending on antibody isotype) of the Fc region of the intact antibody. Examples of antibody fragments include Fab, Fab', Fab'-SH, F(ab')2, and Fv fragments; diabodies; any antibody fragment that is a polypeptide having a primary structure consisting of one uninterrupted sequence of contiguous amino acid residues (referred to herein as a "single-chain antibody fragment" or "single chain polypeptide").

[00183] A “predictive model” or “prediction model” refers to a model that analyzes values for a plurality of predictors and determines a prediction of risk of cancer. In various embodiments, a prediction model includes one panel. In various embodiments, a prediction model includes more than one panel, such as two panels, three panels, four panels, five panels, six panels, seven panels, eight panels, nine panels, or ten panels. The two or more panels can provide combinable information for predicting risk of cancer for the subject.

[00184] The term “panel” refers to a set of predictors that are informative for predicting risk of cancer. In one example, quantitative values of biomarkers in a panel can be informative for predicting risk of cancer. In various embodiments, a panel can include two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven, twenty eight, twenty nine, thirty, thirty one, thirty two, thirty three, thirty four, thirty five, thirty six, thirty seven, thirty eight, thirty nine, forty, forty one, forty two, forty three, forty four, forty five, forty six, forty seven, forty eight, forty nine, fifty, fifty one, fifty two, fifty three, fifty four, fifty five, fifty six, fifty seven, fifty eight, fifty nine, sixty, sixty one, sixty two, sixty three, sixty four, sixty five, sixty six, sixty seven, sixty eight, sixty nine, seventy, seventy one, seventy two, seventy three, seventy four, seventy five, seventy six, seventy seven, seventy eight, seventy night, eighty, eighty one, eighty two, eighty three, eighty four, eighty five, eighty six, eighty seven, eighty eight, eighty nine, ninety, ninety one, ninety two, ninety three, ninety four, ninety five, ninety six, ninety seven, ninety eight, ninety nine, and one hundred predictors. In various embodiments, a panel can include at least one hundred, at least two hundred, at least three hundred, at least four hundred, at least five hundred, at least six hundred, at least seven hundred, at least eight hundred, at least nine hundred, or at least one thousand predictors.

[00185] The term “obtaining a dataset associated with a sample” encompasses obtaining a set of data determined from at least one sample. Obtaining a dataset encompasses obtaining a sample and processing the sample to experimentally determine the data. The phrase also encompasses receiving a set of data, e.g., from a third party that has processed the sample to experimentally determine the dataset. Additionally, the phrase encompasses mining data from at least one database or at least one publication or a combination of databases and publications. A dataset can be obtained by one of skill in the art via a variety of known ways including stored on a storage memory.

[00186] It must be noted that, as used in the specification, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise.

IL System Environment Overview

[00187] FIG. 1A depicts an overview of an environment 100 for predicting risk of cancer in a subject 110 via a cancer prediction system 130. The system environment 100 provides context in order to introduce a marker quantification assay 120 and a cancer prediction system 130 for determining a cancer prediction 140.

[00188] In various embodiments, a test sample is obtained from the subject 110. The sample can be obtained by the individual or by a third party, e.g., a medical professional. Examples of medical professionals include physicians, emergency medical technicians, nurses, first responders, psychologists, phlebotomist, medical physics personnel, nurse practitioners, surgeons, dentists, and any other medical professional as would be known to one skilled in the art.

[00189] The test sample is tested to determine values of one or more biomarkers (e.g., protein biomarkers) by performing one or more marker quantification assays 120. A marker quantification assay 120 determines quantitative values of one or more biomarkers from the test sample. In various embodiments, more than one marker quantification assay 120 can be performed to determine values of one or more biomarkers. In particular embodiments, the marker quantification assay 120 is a protein quantification assay. Therefore, by performing the marker quantification assay 120, quantitative values of one or more protein biomarkers are determined.

[00190] In various embodiments, the marker quantification assay 120 may be an assay useful for detecting and/or quantifying proteins in a biological sample. Example assays useful for detecting and/or quantifying proteins in a biological sample include an immunoassay (e.g., Proximity Extension Assay (PEA) or LUMINEX xMAP Multiplex Assay) to determine the expression levels of the plurality of biomarkers. In various embodiments, the quantitative values of various biomarkers can be obtained in a single run using a single test sample obtained from the subject 110. In some embodiments, the quantitative values of biomarkers are obtained through multiple test samples obtained from the subject 110 (e.g., a blood sample). The quantified values of the biomarkers are provided to the cancer prediction system 130.

[00191] Generally, the cancer prediction system 130 analyzes the quantitative values of biomarkers (e.g., protein biomarkers) determined by the marker quantification assay(s) 120 and generates the cancer prediction 140. In various embodiments, the cancer prediction 140 represents a prediction of presence or absence of cancer in the subject. In various embodiments, the cancer prediction 140 can be a future risk of cancer prediction for the subject 110 (e.g., a likelihood of the subject developing cancer within a time period e.g., within 1-5 years, within 1-3 years, or within 2-5 years). In various embodiments, the cancer prediction 140 can be a current risk of cancer prediction for the subject 110 (e.g., a current presence or absence of cancer in the subject 110). In various embodiments, the cancer prediction 140 can be informative for identifying a therapeutic that is likely to be effective in treating a cancer that is present or is predicted to occur within a predetermined time. In various embodiments, the therapeutic can serve as a prophylactic to delay or prevent the onset of the cancer within the predetermined time. [00192] The cancer prediction system 130 can include one or more computers, embodied as a computer system 400 as discussed below with respect to FIG. 4. Therefore, in various embodiments, the steps described in reference to the cancer prediction system 130 are performed in silico.

[00193] In various embodiments, the marker quantification assay 120 and the cancer prediction system 130 can be employed by different parties. For example, a first party performs the marker quantification assay 120 and then provides the determined quantitative values to a second party which implements the cancer prediction system 130. For example, the first party may be a clinical laboratory that obtains test samples from subjects 110 and performs marker quantification assay(s) 120 on the test samples. The second party receives the quantitative values of biomarkers resulting from performed marker quantification assay(s) 120 and analyzes the quantitative values using the cancer prediction system 130.

[00194] Reference is now made to FIG. IB which depicts a block diagram illustrating the computer logic components of the cancer prediction system 130, in accordance with an embodiment. Specifically, the cancer prediction system 130 may include a model training module 150, a model deployment module 160, and a training data store 170.

[00195] Each of the components of the cancer prediction system 130 is hereafter described in reference to two phases: 1) a training phase and 2) a deployment phase. More specifically, the training phase refers to the building and training of one or more prediction models based on training data that includes quantitative values of biomarkers obtained from individuals that are known to be healthy (e.g., absence of cancer), known to have cancer (e.g., previously diagnosed with cancer), or known to develop cancer within a certain amount of time (e.g., within 1-5 years). Therefore, the prediction models are trained to predict a risk of cancer in a subject based on at least quantitative biomarker values.

[00196] During the deployment phase, a prediction model is applied to quantitative biomarker values (e.g., protein biomarker values) from a test sample obtained from a subject of interest to predict risk of cancer for the subject of interest. In various embodiments, the prediction model only analyzes quantitative biomarker values from a test sample obtained from the subject.

[00197] In some embodiments, the components of the cancer prediction system 130 are applied during one of the training phase and the deployment phase. For example, the model training module 150 and training data store 170 (indicated by the dotted lines in FIG. IB) are applied during the training phase whereas the model deployment module 160 is applied during the deployment phase. In various embodiments, the components of the cancer prediction system 130 can be performed by different parties depending on whether the components are applied during the training phase or the deployment phase. In such scenarios, the training and deployment of the prediction model are performed by different parties. For example, the model training module 150 and training data store 170 applied during the training phase can be employed by a first party (e.g., to train a prediction model) and the model deployment module 160 applied during the deployment phase can be performed by a second party (e.g., to deploy the prediction model). HL Prediction model

III. A. Trainins a Prediction model

[00198] During the training phase, the model training module 150 trains one or more prediction models using training data. In various embodiments, the training data can be derived from samples obtained from individuals. In various embodiments, the training data includes quantitative values of biomarkers (e.g., protein biomarkers) derived from the samples obtained from individuals. Such individuals can be healthy individuals, individuals known to have cancer (e.g., individuals previously diagnosed with cancer), or individuals that are known to develop cancer within a particular timeframe (e.g., within 1-3 years, within 1-5 years, or within 2-5 years). In various embodiments, the individuals from which training data are derived are clinical subjects. For example, the training data can include quantitative values of biomarkers (e.g., protein biomarkers) that were measured from test samples obtained from clinical subjects, such as subjects that were enrolled in a clinical study or clinical trial.

[00199] Referring to FIG. IB, the training data may be stored in the training data store 170. In various embodiments, the cancer prediction system 130 generates the training data and analyzes quantitative values of biomarkers from test samples. In various embodiments, the cancer prediction system 130 obtains the training data from a third party. The third party may have analyzed test samples to determine the quantitative biomarker values from the individuals.

[00200] In various embodiments, the training data includes reference ground truths that indicate information about a cancer. As an example, the training data can include a reference ground truth that indicates a presence or absence of cancer. As another example, the training data can include a reference ground truth that indicates development of cancer within a certain time. For example, the training data can include a reference ground truth that indicates that a subject developed cancer within a particular time period. In various embodiments, the time period can be any one of 1 month, 2 months, 3 months, 4 months, 5 months, 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, 1 year, 1.5 years, 2 years, 2.5 years, 3 years, 3.5 years, 4 years, 4.5 years, 5 years, 5.5 years, 6 years, 6.5 years, 7 years, 7.5 years, 8 years, 8.5 years, 9 years, 9.5 years, 10 years, 10.5 years, 11 years, 11.5 years, 12 years, 12.5 years, 13 years, 13.5 years, 14 years, 14.5 years, 15 years, 15.5 years, 16 years, 16.5 years, 17 years, 17.5 years, 18 years, 18.5 years, 19 years, 19.5 years, or 20 years. In various embodiments, the training data can include two or more reference ground truths, each reference ground truth indicating development of cancer within a particular timeframe. For example, the training data can include a first reference ground truth indicating whether the individual developed cancer within 1 year and can further include a second reference ground truth indicating whether the individual developed cancer within 3 years.

[00201] Reference is made to FIG. 2, which depicts an example set of training data 200, in accordance with an embodiment. As shown in FIG. 2, the training data 200 includes data corresponding to multiple individuals (e.g., column 1 depicting individual 1, 2, 3, 4... ). For each individual, the training data 200 includes quantitative values (e.g., Al, Bl, A2, B2, etc.) for different markers (e.g., protein biomarkers) obtained from the corresponding individual. In some embodiments, the quantitative values are determined by the marker quantification assay 120 shown in FIG. 1 A. Although FIG. 2 explicitly depicts four individuals and two different markers (marker A and marker B), the training data 200 may include tens, hundreds, or thousands of individuals, tens, hundreds, or thousands of markers.

[00202] As shown in FIG. 2, a first training example (e.g., first row) of the training data refers to individual 1, corresponding quantitative values of marker A (e.g., Al) and marker B (e.g., Bl). Similarly, the second training example (e.g., second row) of the training data refers to individual 2, corresponding quantitative values of marker A (e.g., A2) and marker B (e.g., B2). Individuals 3 and 4 have similar corresponding marker values as shown in FIG. 2.

[00203] The training data 200 further includes a reference ground truth (e.g., column titled “Indication”) that indicates cancer information pertaining to the corresponding individual. As an example, an indication may be a current presence or current absence of cancer in the individual. As another example, an indication may be a presence or absence of cancer in the individual within a time period. For example, referring to the first training example (e.g., first row), a “Positive” indication under the column titled “Time” can indicate that the individual 1 developed cancer within the time period (e.g., within any one of 1 month, 2 months, 3 months, 4 months, 5 months, 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, 1 year, 1.5 years, 2 years, 2.5 years, 3 years, 3.5 years, 4 years, 4.5 years, 5 years, 5.5 years, 6 years, 6.5 years, 7 years, 7.5 years, 8 years, 8.5 years, 9 years, 9.5 years, 10 years, 10.5 years, 11 years, 11.5 years, 12 years, 12.5 years, 13 years, 13.5 years, 14 years, 14.5 years, 15 years, 15.5 years, 16 years, 16.5 years, 17 years, 17.5 years, 18 years, 18.5 years, 19 years, 19.5 years, or 20 years).

Referring to the second training example (e.g., second row), the second training example includes an indication of “Positive” under the column titled “Indication” which indicates that the second individual developed cancer within the time period. The third and fourth training examples corresponding to Individual 3 and Individual 4, respectively, include reference ground truths with an indication of “Negative” which indicates that the individuals do not develop cancer within the time period.

[00204] Although the training data 200 in FIG. 2 depicts one reference ground truth (e.g., “Indication”), in various embodiments, training data 200 can include more reference ground truths (e.g., two indications or more). As one example, the training data 200 can additionally include reference ground truth values that indicate whether the individual developed cancer within two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, or twenty other time periods.

[00205] In some embodiments, for training the prediction model, the model training module 150 retrieves the training data from the training data store 170 and randomly partitions the training data into a training set and a test set. As an example, 66% of the training data may be partitioned into the training set and the other 33% can be partitioned into the test set. Other proportions of training set and test set may be implemented. As such, the training set is used to train prediction models whereas the test set is used to validate the prediction models.

[00206] In various embodiments, the prediction model is any one of a regression model (e.g., linear regression, logistic regression, Cox regression, elastic net regression, Cox Elastic regression model, ridge regression, or polynomial regression), decision tree, random forest, support vector machine, elastic net regulation, Naive Bayes model, k-means cluster, or neural network (e.g., feed-forward networks, convolutional neural networks (CNN), deep neural networks (DNN), autoencoder neural networks, generative adversarial networks, or recurrent networks (e.g., long short-term memory networks (LSTM), bi-directional recurrent networks, deep bi-directional recurrent networks), or any combination thereof. In particular embodiments, the prediction model is any one of an elastic net logistic regression model, random forest model, support vector machine, or XGBoost model. In particular embodiments, the prediction model is an elastic net logistic regression model. In particular embodiments, the prediction model is a random forest model. In particular embodiments, the prediction model is a support vector machine. In particular embodiments, the prediction model is a XGBoost model.

[00207] The prediction model can be trained using a machine learning implemented method, such as any one of a linear regression algorithm, logistic regression algorithm, decision tree algorithm, support vector machine classification, elastic net regulation, Naive Bayes classification, K-Nearest Neighbor classification, random forest algorithm, deep learning algorithm, gradient boosting algorithm, and dimensionality reduction techniques such as manifold learning, principal component analysis, factor analysis, autoencoder regularization, and independent component analysis, or combinations thereof. In various embodiments, the prediction model is trained using supervised learning algorithms, unsupervised learning algorithms, semi-supervised learning algorithms (e.g., partial supervision), weak supervision, transfer, multi-task learning, or any combination thereof.

[00208] In various embodiments, the prediction model has one or more parameters, such as hyperparameters or model parameters. Hyperparameters are generally established prior to training. Examples of hyperparameters include the learning rate, depth or leaves of a decision tree, number of hidden layers in a deep neural network, number of clusters in a k-means cluster, penalty in a regression model, and a regularization parameter associated with a cost function. Model parameters are generally adjusted during training. Examples of model parameters include weights associated with nodes in layers of neural network, support vectors in a support vector machine, and coefficients in a regression model. The model parameters of the prediction model are trained (e.g., adjusted) using the training data to improve the predictive capacity of the prediction model.

[00209] The model training module 150 trains a prediction model using the training data. In various embodiments, the model training module 150 constructs a prediction model that receives, as input, two or more predictors (e.g., values of biomarkers). In various embodiments, the model training module 150 constructs a prediction model that receives, as input, three predictors. In various embodiments, the model training module 150 constructs a prediction model that receives, as input, four predictors. In various embodiments, the model training module 150 constructs a prediction model that receives, as input, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven, twenty eight, twenty nine, thirty, thirty one, thirty two, thirty three, thirty four, thirty five, thirty six, thirty seven, thirty eight, thirty nine, forty, forty one, forty two, forty three, forty four, forty five, forty six, forty seven, forty eight, forty nine, fifty, fifty one, fifty two, fifty three, fifty four, fifty five, fifty six, fifty seven, fifty eight, fifty nine, sixty, sixty one, sixty two, sixty three, sixty four, sixty five, sixty six, sixty seven, sixty eight, sixty nine, seventy, seventy one, seventy two, seventy three, seventy four, seventy five, seventy six, seventy seven, seventy eight, seventy night, eighty, eighty one, eighty two, eighty three, eighty four, eighty five, eighty six, eighty seven, eighty eight, eighty nine, ninety, ninety one, ninety two, ninety three, ninety four, ninety five, ninety six, ninety seven, ninety eight, ninety nine, and one hundred predictors. In various embodiments, a panel can include at least one hundred, at least two hundred, at least three hundred, at least four hundred, at least five hundred, at least six hundred, at least seven hundred, at least eight hundred, at least nine hundred, or at least one thousand predictors.

[00210] In various embodiments, the model training module 150 constructs a prediction model that receives, as input, quantitative values of three biomarkers. In various embodiments, the model training module 150 constructs a prediction model that receives, as input, quantitative values of four biomarkers. In some embodiments, the model training module 150 constructs a prediction model that receives, as input, quantitative values for more than four biomarkers. In various embodiments, the model training module 150 constructs a prediction model that receives as input, quantitative values for five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven, twenty eight, twenty nine, thirty, thirty one, thirty two, thirty three, thirty four, thirty five, thirty six, thirty seven, thirty eight, thirty nine, forty, forty one, forty two, forty three, forty four, forty five, forty six, forty seven, forty eight, forty nine, fifty, one hundred, two hundred, three hundred, four hundred, five hundred, six hundred, seven hundred, eight hundred, nine hundred, one thousand, or more markers. In particular embodiments, the model training module 150 constructs a prediction model that receives as input, quantitative values for 5 markers. In particular embodiments, the model training module 150 constructs a prediction model that receives as input, quantitative values for at least 10 markers. In particular embodiments, the model training module 150 constructs a prediction model that receives as input, quantitative values for at least 20 markers. In particular embodiments, the model training module 150 constructs a prediction model that receives as input, quantitative values for at least 30 markers. In particular embodiments, the model training module 150 constructs a prediction model that receives as input, quantitative values for at least 40 markers. In particular embodiments, the model training module 150 constructs a prediction model that receives as input, quantitative values for at least 50 markers. In particular embodiments, the model training module 150 constructs a prediction model that receives as input, quantitative values for at least 100 markers. In particular embodiments, the model training module 150 constructs a prediction model that receives as input, quantitative values for at least 400 markers. In particular embodiments, the model training module 150 constructs a prediction model that receives as input, quantitative values for at least any of 5, 10, 15, 20, 30, 50, 100, 425, or 493 biomarkers.

[00211] In various embodiments, the model training module 150 identifies a set of biomarkers that are to be used to train a prediction model. The model training module 150 may begin with a list of candidate biomarkers that are promising for diagnosing a cancer. In various embodiment, the model training module 150 performs a feature selection process to identify the set of biomarkers to be included for the prediction model. For example, candidate biomarkers that are determined to be highly correlated with a presence of cancer would be deemed important are therefore likely to be included in the panel in comparison to other biomarkers that are not highly correlated.

[00212] In various embodiments, each prediction model is iteratively trained using, as input, the quantitative values of the markers for each individual. For example, referring again to FIG. 2, one iteration involves providing a training example (e.g., a row of the training data). Each prediction model is trained on reference ground truth data that includes the indication(s). In various embodiments, over training iterations, the prediction model is trained (e.g., the parameters are tuned) to minimize a prediction error between a prediction outputted by the prediction model and the ground truth data. In various embodiments, the prediction error is calculated based on a loss function, examples of which include a LI regularization (Lasso Regression) loss function, a L2 regularization (Ridge Regression) loss function, or a combination of LI and L2 regularization (ElasticNet).

[00213] In various embodiments, a penalty factor is employed to lower the risk of false-positive selection of predictive biomarkers arising from their low levels. In various embodiments, a penalty factor is added to the general Elastic Net penalty based on the proportion of values of each biomarker at or below a lower limit of quantitation (LLOQ).

III.B. Deploy ins a Prediction model

[00214] During the deployment phase, the model deployment module 160 (as shown in FIG. IB) applies a trained prediction model to generate a prediction for risk of cancer in the subject. In various embodiments, the prediction for risk of cancer for the subject is a prediction of presence of absence of cancer in the subject. In particular embodiments, the subject has not previously been diagnosed with a disease. Therefore, the deployment of the prediction model enables in silico prediction of whether the subject is likely to develop cancer in the future (e.g., within 1-20 years). In various embodiments, the model deployment module 160 applies a trained prediction model that analyzes quantitative values of biomarkers to determine a risk of cancer in a subject. [00215] In various embodiments, the trained prediction model includes a single panel that includes one or more biomarkers. Thus, the trained prediction model outputs a prediction based on the one or more biomarkers of the single panel.

[00216] In various embodiments, the trained prediction model includes two or more panels, each panel comprising one or more biomarkers. In various embodiments, a panel includes a set of biomarkers that are distinct from a set of biomarkers of another panel in the prediction model. In various embodiments, one or more biomarkers of one panel can overlap with one or more biomarkers of another panel. In other words, two panels may share one or more biomarkers. In various embodiments, two panels may share at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least fifteen, at least twenty, at least thirty, at least fifty, at least one hundred, at least two hundred, at least three hundred, at least four hundred, at least five hundred, at least six hundred, at least seven hundred, at least eight hundred, at least nine hundred, or at least one thousand biomarkers. [00217] In such embodiments where the trained prediction model includes two or more panels, the trained prediction model outputs a prediction based on the biomarkers of each of the two or more panels. To generate an overall prediction, the trained prediction model combines an output of a first panel with an output of a second panel. Thus, the one or more biomarkers of the first panel as well as the one or more biomarkers of the second panel contribute towards the overall prediction outputted by the trained prediction model.

[00218] In various embodiments, the output of each of the panels of the prediction model is a score (e.g., an indication of how likely it is that the subject has cancer or will develop cancer). Thus, the trained prediction model combines scores outputted by the individual panels to generate an overall prediction. In various embodiments, the trained prediction model combines the scores outputted by the individual panels by comparing the scores outputted by the individual panels and selecting one of the scores. Thus, the selected score serves as the basis for the overall prediction of the prediction model. In various embodiments, the trained prediction model combines the scores outputted by the individual panels by comparing the scores outputted by the individual panels and selecting the higher score.

[00219] In various embodiments, the trained prediction model combines the supplemented scores by comparing the supplemented scores and selecting one of the supplemented scores. In various embodiments, the prediction model selects the highest supplemented score. In such embodiments, the overall prediction outputted by the prediction model can be the selected score or can be derived from the selected score (e.g., overall prediction is generated based on the comparison between the selected score and a reference score as described above).

[00220] In various embodiments, prior to comparing the scores and selecting a score, the prediction model normalizes each score outputted by a panel to a corresponding reference score. Thus, normalized scores are compared to one another to select the score.

[00221] In various embodiments, the overall prediction outputted by the prediction model is the selected score that is selected from the scores outputted the panels. In various embodiments, the prediction model generates the overall prediction by comparing the selected score to one or more reference scores. In various embodiments, the reference score can be a score corresponding to healthy patients (e.g., a “healthy score”), a baseline score at a prior timepoint (e.g., longitudinal analysis), a score corresponding to patients clinically diagnosed with cancer (e.g., a “reference cancer score”), a score corresponding to patients diagnosed with a particular subtype of cancer (e.g., a cancer subtype score), a score corresponding to patients who are known to develop cancer within a particular time period (e.g., a time to event score), or a threshold score (e.g., a cutoff).

[00222] In particular embodiments, the reference score can be a “healthy score” corresponding to healthy patients and can be generated by implementing a prediction model to analyze quantitative values of biomarkers. In particular embodiments, the reference score is a time to event score corresponding to patients who are known to develop cancer within a time period (e.g., within any one of 1 month, 2 months, 3 months, 4 months, 5 months, 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, 1 year, 1.5 years, 2 years, 2.5 years, 3 years, 3.5 years, 4 years, 4.5 years, 5 years, 5.5 years, 6 years, 6.5 years, 7 years, 7.5 years, 8 years, 8.5 years, 9 years, 9.5 years, 10 years, 10.5 years, 11 years, 11.5 years, 12 years, 12.5 years, 13 years, 13.5 years, 14 years, 14.5 years, 15 years, 15.5 years, 16 years, 16.5 years, 17 years, 17.5 years, 18 years, 18.5 years, 19 years, 19.5 years, or 20 years).

[00223] In various embodiments, the overall prediction is generated based on the comparison between a score of the prediction model and one or more reference scores. The overall prediction is informative for predicting risk of cancer for the subject within one or more time periods. To provide an example, the score can be from a panel of the prediction model. The score is compared to a healthy score (e.g., reference score derived from healthy patients). If the score is significantly different (e.g., p < 0.05) from the healthy score, the overall prediction can indicate that the subject has cancer, or will likely develop cancer. As another example, the score from the prediction model can be compared to one or more time to event scores of patients who are known to develop cancer within a particular time period. If the score is significantly different (e.g., p < 0.05) from a time to event score, then the overall prediction can indicate that the subject is unlikely to develop cancer within a period of time corresponding to the time to event score. If the score is not significantly different (e.g., p>0.05) from a time to event score, then the overall prediction can indicate that the subject is likely to develop cancer within a period of time corresponding to the time to event score. As described herein, a period of time can be any of within any one of 1 month, 2 months, 3 months, 4 months, 5 months, 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, 1 year, 1.5 years, 2 years, 2.5 years, 3 years, 3.5 years, 4 years, 4.5 years, 5 years, 5.5 years, 6 years, 6.5 years, 7 years, 7.5 years, 8 years, 8.5 years, 9 years, 9.5 years, 10 years, 10.5 years, 11 years, 11.5 years, 12 years, 12.5 years, 13 years, 13.5 years, 14 years, 14.5 years, 15 years, 15.5 years, 16 years, 16.5 years, 17 years, 17.5 years, 18 years, 18.5 years, 19 years, 19.5 years, or 20 years.

[00224] In various embodiments, the subject can undergo treatment depending on the overall prediction. For example, if the subject is predicted to likely develop cancer within a particular period of time, the subject can be administered a therapeutic intervention. Here, the therapeutic intervention can serve as a prophylactic treatment to delay or prevent the onset of the cancer. [00225] Reference is now made to FIG. 3, which depicts implementation of an example prediction model, in accordance with a fourth embodiment. Here, the prediction model 350 may include a single panel 315. Thus, single panel 315 of the prediction model analyzes the quantitative biomarker levels 310.

[00226] Based on the analysis of the quantitative biomarker levels 310, the prediction model 350 generates a cancer score 330. The cancer score 330 is compared to one or more reference scores. In various embodiments, the cancer score 330 can be compared to a time to event score. If the cancer score 330 is not significantly different (e.g., p > 0.05) from the time to event score, then the overall prediction 340 can indicate that the individual is likely to develop cancer within a time period corresponding to the time to event score. Alternatively, if the cancer score 330 is significantly different (e.g., p < 0.05) from the time to event score, then the overall prediction 340 can indicate that individual is not likely to develop cancer within the time period corresponding to the time to event score. The cancer score 330 can be compared to multiple time to event scores corresponding to different time periods to predict whether the individual is likely to develop cancer within any of the time periods corresponding to the time to event scores.

[00227] As shown and described in reference to FIG. 3, the prediction model 350 can generate a cancer score (e.g., cancer score 330) that is informative for determining an overall prediction 340. In various embodiments, the cancer score represents an aggregate score of the levels (e.g., altered or dysregulated levels) of the biomarkers of the prediction model 350. This means that it is not necessary to know how the level of any individual marker has changed to obtain the cancer score. For example, assuming a prediction model of 20 biomarkers, the upregulation or downregulation of any one biomarker represents one component that results in the cancer score. Thus, even though a first patient and second patient may both exhibit upregulation of a biomarker, the final aggregate cancer scores may indicate that the first patient is likely to develop cancer within a certain timeframe, whereas the second patient is unlikely to develop cancer within the certain timeframe.

[00228] As further shown in FIG. 3, the output of the prediction model 350 is an overall prediction 340. In particular embodiments, the overall prediction 340 represents a prediction of risk of cancer (e.g., lung cancer) for the subject. In particular embodiments, the overall prediction 340 represents a prediction of whether the subject is likely to develop lung cancer within a particular time period. In various embodiments, the time period is any one of 1 month, 2 months, 3 months, 4 months, 5 months, 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, 1 year, 1.5 years, 2 years, 2.5 years, 3 years, 3.5 years, 4 years, 4.5 years, 5 years, 5.5 years, 6 years, 6.5 years, 7 years, 7.5 years, 8 years, 8.5 years, 9 years, 9.5 years, 10 years, 10.5 years, 11 years, 11.5 years, 12 years, 12.5 years, 13 years, 13.5 years, 14 years, 14.5 years, 15 years, 15.5 years, 16 years, 16.5 years, 17 years, 17.5 years, 18 years, 18.5 years, 19 years, 19.5 years, or 20 years. In various embodiments, the overall prediction 340 can represent multiple predictions of whether the subject is likely to develop lung cancer within N different time periods. In various embodiments, N is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 different time periods.

[00229] In various embodiments, the prediction model 350 achieves e.g., an area under the curve (AUC) performance metric (e.g., minimum, median, mean, maximum, first quartile, second quartile, third quartile, or fourth quartile AUC value) of at least 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, or 0.99. In various embodiments, the prediction model 350 achieves e.g., an AUC performance metric (e.g., minimum, median, mean, maximum, first quartile, second quartile, third quartile, or fourth quartile AUC value) of about 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, or 0.99. IV. Panel(s) of a prediction model

[00230] Embodiments described herein involve implementing a prediction model that includes one or more panels. Each panel includes one or more predictors, examples of which include biomarkers (e.g., protein biomarkers).

[00231] In various embodiments, multiple panels can be included in a prediction model. The implementation of multiple panels is informative for generating an overall prediction for risk of cancer in a subject. In various embodiments, a panel of the prediction model is a univariate panel. In such embodiments, the univariate panel includes one predictor. In other embodiments, a panel is a multivariate panel. In such embodiments, the multivariate panel includes more than one predictor. In various embodiments, the multivariate panel includes two predictors. In various embodiments, the multivariate panel includes 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,

42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,

68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,

94, 95, 96, 97, 98, 99, or 100 predictors. In various embodiments, the multivariate panel includes at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, or more predictors. In particular embodiments, the multivariate panel includes five predictors. In particular embodiments, the multivariate panel includes ten predictors. In particular embodiments, the multivariate panel includes fifteen predictors. In particular embodiments, the multivariate panel includes twenty predictors. In particular embodiments, the multivariate panel includes thirty predictors. In particular embodiments, the multivariate panel includes fifty predictors. In particular embodiments, the multivariate panel includes at least one hundred predictors. In particular embodiments, the multivariate panel includes at least two hundred predictors. In particular embodiments, the multivariate panel includes at least three hundred predictors. In particular embodiments, the multivariate panel includes at least four hundred predictors. In particular embodiments, the multivariate panel includes at least five hundred predictors. In particular embodiments, the multivariate panel includes at least six hundred predictors. In particular embodiments, the multivariate panel includes at least seven hundred predictors. In particular embodiments, the multivariate panel includes at least eight hundred predictors. In particular embodiments, the multivariate panel includes at least nine hundred predictors. In particular embodiments, the multivariate panel includes at least one thousand predictors. In particular embodiments, the multivariate panel includes 425 predictors. In particular embodiments, the multivariate panel includes 493 predictors.

[00232] In various embodiments, the prediction model (such as the prediction model in FIG. 3) includes between 1 and 1000 biomarkers. In various embodiments, the prediction model (such as the prediction model in FIG. 3) includes between 1 and 500 biomarkers. In various embodiments, the prediction model (such as the prediction model in FIG. 3) includes between 1 and 100 biomarkers. In various embodiments, the prediction model (such as the prediction model in FIG. 3) includes between 1 and 60 biomarkers. In various embodiments, the prediction model includes between 10 and 50 biomarkers. In various embodiments, the prediction model includes between 20 and 40 biomarkers. In various embodiments, the prediction model includes between 25 and 38 biomarkers. In various embodiments, the prediction model includes between 30 and 35 biomarkers. In various embodiments, the prediction model includes between 20 and 30 biomarkers. In various embodiments, the prediction model includes between 30 and 40 biomarkers. In various embodiments, the prediction model includes between 40 and 50 biomarkers. In particular embodiments, the prediction model includes 5 biomarkers. In particular embodiments, the prediction model includes 10 biomarkers. In particular embodiments, the prediction model includes 15 biomarkers. In particular embodiments, the prediction model includes 20 biomarkers. In particular embodiments, the prediction model includes 30 biomarkers. In particular embodiments, the prediction model includes 50 biomarkers.

[00233] In various embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more protein biomarkers. Example protein biomarkers included in panels of the prediction model or the prediction model include protein biomarkers shown below in Tables 1-3.

[00234] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, or each protein biomarker selected from TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1. [00235] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, or each protein biomarker selected from THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[00236] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or each protein biomarker selected from IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[00237] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, or each protein biomarker selected from CEACAM5, TOPI, NCAM1, SCGB3A2, and CALY.

[00238] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, or each protein biomarker selected from TGFBI, CABP2, ENPP6, KRT14, HEPACAM2, TMEM25, SGSH, MFAP3L, TNFSF14, CD3D, TMED4, ZP3, MMP12, GCG, and AFM.

[00239] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, sixteen is more, seventeen or more, eighteen or more, nineteen or more, twenty or more, twenty one or more, twenty two or more, twenty three or more, twenty four or more, twenty five or more twenty six or more, twenty seven or more, twenty eight or more, twenty nine or more, or each protein biomarker selected from SPINT1, LILRA4, FLT3LG, AGBL2, PAEP, SCGB3A1,

LRFN2, TJP3, FGF7, LRIG1, CA14, CEACAM18, CST1, ANXA10, CDCP1, GPC5, OSCAR, CEACAM6, CD2, SNCG, GPR37, SEPTIN3, RAB10, DKK4, DKKL1, SOST, CSF3, VWA5A, TSPAN7, and PAK4.

[00240] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, sixteen is more, seventeen or more, eighteen or more, nineteen or more, twenty or more, twenty one or more, twenty two or more, twenty three or more, twenty four or more, twenty five or more twenty six or more, twenty seven or more, twenty eight or more, twenty nine or more, thirty or more, thirty one or more, thirty two or more, thirty three or more, thirty four or more, thirty five or more, thirty six or more, thirty seven or more, thirty eight or more, thirty nine or more, forty or more, forty one or more, forty two or more, forty three or more, forty four or more, forty five or more, forty six or more, forty seven or more, forty eight or more, forty nine or more, or each protein biomarker selected from BPIFB1, SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TMOD4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2.

[00241] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more protein biomarker selected from NECTIN1, CBLN1, NTF3, PYY, XG, NPY, CCL20, SIL1, PLB1, DUSP29, UMOD, ATXN2L, LEO1, PROS1, EDDM3B, ENO3, DCBLD2, MMP9, KIF22, DENND2B, C1RL, PVALB, CXCL8, PPY, CCN1, KLK10, RRAS, SCN3B, BPIFB2, ITGAL, DDX1, MEGF11, NOP56, NTF4, HNMT, IL9, SCRIB, UXS1, MEP1A, ACTN2, NECAP2, CLEC10A, DDX53, SV2A, ATXN10, PI 16, KCNH2, TNR, PDGFRB, SERPINA4, CDC27, MICALL2, CD28, BRK1, SLC16A1, DSCAM, PBXIP1, MATN3, SFTPA2, PTTG1, ASAH2, SCG2, PTGR1, GBA, PTPRZ1, ERN1, LECT2, SCGN, HLA-DRA, IL5RA, LRPAP1, CXCL13, NEXN, CD248, KYNU, ADAMTS15, WFIKKN2, CLEC14A, FZD10, PROC, LY9, LRP2, CX3CL1, RNASET2, CTSS, MCEMP1, COMP, SIGLEC6, CCL24, AOC1, PLXNB3, TMPRSS15, FCAR, SCIN, IFI30, KIRREL1, FXYD5, S100A16, LILRA5, CLSPN, AHNAK2, CTLA4, INSL5, WDR46, CST5, PHLDB2, TREML2, GUCA2A, PFDN2, PDIA4, LAMA1, SLAMF7, RGS8, IL6, PSG1, PZP, RRM2, GFRAL, AIF1L, LGMN, C1QTNF9, TSPAN1, DLL4, CRELD2, SCARF1, FGF9, JAM3, LPP, HSPB1, PPT1, PPIF, TRPV3, AP0A4, LYSMD3, TGFA, ATP6V1D, LRRC38, CTAG1A, TINAGL1, P0LR2A, EDIL3, LAP3, SORD, ARHGAP30, CSPG4, ART3, GADD45GIP1, SLURP1, LILRA2, GZMH, FKBP7, SLC27A4, CALCB, GIT1, CTSO, PCBD1, CSF3R, EIF1AX, CSPG5, CD93, ADAMTSL5, ISM2, CPE, WFDC1, VWC2, SPINK5, BTN1A1, DPT, FCN1, AIF1, GPC1, FAP, CLNS1A, CFC1, FASLG, NCS1, PRKAR1A, RC0R1, SLITRK2, SPARCL1, HSPB6, TNFRSF12A, IL6, SERPIND1, CEBPB, CASC3, AMPD3, YTHDF3, AAMDC, STX7, AGRP, ICA1, CHCHD6, IGSF21, VSTM1, PCDH7, VNN2, GP6, ITGAV, CD40LG, GIP, MB, TPD52L2, HPSE, GRIN2B, TREML1, C3, TNFRSF17, IL6, CD226, PALM, FKBP14, RBPMS2, CLEC6A, DAAM1, FAM3D, WASF1, HS1BP3, NOS3, POF1B, PLXNA4, MITD1, ERMAP, SYAP1, LRRC59, CNTN2, RAB2B, PENK, MCAM, EIF2S2, EGF, PTPN6, NID2, EHD3, IGFBP6, LM0D1, PAGR1, CD300C, SKAP2, PRKG1, SYTL4, GYSI, CASP3, PILRA, CD69, CCN5, PCBP2, LM0D1, PDIA5, PCSK7, SCARA5, METAP1D, ADGRB3, MPIG6B, NUMB, L3HYPDH, DENR, AGRN, COX6B1, JAM2, TIA1, CACYBP, SEMA6C, VAT1, SUSD1, RSPO3, TWF2, BOLA1, OXCT1, ITGA6, BST2, F2R, PILRB, RTBDN, ENOX2, D0K1, VASH1, DTD1, DDHD2, TBC1D23, GLRX5, CDNF, SIRPB1, NMT1, STK11, RPL14, PSTPIP2, FHIT, CLMP, LM0D1, ERP29, BECN1, CD38, YAP1, CAB, CRKL, PPP1R9B, FLU, CMC1, CDC37, ARHGAP45, PDAP1, NUDC, CLEC1B, USO1, SNAP23, HGS, FUS, PIK3AP1, FUR, TBC1D17, ITPA, IL1B, ENO1, THTPA, SAFB2, JPT2, GIMAP7, NIT2, RILPL2, PRTFDC1, TAD A3, TOMM20, HPCAL1, LONP1, CALCOCO1, ATRAID, TYMP, TNFRSF19, DNPEP, NRGN, STK4, SSNA1, CRYGD, LZTFL1, SNAP29, PDLIM5, CASP2, MANF, BACH1, DAPP1, AKR1B1, EREG, DAG1, HSBP1, DUT, AKT2, PLA2G4A, TXLNA, PIKFYVE, FYB1, CSDE1, RHOC, HNRNPK, DCTD, SCRG1, LACTB2, RGCC, GIMAP8, GRHPR, SNX5, NCK2, EIF4G1, BNIP3L, ACOT13, MECR, MAP2K6, SEC31A, MGLL, MESD, NUDT16, SULT1A1, GOPC, VTA1, PDLIM7, ANXA2, GGACT, PMVK, USP8, SNCA, CAMSAP1, HEXIM I, SHMT1, LGALS8, APPL2, MAP2K1, EHBP1, MAP4K5, [00242] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, or each protein biomarker selected from VWA5A, ENPP6, TMEM25, ALDH2, and LEO1.

[00243] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, or each protein biomarker selected from GAMT, TPSG1, ANK2, SCT, TSPAN7, GPC5, PGLYRP1, PAK4, TNFSF14, CLEC6A, TMPRSS15, PMCH, KRT14, SFTPA1, and LRFN2.

[00244] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, sixteen is more, seventeen or more, eighteen or more, nineteen or more, twenty or more, twenty one or more, twenty two or more, twenty three or more, twenty four or more, twenty five or more twenty six or more, twenty seven or more, twenty eight or more, twenty nine or more, or each protein biomarker selected from MMP12, TNPO1, GAST, CD3D, TK1, DLGAP5, SCGN, CCL24, PSG1, CLU, CFB, LBP, CRYM, LAIR2, TCN2, SV2A, CRHBP, C5, SCGB3A2, ANXA10, GCG, RPGR, PAPP A, FZD8, CSPG5, BRK1, OXT, FDX1, ENPEP, and LRG1.

[00245] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, sixteen is more, seventeen or more, eighteen or more, nineteen or more, twenty or more, twenty one or more, twenty two or more, twenty three or more, twenty four or more, twenty five or more twenty six or more, twenty seven or more, twenty eight or more, twenty nine or more, thirty or more, thirty one or more, thirty two or more, thirty three or more, thirty four or more, thirty five or more, thirty six or more, thirty seven or more, thirty eight or more, thirty nine or more, forty or more, forty one or more, forty two or more, forty three or more, forty four or more, forty five or more, forty six or more, forty seven or more, forty eight or more, forty nine or more, or each protein biomarker selected from PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, EN01, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRES1.

[00246] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more protein marker selected from SLC27A4, IL6, DKKL1, MFAP3, STX7, SSBP1, AKR7L, UGDH, IGHMBP2, GBP4, RBPMS, ST6GAL1, LILRA5, LILRA2, SOWAHA, ACADSB, CAMLG, CRTAC1, SUSD1, IL6, KLK10, GRSF1, MFAP4, NMT1, CNTN3, IL36A, EHD3, MAPT, AGBL2, ERN1, POMC, PDIA4, LGMN, EPHA10, PCBP2, PTGR1, GIT1, TREML1, GALNT2, TDGF1, INSR, OSCAR, MMP10, MRPL24, EIF1AX, AHNAK2, TP53, GBA, LRRC38, CLEC12A, TPT1, PPP1CC, BPIFB1, CFC1, SIGLEC9, CALY, OSM, ADAMTS1, OSMR, TYMP, GPR37, CLEC7A, SMAD5, SFTPA2, CTSS, HNMT, BATF, CCL19, SHC1, CST7, S100A12, ASAH2, PPIB, LYPD3, APOL1, AFM, SSC4D, FGF7, TDRKH, SCG2, ENPP2, PRKAR1A, FAM3D, GADD45GIP1, SEMA4D, PPP1R14A, EGF, NTF4, SERPING1, COX6B1, NECAP2, TFF1, IDI2, TIP3, CA14, PZP, PLIN1, ERBB4, TBC1D23, CRISP3, IFI30, ITIH1, C9, LAP3, PDIA5, ENDOU, FLT3LG, VNN2, MILR1, SDC1, CEACAM18, FHIP2A, CEACAM5, Fl 1, WFIKKN2, USO1, CD40LG, GSTT2B, DUSP29, ATXN2L, IL6, RRM2, FGF23, ARHGAP30, SERPINA3, CXCL13, MMP8, NUDC, ENOPH1, NEK7, MAN1A2, ASAHI, STX5, IZUMO1, SERPINC1, IL9, PVALB, GZMH, FGF16, TFF2, WASF1, TMEM106A, GP2, PLXNA4, GNE, LGALS8, AOC1, FLRT2, CHCHD6, RNF43, TPD52L2, CSDE1, GPD1, PLA2G4A, LRIG1, NGF, RAB27B, VAT1, NUDT16, TRAF3IP2, MARCO, UMOD, PIK3AP1, MEGF11, NEDD4L, PKD2, CEBPB, RILPL2, IL3, RGCC, SARG, SMAD2, CTSH, KLKB1, ERP44, SULT2A1, SORD, IFNAR1, KLK11, TOMM20, C3, ADRA2A, NCK2, KIRREL2, CACNB3, SKAP2, CEACAM6, DNAIC21, PROS1, NRCAM, NPY, FYB1, RAB2B, MANF, MECR, LPA, DAAM1, DCTD, FXYD5, CRELD1, PLEKHO1, TINAGL1, ZBTB16, PROK1, MAP2K1, DAPP1, DSG4, PPP1R9B, RILP, EIF4G1, SESTD1, KIFBP, HGS, CD14, ANKMY2, WNT9A, CA13, GP1BB, CLIP2, BANK1, WDR46, HSPB1, CSF2, SNCA, RRAS, PRTFDC1, RBPMS2, LARP1, KAZN, CLSPN, RHOC, PPT1, DPEP2, METAP1D, STK11, CFH, PDE5A, MRC1, BIN2, IL17A, PXDNL, GP6, EPO, MAP3K5, MCEE, DDHD2, PHLDB2, NECTIN1, CCDC50, GKN1, MPIG6B, CBLIF, SYTL4, SSH3, PDZD2, SULT1A1, DLG4, HPCAL1, ICA1, GDF15, CD160, APPL2, GRN, IL17RA, CDC42BPB, C4BPB, DAG1, CMIP, KYNU, NUMB, PPY, PPIF, CFI, DTD1, LDLRAP1, FGF9, STXBP1, CMC1, GOPC, SMTN, PTPN6, L3HYPDH, PDAP1, LPP, THTPA, XG, AGRP, RAB11FIP3, FUR, BCR, LONP1, BNIP3L, SELP, GYSI, MGLL, PDLIM5, MESD, DNPEP, SRC, PMVK, ITPRIP, CD69, CALCOCO1, PAFAH2, GIPC3, SNAP23, STAT5B, RSPO3, AKT1S1, SNAP29, CASP2, AKT2, NELLI, MCTS1, TIA1, SCRG1, CIRBP, SEMA3F, SOX2, NRGN, PSTPIP2, ISM2, EHBP1, VTA1, and DUE

[00247] In various embodiments, the panel of biomarkers include one or more proteins identified in Table 13 under the column “Gene Name”. In various embodiments, the panel of biomarkers include one or more proteins identified in Table 13 under the column “Gene Name” and differentially expressed in 1-5Y cohort (identified as “1-5Y only” or “Both” under the column “Cohort”). In various embodiments, the panel of biomarkers include two or more, five or more, ten or more, twenty or more, thirty or more, forty or more, fifty or more, one hundred or more, two hundred or more, or each of proteins identified in Table 13 under the column “Gene Name” and differentially expressed in 1-5Y cohort (identified as “1-5Y only” or “Both” under the column “Cohort”).

[00248] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, or each protein biomarker selected from TSPAN1, CD28, SCN3B, ADGRB3, and IGFBP6.

[00249] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, or each protein biomarker selected from NRTN, AIF1L, HSPB6, MB, TNFRSF19, IL5RA, TNR, CDNF, CST1, FGFBP2, S100A16, CD248, GFRA3, LMOD1, and POFIB. [00250] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, sixteen is more, seventeen or more, eighteen or more, nineteen or more, twenty or more, twenty one or more, twenty two or more, twenty three or more, twenty four or more, twenty five or more twenty six or more, twenty seven or more, twenty eight or more, twenty nine or more, or each protein biomarker selected from DENND2B, COMP, CNTN2, SCARA5, CSPG4, ITGAV, SOST, SERPINA4, LILRA4, SPINK5, PINLYP, ACTN2, JAM2, FAP, TMOD4, GUCA2A, MFAP3L, DKK4, LAMA1, BAG3, SNCG, SEPTIN3, VWC2, KLRC1, ATRAID, ART3, SLITRK2, SIGLEC6, TMED4, and SLAMF7.

[00251] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, sixteen is more, seventeen or more, eighteen or more, nineteen or more, twenty or more, twenty one or more, twenty two or more, twenty three or more, twenty four or more, twenty five or more twenty six or more, twenty seven or more, twenty eight or more, twenty nine or more, thirty or more, thirty one or more, thirty two or more, thirty three or more, thirty four or more, thirty five or more, thirty six or more, thirty seven or more, thirty eight or more, thirty nine or more, forty or more, forty one or more, forty two or more, forty three or more, forty four or more, forty five or more, forty six or more, forty seven or more, forty eight or more, forty nine or more, or each protein biomarker selected from CKMT1A, SEMA6C, CD2, CST5, PBXIP1, LECT2, PYY, AGRN, INSL5, CD38, PI16, CCN5, TNFRSF17, LY9, GPC1, CLMP, MEP1B, CCN1, PCDH7, SPARCL1, CRNN, PM20D1, TNFRSF12A, DSCAM, PALM, CX3CL1, MEP1A, SLURP1, APOA4, ADAMTSL5, MEPE, WFDC1, RPS10, CD300C, RIPK4, CALCB, RTBDN, ENO3, NTF3, PTPRZ1, LRP2BP, CPE, MCAM, BGN, PLB1, YAP1, TGFBI, CYB5A, EDDM3B, and SELENOP.

[00252] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more protein marker selected from ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLKIO, CCL24, GPR37, CD3D, TJP3, DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CAI 4, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PROS1, MEGF11, CTSO, CTLA4, CSF3R, FCAR, CTAG1A, SCPEP1, PRSS53, CRELD2, PILRA, PROC, VASH1, NOS3, BPIFB2, UPK3BL1, NOP56, JAM3, HLA-DRA, SIL1, TRPV3, EDEM2, POLR2A, CBLN1, FKBP7, CCL20, PILRB, SIRPB1, VSTM1, BST2, DLL4, C1RL, RNASET2, KCNH2, IL12RB2, FZD10, OXCT1, TREML2, GRIN2B, GFRAL, RGS8, LRPAP1, LRP2, IGSF21, DPT, HEPACAM2, MATN3, UXS1, PTTG1, BTN1A1, IL17C, SCIN, TK1, FKBP14, VWA5A, PRKG1, SV2A, PMCH, NEXN, CDCP1, DDX53, THSD1, PAK4, MMP12, FCN1, UMOD, PDIA4, IL6, BRK1, LILRA2, RBPMS2, SERPIND1, TPSG1, CEACAM5, FGF9, PPIF, RNF43, SIGLEC9, TOMM20, PDE5A, NELLI, GBA, PAEP, ERN1, PCSK7, CHCHD6, MARCO, SFTPA1, IL9, KYNU, SPINT1, LRFN2, NECTIN1, OSCAR, PZP, BPIFB1, LILRA5, CALY, RRAS, GADD45GIP1, ISM2, SCGB3A2, CEACAM6, LPP, GKN1, LRIG1, CLSPN, CXCL13, SFTPA2, COX6B1, PTGR1, RBPMS, PPT1, AOC1, PDLIM5, L3HYPDH, LONP1, APOL1, CEACAM18, FGF7, and KRT14.

[00253] In various embodiments, the panel of biomarkers include one or more proteins identified in Table 13 under the column “Gene Name”. In various embodiments, the panel of biomarkers include one or more proteins identified in Table 13 under the column “Gene Name” and differentially expressed in 1-3Y cohort (identified as “1-3Y only” or “Both” under the column “Cohort”). In various embodiments, the panel of biomarkers include two or more, five or more, ten or more... two hundred or more proteins identified in Table 13 under the column “Gene Name” and differentially expressed in 1-3Y cohort (identified as “1-3Y only” or “Both” under the column “Cohort”).

[00254] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, or each protein biomarker selected from GAST, ENPP2, FZD8, FGF23, and TFF1.

[00255] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, or each protein biomarker selected from MAPT, FGF16, OXT, BRD1, MFAP4, WNT9A, FLRT2, CRTAC1, PAPP A, POMC, NGF, IDI2, TPT1, EPHA10, and MFAP3.

[00256] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, sixteen is more, seventeen or more, eighteen or more, nineteen or more, twenty or more, twenty one or more, twenty two or more, twenty three or more, twenty four or more, twenty five or more twenty six or more, twenty seven or more, twenty eight or more, twenty nine or more, or each protein biomarker selected from SOWAHA, RARRES1, DUSP3, SEMA3F, CNTN3, LPA, KLK11, RPGR, EPO, TDGF1, IL17A, CD160, TNPO1, GAMT, ENPP6, TMEM25, GIP, CSPG5, SCGN, TMPRSS15, LAIR2, KIRREL1, NTF4, TSPAN7, ENDOU, KLK10, CCL24, GPR37, CD3D, and TJP3.

[00257] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, thirteen or more, fourteen or more, fifteen or more, sixteen is more, seventeen or more, eighteen or more, nineteen or more, twenty or more, twenty one or more, twenty two or more, twenty three or more, twenty four or more, twenty five or more twenty six or more, twenty seven or more, twenty eight or more, twenty nine or more, thirty or more, thirty one or more, thirty two or more, thirty three or more, thirty four or more, thirty five or more, thirty six or more, thirty seven or more, thirty eight or more, thirty nine or more, forty or more, forty one or more, forty two or more, forty three or more, forty four or more, forty five or more, forty six or more, forty seven or more, forty eight or more, forty nine or more, or each protein biomarker selected from DKKL1, CFC1, LRRC38, GCG, AGBL2, FASLG, AHNAK2, WFIKKN2, ANXA10, HS6ST1, DUSP29, CA14, CLEC7A, PHLDB2, SCRG1, RSPO3, TOPI, TINAGL1, NCAM1, FAM3D, FLT3LG, ZP3, AGRP, ASAH2, PDGFRB, AFM, NPY, PPY, XG, MFGE8, PR0S1, MEGF11, SCT, CFB, Fl 1, ANK2, EN0PH1, UGDH, ASAHI, ERBB4, IL36A, FGA, C5, OSMR, SSBP1, RICTOR, LRG1, C4BPB, AIDA, and SSC4D.

[00258] In particular embodiments, a panel of the prediction model (such as the panel of the prediction model shown in any of FIG. 3) includes one or more protein marker selected from GRN, IFNAR1, ENPEP, ACADSB, MAN1A2, GBP4, SERPING1, COL4A4, SOX2, GRSF1, PRAME, KIR2DS4, ADAMTS1, ITPRIP, CRISP3, DSG4, ITIH4, MRC1, GABRA4, SERPINA3, MILR1, PLIN1, SHH, KLKB1, IL17RA, MMP10, LBP, SMAD5, ADRA2A, SESTD1, CFI, AKR7L, CTSH, LYPD3, CBLIF, SMTN, CFH, SERPINC1, GDF15, PDZD2, ALDH2, IZUMO1, DNM3, CCL19, CSF2, MCEE, FDX1, SDC1, POSTN, GP2, CST7, CD14, NEK7, SHC1, CRELD1, TCN2, CMIP, CRHBP, C9, PXDNL, NRCAM, DLG4, TRAF3IP2, SULT2A1, GSTT2B, ITIH1, MRPL24, MUC16, IL3, CLU, FHIP2A, TK1, FKBP14, VWA5A, PRKG1, SV2A, PMCH, NEXN, CDCP1, DDX53, THSD1, PAK4, MMP12, FCN1, UMOD, PDIA4, IL6, BRK1, LILRA2, RBPMS2, SERPIND1, TPSG1, CEACAM5, FGF9, PPIF, RNF43, SIGLEC9, TOMM20, PDE5A, NELLI, GBA, PAEP, ERN1, PCSK7, CHCHD6, MARCO, SFTPA1, IL9, KYNU, SPINT1, LRFN2, NECTIN1, OSCAR, PZP, BPIFB1, LILRA5, CALY, RRAS, GADD45GIP1, ISM2, SCGB3A2, CEACAM6, LPP, GKN1, LRIG1, CLSPN, CXCL13, SFTPA2, COX6B1, PTGR1, RBPMS, PPT1, AOC1, PDLIM5, L3HYPDH, LONP1, APOL1, CEACAM18, FGF7, and KRT14.

V. Assays

[00259] As shown in FIG. 1 A, the system environment 100 involves implementing a marker quantification assay 120 for evaluating quantitative values of one or more biomarkers. Examples of an assay (e.g., marker quantification assay 120) for one or more markers include DNA assays, microarrays, polymerase chain reaction (PCR), RT-PCR, Southern blots, Northern blots, antibody-binding assays, enzyme-linked immunosorbent assays (ELISAs), flow cytometry, protein assays, Western blots, nephelometry, turbidimetry, chromatography, mass spectrometry, immunoassays, including, by way of example, but not limitation, RIA, immunofluorescence, immunochemiluminescence, immunoelectrochemiluminescence, or competitive immunoassays, immunoprecipitation, and the assays described in the Examples section below. The information from the assay can be quantitative and sent to a computer system of the invention. The information can also be qualitative, such as observing patterns or fluorescence, which can be translated into a quantitative measure by a user or automatically by a reader or computer system. [00260] Various immunoassays designed to quantitate markers can be used in screening including multiplex assays. Measuring the concentration of a target marker in a sample or fraction thereof can be accomplished by a variety of specific assays. For example, a conventional sandwich type assay can be used in an array, ELISA, RIA, etc. format. Other immunoassays include Ouchterlony plates that provide a simple determination of antibody binding. Additionally, Western blots can be performed on protein gels or protein spots on filters, using a detection system specific for the markers as desired, conveniently using a labeling method.

[00261] Protein based analysis, using an antibody that specifically binds to a polypeptide (e.g. marker), can be used to quantify the marker level in a test sample obtained from a subject. In various embodiments, an antibody that binds to a marker can be a monoclonal antibody. In various embodiments, an antibody that binds to a marker can be a polyclonal antibody. For multiplex analysis of markers, arrays containing one or more marker affinity reagents, e.g. antibodies can be generated. Such an array can be constructed comprising antibodies against markers. Detection can utilize one or a panel of marker affinity reagents, e.g. a panel or cocktail of affinity reagents specific for one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, or more markers.

[00262] In various embodiments, the multiplex assay involves the use of oligonucleotide labeled antibody probes that bind to target biomarkers and allow for subsequent quantification of biomarkers. One example of a multiplex assay that involves oligonucleotide labeled antibody probes is the Proximity Extension Assay (PEA) technology (Olink® Proteomics). Briefly, a pair of oligonucleotide labeled antibodies bind to a biomarker, wherein the two oligonucleotide sequences are complementary to one another. Thus, only when both antibodies bind to the target biomarker will the oligonucleotide sequences hybridize with one another. Mismatched oligonucleotide sequences (which occurs due to non-specific binding of antibodies or crossreactivity of antibodies) will not hybridize and therefore, will not result in a readout. Hybridized oligonucleotide sequences undergo nucleic acid extension and amplification, followed by quantification using microfluidic qPCR. The quantified levels correlate to the quantitative expression values of the respective biomarkers.

[00263] In various embodiments, the multiplex assay involves the use of bead conjugated antibodies (e.g., capture antibodies) that enable the binding and detection of biomarkers. One example of a multiplex assay involving bead conjugated antibodies is Luminex’s xMAP® Technology. Here, bead conjugated antibodies are added to the sample along with biotinylated detection antibodies. Both antibodies are specific to the biomarkers of interest and therefore, form an antibody-antigen sandwich. Streptavidin is further added, which binds to the biotinylated detection antibodies and enables detection of the complex. The Luminex 200™ or FlexMap® analyzer are employed to identify and quantify the amount of the biomarker in the sample. In various embodiments, the multiplex assay represents an improvement over Luminex’s xMAP® technology, such as the Multi- Analyte Profile (MAP) technology by Myriad Rules Based Medicine (RBM), Inc.

[00264] The information from the assay can be quantitative and sent to a computer system of the invention. The information can also be qualitative, such as observing patterns or fluorescence, which can be translated into a quantitative measure by a user or automatically by a reader or computer system.

[00265] In various embodiments, prior to implementation of a marker quantification assay 120, a sample obtained from a subject can be processed. In various embodiments, processing the sample enables the implementation of the marker quantification assay 120 to more accurately evaluate quantitative values of one or more biomarkers in the sample.

[00266] In various embodiments, the sample from a subject can be processed to extract biomarkers from the sample. In one embodiment, the sample can undergo phase separation to separate the biomarkers from other portions of the sample. For example, the sample can undergo centrifugation (e.g., pelleting or density gradient centrifugation) to separate larger and/or more dense entities in the sample (e.g., cells and other macromolecules) from the biomarkers. Other examples include filtration (e.g., ultrafiltration) to phase separate the biomarkers from other portions of the sample. [00267] In various embodiments, the sample from a subject can be processed to produce a subsample with a fraction of biomarkers that were in the sample. In various embodiments, producing a fraction of biomarkers can involve performing a fractionation procedure. One example of fractionation procedures include chromatography (e.g., gel filtration, ion exchange, hydrophobic chromatography, liquid chromatography or affinity chromatography). In particular embodiments, the protein fractionation procedure involves affinity purification or immunoprecipitation where biomarkers are bound by specific antibodies. Such antibodies can be immobilized on a support, such as a magnetic particle or nanoparticle or a plate.

VI. Therapeutic Agents and Compositions for Therapeutic Agents

[00268] In various embodiments, a therapeutic agent can be provided to a subject subsequent to obtaining the sample from the subject and determining quantitative values of one or more markers in the obtained sample. As one example, a prediction model that analyzes predictors including quantitative values of one or more markers predicts that an individual is likely to develop cancer within a time period. In various embodiments, the prediction model may generate a prediction that is informative for selecting a therapeutic agent to be provided to the subject, the therapeutic agent likely to delay or prevent the onset of the cancer within the time period. For example, if the prediction model predicts that the subject has a presence of cancer, the prediction from the prediction model can be used to select a therapeutic agent for treating the currently present cancer. As another example, if the prediction model predicts that the subject is likely to develop cancer within a future timeframe, the prediction from the prediction model can be used to select a therapeutic agent that can be administered prophylactically (e.g., to prevent or to slow the onset of the future development of the cancer).

[00269] In various embodiments the therapeutic agent is a biologic, e.g. a cytokine, antibody, soluble cytokine receptor, anti-sense oligonucleotide, siRNA, RNA/DNA based vaccine, immune cell based therapies (e.g., adoptive cell therapy), and the like. Such biologic agents encompass muteins and derivatives of the biological agent, which derivatives can include, for example, fusion proteins, PEGylated derivatives, cholesterol conjugated derivatives, and the like as known in the art. Also included are antagonists of cytokines and cytokine receptors, e.g. traps and monoclonal antagonists. Also included are biosimilar or bioequivalent drugs to the active agents set forth herein. In various embodiments, the therapeutic agent can be radiotherapy or a surgical intervention.

[00270] Therapeutic agents for lung cancer can include chemotherapeutics such as docetaxel, doxorubicin hydrocholoride, methotrexate, cisplatin, carboplatin, gemcitabine, Nab-paclitaxel, paclitaxel, pemetrexed, gefitinib, erlotinib, brigatinib (Alunbrig®), capmatinib (Tabrecta®), selpercatinib (Retevmo®), entrectinib (Rozlytrek®), lorlatinib (Lorbrena®), larotrectinib (Vitrakvi®), dacomitinib (Vizimpro®), everolimus (Afinitor®), vinorelbine, pralsetinib (Gavreto®), dabrafenib (Tafinlar®), trametinib (Mekinist®), crizotinib (Xalkori®), alectinib (Alecensa®), ceritinib (Zykadia®), osimertinib (Tagrisso®). Afatinib (Gilotrif®), dacomitinib (Vizimpro®), and nintedanib (Vargatef®). Therapeutic agents for lung cancer can include antibody therapies such as durvalumab (Imfinzi®), nivolumab (Opdivo®), pembrolizumab (Keytruda®), atezolizumab (Tecentriq®), ramucirumab, bevacizumab (Avastin®, Mvasi®, Zirabev®), necitumumab (Portrazza®), and ipilimumab (Yervoy®).

[00271] A pharmaceutical composition administered to an individual includes an active agent such as the therapeutic agent described above. The active ingredient is present in a therapeutically effective amount, i.e., an amount sufficient when administered to treat a disease or medical condition mediated thereby. The compositions can also include various other agents to enhance delivery and efficacy, e.g. to enhance delivery and stability of the active ingredients. Thus, for example, the compositions can also include, depending on the formulation desired, pharmaceutically-acceptable, non-toxic carriers or diluents, which are defined as vehicles commonly used to formulate pharmaceutical compositions for animal or human administration. The diluent is selected so as not to affect the biological activity of the combination. Examples of such diluents are distilled water, buffered water, physiological saline, PBS, Ringer’s solution, dextrose solution, and Hank’s solution. In addition, the pharmaceutical composition or formulation can include other carriers, adjuvants, or non-toxic, nontherapeutic, nonimmunogenic stabilizers, excipients and the like. The compositions can also include additional substances to approximate physiological conditions, such as pH adjusting and buffering agents, toxicity adjusting agents, wetting agents and detergents. The composition can also include any of a variety of stabilizing agents, such as an antioxidant. [00272] The pharmaceutical compositions described herein can be administered in a variety of different ways. Examples include administering a composition containing a pharmaceutically acceptable carrier via oral, intranasal, rectal, topical, intraperitoneal, intravenous, intramuscular, subcutaneous, subdermal, transdermal, intrathecal, or intracranial method.

[00273] Such a pharmaceutical composition may be administered for treatment (e.g., after diagnosis of a patient with lung cancer) purposes. Preventing, prophylaxis or prevention of a disease or disorder as used in the context of this invention refers to the administration of a composition to prevent the occurrence, onset, progression, or recurrence of lung cancer some or all of the symptoms of lung cancer or to lessen the likelihood of the onset of lung cancer. Treating, treatment, or therapy of lung cancer shall mean slowing, stopping or reversing the cancer’s progression by administration of treatment according to the present invention. In the preferred embodiment, treating lung cancer means reversing the cancer’s progression, ideally to the point of eliminating the cancer itself.

VII. Cancers

[00274] Methods described herein involve diagnosing a cancer in a subject. In various embodiments, the cancer in the subject can include one or more of: lymphoma, B cell lymphoma, T cell lymphoma, mycosis fungoides, Hodgkin's Disease, myeloid leukemia, bladder cancer, brain cancer, nervous system cancer, head and neck cancer, squamous cell carcinoma of head and neck, kidney cancer, lung cancer, neuroblastoma/glioblastoma, ovarian cancer, pancreatic cancer, prostate cancer, skin cancer, liver cancer, melanoma, squamous cell carcinomas of the mouth, throat, larynx, and lung, colon cancer, cervical cancer, cervical carcinoma, breast cancer, and epithelial cancer, renal cancer, genitourinary cancer, pulmonary cancer, esophageal carcinoma, head and neck carcinoma, large bowel cancer, hematopoietic cancer, testicular cancer, colon and/or rectal cancer, prostatic cancer, or pancreatic cancer.

[00275] In various embodiments, the cancer in the subject can be a particular subtype of a lung cancer. Example lung cancer subtypes include, but are not limited to: small cell lung cancer, non-small cell lung cancer, adenocarcinoma, squamous cell cancer, large cell carcinoma, small cell carcinoma, combined small cell carcinoma, lung sarcoma, lung lymphoma, bronchial carcinoids, and a stage of lung cancer (e.g., stage 1, stage 2, stage 3, or stage 4).

[00276] In various embodiments, the methods disclosed herein involve predicting a future risk of cancer, such as lung cancer, in a subject, In various embodiments, the methods disclosed herein involve predicting a future risk of a subtype of lung cancer, such as one of adenocarcinoma, squamous cell cancer, or large cell carcinoma.

[00277]

VIII. Computer Implementation

[00278] The methods of the invention, including the methods of predicting risk of cancer in an individual, are, in some embodiments, performed on one or more computers.

[00279] For example, the building and deployment of a prediction model and database storage can be implemented in hardware or software, or a combination of both. In one embodiment of the invention, a machine-readable storage medium is provided, the medium comprising a data storage material encoded with machine readable data which, when using a machine programmed with instructions for using said data, is capable of displaying any of the datasets and execution and results of a prediction model. Such data can be used for a variety of purposes, such as patient monitoring, treatment considerations, and the like. The invention can be implemented in computer programs executing on programmable computers, comprising a processor, a data storage system (including volatile and non-volatile memory and/or storage elements), a graphics adapter, a pointing device, a network adapter, at least one input device, and at least one output device. A display is coupled to the graphics adapter. Program code is applied to input data to perform the functions described above and generate output information. The output information is applied to one or more output devices, in known fashion. The computer can be, for example, a personal computer, microcomputer, or workstation of conventional design.

[00280] Each program can be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language can be a compiled or interpreted language. Each such computer program is preferably stored on a storage media or device (e.g., ROM or magnetic diskette) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. The system can also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.

[00281] The signature patterns and databases thereof can be provided in a variety of media to facilitate their use. “Media” refers to a manufacture that contains the signature pattern information of the present invention. The databases of the present invention can be recorded on computer readable media, e.g. any medium that can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. One of skill in the art can readily appreciate how any of the presently known computer readable mediums can be used to create a manufacture comprising a recording of the present database information. "Recorded" refers to a process for storing information on computer readable medium, using any such methods as known in the art. Any convenient data storage structure can be chosen, based on the means used to access the stored information. A variety of data processor programs and formats can be used for storage, e.g. word processing text file, database format, etc.

[00282] In some embodiments, the methods of the invention, including the methods of predicting risk of cancer in an individual, are performed on one or more computers in a distributed computing system environment (e.g., in a cloud computing environment). In this description, “cloud computing” is defined as a model for enabling on-demand network access to a shared set of configurable computing resources. Cloud computing can be employed to offer on-demand access to the shared set of configurable computing resources. The shared set of configurable computing resources can be rapidly provisioned via virtualization and released with low management effort or service provider interaction, and then scaled accordingly. A cloudcomputing model can be composed of various characteristics such as, for example, on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth. A cloud-computing model can also expose various service models, such as, for example, Software as a Service (“SaaS”), Platform as a Service (“PaaS”), and Infrastructure as a Service (“laaS”). A cloud-computing model can also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth. In this description and in the claims, a “cloud-computing environment” is an environment in which cloud computing is employed.

VIII.A. Example Computer

[00283] FIG. 4 illustrates an example computer for implementing the entities shown in FIG. 1 A, IB, 2, and 3. The computer 400 includes at least one processor 402 coupled to a chipset 404. The chipset 404 includes a memory controller hub 420 and an input/output (I/O) controller hub 422. A memory 406 and a graphics adapter 412 are coupled to the memory controller hub 420, and a display 418 is coupled to the graphics adapter 412. A storage device 408, an input interface 414, and network adapter 416 are coupled to the I/O controller hub 422. Other embodiments of the computer 400 have different architectures.

[00284] The storage device 408 is a non-transitory computer-readable storage medium such as a hard drive, compact disk read-only memory (CD-ROM), DVD, or a solid-state memory device. The memory 406 holds instructions and data used by the processor 402. The input interface 414 is a touch-screen interface, a mouse, track ball, or other type of pointing device, a keyboard 410, or some combination thereof, and is used to input data into the computer 400. In some embodiments, the computer 400 may be configured to receive input (e.g., commands) from the input interface 414 via gestures from the user. The graphics adapter 412 displays images and other information on the display 418. The network adapter 416 couples the computer 400 to one or more computer networks.

[00285] The computer 400 is adapted to execute computer program modules for providing functionality described herein. As used herein, the term “module” refers to computer program logic used to provide the specified functionality. Thus, a module can be implemented in hardware, firmware, and/or software. In one embodiment, program modules are stored on the storage device 408, loaded into the memory 406, and executed by the processor 402. [00286] The types of computers 400 used by the entities of FIG. 1A, IB, and 2 can vary depending upon the embodiment and the processing power required by the entity. For example, the cancer prediction system 130 can run in a single computer 400 or multiple computers 400 communicating with each other through a network such as in a server farm. The computers 400 can lack some of the components described above, such as graphics adapters 412, and displays 418.

IX. Kit Implementation

[00287] Also disclosed herein are kits for predicting risk of a cancer in an individual. Such kits can include reagents for detecting quantitative values of one or biomarkers and instructions for predicting risk of cancer based on at least the detected quantitative values of the biomarkers. [00288] The detection reagents can be provided as part of a kit. Thus, the invention further provides kits for detecting the presence of a panel of biomarkers of interest in a biological test sample. A kit can comprise one or more sets of reagents for generating a dataset via at least one detection assay that analyzes the test sample from the subject. In various embodiments, the set of reagents enables detection of quantitative values of protein biomarkers, such as any of the protein biomarkers described herein and in particular, any of the protein biomarkers identified in Tables 1-3.

[00289] A kit can include instructions for use of one or more sets of reagents. For example, a kit can include instructions for performing at least one marker quantification assay, examples of which are described herein. In various embodiments, the kits include instructions for practicing the methods disclosed herein (e.g., methods for training or deploying a prediction model to predict risk of cancer). These instructions can be present in the subject kits in a variety of forms, one or more of which can be present in the kit. One form in which these instructions can be present is as printed information on a suitable medium or substrate, e.g., a piece or pieces of paper on which the information is printed, in the packaging of the kit, in a package insert, etc. Yet another means would be a computer readable medium, e.g., diskette, CD, hard-drive, network data storage, etc., on which the information has been recorded. Yet another means that can be present is a website address which can be used via the internet to access the information at a removed site. Any convenient means can be present in the kits. X. Systems

[00290] Further disclosed herein are systems for predicting risk of cancer in a subject. In various embodiments, such a system can include one or more sets of reagents for detecting quantitative values of biomarkers in one or more panels of a prediction model, an apparatus configured to receive a mixture of the one or more sets of reagents and a test sample obtained from a subject to measure the quantitative values of the biomarkers, and a computer system communicatively coupled to the apparatus to obtain the measured quantitative values and to implement the prediction model to predict risk of cancer in a subject.

[00291] The one or more sets of reagents enable the detection of quantitative levels of the biomarkers in the biomarker panel. In various embodiments, the one or more sets of reagents involve reagents used to perform one or more assays more measuring levels of protein biomarkers. For example, the reagents include one or more antibodies that bind to one or more of the biomarkers. The antibodies may be monoclonal antibodies or polyclonal antibodies. As another example, the reagents can include reagents for performing ELISA including buffers and detection agents.

[00292] The apparatus is configured to detect quantitative levels of biomarkers in a mixture of a reagent and test sample. As an example, the apparatus can determine quantitative levels of biomarkers through a protein detection assay (e.g., a protein detection assay that uses one of NMR spectroscopy or LC-MS).

[00293] The mixture of the reagent and test sample may be presented to the apparatus through various conduits, examples of which include wells of a well plate (e.g., 96 well plate), a vial, a tube, and integrated fluidic circuits. As such, the apparatus may have an opening (e.g., a slot, a cavity, an opening, a sliding tray) that can receive the container including the reagent test sample mixture and perform a reading to generate quantitative values of biomarkers. Examples of an apparatus include a plate reader (e.g., a luminescent plate reader, absorbance plate reader, fluorescence plate reader), a spectrometer, and a spectrophotometer. Further examples of an apparatus include an NMR spectroscopy system or a LC-MS system.

[00294] The computer system, such as example computer 400 described in FIG. 4, communicates with the apparatus to receive the quantitative values of biomarkers. The computer system implements, in silico, a prediction model to analyze the quantitative values of the biomarkers and predict risk of cancer for the subject.

ADDITIONAL EMBODIMENTS

[00295] Disclosed herein are methods for predicting risk of cancer in a subject, the method comprising: obtaining or having obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1, and generating a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers.

[00296] In various embodiments, the protein biomarkers comprise three or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1.

[00297] In various embodiments, the protein biomarkers comprise four or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1.

[00298] In various embodiments, the protein biomarkers comprise each of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1.

[00299] In various embodiments, the protein biomarkers further comprise one or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[00300] In various embodiments, the protein biomarkers further comprise five or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[00301] In various embodiments, the protein biomarkers further comprise ten or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[00302] In various embodiments, the protein biomarkers further comprise each of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16. [00303] In various embodiments, the protein biomarkers further comprise one or more, five or more, or each of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[00304] In various embodiments, the protein biomarkers further comprise one or more of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[00305] In various embodiments, the protein biomarkers further comprise five or more of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[00306] In various embodiments, the protein biomarkers further comprise each of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[00307] In various embodiments, the predictive model comprises a elastic net regression model, and the predictive model achieves an area under a curve (AUC) value of at least 0.65. In various embodiments, the predictive model comprises a support vector machine, and the predictive model achieves an area under a curve (AUC) value of at least 0.70. In various embodiments, the predictive model comprises a random forest model, and the predictive model achieves an area under a curve (AUC) value of at least 0.67. In various embodiments, the predictive model comprises a XGBoost model, and the predictive model achieves an area under a curve (AUC) value of at least 0.68.

[00308] Additionally disclosed herein is a method for predicting risk of cancer in a subject, the method comprising: obtaining or having obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of CEACAM5, TOPI, NCAM1, SCGB3A2, and CALY, and generating a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers.

[00309] In various embodiments, the protein biomarkers comprise three or more of CEACAM5, TOPI, NCAM1, SCGB3A2, and CALY.

[00310] In various embodiments, the protein biomarkers comprise four or more of CEACAM5, TOPI, NCAM1, SCGB3A2, and CALY.

[00311] In various embodiments, the protein biomarkers comprise each of CEACAM5, TOPI, NCAM1, SCGB3A2, and CALY. [00312] In various embodiments, the protein biomarkers further comprise one or more of TGFBI, CABP2, ENPP6, KRT14, HEPACAM2, TMEM25, SGSH, MFAP3L, TNFSF14, CD3D, TMED4, ZP3, MMP12, GCG, and AFM.

[00313] In various embodiments, the protein biomarkers further comprise five or more of TGFBI, CABP2, ENPP6, KRT14, HEPACAM2, TMEM25, SGSH, MFAP3L, TNFSF14, CD3D, TMED4, ZP3, MMP12, GCG, and AFM.

[00314] In various embodiments, the protein biomarkers further comprise ten or more of TGFBI, CABP2, ENPP6, KRT14, HEPACAM2, TMEM25, SGSH, MFAP3L, TNFSF14, CD3D, TMED4, ZP3, MMP12, GCG, and AFM.

[00315] In various embodiments, the protein biomarkers further comprise each of TGFBI, CABP2, ENPP6, KRT14, HEPACAM2, TMEM25, SGSH, MFAP3L, TNFSF14, CD3D, TMED4, ZP3, MMP12, GCG, and AFM.

[00316] In various embodiments, the protein biomarkers further comprise one or more of SPINT1, LILRA4, FLT3LG, AGBL2, PAEP, SCGB3A1, LRFN2, TJP3, FGF7, LRIG1, CA14, CEACAM18, CST1, ANXA10, CDCP1, GPC5, OSCAR, CEACAM6, CD2, SNCG, GPR37, SEPTIN3, RAB10, DKK4, DKKL1, SOST, CSF3, VWA5A, TSPAN7, and PAK4.

[00317] In various embodiments, the protein biomarkers further comprise five or more of SPINT1, LILRA4, FLT3LG, AGBL2, PAEP, SCGB3A1, LRFN2, TJP3, FGF7, LRIG1, CAI 4, CEACAM18, CST1, ANXA10, CDCP1, GPC5, OSCAR, CEACAM6, CD2, SNCG, GPR37, SEPTIN3, RAB10, DKK4, DKKL1, SOST, CSF3, VWA5A, TSPAN7, and PAK4.

[00318] In various embodiments, the protein biomarkers further comprise ten or more of SPINT1, LILRA4, FLT3LG, AGBL2, PAEP, SCGB3A1, LRFN2, TJP3, FGF7, LRIG1, CAI 4, CEACAM18, CST1, ANXA10, CDCP1, GPC5, OSCAR, CEACAM6, CD2, SNCG, GPR37, SEPTIN3, RAB10, DKK4, DKKL1, SOST, CSF3, VWA5A, TSPAN7, and PAK4.

[00319] In various embodiments, the protein biomarkers further comprise twenty or more of SPINT1, LILRA4, FLT3LG, AGBL2, PAEP, SCGB3A1, LRFN2, TJP3, FGF7, LRIG1, CAI 4, CEACAM18, CST1, ANXA10, CDCP1, GPC5, OSCAR, CEACAM6, CD2, SNCG, GPR37, SEPTIN3, RAB10, DKK4, DKKL1, SOST, CSF3, VWA5A, TSPAN7, and PAK4.

[00320] In various embodiments, the protein biomarkers further comprise each of SPINT1, LILRA4, FLT3LG, AGBL2, PAEP, SCGB3A1, LRFN2, TJP3, FGF7, LRIG1, CA14, CEACAM18, CST1, ANXA10, CDCP1, GPC5, OSCAR, CEACAM6, CD2, SNCG, GPR37, SEPTIN3, RAB10, DKK4, DKKL1, SOST, CSF3, VWA5A, TSPAN7, and PAK4.

[00321] In various embodiments, the protein biomarkers further comprise one or more of BPIFB1, SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TMOD4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2.

[00322] In various embodiments, the protein biomarkers further comprise five or more of BPIFB1, SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TMOD4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2.

[00323] In various embodiments, the protein biomarkers further comprise ten or more of BPIFB1, SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TMOD4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2.

[00324] In various embodiments, the protein biomarkers further comprise twenty or more of BPIFB1, SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TMOD4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2.

[00325] In various embodiments, the protein biomarkers further comprise thirty or more of BPIFB1, SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TM0D4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2.

[00326] In various embodiments, the protein biomarkers further comprise forty or more of BPIFB1, SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TMOD4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2.

[00327] In various embodiments, the protein biomarkers further comprise each of BPIFB 1 , SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TMOD4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2.

[00328] In various embodiments, the protein biomarkers further comprise one or more of NECTIN1, CBLN1, NTF3, PYY, XG, NPY, CCL20, SIL1, PLB1, DUSP29, UMOD, ATXN2L, LEO1, PROS1, EDDM3B, ENO3, DCBLD2, MMP9, KIF22, DENND2B, C1RL, PVALB, CXCL8, PPY, CCN1, KLK10, RRAS, SCN3B, BPIFB2, ITGAL, DDX1, MEGF11, NOP56, NTF4, HNMT, IL9, SCRIB, UXS1, MEP1A, ACTN2, NECAP2, CLEC10A, DDX53, SV2A, ATXN10, PH 6, KCNH2, TNR, PDGFRB, SERPINA4, CDC27, MICALL2, CD28, BRK1, SLC16A1, DSCAM, PBXIP1, MATN3, SFTPA2, PTTG1, ASAH2, SCG2, PTGR1, GBA, PTPRZ1, ERN1, LECT2, SCGN, HLA-DRA, IL5RA, LRPAP1, CXCL13, NEXN, CD248, KYNU, ADAMTS15, WFIKKN2, CLEC14A, FZD10, PROC, LY9, LRP2, CX3CL1, RNASET2, CTSS, MCEMP1, COMP, SIGLEC6, CCL24, AOC1, PLXNB3, TMPRSS15, FCAR, SCIN, IFI30, KIRREL1, FXYD5, S100A16, LILRA5, CLSPN, AHNAK2, CTLA4, INSL5, WDR46, CST5, PHLDB2, TREML2, GUCA2A, PFDN2, PDIA4, LAMA1, SLAMF7, RGS8, IL6, PSG1, PZP, RRM2, GFRAL, AIF1L, LGMN, C1QTNF9, TSPAN1, DLL4, CRELD2, SCARF1, FGF9, JAM3, LPP, HSPB1, PPT1, PPIF, TRPV3, APOA4, LYSMD3, TGFA, ATP6V1D, LRRC38, CTAG1A, TINAGL1, POLR2A, EDIL3, LAP3, SORD,

ARHGAP30, CSPG4, ART3, GADD45GIP1, SLURP1, LILRA2, GZMH, FKBP7, SLC27A4, CALCB, GIT1, CTSO, PCBD1, CSF3R, EIF1AX, CSPG5, CD93, ADAMTSL5, ISM2, CPE, WFDC1, VWC2, SPINK5, BTN1A1, DPT, FCN1, AIF1, GPC1, FAP, CLNS1A, CFC1, FASLG, NCS1, PRKAR1A, RC0R1, SLITRK2, SPARCL1, HSPB6, TNFRSF12A, IL6, SERPIND1, CEBPB, CASC3, AMPD3, YTHDF3, AAMDC, STX7, AGRP, ICA1, CHCHD6, IGSF21, VSTM1, PCDH7, VNN2, GP6, ITGAV, CD40LG, GIP, MB, TPD52L2, HPSE, GRIN2B, TREML1, C3, TNFRSF17, IL6, CD226, PALM, FKBP14, RBPMS2, CLEC6A, DAAM1, FAM3D, WASF1, HS1BP3, NOS3, POF1B, PLXNA4, MITD1, ERMAP, SYAP1, LRRC59, CNTN2, RAB2B, PENK, MCAM, EIF2S2, EGF, PTPN6, NID2, EHD3, IGFBP6, LM0D1, PAGR1, CD300C, SKAP2, PRKG1, SYTL4, GYSI, CASP3, PILRA, CD69, CCN5, PCBP2, LM0D1, PDIA5, PCSK7, SCARA5, METAP1D, ADGRB3, MPIG6B, NUMB, L3HYPDH, DENR, AGRN, COX6B1, JAM2, TIA1, CACYBP, SEMA6C, VAT1, SUSD1, RSPO3, TWF2, BOLA1, OXCT1, ITGA6, BST2, F2R, PILRB, RTBDN, ENOX2, D0K1, VASH1, DTD1, DDHD2, TBC1D23, GLRX5, CDNF, SIRPB1, NMT1, STK11, RPL14, PSTPIP2, FHIT, CLMP, LM0D1, ERP29, BECN1, CD38, YAP1, CAB, CRKL, PPP1R9B, FLU, CMC1, CDC37, ARHGAP45, PDAP1, NUDC, CLEC1B, USO1, SNAP23, HGS, FUS, PIK3AP1, FUR, TBC1D17, ITPA, IL1B, ENO1, THTPA, SAFB2, JPT2, GIMAP7, NIT2, RILPL2, PRTFDC1, TAD A3, TOMM20, HPCAL1, LONP1, CALCOCO1, ATRAID, TYMP, TNFRSF19, DNPEP, NRGN, STK4, SSNA1, CRYGD, LZTFL1, SNAP29, PDLIM5, CASP2, MANF, BACH1, DAPP1, AKR1B1, EREG, DAG1, HSBP1, DUT, AKT2, PLA2G4A, TXLNA, PIKFYVE, FYB1, CSDE1, RHOC, HNRNPK, DCTD, SCRG1, LACTB2, RGCC, GIMAP8, GRHPR, SNX5, NCK2, EIF4G1, BNIP3L, ACOT13, MECR, MAP2K6, SEC31A, MGLL, MESD, NUDT16, SULT1A1, GOPC, VTA1, PDLIM7, ANXA2, GGACT, PMVK, USP8, SNCA, CAMSAP1, HEXIM I , SHMT1, LGALS8, APPL2, MAP2K1, EHBP1, MAP4K5, PDE5A, HARS1, SRC, TACC3, and RAB27B.

[00329] In various embodiments, the predictive model comprises a elastic net regression model, and the predictive model achieves an area under a curve (AUC) value of at least 0.85. In various embodiments, the predictive model comprises a support vector machine, and the predictive model achieves an area under a curve (AUC) value of at least 0.84. In various embodiments, the predictive model comprises a random forest model, and the predictive model achieves an area under a curve (AUC) value of at least 0.72. In various embodiments, the predictive model comprises a XGBoost model, and the predictive model achieves an area under a curve (AUC) value of at least 0.73.

[00330] Additionally disclosed herein is a method for predicting risk of cancer in a subject, the method comprising: obtaining or having obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of VWA5A, ENPP6, TMEM25, ALDH2, and LEO1, and generating a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers.

[00331] In various embodiments, the protein biomarkers comprise three or more of VWA5A, ENPP6, TMEM25, ALDH2, and LEO 1.

[00332] In various embodiments, the protein biomarkers comprise four or more of VWA5A, ENPP6, TMEM25, ALDH2, and LEO 1.

[00333] In various embodiments, the protein biomarkers comprise each of VWA5 A, ENPP6, TMEM25, ALDH2, and LEO1.

[00334] In various embodiments, the protein biomarkers further comprise one or more of GAMT, TPSG1, ANK2, SCT, TSPAN7, GPC5, PGLYRP1, PAK4, TNFSF14, CLEC6A, TMPRSS15, PMCH, KRT14, SFTPA1, and LRFN2.

[00335] In various embodiments, the protein biomarkers further comprise five or more of GAMT, TPSG1, ANK2, SCT, TSPAN7, GPC5, PGLYRP1, PAK4, TNFSF14, CLEC6A, TMPRSS15, PMCH, KRT14, SFTPA1, and LRFN2.

[00336] In various embodiments, the protein biomarkers further comprise ten or more of GAMT, TPSG1, ANK2, SCT, TSPAN7, GPC5, PGLYRP1, PAK4, TNFSF14, CLEC6A, TMPRSS15, PMCH, KRT14, SFTPA1, and LRFN2.

[00337] In various embodiments, the protein biomarkers further comprise each of GAMT, TPSG1, ANK2, SCT, TSPAN7, GPC5, PGLYRP1, PAK4, TNFSF14, CLEC6A, TMPRSS15, PMCH, KRT14, SFTPA1, and LRFN2.

[00338] In various embodiments, the protein biomarkers further comprise one or more of MMP12, TNPO1, GAST, CD3D, TK1, DLGAP5, SCGN, CCL24, PSG1, CLU, CFB, LBP, CRYM, LAIR2, TCN2, SV2A, CRHBP, C5, SCGB3A2, ANXA10, GCG, RPGR, PAPP A, FZD8, CSPG5, BRK1, OXT, FDX1, ENPEP, and LRG1. [00339] In various embodiments, the protein biomarkers further comprise five or more of MMP12, TNPO1, GAST, CD3D, TK1, DLGAP5, SCGN, CCL24, PSG1, CLU, CFB, LBP, CRYM, LAIR2, TCN2, SV2A, CRHBP, C5, SCGB3A2, ANXA10, GCG, RPGR, PAPP A, FZD8, CSPG5, BRK1, OXT, FDX1, ENPEP, and LRG1.

[00340] In various embodiments, the protein biomarkers further comprise ten or more of MMP12, TNPO1, GAST, CD3D, TK1, DLGAP5, SCGN, CCL24, PSG1, CLU, CFB, LBP, CRYM, LAIR2, TCN2, SV2A, CRHBP, C5, SCGB3A2, ANXA10, GCG, RPGR, PAPP A, FZD8, CSPG5, BRK1, OXT, FDX1, ENPEP, and LRG1.

[00341] In various embodiments, the protein biomarkers further comprise twenty or more of MMP12, TNPO1, GAST, CD3D, TK1, DLGAP5, SCGN, CCL24, PSG1, CLU, CFB, LBP, CRYM, LAIR2, TCN2, SV2A, CRHBP, C5, SCGB3A2, ANXA10, GCG, RPGR, PAPP A, FZD8, CSPG5, BRK1, OXT, FDX1, ENPEP, and LRG1.

[00342] In various embodiments, the protein biomarkers further comprise each of MMP12, TNPO1, GAST, CD3D, TK1, DLGAP5, SCGN, CCL24, PSG1, CLU, CFB, LBP, CRYM, LAIR2, TCN2, SV2A, CRHBP, C5, SCGB3A2, ANXA10, GCG, RPGR, PAPPA, FZD8, CSPG5, BRK1, OXT, FDX1, ENPEP, and LRG1.

[00343] In various embodiments, the protein biomarkers further comprise one or more of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRES1. [00344] In various embodiments, the protein biomarkers further comprise five or more of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRES1. [00345] In various embodiments, the protein biomarkers further comprise ten or more of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRESE [00346] In various embodiments, the protein biomarkers further comprise twenty or more of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRESE [00347] In various embodiments, the protein biomarkers further comprise thirty or more of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRESE [00348] In various embodiments, the protein biomarkers further comprise forty or more of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRESE [00349] In various embodiments, the protein biomarkers further comprise each of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRESE

[00350] In various embodiments, the protein biomarkers further comprise one or more of

SLC27A4, IL6, DKKL1, MFAP3, STX7, SSBP1, AKR7L, UGDH, IGHMBP2, GBP4, RBPMS, ST6GAL1, LILRA5, LILRA2, SOWAHA, ACADSB, CAMLG, CRTAC1, SUSD1, IL6, KLKIO, GRSF1, MFAP4, NMT1, CNTN3, IL36A, EHD3, MAPT, AGBL2, ERN1, POMC, PDIA4, LGMN, EPHA10, PCBP2, PTGR1, GIT1, TREML1, GALNT2, TDGF1, INSR, OSCAR, MMP10, MRPL24, EIF1AX, AHNAK2, TP53, GBA, LRRC38, CLEC12A, TPT1, PPP1CC, BPIFB1, CFC1, SIGLEC9, CALY, OSM, ADAMTS1, OSMR, TYMP, GPR37, CLEC7A, SMAD5, SFTPA2, CTSS, HNMT, BATF, CCL19, SHC1, CST7, S100A12, ASAH2, PPIB, LYPD3, APOL1, AFM, SSC4D, FGF7, TDRKH, SCG2, ENPP2, PRKAR1A, FAM3D, GADD45GIP1, SEMA4D, PPP1R14A, EGF, NTF4, SERPING1, COX6B1, NECAP2, TFF1, IDI2, TJP3, CA I 4, PZP, PLIN1, ERBB4, TBC1D23, CRISP3, IFI30, ITIH1, C9, LAP3, PDIA5, ENDOU, FLT3LG, VNN2, MILR1, SDC1, CEACAM18, FHIP2A, CEACAM5, Fl 1, WFIKKN2, USO1, CD40LG, GSTT2B, DUSP29, ATXN2L, IL6, RRM2, FGF23, ARHGAP30, SERPINA3, CXCL13, MMP8, NUDC, ENOPH1, NEK7, MAN1A2, ASAHI, STX5, IZUMO1, SERPINC1, IL9, PVALB, GZMH, FGF16, TFF2, WASF1, TMEM106A, GP2, PLXNA4, GNE, LGALS8, AOC1, FLRT2, CHCHD6, RNF43, TPD52L2, CSDE1, GPD1, PLA2G4A, LRIG1, NGF, RAB27B, VAT1, NUDT16, TRAF3IP2, MARCO, UMOD, PIK3AP1, MEGF11, NEDD4L, PKD2, CEBPB, RILPL2, IL3, RGCC, SARG, SMAD2, CTSH, KLKB1, ERP44, SULT2A1, SORD, IFNAR1, KLK11, TOMM20, C3, ADRA2A, NCK2, KIRREL2, CACNB3, SKAP2, CEACAM6, DNAJC21, PROS1, NRCAM, NPY, FYB1, RAB2B, MANF, MECR, LPA, DAAM1, DCTD, FXYD5, CRELD1, PLEKHO1, TINAGL1, ZBTB16, PROK1, MAP2K1, DAPP1, DSG4, PPP1R9B, RILP, EIF4G1, SESTD1, KIFBP, HGS, CD14, ANKMY2, WNT9A, CAB, GP1BB, CLIP2, BANK1, WDR46, HSPB1, CSF2, SNCA, RRAS, PRTFDC1, RBPMS2, LARP1, KAZN, CLSPN, RHOC, PPT1, DPEP2, METAP1D, STK11, CFH, PDE5A, MRC1, BIN2, IL17A, PXDNL, GP6, EPO, MAP3K5, MCEE, DDHD2, PHLDB2, NECTTN1, CCDC50, GKN1, MPIG6B, CBLIF, SYTL4, SSH3, PDZD2, SULT1A1, DLG4, HPCAL1, ICA1, GDF15, CD160, APPL2, GRN, IL17RA, CDC42BPB, C4BPB, DAG1, CMIP, KYNU, NUMB, PPY, PPIF, CFI, DTD1, LDLRAP1, FGF9, STXBP1, CMC1, GOPC, SMTN, PTPN6, L3HYPDH, PDAP1, LPP, THTPA, XG, AGRP, RABI 1FIP3, FUR, BCR, LONP1, BNIP3L, SELP, GYSI, MGLL, PDLIM5, MESD, DNPEP, SRC, PMVK, ITPRIP, CD69, CALCOCO1, PAFAH2, GIPC3, SNAP23, STAT5B, RSPO3, AKT1S1, SNAP29, CASP2, AKT2, NELLI, MCTS1, TTA1, SCRG1, CIRBP, SEMA3F, S0X2, NRGN, PSTPIP2, ISM2, EHBP1, VTA1, and DUT.

[00351] In various embodiments, the predictive model comprises a elastic net regression model, and the predictive model achieves an area under a curve (AUC) value of at least 0.79. In various embodiments, the predictive model comprises a support vector machine, and the predictive model achieves an area under a curve (AUC) value of at least 0.81. In various embodiments, the predictive model comprises a random forest model, and the predictive model achieves an area under a curve (AUC) value of at least 0.71. In various embodiments, the predictive model comprises a XGBoost model, and the predictive model achieves an area under a curve (AUC) value of at least 0.70.

[00352] In various embodiments, the cancer is lung cancer. In various embodiments, the risk of cancer is a level of risk of the subject developing cancer within 1 year, within 2 years, within 3 years, within 4 years, within 5 years, within 6 years, within 7 years, within 8 years, within 9 years, or within 10 years. In various embodiments, the risk of cancer is a presence or absence of cancer. In various embodiments, the dataset is derived from a test sample obtained from the subject. In various embodiments, the test sample is a blood, serum or plasma sample. In various embodiments, obtaining or having obtained the dataset comprises performing one or more assays. In various embodiments, performing the one or more assays comprises performing an immunoassay to determine the expression levels of the plurality of biomarkers. In various embodiments, the immunoassay is a Proximity Extension Assay (PEA) or LUMINEX xMAP Multiplex Assay. In various embodiments, the dataset comprises plasma proteomics data. In various embodiments, methods disclosed herein further comprise: selecting a therapy for providing to the subject based on the prediction of cancer.

[00353] Additionally disclosed herein is a non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain or have obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1, and generate a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers. [00354] In various embodiments, the protein biomarkers comprise three or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1.

[00355] In various embodiments, the protein biomarkers comprise four or more of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1.

[00356] In various embodiments, the protein biomarkers comprise each of TGFA, MMP12, TNFRSF13B, TNFSF14, and MASP1.

[00357] In various embodiments, the protein biomarkers further comprise one or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[00358] In various embodiments, the protein biomarkers further comprise five or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[00359] In various embodiments, the protein biomarkers further comprise ten or more of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[00360] In various embodiments, the protein biomarkers further comprise each of THBS2, GDNF, FLT1, FXYD5, CST5, ARNT, CDCP1, CCL20, FLT3LG, CLEC7A, PRKCQ, SCGN, IL5, NPY, and S100A16.

[00361] In various embodiments, the protein biomarkers further comprise one or more, five or more, or each of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[00362] In various embodiments, the protein biomarkers further comprise one or more of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[00363] In various embodiments, the protein biomarkers further comprise five or more of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[00364] In various embodiments, the protein biomarkers further comprise each of IL1B, CD84, STC1, PRDX3, LAP3, GAMT, CASP2, ITGA6, DECR1, and YTHDF3.

[00365] In various embodiments, the predictive model comprises an elastic net regression model, and the predictive model achieves an area under a curve (AUC) value of at least 0.65. In various embodiments, the predictive model comprises a support vector machine, and the predictive model achieves an area under a curve (AUC) value of at least 0.70. In various embodiments, the predictive model comprises a random forest model, and the predictive model achieves an area under a curve (AUC) value of at least 0.67. In various embodiments, the predictive model comprises a XGBoost model, and the predictive model achieves an area under a curve (AUC) value of at least 0.68.

[00366] Additionally disclosed herein is a non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain or have obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of CEACAM5, TOPI, NCAM1, SCGB3A2, and CAUY, and generate a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers.

[00367] In various embodiments, the protein biomarkers comprise three or more of CEACAM5, TOPI, NCAM1, SCGB3A2, and CALY.

[00368] In various embodiments, the protein biomarkers comprise four or more of CEACAM5, TOPI, NCAM1, SCGB3A2, and CALY.

[00369] In various embodiments, the protein biomarkers comprise each of CEACAM5, TOPI, NCAM1, SCGB3A2, and CALY.

[00370] In various embodiments, the protein biomarkers further comprise one or more of TGFBI, CABP2, ENPP6, KRT14, HEPACAM2, TMEM25, SGSH, MFAP3L, TNFSF14, CD3D, TMED4, ZP3, MMP12, GCG, and AFM.

[00371] In various embodiments, the protein biomarkers further comprise five or more of TGFBI, CABP2, ENPP6, KRT14, HEPACAM2, TMEM25, SGSH, MFAP3L, TNFSF14, CD3D, TMED4, ZP3, MMP12, GCG, and AFM.

[00372] In various embodiments, the protein biomarkers further comprise ten or more of TGFBI, CABP2, ENPP6, KRT14, HEPACAM2, TMEM25, SGSH, MFAP3L, TNFSF14, CD3D, TMED4, ZP3, MMP12, GCG, and AFM.

[00373] In various embodiments, the protein biomarkers further comprise each of TGFBI, CABP2, ENPP6, KRT14, HEPACAM2, TMEM25, SGSH, MFAP3L, TNFSF14, CD3D, TMED4, ZP3, MMP12, GCG, and AFM. [00374] In various embodiments, the protein biomarkers further comprise one or more of SPINT1, LILRA4, FLT3LG, AGBL2, PAEP, SCGB3A1, LRFN2, TJP3, FGF7, LRIG1, CAI 4, CEACAM18, CST1, ANXA10, CDCP1, GPC5, OSCAR, CEACAM6, CD2, SNCG, GPR37, SEPTIN3, RAB10, DKK4, DKKL1, SOST, CSF3, VWA5A, TSPAN7, and PAK4.

[00375] In various embodiments, the protein biomarkers further comprise five or more of SPINT1, LILRA4, FLT3LG, AGBL2, PAEP, SCGB3A1, LRFN2, TJP3, FGF7, LRIG1, CAI 4, CEACAM18, CST1, ANXA10, CDCP1, GPC5, OSCAR, CEACAM6, CD2, SNCG, GPR37, SEPTIN3, RAB10, DKK4, DKKL1, SOST, CSF3, VWA5A, TSPAN7, and PAK4.

[00376] In various embodiments, the protein biomarkers further comprise ten or more of SPINT1, LILRA4, FLT3LG, AGBL2, PAEP, SCGB3A1, LRFN2, TJP3, FGF7, LRIG1, CA14, CEACAM18, CST1, ANXA10, CDCP1, GPC5, OSCAR, CEACAM6, CD2, SNCG, GPR37, SEPTIN3, RAB10, DKK4, DKKL1, SOST, CSF3, VWA5A, TSPAN7, and PAK4.

[00377] In various embodiments, the protein biomarkers further comprise twenty or more of SPINT1, LILRA4, FLT3LG, AGBL2, PAEP, SCGB3A1, LRFN2, TJP3, FGF7, LRIG1, CA14, CEACAM18, CST1, ANXA10, CDCP1, GPC5, OSCAR, CEACAM6, CD2, SNCG, GPR37, SEPTIN3, RAB10, DKK4, DKKL1, SOST, CSF3, VWA5A, TSPAN7, and PAK4.

[00378] In various embodiments, the protein biomarkers further comprise each of SPINT1, LILRA4, FLT3LG, AGBL2, PAEP, SCGB3A1, LRFN2, TJP3, FGF7, LRIG1, CA14, CEACAM18, CST1, ANXA10, CDCP1, GPC5, OSCAR, CEACAM6, CD2, SNCG, GPR37, SEPTIN3, RAB10, DKK4, DKKL1, SOST, CSF3, VWA5A, TSPAN7, and PAK4.

[00379] In various embodiments, the protein biomarkers further comprise one or more of BPIFB1, SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TMOD4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2.

[00380] In various embodiments, the protein biomarkers further comprise five or more of BPIFB1, SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TM0D4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2. [00381] In various embodiments, the protein biomarkers further comprise ten or more of BPIFB1, SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TMOD4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2. [00382] In various embodiments, the protein biomarkers further comprise twenty or more of BPIFB1, SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TMOD4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2. [00383] In various embodiments, the protein biomarkers further comprise thirty or more of BPIFB1, SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TMOD4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2. [00384] In various embodiments, the protein biomarkers further comprise forty or more of BPIFB1, SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TMOD4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2. [00385] In various embodiments, the protein biomarkers further comprise each of BPIFB 1 , SIGLEC9, ZNRD2, PM20D1, TK1, RPS10, PMCH, RNF43, MEP1B, BGN, NELLI, CD101, LRP2BP, PRSS53, MFGE8, THSD1, CKMT1A, MEPE, APOL1, RBPMS, MARCO, KLRC1, FGFBP2, TPSG1, SELENOP, CLEC7A, UPK3BL1, HS6ST1, ENDOU, IL12RB2, CYB5A, GKN1, NRTN, CCL26, CRNN, PINLYP, LAIR2, BAG3, SCPEP1, RIPK4, CTSE, TM0D4, SFTPA1, SEMA4D, IL17C, GFRA3, DPEP2, EDEM2, CD84, and KIRREL2.

[00386] In various embodiments, the protein biomarkers further comprise one or more of NECTIN1, CBLN1, NTF3, PYY, XG, NPY, CCL20, SIL1, PLB1, DUSP29, UMOD, ATXN2L, LEO1, PROS1, EDDM3B, ENO3, DCBLD2, MMP9, KIF22, DENND2B, C1RL, PVALB, CXCL8, PPY, CCN1, KLKIO, RRAS, SCN3B, BPIFB2, ITGAL, DDX1, MEGF11, NOP56, NTF4, HNMT, IL9, SCRIB, UXS1, MEP1A, ACTN2, NECAP2, CLEC10A, DDX53, SV2A, ATXN10, PI 16, KCNH2, TNR, PDGFRB, SERPINA4, CDC27, MICALL2, CD28, BRK1, SLC16A1, DSCAM, PBXIP1, MATN3, SFTPA2, PTTG1, ASAH2, SCG2, PTGR1, GBA, PTPRZ1, ERN1, LECT2, SCGN, HLA-DRA, IL5RA, LRPAP1, CXCL13, NEXN, CD248, KYNU, ADAMTS15, WFIKKN2, CLEC14A, FZD10, PROC, LY9, LRP2, CX3CL1, RNASET2, CTSS, MCEMP1, COMP, SIGLEC6, CCL24, AOC1, PLXNB3, TMPRSS15, FCAR, SCIN, IFI30, KIRREL1, FXYD5, S100A16, LILRA5, CLSPN, AHNAK2, CTLA4, INSL5, WDR46, CST5, PHLDB2, TREML2, GUCA2A, PFDN2, PDIA4, LAMA1, SLAMF7, RGS8, IL6, PSG1, PZP, RRM2, GFRAL, AIF1L, LGMN, C1QTNF9, TSPAN1, DLL4, CRELD2, SCARF1, FGF9, JAM3, LPP, HSPB1, PPT1, PPIF, TRPV3, APOA4, LYSMD3, TGFA, ATP6V1D, LRRC38, CTAG1A, TINAGL1, POLR2A, EDIL3, LAP3, SORD, ARHGAP30, CSPG4, ART3, GADD45GIP1, SLURP1, LILRA2, GZMH, FKBP7, SLC27A4, CALCB, GIT1, CTSO, PCBD1, CSF3R, EIF1AX, CSPG5, CD93, ADAMTSL5, ISM2, CPE, WFDC1, VWC2, SPINK5, BTN1A1, DPT, FCN1, AIF1, GPC1, FAP, CLNS1A, CFC1, FASLG, NCS1, PRKAR1A, RCOR1, SLITRK2, SPARCL1, HSPB6, TNFRSF12A, IL6, SERPIND1, CEBPB, CASC3, AMPD3, YTHDF3, AAMDC, STX7, AGRP, ICA1, CHCHD6, IGSF21, VSTM1, PCDH7, VNN2, GP6, ITGAV, CD40LG, GIP, MB, TPD52L2, HPSE, GRIN2B, TREML1, C3, TNFRSF17, IL6, CD226, PALM, FKBP14, RBPMS2, CLEC6A, DAAM1, FAM3D, WASF1, HS1BP3, NOS3, POF1B, PLXNA4, MITD1, ERMAP, SYAP1, LRRC59, CNTN2, RAB2B, PENK, MCAM, EIF2S2, EGF, PTPN6, NID2, EHD3, IGFBP6, LMOD1, PAGR1, CD300C, SKAP2, PRKG1, SYTL4, GYSI, CASP3, PILRA, CD69, CCN5, PCBP2, LMOD1, PDIA5, PCSK7, SCARA5, METAP1D, ADGRB3, MPIG6B, NUMB, L3HYPDH, DENR, AGRN, COX6B1, JAM2, TIA1, CACYBP, SEMA6C, VAT1, SUSD1, RSPO3, TWF2, BOLA1, OXCT1, ITGA6, BST2, F2R, PILRB, RTBDN, ENOX2, DOK1, VASH1, DTD1, DDHD2, TBC1D23, GLRX5, CDNF, SIRPB1, NMT1, STK11, RPL14, PSTPIP2, FHIT, CLMP, LM0D1, ERP29, BECN1, CD38, YAP1, CAB, CRKL, PPP1R9B, FLU, CMC1, CDC37, ARHGAP45, PDAP1, NUDC, CLEC1B, USO1, SNAP23, HGS, FUS, PIK3AP1, FUR, TBC1D17, ITPA, IL1B, ENO1, THTPA, SAFB2, JPT2, GIMAP7, NIT2, RILPL2, PRTFDC1, TAD A3, TOMM20, HPCAL1, LONP1, CALCOCO1, ATRAID, TYMP, TNFRSF19, DNPEP, NRGN, STK4, SSNA1, CRYGD, LZTFL1, SNAP29, PDLIM5, CASP2, MANF, BACH1, DAPP1, AKR1B1, EREG, DAG1, HSBP1, DUT, AKT2, PLA2G4A, TXLNA, PIKFYVE, FYB1, CSDE1, RHOC, HNRNPK, DCTD, SCRG1, LACTB2, RGCC, GIMAP8, GRHPR, SNX5, NCK2, EIF4G1, BNIP3L, ACOT13, MECR, MAP2K6, SEC31A, MGLL, MESD, NUDT16, SULT1A1, GOPC, VTA1, PDLIM7, ANXA2, GGACT, PMVK, USP8, SNCA, CAMSAP1, HEXIM I , SHMT1, LGALS8, APPL2, MAP2K1, EHBP1, MAP4K5, PDE5A, HARS1, SRC, TACC3, and RAB27B.

[00387] In various embodiments, the predictive model comprises a elastic net regression model, and the predictive model achieves an area under a curve (AUC) value of at least 0.85. In various embodiments, the predictive model comprises a support vector machine, and the predictive model achieves an area under a curve (AUC) value of at least 0.84. In various embodiments, the predictive model comprises a random forest model, and the predictive model achieves an area under a curve (AUC) value of at least 0.72. In various embodiments, the predictive model comprises a XGBoost model, and the predictive model achieves an area under a curve (AUC) value of at least 0.73.

[00388] Additionally disclosed herein is a non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain or have obtained a dataset derived from the subject comprising quantitative levels of a plurality of biomarkers, wherein the plurality of biomarkers comprises protein biomarkers comprising two or more of VWA5A, ENPP6, TMEM25, ALDH2, and LEO1, and generate a prediction of risk of cancer for the subject by applying a predictive model to the quantitative values of the plurality of biomarkers.

[00389] In various embodiments, the protein biomarkers comprise three or more of VWA5A, ENPP6, TMEM25, ALDH2, and LEO 1. [00390] In various embodiments, the protein biomarkers comprise four or more of VWA5A, ENPP6, TMEM25, ALDH2, and LEO 1.

[00391] In various embodiments, the protein biomarkers comprise each of VWA5A, ENPP6, TMEM25, ALDH2, and LEO1.

[00392] In various embodiments, the protein biomarkers further comprise one or more of GAMT, TPSG1, ANK2, SCT, TSPAN7, GPC5, PGLYRP1, PAK4, TNFSF14, CLEC6A, TMPRSS15, PMCH, KRT14, SFTPA1, and LRFN2.

[00393] In various embodiments, the protein biomarkers further comprise five or more of GAMT, TPSG1, ANK2, SCT, TSPAN7, GPC5, PGLYRP1, PAK4, TNFSF14, CLEC6A, TMPRSS15, PMCH, KRT14, SFTPA1, and LRFN2.

[00394] In various embodiments, the protein biomarkers further comprise ten or more of GAMT, TPSG1, ANK2, SCT, TSPAN7, GPC5, PGLYRP1, PAK4, TNFSF14, CLEC6A, TMPRSS15, PMCH, KRT14, SFTPA1, and LRFN2.

[00395] In various embodiments, the protein biomarkers further comprise each of GAMT, TPSG1, ANK2, SCT, TSPAN7, GPC5, PGLYRP1, PAK4, TNFSF14, CLEC6A, TMPRSS15, PMCH, KRT14, SFTPA1, and LRFN2.

[00396] In various embodiments, the protein biomarkers further comprise one or more of MMP12, TNPO1, GAST, CD3D, TK1, DLGAP5, SCGN, CCL24, PSG1, CLU, CFB, LBP, CRYM, LAIR2, TCN2, SV2A, CRHBP, C5, SCGB3A2, ANXA10, GCG, RPGR, PAPP A, FZD8, CSPG5, BRK1, OXT, FDX1, ENPEP, and LRG1.

[00397] In various embodiments, the protein biomarkers further comprise five or more of MMP12, TNPO1, GAST, CD3D, TK1, DLGAP5, SCGN, CCL24, PSG1, CLU, CFB, LBP, CRYM, LAIR2, TCN2, SV2A, CRHBP, C5, SCGB3A2, ANXA10, GCG, RPGR, PAPP A, FZD8, CSPG5, BRK1, OXT, FDX1, ENPEP, and LRG1.

[00398] In various embodiments, the protein biomarkers further comprise ten or more of MMP12, TNPO1, GAST, CD3D, TK1, DLGAP5, SCGN, CCL24, PSG1, CLU, CFB, LBP, CRYM, LAIR2, TCN2, SV2A, CRHBP, C5, SCGB3A2, ANXA10, GCG, RPGR, PAPP A, FZD8, CSPG5, BRK1, OXT, FDX1, ENPEP, and LRG1.

[00399] In various embodiments, the protein biomarkers further comprise twenty or more of MMP12, TNPO1, GAST, CD3D, TK1, DLGAP5, SCGN, CCL24, PSG1, CLU, CFB, LBP, CRYM, LAIR2, TCN2, SV2A, CRHBP, C5, SCGB3A2, ANXA10, GCG, RPGR, PAPP A, FZD8, CSPG5, BRK1, OXT, FDX1, ENPEP, and LRGE

[00400] In various embodiments, the protein biomarkers further comprise each of MMP12, TNPO1, GAST, CD3D, TK1, DLGAP5, SCGN, CCL24, PSG1, CLU, CFB, LBP, CRYM, LAIR2, TCN2, SV2A, CRHBP, C5, SCGB3A2, ANXA10, GCG, RPGR, PAPPA, FZD8, CSPG5, BRK1, OXT, FDX1, ENPEP, and LRGl.

[00401] In various embodiments, the protein biomarkers further comprise one or more of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRES1. [00402] In various embodiments, the protein biomarkers further comprise five or more of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRES1.

[00403] In various embodiments, the protein biomarkers further comprise ten or more of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRES1. [00404] In various embodiments, the protein biomarkers further comprise twenty or more of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRES1. [00405] In various embodiments, the protein biomarkers further comprise thirty or more of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRES1.

[00406] In various embodiments, the protein biomarkers further comprise forty or more of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRES1.

[00407] In various embodiments, the protein biomarkers further comprise each of PRAME, KIRREL1, KIF22, SPINT1, FGA, C1QTNF9, KIR2DS4, MMP9, NEXN, FCN1, MFGE8, ZNRD2, PDGFRB, HS6ST1, DUSP3, CABP2, DNM3, FGL1, TOPI, CDCP1, RAB10, THSD1, FASLG, MCEMP1, COL4A4, ENO1, BRD1, GP5, ZP3, SERPIND1, NCAM1, ATXN10, MUC16, GABRA4, POSTN, MAEA, SHH, DDX53, PRKG1, PAEP, RICTOR, IL6, FKBP14, CCL26, AIDA, GIP, TGFA, ITIH4, PCSK7, and RARRES1.

[00408] In various embodiments, the protein biomarkers further comprise one or more of SLC27A4, IL6, DKKL1, MFAP3, STX7, SSBP1, AKR7L, UGDH, IGHMBP2, GBP4, RBPMS, ST6GAL1, LILRA5, LILRA2, SOWAHA, ACADSB, CAMLG, CRTAC1, SUSD1, IL6, KLK10, GRSF1, MFAP4, NMT1, CNTN3, IL36A, EHD3, MAPT, AGBL2, ERN1, POMC, PDIA4, LGMN, EPHA10, PCBP2, PTGR1, GIT1, TREML1, GALNT2, TDGF1, INSR, OSCAR, MMP10, MRPL24, EIF1AX, AHNAK2, TP53, GBA, LRRC38, CLEC12A, TPT1, PPP1CC, BPIFB1, CFC1, SIGLEC9, CALY, OSM, ADAMTS1, OSMR, TYMP, GPR37, CLEC7A, SMAD5, SFTPA2, CTSS, HNMT, BATF, CCL19, SHC1, CST7, S100A12, ASAH2, PPIB, LYPD3, APOL1, AFM, SSC4D, FGF7, TDRKH, SCG2, ENPP2, PRKAR1A, FAM3D, GADD45GIP1, SEMA4D, PPP1R14A, EGF, NTF4, SERPING1, COX6B1, NECAP2, TFF1, IDI2, TJP3, CAI 4, PZP, PLIN1, ERBB4, TBC1D23, CRISP3, IFI30, ITIH1, C9, LAP3, PDIA5, ENDOU, FLT3LG, VNN2, MILR1, SDC1, CEACAM18, FHIP2A, CEACAM5, Fl 1, WFIKKN2, US01, CD40LG, GSTT2B, DUSP29, ATXN2L, IL6, RRM2, FGF23, ARHGAP30, SERPINA3, CXCL13, MMP8, NUDC, EN0PH1, NEK7, MAN1A2, ASAHI, STX5, IZUM01, SERPINC1, IL9, PVALB, GZMH, FGF16, TFF2, WASF1, TMEM106A, GP2, PLXNA4, GNE, LGALS8, AOC1, FLRT2, CHCHD6, RNF43, TPD52L2, CSDE1, GPD1, PLA2G4A, LRIG1, NGF, RAB27B, VAT1, NUDT16, TRAF3IP2, MARCO, UMOD, PIK3AP1, MEGF11, NEDD4L, PKD2, CEBPB, RILPL2, IL3, RGCC, SARG, SMAD2, CTSH, KLKB1, ERP44, SULT2A1, SORD, IFNAR1, KLK11, TOMM20, C3, ADRA2A, NCK2, KIRREL2, CACNB3, SKAP2, CEACAM6, DNAJC21, PROS1, NRCAM, NPY, FYB1, RAB2B, MANF, MECR, LPA, DAAM1, DCTD, FXYD5, CRELD1, PLEKHO1, TINAGL1, ZBTB16, PROK1, MAP2K1, DAPP1, DSG4, PPP1R9B, RILP, EIF4G1, SESTD1, KIFBP, HGS, CD14, ANKMY2, WNT9A, CAB, GP1BB, CLIP2, BANK1, WDR46, HSPB1, CSF2, SNCA, RRAS, PRTFDC1, RBPMS2, LARP1, KAZN, CLSPN, RHOC, PPT1, DPEP2, METAP1D, STK11, CFH, PDE5A, MRC1, BIN2, IL17A, PXDNL, GP6, EPO, MAP3K5, MCEE, DDHD2, PHLDB2, NECTIN1, CCDC50, GKN1, MPIG6B, CBLIF, SYTL4, SSH3, PDZD2, SULT1A1, DLG4, HPCAL1, ICA1, GDF15, CD160, APPL2, GRN, IL17RA, CDC42BPB, C4BPB, DAG1, CMIP, KYNU, NUMB, PPY, PPIF, CFI, DTD1, LDLRAP1, FGF9, STXBP1, CMC1, GOPC, SMTN, PTPN6, L3HYPDH, PDAP1, LPP, THTPA, XG, AGRP, RABI 1FIP3, Fl 1R, BCR, LONP1, BNIP3L, SELP, GYSI, MGLL, PDLIM5, MESD, DNPEP, SRC, PMVK, ITPRIP, CD69, CALCOCO1, PAFAH2, GIPC3, SNAP23, STAT5B, RSPO3, AKT1S1, SNAP29, CASP2, AKT2, NELLI, MCTS1, TIA1, SCRG1, CIRBP, SEMA3F, SOX2, NRGN, PSTPIP2, ISM2, EHBP1, VTA1, and DUT.

[00409] In various embodiments, the predictive model comprises an elastic net regression model, and the predictive model achieves an area under a curve (AUC) value of at least 0.79. In various embodiments, the predictive model comprises a support vector machine, and the predictive model achieves an area under a curve (AUC) value of at least 0.81. In various embodiments, the predictive model comprises a random forest model, and the predictive model achieves an area under a curve (AUC) value of at least 0.71. In various embodiments, the predictive model comprises a XGBoost model, and the predictive model achieves an area under a curve (AUC) value of at least 0.70. [00410] In various embodiments, the cancer is lung cancer. In various embodiments, the risk of cancer is a level of risk of the subject developing cancer within 1 year, within 2 years, within 3 years, within 4 years, within 5 years, within 6 years, within 7 years, within 8 years, within 9 years, or within 10 years. In various embodiments, the risk of cancer is a presence or absence of cancer. In various embodiments, the dataset is derived from a test sample obtained from the subject. In various embodiments, the test sample is a blood, serum or plasma sample. In various embodiments, the dataset is obtained from having performed one or more assays. In various embodiments, the one or more assays comprises an immunoassay to determine the expression levels of the plurality of biomarkers. In various embodiments, the immunoassay is a Proximity Extension Assay (PEA) or LUMINEX xMAP Multiplex Assay. In various embodiments, the dataset comprises plasma proteomics data. In various embodiments, a therapy is selected for providing to the subject based on the prediction of cancer.

EXAMPLES

[00411] Below are examples of specific embodiments for carrying out the present invention. The examples are offered for illustrative purposes only and are not intended to limit the scope of the present invention in any way. Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperatures, etc.), but some experimental error and deviation should be allowed for.

[00412] In some scenarios as described herein, the proteins in Example 4 can be subsets of proteins described in Example 1 and/or identified in Tables 1-3 (e.g., 425 proteins for 1-3Y and 493 proteins for 1-5Y).

Example 1: Study Methods

[00413] This study was performed using data and biospecimens collected as part of the Liverpool Lung Project (LLP) cohort, and were obtained following institutional review board approval, and patients provided written informed consent. Leveraging the Liverpool Lung Project (LLP), a unique 10-year observational cohort that followed subjects from healthy to lung cancer diagnoses, pre-diagnosis plasma proteomics were generated in a cross-sectional subcohort including 292 subjects e.g., with samples taken 1-5 years before their diagnosis, and a longitudinal sub-cohort including 246 samples from 144 subjects, e.g., taken 5-10 years before their diagnosis, 2-5 years before their diagnosis, and/or at time of their diagnosis.

[00414] In the study methods, plasma proteomics data were generated using two separate workflows or approaches. In one workflow (Example 2), 366 proteins were analyzed to develop predictive models incorporating 30 biomarkers (hereafter referred to as predictive models using the Olink®Target 96 platform). In another workflow (Examples 3 and 4), 2941 proteins were analyzed to develop predictive models for predicting future lung cancer development within 1-3 years and within 1-5 years. Such predictive models are hereafter referred to as predictive models using the Olink® Explore 3072 platform. Receiver operating characteristic (ROC) curves, area under curves (AUCs) (e.g., median AUC) from the models, and recursive feature elimination (RFE) using 5-fold cross validation repeated 5 times were reported.

[00415] For each approach or workflow, four machine learning algorithms (e.g., Elastic Net (“en”), Random Forest (“rf ’), Support Vector Machine (“svm”), XGBoost (“xgb”)) were implemented to develop prediction models to predict cancer vs. healthy based on different biomarkers. Biomarkers for the Olink® Target 96 platform were selected based on differential expression between healthy and cancer subjects in “WP2” step (linear model, p<0.05). Biomarkers for the Olink® Explore 3072 platform were selected after performing differential expression on a random set of 50% of the dataset 1000 times, and significant proteins were defined as being differentially expressed (p<0.05) at least 100 times.

[00416] Tables 1-3 show the predictors that were included in the prediction models. Tables 1-3 further identify the rank of each protein biomarker in the corresponding workflow or model (e.g., “Olink Target 96 WP2 rank,” “1-5Y Rank,” or “1-3Y Rank”). Tables 1-3 further identify the biomarker name, pathway information, Biomarker symbol, Uniprot number, and/or protein name of each protein biomarker.

[00417] The proteins in Example 4 can be subsets of proteins described Tables 1-3 (e.g., 425 proteins for 1-3Y and 493 proteins for 1-5Y). Example 2: Example Results from Prediction Models using Olink® Target 96 Platform

[00418] In this example, a prediction model including 30 protein biomarkers was constructed from the cross-sectional sub-cohort as described in Example 1 for predicting future lung cancer development within 1-5 years. Here, the prediction model was constructed using four separate machine learning algorithms (Elastic Net (“en”), Random Forest (“rf ’), Support Vector Machine (“svm”), XGBoost (“xgb”)), followed by recursive feature elimination (REE) from 5-fold cross- validation (CV) repeated for 5 times to reduce the total number of predictors in the model. [00419] Here, the prediction model was constructed in accordance with the embodiment shown in FIG. 3. Thus, the prediction model analyzes biomarker levels and generates a cancer score that is informative for the overall prediction (e.g., presence or absence of cancer).

[00420] As shown in FIG. 5A, the four different prediction models successfully predicted future lung cancer development from 1-5 years before diagnosis with AUCs ranging from 0.68 to 0.74. [00421] As shown in FIG. 5B and Table 4, in an independent validation set (longitudinal subcohort), the model predicted cancer development 2-5 years prior to diagnoses with AUCs ranging from 0.68 to 0.71.

[00422] FIG. 5C shows the performance of the predictive model (e.g., Random Forest) as a function of the number of predictors in the model, in accordance with the embodiment of the prediction model shown in FIG. 3. Beginning with the 30 initial protein biomarkers (30 biomarkers shown in Table 1), the performance of the predictive model was evaluated as protein biomarkers were iteratively removed via recursive feature elimination (RFE). For example, with the 30 initial protein biomarkers (indicated on the x-axis of FIG. 5C as “variables”), the predictive model achieved an AUC performance metric of nearly 0.7. As the number of protein biomarkers decreased, the predictive capacity of the model remained predictive. For example, at 20 protein biomarkers (which includes the biomarkers in Table 1 with corresponding “Olink Target 96 WP2 rank” between 1-20), the predictive model exhibited an AUC of -0.67. At 10 protein biomarkers (which includes the biomarkers in Table 1 with corresponding “Olink Target 96 WP2 rank” between 1-10), the predictive model exhibited an AUC of -0.63. At 5 protein biomarkers (which includes the biomarkers in Table 1 with corresponding “Olink Target 96 WP2 rank” between 1-5), the predictive model exhibited an AUC of -0.62. Example 3: Example Results from Prediction Models using Olink® Explore 3072 Platform

[00423] In this example, patient samples from the cross-sectional and longitudinal sub-cohorts were incorporated to construct a prediction model for predicting future lung cancer development within 1-5 year (“1-5Y”) (FIGS. 6A and 6B, and Table 5) and 1-3 year (“1-3Y”) (FIGS. 7A and 7B, and Table 6) before diagnosis. For 1-5Y before diagnosis, 493 protein biomarkers were derived. For 1-3Y before diagnosis, 425 protein biomarkers were derived.

[00424] Here, the prediction model was constructed using four separate machine learning algorithms (Elastic Net Regression (“en”), Random Forest (“rf ’), Support Vector Machine (“svm”), XGBoost (“xgb”)), followed by recursive feature elimination (REE) from 5-fold cross- validation (CV) repeated for 5 times to reduce the total number of predictors in the model. [00425] Here, prediction models were constructed in accordance with the embodiment shown in FIG. 3. Thus, prediction models analyze biomarker levels and generate a cancer score that is informative for the overall prediction (e.g., future risk of cancer, or presence or absence of cancer).

[00426] As shown in FIG. 6A and Table 5, the four different prediction models successfully predicted future lung cancer development from 1-5 years before diagnosis with AUCs (e.g., median AUCs) ranging from 0.73 to 0.84.

[00427] Table 5 shows various AUC performance metrics, such as “Min.,” “1st. Qu.,” “Median,” “Mean,” “3rd. Qu,” “Max.” AUC from various “models” (e.g., logistic, svm, rv, xgb) or machine learning algorithms (e.g., “en,” “svm,” “rf,” or “xgb”) ranging from 0.60 to 0.93.

[00428] FIG. 6B shows the performance of the predictive model (e.g., Random Forest) as a function of the number of predictors in the model, in accordance with the embodiment of the prediction model shown in FIG. 3. Beginning with the 493 initial protein biomarkers (493 biomarkers shown in Table 2), the performance of the predictive model was evaluated as protein biomarkers were iteratively removed via RFE. For example, with the 493 initial protein biomarkers (indicated on the x-axis of FIG. 6B as “variables”), the predictive model achieved an AUC performance metric of nearly 0.73. As the number of protein biomarkers decreased, the predictive capacity of the model remained predictive. For example, at 100 protein biomarkers (which includes the biomarkers in Table 2 with corresponding “1-5Y rank” between 1-100), the predictive model exhibited an AUC of -0.70. At 10 protein biomarkers (which includes the biomarkers in Table 2 with corresponding “1-5Y rank” between 1-10), the predictive model exhibited an AUC of -0.57. At 5 protein biomarkers (which includes the biomarkers in Table 2 with corresponding “1-5Y rank” between 1-5), the predictive model exhibited an AUC of -0.53. [00429] Table 6 shows various AUC model performance metrics, such as “Min.,” “1st. Qu.,” “Median,” “Mean,” “3rd. Qu,” “Max.” AUC from four different “models” (e.g., logistic, svm, rv, xgb) or machine learning algorithms (e.g., en, svm, rf, xgb) ranging from 0.58 to 0.99.

[00430] As shown in FIG. 7A and Table 6, the prediction models successfully predicted future lung cancer development from 1-3 years before diagnosis with AUCs (e.g., median AUCs) ranging from 0.74 to 0.87.

[00431] FIG. 7B shows the performance of the predictive model (e.g., Random Forest) as a function of the number of predictors in the model, in accordance with the embodiment of the prediction model shown in FIG. 3. Beginning with the 425 initial protein biomarkers (425 biomarkers shown in Table 3), the performance of the predictive model was evaluated as protein biomarkers were iteratively removed via RFE. For example, with the 425 initial protein biomarkers (indicated on the x-axis of FIG. 7B as “variables”), the predictive model achieved an AUC performance metric of nearly 0.75. As the number of protein biomarkers decreased, the predictive capacity of the model remained predictive. For example, at 100 protein biomarkers (which includes the biomarkers in Table 3 with corresponding “1-3Y rank” between 1-100), the predictive model exhibited an AUC of -0.68. At 10 protein biomarkers (which includes the biomarkers in Table 3 with corresponding “1-3Y rank” between 1-10), the predictive model exhibited an AUC of -0.55. At 5 protein biomarkers (which includes the biomarkers in Table 3 with corresponding “1-3Y rank” between 1-5), the predictive model exhibited an AUC of -0.53.

Example 4: Example Early Prediction of Lung Cancer using Plasma Protein Biomarkers from Prediction Models using Olink® Explore 3072 and Target 96 Platforms

[00432] Individual plasma proteins have been identified as minimally invasive biomarkers for lung cancer diagnosis with potential utility in early detection. Differences in specific plasma protein levels have been previously shown to be indicative for lung cancer diagnosis, or related to imminent lung cancer. However, more comprehensive plasma protein profiling over longer time periods pre-diagnosis has not been studied.

[00433] In this example, the Olink® Explore-3072 platform quantitated 2941 proteins in 496 Liverpool Lung Project (LLP) plasma samples, including 131 cases taken 1-10 years prior to diagnosis, 237 controls, and 90 subjects at multiple times. 1112 proteins associated with haemolysis were excluded. Leature selection with bootstrapping identified differentially expressed proteins, subsequently modelled for lung cancer prediction and validated in UK Biobank data.

Methods

[00434] EDTA plasma samples from LLP subjects were collected by standardized protocols (between 1998 and 2016), with a single cell depletion centrifugation (2200g, 15 minutes) prior to storing at -80°C and a further cell depletion spin after thawing, before being aliquoted for Olink studies and refrozen for shipment.

[00435] The cases and controls in this example were selected retrospectively as a nested casecontrol cohort from the LLP population cohort, as shown in PIG. 12.

[00436] As illustrated in Table 7, LLP population cohort subjects without lung cancer at the time of recruitment, but were identified with subsequent diagnosis of primary lung cancer within 5 years for the primary discovery cohort.

[00437] As illustrated in Table 9, non-small cell lung cancer cases included almost equal numbers of adenocarcinoma (n= 53) and squamous cell carcinoma (n= 49) and were either early stage (45%) or late stage (52%) at the time of diagnosis.

[00438] As illustrated in Table 10, samples at diagnosis (n=23), 1-3 years prior to diagnosis (n=21), 3-5 years prior to diagnosis (n=30) or 5-10 years prior to diagnosis (n=33), were identified for longitudinal studies from 42 cases, along with 110 longitudinal samples at the same time points from 48 controls.

[00439] For each case, sex (e.g., self-reported as sex assigned at birth) and age at plasma sample were used to match control subjects (2 per case for discovery cohort and 1 per case for longitudinal studies). Controls were selected to have substantially the same smoking status (e.g., current, former, or never) at the time of sampling and similar lifetime smoking duration (based on all forms of tobacco). Where multiple longitudinal bio-specimens were available from cases, controls were identified with multiple samples at approximately the same intervals. Most subjects were smokers at the time of initial blood collection, with 10 never smokers, and 24 had quit smoking at the time of the last sample used.

[00440] Pre-diagnosis plasma proteomics was assessed in a cross-sectional sub-cohort (292 subjects, 1-5 years before diagnosis), and a longitudinal sub-cohort (246 samples from 144 subjects, 5-10 years before diagnosis, 2-5 years before diagnosis, and at time of diagnosis). Plasma proteomics data was generated using the Olink Explore 3072 platform (2941 proteins), which consists of 8 separate panels: Oncology, Oncology II, Cardiometabolic, Cardiometabolic II, Inflammation, Inflammation II, Neurology, and Neurology II. PCA plots with all proteins and samples were generated, and 6 samples with > 5 standard deviations from the mean were filtered. PCA for each panel were generated separately, and an additional 5 samples with > 5 standard deviations from the mean were filtered. Data was also generated using the Olink® Target 96 platform (panels: Cardiometabolic, Cardiovascular II, Cardiovascular III, Cell Regulation, Development, Immune Response, Inflammation, Metabolism, Neuro Exploratory, Neurology, Oncology II, Oncology III, Organ Damage).

[00441] Haemolysis is known to contribute to increased levels of some proteins in plasma. As shown in Table 11, to avoid potential false-positives results due to haemolysis-associated signals, proteins that were found to be significantly associated with haemolysis were systematically removed. Each sample in the LLP cohort had a haemolysis score assigned ranging from 0 to 4. A linear model was generated to identify proteins significantly associated with haemolysis, with 1112 proteins out of 2941 proteins measured by Olink Explore identified based on FDR < 0 01. These proteins were filtered out from further analysis.

[00442] Olink data were generated in UK Biobank (UKB) data. UK Biobank population includes ages from 40 to 69 years, and LLP population includes ages from 48 to 84 years. The analysis involved initial batch of data which was generated using the Olink Explore 1536 platform (1472 proteins) on 54,306 UKB participants. Future cancer cases from UK Biobank cancer registry were extracted. Lung cancer cases using the ICD10 code of C34 were defined. Cancer cases were restricted to the first occurrence, have future cancer from the baseline blood draw, and have Olink data. After applying selection criteria, the total number of cases was 392, as shown in FIG. 13 and Table 12.

[00443] Controls were defined as individuals with no record of cancer, who did not self-report any previous cancer incidents, and if deceased cancer was not the cause of death. Controls to cancer cases by age, sex, smoking status and race, were matched using the K-nearest neighbor method to generate matching controls. Two patient-to-control ratios were implemented: one is a balanced ratio where the ratio of cancer to control is 1 :1, and another represents the risk of getting lung cancer as 1 cancer : 14 controls (392 cases and 5500 controls).

[00444] For pan-cancer analysis, the above process for each cancer type was repeated, followed by combining control samples from different cancer types into one pooled control sample; ICD 10 cancer codes: Prostate, C61; Breast, C50; Colorectal, C18 & C19; Uterine Cancer, C44; Kidney Cancer, C64; Pancreatic, C25; Bladder, C67; Stomach, Cl 6; Liver, C22.

Machine Learning

[00445] Feature selection was performed on the discovery cohorts as shown in Table 7 by bootstrapping differential expression on a random set of 50% of the dataset 1000 times using a linear model with age, sex, and pack years as covariates, and proteins were defined as being differentially expressed between cases and controls (P < 0 05 linear model anova) at least 100 times. Proteins ignificantly associated with haemolysis were then filtered out. Four different machine learning algorithms (e.g., Elastic Net, Random Forest, Support Vector Machine, XGBoost) were trained as a binary model to predict cancer vs. control either at 1-3 years before diagnosis or 1 -5 years before diagnosis of lung cancer. Receiver operating characteristic area under the curve values (AUCs) from the models are reported as the median AUC from 5-fold cross validation repeated 5 times. To predict future cancer in UKB individuals, the method involves intersecting selected proteins with proteins available in UKB data and trained Support Vector Machine (SVM) classifiers using this set of proteins.

[00446] For GO biological process pathways gene set enrichment, 7658 gene sets were obtained from msigdb (www.gsea-msigdb.org), and the list was filtered to only include proteins measured by the Olink Explore platform (2941 proteins). Hypergeometric tests were performed separately on proteins higher or lower in lung cancer cases from the 1-3 years and 1-5 years models, with the background as the 2941 proteins measured by Olink.

Results

[00447] Patient samples taken 1-3 years before diagnosis (1-3Y) from the cross-sectional and longitudinal sub-cohorts were combined to build models to predict development of future lung cancer. 422 proteins that were differentially expressed between healthy subjects and future lung cancer cases 1-3Y prior to diagnosis were identified. 240 / 422 proteins were kept for further analysis (e.g., 158 up in cases and 82 down) after filtering out proteins that were significantly associated with haemolysis (as shown in Table 11). A subset of these proteins was measured on the Olink® Target 96 platform and these correlated well with the Olink® Explore platform. 262 / 265 of the overlapping proteins had a significant correlation with FDR < 0 05 (FIG. 14 and Table 14).

[00448] As shown in FIG. 8A, median AUCs from the cross validation ranging from 0.76 to 0.90 were generated by training four different machine learning algorithms on the LLP cohort (e.g., Elastic Net, Random Forest, Support Vector Machine (SVM), XGBoost, 5-fold cross validation repeated 5 times) using the 240 proteins in the 1-3Y cohort.

[00449] Combined z scores were generated from the differentially expressed proteins at 1-3Y before diagnosis and were plotted over time, including additional longitudinal samples (FIG. 8B). The difference between cases and controls was greater closer to diagnosis. The 1-3Y combined z score differentiated between controls and cases at 1-3 years before diagnosis, but not at 3-5 years or 5-10 years before diagnosis. Individual patient trajectories of the combined z scores indicate that patients that developed cancer were more likely to have an upward trajectory of their z score over time, as shown in FIG. 15.

[00450] The combined z scores did not differ between stage of cancer at time of diagnosis, as shown in FIG. 9A. A difference between stages was at 5-10 years before diagnosis, where it was higher for stage I than stage IV. However, at this time point the healthy and lung cancer z-scores didn’t demonstrate a difference overall. The combined z scores also did not correlate with pack years regardless of time before diagnosis, whether looking at healthy or lung cancer subject, as shown in FIG. 9B. The z score had a stronger signal in squamous cell carcinoma 3-5 years before diagnosis, had no correlation with age in pre-diagnostic samples, and had no association with diagnosis of COPD, as shown in FIG. 16.

[00451] These 1-3Y trained models were tested on samples in the UK Biobank using SVM, which was the model that had a superior performance in the training cohort. Proteins that were measured in both LLP and UKB were used in the models since the UKB cohort measured a smaller panel of proteins using the Olink Explore platform: 107/240 for the 1-3Y model. A UK biobank cohort that includes 392 future lung cancer cases and 5500 cancer-free controls was constructed. The 1-3Y model proteins gives rise to an AUC from the cross validation of 0 -75 for predicting cancer 1-3Y before diagnosis, as shown in FIG. 8C. An AUC of ~0-7 was retained for predicting cohorts that included patients 12 years prior to diagnosis, as shown in FIG. 8D. FIG. 8E demonstrates that the model in this example is highly specific to lung cancer in comparison to other types of cancer.

[00452] As shown in Table 9, sub-cohort analysis indicated that the model retained performance in non-smokers, patients younger than the age from the recommended screening guidelines and both sexes. As shown in Table 15, the model also retained performance for different histological subtypes.

Longer term prediction

[00453] Further, the ability of plasma proteins to predict lung cancer were studied by repeating the analysis using sample taken 1-5 years (1-5Y) prior to diagnosis and matched controls. 489 proteins 1 -5 Y before diagnosis that were differentially expressed between future lung cancer and healthy subjects were identified. After filtering out proteins that were significantly associated with haemolysis, 267/493 proteins were kept for further analysis (e.g., 119 up in cases and 148 down), 117 of which were also identified for the 1-3Y analysis (e.g., 69 up in cases and 48 down in cases), as shown in Table 13. Hence, over half of those plasma proteins significantly altered in the future lung cancer cases 1-5Y before diagnosis were not identified as significantly altered 1- 3Y before to diagnosis (n = 150, 50 up in cases and 100 down in cases).

[00454] The combined z score for the 1-5Y proteins had the same relationship to histology, COPD (FIG. 16) and smoking pack year histology as the 1-3Y proteins. However, in contrast to 1-3Y proteins (FIG. 8B), the 1-5Y combined z score differentiated between controls and cases at both 1-3Y and 3-5Y before diagnosis, as shown in FIG. 10B, had no relationship to stage (FIG. 16F) and had a negative correlation with age in pre-diagnostic cancer cases and healthy controls (FIG. IOC).

[00455] Training four different machine learning algorithms (with 5-fold cross validation repeated 5 times) using the 267 1-5Y proteins (Table 13) generated median AUCs from the cross validation ranging from 0.73 to 0.83, as shown in FIG. 10A. During external validation, the model based on 129 1-5Y proteins measured in the UKB data gave an AUC of 0.69 for predicting lung cancer 1-5Y before diagnosis, which was not significantly different to the 1-3Y model. As with the 1-3Y model, AUC remained around 0.7 even for samples up to 12 years prior to diagnosis.

Biological pathways

[00456] Gene enrichment analysis was performed to investigate potential biological pathways implicated in the risk of future lung cancer, being either increased in plasma (over-represented in cases) or decreased in plasma (under-represented in cases). For the top 20 pathways enriched for proteins either higher or lower in cases, there was limited overlap between 1-3Y and 1-5Y cohorts (FIG. 11); only 3 pathways over-represented in cases and 3 pathways under-represented in cases were shared between the 1-3Y and 1-5Y proteins. Of those pathways with higher plasma protein levels in cases, of the 152 pathways with P<0 05 for either cohort, 57 were significant for 1-5Y only, 83 for 1-3Y only and only 12 for both (Table 16). For proteins with lower levels in cases, of the 138 pathways with P<0 05 for either cohort, 55 were significant for 1-5Y only, 74 for 1-3Y only and only 9 for both (Table 17).

[00457] That individual proteins may be associated with different aspects of lung cancer risk and/or presence of undetected lung cancer is exemplified by looking at how levels change over time (FIG. 14) in those cases and controls with longitudinal samples (Table 10). Some increase (e.g. PDIA4, RBPMS2) or decrease (e.g. ENPP6) the closer the sample is taken to diagnosis; others are consistently higher (e.g. CEACAM5) or lower (e.g. MFGE8) varying less over time, but many exhibit a combination of both traits.

[00458] Comprehensive plasma protein discovery was performed in this example, using the Olink® Explore 3072 platform, on plasma samples from the Liverpool Lung Project (LLP) taken at various times prior to lung cancer diagnosis. The methods and results in this example provided insight into early predictive biomarkers and how they change over time. The plasma proteome provided protein biomarkers which may be used to identify those at greatest risk of lung cancer, 5 or more years prior to diagnosis. This approach may provide an opportunity to identify patients who would benefit from novel preventative approaches (for pharmaceutical or vaccination interventions) or who would be eligible for lung cancer screening despite not conforming to current smoking-related selection criteria.

[00459] Selecting proteins by bootstrapping differential expression, 425 and 493 proteins respectively in the 1-3Y and 1-5Y cohorts were identified, and many of these proteins were associated with haemolysis. As haemolysis-associated proteins would give potential false positive signals if any healthy samples were haemolysed, and it is possible that haemolysis is more often seen in lung cancer patients than healthy individuals, removal of any proteins that were associated with haemolysis was performed, leaving 240 (1-3Y) and 267 (1-5Y) proteins (as identified in Table 13) with each panel combined in a z score to investigate relationships with clinical and epidemiological factors. No association was found with smoking (pack years or duration) or with a history of COPD; a negative association with age was seen for pre-diagnostic samples and controls for the 1-5Y z score only. Hence, the plasma proteins are not directly related to known risk factors for cancer, meaning they are more likely to provide additional useful information when used in conjunction with lung cancer risk scores and be unrelated to smoking-induced inflammation. Furthermore, there was no association with stage of disease at diagnosis (apart from the 1-3Y z score association with early stage, albeit at 5-10 years prediagnosis, when not significantly different to control samples) and only a weak association with histological type specifically at 3 - 5 years before diagnosis. These results indicate that the identified proteomic signals are likely to be useful for prediction of any sub-type of non-small cell lung cancer, regardless of stage.

[00460] 240 plasma proteins differentially expressed 1-3 years prior to diagnosis and 267 proteins 1-5 years prior to diagnosis were identified, and 117 of the total 390 proteins (30%) were identified in both analyses. This result has significance as the plasma proteome can reflect not just the presence of an occult, pre-diagnosis tumour (with signals most likely closer to diagnosis), but immune response to pre-malignant disease and the biological response to inflammation associated smoking and environmental factors (risk factors that are not necessarily higher at time of diagnosis). Furthermore, when mapped on to pathways by gene set enrichment analysis, there was limited overlap between the top pathways from 1-3Y and 1-5Y (only 21 pathways of 290 with significant enrichment), indicating different biological pathways drive the signal for long-term and short-term risk. Pathway analysis provides valuable insight into potential biological mechanisms underpinning the differential expression, potentially providing insights into targets for preventative treatment for those at high risk of lung cancer. The Olink panels was curated to reflect specific pathways.

[00461] The z score based on those selected based on 1 -5 Y samples showed a greater differential expression at 3-5 years prior to diagnosis than that based on 1-3Y protein. Nevertheless, four different machine learning algorithms demonstrated that both the 1-3Y and 1-5Y proteins were able to predict lung cancer up to 5 years prior to diagnosis (AUCs of 0.76-0.90 for the 1-3Y models and 0.73-0.83 for the 1-5Y models). Remarkably, in the UK Biobank validation it was shown that either set of proteins were able to predict lung cancer to the same extent (AUC = 0.7) up to 12 years prior to diagnosis. It is important to note that this cancer prediction was exclusive to lung cancers, with other future cancers in the UK Biobank cohort not predicted, indicating that both the predisposing factors and the tumour-released proteome are likely distinctive for different tumours. Furthermore, in the UK Biobank validation, the predictive power was maintained to some extent in never smokers (AUC = 0.62) compared to smokers (AUC = 0.69) and was also predictive in those aged 40-55 years (AUC 0.78), who would not usually be eligible for UDCT lung cancer screening; there was also some evidence that it performed better in males (AUC 0.72) than females (AUC 0.66). It is therefore possible that plasma proteome biomarkers might help to expand lung cancer prediction risk scores for better utility within groups currently excluded from the benefit of UDCT screening. However, this would need to be tested in larger populations of younger subjects and never smokers, as these groups are under-represented in most lung cancer cohorts.

[00462] Booking at longitudinal samples, the combined z score for the 1-3Y proteins rises significantly towards diagnosis. However, for the 1-5Y protein, differences extend to earlier in disease progression and the levels of some proteins were not increased to as great an extent closer to diagnosis. This indicates that they may represent marker of risk, being indicative of either genetic predisposition or smoking-related damage, rather than being tumour-released or tumour-reactive proteins. Risk biomarkers, rather than being used for early diagnosis, may allow one to identify those who would benefit most from preventative measures, including therapeutic- prevention. For example, inflammation has been shown to be a potential target when post-hoc analysis of the CANTOS trial of Canakinumab (an anti-interleukin- 10 monoclonal antibody), for prevention of recurrent vascular events in patients with a persistent pro-inflammatory response, demonstrated a protective effect on lung cancer incidence and mortality; although subsequent trails in treatment of existing cancers have so far proved inconclusive.

[00463] Plasma proteins have been shown to provide a means to predict those most at risk of future lung cancer. Similarly, the models could be considered as candidates for inclusion in risk profiling for LDCT screening, or for expedited referral of symptomatic patients.

[00464] This example demonstrated that some proteins are associated with longer-term risks, rather than increasing closer to diagnosis (and presumably either being tumour-released or indirectly associated with tumour burden).

[00465] In conclusion, the plasma proteome analysis, performed on pre-diagnostic samples from lung cancer patients and lung cancer free controls, identified two partially overlapping panels of proteins from samples 1-3 years or 1-5 years prior to cancer. These panels mapped to predominantly different pathways, but both were predictive for lung cancer on internal and external validation. That samples further from diagnosis displayed different patterns of predictive plasma proteins may indicate that they reflect biological risk, rather than tumour-associated changes. The latter are nevertheless significant in both panels, the combined z scores of which are highest at diagnosis.

[00466] The results show that for samples 1 -3 years pre-diagnosis, 240 proteins were significantly different in cases; for 1-5 year samples, 117 of these and 150 further proteins were identified, mapping to significantly different pathways. Four machine learning algorithms gave median AUCs of 0.76-0.90 and 0.73-0.83 for the 1-3 year and 1-5 year proteins respectively. External validation gave AUCs of 0.75 (1-3 year) and 0.69 (1-5 year), with AUC 0.7 up to 12 years prior to diagnosis. The models were independent of age, smoking duration, cancer histology and the presence of COPD. [00467] The findings in this example confirmed the predictive power of plasma protein profiling for prediction of future lung cancer diagnosis, identifying potential protein biomarkers for early detection. That biomarker proteins selected using longer pre-diagnostic time points partially overlap those selected using samples from later time points, and represent different molecular pathways, suggests that both biomarkers for inherent cancer risk and occult tumor detection can be identified. This is further supported by the differing longitudinal levels across multiple time points, including at diagnosis.

Table 1: Identification of biomarkers in Olink® Target 96 WP2 platform

Table 2: Identification of biomarkers in “1-5Y” prediction models in Olink® Explore 3072 Platform

Ill

Table 3: Identification of biomarkers in “1-3Y” prediction models in Olink® Explore 3072 Platform

Table 4: Model performance using Olink® Target 96 platform

Table 5: Model performance for “1-5Y” prediction models in Olink® Explore 3072 platform

Table 6: Model performance for “1-3Y” prediction models in Olink® Explore 3072 platform Table 7: LLP Cohorts used for 1-3 year and 1-5 year discovery

Cases 1-3 years prior to diagnosis Cases 1-5 years prior to diagnosis

Cancer Control Total P value (test)* Cancer Control Total P value (test)*

Sex n (%) Female 14 (35.0) 39 (38.2) 53 (37.3) A 2 0.13 27 (36.0) 77 (41.4) 104 (39.8) A 2 0.65

Male 26 (65.0) 63 (61.8) 89 (62.7) P=0-72(CS) 48 (64.0) 109 (58.6) 157 (60.2) 0-42 (CS)

Age (years) 69.5 70.1 69.8 0-96 68.3 68.2 68.1 0.88

Median (IQR) (62.3 -74.2) (62.0 -74.3) (62.0 -74.2) (MW) (62.0 -73.3) (61.9 -73.2) (62.0 -73.2)

Smoking status n (%) current 11 (27.5) 38 (37.3) 49 (34.5) X 2 1.08 27 (36.0) 74 (39.8) 101 (38.7) former 27 (67.5) 61 (59.8) 88 (62.0) P=0.58 43 (57.3) 104 (55.9) 147 (56.3) never 1 (2.5) 3 (2.9) 4 (2.8) (CS) 2 (2.7) 8 (4.3) 10 (3.8) unknown 1(2.5) 0(0) 1(0.7) 3(4.0) 0(0) 3(1.1)

Smoking duration (years) 44 43 43 0-47 44 44 44 0.76

Median (IQR) (33 -48) (35 - 50) (34 -49) (MW) (34 -49) (35 -49) (35 -49) (MW)

Smoking pack years 43.5 39.8 39.9 0-68 41.3 37.5 38.4 0.19

Median (IQR) (25.0 - 51.5) (22.7 - 53.8) (24.6 - 52.8) (MW) (25.5 - 51.8) (21.8 -49.2) (23.3 - 50.4)

Smoking quit years 0 2 0 0-75 0 0 0

Median (IQR) (0-10) (0-12.3) (1-11.5) (MW) (0-10) (0-9) (0-8)

COPDn(%)Yes 9 (22.5) 18 (17.6) 27 (19.0) A 2 0.44 16 (21.3) 33 (17.7) 49 (18.8)

No 31 (77.5) 84 (82.4) 115 (81.0) P=0-51(CS) 59 (78.7) 153 (82.3) 212 (81.2)

Body Mass Index 26.6 26.5 26.6 0-47 26.6 26.6 26.6 0.86

Median (IQR) (26.2 -29.3) (24.3 -28.1) (24.6 -28.2) (MW) (24.8 -27.4) (24.5 -28.1) (24.5 -28.1) (MW)

Total subjects 40 102 142 75 186 261

Plasma samples 58 117 175 114 220 334 QR = Inter-quartile range; * CS = Chi-square; MW = Mann-Whitney (tests only performed for known values)

Table 8: Validation of 1-5Y lung cancer prediction model in UK Biobank data

PPV at sensitivity J of: enri .c .hment . „ Popu ,lat .i.on „ „ Preva ,lence

0-05 0-10 0-25 at 0-05 Size dStS in subgroup

Smoker 47.4 37.1 21.7 5.6 0.693 4235 356 8.41

Non-smoker 7.7 8.1 6.6 3.9 0.615 1654 33 2

Age 40-55 y 100 62.5 27.9 39 0.775 1913 49 2.56

Age 55-70 y 30.4 31.5 21.3 3.5 0.683 3979 343 8.62

Male 55.6 29.9 20.2 7.8 0.721 2878 204 7.09

Female 31 31.7 17.6 5.0 0.663 3014 188 6.24

Total 40.8 30 19.1 6.1 0.694 5892 392 6.65

PPP = positive predictive value; AUC = Area under Curve ROC value

Table 9: Stage and histology distribution of discovery cohort and all lung cancer cases (including longitudinal) z AdC NSCLC NOS SqC Total Early/Late

IA 8 0 4 13

IB 3 0 5 6

II A 4 0 3 7 33 (46%)

E IIB 1 0 3 3 o o Early NOS 0 0 4 4

IIIA 4 1 4 9 IIIB 3 0 1 4 39 (54%) IV 8 4 4 17

Late NOS 5 1 3 9 no stage 2 0 1 3

Total 39 6 30 75

IA 10 0 7 17

IB 5 0 5 10

II A 6 0 7 13 51 (42%)

IIB 1 0 4 5 Early NOS 0 1 5 6 IIIA 8 2 6 16 IIIB 3 1 4 8

71 (58%)

IV 16 5 7 28

Late NOS 7 2 10 19 no stage 2 1 2 5

Total 58 12 57 127

Table 10: Longitudinal sample distribution, by number of samples analysed for cases and by stage at diagnosis; matched sample at each time point from 1 control per case were also analysed.

Time of sample relative to diagnosis

At diagnosis Total

5-10 years 3-5 years 1-3 years

4 samples 4 4 7 5 20

3 samples 10 12 10 7 39

2 samples 19 14 4 11 48

Total samples 33 30 21 23 107

Early stage cases 16 8 8 13 19

Late stage cases 16 15 7 10 22

Unknown stage cases 1 0 1 0 1

Total cases 33 23 16 23 42 Table 11 : Linear model correlation of Olink Explore 3072 with haemolysis. 1112 haemolysis associated proteins

Table 12: UK Biobank demographics for lung cancer cases and selected cancer-free controls

Cancer Controls Overall P value (test)*

Sex n (%) X2 1.3

Female 188 (48.0) 2826 (51.4) 3014 (51.2) 0-25

Male 204 (52.0) 2674 (48.6) 2878 (48.8) (CS)

Age (years)

Mean (SD) 62.2 (6.09) 57.6 (7.80) 57.9 (7.78) <0.00001

Median [IQR] 64.0 [59 - 67] 58 [52 - 64] 59 [52 - 65] (MW)

Smoking Status n (%)

Never 33 (8.4) 1621 (29.5) 1654 (28.1) .V2 76. I

Current or Former 356 (90.8) 3879 (70.5) 4235 (71.9) < 0.00001

Missing 3 (0.8) 0 (0) 3 (0.1) (CS)

Smoking pack years*

Mean (SD) 38.9 (25.7) 22.3 (17.9) 24.3 (19.8) < 0.00001

Median [IQR] 34.5 [21.0 - 48.6] 18.0 [9.4 - 30.5] 19.5 [10, 64] (MW)

Total 392 5500 5892

* Pack-year data only given for known non-zero values

Table 13: Plasma proteins differentially expressed in 1-3Y and 1-5Y samples, with direction of change P value and FDR

Table 14: Correlations between protein relative levels from Olink Target 96 platform and the Olink Explore platform Table 15: Validation of 1-5 Y lung cancer prediction model in UK Biobank data

. . PPV (%) at sensitivity of: enrichment Population Prevalence

Histological subtype _ at 0.05 AUC Cases .

0.05 0.10 0.25 sensitivity Slze in subgroup

Adenocarcinoma 18.6 9.6 4.6 7.7 0.652 6440 157 2.43

Non-small cell carcinoma 12.5 4 1.4 19.8 0.713 6323 40 0.63

Small cell carcinoma 14.3 10.3 4 13.8 0.692 6349 66 1.04

Squamous cell carcinoma 11.6 8.7 4.2 7.5 0.697 6382 99 1.55

Carcinoid 6.7 2.2 0.7 18.6 0.621 6306 23 0.36

Unspecified 17.6 5.5 3.3 17.8 0.718 6346 63 0.99

Large cell carcinoma 6.7 6.7 1 30.5 0.683 6297 14 0.22

Table 16 Pathway enrichment for 1-3Y and 1-5Y proteins - upregulated proteins

Table 17: Pathway enrichment for 1-3Y and 1-5Y proteins - downregulated proteins