Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS OF DIAGNOSIS AND THERAPEUTIC TARGETING OF CLINICALLY INTRACTABLE MALIGNANT TUMORS
Document Type and Number:
WIPO Patent Application WO/2017/201497
Kind Code:
A1
Abstract:
The present disclosure is directed to methodologies or technologies for generating a predictor of a disease state (e.g. cancer-therapy efficacy status, cancer therapy progress, cancer prognosis, cancer diagnosis, therapy failure, relapse, recurrence, and the like) based on genomic and proteomic signatures, gene expression, and pathways & networks activation of endogenous human stem cell- associated retroviruses (SCAR). This disclosure is also directed to methods of targeting, designing, and using treatments for clinically intractable malignant tumors.

Inventors:
GLINKSII GUENNADI V (US)
KELTNER LLEW (US)
Application Number:
PCT/US2017/033678
Publication Date:
November 23, 2017
Filing Date:
May 19, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ONCOSCAR LLC (US)
International Classes:
C12Q1/68; C12Q1/70; G16B25/10; G16B40/00
Domestic Patent References:
WO2007114896A22007-10-11
WO2016011558A12016-01-28
Foreign References:
EP1365034A22003-11-26
US20150088432A12015-03-26
US8349555B22013-01-08
US7890267B22011-02-15
Other References:
GENNADI V. GLINSKY: "Activation of endogenous human stem cell-associated retroviruses (SCARs) and therapy-resistant phenotypes of malignant tumors", CANCER LETTERS, vol. 376, no. 2, 12 April 2016 (2016-04-12), US, pages 347 - 359, XP055407483, ISSN: 0304-3835, DOI: 10.1016/j.canlet.2016.04.014
GENNADI V GLINSKY: "Viruses, stemness, embryogenesis, and cancer: a miracle leap toward molecular definition of novel oncotargets for therapy-resistant malignant tumors?", ONCOSCIENCE, 1 January 2015 (2015-01-01), United States, pages 751, XP055407479, Retrieved from the Internet
J.S. GOOTENBERG ET AL.: "Nucleic acid detection with CRISPR-Casl3a/C2c2", SCIENCE, 2017
GLINSKY, GV: "Stemness", JOURNAL OF CLINICAL ONCOLOGY, vol. 26, 2008, pages 2846 - 53
GLINSKY, G.V ET AL., J. CLIN. INVEST., vol. 113, 2004, pages 913 - 923
VAN 'T VEER ET AL., NATURE, vol. 415, 2002, pages 530 - 536
GLINSKY ET AL., J. CLIN. INVEST., vol. 115, 2005, pages 1503 - 1521
VANNESTE E; VOET T; LE CAIGNEC C; AMPE M; KONINGS P; MELOTTE C; DEBROCK S; AMYERE M; VIKKULA M; SCHUIT F: "Chromosome instability is common in human cleavage-stage embryos", NAT MED., vol. 15, 2009, pages 577 - 83
JOHNSON DS; GEMELOS G; BANER J; RYAN A; CINNIOGLU C; BANJEVIC M, ROSS R; ALPER M; BARRETT B; FREDERICK J; POTTER D: "Preclinical validation of a microarray method for full molecular karyotyping of blastomeres in a 24-h protocol", HUM REPROD., vol. 25, 2010, pages 1066 - 75
CHAVEZ SL; LOEWKE KE; HAN J; MOUSSAVI F; COLLS P; MUNNE S; BEHR B; REIJO PERA RA: "Dynamic blastomere behaviour reflects human embryo ploidy by the four-cell stage", NAT COMMUN., vol. 3, 2012, pages 1251
VERA-RODRIGUEZ M; CHAVEZ SL; RUBIO C; REIJO PERA RA; SIMON C: "Prediction model for aneuploidy in early human embryo development revealed by single-cell analysis", NAT COMMUN., vol. 6, 2015, pages 7601
YANEZ LZ; HAN J; BEHR BB; PERA RA; CAMARILLO DB.: "Human oocyte developmental potential is predicted by mechanical properties within hours after fertilization", NAT COMMUN, vol. 7, 2016, pages 10809
YANEZ LZ; HAN J; BEHR BB; PERA RA; CAMARILLO DB: "Human oocyte developmental potential is predicted by mechanical properties within hours after fertilization", NAT COMMUN, vol. 7, 2016, pages 10809
DOBSON AT; RAJA R; ABEYTA MJ; TAYLOR T; SHEN S; HAQQ C; PERA RA: "The unique transcriptome through day 3 of human preimplantation development", HUM. MOL. GENET., vol. 13, 2004, pages 1461 - 1470
KOCABAS AM; CROSBY J; ROSS PJ; OTU HH; BEYHAN Z; CAN H; TAM WL; ROSA GJ; HALGREN RG; LIM B: "The transcriptome of human oocytes", PROC NATL ACAD SCI USA., vol. 103, 2006, pages 14027 - 32
DIAZ LA JR; BARDELLI A: "Liquid Biopsies: Genotyping Circulating Tumor DNA", J CLIN ONCOL., vol. 32, 2014, pages 579 - 86
HABER, D. A.; VELCULESCU, V. E.: "Blood-Based Analyses of Cancer: Circulating Tumor Cells and Circulating Tumor DNA", CANCER DISCOV., vol. 4, 2014, pages 650 - 661
BETTEGOWDA, C. ET AL.: "Detection of circulating tumor DNA in early-and late-stage human malignancies", SCI. TRANSL. MED., vol. 6, 2014, pages 224ra24
NEWMAN AM; BRATMAN SV; TO J; WYNNE JF; ECLOV NC; MODLIN LA; LIU CL; NEAL JW; WAKELEE HA; MERRITT RE: "An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage", NAT. MED. NAT MED., vol. 20, 2014, pages 548 - 54
DAWSON SJ; TSUI DW; MURTAZA M; BIGGS H; RUEDA OM; CHIN SF; DUNNING MJ; GALE D; FORSHEW T; MAHLER-ARAUJO B: "Analysis of circulating tumor DNA to monitor metastatic breast cancer", N. ENGL. J. MED, vol. 368, 2013, pages 1199 - 209
GARCIA-MURILLAS I; SCHIAVON G; WEIGELT B; NG C; HREBIEN S; CUTTS RJ; CHEANG M; OSIN P; NERURKAR A; KOZAREWA I: "Mutation tracking in circulating tumor DNA predicts relapse in early breast cancer", SCI TRANSL MED., vol. 7, 2015, pages 302ra133
"Clonal evolution in breast cancer revealed by single nucleus genome sequencing", NATURE, vol. 512, 2014, pages 155 - 160
WANG, Y. ET AL.: "Clonal evolution in breast cancer revealed by single nucleus genome sequencing", NATURE, vol. 512, 2014, pages 155 - 160
"United States Cancer Statistics: 1999-2012 Incidence and Mortality Web-based Report", 2015, ATLANTA: U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES
SANTONI, F.A.; GUERRA, J.; LUBAN, J: "HERV-H RNA is abundant in human embryonic stem cells and a precise marker for pluripotency", RETROVIROLOGY, vol. 9, 2012, pages 111
XIE W; SCHULTZ MD; LISTER R; HOU Z; RAJAGOPAL N; RAY P; WHITAKER JW; TIAN S; HAWKINS RD; LEUNG D: "Epigenomic analysis of multilineage differentiation of human embryonic stem cells", CELL, vol. 153, 2013, pages 1134 - 1148
GLINSKY, GV: "Transposable Elements and DNA Methylation Create in Embryonic Stem Cells Human-Specific Regulatory Sequences Associated with Distal Enhancers and Noncoding RNAs", GENOME BIOL EVOL., vol. 7, 2015, pages 1432 - 54
KUNARSO, G; CHIA, NY; JEYAKANI, J; HWANG, C; LU, .; CHAN, YS; NG, HH; BOURQUE, G.: "Transposable elements have rewired the core regulatory network of human embryonic stem cells", NAT GENET., vol. 42, 2010, pages 631 - 634
KELLEY, D; RINN, J.: "Transposable elements reveal a stem cell-specific class of long noncoding RNAs", GENOME BIOL., vol. 13, 2012, pages R107
GLINSKY GV: "Endogenous human stem cell-associated retroviruses", BIORXIV, 2015, Retrieved from the Internet
GLINSKY GV: "SCARs: endogenous human stem cell-associated retroviruses and therapy-resistant malignant tumors", ARXIV: 1508.02022, 2015, Retrieved from the Internet
GLINSKY GV: "Viruses, sternness, embryogenesis, and cancer: a miracle leap toward molecular definition of novel oncotargets for therapy-resistant malignant tumors?", ONCOSCIENCE, vol. 2, 2015, pages 751 - 754
GLINSKY GV, ACTIVATION OF ENDOGENOUS HUMAN STEM CELL-ASSOCIATED RETROVIRUSES AND THERAPY-RESISTANT PHENOTYPES OF MALIGNANT TUMORS, 2016
SMITH ZD; CHAN MM; HUMM KC; KARNIK R; MEKHOUBAD S; REGEV A; EGGAN K; MEISSNER A.: "DNA methylation dynamics of the human preimplantation embryo", NATURE, vol. 511, 2014, pages 611 - 615
FORT A; HASHIMOTO K; YAMADA D; SALIMULLAH M; KEYA CA; SAXENA A; BONETTI A; VOINEAGU I; BERTIN N; KRATZ A: "Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance", NATURE GENET., vol. 46, pages 558 - 566
LU X; SACHS F; RAMSAY L; JACQUES PE; GOKE J; BOURQUE G; NG HH: "The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity", NAT STRUCT MOL BIOL, vol. 21, 2014, pages 423 - 425
OHNUKI M; TANABE K1; SUTOU K; TERAMOTO I; SAWAMURA Y; NARITA M; NAKAMURA M; TOKUNAGA Y; NAKAMURA M; WATANABE A: "Dynamic regulation of human endogenous retroviruses mediates factor-induced reprogramming and differentiation potential", PROC NATL ACAD SCI USA., vol. 111, 2014, pages 12426 - 31
KOYANAGI-AOI M; OHNUKI M; TAKAHASHI K; OKITA K; NOMA H; SAWAMURA Y; TERAMOTO I; NARITA M; SATO Y; ICHISAKA T: "Differentiation-defective phenotypes revealed by large-scale analyses of human pluripotent stem cells", PROC NATL ACAD SCI USA., vol. 110, 2013, pages 20569 - 74
MARCHETTO MC; NARVAIZA I; DENLI AM; BENNER C; LAZZARINI TA; NATHANSON JL; PAQUOLA AC; DESAI KN; HERAI RH; WEITZMAN MD: "Differential LINE-1 regulation in pluripotent stem cells of humans and other great apes", NATURE, vol. 503, 2013, pages 525 - 529
XUE Z; HUANG K; CAI C; CAI L; JIANG CY; FENG Y; LIU Z; ZENG Q; CHENG L; SUN YE: "Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing", NATURE, vol. 500, 2013, pages 593 - 597
YAN L; YANG M; GUO H; YANG L; WU J; LI R; LIU P; LIAN Y; ZHENG X; YAN J: "Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells", NAT STRUCT MOL BIOL, vol. 20, 2013, pages 1131 - 1139
GOKE J; LU X; CHAN YS; NG HH; LY LH; SACHS F; SZCZERBINSKA I: "Dynamic transcription of distinct classes of endogenous retroviral elements marks specific populations of early human embryonic cells", CELL STEM CELL, vol. 16, 2015, pages 135 - 141
WANG J; XIE G; SINGH M; GHANBARIAN AT; RASKO T; SZVETNIK A; CAI H; BESSER D; PRIGIONE A; FUCHS NV: "Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells", NATURE, vol. 516, 2014, pages 405 - 9
GROW EJ; FLYNN RA; CHAVEZ SL; BAYLESS NL; WOSSIDLO M; WESCHE DJ; MARTIN L; WARE CB; BLISH CA; CHANG HY: "Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells", NATURE, vol. 522, 2015, pages 221 - 5
ROBBEZ⍰MASSON L; ROWE HM: "Retrotransposons shape speciesl7Jspecific embryonic stem cell gene expression", RETROVIROLOGY, vol. 12, 2015, pages 45
TAMBORERO D1; GONZALEZ-PEREZ A; PEREZ-LLAMAS C; DEU-PONS J; KANDOTH C; REIMAND J; LAWRENCE MS; GETZ G; BADER GD; DING L: "Comprehensive identification of mutational cancer driver genes across 12 tumor types", SCI REP., vol. 3, 2013, pages 2650
HOADLEY KA; YAU C; WOLF DM; CHERNIACK AD; TAMBORERO D; NG S; LEISERSON MD; NIU B; MCLELLAN MD; UZUNANGELOV V: "Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin", CELL, vol. 158, 2014, pages 929 - 44
YU, X; GABRIEL, A: "Patching broken chromosomes with extranuclear cellular DNA", MOL. CELL, vol. 4, 1999, pages 873 - 881
LIN, Y.; WALDMAN, A.S.: "Promiscuous patching of broken chromosomes in mammalian cells with extrachromosomal DNA.", NUCLEIC ACIDS RES., vol. 29, 2001, pages 3975 - 3981
TENG, S.C.; KIM, B; GABRIEL, A: "Retrotransposon reverse transcriptase-mediated repair of chromosomal breaks", NATURE, vol. 383, 1996, pages 641 - 644
MORRISH, T.A.; GILBERT, N.; MYERS, J.S.; VINCENT, B.J.; STAMATO, T.D.; TACCIOLI, G.E.; BATZER, M.A.; MORAN, J.V.: "DNA repair mediated by endonuclease-independent LINE-1 retrotransposition", NAT. GENET., vol. 31, 2002, pages 159 - 165
MORRISH TA; GARCIA-PEREZ JL; STAMATO TD; TACCIOLI GE; SEKIGUCHI J; MORAN JV.: "Endonuclease-independent LINE-1 retrotransposition at mammalian telomeres", NATURE, vol. 446, 2007, pages 208 - 12
ICHIYANAGI, K.; NAKAJIMA, R.; KAJIKAWA, M.; OKADA, N.: "Novel retrotransposon analysis reveals multiple mobility pathways dictated by hosts", GENOME RES., vol. 17, 2007, pages 33 - 41
SEN, S.K.; HUANG, C.T.; HAN, K.; BATZER, M.A: "Endonuclease-independent insertion provides an alternative pathway for L1 retrotransposition in the human genome", NUCLEIC ACIDS RES., vol. 35, 2007, pages 3741 - 3751
SRIKANTA D; SEN SK; HUANG CT; CONLIN EM; RHODES RM ET AL.: "An alternative pathway for Alu 63 retrotransposition suggests a role in DNA double strand break repair", GENOMICS, vol. 93, 2009, pages 205 - 212
SHIN W; LEE J; SON S-Y; AHN K; KIM H-S; HAN, K: "Human-specific HERVK insertion causes genomic variations in the human genome", PLOS ONE, vol. 8, 2013, pages e60605
NUSSENZWEIG A; NUSSENZWEIG MC: "A backup DNA repair pathway moves to the forefront", CELL, vol. 131, 2007, pages 223 - 225
ILIAKIS G: "Backup pathways of NHEJ in cells of higher eukaryotes: cell cycle dependence", RADIOTHER ONCOL., vol. 92, 2009, pages 310 - 315
BOGOMAZOVA AN; LAGARKOVA MA; TSKHOVREBOVA LV; SHUTOVA MV; KISELEV SL: "Error-prone nonhomologous end joining repair operates in human pluripotent stem cells during late G2.", AGING (ALBANY NY)., vol. 3, 2011, pages 584 - 96
FAN J; ROBERT C; JANG YY; LIU H; SHARKIS S; BAYLIN SB; RASSOOL FV: "Human induced pluripotent cells resemble embryonic stem cells demonstrating enhanced levels of DNA repair and efficacy of nonhomologous end-joining", MUTAT RES., vol. 713, 2011, pages 8 - 17
GLINSKY GV; GLINSKII AB; BEREZOVSKAYA O: "Microarray analysis identifies a death-from-cancer signature predicting therapy failure in patients with multiple types of cancer", JOURNAL OF CLINICAL INVESTIGATION, vol. 115, 2005, pages 1503 - 21
GLINSKY GV: "Death-from-cancer signatures and stem cell contribution to metastatic cancer", CELL CYCLE, vol. 4, 2005, pages 1171 - 5
GLINSKY, GV: "Genomic models of metastatic cancer: Functional analysis of death-from-cancer signature genes reveals aneuploid, anoikis-resistant, metastasis-enabling phenotype with altered cell cycle control and activated Polycomb Group (PcG) protein chromatin silencing pathway", CELL CYCLE, vol. 5, 2006, pages 1208 - 1216
BEREZOVSKA, OP; GLINSKII, AB; YANG, Z; LI, X-M; HOFFMAN, RM; GLINSKY, GV: "Essential role of the Polycomb Group (PcG) protein chromatin silencing pathway in metastatic prostate cancer", CELL CYCLE, vol. 5, 2006, pages 1886 - 1901
GLINSKII AB; SMITH BA; JIANG P; LI XM; YANG M; HOFFMAN RM; GLINSKY GV: "Viable circulating metastatic cells produced in orthotopic but not ectopic prostate cancer models", CANCER RES., vol. 63, 2003, pages 4239 - 43
BEREZOVSKAYA O; SCHIMMER AD; GLINSKII AB; PINILLA C; HOFFMAN RM; REED JC; GLINSKY GV: "Increased expression of apoptosis inhibitor protein XIAP contributes to anoikis resistance of circulating human prostate cancer metastasis precursor cells", CANCER RES., vol. 65, 2005, pages 2378 - 86
GLINSKY GV; GLINSKII AB; BEREZOVSKAYA O; SMITH BA; JIANG P; LI XM; YANG M; HOFFMAN RM: "Dual-color-coded imaging of viable circulating prostate carcinoma cells reveals genetic exchange between tumor cells in vivo, contributing to highly metastatic phenotypes", CELL CYCLE., vol. 5, 2006, pages 191 - 7
HOLT, S.; GLINSKY, V.V.; IVANOVA, A.B.; GLINSKY, G.V.: "Resistance to apoptosis in human cells conferred by telomerase function and telomere stability", MOLECULAR CARCINOGENESIS, vol. 25, 1999, pages 241 - 248
GLINSKY, G.V.; GLINSKY, V.V.; IVANOVA, A.B.; HUESER, C.N: "Apoptosis and metastasis: Increased apoptosis resistance of metastatic cancer cells is associated with the profound deficiency of apoptosis execution mechanisms.", CANCER LETTERS, vol. 115, 1997, pages 185 - 193
GLINSKY, G.V: "Apoptosis in metastatic cancer cells", CRIT. REV. ONCOL/HEMAT., vol. 25, 1997, pages 175 - 186
GLINSKY, GV; GLINSKY, VV: "Apoptosis and metastasis: A superior resistance of metastatic cancer cells to programmed cell death", CANCER LETTERS, vol. 101, 1996, pages 43 - 51
GLINSKY GV: "Stem cell origin of death-from-cancer phenotypes of human prostate and breast cancers", STEM CELLS REVIEWS, vol. 3, 2007, pages 79 - 93
GLINSKY GV: "Stemness'' genomics law governs clinical behavior of human cancer: Implications for decision making in disease management", JOURNAL OF CLINICAL ONCOLOGY, vol. 26, 2008, pages 2 846 - 53
GLINSKY GV; BEREZOVSKA O; GLINSKII A: "Genetic signatures of regulatory circuitry of embryonic stem cells (ESC) identify therapy-resistant phenotypes in cancer patients diagnosed with multiple types of epithelial malignancies", CANCER RESEARCH, vol. 67, no. 9, 2007, pages 1272
GLINSKII A; BEREZOVSKAYA O; SIDORENKO A; GLINSKY G.: "Sternness pathways define therapy-resistant phenotypes of human cancers", CLINICAL CANCER RESEARCH, vol. 14, no. 15, 2008, pages B38
SCHWARTZBERG P; COLICELLI J; GOFF SP: "Recombination between a defective retrovirus and homologous sequences in host DNA: reversion by patch repair", J VIROL., vol. 53, 1985, pages 719 - 26
MCCLURE HM.: "Tumors in nonhuman primates: observations during a six-year period in the Yerkes primate center colony", AM J PHYS ANTHROPOL., vol. 38, 1973, pages 425 - 429
SEIBOLD HR; WOLF RH: "Neoplasms and proliferative lesions in 1065 nonhuman primate necropsies", LAB ANIM SCI, vol. 23, 1973, pages 533 - 539
BENIASHVILI DS: "An overview of the world literature on spontaneous tumors in nonhuman primates", J MED PRIMATOL., vol. 18, 1989, pages 423 - 437
SCOTT, G.B.D: "Comparative primate pathology", 1992, OXFORD UNIVERSITY PRESS
WATERS DJ; SAKR WA; HAYDEN DW; LANG CM; MCKINNEY L; MURPHY GP; RADINSKY R; RAMONER R; RICHARDSON RC; TINDALL DJ: "Workgroup 4: spontaneous prostate carcinoma in dogs and nonhuman primates", PROSTATE, vol. 36, 1998, pages 64 - 67
SIMMONS HA; MATTISON JA: "The incidence of spontaneous neoplasia in two populations of captive rhesus macaques (Macaca mulatta", ANTIOXID REDOX SIGNAL., vol. 14, 2011, pages 221 - 7
GEMMELL, P.; HEIN, J.; KATZOURAKIS, A: "Orthologous endogenous retroviruses exhibit directional selection since the chimp-human split", RETROVIROLOGY, vol. 12, 2015, pages 52
SUBRAMANIAN, R.P.; WILDSCHUTTE, J.H.; RUSSO, C.; COFFIN, J.M.: "Identification, characterization, and comparative genomic distribution of the HERV-K (HML-2) group of human endogenous retroviruses", RETROVIROLOGY, 2011
HOHN, O.; HANKE, K.; BANNERT, N.: "HERV-K(HML-2), the best preserved family of HERVs: Endogenization, expression, and implications in health and disease", FRONT ONCOL, vol. 3, 2013, pages 246
BHARDWAJ, N.; COFFIN, J.M: "Endogenous Retroviruses and Human Cancer: Is There Anything to the Rumors?", CELL HOST & MICROBES, vol. 15, 2014, pages 255 - 250
KENT, WJ.: "BLAT - the BLAST-like alignment tool", GENOME RES., vol. 12, 2002, pages 656 - 664
SCHWARTZ, S.; KENT, W.J.; SMIT, A.; ZHANG, Z.; BAERTSCH, R.; HARDISON, R.C.; HAUSSLER, D.; MILLER, W.: "Human-mouse alignments with BLASTZ", GENOME RES., vol. 13, 2003, pages 103 - 107
TAY, S.K.; BLYTHE, J.; LIPOVICH, L.: "Global discovery of primate-specific genes in the human genome", PROC. NATL. ACAD. SCI. USA, vol. 106, 2009, pages 12019 - 12024
CAPRA, J.A.; ERWIN, G.D.; MCKINSEY, G.; RUBENSTEIN, J.L.; POLLARD, K.S.: "Many human accelerated regions are developmental enhancers", PHILOS TRANS R SOC LOND B BIOL SCI., vol. 368, no. 1632, 2013, pages 20130025
MARNETTO D; MOLINERIS I; GRASSI E; PROVERO P.: "Genome-wide identification and characterization of fixed human-specific regulatory regions", AM J HUM GENET, vol. 95, 2014, pages 39 - 48
GITTELMAN RM; HUN E; AY F; MADEOY J; PENNACCHIO L; NOBLE WS; HAWKINS RD; AKEY JM: "Comprehensive identification and analysis of human accelerated regulatory DNA", GENOME RES, vol. 25, 2015, pages 1245 - 55
GUTTMAN, M.; DONAGHEY, J.; CAREY, B.W.; GARBER, M.; GRENIER, J.K.; MUNSON, G.; YOUNG, G.; LUCAS, A.B.; ACH, R.; BRUHN, L.: "lincRNAs act in the circuitry controlling pluripotency and differentiation", NATURE, vol. 477, 2011, pages 295 - 300
GLINSKY, GV: "Rapidly evolving in humans topologically associating domains", ARXIV: 1507.05368, 2015
DIXON, J.R.; SELVARAJ, S.; YUE, F.; KIM, A.; LI, Y.; SHEN, Y.; HU, M.; LIU, J.S.; REN, B: "Topological domains in mammalian genomes identified by analysis of chromatin interactions", NATURE, vol. 485, 2012, pages 376 - 380
DOWEN J.M.; FAN Z.P.; HNISZ D.; REN G.; ABRAHAM B.J.; ZHANG L.N.; WEINTRAUB A.S.; SCHUIJERS J.; LEE T.I.; ZHAO K.: "Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes", CELL, vol. 159, 2014, pages 374 - 387
HNISZ, D; ABRAHAM, B.J.; LEE, T.I.; LAU, A., SAINT-ANDRE', V.; SIGOVA, A.A.; HOKE, H.A.; YOUNG, RA.: "Super-enhancers in the control of cell identity and disease", CELL, vol. 155, 2013, pages 934 - 947
WHYTE, W.A.; ORLANDO, D.A.; HNISZ, D.; ABRAHAM, B.J.; LIN, C.Y.; KAGEY, M.H.; RAHL, P.B.; LEE, T.I.; YOUNG, RA.: "Master transcription factors and mediator establish super-enhancers at key cell identity genes", CELL, vol. 153, 2013, pages 307 - 319
MEYER, L.R.; ZWEIG, A.S; HINRICHS, A.S.; KAROLCHIK, D.; KUHN, R.M.; WONG, M.; SLOAN, C.A.; ROSENBLOOM, K.R.; ROE, G.; RHEAD, B.: "The UCSC Genome Browser database: extensions and updates", NUCLEIC ACIDS RES., vol. 41, 2013, pages D64 - 69
LISTER, R.; PELIZZOLA, M.; DOWEN, R.H.; HAWKINS, R.D.; HON, G.; TONTI-FILIPPINI, J.; NERY, J.R.; LEE, L.; YE, Z.; NGO, Q.M.: "Human DNA methylomes at base resolution show widespread epigenomic differences", NATURE, vol. 462, 2009, pages 315 - 322
LISTER R; MUKAMEL EA; NERY JR; URICH M; PUDDIFOOT CA; JOHNSON ND; LUCERO J; HUANG Y; DWORK AJ; SCHULTZ MD: "Global epigenomic reconfiguration during mammalian brain development", SCIENCE, vol. 341, 2013, pages 1237905
ROSENBLOOM, K.R.; SLOAN, C.A.; MALLADI, V.S.; DRESZER, T.R.; LEARNED, K.; KIRKUP, V.M.; WONG, M.C.; MADDREN, M.; FANG, R.; HEITNER: "ENCODE data in the UCSC Genome Browser: year 5 update", NUCLEIC ACIDS RES, vol. 41, 2013, pages D56 - 63
LI, G.; RUAN, X.; AUERBACH, R.K.; SANDHU, K.S.; ZHENG, M; WANG, P.; POH, H.M.; GOH, Y.; LIM, J.; ZHANG, J.: "Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation", CELL, vol. 148, 2012, pages 84 - 98
WANG, J.; ZHUANG, J.; IYER, S.; LIN, X.; WHITFIELD, T.W.; GREVEN, M.C.; PIERCE, B.G.; DONG, X.; KUNDAJE, A.; CHENG, Y.: "Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors", GENOME RES., vol. 22, 2012, pages 1798 - 1812
ERNST, J.; KELLIS, M.: "Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types", GENOME RES., vol. 23, 2013, pages 1142 - 1154
REICH, D.; GREEN, R.E.; KIRCHER, M.; KRAUSE, J.; PATTERSON, N.; DURAND, E.Y.; VIOLA, B.; BRIGGS, A.W.; STENZEL, U.; JOHNSON, P.L.: "Genetic history of an archaic hominin group from Denisova Cave in Siberia", NATURE, vol. 468, 2010, pages 053 - 1060
MEYER, M.; KIRCHER, M.; GANSAUGE, M.T.; LI, H.; RACIMO, F.; MALLICK, S.; SCHRAIBER, J.G.; JAY, F.; PRUFER, K.; DE FILIPPO, C.: "A high-coverage genome sequence from an archaic Denisovan individual", SCIENCE, vol. 338, 2012, pages 222 - 226
MARCHLER-BAUER A; LU S; ANDERSON JB; CHITSAZ F; DERBYSHIRE MK; DEWEESE-SCOTT C; FONG JH; GEER LY; GEER RC; GONZALES NR: "CDD: a Conserved Domain Database for the functional annotation of proteins", NUCLEIC ACIDS RES., vol. 39, 2011, pages D225 - 9
MARCHLER-BAUER A; DERBYSHIRE MK; GONZALES NR; LU S2; CHITSAZ F; GEER LY; GEER RC; HE J; GWADZ M; HURWITZ 01: "CDD: NCBI's conserved domain database", NUCLEIC ACIDS RES., vol. 43, 2015, pages D222 - 6
TAVAZOIE, S.; HUGHES, J.D.; CAMPBELL, M.J.; CHO, R.J.; CHURCH, GM.: "Systematic determination of genetic network architecture", NAT. GENET., vol. 22, 1999, pages 281 - 285
Attorney, Agent or Firm:
AOKI, Margie, B. (US)
Download PDF:
Claims:
Claims

What is claimed is:

1. A method for diagnosing cancer or predicting cancer-therapy outcome in a subject, comprising: generating target marker information responsive to one or more inputs indicative of a genomic signature pathway and one or more inputs indicative of a proteomic signature pathway of endogenous human Stem Cell-Associated Retroviruses (SCAR); and

generating aberrant object information responsive to comparing detected expression levels and sequence information of a biological sample with target marker information.

2. The method of claim 1, wherein generating the target marker information includes generating target marker information responsive to one or more inputs indicative of a SCARs pathway.

3. The method of claim 1, wherein generating the target marker information includes generating target marker information responsive to one or more inputs indicative of a SCARs pathway target gene,

4. The method of claim 1, wherein generating the target marker information includes generating target marker information associated with one or more of ELF3; PCDH15; MALAT1; PTPN11; RBI; CHST6; NF1; VEZF1; TP53; SMAD4; KEAPl; STKll; PRX; ZNF28; lDHl; FEZ2; DPPA2; LPHN3; KIAA1244; EPHA7; EGFR; TLR4; DA32I P; NOTCH 1; GLUD2; DMD; KDM6A; KRAS; CDKN2A;

DNMT3A; FLT3; NFE2L2; NPMl; MIR142; FOXL2; H3F3A; H3F3B; KMT2D ; RNF43 ; TERT; ERBB2; PLCG1.

5. The method of claim 1, wherein generating the target marker information includes generating target marker information associated with one or more of mRNA, RNA, DNA, peptide or protein.

6. The method of claim 1, wherein generating the target marker information includes generating target marker information associated with one or more of PLCXD1, HKR1, ZNF283, ADA, AMACR+p63, ANK3, BCL2L1, BIRC5, BMl-1, BUB1, CCNB1, CCND1, CES1, CHAF1A, CRlPl, CRYAB, ESMl, EZH2, FGFR2, FOS, Gbx2, HCFCl, lER3, lTPRl, JUNB, KLF6, Kl67, KNTC2, MGC5466, Phcl, RNF2, Suzl2, TCF2, TRAP100, USP22, Wnt5A and ZFP36.

7. The method of claim 1, wherein generating the aberrant object information includes generating aberrant sequence information when a quality of a sequence associated with the biological sample is distinct as compared with one or more reference sequences.

8. The method of claim 1, wherein generating the aberrant object information includes generating aberrant sequence information responsive to one or more inputs indicative of a distinct positioning of a plurality of bases within an entire sequence associated with the biological sample, as compared with one or more reference sequences.

9. The method of claim 1, wherein generating the aberrant object information includes generating aberrant sequence information responsive to one or more inputs indicative of a distinct fragment of a sequence associated with the biological sample, as compared with one or more reference sequences.

10. The method of claim 1, wherein generating the aberrant object information includes generating aberrant expression level information responsive to one or more inputs indicative of when an expression level exceeds a target threshold.

11. The method of claim 1, wherein generating the aberrant object information includes

determining expression level aberrant score when a detected expression level is above a target threshold

12. The method of claim 1, wherein generating the aberrant object information includes

determining a sequence aberrant score when a detected positioning of a plurality of bases associated with the biological sample is distinct compared with a one or more reference sequences.

13. The method of claim 1, wherein generating the aberrant object information includes determining a sequence aberrant score responsive to one or more inputs from a next generation sequencing, multicolor quantitative immunofluorescence co-localization analysis, fluorescence in situ hybridization, and quantitative RT-PCR analysis. 14. The method of claim 1, wherein generating the aberrant object information includes

determining a threshold level by comparing reference information derived from samples obtained from biological subjects with known diagnosis or known clinical outcome after therapies. 15. The method of claim 14, further comprising:

generating a cancer-therapy efficacy status, cancer therapy progress, a cancer prognosis, a cancer diagnosis responsive to one or more inputs indicative of an aberrant expression and an expression level above a target threshold coefficient of at least two markers. 16. The method of claim 1, wherein generating the aberrant object information includes generating aberrant sequence information and marker co-expression level information. 17. The method of claim 1, further comprising:

generating a cancer-therapy efficacy status responsive to one or more inputs indicative of an aberrant sequence and a threshold marker co-expression level. 18. The method of claim 1, further comprising:

generating information indicative of the presence or absence of cancer in a biological subject responsive to one or more inputs indicative of an aberrant sequence and a threshold marker co-expression level. 19. A system for diagnosing cancer or predicting cancer-therapy outcome in a subject, comprising: circuitry configured to generate target marker information responsive to one or more inputs indicative of a genomic signature pathway and one or more inputs indicative of a proteomic signature pathway of endogenous human Stem Cell-Associated Retroviruses (SCAR); and

circuitry configured to generate aberrant object information responsive to comparing at least one input indicative of an expression levels and at least one input indicative of a sequence of a biological sample with target marker information,

20. The system of claim 19, further comprising:

circuitry configured to generate information indicative of the presence or absence of cancer in a biological subject responsive to one or more inputs indicative of an aberrant sequence and a threshold marker co-expression level.

21. The system of claim 19, further comprising:

circuitry configured to generate a cancer-therapy efficacy status, cancer therapy progress, a cancer prognosis, a cancer diagnosis responsive to one or more inputs indicative of an aberrant expression and an expression level above a target threshold coefficient of at least two markers.

22. The system of claim 19, further comprising:

circuitry configured to generate a cancer-therapy efficacy status responsive to one or more inputs indicative of an aberrant sequence and a threshold marker co-expression level.

23. A system for treating cancer, comprising:

circuitry configured to acquire information associated with a Stem Cell-Associated Retroviruses (SCAR) pathway activation in a subject diagnosed with cancer; and

circuitry configured to identify single therapeutic agent or combination of therapeutic agents and to generate user-specific treatment protocol responsive to one or more inputs associated with a Stem Cell-Associated Retroviruses (SCAR) pathway activation in a subject diagnosed with cancer.

24. A method for diagnosing cancer or predicting cancer-therapy outcome in a subject, comprisin concurrenily screening a biological sample for a presence of an aberrant sequences and an aberrant expression level of one or more target markers associated with a pathway involving genomic and proteomic signatures of endogenous human Stem Cell- Associated Retroviruses (SCAR);

scoring a sequence associated with the biological sample as aberrant when the quality of the sequence is distinct compared with a reference sequence; and

scoring an expression level associated with the biological sample as being aberrant when a detected expression level is above a target threshold coefficient. 25. The method of claim 24, wherein concurrently screening a biological sample for a presence of an aberrant sequences and an aberrant expression level of one or more target markers associated with a pathway involving genomic and proteomic signatures of endogenous SCAR, includes concurrently screening a biological sample for a presence of an aberrant sequences and an aberrant expression level of one or more target markers indicative of a cancer diagnosis or a prognosis for cancer-therapy failure in a biological subject. 26. The method of claim 25, further comprising:

generating a user-specific cancer therapy protocol responsive to one or more inputs indicative of an aberrant sequence or an aberrant expression level associated with a cancer diagnosis or a prognosis for cancer-therapy failure in a biological subject. 27. The method of claim 24, wherein concurrently screening a biological sample for a presence of an aberrant sequences and an aberrant expression level of one or more target markers associated with a pathway involving genomic and proteomic signatures of endogenous SCAR, includes concurrently screening a biological sample for a presence of an aberrant sequences and an aberrant expression level of one or more target markers indicative of a progress of cancer therapy in a biological subject. 28. The method of claim 27, further comprising: generating a user-specific cancer therapy protocol responsive to one or more inputs indicative of an aberrant sequence or an aberrant expression level associated with a progress of cancer therapy in a biological subject. 29. The method of claim 24, wherein the detection threshold is being determined by comparing to the values in a reference database of samples obtained from subjects with known diagnosis or known clinical outcome after therapies, wherein the presence of an aberrant expression level of at least one but preferably, two or more markers in the test sample and presence of aberrant expression of two or more such markers is indicative of a cancer diagnosis or a prognosis for cancer-therapy failure, or of the progress of cancer therapy in the subject. 30. The method of claim 24, where the detection threshold is continuously refined by adding the outcome data of each patient tested to the reference database of samples, and in an automated and/or recursive manner either manually or using computational methods using data stored either locally, in remote server(s), or in the cloud, continuously improving the accuracy of diagnosis, prognosis, or specification of future cancer therapy. 31. The method of claim 24, wherein said sample phenotype is selected from the group consisting of cancer, non-cancer, recurrence, non-recurrence, relapse, non-relapse, invasiveness, non- invasiveness, metastatic, non-metastatic, localized, tumor size, tumor grade, Gleason score, survival prognosis, lymph node status, tumor stage, degree of differentiation, age, hormone receptor status, tumor antigen level (including but not limited to PSA level, PSMA level, survivin level, oncofetal protein level, testis antigen level), histologic type, level of, phenotype and genotype of and activation status of immune cells, and disease free survival. 32. The method of claim 24, wherein said threshold coefficient has an absolute value > 0.5. 33. The method of claim 24, wherein said threshold coefficient has an absolute value > 0.6. 34. The method of claim 24, wherein said threshold coefficient has an absolute value > 0.7.

35. The method of claim 24, wherein said threshold coefficient has an absolute value > 0.8.

36. The method of claim 24, wherein said threshold coefficient has an absolute value > 0,9.

37. The method of claim 24, wherein said threshold coefficient has an absolute value > 0.95.

38. The method of claim 24, wherein said threshold coefficient has an absolute value≥ 0.99.

39. The method of claim 24, wherein said threshold coefficient has an absolute value≥ 0.995.

40. The method of claim 24, wherein said threshold coefficient has an absolute value > 0.999.

41. A method of determining detection threshold for classifying a sample phenotype, comprising: identifying a subset of markers and scoring marker expression in cells according to the method of claim 24; and

determining the sample classification accuracy at different detection thresholds using a

reference database of samples from subjects with known phenofypes.

42. The method of claim 41, further comprising determining the sample classification accuracy in an automated and/or recursive manner either manually or using computational methods using data stored either locally, in remote server(s), or in the cloud.

43. The method of claim 41, further comprising determining the best performing magnitude of said detection threshold and using said magnitude to assess the reliability of said established detection threshold in classifying a sample phenotype.

44. The method of claim 41, further comprising determining the best performing magnitude of said detection threshold and using said magnitude to assess the reliability of said established detection threshold in classifying a sample phenotype in an automated and/or recursive manner either manually or using computational methods using data stored either locally, in remote server(s), or in the cloud.

45. The method of claim 41, further comprising using the best performing magnitude of said detection threshold to score an unclassified sample and assign a sample phenotype to said sample.

46. The method of claim 41, further comprising using the best performing magnitude of said

detection threshold to score an unclassified sample and assign a sample phenotype to said sample either manually or using computational methods using data stored either locally, in remote server(s), or in the cloud.

47. The method of claim 41, wherein said subset of markers consists essentially of the genes,

genetic loci, and sequences identified in Table 1A, Table 1, Table 2, Table 3, Table SI, Table S3, Table S4, Table S5, Table S6, Data Set SI, Data Set S2, Data Set S3.

48. The method of claim 41, wherein said subset of markers consists essentially of 90% of the genes, genetic loci, and sequences identified in Table 1A, Table 1, Table 2, Table 3, Table SI, Table S3, Table S4, Table S5, Table S6, Data Set SI, Data Set 52, Data Set S3.

49. The method of claim 41, wherein said subset of markers consists essentially of 80% of the genes, genetic loci, and sequences identified in Table 1A, Table 1, Table 2, Table 3, Table SI, Table S3, Table 54, Table S5, Table S6, Data Set SI, Data Set 52, Data Set S3.

50. The method of claim 41, wherein said subset of markers consists essentially of 70% of the genes, genetic loci, and sequences identified in Table 1A, Table 1, Table 2, Table 3, Table SI, Table S3, Table S4, Table 55, Table S6, Data Set SI, Data Set S2, Data Set S3.

51. The method of claim 41, wherein said subset of markers consists essentially of 60% of the genes, genetic loci, and sequences identified in Table 1A, Table 1, Table 2, Table 3, Table SI, Table S3. Table 54, Table S5, Table S6, Data Set SI, Data Set 52, Data Set S3.

52. The method of claim 41, wherein said subset of markers consists essentially of 50% of the genes, genetic loci, and sequences identified in Table 1A, Table 1, Table 2, Table 3, Table SI, Table S3, Table 54, Table S5, Table S6, Data Set SI, Data Set S2. Data Set S3.

53. A method of treating cancer, comprising:

detecting a molecular signal(s) of SCAR's pathway activation in a subject diagnosed with cancer; and

generating a user-specific therapeutic treatment targeted to activated SCAR's ioci and/or downstream SCARs-regulated genetic ioci based on detecting the molecular signal(s) of SCAR's pathway activation.

54. The method of claim 53, wherein the user-specific therapeutic treatment is based on genome editing, including but not limited to CRISPR/Cas9 complex-mediated genome editing, to silence the defined genomic elements of the activated SCARs pathway.

55. The method of claim 53, wherein the user-specific therapeutic treatment is based on genome editing, including but not limited to CRlSPR/Cas9 complex-mediated genome editing, to activate the defined genomic elements of the activated SCARs pathway.

56. The method of claim 53, wherein the user-specific therapeutic treatment is based on the

application of Highly Active Anti-Retroviral Therapy (HAART).

57. The method of claim 53, wherein the user-specific therapeutic treatment is based on

administration of the antiretroviral drug, Raltegravir (RAL, Isentress, formerly M K-0518).

58. The method of claim 53, wherein the user-specific therapeutic treatment is based on application of anti-sense therapy directed against transcriptionally active SCAR's ioci and/or defined genomic elements of the activated SCARs pathway.

59. The method of claim 53, wherein the user-specific therapeutic treatment is based on the

application of targeted immunotherapy, including at least one of antagonist antibodies or fragments thereof, agonist antibodies or fragments thereof, autologous cells, allogeneic cells, peptides, small molecules, signaling proteins or fragments thereof, or compositions containing two or more of the above and compositions containing in a single molecule or cellular therapy all or part of two or more of the above, directed against the proteins and/or peptides encoded by the activated SCARs sequences. 60. A method of treating cancer where the methods of claims 39 - 45 are used to enhance tumor infiltrating lymphocytes in tumors of treated subjects, either as a sole function or to augment the activity of anti-cancer modulators of the immune system.

Description:
METHODS OF DIAGNOSIS AND THERAPEUTIC TARGETING

OF CLINICALLY INTRACTABLE MALIGNAT TUMORS CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Patent Application No. 62/339007, filed May 19, 2016, which is incorporated herein by reference in its entirety.

All of the following related applications are also incorporated by reference in their entireties: U.S. Provisional Application No. 60/875,061, filed on December 15, 2006; U.S. Provisional Application No. 60/823,577, filed on August 25, 2006; U.S. Provisional Application No. 60/822,705, filed on August 17, 2006; and U.S. Provisional Application No. 60/787,818, filed on March 31, 2006.

SUMMARY

In an aspect, the present disclosure is directed to, among other things, novel methods and kits for diagnosing the presence of cancer within a patient, for determining whether a subject who has cancer is susceptible to different types of treatment regimens, for monitoring the treatment of cancer within a patient, and provides novel methods of delivering cancer therapies, including individualized targeted cancer therapies. The cancers to be tested, monitored and treated include, but are not limited to, prostate, breast, lung, gastric, ovarian, bladder, lymphoma, mesothelioma, brain, liver, metastases of any of the above, and hematological cancers including but not limited to ALL, AML, and CCL. Identification of patients likely to be therapy-resistant early in their treatment regimen can lead to a change in therapy in order to achieve a more successful outcome.

In an aspect, the present disclosure is directed to, among other things, a method for diagnosing cancer or predicting cancer-therapy outcome by defecting the sequences and/or expression levels of multiple markers in the same cell at the same time, in a population of cells, or in a liquid biopsy specimen and scoring their sequences and/or expression as being qualitatively distinct or quantitatively different (above or below) in regard to a certain threshold, wherein the markers are from a particular pathway related to cancer, with the score being indicative of a cancer diagnosis or a prognosis for cancer-therapy failure. This method can be used to diagnose cancer or predict cancer-therapy outcomes for a variety of cancers. In an embodiment, the method includes determining whether an individual is experiencing SCAR's networks activation by using genetic signature information and protein signature information

In an aspect, the present disclosure is directed to, among other things, novel methods of diagnosis and therapeutic targeting of clinically intractable malignant tumors based on identification and monitoring of genomic and proteomic signatures of endogenous human Stem Cell -Associated Retroviruses (SCAR), including early detection of cancer precursor lesions. The markers can come from any pathway involved in the regulation of cancer, including specifically the SCAR's pathway and the "sternness" pathway(s). The markers can be mRNA, RNA, DNA, protein, or peptide. In an aspect, the present disclosure is directed to, among other things, novel methods of designing and using treatments for clinically intractable malignant tumors based on genomic and proteomic signatures of endogenous human stem cell-associated retroviruses (SCAR). Non-limiting examples of technologies and methodologies for detection of nucleic acids, DNA, RNA, etc., with single base mismatch specificity include those described in J.S. Gootenberg et al., "Nucleic acid detection with CRISPR- Casl3a/C2c2," Science, doi:10.1126/science.aam9321, 2017; which is incorporated herein by reference in its entirety.

In an aspect, the present disclosure is directed to, among other things, methods and kits for diagnosing the presence of cancer within a patient, for determining whether a subject who has cancer is susceptible to different types of treatment regimens, for monitoring the treatment of cancer within a patient, and provides novel methods of delivering cancer therapies, including individualized targeted cancer therapies. The cancers to be tested, monitored and treated include, but are not limited to, prostate, breast, lung, gastric, ovarian, bladder, lymphoma, mesothelioma, brain, liver, metastases of any of the above, and hematological cancers including but not limited to ALL, AML, and CCL.. In total, the potential practical utilities of the methods have been demonstrated for 29 distinct types of human cancer.

In an embodiment, a method includes concurrently or sequentially detecting a sequence of multiple markers, the expression levels of multiple markers in the same cell at the same time, in a population of cells, or in a liquid biopsy specimen, and scoring their sequence and/or expression as being aberrant, wherein the markers are from a particular pathway related to cancer, with the score being indicative of a cancer diagnosis or a prognosis for a likelihood of cancer-therapy failure. This method can be used to diagnose cancer or predict cancer-therapy outcomes for a variety of cancers. The simultaneous co-expression of at ieast one, but preferably two or more markers in the same cell, population of cells, or a liquid biopsy specimen from a subject is a diagnostic for cancer and a predictor for the subject to be resistant to standard cancer therapy. The markers can come from any pathway involved in the regulation of cancer, including specifically the SCAR's pathway, PcG pathway and the "sternness" pathway(s). The markers can be m RNA, RNA, DNA, protein, or peptide.

In an aspect, the present disclosure is directed to, among other things, a novel finding that the expression of multiple markers from the SCAR's pathway above a threshold level in the same cell at the same time, wherein the markers are found within pathways related to cancer, can be used as an assay to diagnose cancer and to predict whether a patient already diagnosed with cancer will be therapy-responsive or therapy-resistant. An element of the assay is that at Ieast one, but preferably two or more markers are detected concurrently within the same cell, population of cells, or in a liquid biopsy specimen. Marker detection can be made through a variety of detection means, including next generation sequencing and bar-coding through immunofluorescence. The markers detected can be a variety of products, including rnRNA, RNA, DNA, protein, and peptide. For rnRNA, RNA, and DNA based markers, next generation sequencing and/or PGR can be used as a detection means. Additionally, nucleic acid sequence, protein sequence, protein products or gene copy number can be identified through detection means known in the art. The markers detected can be from a variety of pathways related to cancer. Suitable pathways for markers include any pathways related to oncogenesis and metastasis, and more specifically include the SCAR's pathway, Polycomb group (PcG) chromatin silencing pathway and the "sternness" pathway(s).

In an aspect, the present disclosure is directed to, among other things, a method for diagnosing cancer or predicting cancer-therapy outcome in a biological subject.

In an embodiment, the method includes obtaining a biological sample (e.g., tissue, a cell, a specimen of bodily fluid, biological fluid, biomarker composition, and the like) from the subject.

In an embodiment, the method includes selecting a marker from a pathway related to cancer,

In an embodiment, the method includes screening for simultaneous aberrant sequences and/or expression level of at Ieast one but preferably, two or more markers, In an embodiment, the method includes scoring their sequence(s) as being aberrant when the quality of the sequence (the defined sequence of the positions of the bases within an entire sequence or its fragment) is distinct compared with the reference sequences, and

In an embodiment, the method includes scoring their expression level as being aberrant when the expression level detected is above a certain threshold.

In an embodiment, the method includes the presence of an aberrant sequence and/or an aberrant expression level of at least one but preferably, two or more such markers is indicative of a cancer diagnosis or a prognosis for cancer-therapy failure in the subject.

In an embodiment, an aberrant sequence and/or co-expression level of the markers can be indicative of the presence of cancer in the subject, or predictive of cancer-therapy failure in the subject. The markers can be selected from any suitable cancer pathway, including in preferred embodiments markers from the SCAR's or "sternness" pathway (s). For aberrant sequences detection, these markers can be genes selected from the group consisting of ELF3; PCDH15;

MALAT1; PTPN11; RB1; CHST6; NF1; VEZF1; TP53; SMAD4; KEAP1; STK11; PRX; ZNF28; IDH1; FEZ2; DPPA2; LPHN3; KIAA1244; EPHA7; EGFR; TLR4; DAB2IP; NOTCH 1; GLUD2; DMD; KDM6A; KRAS; CDKN2A; DNMT3A; FLT3; NFE2L2; NPM1; MIR142; FOXL2; H3F3A; H3F3B; KMT2D ; RNF43 ; TERT; ERBB2; PLCG1, For aberrant expression detection, these markers can be genes selected from the group consisting of PLCXD1, HKR1, ZNF283, ADA, AMACR÷p63, ANK3, BCL2L1, BIRC5, BMI-1, BUB1, CCiMBl, CCND1, CES1, CHAF1A, CRIPl, CRYAB, ESM1, EZH2, FGFR2, FOS, Gbx2, HCFC1, IER3, ITPR1, JUNB, KLF6, KI67, KNTC2, MGC5466, Phcl, RNF2, Suzl2, TCF2, TRAPIOO, USP22, WntSA and ZFP36, in preferred embodiments, the markers are selected from the group consisting of regulatory and down-stream genetic elements of the SCAR's pathway(s), transcription factors, and methyiation patterns. In one preferred embodiment, the aberrant sequence(s) being detected and in another preferred embodiment the aberrant co-expression level being detected is of regulatory and downstream genetic elements of the SCAR's pathway(s), transcription factors, and methyiation patterns. The markers being detected are in the form of either mRNA, RNA, DNA, protein, or peptide.

In an embodiment, the aberrant expression level of at least one but preferably, two or more markers can be detected by any detection means known in the art, including, but not limited to, subjecting the cells to an analysis selected from the group consisting of next generation sequencing, multicolor quantitative immunofluorescence co-localization analysis, fluorescence in situ hybridization, and quantitative RT-PCR analysis.

In an aspect, the present disclosure is directed to, among other things, a method for concurrently detecting an aberrant sequence(s) and/or co-expression level of at least one but preferably, two or more markers in a single cell, population of cells, or liquid biopsy samples, in an embodiment, obtaining a sample of tissue, a cell, or a specimen of bodily fluid. In an embodiment, selecting a marker defined by a pathway, in an embodiment, screening for a simultaneous aberrant sequences and/or expression level of at least one but preferably, two or more markers, in an embodiment, scoring their sequence(s) as being aberrant when the quality of the sequence (the sequence of the positions of the bases within an entire sequence or its fragment) is distinct compared with the reference sequences, in an embodiment, scoring their expression level as being aberrant when the expression level detected is above a certain threshold.

In an aspect, the present disclosure is directed to, among other things, a method for detecting at least one of an aberrant sequence(s) and/or co-expression level of at least one but preferably, two or more markers in a single cell, population of cells, or liquid biopsy samples, in an embodiment, obtaining a sample of tissue, a cell, or a specimen of bodily fluid. In an embodiment, selecting a marker defined by a pathway, in an embodiment, screening for a simultaneous aberrant sequences and/or expression level of at least one but preferably, two or more markers, in an embodiment, scoring their sequence(s) as being aberrant when the quality of the sequence (the sequence of the positions of the bases within an entire sequence or its fragment) is distinct compared with the reference sequences, in an embodiment, scoring their expression level as being aberrant when the expression level detected is above a certain threshold.

In an aspect, the present disclosure is directed to, among other things, kits useful in detecting the concurrently aberrant sequences or co-expression levels of two or more markers in a single cell, population of cells, or liquid biopsy samples. In an aspect, the present disclosure is directed to, among other things, kits useful in detecting at least one of an aberrant sequences or co- expression levels of two or more markers in a single cell, population of cells, or liquid biopsy samples.

In an aspect, the present disclosure is directed to, among other things, a method of targeted therapy of malignant tumors which harbor the molecular markers selected from any suitable cancer pathway, including in preferred embodiments markers from the SCAR's or "sternness" pathway(s). Therapeutic targeting of said malignant tumors is guided by the markers being detected in the form of either mRNA, RNA, DNA, protein, or peptide. In preferred embodiments, therapeutic modalities are designed toward molecular targets selected from the group consisting of regulatory SCARs loci and down-stream genetic elements of the SCAR's pathway(s).

The present disclosure details one or more methodologies or technologies for diagnosing cancer, predicting cancer-therapy outcome, determining whether a subject who has cancer is susceptible to different types of treatment regimens, monitoring the efficacy of a cancer treatment, determining, a cancer diagnosis or a prognosis for cancer-therapy failure, and the like by detecting the sequences, expression levels, gene levels, transcription levels, and the like for multiple markers.

In an embodiment, one or more methodologies or technologies for diagnosing untreatable cancer (e.g., one with activated endogenous human Stem Cell-Associated Retroviruses (SCAR) network) include one or more of detecting mutations of the sequences of 42 genes (listed in Table SI); analyzing transcription levels of specific SCAR sequences; analyzing levels of protein sequences; analyzing expression levels in signatures, determining gene expression levels and determining gene copy numbers of Data Set SI, Data Set S2, and Data Set S3.

For example, in an embodiment, methodologies or technologies include generating a user- specific cancer therapy protocol, or a user-specific cancer diagnosis, responsive to receiving one or more inputs indicative of an aberrant sequence or an aberrant expression level associated with the expression levels of one or more locus or loci listed in Table 3.3. Non-limiting examples of genomic signature pathways, signature evaluation method, and the like can be found in U.S. Patent Mos. 8,349,555 and 7,890,267; each of which is incorporated herein by reference in its entirety.

In an embodiment, methodologies or technologies include generating a predictor of a disease state (e.g., a cancer-therapy efficacy status, cancer therapy progress, a cancer prognosis, a cancer diagnosis, therapy failure, relapse, recurrence, and the like) responsive to receiving one or more inputs indicative of an aberrant expression level associated with the expression levels of one or more peptides listed in Supplemental Table S3,

In an embodiment, methodologies or technologies include generating a predictor of a disease state (e.g., a cancer-therapy efficacy status, cancer therapy progress, a cancer prognosis, a cancer diagnosis, therapy failure, relapse, recurrence, and the like) responsive to receiving one or more inputs indicative of the SCAR's pathway activation signatures for genes listed in Supplemental Table S4.

In an embodiment, methodologies or technologies include generating a SCARs activation status responsive to receiving one or more inputs indicative of an aberrant expression level associated with the expression levels of one or more locus or loci listed in Supplemental Table S5.

In an embodiment, methodologies or technologies include generating a predictor of a disease state (e.g., a cancer-therapy efficacy status, cancer therapy progress, a cancer prognosis, a cancer diagnosis, therapy failure, relapse, recurrence, and the like) responsive to receiving one or more inputs indicative of an aberrant expression level associated with the expression levels of one or more locus or loci listed in Supplemental Table S6.

In an embodiment, methodologies or technologies include generating a predictor of a disease state (e.g., a cancer-therapy efficacy status, cancer therapy progress, a cancer prognosis, a cancer diagnosis, therapy failure, relapse, recurrence, and the like) responsive to receiving one or more inputs indicative of an aberrant expression level or a gene copy number associated with the expression levels or the copy number of one or more locus or loci listed in Data Set SI.

In an embodiment, methodologies or technologies include generating a predictor of a disease state (e.g., a cancer-therapy efficacy status, cancer therapy progress, a cancer prognosis, a cancer diagnosis, therapy failure, relapse, recurrence, and the like) responsive to receiving one or more inputs indicative of an aberrant expression level associated with the expression levels of one or more sequences listed in Data Set S2.

In an aspect, the present disclosure is directed to, among other things, a method of identification of common peptide sequences encoded by the genomic loci derived from SCAR sequences. In an embodiment, the method includes retrieving nucleic acid sequences of the SCARs- derived genomic loci which are located at distinct genomic coordinates; and identifying all open reading frames (ORFs) within said nucleic acid sequences. In an embodiment, the method further includes identifying ail peptide sequences encoded by and potentially transcribed from said nucleic acid sequences; and identifying peptide sequences common for distinct SCAR-derived genomic loci which are located at distinct genomic coordinates.

In an embodiment, methodologies or technologies include determining SCAR's networks activation using genetic signature information and protein signature information. In an embodiment. SCAR's networks activation information is used to generate a cancer outcome prognosis. For example, activated SCAR's networks are indicative of a poor cancer therapy outcome or a poor prognosis.

In an embodiment, methodologies or technologies include generating a cancer related outcome based on one more inputs indicative of an aberrant sequence and one more inputs indicative of an expression level of SCARs networks markers

Non-limiting examples of SCAR's networks include a genome-wide compendium of: i) transcriptionally-active SCAR's loci defined based on detection of the expression of corresponding RNA molecules; and ii) expression signatures of down-stream SCARs-regulated coding genes, including protein-coding genes, genes encoding non-coding RNA molecules, micro-RNAs, and other regulatory & structural molecules affected by SCARs activity.

Non-limiting examples of a SCAR pathway include a sub-set of SCAR's loci that are transcriptionally active in specific cells and/or specific biological samples, including single cells as well as populations of cells.

SCAR's pathways: a sub-set of genomic loci defined by the genome-wide SCAR's networks analyses in specific cells and/or specific biological samples, including single cells as well as populations of cells.

Non-limiting example of signatures include 74-gene signature (referring to table 54 for example), 55-gene signature (referring to table 54 for example), the SCAR's pathway signatures defined by the single cell analysis of human oocytes in which expression changes of these genes appear associated with activated transcription of HERV-H-derived retroviral sequences. The gene symbols are listed in the first column. These are coding genes expression of which is altered in a specific manner (up- and down-regulated) using shRNA-interference protocol targeting HERV-H- encoded regulatory transcripts (the log-transformed fold expression changes are listed in the second column). Expression changes of these genes in human oocytes (the log-transformed fold-expression changes are listed in the third column) are consistent with the HERV-H-pathway activation (r = - 0.74043), that is genes expression of which is up-regulated following the shHERVH interference appear down-regulated in oocytes; conversely, genes expression of which is down-regulated following the shHERVH interference appear up-regulated in oocytes. The utility of these signatures have been demonstrated by the analyses of samples of normal and pathological human prostates, including prostate cancer samples and prostatic intraepithelial neoplasia samples (Figs. 1C & 2D). The fold expression changes of each of the individual gene listed in the Table S4 would be determined using the technologies and methods known to the individuals skilled in the art. The values for corresponding genes will be listed in the order defined in the Table S4 as it is shown for the oocyte's values listed in the third column. Next, the correlation coefficient is computed for the values listed in the second and the third columns. The negative values of the correlation coefficient should be interpreted as the indication of the SCAR's pathway activation. The positive values of the correlation coefficient would indicate no evidence of SCAR's pathway activation.

In an embodiment, genetic signatures and protein signatures are used as predictors of a disease state independently. In an embodiment some specific gene/protein targets listed in current signatures are likely relevant to cancer. In an embodiment, some specific gene/protein targets listed in current signatures are utilized them to detect the SCAR's pathways & networks activation.

BRIEF DESCRIPTION OF THE FIGURES

Figures 1A-1K (collectively referred to as "Figure 1", with Figures 1A-1D sometimes referred to as "A", with Figures S 1E-1H sometimes referred to as "B", and with Figures 15-1K sometimes referred to as "C"), Distinct expression patterns of HERVH-regulated genes in eupioid and aneupioid human embryos at 1-cell versus 8-cell stages (Figures 1A-1D), developmental^ viable versus non-viable zygotes (A, Figure ID), and in vivo matured human oocytes (Figures 1E-1H).

A (Figures 1A-1D): A total of 36 statistically significant genes that are differentially expressed in human zygotes vs 8-cell human embryos are regulated by the HERVH/LBP9 in hESC. Expression of 14 of these genes is significantly different in eupioid versus aneupioid human embryos (Figures 1A and 1C), whereas expression of 22 of these genes is not significantly different in eupioid versus aneupioid human embryos (Figure IB). Similarly, expression signatures of 174 HERVH-regulated genes are distinct in developmental^ viable and non-viable human zygotes (q<0.0005; A, Figure ID). Genes up-regulated in deveiopmentaily non-viable human zygotes are highlighted.

B (Figures 1E-1H): Microarray analysis identifies gene expression signatures of HERVH- regulated genes in matured human oocytes.

Figures 2A-2SV1 (collectively referred to as "Figure 2", with Figures 2A-2D sometimes referred to as "A", with Figures 2E-2H sometimes referred to as "B", with Figures 21 and 2J sometimes referred to as "C", and with Figures 2K-2M sometimes referred to as "D"). Single-cell next generation sequencing (A-C) and microarray gene expression analysis (D) of the individual SCARs loci (A, B), SCARs-regulatory sequences of the IncRNA HPAT3 (C), and SCARs-regulated protein-coding genes (D) at various stages of the human preimplantation embryonic development (A-C) and in clinical samples of normal prostate epithelia, normal prostate stroma, benign prostatic hyperplasia, atrophic lesions in the prostate, putative prostate cancer precursor lesions of the prostatic intraepithelial neoplasia (PIN), morphologically normal prostate epithelia adjacent to prostate cancer lesions, localized prostate cancer, and metastatic prostate cancer (D).

A-C, Single-cell next generation RNA sequencing analysis of human preimplantation embryos reveals activation of expression of selected HERVH and HERVK loci in human oocytes and zygotes. Expression patterns of individual HERV loci at the each stage of human preimplantation embryos are shown. Plotted expression values were defined either by the mean expression values normalized to the expression levels in oocytes (A) or the actual measurements in every individual cell of the corresponding stage of embryonic development (B, C).

D, Microarray gene expression profiling of clinical samples representing the key stages of a hypothetical sequence of malignant progression from normal prostate epithelia to metastatic prostate tumors comprising of cells resected from normal prostate epithelia, normal prostate stroma, benign prostatic hyperplasia, atrophic lesions in the prostate, putative prostate cancer precursor lesions of the prostatic intraepithelial neoplasia (PIN), morphologically normal prostate epithelia adjacent to prostate cancer lesions, localized prostate cancer, and metastatic prostate cancer.

Figures 3A-3D (collectively referred to as "Figure 3", wherein Figures 3A-3D are also referred to as A-D, respectively, in the following description). Changes of gene expression and gene copy numbers of SCARs-targeted protein-coding genes manifest significant associations with the long-term survival of cancer patients. Gene copy numbers and m RNA expression levels of protein coding genes comprising structural components of the host/virus chimeric transcripts were evaluated for associations with long-term survival probabilities of cancer patients defined by the Kaplan-Meier survival analysis in TCGA Pan-cancer databases comprising 5,158 clinical samples across 12 TCGA cohorts (PANCAN12 study of 12 distinct cancer types) and 12,093 clinical samples across all TCGA cohorts. Examples of SCARs-targeted genes manifesting significant associations of gene expression changes (A-C) and gene copy number alterations (D) with the long-term survival of cancer patients of TCGA PANCAN12 study are shown (A; C; D). Representative examples of these associations for TCGA cohorts of three individual types of cancer [prostate cancer (n=568), breast cancer (n=l,241), and rectal cancer (n=187)] are shown in (B). Gene expression heatmaps and corresponding Kaplan-Meier survival curves are shown in (A). Heatmaps of gene expression (left images) and copy numbers (right images) and associated Kaplan-Meier survival curves are shown in (D). Vertical dashed lines depict the ten years survival data points. Corresponding p values are reported in the Supplemental Data Set SI.

Figure 4A-4D {collectively referred to as "Figure 4", wherein Figures 4A-4D are also referred to as A~D, respectively, in the following description). Protein alignments of translated amino acid sequences of the human-specific virus/host chimeric transcripts identify distinct patterns of conserved protein domains encoded by different SCARs loci. Nucleotide sequences of human- specific chimeric transcripts were translated into amino acid sequences and subjected to the BLAST protein alignment analyses as described in the Materials and Methods. Note that the most frequently represented conserved protein domains within translated amino acid sequences encoded by human-specific SCARs-derived host/virus chimeric transcripts is the GVQW amino acid sequence (A; C; D).

Figure 5A-5D (collectively referred to as "Figure 5", wherein Figures 5A-5D are also referred to as A-D, respectively, In the following description). Evolutionary tracing of human- specific expansion of the GVQW conserved protein domain originated from the identical nucleic acid sequences of human-specific chimeric virus/host transcripts of SCARs on chrX:278899-284216 and chrY:278899-284216. Nucleotide sequences encoding the GVQW conserved domain were expanded to include a few adjacent amino acids, which was sufficient to obtain the SCARs' locus-specific nucleotide sequences. The genomic origin of the GVQW-encoding sequences was inferred based on the 100% nucleotide sequence identities of a given genomic sequence and the corresponding locus- specific SCARs-derived sequence. The BLAT algorithm was utilized to determine the numbers of GVQW-encoding nucleotide sequences in genomes of humans and non-human primates, which are 100% identical to the sequences of chimeric virus/host transcripts encoded by the specific SCARs' loci. Note that no GVQW conserved protein domain-encoding sequences were detected in the mouse and rat genomes. Only GVQW-encoding sequences originated from SCARs transcripts on chrX:278899~284216 and/or cbrY:278899-284216 appear markedly expanded in the human genome (red colored bar in Fig. 3C) and this expansion is associated with marked enrichment in the human proteome compared with other Great Apes of the number of proteins harboring conserved GVQW domains (Fig. 3D).

Figure 6A-6B {collectively referred to as "Figure 6", wherein Figures 6A-6B are also referred to as A-B, respectively, in the following descriptiors), Changes of gene-level copy numbers of 21 zinc finger proteins harboring GVQW conserved protein domains manifest significant associations with the long-term survival of cancer patients diagnosed with 29 distinct types of malignancies. Gene copy numbers of ail identified to date zinc finger proteins harboring GVQW conserved protein domains were evaluated for associations with long-term survival probabilities of cancer patients defined by the Kaplan-Meier survival analysis of TCGA Pan-cancer databases comprising 12,093 clinical samples across ail TCGA cohorts representing 29 cancer types. Heatmaps of gene copy number changes (A) and associated Kaplan-Meier survival curves (B) are shown. Results of the Kaplan-Meier survival analyses are shown for 21 zinc finger proteins harboring GVQW conserved protein domains and three SCARs-targeted zin finger proteins (ZNF443; ZNF587; ZNF814). The reported p values are from the Kaplan-Meier survival curves generated by the Xena Cancer Genome Browser data visualization tools (http://xena.ucsc.edu/).

Figure 7A-7D (collectively referred to as "Figure 7", wherein Figures 7A-7D are also referred to as A-D, respectively, in the following description). Somatic non-silent mutations' signatures of the clinical intractability of malignant tumors defined by the decreased survival and increased likelihood of death from cancer,

A, Identification of the eighteen genes harboring somatic non-silent mutation signatures of death from cancer phenotypes. The eighteen top-scoring human genes were identified in which the largest numbers of somatic non-silent mutations (SNMs) were detected in 12,093 tumor samples across all TCGA cohorts, provided a requirement is met that the presence of these mutations in tumors is associated with significantly increased likelihood of death from cancer defined by the Kaplan-Meier survival analysis. Top panel shows distributions of SNMs of the 18 genes among patients' tumor samples aligned to the SNMs' profile of the TP53 gene. The numbers of cancer patients with SNMs of each of the 18 genes are reported as the percent of events. Shaded area highlights the relative number of cancer patients without SNMs. Note that Kaplan-Meier survival curves for each of these 18 genes identify patients with significantly decreased survival probability and increased likelihood of death from cancer. Therefore, detection of SNMs in each of these eighteen genes isolated from tumor samples is associated with poor long-term prognosis of cancer patients compared with patients whose tumors do not have SNMs of these genes (Fig. 5A).

Underlined gene symbols identify genes expression of which is regulated by SCARs in the hESC. Red- colored gene symbols depict SCARs -targeted genes, whereas black-colored gene symbols identify previously reported candidate cancer driver genes.

B, Comparisons of the Kaplan-Meier survival analyses of 7,509 cancer patients with and without SNMs in their tumors for the TP53 gene only (Figure 7A, top left figure below); the 18-gene SNMs' signature (Figure 7B, top right figure below); the 26-gene SNMs' signature without TP53 (Figure 7C, bottom left figure below); the 27-gene SNMs' signature including the TP53 gene (Figure 7D, bottom right figure below).

C, D, Linear regression analyses of the clinical intractability of malignant tumors in patients diagnosed with 28 (C) and 19 (D) cancer types. C, Cancer patients' survival data from TCGA Pan- cancer cohort of 28 cancer types were utilized to calculate the percent of death events for each cancer type; the resulting values were aligned with the percent of patients with the SNMs death from cancer signatures in the corresponding groups of cancer patients and subjected to the linear regression analysis. D, Age-adjusted cancer incidence and death rates (per 100,000 people) in the United States for 19 cancer types were obtained from the Center for Disease Control and Prevention (CDC) United States Cancer Statistics (USCS) report; the estimated death rates for each cancer type were calculated by multiplying the corresponding values of incidence rates and percent's of patients with the SNMs death from cancer signatures; the resulting values were aligned with the actual death rates for the corresponding cancer types and subjected to the regression analysis.

Figures 8A-8B (collectively referred to as "Figure 8", wherein Figures 8A-8B are also referred to as A-B, respectively, in the following description). Protein expression changes of the SCARs sternness networks' genes manifest statistically significant associations with decreased long- term survival and increased likelihood of death from cancer.

Protein expression changes of 38 SCARs sternness networks' genes were evaluated for associations with long-term survival probabilities of cancer patients defined by the Kaplan-Meier survival analysis in TCGA Pan-cancer database comprising 5,158 clinical samples across 12 TCGA cohorts. In total, changes in the protein expression levels of 23 SCARs-regulated genes (60.5%) manifested significant associations with the long-term survival probability of cancer patients (Supplemental Data Set SI). Heatmaps of protein expression and associated Kaplan-Meier survival curves are shown. Corresponding p values are reported in the Supplemental Data Set SI.

Figure 9. Transcriptionally active LTR /'/HERVH SCARs contribute to repair of double-stranded breaks (lightning bolt) of host DMA (blue Iines) by coopting the alternative non-homologous end joining (NH EJ) DMA repair pathway. Reverse transcription of SCARs RIMA (dashed black line) with partial homology regions to host DNA creates DNA molecules (solid black Iines) filling the gap at the site of double-stranded breaks of host DNA. A hallmark of this mechanism of SCARs-associated repair of double-stranded DMA breaks is the evidence of deletions of ancestral DMA segments (solid red iines) at the sites of insertions of the LTR7/HERVH sequences in the human genome (see Table 3 and text for further details). This process creates human-specific integration sites of SCARs and may facilitate generation of host/virus chimeric transcripts (blue/black dashed Iines). DSB, double-stranded break; NH EJ, non-homologous end joining; RT, reverse transcription; SCARs, stem cell-associated retroviruses.

Figure 10. Flow chart of a decision-making process in clinical management of cancer patients on the basis of continuing sequential sampling for monitoring of the SCAR's networks activity status in blood, serum, and plasma samples; circulating tumor cells; primary and metastatic tumor samples.

Identification of genetic and/or molecular evidence of the activated SCAR's networks at any stage of this sequence would favor the diagnosis of therapy-resistant clinically-lethal disease phenotype and trigger the requirement for the immediate consideration of the following therapy selection choices: the "next-in-line" aggressive treatment protocols; novel therapies specifically targeting SCAR's pathways and/or therapeutic interventions considered suitable for patients with malignant tumors manifesting the active status of SCAR's networks. CTC, circulating tumor cell; FFPE, formalin-fixed paraffin embedded. Adopted from : Glinsky, GV. 2008. "Sternness" genomics law governs clinical behavior of human cancer: implications for decision making in disease management. Journal of Clinical Oncology, 26: 2846-53.

Supplemental Figures S1A-S1K (collectively referred to as "Supplemental Figure SI"). (Related to the Fig. 4). Additional examples of distinct and common patterns of the conserved protein domain expression within translated amino acid sequences of the host/virus chimeric transcripts encoded by endogenous human SCARs in the hESC. Nucleotide sequences of human- specific chimeric transcripts were translated into amino acid sequences and subjected to the protein alignment analyses using the protein BLAST algorithm

(http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp& BLAST PROGRAMS=blastp&PA GE TYPE=BlastSearch&SHOW DEFAULTS=on&LINK LOC=blasthome ) and associated web-based tools for identification and visualization of conserved protein domains

( http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=3HZ5 Blv1ES01 R&mode=all ), which were described in details elsewhere [80, 81].

Protein alignments of translated amino acid sequences of the human-specific virus/host chimeric transcripts identify distinct patterns of conserved protein domains encoded by different SCARs loci. Nucleotide sequences of human-specific chimeric transcripts were translated into amino acid sequences and subjected to the BLAST protein alignment analyses as described in the Materials and Methods. Note that the most frequently represented conserved protein domains within translated amino acid sequences encoded by human-specific SCARs-derived host/virus chimeric transcripts is the GVQW amino acid sequence.

Supplemental Figures S2A-S2E {collectively referred to as "Supplemental Figure S2", wherein Supplemental Figures S2A-S2D are also referred to as A~D, respectively, in the following description, wherein Supplemental Figure S2E is a compilation of Supplemental Figures S2A-S2E). (Related to the Fig. 6). Changes of gene expression and gene copy numbers of zinc finger proteins harboring GVQW conserved protein domains manifest significant associations with the long-term survival of cancer patients. Gene copy numbers (D) and mRNA expression levels (A-C) of zinc finger proteins harboring GVQW conserved protein domains were evaluated for associations with long- term survival probabilities of cancer patients defined by the Kaplan-Meier survival analysis of cancer patients diagnosed with prostate cancer (n=568); breast cancer (n=l,241); colon cancer (n=550); rectal cancer (n=187); pancreatic cancer (n=196); and TCGA Pan-cancer databases comprising 5,158 clinical samples across 12 TCGA cohorts (PANCAN12 study of 12 distinct cancer types).

Representative examples of zinc finger proteins with GVQW conserved protein domains that manifest significant associations of gene expression changes (A-C) in TCGA cohorts of five individual types of cancer [prostate cancer (A); breast cancer (B; C, bottom left panel); colon cancer (C; top left panel); rectal cancer (C; top right panel); and pancreatic cancer (C, bottom right panel)] are shown. Examples of zinc finger proteins with GVQW conserved protein domains manifesting significant associations of gene copy number alterations with the long-term survival of cancer patients of TCGA PANCAN12 study are shown in Fig. 4D. Gene expression heatmaps and corresponding Kaplan-Meier survival curves are shown in (A-C). Heatmaps of gene expression (left images) and exon expression (right images) and associated Kaplan-Meier survival curves are shown in (C). Heatmaps of gene expression (left images) and copy numbers (right images) and associated Kaplan-Meier survival curves are shown in (D). Corresponding p values are reported in the Supplemental Data Set SI.

Supplemental Figure S3A-S3B (collectively referred to as "Supplemental Figure S3", wherein Supplemental Figures S3A-S3B are also referred to as A-B, respectively, in the following description). (Related to the Fig. 7). Additional Kaplan-Meier survival analyses of the classification performance of SN Ms genes including only patients with the complete clinical records of the follow- up survival data.

A, Comparisons of the Kaplan-Meier survival analyses of 7,258 cancer patients with and without SN Ms in their tumors (top and bottom left figures) and cancer patients stratified into subgroups of identical size (n=2,419) after sorting in the ascending order of their survival time (top and bottom left figures), in this analyses only patients with the complete clinical records of the follow-up survival data were included.

B, Visualization of mutations' fingerprints of genes harboring the SNMs signatures of death from cancer phenotypes. Note that these genes isolated from clinical tumor samples appear

survival analysis in two TCGA Pan-cancer databases comprising 5,158 clinical samples across 12 TCGA cohorts (A; C) and 12,093 clinical samples across 29 TCGA cohorts (B; D). Note, that strikingly similar results were observed for the copy number changes of the BMil (bottom left figures in C & D) and EZH2 (bottom right figures in C & D) genes, associations of which with the activation of the Polycomb chromatin silencing pathway and sternness gene expression signatures in tumors from cancer patients with increased likelihood of death from cancer were previously documented (37-51). Corresponding p values are reported in the Supplemental Data Set SI.

Supplemental Figure S5, Kaplan -Meier survival analyses of therapy outcomes in prostate cancer patients stratified into distinct sub-groups based on expression profiles of the 11-gene death from cancer signature and expression signatures of three SCARs network genes (PLCXD1, HKR1, Z1MF283).

Table S1A (also referred to as "Table SI"). Panel of 42 genes for the analysis of the somatic non-silent mutations which were identified based on significant associations with the increased likelihood of therapy failure and death from cancer in multiple pan-cancer databases.

Table S2A-S2C (collectively referred to as "Table S2").

Table S2A: Two-tailed p value: 0.00090474; p = 0.0009; related to Figure 7C.

Table S2B: 2 -tailed p value; related to Figure 7D.

Table S2C: Related to Figure 7.

Table S3A-S3B {collectively referred to as "Table S3").

Table S3A. ChrY_ChrX

Table S3B. chr3_chrll

Table S4A-S4B (collectively referred to as "Table S4").

Table S4A, 74 genes.

Table S4B. 55 genes.

Table S5A-S5C (collectively referred to as "Table S5").

Table S5A. HERVH-loci manifesting the most significant activation at the zygote stage of human embryogenesis. Related to FIGURE 2.

Table S5B. HERVK-; HERVH-; and other SCARs loci manifesting the most significant activation at the zygote stage of human embryogenesis. Related to FIGURE 2. Table S5C (shown partially in the Figures). SCARs sequences implicated in the human embryogenesis and development of pathological conditions in human subjects.

Table S6A-S6C (collectively referred to as "Table S6").

Table S6A, 64 HERV1 human-specific chimeric transcripts (Bonobo & Chimp alignments failures.

Table S6B.

Table SBC.

A wide variety of cancer treatment protocols have been developed in recent years, including novel methods of personalized, target-tailored cancer therapies. Often, very aggressive cancer therapy is reserved for late stage cancers due to unwanted side effects produced by such therapy. However, even such aggressive therapy commonly fails at such a late stage. The ability to identify cancers responsive only to the most aggressive therapies at an earlier stage could greatly improve the prognosis for patients having such cancers.

In recent years, potentially useful markers predictive of such outcomes have been identified. Glinsky, G.V. et al., J. Clin. Invest. 113: 913-923 (2004) teaches that gene expression profiling predicts clinical outcomes of prostate cancer. Van 't Veer et al., Nature 415: 530 -536 (2002) teaches that

carried out in clinical laboratories, and should accurately predict the likelihood of resistance of various cancers to be applied to standard therapeutic regimens.

A very large number of attempts have been made to discover, define, and design

treatments, develop treatments, and to treat metastatic and intractable cancers, principally by either attacking basic mechanisms of rapid cell growth or aberrant cancer cell metabolic pathways, with little success. Recently, some methods of enabling or re-enabling the immune system in its attack on tumors and micro-metastases has shown much more promising data in trials and commercial use. but the majority of patients with metastatic and intractable disease have proven refractory to even these immune-modulating therapies. There is, therefore, a need for new cancer therapies which, either used as sole therapeutic agents or in combination with other modalities - particularly immune-modulation - are designed to fundamentally attack the cellular mechanisms allowing the metastatic phenotype. Such new therapies should be derived from an understanding of the critical gene signatures responsible for metastasis and survival of cancer cells.

Somatic mutations and chromosome instability are hallmarks of genomic aberrations in cancer cells. Aneupioidies represent common manifestations of chromosome instability, which is frequently observed in human embryos and malignant solid tumors. Activation of human endogenous retroviruses (HERV)-derived loci is documented in preimpiantation human embryos, hESC, and multiple types of human malignancies. It remains unknown whether the HERV activation may highlight a common molecular pathway contributing to the frequent occurrence of

chromosome instability in the early stages of human embryonic development and the emergence of genomic aberrations in cancer.

Single cell RNA sequencing analysis of human preimpiantation embryos reveals activation of specific LTR7/HERVH loci during the transition from the oocytes to zygotes and identifies HERVH network signatures associated with the aneuploidy in human embryos. The correlation pattern's analysis links transcriptome signatures of the HERVH network activation of the in vivo matured human oocytes with gene expression profiles of clinical samples of prostate tumors supporting the existence of a cancer progression pathway from putative precursor lesions (prostatic intraepithelial neoplasia) to localized and metastatic prostate cancers. Tracking signatures of HERVH networks' activation in tumor samples from cancer patients with known long-term therapy outcomes enabled patients' stratification into sub-groups with markedly distinct likelihoods of therapy failure and death from cancer.

Genome-wide analyses of human-specific genetic elements of stem cell-associated retroviruses (SCARs)-regulated networks in 12,093 clinical tumor samples across 29 cancer types revealed pan-cancer genomic signatures of clinically-lethal therapy resistant disease defined by the presence of somatic non-silent mutations (SIMMs), gene-level copy number changes, transcripts' and proteins' expression of SCARs-regulated host genes. More than 73% of all cancer deaths occurred in patients whose tumors harbor the SNMs' signatures. Linear regression analysis of cancer intractability in the United States population demonstrated that organ-specific cancer death rates are directly correlated with the percentages of patients whose tumors harbor the SNMs' signatures.

SCARs-encoded RNA molecules possess intrinsic protein-coding potentials including amino acid sequences defined as conserved protein domains (CPD). Mapping of SCARs-encoded CPDs revealed thousands of locus-specific fingerprints of CPDs scattered genome-wide. The evolutionary expansion of SCARs' sequences encoding specific CPDs resulted in a marked enrichment in the human proteome of the unique protein sequences on which the CPD is found. These results indicate that diseased cells with high expression levels of SCARs RNA are likely to carry a markedly increased load of SCARs RNA-encoded peptides providing attractive and highly specific molecular targets for immunotherapeutic interventions.

A systematic analysis of molecular structures of human-specific virus/host chimeric transcripts demonstrates that a hallmark feature of SCARs' integration in the human genome is a multispecies deletion pattern of ancestral DNA. The cross-species tracing of SCARs' loci with human- specific insertions and deletions suggests a potential role in the repair of double-stranded DNA breaks, highlighting a putative biological function of SCARs that may enhance the immediate survival and fitness of host cells. On the evolutionary scale, in addition to seeding thousands of human- specific regulatory sequences, the SCARs' activity appears involved in DNA repair and spreading sequences of specific CPDs throughout the human genome.

Examples presented herein demonstrate that awakening of SCARs-regulated sternness networks in differentiated cells is associated with development of a diverse spectrum of genomic aberrations subsequently readily detectable in multiple types of clinically lethal malignant tumors and likely contributing to emergence of therapy-resistant phenotypes.

Key words: human endogenous stem cell-associated retroviruses (SCARs); human-specific regulatory sequences; human ESC; human embryos; pluripotent state regulators; NANOG; POU5F1 (OCT4); CTCF; LTR7 RNAs; long terminal repeats, LTR; LTR7/HERVH; LTR5HS/HERVK; therapy- resistant cancers; cancer stem cells List of abbreviations

HERV, human endogenous retroviruses

hESC, human embryonic stem cells

LINE, long interspersed nuclear element

IncRNA, long non-coding RNA

lincRNA, long intergenic non-coding RNA

LTR, long terminal repeat

NANOG, Nanog homeobox

POU5F1 , POU class 5 homeobox 1

SCARs, stem cell associated retroviruses

TCGA, The Cancer Genome Atlas

TE, transposabie elements

TF, transcription factor

TFBS, transcription factor-binding sites

sncRNA, small non coding RNA

STEM CELL-ASSOCIATED RETROVIRUSES (SCARS)

Activity of endogenous retroviruses is suppressed in human cells to restrict the potentially harmful effects of mutations on functional genome integrity and to ensure the maintenance of genomic stability. Human embryonic stem cells (hESCs) and early-stage human embryos seem markedly different in this regard. Expression of human endogenous retroviruses (HERV), in particular, HERVH and HERVK subfamilies, is markedly activated in hESCs [1-3]. An enhanced rate of insertion of LTR7/HERVH sequences in the human genome appears to be associated with binding sites for pluripotency core transcription factors [1; 3; 4], including human-specific transcription binding sites [3], and long noncoding RNAs [5]. Analysis of transcription factor binding sites in hESC suggests that expression of HERVH is regulated by the pluripotency regulatory circuitry, since 80% of long terminal repeats (LTRs) of the 50 most highly expressed HERVH loci are occupied by pluripotency core transcription factors, including NANOG and POU5F1 [1]. Furthermore, transposable elements (TE) -derived sequences, most notably LTR7/HERVH, LTR5_Hs/HERVK, and L1HS, harbor 99.8% of the candidate human -specific regulatory sequences (HSRS) with putative transcription factor-binding sites (TFBS) in the genome of hESC [3]. Based on the common functional features of these specific families of HERVs, which are mediated by their active expression in the human embryos and hESC [6-9], they were designated as the endogenous human stem cell- associated retroviruses (SCARs).

Recent studies highlighted mechanisms of activation and putative biological functions of SCARs in human preimplantation embryos and embryonic stem cells. The LTR7/HERVH subfamily is rapidly demethylated and upreguiated in the blastocyst of human embryos and remains highly expressed in hESC [10]. Sequences of LTR7, LTR7B, and LTR7Y, which typically harbor the promoters for the downstream full-length HERVH-int elements, were found expressed at the highest levels and were the most statistically significantly up-regulated retrotransposons in human ESC and induced pluripotent stem cells, iPSC [11]. It has been demonstrated that LTRs of HERVH subfamily, in particular, LTR7, function in hESC as enhancers and HERVH sequences encode nuclear non-coding RNAs, which are required for maintenance of pluripotency and identity of hESC [12]. Transient spatiotemporaliy controlled hyper-activation of HERVH is required for reprogramming of differentiated human cells toward induced pluripotent stem cells (iPSC), maintenance of pluripotency and reestabiishment of differentiation potential [13], Failure to control and silence the LTR7/HERVH activity leads to the differentiation-defective phenotype in neural lineage [13, 14]. Activation of LI retrotransposons may also contribute to these processes because significant activities of both LI transcription and transposition were recently reported in iPSC of humans and other great apes [15]. Single-cell RNA sequencing of human preimplantation embryos and embryonic stem cells [16, 17] enabled identification of specific distinct populations of early human embryonic stem cells defined by marked activation of specific retroviral elements [18].

Discovery of endogenous human SCARs and compelling evidence of their essential role in human embryogenesis may have some immediate practical implications. Heterogeneous populations of human ESCs and iPSC contain naive-state stem cells that have the most broad and robust multi-lineage developmental potentials and. therefore, hold great promise for a multitLfde of life-saving therapeutic applications in regenerative medicine. Consistent with definition of increased LTR7/HERVH expression as a hallmark of naive-like hESCs, a sub-population of hESCs and human induced pluripotent stem cells (hiPSCs) with markedly elevated LTR7/HERVH expression manifests key properties of naive-like pluripotent stem cells [19]. Furthermore, human na ' fve-like pluripotent stem cells can be genetically tagged, successfully isolated and maintained in vitro based on markers of elevated transcription of LTR7/HERVH [19]. Embryonic stem cell-specific transcription factors NANOG, POU5F1, KLF4, and LBP9 drive LTR7/HERVH transcription in human pluripotent stem cells [19]. Targeted interference with HERVH activity and HERVH-derived transcripts severely

compromises self-renewal functions of human pluripotent stem cells [19].

Similar to the LTR7/HERVH subfamily, transactivation of LTR5_Hs/HERVK by pluripotency master transcription factor POU5F1 (OCT4) at hypomethylated LTRs, which represent the most evolutionary recent genomic integration sites of HERVK retroviruses, induces HERVK expression during normal human embryogenesis [20]. it coincides with embryonic genome activation at the eight-cell stage, continuing through the stage of epiblast cells in preimpiantation blastocysts, and ceasing during hESC derivation from blastocyst outgrowths [20]. The unequivocal experimental evidence of HERVK activation during human embryogenesis has been reported by Grow et al. [20]. They demonstrated the presence of HERVK viral-like particles and Gag proteins in human blastocysts, supporting the idea that endogenous human retroviruses are active and functional during early human embryonic development. Consistent with this hypothesis, overexpression of HERVK virus-accessory protein Rec in pluripotent cells was sufficient to increase the host protein IFITM1 level and inhibit viral infection [20], suggesting that this anti-viral defense mechanism in human early-stage embryos may be triggered by HERVK activation. Detailed analysis of how activation of retrotransposons orchestrates species-specific gene expression in embryonic stem cells is presented in the recent review [21], highlighting the fine regulatory balance established during evolution between activation and repression of specific retrotransposons in human cells.

Recent experiments identified key effector molecules mediating critical biological activities of SCARs in hESC. SCARs-derived long noncoding RNAs have been described as the essential regulatory molecules for maintaining pluripotency, functional identity, and integrity of hESC [12]. Collectively, these experiments conclusively established the essential role of the sustained yet tightly spatiotemporally controlled activity of specific endogenous retroviruses for pluripotency maintenance and functional identity of human pluripotent stem cells, including hESC and iPSC. It has been hypothesized that awakening of SCARs may be associated with activation of sternness genomic networks in cancer cells and the emergence of clinically-lethal death from cancer phenotypes in patients diagnosed with multiple types of malignant tumors [6-9].

In summary, the emerging consensus view is that spatiotemporally controlled activation of endogenous stem cell-associated retroviruses (SCARs) in human preimplantation embryos, specifically LTR7/HERVH and LTR5_Hs/HERVK subfamilies, is required for the pluripotency maintenance, functional identity and integrity of the naive-state ESC, and anti-viral resistance of the early-stage human embryos. Expression of SCARs is epigenetically silenced in differentiated human cells and failure to control and efficiently silence the SCARs activity leads to differentiation-defective phenotypes. Reversal of epigenetic silencing of SCARs loci in cancer cells appears associated with activation of SCARs expression in multiple types of human tumors (reviewed in 9 and references therein).

In this contribution, single cell RNA sequencing analysis of human preimplantation embryos reveals activation of specific LTR7/HERVH loci during the transition from the oocytes to zygotes and identifies HERVH network signatures associated with aneupioidy in human embryos. The correlation patterns' analysis links transcriptome signatures of the HERVH network activation of the in vivo matured human oocytes with gene expression profiles of clinical samples of prostate tumors supporting the existence of a cancer progression pathway from prostatic intraepithelial neoplasia to localized and metastatic prostate cancers. Manifestation of a diverse spectrum of genomic aberrations in malignant tumors from cancer patients with clinically lethal disease has been associated with the activation of SCARs networks in cancer cells. The Cancer Genome Atlas (TCGA)- guided analyses of SCARs networks in 12,093 clinical samples across all TCGA cohorts representing 29 cancer types revealed pan-cancer genomic signatures of clinically-lethal therapy resistant disease defined by the gene expression, gene-level copy number changes, protein expression, somatic non- silent mutations of SCARs-associated protein-coding genes and non-coding RNA loci.

DESCRIPTION OF EXPERIMENTAL EXAMPLES Single-cell transcriptome analysis reveals active transcription from selected LTR7/HERVH loci and altered expression of LTR7/HERVH -regulated genes in aneuploidy-prone and developmental^ non-viable human zygotes

Chromosome instability is common in the early-stage human embryonic development and aneuploidies observed in 50-80% of cleavage-stage human embryos [Vanneste E, Voet T, Le Caignec C, Ampe M, Konings P, Melotte C, Debrock S, Amyere M, Vikkula M, Schuit F. Fryns JP, Verbeke G, D'Hooghe T, Moreau Y, Vermeesch JR. Chromosome instability is common in human cleavage-stage embryos. Nat Med. 2009; 15 :577-83; Johnson DS, Gemelos G, Baner J, Ryan A, Cinnioglu C, Banjevic M, Ross R, Alper M, Barrett B, Frederick J, Potter D, Behr B, Rabinowitz M. Preclinical validation of a microarray method for full molecular karyotyping of blastomeres in a 24-h protocol. Hum Reprod. 2010; 25: 1066-75; Chavez SL, Loewke KE, Han J, Moussavi F, Coils P, Munne S, Behr B, Reijo Pera RA. Dynamic blastomere behaviour reflects human embryo ploidy by the four-cell stage. Nat Commun. 2012; 3:1251; Vera-Rodriguez M, Chavez SL, Rubio C, Reijo Pera RA, Simon C. Prediction model for aneuploidy in early human embryo development revealed by single-cell analysis. Nat Commun. 2015; 6: 7601; Yanez LZ, Han J, Behr BB, Pera RA, Camarillo DB. Human oocyte developmental potential is predicted by mechanical properties within hours after fertilization. Nat Commun. 2016; 7: 10809].

Aneuploidies in human embryos impair proper development leading to the cell cycle arrest, loss of cell viability, and developmental failures. Single-cell transcriptome analyses demonstrated that gene expression signatures of zygotes could reliably predict the development of euploid and aneuploid human embryos as well as distinguish between developmentally viable and non-viable zygotes [Vera-Rodriguez M, Chavez SL, Rubio C, Reijo Pera RA, Simon C. Prediction model for aneuploidy in early human embryo development revealed by single-cell analysis, Nat Commun. 2015; 6: 7601; Yanez LZ, Han J, Behr BB, Pera RA, Camarillo DB. Human oocyte developmental potential is predicted by mechanical properties within hours after fertilization, Nat Commun. 2016; 7: 10809].

The validity test of the hypothesis that activation of specific LTR7/HERVH loci is associated with development of aneuploidies in human embryos must conform to these experimental paradigms and comply with the following postulates:

● Increased LTR7/HERVH expression should be readily detectable in human zygotes; ● Cells with activated LTR7/HERVH loci at the zygote stage should not persist during the subsequent stages of human embryogenesis; and

● Gene expression signatures of aneuploidy-prone human embryos should harbor the significant number of LTR7/HERVH-regulated genes.

Analysis of human embryonic development-associated genes demonstrates that the number of LTR7/HERVH-regulated genes is significantly enriched among genes that are differentially expressed in aneuploid compared with euploid embryos (Table 1A). In contrast, no significant enrichment of the LTR7/HERVH-regulated genes was documented in other gene sets representing six distinct gene expression categories of human embryonic development-associated genes (Table 1A). Consistent with the hypothesis that activation of LTR7/HERVH loci is associated with development of aneupioidies in human embryos, the significant correlation was observed between the gene expression signature of shHERVH-treated hESC and the gene expression profile of zygotes versus 8- cell embryos comprising of genes that are differentially expressed in aneuploid versus euploid embryos (Figure 1). In contrast, no significant correlation was documented between the expression signature of shHERVH-treated hESC and the gene expression profile of zygotes versus 8-cell stage embryos comprising of genes that are not differentially expressed between aneuploidy versus euploid embryos (Figure 1). Consistent with the idea that the expression of HERVH-regulated genes distinguishes human zygotes with distinct developmental potentials, it has been observed that fifty percent of all genes differentially expressed in developmental^ viable versus non-viable zygotes comprised of genes regulated by the LBP9/HERVH in hESC (Figure 1).

Next, the validity of a prediction was tested that activation of LTR7/HERVH expression occurs early in the embryogenesis following the fertilization of oocytes and, therefore, it could be readily- observed in human zygotes during the single cell transcriptome analysis of human preimplantation embryos. In agreement with this idea, the significant activation of several defined LT7/HERVH loci was observed during transition of the fertilized human oocytes to zygotes (Figure 2). Notably, the increased LTR7/HERVH expression in zygotes was restricted to only limited number of specific LTR7/HERVH loci and failed to persist beyond the 8-cell stage (Figure 2). As expected, most of the LTR7/HERVH loci remain silent during the early-stage embryogenesis and undergo massive activation during the late blastocyst stage, the epibiast formation, and at the onset of hESC creation [1-14; 16- 21]. In agreement with the hypothesis, a vast majority of cells with activated LTR7/HERVH loci in zygotes did not persist during the subsequent stages of human embryogenesis (Figure 2), with the exception of the pattern 4 cells manifesting markedly increased LTR7/HERVH expression at the epibiast and hESC creation stages of embryogenesis. Activation of the LTR7/HERVH loci manifesting the pattern 4 of expression profiles during human embryogenesis is likely related to the creation of the ground -state pluripotency state and naive h ESC. This hypothesis is further corroborated by the single-cell transcriptome analyses of expression profiles of the LTR7/HERVH sequences of H PAT3 iincRNA which plays an important role in pluripotency regulation and maintenance networks of hESC (Figure 2).

Gene expression signature of the LTR7/HERVH network activation in human oocytes distinguishes prostate cancer precursor lesions, localized and metastatic prostate cancers from normal prostate epithelia and benign prostatic hyperplasia.

During embryogenesis no transcription occurs before the embryonic genome activations, indicating that the early stages of embryogenesis are controlled exclusively by the maternal genetic information inherited exclusively from the oocytes. The major wave of transcriptional activation of embryonic genome was observed at the four- to eight-cell stage of human embryogenesis [Dobson AT, Raja R, Abeyta MJ, Taylor T, Shen S, Haqq C, Pera RA. The unique transcriptome through day 3 of human preimpiantation development. Hum. Mol. Genet. 2004; 13: 1461-1470]. These considerations suggest that the increased expression of the HERVH loci observed in human zygotes may be related to their active transcriptional status in oocytes. Consistent with this idea, analysis of the

transcriptome of human metaphase H oocytes obtained within minutes after their removal from the ovary [Kocabas AM, Crosby J, Ross PJ, Otu H H, Beyhan Z, Can H, Tarn WL, Rosa GJ, Halgren RG, Lim B, Fernandez E, Cibelli JB. The transcriptome of human oocytes, Proc Natl Acad Sci U S A. 2006; 103: 14027-32] identified a large set of differentially-expressed HERVH-regulated genes (Figure 1).

Furthermore, single cell transcriptome analysis of human preimpiantation embryos revealed direct experimental evidence of the expression of selected LTR7/HERVH loci in human oocytes [Figure 2], Identification of the gene expression signature of LTR7/HERVH network activation in human oocytes provides the opportunity to determine whether this gene signature may be useful for detection of the LTR7/HERVH transcriptome activation in clinical samples of malignant tumors. Remarkably, this analysis reveals that the gene expression signature of the LTR7/HERVH network activation in human oocytes appears to distinguish prostate cancer precursor lesions, localized and metastatic prostate cancers from clinical samples of normal prostate epithelia, stroma, and benign prostatic hyperplasia (Figure 3).

These observations strongly indicate that activation of the LTR7/HERVH transcriptome occurs in large sub-sets of clinical samples of prostatic intraepithelial neoplasia constituting prostate cancer precursor lesions (31-46% of samples), localized prostate adenocarcinomas (22-28% of samples), and metastatic prostate cancers (45-60% of samples). Collectively, these results argue that activation of the LTR7/HERVH regulatory network occurs early during development of clinically significant prostate cancer and manifests the persistence during prostate cancer progression from putative precursor lesions (prostatic intraepithelial neoplasia) to localized and metastatic prostate cancers.

Differential expression of human-specific chimeric host/virus transcripts segregates cancer patients into subgroups with markedly distinct long-term survival probabilities

It has been hypothesized that awakening of SCARs is associated with activation of sternness genomic networks in cancer cells and the emergence of clinically-lethal death from cancer phenotypes in patients diagnosed with multiple types of malignant tumors [6-9]. Insertions of SCARs in defined regions of the hESC genome appear to markedly affect the expression of host genes and chimeric host/virus transcripts by creating alternative promoters, exonization, and alternative splicing (18-20). These data suggest that genomic signatures of the activation of SCARs networks may consist of different classes of genetic elements, including SCARs-derived transcripts, SCARs- regulated protein-coding genes, chimeric host/virus transcripts, and non-coding RNAs, Interestingly, while ~75% of the full-length LTR7/HERVH loci appear highly conserved in humans and non-human primates (Table 1), more than 300 loci represent candidate human-specific regulatory elements, thus underscoring the need for exploration of biological roles of both conserved primate-specific and unique to human regulatory SCARs-derived sequences. Of note, full-length human-specific

LTR7/HERVH sequences are significantly enriched among the transcriptionally active loci compared with the inactive LTR7/HERVH loci (Table 1). Therefore, mRNA expression profiles of protein-coding genes comprising structural components of the host/virus chimeric transcripts may be useful for the assessment of the potential clinical relevance of the locus-specific SCARs activation in human tumors. To assess the potential clinical relevance of SCARs activation, the patterns of changes of mRNA expression levels of protein coding genes comprising structural components of the host/virus chimeric transcripts in association with long-term survival probabilities of cancer patients defined by the Kaplan-Meier survival analysis were evaluated (Fig. 1). The primary focus of this analysis was on the host/virus chimeric transcripts which harbor human-specific SCARs insertions and, therefore, were defined as candidate human-specific regulatory sequences (Tables 1-3).

Interrogation of two TCGA Pan-Cancer databases, comprising 5,158 clinical samples across 12 TCGA cohorts (PANCAl\il2 study of 12 distinct cancer types) and 12,093 ciinicai samples across all TCGA cohorts (https://genomecancer.soe.ucsc.edu/proi/site/xena/ciatapaqes /), demonstrates that changes of gene expression and gene copy numbers of SCARs-targeted protein-coding genes manifest two distinct association patterns with the long-term survival of cancer patients (Fig. 1).

One of the association patterns is defined by the observations that increased gene expression levels of the SCARs-targeted genes appear associated with decreased likelihood of cancer patients' survival. This pattern was observed for the PLCXD1 and CCL26 genes (Fig. 1). In contrast, the second association pattern is illustrated by the evidence that decreased gene expression levels of the SCARs-targeted genes are associated with decreased probabilities of cancer patients' survival. This pattern was observed for the ZNF443, LRBA, TPT1, ABHD12B, and LIN7A mRNAs (Fig. 1).

Association patterns similar to TCGA Pan-Cancer datasets were observed during the analyses of the cancer type-specific patients' survival profiles (Fig. IB), including TCGA Breast Cancer cohort (1,241 clinical samples); TCGA Prostate Cancer cohort (568 clinical samples); and TCGA Rectal Cancer cohort (187 ciinicai samples). Notably, among patients diagnosed with prostate and rectal cancers, it appears possible to identify the good prognosis sub-group of patients comprising of individuals with ~100% survival probability more than 10 years after diagnosis and therapy (Fig. 1 and Supplemental Fig. S2). Therefore, changes of mRNA expression levels and gene copy numbers of SCARs-targeted protein-coding genes with human-specific retroviral insertions comprising structural elements of host/virus chimeric transcripts seem consistent with the hypothesis that different SCAR's activation patterns observed in malignant tumors are associated with clinically distinct outcomes in cancer patients.

Somatic non-silent mutations' fingerprints associated with increased likelihood of death from cancer For efficient evidence-based, individualized management of cancer patients and development of novel diagnostic, prognostic, and therapeutic applications, it would be particularly useful to identify the genetic signatures of somatic non -silent mutations of clinical intractability of malignant tumors, which is defined by the increased probabilities of therapy failure, disease recurrence, metastatic progression, and ultimately death from cancer. To this end, the SCARs' genomic networks and cancer drivers genes were systematically searched for genes that acquired somatic non-silent mutations, detection of which in tumor samples is associated with increased likelihood of death from cancer. Multiple statistically significant instances of this type of associations were observed: that is, genes of the SCARs-associated genomic networks acquired somatic non- silent mutations (SNMs) in malignant tumors and cancer patients having tumors with these mutations manifested a significantly decreased long-term survival probability and increased likelihood of death from cancer (Fig. 5). These observations implied that there are genes within SCARs-associated genomic networks that may function as genetic drivers of clinically lethal death from cancer phenotypes. Conversely, it was reasonable to expect that some of genes previously defined as cancer drivers may constitute a category of candidate SCARs-regulated genes.

This hypothesis has been tested by determining how many previously reported candidate cancer driver genes were also identified in independent experiments as candidate SCARs-regulated genes, which were recently discovered using shRNA approaches [19]. A total of 183 of 291 genes (63%) reported as the high-confidence cancer driver genes [22] were identified as the candidates HERVH/LBP9-reguiated genes in the hESC. Similarly, 75 of 127 genes (59%) previously identified as significantly mutated genes in human tumors [23] were reported among the candidates

HERVH/LBP9-regulated genes. Lastly, 325 of 572 genes (57%) of the latest release of the Cancer Gene Census ihttp://cancer.sanger.ac.uk/census ) were identified as the candidates HERVH/LBP9- reguaited genes in the hESC. Collectively, these observations indicate that a majority of genes that exhibit signals of positive selection across multiple cohorts of tumor samples and were defined as candidate cancer driver genes appears regulated by the HERVH/LBP9 sternness pathway in the hESC.

Based on these consideration, the 18-gene death from cancer SNMs' signature has been identified that segregates patients with decreased survival probability and increased likelihood of death from cancer (Fig. 5). Detection of somatic non-silent mutations in each of these eighteen genes isolated from tumor samples appears associated with poor long-term prognosis of cancer patients compared with patients whose tumors do not have somatic non-silent mutations of these genes (Fig. 5), Significantly, it has been observed that ~70% of all cancer death events occurred in the poor prognosis patients' sub-group defined by the 18-gene death from cancer mutations' signature, whereas TP53 mutations signature alone captured less than 50% of death events (Fig. 5). The eighteen genes comprising the death from cancer SNMs' signature represent human genes in which the presence of somatic non-silent mutations were detected in a single pan-cancer dataset of 7,509 tumor samples across all TCGA cohorts and confirmed during the follow-up analyses of 9 pan- cancer datasets ranging from 1,934 to 8,272 tumor samples, provided that a requirement is met that the presence of these mutations in tumors is associated with significantly increased likelihood of death from cancer defined by the Kaplan-Meier survival analysis (see below). Notably, when the additional nine significant SNMs genes were included in the Kaplan-Meier survival analyses, the classification power of the SIMM signature appears to increase only marginally (Fig. 5).

Cancer survival likelihood classification performance of the SNMs genes was confirmed using several additional analyses (Supplemental Fig. S3). In these analyses only patients with the complete clinical records of the follow-up survival data were included. Comparisons of the Kaplan-Meier survival analyses of 7,258 cancer patients with and without SNMs in their tumors demonstrate that cancer patients whose tumors harbor at least three SNMs genes manifested the shortest median survival (1,438 days), compared with patients with two SNMs genes (median survival 1,725 days) or patients with just one SNMs gene (median survival 1,944 days). Cancer patients without SNMs genes in their tumors had the longest median survival time (4,068 days). When 7,258 cancer patients were stratified into three sub-groups of identical size (n=2,419) after sorting in the ascending order of their survival time, 63.4% of patients with the median survival of 360 days had the SNMs genes in their tumors, whereas 58.5% and 51.8% of cancer patients with the median survival of 869 days and 4,222 days had the SNMs genes in their tumors, respectively (Supplemental Figure S3A).

Visualization of mutations' fingerprints of genes harboring the SNMs signatures of death from cancer phenotypes revealed that these genes isolated from clinical tumor samples appear "littered" with mutations, a vast majority of which is represented by the SNMs (Supplemental Figure S3B).

Interestingly, 11 of 18 (61%) death from cancer SNMs' signature genes are located near fifteen human-specific NANOG-binding sites [3], suggesting that these genes may represent genetic elements of the NANOG-regulatory network in the hESC. The placement of 15 human-specific NANOG-binding sites near 11 death from cancer SNMs' signature genes is significantly higher than could be expected by chance alone (p = 9.95E-Q5; hypergeometric distribution test). This is in contrast to other human -specific transcription factor binding sites (CTCF; POLS5F1; RNAPi l), none of which manifest the significant placement enrichment near death from cancer SN Ms' signature genes (data not shown). Notably, the changes of gene copy numbers of all of these 18 genes seem associated with poor long term survival of cancer patients (Supplemental Fig. S4), thus confirming the potential diagnostic and prognostic values of this gene panel using independent analytical end points for detection of gene-specific genetic alterations.

Next, the search for genes detection of SNMs in which is associated with increased likelihood of death from cancer was conducted employing multiple pan -cancer datasets (see below) to interrogate 127 genes significantly mutated in human cancer [23] and 177 genes listed in the catalogue of somatic mutations in cancer, COSM IC (http://cancer.sanger.ac.uk/cosmic/census ). In total, 42 genes have been identified, which acquired somatic non-silent mutations in clinical samples of malignant tumors and the presence of these mutations is associated with significantly increased likelihood of poor therapy outcomes and death from cancer (Supplemental Data Set 3). Notably, 33 of 42 (78.6%) of genes harboring mutations' fingerprints of death from cancer phenotypes constitute members of SCARs-associated genomic networks (Supplemental Table SI and Supplemental Data Set S3).

Validation analyses of SN Ms' signatures associated with increased likelihood of death from cancer Detection of somatic non-silent mutations (SN Ms) in genome-wide high-throughput experiments represents a significant experimental and analytical challenge. SN Ms' calls are affected by numerous factors even during the processing of the same DNA samples. In addition to the technical factors, such as library preparation and sequencing platforms, differences in analytical and computational methodologies, such as mapping of sequencing reads and calling algorithms, the choice of the reference genome database, genome annotation, and target selection regions ail contribute to the identification of SN Ms, Finally, differences in ad-hoc pre/post data processing such as black lists of genes and samples may be a confounding factor. To account for these potential sources of variability, the significance of the associations between cancer patients' survival and SNMs calls were examined using the databases of somatic non-silent mutations calls reported by different research teams for pan-cancer datasets available at the UCSC Xena browser. In total, ten pan-cancer datasets comprising from 1,934 to 8,272 tumor samples were evaluated in this analysis (Supplemental Data Set S3). All eighteen genes of the SNMs' death from cancer phenotype signature (Fig. 5) were scored as statistically significant genes in at least two pan-cancer datasets

(Supplemental Data Set S3). Seventeen of eighteen SNMs' signature genes (94.4%) were identified in at least three datasets as statistically significant genes, SNMs' mutations in which were associated with the increased likelihood of death from cancer defined by the Kaplan -Meier analysis

(Supplemental Data Set S3). Similarly, detection of SNMs in 39 of 42 genes (92.9%) was associated with the significantly increased likelihood of death from cancer in at least two pan -cancer datasets (Supplemental Data Set S3). Taken together, these observations seem to argue that identified herein genes represent promising candidate genetic markers that are sufficiently robust to justify definitive mutation target site-specific validation experiments and follow-up structural-functional and mechanistic studies.

Linear regression analyses of the clinical intractability of malignant tumors in patients diagnosed with multiple types of malignant tumors revealed striking evidence of associations between the likelihood of dying from cancer, cancer types, and the presence of SNMs' death from cancer signatures in tumors (Fig. 5). in one analysis, cancer patients' survival data from TCGA Pan- cancer cohort of 28 cancer types were utilized to calculate the percent of death events for each cancer type. The resulting values were aligned with the percent of patients with the SNMs' death from cancer signatures in the corresponding groups of cancer patients and subjected to the linear regression analysis (Fig. 5C). In another analysis, age-adjusted cancer incidence and death rates (per 100,000 people) in the United States for 19 cancer types were obtained from the Center for Disease Control and Prevention (CDC) United States Cancer Statistics (USCS) report. The estimated death rates for each cancer type were calculated by multiplying the corresponding values of incidence rates and percent's of patients with the SNMs death from cancer signatures. The estimated death rate values were aligned with the actual death rates for the corresponding cancer types and subjected to the regression analysis (Fig. 5D). In both instances, the strikingly significant correlations were observed, strongly supporting the hypothesis that the presence of SNMs' signatures in tumors may represent a molecular signal of the increased likelihood of developing clinically lethal disease.

Collectively, present analyses indicate that molecular evidence of activation of defined genetic elements of SCARs-associated genomic networks in clinical tumor samples appears linked with the increased likelihood of manifestation of clinically lethal death from cancer phenotypes defined by the poor long-term survival of cancer patients after diagnosis and therapy of malignant tumors. The observed significant correlation of poor survival of cancer patients and copy number changes of genes constituting the master transcriptional regulators of SCARs activity and maintenance of the sternness networks in hESC, namely KLF4, LBP9, POU5F1, and NANOG, strongly support this hypothesis (Supplemental Fig. S4). These data suggest that activation of SCARs - associated genomic networks in cancer cells may provide selective growth and/or survival advantages and represent genetic signals of positive selection during malignant progression.

This conclusion is further supported by the analysis of the expression of proteins encoded by the SCARs-regulated genes in the clinical samples of the TCGA PANCAN 12 cohort (Fig. 6). All available protein expression data associated with the Kaplan-Meier survival curves were evaluated for 38 HERVH/LBP9-reguiated genes. Notably, changes in the protein expression levels of 23 SCARs- regulated genes (60.5%) manifested significant associations with the long-term survival probability of cancer patients (Supplemental Data Set 1). Examples of these highly significant associations are shown in the Fig. 6, confirming the hypothesis that functional alterations of the SCARs-associated sternness genomic networks may play a role in clinically lethal disease progression in cancer patients.

Based on the results of present analyses, it has been concluded that TCGA-guided surveys of SCAR's networks in 12,093 clinical samples across all TCGA cohorts representing twenty-nine distinct types of human cancer revealed pan-cancer genomic signatures of clinically-lethal therapy resistant disease defined by the presence of somatic non-siient mutations (SNMs), gene-level copy number changes, transcripts' and proteins' expression of SCARs-regulated host genes. Reported in this communication genes represent promising candidate genetic markers of clinically lethal forms of human cancer that are sufficiently robust to justify definitive mutation target site-specific validation experiments and follow-up structural-functional and mechanistic studies.

Genome-wide mapping of defined genetic signatures of distinct SCAR's loci revealed marked expansion in the human genome of conserved protein domains encoded by the human-specific chimeric transcript.

Analysis of conserved protein domains within translated amino acid sequences encoded by human-specific SCARs-derived host/virus chimeric transcripts demonstrates that different SCARs' loci manifest distinct protein-coding signatures defined by the combinatorial patterns of conserved protein domains (Fig. 2 and Supplemental Fig. SI). Systematic BLAST analyses of individual SCAR's sequences demonstrate that mutations of viral sequences degraded the full coding potentials of functional viral proteins and only residual structures of certain conserved protein domains remain preserved (Fig. 2 and Supplemental Fig. SI). Notably, one of the most frequently represented conserved protein domains within translated amino acid sequences encoded by human-specific SCARs-derived host/virus chimeric transcripts is the GVQW amino acid sequence (Figs. 2 & 3).

Because nucleotide sequences of distinct SCARs' loci encoding the GVQW amino acid sequence are readily distinguishable, it was possible to ascertain the numbers of the GVQW-encoding sequences in the human genome that were seeded by different SCARs loci. It has been hypothesized that this analysis may be useful for evaluation of the relative impact of expansion of different SCARs loci on spreading the GVQW domain across the human genome.

Genome-wide mapping of specific genetic signatures of distinct SCARs' loci encoding the conserved GVQW protein domain identified thousands of locus-specific genetic fingerprints scattered across the human genome, which were defined as nucleotide sequences having 100% sequence identity with no gaps or insertions compared with the parental SCAR's sequence (Fig. 3). Remarkably, this analysis revealed that the majority of DNA sequences encoding the GVQW conserved protein domain sequences in the human genome seems to originate from the human- specific chimeric transcripts derived from DNA sequences on chrY:278899 -284215 & chrX:278899 - 284215 (Fig. 3). This expansion of specific SCARs-derived nucleotide sequences may have contributed to the marked enrichment of the GVQW conserved protein domains within the human proteome compared with other Great Apes (Fig. 3).

Further analysis revealed that zinc finger proteins represent one of the largest protein families in the human genome that harbor the GVQW domains. Therefore, it was of interest to determine whether expression of the zinc finger proteins harboring the GVQW domains is altered in malignant tumors from cancer patients with distinct long-term survival after therapy. Remarkably, this analysis demonstrates that changes of mRNA expression levels and gene copy numbers of zinc finger proteins harboring the GVQW domains appear to segregate cancer patients into sub-groups with markedly distinct treatment outcomes (Supplemental Fig. S2). The observed patterns of changes in gene expression and gene copy numbers seem useful for identification of individuals with increased likelihood of therapy failure and death from cancer among patients diagnosed with prostate, breast, colon, rectal, and pancreatic cancers (Supplemental Fig. S2). It will be of interest to determine experimentally what the function of the GVQW domain is and how the insertion of this domain into specific protein sequences affects the structural-functional properties of host proteins.

Remarkably, the gene-level copy number changes of all 21 zinc finger proteins with GVQW conserved protein domains and three SCARs network zinc finger protein genes (ZIMF443; ZNF587; ZNF814) manifest highly significant associations with the poor prognosis and increased likelihood of death from cancer defined by the Kaplan-Meier survival analyses of the 12,093 clinical samples comprising TCGA Pan-cancer cohort (Fig. 4). These data strengthen the conclusion regarding the potential diagnostic and prognostic values of the zinc finger proteins containing the conserved GVQVV domains for the clinical management of cancer patients and identification of individuals with the increased risk of therapy failure and disease progression.

Putative role of DNA repair pathways in creation of human-specific regulatory sequences encoded by endogenous human SCARs.

Mammalian cells have evolved to efficiently employ highly effective DNA repair pathways capable of patching DNA double-stranded brakes (DSBs) with almost any DNA molecules available in the vicinity of the lesions [24, 25]. Insertions of transposabie element (TE)-derived DNA sequences (including DNA transposons and both LTR and non-LTR retrotransposons) at the site of DNA lesions appear to utilized by eukaryotic cells to repair DSBs [26-31]. An alternative model of TE-derived DNA capture, an endonuclease-independent LI insertion mechanism at DNA DSBs repair sites has been proposed [27, 28, 30]. This pathway was initially observed in DNA repair-deficient rodent cell lines [27]. Subsequent reports indicated that this mechanism is likely to function in the human genome as well [28, 30-32], It has been suggested that non-classical mechanisms of TE insertions may be associated with DSBs repair mediated by Alu elements [31] and HERV-K retroviruses [32], It was of interest to ascertain whether SCARs activity may have contributed to the DNA repair in human cells.

A consensus signature feature of the non-classical TE-insertion mechanisms observed for various classes of retrotransposons is deletions of ancestral DNA sequences within the sites of insertions of TE-derived sequences. Human-specific deletions associated with TE-mediated DSBs are often extended for thousands base pairs of ancestral DNA sequences [31, 32]. To ascertain whether SCARs may have contributed to the DSBs repair pathways, candidate human-specific regulatory sequences (HSRS) encoded by endogenous human SCARs were identified and analyzed for the presence of human-specific gains (insertions) and losses (deletions) of regulatory DNA (Tables 1, 2). As expected, a majority of transcriptionally-active in human pluripotent stem cells HSRS (75.0%- 79.5%) contains human-specific insertions (Table 2). Remarkably, the DMA sequence conservation analysis employing the LiftOver algorithm and Multiz Alignments of 20 mammals (17 primates) of the UCSC Genome Browser on Human Dec. 2013 (GRCh38/hg38) Assembly

(http://qenome.ucsc.edu/cgi-bin/hgTracks?db=hg38&posi tion=chr1 %3A90820922- 90821071 &bgsid=441235989 eelAivpkubSY2AxzLhSXKL5ut7TN ) revealed that 74.4%-88.6% of SCARs-encoded HSRS contain deletions of ancestral DNA sequences defined by the comparisons with the chimpanzee and bonobo genomes (Table 2). Notably, 40.0%-59.1% of SCARs-encoded HSRS contain large continuous human-specific losses of DNA segments exceeding 1,000 bp. in length. Some of the most extreme examples include the human-specific deletion of 27,843 bp. (hg38 coordinates: chr4:132,117,632-132, 124,853) compared with chimpanzee's genome and the human- specific deletion of 81, 108 bp. (hg38 coordinates: chr4:3, 927,445-3,933,080) compared with bonobo's genome. Similarly, large human-specific deletions of 75, 171 bp. (chrl2:S,279,022- 8,294,090), 35,326 bp. (chr4:3,927,445-3,933,080), and 71,036 bp. (chrl:112,809,666-112,826,054) were detected at different loci of SCAR's insertions compared with gorilla, orangutan and gibbon genomes, respectively.

Present analysis identified 101 transcriptionally active in human pluripotent stem cells SCARs-encoded human-specific regulatory loci that underwent multiple independent events of distinct human-specific DMA losses during primate's evolution (Table 2). Genomic coordinates of these 101 loci manifesting human-specific deletions' cascade patterns were identified by comparisons of human DNA sequences with the orthologous sequences of non-human primates using the UCSC Genome Browser tracks of the Multiz Alignments of 20 mammals (17 primates), in this analysis HSRS were defined as the genomic loci with human-specific deletions' cascade patterns when a continuous human-specific DNA sequence in the human genome manifests at least 2 distinct events of human-specific deletions compared to genomes of at least 2 different species of non- human primates, which were selected from the group comprising of chimpanzee, bonobo, gorilla, orangutan, and gibbon. Therefore, genomic loci manifesting human-specific deletions' cascade patterns appear to experience repeated losses of distinct continuous DNA segments over extended time periods during primates' evolution, which would be consistent with the mechanism of repetitive cycles of occurrence of DSBs and repair of DiMA molecules mediated by the insertions of SCARs sequences at these genomic locations.

These distinctive structural features of human-specific SCAR's integration sites suggest that molecular mechanisms of the SCARs-associated DSBs repair may be similar to a backup DNA repair pathway known as an alternative non-homologous end-joining (Alt NH EJ), because the hallmark features of the repair junctions built by the Ait NH EJ pathway are large DMA deletions, insertions, and tracts of microhomology [33, 34]. Collectively, these data support the hypothesis that the Alt NH EJ pathway of DSBs repair may have contributed to the insertions of SCARs at specific genomic locations, which resulted in creation of HSRS transcriptionally active in human piuripotent stem cells (Fig. 7).

DESCRIPTION OF POTENTIAL BIOLOGICAL, PATHOPHYSIOLOGICAL, DIAGNOSTIC, AND THERAPEUTIC IMPLICATIONS

Implications for the liquid biopsy applications

Observations that malignant tumors shed cell-free fragments of DNA into the bloodstream as a result of apoptotic and/or necrotic death of cancer cells pave the way for the disclosure and rapid introduction into experimental and clinical cancer research the concept of a liquid biopsy based on the analysis of circulating cell-free (cfDNA) derived from cancer cells. The consensus view emerged that the load of cfDNA derived from cancer cells appear to correlate with tumor staging and prognosis [Diaz LA Jr, Bardelli A. Liquid Biopsies: Genotyping Circulating Tumor DNA. J Clin Oncol. 2014;32: 579-86; Haber, D. A. & Velculescu, V. E. Blood-Based Analyses of Cancer: Circulating Tumor Cells and Circulating Tumor DNA. Cancer Discov. 2014; 4: 650-661; Betfegowda, C. et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl. Med. 2014; 6: 224ra24; Newman AM, Bratman SV, To J, Wynne J F, Eclov NC, Modlin LA, Liu CL, Neal JW, Wakeiee HA, Merritt RE, Shrager JB, Loo BW Jr, Alizadeb AA, Diehn M. An ultrasensitive method for quantitating circulating tumor DiMA with broad patient coverage. Nat. Med. Nat Med. 2014; 20: 548- 54; Dawson SJ, Tsui DW, M urtaza M, Biggs H, Rueda OM, Chin SF, Dunning MJ, Gale D, Forshew T, Mahler-Araujo B, Rajan S, Humphray S, Becq J, Halsall D, Wallis M, Bentley D, Caldas C, Rosenfeld N. Analysis of circulating tumor DNA to monitor metastatic breast cancer. N. Engl. J. Med. 2013; 368: 1199-209; Garcia-M uriilas i, Schiavon G, Weigelt B, Ng C, Hrebien S, Cutts RJ, Cheang M, Osin P, Nerurkar A, Kozarewa i, Garrido JA, Dowsett M, Reis-Filho JS, Smith IE, Turner NC. Mutation tracking in circulating tumor DNA predicts relapse in early breast cancer. Sci Transl Med. 2Q15;7: 302ral33]. Most recent advances in the next generation sequencing technology markedly improved the sensitivity, specificity, and accuracy of the analysis of tumor-derived DNA. In principle, the state of the art next generation sequencing techniques have allowed for genotyping of tumor-derived cfDNA for somatic genomic alterations which were previously possible to document only by the direct analysis of cancer cells. The ability to readily detect and reliably quantify highly heterogeneous spectrum of mutations in individual tumors using cfDIMA-based assays has proven highly efficient in tracking dynamics of tumor evolution in real time that can be used for a variety of translational applications facilitating the clinical implementation of the concept of personalized disease management in cancer patients.

Despite the perceived great promise for multiple translational applications, the liquid biopsy technology in its current form has significant limitations. These limitations are particularly apparent when the intended uses of the liquid biopsy for diagnosis of the early-stage solid tumors or prospective identification of therapeutically actionable mutations of cancer driver genes are carefully considered. In its current form, the liquid biopsy is primarily utilized for in-depth high- resolution sequencing of cfDNA extracted from blood samples (plasma or serum) with the primary intent to reliably detect somatic mutations in pre-seiected sets of cancer driver genes. It seems reasonable to expect that tumor vascularization would be required for cancer cell-derived cfDNA to appear in blood. However, it is well established that the early stages of development of essentially all solid tumors in cancer patients are characterized by the lack of the need for vascularization and, indeed, represent the avascular stage of tumor development and progression for many years with the sufficient nutrient supply by diffusion. In this context, the appearance of tumor-derived cfDNA in blood should be regarded as the evidence of tumor vascularization and a molecular signal of increased likelihood of malignant progression toward metastatic disease. Consistent with this line of reasoning, tumor-derived cfDNA is reliably and reproducibly detected in blood of >90% of cancer patients with advanced solid tumors, whereas the detection rate drops to ~50% (or less) in blood from patients diagnosed with the early-stage cancers. Importantly, it is almost certain that further improvements in the analytical performance of the next generation sequencing technology would not dramatically change these realities. It appears that the consensus view is that the primary origin of the cancer cell-derived cfDNA is from tumor cells undergoing apoptotic and/or necrotic death. There are no credible evidence consistently demonstrating that the origin of tumor-derived cfDIMA extracted from blood samples is from viable actively dividing cancer cells or tumor growth-sustaining minority sub -populations of cancer cells such as cells of cancer origin, tumor-initiating cells, or cancer stem cells. Therefore, it is reasonable to believe that mutational signatures of tumor-derived cfDNA extracted from blood of cancer patients represent the past history of tumor evolution and there is no credible way to discern the real time mutational status or to predict the future of tumor evolution based on the genetic information extracted from dead cancer cells.

Most recent analysis of genome-wide mutational dynamics during tumor evolution at the single-nucleus resolution revealed that somatic point mutations, in contrast to aneuploidies, evolved gradually and generated extensive clonal diversity [Wang Y, Waters J, Leung M L, Unruh A, Roh W, Shi X, Chen K, Scheet P, Vattathil 5, Liang H, Multani A, Zhang H, Zhao R, Michor F, Meric-Bernstam F, Navin N E. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature, 2014; 512: 155-160]. Targeted single-molecule sequencing conclusively demonstrated that many of diverse point mutations detected in tumors occur at frequency <10% of tumor cell populations. In striking contrast, aneuploid rearrangements appeared early in tumor evolution and remained highly stable during the clonal expansion [Wang, Y., et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature. 2014; 512: 155-160]. This contribution links

development of aneuploidies with aberrant activity of SCARs networks and demonstrates that gene expression signatures of activated SCAR's pathway (s) can be detected in clinical samples of cancer precursor lesions, localized tumors, and metastatic cancers. Collectively, these observations strongly argue that activation of SCARs networks and associated genomic aberrations are likely to occur in the cancer precursor cells and continually persist throughout tumor evolution and progression toward metastatic disease. Therefore, detection of identified herein SCARs sequences, SCAR/host gene hybrid sequences, SCARs-regulated protein coding genes and non-coding RNA sequences will open the remarkable opportunities for diagnostic, prognostic, therapy selection, and disease management applications utilizing the liquid biopsy technology.

Cell-free macromolecules, including nucleic acids and proteins, are often reside in nano- scale size particles called exosomes. Packaging of DNA and RNA molecules in the exosomes appears to protect them from degradation by extracellular nucleases and the biologically active nucleic acid molecules such as microRNAs and lincRNA appears to remain stable. Therefore, the sample preparation protocols for liquid biopsy analyses would likely to benefit from the inclusion of the exosome enrichment and purification step.

Putative roie of SCAR's sequences in DNA repair and increased survival of metastatic cancer cells

Present analyses suggest a plausible biological role for SCARs in DNA repair that may override the potentially harmful effects of retrotransposon-driven mutations by providing the immediate survival and fitness advantages to host cells, which would be particularly beneficial for immortal cancer cells. Despite relatively high activity of DNA repair pathways, hESCs exhibit increased sensitivity to radiation-induced DNA damage and apoptosis [35, 36]. It has been suggested that increased sensitivity to apoptosis of hESC is due to low apoptotic threshold in response to DNA damage [36]. In striking contrast, previously reported experimental and clinical evidence of activation of sternness pathways in therapy resistant malignant tumors, highly metastatic cancer cells, and circulating tumor cells consistently demonstrated genetic and phenotypic associations with manifestations of markedly increased resistance to apoptosis induced by various biologically-relevant micro-environmental changes and different chemical perturbations [37-51]. These important biological distinctions, which are defined by the underlying differences of genomic architectures between normal human pluripotent stem cells and highly malignant populations of tumor cells with activated sternness genetic networks, are likely responsible for relentless growth, self-renewal, survival, and tumor-initiating abilities of cancer stem cells.

Continuing transcriptional activity of SCARs in tumor cells may represent a constant potentially deadly threat despite their apparent structural deficiencies to encode the functional viral genomes. There are many thousand variants of SCARs' sequences integrated in the human genome, suggesting that many mutations of SCARs' genes can be repaired by recombination with endogenous copies of SCARs' sequences. Consistent with this hypothesis, it has been demonstrated that introduction of mutant retroviruses carrying a lethal deletion in an essential viral gene can result in spread of revertant viruses that repaired the mutation by homologous recombination with endogenous DNA sequences [52]. Genomic networks of stem cell-associated retroviruses harbor signatures of clinically intractable malignant tumors

Present analysis of SCARs and associated sternness genomic networks was focused on genetic loci harboring human-specific insertions and/or deletions that may have contributed to development of human-specific regulatory networks and pathways. One of the primary line of reasoning for the choice of this strategy is based on the apparent major differences in the cancer incidence between humans and nonhuman primates that have been documented extensively.

Prostate carcinoma is essentially nonexistent and lung cancer is very rare in nonhuman primates (53- 58). Overall, the incidence rate of common cancers, including breast, prostate, lung, colon, ovary, pancreas, and stomach, is estimated in the range of ~2% to 4% (53-57). Unique to human phenotypic effects of human-specific regulatory loci and pathways operating within the circuitry of sternness genomic networks may have contributed to these dramatic species-specific differences in the cancer incidence.

Based this idea, the initial analysis was focused on the host/virus chimeric transcripts which harbor human-specific SCARs insertions (Tables 1-3; Fig. 1). Observed changes of m RNA expression levels and gene copy numbers of SCARs-targeted protein-coding genes with human-specific retroviral insertions comprising structural elements of host/virus chimeric transcripts support the hypothesis that different SCAR's activation patterns are associated with significantly distinct long term survival of cancer patients.

Next, the analysis of conserved protein domains within translated amino acid sequences encoded by human-specific SCARs-derived host/virus chimeric transcripts was carried out. It demonstrates that different SCARs' loci manifest distinct protein-coding signatures defined by the combinatorial patterns of conserved protein domains (Fig. 2 and Supplemental Fig. SI). It has been observed that one of the most frequently represented conserved protein domains within translated amino acid sequences encoded by human-specific SCARs-derived host/virus chimeric transcripts is the GVQW amino acid sequence (Figs. 2 & 3). Using defined SCARs-locus-specific signatures of nucleotide sequence encoding GVQW domains, it has been determined that the origin of a majority of DNA sequences encoding the GVQW amino acid sequences in the human genome is from the human-specific chimeric transcripts encoded by DNA sequences on chrY:278899 -284215 & chrX:278899 -284215 (Fig. 3). The spreading of SCARs-derived nucleotide sequences appears to result in the marked expansion of the specific GVQW-encoding DNA sequences and ~10-fold enrichment of the GVQ.W conserved protein domains within the human proteome compared with other Great Apes (Fig. 3). These data strongly argue that one of the biologically-significant consequences of the continuing SCARs activity is the seeding of nucleotide sequences encoding specific conserved protein domains throughout the human genome.

Remarkably, subsequent analysis demonstrates that changes of m RNA expression levels and gene copy numbers of zinc finger proteins harboring the GVQW domains segregate cancer patients into sub-groups with markedly distinct treatment outcomes (Fig. 4 and Supplemental Fig. 2). The observed patterns of changes in gene expression and copy numbers seem to segregate individuals with increased likelihood of therapy failure and death from cancer among patients diagnosed with prostate, breast, colon, rectal, and pancreatic cancers (Supplemental Fig. 2). Among patients diagnosed with prostate and rectal cancers, it appears possible to identify the good prognosis subgroup of patients comprising of individuals with ~100% survival probability more than 10 years after diagnosis and therapy (Supplemental Fig. 2), which may have a highly significant clinical implications for individualized, evidence-based disease management decision making process.

To determine whether genetic signatures of SCARs activity may be potentially useful for diagnostic and prognostic applications, the SCAR's genomic networks were systematically searched for genes that acquired somatic non-silent mutations, detection of which in tumor samples is associated with increased likelihood of death from cancer. A total of 42 human genes have been identified in this contribution that acquired somatic non-silent mutations in clinical tumor samples across all TCGA cohorts and presence of these mutations in malignant tumors seems associated with significantly increased likelihood of death from cancer (Fig, 5; Supplemental Table SI; Supplemental Data Set S3). A significant majority of genes (33 of 42; 78.6%) harboring mutations' fingerprints of death from cancer phenotypes constitute members of SCARs-associated genomic networks (Supplemental Table S1 and Supplemental Data Set S3), thus confirming that molecular evidence of activation of defined genetic elements of SCARs-associated sternness genomic networks in clinical tumor samples appears linked with the increased likelihood of manifestation of clinically lethal death from cancer phenotypes defined by the Kaplan-Meier survival analysis. Significantly, it has been observed that more than 70% of all cancer death events occurred in the poor prognosis patients' sub-group defined by the death from cancer SNMs' signature (Fig. 5). One of the significant conclusions reported in this contribution is based on the observations that detection of molecular evidence of altered activities of defined genetic elements of SCARs- associated sternness genomic networks in clinical tumor samples appears associated with the increased likelihood of clinical manifestation of disease progression defined by the poor long-term survival of cancer patients after diagnosis and therapy of malignant tumors. Observations of engagements of specific genes within SCARs networks in tumors are based on detection of somatic non-silent mutations and changes of gene copy numbers, suggesting that altered activities of SCARs- associated genomic networks in cancer cells may provide selective growth and/or survival advantages and represent genetic signals of positive selection during malignant progression.

Significantly, the clinical intractability of malignant disease, which was ascertained based on the long-term survival of patients diagnosed with twenty-eight cancer types, is directly correlated with the percentage of cancer patients whose tumors harbor somatic non-silent mutations' signatures. Therefore, reported herein genetic correlates of death from cancer phenotypes may represent highly attractive targets for development of novel diagnostic, prognostic, and therapeutic applications directed against intractable human malignancies.

Consistent with the idea that the human-specific structural-functional features of SCAR's genomic networks may play unique roles in both physiology and pathology of H. sapiens, it has been reported that the HERV-H transcriptome has recently evolved in humans under the influence of directional selection and is likely to exert detectable fitness effects on the host since the chimp- human split (59). Explorations of biologically significant functions of SCARs in the pathological and physiological conditions should not focus exclusively on the detection and isolation of infectious viral particles. Like many other HERV families, the majority of SCAR's sequences accumulated multiple mutations and deletions during evolution and no HERV sequence has been shown to be replication- competent and infectious.

In human genome the HERV-K family comprises 91 proviruses with full or partial coding capacity of retroviral proteins and 944 solo LTRs (60), Collectively, HERV-K proviruses maintain open reading frames for all retroviral genes needed for infectivity and potential recombination among only three HERV-K proviruses could facilitate the production of an infectious retrovirus (61).

However, the new conclusive evidence of significant impact of SCARs-derived retroviral sequences on development of cancer in humans may not necessarily require the isolation of infectious virus and establishing a correlation between the viral infection and cancer incidence. The pathologically significant effects of retroviral sequences may arise from many different mechanisms of their biological activities and can be demonstrated as the following experimental evidence (62):

Presence of new, cancer-specific integration sites of retroviruses;

Consistent regulatory targeting of one or a few host genes in many different tumors;

Oncogenic actions of protein products of retroviral genes {env; rec; np9);

Targeted regulatory effects on expression of host genes due to contributions of new splice donor or acceptor sites, alternative promoters, and transcription regulatory sites.

In addition, presence of multiple SCAR's sequences on the same and/or different chromosomes is likely to facilitate the chromosomal rearrangements due to recombination events between the genomic loci within the permissive chromatin context.

Present analyses suggest that epigenetic activation of silenced SCAR's loci in differentiated cells may establish a cancer susceptibility state in a cell by engaging sternness regulatory networks. It seems plausible to argue that subsequent mutagenesis and selection of cancer driver genes occur in cells with SCARs-activated sternness networks, which would explain why nearly two-third of high confidence cancer drivers and COSMIC genes appear regulated by SCARs in hESC (see above). The central postulate of this hypothesis predicts the presence of pre-cancerous differentiated cells with SCARs-activated sternness networks that may serve as a precursor of cancer stem cells, emergence of which would subsequently fuel tumor growth, cancer progression, metastasis, and development of clinically intractable malignancies.

MATERIALS AND METHODS

Data Sources and Analytical Protocols

Solely publicly available datasets and resources were used for this analysis as well as methodological approaches and a computational pipeline validated for discovery of primate-specific gene and human-specific regulatory loci [3; 63-68]. The individual genetic elements comprising the SCARs-associated sternness genomic networks, including HERVH/LBP9-regulated genes identified in the hESC using shRNA experiments [19], were obtained from the recently published contributions reporting transcriptionally active SCARs loci [12; 16-20], host/virus chimeric transcripts [18-20], and human-specific transcription factor binding sites (TFBS) seeded in the hESC genome by SCARs [3] . The most recent beta release of web-based tools of The Cancer Genome Atlas (TCGA) project, the UCSC Xena (http://xena.ucsc.edu/ ), associated clinical data, and multiple functional cancer genomics' end points identified in thousands tumor samples were utilized to explore, analyze, and visualize the clinically-relevant patterns of gene expression, somatic non-silent mutations, and gene copy numbers of individual genetic elements of the SCARs-associated sternness genomic networks by interrogating the comprehensive functional cancer genomics datasets of more than twelve thousands annotated clinical tumor samples

(https://genomecancer.soe.ucsc.edu/proi/site/xena/datapages/ ). Pan-cancer signatures of gene expression, somatic non-silent mutations, and copy number changes associated with increased likelihood of death from cancer were identified by interrogation of two TCGA Pan-Cancer databases, comprising 5,158 clinical samples across 12 TCGA cohorts (PANCAN12 study of 12 distinct cancer types) and 12,088 clinical samples across all TCGA cohorts

(https://genomecancer.soe.ucsc.edu/proi/site/xena/datapag es/).

The sequence conservation analysis is based on the University of California Santa Cruz (UCSC) LiftOver algorithm for conversion of the coordinates of human blocks to corresponding non- human genomes using chain files of pre-computed whole-genome BLASTZ alignments with a MinMatch of 0.95 and other search parameters in default setting (http://genome.ucsc.edu/cgi- bin/hgLiftOver). Extraction of BLASTZ alignments by the LiftOver algorithm for a human query generates a LiftOver output "Deleted in new", which indicates that a human sequence does not intersect with any chains in a given non-human genome. This indicates the absence of the query sequence in the subject genome and was used to infer the presence or absence of the human sequence in the non-human reference genome. Human-specific regulatory sequences were manually curated to validate their identities and genomic features using a BLAST algorithm and the latest releases of the corresponding reference genome databases for time periods between April, 2013 and October, 2015.

Considerations of the putative functionally-significant regulatory effects of SCARs on host genes were based, in part, on the results of the genome-wide proximity placement analyses of the corresponding candidate regulatory elements and target genes. The quantitative limits of proximity during the proximity placement analyses were defined based on several metrics. One of the metrics was defined using the genomic coordinates placing human-specific regulatory sequences closer to putative target protein-coding or IncRNA genes than experimentally defined distances to the nearest targets of 50% of the regulatory proteins analyzed in hESCs [69]. For each gene of interest, specific HSGRL were identified and tabulated with a genomic distance between HSGRL and a putative target gene that is smaller than the mean value of distances to the nearest target genes regulated by the protein-coding TFs in hESCs. The corresponding mean values for protein-coding and IncRNA target genes were calculated based on distances to the nearest target genes for TFs in hESC reported by Guttman et al. [69]. In addition, the proximity placement metrics were defined based on co- localization within the boundaries of the same topologically associating domains (TADs) and the placement enrichment pattern of human -specific NANOG-binding sites (HSNBS) located near the 251 neocortex/prefrontal cortex-associated genes [70]. The placement enrichment analysis of HSNBS identified the most significant enrichment at the genomic distances less than 1.5 Mb with a sharp peak of the enrichment p value at the genomic distance of 1.5 Mb [70].

Comprehensive databases of individual regulatory elements and chromatin regulatory domains identified in the hESC genome were considered in this study. Genomic coordinates of 3,127 topologicaliy-associating domains (TADs) in hESC; 6,823 hESC-enriched enhancers; 6,322 conventional and 684 super-enhancers (SEs) in hESC; 231 SEs and 197 super-enhancers domains (SEDs) in mESC were reported in the previousiy published contributions [2; 71-74]. Species-specific datasets of NANOG-, POU5F1-, and CTCF-binding sites and human-specific TFBS in hESCs were reported previously [3; 4] and are publicly available. RNA-Seq datasets were retrieved from the UCSC data repository site (http://genome.ucsc.edu/; [75]) for visualization and analysis of cell type- specific transcriptional activity of defined genomic regions. A genome-wide map of the human methylome at single-base resolution was reported previously [76; 77] and is publicly available (http://neomorph.salk.edu/human methylome). The histone modification and transcription factor chromatin immunoprecipitation sequence (ChlP-Seq) datasets for visualization and analysis were obtained from the UCSC data repository site (http://genome.ucsc.edu/; [78]). Genomic coordinates of the RNAA polymerase H (Pll)-binding sites, determined by the chromatin integration analysis with paired end-tag sequencing (ChlA-PET) method, were obtained from the saturated libraries constructed for the MCF7 and K562 human cell lines [79], The density of TF-binding to a given segment of chromosomes was estimated by quantifying the number of protein-specific binding events per 1-Mb and 1-kb consecutive segments of selected human chromosomes and plotting the resulting binding site density distributions for visualization. Visualization of multiple sequence alignments was performed using the WebLogo algorithm (http://weblogo.berkeley.edu/logo.cqi ). Consensus TF-binding site motif logos were previously reported [4; 80; 81].

The assessment of conservation of HSGRL in individual genomes of 3 Neanderthals, 12 Modern Humans, and the 41,000-year old Denisovan genome [82; 83] was carried-out by direct comparisons of corresponding sequences retrieved from individual genomes and the human genome reference database (http://genome.ucsc.edu/Neandertal/).

Nucleotide sequences of human-specific chimeric transcripts were translated into amino acid sequences and subjected to the protein alignment analyses using the protein BLAST algorithm (http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp& BLAST PRGGRAMS-biastp&PA GE TYPE-BlastSearch&SHQW DEFAULTS=on&UNK LGX>blasthome ) and associated web-based tools for identification and visualization of conserved protein domains

(http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cqi?RI D=3HZ5BMES01 R&mode=all ), which were described in details elsewhere [84, 85].

Age-adjusted cancer incidence and death rates in the United States were obtained from the Center for Disease Control and Prevention (CDC) United States Cancer Statistics (USCS) report:

U.S. Cancer Statistics Working Group. United States Cancer Statistics: 1999-2012 Incidence and Mortality Web-based Report. Atlanta: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer institute; 2015. Available at:

www.cdc.gov/uscs .

Statistical Analyses of the Publicly Available Datasets

All statistical analyses of the publicly available genomic datasets, including error rate estimates, background and technical noise measurements and filtering, feature peak calling, feature selection, assignments of genomic coordinates to the corresponding builds of the reference human genome, and data visualization, were performed exactly as reported in the original publications and associated references linked to the corresponding data visualization tracks

(http://genome.ucsc.edu/ and http://xena.ucsc.edu/). Any modifications or new elements of statistical analyses are described in the corresponding sections of the Results. Statistical significance of the Pearson correlation coefficients was determined using GraphPad Prism version 6.00 software. The significance of the differences in the numbers of events between the groups was calculated using two -sided Fisher's exact and Chi-square test, and the significance of the overlap between the events was determined using the hypergeometric distribution test [86].

REFERENCES

1. Santoni, F.A., Guerra, J., and Luban, J. HERV-H RNA is abundant in human

embryonic stem cells and a precise marker for pluripotency. Retrovirology 2012; 9: 11 1.

2. Xie W, Schultz MD, Lister R, Hou Z, Rajagopal N, Ray P, Whitaker JW, Tian S, Hawkins RD, Leung D, Yang H, Wang T, Lee AY, Svvanson SA, Zhang J, Zhu Y, Kim A, Nery JR, Urich MA, Kuan S, Yen CA, Klugman S, Yu P, Suknuntha K, Propson HE, Chen H, Edsail LE, Wagner U, Li Y, Ye Z, Kulkarni A, Xuan Z, Chung WY, Chi NC, Antosiewicz-Bourget JE, Slukvin I, Stewart R, Zhang MQ, Wang W, Thomson JA, Ecker JR, Ren B. Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell 2013. 153: 1134-1148.

3. Giinsky, GV. Transposable Elements and DNA Methylation Create in Embryonic Stem Cells Human-Specific Regulatory Sequences Associated with Distal Enhancers and Noncoding RNAs. Genome Biol Evol. 2015; 7: 1432-54.

4. Kunarso, G, Chia, NY, Jeyakani, J, Hwang, C, Lu, ., Chan, YS, Ng, HH, and

Bourque, G. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat Genet. 2010; 42: 631-634.

5. Kelley, D, and Rinn, J. Transposable elements reveal a stem cell-specific class of long noncoding RNAs. Genome Biol. 2012; 13: R107.

6. Giinsky GV. Endogenous human stem cell-associated retroviruses. BioRxiv 2015; doi: http://dx.doi.org/10.1101/024273 Giinsky GV. SCARs: endogenous human stem celi-associaied retroviruses and therapy-resistant malignant tumors. arXiv preprint 2015; arXiv: 1508.02022

http://arxiv.org/abs/1508.02022

Giinsky GV. Viruses, sternness, embryogenesis, and cancer: a miracle leap toward molecular definition of novel oncotargets for therapy-resistant malignant tumors? Oncoscience 2015; 2: 751-754.

Giinsky GV. Activation of endogenous human Stem Cell-Associated Retroviruses and therapy-resistant phenotypes of malignant tumors. 2018. In revision.

Smith ZD, Chan MM, Humm KG, Karnik R, Mekhoubad S, Regev A, Eggan K, Meissner A. DNA methylation dynamics of the human preimplantation embryo.

Nature 2014; 511 : 611-615.

Fort A, Hashimoto K, Yamada D, Salimullah M, Keya CA, Saxena A, Bonetti A, Voineagu I, Bertin N, Kratz A, Noro Y, Wong CH, de Hoon M, Andersson R, Sandeiin A, Suzuki H, Wei CL, Koseki H; FANTOM Consortium, Hasegawa Y, Forrest AR, Carninci P. Deep franscriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance. Nature Genet. 2-14; 46: 558-566.

Lu X, Sachs F, Ramsay L, Jacques PE, Goke J, Bourque G, Ng HH. The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity. Nat Struct Mol Biol. 2014; 21 :423-425.

Ohnuki M, Tanabe K1 , Sutou K, Teramoto I, Sawamura Y, Narita M, Nakamura M, Tokunaga Y, Nakamura M, Watanabe A, Yamanaka S, Takahashi K. Dynamic regulation of human endogenous retroviruses mediates factor-induced reprogramming and differentiation potential. Proc Natl Acad Sci USA. 2014.

11 1 : 12426-31.

Koyanagi-Aoi M, Ohnuki M, Takahashi K, Okita K, Noma H, Sawamura Y, Teramoto I, Narita M, Sato Y, lchisaka T, Amano N, Watanabe A, Morizane A, Yamada Y, Sato T, Takahashi J, Yamanaka S. Differentiation-defective phenotypes revealed by large- scale analyses of human pluripotent stem cells. Proc Natl Acad Sci USA. 2013; 110: 20569-74.

Marchetto MC, Narvaiza I, Denli AM, Benner C, Lazzarini TA, Nathanson JL, Paquoia AC, Desai KN, Herai RH, Weitzman MD, Yeo GW, Muotri AR, Gage FH. (2013). Differential LINE-1 regulation in pluripotent stem cells of humans and other great apes. Nature 503: 525-529.

Xue Z, Huang K, Cai C, Cai L, Jiang CY, Feng Y, Liu Z, Zeng Q, Cheng L, Sun YE, Liu JY, Horvath S, Fan G. Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature 2013; 500: 593-597.

Yan L, Yang M, Guo H, Yang L, Wu J, Li R, Liu P, Lian Y, Zheng X, Yan J, Huang J, Li M, Wu X, Wen L, Lao K, Li R, Qiao J, Tang F. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol 2013; 20: 1 131-1 139.

Goke J, Lu X, Chan YS, Ng HH, Ly LH, Sachs F, Szczerbinska I. Dynamic transcription of distinct classes of endogenous retroviral elements marks specific populations of early human embryonic cells. Cell Stem Cell 2015; 16: 135-141.

Wang J, Xie G, Singh M, Ghanbarian AT, Rasko T, Szvetnik A, Cai H, Besser D, Prigione A, Fuchs NV, Schumann GG, Chen W, Lorincz MC, Ivies Z, Hurst LD, Izsvak Z. Primate-specific endogenous retrovirus-driven transcription defines naive- like stem cells. Nature 2014; 516: 405-9.

Grow EJ, Flynn RA, Chavez SL, Bayless NL, Wossidio M, Wesche DJ, Martin L, Ware CB, Blish CA, Chang HY, Pera RA, Wysocka J. Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells. Nature 2015; 522: 221-5. Robbez Masson L, Rowe HM. Retrotransposons shape speciesElspecific embryonic stem cell gene expression. Retrovirology 2015; 12: 45.

Tamborero D1 , Gonzalez- Perez A, Perez-Llamas C, Deu-Pons J, Kandoth C, Reimand J, Lawrence MS, Getz G, Bader GD, Ding L, Lopez-Bigas N.

Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci Rep. 2013; 3: 2850.

Hoadley KA, Yau C, Wolf DM, Cherniack AD, Tamborero D, Ng S, Leiserson MD, Niu B, McLellan MD, Uzunangelov V, Zhang J, Kandoth C, Akbani R, Shen H, Omberg L, Chu A, Margolin AA, Van't Veer LJ, Lopez-Bigas N, Laird PW, Raphael BJ, Ding L, Robertson AG, Byers LA, Mills GB, Weinstein JN, Van Waes C, Chen Z, Collisson EA; Cancer Genome Atlas Research Network, Benz CC, Perou CM, Stuart JM.

Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 2014; 158: 929-44.

Yu, X. and Gabriel, A. Patching broken chromosomes with extranuclear cellular DNA. MoL Cell 1999; 4: 873-881.

Lin, Y. and Waldman, A.S. Promiscuous patching of broken chromosomes in mammalian cells with extrachromosomal DNA. Nucleic Acids Res. 2001 ; 29: 3975- 3981. , Teng, S.C., Kim, B. and Gabriel, A. Retroiransposon reverse transcriptase-mediated repair of chromosomal breaks. Nature 1996; 383: 641-644.

, Morrish, T.A., Gilbert, N., Myers, J.S., Vincent, B.J., Stamato, T.D., Taccioli, G.E., Batzer, M.A. and Moran, J.V. DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat. Genet. 2002; 31 : 159-165.

, Morrish TA, Garcia-Perez JL, Stamato TD, Taccioli GE, Sekiguchi J, Moran JV, Endonuclease-independent LINE-1 retrotransposition at mammalian telomeres. Nature. 2007; 446: 208-12.

, Ichiyanagi, K., Nakajima, R., Kajikawa, M. and Okada, N. (2007) Novel

retrotransposon analysis reveals multiple mobility pathways dictated by hosts. Genome Res. 2007; 17: 33-41.

, Sen, S.K., Huang, C.T., Han, K., Batzer, M.A. Endonuclease-independent insertion provides an alternative pathway for L1 retrotransposition in the human genome. Nucleic Acids Res. 2007; 35: 3741-3751.

, Srikanta D, Sen SK, Huang CT, Conlin EM, Rhodes RM, et ai. An alternative pathway for Aiu 83 retrotransposition suggests a role in DNA double strand break repair. Genomics 2009; 93: 205-212.

, Shin W, Lee J, Son S-Y, Ahn K, Kim H-S, Han, K. Human-specific HERVK insertion causes genomic variations in the human genome. PLoS ONE 2013; 8: e60605., Nussenzweig A, Nussenzweig MC. A backup DNA repair pathway moves to the forefront. Cell. 2007; 131 : 223-225.

, Iliakis G. Backup pathways of NHEJ in cells of higher eukaryotes: cell cycle dependence. Radiother Oncol. 2009; 92: 310-315. , Bogomazova AN, Lagarkova MA, Tskhovrebova LV, Shutova MV, Kiselev SL. Error- prone nonhomologous end joining repair operates in human pluripotent stem cells during late G2. Aging (Albany NY). 201 1 ; 3: 584-98,

. Fan J, Robert C, Jang YY, Liu H, Sharkis S, Baylin SB, Rassool FV. Human induced pluripotent cells resemble embryonic stem cells demonstrating enhanced levels of DNA repair and efficacy of nonhomologous end-joining. Mutat Res. 2011 ; 713: 8-17.. Glinsky GV, Glinskii AB, Berezovskaya O. Microarray analysis identifies a death- from-cancer signature predicting therapy failure in patients with multiple types of cancer. Journal of Clinical Investigation 2005; 1 15: 1503 - 21.

. Glinsky GV. Death-from-cancer signatures and stem cell contribution to metastatic cancer. Cell Cycle 2005; 4: 1171 - 5.

. Glinsky, GV. Genomic models of metastatic cancer: Functional analysis of death- from-cancer signature genes reveals aneupioid, anoikis-resistant, metastasis- enabling phenotype with altered cell cycle control and activated Polycomb Group (PcG) protein chromatin silencing pathway. Cell Cycle, 2006; 5: 1208-1216.

. Berezovska, OP, Glinskii, AB, Yang, Z, Li, X-M, Hoffman, RM, Glinsky, GV. Essential role of the Polycomb Group (PcG) protein chromatin silencing pathway in metastatic prostate cancer. Cell Cycle, 2006; 5: 1886-1901.

. Glinskii AB, Smith BA, Jiang P, Li XM, Yang M, Hoffman RM, Glinsky GV. Viable circulating metastatic cells produced in orthotopic but not ectopic prostate cancer models. Cancer Res. 2003; 63: 4239-43.

. Berezovskaya O, Schimmer AD, Glinskii AB, Pinilia C, Hoffman RM, Reed JC, Glinsky GV. increased expression of apoptosis inhibitor protein XIAP contributes to anoikis resistance of circulating human prostate cancer metastasis precursor cells. Cancer Res. 2005; 65: 2378-86.

. Glinsky GV, Glinskii AB, Berezovskaya O, Smith BA, Jiang P, Li XM, Yang M, Hoffman RM. Dual-color-coded imaging of viable circulating prostate carcinoma cells reveals genetic exchange between tumor cells in vivo, contributing to highly metastatic phenotypes. Cell Cycle. 2006; 5: 191-7.

. Holt, S., Glinsky, V.V., ivanova, A.B., Glinsky, G.V. Resistance to apopiosis in human cells conferred by telomerase function and telomere stability. Molecular Carcinogenesis 1999; 25: 241-248.

. Glinsky, G.V., Glinsky, V.V., Ivanova, A.B., Hueser, C.N. Apoptosis and metastasis: Increased apoptosis resistance of metastatic cancer cells is associated with the profound deficiency of apoptosis execution mechanisms. Cancer Letters 1997; 115: 185-193.

, Glinsky, G.V. Apoptosis in metastatic cancer cells. Crit. Rev. Oncol/Hemat. 1997; 25: 175-186.

. Glinsky, GV, Glinsky, VV. Apoptosis and metastasis: A superior resistance of metastatic cancer cells to programmed cell death. Cancer Letters 1996; 101 : 43-51.. Glinsky GV. Stem cell origin of death-from-cancer phenotypes of human prostate and breast cancers. Stem Cells Reviews 2007; 3: 79-93.

, Glinsky GV. "Sternness" genomics law governs clinical behavior of human cancer: Implications for decision making in disease management. Journal of Clinical Oncology 2008; 26:2 846-53.

, Glinsky GV, Berezovska O, Glinskii A, Genetic signatures of regulatory circuitry of embryonic stem cells (ESC) identify therapy-resistant phenotypes in cancer patients diagnosed with multiple types of epithelial malignancies. Cancer Research 2007; 87 (9 Supplement): 1272.

. Glinskii A, Berezovskaya O, Sidorenko A, Glinsky G, Sternness pathways define therapy-resistant phenotypes of human cancers. Clinical Cancer Research 2008; 14 (15 Supplement): B38.

. Schwartzberg P, Colicelli J, Goff SP. Recombination between a defective retrovirus and homologous sequences in host DNA: reversion by patch repair. J Virol. 1985; 53: 719-26.

. McClure HM. Tumors in nonhuman primates: observations during a six-year period in the Yerkes primate center colony. Am J Phys Anthropoi. 1973; 38:425-429.

. Seibold HR, Wolf RH. Neoplasms and proliferative lesions in 1065 nonhuman primate necropsies. Lab Anim Sci. 1973; 23:533-539.

. Beniashvili DS. An overview of the world literature on spontaneous tumors in nonhuman primates. J Med Primatol. 1989; 18:423-437.

. Scott, G.B.D. 1992. Comparative primate pathology. Oxford University Press, New York, NY.

. Waters DJ, Sakr WA, Hayden DW, Lang CM, McKinney L, Murphy GP, Radinsky R, Ramoner R, Richardson RC, Tindali DJ. Workgroup 4: spontaneous prostate carcinoma in dogs and nonhuman primates. Prostate. 1998; 36: 64-67.

, Simmons HA, Mattison JA. The incidence of spontaneous neoplasia in two populations of captive rhesus macaques (Macaca mulatta). Antioxid Redox Signal. 2011 ; 14: 221-7.

, Gemmell , P., Hein, J., Katzourakis, A. Orthologous endogenous retroviruses exhibit directional selection since the chimp-human split. Retrovirology 2015; 12: 52. Subramanian, R.P., Wildschutte, J.H., Russo, C, Coffin, J.M. Identification, characterization, and comparative genomic distribution of the HERV-K (HML-2) group of human endogenous retroviruses. Retrovirology 201 1 ; 8: 90.

Hohn, O., Hanke, K., Bannert, N. HERV-K( HML-2), the best preserved family of HERVs: Endogenization, expression, and implications in health and disease. Front Oncol 2013; 3: 246.

Bhardwaj, N., Coffin, J.M. Endogenous Retroviruses and Human Cancer: is There Anything to the Rumors? Cell Host & Microbes 2014; 15: 255-250.

Kent, WJ. BLAT - the BLAST-iike alignment tool. Genome Res. 2002; 12: 656-864. Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., and Miller, W. Human-mouse alignments with BLASTZ. Genome Res. 2003; 13: 103-107.

Tay, S.K., Blythe, J., and Lipovich, L. Global discovery of primate-specific genes in the human genome. Proc. Natl. Acad. Sci. USA 2009; 106: 12019-12024.

Capra, J. A., Erwin, G.D., McKinsey, G., Rubenstein, J.L., Pollard, K.S. Many human accelerated regions are developmental enhancers. Phiios Trans R Soc Lond B Biol Sci. 2013; 368 (1632): 20130025.

Marnetto D, Molineris I, Grassi E, Provero P. Genome-wide identification and characterization of fixed human-specific regulatory regions. Am J Hum Genet 2014; 95: 39-48.

Gittelman RM, Hun E, Ay F, Madeoy J, Pennacchio L, Noble WS, Hawkins RD, Akey JM. 2015. Comprehensive identification and analysis of human accelerated regulatory DNA. Genome Res 2015; 25: 1245-55. 69. Guttman, Μ,, Donaghey, J,, Carey, B.W., Garber, M., Grenier, J.K., Munson, G., Young, G., Lucas, A.B., Ach, R., Bruhn, L, Yang, X., Amit, I., Meissner, A., Regev,

A. , Rinn, J.L., Root, D.E., and Lander, E.S. lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 201 1 ; 477: 295-300.

70. Giinsky, GV. Rapidly evolving in humans topologically associating domains. 2015.

arXiv: 1507.05368 .

71. Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., and Ren,

B. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 2012; 485: 378-380.

72. Dowen J.M., Fan Z.P., Hnisz D., Ren G., Abraham B.J., Zhang L.N., Weintraub A.S., Schuijers J., Lee T.I., Zhao K., Young RA. Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes. Cell 2014; 159: 374-387.

73. Hnisz, D., Abraham, B.J., Lee, T.L, Lau, A., Saint-Andre " , V., Sigova, A. A., Hoke, H.A., and Young, RA. Super-enhancers in the control of cell identify and disease. Cell 2013; 155: 934-947.

74. Whyte, W.A., Orlando, D.A., Hnisz, D., Abraham, B.J., Lin, C.Y., Kagey, M.H., Rahl, P.B., Lee, T.L, and Young, RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 2013; 153: 307-319.

75. Meyer, L.R., Zweig, A.S., Hinrichs, A.S., Karolchik, D., Kuhn, R.M., Wong, M., Sloan,

C. A., Rosenbloom, K.R., Roe, G., Rhead, B., Raney, B.J., Pohi, A., Mailadi, V.S., Li, C.H., Lee, B.T., Learned, K., Kirkup, V., Hsu, F., Heitner, S., Harte, R.A., Haeussler, M., Guruvadoo, L, Goldman, M., Giardine, B.M., Fujita, P. A., Dreszer, T.R.,

Diekhans, M., Cline, M.S., Clawson, H., Barber, G.P., Haussier, D., and Kent, W.J. The UCSC Genome Browser database: extensions and updates 2013, Nucleic Acids Res. 2013; 41 : D64-69.

78. Lister, R., Pelizzoia, M., Dowen, R.H., Hawkins, R.D., Hon, G., Tonti-Filippini, J., Nery, J.R., Lee, L., Ye, Z., Ngo, Q.M., Edsali, L., Antosiewicz-Bourget, J., Stewart, R., Ruotti, V., Millar, A.H., Thomson, J.A., Ren, B., and Ecker, JR. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 2009; 462: 315-322.

77, Lister R, Mukamei EA, Nery JR, Urich M, Puddifoot CA, Johnson ND, Lucero J, Huang Y, Dwork AJ, Schultz MD, Yu M, Tonti-Filippini J, Heyn H, Hu S, Wu JC, Rao A, Esteller M, He C, Haghighi FG, Sejnowski TJ, Behrens MM, Ecker JR. Global epigenomic reconfiguration during mammalian brain development. Science 2013; 341 : 1237905.

78, Rosenbioom, K.R., Sloan, C.A., Mailadi, V.S., Dreszer, T.R., Learned, K., Kirkup, V.M., Wong, M.C., Maddren, M., Fang, R., Heitner, S.G., Lee, B.T., Barber, G.P., Harte, R.A., Diekhans, M., Long, J.C., Wilder, S.P., Zweig, A.S., Karoichik, D., Kuhn, R.M., Haussier, D., and Kent, WJ. ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res 2013; 41 : D56-63.

79, Li, G., Ruan, X., Auerbach, R.K., Sandhu, K.S., Zheng, M., Wang, P., Poh, H.M., Goh, Y., Lim, J., Zhang, J., Sim, H.S., Peh, S.Q., Muiawadi, F.H., Ong, C.T., Oriov, Y.L., Hong, S., Zhang, Z., Landt, S., Raha, D., Euskirchen, G., Wei, C.L., Ge, W., Wang, H., Davis, C, Fisher-Ayior, K.I., Mortazavi, A., Gerstein, M., Gingeras, T., Wold, B., Sun, Y., Fuilwood, M.J., Cheung, E., Liu, E., Sung, W.K., Snyder, M., and Ruan, Y. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 2012; 148: 84-98. 80, Wang, J,, Zhuang, J., Iyer, S., Lin, X,, Whitfield, T.W., Greven, M.C., Pierce, B.G., Dong, X., Kundaje, A., Cheng, Y., Rando, O.J., Birney, E., Myers, R.M., Noble, W.S., Snyder, M., and Weng, Z, Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 2012; 22: 1798-1812.

81 , Ernst, J., and Kellis, M. 2013, Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types. Genome Res. 2013; 23: 1 142-1 154.

82, Reich, D., Green, R.E., Kircher, M., Krause, J., Patterson, N., Durand, E.Y., Viola, B., Briggs, A.W., Stenzel, U,, Johnson, P.L., Maricic, T., Good, J.M., Marques-Bonet, T., Alkan, C, Fu, Q., Mailick, S., Li, H., Meyer, M., Eichler, E.E., Stoneking, M.,

Richards, M., Talamo, S., Shunkov, M.V., Derevianko, A. P., Hublin, J.J., Kelso, J., Slatkin, M., Paabo, S. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 2010; 468: 053-1060.

83, Meyer, M., Kircher, M., Gansauge, M.T., Li, H., Racimo, F., Mailick, S., Schraiber, J.G., Jay, F., Prufer, K., de Filippo, C, Sudmant, P.H., Alkan, C, Fu, Q., Do, R., Rohiand, N., Tandon, A., Siebauer, M., Green, R.E., Bryc, K., Briggs, A.W., Stenzel, U., Dabney, J., Shendure, J., Kitzman, J., Hammer, M.F., Shunkov, M.V.,

Derevianko, A. P., Patterson, N., Andres, A.M., Eichler, E.E., Slatkin, M., Reich, D., Keiso, J., Paabo, S. A high-coverage genome sequence from an archaic Denisovan individual. Science 2012; 338: 222-226.

84, Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scotf C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz Dl, Jackson JD, Ke Z, Lanczycki CJ, Lu F, Marchler GH, Mullokandov M, Omelchenko MV, Robertson CL, Song JS, Thanki N, Yamashita RA, Zhang D, Zhang N, Zheng C, Bryant SH. CDD: a Conserved Domain Database for ihe functional annotation of proteins. Nucleic Acids Res. 2011 ; 39: D225-9.

85. Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S2, Chitsaz F, Geer LY, Geer RC, He J, Gwadz M, Hurwitz Dl, Lanczycki CJ, Lu F, Marchler GH, Song JS, Thanki N, Wang Z, Yamashita RA, Zhang D, Zheng C, Bryant SH. CDD: NCBI's conserved domain database. Nucleic Acids Res. 2015; 43: D222-6.

86, Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., and Church, GM. 1999.

Systematic determination of genetic network architecture. Nat. Genet.1999; 22: 281- 285.

Legends: shHERVH or shLBP9, small haipln RNAs against HERVH or LBP9; NA, not applicable;

*Number of genes with significant expression changes in both shHERVH and shLBP9 experiments;

**Ratio of HERVH/LBP9 regulated genes to genes expression of which was not significantly changed;

***Foid enrichment of HERVH/LBP9 regulated genes was calculated compared to the entire set of 87-genes associated with the human embryo development;

****P values were estimated using the hypergeometric distribution test;

Legends: *Sequences conserved in non-human primates were defined based on successful direct and reciprocal conversions between human, bonobo, and chimpanzee reference genome databases using the LiftOver algorithm (MinMatch threshoid setting of 0.95) as described in [3]; **HSRS, human-specific regulatory sequences; ***Sequences of 1 ,222 full-length LTR7/HERVH were successfully converted between hg19 and hg3S database releases of the human reference genome; # Two-sided Fisher's exact test versus inactive LTR7/HERVH elements.

C/> C CD

m c/) ¾ m m

7J c

I- m σ>

C/>

C CD

m

C/> o

X

m

m

7J

C

m

σ>

Tables 10-14 (Data Set S2) contain descriptions of human-specific SCARs loci defined based on the direct and reciprocal sequence alignment conversion failures during the comparisons of the human genome sequences to the sequences of the genomes of 17 the primates, including genomes of Chimpanzee, Bonobo, Gorilla, Orangutan, Gibbon, and Rhesus. Tables 10-X also denote for each SCARs loci the size of human-specific deletions of ancestral DNA defined by the sequence alignments to the genomes of 17 primates.

Table 10, 251 b.c. failures

85

88

93

σ>

102

Note: Tables 4-9 are "Data Set SI", Tables 10-14 are "Data Set S2", and Tables 15-17 are "Data

Set S3".

PARAGRAPH 1: A method for diagnosing cancer or predicting cancer-therapy outcome in a subject, comprising: generating target marker information responsive to one or more inputs indicative of a genomic signature pathway and one or more inputs indicative of a proteomic signature pathway of endogenous human Stem Cell-Associated Retroviruses (SCAR); and generating aberrant object information responsive to comparing detected expression levels and sequence information of a biological sample with target marker information.

In an embodiment, generating aberrant object information includes displaying the aberrant object information on a client device, a user interface, and the like. In an embodiment, generating aberrant object information includes exchanging the aberrant object information with a remote network. Non-limiting examples of aberrant object information include aberrant sequence information, aberrant expression level information, expression level is above a target threshold information, detected positioning of a plurality of bases, sequence aberrant score, and the like.

Further non-limiting examples of aberrant object information includes information indicative of a threshold level derived by comparing reference information derived from samples obtained from biological subjects; information indicative of a comparison of at least one input indicative of an expression levels and at least one input indicative of a sequence of a biological sample with target marker information; and the like.

PARAGRAPH 2: The method of according to PARAGRAPH 1, wherein generating the target marker information includes generating target marker information responsive to one or more inputs indicative of a SCARs pathway.

PARAGRAPH 3: The method of according to PARAGRAPH 1, wherein generating the target marker information includes generating target marker information responsive to one or more inputs indicative of a SCARs pathway target gene.

PARAGRAPH 4: The method of according to PARAGRAPH 1, wherein generating the target marker information includes generating target marker information associated with one or more of ELF3; PCDH15; MALAT1; PTPNll; RBI; CHST6; NFl; VEZFl; TP53; SMAD4; KEAPl; STKll; PRX; ZNF28; lDHl; FEZ2; DPPA2; LPHN3; KIAA1244; EPHA7; EGFR; TLR4; DAB2IP; NOTCH 1; GLUD2; DMD; KDM6A; KRAS; CDKN2A; DNMT3A; FLT3; NFE2L2; NPM1; MlR142; FOXL2; H3F3A; H3F3B; KMT2D ; RNF43 ; TERT; ERBB2; PLCG1.

PARAGRAPH 5: The method of according to PARAGRAPH 1, wherein generating the target marker information includes generating target marker information associated with one or more of mRNA, RNA, DNA, peptide or protein,

PARAGRAPH 6: The method of according to PARAGRAPH 1. wherein generating the target marker information includes generating target marker information associated with one or more of PLCXD1, HKRl, ZNF283, ADA, AMACR+p63, ANK3, BCL2L1, BIRC5, BMl-l, BUBl, CCNB1, CCND1, CESl, CHAFIA, CRIPl, CRYAB, ESMl, EZH2, FGFR2, FOS, Gbx2, HCFCl, IER3, lTPRl, JUNB, KLF6, KI67, KNTC2, MGC5466, Phcl, RNF2, Suzl2, TCF2, TRAP100, USP22, WntSA and ZFP36,

PARAGRAPH 7: The method of according to PARAGRAPH 1, wherein generating the aberrant object information includes generating aberrant sequence information when a quality of a sequence associated with the biological sample is distinct as compared with one or more reference sequences,

PARAGRAPH 8: The method of according to PARAGRAPH 1, wherein generating the aberrant object information includes generating aberrant sequence information responsive to one or more inputs indicative of a distinct positioning of a plurality of bases within an entire sequence associated with the biological sample, as compared with one or more reference sequences.

PARAGRAPH 9: The method of according to PARAGRAPH 1, wherein generating the aberrant object information includes generating aberrant sequence information responsive to one or more inputs indicative of a distinct fragment of a sequence associated with the biological sample, as compared with one or more reference sequences.

PARAGRAPH 10: The method of according to PARAGRAPH 1, wherein generating the aberrant object information includes generating aberrant expression level information responsive to one or more inputs indicative of when an expression level exceeds a target threshold. PARAGRAPH 11: The method of according to PARAGRAPH 1, wherein generating the aberrant object information includes determining expression level aberrant score when a detected expression level is above a target threshold

PARAGRAPH 12: The method of according to PARAGRAPH 1, wherein generating the aberrant object information includes determining a sequence aberrant score when a detected positioning of a plurality of bases associated with the biological sample is distinct compared with a one or more reference sequences.

PARAGRAPH 13: The method of according to PARAGRAPH 1, wherein generating the aberrant object information includes determining a sequence aberrant score responsive to one or more inputs from a next generation sequencing, multicolor quantitative immunofluorescence co-localization analysis, fluorescence in situ hybridization, and quantitative RT-PCR analysis.

PARAGRAPH 14: The method of according to PARAGRAPH 1, wherein generating the aberrant object information includes determining a threshold level by comparing reference information derived from samples obtained from biological subjects with known diagnosis or known clinical outcome after therapies,

PARAGRAPH 15: The method of according to PARAGRAPH 14, further comprising: generating a cancer-therapy efficacy status, cancer therapy progress, a cancer prognosis, a cancer diagnosis responsive to one or more inputs indicative of an aberrant expression and an expression level above a target threshold coefficient of at least two markers.

PARAGRAPH 16: The method of according to PARAGRAPH 1, wherein generating the aberrant object information includes generating aberrant sequence information and marker co-expression level information.

PARAGRAPH 17: The method of according to PARAGRAPH 1, further comprising:

generating a cancer-therapy efficacy status responsive to one or more inputs indicative of an aberrant sequence and a threshold marker co-expression level,

PARAGRAPH 18: The method of according to PARAGRAPH 1, further comprising: generating information indicative of the presence or absence of cancer in a biological subject responsive to one or more inputs indicative of an aberrant sequence and a threshold marker co-expression level. PARAGRAPH 19: A system for diagnosing cancer or predicting cancer-therapy outcome in a subject, comprising: circuitry configured to generate target marker information responsive to one or more inputs indicative of a genomic signature pathway and one or more inputs indicative of a proteomic signature pathway of endogenous human Stem Cell-Associated Retroviruses (SCAR); and circuitry configured to generate aberrant object information responsive to comparing at least one input indicative of an expression levels and at least one input indicative of a sequence of a biological sample with target marker information.

PARAGRAPH 20: The system of according to PARAGRAPH 19, further comprising: circuitry configured to generate information indicative of the presence or absence of cancer in a biological subject responsive to one or more inputs indicative of an aberrant sequence and a threshold marker co- expression level.

PARAGRAPH 21: The system of according to PARAGRAPH 19, further comprising: circuitry configured to generate a cancer-therapy efficacy status, cancer therapy progress, a cancer prognosis, a cancer diagnosis responsive to one or more inputs indicative of an aberrant expression and an expression level above a target threshold coefficient of at least two markers.

PARAGRAPH 22: The system of according to PARAGRAPH 19, further comprising: circuitry configured to generate a cancer-therapy efficacy status responsive to one or more inputs indicative of an aberrant sequence and a threshold marker co-expression level.

PARAGRAPH 23: A system for treating cancer, comprising: circuitry configured to acquire information associated with a Stem Cell-Associated Retroviruses (SCAR) pathway activation in a subject diagnosed with cancer; and circuitry configured to identify single therapeutic agent or combination of therapeutic agents and to generate user-specific treatment protocol responsive to one or more inputs associated with a Stem Celi-Associated Retroviruses (SCAR) pathway activation in a subject diagnosed with cancer.

PARAGRAPH 24: A method for diagnosing cancer or predicting cancer-therapy outcome in a subject, comprising: concurrently screening a biological sample for a presence of an aberrant sequences and an aberrant expression level of one or more target markers associated with a pathway involving genomic and proteomic signatures of endogenous human Stem Cell-Associated Retroviruses (SCAR); scoring a sequence associated with the biological sample as aberrant when the quality of the sequence is distinct compared with a reference sequence; and scoring an expression level associated with the biological sample as being aberrant when a detected expression level is above a target threshold coefficient. In an embodiment, a method for diagnosing cancer or predicting cancer-therapy outcome in a subject, comprising: screening a biological sample for at least one of a presence of an aberrant sequences and an aberrant expression level of one or more target markers associated with a pathway involving genomic and proteomic signatures of endogenous human Stem Cell-Associated Retroviruses (SCAR); scoring a sequence associated with the biological sample as aberrant when the quality of the sequence is distinct compared with a reference sequence; and scoring an expression level associated with the biological sample as being aberrant when a detected expression level is above a target threshold coefficient.

PARAGRAPH 25: The method of according to PARAGRAPH 24, wherein concurrently screening a biological sample for a presence of an aberrant sequences and an aberrant expression level of one or more target markers associated with a pathway involving genomic and proteomic signatures of endogenous SCAR, includes concurrently screening a biological sample for a presence of an aberrant sequences and an aberrant expression level of one or more target markers indicative of a cancer diagnosis or a prognosis for cancer-therapy failure in a biological subject.

PARAGRAPH 26: The method of according to PARAGRAPH 25, further comprising: generating a user-specific cancer therapy protocol responsive to one or more inputs indicative of an aberrant sequence or an aberrant expression level associated with a cancer diagnosis or a prognosis for cancer- therapy failure in a biological subject.

PARAGRAPH 27: The method of according to PARAGRAPH 24, wherein concurrently screening a biological sample for a presence of an aberrant sequences and an aberrant expression level of one or more target markers associated with a pathway involving genomic and proteomic signatures of endogenous SCAR, includes concurrently screening a biological sample for a presence of an aberrant sequences and an aberrant expression level of one or more target markers indicative of a progress of cancer therapy in a biological subject. PARAGRAPH 28: The method of according to PARAGRAPH 27, further comprising: generating a user-specific cancer therapy protocol responsive to one or more inputs indicative of an aberrant sequence or an aberrant expression level associated with a progress of cancer therapy in a biological subject.

PARAGRAPH 29: The method of according to PARAGRAPH 24, wherein the detection threshold is being determined by comparing to the values in a reference database of samples obtained from subjects with known diagnosis or known clinical outcome after therapies, wherein the presence of an aberrant expression level of at least one but preferably, two or more markers in the test sample and presence of aberrant expression of two or more such markers is indicative of a cancer diagnosis or a prognosis for cancer-therapy failure, or of the progress of cancer therapy in the subject.

PARAGRAPH 30: The method of according to PARAGRAPH 24, where the detection threshold is continuously refined by adding the outcome data of each patient tested to the reference database of samples, and in an automated and/or recursive manner either manually or using computational methods using data stored either locally, in remote server(s), or in the cloud, continuously improving the accuracy of diagnosis, prognosis, or specification of future cancer therapy,

PARAGRAPH 31: The method of according to PARAGRAPH 24, wherein said sample phenotype is selected from the group consisting of cancer, non-cancer, recurrence, non-recurrence, relapse, non- relapse, invasiveness, non-invasiveness, metastatic, non-metastatic, localized, tumor size, tumor grade, Gleason score, survival prognosis, lymph node status, tumor stage, degree of differentiation, age, hormone receptor status, tumor antigen level (including but not limited to PSA level, PSMA level, survivin level, oncofetal protein level, testis antigen level), histologic type, level of, phenotype and genotype of and activation status of immune cells, and disease free survival,

PARAGRAPH 32: The method of according to PARAGRAPH 24, wherein said threshold coefficient has an absolute value > 0.5.

PARAGRAPH 33: The method of according to PARAGRAPH 24, wherein said threshold coefficient has an absolute value > 0.6.

PARAGRAPH 34: The method of according to PARAGRAPH 24, wherein said threshold coefficient has an absolute value > 0,7. PARAGRAPH 35: The method of according to PARAGRAPH 24, wherein said threshold coefficient has an absolute value > 0,8.

PARAGRAPH 36: The method of according to PARAGRAPH 24, wherein said threshold coefficient has an absolute value > 0.9.

PARAGRAPH 37: The method of according to PARAGRAPH 24, wherein said threshold coefficient has an absolute value > 0.95.

PARAGRAPH 38: The method of according to PARAGRAPH 24, wherein said threshold coefficient has an absolute value > 0.99.

PARAGRAPH 39: The method of according to PARAGRAPH 24, wherein said threshold coefficient has an absolute value > 0.995.

PARAGRAPH 40: The method of according to PARAGRAPH 24, wherein said threshold coefficient has an absolute value > 0.999.

PARAGRAPH 41: A method of determining detection threshold for classifying a sample phenotype, comprising: identifying a subset of markers and scoring marker expression in cells according to the method of according to PARAGRAPH 2.4; and determining the sample classification accuracy at different detection thresholds using a reference database of samples from subjects with known phenotypes.

PARAGRAPH 42: The method of according to PARAGRAPH 41, comprising determining the sample classification accuracy in an automated and/or recursive manner either manually or using computational methods using data stored either locally, in remote server(s), or in the cloud.

PARAGRAPH 43: The method of according to PARAGRAPH 41, further comprising determining the best performing magnitude of said detection threshold and using said magnitude to assess the reliability of said established detection threshold in classifying a sample phenotype.

PARAGRAPH 44: The method of according to PARAGRAPH 41, further comprising determining the best performing magnitude of said detection threshold and using said magnitude to assess the reliability of said established detection threshold in classifying a sample phenotype in an automated and/or recursive manner either manually or using computational methods using data stored either locally, in remote server(s), or in the cloud. PARAGRAPH 45: The method of according to PARAGRAPH 41, further comprising using the best performing magnitude of said detection threshold to score an unclassified sample and assign a sample phenotype to said sample.

PARAGRAPH 46: The method of according to PARAGRAPH 41, further comprising using the best performing magnitude of said detection threshold to score an unclassified sample and assign a sample phenotype to said sample either manually or using computational methods using data stored either locally, in remote server(s), or in the cloud.

PARAGRAPH 47: The method of according to PARAGRAPH 41, wherein said subset of markers consists essentially of the genes, genetic loci, and sequences identified in Table 1A, Table 1, Table 2, Table 3, Table SI, Table S3, Table 54, Table S5, Table S6, Data Set SI, Data Set 52, Data Set S3.

PARAGRAPH 48: The method of according to PARAGRAPH 41, wherein said subset of markers consists essentially of 90Ψο of the genes, genetic loci, and sequences identified in Table lA, Table 1, Tabl 2, Table 3, Table SI, Table S3, Table S4, Table S5, Table 56, Data Set SI, Data Set S2, Data Set S3.

PARAGRAPH 49: The method of according to PARAGRAPH 41, wherein said subset of markers consists essentially of 80% of the genes, genetic loci, and sequences identified in Table 1A, Table 1, Tabl 2, Table 3, Table SI, Table S3, Table S4, Table S5, Table S6, Data Set SI, Data Set S2, Data Set S3.

PARAGRAPH 50: The method of according to PARAGRAPH 41, wherein said subset of markers consists essentially of 70% of the genes, genetic loci, and sequences identified in Table 1A, Table 1, Tabl 2, Table 3, Table SI, Table S3, Table S4, Table S5, Table S6, Data Set SI, Data Set S2, Data Set S3.

PARAGRAPH 51: The method of according to PARAGRAPH 41, wherein said subset of markers consists essentially of 60% of the genes, genetic loci, and sequences identified in Table 1A, Table 1, Tabl 2, Table 3, Table SI, Table S3, Table 54, Table S5, Table S6, Data Set SI, Data Set 52, Data Set S3.

PARAGRAPH 52: The method of according to PARAGRAPH 41, wherein said subset of markers consists essentially of 50% of the genes, genetic loci, and sequences identified in Table 1A, Table 1, Tabl 2, Table 3, Table SI, Table S3, Table S4, Table S5, Table S6, Data Set S2, Data Set S3.

PARAGRAPH 53: A method of treating cancer, comprising: detecting a molecular signals) of SCAR's pathway activation in a subject diagnosed with cancer; generating a user-specific therapeutic treatment targeted to activated SCAR's loci and/or down-stream SCARs-regulated genetic loci based on detecting the molecular signal(s) of SCAR's pathway activation.

PARAGRAPH 54: The method of according to PARAGRAPH 53, wherein the user-specific therapeutic treatment lis based on genome editing, including but not limited to CRISPR/Cas9 complex- mediated genome editing, to silence the defined genomic elements of the activated SCARs pathway.

PARAGRAPH 55: The method of according to PARAGRAPH 53, wherein the user-specific therapeutic treatment is based on genome editing, including but not limited to CRISPR/Cas9 complex- mediated genome editing, to activate the defined genomic elements of the activated SCARs pathway.

PARAGRAPH 56: The method of according to PARAGRAPH 53, wherein the user-specific therapeutic treatment is based on the application of Highly Active Anti-Retroviral Therapy (HAART),

PARAGRAPH 57: The method of according to PARAGRAPH 53, wherein the user-specific therapeutic treatment is based on administration of the antiretrovirai drug, Raltegravir (RAL, Isentress, formerly MK-0518).

PARAGRAPH 58: The method of according to PARAGRAPH 53, wherein the user-specific therapeutic treatment is based on application of anti-sense therapy directed against transcriptionally active SCAR's loci and/or defined genomic elements of the activated SCARs pathway.

PARAGRAPH 59: The method of according to PARAGRAPH 53, wherein the user-specific therapeutic treatment is based on the application of targeted immunotherapy, including but not limited to antagonist antibodies or fragments thereof, agonist antibodies or fragments thereof, autologous cells, allogeneic cells, peptides, small molecules, signaling proteins or fragments thereof, or compositions containing two or more of the above and compositions containing in a single molecule or cellular therapy all or part of two or more of the above, directed against the proteins and/or peptides encoded by the activated SCARs sequences.

PARAGRAPH 60: A method of treating cancer where the methods of according to PARAGRAPHS 39 - 45 are used to enhance tumor infiltrating lymphocytes in tumors of treated subjects, either as a sole function or to augment the activity of anti-cancer modulators of the immune system.