Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NEW METHOD FOR IDENTIFYING HERV-DERIVED EPITOPES
Document Type and Number:
WIPO Patent Application WO/2023/144231
Kind Code:
A1
Abstract:
The present invention relates to methods for identifying HERV-derived T cell epitopes associated with cancer, and peptides comprising or consisting of epitopes identified by said method, expression vectors encoding said peptides, cytotoxic T lymphocytes (CTLs) of a subject treated with said peptides or vectors and engineered T cells expressing T-cell receptors recognizing said peptides. The present invention also relates to the use of said peptides, expression vectors, CTLs or engineered T cells as a vaccine or a medicament, and in particular the use of said peptides, expression vectors, CTLs or engineered T cells for use in preventing or treating cancer in a subject in need thereof.

Inventors:
DEPIL STÉPHANE (FR)
ALCAZER VINCENT (FR)
Application Number:
PCT/EP2023/051839
Publication Date:
August 03, 2023
Filing Date:
January 25, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ERVACCINE TECH (FR)
CENTRE LEON BERARD (FR)
UNIV CLAUDE BERNARD LYON (FR)
CENTRE NAT RECH SCIENT (FR)
INST NAT SANTE RECH MED (FR)
International Classes:
A61P35/00; A61K38/08; A61K38/16; A61K39/00; A61K39/12; C07K7/06; C07K14/15; G06F17/00
Domestic Patent References:
WO2021150713A22021-07-29
WO2014004385A22014-01-03
WO2020049169A12020-03-12
WO2020049169A12020-03-12
WO2019162110A12019-08-29
Other References:
BONAVENTURA PAOLA ET AL: "Identification of shared tumor epitopes from endogenous retroviruses inducing high-avidity cytotoxic T cells for cancer immunotherapy", SCIENCE ADVANCES, vol. 8, no. 4, 26 January 2022 (2022-01-26), pages 3671, XP055935432, DOI: 10.1126/sciadv.abj3671
BONAVENTURA PAOLA ET AL: "IDENTIFICATION OF SHARED TUMOR EPITOPES FROM ENDOGENOUS RETROVIRUSES INDUCING HIGH AVIDITY CYTOTOXIC T CELLS FOR CANCER IMMUNOTHERAPY 1", J IMMUNOTHER CANCER, 10 November 2021 (2021-11-10), pages 1 - 1054, XP055935448, Retrieved from the Internet [retrieved on 20220625]
TU XIAONING ET AL: "Human leukemia antigen-A*0201-restricted epitopes of human endogenous retrovirus W family envelope (HERV-W env) induce strong cytotoxic T lymphocyte responses", VIROLOGICA SINICA, SPRINGER, DE, vol. 32, no. 4, 1 August 2017 (2017-08-01), pages 280 - 289, XP036944518, ISSN: 1674-0769, [retrieved on 20170822], DOI: 10.1007/S12250-017-3984-9
SAINI SUNIL KUMAR ET AL: "Human endogenous retroviruses form a reservoir of T cell targets in hematological cancers", vol. 11, no. 1, 1 December 2020 (2020-12-01), XP055935439, Retrieved from the Internet DOI: 10.1038/s41467-020-19464-8
RAGONE CONCETTA ET AL: "Identification and validation of viral antigens sharing sequence and structural homology with tumor-associated antigens (TAAs).", vol. 9, no. 5, 1 May 2021 (2021-05-01), pages e002694, XP055935447, Retrieved from the Internet DOI: 10.1136/jitc-2021-002694
SMITH CHRISTOF C. ET AL: "Endogenous retroviral signatures predict immunotherapy response in clear cell renal cell carcinoma", vol. 128, no. 11, 1 November 2018 (2018-11-01), GB, pages 4804 - 4820, XP055800962, ISSN: 0021-9738, Retrieved from the Internet DOI: 10.1172/JCI121476
KESSLER J H ET AL: "Identification of T-cell epitopes for cancer immunotherapy", LEUKEMIA, NATURE PUBLISHING GROUP UK, LONDON, vol. 21, no. 9, 5 July 2007 (2007-07-05), pages 1859 - 1874, XP037785409, ISSN: 0887-6924, [retrieved on 20070705], DOI: 10.1038/SJ.LEU.2404787
C. C. SMITH ET AL.: "Endogenous retroviral signatures predict immunotherapy response in clear cell renal cell carcinoma", JOURNAL OF CLINICAL INVESTIGATION, vol. 128, 2018, pages 4804 - 4820, XP055800962, DOI: 10.1172/JCI121476
BENDALL ET AL.: "Telescope: Characterization of the retrotranscriptome by accurate estimation of transposable element expression", PLOS COMPUT BIOL, vol. 15, no. 9, 30 September 2019 (2019-09-30), pages e1006453
L. VARGIU ET AL.: "Classification and characterization of human endogenous retroviruses; mosaic forms are common", RETROVIROLOGY, vol. 13, no. 7, 2016, pages 7
D. ARAN ET AL.: "xCell: digitally portraying the tissue cellular heterogeneity landscape", GENOME BIOL, vol. 18, 2017, pages 220, XP055816715, DOI: 10.1186/s13059-017-1349-1
T. J. O'DONNELL ET AL.: "MHCflurry: Open-Source Class I MHC Binding Affinity Prediction", CELL SYSTEMS, vol. 7, 2018, pages 129 - 132
B. WEN ET AL.: "PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations", GENOME RES, vol. 29, 2019, pages 485 - 493
K. E. VARLEY ET AL.: "Recurrent read-through fusion transcripts in breast cancer", BREAST CANCER RES. TREAT., vol. 146, 2014, pages 287 - 297, XP055457370, DOI: 10.1007/s10549-014-3019-2
M. R. CORCES ET AL.: "Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution", NATURE GENETICS, vol. 48, 2016, pages 1193 - 1203, XP055651307, DOI: 10.1038/ng.3646
J.-D. LAROUCHE ET AL.: "Widespread and tissue-specific expression of endogenous retroelements in human somatic tissues", GENOME MED, vol. 12, 2020, pages 1 - 16
A. DOBIN ET AL.: "STAR: ultrafast universal RNA-seq aligner", BIOINFORMATICS, vol. 29, 2013, pages 15 - 21, XP055500895, DOI: 10.1093/bioinformatics/bts635
H. LI ET AL.: "1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools", BIOINFORMATICS, vol. 25, 2009, pages 2078 - 2079
R. PATRO ET AL.: "Salmon: fast and bias-aware quantification of transcript expression using dual-phase inference", NAT METHODS, vol. 14, 2017, pages 417 - 419
S. HANZELMANN ET AL.: "GSVA: gene set variation analysis for microarray and RNA-Seq data", BMC BIOINFORMATICS, vol. 14, 2013, pages 7, XP021146329, DOI: 10.1186/1471-2105-14-7
M. S. ROONEY ET AL.: "Molecular and genetic properties of tumors associated with local immune cytolytic activity", CELL, vol. 160, 2015, pages 48 - 61, XP002782862, DOI: 10.1016/j.cell.2014.12.033
V. THORSSON ET AL.: "The Immune Landscape of Cancer", IMMUNITY, vol. 48, 2018, pages 812 - 830
J. H. FRIEDMAN ET AL.: "Regularization Paths for Generalized Linear Models via Coordinate Descent", JOURNAL OF STATISTICAL SOFTWARE, vol. 33, 2010, pages 1 - 22, XP055480579, DOI: 10.18637/jss.v033.i01
M. SILL ET AL.: "c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models", JOURNAL OF STATISTICAL SOFTWARE, vol. 62, 2014, pages 1 - 22
F. MADEIRA ET AL.: "The EMBL-EBI search and sequence analysis tools APIs in 2019", NUCLEIC ACIDS RES, vol. 47, 2019, pages W636 - W641
"UniProt: a worldwide hub of protein knowledge", NUCLEIC ACIDS RES, vol. 47, 2019, pages D506 - D515
Y. XIAO ET AL.: "A novel significance score for gene selection and ranking", BIOINFORMATICS, vol. 30, 2014, pages 801 - 807
N. J. EDWARDS ET AL.: "The CPTAC Data Portal: A Resource for Cancer Proteomics Research", J PROTEOME RES, vol. 14, 2015, pages 2707 - 2713
"Comprehensive molecular portraits of human breast tumours", NATURE, vol. 490, 2012, pages 61 - 70
K. KRUG ET AL.: "Proteogenomic Landscape of Breast Cancer Tumorigenesis and Targeted Therapy", CELL, vol. 183, 2020, pages 1436 - 1456
R. ADUSUMILLIP. MALLICK: "Data Conversion with ProteoWizard msConvert", METHODS MOL BIOL, vol. 1550, 2017, pages 339 - 368
F. LOAYZA-PUCH ET AL.: "Tumour-specific proline vulnerability uncovered by differential ribosome codon reading", NATURE, vol. 530, 2016, pages 490 - 494, XP055504716, DOI: 10.1038/nature16982
M. I. LOVE ET AL.: "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2", GENOME BIOL, vol. 15, 2014, pages 550, XP021210395, DOI: 10.1186/s13059-014-0550-8
A. ZHU ET AL.: "Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences", BIOINFORMATICS, vol. 35, 2019, pages 2084 - 2092
A. GROS ET AL.: "Recognition of human gastrointestinal cancer neoantigens by circulating PD-1+ lymphocytes", J CLIN INVEST, vol. 129, 2019, pages 4992 - 5004, XP055944981, DOI: 10.1172/JCI127967
K. K. JENSEN ET AL.: "TCRpMHCmodels: Structural modelling of TCR-pMHC class I complexes", SCI REP, vol. 9, 2019, pages 14530
K. R. ABHINANDANA. C. R. MARTIN: "Analysis and improvements to Kabat and structurally correct numbering of antibody variable domains", MOL IMMUNOL, vol. 45, 2008, pages 3832 - 3839, XP023437109, DOI: 10.1016/j.molimm.2008.05.022
M. S. KLAUSEN ET AL.: "LYRA, a webserver for lymphocyte receptor structural modeling", NUCLEIC ACIDS RES, vol. 43, 2015, pages W349 - 355
S. BOBISSE ET AL.: "Sensitive and frequent identification of high avidity neo-epitope specific CD8 + T cells in immunotherapy-naive ovarian cancer", NAT COMMUN, vol. 9, 2018, pages 1 - 10
A. FISERA. SALI: "Modeller: generation and refinement of homology-based protein structure models", METHODS ENZYMOL, vol. 374, 2003, pages 461 - 491
A. SALIT. L. BLUNDELL: "Comparative protein modelling by satisfaction of spatial restraints", J MOL BIOL, vol. 234, 1993, pages 779 - 815
G. LAUNAY ET AL.: "Evaluation of CONSRANK-Like Scoring Functions for Rescoring Ensembles of Protein-Protein Docking Poses", FRONT MOL BIOSCI, vol. 7, 2020, pages 559005
R. OLIVA ET AL.: "Ranking multiple docking solutions based on the conservation of inter-residue contacts", PROTEINS, vol. 81, 2013, pages 1571 - 1584
E. F. PETTERSEN ET AL.: "UCSF Chimera--a visualization system for exploratory research and analysis", J COMPUT CHEM, vol. 25, 2004, pages 1605 - 1612
J. DOUGLASS ET AL.: "Bispecific antibodies targeting mutant RAS neoantigens", SCI IMMUNOL, vol. 6, 2021, pages eabd5515
E. DRIEHUIS ET AL.: "Establishment of patient-derived cancer organoids for drug-screening applications", NATURE PROTOCOLS, vol. 15, 2020, pages 3380 - 3409, XP037256909, DOI: 10.1038/s41596-020-0379-4
R. TIBSHIRANI: "Regression Shrinkage and Selection via the Lasso", JOURNAL OF THE ROYAL STATISTICAL SOCIETY, vol. 58, 1996, pages 267 - 288
VARLEY ET AL.: "Broad Institute Cancer Cell Line Encyclopedia"
Attorney, Agent or Firm:
ICOSA (FR)
Download PDF:
Claims:
CLAIMS A method for identifying Human endogenous retroviruses (HERVs)-derived T cells epitopes associated with at least one cancer, wherein said method comprises the following steps:

(a) Identifying HERVs associated with at least one cancer, and

(b) Selecting T cell epitopes among the HERVs identified in the previous step, and wherein said method further comprises at least one, preferably two, of the following steps:

(i). Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a), said step being between the step (a) and the step (b), and/or

(ii). Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (b) in tumor samples, said step being after the step (b), preferably wherein steps (a), (b), (i) and (ii) are in silico steps, or wherein steps (a), (b) and (i) are in silico steps and step (ii) is an in vitro step. The method according to claim 1, wherein the step (a) comprises the step of comparing HERVs expression in tumor and in normal samples. The method according to any one of claims 1 or 2, wherein the step (b) comprises the step of aligning the sequences of the HERVs identified in the previous step with HERV proteins. The method according to claim 3, wherein the step (b) further comprises the step of predicting the binding of the sequences sharing at least 70, 75, 80, 85, 90, 95, 96, 97, 98, 99% or more identity with HERV proteins to MHC class I molecules. The method according to any one claims 1 to 4, wherein the association of the cancer-associated HERVs with a cytotoxic T cells response in the step (i) is assessed by the association of each HERV with at least one CD4 or CD8 T cell signature, the association of each HERV with a function signature being either interferon (IFN)-y signature or cytolytic activity, and the absence of expression of each HERV in normal purified T or NK cells, preferably wherein said association is assessed by a machine learning-based approach.

6. The method according to any one of claims 1 to 5, wherein said method further comprises, after the step (b) or before the step (ii), a step of selecting epitopes among the most shared epitopes in the cancer-associated HERVs identified in step (a).

7. The method according to any one of claims 1 to 6, wherein said method further comprises, after the step (b) or after the step (ii), a step of aligning the HERVs- derived T cell epitopes with human proteome.

8. A peptide comprising or consisting of an epitope identified by the method according to any one of claims 1 to 7.

9. A peptide comprising or consisting of an epitope having a sequence selected in the group comprising or consisting of RMLTDLRAV (SEQ ID NO: 3), LMAQAITGV (SEQ ID NO: 11), VLQDFDQPI (SEQ ID NO: 13), MLLAALMIV (SEQ ID NO: 15) and YIDDILCAA (SEQ ID NO: 16).

10. An expression vector inducing expression of one or more peptide(s) according to claim 8 or claim 9.

11. A cytotoxic T-lymphocyte of a subject treated with one or more peptide(s) according to claim 8 or claim 9, or one or more expression vector(s) according to claim 10.

12. An engineered T cell expressing a T-cell receptor recognizing a peptide according to claim 8 or claim 9.

13. One or more peptide(s) according to claim 8 or claim 9, one or more expression vector(s) according to claim 10, one or more cytotoxic T-lymphocyte(s) according to claim 11, or one or more engineered T cell(s) according to claim 12 for use as a vaccine or medicament. One or more peptide(s) according to claim 8 or claim 9, one or more expression vector(s) according to claim 10, one or more cytotoxic T-lymphocyte(s) according to claim 11, or one or more engineered T cell(s) according to claim 12 for use in preventing or treating a cancer in a subject in need thereof. The one or more peptide(s), the one or more expression vector(s), the one or more cytotoxic T-lymphocyte(s) or the one or more engineered T cell(s) for use according to claim 14, wherein said cancer is selected from the group comprising or consisting of breast cancer, including triple negative breast cancer, ovarian cancer, melanoma, sarcoma, teratocarcinoma, bladder cancer, lung cancer, including non-small cell lung carcinoma and small cell lung carcinoma, head and neck cancer, colorectal cancer, glioblastoma, leukemias, lymphomas and other solid tumors and hematological malignancies.

Description:
NEW METHOD FOR IDENTIFYING HERV-DERIVED EPITOPES

FIELD OF INVENTION

[1] The present invention relates to methods for identifying HERV-derived T cell epitopes associated with cancer, peptides comprising or consisting of the epitopes identified by said method, expression vectors encoding said peptides, cytotoxic T lymphocytes (CTLs) of a subject treated with said peptides or vectors and engineered T cells expressing T-cell receptors recognizing said peptides. The present invention also relates to the use of said peptides, expression vectors, CTLs or engineered T cells as a vaccine or a medicament, and in particular, the use of said peptides, expression vectors, CTLs, or engineered T cells for preventing or treating a cancer in a subj ect in need thereof

BACKGROUND OF INVENTION

[2] The adaptive T cell immune response in cancer relies on the recognition of tumor epitopes specifically expressed by tumor cells. The role of neoantigens, generated by non- synonymous mutations specific to the tumor genome, has been extensively studied in the last decade and many clinical trials testing combinations of neoantigens in personalized cancer vaccines have been initiated, with encouraging preliminary results. However, determining the optimal combination of neoepitopes for each patient remains challenging. Furthermore, many tumors are characterized by a low or moderate tumor mutational burden. Therefore, unveiling other families of tumor antigens, such as those derived from splice variants, fusion proteins or endogenous retroelements, possibly shared among different cancer subtypes, is of utmost importance for the development of off-the-shelf therapies in solid tumors.

[3] Human endogenous retroviruses (HERVs) represent 8% of the human genome. Although most HERV genes are non-functional due to DNA recombination, mutations, and deletions, some produce functional proteins including the group-specific antigen (Gag), polymerase (Pol) with reverse transcriptase, and the envelope (Env) surface unit. Most HERVs are silenced by epigenetic mechanisms in normal cells. However, HERVs were reported to be possible pathogenic agents in carcinogenesis, through their involvement in insertional mutagenesis, chromosomal aberrations or LTR-induced oncogene-activation. Thus, HERVs may represent an interesting source of shared tumor antigens.

[4] Among the prior art, the patent application W02020/049169 describes a method that enabled the identification of epitopes derived from HERVs and able to elicit a specific CD8+ T cell response. However, the method developed in this patent application was applied to a single cancer type, i.e. triple-negative breast cancer (TNBC), having a limited number of HERVs. Thus, there is still a need to develop new approaches able to identify HERVs-derived antigens that are efficient to induce CD8+ T cells response and that are shared among multiple cancer subtypes.

[5] In the present application, the Inventors have developed a new approach that addresses this need. In particular, the Inventors have developed a method relying on selection steps that allows the determination of a limited number of epitopes, that are efficient and shared by multiple cancer subtypes, among a large number of HERV candidates.

SUMMARY

[6] The present invention relates to a method for identifying Human endogenous retroviruses (HERVs)-derived T cells epitopes associated with at least one cancer, wherein said method comprises the following steps:

(a). Identifying HERVs associated with at least one cancer, and

(b). Selecting T cell epitopes among the HERVs identified in the previous step, and wherein said method further comprises at least one, preferably two, of the following steps:

(i). Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a), said step being between step (a) and step (b), and/or (ii). Assessing the expression at the protein or peptide level of the HERVs-derived T cells epitopes identified in step (b) in tumor samples, said step being after step (b).

[7] In one embodiment, steps (a) and (b) and steps (i) and/or (ii) are in silico steps. In one embodiment, step (ii) is an in vitro step.

[8] The present invention relates to an in silico method for identifying Human endogenous retroviruses (HERVs)-derived T cells epitopes associated with a cancer, wherein said method comprises the following steps:

(a) Identifying and selecting HERVs associated with a cancer, and

(b) Selecting shared T cell epitopes among the HERVs identified in the previous step, and wherein said method further comprises at least one, preferably two, of the following steps:

(i). Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a), said step being between the step (a) and the step (b), and/or

(ii). Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (b) in tumor samples, said step being after the step (b).

[9] The present invention relates to an in silico method for identifying Human endogenous retroviruses (HERVs)-derived T cells epitopes associated with a cancer, preferably associated with at least one cancer, wherein said method comprises the following steps:

(a) Identifying HERVs associated with a cancer, preferably associated with at least one cancer, and

(b) Selecting T cell epitopes among the HERVs identified in the previous step, and wherein said method further comprises at least one, preferably two, of the following steps: (i). Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a), said step being between the step (a) and the step (b), and/or (ii). Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (b) in tumor samples, said step being after the step (b).

[10] In one embodiment, the step (a) comprises the step of comparing HERVs expression in tumor and in normal samples.

[11] In one embodiment, the step (b) comprises the step of aligning the sequences of the HERVs identified in the previous step with HERV proteins.

[12] In one embodiment, the step (b) further comprises the step of predicting the binding of the sequences sharing at least 70, 75, 80, 85, 90, 95, 96, 97, 98, 99% or more identity with HERV proteins to MHC class I molecules.

[13] In one embodiment, the association of the cancer-associated HERVs with a cytotoxic T cells response in the step (i) is assessed by the association of each HERV with at least one CD4 or CD8 T cell signature, the association of each HERV with a function signature being either interferon (TFN)-y signature or cytolytic activity, and the absence of expression of each HERV in normal purified T or NK cells, preferably said association is assessed by a machine learning-based approach.

[14] In one embodiment, said method further comprises, after the step (b) or before the step (ii), a step of selecting epitopes among the most shared epitopes in the cancer- associated HERVs identified in step (a).

[15] In one embodiment, said method further comprises, after the step (b) or after the step (ii), a step of aligning the HERVs-derived T cell epitopes with human proteome.

[16] The present invention also relates to a peptide comprising or consisting of an epitope identified by the method as described previously.

[17] The present invention also relates to a peptide comprising or consisting of an epitope having a sequence selected in the group comprising or consisting of RMLTDLRAV (SEQ ID NO: 3), LMAQAITGV (SEQ ID NO: 11), VLQDFDQPI (SEQ ID NO: 13), ALMIVSMVV (SEQ ID NO: 14), MLLAALMIV (SEQ ID NO: 15) and YIDDILCAA (SEQ ID NO: 16). The present invention also relates to a peptide comprising or consisting of a sequence selected in the group comprising or consisting of RMLTDLRAV (SEQ ID NO: 3), LMAQAITGV (SEQ ID NO: 11), VLQDFDQPI (SEQ ID NO: 13), ALMIVSMVV (SEQ ID NO: 14), MLLAALMIV (SEQ ID NO: 15) and YIDDILCAA (SEQ ID NO: 16).

[18] The present invention also relates to a peptide comprising or consisting of an epitope having a sequence selected in the group comprising or consisting of RMLTDLRAV (SEQ ID NO: 3), LMAQAITGV (SEQ ID NO: 11), VLQDFDQPI (SEQ ID NO: 13), MLLAALMIV (SEQ ID NO: 15) and YIDDILCAA (SEQ ID NO: 16). The present invention also relates to a peptide comprising or consisting of a sequence selected in the group comprising or consisting of RMLTDLRAV (SEQ ID NO: 3), LMAQAITGV (SEQ ID NO: 11), VLQDFDQPI (SEQ ID NO: 13), MLLAALMIV (SEQ ID NO: 15) and YIDDILCAA (SEQ ID NO: 16).

[19] The present invention also relates to an expression vector inducing expression of one or more peptide(s) as described hereinabove.

[20] The present invention also relates to a cytotoxic T-lymphocyte of a subject treated with one or more peptide(s), or one or more expression vector(s) as described hereinabove.

[21] The present invention also relates to an engineered T cell expressing a T-cell receptor recognizing a peptide as described hereinabove.

[22] The present invention also relates to one or more peptide(s), one or more expression vector(s), one or more cytotoxic T-lymphocyte(s), or one or more engineered T cell(s) as described hereinabove for use as a vaccine or medicament.

[23] The present invention also relates to one or more peptide(s), one or more expression vector(s), one or more cytotoxic T-lymphocyte(s), or one or more engineered T cell(s) as described hereinabove for use in preventing or treating a cancer in a subject in need thereof. [24] In one embodiment, said cancer is selected from the group comprising or consisting of breast cancer, including triple negative breast cancer, ovarian cancer, melanoma, sarcoma, teratocarcinoma, bladder cancer, lung cancer, including non-small cell lung carcinoma and small cell lung carcinoma, head and neck cancer, colorectal cancer, glioblastoma, leukemias, lymphomas and other solid tumors and hematological malignancies.

DEFINITIONS

[25] In the present invention, the following terms have the following meanings:

[26] “Epitope” refers to a portion of an antigen, that is capable of stimulating an immune response.

[27] “Human endogenous retroviruses (HERV)”: refers to retroviruses that have integrated into the germline that have lost infectious capability but retained the capability to transpose.

[28] “Peptide” refers to a linear polymer of amino acids of at least 2 amino acids linked together by peptide bonds. Amino acid residues in peptides are abbreviated as follows: Phenylalanine is Phe or F; Leucine is Leu or L; Isoleucine is He or I; Methionine is Met or M; Valine is Vai or V; Serine is Ser or S; Proline is Pro or P; Threonine is Thr or T; Alanine is Ala or A; Tyrosine is Tyr or Y; Histidine is His or H; Glutamine is Gin or Q; Asparagine is Asn or N; Lysine is Lys or K; Aspartic Acid is Asp or D; Glutamic Acid is GIu or E; Cysteine is Cys or C; Tryptophan is Trp or W; Arginine is Arg or R; and Glycine is Gly or G.

[29] “Prevent”, “preventing” and “prevention” refer to prophylactic and preventative measures, wherein the object is to reduce the chances that a subject will develop the pathologic condition or disorder over a given period of time. Such a reduction may be reflected, e.g., in a delayed onset of at least one symptom of the pathologic condition or disorder in the subject. [30] “Protective immune response” refers to a cytotoxic T lymphocytes (CTL) and/or an helper T lymphocytes (HTL) response to an antigen derived from an infectious agent or a tumor antigen, which prevents or at least partially arrests disease symptoms or progression. The immune response may also include an antibody response which has been facilitated by the stimulation of helper T cells.

[31] “Subject” refers to a mammal, preferably a human. In one embodiment, a subject may be a “patient”, z'.e., a warm-blooded animal, more preferably a human, who/which is awaiting the receipt of, or is receiving medical care or was/is/will be the object of a medical procedure, or is monitored for the development of a disease. The term “mammal” refers here to any mammal, including humans, domestic and farm animals, and zoo, sports, or pet animals, such as dogs, cats, cattle, horses, sheep, pigs, goats, rabbits, etc. Preferably, the mammal is a primate, more preferably a human.

[32] “Therapeutically effective amount” refers to the level or amount of one or more peptide(s), one or more expression vector(s), one or more CTL(s), or one or more engineered T cell(s) as described herein that is aimed at, without causing significant negative or adverse side effects to the target, (1) delaying or preventing the onset of a disease, disorder, or condition; (2) slowing down or stopping the progression, aggravation, or deterioration of one or more symptoms of the disease, disorder, or condition; (3) bringing about ameliorations of the symptoms of the disease, disorder, or condition; (4) reducing the severity or incidence of the disease, disorder, or condition; or (5) curing the disease, disorder, or condition. A therapeutically effective amount may be administered prior to the onset of the disease, disorder, or condition, for a prophylactic or preventive action. Alternatively or additionally, the therapeutically effective amount may be administered after initiation of the disease, disorder, or condition, for a therapeutic action.

[33] “Treating” or “treatment” or “alleviation” refers to therapeutic treatment; wherein the object is to slow down (lessen) the targeted pathologic condition or disorder. A subject or mammal is successfully "treated" for a cancer if, after receiving a therapeutic amount of the one or more peptide(s), one or more expression vector(s), one or more cytotoxic T lymphocyte(s) or one or more engineered T cell(s) according to the present invention, the patient shows observable and/or measurable reduction in or absence of one or more of the following: reduction in the number of cancer cells (or tumor size); reduction in the percent of total cells that are cancerous; and/or relief to some extent of one or more of the symptoms associated with the specific disease or condition; reduced morbidity and mortality, and improvement in quality of life issues. The above parameters for assessing successful treatment and improvement in the disease are readily measurable by routine procedures familiar to a physician.

[34] “Vaccine” refers to a compound that, once administered to a patient, may induce a humoral and/or cellular immune response, and this immune response is protective.

[35] “Vector”, or “expression vector” means the vehicle by which a DNA or RNA sequence (e.g. a foreign gene) can be introduced into a host cell, so as to transform the host and promote expression (e.g. transcription and translation) of the introduced sequence.

DETAILED DESCRIPTION

[36] The present invention relates to a method for identifying Human endogenous retroviruses (HERVs)-derived T cells epitopes associated with a cancer, preferably associated with at least one cancer.

[37] In one embodiment, all the steps of the method are performed in silico, and the method of the invention is an in silico method.

[38] The method of the present invention notably allows to select a limited number of HERV-derived T cell epitopes specifically overexpressed by tumor cells and most likely to be immunogenic among a large number of HERV candidates.

[39] In one embodiment, the method enables to select shared epitopes, i.e. epitopes shared in several patients with the same cancer, and/or epitopes shared among different cancer types and/or epitopes shared among cancer-associated HERVs. In one embodiment, the epitope(s) identified by the method of the present invention is/are shared by several patients with the same cancer. In one embodiment, the epitope(s) identified by the method of the present invention is/are shared among different cancer types. In one embodiment, the epitope(s) identified by the method of the present invention is/are shared among cancer-associated HERVs.

[40] In one embodiment, said method comprises at least 2, 3, 4, 5 or more steps described hereinbelow.

[41] In one embodiment, the method described hereinabove comprises the step of identifying HERVs in tumor and normal samples. Said HERVs may be identified with HERV database.

[42] Methods for identifying HERVs in samples are well known by the skilled artisan, and include, without limitation, the use of reference tools such as Hervquant (C. C. Smith, et al., Endogenous retroviral signatures predict immunotherapy response in clear cell renal cell carcinoma. Journal of Clinical Investigation. 128, 4804-4820 (2018)) or Telescope (Bendall et al., Telescope: Characterization of the retrotranscriptome by accurate estimation of transposable element expression, PLoS Comput Biol. 2019 Sep 30;15(9):el006453).

[43] In addition, HERVs database are described in the literature, see for example, L. Vargiu et al., Classification and characterization of human endogenous retroviruses; mosaic forms are common. Retrovirology. 13, 7 (2016). HERVs database may also be retrieved from Genbank database.

[44] In one embodiment, a normal sample refers to a sample obtained from a subject not affected with a cancer or to a sample obtained in a peri-tumorous location from a subject affected with cancer.

[45] In one embodiment, a tumor sample refers to a sample from a tumor tissues in a subject affected with a cancer.

[46] In one embodiment, the method described hereinabove comprises the step of selecting HERVs associated with cancer, preferably associated with at least one cancer. This step notably enables the selection of the HERVs that are differentially expressed in tumor and in normal samples. [47] In one embodiment, the selection of HERVs associated with cancer comprises the step of comparing HERVs expression in tumor and in normal samples. Said comparison may be done at the RNA level, meaning that the expression of RNA sequences is compared for each HERV between tumor and normal samples.

[48] Said RNA sequences may be obtained from RNA sequencing database and/or by extraction from fresh tissues.

[49] In one embodiment, a HERV is associated with a cancer, preferably associated with at least one cancer, if said HERV is expressed more than 2, 2.5, 3, 3.5, 4, 4.5, 5-fold or more in tumor samples than in normal samples. In one embodiment, a HERV is associated with a cancer, preferably associated with at least one cancer, if said HERV is expressed more than 2-fold in tumor samples than in normal samples and not more than 2-fold in any normal tissue compared to its matched tumor.

[50] In one embodiment, the method described hereinabove comprises the step of selecting HERVs associated with a cytotoxic T cells response. This step notably enables the selection of HERVs that can induce an immune response.

[51] In one embodiment, the association of the HERVs with a cytotoxic T cells response is assessed by a phenotype criterium. In one embodiment, the association of the HERVs with a cytotoxic T cells response is assessed by the association of each HERV with at least one CD4 or CD8 T cell signature.

[52] In one embodiment, the association of the HERVs with a cytotoxic T cells response is assessed by determining the association of each HERV with a transcriptomic signature associated with at least one of CD4 or CD8 T cell phenotype. In one embodiment, this step enables to select HERVs associated with transcripts suggestive of the presence of cells with a CD4 or CD8 T cell phenotype in the tumor sample.

[53] Examples of methods to assign CD4 or CD8 T cell phenotypes are well known by the skilled artisan in the art and include, for example, Xcell method (D. Aran et al., xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 18, 220 (2017)), and MCP counter. [54] As used herein, Xcell is a gene signature-based method that performs cell type enrichment analysis from gene expression data for 64 immune and stroma cell types. xCell signatures were validated using extensive in-silico simulations and also cytometry immunophenotyping. The xCell R package for generating the cell type scores and R scripts for the development of xCell are available at https://github.com/dviraran/xCell and deposited to Zenodo (assigned DOI http:// doi.org/10.5281/zenodo.1004662).

[55] In one embodiment, the signature to assign CD4 or CD8 T cell phenotype is based on the signatures of Xcell.

[56] In one embodiment, the signature to assign CD4 T cell phenotype, in particular CD4 memory T cell phenotype, comprises or consists of the following genes: CD6, CD28, GPRI83, EIF3E, ITK, PKP4, PREPL, DNAJA2, PTGES3, CD2AP and IGOS.

[57] In one embodiment, the signature to assign CD4 T cell phenotype, in particular CD4 naive T cell phenotype, comprises or consists of the following genes: CD27, CLC, CTSW, DNAJBI, RBL2, HAUS3, ANKRD55, ZNF394 and CHMP7.

[58] In one embodiment, the signature to assign CD4 T cell phenotype comprises or consists of the following genes: APBB1, CD28, CTLA4, ITK, PLCL1, SNPH, PPWD1, PHF3, IGOS, TRAT1, RAPGEF6, N0L9 and CHMP7.

[59] In one embodiment, the signature to assign CD4 T cell phenotype, in particular CD4 central memory T cell phenotype (CD4 Tcm), comprises or consists of the following genes: CCR4, CCR8, CTLA4, DAB1, ERN1, GPRI5, DNAJBI, KRTI, P0U6F1, TPO, TRADD, DLEC1, IGOS, TRAT1, FXYD7, FBXL8, SIRPG, ANKRD55 and OBSCN.

[60] In one embodiment, the signature to assign CD4 T cell phenotype, in particular CD4 effector memory T cell phenotype (CD4 Tem), comprises or consists of the following genes: RPN2, SLAMF1, SPTAN1, TRADD, MY016, MCF2L2, ESYT1, TRAPPC2L, ARHGAP15 and RIC8A.

[61] In one embodiment, the signature to assign CD8 T cell phenotype, in particular CD8 naive T cell phenotype, comprises or consists of the following genes: CD8A, CD8B, EEF1D, GPRI5, MYL1, NDUFA4, NDUFS5, PSGII, SKI, SON , HIST1H3A, BUD3I, GPR52, ZNHIT3, CGRRF1, RRP8, NGDN, C19orf53, SETD2, MED31, SS18L2, CDK5RAP1, DDX24, PCIF1, MS4A5, HAUS3, LIN28A and TNKS2.

[62] In one embodiment, the signature to assign CD8 T cell phenotype comprises or consists of the following genes: CASP8, CD8A, CD8B, GZMK, PTGDR, SLC1A7, TSPAN32, KLRG1, NPRL2, GIMAP4, CRTAM w ZNF611.

[63] In one embodiment, the signature to assign CD8 T cell phenotype, in particular CD8 central memory T cell phenotype (CD8 Tcm), comprises or consists of the following genes: CASP8, CD27, TNFSF8, GZMK, NCK1, GPR171, CRTAM, P ARP 11 and TMEM30B.

[64] In one embodiment, the signature to assign CD8 T cell phenotype, in particular CD8 effector memory T cell phenotype (CD8 Tem), comprises or consists of the following genes: DHX8, GZMH, GZMK, LAG3, ZAP70, COLQ, RGS9, CXCR6, PVRIG and PYHIN1.

[65] In one embodiment, the signature to assign CD4 or CD8 T cell phenotype is based on the signature of MCP counter.

[66] In one embodiment, the signature to assign CD4 or CD8 T cell phenotype comprises or consists of the following genes: CD28, CD3D, CD3G, CD5, CD6, CHRM3- AS2, CTLA4, FLT3LG, IGOS, MAP, MGC40069, PBX4, SIRPG, THEMIS, TNFRSF25, TRAT1, CD8B, CD8A, EOMES, FGFBP2, GNLY, KLRC3, KLRC4, KLRD1, BANK1, CD19, CD22, CD79A, CR2, FCRL2, IGKC, MS4A1, PAX5, CD160, KIR2DL1, KIR2DL3, KIR2DL4, KIR3DL1, KIR3DS1, NCR1, PTGDR, SH2D1B, ADAP2, CSF1R, FPR3, KYNU, PLA2G7, RASSF4, TFEC, CD1A, CD1B, CD1E, CLEC10A, CLIC2, WFDC21P, CA4, CEACAM3, CXCR1, CXCR2, CYP4F3, FCGR3B, HAL, KCNJ15, MEGF9, SLC25A37, STEAP4, TECPR2, TLE3, 7NFRSF10C, VNN3, ACVRL1, APLN, BCL6B, BMP6, BMX, CDH5, CLEC14A, CXorf36, EDN1, ELTD1, EMCN, ESAM, ESMI, FAM124B, HECW2, HHIP, KDR, MMRN1, MMRN2, MYCT1, PALMD, PEAR1, PGF, PLXNA2, PTPRB, ROBO4, SDPR, SHANK3, SHE, TEK, TIE1, VEPH1, VWF, COL1A1, COL3A1, COL6A1, COL6A2, DCN, GREM1, PAMR1 and TAGLN. [67] In one embodiment, the association of the HERVs with a cytotoxic T cells response is assessed by determining the association of each HERV with a function signature. In one embodiment, said function signature is interferon (TFN)-y signature or cytolytic activity.

[68] In one embodiment, the IFN-y signature is based on genes up-regulated in response to IFN-y. In one embodiment, the association of the HERVs with a cytotoxic T cells response is assessed by determining the association of each HERV with a transcriptomic signature associated with an IFN-y response.

[69] In one embodiment, IFN-y signature is determined on the basis of database. As an example, IFN-y signature database may be accessible through Molecular Signature Database.

[70] In one embodiment, the IFN-y signature comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 40, 60, 80, 100, 120, 140, 160, 180 or 200 or all of the following genes: ADAR, APOL6, ARID5B, ARL4A, AUTS2, B2M, BANK1, BATF2, BPGM, BST2, BTG1, C1R, CIS, CASP1, CASP3, CASP4, CASP7, CASP8, CCL2, CCL5, CCL7, CD274, CD 38, CD40, CD69, CD74, CD86, CDKN1A, CFB, CFH, CIITA, CMKLR1, CMPK2, CSF2RB, CXCL10, CXCL11, CXCL9, DDX58, DDX60, DHX58, EIF2AK2, EIF4E3, EPSTI1, FAS, FCGR1A, FGL2, FPR1, FTSJD2, GBP4, GBP6, GCH1, GPR18, GZMA, HERC6, HIF1A, HLA-A, HLA-B, HLA-DMA, HLA-DQA1, HLA-DRB1, HLA-G, ICAM1, IDO1, IFI27, IFI30, IFI35, IFI44, IFI44L, IFIH1, IFIT1, IFIT2, IFIT3, IFIIM2, IFIIM3, IFNAR2, IL10RA, III 5, IL15RA, IL18BP, IL2RB, IL4R, IL6, IL7, IRF1, IRF2, IRF4, IRF5, IRF7, IRF8, IRF9, ISG15, ISG20, ISOCI, ITGB7, JAK2, KLRK1, LAP 3, LATS2, LCP2, LGALS3BP, LY6E, LYSMD2, 1-Mar, METTL7B, MT2A, MTHFD2, MVP, MX1, MX2, MYD88, NAMPT, NCOA3, NFKB1, NFKBIA, NLRC5, NMI, NODI, NUP93, OAS2, OAS3, OASL, OGFR, P2RY14, PARP12, PARP14, PDE4B, PELI1, PFKP, PIM1, PLA2G4A, PLSCR1, PML, PNP, PNPT1, PRIC285, PSMA2, PSMA3, PSMB10, PSMB2, PSMB8, PSMB9, PSME1, PSME2, PTGS2, PTPN1, PTPN2, PTPN6, RAPGEF6, RBCK1, RIPK1, RIPK2, RNF213, RNF31, RSAD2, RTP4, SAMD9L, SAMHD1, SECTM1, SELP, SERPINGJ, SLAMF7, SLC25A28, SOCS1, SOCS3, SOD2, SP110, SPPL2A, SRI, SSPN, ST3GAL5, ST8SIA4, STAT1, STAT2, STAT3, STAT4, TAPI, TAPBP, TDRD7, TNFAIP2, TNFAIP3, TNFAIP6, TNFSF10, TOR1B, TRAFD1, TRIM14, TRJM21, TRIM25, TRIM26, TXNIP, UBE2L6, UPP1, USP18, VAMP5, VAMP8, VCAM1, WARS, XAF1, XCL1, ZBP1 and ZNFXL

[71] In one embodiment, the IFN-y signature comprises all the genes mentioned hereinabove. In one embodiment, the IFN-y signature is HALLMARK INTERFERON GAMMA RESPONSE (https ://www.gsea msigdb.org/gsea/msigdb/cards/HALLMARK_INTERFERON_GAMMA_RESPO NSE. html).

[72] In one embodiment, the IFN-y signature is evaluated by calculating the enrichments scores based on IFN-y signature database for each sample per cancer type.

[73] In one embodiment, the cytolytic activity is evaluated by assessing the expression levels of granzyme-A (GZMA) and perforin (PRF1). In one embodiment, the cytolytic activity is evaluated by the calculation of the geometric mean of the granzyme-A (GZMA) and perforin (PRF1) level expression.

[74] In one embodiment, the expression level of granzyme-A (GZMA) and perforin (PRF1) is evaluated at the transcriptomic level. In one embodiment, the expression level of granzyme-A (GZMA) and perforin (PRF1) is evaluated by RNA-seq.

[75] As used herein, RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of specific RNA in a biological sample at a given moment.

[76] RNA-seq data may be found in database, such as the ones of NCBI Gene Expression Omnibus (GEO) portal.

[77] In one embodiment, the expression level of granzyme-A (GZMA) and perforin (PRF1) is evaluated at the proteomic level.

[78] In one embodiment, the association of the HERVs with a cytotoxic T cells response is assessed by the absence of expression of said HERVs in normal purified T or NK cells. This step further enables to select HERVs expressed in tumoral cells, and not in T or NK cells.

[79] Examples of HERVs expressed in T or NK cells are known by the skilled artisan in the art, and include, without limitation, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQID NO: 39, SEQ ID NO: 40 and SEQ ID NO: 41.

[80] In one embodiment, the association of the HERVs with a cytotoxic T cells response is assessed by the evaluation of one, two or the three parameter(s) described hereinabove.

[81] In one embodiment, the association of the HERVs with a cytotoxic T cells response is assessed by (i) the association of each HERV with at least one CD4 or CD8 T cell signature, (ii) the association of each HERV with a function signature being either IFN-y signature or cytolytic activity, and/or (iii) the absence of expression of each HERV in normal purified T or NK cells.

[82] In one embodiment, the association of the HERVs with one, two, or the three parameter(s) described hereinabove is assessed by a machine learning-based approach. In one embodiment, a machine learning-based approach is used to test the associations independently for each cancer type. As used herein, machine-learning refers to artificial intelligence, wherein algorithms build models based on training data.

[83] In one embodiment, the association of the HERVs with one of the parameters described hereinabove is evaluated by a regression, preferably a LI penalized regression (LASSO). In particular, a model may be built for each cancer type, wherein each HERV with a positive coefficient in the model is considered to be associated with the parameter.

[84] In one embodiment, the method described hereinabove comprises a step of selecting T cell epitopes, in particular selecting shared T cell epitopes.

[85] This step notably enables the selection of epitopes i) having common regions with known HERV proteins, reducing thereby the risk of selecting non-translated sequences, and ii) being strong binders for the MHC class I molecules, allowing their presentation by MHC class I molecules. This step may further enable the selection of epitopes iii) being among the most shared epitopes in HERVs, such as the HERVs associated with cancer.

[86] In one embodiment, the method comprises a step of selecting T cell epitope. In one embodiment, the step of selecting T cell epitopes comprises a step of determining putative epitopes.

[87] In one embodiment, the step of selecting T cell epitopes comprises a step of translating HERV sequences into 1, 2, 3, 4, 5 or 6 possible frames and identifying openreading frames (ORFs) of at least 10, 11, 12, 13, 14, 15, or more amino acids to obtain putative epitopes.

[88] In one embodiment, the step of selecting T cell epitopes comprises a step of predicting the binding of the putative epitopes with MHC class I molecules.

[89] Analysis tools for predicting the binding of sequences with MHC molecules are well known by the skilled artisan in the art and include, for example, MHCflurry (T. J. O’Donnell et al., MHCflurry: Open-Source Class I MHC Binding Affinity Prediction. Cell Systems. 7, 129-132.e4 (2018)) or NetMCH I & II.

[90] In one embodiment, said MCH class I molecule is an HLA molecule. In one embodiment, the MCH class I molecule is selected from the group comprising or consisting of HLA- A, HLA-B and HLA-C molecules. In one embodiment, the MCH class I molecule is an HLA-A molecule, such as an HLA-A2 molecule.

[91] In one embodiment, only T cell epitopes predicted as capable of binding with MHC class I molecules are selected. In one embodiment, the epitope is selected if its sequence is predicted as a strong binder of a MCH class I molecule as defined hereinabove.

[92] In one embodiment, the step of selecting T cell epitopes comprises a step of aligning the epitopes with sequences of known HERV proteins, to reduce the risk of selecting non-translated sequences. [93] In one embodiment, the HERV protein is an envelop (Env) protein. In one embodiment, the HERV protein is a group-specific antigen (Gag) protein. In one embodiment, the HERV protein is a polymerase (Pol). In one embodiment, the HERV protein is a protease (Pro). In one embodiment, the HERV protein is a Rec protein. In one embodiment, the HERV protein is an accessory protein.

[94] In one embodiment, the HERV protein is an HERV-K/HML-2 protein. In one embodiment, the HERV protein is a HERV-K/HML-2 Gag protein. In one embodiment, the HERV protein is a HERV-K/HML-2 Pol protein.

[95] Examples of HERV proteins include, without limitation, HERV-K10 Gag protein (SEQ ID NO: 20), HERV-K113 Gag protein (SEQ ID NO: 21), HERV-K21 Gag protein (SEQ ID NO: 22), HERV-K113 Pol protein (SEQ ID NO: 23), HERV-K9 Pol protein (SEQ ID NO: 24), HERV-K6 Pol protein (SEQ ID NO: 25), HERV-K113 Env protein (SEQ ID NO: 26).

SEQ ID NO: 20

MGQTKSKIKSKYASYLSFIKILLKRGGVKVSTKNLIKLFQIIEQFCPWFPEQGTSD LKDWKRIGKELKQAGRKGNIIPLTVWNDWAIIKAALEPFQTEEDSISVSDAPGS CLIDCNENTRKKSQKETESLHCEYVAEPVMAQSTQNVDYNQLQEVIYPETLKL EGKGPELMGPSESKPRGTSPLPAGQVLVRLQPQKQVKENKTQPQVAYQYWPL AELQYRPPPESQYGYPGMPPAPQGRAPYHQPPTRRLNPMAPPSRQGSELHEIID KSRKEGDTEAWQFPVTLEPMPPGEGAQEGEPPTVEARYKSFSIKMLKDMKEGV KQYGPNSPYMRTLLDSIAYGHRLIPYDWEILAKSSLSPSQFLQFKTWWIDGVQE QVRRNRAANPPVNIDADQLLGIGQNWSTISQQALMQNEAIEQVRAICLRAWEKI QDPGSTCPSFNTVRQGSKEPYPDFVARLQDVAQKSIADEKAGKVIVELMAYEN ANPECQSAIKPLKGKVPAGSDVISEYVKACDGIGGAMHKAMLMAQAITGVVL GGQVRTFGGKCYNCGQIGHLKKNCPVLNKQNITIQATTTGREPPDLCPRCKKG KHWASQCRSKFDKNGQPLSGNEQRGQPQAPQQTGAFPIQPFVPQGFQGQQPPL SQVFQGISQLPQYNNCPSPQAAVQQ SEQ ID NO: 21

MGQTKSKIKSKYASYLSFIKILLKRGGVKVSTKNLIKLFQIIEQFCPWFPEQGTLD LKDWKRIGKELKQAGRKGNIIPLTVWNDWAIIKAALEPFQTEEDSVSVSDAPGS CIIDCNEKTRKKSQKETESLHCEYVAEPVMAQSTQNADYNQLQEVIYPETLKLE GKGPELMGPSESKPRGTSPLPAGQVPVTLQPQKQVKENKTQPPVAYQYWPPAE LQYQPPPESQYGYPGMPPAPQGRAPYPQPPTRRLNPTAPPSRQGSELHEIIDKSR KEGDTEAWQFPVTLELMPPGEGAQEGEPPTVEARYKSFSIKMLKDMKEGVKQ YGPNSPYMRTLLDSIAHGHRLIPYDWEILAKSSLSPSQFLQFKTWWIDGVQEQV RRNRAANPPVNIDADQLLGIGQNWSTISQQALMQNEAIEQVRAICLRAWEKIQD PGSTCPSFNTVRQGSKEPYPDFVARLQDVAQKSIADEKARKVIVELMAYENAN PECQ SAIKPLKGKVPAGSD VISE YVKACDGMGGAMHKAMLMAQ AIT GVVLGG QVRTFGGKCYNCGQIGHLKKNCPVLNKQNITIQATTTGREPPDLCPRCKKGKH WASQCRSKFDKNGQPLSGNEQRGQPQAPQQTGAFPIQPFVPQGFQGQQPPLSQ VFQGISQLPQYNNCPPPQAAVQQ

SEQ ID NO: 22

MGQTKSKIKSKYASYLSFIKILLKRGGVKVSTKNLIKLFQIIEQFCPWFPEQGTLD LKDWKRIGKELKQAGRKGNIIPLTVWNDWAIIKAALEPFQTEEDSISVSDAPGS CIIDCNENTRKKSQKETEGLHCEYAAEPVMAQSTQNVDYNQLQEVIYPETLKLE GKGPELVGPSESKPRGTSPLPAGQVPVTLQPQTQVKENKTQPPVAYQYWPPAE LQYRPPPESQYGYPGMPPAPQGRAPYPQPPTRRLNPTAPPSRQGSELHEIIDKSK EGDTEAWQFPVMLEPMPPGEGAQEGEPPTVEARYKSFSIKMLKDMKEGVKQY GPNSPYMRTLLDSIAHGHRLIPYDWEILAKSSLLPSQFLQFKTWWIDGVQEQVQ RNRAANPPVNIDADQLLGIGQNWSTISQQALMQNEAIEQVRAICLRAWEKIQDP GSTCPSFNTVRQSSKEPYPDFVARLQDVAQKSIADEKARKVIVELMAYENANPE CQSAIKPLKGKVPAGSDVISEYVKACDGIGGAMHKAMLMAQAITGVVLGGQV RTFGGKCYNCGQIGHLKKNCPVLNKQNITIQATTTGREPPDLCPRCKKGKHWA SQCRSKFDKNGQPLSGNEQRGQPQAPQQTGAFPIQPFVPQGFQGQQPPLSQVFQ GISQLPQYNNCPPPQAAVQQ SEQ ID NO: 23

NKSRKRRNRVSFLGAATVEPPKPIPLTWKTEKPVWVNQWPLPKQKLEALHLLA NEQLEKGHIEPSFSPWNSPVFVIQKKSGKWRMLTDLRAVNAVIQPMGPLQPGLP SPAMIPKDWPLIIIDLKDCFFTIPLAEQDCEKFAFTIPAINNKEPATRFQWKVLPQ GMLNSPTICQTFVGRALQPVRDKFSDCYIIHYIDDILCAAETKDKLIDCYTFLQA EVANAGLAIASDKIQTSTPFHYLGMQIENRKIKPQKIEIRKDTLKTLNDFQKLLG DINWIRPTLGIPTYVMSNLFSILRGDSDLNSKRMLTPETTKEIKLVEEKIQSAQIN RIDPLAPLRLLIFATAHSPIGIIIQNTDLVEWSFLPHSTVKTFTLYLDQIATLIGQTR

LRIIKLCGNDPDKIVVPLTKEQVRQAFINSGAWQIGLANFVGIIDNHYPKTKIFQF LKLTTWILPKITRREPLENALTVFTDGSSNGKAAYTGLKERVIKTPYQSAQRAEL VAVITVLQDFDQPINIISDSAYVVQATRDVETALIKYSMDDQLNQLFNLLQQTV RKRNFPFYITHIRAHTNLPGPLTKANEQADLLVSSALIKAQELHALTHVNAAGL KNKFDVTWKQAKDIVQHCTQCQVLHLPTQEAGVNPRGLCPNALWQMDVTHV PSFGRLSYVHVTVDTYSHFIWATCQTGESTSHVKKHLLSCFAVMGVPEKIKTDN GPGYCSKAFQKFLSQWKISHTTGIPYNSQGQAIVERTNRTLKTQLVKQKEGGDS

KECTTPQMQLNLAPYTLNFLNIYRNQTTTSAEQHLTGKKNSPHEGKLIWWKDN KNKTWEIGKVITWGRGFACVSPGENQLPVWMPTRHLKFYNEPIGDAKKSTSAE TETPQSSTVDSQDEQNGDVRRTDEVAIHQEGRAADLGTTKEADAVSYKISREH KGDTNPREYAACSLDDCINGGKSPYACRSSCS

SEQ ID NO: 24

MGQTKSKIKSKYASYLSFIKILLKRGGVKVSTKNLIKLFQIIEQFCPWFPEQGTLD LKDWKRIGKELKQAGRKGNIIPLTVWNDWAIIKAALEPFQTEEDSISVSDAPGS GIIDCNEKTRKKSQKETESLHCEYVAEPVMAQSTQNVDYNQLQEVIYPETLKLE GKGPELVGPSESKPRGTSPLPAGQVPVTLQPQKQVKENKTQPPVAYQYWPPAE LQYRPPPESQYGYPGMPPAPQGRAPYPQPPTRRLNPTAPPSRQGSELHEIIDKSR KEGDTEAWQFPVTLEPMPPGEGAQEGEPPTVEARYKSFSIKILKDMKEGVKQY GPNSPYMRTLLDSIAHGHRLIPYDWEILAKSSLSPSQFLQFKTWWIDGVQEQVR

RNRAANPPVNIDADQLLGIGQNWSTISQQALMQNEAIEQVRAICLRAWEKIQDP GSTCPSFNTVRQGSKEPYPDFVARLQDVAQKSIADEKARKVIVELMAYENANP ECQSAIKPLKGKVPAGSDVISEYVKACDGIGGAMHKAMLMAQAITGVVLGGQ VRTFGGKCYNCGQIGHLKKNCPVLNKQNITIQATTTGREPPDLCPRCKKGKHW ASQCRSKFDKNGQPLSGNEQRGQPQAPQQTGAFPIQPFVPQGFQGQQPPLSQVF QGISQLPQYNNCPPPQVAVQQVDLCTIQAVSLLPGEPPQKIPTGVYGPLPEGTVG LILGRSSLNLKGVQIHTSVVDSDYKGEIQLVISSSVPWSASPGDRIAQLLLLPYIK GGNSEIKRIGGLGSTDPTGKAAYWASQVSENRPVCKAIIQGKQFEGLVDTGAD VSIIALNQWPKNWPKQKAVTGLVGIGTASEVYQSMEILHCLGPDNQESTVQPMI TSIPLNLWGRDLLQQWGAEITMPAPLYSPTSQKIMTKRGYIPGKGLGKNEDGIKI PFEAKINQKREGIGYPFLGAATIEPPKPIPLTWKTEKPVWVNQWPLPKQKLEAL HLLANEQLEKGHIEPSFSPWNSPVFVIQKKSGKWRMLTDLRAVNAVIQPMGPL QPGLPSPAMIPKDWPLIIIDLKDCFFTIPLAEQDCEKFAFTIPAINNKEPATRFQW KVLPQGMLNSPTICQTFVGRALQPVKVFRLLYYSLY

SEQ ID NO: 25

NKSRKRRNRESLLGAATVEPPKPIPLTWKTEKPVWVNQWPLPKQKLEALHLLA NEQLEKGHIEPSFSPWNSPVFVIQKKSGKWRMLTDLRAVNAVIQPMGPLQPGLP SPAMIPKDWPLIIIDLKDCFFTIPLAEQDCEKFAFTIPAINNKEPATRFQWKVLPQ GMLNSPTICQTFVGRALQPVREKFSDCYIIHCIDDILCAAETKDKLIDCYTFLQAE VANAGLAIASDKIQTSTPFHYLGMQIENRKIKPQKIEIRKDTLKTLNDFQKLLGD INWIRPTLGIPTYAMSNLFSILRGDSDLNSKRMLTPEATKEIKLVEEKIQSAQINRI DPLAPLQLLIFATAHSPTGIIIQNTDLVEWSFLPHSTVKTFTLYLDQIATLIGQTRL

RIIKLCGNDPDKIVVPLTKEQVRQAFINSGAWKIGLANFVGIIDNHYPKTKIFQFL KLTTWILPKITRREPLENALTVFTDGSSNGKAAYTGPKERVIKTPYQSAQRAEL VAVITVLQDFDQPINIISDSAYVVQATRDVETALIKYSMDDQLNQLFNLLQQTV RKRNFPFYITHIRAHTNLPGPLTKANEQADLLVSSALIKAQELHALTHVNAAGL KNKFDVTWKQAKDIVQHCTQCQVLHLPTQEAGVNPRGLCPNALWQMDVTHV PSFGRLSYVHVTVDTYSHFIWATCQTGESTSHVKKHLLSCFAVMGVPEKIKTDN GPGYCSKAFQKFLSQWKISHTTGIPYNSQGQAIVERTNRTLKTQLVKQKEGGDS

KECTTPQMQLNLALYTLNFLNIYRNQTTTSAEQHLTGKKNSPHEGKLIWWKDN KNKTWEIGKVITWGRGFACVSPGENQLPVWIPTRHLKFYNEPIRDAKKSTSAET ETSQSSTVDSQDEQNGDVRRTDEVAIHQEGRAANLGTTKEADAVSYKISREHK GDTNPREYAAC SLDDCINGGKSP YACRSSC S SEQ ID NO: 26

MNPSEMQRKAPPRRRRHRNRAPLTHKMNKMVTSEEQMKLPSTKKAEPPTWA QLKKLTQLATKYLENTKVTQTPESMLLAALMIVSMVVSLPMPAGAAAANYTY WAYVPFPPLIRAVTWMDNPIEIYVNDSVWVPGPTDDCCPAKPEEEGMMINISIG

YRYPPICLGRAPGCLMPAVQNWLVEVPTVSPISRFTYHMVSGMSLRPRVNYLQ DFSYQRSLKFRPKGKPCPKEIPKESKNTEVLVWEECVANSAVILQNNEFGTLID WAPRGQFYHNCSGQTQSCPSAQVSPAVDSDLTESLDKHKHKKLQSFYPWEWG EKGISTARPKIISPVSGPEHPELWRLTVASHHIRIWSGNQTLETRDRKPFYTIDLN

SSLTVPLQSCVKPPYMLVVGNIVIKPDSQTITCENCRLLTCIDSTFNWQHRILLVR AREGVWIPVSMDRPWEASPSVHILTEVLKGVLNRSKRFIFTLIAVIMGLIAVTAT

AAVAGVALHSSVQSVNFVNDWQNNSTRLWNSQSSIDQKLANQINDLRQTVIW MGDRLMSLEHRFQLQCDWNTSDFCITPQIYNESEHHWDMVRCHLQGREDNLT LDISKLKEQIFEASKAHLNLVPGTEAIAGVADGLANLNTVTWVKTIGSTTIINLIL ILVCLFCLLLVYRCTQQLRRDSDHRERAMMTMVVLSKRKGGNVGKSKRDQIV

TVSV

[96] In one embodiment, the epitope is selected if its sequence shares at least 70, 75, 80, 85, 90, 95, 96, 97, 98, 99% or more identity with at least one HERV protein as defined hereinabove. In one embodiment, the sequence is selected if said sequence shares at least 90% of identity with at least one HERV protein as defined hereinabove.

[97] In one embodiment, an epitope is selected if its sequence shares at least 70, 75, 80, 85, 90, 95, 96, 97, 98, 99% or more identity with a HERV protein as defined hereinabove and/or if said epitope is predicted to bind with MHC class I molecules as described hereinabove.

[98] In one embodiment, the method described hereinabove comprises a step of selecting epitopes among the most shared epitopes in the HERVs identified in one of the steps described hereinabove. This step notably enables to select the epitopes that are among the most shared epitopes in the HERVs, such as the HERVs associated with cancer. [99] Said step may lead to the selection of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more epitopes among the most shared epitopes in the HERVs identified in one of the steps described previously, such as the HERVs associated with cancer.

[100] In one embodiment, the method described hereinabove comprises the step of assessing the expression of the epitopes identified by one of the steps described hereinabove at the protein or peptide level in tumor samples. Said step notably enables to confirm that the selected epitopes are translated or expressed in tumor samples.

[101] In one embodiment, said expression of epitopes at the protein or peptide level is evaluated in silico by using one or more proteomic database(s).

[102] The proteomic database may be obtained from tandem mass spectrometry (MS/MS) data. As used herein, tandem mass spectrometry (TANDEM MS), also named MS/MS, refers to a mass spectrometry technique using two or more mass analyzers. With two in tandem, the precursor ions are mass-selected by a first mass analyzer, and focused into a collision region where they are then fragmented into product ions which are then characterized by a second mass analyzer.

[103] Analysis tools for evaluating the expression of proteins or peptides from database are well known by the skill artisan in the art, and include, for example, pepquery (B. Wen et al., PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations. Genome Res. 29, 485-493 (2019)).

[104] In one embodiment, said expression of epitopes at the protein or peptide level is evaluated in vitro by immunopeptidomics analysis. Immunopeptidomics analysis enables to assess the presence of the epitopes associated with HLA molecules from tumors.

[105] In one embodiment, an epitope is selected if said epitope shows evidence of translation in tumor samples, and/or of translation and association with HLA molecules in tumor samples.

[106] In one embodiment, the method as described hereinabove comprises the following steps: (a). Identifying HERVs associated with a cancer, preferably associated with at least one cancer, and

(b). Selecting T cell epitopes among the HERVs identified in the previous step, and wherein said method further comprises at least one, preferably two, of the following steps:

(i). Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a), said step being between step (a) and step (b), and/or

(ii). Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (b) in tumor samples, said step being after step (b).

[107] In one embodiment, steps (a) and (b) and steps (i) and/or (ii) are in silico steps. In one embodiment, step (ii) is an in vitro step.

[108] In one embodiment, step (a) comprises the step of identifying and selecting HERVs associated with a cancer, preferably associated with at least one cancer.

[109] In one embodiment, step (b) comprises the step of selecting shared T cell epitopes.

[110] In one embodiment, the method as described hereinabove comprises the following steps:

(a). Identifying HERVs associated with a cancer, preferably associated with at least one cancer, and

(b). Selecting T cell epitopes among the HERVs identified in the previous step, and wherein said method further comprises at least one, preferably two, of the following steps: (i). Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a), said step being between the step (a) and the step (b), and/or (ii). Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (b) in tumor samples, said step being after the step (b), wherein the steps (a), (b), (i) and (ii) are performed in silico.

[111] Thus, in one embodiment, the in silico method as described hereinabove comprises the following steps:

(a). Identifying HERVs associated with a cancer, preferably associated with at least one cancer, and

(b). Selecting T cell epitopes among the HERVs identified in the previous step, and wherein said method further comprises at least one, preferably two, of the following steps:

(i). Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a), said step being between the step (a) and the step (b), and/or

(ii). Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (b) in tumor samples, said step being after the step (b).

[112] In one embodiment, said method further comprises, after the step (b) or before the step (ii), a step of selecting epitopes among the most shared epitopes in the HERVs associated with cancer identified in step (a).

[113] In one embodiment, the in silico method as described hereinabove comprises the following steps:

(a) Identifying HERVs associated with a cancer, preferably associated with at least one cancer,

(b) Selecting T cell epitopes among the cancer-associated HERVs identified in step (a), and

(c) Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (b) in tumor samples. [114] In one embodiment, the in silica method as described hereinabove comprises the following steps:

(a) Identifying HERVs associated with a cancer, preferably associated with at least one cancer,

(b) Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a),

(c) Selecting T cell epitopes among the HERVs identified in step (b), and

(d) Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (c) in tumor samples.

[115] In one embodiment, the in silica method as described hereinabove comprises the following steps:

(a) Identifying HERVs associated with a cancer, preferably associated with at least one cancer,

(b) Selecting T cell epitopes among the cancer-associated HERVs identified in step (a),

(c) Selecting epitopes among the most shared epitopes in the HERVs associated with cancer identified in step (a), and

(d) Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (c) in tumor samples.

[116] In one embodiment, the in silica method described hereinabove comprises the following steps:

(a) Identifying HERVs associated with a cancer, preferably associated with at least one cancer,

(b) Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a),

(c) Selecting T cell epitopes among the HERVs identified in step (b),

(d) Selecting epitopes among the most shared epitopes in the cancer- associated HERVs identified in step (a), and

(e) Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (d) in tumor samples. [117] In one embodiment, the in silica method as described hereinabove comprises the following steps:

(a) Identifying HERVs associated with a cancer, preferably associated with at least one cancer,

(b) Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a), and

(c) Selecting T cell epitopes among the cancer-associated HERVs identified in step (b).

[118] In one embodiment, the in silica method as described hereinabove comprises the following steps:

(a) Identifying HERVs associated with a cancer, preferably associated with at least one cancer,

(b) Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a),

(c) Selecting T cell epitopes among the cancer-associated HERVs identified in step (b), and

(d) Selecting epitopes among the most shared epitopes in the cancer- associated HERVs identified in step (a).

[119] In one embodiment, the steps are as described hereinabove.

[120] In one embodiment, the in silica method described hereinabove comprises the following steps:

(a) Identifying HERVs associated with a cancer, preferably associated with at least one cancer,

(b) Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a), wherein the association of the HERVs with a cytotoxic T cells response is assessed by (i) the association of each HERV with at least one CD4 or CD8 T cell signature, (ii) the association of each HERV with a function signature being either interferon (TFN)-y signature or cytolytic activity, and/or (iii) the absence of expression of each HERV in normal purified T or NK cells. (c) Selecting T cell epitopes among the HERVs identified in step (b), wherein the selection of the T cell epitopes comprises the steps of

- determining putative epitopes,

- predicting the binding of the putative epitopes with MHC class I molecules,

- optionally, aligning the epitopes with sequences of known HERV proteins,

(d) Optionally, selecting epitopes among the most shared epitopes in the cancer-associated HERVs, and

(e) Optionally, assessing the expression at the protein or peptide level of the HERVs-derived T cells epitopes identified in step (c) or (d) in tumor samples.

[121] In one embodiment, the method as described hereinabove comprises the following steps:

(a). Identifying HERVs associated with a cancer, preferably associated with at least one cancer, and

(b). Selecting T cell epitopes among the HERVs identified in the previous step, and wherein said method further comprises at least one, preferably two, of the following steps:

(i). Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a), said step being between the step (a) and the step (b), and/or

(ii). Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (b) in tumor samples, said step being after the step (b), wherein the steps (a), (b) and (i) are performed in silico, and the step (ii) is performed in vitro.

[122] In one embodiment, step (a) comprises the step of identifying and selecting HERVs associated with a cancer, preferably associated with at least one cancer.

[123] In one embodiment, step (b) comprises the step of selecting shared T cell epitopes. [124] In one embodiment, said method further comprises, after the step (b) or before the step (ii), a step of selecting epitopes among the most shared epitopes in the HERVs associated with cancer identified in step (a).

[125] In one embodiment, the method as described hereinabove comprises the following steps:

(a) Identifying HERVs associated with a cancer, preferably associated with at least one cancer,

(b) Selecting T cell epitopes among the cancer-associated HERVs identified in step (a), and

(c) Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (b) in tumor samples, wherein the steps (a), (b) are performed in silico and the step (c) is performed in vitro.

[126] In one embodiment, the method as described hereinabove comprises the following steps:

(c) Identifying HERVs associated with a cancer, preferably associated with at least one cancer,

(d) Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a),

(c) Selecting T cell epitopes among the HERVs identified in step (b), and

(d) Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (c) in tumor samples, wherein the steps (a), (b), (c) are performed in silico and the step (d) is performed in vitro.

[127] In one embodiment, the method as described hereinabove comprises the following steps:

(a) Identifying HERVs associated with a cancer, preferably associated with at least one cancer,

(b) Selecting T cell epitopes among the cancer-associated HERVs identified in step (a),

(c) Selecting epitopes among the most shared epitopes in the HERVs associated with cancer identified in step (a), and (d) Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (c) in tumor samples, wherein the steps (a), (b), (c) are performed in silico and the step (d) is performed in vitro.

[128] In one embodiment, the method described hereinabove comprises the following steps:

(a) Identifying HERVs associated with a cancer, preferably associated with at least one cancer,

(b) Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a),

(c) Selecting T cell epitopes among the HERVs identified in step (b),

(d) Selecting epitopes among the most shared epitopes in the cancer- associated HERVs identified in step (a), and

(e) Assessing the expression at the protein or peptide level of the HERVs- derived T cells epitopes identified in step (d) in tumor samples, wherein the steps (a), (b), (c), (d) are performed in silico and the step (e) is performed in vitro.

[129] In one embodiment, the steps are as described hereinabove.

[130] In one embodiment, the method described hereinabove comprises the following steps:

(a) Identifying HERVs associated with a cancer, preferably associated with at least one cancer,

(b) Selecting HERVs associated with a cytotoxic T cells response among the cancer-associated HERVs identified in step (a), wherein the association of the HERVs with a cytotoxic T cells response is assessed by (i) the association of each HERV with at least one CD4 or CD8 T cell signature, (ii) the association of each HERV with a function signature being either interferon (TFN)-y signature or cytolytic activity, and/or (iii) the absence of expression of each HERV in normal purified T or NK cells.

(c) Selecting T cell epitopes among the HERVs identified in step (b), wherein the selection of the T cell epitopes comprises the steps of: - determining putative epitopes,

- predicting the binding of the putative epitopes with MHC class I molecules,

- optionally, aligning the epitopes with sequences of known HERV proteins,

(d) Optionally, selecting epitopes among the most shared epitopes in the cancer-associated HERVs, and

(e) Optionally, assessing the expression at the protein or peptide level of the HERVs-derived T cells epitopes identified in step (c) or (d) in tumor samples, wherein the steps (a), (b), (c), (d) are performed in silico and the step (e) is performed in vitro.

[131] In one embodiment, the steps (d) and (e) are performed in parallel, when present. In one embodiment, the step of aligning the epitopes with sequences of known HERV proteins and the steps (d) and/or (e) are performed in parallel, when present.

[132] In one embodiment, the methods described hereinabove further comprise the step of aligning the selected epitopes with human proteome. This step notably enables to confirm that the selected epitopes do not match any self-protein.

[133] In one embodiment, the step of aligning the selected epitopes with human proteome is performed in parallel with the steps of aligning the epitopes with sequences of known HERV proteins, the step (d) and/or the step (e), when present.

[134] In one embodiment, the methods described hereinabove are combined with an in vitro validation of the selected epitopes. In particular, the methods described hereinabove may be combined with one or several step(s) described hereinbelow.

[135] In one embodiment, said in vitro validation comprises the step of evaluating the induction of T cell responses.

[136] In one embodiment, the induction of T cells responses is assessed by measuring the induction of CD8+ T cells specific for the selected epitopes. Examples of methods to assess induction of CD8+ T cells include, for example, in vitro priming assays with the selected epitopes (see the Example part for example). [137] In one embodiment, the induction of T cells responses is assessed by measuring IFN-y or granzyme B production in presence of epitopes-stimulated cells (see the Example part for example). Examples of methods for measuring IFN-y or granzyme B production include, for example, flow cytometry or fluorospot assays.

[138] In one embodiment, the induction of T cells responses is assessed by measuring degranulation markers, such as, for example, CD 107a (see the Example part for example).

[139] In one embodiment, said in vitro validation comprises a step of evaluating the affinity of the CD8+T cells specific for the selected epitopes with MCH molecules. Examples of methods for evaluating such affinity include, without limitation, 3D modeling (see the Example part for example).

[140] In one embodiment, said in vitro validation comprises a step of evaluating the functionality of the CD8+T cells specific for the selected epitopes. Examples of methods to measure the functionality of CD8+T cells include, without limitation, immune-cell killing assays and analyses of morphological signs of T-cells activation by microscopy (see the Example part for example).

[141] In one embodiment, said in vitro validation comprises a step of assessing the presence of T cells specific for the selected epitopes in tumor tissues from a subject affected with cancer (see the Example part for example). Said tumor tissues may be obtained by biopsies.

[142] The present invention also relates to a peptide comprising or consisting of an epitope identified by the method described hereinabove.

[143] The present invention relates to a peptide comprising or consisting of an epitope having a sequence selected in the group comprising or consisting of KLLGDINWI (SEQ ID NO: 1), FIFTLIAVI (SEQ ID N: 2), RMLTDLRAV (SEQ ID NO: 3), FLSLYFVSV (SEQ ID NO: 4), TLIAVIMGL (SEQ ID NO: 5), YLDQIATLI (SEQ ID NO: 6), FLQAEVANA (SEQ ID NO: 7), WMGDRLMSL (SEQ ID NO: 8), ALHSSVQSV (SEQ ID NO: 9), ILTEVLKGV (SEQ ID NO: 10), LMAQAITGV (SEQ ID NO: 11), RLSYVHVTV (SEQ ID NO: 12), VLQDFDQPI (SEQ ID NO: 13), ALMIVSMVV (SEQ ID NO: 14), MLLAALMIV (SEQ ID NO: 15), YIDDILCAA (SEQ ID NO: 16), YIIHYIDDI (SEQ ID NO: 17), YIWCPTWRL (SEQ ID NO: 18), and YIWCPTWSL (SEQ ID NO: 19). The present invention relates to a peptide comprising or consisting of ID NO: 1), FIFTLIAVI (SEQ ID N: 2), RMLTDLRAV (SEQ ID NO: 3), FLSLYFVSV (SEQ ID NO: 4), TLIAVIMGL (SEQ ID NO: 5), YLDQIATLI (SEQ ID NO: 6), FLQAEVANA (SEQ ID NO: 7), WMGDRLMSL (SEQ ID NO: 8), ALHSSVQSV (SEQ ID NO: 9), ILTEVLKGV (SEQ ID NO: 10), LMAQAITGV (SEQ ID NO: 11), RLSYVHVTV (SEQ ID NO: 12), VLQDFDQPI (SEQ ID NO: 13), ALMIVSMVV (SEQ ID NO: 14), MLLAALMIV (SEQ ID NO: 15), YIDDILCAA (SEQ ID NO: 16), YIIHYIDDI (SEQ ID NO: 17), YIWCPTWRL (SEQ ID NO: 18), and YIWCPTWSL (SEQ ID NO: 19).

[144] In one embodiment, the peptide comprises or consists of an epitope having a sequence selected in the group comprising or consisting of RMLTDLRAV (SEQ ID NO: 3), ALHSSVQSV (SEQ ID NO: 9), LMAQAITGV (SEQ ID NO: 11), VLQDFDQPI (SEQ ID NO: 13), ALMIVSMVV (SEQ ID NO: 14), MLLAALMIV (SEQ ID NO: 15), YIDDILCAA (SEQ ID NO: 16). In one embodiment, the peptide comprises or consists of a sequence selected in the group comprising or consisting of RMLTDLRAV (SEQ ID NO: 3), ALHSSVQSV (SEQ ID NO: 9), LMAQAITGV (SEQ ID NO: 11), VLQDFDQPI (SEQ ID NO: 13), ALMIVSMVV (SEQ ID NO: 14), MLLAALMIV (SEQ ID NO: 15), YIDDILCAA (SEQ ID NO: 16).

[145] In one embodiment, the peptide comprises or consists of an epitope having a sequence selected in the group comprising or consisting of RMLTDLRAV (SEQ ID NO: 3), LMAQAITGV (SEQ ID NO: 11), VLQDFDQPI (SEQ ID NO: 13), ALMIVSMVV (SEQ ID NO: 14), MLLAALMIV (SEQ ID NO: 15), YIDDILCAA (SEQ ID NO: 16). In one embodiment, the peptide comprises or consists of a sequence selected in the group comprising or consisting of RMLTDLRAV (SEQ ID NO: 3), LMAQAITGV (SEQ ID NO: 11), VLQDFDQPI (SEQ ID NO: 13), ALMIVSMVV (SEQ ID NO: 14), MLLAALMIV (SEQ ID NO: 15), YIDDILCAA (SEQ ID NO: 16).

[146] In one embodiment, the peptide comprises or consists of an epitope having a sequence selected in the group comprising or consisting of RMLTDLRAV (SEQ ID NO: 3), LMAQAITGV (SEQ ID NO: 11), VLQDFDQPI (SEQ ID NO: 13), ALMIVSMVV (SEQ ID NO: 14), YIDDILCAA (SEQ ID NO: 16). In one embodiment, the peptide comprises or consists of a sequence selected in the group comprising or consisting of RMLTDLRAV (SEQ ID NO: 3), LMAQAITGV (SEQ ID NO: 11), VLQDFDQPI (SEQ ID NO: 13), ALMIVSMVV (SEQ ID NO: 14), YIDDILCAA (SEQ ID NO: 16).

[147] In one embodiment, the peptide comprises or consists of an epitope of sequence KLLGDINWI (SEQ ID NO: 1). In one embodiment, the peptide comprises or consists of an epitope of sequence FIFTLIAVI (SEQ ID N: 2). In one embodiment, the peptide comprises or consists of an epitope of sequence RMLTDLRAV (SEQ ID NO: 3). In one embodiment, the peptide comprises or consists of an epitope of sequence FLSLYFVSV (SEQ ID NO: 4). In one embodiment, the peptide comprises or consists of an epitope of sequence TLIAVIMGL (SEQ ID NO: 5). In one embodiment, the peptide comprises or consists of an epitope of sequence YLDQIATLI (SEQ ID NO: 6). In one embodiment, the peptide comprises or consists of an epitope of sequence FLQAEVANA (SEQ ID NO: 7). In one embodiment, the peptide comprises or consists of an epitope of sequence WMGDRLMSL (SEQ ID NO: 8). In one embodiment, the peptide comprises or consists of an epitope of sequence ALHSSVQSV (SEQ ID NO: 9). In one embodiment, the peptide comprises or consists of an epitope of sequence ILTEVLKGV (SEQ ID NO: 10). In one embodiment, the peptide comprises or consists of an epitope of sequence LMAQAITGV (SEQ ID NO: 11). In one embodiment, the peptide comprises or consists of an epitope of sequence RLSYVHVTV (SEQ ID NO: 12). In one embodiment, the peptide comprises or consists of an epitope of sequence VLQDFDQPI (SEQ ID NO: 13). In one embodiment, the peptide comprises or consists of an epitope of sequence ALMIVSMVV (SEQ ID NO: 14). In one embodiment, the peptide comprises or consists of an epitope of sequence MLLAALMIV (SEQ ID NO: 15). In one embodiment, the peptide comprises or consists of an epitope of sequence YIDDILCAA (SEQ ID NO: 16). In one embodiment, the peptide comprises or consists of an epitope of sequence YIIHYIDDI (SEQ ID NO: 17). In one embodiment, the peptide comprises or consists of an epitope of sequence YIWCPTWRL (SEQ ID NO: 18). In one embodiment, the peptide comprises or consists of an epitope of sequence YIWCPTWSL (SEQ ID NO: 19). [148] In one embodiment, the peptide comprises or consists of KLLGDINWI (SEQ ID NO: 1). In one embodiment, the peptide comprises or consists of FIFTLIAVI (SEQ ID N: 2). In one embodiment, the peptide comprises or consists of RMLTDLRAV (SEQ ID NO: 3). In one embodiment, the peptide comprises or consists of FLSLYFVSV (SEQ ID NO: 4). In one embodiment, the peptide comprises or consists of TLIAVIMGL (SEQ ID NO: 5). In one embodiment, the peptide comprises or consists of YLDQIATLI (SEQ ID NO: 6). In one embodiment, the peptide comprises or consists of FLQAEVANA (SEQ ID NO: 7). In one embodiment, the peptide comprises or consists of WMGDRLMSL (SEQ ID NO: 8). In one embodiment, the peptide comprises or consists of ALHSSVQSV (SEQ ID NO: 9). In one embodiment, the peptide comprises or consists of ILTEVLKGV (SEQ ID NO: 10). In one embodiment, the peptide comprises or consists of LMAQAITGV (SEQ ID NO: 11). In one embodiment, the peptide comprises or consists of RLSYVHVTV (SEQ ID NO: 12). In one embodiment, the peptide comprises or consists of VLQDFDQPI (SEQ ID NO: 13). In one embodiment, the peptide comprises or consists of ALMIVSMVV (SEQ ID NO: 14). In one embodiment, the peptide comprises or consists of MLLAALMIV (SEQ ID NO: 15). In one embodiment, the peptide comprises or consists of YIDDILCAA (SEQ ID NO: 16). In one embodiment, the peptide comprises or consists of YIIHYIDDI (SEQ ID NO: 17). In one embodiment, the peptide comprises or consists of YIWCPTWRL (SEQ ID NO: 18). In one embodiment, the peptide comprises or consists of YIWCPTWSL (SEQ ID NO: 19).

[149] Said peptides may be made by any technique known to those of skill in the art, including the expression of proteins, polypeptides or peptides through standard molecular biological techniques, the isolation of proteins or peptides from natural sources, or the chemical synthesis of proteins or peptides. Synthetic peptides will generally be about up 35 residues long, which is the approximate upper length limit of automated peptide synthesis machines, such as those available from Applied Biosystems (Foster City, Calif.). Longer peptides also may be prepared, e.g., by recombinant means.

[150] The present invention also relates to an expression vector inducing expression of one or more peptide(s) as described hereinabove. The present invention also relates to an expression vector comprising a nucleic acid sequence encoding one or more peptide(s) as described hereinabove.

[151] Said vector may be especially a RNA vector, a DNA vector or plasmid, a viral vector or a bacterial vector. There can be integration of an expression cassette into the host cell genome or there can be no integration, depending on the nature of the vector and as this is well known to the skilled person. The expression vector or the expression cassette may further comprise elements necessary for the in vivo expression of the nucleic acid (polynucleotide) in a subject. For example, this may consist of an initiation codon (ATG), a stop codon and a promoter, as well as a polyadenylation sequence for certain vectors such as the plasmids and viral vectors other than poxviruses. The ATG may be placed at 5' of the reading frame and a stop codon may be placed at 3'. As it is well- known, other elements making it possible to control the expression could be present, such as enhancer sequences, stabilizing sequences and signal sequences permitting the secretion of the peptide.

[152] Regarding RNA vectors, said vectors may use, for example, non-replicating mRNA or virally derived, self-amplifying RNA. Conventional mRNA-based vectors may encode the peptide of interest and may contain 5' and 3' untranslated regions (UTRs). Self-amplifying RNAs may encode not only the peptide of interest but also the viral replication machinery that enables intracellular RNA amplification and abundant protein expression.

[153] Examples of viral vectors, include, without limitation, lentivirus and retrovirus.

[154] In one embodiment, the method to obtain the peptides as described hereinabove comprises: introducing in vitro or ex vivo a vector as described hereinabove into a competent host cell; culturing in vitro or ex vivo host cells transformed with the expression vector as described hereinabove, under conditions suitable for expression of the peptides; optionally, selecting the cells which express and/or secrete said peptides; and recovering the expressed peptides. [155] The present invention also relates to a cytotoxic T lymphocyte (CTL) of a subject treated with one or more peptide(s) as described hereinabove.

[156] The present invention also relates to a cytotoxic T lymphocyte (CTL) of a subj ect treated with one or more expression vector(s) as described hereinabove.

[157] The present invention also relates to a T-cell receptor (TCR) recognizing a peptide as described hereinabove.

[158] The present invention also relates to an engineered T cell expressing a TCR recognizing a peptide as described hereinabove.

[159] The process of preparing these T cells is known from the skilled person. It may be the following: (i) TCR a and P chains are isolated from T cells recognizing peptides as described hereinabove and inserted into a vector; (ii) T cells isolated from the peripheral blood of a patient or a donor are modified with such a vector to encode the desired TCRaP sequences ; (iii) these modified T cells are then expanded in vitro to obtain sufficient numbers for treatment and administered into the patient. Of note, TCR sequences can be modified for optimization of TCR affinity.

[160] The present invention also relates to one or more peptide(s), one or more expression vector(s), one or more CTL(s), or one or more engineered T cell(s) as described hereinabove for use as a vaccine.

[161] The present invention also relates to one or more peptide(s), one or more expression vector(s), one or more CTL(s), or one or more engineered T cell(s) as described hereinabove for use as a medicament.

[162] The present invention also relates to one or more peptide(s), one or more expression vector(s), one or more CTL(s), or one or more engineered T cell(s) as described hereinabove for use in treating or preventing a cancer in a subject in need thereof.

[163] The present invention also relates to the use of one or more peptide(s), one or more expression vector(s), one or more CTL(s), or one or more engineered T cell(s) as described hereinabove in the manufacture of a medicament for treating or preventing a cancer in a subject in need thereof

[164] The present invention also relates to a method for treating or preventing a cancer in a subject in need thereof, wherein said method comprises the administration of one or more peptide(s), one or more expression vector(s), one or more CTL(s), or one or more engineered T cell(s) as described hereinabove in said subject.

[165] In one embodiment, said cancer is selected from the group comprising or consisting of breast cancer, including triple negative breast cancer (TNBC), ovarian cancer, melanoma, sarcoma, teratocarcinoma, bladder cancer, lung cancer, including nonsmall cell lung carcinoma and small cell lung carcinoma, head and neck cancer, colorectal cancer, glioblastoma, leukemias, lymphomas and other solid tumors and hematological malignancies.

[166] In one embodiment, said cancer is selected from the group comprising or consisting of breast cancer, including triple negative breast cancer (TNBC), ovarian cancer, melanoma, sarcoma, lung cancer, including non-small cell lung carcinoma and small cell lung carcinoma, head and neck cancer, glioblastoma, leukemias, lymphomas and other solid tumors and hematological malignancies.

[167] In one embodiment, said cancer is a breast cancer, preferably TNBC.

[168] In one embodiment, the one or more peptide(s), one or more expression vector(s), one or more CTL(s) or one or more engineered T cell(s) as described hereinabove induce an immune response, such as a T cell response, preferably an immune response against the tumor-associated epitope.

[169] One of ordinary skill would know various assays to determine whether an immune response against a tumor-associated epitope was generated. Various B lymphocyte and T lymphocyte assays are well known, such as ELISAs, cytotoxic T lymphocyte (CTL) assays, such as chromium release assays, proliferation assays using peripheral blood lymphocytes (PBL), tetramer assays, and cytokine production assays. [170] Thus, the present invention also relates to a method for inducing an immune response in a subject in need thereof, wherein said method comprises the administration of one or more peptide(s), one or more expression vector(s), one or more CTL(s), or one or more engineered T cell(s) as described hereinabove in said subject.

[171] In one embodiment, the one or more peptide(s), one or more expression vector(s), one or more CTL(s), or one or more engineered T cell(s) as described hereinabove is/are administered at a therapeutically effective amount.

[172] It will be however understood that the total daily usage of one or more peptide(s), one or more expression vector(s), one or more CTL(s), or one or more engineered T cell(s) as described hereinabove will be decided by the attending physician within the scope of sound medical judgment.

[173] The specific dose for any particular subject will depend upon a variety of factors including the symptom being treated and the severity of the symptom; activity of the specific compound employed; the specific composition employed, the age, body weight, general health, sex and diet of the patient; the time of administration, route of administration, and rate of excretion of the specific compound employed; the duration of the treatment; drugs used in combination or coincidental with the specific compounds employed; and like factors well known in the medical arts.

[174] For use in administration to a subject, the one or more peptide(s), one or more expression vector(s), one or more CTL(s) or one or more engineered T cell(s) is/are to be formulated for administration to the subject. The one or more peptide(s), one or more expression vector(s), one or more CTL(s) or one or more engineered T cell(s) as described hereinabove may be administered parenterally, by inhalation spray, rectally, nasally, or via an implanted reservoir. The term administration used herein includes subcutaneous, intravenous, intramuscular, intra-articular, intra-synovial, intrastemal, intrathecal, intrahepatic, intralesional and intracranial injection or infusion techniques. BRIEF DESCRIPTION OF THE DRAWINGS

[175] Figure 1 is a combination of diagrams and one histogram showing the pancancer identification of EIERVs associated with CTL responses. Figure 1A: Venn diagram representing the total number of HERVs overexpressed in a tumor versus its normal counterpart (peritumoral tissue), and the total number of HERVs overexpressed in a normal peritumoral tissue versus its tumoral counterpart. EIERVs overexpressed in at least 1 tumor and never overexpressed in any peritumoral tissue are considered cancer- associated. Figure IB: Venn diagram of the selection criteria for a EIERV to be annotated as associated with CTL response (cyt-EIERV). Each EIERV had to be associated with both a phenotype (CD8 or CD4 T cell signatures) and a function (cytolytic activity (Granzyme B & Perforin 1) or IFN-y signature) criteria (A and B) and not overexpressed in normal purified T/NK cells (A AND B NOT C). Figure 1C: Venn diagram of cancer-associated EIERVs’ association with CTL responses criteria defined in A. A total of 192 HERVs are annotated as cyt-HERVs. Figure ID: Proportion of cancer-associated EIERVs annotated as cyt-HERVs per cancer subtype. Cyt-EIERVs are represented in light grey. CTL: Cytotoxic T cell response, Cyt-HERVs: HERVs associated with CTL response in cancer, TNBC: Triple-Negative Breast Cancer

[176] Figure 2 is a combination of one histogram, one schema and one table showing the selection of shared EILA-A2 epitopes derived from Gag and Pol EIERV-K/HML-2. Figure 2A: Flow Chart of peptide selection from cyt-HERVs sequences. Figure 2B: Bar chart of the top 25 most shared peptides predicted as strong EILA-A*02 binders among the 192 cyt-HERVs. Selected peptides (Pl to P6) are marked with a star. Figure 2C: Characteristics of the 6 selected EILA-A2 epitopes. 9-mer peptides were selected according to their predicted EILA-A*02 affinity (considering strong binders for percentile ranks <0.5) and the number of EIERVs containing their sequences.

[177] Figure 3 is a combination of a diagram and two histograms showing that the shared CD8+ T cell epitopes derived from conserved Gag and Pol EIERV-K/HML-2 motifs are expressed in TNBC. Figure 3A: Venn diagram of total number of cyt-HERVs overexpressed in each subtype of breast cancer in TCGA database. Figure 3B: Mean expression of the 54 cyt-EIERVs overexpressed in TCGA basal subtype, in the independent database of Varley et al and in medullary thymic epithelial cells (mTECs). Figure 3C: Expression of the 18 peptide-containing CAHs in the breast cancer basal (Hs578t & MDA-MB-231) and luminal A (MCF7 and T47D) cell lines analyzed by Riboseq.

[178] Figure 4 is a combination of schemas and plots showing that HERV-derived epitopes induce polyfunctional CD8+ T cell responses. Figure 4A: Schematic representation of the in vitro priming protocol. Figure 4B: Summary of the results obtained with PBMCs from 11 HLA-A2-positive EID (HD1 to HD11, one donor per line). Figure 4C: Plots of IFN-y (left panels), IFN-y and TNF-a (center panels) or IFN-y and CD 107a (right panels) staining gated on CD8+ T cells. PBMCs were stimulated with peptide (here P6, upper line), no peptide (central line) or CMV pp65 peptide (bottom line).

[179] Figure 5 is a combination of tables and a diagram showing the visualization of CDR Loops and CDR Loop interactions with peptides. Figure 5A: Productive frequency of the TCRa and TCRP CDR3 sequences for the top clones specific to each peptide (Pl, P2, P4, P6) and the corresponding resolved V, D and J alleles. Figure 5B: Predicted binding affinities (Predict. Ag) are expressed in kcal/mol. Average values are reported. Stars indicate significant statistical test (Welch two-sample t-test) at the 5% level. Figure 5C: Diagram ranking of modeled HERV-specific TCR-pMHC and reference TCR- pMHC complexes available in the Protein Data Bank and obtained from crystallography data, according to their predicted binding affinity. CDR: complementarity-determining region, TCR: T cell receptor; MHC: major histocompatibility complex; pMHC: peptide- MHC , TRA: alpha chain of TCR; TRB: beta chain of TCR.

[180] Figure 6 is a combination of graphs showing that EIERV-specific T cell clones are functional, recognize and kill tumor cells. Figure 6A: Functional avidity of CD8+ T cell clones calculated as nonlinear fit of normalized IFN-y production. N9-V1 and -2: CMV-specific T cell clones (see Methods). EC50 are represented for each clone by the interpolation of the dashed lines with the X-axis. Figure 6B: Cell death quantification represented as fluorescence intensity increase from the baseline (Y-axis) in function of the time (hours, X-axis). Figure 6C: Specific tumor cell lysis at 48h. Mean percentage of technical triplicates is plotted for each condition (data representative of at least 2 independent experiments).

[181] Figure 7 is a combination of a graph and pictures showing that HERV-specific T cells are present among tumor infiltrating T cells. Figure 7A: Overall survival according to 18-HERVs score in TCGA HLA-A2 TNBC patients (n=65). Patients were divided in three groups according to the score terciles: high expression (n=22); intermediate expression (n=21); low expression (n=22). Figure 7B: 60X-pictures of TNBC organoids co-cultured with CMV, Pl or P6-specific CD8+ T cell clones (top- down) acquired at different time points using Nanolive technology. T cells are shown by white arrows.

EXAMPLES

[182] The present invention is further illustrated by the following examples.

Materials and Methods

Datasets

[183] For RNA-seq data, raw fastq files were accessed from the NCBI Gene Expression Omnibus (GEO) portal, under the accession number GSE58135 for Varley et al. independent breast cancer dataset (K. E. Varley et al., Recurrent read-through fusion transcripts in breast cancer. Breast Cancer Res. Treat. 146, 287-297 (2014)), GSE74246 for the sorted PBMC dataset (M. R. Corces et al., Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nature Genetics. 48, 1193-1203 (2016)), GSE127825 and GSE127826 for the six mTECs samples (J.-D. Larouche et al., Widespread and tissue-specific expression of endogenous retroelements in human somatic tissues. Genome Med. 12, 1-16 (2020)). TCGA pancancer raw fastq files were accessed from the Genomic Data Commons (GDC) portal (https://portal.gdc.cancer.gov/). Cell line data were accessed from the Broad Institute Cancer Cell Line Encyclopedia (CCLE) portal (https://portals.broadinstitute.org/ccle). HERV expression quantification

[184] HERV expression was assessed using the HervQuant pipeline (C. C. Smith et al., Endogenous retroviral signatures predict immunotherapy response in clear cell renal cell carcinoma. Journal of Clinical Investigation. 128, 4804-4820 (2018)). Briefly, RNAseq reads were mapped with STAR v2.7.3a (A. Dobin et al., STAR: ultrafast universal RNAseq aligner. Bioinformatics. 29, 15-21 (2013)) to the hgl9 reference transcriptome compiled with the annotation of 3,173 HERV sequences (L. Vargiu et al., Classification and characterization of human endogenous retroviruses; mosaic forms are common. Retrovirology. 13, 7 (2016)). Multimaps < 10 and mismatch < 7 were allowed, as in the original publication. BAM outputs were filtered for reads that mapped HERV sequences using SAMtools vl .4 (H. Li et al., 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25, 2078-2079 (2009)) and then quantified using Salmon vO.7.2 (R. Patro et al., Salmon: fast and bias-aware quantification of transcript expression using dual-phase inference. Nat Methods. 14, 417— 419 (2017)). Raw counts were normalized to counts per million total reads and then log2 + 1 transformed.

Quality check / sample filtering

Only primary solid tumor samples (TCGA code 01) were included, regrouping 9,718 samples from 32 different cancer types, from which 9,492 were analyzable for HERV expression. Quality check resulted in complete removal of ESCA and STAD samples due to a largely skewed HERV distribution, leading to the final analysis of 8,893 samples from 29 different cancer types.

Immune signatures and genetic alterations

[185] Phenotypic immune signatures were calculated with the Xcell method (D. Aran, et al., xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 18, 220 (2017)). For the TCGA Pancancer samples, Xcell signatures were directly downloaded from the Xcell website (https://xcell.ucsf.edu/xCell_TCGA_RSEM.txt). For the GSM 1401648 dataset, signatures were calculated for the whole dataset, and immune signatures were filtered after. Interferon-gamma (IFN-y) signature was calculated by single-sample gene set variation analysis (GSVA) (S. Hanzelmann, et al., GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics. 14, 7 (2013)). based on the HALLMARK INTERFERON GAMMA RESPONSE signature from the Molecular Signature Database

(http://software.broadinstitute.org/gsea/msigdb/index.jsp ). Enrichment scores were calculated for each sample per cancer type. The cytolytic activity (CYT score) was calculated as the geometric mean of granzyme-B (GRZB) and perforin (PRF1) expression, as previously described (M. S. Rooney et al., Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell. 160, 48-61 (2015)). TCGA Pancancer Genetic alterations were retrieved from Thorsson V. et al. (V. Thorsson et al., The Immune Landscape of Cancer. Immunity. 48, 812-830. el4 (2018)).

Cancer-associated and cyt-HERV annotation

[186] To define cancer-specificity, differential HERV expression was performed between tumor samples and their respective normal peritumoral matched tissues. Only TCGA studies with at least 10 peritumoral samples were included (n=14 different cancer types). Differential HERV expression analysis was performed independently for each TCGA cancer type. Having filtered-out any HERV expressed more than 2-fold in any normal tissue compared to its matched tumor, remaining HERVs overexpressed more than 2 fold in at least one cancer compared to its normal counterpart (peritumoral tissue) was considered cancer-associated.

[187] To be annotated as potentially immunogenic, each cancer-associated HERV had to be associated with at least one phenotype (A) criterion and one functionality (B) criterion and not be overexpressed by T/NK cells (C). Phenotype criteria included association with either CD4 or CD8+ T cell signatures as defined by the Xcell method. Function criteria included association with either IFN-y or the cytolytic activity, defined by the geometric mean of granzyme-A (GZMA) and perforin (PRF1) expression. Normal PBMC expression was assessed in an independent dataset of sorted-PBMCs from healthy donors (M. R. Corces et al., Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nature Genetics. 48, 1193-1203 (2016)). HERV expression was compared independently in T cells and NK cells to the rest of PBMCs. LI -Penalized regression (LASSO)

[188] Association were calculated by Lasso regression using the glmnet and the c060 packages (J. H. Friedman et al., Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software. 33, 1-22 (2010), M. Sill et al., c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models. Journal of Statistical Software. 62, 1-22 (2014)). Gaussian distribution was considered for the CYT score and the IFN-y signatures, and Poisson distribution was considered for the Xcell signatures. HERVs were analyzed as log2(CPM+l), requiring no further standardization. For each cancer type, a model was built based on optimal parameters found with 10-fold cross validation. Each HERV with a positive coefficient in the final model (based on the lambda parameter minimizing the mean-squared error) was considered to be associated with the variable.

Epitope screening

[189] Open-reading frame (ORF) detection was performed using sixpack from EMBOSS v6.6.0.0 (F. Madeira et al., The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res. 47, W636-W641 (2019)). Detected ORFs of more than 10 amino acids were then aligned to known HML-2 (HERV-K) Gag, Pro, Pol, Env, Rec and Np9 proteins referenced in UniProt (UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506-D515 (2019)). Blast with optimal parameters for retrovirus was used (Word size = 3, composition-based statistics, no “low-complexity-region” filter). Conserved sequences aligned with Gag and Pol proteins with more than 90% identity and an e-value < 0.05 were then screened for predicted HLA-A*02 strong binders using MHCflurry vl.3 (T. J. O’Donnell et al., MHCflurry: Open-Source Class I MHC Binding Affinity Prediction. Cell Systems. 7, 129-132. e4 (2018)). Peptides with a rank <= 0.5 percentile were considered to be strong-binders. The human proteome was downloaded on Uniprot (ID: UP000005640) to validate the absence of match before peptide synthesis and in vitro validation.

Cumulative expression score

[190] The 7r-value score was defined for each HERV and each tissue comparison (TNBC versus peritumoral tissue) as the product of the log2 fold change of expression and the log 10 of the inverse p-value, according to the method proposed by Xiao et al (Y. Xiao et al., A novel significance score for gene selection and ranking. Bioinformatics. 30, 801-807 (2014)). The cumulative expression score was calculated by summing the TT- values of all the HERVs containing the epitope sequence (including CAHs and other HERVs).

Analysis of peptidome proteomic datasets

[191] Raw MS/MS datasets were downloaded from CPTAC (N. J. Edwards et al., The CPTAC Data Portal: A Resource for Cancer Proteomics Research. J Proteome Res. 14, 2707-2713 (2015)) for breast cancer studies (Cancer Genome Atlas Network, Comprehensive molecular portraits of human breast tumours. Nature. 490, 61-70 (2012); K. Krug et al., Proteogenomic Landscape of Breast Cancer Tumorigenesis and Targeted Therapy. Cell. 183, 1436-1456. e31 (2020)). Retrieved MS/MS spectra were converted to MGF format using msconvert from proteowizard (R. Adusumilli, P. Mallick, Data Conversion with ProteoWizard msConvert. Methods Mol Biol. 1550, 339-368 (2017)). Then the list of peptides was analysed using the standalone version of Pepquery (v.1.6.2.0) (B. Wen et al., PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations. Genome Res. 29, 485-493 (2019)). The used command line was: java -XmxlOG -jar pepquery .jar -fixMod 6,62,108 -varMod 117 - maxVar 3 -c 1 -tol 10 -tolu ppm -minScore 12 -e 1 -um -he TRUE -n 1000 -itol 0.05 -m 1 epu 12 -pep ${peptides_list} -db ${Reference_database} -ms ${MS_database} -o ${output_directory } . For the second data set the fixmod and varMod in the command line were adapted like the following: -fixmod 6,103,157 -varMod 101,117.

Riboseq analysis

[192] Ribosome-profiling data were retrieved from a previously published study (F. Loayza-Puch et al., Tumour-specific proline vulnerability uncovered by differential ribosome codon reading. Nature. 530, 490-494 (2016)). Raw fastq files were pre- processed as described in the initial publication. Briefly, adapter sequences were trimmed from raw data using cutadapt 1.1 with parameters (— quality -base=33 -O 12 -m 20 -q 5) and mapped to our reference hgl9-HERV transcriptome. Statistical analysis

[193] All analyses were performed using R statistical software version 3.6.0. Differential HERV expression analysis was performed using DESEQ2 vl.24.0 (M. I. Love et al., Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014)) and logarithmic fold changes were shrunk with the apeglm package (A. Zhu et al., Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics. 35, 2084-2092 (2019)).

Biological samples

[194] Blood from healthy donors was obtained from the “Etablissement Frangais du Sang” (Lyon). Fresh TNBC (n=l l) were provided by the tissue bank of Centre Leon Berard (CLB) (BB-0033-00050, CRB - CLB, Lyon, France; French agreement number: AC-2013-1871), after approval from the institutional review board and ethics committee (L-06-36 and L-l l-26) and patient written informed consent, in accordance with the Declaration of Helsinki.

Peptides synthesis

[195] Peptides were synthetized at JPT peptide Technologies (GE, EU) with a specification and a purity >90%. Lyophylized powder was resuspended in 1% DMSO distilled water.

Cell lines

[196] MDA-MB-231 basal breast cancer epithelial cells were obtained from American Type Culture Collection (ATCC catalog name: HTB-26) and cultured in 10% FBS DMEM (Gibco, FR, EU) 1% Penicillyn/streptomicyn 1% L-Glutammine. HMEC primary cells were obtained from Promocell (GE, EU) and cultured in mammary epithelial growth medium (Promo Cell, GE, EU).

In vitro priming assays

[197] PBMCs were obtained by Ficoll density gradient centrifugation (Eurobio, FR, EU). They were rapidly thawed at 37°C and extensively washed, let at room temperature or overnight at 37°C before assessing their viability. 0.15xl0 6 PBMCs per well were cultured in 96 well plates with AIM V Medium (Gibco, FR, EU) enriched with 5pg/mL R-848 (Resquimod), lOpg/mL HMW poly-IC (both Invivogen, FR, EU), 20IU/mL IL-2 (PROLEUKIN aldesleukine, Novartis Pharma, CH, EU) and lOpg/mL of the peptide of interest at day 0. After 3, 6 and 10 days lOOpL of medium were replaced by enriched fresh medium (IL-2 and peptide only at day 6 and IL-2 only at day 10) and splitted if necessary. On day 12 cells were collected and counted for analysis.

Feeding protocol

[198] Dextramer single-cell sorted CD8 + T cells were expanded on a feeder composed by 35 Gy-irradiated allogeneic PBMCs and B-lymphoblastoic cell lines in a ratio 10: 1. Feeder cells were plated in a 96-well round bottom plate at a concentration of O. lOxlO 6 cells per well in RPMI 5% human serum with PHA-L 1.5pg/mL (Merck KgAa, GE, EU) and IL-2 150 lU/mL (Novartis Pharma, CH, EU) and up to 5xl0 3 sorted cells were added per well. Cells were cultured for 14 days and medium was replaced when needed with fresh IL-2 enriched RPMI 5% human serum. This process was repeated if needed.

TCR immunosequencing

[199] DNA from specific CD8 + T cells and the corresponding bulk PBMCs was extracted using the QIAGEN QIAmp® DNA Blood Micro kit (QIAGEN, GE, EU) and sent for TCR survey and deep analysis to Adaptive Biotechnologies (WA, US).

Generation and refinement of 3D models

[200] The full TCR sequences of both alpha and beta chains were reconstructed for each T cell clone from the results of the immunosequencing as previously published (A. Gros et al., Recognition of human gastrointestinal cancer neoantigens by circulating PD- 1+ lymphocytes. J Clin Invest. 129, 4992-5004 (2019)) For variable domains, TRA and TRB CDR3 nucleotide sequences were obtained from immunosequencing (Fig. 5A) and the 5’ and 3’ ends of the TRAV and TRBV regions were obtained from International Immunogenetics Information System (IMGT) online database. Human constant domains of TRA and TRB were added in 3’ of the variable domains to reconstitute the full-length TCR. These full-length TCR sequences, together with the MHC and peptide sequences were submitted to the CBS TCRpMHCmodels-1.0 web server, specifically developed for the automatic structural modeling of TCR-pMHC complexes (K. K. Jensen et al., TCRpMHCmodels: Structural modelling of TCR-pMHC class I complexes. Sci Rep. 9, 14530 (2019)) using template-based modeling. TCR residues are renumbered using a standardized procedure (K. R. Abhinandan, A. C. R. Martin, Analysis and improvements to Kabat and structurally correct numbering of antibody variable domains. Mol Immunol. 45, 3832-3839 (2008); M. S. Klausen et al., LYRA, a webserver for lymphocyte receptor structural modeling. Nucleic Acids Res. 43, W349-355 (2015)). The initial models generated by the web server were further refined in four rounds using a protocol adapted from Bobisse et al (S. Bobisse et al., Sensitive and frequent identification of high avidity neo-epitope specific CD8 + T cells in immunotherapy-naive ovarian cancer. Nat Commun. 9, 1-10 (2018)). Briefly, the CDR loops were refined by pairs using Modeller software version 9.25 (A. Fiser, A. Sali, Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol. 374, 461-491 (2003); A. Sali, T. L. Blundell, Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol. 234, 779-815 (1993)): CDR loops al/a2 at round 1, al/a3 at round 2, pi/p2 at round 3 and pi/p3 at round 4. Only the residues in coil conformations in the initial models were refined. In each round, 500 models were generated and the best model based on the Modeller internal DOPE score was selected and used as input for the next round.

[201] At the end of the four rounds, a representative model was chosen based on the consensus of unweighted contacts as follows: contacts between residues at the TCR- pMHC interface (defined by a distance lower than 5 Angstrom between heavy atoms) were counted in the set of 4*25 models with best DOPE scores of each refinement round. Then, among the 25 models with best DOPE scores in the final round, the model with the highest number of recurrent contacts, referred as un-normalized CONSRANK score (G. Launay et al., Evaluation of CONSRANK-Like Scoring Functions for Rescoring Ensembles of Protein-Protein Docking Poses. Front Mol Biosci. 7, 559005 (2020); R. Oliva et al., Ranking multiple docking solutions based on the conservation of interresidue contacts. Proteins. 81, 1571-1584 (2013)), is elected as the representative model.

[202] A quantitative view of potential stabilizing interactions between TCR and pMHC is provided by the frequencies of inter-residue contacts observed in the set of 4*25 models with best DOPE scores of each refinement round. Whereas CDR1 and CDR3 loops of both TCRa and TCRP chains interact with both the MHC and the peptide, the CDR2 loops interact mostly with the MHC molecule.

3D structure analysis

[203] The structural similarity between representative models of different complexes was assessed by the RMSD between backbone CDR loops computed with UCSF Chimera (E. F. Pettersen et al., UCSF Chimera— a visualization system for exploratory research and analysis. J Comput Chem. 25, 1605-1612 (2004)). As expected, structural variability was highest within CDRP3 loops. UCSF Chimera was also used for hydrogen bonds and hydrophobic contact detection and structure visualization.

Binding affinity prediction

[204] The binding affinity was predicted using the prodigy method (A. Vangone, A. M. Bonvin, Contacts-based prediction of binding affinity in protein-protein complexes. Elife. 4, e07454 (2015)) which uses a linear model based on the number and types of contacts at the interface. For each complex, instead of running one prediction on the representative model, we averaged the predictions obtained for the 25 models with best DOPE scores obtained at round 4 of the refinement protocol.

Fluorospot

[205] After co-culture of PBMCs or CD8 + T cells with T2 cells pulsed or not with the peptide in a ratio 10: 1 in AIM V medium (Gibco, FR, EU) at 37° C 9% CO2 for 24 hours, double-color Fluorospot (CTL GmbH, CA, US) with ZFN-y AF488 and Grz-b CTL-red was performed according to manufacturer’s instructions. Revelation plate was read on ImmunoSpot® S6 ULTIMATE UV Image Analyzer and analyzed with the ImmunoSpot® analysis software.

Functional avidity

[206] Dextramer-isolated specific CD8 + T cells for the selected peptides (Pl, P2, P6) were used in a functional avidity TFN-y production test by enzyme-linked immunosorbent assay (IFN-y ELISA, Thermo Fisher scientific, FR, EU). Two CMV T cell clones specific for the immuno-dominant epitope N9V (NLVPMVATV) were used. N9V-1 corresponds to pp65 dextramer-selected CD8 + T cells and N9V-2 is a CD8 + T cell clone kindly provided by Dr Henri Vie. Functional avidity of specific CD8 + T-cell responses was assessed by performing limiting peptide dilutions from IO' 4 to 10' 9 M (log) charged on T2 cells pulsed for 5 hours. After wash, peptide-pulsed T2 cells were co-cultured with specific CD8 + T cell in a ratio 1 : 1 in AIM-V medium (Gibco, FR, EU) supplemented with 5% of human serum. After 18 hours, supernatants were collected and ELISA was performed. The peptide concentration required to achieve a half maximal cytokine response (ECso) was determined (graphpad prism, version 6.0 for Windows was used for the 50% EC (ECso) determinations, R>0.98)

Epitope Validation and Quantification by MS

[207] Epitope validation and quantification by MS was performed by Complete Omics Inc. (Maryland, USA) according to the method previously described (J. Douglass et al. Bispecific antibodies targeting mutant RAS neoantigens. Sci Immunol. 6, eabd5515 (2021)) with further modifications. In brief, a total of 300 million cells were lysed and peptide-HLA complexes were immunoprecipitated using self-packed Valid-NEO neoantigen enrichment column pre-loaded with anti-human HLA-A, B, C antibody clone W6/32 (Bio-X-Cell). After elution, dissociation, filtration and clean up, peptides were lyophilized before further analysis. Transition parameters for each epitope peptide were examined and curated through Valid-NEO method builder bioinformatics pipeline to exclude ions with excessive noise due to co-elution with impurities and to boost up the detectability through recursive optimizations of significant ions. Absolute copy numbers of peptides presented on the cell surface were calculated based on the quantification using the heavy isotope labeled peptides. The MS data have been deposited via Proteom exchange and can be accessed through identifier PAS SO 1698.

Live imaging

[208] Cells were plated in DMEM (Gibco, FR, EU) medium, 10% SVF, 1% Penicillin/Streptomycin. For IncuCyte analysis medium was removed from the 96 wells plate after overnight cell adhesion. A blocking HLA-A2 antibody (GeneTex, clone BB7.2, GTX75806, CA, US) was added in AIM-V medium (Gibco, FR, EU) for 1 hour, according to conditions. T cells were then added in an effector to target (E:T) ratio 2:1 in the presence of IncuCyte Cytotox dye (Essen Bioscience, UK, EU) for cell death quantification. A 48-hour live imaging was performed at 37°C 5%CO? with Incucyte Zoom. Cell death was calculated as the total number of counted stained cells corrected by the number of counted stained cells at baseline. Maximum killing was established using DMSO. Specific lysis was calculated according to the following formula:

% specific lysis = (((HERV-specific T cells induced target cell death - spontaneous target cell death) - (non-specific dextramer-negative T cells induced target cell death - spontaneous target cell death)) / (DMSO induced target cell death - spontaneous target cell death)) x 100

[209] For Nanolive imaging T cells were then added with an E:T 10: 1 and phase imaging was performed every minute using Nanolive microscope 3D cell explorer.

Tumor dilacerations: organoids and TILs expansion

[210] Tumor tissues were dissected into fragments of approximately 1 mm 3 and dilacerated with collagenase IV and DNAse for 45 minutes in 20% SVF enriched RPMI. The tumor lysate was centrifuged at 1500 rpm for 5 minutes and resuspended in 5% human serum enriched RPMI. Cells were counted and plated at a density of 5X10 4 cells per well in a flat bottom 96-well plate with anti-CD3 anti-CD28 Dynabeads (Dynabeads, Gibco, EU) and IL-2 at 100 lU/mL in a ratio beads to cells of 1 :4.

[211] For organoids, a part of the tumor lysate (3 to 10 million of cells) was resuspended in lOmL of Advanced DMEM/F12 medium. Cells were centrifuged at 500 ref for 10 seconds and then resuspended in full medium. This protocol was repeated 3 to 5 times according to the cell number at the beginning to enrich the cell suspension in epithelial cells. These cells were then cultured according to the protocol previously described by Drehuis et al. (E. Driehuis et al., Establishment of patient-derived cancer organoids for drug-screening applications. Nature Protocols. 15, 3380-3409 (2020)).

Multi-parametric Flow Cytometry

[212] T cells were counted and co-cultured with T2 cells loaded or not with the cognate peptide in a 5: 1 ratio. After one hour CD107a antibody (BD, clone H4A3) was added in each well with Golgi plug (1/1000) (10 pg/mL, BD, FR, EU). After 5 hours, viability, surface and intra-cellular staining were performed. To assess cytokine expression in CD8 + T cells an intracellular staining with the FoxP3 Fixation and Permeabilization kit (Thermo Fisher scientific, Life Technologies, CA, US) was used, according to manufacturer’s instructions.

[213] Dextramer staining was performed on PBMCs after a 12-day culture (priming protocol) or on TILs expanded for 14 days after tumor dilaceration. Cells were washed in 2 mL washing buffer (PBS + 2% FBS + 2mM EDTA (Sigma Alderich, MI, US)) and stained for 10 minutes with dextramers (Immudex ApS, DK, EU) at room temperature prior to viability and surface marker staining. Washing was performed 2 times to avoid non-specific dextramer staining. CMV pp65 NLVPMVATV was used as positive control. For TILs analysis, a dextramer complexed to a non-natural irrelevant peptide (ALIAPVHAV) was used as negative control.

[214] All samples were analyzed on a LSR-Fortessa (BD Biosciences, FR, EU) with conserved settings throughout the entire study. Data were analyzed using FlowJo Software (Tree Star vl0.4, NJ, USA).

Results

A machine learning-based approach allows the identification of HERVs associated with CTL response

[215] To optimize the epitope detection, we developed a new pipeline for annotating HERVs. For this, we reviewed multiple HERV databases and selected a recent and complete reference of 3,173 HERVs, mostly composed of complete proviral sequences, thus having a higher probability of containing translated peptides. We assessed HERV expression in 8,893 primary tumor samples from 29 different cancer types from The Cancer Genome Atlas (TCGA) pancancer RNAseq database using HervQuant. We selected cancers with at least 10 available matched peritumoral samples (n = 14) to filter HERVs highly expressed in tumors and not in normal tissue (cancer-associated HERVs or CAHs). Differential HERV expression analysis unveiled 1,134 CAH candidates (Fig. 1 A). To reduce the number of candidates to test, we next applied a second filter to retain only HERVs associated with a CTL response among CAHs (cyt-HERVs). Cyt-HERV annotation was based on two inclusion criteria, namely the association of each HERV with at least one CD4 or CD8 T cell phenotype (A) and function (B) signature, and one exclusion criterion, namely its expression by purified T or NK cells (C) (A and B not C) (Fig. IB). To reduce the risk of false-positive association and control for the high collinearity encountered with HERV expression, these associations were evaluated by LI penalized regression (R. Tibshirani, Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. 58, 267-288 (1996)) to retain only HERVs highly associated with CTL responses, controlling for cancer subtypes. A machine learningbased approach was used to test the associations independently for each cancer type (see Methods for full details) leading to the final identification of 192 cyt-HERVs (Fig. 1C). Sub-cancer analysis revealed that colon adenocarcinoma (COAD), lung squamous cell carcinoma (LUSC), head and neck squamous cell carcinoma (HNSC), bladder urothelial carcinoma (BLCA) and lung adenocarcinoma (LU AD) were the top 5 cancers with the highest total number of cyt-HERVs (Fig. ID).

[216] Overall, cyt-HERVs constituted around 15% of total CAHs, greatly reducing the number of potential candidates. Among the most shared cyt-HERVs, 11 were overexpressed in more than 10 different types of cancers, including 3 HERVs (herv_2256, herv_6069 and herv_4700) formerly reported to induce CD8+ T cell responses. Analysis of the mean beta-value of the 10 nearest surrounding probes from TCGA Illumina 450k methylation data revealed more cyt-HERVs significantly correlated with local demethylation (n = 37) than methylation (n = 15), suggesting a partial epigenetic control of these HERVs.

Selection of conserved Gag and Pol HERV-K/HML-2 motifs among cyt-HERVs leads to the identification of shared CD8+ T cell epitopes

[217] We next assessed the presence of shared T cell epitopes among these cyt-HERVs, focusing on HLA-A2, the most common HLA class I allele. We translated our 192 cyt- HERV sequences into the 6 possible frames and retained predicted open-reading frames (ORFs) of at least 10 amino acids. To reduce the number of false-positives (non-translated sequences), we aligned these ORFs against known HERV-K/HML-2 Gag and Pol proteins referenced in UniProt and kept only ORFs with 90% homology with known existing HML-2 proteins. This conservative approach led to the identification of 57 HML- 2 HLA-A*0201 epitope candidates from 27 distinct ORFs (Fig. 2A) with herv_2410 and herv_6069 showing the highest number of conserved HML-2-derived ORFs. To better appreciate the distribution of these epitopes, we relocated each peptide among all the CAHs. The top 25 most shared epitopes are shown in Figure 2B. Thirteen unique epitopes were present in at least 10 different HERVs (Fig. 2B). For further biological validation and immunological assays, we selected 6 of the most shared epitope candidates: 3 from Gag (Pl, P2 and P4) and 3 from Pol (P3, P5 and P6) (Fig. 2C). Analysis of mass spectrometry (MS) data from TCGA and Clinical Proteomic Tumor Analysis Consortium (CPTAC) showed evidence of translation for Pl, P2, P3, P5 and P6 peptides. P4 was also selected as it had been described among HLA-I eluted peptides from tumors (patent W02019/162110A1). Importantly, alignment against the human proteome revealed that the sequences of these epitope candidates did not match any self-protein sequence. Of note, these HLA-A2 epitopes were not predicted as strong binders for the other most common HLA-A and B alleles.

Triple negative breast cancer (TNBC) is characterized by many cyt-HERVs containing shared HERV epitopes

[218] Owing to the well-characterized expression of HERVs in TNBC and the availability of a RNA sequencing (RNAseq) database comprising normal samples, we then focused on breast cancer. Differential HERV expression analysis uncovered a total of 497 CAHs expressed across different breast cancer subtypes, among which 91 were annotated as cyt-HERVs. Fifty-four of these 91 cyt-HERVs were expressed in the basal subtype (Fig. 3A). The mean expression of these 54 cyt-HERVs was significantly higher in TNBC and ER+ samples compared to peritumoral or normal breast tissues from an independent dataset (Fig. 3B). We confirmed the high expression of these 54 cyt-HERVs in breast cancer cell lines sequenced in Varley et al.’s study and in cell lines from the Broad Institute Cancer Cell Line Encyclopedia. The top 25 most shared epitope candidates among all CAHs expressed in the basal subtype contained the 6 previously identified peptides P1-P6.

[219] We next selected HERVs containing the sequences of the 6 previously identified epitope candidates P1-P6 among the CAHs expressed in the basal subtype. Eighteen different CAHs contained at least one of the 6 peptides in their ORFs (Table 1). Table 1

[220] Genomic mapping of the corresponding loci showed a diffuse location for these 18 HERVs on chromosomes. To quantify the differential expression of the epitope- containing HERVs in TNBC versus normal tissues (represented here by each available peritumoral sample), we used the w-value score that takes into account both statistical significance, given by the p-value, and biological significance expressed by the fold change. A cumulative expression score was then calculated for each epitope by summing the 7t-values of all the HERVs containing its sequence. This score was between 10 and 200 in most cases, which confirmed the significant overexpression of the epitopecontaining ELERVs in TNBC versus each evaluated normal sample. Finally, analysis of ribosome profiling (riboseq) data from a previously published study revealed evidence of translation for the 18 peptide-containing CAHs in 4 different breast tumor cell lines including 2 basal and 2 luminal A subtypes (Fig. 3C).

[221] Overall, our bioinformatics approach allowed us to select a limited number of HERV-derived T cell epitopes specifically overexpressed by tumor cells and most likely to be immunogenic among a large number of HERV candidates.

HERV-derived epitopes induce strong and polyfunctional T cell responses

[222] We then evaluated the capacity of the selected epitope candidates to induce efficient T cell responses. The HLA-A2 affinity of the 6 selected peptides was first confirmed using an in vitro binding assay on purified HLA-A*02:01 molecules. To assess the immunogenicity of these 6 peptides, we developed an optimized in vitro priming assay performed on peripheral blood mononuclear cells (PBMCs) from HLA-A2-positive donors (Fig. 4A and Methods for details). The dextramer-based quantification of peptidespecific CD8+ T cells revealed the presence of specific T cells for all peptides, with variations among donors (Fig. 4B). Pl appeared to be the most immunogenic peptide with significant T cell responses in 9/11 donors, followed by P4 (7/10), P6 (4/9) and P2 (3/11) (Fig. 4B). The immunogenicity of these peptides was further confirmed by a classical assay using monocyte-derived dendritic cells (MoDCs) prepared from 5 HLA-A2- positive healthy donors. Flow cytometry analysis showed a CD8+ T cell IFN-y production when peptide-stimulated PBMCs were co-cultured with T2 cells pulsed with the cognate epitopes. Of note, Pl also induced the highest IFN-y response compared to the other peptides. In agreement with the bioinformatics prediction, no specific T cell induction was observed using PBMCs from HLA-A2 negative donors (n=5).

[223] Based on these results, we selected Pl, P2, P4 and P6 for further experiments. A polyfunctional IFN-y+ TNF-a+-specific CD8+ T cell response was observed upon coculture of stimulated PBMCs with peptide-pulsed T2 cells, associated with the presence of the degranulation marker CD 107a (Fig. 4C). Fluorospot assay in the same co-culture conditions confirmed the secretion of IFN-y and granzyme B with the presence of doublepositive cells. Epitope-specific CD8+ T cell clones are characterized by T cell receptors (TCR) of high predicted affinity

[224] Pl, P2, P4 and P6-specific CD8+ T cells were sorted by flow cytometry using dextramer staining and expanded on feeder cells (see Methods). More than 90% (90-99%) of the CD8+ T cells were dextramer-positive after one (Pl) or two steps (P2, P4 and P6) of selection-expansion. TCRP immunosequencing confirmed the presence of dominant clones with a unique VP rearrangement representing 90.8%, 90.7%, 99.6% and 76% of the expanded T cells for Pl, P2, P4 and P6, respectively (Fig. 5 A). Of note, the V/D/J recombination sequences of TCRP characterizing these clones were not present in the T cell bulk before peptide stimulation (threshold sensitivity: 3x10-6). TCRa chains were also sequenced and confirmed the presence of a unique major clone for Pl, P4 and P6, enabling TCR pairing and modeling. Because 2 major Va rearrangements were obtained for P2, the predominant rearrangement occurring at a 60% frequency was used for TCR modeling.

[225] The affinity of the T cell clones specific for the peptides Pl, P2, P4 and P6 was then characterized by considering 3D models of the TCR-peptide-MHC (pMHC) complexes (see Methods). The stability of macro-molecular complexes is due to the formation of favorable interactions at the interface such as hydrogen bonds, salt bridges and hydrophobic interactions. These interactions involve specific side-chains of the peptides that are exposed at the TCR-pMHC interface. In the TCR Pl complex, Phel, Phe4 and Trp8 side chains of the peptide form several hydrophobic interactions, and the backbone atoms of Phe 4 and He 9 are involved in H-bonds. In the TCR P2 complex, several hydrophobic interactions are mediated by peptide residues Pro4, Tyr5 and Trp7. In TCR P4, peptide residues Ile5, Ile7 and Leu8 form several hydrophobic interactions, while Tyrl and Lys6 side chains, as well as Phe4 and Leu8 backbone atoms are involved in H-bonds. In TCR P6, Tyrl, Ser4, Asn5, Leu6 and Phe7 form several hydrophobic interactions, while Ser4/Leu6/Ser8 backbones and Tyrl/Ser4/Asn5/Ser8 side-chains form 8 H-bonds.

[226] Overall, this analysis of the predicted 3D models suggests that the TCR-pMHC complexes are stabilized by several favorable non-covalent interactions, supporting that the TCR identified after clonal expansion of HERV-specific T cells form a stable complex with the peptides presented by HLA-A2 molecules. To gain further insight, we submitted the 3D models to binding affinity prediction (Fig. 5B). When compared to reference TCR- pMHC complexes available in the Protein Data Bank and obtained from crystallography data, the predicted affinities of the identified TCRs match clinically relevant TCR affinities, such as TCRs targeting MAGE- A3, NY-ESO-1, MART-1, HTLV or CMV (Fig. 5C). Hence, the HERV-specific TCRs identified are predicted to stably interact with their respective pMHC complexes, reminiscent of high affinity TCRs.

High avidity HERV-specific T cell clones recognize and kill tumor cells

[227] The functionality of the sorted and expanded epitope-specific CD8+ T cells was confirmed by Fluorospot using peptide-pulsed T2 cells. The functional avidity was subsequently assessed by loading T2 cells with decreasing concentrations of the cognate peptide (ranging from 10-4 to 10-9 M) and measuring the lowest peptide concentration necessary to provoke IFN-y responses in 50% of cells (defined as half maximal effective concentration, i.e. EC50). The EC50 values, estimated at 6.6 x IO' 7 M, 1.9 x IO' 6 M and 6.8 x IO' 6 M for Pl, P2 and P6-specific T cells, respectively, are in the same order of magnitude as neoepitope-specific T cell clones (28) and CMV-specific T cells (1.2 x 10" 6 and 1.9 x IO' 6 for N9V-1 and N9V-2, respectively) (Fig. 6A).

[228] We next assessed the capacity of these HERV epitope-specific CD8+ T cells to recognize and kill tumor cells. We selected as a target candidate the HLA-A2-positive MDA-MB-231 basal BRCA tumor cell line, previously shown to express HERVs containing epitope sequences (Fig. 3C). To provide evidence that the epitopes are actually presented on the cell surface, a MS-based method was used to analyze peptides eluted from HLA molecules. Pl and P6 epitopes were clearly detected by MS. On the basis of comparison with the heavy isotope-labeled control, we estimated that there were 1.8 copies of Pl-HLA complexes on MDA-MB-231 cell surface.

[229] Tumor cells were co-cultured with the epitope-specific T cells or with the dextramer-negative CD8+ T cell fraction sorted and expanded in the same conditions (negative controls). Flow cytometry analysis highlighted IFN-y production by approximately 25% of epitope-specific T cells in contact with MDA-MB-231, with a significant increase (> 6-fold) compared to the background observed with non-specific T cells. This IFN-y production was inhibited by a HLA-A2 blocking monoclonal antibody, demonstrating that the T cell clones specifically recognized the tumor cells in a HLA-A2 restricted manner.

[230] In order to monitor tumor cell death in real-time, we performed an immune-cell killing assay using the IncuCyte technology. T cell clones induced a significant and HLA- A2-restricted killing of MDA-MB-231 cells, as shown by the time-dependent increase in the amount of Cytotox fluorescent reagent of target cells. In contrast, the dextramer- negative fraction of T cells did not induce significant cell death of MDA-MB-231 cells (pulsed or not with the peptide) (Fig. 6B). Specific lysis (at effector to target ratio E:T = 2: 1) was calculated based on the quantification of target cell death after 48 hours after subtracting the alloreactive background (assessed by the target cell death induced with the corresponding dextramer-negative T cell fraction) (see Methods). A particularly high specific lysis of the tumor cells was achieved with Pl and P2-specific T cells (35% and 44%, respectively), with a more moderate lysis (15%) with P6-specific T cells. The specific lysis was further increased when the target tumor cells were pulsed with the cognate epitope, reaching 55%, 80% and even 95% for Pl, P2 and P6-specific T cells, respectively. Of note, epitope-specific T cell clones did not kill HLA-A2-positive human mammary epithelial cells (HMECs) used here as a negative, normal cell, control (Fig. 6C). These results were further validated using the 3D microscopy Nanolive technology, showing morphological signs of activation of specific T cells associated with killing of the majority of the tumor cells after 4.5 hours (E:T=10: l). Again, no cell death was observed when the specific T cells were co-cultured with HMECs and when MDA-MB- 231 cells were co-cultured with non-specific T cells. Similar results were obtained using the TNBC HLA-A2-positive cell line HCC1599 as target. Altogether, these data show that the selected epitopes elicit high avidity CD8+ T cell clones that selectively recognize and kill HERV-expressing tumor cells.

HERV-specific T cells are present among tumor infiltrating T cells

[231] In order to test our hypothesis that an adaptive immune response against HERVs may exist in cancer patients, we assessed by dextramer-staining the presence of HERV epitope-specific T cells among polyclonally expanded tumor infiltrating lymphocytes (TILs) from TNBC HLA-A2 patients (without any peptide-specific stimulation). HERV- specific TILs were observed for at least one epitope in 7 of the 11 analyzed tumor samples, with variations in terms of epitope specificity and frequency from one patient to another. Pl, P4 and P6 were the most frequently recognized peptides, with a dextramer- based identification in 4/11, 4/11 and 5/11 cases, respectively.

[232] These results prompted us to investigate the potential link between the outcome of TNBC patients and the expression of the 18 CAHs containing these HLA-A2 epitopes. We established a score based on the mean expression of these 18 HERVs in HLA-A2 patients with basal breast cancer from TCGA cohort. Interestingly, HLA-A2 -positive patients with a high or intermediate 18-HERVs score had a significantly better overall survival than those with a low score (P = 0.0066) (Fig. 7A). This prognostic impact was not observed in the overall population.

[233] Finally, we evaluated the antitumor activity of HERV-specific T cells against primary tumor cells by using organoids derived from the tumor of patient 8 (see Methods). RNAseq analysis confirmed the expression of the 18 epitopes-containing CAHs at early and late passage. Tumor organoids were co-cultured with Pl, P6 or CMV-specific CD8+ T cell clones in a 3D microscopy Nanolive experiment (E:T=10: 1). Whereas no activation of T cells was observed with CMV-specific T cells, Pl and P6-specific T cells exhibited signs of active proliferation associated with lysis of the organoids (Fig. 7B).

[234] Altogether, these last results suggest that HERV-specific T cells are induced during tumor development and may participate in the antitumor immune response.