Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
BIOMARKERS OF SQUAMOUS CELL CARCINOMA OF HEAD AND NECK, PROGNOSTIC MARKERS OF RECURRENCE IN SQUAMOUS CELL CARCINOMA OF HEAD AND NECK, AND METHODS THEREOF
Document Type and Number:
WIPO Patent Application WO/2017/077499
Kind Code:
A1
Abstract:
The present disclosure relates to biomarkers of cancer and prognostic markers of recurrence in cancer, particularly head and neck squamous cell carcinomas (HNSCC), method of analysing role of said biomarkers/prognostic markers, corresponding methods of detecting cancer and determining/predicting recurrence, and kits thereof. In particular, the present disclosure relates to analyzing aberrations and providing biomarkers of HNSCC and prognostic markers of recurrence in HNSCC and associated methods/applications. Further, said prognostic markers differentiate non-recurring, loco-regionally recurring and distant metastatic tumors.

Inventors:
PANDA BINAY (IN)
KRISHNAN NEERAJA M (IN)
GUPTA SAURABH (IN)
Application Number:
PCT/IB2016/056652
Publication Date:
May 11, 2017
Filing Date:
November 04, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
GENOMICS APPLICATIONS AND INFORMATICS TECH (GANIT) LABS (IN)
International Classes:
G01N33/574; C12Q1/00
Domestic Patent References:
WO2009039341A22009-03-26
Other References:
KRISHNAN NM ET AL.: "Integrated analysis of oral tongue squamous cell carcinoma identifies key variants and pathways linked to risk habits", HPV, CLINICAL PARAMETERS AND TUMOR RECURRENCE, vol. 4, no. 1215, 11 October 2015 (2015-10-11), pages 1 - 18, XP055381915, Retrieved from the Internet
KRISHNAN NM ET AL.: "Integrated analysis of oral tongue squamous cell carcinoma identifies key variants and pathways linked to risk habits", HPV, CLINICAL PARAMETERS AND TUMOR RECURRENCE, 11 October 2015 (2015-10-11), pages 1 - 18, XP055381918, Retrieved from the Internet
Attorney, Agent or Firm:
MUKHARYA, Durgesh et al. (IN)
Download PDF:
Claims:
We Claim:

1. A method of predicting recurrence of malignancy in a subject having or suspected of having head and neck squamous cell carcinoma (HNSCC), said method comprising step of detecting aberration in at least one gene selected from a group comprising WASH4P, SLC01A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, K DC1, LILRA6, MARCH3, NANOG, POLRIC, PPFIA1, RP3-323P13.2, STCl, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, D M1P47, DUX4L9, HNF1A, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2, TRBVl l-1, ABCD1P2, ETV7, HDC1, RP11- 438J1.1, SLC9B1P1 and Z F618.

2. The method as claimed in claim 1, wherein detecting aberration in the gene comprises steps of:

collecting tumor sample and matched control sample from the subject; performing clinical, pathological and epidemiological examination of the sample, the clinical and the epidemiological parameter is selected from a group comprising detection of FIPV status in the tumor and staging, or a combination thereof; isolating nucleic acid from the samples by methods selected from a group comprising PureLink RNA mini kit, Qiagen RNA isolation kit or any other total RNA isolation kit; generating data on gene sequencing, QC alignment, variant identification, post-processing filters, or any combination thereof, using tools selected from a group comprising Agilent SureSelect, Illumina TruSeq, Nextera exome capture kits, HiSeq 2500, GAIIx, Illumina Base caller, NovoAlign, Samtools, GATK, Picard, Dindel and any other tool used to generate said data, or any combination thereof; detecting cross-contamination in the sample, identifying significant somatic variants, annotating, analyzing the variants, or any combination thereof, using tools selected from a group comprising ContEst, CRAVAT, CHASM analysis, IntOGen, MutSigCV, MutSiC2 and any other tool used to detect cross-contamination in the sample, followed by identifying significant somatic variants, annotate and analyze the variants, or any combination thereof; performing S P genotyping, variant re-validation, or a combination thereof, using tools selected from a group comprising whole-genome SNP genotyping arrays, Qubit, Illunima Human Omni and any other tool used for SNP genotyping and variant re-validation, or any combination thereof; determining Copy number variations (CNVs), Loss of Heterozygosity (LOH) or a combination thereof, using tools selected from a group comprising cnv Partition 3.1.6 plugin in Illumina Genome Studio v2011.1 and CNV annotator, and any other tool used for determining CNVs and Loss of Heterozygosity, or any combination thereof; carrying out gene expression profiling using tools selected from a group comprising Illumina HumanHT-12-v4 expression Bead chip, PureLink RNA kit, RNeasy (Qiagen) Mini kit, Agilent Bioanalyzer, RNA Nano6000 chip, Illumina WGDASL aasay, Illumina Total Prep RNA Amplification kit (Ambion), Illumina Hi Scan, Genome Studio, VST (Variance stabilizing transformation), LOESS and R package Lumi, ComBat and any other tool used for gene expression profiling, or any combination thereof; predicting recurrence using random forest analysis, error correction, recomputing, or any combination thereof using tools selected from a group comprising varSelRF package, leave-one-out bootstrap method, .632+ method, Benjamin- Hochberg test and any other tool used for predicting recurrence, error correction, recomputing, or any combination thereof; analyzing pathways using tools selected from a group comprising Graphite Web, KEGG, Reactome databases, CytoScape and any other pathway analysis tool, or any combination thereof; visualizing data using tools selected from a group comprising Circos, Mutation Mapper, GIMP, IGV and any other data visualizing tool, or any combination thereof; validating somatic variants using Sanger sequencing or any other tool known for validating somatic variant, to analyse the role of genetic aberration(s) in HNSCC; and performing statistical analyses to determine biomarkers for recurrence the malignancy using any statistical analyses tool capable of determining biomarkers for the recurrence.

3. Aberration of WASH4P, SLC01A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, K DC1, LILRA6, MARCH3, NANOG, POLRIC, PPFIA1, RP3-323P13.2, STC1, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNF1A, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2, TRBVl l-1, ABCD1P2, ETV7, HDCl, RP11-438J1.1, SLC9B1P1 or Z F618, or any combination thereof for predicting recurrence of malignancy in a subject having or suspected of having HNSCC.

4. Use of aberration of WASH4P, SLC01A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDC1, LILRA6, MARCH3, NANOG, POLRIC, PPFIA1, RP3-323P13.2, STC1, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNFIA, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2, TRBVl l-1, ABCD1P2, ETV7, HDCl, RP11-438J1.1, SLC9B1P1 or ZNF618, or any combination thereof for predicting recurrence of malignancy in a subject having or suspected of having HNSCC.

5. A kit for predicting recurrence of malignancy in a subject having or suspected of having HNSCC, said kit comprising agent for detecting aberration of WASH4P, SLC01A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDCl, LILRA6, MARCH3, NANOG, POLRIC, PPFIA1, RP3- 323P13.2, STC1, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNFIA, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MSTl, NOTCH3, TATDN2, TRBVl l-1, ABCD1P2, ETV7, HDCl, RP11-438J1.1, SLC9B1P1 or ZNF618, or any combination thereof, for predicting of malignancy in a subject having or suspected of having HNSCC.

6. The method as claimed in claim 1, the aberration as claimed in claim 3, use as claimed in claim 4 and the kit as claimed in claim 5, wherein the recurrence is loco- regional recurrence or distant metastasis or both.

7. The method as claimed in claim 1, the aberration as claimed in claim 3, use as claimed in claim 4 and the kit as claimed in claim 5, wherein the aberration in gene selected from a group comprising RPS9, MARCH3, LILRB3, LILRA6, ADM2, TTC39A, TPCN2, STC1, RP3-323P13.2, PPFIA1, POLRIC, NANOG, KNDCl, KDM4C, HOXB2, CBR4, ARHGEF4 and AKR1C2, or any combination thereof, predicts loco-regional recurrence of the malignancy in the subject.

8. The method or the aberration or the use or the kit as claimed in claim 7, wherein the aberration in at least 4 gene selected from a group comprising RPS9, MARCH3, LILRB3, LILRA6, ADM2, TTC39A, TPCN2, STC1, RP3-323P13.2, PPFIA1,

POLRIC, NANOG, KNDCl, KDM4C, HOXB2, CBR4, ARHGEF4 and AKR1C2, predicts the loco-regional recurrence and wherein the at least 4 gene mandatorily comprise at least one gene selected from RPS9, LILRA6, LILRB3, MARCH3 or ADM2, to predict the loco-regional recurrence in the subject.

9. The method as claimed in claim 1, the aberration as claimed in claim 3, use as claimed in claim 4 and the kit as claimed in claim 5, wherein the aberration in gene selected from a group comprising AQR, FBN1, MST1, NOTCH3, ATDN2, TRBVl 1-1, ABCD1P2, ETV7, FHDCl, RPl 1-438 Jl .1, SLC9B1P1 and ZNF618, or any combination thereof, predicts distant metastasis of the malignancy in the subject.

10. The method or the aberration or the use or the kit as claimed in claim 9, wherein the aberration in

AQR, FBN1, MST1, NOTCH3, TATDN2 and TRBVl 1-1,

AQR, FBN1, MST1, NOTCH3 and TATDN2;

AQR, FBN1, MST1, NOTCH3 and TRBVl 1-1;

AQR, FBN1, MST1, TATDN2 and TRBVl 1-1;

AQR, FBN1, NOTCH3, TATDN2 and TRBVl 1-1

AQR, MST1, NOTCH3, TATDN2 and TRBVl 1-1; or

FBN1, MST1, NOTCH3, TATDN2 and TRBVl 1-1 predicts distant metastasis of the malignancy in the subject.

11. The method as claimed in claim 1, the aberration as claimed in claim 3, use as claimed in claim 4 and the kit as claimed in claim 5, wherein the aberration is selected from a group comprising up-regulation, down-regulation, amplification, mutation, loss of heterozygosity, copy number variations, structural variations, somatic mutations, gene fusion events, allelic expression, chromosomal aberrations, epigenetic changes, DNA methylation, histone modification and non-coding RNA (ncRNA)-associated gene silencing, or any combination thereof.

12. The method as claimed in claim 1, the aberration as claimed in claim 3, use as claimed in claim 4 and the kit as claimed in claim 5, wherein the HNSCC is selected from a group comprising cancer of hypopharynx, laryngeal cancer, cancer of oral cavity, nasopharyngeal cancer, oropharyngeal squamous cell carcinomas and cancer of trachea.

13. A method of detecting head and neck squamous cell carcinoma (HNSCC) in a sample having or suspected of having the HNSCC, said method comprising a step of detecting aberration in TP53, CASP8, OBSCN, ING1, TTK, U2AF1, RASA1, CDKN2A, NOTCH1, NOTCH2, DMD, PIK3CA, AJUBA, ANK3, FAT1, HLAA,

KEAP1, KMT2D and NFE212, in the sample to detect the HNSCC.

14. The method as claimed in claim 13, wherein the aberration is selected from a group comprising up-regulation, down-regulation, amplification, mutation, loss of heterozygosity, copy number variations, structural variations, somatic mutations, gene fusion events, allelic expression, chromosomal aberrations, epigenetic changes, DNA methylation, histone modification and non-coding RNA (ncRNA)-associated gene silencing, or any combination thereof and wherein the HNSCC is selected from a group comprising cancer of hypopharynx, laryngeal cancer, cancer of oral cavity, nasopharyngeal cancer, oropharyngeal squamous cell carcinomas and cancer of trachea.

15. The method as claimed in claim 13, wherein the aberration in CASP8 serve as indicator in HPV negative HNSCC subject.

16. The method as claimed in claim 13, wherein detecting aberration in the gene comprises steps of:

collecting tumor sample and matched control sample from the subject; performing clinical, pathological and epidemiological examination of the sample, the clinical and the epidemiological parameter is selected from a group comprising detection of HPV status in the tumor and staging, or a combination thereof; isolating nucleic acid from the samples by methods selected from a group comprising PureLink RNA mini kit, Qiagen RNA isolation kit or any other total RNA isolation kit; generating data on gene sequencing, QC alignment, variant identification, post-processing filters, or any combination thereof, using tools selected from a group comprising Agilent SureSelect, Illumina TruSeq, Nextera exome capture kits, HiSeq 2500, GAIIx, Illumina Base caller, NovoAlign, Samtools, GATK, Picard, Dindel and any other tool used to generate said data, or any combination thereof; detecting cross-contamination in the sample, identifying significant somatic variants, annotating, analyzing the variants, or any combination thereof, using tools selected from a group comprising ContEst, CRAVAT, CHASM analysis, IntOGen, MutSigCV, MutSiC2 and any other tool used to detect cross-contamination in the sample, followed by identifying significant somatic variants, annotate and analyze the variants, or any combination thereof; performing SNP genotyping, variant re-validation, or a combination thereof, using tools selected from a group comprising whole-genome SNP genotyping arrays, Qubit, Illunima Human Omni and any other tool used for SNP genotyping and variant re-validation, or any combination thereof; determining Copy number variations (CNVs), Loss of Heterozygosity (LOH) or a combination thereof, using tools selected from a group comprising cnv Partition 3.1.6 plugin in Illumina Genome Studio v2011.1 and CNV annotator, and any other tool used for determining CNVs and Loss of Heterozygosity, or any combination thereof; carrying out gene expression profiling using tools selected from a group comprising Illumina HumanHT-12-v4 expression Bead chip, PureLink RNA kit, RNeasy (Qiagen) Mini kit, Agilent Bioanalyzer, RNA Nano6000 chip, Illumina WGDASL aasay, Illumina Total Prep RNA Amplification kit (Ambion), Illumina Hi Scan, Genome Studio, VST (Variance stabilizing transformation), LOESS and R package Lumi, ComBat and any other tool used for gene expression profiling, or any combination thereof; predicting recurrence using random forest analysis, error correction, recomputing, or any combination thereof using tools selected from a group comprising varSelRF package, leave-one-out bootstrap method, .632+ method, Benjamin- Hochberg test and any other tool used for predicting recurrence, error correction, recomputing, or any combination thereof; analyzing pathways using tools selected from a group comprising Graphite Web, KEGG, Reactome databases, CytoScape and any other pathway analysis tool, or any combination thereof; visualizing data using tools selected from a group comprising Circos, Mutation Mapper, GIMP, IGV and any other data visualizing tool, or any combination thereof; and validating somatic variants using Sanger sequencing or any other tool known for validating somatic variant, to analyse the role of genetic aberration(s) in HNSCC.

Description:
"BIOMARKERS OF SQUAMOUS CELL CARCINOMA OF HEAD AND NECK, PROGNOSTIC MARKERS OF RECURRENCE IN SQUAMOUS CELL CARCINOMA OF HEAD AND NECK, AND METHODS THEREOF" TECHNICAL FIELD

The present disclosure relates to the field of Oncology, Molecular Biology, Genomics and Bioinformatics. The present disclosure relates to biomarkers of cancer and prognostic markers of recurrence in cancer, particularly head and neck squamous cell carcinomas (HNSCC), method of analysing role of said biomarkers/prognostic markers, corresponding methods of detecting cancer and determining/predicting recurrence, and kits thereof. In particular, the present disclosure relates to analyzing aberrations and providing biomarkers of identification of HNSCC and prognostic markers of recurrence in HNSCC and associated methods/applications. Further, said prognostic markers differentiate non-recurring, loco- regionally recurring and distant metastatic tumors.

BACKGROUND OF THE DISCLOSURE

Squamous cell carcinomas of head and neck (HNSCC) are the sixth leading cause of cancer worldwide. Tumors of head and neck region are heterogeneous in nature with different incidences, mortalities and prognosis for different subsites and accounts for almost 30% of all cancer cases in India. Oral cancer is the most common subtype of head and neck cancers in humans, with a worldwide incidence in >300,000 cases. The disease is an important cause of death and morbidity, with a 5-year survival of less than 50%. Recent studies have identified various genetic changes in many subsites of head and neck using high-throughput sequencing assays and computational methods. Such multi-tiered approaches using the exomes, genomes, transcriptomes and methylomes from different squamous cell carcinomas have generated data on key variants and in some cases, their biological significance, aiding the understanding of disease progression. Some of the above sequencing studies have identified key somatic variants and linked them with patient stratification and prognostication. This, along with the associated epidemiology, enables one to look beyond the identification of driver mutations, and identify predictive signatures in HNSCC.

A previous study from the cancer genome atlas (TCGA) consortium with HNSCC patients (N = 279) identified somatic mutations in TP53, CDKN2A, FAT J, PIK3CA, NOTCH], KMT2D and NSDI at a frequency greater than 10%. Additionally, the TCGA study identified loss of TRAF3 gene, amplification of E2F1 in human papilloma vims (HPV)-positive oropharyngeal tumors, along with mutations in PIK3CA, CASP8 and HRAS, and co- amplifications of the regions l lql3 (harboring CCND1, FADD and CTTN) and l lq22 (harboring BIRC2 and YAP1), in HPV-negative tumors, described to play an important role in pathogenesis and tumor development. Chromosomal losses at 3p and 8p, and gains at 3q, 5p and 8q were also observed in HNSCC. Tumors originating in the anterior/oral part of tongue or, oral tongue squamous cell carcinoma (OTSCC) tend to be different from those at other subsites as oral tongue tumors are associated more with younger patients and spread early to lymph nodes. Additionally, oral tongue tumors have a higher regional failure compared to gingivo-buccal cases in oral cavity. Tobacco (both chewing and smoking) and alcohol are common risk factors for this group of tumors among older patients.

Previous sequencing studies groups oral tongue tumors with tumors from oral cavity but a rise in the incidence of oral tongue tumors, especially among younger people who never smoked, drank alcohol or chewed tobacco, warrants further investigation of this subgroup of oral tumors. Additionally, the role of HPV in oral tongue tumors, unlike in oropharyngeal cases, is not well understood both in terms of incidence and prognosis. A meta-analysis of HPV-positive HNSCC tumors from multiple studies conducted at multiple locations concluded that HPV-positive patients, especially in oropharynx, have improved overall and disease-specific survival. A past study has presented data that the HPV incidence in oral tongue is low and some argue against any link between HPV infection and aggressive oral tongue tumors. Although there is no consensus on rate of HPV incidence among oral tongue patients, it is generally believed that it is low compared to oropharyngeal tumors. Further, some studies in the past, albeit from a different geography, established a much higher rate of HPV infection in oral tongue tumors.

However, the aforesaid prior art study/methods do not provide specific reliable indicators/diagnostic biomarkers of HNSCC and/or predictors/prognostic markers of recurrence and distant metastasis in HNSCC based on holistic genome analysis and determination of genetic variations/aberrations.

Hence, there exists a need for providing improved and reliable indicators/ diagnostic biomarkers of squamous cell carcinomas of head and neck (HNSCC) including but not limiting to oral tongue squamous cell carcinoma (OTSCC), and predictors/prognostic markers of recurrence and distant metastasis in HNSCC including but not limiting to OTSCC, and employing such indicators/biomarkers/predictors for understanding and practical management of HNSCC. The present disclosure tries to address the above mentioned drawbacks of prior art. SUMMARY OF THE DISCLOSURE

Accordingly, the present disclosure relates to a method of predicting recurrence of malignancy in a subject having or suspected of having head and neck squamous cell carcinoma (HNSCC), said method comprising step of detecting aberration in at least one gene selected from a group comprising WASH4P, SLC01A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDC1, LILRA6, MARCH3, NANOG, POLRIC, PPFIAl, RP3-323P13.2, STCl, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNFIA, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2, TRBVl l-1, ABCD1P2, ETV7, HDC1, RP11- 438J1.1, SLC9B1P1 and ZNF618.

The disclosure further relates to aberration of WASH4P, SLCOl A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDC1, LILRA6, MARCH3, NANOG, POLRIC, PPFIAl, RP3-323P13.2, STCl, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNFIA, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2, TRBVl l-1, ABCD1P2, ETV7, HDC1, RP11- 438J1.1, SLC9B1P1 or ZNF618, or any combination thereof for predicting recurrence of malignancy and distant metastasis in a subject having or suspected of having HNSCC.

The disclosure further relates to use of aberration of WASH4P, SLCOl A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDC1, LILRA6, MARCH3, NANOG, POLRIC, PPFIAl, RP3-323P13.2, STCl, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNFIA, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2, TRBVl l-1, ABCD1P2, ETV7, HDC1, RP11-438J1.1, SLC9B1P1 or ZNF618, or any combination thereof for predicting recurrence and distant metastasis of malignancy in a subject having or suspected of having HNSCC.

The disclosure further relates to a kit for predicting recurrence of malignancy in a subject having or suspected of having HNSCC, said kit comprising agent for detecting aberration of WASH4P, SLCOl A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDC1, LILRA6, MARCH3, NANOG, POLRIC, PPFIAl, RP3-323P13.2, STC1, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, D M1P47, DUX4L9, HNF1A, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2, TRBVl l-1, ABCD1P2, ETV7, HDC1, RP11-438J1.1, SLC9B1P1 or Z F618, or any combination thereof, for predicting of malignancy in a subject having or suspected of having HNSCC.

BRIEF DESCRIPTION OF THE ACCOMPANYING FIGURES

In order that the disclosure may be readily understood and put into practical effect, reference will now be made to exemplary embodiments as illustrated with reference to the accompanying figures. The figures together with a detailed description and tables below, are incorporated in and form part of the specification, and serve to further illustrate the embodiments and explain various principles and advantages, in accordance with the present disclosure where: Figure 1 shows key variants in OTSCC and their relationship with habits, clinical and epidemiological parameters.

A. The OTSCC samples are represented in color-codes with their corresponding status on; node (Y: positive, N: negative); stage (E: early, L: late), recurrence (Y: loco-regionally recurrent, N: non-recurrent and M: distant metastatic); grade (WD: well-differentiated, MD: moderately-differentiated and PD: poorly-differentiated); disease-free survival or DFS (L: low, M: mid and H: high); HPV DNA (Y: positive and N: negative); and habits (chewing, alcohol and smoking, Y: positive and N: negative). B. Somatic mutation frequency per megabase (MB) is represented as scatterplot with the median point as a fine dotted line (only somatic non-synonymous mutations/mb is around 1.75/mb). C. Genes with significant somatic variants. D. Frequency histogram of nineteen cancer-associated genes bearing somatic missense and nonsense variants (mutations and indels). E. Columns representing mutually exclusive sets of genes. F. Significant copy number insertions and deletions (CNVs), alongside the chromosome cytoband. The numbers of cancer-associated genes within each cytoband are listed on the right.

Figure 2 shows relationship between genes harboring somatic variants with clinical, epidemiological parameters and signaling pathways.

A. Histograms showing relationship between genes with significant somatic variants (>10% frequency) and various clinical and epidemiological parameters. B. Stack net charts of relative patient fraction (%) for each of the eight cancer-associated signaling pathways and their relationship with various clinical and epidemiological parameters.

Figure 3 shows differentially expressed genes, affected pathways and their relationship with clinical and epidemiological parameters.

A. Expression changes (green - up-regulation, red - down-regulation) representing significantly differentially expressed genes in tumors. B. Stacked histograms representing relative patient fraction (%) for each of the 19 cancer-associated pathways and their relationship with clinical and epidemiological parameters.

Figure 4 shows role of CASP8 in HPV-positive and HPV-negative OTSCC cell lines. Results from the A. Matrigel cell invasion assay (plotted with respect to the control cells),

B. Wound healing assay, and C. MTT cell survival assay (plotted with respect to the control cells) in UPCLSCC040 (HPV-negative) and UMSCC-47 (HPV-positive) cell lines.

Figure 5 shows a minimal gene signature for tumor recurrence (loco-regional recurrence) and distant metastasis.

Genes harboring somatic variants (in color) that are a part of the minimal signature set for tumor recurrence and distant metastasis derived from random forest analyses.

Figure 6 shows cytoband-wise representation of CNVs found in all 48 samples along with clinical parameters and patient epidemiology.

Figure 7 shows circular genomic representation using Circos (v0.66) of LOHs with > 10% frequency of patients bearing them, somatic indels- and mutations and genes with significant expression changes (|log2FC|>0.6).

Figure 8 shows gene signature for locoregional recurrence of tumor/malignancy in HNSCC Figure 9 shows gene signature for distance metastasis of tumor/malignancy in HNSCC DETIALED DESCRIPTION OF THE DISCLOSURE

The present disclosure relates to indicators/biomarkers of squamous cell carcinoma of head and neck (HNSCC) and predictors/prognostic markers of recurrence in squamous cell carcinoma of head and neck (HNSCC). The present disclosure relates to a method of predicting recurrence of malignancy in a subject having or suspected of having head and neck squamous cell carcinoma (HNSCC), said method comprising step of detecting aberration in at least one gene selected from a group comprising WASH4P, SLC01A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDCl, LILRA6, MARCH3, NANOG, POLRIC, PPFIAl, RP3- 323P13.2, STCl, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNFIA, MED24, MGMEl, MUC3A, OVOL2, TAP2, AQR, FBNl, MSTl, NOTCH3, TATDN2, TRBVl l-1, ABCD1P2, ETV7, HDCl, RPl 1-438J1.1, SLC9B 1P1 and ZNF618.

In an embodiment, the disclosure relates to a method for detecting aberration in the gene, said method comprises steps of:

collecting tumor sample and matched control sample from the subject; performing clinical, pathological and epidemiological examination of the sample, the clinical and the epidemiological parameter is selected from a group comprising detection of HPV status in the tumor and staging, or a combination thereof; isolating nucleic acid from the samples by methods selected from a group comprising PureLink RNA mini kit, Qiagen RNA isolation kit or any other total RNA isolation kit; generating data on gene sequencing, QC alignment, variant identification, post-processing filters, or any combination thereof, using tools selected from a group comprising Agilent SureSelect, Illumina TruSeq, Nextera exome capture kits, HiSeq 2500, GAIIx, Illumina Base caller, NovoAlign, Samtools, GATK, Picard, Dindel and any other tool used to generate said data, or any combination thereof; detecting cross-contamination in the sample, identifying significant somatic variants, annotating, analyzing the variants, or any combination thereof, using tools selected from a group comprising ContEst, CRAVAT, CHASM analysis, IntOGen, MutSigCV, MutSiC2 and any other tool used to detect cross-contamination in the sample, followed by identifying significant somatic variants, annotate and analyze the variants, or any combination thereof; performing S P genotyping, variant re-validation, or a combination thereof, using tools selected from a group comprising whole-genome SNP genotyping arrays, Qubit, Illunima Human Omni and any other tool used for SNP genotyping and variant re-validation, or any combination thereof; determining Copy number variations (CNVs), Loss of Heterozygosity (LOH) or a combination thereof, using tools selected from a group comprising cnv Partition 3.1.6 plugin in Illumina Genome Studio v2011.1 and CNV annotator, and any other tool used for determining CNVs and Loss of Heterozygosity, or any combination thereof; carrying out gene expression profiling using tools selected from a group comprising Illumina HumanHT-12-v4 expression Bead chip, PureLink RNA kit, RNeasy (Qiagen) Mini kit, Agilent Bioanalyzer, RNA Nano6000 chip, Illumina

WGDASL aasay, Illumina Total Prep RNA Amplification kit (Ambion), Illumina Hi Scan, Genome Studio, VST (Variance stabilizing transformation), LOESS and R package Lumi, ComBat and any other tool used for gene expression profiling, or any combination thereof; predicting recurrence using random forest analysis, error correction, recomputing, or any combination thereof using tools selected from a group comprising varSelRF package, leave-one-out bootstrap method, .632+ method, Benjamin- Hochberg test and any other tool used for predicting recurrence, error correction, re- computing, or any combination thereof; analyzing pathways using tools selected from a group comprising Graphite Web, KEGG, Reactome databases, CytoScape and any other pathway analysis tool, or any combination thereof; visualizing data using tools selected from a group comprising Circos, Mutation Mapper, GIMP, IGV and any other data visualizing tool, or any combination thereof; validating somatic variants using Sanger sequencing or any other tool known for validating somatic variant, to analyse the role of genetic aberration(s) in HNSCC; and performing statistical analyses to determine biomarkers for recurrence the malignancy using any statistical analyses tool capable of determining biomarkers for the recurrence.

The disclosure further relates to aberration of WASH4P, SLCOl A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDCl, LILRA6, MARCH3, NANOG, POLRIC, PPFIAl, RP3-323P13.2, STCl, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNFIA, MED24, MGMEl, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2, TRBVl l-1, ABCD1P2, ETV7, HDCl, RP11- 438J1.1, SLC9B1P1 or ZNF618, or any combination thereof for predicting recurrence and distant metastasis of malignancy in a subject having or suspected of having HNSCC.

The disclosure furthermore relates to use of aberration of WASH4P, SLCOl A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDCl, LILRA6, MARCH3, NANOG, POLRIC, PPFIAl, RP3-323P13.2, STCl, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNFIA, MED24, MGMEl, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2, TRBVl l-1, ABCD1P2, ETV7, HDCl, RP11-438J1.1, SLC9B 1P1 or ZNF618, or any combination thereof for predicting recurrence and distant metastasis of malignancy in a subject having or suspected of having HNSCC.

The disclosure furthermore relates to a kit for predicting recurrence of malignancy in a subject having or suspected of having HNSCC, said kit comprising agent for detecting aberration of WASH4P, SLCOl A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDCl, LILRA6, MARCH3, NANOG, POLRIC, PPFIAl, RP3- 323P13.2, STCl, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNFIA, MED24, MGMEl, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2, TRBVl l-1, ABCD1P2, ETV7, HDCl, RPl 1-438J1.1, SLC9B 1P1 or ZNF618, or any combination thereof, for predicting of malignancy in a subject having or suspected of having HNSCC. In an embodiment of the present disclosure, the recurrence is locoregional recurrence or distant metastasis or both.

In another embodiment of the present disclosure, the aberration in gene selected from a group comprising RPS9, MARCH3, LILRB3, LILRA6, ADM2, TTC39A, TPCN2, STC1, RP3- 323P13.2, PPFIA1, POLR1C, NANOG, K DC1, KDM4C, HOXB2, CBR4, ARHGEF4 and AKR1C2, or any combination thereof, predicts locoregional recurrence of the malignancy in the subject. In another embodiment of the present disclosure, the aberration in at least 4 gene selected from a group comprising RPS9, MARCH3, LILRB3, LILRA6, ADM2, TTC39A, TPCN2, STC1, RP3-323P13.2, PPFIA1, POLR1C, NANOG, KNDC1, KDM4C, HOXB2, CBR4, ARHGEF4 and AKR1C2, predicts the loco-regional recurrence and wherein the at least 4 gene mandatorily comprise at least one gene selected from RPS9, LILRA6, LILRB3, MARCH3 or ADM2, to predict the loco-regional recurrence in the subject.

In another embodiment of the present disclosure, the aberration in gene selected from a group comprising AQR, FBN1, MST1, NOTCH3, ATDN2, TRBVl l-1, ABCD1P2, ETV7, FHDCl, RP11-438J1.1, SLC9B 1P1 and ZNF618, or any combination thereof, predicts distant metastasis of the malignancy in the subject.

In another embodiment, of the present disclosure, the aberration in-

AQR, FBN1, MST1, NOTCH3, TATDN2, TRBVl l-1,

AQR, FBN1, MST1, NOTCH3 and TATDN2;

AQR, FBN1, MST1, NOTCH3 and TRBVl l-1;

AQR, FBN1, MST1, TATDN2 and TRBVl l-1;

AQR, FBN1, NOTCH3, TATDN2 and TRBV11-1

AQR, MST1, NOTCH3, TATDN2 and TRBVl l-1; or

FBN1, MST1, NOTCH3, TATDN2 and TRBVl l-1

predicts distant metastasis of the malignancy in the subject.

In another embodiment of the present disclosure, the aberration is selected from a group comprising up-regulation, down-regulation, amplification, mutation, loss of heterozygosity, copy number variations, structural variations, somatic mutations, gene fusion events, allelic expression, chromosomal aberrations, epigenetic changes, DNA methylation, histone modification and non-coding RNA (ncRNA)-associated gene silencing, or any combination thereof.

In another embodiment of the present disclosure, the HNSCC is selected from a group comprising cancer of hypopharynx, laryngeal cancer, cancer of oral cavity, nasopharyngeal cancer, oropharyngeal squamous cell carcinomas and cancer of trachea.

The present disclosure furthermore relates to a method of detecting head and neck squamous cell carcinoma (HNSCC) in a sample having or suspected of having the HNSCC, said method comprising a step of detecting aberration in TP53, CASP8, OBSCN, ING1, TTK, U2AF1, RASA1, CDKN2A, NOTCH1, NOTCH2, DMD, PIK3CA, AJUBA, ANK3, FAT1, HLAA, KEAP1, KMT2D and NFE212, in the sample to detect the HNSCC.

In another embodiment of the present disclosure, the aberration is selected from a group comprising up-regulation, down-regulation, amplification, mutation, loss of heterozygosity, copy number variations, structural variations, somatic mutations, gene fusion events, allelic expression, chromosomal aberrations, epigenetic changes, DNA methylation, histone modification and non-coding RNA (ncRNA)-associated gene silencing, or any combination thereof and wherein the HNSCC is selected from a group comprising cancer of hypopharynx, laryngeal cancer, cancer of oral cavity, nasopharyngeal cancer, oropharyngeal squamous cell carcinomas and cancer of trachea.

In another embodiment of the present disclosure, the aberration in CASP8 serve as indicator in HPV negative HNSCC subject.

In another embodiment of the present disclosure, the method of detecting aberration in the gene comprises steps of:

collecting tumor sample and matched control sample from the subject; performing clinical, pathological and epidemiological examination of the sample, the clinical and the epidemiological parameter is selected from a group comprising detection of HPV status in the tumor and staging, or a combination thereof; isolating nucleic acid from the samples by methods selected from a group comprising PureLink RNA mini kit, Qiagen RNA isolation kit or any other total RNA isolation kit; generating data on gene sequencing, QC alignment, variant identification, post-processing filters, or any combination thereof, using tools selected from a group comprising Agilent SureSelect, Illumina TruSeq, Nextera exome capture kits, HiSeq 2500, GAIIx, Illumina Base caller, NovoAlign, Samtools, GATK, Picard, Dindel and any other tool used to generate said data, or any combination thereof; detecting cross-contamination in the sample, identifying significant somatic variants, annotating, analyzing the variants, or any combination thereof, using tools selected from a group comprising ContEst, CRAVAT, CHASM analysis, IntOGen, MutSigCV, MutSiC2 and any other tool used to detect cross-contamination in the sample, followed by identifying significant somatic variants, annotate and analyze the variants, or any combination thereof; performing SNP genotyping, variant re-validation, or a combination thereof, using tools selected from a group comprising whole-genome SNP genotyping arrays, Qubit, Illunima Human Omni and any other tool used for SNP genotyping and variant re-validation, or any combination thereof; determining Copy number variations (CNVs), Loss of Heterozygosity (LOH) or a combination thereof, using tools selected from a group comprising cnv Partition 3.1.6 plugin in Illumina Genome Studio v2011.1 and CNV annotator, and any other tool used for determining CNVs and Loss of Heterozygosity, or any combination thereof; carrying out gene expression profiling using tools selected from a group comprising Illumina HumanHT-12-v4 expression Bead chip, PureLink RNA kit, RNeasy (Qiagen) Mini kit, Agilent Bioanalyzer, RNA Nano6000 chip, Illumina WGDASL aasay, Illumina Total Prep RNA Amplification kit (Ambion), Illumina Hi Scan, Genome Studio, VST (Variance stabilizing transformation), LOESS and R package Lumi, ComBat and any other tool used for gene expression profiling, or any combination thereof; predicting recurrence using random forest analysis, error correction, recomputing, or any combination thereof using tools selected from a group comprising varSelRF package, leave-one-out bootstrap method, .632+ method, Benjamin- Hochberg test and any other tool used for predicting recurrence, error correction, recomputing, or any combination thereof; analyzing pathways using tools selected from a group comprising Graphite Web, KEGG, Reactome databases, CytoScape and any other pathway analysis tool, or any combination thereof; visualizing data using tools selected from a group comprising Circos, Mutation Mapper, GIMP, IGV and any other data visualizing tool, or any combination thereof; and validating somatic variants using Sanger sequencing or any other tool known for validating somatic variant, to analyse the role of genetic aberration(s) in HNSCC.

As used herein, the expressions "recurrence" or "cancer recurrence" includes 'loco-regional recurrence' and 'distant metastasis' or 'distant metastatic tumor' .

Since there is a need for improved molecular biomarkers of head and neck squamous cell carcinoma (HNSCC) which are reliable and predictors/prognostic markers of recurrence in head and neck squamous cell carcinoma (HNSCC) patients, the present disclosure studies genetic aberrations in HNSCC to arrive at said indicators/biomarkers for determining HNSCC and predictors/prognostic markers of recurrence in HNSCC. In particular, the present disclosure studies the role of genetic aberrations including somatic variations/mutations in genes from exome sequencing, immediate upstream and downstream flanking nucleotides of the somatic mutations, DNA methylation, loss of heterozygosity (LOH), copy number variations (CNVs), SNVs and gene expression changes, along with status of HPV infection, tumor nodal status and altered cellular pathways, in HNSCC. The disclosure also identifies the correlation of said genetic aberrations with alteration in cellular pathways of cancer and linking of habits, HPV infection, nodal status, tumor grade and recurrence. Thus, the present disclosure exploits said aspects to analyze the role of genetic changes in head and neck squamous cell carcinoma (HNSCC) and arrive at biomarkers of HNSCC and predictors/prognostic markers of recurrence in HNSCC patients. Accordingly, the present disclosure relates to a method of analysing the role of genetic aberration(s) in HNSCC, said method comprising steps of:

a) collecting patient samples - normal and tumor pairs;

b) performing clinical, epidemiological and pathological examination of the samples; c) isolating nucleic acid from the samples;

d) generating data with sequencing, QC of sequencing reads, read alignment, variant identification and post-processing filters;

e) detecting cross-contamination in the samples, identifying significant somatic variants, annotating and analyzing the variants;

f) SNP genotyping using different platform and re-validation of variants called by high- throughput sequencing;

g) determining Copy number variations (CNVs) and Loss of Heterozygosity (LOH); h) carrying out gene expression profiling and identifying genes with significantly altered expression in tumors;

i) predicting recurrence using random forests;

j) analyzing pathways;

k) visualizing data;

1) validating somatic variants to analyse the role of genetic aberration(s) in HNSCC; and;

m) performing statistical analyses to determine biomarkers for tumor recurrence.

In an embodiment of the present disclosure, the analysing of the role of genetic aberrations in HNSCC involves both qualitative and quantitative analysis. In another embodiment of the present disclosure, the above method determines significantly reliable or improved indicators/biomarkers of HNSCC and predictors/prognostic markers of recurrence and distant metastasis in HNSCC, more particularly, oral tongue squamous cell carcinoma (OTSCC). In a preferred embodiment, the above method analyses aberration/alteration in genes to determine their role and arrive at indicators/biomarkers of HNSCC and predictors/prognostic markers of recurrence and distant metastasis in HNSCC.

In an exemplary embodiment, the above method of analysing genetic aberrations to arrive at indicators/biomarkers of HNSCC comprises act of performing steps (a) to (1). In another exemplary embodiment, the above method of analysing genetic aberrations to arrive at predictors/prognostic markers of recurrence and distant metastasis in HNSCC comprises act of performing steps (a) to (m).

In another embodiment, the above method links habits, clinical, pathological and epidemiological parameters including but not limiting to chewing tobacco, smoking and alcohol consumption, HPV infection, nodal status and tumor grade, with genetic aberrations including but not limiting to somatic variants and the associated pathways affected in HNSCC, more particularly, OTSCC. In yet another embodiment, the correlation/linkage of aforesaid habits and/or parameters provide for improved indicators/biomarkers of HNSCC as described in the present disclosure.

Thus, the above method provides for a holistic analysis of genetic aberrations in HNSCC and identifies a group of genes, which are significantly altered/bear aberrations and serve as indicators/biomarkers of HNSCC and predictors/prognostic markers of recurrence in HNSCC. More particularly, the present disclosure provides a set of 19 genes to be aberrated in HNSCC, particularly OTSCC wherein one or more genes of said set serve as indicators/biomarkers in HNSCC.

In an exemplary embodiment, the above method of the present disclosure provides genetic aberrations in one or more genes from a group comprising 19 genes as indicators/biomarkers of HNSCC.

The said 19 genes of the group are TP53, CASP8, OBSCN, ING1, TTK, U2AF1, RASA1, CDKN2A, NOTCH1, NOTCH2, DMD, PIK3CA, AJUBA, ANK3, FAT1, HLAA, KEAP1, KMT2D and NFE212. In another embodiment, one or more of said 19 genes harbor genetic aberrations, more particularly somatic variations and act as indicators/biomarkers of HNSCC. Stringent filtering steps are applied and multiple annotation tools are used to come up with the said list of 19 cancer-associated genes that harbor significant somatic variations in HNSCC, more particularly OTSCC.

Although the somatic variants identified from the present study are distributed uniformly across the genome, the significant copy number variation (CNV) events are more concentrated in chromosomes 6-9 and 11 (Figure IF, Figure 6 and Figure 7). In another embodiment of the present disclosure, genetic variants (somatic mutations, indels, CNVs and LOHs) are cataloged and transcriptomic (significantly up- and down-regulated genes) changes in oral tongue squamous cell carcinoma (OTSCC) are observed and these are used in an integrated approach linking genetic aberrations, more particularly genes harboring somatic variants with common risk factors like tobacco and alcohol, clinical, epidemiological factors like tumor grade and HPV; and gene expression changes with tumor recurrence.

In another embodiment of the present disclosure, one of the most important genes harboring somatic mutations identified in the study is CASP8, the product for which is derived from the precursor Procaspase-8. Caspase-8, is an important protein implicated in both apoptotic and non-apoptotic pathways. Recent analysis from the TCGA study suggests that mutations in CASP8 co-occur with mutations in HRAS, and are mutually exclusive with amplifications in FADD gene. In the functional study/method of the present disclosure, it is concluded that caspase-8 shows different effects in HPV-positive and HPV-negative cells, the effect being more pronounced in HPV-negative cells (Figure 4). Therefore, it is possible that HPV- negative tumors activate a completely different set(s) of pathways and/or may have different chemo sensitivity towards drugs than the HPV-positive tumors. It was shown previously that HPV-positive HNSCC cell lines are resistant to TRAIL (tumor necrosis factor-related apoptosis-inducing ligand) and treatment of cells with the proteasome inhibitor bortezomib sensitizes HPV-positive cells towards TRAIL-induced cell death mediated by caspase-8. The E6 protein of HPV interacts with the DED domain of caspase-8 and induces its activation by recruiting it to the nucleus. The present results on the role of caspase-8- mediated apoptosis being more pronounced in the HPV-negative OTSCC cell line is similar to the observation on the role of CASP8 in HPV-negative patients made earlier in TCGA study. Taken together, genes including CASPS regulate key pathways that play important role in the development of tumors in oral tongue.

Accordingly, the present disclosure also provides CASP8 gene to be significantly altered and play an important role in apoptosis-mediated cell death in an HPV-negative OTSCC cell line. Thus, genetic aberration in CASP8 gene serve as indicator/biomarker in HPV negative HNSCC subject, particularly subject having HPV negative OTSCC.

Further, the above method of present disclosure also provides a set of 38 genes to bear aberrations in HNSCC, particularly OTSCC wherein any combination of genes of said set serve as predictors/prognostic markers of recurrence in HNSCC. In an embodiment, aberration in any combination of said 38 genes serve as predictors/prognostic markers of recurrence in HNSCC, wherein said combinations distinguish non-recurrence, loco-regional recurrence and distant metastasis in HNSCC, particularly OTSCC. In an exemplary embodiment, aberrations in plurality of genes of the 38-gene set serve as minimal signature for predicting recurrence and distant metastasis in HNSCC.

In an exemplary embodiment, any combination of 38 genes serve as minimal signature for predicting recurrence in HNSCC wherein said 38 genes are WASH4P, SLC01A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDCl, LILRA6, MARCH3, NANOG, POLRIC, PPFIA1, RP3-323P13.2, STC1, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNFIA, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2 and TRBVl l-1. In another exemplary embodiment, aberration in all 38 genes serve as a signature for predicting recurrence in HNSCC. In another embodiment, the recurrence is loco-regional recurrence or distant metastasis.

In another exemplary embodiment, gene signature such as RPS9, MARCH3, LILRB3, LILRA6, ADM2, TTC39A, TPCN2, STC1, RP3-323P13.2, PPFIA1, POLRIC, NANOG, KNDCl, KDM4C, HOXB2, CBR4, ARHGEF4 or AKR1C2, or any combination thereof selected from the said 38 gene signature, predicts loco-regional recurrence of malignancy in a subject having or suspected of having HNSCC with highest sensitivity and specificity (illustrated in Figure 8).

In another exemplary embodiment, at least 4 gene from the gene signature such as RPS9, MARCH3, LILRB3, LILRA6, ADM2, TTC39A, TPCN2, STC1, RP3-323P13.2, PPFIA1, POLRIC, NANOG, KNDCl, KDM4C, HOXB2, CBR4, ARHGEF4 or AKR1C2, predicts loco-regional recurrence of malignancy in a subject having or suspected of having HNSCC, wherein in the said at least 4 gene mandatorily there should be at least one gene selected from RPS9, LILRA6, LILRB3, MARCH3 or ADM2, to predict the loco-regional recurrence of malignancy in a subject having or suspected of having HNSCC.

In another exemplary embodiment, gene signature such as AQR, FBN1, MST1, NOTCH3, TATDN2 or TRBVl l-1, or any combination thereof selected from the said 38 gene signature, predicts distant metastasis of malignancy in a subject having or suspected of having HNSCC. In another exemplary embodiment, the gene combinations from the said 38 gene signature that predicts distant metastasis of malignancy with highest sensitivity and specificity (illustrated in Figure 9) in a subject having or suspected of having HNSCC are:

AQR, FBN1, MST1, NOTCH3, TATDN2 and TRBV11-1

AQR, FBN1, MST1, NOTCH3 and TATDN2;

AQR, FBN1, MST1, NOTCH3 and TRBVl l-1;

AQR, FBN1, MST1, TATDN2 and TRBVl l-1;

AQR, FBN1, NOTCH3, TATDN2 and TRBV11-1

AQR, MST1, NOTCH3, TATDN2 and TRBVl l-1; or

FBN1, MST1, NOTCH3, TATDN2 and TRBVl l-1

In an alternate embodiment, the gene signature ABCD1P2, ETV7, FHDC1, RP11-438J1.1, SLC9B1P1 or ZNF618, or any combination thereof, predicts distant metastasis of malignancy in a subject having or suspected of having HNSCC with highest sensitivity and specificity (illustrated in Figure 9).

Identifying signature for loco-regional tumor recurrence and distant metastasis prospectively in primary tumors adds significant advantage to disease management. In order to do this, a machine learning method is employed in the present disclosure using the molecular changes identified in the study, in three batches of primary tumors; non-recurring, loco-regionally recurring and tumors with distant metastasis. A 38-gene signature is identified to be significantly distinguishing the three groups. The bootstrapping error for the non-recurring and the loco-regionally recurring groups are low (N = 34, .632 error = 0.03 and N = 10, .632 error = 0.3 respectively) but not in the metastatic tumor group (N = 4, .632 error = 1). This is due to the small sample numbers (N = 4) in the metastatic category, and further studies can be carried out using a larger sample set to additionally validate the signature.

In an embodiment of the present disclosure, the present method helps in finding novel drug candidates for OTSCC based on the genes that are altered in harboring genetic variants and/or having altered expression. The genome-wide study of genetic aberrations in HNSCC, particularly somatic variant identification and gene expression changes in tumors in the present method give rise to possibilities of finding novel drug targets/candidates and/or lead to using existing drugs prescribed/under trial for other indications. In another embodiment, the present disclosure also identifies the role of genetic aberrations as potential drug targets. In a preferred embodiment, the same is carried out by identifying the significantly altered genes, preferably somatic variant identification and/or gene expression changes and screening for available drugs/drugs under trial against them.

As used in the present disclosure, head and neck squamous cell carcinomas (HNSCC) refers to cancers including but not limiting to cancers of oral cavity including the inner lip, tongue, floor of mouth, gingivae, and hard palate, nasopharyngeal cancer, oropharyngeal squamous cell carcinomas (OSCC), cancer of hypopharynx, laryngeal cancer and cancer of trachea. The said terms/phrases are used interchangeably in the present disclosure and should be construed accordingly. In an exemplary embodiment of the present disclosure, the HNSCC is oral tongue squamous cell carcinoma (OTSCC).

As used herein, the expressions related to "prediction/determination of recurrence" or "predicting/determining recurrence" may be used interchangeably with conventional terms such as prognosis which refers to predicting the likely outcome of recurrence in subject(s) having HNSCC.

In an exemplary embodiment of the present disclosure, the aforementioned method of analysing the role of genetic aberration(s) to arrive at indicators/biomarkers of HNSCC and predictors/prognostic markers of recurrence in HNSCC specifically involves the following steps:

a) collecting tumor and matched control samples from OTSCC patients;

b) performing screening by collecting details on habits of the patients, clinical, pathological and epidemiological examination of the samples wherein the habit is selected from a group comprising smoking, alcohol consumption and chewing tobacco, or any combination thereof, and the clinical and epidemiological parameter is selected from a group comprising detection of HPV status in the tumors and staging, or a combination thereof;

c) isolating nucleic acid from the samples by methods selected from a group comprising PureLink RNA mini kit, Qiagen RNA isolation kit or any other total RNA isolation kit;

d) generating data on gene sequencing, QC alignment, variant identification, postprocessing filters, or any combination thereof, using tools selected from a group comprising Agilent SureSelect, Illumina TruSeq, Nextera exome capture kits, HiSeq 2500, GAIIx, Illumina Base caller, NovoAlign, Samtools, GATK, Picard, Dindel and any other tool used to generate said data, or any combination thereof;

e) detecting cross-contamination in the samples, identifying significant somatic variants, annotating, analyzing the variants, or any combination thereof, using tools selected from a group comprising ContEst, CRAVAT, CHASM analysis, IntOGen,

MutSigCV, MutSiC2 and any other tool used to detect cross-contamination in the samples, identify significant somatic variants, annotate and analyze the variants, or any combination thereof;

f) SNP genotyping, variant re-validation, or a combination thereof, using tools selected from a group comprising whole-genome SNP genotyping arrays, Qubit, Illunima

Human Omni and any other tool used for SNP genotyping and variant re-validation, or any combination thereof;

g) determining Copy number variations (CNVs), Loss of Heterozygosity (LOH) or a combination thereof, using tools selected from a group comprising cnv Partition 3.1.6 plugin in Illumina Genome Studio v2011.1 and CNV annotator, and any other tool used for determining CNVs and Loss of Heterozygosity, or any combination thereof; h) carrying out gene expression profiling using tools selected from a group comprising Illumina HumanHT-12-v4 expression Bead chip, PureLink RNA kit, RNeasy (Qiagen) Mini kit, Agilent Bioanalyzer, RNA Nano6000 chip, Illumina WGDASL aasay, Illumina Total Prep RNA Amplification kit (Ambion), Illumina Hi Scan,

Genome Studio, VST (Variance stabilizing transformation), LOESS and R package Lumi, ComBat and any other tool used for gene expression profiling, or any combination thereof;

i) predicting recurrence using random forest analysis, error correction, re-computing, or any combination thereof using tools selected from a group comprising varSelRF package, leave-one-out bootstrap method, .632+ method, Benjamin-Hochberg test and any other tool used for predicting recurrence, error correction, re-computing, or any combination thereof;

j) analyzing pathways using tools selected from a group comprising Graphite Web, KEGG, Reactome databases, CytoScape and any other pathway analysis tool, or any combination thereof;

k) visualizing data using tools selected from a group comprising Circos, Mutation Mapper, GIMP, IGV and any other data visualizing tool, or any combination thereof; 1) validating somatic variants using Sanger sequencing or any other tool known for validating somatic variant, to analyse the role of genetic aberration(s) in HNSCC; and

m) performing statistical analyses to determine biomarkers for tumor recurrence using any statistical analyses tool capable of determining biomarkers for tumor recurrence.

Accordingly, the present disclosure specifically relates to a group of 19 genes as indicators/biomarkers of HNSCC, particularly OTSCC wherein one or more genes of said set serve as indicators/biomarkers in HNSCC. In an exemplary embodiment, genetic aberrations in one or more genes from a group comprising 19 genes act as indicators/biomarkers of HNSCC. The said 19 genes of the group are TP53, CASP8, OBSCN, ING1, TTK, U2AF1, RASA1, CDKN2A, NOTCH1, NOTCH2, DMD, PIK3CA, AJUBA, ANK3, FAT1, HLAA, KEAP1, KMT2D and NFE212. In another embodiment, one or more of said 19 genes harbor genetic aberrations, more particularly somatic variations and serve as indicators/biomarkers of HNSCC.

Further, the present disclosure also provides a set of 38 genes as predictors/prognostic markers of recurrence in HNSCC, particularly OTSCC wherein, any combination of said 38 genes can be employed as predictors/prognostic markers of recurrence in HNSCC. In an embodiment, aberration in any combination of said 38 genes serve as predictors/prognostic markers of recurrence in HNSCC, wherein said combinations distinguish non-recurrence, loco-regional recurrence and distant metastasis in HNSCC, particularly OTSCC. The said 38 genes are WASH4P, SLC01A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDC1, LILRA6, MARCH3, NANOG, POLRIC, PPFIA1, RP3- 323P13.2, STC1, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNFIA, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2 and TRBVl l-1. In an exemplary embodiment, aberrations in all 38 genes serve as a signature for predicting recurrence and distant metastasis in HNSCC. As used in the present disclosure, the term "aberration" includes but is not limiting to alteration in expression including up-regulation/over expression or down-regulation/under expression, amplification, mutation, loss of heterozygosity, copy number variations, structural variations, somatic mutations, gene fusion events, allelic expression, chromosomal abberations epigenetic changes including DNA methylation, histone modification and non- coding RNA (ncRNA)-associated gene silencing or any combination of aberrations thereof. In some embodiments of the present disclosure, "mutations" include but are not limiting to epigenetic mutation, transgenetic mutation, deletion, substitution and insertion or any combination thereof. In specific embodiments of the present disclosure, "aberrations" include up-regulation & down-regulation of plurality of the genes from the set of 38-genes and/or somatic variation or mutation in one or more genes from the group of 19 genes. The 38 genes are WASH4P, SLC01A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, K DC1, LILRA6, MARCH3, NANOG, POLRIC, PPFIA1, RP3- 323P13.2, STC1, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, D M1P47, DUX4L9, HNF1A, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2 and TRBV11-1, and the 19 genes are TP53, CASP8, OBSCN, ING1, TTK, U2AF1, RASAl, CDKN2A, NOTCHl, NOTCH2, DMD, PIK3CA, AJUBA, ANK3, FATl, HLAA, KEAPl, KMT2D and NFE212.

The present disclosure relates to a method of detecting HNSCC in a subject having or suspected of having HNSCC, wherein said method comprises determining aberration(s) in one of more genes from the group comprising 19 genes, which are identified by the present disclosure to be critical biomarkers of HNSCC.

In an embodiment of the present disclosure, determination of aberration(s) in one or more genes from the group of 19 genes includes analysing somatic variation/somatic mutations in said genes of the present disclosure. In an exemplary embodiment, somatic variations in one or more genes of the group comprising 19 genes is determined to detect HNSCC in a subject. In another embodiment, the 19 genes are TP53, CASP8, OBSCN, ING1, TTK, U2AF1, RASAl, CDKN2A, NOTCHl, NOTCH2, DMD, PIK3CA, AJUBA, ANK3, FATl, HLAA, KEAPl, KMT2D and NFE212.

In an exemplary embodiment of the above method, the HNSCC is OTSCC.

In a specific embodiment of the present disclosure, the method detecting HNSCC in a subject having or suspected of having HNSCC comprises acts of:

a) contacting sample obtained from the HNSCC subject with an agent or performing steps of a biomarker detection technique to determine aberration in: one or more genes selected from the group comprising TP53, CASP8, OBSCN, ING1, TTK, U2AF1, RASAl, CDKN2A, NOTCHl, NOTCH2, DMD, PIK3CA, AJUBA, ANK3, FAT 1 , HLAA, KEAP 1 , KMT2D and NFE212; and b) detecting HNSCC based on step (a) wherein aberration(s) in one or more genes correlates to the presence of HNSCC in said subject or vice-versa.

In another embodiment of the present disclosure, aberration in one or more genes from the group comprising 19 genes is determined with the help of an agent selected from a group comprising primer, probe, antibody, nanoparticles and a suitable interacting protein/biological agent capable of interacting with one or more genes of the group, in order to detect presence or absence of aberrations. In a preferred embodiment, said agent is employed for determining aberration(s) in one or more genes selected from the group TP53, CASP8, OBSCN, ING1, TTK, U2AF1, RASA1, CDKN2A, NOTCH1, NOTCH2, DMD, PIK3CA, AJUBA, ANK3, FAT1, HLAA, KEAP1, KMT2D and NFE212.

In another embodiment of the present disclosure, aberration(s) in one or more genes from the group of 19-genes is identified by employing techniques selected from a group comprising but not limiting to solution- based assays and solid-support based assays, or a combination thereof. In yet another embodiment of the present disclosure, gene aberration is determined by employing techniques selected from a group comprising but not limiting to Sequencing, Reporter gene technique, PCR, Northern Blotting, Western blotting, ELISA, fluorescence- based assays, luminescence/chemiluminescence-based assays, in-situ hybridization, Serial analysis of gene expression (SAGE), microarrays, tiling array, RNA Sequencing/Whole Transcriptome Shotgun Sequencing (WTSS) and electrochemical assays, or any combination of techniques thereof.

In yet another embodiment, the solution-based assays to detect aberration(s) in one or more genes from the group of 19 genes is selected from a group comprising but not limiting to Solution hybridization, PCR and luminescence- based assay, or any combination thereof.

In still another embodiment, the solid support based assays employed to detect aberration(s) in one or more genes from the group of 19 genes is selected from a group comprising but not limiting to Northern Blot, fluorescence- based assays, ELISA and Microarray, or any combination thereof.

As used herein, the term 'sample' used in the method of detecting HNSCC refers to any biological material/fluid/cell having or suspected of having tumor/cancer. Further, the sample may be derived from subject including humans and/or mammals, or the sample may be any biological fluid prepared/obtained in a laboratory.

The present disclosure also provides a kit for detecting HNSCC in a subject having or suspected of having HNSCC. In an embodiment, said kit comprises suitable agent(s) to determine aberration in one or more genes of the group comprising 19 genes, wherein said 19 genes are TP53, CASP8, OBSCN, ING1, TTK, U2AF1, RASA1, CDKN2A, NOTCH1, NOTCH2, DMD, PIK3CA, AJUBA, ANK3, FATl, HLAA, KEAPl, KMT2D and NFE212; and an instruction manual thereof which provides step-wise protocol of determining said aberration and correlating the same with HNSCC detection. In another embodiment, the agent is selected from a group comprising primer, probe, antibody, nanoparticle, suitable interacting protein/biological agent capable of interacting with one of more genes of the group comprising 19 genes. The present disclosure also relates to a method of predicting or prognosing recurrence of HNSCC in a sample having HNSCC, wherein said method comprises determining aberration(s) in the set consisting of 38 genes, which are identified by the present disclosure to be critical in HNSCC. In an exemplary embodiment, aberration in any combinations of 38 genes can predict recurrence of HNSCC in a sample.

In an embodiment of the present disclosure, determination of aberration(s) in the set consisting of 38 genes includes analysing expression levels of said genes. In an exemplary embodiment, up-regulation and down-regulation of said 38 genes is determined to predict recurrence and distant metastasis of HNSCC in a sample. In an embodiment, the 38 genes are WASH4P, SLC01A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDCl, LILRA6, MARCH3, NANOG, POLRIC, PPFIA1, RP3-323P13.2, STC1, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNF1A, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2 and TRBV11-1.

In another embodiment of the present disclosure, determination of aberration(s) in the set of 38 genes is performed wherein aberration in any combinations of 38 genes can predict recurrence. In a specific embodiment, aberration in any combinations of 38 genes distinguish non- recurrant, locoregional recurrant and distant metastatic tumors. In another specific embodiment of the present disclosure, the method of predicting the recurrence in a subject having HNSCC comprises acts of:

a) contacting sample obtained from the HNSCC subject with an agent or performing steps of a biomarker detection technique to determine aberration in gene set consisting of 38 genes; wherein the 38 genes are WASH4P, SLC01A2, LILRB3,

RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDC1, LILRA6, MARCH3, NANOG, POLRIC, PPFIA1, RP3-323P13.2, STC1, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNF1A, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2 and TRBVl l-l; and

b) predicting recurrence and distant metastasis in HNSCC based on step (a) wherein aberration(s) in any combination of genes within the set of 38 genes correlates to the recurrence of HNSCC in said sample or vice-versa. In a specific embodiment, the above method of predicting recurrence distinguishes non- recurrence, loco-regional recurrence and distant metastasis in HNSCC subjects.

In another embodiment of the present disclosure, aberration in the 38 gene set is determined with the help of an agent selected from a group comprising primer, probe, antibody, nanoparticles and a suitable interacting protein/biological agent capable of interacting with said genes, in order to detect presence or absence of aberrations. In a preferred embodiment, said agent is employed for determining aberration(s) in the set of 38 genes, wherein genes are WASH4P, SLC01A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDC1, LILRA6, MARCH3, NANOG, POLRIC, PPFIA1, RP3-323P13.2, STC1, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNF1A, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2 and TRBV11-1.

In another embodiment of the present disclosure, aberration(s) in 38 gene set is identified by employing techniques selected from a group comprising but not limiting to solution- based assays and solid-support based assays, or a combination thereof. In yet another embodiment of the present disclosure, gene aberration is determined by employing techniques selected from a group comprising but not limiting to Sequencing, Reporter gene technique, PCR, Northern Blotting, Western blotting, ELISA, fluorescence- based assays, luminescence/chemiluminescence-based assays, in-situ hybridization, Serial analysis of gene expression (SAGE), microarrays, tiling array, RNA Sequencing/Whole Transcriptome Shotgun Sequencing (WTSS) and electrochemical assays, or any combination of techniques thereof. In yet another embodiment, the solution-based assays to detect aberration(s) in the 38 gene set is selected from a group comprising but not limiting to Solution hybridization, PCR and luminescence- based assay, or any combination thereof.

In still another embodiment, the solid support based assays employed to detect aberration(s) in 38 gene set is selected from a group comprising but not limiting to Northern Blot, fluorescence- based assays, ELISA and Microarray, or any combination thereof.

In still an embodiment of the present disclosure, the HNSCC of the aforementioned methods is selected from group comprising cancers of oral cavity including the inner lip, tongue, floor of mouth, gingivae, and hard palate, nasopharyngeal cancer, oropharyngeal squamous cell carcinomas (OPSCC), cancer of hypopharynx, laryngeal cancer and cancer of trachea. In an exemplary embodiment of the present disclosure, the cancer is oral tongue squamous cell carcinoma (OTSCC). As used herein, the term ' sample' used in the method of predicting or prognosing recurrence refers to any biological material/fluid/cell having tumor/cancer. Further, the sample may be derived from subject including humans and/or mammals, or the sample may be any biological fluid prepared/obtained in a laboratory. The present disclosure also provides a kit for prognosis or predicting recurrence or distant metastasis in a subject having HNSCC. In an embodiment, said kit comprises suitable agent(s) to determine aberration in genes comprising the set of 38 genes, wherein said 38 genes are WASH4P, SLC01A2, LILRB3, RPS9, ADM2, AKR1C2, ARHGEF4, CBR4, HOXB2, KDM4C, KNDC1, LILRA6, MARCH3, NANOG, POLRIC, PPFIA1, RP3- 323P13.2, STC1, TPCN2, TTC39A, BVES, EVI5, COL4A4, COL5A3, DNM1P47, DUX4L9, HNFIA, MED24, MGME1, MUC3A, OVOL2, TAP2, AQR, FBN1, MST1, NOTCH3, TATDN2 and TRBVl l-1; and an instruction manual thereof which provides step-wise protocol of determining the aberration and correlating said aberration with recurrence in HNSCC. In another embodiment, the agent is selected from a group comprising primer, probe, antibody, nanoparticle, suitable interacting protein/biological agent capable of interacting with genes comprising the set of 38 genes.

The present disclosure also provides a method of detecting HNSCC, particularly OTSCC in a HPV negative subj ect, the method comprising act of determining aberration in CASP8 gene in the sample of said subject.

The technology of the instant application is further elaborated with the help of following examples, tables and figures. However, the examples, tables and figures should not be construed to limit the scope of the present disclosure.

EXAMPLES:

MATERIALS AND METHODS

Informed consent and Ethics approval

Informed consent is obtained voluntarily from each patient enrolled in the study and ethics approval is obtained from the Institutional Ethics Committees of the Mazumdar Shaw Cancer Centre. Patient samples used in the study

Informed consent is obtained voluntarily from each patient enrolled in the study. Ethics approval is obtained from the Institutional Ethics Committees of the Mazumdar Shaw Medical Centre. Matched control (blood and/or adjacent normal tissue) and tumor specimens are collected and used in the study. Only those patients, where the histological sections confirmed the presence of squamous cell carcinoma with at least 70% tumor cells in the specimen, are used in the current study. At the time of admission, patients are asked about the habits (chewing, smoking and/or alcohol consumption). Fifty patients who underwent staging according to AJCC criteria, and curative intent treatment as per NCCN guideline involving surgery with or without post-operative adjuvant radiation or chemo-radiation at the Mazumdar Shaw Medical Centre are accrued for the study (Table 1). Post-treatment surveillance is carried out by clinical and radiographic examinations as per the NCCN guidelines. EXAMPLE 1:

Studying HPV status

HPV is detected by using any of the four different assays, immunohistochemistry (THC) using antibodies against pl6 (AM540-5M, Biogenex, CA, USA), and viral E6 and E7 antigens (sc-460 and sc-58661 respectively, Santa Cruz Biotechnology, TX, USA); HPV DNA PCR using either type-specific (HP VI 6 LI and E6 and HPV18 LI and E7) or consensus primers (PGMY09/11, MY09/11, GP5 + /6 + and CPI/II); q-PCR using HPV16- and HPV18-specific TaqMan probes and primers and digital PCR using TaqMan probes and primers to detect HPV in primary tumor samples.

RESULTS:

Habits, clinical parameters and epidemiology

Tumor and matched control (adjacent normal and/or lymphocytes) samples are collected from 50 patients diagnosed with OTSCC, with informed consent. Data from patient habits, epidemiology and clinical parameters are presented in Figure 1A and Table 1.

Table 1 : Patient details used in the study (habits, epidemiology, clinical parameters)

OT36 40 M C T1N0M0 E II MDSCC Nil Nil Rec Y Alive 43.77 30.17 H

OT37 65 F C E 1 Nil 1 Nil No Rec 1 Ϋ Alive 49.40 42.53 H

OT38 37 M N T2N0M0 E 1 WDSCC Nil Nil Rec N Alive 51.73 14.10 M

OT40 40 M S+A T4aN2bM0 L IVA PDSCC Nil Nil Rec Y Dead 2.33 1.07 L

0T41 38 F N T1N0M0 WDSCC Nil Nil No Rec N Alive 85.60 85.47 H

OT42 57 M N T3N0M0 L j III WDSCC Nil Nil No Rec Y Alive 12.93 12.90 M

OT43 50 M N T2N2bicMx L IVA MDSCC Nil Nil No Rec N Alive 28.53 28.43 H

OT44 74 M A+S T3N1M0 L III WDSCC Nil Nil Rec Y Alive 32.50 27.90 H

OT45 62 F N T2N1MX L III MDSCC Nil RT No Rec N Alive 20.63 20.63 M j

OT46 48 M S+A T4N2cM0 L IVA MDSCC Nil Nil No Rec Y Dead 2.33 0.97 L

OT47 60 M S T3N0M0 L III MDSCC Nil CT+RT No Rec N Alive 17.07 16.60 M

OT48 38 M N T2N2bM0 L IVA PDSCC Nil CT+RT Rec Y Dead 2.00 0.77 L

OT49 40 M S+A+C T2N2bM0 L IVA PDSCC Nil Nil Rec NA Dead 19.93 9.80 L

CT50 35 M NA T2N1M0 L III MDSCC Nil Nil Dist Met NA Dead 13.10 12.20 M

0T51 79 F C T1N0M0 1 WDSCC Nil Nil Dist Met N Dead 18.87 11.77 L

OT52 48 F C T2N2cM0 L IVA MDSCC Nil Nil No Rec Y Alive 7.73 7.67 L

CT53 45 M A+S T4N0M0 L IVA MDSCC CT Nil No Rec Y Dead 15.83 15.80 M

OT54 42 F NA T2N2bMx L IVA MDSCC Nil Nil No Rec Y Alive 76.90 76.83 H

CT55 68 M N T4N2bM0 L IVA MDSCC Nil " Nil Rec N Dead 6.03 3.97 L

0T5 45 M N T4N0M0 L MDSCC Nil Nil Rec NA Alive 33.43 12.80 M

CT35 58 M N pTINOMx E WDSCC Nil RT No Rec NA Alive 97.07 NA NA

CT39 40 M S T3N0M0 L III MDSCC Nil Nil No Rec NA Dead 33.47 32.77 H

0T15 66 M N T1N0M0 E PDSCC Nil Nil Rec NA Alive 33.47 ' 3323 H

About two-thirds of the patients (N = 31) included in the study were in the younger age group (<50yrs), with 20% female patients in the total pool. Approximately, 70% of the patients were positive for at least one risk habit, namely, smoking, alcohol consumption or chewing tobacco (33% of patients smoked tobacco, 40% consumed alcohol and 42% chewed tobacco). HPV infection status in the primary tumors is established with at least one of the assay (pi 6 fflC, consensus or type (HPV16/18)-specific PCR, qPCR or digital PCR as described in Palve et al., 2016; http:^iorxiv.org/content/early/2016/10/24/082651). Thirty- three percentage of the patients deceased at the time of completing the analysis. About 60% of the tumors were moderately differentiated, 25% well differentiated and the rest were poorly differentiated. Among the patients recruited, 60% were node-positive, 70% had no recurrence, 9% had distant metastasis and 24% had loco-regional recurrence at the time of completing the analysis. The mean and median follow up duration for patients were nearly 30 months and 21 months respectively. About 27% of the tumors were early stage tumors (T1N0M0 and T2N0M0) and the rest 73% were late stage tumors (tumors belonging to the rest of the TNM stage).

EXAMPLE 2:

Exome Sequencing, read QC, alignment, variant identification and post-processing filters

Exome libraries are prepared using Agilent SureSelect, Illumina TruSeq and Nextera exome capture kits following manufacturers' specifications. Paired end sequencing is performed using HiSeq 2500 or GAIIx and raw reads are generated using standard Illumina base caller. Read pairs are filtered using in house and only those reads having >75% bases with > 20 phred score and < 15 Ns are used for sequence alignment against human hgl9 reference genome using NovoAlign (v3.00.05). The aligned files (*.sam) are processed using Samtools (vO.1.12a) and only uniquely mapped reads from NovoAlign are considered for variant calling. The alignments are pre-processed using GATK (vl .2-62) in three steps before variant calling. First, the indels are realigned using the known indels from 1000G (phasel) data. Second, duplicates are removed using Picard (vl .39). Third, base quality recalibration is done using CountCovariates and Table Recalibration from GATK (vl .2-62), taking into account known SNPs and indels from dbSNP (build 138). Finally, Unified Genotyper from GATK (v2.5-2) is used for variant calling, using known SNPs and indels from dbSNP (build 138). Raw variants from GATK are filtered to only include the PASS variants (standard call confidence > 50) within the merged exomic bait boundaries. Two out of 50 tumor samples did not confirm to the QC standards, therefore excluded from all further analyses. Therefore, all the downstream analyses are restricted to 48 primary tumors. The variants are further flagged as novel or present in either dbS P138 or COSMIC (v67) databases, based on their overlap. In addition to GATK, Dindel is also used to call indels. Both GATK and Dindel calls are filtered for microsatellite repeats (flagged as STR). The raw variant calls are used to estimate frequencies of nucleotide changes and transitiomtransversion (ti/tv) ratios. Exome-filtered PASS variants specific to the tumor samples, with respect to both location and actual call, are retained as somatic variants, which are further filtered to exclude variants where the region bearing the variant is not callable in the matched control sample, and those where the matched control sample had even one read covering the variant allele.

EXAMPLE 3:

Detection of cross-contamination and identification of significant somatic variants

Cross-contamination is estimated using ContEst in the tumor samples. Locus-wise and gene- wise driver scores are estimated by CRAVAT using the head and neck cancer database with the CHASM analysis option. Genes with a CHASM score of at least 0.35 is considered significant for comparison with other functional analyses. Somatic mutations are normalized with respect to the exome bait size (MB) to calculate the somatic mutation frequency per MB. Annotation and functional analyses of variants

Annotation and functional analyses of somatic variants is performed using IntoGen (web version 2.4), MutSigCV and MuSiC2. Somatic variants, filtered to contain only those callable in the matched normal but not covered by any read in the control samples (VCF), are used for IntoGen with the 'cohort analyses' option. Also, MutsigCVl .3 is run with these variants using coverage from un-filtered variants of all tumor samples. Pooled alignments for all normal and tumor samples (bam), each, along with pooled variants for all normal samples (MAF) are analyzed using MuSiC2 to calculate the background mutation rates (bmrs) for all genes, and a list of significantly mutated genes are identified (p-value of convolution test < 0.05). A condensed list of 19 genes, common between at least two analyses is compiled (Figure ID).

EXAMPLE 4:

SNP genotyping and validation using Illumina whole-genome Omni LCG arrays

High quality DNA (200ng), quantified by Qubit (Invitrogen), is used as the starting material for whole-genome genotyping experiments following the manufacturer's specifications. Briefly, the genomic DNA is denatured at room temperature (RT) for 10 mins using 0.1N NaOH, neutralized and used for whole genome amplification (WGA) under isothermal conditions, at 37°C for 20 hrs. Post WGA, the DNA is enzymatically fragmented at 37°C for lhr. The fragmented DNA is precipitated with isopropanol at 4°C and resuspended in hybridization buffer. The samples are then denatured at 95°C for 20 mins, cooled at RT for 30 mins and 35μ1 of each sample is loaded onto the Illumina HumanOmni 2.5-8 beadchip for hybridization (20hrs at 48°C) in a hybridization chamber. The unhybridized probes are washed away and the Chips (Human Omni2.5-8 vl .O and vl . l) are prepared for staining, single base extension and scanning using Illumina's HiScan system.

The SNP locations are filtered to retain only those, called without any error, contain within the exome boundaries as per the sequencing baits, and which are callable (covered by at least five sequencing reads). At these locations, the overlap is estimated for individual SNP calls, i.e., chr/pos/ref/alt and for no calls; i.e., chr/pos/ref/ref; between sequencing and array platforms.

EXAMPLE 5:

Determining Copy number Variations ( CNVs) and Loss of Heterozygosity (LOH)

CNVs and LOHs are identified using cnvPartition 3.1.6 plugin in Illumina GenomeStudio v2011.1, with default settings except for a minimum coverage of at least 10 probes per CNV7LOH with a confidence score threshhold of at least 100. Somatic CNVs and LOHs are extracted by filtering out any region common to CNVs and LOHs detected in its matched control. Somatic CNVs and LOHs are further filtered with respect to common and disease- related CNVs and LOHs using CNV Annotator. Overlaps with common CNVs and LOHs are discarded, reporting only the overlaps with disease-related, and novel CNVs and LOHs. The CNVs and LOHs are categorized within each cytoband and those with an occurrence in at least 10% of the patient samples are reported.

RESULTS FROM EXOME SEQUENCING. READ QC. ALIGNMENT. VARIANT IDENTIFICATION. POST-PROCESSING FILTERS. IDENTIFICATION OF SIGNIFICANT SOMATIC VARIANTS. SNP GENOTYPING AND DETERMINATION OF CNVs and LOH (Examples 2-5)

Identification and validation of significant somatic variants and their relationship with other parameters Variants are re-identified as described previously using whole-genome arrays, to validate the variant call accuracy as obtained from the exome sequencing data. Approximately 99% of the SNPs identified from Illumina sequencing are validated in both the tumor and matched control samples. After filtering and annotation, 19 cancer-associated genes are identified bearing significantly altered somatic variants in OTSCC (Figure ID). These are validated using Sanger sequencing in two sets of samples, one using the same tumor-control pairs used in the exome sequencing (the discovery set, Table 1) and second, using an additional 36-60 primary tumors (validation set) for genes altered in > 5% of the tumor samples. All the TP 53 variants are validated in the discovery set. Three out of the four variants are validated for CASP8. The mutant alleles for the heterozygous variants in HLA-A, OBSCN, INGI, TTK and U2AF1 identified by exome sequencing are difficult to interpret from the results of the validation using Sanger sequencing as they are present at a very low frequency. Combining data from the validation set; the mutation frequencies for RASA 1 and CDKN2A rose significantly to 10.71% and 16.47%) in primary tumors respectively but those for TP 53 and CASP8 remains largely unchanged.

The somatic mutation frequency per MB ranges from 10-45 with a median around 25 (Figure IB). The median value for transition to transversion (ti/tv) ratio for both the tumor and its matched control samples is -2.5. Overall, T->C changes are most frequent, followed by G- >A and then T->G. Habits (smoking and alcohol consumption), nodal status, HPV infection, tumor grade and stage has no significant impact on the distribution of these nucleotides. The workflow described in the Methods section is used to identify somatic mutations and indels in tumor samples following which three functional tools, IntOGen , MutSigCV and MuSiC2 are used for variant interpretations. In order to identify genes harboring significant variants, the intersection of these tools are used, following the criteria that the somatic variants be callable in the matched control sample and present in a single sequencing read in the control sample. This results in a final list of 19 cancer-associated genes (Figure 1C), which are divided into three categories with varying mutation frequencies (Figure ID). The three frequency tiers are > 30% (TP53), 6-30% (RASA1, CASP8 and CDKN2A) and 2-5% (NOTCH 1, NOTCH2, DMD and PIK3CA are prominent among them).

Next, mutual exclusivity of finding somatic variants in the genes is examined and it is found that many of these genes harbor variants in a mutually exclusive manner across samples (Figure IE), suggesting the possibility that there might be some common pathway(s) involved in the development of OTSCC. Mutual exclusivity is observed among somatic variants in NOTCH 1 and NOTCH2 genes, and this finding is expanded to identifying 15 such mutually exclusive sets (Figure IE). Among them, CDKN2A, HLA-A and TTK form a mutually exclusive set with TP53; RASA1, OBSCN, HLA-A, AJUBA and TTK are mutually exclusive with either NOTCH 1 alone, or NOTCH2 and ANK3 together; NOTCH 1, NOTCH2, HLA-A, AJUBA, ANK3, TTK, MLL2, INGl or KEAP1, are mutually exclusive with CASP8 alone, or FAT1 and DMD together; FAT1, HLA-A, AJUBA, ANK3, TTK, MLL2, INGl or KEAP1, are mutually exclusive with PIK3CA or DMD or NOTCH1 and OBSCN, or CDKN2A and OBSCN, U2AF1, MLL2 and TTK form a small mutually exclusive set. The positions of the somatic variants are juxtaposed from final list of all 19 genes detected in OTSCC against those found in the TCGA data using the cBioPortal. It is found that the somatic variants in OTSCC are in the same domains where mutations are observed earlier in many of the genes. Copy Number Variation (CNV) analyses using data from the whole-genome SNP genotyping arrays reveals a large chunk of chromosome 9, bearing cancer-associated genes like CDKN2A, NF1 and MRPL4, to be affected in about 17% of the tumors (Figure IF). Several CNVs of short stretches (in low kb range) are found within chromosomes 6-8, 11, 17 and X in many tumors.

EXAMPLE 6:

Gene Expression Assay

Gene expression profiling is carried out using Illumina HumanHT-12 v4 expression BeadChip (Illumina, San Diego, CA) in tumor and matched normal tissues following manufacturer's specifications. Total RNA is extracted from 20mg of tissue using PureLink RNA (Invitrogen) and RNeasy (Qiagen) Mini kits. RNA quality is checked using Agilent Bioanalyzer using RNA Nano6000 chip. Samples with poor RIN numbers, indicating partial degradation of RNA, are processed using Illumina WGDASL assay as per manufacturer's recommendations. The RNA samples with no degradation are labelled using Illumina TotalPrep RNA Amplification kit (Ambion) and processed according to the array manufacturer's recommendations. Gene expression data is collected using Illumina' s HiScan and analyzed with the GenomeStudio (v2011.1 Gene Expression module 1.9.0) and all assay controls are checked to ensure quality of the assay and chip scanning. Raw signal intensities are exported from GenomeStudio for pre-processing and analyzed using R further.

Gene-wise expression intensities for tumor and matched control samples from GenomeStudio are transformed and normalized using VST (Variance Stabilizing Transformation) and LOESS methods, respectively, using the R package lumi. The data is further batch-corrected using ComBat. The pre-processed intensities for tumor and matched control samples are subjected to differential expression analyses using the R package, limma . Genes with significant expression changes (adjusted P value <= 0.05) and fold change of at least 1.5 are followed up with further functional analyses.

RESULTS:

Differentially expressed genes in OTSCC

Significant (q val < 0.05) differentially expressed genes with a fold change of at least 1.5 reveals a consistent pattern of differential expression across the tumor samples (21 up- and 23 down-regulated genes, Figure 3A). Genes in PPAR signaling (e.g., MMP1) and ECM- receptor interaction pathways (LAMC2 and SPP1) are up-regulated and CRNN, APOD, SCARA5 and RERGL are down-regulated in a majority of tumors (Figure 3 A). Next, the pathways involving genes with aberrant expression and their link with HPV infection and other clinical parameters are studied. Genes in the arachidonic acid metabolism and Toll-like receptors are differentially expressed in patients with no smoking history (never smokers or past smokers) and alcohol habits (Figure 3B). SERPINE1 (a gene in HIF- 1 signaling pathway) is differentially expressed in patients that are habits- and are FIPV- negative. The NF-κ-Β and p53 signaling pathway is differentially expressed only in late stage tumors.

EXAMPLE 7:

Recurrence prediction using random forests

The presence or absence of somatic mutations/indels data in the entire set of genes for all the OTSCC patients, along with their recurrence patterns are used as training set for the random forest analyses using the varSelRF package in R. This method performs both backward elimination of variables and selection based on their importance spectrum, and predicts recurrence patterns in the same set by iteratively eliminating 2% of the least important predictive variables until the current OOB (out-of-bag) error rate becomes larger than the initial or previous OOB error rates. In order to understand the specificity of the best minimalistic predictors of tumor recurrence, the 0.632+ error rate is estimated over 50 bootstrap replicates. The varSelRFBoot function is used from the varSelRF Bioconductor package to perform bootstrapping. The .632+ method is described by the following formula: where Er 632' Er 632 Err^ and err are errors estimated by the .632+ method, the original .632 method, leave-one-out bootstrap method and err represents the error. R ' represents a value between 0 and 1. Another popular error correction method used is leave-one-out bootstrap method. The .632+ method is designed to correct the upward bias in the leave-one- out and the downward bias in the original .632 bootstrap methods.

For all iterations of all random forest analyses, it is confirmed that the variable importance remains the same before and after correcting for multiple hypotheses comparisons using pre- and post- Benjamin-Hochberg FDR-corrected P values.

RESULTS:

Tumor recurrence prediction using random forests

After cataloging the significantly altered genes in OTSCC, a study is conducted to find out whether there is a relationship between the altered genes and loco-regional recurrence of tumors and metastasis. In order to do this, an ensemble machine learning method is used implemented by variable elimination using random forests (Figure 5). Multiple testing correction and the 0.632 bootstrapping method is used to estimate false positives. A 38-gene minimal signature is identified that discriminates between the non-recurring, loco-regionally recurring and distant metastatic tumors (Figure 5). The .632+ bootstrap error, indicative of prediction specificity, varies across non-recurrent, recurrent and distant metastatic tumors. The median error is low (0.03) and intermediate (0.3) for the non-recurrent and the loco- regionally recurrent categories respectively but is relatively higher (1.0) for the metastatic tumors. The errors are proportional to the number of representative samples within each category. EXAMPLE 8:

Pathway analyses

Consensus list of genes from analysis, filtering and annotation of variant calls and from differential expression analysis using whole genome micro-arrays, are mapped to pathways using the web version of Graphite Web employing KEGG and Reactome databases. The network of interactions between genes is drawn originally using CytoScape (v3.1.1) using the .sif file created by Graphite Web.

EXAMPLE 9:

Data Visualization

Circos (v0.66) is used for multi-dimensional data visualization. Additionally, the cbioportal portal (http://www.cbioportal.org/) is used to visualize variants within the 19 genes harboring significant variants. All of the mandatory fields accepted by Mutation Mapper are provided for select genes from the study to create structural representations for each gene including domains. Such diagrams from the study, the HNSCC study and all cancer studies from TCGA are collated using the image-editing tool, GIMP (www.gimp.org). SNPs and indels are visualized for each individual tumor sample using IGV (vl .5.54), along with the reads supporting variants. EXAMPLE 10:

Validation of somatic variants using Sanger sequencing

Primers are designed using the NCBI primer designing

(http://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi 7LIMC_LOC BlastHome) used in Sanger sequencing for validation.

The specificity of the designed primers is tested using UCSC's tool, In Silico PCR. The variant-bearing region is amplified by using specific primers and used in Sanger sequencing. The somatic variants are confirmed by sequencing in the entire tumor and matched control DNA set used for the exome sequencing followed by further validation in 60 additional tumor samples.

RESULTS OF PATHWAY ANALYSIS. DATA VISUALIZATION AND VALIDATION

OF SOMATIC VARIANTS (Examples 8. 9 and 10) Linking habits, HPV infection, nodal status, tumor grade and recurrence, with genes harboring somatic variants and the associated affected pathways

The 19 cancer-associated genes are classified from the previous analyses and those are linked with habits, clinical parameters and HPV infection. Among the genes harboring significant somatic variants, CDKN2A is found to be mutated patients only in the no-smoking category (never smokers and past smokers), and NOTCH1 only in those that consumed alcohol (Figure 2A). HPV-negative patients harbor fewer TP53 somatic variants, while HPV- positive patients alone has somatic variants in the RASA1. DMD and PIK3CA are mutated only in the HPV-negative patients. Only the poorly differentiated tumor samples harbor variants in AJUBA, DMD and U2AF1 while NOTCH2 is mutated only in the well- differentiated tumors. Node-positive tumors alone have CASP8 variants, while node- negative tumors has greater occurrence of TP53 variants. Somatic variants in RASA1, NOTCH2, DMD and PIK3CA occur exclusively in the late stage while those in OBSCN and MLL2 occur exclusively in the early stage tumors (Figure 2A). Further, the association of cancer-related signaling pathways with habits and clinical parameters is studied, and it is found that nodal status and HPV infection are the two that has the highest impact (Figure 2B). The Procaspase-8 activation, Notch, p53 and Wnt signaling pathways are linked most with many of the clinical parameters, HPV infection and habits (Figure 2B). In the study, primary tumors with loco-regional recurrence has higher occurrences of altered Wnt signaling pathway (Figure 2B).

Major signaling pathways implicated in OTSCC

Significant pathways which are altered in OTSCC are examined, taking into account all the molecular changes in tumors and it is found that apoptosis, HIF, Notch, mTOR, p53, PI3K/Akt, Wnt and Ras are some of the key signaling pathways affected in OTSCC. In addition, histone methylation, cell cycle/immunity and mRNA splicing processes are also affected.

A summary of all changes in OTSCC is given in Figure 7.

EXAMPLE 11:

Cell culture and knockdown of CASP8 gene

The human OTSCC cell lines UPCI: SCC040 (gift from Dr. Susan Gollin, University of Pittsburgh, PA, USA, DS Z no: ACC 660) and UM-SCC47 (gift from Dr. Thomas Carey, University of Michigan, MI, USA) are used in the study. All the cells are maintained in Dulbecco's Modified Eagles' Media (DMEM) supplemented with 10% FBS, 1% MEM nonessential amino acids solution & 1% penicillin/streptomycin mixture (Gibco) at 37°C with 5% CO2 incubator. The siRNA-based knockdown assays is performed using UPCLSCC040 and UM:SCC47 cell lines for CASP8 gene. The expression of Caspase-8 is transiently knocked down using ON-TARGETplus Human CASP8 smart pool siRNA (L-003466-00-0010; Dharmacon) along with an ON-TARGET plus Non-targeting siRNA (D-001810-01 -20; Dharmacon). The transfection efficiency for the two cell lines (UPCLSCC040 and UM:SCC47) are optimized using siGLO Red Transfection Indicator (D-001630; Dharmacon). The siRNA duplexes are transfected using Lipofectamine-2000 according to the manufacturer's instructions (Invitrogen). The siRNA-oligo complexes medium is changed 8 hrs post transfection. The efficiency of transfection along with the mRNA expression is analyzed at 24 and 48 hrs post transfection by qRT-PCR. The specific down-regulation of CASP8 is confirmed by three independent experiments as given below:

RNA isolation and quantitative real-time PCR

RNA is extracted from cell pellets and tissues using RNeasy Mini kit spin columns (Qiagen) following manufacturer's protocol. Genomic DNA contamination is removed by RNase- Free DNase Set (Qiagen) and the total RNA is eluted in nuclease free water (Ambion). The RNA samples are estimated using Qubit 2.0 fluorometer (Invitrogen) and the integrity is checked by gel electrophoresis. The RNA samples are stored at -80°C until further used. The cDNA is synthesized with 400ng total RNA, using a SuperScript-III first strand cDNA synthesis kit, and following the manufacturer's instructions (Invitrogen). The cDNA is then subjected for quantitative real-time PCR (q-RT-PCR) using KAPA SYBR FAST qPCR Master Mix (KK4601, KAPA). The primer pairs used for testing the expression of caspase- 8 in q-RT-PCR are, forward 5'-ATGATGACATGAACCTGCTGGA-3' and reverse 5'- C AGGCTCTTGTTGATTTGGGC-3 ' . The amplification is done on Stratagene MX300P real time machine. To normalize inter-sample variation in RNA input, the expression values are normalized with GAPDH. All amplification reactions are done in triplicates, using nuclease free water as negative controls. The differential gene expression is calculated by using the comparative CT method of relative quantification. Assessment of cell viability

MTT cell proliferation assay is performed as per manufacturer's instructions (Sigma) to assess cell viability. Briefly, cells are seeded on 96-well plates containing DMEM with 10% FBS & incubated overnight. After treatment with 0.1% DMSO (vehicle control), or Cisplatin for 48 hrs, medium is changed and 100 μΐ of MTT solution (lmg/ml) is added to each well. The cells are further incubated for 4hrs at 37°C. The formazan crystals in viable cells are dissolved by adding ΙΟΟμΙ of dimethyl sulfoxide (DMSO) (Merck). The absorbance is recorded at 540 nm using reference wavelength of 690 nm on micro plate reader (Tecan Systems). Data is normalized to vehicle treatment, and the cell viability is calculated using GraphPad Prism software (version 4.03; La Jolla, CA). All the experiments are performed in triplicates.

Wound healing assay

Cells are cultured up to 80% confluency in 12 well plates; serum-starved for 24 hrs and then wounded using a 200 μΐ pipette tip. The wound is washed with lx PBS and the cells are grown in DMEM containing 10% FBS. Cells are imaged at lOx magnification at 0 hr, 15 hrs, 23 hrs and 42 hrs. For each well, three wounds are made and the migration distance is photographed and measured using Carl Zeiss software (Zeiss). Each experiment is performed in triplicates.

Matrigel invasion assay

The ECM gel (E1270, Sigma) is thawed overnight at 4°C and plated at requisite concentrations (for UPCLSCC040: 1.5mg/ml and UMSCC047: 2mg/ml) onto the transwell inserts and incubated overnight in the CO2 incubator at 37°C with 5% CO2. Cells were serum-starved for overnight, harvested, counted and seeded (UPCI: SCC040: 50,000 cells and UMSCC047: 20,000 cells per well) on top of the matrigel transwell-inserts (2 mg/ml) in serum free medium as per manufacturer's specifications (Sigma). D-MEM containing 10% FBS and 1% NEAA was added to the lower chamber. The 24-well plates containing matrigel inserts with cells were incubated in 37°C incubator for 48 hrs. At the end of incubation time, cells in the upper chamber were removed with cotton swabs and cells that invaded the Matrigel to the lower surface of the insert were fixed with 4% paraformaldehyde (Merk Milipore), permeabilized with 100% methanol, stained with Giemsa (Sigma), mounted on glass slides with DPX mounting agent and counted under a light microscope (Zeiss). Each experiment was performed in triplicates. RESULTS:

Functional studies with CASP8 in OTSCC cell lines

CASP8 is mutated in a significant number of oral tongue tumors. Caspase-8 is an important and versatile protein that plays a role in both apoptotic (extrinsic or death receptor-mediated) and non-apoptotic processes. The functional consequences of CASP8 knockdown through a siRNA-mediated method in an HPV-positive UM:SCC-47 and an HPV-negative UPCLSCC040 OTSCC cell lines is studied. Prior to the functional assay, the concentration of siRNA required for silencing, extent of CASP8 knockdown and cisplatin sensitivity (ICso) in both these cell lines is tested (Figure 4). The invasion of cells is greater in both UM:SCC- 47 and UPCLSCC040 cell lines when CASP8 is knocked down (Figure 4A). To analyze the effect of caspase-8 on the migration property of cells, scratches are made on the confluent monolayer of cells and the wound closure area is measured at different time points (Ohr, 15hr, 23hr & 42hr, Figure 4B). The wound closure is faster in CASP8 knockdown FtPV- negative cells compared to the FtPV-positive cells. At 15hr, 23hr and 48hrs, about 65%, 90% and 100%) of the wound got closed respectively in the FtPV-negative cell line compared to 50%), 70%) and 85%> respectively during the same time period in the FtPV-positive cell lines (Figure 4B). siRNA knockdown of CASP8 rescued the chemo-sensitivity caused by cisplatin treatment as evident by the MTT (3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide) survival assay (Figure 4C). It is found that the extent of rescue is greater in the FtPV-negative cell line compared to the FtPV16-positive one.

The present disclosure thus presents a comprehensive study on FtNSCC, particularly oral tongue tumors by molecular characterization of OTSCC and determines variants linked with habits, nodal status, tumor recurrence and FtPV infection. Exome sequencing, whole- genome gene expression, and genotyping arrays is performed using fifty primary tumors along with their matched control samples, towards identification of somatic variants (mutations and indels), significantly up- and down-regulated genes, loss of heterozygosity (LOH) and copy number variations (CNVs). All the molecular data along with the clinical parameters and epidemiology such as tumor stage, nodal status, FtPV infection, risk habits and tumor recurrence is integrated to interpret the effect of changes in the process of cancer development in oral tongue. Significant somatic variations are identified in TP 53 (38%>), RAMI (8%), CASP8 (8%), CDKN2A (6%), NOTCH 1 (4%), NOTCH2 (4%), and PIK3CA (4%) from the exome sequencing study in OTSCC. The key variants are validated using an additional set of primary tumor samples. Variants in TP53 and NOTCH! are found in mutually exclusive sets of tumors. Additionally, frequent aberrations are found in chromosomes 6-9, and 11 in tumor samples. A strong association is observed between somatic variations in some key genes with one or more risk habits; for example, CDKN2A and NOTCH1 with smoking and alcohol consumption respectively; RASA1 with HPV infection, CASP8 with nodal status; NOTCH2 with clinical grade of tumor; and RASA1, CASP8, AJUBA, INGl and KEAPl with tumor recurrence. From the gene expression analysis, it is found that matrix metalloproteases (MMPs) are highly expressed in OTSCC. Pathway analysis identifies Procaspase-8, Notch, Wnt, arachidonic acid, extracellular matrix (ECM)-receptor interaction, JAK-STAT and PPAR to be some of the significantly altered pathways in OTSCC. An ensemble machine learning method is employed and a minimal gene signature set that distinguishes a group of tumors with loco-regional recurrence from the non-recurrent set is determined. A 38-gene minimal signature is determined that predicts tumor recurrence using an ensemble machine learning method. Finally, functional analysis of CASP8 gene is performed in HPV-negative and FtPV-positive OTSCC cell lines to establish its role in the process of tumor development.