Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
LUNG CANCER DIAGNOSTIC
Document Type and Number:
WIPO Patent Application WO/2022/090422
Kind Code:
A1
Abstract:
The invention concerns a method for determining lung cancer in a human subject with a high risk or likelihood of developing lung cancer comprising performing FTIR spectral analysis of a sputum sample obtained from the subject and, optionally, comparing the spectrum from the sample with that of a control; use of said method to further select a course of treatment for lung cancer and a method of treatment comprising same; and a kit of parts for use in said method.

Inventors:
LEWIS PAUL (GB)
BRILLIANT CHARLES (GB)
Application Number:
PCT/EP2021/080042
Publication Date:
May 05, 2022
Filing Date:
October 28, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
VINDICO ICS LTD (GB)
International Classes:
G01N21/3577; G01N33/487; G01N21/35
Domestic Patent References:
WO2012001370A12012-01-05
Other References:
LEWIS PAUL D ET AL: "Evaluation of FTIR Spectroscopy as a diagnostic tool for lung cancer using sputum", BMC CANCER, BIOMED CENTRAL, LONDON, GB, vol. 10, no. 1, 23 November 2010 (2010-11-23), pages 640, XP021087157, ISSN: 1471-2407, DOI: 10.1186/1471-2407-10-640
Attorney, Agent or Firm:
SYMBIOSIS IP LIMITED (GB)
Download PDF:
Claims:
CLAIMS A method for determining lung cancer in a human subject with a high risk or likelihood of developing lung cancer, the method comprising: i) performing Fourier transform infrared (FTIR) spectral analysis of a sputum sample obtained from the subject to produce a sample spectrum; ii) processing the spectrum obtained in step i) to provide one or more second derivative spectra with respect to wavenumber (or frequency); iii) measuring the spectral absorbance at any one or more of the following combinations of wavenumbers: 984crrr1 and 967cnr1; 1024cm’1 and 967cm’1; 1055cm’1 and 967cm’1; 1079cm’1 and 967cm’1; 1411 cm’1 and 967cm’1; 1577cm’1 and 967cm’1; 1656cm’1 and 967cm’1; 1079cm’1 and 1034cm’1; 1079cm’1 and 1034cm’1; 1079cm’1 and 1168cm’1; 1079cm’1 and 1388cm’1; and/or 1079cm’1 and 1440cm’1; and iv) comparing the spectral absorbance(s) with respect to wavenumber according to one or more of the following equations or a mathematically rearranged derivative thereof, wherein x represents the second derivative absorbance at a first wavenumber, y represents the second derivative absorbance at a second wavenumber and wherein cancer is indicated if said equation is satisfied: A method for determining lung cancer in a human subject with a high risk or likelihood of developing lung cancer, the method comprising: i) performing Fourier transform infrared (FTIR) spectral analysis of a sputum sample obtained from the subject to produce a sample spectrum; ii) comparing the sample spectrum obtained in step i) with an FTIR spectrum from a control subject, iii) wherein a difference between the spectrum produced from the sample and the control at one or more of the following combinations of wavenumbers: 984cm-1 and 967cm-1; 1024cm-1 and 967cm-1; 1055cm-1 and 967cm-1; 1079cm-1 and 967cm-1; 1411 cm-1 and 967cm-1; 1577cm-1 and 967cm-1; 1656cm-1 and 967cm-1; 1079cm-1 and 1034cm-1; 1079cm-1 and 1034cm-1; 1079cm-1 and 1168cm-1; 1079cm-1 and 1388cm-1; and/or 1079cm-1 and 1440cm-1 is indicative of a subject suffering from lung cancer. The method according to claim 2 wherein second derivative absorbances with respect to wavenumber are compared according to one or more of the following equations or a mathematically rearranged derivative thereof, wherein x represents the second derivative absorbance at a first wavenumber, y represents the second derivative absorbance at a second wavenumber and wherein cancer is indicated if said equation is satisfied: The method according to any one of claims 2-3 wherein step iii) comprises or consists of observing a difference wherein an increase or decrease in absorbance at the same wavenumber or a shift in position of absorbance maxima or minima between wavenumbers between the spectrum produced from the sample and the control is indicative of a subject suffering from lung cancer. The method according to any preceding claim wherein said sputum sample is whole sputum. The method according to any preceding claim wherein said control sample is from a subject with a high risk or likelihood of developing lung cancer. The method according to any preceding claim wherein FTIR analysis is undertaken in transmission mode. The method according to any preceding claim wherein the, or each, spectra undergoes one or more conventional pre-processing step(s) prior to or following the comparison step to reduce the noise associated with the one or more spectra to provide the, or each, processed spectra. The method according to claim 8 wherein the one or more preprocessing step(s) are selected from the group comprising: background subtraction and/or normalisation such as vector normalisation and/or baseline correction, or combination thereof.

10. The method according to claim 9 wherein the, or each, pre-processed spectra is then further processed to provide one or more dimensionally reduced spectrum.

11. The method according to claim 10 wherein spectra is processed to provide second derivative spectra with respect to wavenumber (or frequency).

12. A method for monitoring the progression of lung cancer in a human subject comprising repeating one or more of the methods according to any one of claims 1-11 , and ideally the same, method(s) periodically.

13. A method for treating lung cancer comprising performing the method according to any one or more of claims 1-12 and depending upon the outcome of the method, where cancer is indicated undertaking a suitable or selected course of treatment.

14. A kit for use in determining lung cancer in a sputum sample from human subject with a high risk or likelihood of developing lung cancer, said kit comprising: a) an FTIR machine for performing spectral analysis of a sputum sample obtained from the subject; b) a processing unit, wherein said processing unit processes the spectrum from the sample according to the methods according to any one or more of claims 1 -11 ; and c) an output unit arranged to provide an output indicative of a subject suffering from lung cancer in the subject according to a determination made by the processing unit in step b).

15. The kit according to claim 14 further comprising a collection means for collecting the sputum samples.

16. The kit according to claim 15 wherein said collection means is provided as an infrared transparent slide.

27 The kit according to claim 16 wherein said kit comprises a sample or slide holder.

28

Description:
Lung Cancer Diagnostic

Field of the Invention

The invention concerns a method for determining lung cancer in a human subject with a high risk or likelihood of developing lung cancer, said method comprising performing FTIR spectral analysis of a sputum sample obtained from the subject and, optionally, comparing a portion of the spectrum from the sample with that of a control; use of said method to further select a course of treatment for lung cancer and a method of treatment comprising same; and a kit of parts for use in said method.

Background of the Invention

Worldwide, lung cancer represents a huge burden on healthcare systems and is a major cause of mortality. It is the most common cause of cancer in adult males (16.7%) and the third most common in adult females (8.7%) and is the responsible for 23.6% and 13.6% of all cancer deaths in adult males and adult females, respectively. Lung cancer patients also have a very poor 5-year survival rate of <10%, which is primarily due to a majority of patients being diagnosed only after the disease has progressed to a stage that no longer can be easily treated.

The currently available methods for detection of lung cancer include flexible bronchoscopy, computed tomography (CT)-scan and X-ray. However, these technigues are not effective for early detection of the disease, as evidenced by the extremely poor rate of diagnosis of early stage disease. Flexible bronchoscopy has been shown to have an overall sensitivity for lung cancer diagnosis of 88%, however this sensitivity drops markedly for peripheral lesions of <2cm in diameter to 34%. Almost 1 in 4 (23%) of diagnostic X-rays have been shown to provide a false negative results, whilst CT-scan has been shown to have 88.9% sensitivity and 92.6% specificity for diagnosis of lung cancer in a study comparing X-ray to CT-scanning. However, CT-scanning is limited by the potential for over-diagnosis and causing radiation-related harm to the patient. Evidently, there is a clear unmet need for a highly sensitive and specific diagnostic tool, capable of diagnosing both centrally located and peripherally located lesions, whilst causing minimal harm and distress to the patient.

Fourier transform infrared (FTIR) spectroscopy is a highly-sensitive analytical method which is capable of rapidly analysing structural changes in molecules. Due to its inherent ease of use, high reproducibility and non-invasiveness, FTIR has previously been applied with success to a range of biofluids and tissue samples. The technique is capable of analysing microlitre volumes of sample, with minimal sample preparation required. FTIR has shown promise as a sensitive diagnostic tool to distinguish neoplastic from normal cells in cancers such as colon cancer, prostate, breast, cervical, gastric, oral and oesophageal cancer. Briefly, FTIR measures chemical bond vibrations by measuring infrared (IR) absorbance by a sample - or transmission through a sample - and then produces an infrared spectrum based on the absorptive or transmissive properties of that sample. Depending on the analysis, changes in IR spectra, including readings at specific wavenumber or wavenumber regions, can be used to infer changes in sample composition that may correlate with a disease condition.

Indeed, it has previously been shown that FTIR can be used as a method for identifying biochemical changes in processed pelleted sputum as biomarkers for detection of lung cancer. Sputum was collected from lung cancer patients and healthy controls who showed no previous history of cancer or lung disease. FTIR spectra were generated from sputum cell pellets using infrared wavenumbers within the 1800 to 950 cm-1 "fingerprint" region, identifying certain regions of importance in diagnosis.

This study was limited in that the diagnostic power of the identified regions could differentiate lung cancer from non-cancer with only 80% sensitivity and specificity, and only when comparing to healthy individuals. Further, and more significantly, there are many major contributors to developing lung cancer, including important environmental and occupational risks, as well a multitude of genetic factors, which can skew diagnosis testing. Patients of chronic obstructive pulmonary disease (COPD), a common smoking-related obstructive respiratory disease, have a higher risk of developing lung cancer as their forced expiratory volume in one second (FEV1 ) declines. Diagnosis of lung cancer in such individuals is influenced by the patients’ background with a late stage diagnosis more likely in presence of comorbidities and disability, with COPD as a comorbid condition being strongly associated with stageindependent poor survival. Indeed, some COPD patients, especially those who have frequent exacerbations, can be accustomed to frequent changes in their condition including their tissue samples of relevance for diagnosis (i.e. sputum in this case), and this may contribute further to a late-stage diagnosis of cancer as persistent changes to symptoms could be attributed to an exacerbation. This can thus lead to erroneous or incorrect diagnosis of lung cancer in such individuals when considering such diagnostic methodologies.

There is therefore also clearly an unmet need for a diagnostic tools and refined FTIR analysis that is able to finely resolve and identify those patients with lung cancer at an early stage, especially amongst those individuals with an increased risk or likelihood of developing lung cancer due to the occurrence of other co-morbidities such as COPD.

Accordingly, we herein disclose a refined FTIR spectral waveform signature and analysis that can be used to diagnose and predict lung cancer and, significantly and superiorly, even amongst those individuals typically predicted to be at high risk of lung cancer. As disclosed herein, such high risk individuals can often otherwise exhibit changes to sputum that may lead to erroneous measure of sputum molecular changes, thereby skewing analysis and leading to inaccurate or inconclusive diagnosis when undertaking FTIR spectral analysis. Therefore, such signatures provide a robust and superior diagnostic analysis that pushes the diagnostic power of FTIR analysis in lung cancer to a level of increased sensitivity and specificity that is a pre-requisite in any clinical diagnostic test.

Statements of Invention According to a first aspect of the invention there is provided a method for determining lung cancer in a human subject with a high risk or likelihood of developing lung cancer, the method comprising: i) performing Fourier transform infrared (FTIR) spectral analysis of a sputum sample obtained from the subject to produce a sample spectrum; ii) comparing the sample spectrum obtained in step i) with an FTIR spectrum from a control subject, iii) wherein a difference between the spectrum produced from the sample and the control at one or more wavenumbers within one or more ranges selected from the group comprising: about 967 cm -1 , about 984 cm -1 , about 1024 cm -1 , about 1034 cm -1 , about 1055 cnr 1 , about 1079 cm -1 , about 1 168 cm -1 , about 1388 cm -1 , about 141 1 cm -1 , about 1440 cm -1 , about 1577 cm -1 , about 1656 cm -1 is indicative of a subject suffering from lung cancer.

Remarkably, when analysing a sputum sample it has been found that these specific wavenumbers are able to accurately predict and diagnose lung cancer even in those individuals with high likelihood of developing lung cancer for example, but not limited to, COPD patients, smokers, persons previously exposed to asbestos, and persons who live in areas of high air pollution, and thus provide a robust and highly accurate method with heretofore undisclosed levels of sensitivity and specificity.

Reference herein lung cancer refers to any cancer or tumour originating from lung tissue including, but not limited to: non-small cell lung cell carcinomas such as adenocarcinoma, squamous-cell carcinoma, and large-cell carcinoma; small-cell lung carcinoma; adenosquamous carcinoma; mesothelioma; carcinoid tumours; bronchial gland carcinomas; sarcomatoid carcinomas.

As is known in the art, sputum refers to the coughed-up material (phlegm), typically secreted by goblet cells from the lower airways (trachea and bronchi). Sputum can be any colour including clear, white, yellow, green, pink or red and blood tinged which can result from different medical conditions. In addition to containing dead cells, foreign debris that is inhaled into the lung, and at times, bacteria, sputum contains white blood cells and other immune cells that protect the airway from infections. Contrary to FTIR analysis of the prior art, wherein a substantial degree of processing of the sputum is required e.g. preparation of cell pellets and heating of the sample, in a preferred method whole sputum, ideally dried, is utilised and subject to FTIR analysis. As will be appreciated by those skilled in the art, in this manner the sputum samples are advantageously tested without substantial sample pre-processing or preparation, other than sampling from the subject to be tested and optionally freezing (for storage purposes), thawing and/or drying (either by active process or merely by atmospheric drying). In addition to simplifying sample handling and processing times, it has been found that this preparation minimises presence of saliva in the sputum sample which can lead to spectral artefacts that may adversely affect diagnostic result.

Reference herein to a control sample refers to a sample that has been shown not to have lung cancer using any one or more conventional techniques for identifying same such as, but not limited to flexible bronchoscopy, CT-scan, X- ray, ultrasound, MRI or the like. As is disclosed herein, it has been found that, through rigorous analysis, a refined FTIR spectral waveform signature has been determined that can be used to diagnose and predict lung cancer and, significantly and superiorly, even amongst those individuals typically predicted to be at high risk of lung cancer. As will be appreciated by those skilled in the art, such high risk individuals can often otherwise exhibit changes to sputum that may lead to erroneous measure of sputum molecular changes that would skew analysis and so lead to inaccurate or inconclusive diagnosis when undertaking FTIR spectral analysis using wavenumber signatures of the prior art. Accordingly, and more preferably, said control sample is from a subject with a high risk or likelihood of developing lung cancer, such as but not limited to, persons with COPD, smokers, persons previously exposed to asbestos, and/or persons who live in areas of high air pollution, but shown not to have lung cancer as determined using any one or more conventional techniques for identifying same.

In a preferred method of the invention, FTIR analysis is undertaken in transmission mode. In contrast to reflectance modes (such as Attenuated Total Reflectance FTIR (ATR-FTIR)), wherein incident light is reflected and measured to provide spectral readings, in transmission mode the sample is placed directly into the infrared (IR) beam. As the IR beam passes through the sample, the transmitted energy is measured and a spectrum is generated. Whilst the quality of the data produced is comparable in different modes, readings can be affected at certain wavenumbers owing to the presence of water which if present can obscure the protein absorbance bands. In the present context, water vapour is easily trapped in sputum at the sputum/substrate interface as it dries onto the substrate. However, the affect from water vapour is decreased using transmission FTIR, due to the IR beam passing directly through the whole sample.

Reference herein to the term ‘about’ means plus or minus 5% and most preferably plus or minus 2%. For example, given the nature of the art it will be appreciated by those skilled in the field that there may be variation around recited wavenumbers owing to sample variability, for example, ±5 cm -1 , ±4 cnr 1 , ±3 cm -1 , ±2 cm -1 , ±1 cm -1 .

As will be appreciated by those skilled in the art, in a preferred method of the invention the, or each, spectra preferably undergoes one or more conventional pre-processing steps prior to or following the comparison step to reduce the noise associated with the one or more spectra to provide the, or each, processed spectra. The pre-processing step(s) may comprise one or more of: background subtraction, and/or normalisation such as vector normalisation and/or baseline correction, or other method known to those skilled in the art. Preferably multiple output spectra are obtained and each spectrum is preferably subjected to one or more, preferably two or more, of wavenumber correction, baseline correction and vector normalisation. In preferred embodiments, the, or each, processed spectra is then further processed to provide one or more dimensionally reduced spectrum such as, but not limited to, second derivative spectra with respect to wavenumber (or frequency). The or each dimensionally reduced spectrum/spectra is/are then compared to similarly dimensionally reduced control spectra.

In a preferred method, step iii) comprises or consists of observing a difference wherein an increase or decrease in absorbance at the same wavenumber or a shift in position of absorbance maxima or minima between wavenumbers between the spectrum produced from the sample and the control is indicative of a subject suffering from lung cancer. Most preferably, an increase or decrease in absorbance between wavenumbers between the spectrum produced from the sample and the control is observed. More preferably still, the method comprises or consists of observing a difference wherein an increased absorbance at a wavenumber selected from one or more of about 984cm’ 1 , about 1034cm’ 1 , 1055cm’ 1 and 1440cm’ 1 and/or wherein a decreased absorbance at a wavenumber selected from one or more of about 967cm’ 1 , about 1024cm’ 1 , about 1079cm’ 1 , about 1168cm’ 1 , about 1388cm’ 1 , about 1411 cm’ 1 , about 1577cm’ 1 , and 1656cm’ 1 in the sample when compared with the control is indicative of a subject suffering from lung cancer.

More preferably still, the said method comprises or consists of comparing the spectral absorbance at wavenumbers in any combination selected from the group comprising or consisting of: about 967 cm’ 1 , about 984 cm’ 1 , about 1024 cm’ 1 , about 1034 cm’ 1 , about 1055 cm’ 1 , about 1079 cm’ 1 , about 1168 cm’ 1 , about 1388 cm’ 1 , about 1411 cm’ 1 , about 1440 cm’ 1 , about 1577 cm’ 1 , about 1656 cm’ 1 , and more preferably a combination of at least two wavenumbers.

Yet more preferably still, said method comprises or consists of comparing the spectral absorbance at any one or more of the following combinations of wavenumbers: 984cm’ 1 and 967cm’ 1 ; 1024cm’ 1 and 967cm’ 1 ; 1055cm’ 1 and 967cm’ 1 ; 1079cm’ 1 and 967cm’ 1 ; 1411 cm’ 1 and 967cm’ 1 ; 1577cm’ 1 and 967cm’ 1 ; 1656cm -1 and 967cm -1 ; 1079cm -1 and 1034cm -1 ; 1079cm -1 and 1034cm -1 ; 1079cm -1 and 1168cm -1 ; 1079cm -1 and 1388cm -1 ; and/or 1079cm -1 and 1440cm -1 . In exemplary embodiments, spectral absorbance at 1079cm -1 and 1168cm -1 or at 1079cm -1 and 967cm -1 is compared.

In preferred embodiments, second derivative absorbances with respect to wavenumber are compared according to one or more of the following equations, wherein x represents the second derivative absorbance at a first wavenumber, y represents the second derivative absorbance at a second wavenumber and wherein cancer is indicated if said equation is satisfied.

More preferably, second derivative absorbances with respect to wavenumber are compared according to one or more of the following equations, wherein x represents the second derivative absorbance at a first wavenumber, y represents the second derivative absorbance at a second wavenumber and wherein cancer is indicated if said equation is satisfied.

Alternatively, as will be readily appreciated by those skilled in the art, the comparison of the second derivative absorbance at a first wavenumber with the second derivative absorbance at a second wavenumber can be considered relative to one another in the alternative arrangement using a mathematically rearranged derivative of one or more of the equations identified above, and used to indicate cancer to equal effect.

According to a further aspect of the invention, there is provided a method for determining lung cancer in a human subject with a high risk or likelihood of developing lung cancer, the method comprising: i) performing Fourier transform infrared (FTIR) spectral analysis of a sputum sample obtained from the subject to produce a sample spectrum; ii) processing the spectrum obtained in step i) to provide one or more second derivative reduced spectra with respect to wavenumber (or frequency); iii) measuring the spectral absorbance at any one or more of the following combinations of wavenumbers: 984cm -1 and 967cm -1 ; 1024cm -1 and 967cm -1 ; 1055cm -1 and 967cm -1 ; 1079cm -1 and 967cm -1 ; 1411 cm -1 and 967cm -1 ; 1577cm -1 and 967cm -1 ; 1656cm -1 and 967cm -1 ; 1079cm -1 and 1034cm -1 ; 1079cm -1 and 1034cm -1 ; 1079cm -1 and 1168cm -1 ; 1079cm -1 and 1388cm -1 ; and/or 1079cm -1 and 1440cm -1 ; and iv) comparing the spectral absorbance(s) with respect to wavenumber according to one or more of the following equations, wherein x represents the second derivative absorbance at a first wavenumber, y represents the second derivative absorbance at a second wavenumber and wherein cancer is indicated if said equation is satisfied.

More preferably still, step iv) of said method comprises comparing absorbances with respect to wavenumber are according to one or more of the following equations, wherein x represents the second derivative absorbance at a first wavenumber, y represents the second derivative absorbance at a second wavenumber and wherein cancer is indicated if said equation is satisfied. Alternatively, as will be readily appreciated by those skilled in the art, the comparison of the second derivative absorbance at a first wavenumber with the second derivative absorbance at a second wavenumber can be considered relative to one another in the alternative arrangement using a mathematically rearranged derivative of one or more of the equations identified above, and used to indicate cancer to equal effect.

According to a further aspect of the invention, there is provided a method for monitoring the progression of lung cancer in a human subject comprising repeating one or more of the afore, and ideally the same, method(s) periodically.

Ideally, FTIR analysis is correlated with known lung cancer staging techniques such that of a simple in vitro assay or biopsy used to reliably inform a clinician about, not only the existence of a lung cancer, but also its stage or progression.

As will be appreciated by those skilled in the art, in the above method of the invention comparing spectral absorbance with respect to wavenumber from the sample and/or analyzing relative spectral difference(s) between the spectrum produced from the sample and the control sample at one or more wavenumbers within one or more ranges disclosed herein can be used to assess how effective a treatment regimen is working, for example, by assaying the absorbance levels during the course of a given therapy to determine if there is a change in the absorbance in response to said treatment.

Additionally, or alternatively, there is provided a method for treating lung cancer comprising performing any one of the afore methods and then, depending upon the outcome of the method, undertaking a suitable or selected course of treatment.

According to a further aspect of the invention there is provided a kit for use in determining lung cancer in a sputum sample from human subject with a high risk or likelihood of developing lung cancer, said kit comprising: a) an FTIR machine for performing spectral analysis of a sputum sample obtained from the subject; b) a processing unit, wherein said processing unit processes the spectrum from the sample according to a method as defined herein; and c) an output unit arranged to provide an output indicative of a subject suffering from lung cancer in the subject according to a determination made by the processing unit in step b).

In a preferred kit of the invention, FTIR analysis is undertaken in transmission mode.

In yet a further preferred kit of the invention, the kit further comprises a collection means for collecting the sputum samples, ideally provided as an infrared transparent slide.

Yet more preferably still, said kit further comprises a sample or slide holder to keep the slide in place during measurement of absorbance.

Throughout the description and claims of this specification, the words “comprise” and “contain” and variations of the words, for example “comprising” and “comprises”, mean “including but not limited to” and do not exclude other moieties, additives, components, integers or steps. Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.

All references, including any patent or patent application, cited in this specification are hereby incorporated by reference. No admission is made that any reference constitutes prior art. Further, no admission is made that any of the prior art constitutes part of the common general knowledge in the art. Preferred features of each aspect of the invention may be as described in connection with any of the other aspects.

Other features of the present invention will become apparent from the following examples. Generally speaking, the invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including the accompanying claims and drawings). Thus, features, integers, characteristics, compounds or chemical moieties described in conjunction with a particular aspect, embodiment or example of the invention are to be understood to be applicable to any other aspect, embodiment or example described herein, unless incompatible therewith.

Moreover, unless stated otherwise, any feature disclosed herein may be replaced by an alternative feature serving the same or a similar purpose.

The Invention will now be described by way of example only with reference to the Examples below and to the following Figures wherein:

Figure 1. Vector-normalised, baseline-corrected absorbance spectra from 1800-950cnr 1 of cancer (black dashed), COPD (grey solid) cohorts. Peaks and troughs were differences between cancer and non-cancer are visible are easily identifiable at approximately 1740, 1650, 1590, 1410 and 1075cnr 1 ;

Figure 2. A) QQ Normality plots and B) frequency histograms of distribution of absorbencies at 1653cnr 1 and 1076cm’ 1 . Plots of distribution at 1653cm’ 1 show a heavy-tailed distribution. Distribution at 1076cm’ 1 is closer to normality but the cancer cohort shows a light-tailed distribution and the control cohorts show a light skewing;

Figure-3. Vector-normalised baseline-corrected absorbencies at 1076cm-1 and 1653cm-1 of all cancer (red), and COPD (green)

Figure-4. Second derivative spectra from (a) 1800-950crrr 1 (b) 1400-1300cm’

1 , and (c) 1250-950crrr 1 of cancer, COPD cohorts (colours as in figure 1 ). Figure-5. Comparison of second derivative intensities at 1168 cm’ 1 and 1079 cm’ 1 in cancer (red) and COPD (green) patient spectra. The linear regressor separates the patient clusters and the associated equation is used as the predictor.

Figure 6. Two-dimensional heatmap of predictions by each regression model, with sensitivities and specificities for each model shown. A prediction of cancer is shown as red, and a prediction of COPD is shown as green.

Table 1. Results from Shapiro-Wilk test for normality of distribution of absorbencies at 1740, 1653, 1589, 1410 and 1076cm’ 1 in cancer and noncancer control cohorts. P<0.05 suggests that the null hypothesis of normally distributed data can be rejected and the data are non-normally distributed. *P >0.05, the null hypothesis cannot be rejected, the data are normally distributed;

Table 2. Results of significance testing, comparing the normalized absorbencies at each wavenumber between the cancer and non-cancer cohorts with a Mann-Whitney II test; all wavenumbers tested were shown to be highly-significantly different between the patient groups;

Table 3. Sensitivity and specificity scores for determining lung cancer from non-cancer control groups based on the equations of the three lines shown in Figure 3;

Table 4. Average major peak positions within the glycogen-rich region from a subset of 60 randomly selected lung cancer patient second-derivative spectra. Standard deviation and variance for each peak has been calculated and the lowest values are highlighted; Table 5. Average major peak positions within the glycogen-rich region from all COPD patient second-derivative spectra. Standard deviation and variance for each peak has been calculated and the lowest values are highlighted;

Table 6. Summary of two-dimensional linear model performances, with equations, and model performance characteristics shown;

Table 7. Mean second derivative absorbances for cancer and COPD groups.

Methods

Patient Recruitment

All patients provided informed consent for their samples to be used in future research.

The Medlung observational study (REC No: 05/MWM01/75) recruited patients who attended bronchoscopy clinics across the UK under suspicion of lung cancer and were subsequently given a final clinical diagnosis of either lung cancer or non-cancer. Patients were referred to the clinic and gave informed consent before providing a sample of spontaneous sputum. The patients’ final clinical diagnosis and histology was recorded. Confirmed cancer cases and confirmed COPD cases make up the “Cancer”, “COPD control” cohorts respectively.

In total, raw sputum samples collected from 214 lung cancer patients with fully confirmed histologies, and 108 COPD patients as a higher-risk control group were used in this study.

Each sputum sample was stored at -80°C until time of spectrum generation.

Spectrum Acquisition and Sample Processing

Transmission-FTIR (t-FTIR) was performed on raw sputum samples using a Bruker Vertex 70 with high throughput attachment (HTS-XT), KBr beamsplitter, and DGTS detector. Prior to all measurements, ninety-six well silicon plates (Bruker) were cleaned in 70% ethanol and rinsed with dH20 three times and air dried. The sputum samples were allowed to thaw-out in their sealed containers and reach room temperature for at least one-hour prior to analysis. Raw (i.e. without further labelling or sample purification), thawed-out sputum samples were pipetted (2 pl) directly onto the plates in triplicate and allowed to dry under atmospheric conditions. Once dry, spectra were generated at 32 scans per spectrum, with a fresh background spectrum taken between each sample spectrum. Each 96-well plate was scanned in triplicate, to give a total of 9 replicate spectra per sample.

Spectrum Processing

All spectra underwent quality analysis prior to processing. All sample replicates were averaged before vector-normalisation and baseline-correction using the OPUS 7.5 (Bruker) in-built baseline-correction, vector-normalisation algorithms. Second derivative spectra were generated using the Savitzky- Golay method with 9 smoothing points. This approach allowed us to resolve broad, overlapping bands into individual bands thus increasing the accuracy of analysis. Peak peaking analysis was carried out using the in-built peak peaking algorithm in OPUS, set to a 5% threshold.

Statistical Analysis

Statistical tests were carried out using the programming environment R. Statistical significance was calculated using the non-parametric Mann-Whitney U Test, at a level of 0.05. Multiple hypothesis testing was carried out using Bonferroni correction. Normality of data was established using a Shapiro-Wilk test and visualisations through QQ-plots and histograms.

Model Building

Two dimensional linear models were produced comparing second derivative absorbencies at specific wavenumbers. Model performance was quantified using sensitivity and specificity. Results

1. FTIR Spectrum of Cancer Sputum

Shown in Figure 1 are vector-normalised, baseline-corrected average absorbance spectra of cancer (black dashed) and COPD (solid grey) cohorts. Whilst the average spectra appear to be highly similar, with little variation in peak position or relative absorbance, subtle differences can be identified.

For example, the relative intensity of multiple peaks and troughs can be seen to be different between cancer and non-cancer average spectra. For example, the relative absorbance at amide I (~1653cnr 1 ) is higher in the cancer average spectrum, and the major glycogen peak (~1076cnr 1 ) is lower compared to the non-cancer average spectra. The proposed vibrational mode of 1653cnr 1 is C=O stretching from a protein source. The proposed vibrational mode of the glycogen peak at ~1076cnr 1 is C-0 stretching, from the alcohol groups found within individual monosaccharide moieties throughout the sputum. This suggests that overall glycosylation compared to protein content in lung cancer sputum could be reduced, compared to non-cancer sputum. Additionally, the trough between amide I and amide II at around 1589 cm -1 of the cancer spectrum appears to be lower than all of non-cancer spectra, whilst the amide I peak shows a greater relative intensity than the non-cancer amide I peaks. This may suggest a reduction in the levels of amino-sugars such as sialic acid, A/-acetylgalactosamine (GalNAc) or A/-acetylglucosamine (GIcNAc) relative to the levels of protein present in lung cancer sputum.

A series of wavenumbers (1740, 1653, 1589, 1410 and 1076 cm -1 ) were identified as potential markers that could be used to discriminate between cancer and non-cancer sputum, due to clear visible differences in the average spectra (figure 1 ). Normality and significance testing was subsequently performed.

2. Normality and Significance Testing

Normality testing was carried out to ascertain how the absorbencies at each wavenumber were distributed across the cancer and control cohorts. A Shapiro-Wilk (SW) test for normality on five wavenumbers which correspond with positions of major peaks and troughs was initially carried out, and the results are summarised in Table 1. The results suggested that the null hypothesis that the data were drawn from a normally distributed population can be rejected, therefore indicating the data are non-normally distributed.

QQ-plots and histograms were drawn to visualise distribution (Figure 2). The distribution of absorbencies at 1653cnr 1 was shown to be heavily tailed, and the distribution at 1076cm -1 can be said to be closer to normality, but still demonstrated a light skewing in both cancer and control cohorts. Therefore, combining these results with the Shapiro-Wilk normality results, the data can be said to be non-normally distributed. Thus, non-parametric statistical testing was carried out to determine significance of differences between absorbencies at wavenumbers of interest.

As all wavenumbers tested were shown to be drawn from non-normally distributed data. The non-parametric Mann-Whitney II test was carried out to assess the statistical significance of any differences between the mean absorbencies (Table 2).

Figure 3 demonstrates how the different patient cohorts cluster and separate from each other based on the absorbance at 1076 and 1653cm -1 , as well as showing the overall trend within each patient cohort. There appears to be a small trend for cancer sputum spectra (red) to show a slight increase in relative absorption at 1653cm -1 alongside a slight decrease in relative absorption at 1076cm -1 , compared to the non-cancer sputum spectra (green).

Despite these trends, the discriminatory power of these wavenumbers is poor. The calculated sensitivity and specificity scores for a model based on normalized absorbencies at 1653cm -1 and 1076cm -1 are shown in Table 3. The cancer and non-cancer patient cohorts exhibit a large overlap, so the intercept of the linear separator was modified to optimise sensitivity and specificity scores. The most accurate equation was determined to be y2, with sensitivity and specificity of 61.11 % and 71.65% respectively. The other models demonstrated stronger specificity but poor sensitivity (y-i) or vice versa (ys).

3. Second Derivative Absorbance Models

Second derivative spectra were calculated from the vector-normalised, baseline-corrected average spectra (Figure 4). A second-derivative spectrum is used to increase the sensitivity for peak finding by calculating the rate-of- change across a small window of the spectrum. In this way, the resolution of small peaks and shoulders within larger peaks is increased.

The average second-derivative spectra were closely examined to identify regions of the spectra that could be used to distinguish cancer from non-cancer. Wavenumbers were identified having good discriminatory potential for lung cancer sputum for analysis.

Two dimensional linear models were generated to examine how the second-derivative absorbencies of interest at specific wavenumbers could separate the cohorts within two dimensions. The second-derivative absorbance at 967cnr 1 was initially chosen as a standard for plotting against other wavenumbers. This was because it was readily identifiable in all spectra and was calculated to have the lowest standard deviation and variance compared to all other major peaks detected in the cancer (Table 4), and COPD (Table 5) data sets.

A series of two dimensional linear models were built and tested for sensitivity and specificity for determining lung cancer from COPD. These are summarised in Table 6 and the best performing model (1168 vs 1079 cm -1 ) is shown in Figure 5. The predictions for each model are shown in the heatmap in figure 6. Table-1.

Wavenumber (cm 1 ) Cancer Non-Cancer

1653 < 2.2c- 16 2.06E-I4

1410 3.1 IE-08 8.I8E-06 Table-2.

Wavenumber Mean Cancer Mean Non- p-Value

(cm 1 ) Spectrum Cancer

Absorbance Spectrum

Absorbance

1653 0.152049 0.146985 1.02- 10- 12

1411 0.048932 0.055263 l.958e-09

Table-3. Table-4.

1

25 Table-7.