Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
GENE MARKERS FOR LUNG CANCER
Document Type and Number:
WIPO Patent Application WO/2001/098539
Kind Code:
A2
Abstract:
A method for the diagnosis and identification of new or residual lung cancer is disclosed which uses newly identified markers for lung cancer including syndecan 1, collagen 1 alpha 2, and two novel proteins, 7013 and 7018. The method involves identification of the lung cancer markers is blood from a patient. It is envisioned that at least one marker may be used or any mixture of the four. The method may also include the identification of cytokeratin-19.

Inventors:
MITSUHASHI MASATO (US)
KAMBARA HIDEKI (JP)
MATSUNAGA HIROKO (US)
KAWAMURA MASAFUMI (JP)
Application Number:
PCT/US2001/019980
Publication Date:
December 27, 2001
Filing Date:
June 21, 2001
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HITACHI CHEMICAL CO LTD (JP)
HITACHI CHEMICAL RES CT INC (US)
HITACHI LTD (JP)
MITSUHASHI MASATO (US)
KAMBARA HIDEKI (JP)
MATSUNAGA HIROKO (US)
KAWAMURA MASAFUMI (JP)
International Classes:
C07H21/04; G01N33/574; C07K16/30; C12N15/09; C12Q1/68; (IPC1-7): C12Q1/68
Domestic Patent References:
WO1999047674A21999-09-23
Other References:
BORSET MAGNE ET AL: "Hepatocyte growth factor and its receptor c-met in multiple myeloma." BLOOD, vol. 88, no. 10, 1996, pages 3998-4004, XP002223750 ISSN: 0006-4971
PECK KONAN ET AL: "Detection and quantitation of circulating cancer cells in the peripheral blood of lung cancer patients." CANCER RESEARCH, vol. 58, no. 13, 1 July 1998 (1998-07-01), pages 2761-2765, XP002223751 ISSN: 0008-5472
BESSHO AKIHIRO ET AL: "Detection of occult tumor cells in peripheral blood from patients with small cell lung cancer by reverse transcriptase-polymerase chain reaction." ANTICANCER RESEARCH, vol. 20, no. 2B, March 2000 (2000-03), pages 1149-1154, XP001128162 ISSN: 0250-7005
Attorney, Agent or Firm:
Altman, Daniel E. (Martens Olson & Bea, LLP 16th Floor 620 Newport Center Drive Newport Beach CA, US)
Download PDF:
Claims:
WHAT IS CLAIMED IS :
1. A method for the identification of lung cancer comprising: isolating blood or nonlung tissue from a patient, identifying the presence of at least one marker selected from the group consisting of syndecan 1, collagen 1 alpha 2, 7013, and 7018.
2. The method of Claim 1, further comprising identifying the presence of the marker cytokeratin19.
3. The method of Claim 1, wherein at least two markers are identified.
4. The method of Claim 2, wherein more than two markers are identified.
5. The method of claim 1 wherein said identification is by RTPCR.
6. The method of Claim 1 wherein said identification is by antibody binding.
7. The method of Claim 1 wherein said patient is a mammal. s.
8. The method of Claim 7 wherein said mammal is a human, dog, or cat.
9. A method for the isolation or removal of metastatic cancer cells, comprising: treating cells or a nonlung tissue containing cancer cells with antibodies specific for at least one marker selected from the group consisting of: syndecan 1, collagen 1 alpha 2,7013, and 7018.
10. The method of claim 9 further comprising antibodies specific for the marker cytokeratin18.
11. The method of claim 9 wherein said antibodies are bound to a moiety selected from the group consisting of metallic particles, fluorescent particles, chromatography beads, a chromatography gel and a solid support.
12. The method of Claim 9, wherein two markers are used.
13. The method of Claim 9, wherein more than two markers are used.
14. A polynucleotide comprising at least 17 nucleotides of SEQ ID NO : 16 (p7013, ATCC accession No., June 21, 2001).
15. A polynucleotide comprising at least 17 nucleotides of SEQID NO : 17 (p7018. ATCC accession No., June 21,2001).
16. A polynucleotide comprising at least 17 nucleotides of p7013112, ATCC accession No., June 21,2001.
17. A method for the identification of metastases of a solid tumor in a patient, comprising: isolating blood or bone marrow from said patient; and identifying the presence of at least one marker selected from the group consisting of: syndecan 1, collagen 1 alpha 2, 7013, and 7018.
18. The method of Claim 16, wherein said solid tumor is selected from the group consisting of bile duct, colon, breast, uterus, esophagus, and larynx.
19. A method for the identification of a carcinoma, comprising: obtaining a cancer cell ; and identifying the presence of the markers selected from the group consisting of: 7013,7018, and both.
Description:
GENE MARKERS FOR LUNG CANCER FIELD OF THE INVENTION The invention relates to a method for the diagnosis and identification of residual lung cancer. The invention further relates to the use of newly identified cellular markers for lung cancer. These markers include syndecan 1, collagen 1 alpha 2, and two novel proteins, 7013 and 7018.

BACKGROUND OF THE INVENTION Lung cancer is one of the most common cancers in industrial nations and has an extremely high mortality rate. Early diagnosis and effective treatments are not available at this time. A chest X-ray is frequently used for lung cancer screening, however, it is not useful for the detection of early stage cancer. CT scans may also be used and may allow detection of the cancer at an earlier stage, however, this procedure takes time and has a risk of exposure.

Cancer cells or micrometastases are frequently detected in the blood stream of patients with melanomas, thyroid cancers, and prostate cancers. Currently reverse transcription-based polymerase chain reaction (RT-PCR) is a powerful method capable of detecting a single cancer cell within millions of normal blood cells by amplifying a cancer specific gene or marker. This makes RT-PCR detection of micrometastases a promising diagnostic procedure for the prognosis, choice of appropriate treatments, and monitoring of the efficacy of each treatment. Furthermore, blood tests do not induce any health hazards, whereas methods such as X-ray or CT scan do. In addition, blood tests cause very minor physical discomfort as compared to endoscopic examination and biopsy. Lung cancer frequently induces blood-born metastasis even in the early stages-before symptomatic disease and many lung cancers relapse as distant metastases, such as in brain, bone, and liver. This is due to the lung cancer induced blood-borne metastasis. This knowledge can be used, however, because it suggests that the diagnosis and detection of relapse could be made on the basis of these blood borne metastases at a very early stage. However, there are no good markers available for the identification of the metastatic lung cancers cells in the blood. Currently, lung cancer markers such as cytokeratin-19 and CEA are used for the diagnosis of non-small cell lung cancer by RT-PCR (reverse transcriptase polymerase chain reaction) but lack specificity, and result in a high number of false positives and negatives.

RT-PCR of micrometastases, then, is especially advantageous for the detection of lung cancer due to the large patient population, high incidence of blood-borne metastasis, poor prognosis, and high medical cost for advanced cancer treatments. In addition, specific antibodies are not available as of yet for lung cancer.

Therefore, specific markers and a method of diagnosis of lung cancer by detecting blood-borne metastasis is needed.

SUMMARY OF THE INVENTION One embodiment is a method for the identification of lung cancer by isolating blood or non-lung tissue from a patient, and identifying the presence of at least one marker from the following : syndecan 1, collagen 1 alpha 2,7013, and 7018. The method may also include identifying the presence of the marker cytokeratin 19. In a further embodiment at least two markers are identified. In a further embodiment more than two markers are identified. The method of identification may be any known to one of skill in the art, but may also include RT-PCR andlor antibody binding. The patient may be any living thing, but in one embodiment is a mammal, particularly a human, dog, or cat.

A further embodiment is a method for the isolation or removal of metastatic cancer cells, by treating cells or a non-lung tissue containing cancer cells with antibodies specific for at least one marker selected from the group consisting of: syndecan 1, collagen 1 alpha 2,7013, and 7018. The method may also include antibodies specific for the marker cytokeratin-18. In one embodiment, the antibodies are bound to a moiety selected from the group consisting of metallic particles, fluorescent particles, chromatography beads, a chromatography gel and a solid support. In a further embodiment, two markers are used. In a further embodiment more than two markers are used.

A further embodiment is a polynucleotide comprising at least 17 nucleotides of SEQ ID NO : 16 (deposited as ATCC, June 21,2001) also identified herein as marker 7013. A further embodiment is at least 17 nucleotides of the polynucleotide deposited as ATCC, June 21,2001.

A further embodiment is a polynucleotide comprising at least 17 nucleotides of SEQ ID NO : 17 (deposited as ATCC, June 21,2001) also identified herein as marker 7018.

A further embodiment is a method for the identification of metastases of a solid tumor in a patient by, isolating blood or bone marrow from said patient; and identifying the presence of at least one marker selected from the group consisting of: syndecan 1, collagen 1 alpha 2,7013, and 7018. The method may also include identifying cytokeratin-18. In one embodiment, the solid tumor is selected from the group consisting of bile duct, colon, breast, uterus, esophagus, and larynx.

A further embodiment is a method for the identification of a carcinoma, by obtaining a cancer cell ; and identifying the presence of the markers selected from the group consisting of: 7013,7018, and both.

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a representation of the modified differential display procedure, selective-Amplified Fragment Length Polymorphism method (s-AFLP) used in the preferred embodiment.

Figure 2 is a polyacrylamide gel showing 6 candidate bands which were more abundant in the lung cancer RNA than in either normal blood lane. H: healthy people, L: Lung cancer tissue.

Figure 3 is a gel showing RT-PCR of lung cells with each candidate marker as well as cytokeratin-19 and- actin.

Figure 4 is a gel showing RT-PCR of lung cancer cells with each candidate marker as well as cytokeratin-19 and-actin.

Figure 5 is an example of a positive PCR and a negative PCR from blood RNA.

Figure 6 is the sequence information for Syndecan 1 as well as the probe sequence.

Figure 7 is the sequence information for Collagen-1 Pro alpha 2 as well as the probe sequence.

Figure 8 is the sequence information for the 7013 gene as well as the probe sequence.

Figure 9 is the sequence information for 7018 gene as well as the probe sequence.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT Differential display was used to identify mRNA's specifically expressed by lung cancer cells which circulate in the blood and are not expressed by normal blood cells. The differential display method was modified from the classical procedure to allow a more comprehensive representation of the genes expressed in a cell.

In order to identify candidate genes, three major technologies are presently available : subtraction libraries, differential display, and DNA microchip arrays. However, subtraction libraries, though being potentially useful, may also identify genes which exhibit partial similarity in gene sequences and is a very laborious and time intensive technique. DNA microchip arrays, though quick and easy to do, are not sensitive enough to detect rare genes in blood samples. This is particularly important because the method needs to be able to detect rare micrometastases which express specific markers within a large background (millions of white blood cells) of normal cells which do not.

Differential display (dd-PCR) allows for the amplification of all existing genes using combinations of multiple degenerate primers, thus increasing the likelihood that a rare marker or gene will be detected.

However, in order to improve the differential display technology to represent as many genes as possible, selective Amplified Fragment Length Polymorphism (s-AFLP) was used in Example 1. This approach has the advantages of 1) amplifying only the 3'end of the gene, 2) producing more reproducible gel patterns, 3) identifying fewer redundant genes, and 4) using more selective PCR conditions. The method resulted in the identification of 4 markers: Synd, Col, 7013 and 7018.

These markers may be used to identify lung cancer cells in the blood of a patient at any stage of disease.

Any method known to one of skill in the art may be used to identify the markers in the blood or in any metastatic tissue. For example, the mRNA expressed by the cells which is specific to the markers may be identified.

Alternatively, the proteins themselves may be identified using immuno-techniques, such as Western blot, FACS technology, ELISA, and other methods known to one of skill in the art. The antibodies or functional antibody parts may be purchased, isolated, or produced using known methods.

In one embodiment, the gene expression of the marker is identified. Any method which allows for identification of expression of the gene associated with the marker can be used. Typically, the method amplifies the mRNA resulting from expression of the gene. In one embodiment, RT-PCR of the mRNA from blood or tissue is used.

In a further embodiment, antibodies are used to identify cells in the blood containing these markers. For example, cell sorting can be used to identify cells which have been fluorescently labeled with antibodies to these markers.

In one embodiment, the method is used to identify the presence of lung cancer cells in the blood or bone marrow. However, it can be envisioned that mRNA from any tissue which does not normally produce these markers may be used. For example, the mRNA from an organ which is typically the site of metastases can be used. Therefore, in a further embodiment, the method is used to identify lung cancer cells in an organ such as liver or brain.

In one embodiment, the method is used to identify the presence of lung cancer cells at a very early stage in the disease, in a further embodiment, the lung cancer cells are identified after remission or to identify a relapse.

In a further embodiment, the markers are used to target the lung cancer cells in vivo or in vitro. In one embodiment, the body is cleared of cancer cells using affinity techniques which allow the cancer cells to be targeted and removed from the blood.

In a further embodiment, the markers are used to identify metastatic cells in the blood or bone marrow or a metastatic organ or tissue which are produced by such cancers as bile duct, colon, breast, uterus, esophagus, and larynx. The markers may alternatively be used to remove the metastatic cells from blood.

In a further embodiment, the markers, 7013 and 7018 are used to identify whether a cancer cell or other cell is of epithelial origin. For example, if the cell expresses one or both markers, it is likely of epithelial origin.

The patient may be any animal which is capable of having cancer. In one embodiment, the patient is a mammal. In a further embodiment, the mammal is a pet, such as a dog or cat. In a further embodiment, the patient is a human.

The invention will now be explained with reference to the Examples below. However, it is understood that these examples merely illustrate, but do not restrict the invention.

In Examples 1 and 2, the markers are identified using a variant of the differential display technique. The markers which are identified are explained in Example 2.

EXAMPLE1 s-AFLP Differential Display s-AFLP is based on the selective PCR amplification of restriction fragments from a cDNA library. Two sets of selective primers were used. The first one, selective primer A in Figure 1, consisted of three parts; a core sequence, an enzyme specific sequence and two selective nucleotides at the 3'termini. This provided for 42=16 kinds of primers because of the variation of selective sequences. The second one, selective primer B in Figure 1, also consisted of three parts: an anchor sequence, a poly T sequence and two selective nucleotides at the 3'termini which are fluorophore labelled. This then provided for 3 X 4 = 12 kinds of primers, because of the 3 selective sequences following the poly T (A, G, and C). Thus, all of the 3'terminal cDNA restriction fragments including a poly A region were included in the 192 (16 X 12) groups produced by this method of selective PCR. Each signal on the gel display of amplified fragments indicated a non-overlapping gene in the cDNA library. Using this technique, RNA isolated from lung cancer specimens was compared with RNA from the blood of healthy individuals. Differentially expressed genes,

which were overexpressed in the cancer specimens, were considered candidates for general genetic markers for tumor cell dissemination.

For successful s-AFLP, excellent quality RNA was important. However, lung specimens are one of the most difficult tissues to prepare RNA from due to the presence of abundant Rnases from alveolar macrophages. Therefore, snap frozen lung cancer specimens were purchased from NCCRI (Chuou-ku, Tokyo, Japan) and total RNA was purified by the AGPC method as previously described (Tominaga, K, Miura, Y. Arakawa, T. Koboyashi, K. and Mitsuhashi, M.

Clin. Chem., 42,1750-1783,1998). Agarose gel electrophoresis or a micro-capillary chip was used to assess the quality of the RNA and to confirm the presence of two rRNA bands. Acceptable specimens were then used for s AFLP.

Six lung cancer specimens were used and each sample was derived from different patients. Four of six were adenocarcinoma and two were squamous cell carcinoma. Twenty control bloods were derived from healthy volunteers with no history or present diagnosis of malignancy or other diseases.

Total RNA preparation of tumor and blood specimens Fresh frozen tumor specimens were broken into small pieces in liquid nitrogen. Total RNA was extracted from 100 mg specimen by the guanidine method (Chomzynski, P. and Sacchi, N. 1987. Single-step method of RNA isolation by acid guanidine thiocyanate-phenol-chloroform extraction. Anal. Biochem. 162: 156-159). Peripheral blood samples were collected from the veins of the patients and healthy volunteers in heparin anticoagulant containing tubes.

Two samples were collected from each subject, with 1-2 moi of peripheral blood in the first tube and 10 ml in the second tube. The first tube was discarded because it could have been contaminated with epithelial cells picked up by the needle when it pierced the skin, and only the second tube was assayed. Total RNA was extracted from peripheral blood samples with RiboCapw (RNature, Irvine, CA) according to the manufacturer's instructions. Purified total RNA qualities were checked by agarose gel electrophoresis with 18s and 28s ribosomal RNA bands and their quantity were measured by UV spectrometer.

Differential Displav s-AFLP Analysis A modified differential,"s-AFLP"was carried out using a selective primer technique.

Six lung cancer specimens (lung cancer tissue) and two pooled healthy blood samples that contained 5 healthy individuals each were used for s-AFLP analysis. All oligomers used in the whole assay were obtained from Sawady Technology (Tokyo, Japan). Double stranded cDNAs were synthesized from 30 fig of total RNA. Following denaturation for 5 min at 65°C, RNA was reverse-transcribed for 90 min at 37°C in a 50 il reaction mix which contained the following: 50 ligiml of oligo (dT), 218 primer (Life Technologies, USA), 500 gM of each deoxynucleotide triphosphate (dNTP) (Life Technologies), 50 mM of Tris-HCI (pH 8.3), 75 mM of KCI, 3 mM of MgCI2, 10 mM of dithiothreitol (DTT), 20 units of RNAsin (Life Technologies), and 10,000 units/ml of MMLV Reverse Transcriptase (Life Technologies). Second strand reactions were performed for 120 min at 16°C by adding the following components for the final concentrations; 25 mM Tris-HCI (pH8.3), 100 mM KCI, 10 mM (NH4) 2SO4, 5 mM MgCI2, 250, uM of each dNTP, 0.15 mM NAD, 5 mM DTT, 250 unitsiml of DNA polymerase (Life Technologies), and 30 unitsiml of DNA ligase

(Life Technologies). After the reaction, the tube was placed on ice and 12.5 Ri of 0.25 M ethlenediaminetetraaceticacid (EDTA; pH 7.5) was added for enzyme inactivation. The product was extracted with phenollchloroform once and ethanol precipitated. Double stranded cDNA was digested with X units 4-base recognition restriction enzyme Mbol (4-GATE ; New England Biolabs, USA) for 60 minutes at 37°C. The phosphate residue at the 5'termini of digested fragments was removed with calf intestinal alkaline phosphatase (CIAP ; Takara, Japan) to avoid self-ligation during the ligation process. 20 units of CIAP and 10 zizi of 10x alkaline phosphate buffer (500 mM Tris-HCI (pH9. 0), 10 mM MgCI2) was added and incubated for 30 minutes at 37°C. The product was extracted with phenollchloroform twice and ethanol precipitated. Fragments were ligated to the oligomers to introduce priming sites.

The following oligomers were used for ligation : (P1) 5'-P-GATCCCCTATAGTGAGTC-3' (linker oligomer) [SEQ ID N0 : 1] ; (P2) 5'GACTCACTATAGGG-P-3' (helper oligomer) [SEQ ID N0 : 21. The helper oligomer was phosphorylated at the 3'- terminal end to prevent the production of oligomer dimers. The ligation reaction was performed with 1: 100: 200 molar ratio of the digested fragments to helper oligomer to linker oligomer.

The reaction was performed by adding 15, u1 of Ligation High (Toyobo) in a 45p1 reaction at 16°C overnight.

After removal of unligated oligomer by QlAquick Nucleotide Removal Kit (Qiagen, Germany), the fragments of 3' terminus including the poly A tailing were amplified and detected using two types of selective primers. One of them (P3) consisted of three parts: a complementary sequence of anchor oligomer, an Mbol recognition sequence (GATC) and two degenerate nucleotides at the 3'termini (5'-GACTCACTATAGGGATCNN-3') [SEQ ID NO : 3]. P3 has 42=16 types because of the variation of selective sequences (NN). A second selective primer (P4) also consisted of three parts: an anchor sequence, a poly T sequence and two degenerate nucleotides at the 3'termini labelled with a sulforhodamine 101 label (5'-TCTCCTTTTTTTTTTTTTTTTTTVN-3') [SEQ ID N0 : 4]. P4 has 3x4-12 types, because the selective sequence"V"is any nucleotide except T. Thus, all of the 3'terminal cDNA restriction fragments including a poly A region are classified into 192 (= 16 x 12) groups by 192 PCR reactions each separately. PCR was carried out with Ex Taq DNA polymerase (Takara, Japan) and cycling parameters were 30 s at 94°C, 1 min at 56°C, and 1 min at 72°C (30 cycles). The amplified products were separated by polyacrylamide gel (4% T, 5% C) electrophoresis containing 1X TBE (0.09 M Tris-borate and 0.02 M EDTA) and 7 M Urea on a Hitachi SQ-3000 fluorescent DNA sequencer.

The cDNA's of interest were cut from the gel and purified by the crush and soak method. Figure 2 shows a sample gel yielding six candidates. The bands represent a particular fragment that is more abundant in the lung cancer RNA than the normal blood. Bands were isolated if they were present in at least 2 of the 6 lung cancer tissue lanes and not in either normal blood lane. Because differential display frequently produces many false positives, the selection criterion was important. The isolated fragments were subcloned using pGEM-Teasy Vector Systems (Promega, USA). The products that were purified by mini-preparation were sequenced with T7 primer (5'- TAATACGACTCACTATAG-3') [SEQ ID N0 : 15] using Big Dye terminator cycle sequencing kit (Applied Biosystems, CA, USA) and an AB ! Prism 377 DNA sequencing apparatus (Applied Biosystems).

Northern Blot Analysis

Total RNA was separately by formaldehyde gel electrophoresis and transformed to nylon membrane (+) (Amersham Pharmacia, England). Candidates were cloned with pCR2-1. Clones were verified by sequence analysis.

Plasmid DNA was restricted with appropriate enzyme and used for in vitro transcription.

Selection Method Initially, 121 candidates which were negative in healthy blood and positive in lung cancers were identified.

After sequencing these 121 candidates, the sequences were analyzed for homology with known sequences in GenBank as well as Expressed Sequence Tag (EST) databases.

The procedure for the selection of real versus false signals was as follows : Bands of interest were excised form the display gel and the DNA was cloned. The isolates of interest were then sequenced. If the sequence data indicated that the gene was normally present in blood cells it was discarded. Then specific primers were designed and PCR was carried out for each clone. First, normal blood RNA was amplified by PCR and if a PCR product was produced, the candidate was discarded. The remaining 21 candidates were tested against blood from lung cancer patients, and if no signal was found in any patient, then the candidate was discarded. This procedure resulted in the isolation of 4 candidates (see Example 2).

TABLE 1: CANDIDATE MARKERS FOR LUNG CANCER BLOOD DETECTION Candidate Database Description Synd nr Syndecan-1 gene (exon 2-5) Col nr Pro-2 (1) collagen COL1A2gene(exon 1) 7013 EST Genomic clone Location lq32. 2 7018 EST none EXAMPLE 2 Candidate Genes The candidate markers are shown in Table I. The first candidate marker found was identified as the syndecan 1 gene (nucleotide sequence PubMed accession number BC008765, protein sequence accession number AAH08765) (SEQ ID NOS: 18 and 19). Figure 6 provides the sequence of the fragment provided by s-AFLP (SEQ ID NO : 22). A blast search of this sequence produced a match for exons 2-6 of the Syndecan 1 gene (genbank accession No. Z48199). Syndecan 1 is a cell surface transmembrane heparan sulfate proteoglycan from the family of

proteoglycans that binds to extracellular matrix and growth factors. Loss of regulation of this gene has been identified in several cancers.

The next candidate was identified as collagen 1 alpha 2 (nucleotide sequence PubMed accession number J00114, protein sequence accession number AAA51996) (SEQ ID NOS: 20 and 21). Figure 7 provides the sequence of the old fragment provided by s-AFLP (SEQ ID NO : 24) as well as the new fragment (SEQ ID NO : 23). A blast search of this sequence provided a match for exon 1 of the collagen pro alpha2 (1) gene (genbank accession No. J03464). This is a widely expressed gene, especially in lung. It is interesting that the gene is involved as a fusion protein with PLAG1 (pleomorphic adenoma gene 1) in lipoblastoma.

The third candidate was termed 7013 (SEQ ID NO : 16) and was identified as a novel gene when searched against EST databases. Figure 8 provides the sequence of the fragment obtained from s-AFLP (SEQ ID N0 : 25). This fragment was used to identify a larger fragment using primer extension (SEQ ID NO : 16). This larger fragment was cloned into pCRll and the resulting plasmid p7013 has been deposited under the Budapest Treaty with the ATCC in Bethesda, Maryland, USA on June 21,2001 by FEDEX shipment with label No. 822080437778 and assigned ATCC accession No.. A larger clone was isolated from a cDNA library, and the plasmid p7013112 has been deposited with the Budapest Treaty with the ATCC in Bethesda, Maryland, USA on June 21,2001 by FEDEX shipment with label No. 822080437778 and assigned ATCC accession No.. The plasmid has an insert of approximately 2200 bases in pCMV6-XL4. Upon doing a BLAST search an EST (Genbank Accession number AK002208) was identified which showed the highest homology and a genomic clone (Genbank Accession number AL035408). The sequence does not exhibit any homology with any known gene, but has been localized to chromosome 1 region q32, a region which has been reported to undergo amplification in several epithelial cancers.

The next candidate was termed 7018 (SEQ ID N0 : 17) and is also a novel gene. Figure 9 provides the sequence of the fragment obtained from s-AFLP (SEQ ID NO : 26). This sequence was used to primer extend into the cDNA to get a larger fragment (SEQ ID NO : 17). This larger fragment was cloned in plasmid pCRII and and the resulting plasmid p7018 has been deposited with the Budapest Treaty with the ATCC in Bethesda, Maryland, USA on June 21, 2001 by FEDEX shipment with label No. 822080437778 and assigned ATCC accession No.. Upon doing a BLAST search two ESTs (Genbank Accession numbers AW956727 and AW452795) were identified which showed the highest homology, but does not exhibit any homology to known genes. In addition, it is not localized to any chromosomal region as of yet.

Examples 3 and 4 identify whether the markers are expressed by normal lung cells and lung cancer cell lines.

EXAMPLE 3 Expression of the markers in normal lung cells and lung cell lines To identify whether these markers were expressed in normal human lung cells, RT-PCR was performed on normal human lung RNA using identical amount of RNA from human lung small airway epithelial cells, micro-vascular endothelial cells from the lung, bronchial epithelial cells and normal human lung fibroblasts.

Total RNA Preparation From Normal Lung or Lung Cancer Cell Lines

Total RNA from cells was isolated by the acid-guanidine thiocyanatelphenol Ichloroform extraction using the TRI REAGENTs protocol (Molecular Research Center, Inc., Cincinnati, OH, USA). It is not necessary for the RNA to be purified completely, in fact, some DNA contamination is acceptable if the primer is designed at an intron-spanning region.

RT-PCR Before reverse transcription, all of the RNA samples were treated with DNase I to ensure that there was no genomic DNA contamination. One unit of DNase I (Life Technologies) was used for every one wog of total RNA. The products were extracted with phenollchloroform twice and ethanol precipitated. The cDNA was synthesized with 200 units of MMLV transcriptase for each 0.5 ug of RNA (42°C for 50 min.) from cell lines and with 100 units of transcriptase ReverTraAce for each one jug from blood total RNA (37°C for 60 min.).

First, cDNA synthesis was performed with 100 units of reverse transcriptase ReverTRaAceTM (TOYOBO, Japan) for each one p. g of total RNA, 50 llgiml of oligo (dT), 2. 18 primer (Life Technologies), 500, uM of each deoxynucleotide triphosphate (dNTP) (Life Technologies) for 60 min at 37°C. For every PCR, 50 ng of total RNA, 10 pmol of primer pairs and 0.125 units of EX Taq DNA polymerase was used in a total volume of 25 ul reaction. Primer pairs and cycling conditions for each genetic marker were as follows : SYND marker was amplified with primer rP5 (5'- TCATGTGTGCAACAGGGTAT-3') [SEQ ID NO : 5] and primer P6 (5'-AATATTCCTGATTCCAGCCC-3') [SEQ ID N0 : 6].

Cycling conditions were 30 sec at 94°C, 1 min at 65°C, and 1 min at 72°C (30 cycles). COL was amplified with primer P7 (5'OAGAGCATTGTGCAATACAGTTTCATTAACTCCT-3') [SEQ ID N0 : 7] and primer P8 (5'- GGTTTTCTTACAAAGGTTGACATTTTCCTAACAG-3') [SEQ ID NO : 8]. Cycling parameters were 30 sec at 94°C, 1 min at 58°C, and 1 min at 72°C (20 cycles). For the second round of the PCR reaction, the reaction mix contained I ti of the first round PCR product as a template and the primer pair P3 and P4. The second step of PCR was cycled at 94°C for 30 s, at 60°C for 1 min and 72°C for 1 min for 20 rounds. 7013 was amplified with primer P9 (5'- AATGAAGGAGACATCTGGAGTGTGCG-3') [SEQ ID N0 : 9] and primer P10 (5'- AGAAAAGAAAGATTAAGGTTCCCATCTGCG-3') [SEQ ID N0 : 10]. Cycling conditions were 30 sec. at 94°C, 1 min at 63°C, and 1 min at 72°C (38 cycles). 7018 was amplified with primer P11 (5'-ATCCATGCACGTCACTTTCCTTTCC- 3') [SEQ ID NO : 11] and prime P12 (5'-TCAAGTAGGCACAACCCAGTCCT-3') [SEQ ID N0 : 12]. Cycling conditions were 30 sec at 94°C, 1 min at 63°C, and 1 min at 72°C (38 cycles). CK-19 was amplified with primer P13 (5'- CAAGATCCTGAGTGACATGCGAAG-3') [SEQ ID N0 : 13] and primer P14 (5'-CGCTGATCAGCGCCTGGATATG-3') [SEQ ID NO : 14]. Cycling parameters were 30 sec at 94°C, 1 min at 60°C and 1 min at 72°C (20 cycles). For the second PCR reaction, one pl of the first round PCR product was a template and same primer pairs of P13 and P14 with identical cycle parameter. Following PCR, one) J. t of PCR products was analyzed by electrophoresis on a 4% NuSieve GTG agarose gel (Takara, Japan).

As positive controls, all samples were subjected to PCR for P-actin. Primers for four candidates, 7018, 7013, Syndecan 1 (Synd), Collagen 1 A2 (Col), and a positive control, Cytokeratin-19 (Ck-19) were used for RT-PCR.

Cytokeratin-19 is well known as an epithelial cell marker. PCR amplifications were performed with the use of the five primer pairs for these markers. For cell lines, 0.1 volume of RT reaction, 10 FM of each primer and 1.24 units of Taq DNA polymerase were used in a total volume of 50 pI reaction. For whole blood, 0.1 volume of RT reaction, 10 um of each primer and 0.125 units of EXTaq DNA polymerase were used in total volume of 25 jan) reaction. The PCR conditions of each primers as shown in Table 2. PCR products were analyzed by agarose gel electrophoresis and some products were confirmed by sequencing.

TABLE 2: PCR conditions for each primer pair PCR cycle marker initial denaturation denaturation annealing extension final extension 7018 94°C for 1 min. (94°C for 30 sec. 53°C for 30 sec. 72°C for 1 min) x35 72°C for 5 min.

7913 94°C for 1 min. (94°C for 30 sec. 60°C for 30 sec. 72°C for 1 min) x35 72°C for 5 min.

Synd 94°C for 1 min. (94°C for 30 sec. 55°C for 30 sec. 72°C for 1 min) x35 72°C for 5 min.

Col 94°C for 1 min. (94°C for 30 sec. 52°C for 30 sec. 72°C for 1 min) x35 72°C for 5 min.

Ck-19 94°C for 1 min. (94°C for 30 sec. 65°C for 30 sec. 72°C for 1 min) x35 72°C for 5 min.

ß actin 94°C for 1 min. (94°C for 30 sec. 53°C for 30 sec. 72°C for 1 x35 72°Cfor5min.

The results of the PCR of normal cells is shown in Table 3. This shows that the four newly isolated markers were frequently expressed in normal lung cell lines examined. In addition, the new markers, 7013 and 7018 are specific for lung epithelial cells (Table 3 and see also, Figure 3). The syndecan I was found to be expressed in all five RNA's. The collagen gene was also expressed in all five. The 7013 gene was found only in total lung cancer tissue and the epithelial cells as was 7018. The marker for lung cancer cytokeratin-19 was also found only in total lung and epithelial cells.

TABLE 3: RT-PCR amplification of mRNA markers in lung RNA and lung cell line RNA 7018 7013 Synd Col Ck-19 Total lung RNA + + + + + SAEC (small airway epithelial cell) NHBE(bronchial epithelial cell) HMVEC L (micro vascular endothelial cell in lung) NHLF (lung fibroblast) - - + + - EXAMPLE 4 Expression of the markers in lung cancer cell lines To identify whether these markers were expressed in lung cancer cell lines, RT-PCR was performed on RNA from 12 lung cancer cell lines using identical amounts of RNA (see Table 4). All PCR products were analyzed by agarose gel electrophoresis and some products were confirmed by sequencing. All RNA's amplified the positive control -actin equally well. Cancer cell lines Lu99 (Yamada, et al. Giant cell carcinomas of the lung producing colony- stimulating factor in vitro and in vivo. Jpn. J. Cancer Res., 76: 967-976, 1985), PC13 (large cell carcinoma), A549 (Imanishi, et al. Inhibition of growth of human lung adenocarcinoma cell lines by anti-transforming growth factor-alpha monoclonal antibody. J. Natl. Cancer Inst., 81: 220--23, 1989), PC14, NCI-H441 (adenocarcinoma), PC1, and QG56 (squamous cell carcinoma) were used.

The presence of the markers in the RNA in 6 lung cancer cell lines was also examined by RT-PCR as shown in Figure 4. Four of the lines were adenocarcinomas and the ones in lanes 5 and 6 were squamous carcinoma.

Syndecan 1 was found expressed in all six. The collagen gene was found strongly in four and weakly in two of the six.

Interestingly 7013 and 7018 displayed different expression patterns. 7013 was found in four of six and 7018 in five of six lines. This is compared to the cytokeratin-19 which was found in five of six. The squamous line 5082 did not express cytokeratin-19, but did express 7018.

TABLE 4: RT-PCR expression markers from tuna cancer cell lines cell line cell type 7018 7013 Synd Coi Ck-19 A549 adenocarcinoma + + + + + PC14 adenocarcinoma - + - - - NCI-H23 adenocarcinoma-+ + + + NCI-H358 adenocarcinoma + + + + + NCI-H441adenocarcinoma + + +. SW 1573 adenocarcinoma +-+ + + QG 56 squamous cell + + + + + PC 1 squamous cell + +. + NCI-H157 squamous cell + _ + +

NCI-H520 squamous cell + + + + + Lu 99 large cell.... PC 13 large cell + +..

Example 5 provides a method for the identification of lung cancer cells in blood using these markers. In Example 5, RT-PCR is used to identify the presence of mRNA for these markers in a blood sample.

EXAMPLE 5 Expression of the markers in patient blood 68 patients with lung cancer who were diagnosed and treated at Keio University Hospital between November 1998 and April 2000 were studied as well as 7 patients with metastatic lung cancer at Keio University Hospital in the same period. The characteristics of the patients are shown in Table 5.

The RNA from patient blood samples was tested using RT-PCR with the four candidates, 7018,7013, Syndecan 1 (synd), Collagen 1A2 (Col), and a positive control, Cytokeratin-19 (Ck-19). In order to qualify as a positive result, the sample had to have a successful amplification of the marker as well as a successful amplification of the- actin.

Total RNA Preparation From Whole Blood Peripheral blood samples were taken from the antecubital vein of patients and healthy volunteers in heparin anticoagulant containing tubes. Red blood cells (RBCs) were lysed by standard hypotonic solutions, and the whole blood cell population was collected onto a RiboCap syringe filter (RNAture, Irvine, CA). RNA was eluted from the syringe filter by applying a guanidine solution followed by a standard AGPC method. Purified total RNA qualities were analyzed by agarose gel electrophoresis with 18s and 28s ribosomal RNA bands and their quantity were measured by UV spectrometer. After preparation, the RNA pellet was resuspended in 20 ul of diethylpyrocarbonate (DEPC)-treated water and store at-80°C.

RT-PCR Before reverse transcription, all RNA samples were treated with DNase I to remove any possible genomic DNA contamination. The cDNA was synthesized with 200 units of MMLV transcriptase for each 0. zug of RNA (42°C for 50 min.) from cell lines and with 100 units of transcriptase ReverTraAcerM for each one ug from total blood RNA (37°C for 60 min.).

All samples were subjected to PCR for (3-actin as a positive control. Figure 5 gives an example of a positive test from blood RNA. Primers for the four candidates, 7018,7013, Syndecan 1 (synd), Collagen 1A2 (Col), as well as Cytokeratin-19 (Ck-19) which is well known as an epithelial cell marker were used for RT-PCR. For RT-PCR of cell lines : 0.1 volume of RT reaction, 10 uM of each primer and 1.24 units of Taq DNA polymerase were used in a total volume of 50 Ill reaction. For RT-PCR of whole blood : 0.1 volume of RT reaction, 10 lim of each primer and 0.125 units of EXTaq DNA polymers were used in total volume of 25 ul reaction. The PCR conditions were as in Table 2.

TABLE 5: Characteristics of Lung Cancer Patients

Group Characteristic Number Gender Male 48 Female 20 Age -39 40-49 14 50-59 8 60-69 18 70-79 25 ............................................................ ............................................................ ............................................................ ............................................................ ... Stage IA 9 IB 6 IIA 0 IIB 2 IIIA 12 IIIB 15 IV 15 Recurrence 8 Unknown 1 Histology Adenocarcinoma 35 Squamous Cell Carcinoma 19 Large Cell Carcinoma 3 Small Cell Carcinoma 2 Adenosquamous Cell Carcinoma 1 Unknown 8 In Table 6, the results of RT-PCR of the blood from lung cancer patients and control healthy volunteers was tested for the presence of the identified markers. Stage V patients were tested for expression of the markers as were 40 healthy volunteers. In summary, the results showed that the four genes; 7018,7013, Synd and col were not expressed in the healthy control bloods, but were in cancer cell lines (see Table 6).

TABLE 6: Results of RT-PCR Analysis in Lung Tumor Patients Patients N Synd Col 7013 7018 % Stage I-Iil 68 13 9 25 3 54 Healthy 20 0 0 0 1 5

More specifically, In Table 7, in lung cancer patients'blood samples, each gene was expressed 3%, 41%, 19%, and 16% respectively, but at least one of these four genes is expressed in 57% of patient's blood samples. Also, using cytokeratin-19, at least one of these five genes is expressed in 71% of patients. One or more of these genes are expressed in 80.5% of adneocarcinoma and in 68.4% of squamous cell carcinoma.

Syndecan was found in 13 lung cancer patients and no control (healthy) patients. Collagen was only found in lung cancer patients. The 7013 marker was widely expressed in lung cancer patients and not a single control. Only 7018 was found to be expressed in a control sample as well as the lung cancer patients.

TABLE 7: Frequency of positive RT-PCR expression markers in lung cancer patients'blood Cell type No. of 7018 7013 Synd Col combination of Ck-19 combination of Pt. 4 5 Adeno 34 2 14 8 5 22 10 27 Squamous 21 1 7 3 2 10 7 13 Large 3 0 1 1 0 2 0 2 Adenosq. 1 0 0 0 0 0 0 0 Small 2 0 1 0 1 1 1 Unknown 8 0 3 1 1 4 3 6 Total 69 3 26 13 9 39 21 49 Therefore, 54% of the patients displayed at least one or more of the four markers.

In summary, four new potential markers have been identified which are present in lung cancer patient's blood and absent from normal blood. Two of the markers are novel genes.

The new markers in combination were observed in 54% of lung cancer patients'blood examined (Table 5) and combining these markers with cytokeratin-19 resulted in the markers being found in 68% of lung cancer patients' blood examined. Therefore, the combination would be ideal for the diagnosis of micrometastasis of lung cancer in the early stage or for relapse.

In example 6, the ability of the assay using the 5 markers to identify a lung cancer cell in any type or stage of lung cancer is explored.

EXAMPLE 6 Analysis by Type and Stage of Lung Cancer Patients with different types and stages of lung cancer were tested for expression of the four makers (see Table 8).

TABLE 8: Characteristics of metastatic lung cancer patients. Origin (Cell type) Age Gender Bile duct (adenocarcinoma) 59 F Colon (adenocarcinoma) 60 F Breast (adenocarcinoma) 68 F Uterus (squamous cell carcinoma) 68 F Uterus (squamous cell carcinoma) 55 F Esophagus (squamous cell carcinoma) 56 M Larynx (squamous cell carcinoma) 59M As in Example 5, Sixty-nine lung cancer patient blood samples were obtained from Keio University Hospital.

The staging classification was performed according to tumor-node-metastasis (TNM) score (Mountain CE: A new international staging system for lung cancer. Chest 89: 225S-233S, 1986). The RNA was purified and RT-PCR performed as in Example 5. Expression of each marker in the blood samples was analyzed and the results are shown in Table 9.

In Table 9, it can be seen that although the markers were somewhat more likely to be expressed in later stages, if the combination of all 5 is used, they can be identified in all stages of the disease. Thus, using these five genes in combination, at least one gene is expressed in 56% of stage IA, 33% in IB, 50% in 11, 921 in IIIA, 60% in IIIB and 87% in IV.

TABLE 9: Frequency of positive RT-PCR expression markers in lung cancer patients'blood by stage Stage 7018 7013 Synd Col Combination of 4* Ck-19 Combination of -15- 5** IA 1 2 0 3 4 (44%) 2 5 (56%) IB 0 1 1 1 2 (33%) 1 3 (50%) II 0 0 1 0 1 (50%) 1 2 (100%) IIIA 0 9 4 0 10 (83%) 2 11 (92%) IIIB 2 2 3 3 7 (44%) 5 9 (56%) IV 0 9 3 2 11 (69%) 8 13 (81%) Rec. 0 3 1 0 4 (50%) 2 6 (75%) *combination of 4 markers: 7013,7018, Synd, and Col **combination of 5 markers: 7013,7018, Synd, Col, and Ck-19

When blood from patients with metastatic disease was tested, it can be seen that at least one marker was detected in all types of metastatic disease (see Table 10). This suggests that the markers can be used to detect disease in all stages of lung cancer as well as detecting recurrent metastases.

TABLE 10 : Marker detection in metastatic lung cancer patient's blood Origin (cell type) 7018 7013 Synd Col Ck-19 Bile duct (adeno)-+--- Colon (adeno) + + + Breast (adeno) + +--+ Uterus (squamous) Uterus (squamous)---+ + Esophagus (squamous)-+ + + Larynx (squamous)-+--- EXAMPLE11 Diagnosis and analysis of the presence of Lung cancer using mRNA from one or more of the identified markers The blood from a patient is isolated and the RNA from the blood sample purified. RT-PCR is performed using at least one of the following markers: syndecan 1, collagen 1 alpha 2,7013, and 7018. Controls such as-actin are also performed. Alternatively, the additional marker cytokeratin-19 is also tested by RT-PCR. The results are pooled and analyzed for expression of each marker.

EXAMPLE 12 Diagnosis and analysis of the presence of lung cancer using antibodies to one or more of the identified markers

Monoclonal or polyclonal antibodies specific for the markers: syndecan 1, collagen 1 alpha 2,7013, and 7018, and cytokeratin-19 are purchased, prepared, or isolated from patient blood. The antibodies are prepared using methods known to one of skill in the art, including hybridoma technology, injection of a fusion protein into rabbits, production of humanized antibodies, library technology, etc. In addition, whole antibodies or functional parts may be used. The blood from a patient is isolated and the cells treated with antibodies to one or more of the identified markers. The antibodies are fluorescently labeled or alternatively, the secondary antibodies are fluorescently labeled.

Cells bearing these markers are identified by the presence of the fluorescent label using FACS technology. Each marker is identified using a different fluorescently labelled antibody, allowing the identification of more then one marker in a single blood sample.

EXAMPLE 13 Isolation of Cancer Cells from Blood or Bone Marrow The metastatic cancer cells are isolated from the blood or bone marrow by treating the cells or a non-lung tissue containing the cells with antibodies specific for at least one marker selected from the group consisting of: syndecan 1, collagen 1 alpha 2,7013, and 7018 and cytokeratin-18.

The antibodies are bound to a moiety selected from the group consisting of metallic particles, fluorescent particles, chromatography beads, a chromatography gel and a solid support.

The isolated cells can then be used for identification of the site from which the cancer cells metastasized or production of activated immune cells specific for the cancer. Alternatively, the method can be used to purify the cells from the blood.

EXAMPLE 14 Isolation of a full-length cDNA clone The clone obtained from the s-AFLP technique is used to design primers for detection of a full-length cDNA clone. A human cDNA library is purchased from Stratagene (Torrey Pines, CA). The library is screened using a probe specific to 7013 (SEQ ID NO : 16) or 7018 (SEQ ID NO : 17). After obtaining positive clones, the clones are tested using PCR and sequenced.

EXAMPLE 15 Identification of a Cellular Epithelial Origin/Carcinoma Because the markers 7013 and 7018 specifically identified cells with an epithelial origin. The markers are used to identify whether cells have an epithelial origin, or in the case of cancer, whether it is a carcinoma. The cells are treated as in Example 12 for FACS analysis, using only the markers 7013 and 7018. If the cells are fluorescently labelled by one or both markers, the cell is determined to be of epithelial origin, or a carcinoma.