CORE 1 $g(b)3-GALACTOSYL TRANSFERASES AND METHODS OF USE THEREOF

Title:

CORE 1 $g(b)3-GALACTOSYL TRANSFERASES AND METHODS OF USE THEREOF

Document Type and Number:

WIPO Patent Application WO/2001/044478

Kind Code:

A2

Abstract:

Core 1 $g(b)3-galactosyl transferases and nucleic acids encoding the core (1) $g(b)3-galactosyl transferases are described. The enzymes and the nucleic acids encoding said enzymes have been identified in human, rat, mouse D. melanogaster and C. elegans. The polypeptides exhibit a wide range of homologies. The polynucleotides can be used to transform or transfect host cells for producing substantially pure forms of the enzyme, or for use in an expression system for post-translational core (1) O-linked glycosylation of proteins or peptides produced within the expression system. The enzymes can be used to galactosylate, via a $g(b)3-linkage, an N-acetylgalactosamine linked to a serine, threonine or other O-linking amino acid on peptides or proteins requiring O-linked glycosylation.

Inventors:

CANFIELD WILLIAM M
CUMMINGS RICHARD D
JU TONGZHONG

Application Number:

PCT/US2000/033945

Publication Date:

June 21, 2001

Filing Date:

December 14, 2000

Export Citation:

Click for automatic bibliography generation Help

Assignee:

UNIV OKLAHOMA (US)

International Classes:

C12N15/09; C12N1/15; C12N1/19; C12N1/21; C12N5/10; C12N9/10; C12N15/54; C12P19/18; (IPC1-7): C12N15/54; C12N9/10; C12P19/18; C12P21/02

Domestic Patent References:

WO1999051185A2	1999-10-14
WO1999065712A2	1999-12-23

Foreign References:

EP0679716A1

1995-11-02

Other References:

JU, T.-Z. ET AL: "Purification, cloning and expression of core 1 beta3-galactosyltransferase. " GLYCOBIOLOGY, vol. 9, no. 10, October 1999 (1999-10), page 86 XP001023067
LEPPANEN ANNE ET AL: "A novel glycosulfopeptide binds to P-selectin and inhibits leukocyte adhesion to P-selectin." JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 274, no. 35, 27 August 1999 (1999-08-27), pages 24838-24848, XP002176950 ISSN: 0021-9258
AMADO M ET AL: "IDENITFICATION AND CHARACTERIZATION OF LARGE GALACTOSYLTRANSFERASE GENE FAMILIES: GALACTOSYLTRANSFERASES FOR ALL FUNCTIONS" BIOCHIMICA ET BIOPHYSICA ACTA, AMSTERDAM, NL, vol. 1473, no. 1, 6 December 1999 (1999-12-06), pages 35-53, XP001015944 ISSN: 0006-3002
WILSON R ET AL: "2.2 MB OF CONTIGUOUS NUCLEOTIDE SEQUENCE FROM CHROMOSOME III OF C. ELEGANS" NATURE, MACMILLAN JOURNALS LTD. LONDON, GB, vol. 368, no. 6466, 3 March 1994 (1994-03-03), pages 32-38, XP002029739 ISSN: 0028-0836
SHERWOOD A L ET AL: "STABLE EXPRESSION OF A CDNA ENCODING A HUMAN BETA1 -->3GALACTOSYLTRANSFERASE RESPONSIBLE FOR LACTO-SERIES TYPE 1 CORE CHAIN SYNTHESIS IN NON-EXPRESSING CLLS: VARIATION IN THE NATURE OF CELL SURFACE ANTIGENS EXPRESSED" JOURNAL OF CELLULAR BIOCHEMISTRY, LISS, NEW YORK, NY, US, vol. 50, no. 2, 1 October 1992 (1992-10-01), pages 165-177, XP000676088 ISSN: 0730-2312
AMADO M ET AL: "A Family of human beta3-Galactosyltransferases (subtitle: characterization of four members of a udp-galactose:BETA-n-acetyl-glu cosamine/BETA-nacetyl-galactosamine BETA-1,3- galactosyltransferase family" JOURNAL OF BIOLOGICAL CHEMISTRY, AMERICAN SOCIETY OF BIOLOGICAL CHEMISTS, BALTIMORE, MD, US, vol. 273, no. 21, 22 May 1998 (1998-05-22), pages 12770-12778, XP002109786 ISSN: 0021-9258
ZENG STEFFEN ET AL: "Complete enzymic synthesis of the mucin-type sialyl Lewis x epitope, involved in the interaction between PSGL-1 and P-selectin." GLYCOCONJUGATE JOURNAL, vol. 16, no. 9, 1999, pages 487-497, XP001016300 ISSN: 0282-0080
ADAMS M D ET AL: "The genome sequence of Drosophila melanogaster" SCIENCE, AMERICAN ASSOCIATION FOR THE ADVANCEMENT OF SCIENCE,, US, vol. 287, 24 March 2000 (2000-03-24), pages 2185-2195, XP002144875 ISSN: 0036-8075

Attorney, Agent or Firm:

Palmer, John (CA, US)

Download PDF:

View/Download PDF PDF Help

Claims:

What is claimed is:

1.

An isolated polynucleotide which encodes a protein having ß3galactosyl transferase activity and which is selected from the group consisting of: (A) a polynucleotide selected from the group consisting of SEQ ID NO : 2, SEQ ID NO : 4, SEQIDN0 : 6, SEQIDNO : 8, SEQIDNO : 10, SEQID NO: 12, SEQ ID NO : 13, SEQ ID NO : 14, SEQ ID NO: 15 and SEQ ID NO: 18; (B) a polynucleotide which hybridizes under relaxed or stringent conditions to a polynucleotide of (A); (C) a polynucleotide which has at least about 41% identity with SEQ ID NO: 12; (D) a polynucleotide which differs in nucleotide sequence from the isolated polynucleotides of (A) (C) above due to degeneracy of the genetic code and which encodes a protein having ß3galactosyl transferase activity; and (E) a polynucleotide which differs in nucleotide sequence from the polynucleotides of (A) (D) in that said polynucleotide lacks a nucleotide sequence which, encodes a transmembrane domain wherein the R3 galactosyl transferase encode is soluble.

2.	The polynucleotide of claim 1 w erein the polynucleotide is DNA.

3.	A vector containing the polynucleotide of claim 1.

4.	A host cell transformed or transfected with the vector of claim 3.

5.	A process for producing substantially purified ß3galactosyl transferase comprising the steps of: culturing the host cell of claim 4; using the cultured host cell to express the ß3galactosyl transferase; and purifying the (33galactosyl transferase from the cultured host cell.

6.	The process of claim 5 wherein the ß3galactosyl transferase is soluble.

7.	The host cell of claim 4 wherein the polynucleotide is operatively associated with an expression control sequence contained in said vector.

8.	The host cell of claim 4 transformed or transfected with an expressible polynucleotide encoding a peptide or polypeptide requiring posttranslational O linked glycosylation to have a core 1 structure.

9.	The host cell of claim 8 wherein the peptide or polypeptide requiring post translational Olinked glycosylation to have a core 1 structure comprises P selectin glycoprotein ligand1 or a portion thereof which has Pselectin binding activity.

10.

A process for producing a substantially purified protein or peptide requiring post translational 0linked glycosylation having a core 1 structure, comprising the steps of: culturing a host cell having an expressible polynucleotide encoding a peptide or polypeptide requiring posttranslational 0linked glycosylation to have a core 1 structure and transformed or transfected with the vector of claim 3; expressing in the cultured host cell the ß3galactosyl transferase and the protein or peptide requiring post translational 0linked glycosylation to have a core 1 structure; and purifying the protein or peptide which required post translational Olinked glycosylation to have a core 1 structure from the cultured host cell.

11.

A process for producing a substantially pure 3galactosyl transferase comprising the steps of: providing a host cell transformed or transfected with a vector comprising a polynucleotide encoding a ß3galactosyl transferase; culturing the host cell in a manner which causes the expression of the ß3galactosyl transferase; and purifying the ß3galactosyl transferase from the cultured host cell.

12.

The process for producing a substantially pure ß3galactosyl transferase of claim 11 wherein,. in the step of providing a host cell transformed or transfected with a vector comprising a polynucleotide encoding aß3galactosyl transferase, the polynucleotide is selected from the group consisting of SEQ ID NO : 2, SEQ ID NO : 4, SEQ ID NO : 6, SEQ ID NO 8, SEQ ID NO : 10, SEQ ID NO : 18, SEQ ID NO : 12, SEQ ID NO : 13, SEQ ID NO : 14, and SEQ ID NO : 15, and a polynucleotide which hybridizes under relaxed or stringent conditions to a polynucleotide selected from the group consisting of SEQ ID NO : 2, SEQ ID NO : 4, SEQ ID NO : 6, SEQ. ID NO 8, SEQ ID NO : 10, SEQ ID NO : 18, SEQ ID NO : 12, SEQ ID NO : 13, SEQ ID NO : 14, and SEQ ID NO : 15.

13.	The process for producing a substantially pure ß3galactosyl transferase of claim 11 wherein the (33galactosyl transferase is soluble.

14.	A purified (33galactosyl transferase which is substantially free of other proteins.

15.	The purified (33galactosyl transferase of claim 1 wherein the ß3galactosyl transferase is a mammalian (33galactosyl transferase.

16.	The purified ß3galactosyl transferase of claim 2 wherein the ß3galactosyl transferase is a human (33galactosyl transferase.

17.	The purified ß3galactosyl transferase of claim 1 wherein the ß3galactosyl transferase is an insect ß3galactosyl transferase.

18.	The purified ß3galactosyl transferase of claim 1 wherein the ß3galactosyl transferase is a nematode ß3galactosyl transferase.

19.	The purified ß3galactosyl transferase of claim 1 lacking a transmembrane domain wherein the ß3galactosyl transferase is soluble.

20.	The purified human ß3galactosyl transferase of claim 3 lacking a transmembrane domain wherein the human ß3galactosyl transferase is soluble.

21.

A purified ß3galactosyl transferase which is substantially free of other proteins, comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO : 3, SEQ ID NO : 5, SEQ ID NO : 7, SEQ ID NO : 9, SEQ ID NO : 17 and an amino acid sequence which has at least about 41 % identity with SEQ ID NO : 1, and which has enzymatic activity of a R3galactosyl transferase enzyme.

22.

A R3galactosyl transferase,, comprising : an amino acid sequence which has at least about 60% identity with amino acid residues 97126 of SEQ ID NO : 1; an amino acid sequence which has at least about 67% identity with amino acid residues 143224 of SEQ ID NO : 1; an amino acid sequence which has at least about 63% identity with amino acid residues 239330 of SEQ ID NO : 1; and which has the activity of a ß3galactosyl transferase enzyme.

23.

An in vitro method of galactosylating an Nacetylgalactosamine linked to a serine, threonine or other Olinking amino acid on a protein, polypeptide or peptide, via a (33linkage, the method comprising the steps of: providing a protein, polypeptide or peptide comprising at least one serine, threonine or other 0linking amino acid residue have an N acetylgalactosamine linked thereto; providing a purified ß3galactosyl transferase; providing a galactose donor; and combining the protein, polypeptide or peptide with the ß3galactosyl transferase and the galactose donor under conditions suitable for causing transfer of the galactose from the galactose donor to the Nacetylgalactosamine linked to the protein, polypeptide or peptide.

24.

The in vitro method of claim 10 wherein, in the step of providing a purified R3galactosyl transferase, the ß3galactosyl transferase comprises at least one of the sequences selected from the group consisting of: (A) SEQID NO : 1, SEQ ID NO : 3, SEQ ID NO : 5, SEQ ID NO : 7, SEQ ID NO : 9, SEQ ID NO : 11 and SEQ ID NO : 17; (B) an amino acid sequence which has at least about 41 % identity with SEQ I D NO : 1 and which has enzymatic activity of a R3galactosyl transferase; (C) an amino acid sequence which has at least about 60% identity with amino acids 97126 of SEQ ID NO: 1, at least about 67% identity with amino acids 143224 of SEQ ID NO: 1, at least about 63% identity with amino acids 239330 of SEQ ID NO: 1 and which has enzymatic activity of a ß3galactosyl transferase; and (D) a human R3galactosyl transferase.

25.

A polynucleotide which encodes a protein having ß3galactosyl transferase activity and which is selected from the group consisting of: (A) a polynucleotide selected from the group consisting of SEQ ID NO : 2, SEQ ID NO : 4, SEQ ID NO : 6, SEQ ID NO : 8, SEQ) D NO : 10, SEQID NO: 12, SEQ ID NO: 13, SEQ ID NO : 14, SEQ ID NO : 15 and SEQ ID NO: 18; (B) a polynucleotide which hybridizes to a polynucleotide of (A); (C) a polynucleotide which has at least about 41 % identity with SEQ ID NO : 12; (D) a polynucleotide which differs in nucleotide sequence from the polynucleotides of (A) (C) above due to degeneracy of the genetic code and which encodes a protein having R3galactosyl transferase activity; and (E) a polynucleotide which differs in nucleotide sequence from the polynucleotides of (A) (D) in that said polynucleotide lacks a nucleotide sequence which, encodes a transmembrane domain wherein the ß3 galactosyl transferase encode is soluble.

26.	A vector containing the polynucleotide of claim 25.

27.	A host cell transformed or transfected with the vector of claim 26.

28.	A process for producing substantially purified R3galactosyl transferase comprising the steps of: culturing the host cell of claim 27; and using the cultured host cell to express the R3galactosyl transferase.

29.

A process for producing a protein or peptide, comprising the steps of: culturing a host cell having an expressible polynucleotide encoding a peptide or a polypeptide requiring posttanslational 0linked glycosylation to have a core 1 structure and transformed or transfected with the vector of claim 3; and expressing in the cultured host cell the R3galactosyl transferase and the protein or peptide requiring post translational 0linked glycosylation to have a core 1 structure.

30.

A process for producing ß3galactosyl transferase comprising the steps of: providing a host cell transformed or transfected with a vector comprising a polynucleotide encoding a ß3galactosyl transferase; and culturing the host cell in a manner which causes the expression of the R3galactosyl transferase.

31.

A R3galactosyl transferase comprising an amino acid sequence selected from the group consisting of SEQ ID NO : 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 17 and an amino acid sequence which has at least about 41 % identity with SEQ ID NO : 1, and which has enzymatic activity of a f33 galactosyl transferase enzyme.

32.

A R3galactosyl transferase, comprising: an amino acid sequence which has at least about 60% identity with amino acid residues 97126 of SEQ ID NO : 1; an amino acid sequence which has at least about 67% identity with amino acid residues 143224 of SEQ ID NO: 1; an amino acid sequence which has at least about 63% identity with amino acid residues 239330 of SEQ ID NO: 1.

33.

A method of galactosylating an Nacetylgalactosamine, the method comprising the steps of: providing a protein, the polypeptide or peptide comprising at least one serine, theonine or other 0linking amino acid residue have an N acetylgalactosamine linked thereto; providing a ß3galactosyl transferase; providing a galactose donor; and combining the protein, polypeptide or peptide with the R3galactosyl transferase and the galactose from the galactose donor to the N acetylgalactosamine.

Description:

CORE 1 ß3-GALACTOSYL TRANSFERASES AND METHODS OF USE THEREOF CROSS-REFERENCE TO RELATED APPLICATIONS The present application is based on copending U. S. Patent Application No.

09/334,013, entitled"SYNTHETIC GLYCOSULFOPEPTIDES AND METHODS OF SYNTHESIS THEREOF,"filed June 15,1999, the entire specification of which is hereby incorporated herein by this reference.

BACKGROUND The present invention is related to core 1 f3-galactosyl transferases, polynucleotides which encode said galactosyl transferases and methods of use thereof.

The core I 0-linked glycan structure, consisting of galactose linked R. 3 to N-acetylgalactosamine linked to a threonine or serine on a protein, peptide or polypeptide, is a critical intermediate in the biosynthesis of most extended 0- linked glycans (Glycoproteins and Human Disease (Brockhausen, I., and Kuhns, W., eds), (1997), pp. 13-31, R. G. Landes Company, Austin). The core 1 structure is found on a number of mucins and adhesion molecules. Core 1 R3- galactosyl transferase is the only enzyme which is capable of synthesizing the core 1 0-linked glycan structure Gal ß3-GalNAc-Ser/Thr. Previous attempts to measure activity of core 1 R3-galactosyl transferase in vitro and to purify the enzyme have been made. However, previous attempts to sufficiently purify the protein to identify its amino acid sequence or generate antibodies to the enzyme, as well as attempts to identify cDNAs encoding the enzyme, have been unsuccessful. As a result, there has remained a need in the field for complete identification of core 1 ß3-galactosyl transferase and of cDNAs encoding core 1 ß3-galactosyl transferase.

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a table summarizing the purification of core 1 ß3-galactosyl transferase from 500 grams of rat liver.

Figure 2 is a SDS-PAGE gel characterizing the purification of core 1 ß3- galactosyl transferase from rat liver.

Figure 3 show the cDNA (SEQ ID NO : 2) and protein sequence (SEQ ID NO : 1) of human core 1 (33-galactosyl transferase.

Figure 4 is an alignment comparison of human (SEQ ID NO : 1), rat (SEQ ID NO : 3), mouse (SEQ ID NO : 5), two D. melanogaster (SEQ ID NO : 9 and SEQ ID NO : 17) and C. elegans (SEQ ID NO : 7) core 1 ß3-gala*osyl transferase protein sequences.

SUMMARY OF THE INVENTION According to the present invention, core 1 ß3-galactosyl transferase and polynucleotides which encode said galactosyl transferase, as well as methods for using same, is provided. Broadly, core 1 ß3-galactosyl transferase purified from rat liver is provided, as well as the cloned. Homo sapiens, Rattus norvegicus, Mus musculus, Drosophila me/anogaster, and Caenorhabditis elegans cDNAs that encode this enzyme, designated herein as ß3-GalT. The invention further comprises a soluble form of the enzyme.

In one aspect, the invention comprises homologous versions of (33-GaIT proteins encoded by homologous cDNAs, homologous cDNAs, vectors and host cells which express the cDNAs, and methods of using the (33-GaIT proteins and cDNAs.

In further aspects, the present invention contemplates cloning vectors which comprise the nucleic acid of the invention; and prokaryotic or eukaryotic expression vectors which comprise the nucleic acid molecule of the invention- operatively associated with an expression control sequence. Accordingly, the invention further relates to a bacterial or eukaryotic cell transfected or transformed with an appropriate expression vector.

An object of the present invention is to provide a nucleic acid, in particular a DNA, that encodes a core 1 ß3-gala*osyl transferase or a fragment thereof, or homologous derivatives or analogs thereof.

A further object of the present invention, while achieving the before- stated object, is to provide a cloning vector and an expression vector for such a nucleic acid molecule.

Yet another object of the present invention, while achieving the before- stated objects, is to provide a recombinant cell line that contains such an expression vector.

Yet a further object of the present invention, while achieving the before- stated objects, is to produce core 1 ß3-galactosyl transferase and fragments thereof.

A still further object of the present invention, while achieving the before- stated objects, is to provide methods for using core 1 ß3-galactosyl transferase and fragments thereof.

Other objects, features and advantages of the present invention will become apparent from the following detailed description when read in conjunction with the appended claims.

DETAILED DESCRIPTION OF THE INVENTION The core 1 0-linked glycan structure, consisting of galactose in pal, 3 linkage to N-acetylgalactosamine linked to a threonine or serine on a protein, peptide or polypeptide, is a critical intermediate in the biosynthesis of most extended 0-linked glycans. The core 1 structure is found on a number of mucins and adhesion molecules. Core 1 (33-galactosyl transferase, which is capable of synthesizing the core 1 0-linked glycan structure Gal ß3-GalNAc- Thr/Ser, has been purified herein from rat liver. N-termina and internal protein sequence of the purified enzyme was obtained and used to identify human EST clones, and a full length cDNA for the human core 1 i33-galactosyl transferase was isolated using standard molecular biology techniques. The rat core 1 ß3- galactosyl transferase cDNA has also been identified. The mouse, C. elegans, and two Drosophila melanogaster core 1 ß3-galactosyl transferase genes are also described herein. An alignment of the human, rat, mouse, C. elegans, and two D. melanogaster core 1 ß3-gala*osyl transferases is also provided, demonstrating that these are highly homologous proteins; in particular, the C. elegans protein is 41% identical to the human protein, with 7 of 9 cystines being conserved. Also provided herein is a soluble, epitope-tagged version of the human core 1 i33-galactosyl transferase which has been expressed and recovered from culture media.

The polynucleotides of the present invention may be in the form of RNA or in the form of DNA, wherein the term"DNA"includes cDNA, genomic DNA and synthetic DNA. The DNA may be double-stranded or single-stranded, and if single-stranded, may be the coding strand or non-coding (anti-sense) strand.

The coding sequence which encodes the mature polypeptide may be identical to the coding sequence shown herein or may be a different coding sequence which, as a result of the redundancy or degeneracy of the genetic code, encodes the same, mature polypeptide as the DNA coding sequences'shown herein.

The polynucleotides which encode the mature polypeptides may include : only the coding sequence for the mature polypeptide ; the coding sequence for the mature polypeptide and additional coding sequence such as a leader or secretory sequence or a proprotein sequence ; the coding sequence for the mature polypeptide (and optionally additional coding sequence) and non-coding sequence, such as introns, or non-coding sequence 5'and/or 3'of the coding sequence for the mature polypeptide.

Thus, the term"polynucleotide encoding a polypeptide"encompasses a polynucleotide which includes only coding sequence for the polypeptide as well as a polynucleotide which includes additional coding and/or non-coding sequence.

The present invention further relates to variants of the hereinabove described polynucleotides which encode fragments, analogs and derivatives of the polypeptide having the amino acid sequences of SEQ ID NO : 1, SEQ ID NO : 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID N0 : 9 and SEQ ID N0 : 17. The variants of the polynucleotide may be naturally occurring allelic variants of the polynucleotides or nonnaturally occurring variants of the polynucleotides.

Thus, the present invention includes polynucleotides encoding the same mature polypeptides as shown in SEQ ID NO : 1, SEQ ID N0 : 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID N0 : 9 and SEQ ID N0 : 17, as well as variants of such polynucleotides which encode active fragments, derivatives or analogs of said polypeptides. Such nucleotide variants include deletion variants, substitution variants and addition or insertion variants.

As hereinabove indicated, the polynucleotide may have a coding sequence which is a naturally occurring allelic variant of the coding sequences of SEQ ID NO : 2, SEQ ID NO : 4, SEQ ID NO : 6, SEQ ID NO : 8, SEQ ID NO : 10 and SEQ ID NO : 18. The portions of SEQ ID NO : 2, SEQ ID NO : 4, SEQ ID NO : 6, and SEQ ID NO : 8 which encode the protein sequences of SEQ ID NO : 1, SEQ ID N0 : 3, SEQ ID NO : 5 and SEQ ID N0 : 7, respectively, are provided as SEQ ID NO : 12, SEQ ID NO : 13, SEQ ID NO : 14, and SEQ ID NO : 15, respectively (SEQ ID N0 : 10 and SEQ ID NO : 18 contain only the open reading frames of the core 1 (33-GaIT genes and no non-coding sequences). As is known in the art, an allelic variant is an alternate form of a polynucleotide sequence which may have a substitution, deletion or addition of one or more nucleotides which does not substantially adversely alter the function of the encoded polypeptide.

The present invention further relates to a ß3-GalT polypeptide which has the amino acid sequence of SEQ ID NO : 1, SEQ ID NO : 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO : 9 and SEQ ID NO : 17, as well as fragments, analogs and derivatives of such polypeptide.

The terms"fragment","derivative"and"analog"when referring to the polypeptide of SEQ ID NO : 1, SEQ ID NO : 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO : 9 and SEQ ID NO : 17, refer to a ß3-GalT which retains essentially the same or increased biological functions or activities as the native ß3-GalT. Thus, an analog includes a proprotein which can be activated by cleavage of a proprotein portion to produce an active mature polypeptide. Fragments of core 1 ß3-GalT include soluble, active proteins which have the N-terminal transmembrane region removed.

The polypeptide of the present invention may be a natural polypeptide or a synthetic polypeptide, or preferably a recombinant polypeptide.

The fragment, derivative or analog of the polypeptide of SEQ ID NO : 1, SEQ ID NO : 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO : 9 and SEQ ID NO : 17 may be (i) one in which one or more of the amino acid residues are substituted with a conserved or non-conserved amino acid residue (preferably a conserved amino acid residue) and such substituted amino acid residue may or may not be one encoded by the genetic code, or (ii) one in which one or more of the amino acid residues includes a substituent group, or (iii) one in which the mature polypeptide is fused with another compound, such as a compound to increase the half-life of the polypeptide (for example, polyethylene glycol), or (iv) one in which the additional amino acids are fused to the mature polypeptide, such as a leader or secretory sequence or a sequence which is employed for purification of the mature polypeptide or a proprotein sequence.

Such fragments, derivatives and analogs are deemed to be within the scope of one of ordinary skill in the art given the teachings herein.

The polypeptides and polynucleotides of the present invention are preferably provided in an isolated form, and preferably are purified substantially to homogeneity.

The term"isolated"means that the material is removed from its original environment (e. g., the natural environment if it is naturally occurring) in a form sufficient to be useful in performing its inherent enzymatic function. For example, a naturally-occurring polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide or polypeptide separated from some or all of the coexisting materials in the natural system, is isolated.

Such polynucleotides could be part of a vector, and/or such polynucleotides or polypeptides could be part of a composition, and still be isolated in that such vector or composition is not part of its natural environment.

The present invention also relates to vectors which include polynucleotides of the present invention, host cells which are genetically engineered with vectors of the invention, and the production of polypeptides of the invention by recombinant techniques.

Host cells are genetically engineered (transduced or transformed or transfected) with the vectors of this invention which may be, for example, a cloning vector or an expression vector. The vector may be, for example, in the form of a plasmid, a viral particle, or a phage or other vectors known in the art.

The engineered host cells can be cultured in conventional nutrient media modified as appropriate for activating promoters, selecting transformants or amplifying the (33-GaiT genes. The culture conditions, such as temperature, pH and the like, are those previously used with the host cell selected for expression, and will be apparent to the ordinary skilled artisan.

The (33-GaIT-encoding polynucleotides of the present invention may be employed for producing Gal ß3-GalNAc by recombinant techniques or synthetic in vitro techniques. Thus, for example, the 03-GaIT polynucleotides may be included along with a gene encoding a protein requiring 0-linked glycosylation in any one of a variety of expression vectors for expressing the (33-GaIT and the protein requiring 0-linked glycosylation. Such vectors include chromosomal, nonchromosomal and synthetic DNA sequences, e. g., derivatives of SV40; bacterial plasmids ; phage DNA; baculovirus ; yeast plasmids ; vectors derived from combinations of plasmids and phage DNA, viral DNA such as vaccinia, adenovirus, fowl pox virus, and pseudorabies. However, any other vector may be used as long as it is repliable in the host. In one embodiment, the protein requiring O-linked glycosylation is P-selectin glycoprotein ligand-1 or a portion thereof or a synthetic peptide which has P-selectin binding activity.

The appropriate DNA sequence (or sequences) may be inserted into the vector by a variety of procedures. For example, the DNA sequence may be inserted into an appropriate restriction endonuclease sites (s) by procedures known in the art. Such procedures and others are deemed to be within the scope of a person of ordinary skill in the art.

The DNA sequence in the expression vector is operatively linked to an appropriate expression control sequence (s) (promoter) to direct mRNA synthesis. As representative examples of such promoters, there may be mentioned : LTR or SV40 promoter, the E. coli lac or trp, the phage lambda PL promoter and other promoters known to control expression of genes in prokaryotic or eukaryotic cells or their viruses. The expression vector also contains a ribosome binding site for translation initiation and a transcription terminator. The vector may also include appropriate sequences for amplifying expression.

In addition, the expression vectors preferably contain one or more selectable marker genes to provide a phenotypic trait for selection of transformed host cells, such as dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such as tetracycline or ampicillin resistance in E. coli.

The vector containing the appropriate DNA sequence as hereinabove described, as well as an appropriate promoter or control sequence, may be employed to transform an appropriate host to permit the host to express the protein as described elsewhere herein.

As representative examples of appropriate hosts, there may be mentioned : bacterial cells, such as E. coli, Streptomyces, Salmonella typhimurium ; fungal cells, such as yeast; insect cells such as Drosophila and Sf9; animal cells such as CHO, COS, 293T or Bowes melanoma ; plant cells, etc.

The selection of an appropriate host is deemed to be within the scope of a person of ordinary skill in the art given the teachings herein.

More particularly, the present invention also includes recombinant constructs comprising one or more of the sequences as broadly described above. The constructs comprise a vector, such as a plasmid or viral vector, into which a sequence of the invention has been inserted, in a forward or reverse orientation. In a preferred aspect of this embodiment, the construct further comprises regulatory sequences, including, for example, a promoter, operably linked to the sequence. Large numbers of suitable vectors and promoters are known to those of skill in the art, and are commercially available. Bacterial : pQE70, pQE60, pQE-9 (Qiagen), pbs, pDlO, phagescript, psi174, pBluescript SK, pbsks, pNH8A, pNH16a, pNH18A, pNH46A (Stratagene); ptrc99a, pKK223- 3, pKK233-3, pDR540, pRIT5 (Pharmacia). Eukaryotic: pWLNEO, pSV2CAT, pOG44, pXTl, pSG (Stratagene) pSVK3, pBPV, pMSG, pSVL (Pharmacia).

However, any other plasmids or vectors may be used as long as they are repliable in the host.

Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers.

Two appropriate vectors are PKK232-8 and PCM7. Particular named bacterial promoters include lacI, lacZ, T3, T7, gpt, lambda PR, PL and trp. Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art.

In a further embodiment, the present invention relates to host cells containing the above-described constructs. The host cells may be obtained using techniques known in the art. Suitable host cells include prokaryotic or. lower or higher eukaryotic organisms or cell lines, for example bacterial, mammalian, yeast, or other fungi, viral, plant or insect cells. Methods for transforming or transfecting cells to express foreign DNA are well known in the art (See for example, Itakura et al., U. S. Pat. No. 4,704,362; Hinnen et al. j PNAS USA 75: 1929-1933, 1978; Murray et al., U. S. Pat. No. 4,801,542; Upshall. et al., U. S. Pat. No. 4,766,075; and Sambrook et al., Molecular Cloning : A Laboratory Manual 2nd Ed., Cold Spring Harbor Laboratory Press, 1989), all of which are incorporated herein by reference.

Introduction of the construct into the host cell can be effected by methods well known in the art such as by calcium phosphate transfection, DEAE-Dextran mediated transfection, or electroporation (Davis, L., Dibner, M. Battey, I., Basic Methods in Molecular Biology, (1986)).

The constructs in host cells can be used in a conventional manner to produce the gene product encoded by the recombinant sequence. Alternatively, the polypeptides of the invention can be synthetically produced by conventional peptide synthesizers.

Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells under the control of appropriate promoters. Cell-free translation systems can also be employed to produce such proteins using RNAs derived from the DNA constructs of the present invention. Appropriate cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described by Sambrook et al., Molecular Cloning : A Laboratory Manual, Second Edition, Cold Spring Harbor, N. Y., (1989), the disclosure of which is hereby incorporated herein by reference.

Transcription of the DNA encoding the polypeptides of the present invention by higher eukaryotes may be increased by inserting an enhancer sequence into the vector. Enhancers are cis-acting elements of DNA, usually about from 10 to 300 bp that act on a promoter to increase its transcription.

Examples include the SV40 enhancer, a cytomegalovirus early promoter enhancer, the polyoma enhancer, and adenovirus enhancers.

Generally, recombinant expression vectors will include origins of replication and selectable markers permitting transformation of the host cell, e. g., the ampicillin resistance gene of E. coli and S, cerevisiae TRP1 gene, and a promoter derived from a highly-expressed gene to direct transcription of a downstream structural sequence. Such promoters can be derived from operons encoding glycolytic enzymes such as 3-phosoglycerate kinase (PGK), a-factor, acid phosphatase, or heat shock proteins, among others. The heterologous structural sequence is assembled in appropriate phase with translation initiation and termination sequences, and preferably, a leader sequence capable of directing secretion of translated protein into the periplasmic space or extracelluar medium. Optionally, the heterologous sequence can encode a fusion protein including an N-terminal or C-terminal identification peptide imparting desired characteristics, e. g., stabilization or simplified purification of expressed recombinant product.

Useful expression vectors for bacterial use are constructed by inserting one or more structural DNA sequences encoding one or more desired proteins together with suitable translation initiation and termination signals in operable reading phase with a functional promoter. The vector will comprise one or more phenotypic selectable markers and an origin of replication to ensure maintenance of the vector and to, if desirable, provide amplification within the host. Suitable prokaryotic hosts for transformation include E. coli, Bacillus subtilis, Salmonella typhimurium and various species within the genera Pseudomonas, Streptomyces, and Staphylococcus, ålthough others may also be employed as a matter of choice.

As a representative but nonlimiting example, useful expression vectors for bacterial use can comprise a selectable marker and bacterial origin of replication derived from commercially available plasmids comprising genetic elements of the well known cloning vector pBR322, (ATCC 37017). These pBR322"backbone"sections are combined with an appropriate promoter and the structural sequence to be expressed.

Following transformation of a suitable host strain and growth of. the host strain to an appropriate cell density, the selected promoter is induced by appropriate methods (e. g., temperature shift or chemical induction) and cells are cultured for an additional period.

Cells are typically harvested by centrifugation, disrupted by physical or chemical methods, and the resulting crude extract retained for further purification. Microbial cells employed in expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical. disruption, or use of cell lysing agents. Such methods are well known to a person of ordinary skill in the art.

Various mammalian cell culture systems can also be employed to express recombinant protein. Examples of mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts, described by Gluzman (Cell, 23: 175 (1981)), and other cell lines capable of transcribing compatible vectors, for example, the C127, 293T, 3T3, CHO, HeLa and BHK cell lines. Mammalian expression vectors will comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, and S'flanking nontranscribed sequences. DNA sequences derived from the SV40 splice and polyadenylation sites may be used to provide the required nontranscribed genetic elements.

The 3-GalT polypeptides or portions thereof can be recovered and purified from recombinant cell cultures by methods including but not limited to ammonium sulfate or ethanol precipitation, acid extraction, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic interaction chromatography, affinity chromatography, hydroxyl apatite chromatography, and lectin chromatography, alone or in combination. Protein refolding steps can be used as necessary in completing configuration of the mature protein. Finally, high performance liquid chromatography (HPLC) can be employed for final purification steps.

The polypeptides of the present invention may be a naturally purified product, or a product of chemical synthetic procedures, or produced by recombinant techniques from a prokaryotic or eukaryotic host (for example, by bacterial, yeast, higher plant, insect and mammalian cells in culture).

Depending upon the host employed in a recombinant production procedure, the polypeptides of the present invention may be glycosylated or may be non- glycosylated. Polypeptides of the invention may also include an initial methionine amino acid residue.

A recombinant p3-GatTofthe invention, or functional fragment, derivative or analog thereof, may be expressed chromosomally, after integration of the (33-GaIT coding sequence by recombination. In this regard any of a number of amplification systems may be used to achieve high levels of stable gene expression (See Sambrook et al., 1989, supra).

The cell into which the recombinant vector comprising the nucleic acid encoding the (33-GalT is cultured in an appropriate cell culture medium under conditions that provide for expression of the (33-GaIT by the cell. If full length , 33-GalT is expressed, the expressed protein will comprise an integral transmembrane portion. If a (33-GalT lacking a transmembrane domain is expressed, the expressed soluble (33-GalT can then be recovered from the culture according to methods well known to persons of ordinary skill in the art.

Such methods are described in detail, infra.

Any of the methods previously described for the insertion of DNA fragments into a cloning vector may be used to construct expression vectors containing a gene consisting of appropriate transcriptional/translational control signals and the protein coding sequences. These methods may include in vitro recombinant DNA and synthetic techniques and in vivo recombination.

The polypeptides, their fragments or other derivatives, or analogs thereof, or cells expressing them can be used as an immunogen to produce antibodies thereto. These antibodies can be, for example, polyclonal or monoclonal antibodies. The present invention also includes chimeric, single chain, and humanized antibodies, as well as Fab (F (ab') 2 fragments, or the product of an Fab expression library. Various procedures known in the art may be used for the production of such antibodies and fragments.

Antibodies generated against the polypeptides corresponding to a sequence of the present invention can be obtained by direct injection of the polypeptides into an animal or by other appropriate forms of administering the polypeptides to an animal, preferably a nonhuman. The antibody so obtained will then bind the polypeptide itself. In this manner, even a sequence encoding only a fragment of the polypeptide can be used to generate antibodies binding the whole native polypeptide. Such antibodies can then be used to isolate the polypeptide from tissue expressing that polypeptide.

For preparation of monoclonal antibodies, any technique which provides antibodies produced by continuous cell line cultures can be used. Examples include the hybridoma technique (Kohler and Milstein, 1975, Nature, 256: 495- 497), the trioma technique, the human B-cell hybridoma technique (Kozbor et al., 1983, Immunology Today 4: 72), and the EBV-hybridoma technique to produce human monoclonal antibodies (Cole, et al., 1985, in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96).

Techniques described for the production of single chain antibodies (U. S.

Pat. No. 4,946,778) can be adapted to produce single chain antibodies to immunogenic polypeptide products of this invention.

The polyclonal or monoclonal antibodies may be labeled with a detectable marker including various enzymes, fluorescent materials, luminescent materials and radioactive materials. Examples of suitable enzymes include horseradish peroxidase, alkaline phosphatase, ß-galactosidase, or acetylcholinesterase ; examples of suitable fluorescent materials include umbeliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; examples of luminescent materials include luminol and aequorin; and examples of suitable radioactive material include 535, Cru64, Ga67,ZR89,Ru97,Tc99m,Rh105,Pd109,In111,I123,I125,I131,Re186, Au198,Au199,Pb203, Atoll, Pb and Bizl2. The antibodies may also be labeled or conjugated to one partner of a ligand binding pair. Representative examples include avidin-biotin and riboflavin-riboflavin binding protein.

Methods for conjugating or labeling the antibodies discussed above with the representative labels set forth above may be readily accomplished using conventional techniques such as described in U. S. Pat. No. 4,744,981 (Trichothecene Antibody); U. S. Pat. No., 5, 106, 951 (Antibody Conjugate); U. S.

Pat. No. 4,018,884 (Fluorengenic Materials and Labeling Techniques); U. S. Pat.

No. 4,897,255 (Metal Radionucleotide Labeled Proteins for Diagnosis and Therapy); U. S. Pat. No. 4,988,496 (Metal Radionuclide Cheating Compounds for Improved Chelation Kinetics); Inman, Methods in Enzymology, Vol. 34, Affinity Techniques, Enzyme Purification; Part B, Jacoby and Wichek (eds) Academic Press, New York, P. 30,1974; and Wilcheck and Bayer, The Avidin- Biotin Complex in Bioanalytical Applications Anal. Biochem. 171: 1-32,1988.

Due to the degeneracy of nucleotide coding sequences, other DNA sequences which encode substantially the same amino acid sequence as a ß3- GaIT gene described herein may be used in the practice of the present invention. These include but are not limited to nucleotide sequences comprising all or portions of 3-GalT genes which are altered by the substitution of different codons that encode the same amino acid residue within the sequence, thus producing a silent change. Likewise, the 63-GaiT derivatives of the invention include, but are not limited to those containing, as a primary amino acid sequence, all or part of the amino acid sequence of the ß3-GalT protein including altered sequences in which functionally equivalentamino acid residues are substituted for residues within the sequence, resulting in a conservative amino acid substitution. For example, one or more amino acid residues within the sequence can be substituted for another amino acid of a similar polarity, which acts as a functional equivalent. Substitutions for an amino acid within the sequence may be selected from, but are not limited to, other members of the class to which the amino acid belongs. For example, the nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, valine, proline, phenylalanine, tryptophan and methionine. The polar (neutral) amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and glutamin. The positively charged (basic) amino acids include arginin, lysine and histidine. The negatively charged (acidic) amino acids include aspartic acid and glutamic acid.

The genes encoding ß3-GalT derivatives and analogs of the invention can be produced by various methods known in the art. The manipulations which result in their production can occur at the gene or protein level. For example, the cloned (33-GaIT gene sequence can be modified by any of numerous strategies known in the art (Sambrook et al., 1989, supra). The sequence can be cleaved at appropriate sites with restriction endonuclease (s), followed by further enzymatic modification if desired, isolated, and ligated in vitro. In the production of the gene encoding a derivative or analog of (33-GaIT, care should be taken to ensure that the modified gene remains within the same translational reading frame as the (33-GaIT coding sequence, uninterrupted by translation stop signals, in the gene region where the desired activity is encoded.

Within the context of the present invention, 03-GaIT may include various structural forms of the primary protein which retain biological activity. For example, (33-GalT polypeptide may be in the form of acidic or basic salts or in neutral form. In addition, individual amino acid residues may be modified by oxidation or reduction. Furthermore, various substitutions, deletions or additions may be made to the amino acid or nucleic acid sequences, the net effect being that biological activity of 03-GaIT is retained. Due to code degeneracy, for example, there may be considerable variation in nucleotide sequences encoding the same amino acid.

Mutations in nucleotide sequences constructed for expression of derivatives of (33-GaIT polypeptide must preserve the reading frame phase of the coding sequences. Furthermore, the mutations will preferably not create complementary regions that could hybridize to produce secondary mRNA structures, such as loops or hairpins which could adversely affect translation of the mRNA.

Mutations may be introduced at particular loci by synthesizing oligonucleotides containing a. mutant sequence, flanked by restriction sites enabling ligation. to fragments of the native sequence. Following ligation, the resulting reconstructed sequence encodes a derivative having the desired amino acid insertion, substitution, or deletion.

Alternatively, oligonucleotide-directed site specific mutagenesis procedures may be employed to provide an altered gene having particular codons altered according to the substitution, deletion, or insertion required.

Deletions or truncations of 03-GaiTs may also be constructed by utilizing convenient restriction endonuclease sites adjacent to the desired deletion.

Subsequent to restriction, overhangs may be filled in, and the DNA religated.

Exemplary methods of making the alterations set forth above are disclosed by Sambrook et al., (Molecular Cloning : A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory Press, 1989).

As noted above, a nucleic acid sequence encoding a (33-GaIT can be mutated in vitro or in vivo, to create and/or destroy translation, initiation, and/or termination sequences, or to create variations in coding regions and/or form new restriction endonuclease sites or destroy preexisting ones, to facilitate further in vitro modification. Preferably, such mutations enhance the functional activity of the mutated (33-GaIT gene product. Any technique for mutagenesis known in the art can be used, including but not limited to, in vitro site-directed mutagenesis (Hutchinson, C., et al., 1978, J. Biol. Chem. 253: 6551; Zoller and Smith, 1984, DNA 3: 479-488; Oliphant et al., 1986, Gene 44: 177; Hutchinson et al., 1986, Proc. Natl. Acad. Sci. U. S. A. 83: 710), use of TAB@ tinkers (Pharmacia), etc. PCR techniques are preferred for site directed mutagenesis (see Higuchi, 1989,"Using PCR to Engineer DNA", in PCRTechnology : Principles and Applications for DNA amplification, H. Erlich, ed., Stockton Press, Chapter 6, pp. 61-70).

It is well known in the art that some DNA sequences within a larger stretch of sequence are more important than others in determining functionality. A skilled artisan can test allowable variations in sequence, without expense of undue experimentation, by well-known mutagenic. techniques which include, but are not limited to, those discussed by D. Shortle et al. (1981) Ann. Rev. Genet. 15: 265; M. Smith (1985) ibid. 19: 423; D.

Botstein and D. Shortle (1985) Science 229: 1193; by linker scanning mutagenesis (S. McKnight and R. Kingsbury (1982) Science 217 : 316), or by saturation mutagenesis (R. Myers et al. (1986) Science 232 : 613). These variations may be determined by standard techniques in combination with assay methods described herein to enable those in the art to manipulate and bring into utility the functional units of upstream transcription activating sequence, promoter elements, structural genes, and polyadenylation signals. Using the methods described herein the skilled artisan can without application of undue experimentation test altered sequences within the upstream activator for retention of function. All such shortened or altered functional sequences of the activating element sequences described herein are within the scope of this invention.

The nucleic acid molecule of the invention also permits the identification and isolation, or synthesis of nucleotide sequences which may be used as primers to amplify a nucleic acid molecule of the invention, for example in the polymerase chain reaction (PCR) which is discussed in more detail below. The primers may be used to amplify the genomic DNA of other species which possess (33-GaIT activity. The PCR amplified sequences can be examined to determine the relationship between the various ß3-GalT genes.

The length and bases of the primers for use in the PCR are-selected so that they will hybridize to different strands of the desired sequence and at relative positions along the sequence such that an extension product synthesized from one primer when it is separated from its template can serve as a template for extension of the other primer into a nucleic acid of defined length.

Primers which may be used in the invention are oligonucleotides of the nucleic acid molecule of the invention which occur naturally, as in purified products of restriction endonuclease digest, or are produced synthetically using techniques known in the art, such as phosphotriester and phosphodiesters methods (See Good et al., Nucl. Acid Res 4: 2157,1977) or automated techniques (See for example, Conolly, B. A. Nucleic Acids Res. 15: 15 (8\7): 3131, 1987). The primers are capable of acting as a point of initiation of synthesis when placed under conditions which permit the synthesis of a primer extension product which is complementary to the DNA sequence of the invention i. e., in the presence of nucleotide substrates, an agent for polymerization, such as DNA polymerase, and at suitable temperature and pH.

Preferably, the primers are sequences that do not form secondary structures by base pairing with other copies of the primer or sequences that form a hair pin configuration. The primer may be single or double-stranded. When the primer is double-stranded it may be treated to separate its strands before using to prepare amplification products. The primer preferably contains between about 7 and 50 nucleotides.

The primers may be labeled with detectable markers which allow for detection of. the amplified products. Suitable detectable markers are radioactive markers such as p32, S35, I125, and H3, luminescent markers such as chemiluminescent markers, preferably luminol, and fluorescent markers, preferably dansyl chloride, fluorocein-5-isothiocyanate, and 4-fluor-7-nitrobenz- 2-axa-1, 3 diazole, enzyme markers such as horseradish peroxidase, alkaline phosphatase, ß-galactosidase, acetylcholinesterase, or biotin.

It will be appreciated that the primers may contain non-complementary sequences provided that a sufficient amount of the primer contains a sequence which is complementary to a nucleic acid molecule of the invention or oligonucleotide sequence thereof which is to be amplified. Restriction site linkers may also be incorporated into the primers, allowing for digestion of the amplified products with the appropriate restriction enzymes facilitating cloning and sequencing of the amplified product.

In an embodiment of the invention a method of determining the presence of a nucleic acid molecule having a sequence encoding a p3-Ga ! T, or a predetermined oligonucleotide fragment thereof in a sample, is provided comprising treating the sample with primers which are capable of amplifying the nucleic acid molecule or the predetermined oligonucleotide fragment thereof in a polymerase chain reaction to form amplified sequences, under conditions which permit the formation of amplified sequences, and assaying for amplified sequences.

The polymerase chain reaction. refers to a process for amplifying a target nucleic acid sequence as generally described in Innis et al., Academic Pres, 1990; in Mullis et. al., U. S. Pat. No. 4,863,195 and Mullis, U. S. Pat. No.

4,683,202 which are incorporated herein by reference. Conditions for amplifying a nucleic acid template are described in M. A. Innis and D. H.

Gelfand, PCR Protocols, A Guide to Methods and Applications, M. A. Innis, D. H.

Gelfand, J. J. Shinsky and T. J. White eds, pp 3-12, Academic Press 1989, which is also incorporated herein by reference.

It will be appreciated that other techniques such as the Ligase Chain Reaction (LCR) and NASBA may be used to amplify a nucleic acid molecule of the invention. In LCR, two primers which hybridize adjacent to each other on the target strand are ligated in the presence of the target strand to produce a complementary strand (Barney in"PCR Methods and Applications", Aug 1991, Vol 1 (1), page 4, and European Published Application No. 0320308, published Jun. 14, 1989. NASBA is a continuous amplification method using two primers, one incorporating a promoter sequence recognized by an RNA polymerase. and the second derived from the complementary sequence of the target sequence to the first primer (U. S. Pat. No. 5,130,238 to Malek).

The present invention also provides novel fusion proteins in which any of the enzymes of the present invention are fused to a polypeptide such as protein A, streptavidin, fragments of c-myc, maltose binding protein, IgG, IgM, amino acid tag, etc. In addition, it is preferred that the polypeptide fused to the enzyme of the present invention is chosen to facilitate the release of the fusion protein from a prokaryotic cell or a. eukaryotic cell, into the culture medium, and to enable its (affinity) purification and possibly immobilization on a solid phase matrix.

In another embodiment, the present invention provides novel DNA sequences which encode a fusion protein according to the present invention.

The present invention also provides novel immunoassays for the detection and/or quantitation of the present enzymes in a sample. The present immunoassays utilize one or more of the present monoclonal or polyclonal antibodies which specifically bind to the present enzymes. Preferably the present immunoassays utilize a monoclonal antibody. The present immunoassay may be a competitive assay, a sandwich assay, or a displacement assay, such as those described in Harlow, E. et al., Antibodies. A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1988) and may rely on the signal generated by a radiolabel, a chromophore, or an enzyme, such as horseradish peroxidase.

Alterations in core 1 B3-galactosyl transferase activity have been described in Tn-syndrome (Vainchenker et al. (1985) J. Clin. Invest. 75: 541), an exceedingly rare hematologic disorder, which has been described in probably' less than 50 patients. In addition, a role for an alteration in the synthesis of the core 1 structure has been proposed as a possible etiology for IgA nephropathy syndrome, although this remains to be proven (Kokubo et al.

(1997) J. Am ; Soc. Nephrol. 8: 915). Core 1 ß3-galactosyl transferase ha's'also been demonstrated to be useful in the synthesis of glycosulfopeptides which can function as inhibitors of P-selectin : PSGL-1 interactions.

Therefore, the core 1 ß3-galactosyl transferase enzymes of the present invention can be used for in vitro synthesis of glycosulfopeptides to block selectin : ligand interactions. Other potential uses for the core 1 3-galactosyl transferase enzymes of the present invention which can be envisioned include diagnostic tests for the rare Tn-syndrome or IgA nepropathy, as well as for therapy of these disorders.

The invention will be more fully understood by reference to the following examples. However, the examples are merely intended to illustrate embodiments of the invention and are not to be construed to limit the scope of the invention.

Examples Assay of Core 1 ß3-Galactosyl Transferase Activity. Core 1 ß3- galactosyl transferase activity was assayed as previously described (Brockhausen, I. (1992) Biochem and Cell Biol 70: 99) with the following modifications. The assay contained the following: 100 mM MES pH 6.8,0.2% Triton X-100,20 mM MnCl2, 1 mM phenyl-α-GalNAc, 4 mM [3H]-UDP-Gal (100,000-150,000 dpm/nmol), 2 mM ATP and 5-25 pi sample containing core 1 ß3-GalT in a total volume of 50 ui. Mixtures were incubated at 37°C for 30-60 minutes and stopped by adding 950 ut cold H20. The mixtures were loaded onto 500 mg Sep-Pak@ C18 cartridges previously activated with 2 mi ethanol and equilibrated with 10 ml water. Following application of the diluted reaction mixture, the columns were washed with 10 ml water, eluted with 1 ml n- butanol, and radioactivity determined by liquid scintillation counting in 10 mi Scintiverse-BD.

Purification of Core 1 ß3-Galactosyl Transferase. The enzyme has been purified from rat liver using an affinity chromatographic step consisting of immobilized bovine submaxillary mucin that has been neuraminidase treated and coupled to UltraLinkT'". The enzyme binds tightly to this support and is eluted with high salt. Due to the tight binding, core 1 {33-GaiT elutes late in the elution pattern after most nonspecific binding proteins have already eluted.

Final purification was achieved by gel filtration chromatography on Superose 12. Overall, the enzyme was purified 71,000-fold in 8% yield, as shown in Figures 2 and 3.

Step 1 : Homogenization, Subcellular Fractionation, Isolation of Membranes and Solubilization. 500 grams fresh rat liver was washed with cold 150 mM NaCI in 25 mM Tris-cl pH 7.4 and homogenized with 2,000 mi of buffer containing 25 mM Tris-HCI pH 7.5,0.25 M Sucrose, 1 mM PMSF, 2 ug/ml Leupeptin, 1 mM Benzamidine, and 0.7 ug/ml Pepstatin A in a Waring commercial blender. The homogenate was centrifuged at 20,000xg for 20 minutes, and the supernatant was then decanted and centrifuged at 100, 000xg for 60 minutes. The pellets were suspended in 5 volumes of buffer containing 50 mM Tris-ci pH 9.0,0.25 M Sucrose, 1 mM PMSF, 1 mM Benzamidine, 0.7 ug/m ! Pepstatin A, and 2 RC_AAg/ml Leupeptin. The suspension was sonicated four times for 10 seconds each in an ice-bath, extracted in ice for 1 hour, and then centrifuged at 100, 000xg for 60 minutes. The supernatant was collected and the pH adjusted using 1 mM MES. The approximate volume of the supernatant containing solubilized membrane proteins was 1,000 ml.

Step 2 : SP-Sepharose FF Chromatography. The solubilized membrane proteins were applied onto a 6 x 20 cm SP-Sepharose FF column (Pharmacia BioTech) which was equilibrated with equilibration buffer, which contains 25 mM MES pH 6.5,0.1% Triton X-100,5 mM MnCl2, 1 mM PMSF, 1 mM Benzamidine, 0.7 ug/ml Pepstatin A, and 2 ug/ml Leupeptin. The column was washed with the same buffer, and then the core 1 ß3-GalT was step-eluted using 1 M NaCi in equilibration buffer.

Step 3: Asialo-BSM UltraLinkTM Chromatography. The SP-Sepharose elute was dialyzed and concentrated into equilibration buffer in an Amicon concentrator using a YM30 membrane. The sample was loaded onto a 1 x 5 cm Asialo-BSM UltralinkT column equilibrated with a second equilibration buffer, which contains 25 mM MES pH 6.8,0.01% Triton X-100,10 mM MnCl2, 150 mM NaCi, 1 mM PMSF, 1 mM Benzaminidine, 2 Ng/ml Leupepti'n, and 0.7gug/ml Pepstatin A. After washing, with the same buffer, core 1 ß3-GalT was eluted with 1 M NaCl in the second equilibration buffer without MnCl2. Fractions were collected, and activity of core 1 ß3-GalT was assayed as described above. The fractions which contained core 1 ß3-GalT were then pooled.

Step 4 : Superose 12 Chromatography. The pooled samples containing core 1 ß3-GalT from Asialo-BSM UltraLinkTM chromatography were concentrated to a final volume of 200 ut using Centriprep 30 and Centricon 30 concentrators, loaded on a 1. 5 x 35 cm Superose 12 column (Pharmacia BioTech) equilibrated with a third equilibration buffer, which contains 25 mM Tris-HCI pH 7.2,0.005% Triton X-100, and 150 mM NaCl. Core 1 (33-GaiT was eluted with the same buffer, and fractions were pooled and assayed as described above.

Using the purified enzyme, amino-terminal and internal protein sequence was obtained by standard molecular biology techniques. Blast searching of the NCBI EST database using the rat core 1 ß3-GalT N-terminal peptide sequence identified a rat EST, AI059600.

Identification of Human Core 1 ß3-GalT and Expression of Recombinant Core 1 p3-GaIT in Mammalian Cells. Blast searching with the rat EST sequence (AI059600) identified a human EST (T10488). The human EST was sequenced and found to contain a 1. 6 kb insert incomplete at the 5'end. The human core 1 (33-GaIT cDNA was complete by 5'-RACE using primers AP1 and 5'CTTTATGTTGGCTAGAATCTGC-3' (SEQ ID NO : 23) with human placenta marathon-ready cDNA as template. Amplification was carried out at 94°C for 1 minute followed by 35 cycles of 94°C for 30 seconds and 68°C for 2 minutes, then the reaction was held at 68°C for 10 minutes. The 450 bp product was purified using a QIA-quick column, ligated into PCR2.1, and sequenced.

The 1794 bp cDNA encoding human core 1 ß3-GalT is shown in SEQ ID NO : 2. The cDNA (SEQ ID NO : 2) and protein sequence (SEQ ID NO : 1) of human core 1 ß3-GalT is shown in Figure 4. An open reading frame (SEQ. ID NO : 12) of SEQ ID NO : 2 encodes a 363 amino acid type 2 transmembrane protein. The predicted 28 amino acid transmembrane domain (SEQ ID NO : 16) is underlined in Figure 3.

An expression vector encoding the wild-type human core 1 ß-Gal-T was constructed by ligating a 1. 5 kb XbaI/XhoI fragment of EST T10488 with a BamHI/XhoI digested pcDNA3.1 (+) vector (Invitrogen), a 155 bp ApoI/XbaI fragment from the cloned 5'-RACE product, and annealed oligonucleotides 5'- GATCCACCATGGCCTCTAAATCCTGGCTG-3' (SEQ ID NO : 19) and 5'- AATTCAGCCAGGATTTAGAGGCCATGGTG-3' (SEQ ID NO : 20).

Human embryonic kidney 293-T cells in 100 mm dishes were transiently transfected with wild-type expression vector using FUGEN6 according to the manufacturer's protocol and cultured in low-glucose Dulbecco's modified Eagle's media containing 10% Fetal Calf Serum. Cells were harvested at 24, 48 and 72 h,. washed twice with cold TBS (25 mM Tris-HCI pH 7.4,150 mM NaCl) and sonicated on ice (10 seconds, 3 times, Branson Cell Disruptor model 185, setting 5) in 300-500 ul of TBS containing 1 mM PMSF, 1 mM benzaminidine- HCI, 2, ug/ml leupeptin. Membrane fragments were collected by centrifugation (14,000 rpm for 10 minutes), solubilized in 0.5% Triton X-100, and assayed for core 1 (33-GaIT as described hereinbefore.

Expression of Recombinant Soluble, Epitope-Tagged Core 1 ß3- GaIT in Mammalian Cells. An expression vector encoding soluble epitope- tagged core 1 03-GaIT was constructed by ligating a 1584 bp BsmI/XhoI fragment from EST T10488 with BamHIjXhoI digested pcDNA3.1 (+)-TH (a modified form of the pcDNA3.1 (+) vector constructed for expression of fusion proteins containing an NH2-terminal epitope for HPC4, a Ca2t-dependent monoclonal antibody to Protein C (Rezaie et al. (1992) J. Biol. Chem.

267: 26104)), and annealed oligonucleotides 5'-GATCCTCATGCAAGG-3' (SEQ ID NO : 21) and 5'-TTGCATGAG-3' (SEQ ID NO : 22). Expression of this plasmid in eucaryotic cells results in the synthesis of core 1 (33-GaIT with 31 additional amino acids fused to Asp45 of the human core 1 ß3-(ialT sequence (SEQ ID NO : 11). The first 19 additional amino acids (SEQ ID NO : 24) correspond to the human transferrin signal peptide, which is recognized during the sorting process and directs the transport of the protein to the cell surface for secretion from the cell. Additional amino acids 20-31 (SEQ ID NO : 25) correspond to the HPC4 epitope tag. The soluble form of core 1 ß3-GalT which is secreted from the cell will have the signal peptide sequence removed but will still contain the HPC4 epitope tag.

Capture Assay for Soluble Form of Core 1 133-GaIT. Human 293T cells were transfected with the soluble form of core 1 03-GaIT and cultured and harvested as described hereinbefore for expression for wild type core 1 ß3- GaIT. Following harvesting at 24, 48 and 72 h, the cells were directly assayed as described herein below.

HPC4-Affi-Gel 10 (15 pi) equilibrated with equilibration buffer (50 mM Tris-HCl pH 7.2,100 mM NaCt and 1 mM Cacti2) was incubated with 500 pi media at 4°C on a rotator for 2 h, spun for 5 minutes in a microcentrifuge at 14,000xg, and both beads and supernatant were collected and saved. The beads were washed three times with 500 pi of 50 mM Tris-HCI pH 7.4,1 mM NaCl, 1 mM CaCI2 and once with equilibration buffer. The beads, media and supernatant were then assayed in the absence of Triton X-100 for core 1 P3- GaIT activity as described herein before.

Identification of Core 1 ß3-Galactosyl Transferase Gene and Protein Sequences in Other Species and Homology to Human Core 1 ß3- GaIT. M. musculus, C. elegans and D. melanogaster genes are described herein which encode core 1 ß3-galactosyl transferases. The 1469 bp M. musculus cDNA is shown in SEQ ID NO : 6, and an open reading frame corresponding to bases 180-1271 of SEQ ID NO : 6, shown in SEQ ID NO : 14, encodes the protein sequence of M. musculus core 1 (33-GaIT (SEQ ID NO : 5).

The M. musculus core 1 ß3-GalT has 89% identity and 94% similarity to the human enzyme (SEQ ID NO : 1). The 1172 bp C. elegans P3-GaIT gene is shown in SEQ ID NO : 8, and an open reading frame corresponding to bases 1-1170 of SEQ ID NO : 8, shown in SEQ ID NO : 15, encodes the protein sequence of C. elegans core 1 ß3-GalT (SEQ ID NO : 7). The C elegans core 1 ß3-GalT has 41% identity and 58% similarity to the human enzyme (SEQ ID NO : 1). Two highly homologous sequences derived from D. melanogaster have been identified and are designated as D. melanogaster core 1 ß3-GalT #1 and &num 2, respectively. The 1167 bp open reading frame of the D. melanogaster core 1 3-GalT &num 1 gene is shown in SEQ ID NO : 18, and encodes the protein sequence of D. melanogaster core 1 (33-GaIT &num 1 (SEQ ID NO : 17). The 1104 bp open reading frame of the D. melanogaster 03-GaIT #2 gene is shown in SEQ ID NO : 10, and encodes the protein sequence of D. melanogaster core 1 ß3-GalT #2 (SEQ ID NO : 9). The D. melanogaster core 1 ß3-GalT #2 has 41% identity and 55% similarity to the human enzyme (SEQ ID NO : 1).

In addition, a cDNA for the rat (R. norvegicus) core 1/33-GaIT has also been identified herein and is shown in SEQ ID NO : 4. Bases 154-1245 of SEQ ID NO : 4 correspond to the open reading frame, shown in SEQ ID NO : 13, which encodes the protein sequence of R. norwegicus core 1 ß3-GalT, which is shown in SEQ ID NO : 3. The rat core 1 03-GaIT has 89% identity and 93% similarity to the human enzyme (SEQ ID NO : 1).

It will be appreciated that the invention includes nucleotide or amino acid sequences which have substantial sequence homology (identity) with the nucleotide and amino acid sequences shown in the Sequence Listings. The term "sequences having substantial sequence homology"includes those nucleotide and amino acid sequences which have slight or inconsequential sequence variations from the sequences disclosed in the Sequence Listings, i. e. the homologous sequences function in substantially the same manner to produce substantially the same polypeptides as the actual sequences. The variations may be attributable to local mutations or structural modifications.

Substantially homologous (identical) sequences further include sequences having at least 41% sequence homology (identity) with the (33-GaIT polynucleotide or polypeptide sequences shown herein or other percentages as defined elsewhere herein.

As noted elsewhere herein, the present invention includes polynucleotides represented by SEQ ID NO : 2, SEQ ID NO : 4, SEQ ID NO : 6, SEQ ID NO : 8, SEQ ID NO : 10 and SEQ ID NO : 18, and coding sequences thereof (SEQ ID NO : 12, SEQ ID NO : 13, SEQ ID NO : 14, SEQ ID NO : 15, SEQ ID NO : 10 and SEQ ID NO : 18, respectively), which encode the proteins of SEQ ID NO : 1, SEQ ID NO : 3, SEQ ID NO : 5, SEQ ID NO : 7, SEQ ID NO : 9 and SEQ ID NO : 17, respectively.

Each polynucleotide comprises untranslated regions upstream and/or downstream of the coding sequence and a coding sequence (which by convention includes the stop codon). The coding sequences in SEQ ID NO : 12, SEQ ID NO : 13, SEQ ID N0 : 14, SEQ ID NO : 15, SEQ ID NO : 10 and SEQ ID NO : 18 of each polynucleotide SEQ ID NO : 2, SEQ ID NO : 4, SEQ ID N0 : 6, SEQ ID NO : 8, SEQ ID NO : 10 and SEQ ID NO : 18, respectively, encodes polypeptides of 363, 363, 363, 389,367 and 388 amino acids, respectively (SEQ ID NO : 1, SEQ ID NO : 3, SEQ ID NO : 5, SEQ ID NO : 7, SEQ ID NO : 9, and SEQ ID NO : 17).

A comparison of the ß3-GalTs identified herein revealed considerable homology in specific portions of the amino acid sequences. Each ß3-GalT of SEQ ID NO : 3, SEQ ID NO : 5, SEQ ID NO : 7, SEQ ID NO : 9 and SEQ ID NO : 17 had homologous loci which had 100% identity with amino acid residues 120- 123,167-173,208-213,254-259,271-275, and 307-311 of SEQ ID NO : 1 (h- ß3-GalT). The (33-GaITs further had homologous loci having at least 60%, 67% and 63% identity with hß3-GalT amino acid residues 97-126,143-224, and 239-330, respectively.

A comparison of the overall homology of the ß3-GalTs identified herein further reveals a considerable range in homology as indicated in the alignment in Figure 5.

Homologies provided herein were calculated by ClustalW, a program component of MacVector Version 6.5 by the Genetics Computer Group at University Research Park, 575 Science Dr., Madison, WI 53711.

The term"identity"or"homology"used herein is defined by the output called"Percent Identity"of a computer alignment program called ClustalW.

"Similarity"values provided herein are also provided as an output of the ClustalW program using the alignment values provided below. As noted, this program is a component of widely used package of sequence alignment and analysis programs called MacVector Version 6.5, Genetics Computer Group (GCG), Madison, Wisc. The ClustalW program has two alignment variables, the gap creation penalty and the gap extension penalty, which can be modified to alter the stringency of a nucleotide and/or amino acid alignment produced by the program. The settings for open gap penalty and extend gap penalty used herein to define identity for amino acid alignments were as follows : Open Gap penalty = 10.0 Extend Gap penalty = 0.05 Delay Divergent = 40% The program used the BLOSUM series scoring matrix. Other parameter values used in the percent identity determination were default values previously established for the 6.5 version of the ClustalW program. (see Thompson, J. D. et al (1994) Nucleic Acids Res 22: 4673).

In general, polynucleotides which encode core 1 p3-gaiactosy ! transferases are contemplated by the present invention. In particular, the present invention contemplates DNA sequences having SEQ ID NO: 2, SEQ ID NO : 4, SEQ ID NO : 6, SEQ ID NO : 8, SEQ ID NO : 10 and SEQ ID NO : 18, and DNA sequences comprising bases 63-1154 of SEQ ID NO : 2 (SEQ ID NO : 12), bases 154-1245 of SEQ ID NO : 4 (SEQ ID NO : 13), bases 180-1271 of SEQ ID NO : 6 (SEQ ID NO : 14), and bases 1-1170 of SEQ ID NO : 8 (SEQ ID NO : 15). The invention further comprises portions of said sequences which encode soluble forms of (33-GaiTs.

The invention further contemplates DNA sequences which comprise portions of polynucleotides of SEQ ID NO : 12, SEQ ID NO : 13, SEQ ID NO : 14, SEQ ID NO : 15, SEQ ID NO: 10 and SEQ ID NO : 18 which encode soluble proteins having p3-galactosyl transferase activity. That is, portions of the above polynucleotides which encode the N-terminal transmembrane region have been removed, and the remaining portions encode soluble proteins having ß3- galactosyl trasnferase activity.

The invention further contemplates polynucleotides which are at least about 50% homologous, 60% homologous, 70% homologous, 80% homologous or 90% homologous to the coding sequence SEQ ID NO : 12, where homology is defined as strict base identity, wherein said polynucleotides encode proteins having ß3-gala*osyl transferase activity.

The present invention further contemplates nucleic acid sequences which differ in the codon sequence from the nucleic acids defined herein due to the degeneracy of the genetic code, which allows different nucleic acid sequences to code for the same protein as is further explained herein above and as is well known in the art. The polynucleotides contemplated herein may be DNA or RNA. The invention further comprises DNA or RNA nucleic acid sequences which are complementary to the sequences described above.

The present invention further comprises polypeptides which are encoded by the polynucleotide sequences described above. In particular, the present invention contemplates polypeptides having ß3-galactosyl transferase activity including SEQ ID NO: 1, SEQ ID NO : 3, SEQ ID NO : 5, SEQ ID NO : 7, SEQ ID NO : 9 and SEQ ID NO: 17 and versions thereof which lack the transmembrane domain and which are therefore soluble. The invention further contemplates polypeptides which are at least about 41 % homologous, 50% homologous, 60% homologous, 70% homologous, 80% homologous, or 90% homologous to the polypeptides represented herein by, SEQ ID NO: 1 or SEQ ID NO: 3, wherein homology is defined as strict identity. The present invention further contemplates polypeptides having loci in substantially homologous positions which have at least about 60% or greater identity with residues 97-126, about 67% or greater identity with residues 143-224, and about 63% or greater identity with residues 239-330 of SEQ ID NO: 1, and which have ß3-galactosyl transferase activity. The present invention further contemplates polypeptides which differ in amino acid sequence from the polypeptides defined herein by substitution with functionally equivalent amino acids, resulting in what are known in the art as conservative substitutions, as discussed above herein.

Also included in the invention are isolated DNA sequences which hybridize to the DNAs set forth in SEQ ID NO : 2, SEQ ID NO : 4, SEQ ID NO : 61 SEQ ID NO : 8, SEQ ID NO: 10 or SEQ ID NO: 18 under stringent or relaxed conditions (as well known to persons of ordinary skill in the art), and which have f33-galactosyl transferase activity.

In summary, as shown herein, at least three mammalian core 1 93 galactosyl transferases, a C elegans f3-galactosyl transferase and two D. melanogaster ß3-galactosyl transferases that catalyze galactosylation of an N- acetylgalactosamine linked to a serine or threonine on a protein, polypeptide or peptide have been cloned and expressed.

The present invention is not to be limited in scope by the specific embodiments described herein, since such embodiments are intended as but single illustrations of one aspect of the invention and any functionally equivalent embodiments are within the scope of this invention. Indeed, various modifications of the invention in addition to those shown and described herein will become apparent to those skilled in the art from the foregoing description and accompanying drawings. Such modifications are intended to fall within the scope of the appended claims.

It is also to be understood that a ! ! base pair sizes given for nucleotides are approximate and are used as examples for the purpose of description.

Changes may be made in the construction-and the operation of the various compositions and elements described herein or in the steps or the sequence of steps of the methods described herein without departing from the spirit and scope of the invention as defined in the following claims.

Previous Patent: Moss genes from physcomitrella patens encoding proteins involved in the synthesis of carbohydrates

Next Patent: GALACTOSYLTRANSFERASE HOMOLOG, ZNSSP8