Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
VON WILLEBRAND FACTOR
Document Type and Number:
WIPO Patent Application WO/1986/006096
Kind Code:
A1
Abstract:
Von Willebrand's Factor (VWF) is produced using an expression vector that includes: 1) a DNA sequence encoding a functional VWF protein; and 2) regulatory DNA capable of effecting expression of that DNA sequence in a host cell transformed with the vector. Restriction fragment length polymorphisms (RFLP's) associated with the VWF gene are identified and used in a probe for determining the source of a VWF gene in a DNA sample. The gene for VWF is localized to the short arm of human chromosome 12 (12p).

Inventors:
GINSBURG DAVID (US)
ORKIN STUART H (US)
KAUFMAN RANDAL J (US)
Application Number:
PCT/US1986/000760
Publication Date:
October 23, 1986
Filing Date:
April 10, 1986
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CHILDRENS MEDICAL CENTER (US)
International Classes:
G01N33/50; C07K14/00; C07K14/745; C07K14/755; C12N5/00; C12N5/10; C12N15/00; C12N15/09; C12N15/85; C12P21/02; C12Q1/68; A61K38/00; C12R1/91; (IPC1-7): C12N15/00; C07H21/04; C07K15/06; C12Q1/68
Domestic Patent References:
WO1985001961A11985-05-09
Foreign References:
US4386068A1983-05-31
EP0128018A21984-12-12
Other References:
Proceedings National Academy Sciences, USA, Volume 81, issued August 1984 (Washington D.C., USA), (S. TIMMONS et al), "ADP-Dependent Common Receptor Mechanism for Binding of von Willebrand Factor and Fibrinogen to Human Platelets", see pages 4935-4939, especially 4935 and 4936.
Science, Volume 228, issued 21 June 1985, (Washington, D.C., USA), (D. GINSBURG et al)., "Human von Willebrand Factor (vWF): Isolation of Complementary DNA (cDNA) Clones and Chromosomal Localization", see pages 1401-1406.
Proceedings National Academy Sciences, USA, Volume 82, issued Ocotober, 1985, (Washington, D.C. USA), (J.E. SADLER et al), "Cloning and Characterization of two cDNAs Coding for Human von Willebrand Factor", see pages 6394-6398.
Nucleic Acids Research, Volume 13, issued July 1985, (Oxford, England), (C.L. VERWEIJ et al)., "Construction of cDNA Coding for Human von Willebrand Factor Using Antibidy Probes for Colony-Screening and Mapping of the Chromosomal Gene", see pages 4699-4717.
Cell, Volume 41, issued May 1985, (Cambridge, Massachusetts, USA), (D.C. LYNCH et al), "Molecular Cloning of cDNA for Human von Willebrand Factor: Authentication by a New Method", see pages 49-56.
Thromb. Haemostas, Volume 50, issued 1983, (Stuttgart, West Germany), (V. CHAN et al), "Cell-Free Synthesis of Factor VIII Related Protein", see pages 835-837
Nucleic Acids Research, Volume 13, issued November 1985, (Oxford, England), (C.L. VERWEIJ et al), "RFLP for a Human von Willebrand Factor (vWF) cDNA Clone. pvWF1100", see page 8289.
Nature. Volume 312, issued 22 November 1984, (London, England), (W.I. WOOD et al), "Expression of Active Human Factor VIII from Recombinant DNA Clones." see pages 330-337, especially 334 & 335.
E.B. BROWN ed., Progress in Hematology, Volume XIII, 1983, GRUNE & STRATTON (New York, New York, USA), (T.S. ZIMMERMAN et al.), "Factor VIII/von Willebrand Factor", see pages 279-309, especially 279-285.
Federation Proceedings, Volume 44, 5 March 1985, (Bethesda Maryland, USA), (J.E. SADLER et al.), "Cloning and Characterization of a cDNA for Human von Willebrand Factor", see pages 1069, Abstract 3950.
See also references of EP 0218692A4
Download PDF:
Claims:
41Claims
1. What is claimed is: A DNA sequence encoding human Von Willebrand Factor (VWF) , said protein having the amino acid sequence substan¬ tially as depicted in Table 2, said DNA sequence being substantially free of other human genes.
2. A DNA sequence of claim 1 which is further characterized by a nucleotide sequence substantially as depicted in Table 2 or a sequence capable of hybridizing thereto.
3. An expression vector comprising a DNA sequence of claim 1 operatively linked to regulatory DNA capable of effecting expression of said DNA sequence in a host cell to produce active VWF.
4. A method for producing biologically active human VWF substantially free from other human proteins which comprises culturing a eukaryotiσ cell containing the expression vector of claim 3.
5. A method of claim 4 wherein the eukaryotic cell is a mammalian cell.
6. A method of claim 5 wherein the mammalian cell is a Chinese Hamster Ovary (CHO) cell.
7. A method of claim 5 wherein the mammalian cell is a COS cell.
8. VWF produced by the method of claim 4.
9. VWF produced by the method of claim 5. 42 .
10. VWF produced by the method of claim 6.
11. A DNA probe for analyzing a mammalian DNA sample, said probe comprising DNA encoding VWF or a fragment thereof, labeled with a detectable label.
12. The probe of claim 11 wherein said DNA encoding VWF is human DNA.
13. A method of analyzing a mammalian DNA sample comprising contacting said DNA with the probe of claim 12 and determining whether said probe hybridizes with said sample.
14. The method of claim 13 wherein said DNA sample is human DNA.
15. A method of analyzing a mammalian DNA sample comprising: fragmenting said DNA sample by restriction enzyme digestion, seperating said DNA fragments on the basis of size, contacting said fragments with the probe of claim 12 and determining the relative lenghts of said DNA fragments.
16. The method of claim 15 wherein said DNA sample is human DNA.
Description:
VON WILLEBRAND FACTOR

Background of the Invention This invention relates to mammalian Von Willebrand Factor (VWF) , which is also known as Factor VIIIR, and to methods for obtaining and using VWF and for analyzing a mammalian DNA sample for the presence of VWF gene.

Blood coagulation in mammals involves the interaction of a number of protein factors and tissue components. One coagulation factor complex is called Factor VIII and is composed of at least two distinct proteins: Factor VIIIC (antihemophiliσ factor, the protein that corrects the coagulation disorder known as hemophilia A) , and VWF. VWF is a protein that binds to platelets and is an essential component in platelet-vessel wall interaction during the clotting process. Zimmerman et al.. Progress in He atology Vol. XIII, "Factor VHI/Von Willebrand Factor" (Grune & Stratton 1983) . Diminished or abnormal VWF activity can result in Von Willebrand's Disease (VWD) , a relatively common and complex hereditary bleeding disorder. The Factor VIII complex, obtained as a cryoprecipitate from donor blood, is administered as a therapy for VWD. Mitra, U.S. Patent 4,386,068. cDNA's specific for other coagulation proteins, e.g.. Factor VIIIC, have been cloned and expressed in host systems [Wood et al (1984) Nature 112.:330-3371] .

VWF is partiσulary difficult to analyze and produce because it is very large and there has been very little, if any, sequence data available heretofore. Moreover, cells that produce substantial amounts of VWF, e.g., endothelial cells and megakaryocytes, are difficult to grow in culture.

Summary of the Invention

One aspect of the invention generally features producing VWF using an expression vector that includes: 1) a DNA sequence encoding a functional mammalian VWF protein; and 2) regulatory DNA capable of effecting expression of that DNA sequence in a host cell transformed with the vector. By a functional mammalian VWF protein we mean a protein that corresponds sufficiently to a naturally occurring mammalian VWF protein to have or to be processed to have the function of a naturally occurring Von Willebrand factor; processing may include glyσosylation and/or assembly into multimers. By a host cell we mean any suitable host cell such as a mammalian cell. At least some of the expression vector is exogenous to the VWF-encoding DNA sequence, meaning that it does not naturally occur in the same molecule with that sequence.

In preferred embodiments, the VWF is human VWF, and the host cell for producing the functional VWF is a eukaryotic cell, most preferably a mammalian cell.

A second aspect of the invention generally features analyzing a mammalian DNA sample using a probe comprising DNA encoding a VWF protein, or a fragment thereof, labeled with a deteσt- able label. The probe is contacted with the sample to determine whether it hybridizes with the sample.

In a third aspect, the mammalian DNA that is to be analyzed is fragmented using restriction enzyme digestion, the DNA fragments are separated on the basis of size, the fragments are contacted with the above described probe, and the relative lengths of the DNA fragments that hybridize to the probe are determined.

In preferred embodiments of both the second and third aspects, both the probe and the DNA being analyzed are human DNA.

The invention thus provides a relatively plentiful and pure source of VWF for treating bleeding disorders, for example, by administering: VWF to those with VWD; VWF in a stable complex with factor VIIIC to those with hemophilia A; or VWF to those with bleeding disorders associated with renal failure. The invention also provides diagnostic and research tools for evaluating VWF genes and defects in them using labled probes comprising the VWF gene or fragments of it. For example, a mammalian DNA sample can be analyzed using a DNA probe to detect a restriction fragment length polymorphism (RFLP) specifically associated with that VWF-related gene. By a VWF-related gene, we mean a normal VWF gene, or DNA characterized by the absence of part or all of the VWF gene or by a mutant VWF gene. Restriction fragment length polymorphism means an identifiable DNA sequence that is associated with the VWF-related gene and is conserved together with the VWF-related gene. RFLP's can be used to determine the pattern of inheritance of VWF genes, e.g. in a fetus at risk for one of the various forms of VWD. Other features and advantages of the invention will be apparent from the following descriptions of the preferred embodiments and from the claims.

Description of the Preferred Embodiment

We now describe preferred embodiments of the invention, first briefly describing the drawings.

I. Drawings Fig. 1A represents a restriction map of VWF DNA. Fig. IB represents a restriction map of the 3' portion of VWF cDNA derived from a plasmid designated pVWd and the 3' end of

the overlapping clone pVWE6. Fig. 2 is a diagram of the chromosomal localization for the human VWF gene as determined by in situ hybridization.

II. Structure and Use

Section A below describes how to make functional human VWF by first assembling cDNA fragments to yield a cDNA coding for a functional human VWF protein and then expressing that cDNA in a host cell.

Section B describes the use of cDNA encoding VWF protein to analyze sample DNA, e.g., by detecting restriction fragment length polymorphisms (RFLP's) that are associated with the human gene. An assay can then be performed to determine the source of a VWF gene in a human DNA sample-e.g. which allele of each parent was inherited.

A. Functional VWF Protein 0 The process of making functional VWF protein involves:

A) obtaining human mRNA; B) making a cDNA library from that mRNA and identifying cDNA fragments coding for VWF; C) assembling those fragments into a cDNA molecule encoding a functional VWF protein; and D) producing VWF by cloning 5 that cDNA molecule into an expression vector that can be used to transform a eukaryotic host cell.

1. Obtaining the mRNA

Q As a source of VWF mRNA, a primary culture of human umbilical vein endothelial cells (HUVEC) is grown and passaged in cell culture in Medium 199 with 20% fetal bovine serum in the presence of bovine endothelial cell growth factor and fibronectin according to the method of Maciag et. al. [J. Cell 5 Biol. 11:420-426 (1981), J. Cell Biol. 94.:511-520 (1982)]. Growth is markedly enhanced by the addition of heparin as

described by Thornton et al. [Science 222.:623-625 (1983)]. To verify the presence of VWF mRNA, both the cultured cells and the conditioned medium are tested for the presence of VWF using anti-VWF antibody obtained by standard techniques. Standard immunofluorescense and ELISA assay, respectively, can be used for this purpose. After four additional passages, cells are harvested and total RNA prepared in guanidine HC1 by standard techniques.

2. Constructing a cDNA Library

Poly-A + mRNA is isolated from total endothelial cell RNA by oligo-dT cellulose column chromatography. Two cDNA pools for the preparation of two different cDNA libraries are synthesized from the mRNA using standard techniques. For the first cDNA pool, oligo-dT is used as primer for the first strand synthesis, whereas for the second- cDNA pool strand. The cDNA pools are made blunt-ended by treatment with T4 DNA polymerase and ligated to EcoRI linkers with T4 DNA ligase after protection of internal EcoRI sites by treatment with J . coli methylase. The linker-ligated cDNA's are then digested with an excess of EcoRI restriction enzyme and seperated from free linkers by passage over a Sepharose CL4B column.

The phage vector selected for carrying the VWF cDNA into a bacterial host is lambda gtll, a derivative of bacteriophage lambda which contains a bacterial gene for beta-galactosidase with a single EcoRI cloning site located near its 3' end, corresponding to the C-terminal portion of the beta-galact¬ osidase protein. cDNA molecules are inserted into this site to construct a cDNA library.

By infecting an appropriate strain of bacteria with this phage using known techniques, a fusion protein will be produced containing most of beta-galactosidase at its amino-

terminus and a peptide fragment of the protein of interest at the carboxy-terminus. If this cDNA-encoded peptide contains one of the antigeniσ determinants of VWF, it is detected by screening with anti-VWF antibody. Large numbers of phage particles can be grown on a bacterial plate and the protein products transferred to a nitrocellulose filter and screened with a specific antibody to identify the location of the recombinant plaque producing the protein of interest.

Specifically, the VWF cDNA is ligated into EcoRI digested, phosphatase treated lambda gtll vector DNA and two libraries containing between 3-4 x 10 6 recombinant clones each are plated and amplified. Nonrecombinant background as assessed by growth on ITPG/XGal plates is approximately 30%. Recom¬ binant clones are obtained having cDNA inserts ranging in size from approximately 1 to 3 kilobases (kb) in length.

Affinity purified rabbit heteroantiserum prepared against human factor VIII-VWF is obtained using standard methods. The antiserum is passed over gelatin-sepharose, and adsorbed and eluted from a column of VWF-sepharose.

Recombinant clones from the above lambda-gtll endothelial cell cDNA libraries are screened as phage plaques in E_j_ coli host strain Y1090 with this antibody at a 1:1000 dilution. Potential positive plaques are purified, replated and rescreened. For example, one primary filter screened with anti-human VWF antibody showed a positive plaque designated LVWd. As a positive control, purified VWF protein can be spotted onto the filter and can be detected at amounts between 100 and 0.1 nanograms (ng) total protein.

Positive plaques from the rescreening are purified and phage DNA prepared by standard methods. Purification and characterization of the cDNA insert in the above-described-

LVWd plague are described below.

The 553 bp cDNA insert of LVWd is purified by agarose gel eleσtrophoresis following EcoRI digestion and used as a probe to examine Northern blots of total mRNA from endothelial cells prepared as described above. Northern blot analysis was performed on total cell RNA from HPB-ALL (a T-cell line) , endothelial cells (HTJVEC) , fibroblasts, and Hela cells, with the LVWd cDNA insert as the hybridization probe. The LVWd cDNA probe hybridized with a single mRNA band between 8 and 10 kb in length. This mRNA species is large enough to code for a protein on the order of 250K dalton in molecular weight. This mRNA species was detected only in endothelial cells; no hybridization was observed with RNA's from the controls, i.e., human fibroblasts, Hela cells, or a human T-cell line (HPB-ALL) . Thus the cDNA insert of clone LVWd corresponds to a segment of an mRNA molecule that is present only in endothelial cells and that is large enough to code for VWF. The clone contains a polypeptide epitope which reacts with affinity purified anti-VWF antibody.

The cDNA insert of LVWd is subcloned into plasmid pUC-13 (P-L Bioche icals) yielding the plasmid pVWd. Fig. 1A is a restriction map of cDNA coding for a VWF protein. Fig. IB represents a restriction fragment map and sequencing strategy for DNA from the 3' portion of VWF cDNA derived from pVWd and the 3' end of the overlapping clone pVWE6 (see Fig. 1A) . In both Fig. 1A and Fig. IB, the DNA segment designated pVWD is the above-described cDNA fragment of the plasmid pVWD. Solid lines and arrows in Fig. IB indicate regions sequenced by the method of Maxam and Gilbert [Meth. Enzymol.

6 . 5:499-560 (1980)], with the solid circles indicating the end labeled restriction site. Dotted lines and arrows indicate regions sequenced by the method of Sanger et al. [Proσ. Natl. Acad. Sσi. U.S.A. 74 . : 5463-5467 (1977)]

using PstI fragments subcloned into M13 mpll.

Table 1 shows the DNA sequence of the cDNA fragment of Fig. IB. The cDNA sequence contains a single large open reading frame encoding 193 amino acids followed by a single in the same reading frame and orientation as the beta-gala- tosidase gene with which it was fused, consistent with expression of a fusion protein product. The predicted amino acid sequence for 193 amino acid residues at the carboxy-terminus of VWF is also shown using the standard single letter amino acid code, the six nucleotides at the beginning and at the end of the DNA sequence correspond to the synthetic EcoRi linker introduced by the cloning proce¬ dure. The single termination codon is marked by a diamond. Arrowheads indicate two potential N-glycosylation sites.

The chromosomal assignment of the VWF gene is established by the use of cDNA obtained as described above. Fig. 2 represents the hybridization patterns which indicate the localization of the VWF gene to the short arm of chromosome 12 (12p) .

LVWd, pVWH5 and pVWH33 are on deposit with the American

Type Culture Collection (Rockville, Maryland, U.S.A.), under the respective accession numbers 53088,53089 and 53090.

3. Assembling cDNA Fragments

In order to build a cDNA segment corresponding to a VWF protein, an insert from a positive plaque such as the 553 base pair insert of LVWd is used as a probe to rescreen the above-described HUVEC libraries, and to produce the restriction map of VWF cDNA shown in Fig. 1A.

Specifically, the cDNA insert is purified in low melt agarose (Bethesda Research Labs) following EcoRI digestion.

9

TABLE 1

ια 20 30 40 50 60

* *

AA TTC CGG AAG ACC ACC TCC AAC C C TCC CCC CTG GGT TAG AAG GAA GAA AAT sfC ΛCA GGT K T T C N P C P G Y K E S N N T G

70 80 90 100 110 120

* *

GAA GT TGT GGG AGA GΓ TΓG ccr ACG GCT TCC ACC AT. CAG CTA ACA GGA GGA CAG ATC E C C G R C P T A c T I Q _, R G G Q I

130 140 ISO ICO 170 ISO

* * _ΓG ACA C G AAG CGT GAT GAG ACG CTC CAG GAT GGC TG GA ACT C TTC TGC AAG C-TC M T L K R D Ξ Q D G C D T H F C K V

190 200 210 220 230 240

AAT GAG ΛGA GGA GAG TAG TTC TCC GAG AAG AGG GTC ACA GGC TGC CCA CCC GAT GAA N E R G E Y ? W E K R V T G C P P D S

250 260 270 280 290 300

* * » » *

CAC AAG TGT CTG GCT GAG GGA GGT AM ATT ATG /W_ ATT CCA CCC ACC TGC TCT GAC ACA

H X C L A ε G G K I M K I P G T C C D

310 320 330 340 350 350

* τσr GAG GAG CCT GAG GC AAC GAC ATC ACT GCC AGG CTG CAG TAT GTC AAG GTG GGA ACC c N D I T R L Q Y V K V G S

370 3 38800 3 39900 400 410 420 t *

TCT AAG TCT GAA GTA G VACG GGTTGG GGAATT AATTCC : CCA(C TAC TGC CAG GGC AAA TGT GCC AGC AAA GCC C S S ε v Eε VV DD II HH Y C Q G K C A S S A

430 444400 445500 460 470 480 » * * » * *

A__C TAC TCC ATT GAC ATC AAC GAT GTG CAG GAC CAG TGC TCC TCC TGC TCT CCG ACA CGG M Y S I D I N D V Q D Q C S C C S P T R

490 500 510 520 530 540

« * * • * • *

ACG GAG CCC ATG CAG GTG GCC CTG CAC TGC ACC AAT GGC TCT GTT GTG TAC CAT GAG GTT T E P H Q V A L H C T N G S V V Y H Ξ V

550 560 570 580 590 600 » * * * * «

C- AAT GCC ATG GAG TGC A TGCTCCCK AO- AAG TGC AGC AAG TGAα.C T^ L N A M E C K C S P R K C S K

610 620 630

« • #

TGC ATG GGT GCC TGC TGC TGC CGG AAT T

and subcloned into the EcoRI digested and phosphatased pϋC-13 plasmid (P-L Biochemicals) to yield pVWD (see Fig. 1A) . The EcoRI insert is radiolabeled and used as a probe to rescreen the HUVEC cDNA library. The positive recombinant phage are purified and subcloned into pUC-13. Two such recombinants, pVWE2 and pVWE6, are illustrated in Fig. 1A. The restriction map is deduced by standard methods. A 217 bp EcoRI/PstI fragment from the 5' end of pVWE2 is used to rescreen the HUVEC cDNA library and a third series of overlapping VWF cDNA clones is identified, one of which, pVWG8b, is shown in Fig. 1A. Using the same methods, a fourth series of overlapping clones is obtained, including pVWH33 and pVWH5 which, together with the sequence shown in Table 1, span DNA sufficient to encode an entire monomer of human VWF protein. By using the appropriate restriction enzymes, reagents, and procedures, one skilled in the art can.ligate together cDNA fragments of pVWH33, pVWH5, and the sequence of Table 1, for example, to construct a cDNA encoding an entire monomer of VWF protein.

Construction and Expression of Full-Length cDNA

More specifically, cDNA clones pVWH33, pVWH5 and pVWEβ, which span 9kb pairs of DNA and encompass the entire protein coding region of VWF, were selected to construct full length cDNA. Nucleotide sequence of the ends of these clones confirmed that together they include the translational start, protein coding and translational stop sequences. The full length cDNA was constructed by standard techniques using fragments derived from individual clones. A fragment of pVWH33 from the left hand EcoRI site to the unique BamHI site was linked to a pVWH5 fragment from the BamHI site to the Sac I site. This was then linked to a fragment from pVWE6 from the Sac I site to the right hand EcoRI site. The full-length cDNA was then inserted as an EcoRI fragment into the EcoRI site of the expression vector pMT2 in which

transcription occurs under the control of the adenovirus major late promoter.

pMT2 is a derivative of the mammalian cell expression vector p91023(B) (Wong et al., 1985, Science 228;810. in which the tetracycline resistance marker is substituted for the ampicillin resistance marker. The functional elements of the VWF expression plasmid have been previously described (Kaufman, 1985, Proc. Natl. Acad. Sci. USA82 . .689). pMT2-VWF contains the SV40 origin and enhancer element; the adenovirus major late promoter with the first, second and two thirds of the third tripartite leader; an intron from the 5'splice site from the first late leader of adenovirus and 3 » splice site from an immunoglobulin gene (Kaufman & Sharp '82); the VWF cDNA; DHFR coding region, and SV40 early polyadenylation site; the adenovirus VA genes in a derivative of pBR322 containing the Col El origin of replication and ampicillin resistance.

o pMT2-VWF was grown in Ej_ coli DH5 in order to prevent deletion of the VWF sequence. Plasmid DNA was prepared by twice banding to equilibrium on CsCl gradients.

4. Production and Use of VWF 5

The above-described cDNA encoding a VWF protein can be inserted into a suitable vector and expressed in any one of a number of mammalian expression systems known to the art, for example using the general method described by Wood et 0 al. (1984) Nature 312;330-337. The resulting product with any necessary post-translational processing, yields a mature Von Willebrand factor. Host systems can be selected for appropriate post translational processing of the VWF gene product, and enable efficient recovery of VWF. Active 5 VWF has thus been expressed in COS monkey cells and in CHO cells, for example, as described below. Pure VWF produced

in this way will be useful in the treatment of VWD, and patients with chronic renal failure whose abnormal bleeding times are corrected by crude cryoprecipitate. Pure VWF can also be used to carry, stabilize, and improve the therapeutic efficacy of factor VIII:C.

Expression of VWF in Monkey COS Cells.

The SV40-transformed COS monkey cells (clone M6) have beendescribed (Horowitzetal. , 1983, J. Mol. Appl. Genet. 2 : 147-149) . DNA transfections using pMT2 and pMT2-VWF were performed as described (Kaufman, 1985, Proσ. Natl. Acad. Sci. USA . 82:689 ) by the. DEAE-dextran procedure with the addition of a chloroquin treatment (Sompayrac and Danna, 1981, Proc. Natl. Acad. Sci. 7_8_:75757578; Luthman and Magnusson, 1983, Nuc. Acids Res. .11:12951308). Transfected cells were fed with DMEM (Dulbecco's Modified Eagle's Media) with 10% total bovine serum for 48 hr. Then the media was removed, the cells rinsed, and serum-free DMEM applied (4 ml per 3xl0 6 cells) for measurement for VWF using an inhibition ELISA assay in which purified VWF was adsorbed onto the surface of microtiter wells followed by anti-VWF antibody. The ability of test material to displace anti-VWF antibody from the immobilized antigen was tested using peroxidase conjugated anti-rabbit IgG as the indicator substance. Media from COS cells transfected with expression vector pMT2-VWF containing VWF cDNA produced between 50 and 300 ng/ml VWF antigen in three seperate transfections. COS cells transfected with vector pMT2 alone did not produce any protein reacting in the ELISA assay.

Processing of Recombinant VWF

A transfection of COS cells was performed as above, and 72 hours post-transfection, the media was replaced with fresh

cysteine-free media containing 35 S-σysteine. After an additional 1 to 5 hours of incubation, the media was removed and cell extracts were prepared as described (Kaufman and Sharp, 1982, J. Mol. Biol. 159:601.. The cell extracts and media were then used for studies of VWF processing and multimer assembly. VWF was immunoprecipated by incubation with rabbit anti-human VWF antibody, followed by protein-A sepharose. The immunoprecipated material was washed in a buffer containing 0.1% SDS and Nonidet P40 to minimize non-specific adsorption of other proteins to the immune complex. The precipitated proteins were then analyzed by 4 to 6% SDS-PAGE in the presence and absence of lmM DTT. Recombinant cell lysates contained a band migrating with an apparent molecular weight of 260 kd. Cell media contained a mixture of 260 kd and 220 kd species. Immunopreσiptates derived from COS cells or media transfected with non-recom- binant plasmid did not contain these two bands. Analysis of non-reduced species by 4% SDS-PAGE showed a series of four or five very high molecular weight proteins varying from 1 to 3 or 4 million daltons.

Biological Activity of Recombinant VWF

COS cells were transfected as above and 48 hr. post- transfection were rinsed and fed with serum free media. Two hundred ml of serum free COS cell media was collected after incubation with COS cells for 24 hours. It contained between 50 and 200 ng/ml VWF by ELISA. The serum free media was concentrated by dialysis against 50% Ficoll to a concentration of 2.2 ug/ml VWF protein for use in the competitive binding assays.

This recombinant VWF was used in varying concentrations as a competing ligand against purified, radiolabelled human VWF multimers in assays as described by Luscalzo, J. and

Handin, R.I., 1984, Biochem. 2.3:3880. A concentration of

lug/ml competed for 50% of the collagen binding (I.C. 50). This was ten fold less than the I.C. 50 when purified human VWF was used as competing ligand. COS media from cells transfected with pMT2 plasmid alone did not compete for collagenbindingsites. Similarly, the I.C. 50 for recombinant VWF binding to platelet glycoprotein lb was 2 ug/ml when using freshly isolated platelets, and 5 ug/ml when using formalin fixed platelets as the source of receptor. These values are identical to those obtained with purified human VWF. Again, media form COS cells transfected with nonrecom- binant plasmid did not compete for binding. These results demonstrate that COS cell derived VWF is functional as determined by collagen and platelet binding.

VWF Expression in CHO Cells

Although a number of systems are available for the production of VWF in mammalian cells, one particularly useful approach to obtain high level expression is to select for cells that contain a high degree of amplification of the heterologous VWF gene. One amplifiable marker which is available for this approach is the dihydrofolate reductase gene for which cells harbouring increased gene copies can be selected by propagation in increasing concentrations of methotrexate (MTX) (Kaufman and Sharp,1982, J. Mol. Biol. 159: 601). This approach can be used to select and amplify the VWF gene in a variety of different cell types and has been used to obtain expression of active, full-length human VWF in Chinese hamster ovary cells.

Coamplification and Coexpression of VWF and DHFR in DHFR Deficient Chinese Hamster Ovary (CHO) Cells

The VWF expression plasmid pMT2-VWF and the DHFR expression plasmidpAdD26SV (A) 3 (Kaufman and Sharp,1982, Mol. Cell.Biol 2 : 1304) were introduced into DHFR deficient CHO DUKX-BII cells by calcium-phosphate coprecipitation and transfection. DHFR+ transformants were selected for growth in alpha media with dialyzed fetal calf serum and subsequently selected for amplification by growth in increasing concentrations of MTX (sequential steps in 0.02, 0.2, 1.0, and 5.0 uM) as described (Kaufman et al., 1985, Mol. Cell. Biol. . 5:1750). One transformant, designated XMTVWF-C1, was isolated in alpha media and propagated in MTX as described above. The expression of VWF, monitored by ELISA, as a function of increasing levels of MTX resistance is shown below.

uM T TX1 nσ/ml/dav

0.2 56

1.0 91

5.0 278

VWF expression increased with increasing levels of MTX resistance.

The VWF derived from the CHO cells was assayed by a direct ELISA assay using a rabbit anti-human VWF antibody (Calbio- chem) immobilized on microtiter plates and a secondary antibody conjugated to horse raddish peroxidase (Dacco) . There was minimal activity in the media from the original CHO cells (less than 5 ng/ml) . Values were determined by comparison to a standard derived from normal human pooled plasma (1 unit/ml) which was assumed to contain lOug/ml of VWF. VWF expression has also been verified by 35 S-cysteine labeling of the cells and analysis of the conditioned media and cell extracts by immunopreσipitation with rabbit anti- human-VWF antisera (Calbiochem) and electrophoresis of SDS-polyacrylamide gels as described above for the VWF derived from transfected COS cells.

5. Nucleotide sequence of VWF rDNA

The VWF DNA sequence was derived from the same over¬ lapping cDNA clones which were used in the construction of the full-length expressed clone pMT2-VWF. An additonal 70 bp of 5' untranslated region, derived from the most 5' clone isolated, pVWK7, has been included. The entire sequence was determined on both strands using the Sanger dideoxy method on single-stranded M13 subclones (Sanger, F. Nicklen. , S. and Coulson, A.R. , 1977, Proc. Natl. Acad.- Sci. U.S.A. 74.:5463-5467; Messing, J. , 1983, J. Methods Enzymol. 101:30-78.. Sub-clones forsequencingweregenerated

by exonuclease digestion of the inserts of cDNA clones pVWH33, pVWH5, and pVWE2, using nuclease Bal31 or T4 DNA polymerase. Gaps were completed by subcloning appropriate restriction fragments from the same clones into M13 mplO and mpll.

The sequence of 8588 base pairs is shown in Table 2. It contains continuous open reading frame encoding a polypeptide of 2815 amino acids.

There are three lines of evidence supporting the authenticity 0 of the indicated translational start site. First, there is an upstream nonsense codon in the major open reading frame. Second, the only other upstream start codon is followed almost immediately by an in frame stop codon. Finally, the presumptive initiator methionine is followed by a classical 5 signal peptide sequence, as expected from this secreted glycoprotein.

In order to characterize the 5' untranslated region, several other clones containing the 5' segments of the VWF cDNA o partially sequenced but no two independent clones were found to have the same 5' end. As shown here pVWK7 extended the farthest 5' .

The apparent descrepancy between the length needed to encode 5 an estimated 260 kd VWF presursor (about 7kb) and the observed VWF message size of 8-9 kb has been previously noted, and has led some investigators to postulate the presence of an extremely long 5' untranslated region (Lynch et al., 1985, Cell 4L:49). The presence of an 8.3 kb conti- 0 nuous open reading frame clearly shows that the primary VWF transcript is much larger than the 260 kd suggested by SDS-PAGE. The predicted molecular weight is approximately

18

TABLE 2 LVWF cDNA sequence with deduced amino acid sequence l β β 3ø '.ι_l Sβ G<3 7~

GAATTCCGCQ GCTGAGAGCA TGGCCTflGGB TGGBCGSCflC CATTGTCCAG CAGCTGAGTT TCCCfiGGGAC

aa 9a ιaa I≤I

CTTGGAGATA GCCGCAGCCC TCATTTGCAG GGGAAG ATG ATT CCT GCC AGA TTT GCC GGG

M I P A R F A G

13S ιsι ιsε lai

GTG CTG CTT GCT CTG GCC CTC ATT TTG CCA 6GG ACC CTT TGT GCA GAA GGA ACT V L A A L I L P G T L C A S G T

CGC GGC AGG TCA TCC ACG GCC CGA TGC AGC CTT TTC GGA AGT GAC TTC GTC AAC R G R S S T A R C S L F G S D F V N

241 £35 871 2S6

ACC TTT GAT GGG AGC ATG TAC AGC TTT GCG GGA TAC TGC AGT TAC CTC CTG GCA

T F D G S M Y S F A G Y C S Y L L A

301 316 331 346

GGG GGC TGC CAG AAA CGC ICC TTC TCG ATT ATT GGG GAC TTC CAG AAT GGC AAΘ S G C Q K R S F S I T G D F Q N G K

3S1 376 331

•AGA GTG AGC CTC TCC GTG TAT CTT GGG GAA TTT TTT GAC ATC CAT TTG TTT GTC R - V S L S V Y L G E F F D I H F V

4«6 4≤1 426 451

AAT GGT ACC GTG ACA CAG GGG GAC CAA AGA GTC TCC ATG CCC TAT GCC TCC AAA N G T U T Q G D CJ R V S M P Y A S K

TABLE 2 (cont'd)

466 431 496

CGG CTG TAT CTA GAA ACT GAG __T GGG TAC T~C AAG CTG TCC GGT GAG L'CC TAT G L Y L E T E A Q Y Y K L S G E A Y

511 5__6 541 556

GGC TTT GTG GCC AGG ATC GAT GGC AGC GGC AAC TTT CAA GTC CTG CTG TCA GAC

G F V A R I D G S G N F Q V L L S D

571 536 601 616

AGA TAC TTC AAC AAG ACC TGC GGG CTG TGT GGC AAC TTT AAC ATC TTT GCT GAA

R Y F N K T C G L C G N F N I F A E

631 646 661

GAT GAC TTT ATG ACC CAA GAA GGG ACC TTG ACC TCG GAC CCT TAT GAC TTT GCC D D F M T Q E G T T 3 D P Y D F A

676 * ' 631 736 7Ξ1

AAC TCA TGG GCT CTG AGC AGT GGA GAA CAG TGG TGT GAA CGG GCA TCT CCT CCC N S W A L S S G E Q W C E R A S P P

736 751 766

AGC AGC TCA TGC AAC ATC TCC TCT GGG GAA ATG CAG AAG GGC CTG TGG GAG CflG S S S C N ' l S S G E M Q K G L W E Q

7 β l 736 , βll a≤6

TGC CAG CTT CTG AAG AGC ACC TCG GTG TTT GCC CGC TGC CAC CCT CTG GTG GAC

C O L K S T S V F A R C H P L V D

841 856 071 636

CCC GAG CCT TTT GTG GCC CTG TGT GAG AAG ACT TTG TGT GAG TGT GCT GGG <-GG

° ~ p F ' j i O fi K T L c ε r n r. G

TABLE 2 ( " cont'd.

3<31 316 331

CTG GAG TGC GCC TGC CCT GCC CTC CTG GAG TAC GCC CGG ACC TGT GCC CAG GAG L- E C A C P A L L E Y A R T C A Q E

946 961 976 991

GGA ATG GTG CTG TAC GGC TGG ACC GAC CAC AGC GCG TGC AGC CCA GTG TGC CCT S M V L Y G W T D H S A C S P V C P

1006 1021 1036

GCT GGT ATG GAG TAT AGG CAG TGT GTG TCC CCT TGC GCC AGG ACC TGC CAG AGC A G M E Y R Q C V S P C A R T C Q S

1051 106S 1081 1096

CTG CAC ATC AAT GAA ATG TGT CAG GAG CGA TGC GTG GAT GGC TGC AGC TGC CCT L H I N E M C Q E R C V D G C S C P

1111 1126 1141 1156

GAG GGA CAG CTC CTG GAT GAA GGC CTC TGC GTG GAG AGC ACC GAG TGT CCC TGC E G ~ Q L L D E G C V E S T E C P C

1171 1186 12øl

GTG CAT TCC GGA AAG CGC TAC CCT CCC GGC ACC TCC CTC TCT CGA GAC TGC AAC V H S G K R Y P P G T S U S R D C N

1216 1231 1246 1261

- ACC TGC ATT TGC CGA AAC AGC CAG TGG ATC TGC AGC AAT GAA GAA TGT CCA GGG

T C I C R N S Q W I C S ' N E E C P G

127S 1231 1306

GAG TGC CTT GTC ACA GGT CAA TCA CAC TTC AAG AGC TTT GAC AAC AGA TAC TTC ε C L V T G Q S H F K S F D N R Y F

Table 2 f cont'd)

1321 1336 13Ξ1 1366

ACC TTC AGT GGG ATC TGC CAG TAC CTG CTG GCC CGG GAT TGC CAG GAC CAC TCC T F Ξ G I C Q Y L A R D C D D H S

1381 1396 1411 1426

TTC TCC ATT GTC ATT GAG ACT GTC CAG TGT GCT GAT GAC CGC GAC GCT GTG TGC F S I V I E T V Q C A D D R D A V C

1441 1456 1471

ACC CGC TCC GTC ACC GTC CGG CTG CCT GGC CTG CAC AAC AGC CTT STG AAA CTG T / S V T V R L P G -_ H N S L V K L

I486 1501 1516 1531

AAG CAT GGG GCA GGA GTT GCC ATG GAT GGC CAG GAC GTC CAG CTC CCC CTC CTG K H G A G V A M D G Q D V Q L P L

1546 1561 . 1576

AAA GGT GAC CTC CGC ATC CAG CAT ACA GTG ACG GCC TCC GTG CGC CTC AGC TAC K G D R I Q H T V T A S V R S Y

1591 1606 1621 1635

GGG GAG GAC CTG CAG ATG GAC TGG GAT GGC CGC GGG AGG CTG CTG GTG AAG CTG G E D L Q M D W D G R G R L L V K L

1651 1666 1681 1696

TCC CCC GTC TAT GCC GGG AAG ACC TGC GGC CTG TGT GGG AAT TAC AAT GGC AAC S P V Y A G K T C G L C G N Y N G N

1711 1726 1741

CAG GGC GAC GAC TTC CTT ACC CCC TCT GGG CTG GCG GAG CCC CGG GTG GAG GAC C_. G D D F l_ T P S G L A E P R V E D

TABLE 2 (cont'd)

. 1756 1771 1 06 *"v.ι

_. -.„,-. R RG G A C TGC CAG GAC CTG CAG AAG CAG CAC TTC GGG AAC GCC TGG AAG CTG CAC B QB toHL - ' ul -

F G N A W K H G D C Q D -. Q Q H

1816 1331 1846

AGC GAT CCC TGC GCC CTC AAC CCG CGC ATG ACC AGG TTC TCC GAG GAG GCG TGC S D ' P C A L N P R M T R F S E E A C

1361 1376 1831 1306

GCG GTC CTG ACG TCC CCC ACA TTC GAG GCC TGC CAT CGT GCC GTC AGC CCG CTG A V L T S P T F E A C H R A V S P U

1321 1336 1951 19S6

. CCC TAC CTG CGG AAC TGC CGC TAC GAC GTG TGC TCC TGC TCG GAC GGC CGC GAG P Y L R N C R Y D V C S C S D G R E

1981 1396 2011

TGC CTG TGC GGC GCC CTG GCC AGC TAT GCC GCG GCC TGC GCG GGG AGA GGC GTG C L C G A L A S Y A A A C A G R G V

2026 @ 2041 2056 2071

CGC GTC GCG TGG CGC GAG CCA GGC CGC TGT GAG CTG AAC TGC CCG AAA GGC CflG R V A W R E P G R C E L N C P K G Q

2036 2101 2116

GTG TAC CTG CAS TGC GGG ACC CCC TGC AAC CTG ACC TGC CGC TCT CTC TCT TAC V Y Q C G T P C N T C R S L S Y

2131 2146 2161 2176

CCD GAT GAG GAA TGC AAT GAG GCC TGC CTG GAG GGC TGC TTC TGC CCC CCA GGG P D ^ E C N E A Γ L E G F C P P G

TABLE 2 (cont'd)

2131 2206 £221 2236

CTC TAC ATG GAT GAG AGG GGG GAC TGC GTG CCC AAG GCC CAG TGC CCC TGT TAC Y M D E R G D C V P K A Q C P C Y

2251 2266 £281

TAT GAC GGT GAG ATC TTC CAG CCA GAA GAC ATC TTC TCA GAC CAT CAC ACC ATG Y D G E I F Q P E D I F S D H H T M

2295 2311 2326 2341

TGC TAC TGT GAG GAT GGC TTC ATG CAC TGT ACC ATG AGT GGA GTC CCC GGA AGC C Y C E D G F M H C T M S G V P G S

2356 2371 2336

TTG CTG CCT GAC GCT GTC CTC AGC AGT CCC CTG TCT CAT CGC AGC AAA AGG AGC L L P' D A V L S S P L S H R S K R S

2401 2416 2431 2446

CTA TCC TGT CGG CCC CCC ATG GTC AAG CTG GTG TGT CCC GCT GAC AAC CTG CGG U S C R P P M V K L V C P A D N L R

2461 2476 2491 2506

GCT GAA GGG CTC GAG TGT ACC AAA ACG TGC CAG AAC TAT GAC CTG GAG TGC ATG A E G L E C T K T C Q N Y D E C M

2521 2536 2551

AGC ATG GGC TGT GTC TCT GGC TGC CTC TGC CCC CCG GGC ATG GTC CGG CAT GAG S M G C V S G C L C P P G M V R H E

•2566 2581 2596 2611

AAC AGA TGT GTG GCC CTG GAA AGG TGT CCC TGC TTC CAT CAG GGC AAG GAG TAT N R C V A L E R C P C F H Q G K E Y

24

TABLE 2 (cont'd)

2626 2641 2656

GCC CCT GGA GAA ACA GTG AAG ATT GGC TGC AAC ACT TGT GTC TGT CGG GAC CGG A P G E T V K I G C N T C V C R D R

2671 2686 2701 2716

AAG TGG AAC TGC ACA GAC CAT GTG TGT GAT GCC ACG TGC TCC ACG ATC GGC ATG W N C T D H V C D A T C S T I G M

2731 2746 2761 2776

GCC CAC TAC CTC ACC TTC GAC GGG CTC AAA TAC CTG TTC CCC GGG GAG TGC CAG A H Y T F D G L K Y F P G E C Q

2731 2806 2821

TAC GTT CTG GTG CAG GAT TAC TGC GGC AGT AAC CCT GGG ACC TTT CGG ATC CTA Y V L V Q D Y C G S N P G T F R- I L

2336 2851 2366 2381

GTG GGG AAT AAG GGA TGC AGC CAC CCC TCA GTG AAA TGC AAG AAA CGG GTC ACC V G N K G C S H P S V K C K K R V T

2895 291 2926

ATC CTG GTG GAG GGA GGA GAG ATT GAG CTG TTT GAC GGG GAG GTG AAT GTG AAG I L V E G TS E I E U F D G E V N V K

2941 2956 2971 2386

AGG CCC ATG AAG GAT GAG ACT CAC TTT GAG GTG GTG GAG TCT GGC CGG TAC ATC R . P M K D E T H F E V V E S G Y I

3001 3016 3031 3046

ATT CTG CTG CTG GGC AAA GCC CTC TCC GTG GTC TGG GAC CGC CAC CTG AGC ATC I U G A L S V V W D R H L S I

TABLE 2 (cont'd)

3061 3076 3091

TCC GTG GTC CTG AAG CAG ACA TAC CAG GAG AAA GTG TGT GGC CTG TGT GGG AAT S V V K Q T Y Q E K V C G L C G IM

3106 3121 3136 3151

TTT GAT GGC ATC CAG AAC AAT GAC CTC ACC AGC AGC AAC CTC CAA GTG GAG GAA F D G I Q N N D L T S S N L Q V E E

3166 3181 3136

GAC CCT GTG GAC TTT GGG AAC TCC TGG AAA GTG AGC TCG CAG TGT GCT GAC ACC D P V D F G N S W K V S S Q C A D T

3211 3226 • 3241 3256

AGA AAA GTG CCT CTG GAC TCA TCC CCT GCC ACC TGC CAT AAC AAC ATC ATG AAG R K V P L D S S P A T C H N N I M K

3271 3236 3301 ■ 3316

CAG ACG ATG GTG GAT TCC TCC TGT AGA ATC CTT ACC AGT GAC GTC TTC CAG GAC Q T M V D S S C R I T S D V F Q D

3331 3346 3361

TGC AAC AAG CTG GTG GAC CCC GAG CCA TAT CTG GAT GTC TGC ATT TAC GAC ACC C N K L V D P E P Y L D V C I Y D T

3375 3331 3406 3421

TGC TCC TGT GAG TCC ATT GGG GAC TGC GCC TGC TTC TGC GAC ACC ATT GCT GCC C S C E S I G D C A C F C D T I A A

3436 3451 3466

TAT GCC CAC GTG TGT GCC CAG CAT GGC AAG GTG GTG ACC TGG AGG ACG GCC ACA Y A H V C A Q H G K V V T W R T A T

TABLE 2 (cont'd)

3431 3496 3511 3526

TTG TGC CCC CAG AGC TGC GAG GAG AGG AAT CTC CGG GAG AAC GGG TAT GAG TGT L C P Q S C E E R N L R E N G Y E C

3541 3556 3571 3586

GAG.TGG CGC TAT AAC AGC TGT GCA CCT GCC TGT CAA GTC ACG TGT CAG CAC CCT E W R Y N S C A P A C Q V T C Q H P

3601 3616 3631

GAG CCA CTG GCC TGC CCT GTG CAG TGT GTG GAG GGC TGC CAT GCC CAC TGC CCT E P l- A C P V Q C V E G C H A H C P

3646 3661 3676 3691

CCA GGG AflA ATC CTG GAT GAG CTT TTG CAG ACC TGC GTT GAC CCT GAA GAC TGT P G K I L D E L- L Q T C V D P E D C

3706 3721 3736

CCA GTG TGT GAG GTG GCT GGC CGG CGT TTT GCC TCA GGA AAG AAA GTC ACC TTG P V C E V A G R R F A S G K K V T L

3751 3766 3781 3796

AAT CCC AGT GAC CCT GAG CAC TGC CAG ATT TGC CAC TGT GAT GTT GTC AAC CTC N P S D P E H C Q I C H C D V V N L

3811 3826 3841 3856

. ACC TGT GAA GCC TGC CAG GAG CCG GGA GGC CTG GTG GTG CCT CCC ACA GAT GCC T C E A C Q E P G G U V V P P T D A

3871 3386 3301

CCG GTG AGC CCC ACC ACT CTG TAT GTG GAG GAC ATC TCG GAA CCG CCG TTG CflC ° V 3 P T T l v y c p . S E P P L H

TABLE 2 (cont'd)

3316 3331 3346 3361

GAT TTC TAC TGC AGC AGG CTA CTG GAC CTG GTC TTC CTG CTG GAT GGC TCC TCC D F Y C S R L D V F L D G S S

3376 3931 4øø6

AGG CTG TCC GAG GCT GAG TTT GAA GTG CTG AAG GCC TTT GTG GTG GAC ATG ATG R L S E A E F E V K A F V V D M M

4021 4036 4051 4065

GAG CGG CTG CGC ATC TCC CAG AflG TGG GTC CGC GTG GCC GTG GTG GAG TAC CAC E R L R I S Q K W V R V A V V E Y H

4081 4035 4111 4126

GAC GGC TCC CAC GCC TAC ATC GGG CTC AAG'GAC CGG AAG CGA CCG TCA GAG CTG D G S H A Y I G L K D R K R P S E

4141 4156 4171

CGG CGC ATT GCC AGC CflG GTG AflG TAT GCG GGC AGC CAG GTG GCC TCC ACC AGC R R I A S Q V K Y A G S Q V . A S T Ξ

4186 4201 4216 4231

GAG GTC TTG AAA TAC ACA CTG TTC CAA ATC TTC AGC AAG ATC GAC CGC CCT GAA E V L K Y T L F Q I F S K I D R P E

4246 4261 4276

GCC TCC CGC ATC GCC CTG CTC CTG ATG GCC AGC CAG GAG CCC CAA CGG ATG TCC A S R I A M A S Q E P Q R M S

4291 4306 4321 4335

CGG AAC TTT GTC CGC TAC GTC CAG GGC CTG AAG AAG AAG AAG GTC. ATT GTG ATC R N F V R Y V Q G L K K K K V I V I

TABLE 2 (cont'd)

4351 4366 4381 4396

CCG GTG GGC ATT GGG CCC CAT GCC AAC CTC AAG CflG ATC CGC CTC ATC GAG AAG P V G I G P H A N L K Q I R L I E K

4411 4426 4441

CAG GCC CCT GAG AAC AAG GCC TTC GTG CTG AGC AGT GTG GAT GAG CTG GAG CflG D A P E N K A F V L S S V D E L E Q

4455 4471 4486 4501

CAA AGG GAC GAG ATC GTT AGC TAC CTC TGT GAC CTT GCC CCT GAA GCC CCT CCT Q R D E I V S Y L C D L A P E A P P

4516 4531 4546

CCT ACT CTG CCC CCC CAC ATG GCA CAA GTC ACT GTG GGC CCG GGG CTC TTG GGG P T L P P H M A Q V T V G P G L G

4561 4576 4591 4606

GTT TCG ACC CTG GGG CCC AAG AGG AAC TCC ATG GTT CTG GAT GTG GCG TTC GTC V S T L G P K R N S M V L D V A F V

4621 4636 4651 4666

CTG GAA GGA TCG GAC AAA ATT GGT GAA GCC GAC TTC AAC AGG AGC AAG GAG TTC L E G S D K I G E A D F N R S K E F

4681 4696 4711

ATG GAG GAG GTG ATT CAG CGG ATG GAT GTG GGC CAG GAC AGC ATC CAC GTC ACG M E E V I Q R M D V G Q D S 1 H V T

4726 4741 4756 4771

GTG CTG CAG TAC TCC TAC ATG GTG ACC GTG GAG TAC CCC TTC AGC GAG GCA CAG Q Y S Y M V T V E Y P F S E A Q

TABLE 2 (cont'd)

4736 4801 4316

TCC AAA GGG GAC ATC CTG CAG CGG GTG CGA GAG ATC CGC TAC CAG GGC GGC AAC S K G D I L Q R V R E I R Y Q G G N

4831 4846 4861 4876

AGG ACC AAC ACT GGG CTG GCC CTG CGG TAC CTC TCT GAC CAC AGC TTC TTG GTC R T N T G L A L R Y S D H S F V

4891 4906 4921 4336

AGC CAG GGT GAC CGG GAG CAG GCG CCC AAC CTG GTC TAC ATG GTC ACC GGA AAT S Q G D R E Q A P N L V Y M V T G N

4951 4966 4981

CCT GCC TCT GAT GAG ATC AAG AGG CTG CCT GGA GAC ATC .CAG GTG GTG CCC ATT P A S D E I K R P G D I Q V V P I

4996 * 5011 5026 __ 5041

GGA GTG GGC CCT AAT GCC AAC GTG CAG GAG CTG GAG AGG ATT GGC TGG CCC AAT G V G P N A N V Q E L E R I G W P N

5056 5071 5086

GCC CCT ATC CTC ATC CAG GAC TTT GAG ACG CTC CCC CGA GAG GCT CCT GAC CTG A P I I ( 3. D F E T P R E A P D

5101 5116 5131 5146

GTG CTG CAG AGG TGC TGC TCC GGA GAG GGG CTG CAG ATC CCC ACC CTC TCC CCT V t- O R C C S G E G L Q I P T S P

5161 5176 ' 5191 5206

GCA CCT GAC TGC AGC CAG CCC CTG GAC GTG ATC CTT CTC CTG GAT GGC TCC TCC

•" ' P D c £ q p D '-' f l. L L D G r s

TABLE 2 (cont'd)

5221 5236 5251

AGT TTC CCA GCT TCT TAT TTT GAT GAA ATG AAG AGT TTC GCC AAG GCT TTC ATT S F P A S Y F D E M K S F A K A F I

5265 5281 5236 5311

TCA AAA GCC AAT ATA GGG CCT CGT CTC ACT CAG GTG TCA GTG CTG CAG TAT GGA S K A N I G P R L T Q V S V L Q Y G

5326 5341 5356

AGC ATC ACC ACC ATT GAC GTG CCA TGG AAC GTG GTC CCG GAG AAA GCC CAT TTG S I T T I D V P W N V V P E K A H L

5371 5386 5401 5415

CTG AGC CTT GTG GAC GTC ATG CAG CGG GAG GGA GGC CCC AGC CAA ATC GGG GAT L S L V D V M Q R E G G P S Q I G D

5431 5446 5461 5476

GCC TTG GGC TTT GCT GTG CGA TAC TTG ACT TCA GAA ATG CAT GGT GCC AGG CCG A L G F A V R Y L T S E M H G A R P

5431 5505 5521

GGA GCC TCA AAG GCG 'GTG GTC ATC CTG GTC ACG GAC GTC TCT GTG GAT TCA GTG G A S K A V V I L V T D V S V D S V

5536 5551 5566 5581

GAT GCA GCA GCT GAT GCC GCC AGG TCC AAC AGA GTG ACA GTG TTC CCT ATT GGA D A A A D A A R S N R V T V F P I G

5536 5611 5626

ATT GGA GAT CGC TAC GAT GCA GCC CAG CTA CGG ATC TTG GCA GGC CCA GCA GGC r G » R. Y D A A C». L_ R X l- A G P A G

TABLE 2 (cont'd)

5641 5656 5671 5686

GAC TCC AAC GTG GTG AflG CTC CAG CGA ATC GAA &AC CTC CCT ACC ATG GTC ACC D S N V V K L Q R I E D L P T M V T

5701 5716 5731 5746

TTG GGC AAT TCC TTC CTC CAC AAA CTG TGC TCT GGA TTT GTT AGG ATT T6C ATG L G N S F L H K L C S G F V R I C M

5761 5776 5791

GAT GAG GAT GGG AAT GAG AflG AGG CCC GGG GAC GTC TGG ACC TTG CCA GAC CAG D E D G N E K R P G D V W T L P D Q

5806 5821 5336 5351

TGC CAC ACC GTG ACT TGC CAG CCA GAT GGC CAG ACC TTG CTG AAG AGT CAT CGG C H T V T C Q P D G Q T L K S H R

5366 5831 5836

GTC AAC TGT GAC CGG GGG CTG AGG CCT TCG TGC CCT AAC AGC CAG TCC CCT GTT V N C D R G L R P S C P N S Q S P V

5911 5326 5341 5356

AAA GTG GAA GAG ACC TGT GGC TGC CGC TGG ACC TGC CCC TGC GTG TGC ACA GGC K V E E T 'C G C R W T C P C V C T G

5371 5336 6001 6 l6

AGC TCC ACT CGG CAC ATC GTG ACC TTT GAT GGG CAG AAT TTC AAG CTG ACT GGC S S T R H I V T F D G Q N F K L T G

6031 6046 6061

AGC TGT TCT TAT GTC CTA TTT CAA AAC AAG GAG CAG GAC CTG GAG GTG ATT CTC S C P Y V L F O N K F Q D L E V I

TABLE 2 (cont'd)

6076 6031 6106 6121

CAT AAT GGT GCC TGC AGC CCT GGA GCA AGG CAG GGC TGC ATG AAA TCC ATC GAG H N G A C S P G A R Q G C M K S I E

6135 5151 6166

GTG AAG CAC AGT GCC CTC TCC GTC GAG CTG CAC AGT GAC ATG GAG GTG ACG GTG V K H S A L S V E L H S D M E V T V

6181 6136 6211 6226

AAT GGG AGA CTG GTC TCT GTT CCT TAC GTG GGT GGG AAC ATG GAA GTC AAC GTT N G R L V S V P Y V G G N M E V N V

6241 6256 6271 6286

TAT GGT GCC ATC ATG CAT GflG GTC AGA TTC AAT CAC CTT GGT CAC ATC TTC ACA Y G A I M H E V R F N H L G H I F T

6301 6316 6331

TTC ACT CCA CAA AAC AAT GAG TTC CAA CTG CAG CTC AGC CCC AAG ACT TTT GCT F T P Q N N E F Q L Q L S P K T F A

6346 6361 6376 6331

TCA AAG ACG TAT GGT CTG TGT GGG ATC TGT GAT GAG AAC GGA GCC AAT GAC TTC S K T Y G U C G I C D E N G A N D F

5405 6421 6435

ATG CTG AGG GAT GGC ACA GTC ACC ACA GAC TGG AAA ACA CTT GTT CAG GAA TGG M- _. R D G T V T T D W K T l. V Q E W

£451 6466 6481 6496

ACT GTG CAG CGG CCA GGG CAG ACG TGC CAG CCC ATC CTG GAG GAG CAG TGT CTT

T α R β a T r π p i i. E ε Q c ι.

33

TABLE 2 (cont'd)

6511 6526 6541 65Ξ6

GTC CCC GAC AGC TCC CAC TGC CAG GTC CTC CTC TTA CCA CTG TTT GCT GAA TGC V P D S S H C G. V L_ _. L P I_. F A E C

6571 6585 6601

CAC AAG GTC CTG GCT CCA GCC ACA TTC TAT GCC ATC TGC CAG CAG GAC AGT TGC H K V L A P A T F Y A I C Q Q D S C

6616 6631 6646 6661

CAC CAG GAG CAA GTG TGT GAG GTG ATC GCC TCT TAT GCC CAC CTC TGT CGG ACC

H α ε α V C E V I A S Y A H L C R T

6576 6691 6706

AAC GGG GTC TGC GTT GAC TGG AGG ACA CCT GAT TTC TGT GCT ATG TCA TGC CCA N G V C V D W R T P D F C A M S C P

6721 6736 6751 6766

CCA TCT CTG GTC TAC AAC CflC TGT GAG CAT GGC TGT CCC CGG CAC TGT GAT GGC P S L V Y N H C E H G C P R H C D G

6781 6795 6811 6826

AAC GTG AGC TCC TGT GGG GAC CAT CCC TCC GAA GGC TGT TTC TGC CCT CCA GAT N V S S C G D H P S E G C F C P P D

6841 6856 6871

AAA GTC ATG TTG GAA GGC AGC TGT GTC CCT GAA GAG GCC TGC ACT CAG TGC ATT K V M E G S C V P E ε A C T Q C I

6886 6901 6316 6331

GGT GAG GAT GGA GTC CAG CAC CAG TTC CTG GAA GCC TGG GTC CCG GAC CAC CAG G E D G V Q. H -S. F E A W '.' O D H Q

34 TΑBLE 2 (cont'd)

6946 6361 6376

CCC TGT CAG ATC TGC ACA TGC CTC AGC GGG CGG AAG GTC AAC TGC ACA ACG CAG P C Q I C T C L S G K V N C T T Q

6391 7006 7021 7036

CCC TGC CCC ACG GCC AAA GCT CCC ACG TGT GGC CTG TGT GAA GTA GCC CGC CTC P C P T K A P T C G L C E V A R L

7051 7066 7081 7096

CGC CAG AAT GCA GAC CflG TGC TGC CCC GAG TAT GAG TGT GTG TGT GAC CCA GTG R Q N A D Q C C P E Y E C V C D P V

7111 7126 7141

AGC TGT GAC CTG CCC CCA GTG CCT CAC TGT GAA CGT GGC CTC CAG CCC ACA CTG Ξ C D L P P V P H C- E R G L Q P T L

7156 7171 7186 7201

ACC AAC CCT GGC GAG TGC AGA CCC AAC TTC ACC TGC GCC TGC AGG AAG GAG GAG T N P G E C R P N F T C A C R K E E

7216 7231 7246

TGC AAA AGA GTG TCC ECA CCC TCC TGC CCC CCG CAC CGT TTG CCC ACC CTT CGG C K R V S P P S C P P H R L P T L R

7261 7275 7291 7306

AAG ACC CAG TGC TGT GAT GAG TAT GAG TGT GCC TGC AAC TGT GTC AAC TCC ACA K T Q C C D E Y E C A C N C V N S T

7321 7335 7351 7366

GTG AGC TGT CCC CTT GGG TAC TTG GCC TCA ACC GCC ACC AAT GAC TGT GGC TGT V « 5 C P L G V I S T A T N D C G T.

TABLE 2 (cont'd)

7301. 7336 7411

ACC ACA ACC ACC TSC CTT CCC SAC FLFTG GTS TGT GTC CAC CGA ABC ACC ATC TAC Γ T T T C L P D K V C V H R S T I Y

7426 7441 7456 7471

CCT GTG GGC. CAG TTC TGG GAG GAG 5GC TGC SAT GTG TGC ACC TGC ACC GAC ATG P V G Q F W E E Q C D V C T C T D M

7486 7501 7516

GAG GOT GCC GTG ATG GGC CTC CGC GTG 6CC CAG TGC TCC CAG AAG CCC TGT SAG E D O M G L R V A Q C S Q K P C E

7531 7546 7561 757S

GAC AGC TGT CGG TCG GGC TTC ACT TAC GTT CTG CAT GAA GGC GAG TGC TGT GGA D S C R S G F T Y V L H E G E * C C - G

AGG TGC CTG CCA TCT GCC TGT BAB GTS STS ACT GGC TCA CCG G GG3 GAC TCC R C L P S A C E V V G S P R G D S

7651 7666 7681

CAB TCT TCC TGG AAG AGT GTC GGC TCC CAG TGG GCC TCC CCG GAG AAC CCC TGC Q S S W K S V G S Q W A S P E N P C

7SS6 7711 7726 7741

CTC ATC AAT SAG TGT GTC CGA GTG AAG GAG GAG S C TTT ATA CAA CAA AGG AAC L. I N E C V R V E E V F I Q Q R N

7756 7771 7786

GTC TCC TGC CCC CRG CTG GAG GTC CCT GTC TSC CCC TCG 5GC TTT CAG CTG AGC • V S C P H L E V P V C P S Q F Q L S

-/. •..; "Hit. 7£.:._ 754-.

T6T AAG ΛCC TCA GCG TGC TSC CCA AGC TQT CGC TβT GAG CGC ATG GAS βCC TGC C K T S A C C P -t C R C Ξ rl E A C

7861 7876 7831 73*6

ATG CTC AAT GGC ACT GTC ATT GGG CCC GGG AAG ACT GTG ATG ATC GAT GTS TGC M L N G T V I G P G K T V M I D V C

7921 7925 7951

ACG ACC TGC CGC TGC ATG GTG CAG GTG GGG GTC ATC TCT GGA TTC AAG CTG GAG T T C C M V 0 V 6 I S G F K L E

79S6 7981 7936 ββll

TGC AGG AAG ACC ACC TGC AAC CCC TGC CCC CTG GGT TAC AAG GAA GAA -ART AAC C R K T C N P C L ' G Y K Ξ E N N

8026 6041 8056

ACA GGT GAA TGT TGT GGG AGA TGT TTG CCT ACG GCT TGC ACC ATT CAG CTA AGA T G E C C G R C P T A C T I Q L R

8071 8086 8101 8116

GGA GGA CAG ftTC ATG ACA CTG AAG CGT GAT GAG ACG CTC CAG GAT GGC TGT GAT β e o x -M T L K F. o ε τ L α o e c .0

6131 8146 8161 . 8176

ACT CAC TTC TGC AAG GTC AAT GAG AGA GGA GAG TAC TTC TGG GAS AflG AGG GTC τ. JH __ F. c " K v N ε R . s ε Y F W ε K H Λ>

8191 8206 S221 ■' .

ACA GGC TGC CCA CCC TTT GAT GAA CAC AAG TGT CTG GCT GAG GGA GGT APO ATT

Γ G C P P F D ε H K C L ~ ε ø β κ . 'i

36A

ATG AAA ATT CCA GGC ACC TGC TGT GAC ACA TGT GAG GAG CCT GAG TGC AAC GAC K I P G T C C D T C E E P E C N D

SS96 8311 8326

ATC ACT GCC AGG CTG CflG TAT GTC AAG GTG GGA AGC TGT AAG TCT GAA GTA GAG I T A R L O Y V K V G S C S ε v ε

8341 8356 8371 8386

GTG GAT ATC CAC TAC TGC CAG GGC AAA TGT GCC AGC AAA GCC ATG TAC TCC ATT V D I H Y C Q G K C A S K A M Y S I

8401 8416 8431 8446

GAC ATC AAC GAT GTG CflG GAC CAG TGC TCC TGC TGC TCT CCG ACA CGG ACG GAG D I N D V Q D Q C S C C S P T R T E

8461 8476 8431

CCC ATG CAG GTG GCC CTG CAC TGC ACC AAT GGC TCT GTT GTG TAC CAT GAG GTT P M Q V A L H C T N G Ξ V V Y H E . ' V

8505 8521 8535 8

558

CTC AAT GCC ATG GAG TGC AAA TGC TCC CCC AGG AAG TGC AGC AAG TGA GGCTGCT GCA

L N A M E C K C Ξ P R K C S K

8568 8578 8588 GCTGCATGGG TGCCTGCTGC TGCCGGAATT

37

300 kd., even before the extra contribution fromglycosylation is taken into account. SDS-polacrylamide gels are known to be an inacccurate way to estimate molecular weights in this range, althouugh it has not been formally excluded that multi-step processing occurs, with rapid formation of a relatively stable 260 kd intermediate. Since the VWF "pro-piece" can be found intact in the circulation as a 100 kd glycoprotein, multi-stage processing seems unlikely and the pro-VWF is clearly larger than previously suspected and processed in a single step.

The function, if any, of the large VWF pro-piece, is currently unknown. Its fate after trans-membrane secretion is also not fully clear. However, the identity of the VWF pro-piece with a lOKd plasma glycoprotein has recently been established so that at least some of the intact pro-piece leaves the cell. The N-terminal sequence of the lOOKd glycoprotein which corresponds to the predicted sequence of the propoly- peptide implies that signal peptide cleavage occurs at the position shown by the arrow in Table 2. This agrees with the consensus sequences usually involved in signal peptidase cleavage (Watson, Nuc. Acids Res., .12 . , 1984).

Sadler et al., 1985, Proc. Natl. Acad. Sci. USA 82:6394 have recently published a partial VWF sequence, and noted the presence of repeated elements. Analysis of the complete nucleotide sequence, however, reveals much more extensive repetition in the VWF structure than shown by previous data. We confirm the presence of three complete copies of a repeat labelled "domain A" by Sadler et al. We have retained the terminology for this repeat and have confirmed the homology of the 5' end of its first copy, missing in the clones of Sadler et al. A striking feature, not noted by those authors, is the paucity of cys residues within this region, which occupies about 600 amino-acids in the

38 center of the VWF sequence. In contrast, the regions at each end of the molecule, from nucleotides 208 to 3833, and 5729 to 8582 in our sequence, are extremely rich in Cys, and account almost entirely for the high Cys content of VWF. Cys is in facat the most abundant amino-acid in the prepro-VWF sequence, accounting for 8.3% of the residues. In the region outside "domain A", Cys accounts for 10.4% of the amino acids.

Further analysis of these Cys rich regions indicates that they too can be arranged as a series of 6 repeats of a basic 400 amino acid unit. The first repeat block begins almost immediatly after the signal peptide. The second repeat is truncated just preceding this duplication contains the cleavage site between the pro-piece and mature VWF. Following the third repeat, the sequence is interrupted by the triplicated cys-poor "A domain" repeat, after which the Cys rich repeats resume encoding the C-terminal end of the molecule. The fifth and sixth repeats are incomplete, but include the region of the short "B domain" repeats of Sadler et al. Those authors' "C domain" repeats follow in two copies. ' Since no free sulfhydryl groups can be detected in multimeriσ VWF, all these Cys residues are involved in interchain and intrachain disulfide bonds which are important determinants of the tertiary and quaternary structure of the protein.

B. Analyzing Sample DNA

DNA sequence polymorphisms are neutral variations in DNA sequence present throughout the genome, which can often be detected by restriction enzyme digestion and blot hybrid¬ ization analysis. By neutral we mean that the variations per se are not themselves responsible for any phenotypiσ trait. However, the value of polymorphisms is that they

39 are linked to or associated with adjacent portions of the genome, and therefore they can be used as markers of those portions of the genome.

Two types of DNA sequence polymorphisms have been described. One type involves single nucleotide changes, or small insertions or deletions, which result in the presence or absence of a particular restriction enzyme recognition site. In another type of polymorphism, a large segment of 0 DNA of unknown function varies widely in length among individuals. Bothtypes of sequence differences are inherited in Mendelian fashion.

5 RFLP's linked to the VWF gene are identified in genomic DNA from individuals examined by cutting sample DNA with a series of restriction enzymes. The resulting restriction length fragments are segregated by molecular weight. Hybridization with a radiolabeled VWF cDNA probe, e.g., the o cDNA insert from clone pVWE6 yields (e.g., using Southern blot techniques) a unique band pattern. For example, RFLP's are detected by the above procedure using restriction enzymes TaqI and Saσl and the pVWE6 probe. Specifically, peripheral blood specimens are collected in 10% aσid- 5 citrate-dextrose. High molecular weight DNA is prepared by standard techniques either from dextran sedimented leukocytes or isolated nuclei separated by centrifugation following Triton X-100 solubulization. From 2 to 16 g of DNA are digested to completion with the restriction enzymes TaqI 0 and Sad. DNA fragments are then fraσtioned by elect- rophoresis in 0.6 to 1.0% agarose gels and transferred to nitrocellulose filters. The probes constructed from pVWE6 are labeled and hybridized with the DNA on the above-described nitrocelloulose filters. The hybridized filters are washed 5 and examined by autoradiography. Once identified RFLP's can be used as described above to assay a sample of human

40

DNA and determine the source of a VWF gene in that sample. For example, DNA of parents and other family members can be screened, e.g., by Southern blotting, with such a probe, and then fetal DNA can be screened to determine the inheri- tanσe pattern of its VWF alleles.

Other Embodiments

Other embodiments are within the following claims. For example, the techniques described apply to other mammalian systems. Other RFLP's can be used. Hybridization probes can be used for other assays or research techniques.