Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
APPLICATION OF HELIX-RANDOM CHAIN THERMODYNAMICS TO NUCLEIC ACID ANALYSIS
Document Type and Number:
WIPO Patent Application WO/1998/054363
Kind Code:
A1
Abstract:
The present invention relates generally to the field of genetics and nucleic acid research and, more particularly, to the detection of mismatches or differences between nucleotide sequences, to the detection of mutations, to the detection and evaluation of single and multiple nucleotide polymorphisms, and to the genotyping of cells, tissues or individuals. The methods are based on the effects of these mismatches or differences on helix-random chain thermodynamics.

Inventors:
LERMAN LEONARD S (US)
Application Number:
PCT/US1998/010917
Publication Date:
December 03, 1998
Filing Date:
May 29, 1998
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MASSACHUSETTS INST TECHNOLOGY (US)
LERMAN LEONARD S (US)
International Classes:
C12Q1/68; C12Q1/6816; C12Q1/6827; (IPC1-7): C12Q1/68
Domestic Patent References:
WO1993016194A11993-08-19
WO1991002815A11991-03-07
WO1995021268A11995-08-10
WO1996024687A11996-08-15
WO1991000925A11991-01-24
WO1997016567A11997-05-09
Other References:
TALAVERA E M ET AL: "FLUORESCEIN-LABELED DNA PROBES FOR HOMOGENEOUS HYBRIDIZATION ASSAYS: APPLICATION TO DNA E. COLI RENATURATION", APPLIED SPECTROSCOPY, vol. 51, no. 3, March 1997 (1997-03-01), pages 401 - 406, XP000698677
STIMPSON D I ET AL: "THE UTILITY OF OPTICAL WAVEGUIDE DNA ARRAY HYBRIDIZATION AND MELTING FOR RAPID RESOLUTION OF MISMATCHES, AND FOR DETECTION OF MINOR MUTANT COMPONENTS IN THE PRESENCE OF A MAJORITY OF WILD TYPE SEQUENCE: STATISTICAL MODEL AND SUPPORTING DATA", GENETIC ANALYSIS: BIOMOLECULAR ENGINEERING, vol. 13, no. 3, September 1996 (1996-09-01), pages 73 - 80, XP000635154
FERNANDEZ ET AL.: "USE OF CHEMICAL CLAMPS IN DENATURING GRADIENT GEL ELECTROPHORESIS: APPLICATION IN THE DETECTION OF THE MOST FREQUENT MEDITERRANEAN BETA-THALASSEMIC MUTATIONS", PCR METHODS AND APPLICATIONS, vol. 3, 1993, pages 122 - 124, XP002079074
FISCHER S G ET AL: "DNA FRAGMENTS DIFFERING BY SINGLE BASE-PAIR SUBSTITUTIONS ARE SEPARATED IN DENATURING GRADIENT GELS: CORRESPONDENCE WITH MELTING THEORY", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA, vol. 80, March 1983 (1983-03-01), pages 1579 - 1583, XP000673331
Attorney, Agent or Firm:
Twomey, Michael J. (Hurwitz & Thibeault LLP, High Street Tower, 125 High Stree, Boston MA, US)
Download PDF:
Claims:
CLAIMS What is claimed is:
1. A method of identifying nucleotide sequence variation between a sample nucleic acid and a reference nucleic acid comprising: mixing a sample nucleic acid, a reference nucleic acid, and an indicator of helixformation to form a test solution, wherein at least one of said sample nucleic acid and said reference nucleic acid is bound to a helixstabilizing element which stabilizes or promotes association of said sample and reference nucleic acids in a double helical conformation; maintaining said test solution under initial conditions which favor association of said sample and reference nucleic acids in double helical conformations; subjecting said test solution to a range of test temperatures over a period of time by gradually heating said test solution at a site on a substrate, said range of temperature including temperatures which favor disassociation of said sample and reference nucleic acids into random chain conformations; detecting changes in said indicator of helixformation as an indication of temperature dependent transitions of said nucleic acids from double helical conformations to random chain conformations; and inferring from the number of said transitions and/or the temperature of said transitions whether there are nucleotide sequence mismatches between said sample and reference nucleic acids.
2. A method as in claim 1 wherein said helixstabilizing element comprises a GC rich extension element bound to said reference nucleic acid; and wherein a complementary GC rich extension element is bound to said sample nucleic acid, said GC rich extension elements forming a GC rich extension when associated in double helical conformation.
3. A method as in claim 2 wherein said GC extension comprises a GC rich extension element bound to a 5' end of said reference nucleic acid.
4. A method as in claim 2 wherein said GC extension comprises a GC rich extension element bound to a 3' end of said reference nucleic acid.
5. A method as in claim 2 wherein said GC extension comprises at least 10 base pairs.
6. A method as in claim 2 wherein said GC extension comprises 1020 base pairs.
7. A method as in claim 2 wherein said GC extension comprises 2030 base pairs.
8. A method as in claim 2 wherein said GC extension comprises at least 30 base pairs.
9. A method as in claim 2 wherein at least 60% of said GC extension comprises GC nucleotide pairs.
10. A method as in claim 2 wherein 6080% of said GC extension comprises GC nucleotide pairs.
11. A method as in claim 2 wherein 8090% of said GC extension comprises GC nucleotide pairs.
12. A method as in claim 2 wherein 90100% of said GC extension comprises GC nucleotide pairs.
13. A method as in claim 1 wherein said helixstabilizing element comprises a helical crosslinking element.
14. A method as in claim 13 wherein said method further comprises subjecting said test solution to conditions which promote crosslinking of said nucleic acids prior to said heating step.
15. A method as in claim 13 wherein said helixstabilizing element comprises a helical crosslinking element bound to a 5' end of at least one of said nucleic acids.
16. A method as in claim 13 wherein said helixstabilizing element comprises a helical crosslinking element bound to a 3' end of at least one of said nucleic acids.
17. A method as in claim 13 wherein said helical crosslinking element is selected from the group consisting of the psoralens, 4 fluoro3nitrophenylazide and bisplatinum activated sulfursubstituted nucleotide analogs.
18. A method as in claim 1 wherein said indicator of helixformation is a dye which differentially associates with nucleic acids in a double helical conformation as opposed to a random chain conformation, and which undergoes a change in an optical property when associated with nucleic acids in a double helical conformation.
19. A method as in claim 18 wherein said change in an optical property is a change in fluorescence.
20. A method as in claim 18 wherein said change in an optical property is a change in absorbance.
21. A method as in claim 18 wherein said dye is selected from the group consisting of acridine orange, ethidium bromide, and oxazole yellow.
22. A method as in claim 1 wherein said indicator of helixformation is a multivalent double helixbinding agent, and wherein binding of said agent to a multiplicity of nucleic acids in double helical conformation causes a detectable change in said test solution.
23. A method as in claim 1 wherein said reference nucleic acid comprises a nucleotide sequences of at least 50 nucleotides or nucleotide analogs.
24. A method as in claim 1 wherein said reference nucleic acid comprises a nucleotide sequences of at least 75 nucleotides or nucleotide analogs.
25. A method as in claim 1 wherein said reference nucleic acid comprises a nucleotide sequences of at least 100, 150, 200, 250, 300, 350 or 400 nucleotides or nucleotide analogs.
26. A method as in claim 1 wherein said reference nucleic acid comprises a nucleotide sequences of 4001000 nucleotides or nucleotide analogs.
27. A method as in claim 1 wherein said reference nucleic acid comprises a nucleotide sequence consisting essentially of a PolandFixman domain.
28. A method as in claim 1 wherein said reference sample acid comprises a nucleotide sequence consisting essentially of a PolandFixman domain.
29. A method as in claim 1 wherein said site is a microtiter well.
30. A method as in claim 1 wherein said site is a site on a biological assay chip.
31. A biological assay chip for use in identifying nucleotide sequence mismatches between a sample nucleic acid and a reference nucleic acid comprising a substrate; and a reference nucleic acid bound to a predetermined site on said substrate.
32. An assay chip for use in detecting one or more differences in between a sample nucleic acid and a reference nucleic acid comprising a substrate; a reference nucleic acid bound on said substrate; wherein said nucleic acid includes a helix stabilizing element.
33. An assay chip as in claim 32, wherein said helix stabilizing element is a GC rich extension element.
34. An assay chip as in claim 32, wherein said helix stabilizing element is a helical cross linking element.
Description:
APPLICATION OF HELIX-RANDOM CHAIN THERMODYNAMICS TO NUCLEIC ACID ANALYSIS Field of the Invention The present invention relates generally to the field of genetics and nucleic acid research.

The present invention relates more particularly to the detection of mismatches or differences between nucleotide sequences, to the detection of mutations, to the detection and evaluation of single or multiple nucleotide polymorphisms, and to the genotyping of cells, tissues or individuals.

Background of the Invention The transition in nucleic acid structures between double-stranded helical conformations and single-stranded random chain conformations may be exploited in the detection of sequence variation between nucleic acids in double helix conformation (e.g., between a reference strand and a sample strand heteroduplex or substantially similar reference and sample homoduplexes). The relevance of the transition in DNA structure between helix and random chain derives from the sensitivity of the temperature of the transition to details of base sequence, to helix continuity, and to the close coupling of many base pairs in maintaining helical structure.

All base pairs within a closely coupled segment of double helix DNA undergo the transition between helix and random chain, which is referred to as 'melting', within the same temperature interval. Where close coupling extends over a substantial length of helix, the thermodynamic interaction renders the temperature interval exceedingly narrow. Consequently, the midpoint of the transition is amenable to precise measurement. Differences of a single base pair in DNA segments designed to maintain full-length close coupling can be revealed as a shift upward or downward in the melting temperature. A larger shift usually accompanies the presence of a mismatch in the segment (e.g., a non-Watson-Crick pair, or a base excluded from the helix for lack of a partner in a heteroduplex). Thus, sufficient resolution in measurement of melting temperature can signal identity or a single base difference within a helix sample. It should be noted that these considerations apply only to the unimolecular melting equilibrium, in which complete separation of complementary strands does not occur.

The effect of partial melting on the electrophoretic mobility of a double-stranded nucleic acid in a gel has been used as a sensitive means of comparing melting temperatures with very small samples of DNA, and as a convenient means of testing many samples. Because the mobility

falls to a much lower value on partial melting, presumably because the open, single-stranded ends of the molecules tangle in the gel, and because complete melting or disassociation of the strands leads to greatly increased mobility, transitions in helicity may be detected by gel electrophoresis under denaturing conditions.

Helix-stabilizing elements (e.g., G-C rich extensions or helical cross-links) added to double stranded nucleic acids promote close-coupling, by permitting partial but not complete melting of strands. For example, G-C rich segments have the effect of stabilizing helical conformations of nucleic acids because they are more thermostable than random genomic sequences. This is due to the greater stability of G-C as opposed to A-T base pairs. Such G-C extensions have been used in both uniform denaturant concentration and denaturing gradient gel electrophoresis ("DGGE"). Using such systems, it has been shown that all of a large sample of base substitutions and mismatches are easily detected, and that characteristic patterns, usually a quadruplet of four bands, result from strand reassortment between two molecules if they are not identical.

Other improvements and methodological elaborations in using change of helicity to show sequence variants have included the replacement of G-C segments by cross-links, replacement of the gradient in concentration of denaturing solvent by a thermal gradient, replacement of the slab gel by a capillary containing a dense solution of a linear polymer, and chromatography under similar conditions.

Unfortunately, the time and/or manipulations and skill required by these capillary, electrophoretic and chromatographic methods effectively restrict their application to research laboratories. A need exists, therefore, for methods in which melting transitions can be monitored rapidly, in large numbers, with sufficient precision, and without skill requirements beyond that of the ordinary clinical laboratory.

Detailed Description of the Invention I. Definitions In order to more clearly and concisely point out the subject matter of the claimed invention, the following definitions are provided for specific terms used in the following written description and appended claims.

Helix-stabilizing element. As used herein, a "helix-stabilizing element" means a G-C rich extension element or a helical cross-linking element.

G-C rich extension. As used herein, a "G-C rich extension" means a double helical nucleir acid segment of at least 10, preferably 10-20, more preferably 20-30, and most preferably at least 30 consecutive base pairs in which at least 60%, preferably 60-80%, more preferably 80-90%, and most preferably 90-100% of the base pairs are G-C (i.e., guanidine-cytosine) pairs. A G-C rich extension consists of two complementary nucleic acid strands, referred to herein as "G-C rich extension elements," which may associate to form a double helical segment consisting largely or entirely of G-C pairs, and which may serve to stabilize or promote the association of contiguous nucleotide sequences in a double helical conformation. Thus, G-C rich extension elements may be added to the 5' or 3' ends of sample and reference nucleic acids to stabilize or promote association of the sample and reference nucleic acids in a double helical conformation under appropriate conditions of buffer and temperature.

Helical cross-link. As used herein, a "helical cross-link" means a chemical bond or chemical moiety which covalently or non-covalently binds two nucleic acids so as to stabilize or promote association of the nucleic acids in a double helical conformation. A helical cross-link may be formed by (a) a single "helical cross-linking element" which is bound to a first nucleic acid and which may form a covalent or non-covalent bond to the second nucleic acid, or (b) two "helical cross-linking elements" which are bound to first and second nucleic acids and which may react with each other to form a covalent or non-covalent bond between the nucleic acids. A helical cross-link may serve to stabilize or promote the association of contiguous nucleotide sequences in a double helical conformation. Thus, helical cross-linking elements may be added to the 5' or 3' ends of sample and reference nucleic acids to stabilize or promote association of the sample and reference nucleic acids in a double helical conformation under appropriate conditions of buffer and temperature.

Indicator of helix-formation. As used herein, an "indicator of helix-formation" means a chemical agent or moiety which (1) undergoes or induces a detectable chemical or physical change when it interacts with nucleic acids as opposed to when it exists free in solution, and (2) which either (a) interacts with nucleic acids in a double helical conformation to a substantially greater degree than it interacts with nucleic acids in a random chain conformation, or (b) interacts with nucleic acids in a random chain conformation to a substantially greater degree than it interacts with nucleic acids in a double helical conformation. Thus, an indicator of helix- formation is a chemical agent or moiety which differentially interacts with helical or random chain nucleic acids, which undergoes or induces a detectable chemical or physical change as a result of

that interaction and which, therefore, may be used as an indicator of the transition of nucleic acit between helical and random chain conformations.

Fixed-site. As used herein, "fixed-site" means an experimental context wherein nucleic acid is either bound to a substrate by chemical interaction or immersed in a gelatinous medium.

Solution system. As used herein, the term "solution system" means an experimental context wherein the helical transition temperature is monitored in a liquid media as opposed to electrophoretic gel, linear polymer, or chromatographic medium.

II. Description of Embodiments A. General The present invention is directed to methods and products for detecting sequence variants (i.e., substitutions, insertions, deletions, non-complementarities or non-Watson-Crick pairings) between nucleic acid strands based on the effects of these differences on helix-random chain thermodynamics. As is known in the art, mismatches between bases in double helical nucleic acids reduce the stability of the helical conformation and lower the temperature at which the double helix "melts," or transitions from a double-stranded, helical conformation to a single- stranded, random chain conformation. It is known also that even single base differences in sequence between fully paired double-helices almost invariably change the equilibrium melting temperature.

The temperature at which nucleic acid structures transition from double-stranded helical conformations to random conformations, can be used to infer the presence of sequence differences or mismatches between the nucleic acid strands. This may be accomplished by comparing the actual melt temperature to a theoretical prediction, an internal standard, or an external standard. In a sample with a single species of double helix (homoduplex), the temperature (melting temperature) at which part of a double helical molecule melts into partly separated random chain can be used to infer the presence of variant sequences by comparison to the melting temperature of the sample with the transition temperature of a nearly identical homoduplex reference sample. In a sample with more than one species of double helix (including both heteroduplexes and/or homoduplexes), there may be several temperatures at which different double helical species undergo transition. Thus, the number of different melt temperatures, as well as the numerical values of the temperatures themselves, may be used to infer the presence of mismatches and the number of species present. Finally, when multiple samples including different component nucleic acids are tested, the relative pattern of transition temperatures may be used to

infer the presence of sequence differences or mismatches, the number of species in each sample, as well as the relative identities of the components of each sample.

The present invention represents an improvement over the prior art in several respects: First, the present invention provides a method which may be performed at a fixed site or in a solution system (e.g. a biological assay chip or a microtiter assay plate ) which does not require the use of electrophoretic gels, capillary systems or chromatography. Therefore, the present methods and products provide for rapid determinations of sequence differences or mismatches without the need for costly lab equipment or training, while reducing the number of sample manipulations and opportunity for human error. Second, by incorporating helix-stabilizing elements, never before used in non-electrophoretic methods, the present invention permits a more precise determination and clearer separation of melt temperatures than previously achieved in non-electrophoretic methods. Third, by choosing and preparing sample and reference nucleic acids to correspond to Poland-Fixman domains, the present invention provides predictability of detection, and more precise determination and clearer separation of melt temperatures. Finally, by using easily detectable indicators of helix-formation, the present methods and products allow for more rapid and simple quantification of results. All of these improvements provide for accurate and rapid determination of sequence variants, and are amenable to automation.

B. Methods of Nucleic Acid Analvsis In a first embodiment, the present invention provides a method of identifying nucleotide sequence variants, (e.g., differences or mismatches) between a sample nucleic acid and a reference nucleic acid in which (1) a sample nucleic acid and a reference nucleic acid with compatible G-C rich elements are mixed, denatured and permitted to anneal, thereby forming a mixture of homoduplex and heteroduplex helical molecules in a test solution or a thin layer on a surface; (2) the test solution or the thin layer is initially maintained under conditions (e.g., buffer, temperature) which favor association of the sample and reference nucleic acids in double helical conformations; (3) the test solution or thin layer is gradually heated in the presence of an indicator of helix formation at a fixed-site on a substrate or in solution so that it passes through a range of temperatures, including temperatures which will cause the double helical nucleic acids to melt or unravel into random chain conformations; and (4) transitions of the nucleic acids from double helical conformations to random chain conformations are detected by means of the indicator of helix-formation. On the basis of the measurements of the indicator of helix-formation, the values

and/or number of different melting or transition points in the mixed sample can be determined, and therefore the presence and/or identity of different nucleotide sequences can be inferred.

In alternative embodiments, the present invention provides a method of identifying nucleotide sequence variants, (e.g., differences or mismatches) between a sample nucleic acid and a reference nucleic acid in which (1) a sample nucleic acid and a reference nucleic acid are mixed, denatured and permitted to anneal, thereby forming a mixture of homoduplex and heteroduplex helical molecules in a test solution or a thin layer on a surface; (2) a helical cross-link is introduced which acts as a helix-stabilizing element, (3) the test solution or thin layer is initially maintained under conditions (e.g., buffer, temperature) which favor association of the sample and reference nucleic acids in double helical conformations; (4) the test solution or thin layer is gradually heated in the presence of an indicator of helix formation at a fixed-site on a substrate or in solution so that it passes through a range of temperatures, including temperatures which will cause the double helical nucleic acids to melt or unravel into random chain conformations; and (5) transitions of the nucleic acids from double helical conformations to random chain conformations are detected by means of the indicator of helix-formation. On the basis of the measurements of the indicator of helix-formation, the values and/or number of different melting or transition points in the mixed sample can be determined, and therefore the presence and/or identity of different nucleotide sequences can be inferred.

Alternatively, as will be apparent to one of ordinary skill in the art, the same methods may be practiced in which the mixture of homoduplex and heteroduplex molecules formed by the sample and reference nucleic acids is initially maintained under conditions (e.g., buffer, temperature) which favor disassociation of the sample and reference nucleic acids in single- stranded molecules, and the test solution or thin layer is then gradually cooled in the presence of an indicator of helix formation at a fixed-site on a substrate or in solution so that it passes through a range of temperatures, including temperatures which will cause the single-stranded nucleic acids to anneal or associate into double helical conformations. As before, the transitions of the nucleic acids between double helical conformations and random chain conformations are detected by means of the indicator of helix-formation.

1. Sample and Reference Nucleic Acids The sample nucleic acids and reference nucleic acids used in the present invention may be derived from any source, may be natural or synthetic, and may include various chemical modifications affecting stability or chemical reactivity. Thus, for example, the nucleic acids may

be isolated from natural sources such as genomic DNA, or total or polyA mRNA; may be products of PCR or other nucleic acid amplification techniques; or may be synthesized using standard or modified nucleotide phosphate chemistries. The nucleic acids may be natural deoxyribonucleotide or ribonucleotide phosphodiester polymers, or may include one or more modified internucleotide linkages, such as phosphorothioates, alkylphosphonates, phosphorodithioates, phosphate esters, alkylphosphonothioates, phosphoramidates, carbamates, carbonates, phosphate triesters, acetamidate, and carboxymethyl esters. Alternatively, or in addition, the nucleic acids may include modified bases and/or sugars. For example, modified nucleic acids may include modified bases such as C-S methyl pyrimidine nucleotides (e.g., 5- methyl-2'-deoxyCytidine) backbone sugars which are covalently attached to low molecular weight organic groups other than a hydroxyl group at the 3' position and other than a phosphate group at the 5' position. Thus, for example, modified nucleic acids may include a 2'-O-alkylated ribose group such as a 2'-O-methylated ribose. In addition, modified nucleic acids may include sugars such as arabinose instead of ribose. Modified nucleic acids with these linkages or other modifications can be prepared according to known methods (see, e.g., Agrawal and Goodchild (1987) Tetrahedron Lett. 28:3539-3542; Agrawal et al. (1988) Proc. Natl. Acad. Sci. (USA) 85:7079-7083; Uhlmann et al. (1990) Chem. Rev. 90:534-583; Agrawal et al. (1992) Trends Biotechnol. 10:152-158; Agrawal (1993) Meth. Mol. Biol., Vol. 20, Humana Press, Totowa, NJ; and U.S. Pat. No. 5,149,798).

For embodiments in which the reference nucleic acids are bound to a substrate, such as a biological assay chip, the nucleic acids may also be modified at either the 3 or 5' termini with a chemical moiety that covalently bonds the nucleic acid to the substrate surface. For example, the nucleic acids may be bound to biotin and the substrate surface may be coated with avidin, streptavidin or neutravidin. Other linking agents are described below, in the description of the biological assay chips of the invention.

The nucleic acids of the present invention may be, and preferably are, much longer than the nucleic acids typically used in hybridization assays or prior art biological assay chips in which the observation depends on the binding or disassociation of a short oligonucleotide with a longer representative of the genomic sequence. Preferably, the nucleic acids are at least 50 nucleotides in length, and more preferably 75 or 100 bases in length. Most typically, nucleic acids of 100- 150, 150-200, 200-250, 250-300, 300-350, or 350-400, are contemplated as useful in the present invention. In some cases, nucleic acids of 400-1000 nucleotides may also be usefully analyzed

according to the methods. Such long sequences are typically ineffective in hybridization assays for the detection of only one or a few mismatches.

2. Helix-Stabilizing Elements In the present invention, both strands of the double helical molecules, consist of homoduplexes or heteroduplexes of the sample and reference nucleic acid strands, are covalently continuous with a helix-stabilizing element which is either a G-C rich extension element or a helical cross-linking element.

In a double helical conformation with Watson-Crick pairing of nucleotides, G-C base pairs contribute more to the stability of the helix than do A-T base pairs. Therefore, given segments of equal length, a G-C rich double helical segment will have a higher melting temperature than either an A-T rich segment, or a typical, randomly chosen genomic or cDNA segment.

When G-C rich extensions are employed to promote or stabilize double helical conformations, a G-C rich extension element is joined to one end of the reference nucleic acid and a complementary G-C rich extension element is joined to the opposite end of the sample nucleic acids. As a simple example, a polyG sequence may be covalently bound to the 5' end of the reference nucleic acids and a polyC sequence may be joined to the 3' end of the sample nucleic acids. Obviously, a mixed sequence of G's and C's may be employed in such extension elements, and the ends to which the G-C rich extension elements are attached may be reversed (i.e., the 3' end of the reference sequence and the 5' end of the sample sequence). In addition, the G-C rich extension need not consist entirely of G-C base pairs. Thus, for example, the G-C rich extension may comprise 60%, 60-80%, 80-90% or 90-100% G-C pairs, with the remainder being A-T pairs.

Obviously, 100% G-C pairings are preferred. Thus, a G-C rich extension comprises at least 10 base pairs, preferably 10-20 or 20-30 base pairs, and most preferably at least 30 base pairs, which are largely or entirely G-C base pairs. The length of a G-C rich extension must be sufficient to prevent or substantially reduce complete separation of the strands at the equilibrium melting temperature of the remainder of the molecule.

When helical cross-links are employed to promote or stabilize double helical conformations, the cross-link may be formed by (a) a single helical cross-linking element which is bound to either the sample or reference nucleic acids, or (b) two helical cross-linking elements, one of which is bound to each of the sample and reference nucleic acids. As with G-C rich extensions, helical cross-links may be bound to either the 5' or 3 ends of the sample and reference nucleic acids. Examples of cross-linking agents include, but are not limited to,

photoactivatable cross-linking agents including the psoralens (e.g., psoralen C2 phosphoramiditt or psoralen C6 phosphoramidite, Glen Research, Inc., Sterling, VA), and 4-fluoro-3- nitrophenylazide (Dojindo Molecular Technologies, Inc., Bethesda, MD), as well as bis-platinum activated cross-linking reagents such as nucleoside analogs with sulfur-substituted phosphate groups (see, e.g., Gruff and Orgel (1991), Nucl. Acids Res. 19:6849-6854; Chu and Orgel (1990), Nucl. Acids Res. 18:5163-5171; Farrell et al. (1990), Biochem. 29:9522-9531).

When helical cross-linking agents are employed which require activation, the methods of the present invention further include such an activating step prior to the heating (or cooling) of the test sample for the transition in helicity. Thus, for example, when photoactivated cross-linking elements are employed (e.g., 4'-aminomethyl-4, 5', 8-trimethylpsoralen), the sample and reference nucleic acids (to at least one of which is bound the cross-linking element) are mixed, allowed to form double helical conformations, and exposed to light of the appropriate wavelength (e.g., 365 nm for most psoralens) to cause helical cross-linking.

3. Indicators of Helix Formation In the present invention, indicators of helix-formation are employed to increase the ease of detection of the transitions of nucleic acids between double helix and random chain conformations. Indicators of helix-formation include any chemical agents or moieties which differentially interact with helical or random chain nucleic acids, which undergo or induce a detectable chemical or physical change as a result of that interaction and which, therefore, may be used as indicators of the transition of nucleic acids between helical and random chain conformations. The differential interaction and detectable change is preferably sufficient (e.g., a difference in absorption, fluorescence, fluorescence polarization, or turbidity of2x-l0x and, preferably, 10x-1000x) to provide a strong signal.

Preferred indicators of helix-formation are dyes which differentially associate with nucleic acids in double helical conformation and which undergo a change in fluorescence in their bound as opposed to free state. Thus, for example, acridine orange (see, e.g., Gustashaw (1991) in The ACT Cvtogenetics Laboratorv Manual 2d Ed., M.J. Barch, ed., Raven Press, Ltd., New York) fluoresces orange-red when bound to single-stranded nucleic acids and yellow-green when bound to double-stranded nucleic acids, with an approximately 3x increase in yellow-green fluorescence when bound to helical conformation nucleic acids. Similarly, ethidium bromide fluoresces approximately 50x more strongly in the ultraviolet spectrum when bound to double helical nucleic acids (Le Pecq (1971), Meth. Biochem. Anal. 20:41-86), and oxazole yellow fluoresces 1000x-

2000x more strongly when bound to double helices (Glazer and Rye (1992), Nature 395:859- 861). There are also dyes for which fluorescence is decreased or quenched by binding to double helical nucleic acids (e.g., 9-amino-acridine) and, although these may be used in the methods of the present invention, they are not preferred in embodiments with arrays of immobilized nucleic acids because unbound, more strongly fluorescing dye molecules can diffuse across the array, and because measurement of fluorescence will be more dependent upon dye concentration than nucleic acid concentration.

In presently most preferred embodiments, the indicator of helix formation is a cationic dye which intercalates and fluoresces more strongly in double helical conformations of nucleic acids.

The presently most preferred dye is oxazole yellow.

In other embodiments, the indicator of helix-formation may be a multivalent double helix- binding agent which causes a detectable change in the test solution, such as a multivalent antibody (e.g., IgM) which is specific for double-stranded nucleic acids. Multivalent antibodies, for example, can cause agglutination of double helical strands of nucleic acids, resulting in optically detectable changes in turbidity or other measures.

4. Assav formats The methods of the present invention may be performed at a single or fixed site on a substrate, as opposed to prior art methods in which samples were caused to separate according, for example, to electrophoretic mobility in uniform denaturant or denaturing gradient gels. The fixed sites of the present invention may be on or in any surface or container, such as a slide, test tube, spectrophotometer vial, capillary tube, cuvette, or the like. In preferred embodiments, however, the site is a small well such as those of microtiter plates, or the surface of a small "chip" on which the reference nucleic acids may be immobilized.

Thus, in one preferred embodiment, the methods of the present invention are carried out on a standard laboratory microtiter plate, a modified version designed for the present purposes, or a substantially equivalent laboratory article. Reference nucleic acids may be placed in each of a multiplicity of wells in a microtiter plate, and sample nucleic acids and an indicator of helix- formation may be added to each well. Because the various wells can contain different reference nucleic acids, a number of assays may be performed simultaneously in a small space. In addition, because different reference nucleic acids may be placed in each well, the sample nucleic acids may be a mixed sample, containing a variety of different sample nucleic acids, each intended to be capable of forming a double helix with only one or a few of the reference nucleic acids in the

multiplicity of wells. This approach is particularly convenient because the sample nucleic acids may be produced by "multiplex" amplification using, for example, a multiplicity of PCR primers to generate a mixed batch of sample nucleic acids, and this multiplex sample may be tested against a variety of different reference nucleic acids isolated in different wells. In this method, the sample nucleic acids which are unrelated to the reference nucleic acids will either fail to form duplexes under any conditions, or will form duplexes which melt at far lower temperatures than the sample nucleic acids which are intended to be tested against the reference nucleic acids. Thus, in each well, a reference nucleic acid is effectively tested against only one or a few sample sequences (e.g., two sequences corresponding to two alleles derived from a sample of genomic or cDNA).

In another preferred embodiment, the methods of the present invention employ "biological assay chips" bearing immobilized reference nucleic acids on the surface. The preferred structures for such chips are described in detail below. With respect to the method, however, the immobilization of different single-stranded reference nucleic acids or different double-stranded helical molecules at different sites on the surface of the chips provides some of the same advantages as a microtiter plate, even though the different sites are in fluid communication. Thus, the immobilization of molecules allows a number of assays to be performed simultaneously in a small space, and permits the use of a mixed or multiplex solution of sample nucleic acids. Again, this approach is particularly convenient because the sample nucleic acids may be produced by "multiplex" amplification using, for example, a multiplicity of PCR primers to generate a mixed batch of sample nucleic acids. Furthermore, as the various sites on the chip are in fluid communication, the various nucleic acids included in the sample may be expected to associate (if at all) with the most closely matching reference nucleic acids. As will be obvious to one of ordinary skill in the art, the patterning of the reference nucleic acids on the chip surface may be used to advantage; placing, for example, sites bearing a series of common polymorphisms or mutations on adjacent sites and determining the number and temperatures of transitions from helix to random chain at each site.

As an alternative to covalent attachment to a fixed surface, the samples can be embedded in a thermostable gel on a chip, slide or other controlled surface. A significant advantage of any embodiment using a fixed surface or solution system rather than gel electrophoresis is that all samples are exposed to essentially the same medium and indicator concentration, as well as the same rate of temperature change. In embodiments using a thermostable gel, reference and test

strands will be annealed before introduction into the gel, and the rate of change of temperature will be a function of the thickness of the gel.

As described above, these methods, whether employing microtiter plates or biological assay chips, or microscope slides or spectrophotometer vials or in gels, will include helix- stabilizing elements. These helix-stabilizing elements may be present in either the sample and/or the reference nucleic acids. Where the helix-stabilizing element is not a G-C rich extension, a separate cross-linking step may be introduced when necessary.

Finally, because these methods allow for the use of spatial arrays of fixed sites, they are particularly suitable to the use of robotics or automation in the filling of microtiter wells and/or the production of biological assay chips, as well as the detection or measurement steps.

5. Test Solution Heating and Temperature Measurement Essentially any standard laboratory method may be used to heat the test solutions of the present invention. For example, microtiter plates are commonly heated by convection from a heat source such as a fluid bath or heating surface. If biological assay chips comprise a conductive material, they may be heated by their resistance to the passage of an electric current through the material. Preferably, however, biological assay chips or the immersion medium for a fluid bath will be heated by contact with a well-controlled surface. In all cases, the main consideration is the ability to control the temperature and its rates of change with precision. Close temperature uniformity is desirable, and inclusion of a temperature standard in each sample (a control DNA with well-defined melting temperature) may be helpful to calibrate each position. Temperature can be measured continuously close to the samples by means of a thermocouple, thermistor, or RTD.

Typically, the test medium is expected to be initially maintained at a temperature of about 60-65 OC, although initial temperatures as low as 40 "C may be employed. The temperature is increased at the maximum controlled rate consistent with temperature uniformity through the depth of each nucleic acid sample. For molecules anchored to a chip, where temperature uniformity is needed only across approximately 100nm, the rate may be as high as 10 "C/sec, but will be limited by the speed of data collection. For solutions, the rate will fall between 0.3 and 3"/minute, depending on the dimensions.

6. Data analysis In the preferred embodiment, the intensity of fluorescence and the temperature are recorded for each sample at very small temperature intervals, preferably smaller than 0.01", by

means of a suitable photometric device, for example, a CCD camera. The data is subjected to computational analysis to correct for thermal quenching and instrumental noise and set into correspondence with a theoretical transition function. This permits accurate interference of the melting temperature and the number of components in each sample.

The most robust indicator of a sequence difference between reference and test DNA molecules is given by a sample consisting of randomly reassorted strands of the two molecules.

Such a mixture contains four differing molecules (two heteroduplex and two homoduplex), and even if the melting temperatures are too close to be clearly differentiated, the transition zone will be distinctly broadened. The breadth of the zone indicates the presence of mismatches in the mixture without reliance on accurate temperature measurement.

C. Products and Kits for Nucleic Acid Analysis The present invention also provides for products and kits useful in the methods of the present invention. In particular, the present invention provides for biological chips and assay kits for analyzing nucleic acids for identities and mismatches with reference nucleic acids.

As the methods of the present invention are directed to the determination of helix-random chain transitions rather than mere hybridization, and because the helix-stabilizing elements of the invention permit a finer degree of resolution of small differences in nucleotide sequences, the sample and reference nucleic acids of the present invention are substantially longer than those employed in the prior art. Thus, in one set of embodiments, the present invention provides for biological assay chips in which a nucleic acid is bound to a substrate and the nucleic acid comprises at least 50, 75, 100, 150, 200, 250, 300, 350, 400, or 1000 nucleotides or nucleotide analogs. These are far longer than the oligonucleotides of 10-30 nucleotides typically employed in the prior art. The reference nucleic acids may be bound to the substrate at either the 5 or 3 end.

In the preferred embodiment, the present invention provides biological assay chips in which the reference nucleic acids of the invention are bound to predetermined sites, and in which a helix-stabilizing element is bound to the reference nucleic acid. As described above, the helix- stabilizing element may be a G-C rich extension element or a cross-linking element, and may be bound to the 5' or to the 3 end. Helix-stabilizing elements may be either a helix-stabilizing sequence, such as a G-C rich element or a helical cross-linking element (i.e., a cross-linking reagent, such as a psoralen moiety, or a cross-linking site, such as that favored by psoralen)

In one embodiment of the present invention using fixed-sites, biological assay chips are provided with a surface on which single- or double-stranded DNA molecules, which may carry a reactive end structure, are to be retained. In this embodiment, DNA molecules are to be applied by the user by one of numerous methods known in the art, for example, slot-blots or automated pip wetting methods.

In an alternative embodiment, biological assay chips are provided carrying bound single strands of reference DNA sequences. The reference sequences can include a large number of sequence selections within genes or in multiple loci representing both normal and mutant forms or all of polymorphic forms Varied reference sequences may be distributed over the chip for intensive scrutiny of one or a few individuals. Alternatively, the chip can carry the same set of bound related strands to one or more sites over the entire array for a closely defined survey of a large number of individuals. In this alternative, microtiter or automated pipetting techniques may be used for the initial loading and/or cross-linking steps.

The substrates of the present invention may be any substantially rigid surface to which the reference nucleic acids may be directly or indirectly (i.e., through a linker molecule) bound.

Examples may be found in, for example, U.S. Pat No. 5,412,087, U.S. Pat No. 5,482,867, U.S.

Pat No. 5,527,681, U.S. Pat No. 5,539,097, and references cited therein. Currently preferred substrates include glass, indium oxide coated glass, and silicon, although it is expected the polymeric materials may provide advantages in cost and ease of manufacture.

The reference nucleic acids of the invention may be bound to the substrate by any standard chemistry used in the art. Examples may be found in, for example, U.S. Pat No. 5,412,087, U.S.

Pat No. 5,482,867, U.S. Pat No. 5,527,681, U.S. Pat No. 5,539,097, and references cited therein.

The currently preferred method is by coupling avidin, streptavidin or neutravidin to the substrate surface, and coupling biotin to the reference nucleic acids. The biotin may then be allowed to bind the avidin component to immobilize the reference nucleic acids.

In additional embodiments, the present invention provides for kits including any of the above-described biological assay kits in combination with one or more of(l) a cross-linking reagent, (2) an indicator of helix-formation, (3) a primer for the production of a G-C rich extension element substantially complementary to a G-C rich extension element bound to a reference nucleic acid, (4) a primer for the amplification of a sample nucleic acid substantially complementary to a reference nucleic acid, (5) a primer for the amplification of a sample nucleic acid substantially complementary to a reference nucleic acid and including a contiguous G-C rich

extension element substantially complementary to a G-C rich extension element of a reference nucleic acid and (6) a reactive surface to which appropriately terminated single or double strand DNA can be bound or retained.

In additional embodiments, the biological assay chip includes one or more of (1) connections for electrodes to generate heat by passage of a current through the chip substrate, (2) a thermometer, (3) a thermocouple, and (4) a thermistor.

Examples The following non-limiting examples and theoretical discussion are presented to illustrate certain aspects of the present invention. As will be understood by one of ordinary skill in the art, these are but examples of particular embodiments and not indicative of the full scope of the invention disclosed herein.

Poland-Fixman Theorv The Poland-Fixman theory provides a theoretical thermodynamic model, suitable for calculation on computers, which incorporates the close coupling among neighboring and more remote base pairs and predicts the distribution of melting temperatures along any nucleic acid molecule from its sequence. Although the pattern of melting temperature in most arbitrary genomic sequences may not be not sufficiently uniform to define a single, sharp melting temperature for the entire segment, the use of a G-C rich extension or cross-link exerts a profound, long-range effect.

Figure 1 shows the distribution of A-T and G-C density along a genomic segment, plotted by means of statistical smoothing of a discontinuous function in which an A-T pair is given the value 0, and a G-C pair the value 1. The corresponding calculated pattern of thermal stability is plotted in Figure 2. The sequence generates an irregular contour, despite close coupling. The corresponding patterns with a 30 bp G-C extension element is attached at the 5 end are shown in Figures 3 and 4. It can be seen that the entire segment becomes thermally uniform, now termed a Poland-Fixman domain, with the 5' G-C rich extension. The theoretical melting calculation can include the effect of substitution and mismatches.

More detailed discussions of the theory may be found in, for example, Poland (1974) Biopolvmers 13:1859-1871; Fixman and Freire (1977) Biopolymers 16:2693-2704; Lerman et al.

(1984) Ann. Rev. Biophys. Bioeng. 13:399-423.

Direct observation of the melting progression To consider now the present invention, in which the progression of melting is followed at a fixed site by means of an indicator of helix-formation, it is useful to return to the theoretical properties. Figure 5 shows the calculated first derivative of the number of melted base pairs as a function of temperature for an indefinitely high concentration of a genomic beta globin fragment and a splice mutant. This curve is the counterpart of the derivative of hyperchromicity as a function of temperature as recorded in a spectrophotometer. It is seen that the slight shift in the genomic fragment would be obscured by the breadth of the complex melting progression expected from the stepwise meltmap. However, the Poland-Fixman calculation no longer applies when strand separation takes place, anticipated here near 73 "C. Rather, the theory pertains only to the unimolecular part of melting. Separation of long strands, a bimolecular equilibrium, is less sensitive to sequence, and there may be no observable melting difference.

Use of a helix-stabilizing element, (i.e., a G-C rich extension or helical cross-link) assures unimolecular melting of the genomic segment, and the expected properties are shown for a 5 G-C rich extension in Figure 6. The flat domain results in a narrow transition, and the mutant is clearly shifted from wild-type.

Fluorescent dye as an indicator of helix formation To measure the progress of equilibrium melting with rising temperature we have followed the fluorescence of non-covalently bound dyes for which the quantum yield differs substantially between the helix-bound form and the free dye. Although the dyes we have examined are cationic and raise the melting temperature, the use of a low concentration results in only a small perturbation, and internal standards permit precise comparison. Sufficient DNA molecules of about 800 bp bound to G-C extensions can be bound to a silicon substratum in a 1 mm spot to provide a strong fluorescence signal when dye is added to the medium, and the signal declines reversibly on heating below a limiting temperature, then declines irreversibly. We infer that the reversible decline corresponds to unimolecular, partial melting, and that the irreversible sequel implies final complete strand separation.

In one experiment fluid media, we tested G-C rich extensions with wild-type and mutant Kras2 and wild-type and mutant human beta globin in the presence of oxazole yellow as an indicator of helix formation. The dye has a moderately strong affinity for the helix, and the quantum yield when bound is about 1000 times that of the free dye. Temperature was measured by means of a thermistor immersed in the DNA solution. Fluorescence and temperature were

recorded at about 0.01 "C intervals by means of a data logger. The solid line in Figure 7 shows the raw data with a small level of statistical smoothing for a mixture of the ras and globin fragments. It is seen that fluorescence declines gradually with temperature interrupted by two sharp drops. Since the gradual decline can be seen with the dye bound to pure poly (dCdG), it is unrelated to melting, and we have used the poly (dCdG) function to infer the fluorescence intensity if it were not suppressed by the temperature. The corrected function is indicated by the dashed line. The narrow melting transitions of both sequences standout sharply. Temperature uncertainty has been accommodated by superposing the transition peaks of the globin fragment Z (identical in both experiments). The same data for the ras transition are replotted differently in Figure 8, where experimental curves can be compared with the results of the Poland -Fixman calculation. Aside from the scatter in experimental points, the peaks are distinctly a little broader.

This may be largely attributable to incomplete temperature uniformity in the cuvette, a consequence of the rate of continuous temperature elevation and the thermal conductivity of the holder, fused silica and the solution itself.

Data analysis. inferring midpoints It was next considered whether the peak shift shown in Figure 8, is adequate for discrimination of a mutant pattern when the data are derived from a mixture. The precision in defining the midpoint of the melting transition can be estimated by error analysis, using the least symmetrical globin peak as a worst case. The top half of the peak was fitted to a modified Gaussian function by means of the Marquardt algorithm, which arrives at the values of the midpoint (melting temperature), the breadth and amplitude of the peak, and parameters of a hypothetical linear baseline that give a minimum chi-squared value. The chi-squared as a function of arbitrary midpoint displacement was then computed, allowing all other parameters to reach optimum values. The results indicate that chi-squared very sharply defines the midpoint, and it doubles for an error of+/-0.0l "C. This implies that the expected peak shift for a G to A substitution of a mismatch is likely to be 10 to 60 times as large as the error in estimation of the midpoints. If mutation detection is to depend on quadruplets rather than merely peak shifts, the ease of identification is further enhanced.

Quadruplets in the fluorescence pattern The theoretical calculation should provide a reasonably realistic anticipation of data from a sample containing reassorted strands of a wild-type and a putative mutant (non wild type). Using the Marquardt algorithm we have deconvoluted the simulated complex profile, assuming that

there are precisely four components. The melting midpoints inferred frdm the deconvolution agree with all of the values of the separate transitions used to construct the simulation within 0.01 "C. Although the sample and reference sequence (e.g., PCR products) mixed for reassortment will usually be in different concentration, the calculation will operate if the minor component is visible.

Surface-bound arrays Measurements in solution like these could be implemented with a microtiter array or the like for effectively simultaneous recording of a large number of samples. However, the possibility of a surface-bound array offers substantial advantages, both in the number of samples that can be accommodated within manageable dimensions, and in simplifying heat transport problems.

In the experiments above, the signal was derived from about 0.1 ng of DNA, somewhat more than can be anchored within a 1 mm square flat surface. With surface bound DNA, temperature uniformity needs to extend less that 1/2 micron into the solution, such that the difference top to bottom can be reduced to 0.001 OC in less than 0.1 second with a temperature ramp of 1/3 OC/sec. Since the unraveling time for a 500 bp helix can be estimated as less than 1 msec, melting equilibrium is easily maintained. The time required to determine the melting profile would be limited by the detector system, not by the sample.

Projection to a multiplexed system A further elaborated system may be developed to scan, for example, a 1 inch square, 15x15 array of 225 spots, providing scrutiny of 67,500 kb for single base variants in 20 minutes, including computing time, over a temperature ramp of 10 or 20 "C. The profile of each spot is informative, and the freedom to include a large number of independent sequences permits multiple probes.

Of the several approaches to surface anchoring have been tested, biotin binding to silicon- linked neutravidin appears the most promising. This method permits considerable flexibility in application in that it does not require elaborate, factory-only chemistry as necessitated by current oligonucleotide methodologies to prepare arrays. Any distribution of PCR products or single, biotin-ended strands, or reassortment of PCR products or single, biotin-ended strands, or reassortment of PCR products would be accommodated. However, any means by which DNA samples can be retained in a small, defined locale as a close array can be regarded as equivalent.