MOLECULAR COMPUTER - UNIV SOUTHERN CALIFORNIA

Title:

MOLECULAR COMPUTER

Document Type and Number:

WIPO Patent Application WO/1997/007440

Kind Code:

A1

Abstract:

A molecular computer carries out computations. Solutions are represented by molecules. Procedures are carried out on the molecules to separate the molecules in a way to make it more likely that the molecules represent the proper result.

More Like This:

JPH04340157	ASSOCIATIVE LEARNING SYSTEM
JPH05197703	LEARNING SUPPORTING DEVICE
JP3417574	[Title of Invention] Neuron element for fuzzy logic operation

Inventors:

ADLEMAN LEONARD M

Application Number:

PCT/US1996/013532

Publication Date:

February 27, 1997

Filing Date:

August 21, 1996

Export Citation:

Click for automatic bibliography generation Help

Assignee:

UNIV SOUTHERN CALIFORNIA (US)

International Classes:

G06N99/00; G06N3/12; (IPC1-7): G05B11/01

Other References:

SCIENCE, 28 April 1995, Vol. 268, BAUM, E.B., "Building an Associative Memory Vastly Larger than the Brain", pages 583-585.

Download PDF:

View/Download PDF PDF Help

Claims:

What is claimed is:

1.

A method of performing a computation using molecular sequences, comprising: obtaining a plurality of different molecular sequences; associating said plurality of different molecular sequences to a set of possible solutions to the computation such that molecules represent solutions to the computation, said set of possible solutions representing possible solutions to the problem; and carrying out a chemical procedure on all of said plurality of molecular sequences, said chemical procedure making it more likely that the molecular sequences represent the desired results.

2.	A method as in claim 1 wherein said molecular sequences are DNA.

3.

A method as in claim 1 , wherein said computation has correct and incorrect solutions, further comprising determining at least one characteristic of said molecular sequences which represent said incorrect solutions , and said carrying out comprises removing said molecular sequences having said at least one characteristic.

4.	A method as in claim 1, wherein said molecular sequence is a strand of molecules with different molecules in different positions .

5.	A method as in claim 1, wherein said molecular sequence is a backbone sequence with specified chemically bonded species at specified positions .

6.

A method of computing using DNA as a computing medium, comprising: providing a plurality of first DNA molecules; assigning a plurality of first DNA molecules to a plurality of states, said states comprising a set of pieces which can be connected to solve a problem; adding a reaction medium to the DNA molecules , which selectively connects together the DNA molecules to randomly form a plurality of randomlyformed second DNA molecules , carrying out at least one procedure on the second DNA molecules to remove second DNA molecules sequences which do not represent the proper solution, according to a predetermined criteria; and establishing the DNA molecules which remain after said at least one operation as being more likely to represent a result to the problem. A method as in claim 1 , wherein a sufficient number of said first DNA molecular sequences being provided such that said second DNA molecules sequences comprise a universe which is statistically likely to include a complete set of all solutions to the problem.

7.	A method as in claim 6 wherein said at least one chemicallybased operation includes separating second DNA molecules having predetermined properties from the other DNA molecules which do not have the predetermined properties .

8.	A method as in claim 6 wherein said at least one chemicallybased operation includes amplifying only those DNA molecules which have certain properties and not amplifying DNA molecules which do not have the certain properties .

9.	A method as in claim 8 wherein said separating is done by attaching second DNA molecules having predetermined properties to magnetic beads .

10.	A method as in claim 8 wherein said separating is done by gel electrophoresis and subsequent separating of resulting bands.

11.	A method as in claim 6 further comprising attaching a sticker to a predetermined portion of the second DNA molecules, the sticker being indicative of some characteristic of the operation which has occurred.

12.	A method as in claim 9 further comprising separating DNA molecules which have predetermined stickers in a predetermined position from DNA molecules which do not have said predetermined sticker in said predetermined location.

13.

A method of solving a complex problem using molecules to carry out the computation, comprising: separating the problem into aspects, each aspect comprising a part of the solution to the problem; defining an alphabet of allowable substances , and a universe of substances formed from said alphabet, said universe of substances including a plurality of substances each having a different state, which states can be chemically distinguished from one another; combining the molecular sequences in random ways , to provide a plurality of combined substances, said substances representing possible connections between said substances and hence representing a possible solution to the problem, determining at least one chemical procedure for a proper solution to said problem, said at least one chemical procedure changing the substances in a way to make it more likely that the substances represent solutions to the problem which are correct; and processing the substances left after said testing, as likely solutions to the problem.

14.

A method as m claim 14 wherein said combined substances represent a universe of all combined substances statistically being likely to represent all possible states representing all possible solutions to the problem and wherein said processing comprises identifying substances left by said changing, and, carrying out an errorreducing operation to exclude at least some substances do not represent errors from the operation steps.

15.	A method as m claim 15 wherein said error reducing operation comprises repeating said changing operation to ensure that improper substances are removed from the solution by said changing.

16.	A method as in claim 15 wherein said error reducing operation comprises identifying a substance which represents said likely solution, identifying the solution it represents, and using an electronic computer to verify that the solution is correct.

17.	A method as in claim 14 wherein said substances include at least one start and end sequence thereon.

18.	A method as in claim 14 wherein the problem to be solved is NP complete .

19.

A method of computing using a molecular computer, comprising: obtaining a plurality of molecules, sa d plurality of molecules comprising a universe including possible solutions to a problem; carrying out at least one chemical reaction on said universe of molecules, said reaction removing molecules which do not represent a proper solution; and Iteratively carrying out another chemical reaction to continue to exclude molecules until a desired result is obtained. A method as in claim 20, further comprising an error reducing process to reduce errors in the chemical reaction.

20.	A method as in claim 21 wherein said error reducing comprises repeating said chemical reactions a plurality of times to minimize error in the chemical reactions.

21.	A method as in claim 22 wherein said error reducing comprises an initial step of equalizing a probability that negative results are obtained erroneously and that positive results are obtained erroneously.

22.	A method as in claim 23 wherein said error reducing comprises identifying the molecules which represent the proper answer, identifying the solution said proper answer molecule represents , and using an electronic computer to verify said solution.

23.

A molecular computer, comprising: a first container of molecules , said first container of molecules including a plurality of molecules with different chemical characteristics therein, and each of said plurality of molecules comprising a representation indicating some aspect of a possible solution to a problem; a first molecule processing apparatus , said first molecule processing apparatus including an element which removes molecules having a certain chemical characteristic from said first container, said certain chemical characteristic being a characteristic which indicates some aspect related to whether said molecule is a proper solution to the problem; and an analysis element which monitors molecules in the first container, to determine characteristics of the molecules remaining in the first container and using said characteristic to determine a solution to the problem.

24.	A computer as in claim 25, wherein said first molecule processing apparatus is an electrophoretic separating element.

25.	A computer as in claim 25, wherein said first molecule processing apparatus is an amplifying device, which amplifies desired molecules, thus effectively diluting undesired molecules .

26.	A computer as in claim 27, wherein said amplifying device is a PCR device.

27.

A computer as in claim 25, wherein said plurality of molecules is formed by: obtaining molecules indicating pieces of possible solutions; obtaining complement molecules , which bind to and hold together said pieces of the original solution to other pieces; and combining said pieces and said complements into a solution so that all pieces are statistically likely to combine with all complements to form all possible solutions .

28.

A method of producing a computational result comprising: determining a problem, and determining pieces of the problem which pieces can be combined to form a solution; obtaining a plurality of substances and assigning substances to specified pieces; obtaining coupling agents for the substances , said coupling agents having a capability of binding together any of the substances representing a piece with any other substances that represents an allowable piece which can be coupled to said one piece; and combining said substances with said coupling agents in a solution, to provide combinations of substances in the solution which represent a plurality of possible solutions .

29.	A method as in claim 30, further comprising carrying out an additional operation, based on characteristics of substances in the solution, to remove all substances from the solution which represent incorrect solutions.

30.	A method as in claim 31, wherein said substances are DNA molecules.

31.	A method as in claim 30, wherein said additional operation includes a polymerase chain reaction to reproduce desired molecules.

32.	A method as in claim 31, wherein said removing comprises electrophoretically grouping substances according to their size, and excising selected substances according to their size.

33.

A computer which carries out calculations based on molecular reactions, comprising: a plurality of different kinds of substances, arranged in respective containers; a plurality of different kinds of reaction mechanisms for the substances, each of said reaction mechanisms comprising a mechanism which allows selectivity in reacting with certain substances; a plurality of moving elements operating to move the containers of substances from a first position to a second position, an electronic computing element, said electronic computing element operating to control said moving elements, and to control positions of said moving elements according to a predetermined sequence of operations, said sequence of operations using a molecule processing element to carry our a first operation which separates substances which have a predetermined property from substances which do not have the predetermined property, the predetermined property being one which is related to a problem to be solved, and the separating being one which makes the separated substances more likely to be indicative of an answer to the problem.

34.	A computer as in claim 35, wherein said electronic computing element also includes structure operating to investigate substances in said tube after said first operation, to assign said substances with their answers, and to test said answers to determine that said answers are correct.

35.	A computer as in claim 35, wherein said substance processing element which includes magnetic beads which stick only to certain chemicals .

36.	A computer as in claim 35, wherein said substance separating elements include a gel electrophoresis and separating system.

37.

A computer as in claim 34, wherein said substances include first, marker receiving portions, said marker receiving portions either including a marker or not having a marker, and one of said substance processing elements is an element for adding a marker, another of said substance processing elements being an element for detecting the presence of a marker in a specified location.

38.	A computer as in claim 39, wherein said substance includes strongly bond portions defining a chemical makeup of the nonmarker portions, and weakly bond portions defining the chemical makeup of the marker portions .

39.	A computer as in claim 40, wherein said weakly bond portions of the substance can be removed from strongly bond portions of the substance.

40.

A method of performing a computation using substances, comprising: obtaining a plurality of substances; assigning substances to different solutions or sets of solutions to a problem; carrying out a molecular function on the substance set which makes the set more likely to correspond to an answer to the problem.

41.	A method as in claim 42 , further comprising repeating said reacting step until only solutions remain.

42.	A method as in claim 42 , wherein said substances are DNA substances.

43.	A method as in claim 42, further comprising marking said substances in a way to indicate some aspect of a reaction which it has taken.

44.	A method as in claim 45, wherein said marking is done by attaching a sticker to a specific position in the substance.

45.	A method as in claim 46, wherein said sticker is more weakly bond than some other reaction which the substance has made.

46.

A method of carrying out a computation using a molecule, comprising: obtaining a plurality of molecules which have different properties, associating each of said molecules with a different aspect of the problem; carrying out some operation on the plurality of molecules to change a ratio between the different molecules having the different properties in a way which makes it more likely to solve the problem; and marking some aspect of the thus segregated molecules to indicate said reaction. A method of performing a computation using substances, comprising: obtaining a plurality of different substances; associating said plurality of different substances to a set of possible results to the computation such that substances represent solutions to the computation; and carrying out a procedure on said plurality of substances to modify the substances, said procedure making it more likely that the substances represent the desired results.

Description:

MOLECULAR COMPUTER

Origin of Invention The present invention was supported by the National Science Foundation under grant CCR-9214671. The government may have certain rights in this invention

Field of the Invention The present invention describes computing techniques using chemical substances as the computational medium. More specifically, the present invention describes massively- parallel computing techniques using molecules , such as DNA molecules , the computing techniques being carried out using procedures such as cellular processes or experimental processes

Background and Summary Computing is a science of calculating some result based on some input information A computer usually carries out processing steps on input information that are used to calculate the result.

Computers today are based on the earliest architecture of the UNIVAC 1 which operated based on states of bistable electronic devices . The UN VAC 1 was made using vacuum tubes Solid state physics revolutionized the computer industry, with their smaller s ze, lower power dissipation, and long term stability. These devices made the electronic computer operate stably and reliably. However, it has been said that there have been no radical breakthroughs in the basic concepts of computing since the dawn of the computer industry. See for example, "An introduction to Microcomputers", Adam Osborne, 1976.

Today's computers are far advanced beyond those early computers. Speeds of computing have increased many orders of magnitude since the earliest computers. Computer architectures have become more and more complicated, and computers have shrunk to a smaller and smaller s ze The

size of the devices is imposing physical limits, e.g. from the wavelength of the radiation that is used to etch the materials

Moreover, in order to carry any operation, elements of the computer must change states and hence do work. The work s done in an extremely inefficient manner, although not so inefficient as the vacuum tube Each time the computer processes change state, they must produce some amount of wasted energy which is dissipated as heat There are two ways of increasing the speed of operation of an electronic computer: Either more switching elements are provided, or the switching elements are driven faster. In either case, the amount of heat produced per unit time increases as the computational power increases The heat produced by these computers becomes progressively more difficult to handle.

Many predict that the electronics-based computers will face a ceiling where the computing power becomes limited by physical size and cooling requirements . Alternative techniques have been suggested for computing. Many of these suggestions have been nothing short of science fiction Some have suggested that crystal faces be used to store and process information These and other suggestions have been largely prophetic, as no one has ever suggested a truly practical way of effecting a non- electromechanical technique of computing to obtain results. The inventor of the present invention noticed certain similarities between DNA or molecular interactions and the way that a computer operates He noticed that many of these DNA processes can be carried out in a selective way that is they will combine in a certain way to the exclusion of other combinations. The inventor had the totally unobvious insight of allowing this analogous operation to be used to steer molecular reactions to actually carry out a computing function.

The inventor noticed that the strength of this technique is in its parallelism. A drop of DNA can include

more than 10 ¹⁴ molecules. Each of those molecules has the capability of representing a specific state. Hence that drop can represent more than 10 ¹⁴ states.

The present invention describes how these procedures can be used to carry out multiple operations . Each substance, e . g. , a molecule, is capable of reacting. The proper reacting procedure is used which allows selective reacting of the di erent molecules . This selective reacting uses these molecules to effect complex operations in a massively parallel manner - where each operation can be carried out on all, or at least a large portion of the DNA molecules in parallel.

A first embodiment of the present invention describes how extremely complicated problems could be done in a few hours using DNA as a computing medium. More generally, however, any molecule or complex (i.e. collection of molecules which are held together by non-covalent bonds) which is capable of reacting in a specified way can be used according to the present invention. All of these features will be described in further detail throughout this specification.

Another aspect of the invention regards how the different characteristics of this kind of computer are improved. For example, the inventor realized that the error rate in such computers is many orders of magnitude higher than that in electronic computers . The present invention describes how this hurdle can be overcome.

As evident from the above, it is an object of the present invention to define a computer system, and a mechanism for operation of that computer system, which carries out computational operations in an unconventional manner using substances such as molecules . One preferred mode of operation of the present invention uses DNA processes which are similar to the DNA processes used by the human body. Massive parallelism of the computation is enabled by carrying out operations at a molecular level . Each step allows millions of parallel operations to occur.

The present specification shows in detail the impressive potential of molecular computation. However, the problems which are to be solved on molecular computers must be carefully chosen and defined A new kind of programming language may be necessary to define the steps that the DNA computer carries out.

Electronic computers can execute a variety of operations and there is great flexibility in executing these operations. Two 100-dÎ¹g t integers can be multiplied quite efficiently on an electronic computer It would be a daunting and very inefficient task to do such a calculation on a molecular computer using currently available protocols and enzymes. This might be considered analogous to the early days of electronic computing, where adding 2+2 on the UnÎ¹vac-1 might have been considered unreasonably difficult because of the precise process control that was necessary to get consistent results. Today, however, this is a trivial task.

Certain intrinsically complex problems such as the directed Hamiltonian path problem and others described herein may be efficiently solved by massively parallel searches. These searches can be organized to take advantage of the characteristics of molecular biology as described herein. The techniques of the present invention may be quite well suited for solution of many NP-complete problems.

An important advantage of this system was noticed by the inventor only after he actually conceived of the invention in its fullest form. This system is extremely energy efficient. In fact, the inventor has hypothesized that this system comes as close to the limit for energy efficiency as is possible. Natural cellular processes have had billions of years to evolve. One consequence of this evolution is extreme energy efficiency. The inventor hypothesizes that almost no further efficiency is possible, because any further efficiency would actually prevent the operation from occurring reliably. This system is many

orders of magnitude more efficient than any electronic computer.

It is another object of the present invention, therefore, to describe a computer which is extremely energy efficient.

All of this is accomplished by obtaining a plurality of substances, e . g , molecules or complexes, which are of a type that can combine with other substances. Assuming the substances to be molecules, states are assigned to each of the molecules. Enough molecules are provided to make it statistically almost certain that the set of molecules form a universe of many results. The molecules are selectively processed by procedures, e . g. , molecular reactions of a type that divide certain molecules, representing desired results, from the molecules that represent undesired results. The processes are operated to carry out various functions and obtain various results. Importantly, however, the processes are carried out in massively parallel systems, whereby each possible process is carried out many times , in parallel.

When the computer is operated based on DNA processes , these DNA processes mimic the operation of nature on a molecular level .

Brief description of the Drawings These and other aspects of the invention will now be described in detail with reference to the accompanying drawings , wherein:

Figure 1 shows a diagram of a simple Hamiltoman path problem; Figure 2 shows a flowchart of the technique used according to the first embodiment to determine the solution to the proble ;

Figure 3 shows a second embodiment of the Hamiltoman path problem; Figure 4 shows how a Hamiltoman can be encoded as DNA sequences;

Figures 5A - 5C show how DNA representing different states are processed in certain ways; and

Figure 6 shows a practical embodiment of the molecular computer, and Figure 7 shows a flowchart of operation of the practical molecular computer.

Description of the Preferred Embodiment

The present invention defines an entirely new way of carrying out operations to achieve results The operations are carried out using a number of parallel chemical reactions

Definitions The following definitions apply throughout this specification and claims.

A procedure is any process that alters properties of chemicals This includes, for example, a chemical reaction, a chemical process, a cellular reaction, a cellular process, an experimental procedure, an experimental process, a laboratory procedure or a laboratory process.

A substance is any molecule or a complex of molecules This includes DNA and other molecules

A sticker is a second substance which can be attached to and/or removed from a first substance to modify the information represented by the first substance The first substance might be referred to as a backbone

A problem is a question for which a "correct solution" needs to be determined. Problems can have a number of "possible solutions" of which a portion are "correct solutions" and the rest are "incorrect solutions."

First embodiment - The Hamiltoman Path Problem.

The first embodiment of the invention was directed to solving the so-called Hamiltoman path problem The

Hamiltoman Path problem can be posed as ollows. Assume a fixed number of nodes and possible paths between the nodes.

The correct solution to the problem requires determining whether a path can be made which starts at one node, visits all other nodes once and only once, and finally arrives at the last node. Stated more mathematically, given a directed graph G with designated vertices V _Î¹n and V _out. This graph G has a Hamiltonian Path only and only if there exists a sequence of compatible one way edges e _:, e ₂, e ₃... The path for a correct solution includes a number of one-way edges must begin at V _Î¹n, end at V _out, and enter every other vertex exactly once.

A four-node graph with a hamiltonian path will be described with reference to FIG. 1. The correct solution to the problem for the four-node graph is trivial . However, the correct solution becomes progressively and exponentially more difficult to solve as the number of nodes increases.

FIG. 1 shows a graph having V _Î¹n = A and V _out = D. Th s has the shown Hamiltonian path given by the edges A-B, B-C, C-D. With the small number of nodes shown, it can be easily seen that there is a hamiltonian path. The reader can easily work out that changing V _Î¹n to node B obviates there being any Hamiltonian path, because no edge will enter vertex A.

The Hamiltonian path problem is known to be NP complete. This means that mathematicians suspect that there is no efficient way to solve this problem and that essentially it must be solved by trial-and-error. As the number of nodes gets higher, this becomes more and more computationally extensive. All known algorithms have exponential worst case complexity. An important realization by the inventor of the present invention is that this problem, and other problems of this type, can be posed in a way that allows calculation using a number of parallel computations , each of which uses a small number of simple techniques . The number of these computational techniques is linear, rather than exponential, as a function of the size of the problem.

The path is analyzed according to the flowchart shown in FIG. 2.

Step 200 generates all possible paths through the graphs This is done by assigning each of the nodes to a label , which label is represented by a DNA molecule This embodiment is carried out using DNA as the preferred molecule. Hence the ith vertex in the graph is associated with a DNA molecule which is denoted as O _Î± .

This first step obtains the universe U including every possible path through the directed graph The remainder of the process consists of determining criteria that can be used to identify all incorrect paths representing the incorrect solutions, and discarding those paths until only the correct path representing the correct solution, if any, remains.

The solution to the hamiltoman path problem must begin at V _Î¹n and end at V _out. Hence, all paths which do not begin with V _Î¹n and end with V _out are discarded as incorrect paths at step 202 to form a modified universe U _x . Another important feature of the hamiltonian path problem is that it must enter each of the vertices only once. Hence, for N vertices, there must be only N - elements in the path. Step 204 discards all paths that have any number of vertices other than N. This forms yet another modified universe U ₂ Hence, U ₂ includes paths staring at V _Î¹n and ending at V _out, and which enter the proper number of nodes .

If any path in U ₂ enters each of the vertices, it is a Hamiltonian path and hence a correct solution Step 206 discards everything that does not enter every vertex of the graph. The result is the solution S. If S has any DNA in it, then the graph has a Hamiltonian path represented by the molecules which have survived the test. If not, there is no solution to the Hamiltonian path problem. As described above, one important feature of this embodiment is its ability to generate many, many molecules, each of which represents some part of the problem. Since

there are so many molecules generated, it becomes statistically almost a certainty that all possible solutions are represented by the molecules. The universe of all possible solutions is then selectively further processed, to remove all incorrect solutions. The process will now be described m more detail herein.

According to the present invention, each possible solution is assigned to a designated sequence of DNA. First, a brief primer on DNA and its combinatorial characteristics.

DNA is a double-stranded molecule in which one strand of DNA is annealed to a second strand of DNA (i.e. , the "complement") . The strands of DNA are polar in nature, with a phosphate group at the 5" end, and a hydroxyl group at the 3' end. There are four basic components (i.e., nucleotides) that make up each strand of DNA structures: A, G, T and C. Each nucleotide has a so-called Watson-Crick Complement, which determines whether two strands of DNA can anneal to each other. The nucleotide A in one strand of DNA will only anneal to the nucleotide T in a second strand of DNA. Similarly, the nucleotide C will only anneal to the nucleotide G. An example of a double-stranded molecule is shown here with its phosphate and hydroxyl groups.

5' -P-ATAGCGT-OH-3 ' 3 '-OH-TATTCGCA-P-5 '

For example, the Watson-Crick complement of ATTCG is TAAGC. DNA's chemical composition allows a first stand of DNA to anneal to a DNA molecule which is the "complement" of the original compound. DNA molecules usually only anneal to other DNA molecules which contain their complement.

The present embodiment first obtains a universe of all possible paths. This is done by obtaining a unique DNA molecule for each part of each path, and then combining those parts to get each possible path.

Each vertex in the graph is assigned to a particular DNA sequence O. We define the DNA sequence as including a forward portion O _f and a rear portion O _r. Now, for each

transition (edge) between two vertices, we assign a specific combination of the previously assigned parts of vertices. That is, for an edge from vertex A to B we assign the sequence A _rB _Â£. Figure 1 shows four vertices A, B, C and D. Vertex A is assigned for example to the 6-mer 5 'AGTCAT3 ' . The 6-mer sub-divides this DNA molecule into A _f and A _r, where A^ is AGT and Al is CAT. Similarly, B, C and D are assigned with DNA sequences: B is assigned 5 'TTGACG3 ' where TTG is B _f and ACG

Now, the edge from A to B is defined as the edge including the last part of the first vertex (AJ and the first part of the second vertex (B _f) Hence, A-B is encoded as A _rB _f; here 5 'CATTTG3 ' . To state this in a more complete way, for each edge I-J in the graph, an oligonucleotide O _{j Î±} is created. If the oligos are x mers , then the oligonucleotide O ₃ is the 3 ' x/2-mer of O _Ï followed by the 5' x/2-mer of O _j This construction preserves all edge orientation. For example, 0 _{2 3} will not be the same as 0 ₃₂.

The sequence that s the Watson Crick Complement of O _x is denoted as (Î¸ _! not) .

Now, recall that we are attempting to follow step 1 of the algorithm: generating all possible random allowable paths through the graph. In order to do that, we start with a mixture of DNA molecules representing possible allowable edges between vertices.

Each such DNA molecule representing every edge between vertices of the graph, represent every allowable transition and hence each possible solution. Moreover, there are multiple copies of each such DNA molecule . Every DNA molecule representing every edge, that is, for example, DNA molecule 1 representing A^B, DNA molecule 2 representing B-C, and so on is included in the mixture. Once all these parts are combined, the next step is to bind together all possible compatible edges. Accordingly, we also add the complement of each original vertex. For

example, (A not) will bind together A _f to A _Î», thus binding all paths that end with A _f to all paths that begin with A _x.

Returning to our example given above, therefore, A is 5ÎGTCAT3' , B is 5 'TTGACG3 ' , C is 5'GATTAC3' . (B not) is 3ÎACTGC5' . Hence the 3' part of (B not) will complement or bind to the B _f in A-B edge.

We add into the solution DNA molecules for each edge, where A-B is 5 *CATTTG3 ' (A _xB _f) ; B-C is 5 'ACGCAT3 ' (B^) . We also add all vertex nots . (B not) equals 3 'AACTGC5 ' . The 3' part of the (B not) molecule, (B _f not) , will anneal to the B _f in the A-B edge. The 5" part of the (B not) molecule, (Bj not) , will bind to the B _x in the B -C edge. The vertex nots effectively link or connect together the molecule representing A-B to the molecule representing B-*C . It will bind together these edges to form eventual larger, or combinatorial , molecules .

This reaction continues to form a plurality of DNA complexes that are longer in sequence. If there are sufficient molecules in the sample, these sequences will represent every possible path formed between the vertices. That is , all random allowable paths through the sequence will have been generated at step 200.

DNA ligase is used to create a covalent bond between the 5 ' end of one DNA molecule and the 3 ' end of another DNA molecule. Ligase is an enzyme that repairs breaks in DNA and can connect two DNA molecules end to end. Thus, the ligase ensures that the molecules representing A-B and B-C, etc. , are covalently connected end to end to become a single molecule. Likewise, the vertex nots making up the other strand of DNA are covalently/chemically connected end to end by ligase. The ligase thus creates two (one for each DNA strand) long continuous chains of DNA that can withstand the further processing that is necessary to eliminate the molecules representing the undesired paths Step 202 requires discarding everything that does not start with v,â and end with vâ,â..

The operation according to the present embodiment uses an amplification technique to narrow down the proper parts . In this embodiment, we know that the desired sequence will start with A _x and will end with D _f. A polymerase chain reaction ("PCR") is preferably used to selectively reproduce molecules in a particular tube . Here we only want to selectively reproduce those molecules which start with A _x and end with D _f Copying only those certain molecules effectively dilutes the remaining ones out of existence. This is done by adding a primer The primer is complementary to one of the desired ends In other words , we add the primer corresponding to A _Î± and the primer corresponding to (D _f not) This primer pair reproduce any molecules that have the proper ends for example 10 ⁶ times Step 204 discards all paths that do not enter exactly n vertices , where n is the number of vertices . This embodiment will discard sequences which are too long or too short. This operation is carried out using gel electrophoresis . Electrophoresis is a process of using electronic separation to cause migration of molecules through a matrix. Electrophoresis puts all the molecules at one end of a gel medium and exposes them to an electric field. The short DNA molecules move more quickly under the electrophoretic field, and long DNA molecules move more slowly. This forms an eventual product with bands of nucleotides, where the final positions bands in the gel are proportional to the number of nucleotides in the sequence.

The size discrimination according to the present invention includes viewing the bands, and cutting out and discarding all those bands which are too short or too long. If necessary, this process can be performed with a high degree of resolution, permitting discrimination between molecules that differ by only a single nucleotide. U ₂ as formed from step 204 has start and end portions which are correct, and all sequences which are the right length What is left includes the correct Hamiltoman path and incorrect paths which have skipped one or more vertices and hence

visited others more than once. The last step 206 is hence used to keep only those paths which enter each vertex of the graph once.

The preferred mode of this invention carries this out by doing affinity separations. Affinity separation is preferably carried out using lum beads with material attached thereto. These materials have processes which allow them to stick to certain other characterized molecules. For example, one micron diameter agarose beads are used which have a streptavidin binder thereon. Streptavidin has a high affinity for biotin, a ligand that can be attached to the end of a DNA molecule. In this example, biotin is attached to (B not) , and the biotin/ (B not) conjugate is linked to the streptavidin binder of the agarose beads. These prepared beads can then capture DNA molecules that contain B, the complement of (B not) .

If the solution of DNA is passed over the beads, and the beads have (B not) thereon, then DNA molecules containing B stick to the (B not) on the beads . After binding, the excess solution containing DNA molecules that lacked B is poured off. Only that DNA which had B therein sticks to the beads. The B-contaming DNA can then be separated from the (B not) on the beads by heating the beads in solution. That B-containing material is then progressively passed over similar beads carrying (A not) , (C not) and (D not) , with the excess removed. The material that sticks on the last trial will have all of A, B, C and D therein. This is the end result S.

Similar techniques to those described above can be used to process a larger vertex sample.

FIG. 3 shows a graph having V _Î¹n = 0 and V _out = 6. This has a Hamiltonian path given by the edges 0-1, 1-2, 2-3, 3- 4, 4-5 and 5-6. With the small number of nodes shown, it can be easily seen that there is a hamiltonian path. The reader can easily work out that if we change V _Î¹n to node 3 and V _out to node 5, there is no hamiltonian path because there will be no edges entering vertex 0.

The Figure 2 flowchart is used to solve this problem on a larger scale. Step 200 is first carried out by assigning a DNA molecule having a unique sequence to each verte . The length of the DNA molecule depends on how large the graph is . A 20-mer sequence of DNA could be sufficient to encode 4 ²⁰ possible combinations. The preferred mode of this embodiment therefore associates each vertex i in the graph randomly with a 20-mer sequence. For each vertex i in the graph and for each i *j edge in the graph, 50 pmol of (0 ₁ not) and 50 pmol of O , respectively, are mixed together in a single ligation reaction. Each oligonucleotide (50 pmol) with 5 '-terminal phosphate residue, 5 units of T4 DNA ligase (Boehringer-Mannheim, Germany) , ligase buff r, and ddH ₂0 to a total volume of 100 Î¼l was incubated for 4 hours at room temperature. The (O not) oligonucleotides serve as splints to bind oligonucleotides associated with compatible edges together for ligation. Figure 4 shows the assignment and ligation reaction.

A random 20-mer oligonucleotide is generated for each vertex. Figure 4 shows only 0 ₂, 0 ₃ and 0 ₄. Each edge in the graph is effected by an oligonucleotide O _{x D} from the 3' 10- mer of O _i and the 5' 10-mer of 0 ₃. Figure 4 shows the sequences 0 ₂_ ₃ for edge 2-3 and 0 ₃_ _< for edge 3-4. Figure 4 shows all oligonucleotides written as 5 ' to 3 ' except for (03 not) . This results in the formation of DNA molecules encoding random paths through the graph.

The scale of this ligation reaction far exceeds what is necessary for the graph under consideration. For each edge in the graph, approximately 3 X 10 ¹³ copies of the associated oligonucleotide are added to the ligation reaction. Hence it is likely that many DNA molecules encoding the Hamiltonian path are created. In theory, the creation of a single such molecule would be sufficient. Quantities of oligonucleotides less than an attomole would probably be sufficient for this graph. Alteratively, a much larger graph could have been processed with the picomole quantities used here.

Step 202 is carried out by amplifying using PCR with primers O ₀ and (O _e not) . Only those molecules that begin with vertex 0 and end with vertex 6 are amplified. All PCR amplifications were performed on a Perkin-Elmer (Norwalk, CT) 9600 thermal cycler. For amplification in Step 2, 50 pmol of each primer and 5 units of Taq DNA polymerase (Gibco-BRL, Grand Island, NY) in PCR buffer to a total volume of 50 Î¼l were processed for 35 cycles at 9 Â°C for 15 s and at 30Â°C for 60 s. For graduated PCR, 50 pmol of each primer and 2.5 units of Taq DNA polymerase in PCR buffer to a total volume of 50 Î¼l were processed for 25 cycles at 94Â°C for 15 s and at 40Â°C for 60 s.

Step 204 is run on an agarose gel, with a 140-base pair (bp) band, corresponding to double-stranded DNA having 7 vertices. The DNA corresponding to paths entering exactly seven vertices is excised and soaked in doubly distilled water and then gel-purified several times to enhance its purity. All gels were 3 to 5% agarose (NuSieve, FMC Bio- Products, Rockland, ME) in tris-borate-EDTA buffer with ethidium bromide staining

Step 206 finally affinity-purifies the newest sample U2 with a biotin-avidin magnetic beads system. This was accomplished by a two-step process. In step A, the magnetic beads system was used to generate single-stranded DNA from the double-stranded DNA product of Step 3. (0 ₆ not) oligonucleotides were 5 ' biotinylated with LC Biotin-ON Phosphoramidite (Clontech) . To obtain single-stranded DNA, the product from Step 3 was amplified by PCR with the use of primers 0 _o and biotinylated (0 ₆ not) . The biotin group on the amplified product allowed the double-stranded DNA to be annealed to streptavidin group on the paramagnetic particles (Promega, Madison, WI) by incubating in 100 Î¼l of 0.5 x saline sodium citrate (SSC) for 45 min at room temperature with constant shaking. Particles were washed three times in 200 Î¼l of 0.5 x SSC and then heated to 80Â°C in 100 Î¼l of ddH ₂0 for 5 min to denature the bound double-stranded DNA.

The aqueous phase containing single-stranded DNA corresponding to the product of step 3 was retained.

In step B, the magnetic beads system was used to affinity purify DNA containing 0 ₁ . For affinity purification, 1 nmol of biotinylated (0 _: not) was annealed to particles as above and washed three times in 400 Î¼l of 0.5 x SSC Single-stranded DNA was then incubated with these particles in 150 Î¼l of 0.5 x SSC for 45 min at room temperature with constant shaking. Particles were washed four times in 400 Î¼l of 0.5 x SSC to remove unbound single- stranded DNA and then heated to 80Â°C in 100 Î¼l of ddH ₂0 for 5 mm to release single-stranded DNA bound to [O _Ï not] . The aqueous phase containing single-stranded DNA was retained. Only those single-stranded DNA molecules that contained the sequence 0 ₁ (and hence encoded paths that entered vertex 1 at least once) anneal to the bound {0 ₁ not) and are retained. This process is repeated successively with (0 ₂ not) , (0 ₃ not) , (O _t not) and (O _s not)

From a graph theoretic point of view, the use of equal quantities of each oligonucleotide in the ligation reaction is not optimal and leads to the formation of large amounts of molecules encoding paths that do not start at vertex O nor end at vertex 6. Another way to proceed is to first calculate a flow on the graph and to use the results to determine the quantity of each oligonucleotide that is necessary.

The product is amplified by PCR and run on a gel .

Figures 5A-5C show the results of these procedures . In Fig. 5A, lane 1 is the result of the ligation reaction in Step 200. The smear with stÏations is consistent with the construction of molecules encoding random paths through the graph of Figure 3. Lanes 2 through 5 of Figure 5 show the results of the PCR reaction in Step 202. The dominant bands correspond to the amplification of molecules encoding paths that begin at vertex 0 and end at vertex 6.

Figure 3B shows the results of a "graduated PCR" performed on the single-stranded DNA molecules generated

from the band excised in Step 204 Graduated PCR forms a "print out" of the results and is performed by running six different PCR reactions with the use of O ₀ as the right primer and (O _^ not) as the left primer in the ith tube For example, for the molecules encoding the Hamiltonian path 0-1, 1-2, 2-3, 3-4, 4-5, 5-6, graduated PCR will produce bands of 40, 60, 80, 100, 120 and 140 bp in successive lanes. On the molecules encoding the path 0-1, 1-3, 3-4, 4-5, 5-6, graduated PCR will produce bands of 40 , x, 60, 80, 100, and 120 bp in successive lanes. The character x denotes the absence of a band in lane 2 corresponding to the omission of vertex 2 along this path Molecules which encode the path 0-3, 3-2, 2-3, 3-4, 4 -5, 5-6, will appear as bands of x, 60, 80-40, 100, 120, and 140 bp in successive lanes. 80-40 denotes that both a 40-bp and an 80-bp band will be produced in lane 3 (corresponding to the double passage of vertex 3 along this path) .

The most prominent bands in Fig. 5B are those that would arise from the supeÏmposition of the bands predicted for the three paths described above. The bands corresponding to path 0-1, 1-3, 3-4, 4-5, 5-6 were not expected and suggest that the band excised in Step 3 contained contamination from 120-bp molecules However, such low weight contamination is not a problem because it does not persist through Step 206.

Figure 5C shows the results of graduated PCR applied to the molecules in the final product of Step 206. These bands demonstrate that these molecules encode the Hamiltonian path 0 1, 1 2, 2-3, 3-4, 4-5, 5-6. This computation described for the Figure 3 graph, and for steps 200-206, required approximately 7 days of lab work. Step 206, the magnetic bead separation, was the most labor-intensive, requiring a full day at the bench. In general, with use of the algorithm above, the number of procedures required should grow linearly with the number of vertices in the graph. The labor required for large graphs can be reduced by use of alternative procedures , including

the automation technique described herein with reference to Figure 6. Less labor-intensive molecular algorithms can also be used.

The number of different oligonucleotides required also increases linearly with the number of edges . The quantity of each oligonucleotide needed is a rather subtle graph theoretic question. Roughly, the quantity used should be just sufficient to ensure that during the ligation step 200 , a molecule encoding a Hamiltonian path w ll be formed with high probability if such a path exists in the graph. This quantity should grow exponentially with the number of vertices in the graph.

Further, the consideration of the Hamiltonian path problem is only exemplary, and it should be understood that other, computational problems may be solved with similar m thods .

Any n vertex graph with V _Î¹n designated vertices and V _out designated vertices might have multiple Hamiltonian paths Step 206 leaves a solution with molecules representing all Hamiltonian paths for the problem. The graduated PCR at the end of step 206 will produce a supeÏmposition of the bands corresponding to all of these Hamiltonian paths in the n-1 successive lanes .

On an n vertex graph G with designated vertices v _Î¹n and ^v _ouf there may be multiple Hamiltonian paths. If it is desirable to have an explicit description of some Hamiltonian path, that can be accomplished by extending the algorithm as follows. At the end of Step 206, there is a solution in the chemistry sense containing molecules encoding all Hamiltonian paths for <G, v _Î¹n,v _out> The graduated PCR performed at the end of Step 206 will produce the supeÏmposition of the bands corresponding to all of these Hamiltonian paths in the n - 1 successive lanes . For some lane i, a band of least weight (40 bp) will appear. This indicates that some Hamiltonian path begins with v _in and proceeds directly to vertex i . By amplifying by PCR the solution with primers O _x and [O _n not] , running a gel, and

excising the 20 x (n - 1) bp band, one can ensure that only those molecules encoding such Hamiltonian paths will be retained. One now has a solution containing molecules encoding all Hamiltonian paths for <G' , i _ln,v _out>, where G' is the graph where vertex v _in has been removed. This procedure is now iterated.

Many errors can occur in this molecular computer. The occasional ligation of incompatible edge oligonucleotides in step 200, for example, may result in the formation of molecules encoding "pseudopaths" that do not actually occur in the graph. Although such molecules may be amplified during Step 202 and persist through Step 204, they seem unlikely to survive the separation in Step 206. However, the inventor has recognized the power of this computer is in the ability to identify the solution which has a high probability of being correct, even though that solution has some possibility of error.

One simple error correction technique for this embodiment would include using a conventional electronic computer to confirm that a putative Hamiltonian path actually occurs in the graph at the end of any calculation. Another possible source of errors is found during the separation step, where molecules encoding Hamiltonian paths may fail to bind adequately and be lost. Molecules encoding non-Hamiltonian paths may bind nonspecifically and be retained. The latter problem might be mitigated by more stringent or repeated separation procedures . The former problem can be solved by periodically applying PCR with primers designed to amplify Hamiltonian paths (in the example above, primer O ₀ and (0 ₆ not) .) The balanced use of these techniques may be adequate to control such errors .

The choice of random 20-mer oligonucleotides for encoding the graph was based on the ollowing rationale . First, because 4 ²⁰ 20-mer oligonucleotides exist, choosing randomly made it unlikely that oligonucleotides associated with different vertices would share long common subsequences that might result in "unintended" binding during the

ligation step 200. Second, it was guessed that potentially deleterious (and presumably rare) features such as severe hairpin loops would not be likely to arise. Finally, choosing 20-mers assured that binding between "splint" and "edge" oligonucleotides would involve 10 nucleotide pairs and would consequently be stable at room temperature.

This model can also be used to solve "satisf ability" . Satisfiability is a problem where a formula of Boolean elements is given. For example, the formula of the form: Ï = (A or B) and (C or D or not A) ...

This satisfiability problem includes 4 variables, A-D, and 4 connectives (ands and ors) .

The satisfiability problem consists of determining if there is a possible solution, e.g. , a truth - assignment, for which Ï comes out true Here, even using this small set, there are 2* =16 truth-assignments. Each one gives Ï as being true or false, and Ï is satisfiable if and only if at least one truth-assignment gives Ï true.

For the four variable case, the problem can be easily solved by diagramming out each and every possible truth assignment. If there are 70 variables, this leads to 2 ⁷⁰ assignments.

As above, one DNA molecular sequence is used to represent each truth assignment. Molecule j is used to represent bit number 1 being true, F _x is used for bit number 1 being false, T ₂ for bit number 2 being true, etc. These molecules are combined, as alone, to yield molecules encoding all possible 2 ⁷⁰ combinations of true and false sequences i.e. truth assignments. Here, we might use 700- mers for this.

The preferred 700-mer will be of the form for example 05'T ₁F _Z...03* where T^... is a 700-mer. The totality of 700-mers would require approximately 0.93 pounds of DNA, which is about the amount of DNA in the human body. Since there is so many possibilities, we can postulate that each one of the DNA molecules, encoding each of the possible solutions, is contained within that universe. This universe

is appropriately tested as above, to leave only those solutions which match the predetermined criteria, i.e encode truth assignments which satisfy the original formula. Speed and energy efficiency An important feature of the present invention is its ability to operate in a massively parallel way and with incredible energy efficiency. Cellular evolution has optimized energy efficiency processes in the cells.

A typical desktop computer can execute approximately 10 ⁶ operations per second. The fastest supercomputers currently available can execute approximately 10 ¹² operation per second. Call the ligation (concatenation) of two DNA molecules a single operation. If it is assumed that about half of the approximately 4 X 10 ¹⁴ edge oligonucleotides in Step 200 of the Hamiltonian path problem are going to be ligated, then Step 200 will execute approximately 10 ¹⁴ operations. This step could be scaled-up considerably. 10 ²⁰ or more operations could easily be carried out by using micromole rather than picomole quantities At this scale, the number of operations per second during the ligation step would exceed that of current supercomputers by more than three orders of magnitude. This massive number of operations is done with incredible energy efficiency. Hydrolysis of a single molecule of adenosine triphosphate to adenosine monophosphate plus pyrophosphate provides the

Gibbs free energy (ÎG = -8 kcal mol ^"1) for one ligation operation. Hence, in principle, 1 Joule is sufficient for approximately 2 x 10 ¹⁹ such operations.

This is remarkable energy efficiency, considering that the second law of thermodynamics dictates a theoretical maximum of 34 x 10 ¹⁹ (irreversible) operations per joule at 300 K. Existing supercomputers are far less energy- efficient, executing at most 10 ⁹ operations per joule. The energy consumed during other parts of the molecular computation, such as oligonucleotide synthesis and PCR, should also be small in comparison to that consumed by current supercomputers.

Finally, storing information in molecules of DNA allows for an information density of approximately 1 bit per cubic nanometer. Th s represents a dramatic improvement over existing storage media such as videotapes, which store information at a density of approximately 1 bit per 10 ¹² nm ³.

Second Embodiment. First subset

For automation purposes described herein, it is convenient to define a test tube as being a set of molecules of DNA or more formally a multi-set of finite strings over the alphabet A,C,G,T. It is of course possible that in real systems the time and accuracy of a separation may depend on the tube and symbol being considered. However, for convenience it will be assumed that a uniform time and accuracy can be given.

Any of the following functions can be performed within a tube.

1. Separate. Given a tube T and a string S over

{A,C,G,T}, where S is a written string of A,C,G,T and not necessarily an actual molecule . The separate function is used to produce two tubes: +(T,S) and -(T,S) , where-

+(T,S) is all of the molecules of DNA in T which do contain the consecutive subsequence S.

-(T,S) is all of the molecules of DNA in T which do not contain the consecutive subsequence S.

Step 202 in the first embodiment can be thought of a separating steps, since it requires determining which molecules do have specific beginning and end sequences.

2. Merge. Gi en n tubes T _Î±,T ₂... , T _n produce the union of _x to T _n, U(T ₁T ₂... ,T _n) where:

U(T T ₂... ,T _n) = T, U T, ...U T _n

3. Detect. Given a tube T, say 'yes' if T contains any DNA, otherwise say 'no' .

4. Amplify. Given a tube T produce two tubes T' (T) and T" (T) such that T = T' (T) = T" (T) .

This second embodiment uses a more structured language of programming. Tubes are received as input. The output is returned as yes, no or a new tube or set of tubes.

The following program shows inputting a tube, and returning "yes", if and only if the tube contains as sequence which is composed entirely of A. This program assumes a four element alphabet: A, G, T and C. Input (T) Tl = - (T,C) T2 = -(Tl, G) T3 = - (T2, T) Detect (T)

The last step can be changed to OUTPUT (T3) , to alternately return the tube with only components of A.

Second Subset

This second subset of the second embodiment preferably uses a restricted model, which has different characteristics which may make it more practical to implement.

The inventor recognized that the amplify operation used in the first embodiment may be one of the more difficult operations to actually carry out Amplification is a complicated and rare process . Currently ampli ication is most often applied to special biological molecules such as DNA and RNA, and certain living things. Amplification requires construction and processing of covalent bonds. One objective of this second embodiment is to avoid or restrict the use of amplification, and hence to allow increasing the size of the alphabet. One advantage of obviating the amplification is the ability to use substances other than DNA. We therefore define the alphabet for this restricted model of the second embodiment as Î£ which is not necessarily A,C,G,T.

Although DNA has a natural structure allowing the occurrences of certain symbols to be monitored, and to speak of sequences, this may not be true for other types of substances. For example, groups encoding the symbols in other types of substances may simply be placed anywhere on a standard and potentially non-linear molecular backbone. The elements of a "tube" are not, therefore, limited to DNA sequences. More generally, these elements might be called aggregates. These aggregates, in general, are subsets of Î£. Separation will be allowed with reference to symbols only.

The tube is a multiple set of aggregates over an alphabet Î£. Given a tube, one can perform the following operations.

1. Separate. Given a tube T and a symbol sÎ²â, produce two tubes +(T,s) and -(T,s) where +(T,s) is all of the aggregates of T which contain the symbol s and -(T,s) is all of the aggregates of T which do not contain the symbol s .

2. Merge. Give n tubes T _:, T ₂... ,T _n, produce the union, U(T ₁,T ₂, ... ,T _n) where

U(T T ₂, T = T, U T- U ... T _n

3. Detect. Given a tube T, say 'yes' if T contains at least one Aggregate and say 'no' if it contains none.

This more restricted model can be used to solve problems in a similar way to that described above. The second embodiment describes how this model can be used to solve the well-known "3 colorability" problem. 3 colorability problem requires deciding for an undirected graph G = <V,E> whether each vertex in the graph can be colored either red, green or blue in a way such that after coloring, no two vertices connected by an edge have the same color.

As above, the problem must first be posed in a way that allows the problem to be solved by selectivity in molecular operations .

Given an n vertex graph G with edges e _lr e ₂... ,e _z, let â = ( i tt _Î¹ Î¹ , r ₂,b ₂,g ₂, ... ,r _n,b _n,g _n} . and consider the following restricted program on input:

V = "OR".

The program can be written as:

(1) Input (T)

(2) for k = 1 to z:

(a) T _red = +(T,rJ and T _{blue or grÎ²en} = -(T,rJ .

\") -' ^â¢blue ^{â "}" ^" '^blue or green/ "Î¹) **'*'* ^â¢' ^âgreen ^{â â} ( -"-blue or

,,bj .

(c) _red Ï ^oÂ° ^od = -(T _rÎ²d,rJ .

( A _ue Ï'Â°Â° ^d = -(Ï _blue,b .

'Â® /green ^â¢* ^{â â â} ' ^â¢* ^â¢ green ' Î¶TJ *

(f) T' = uiÏ'Â°Â° ^d, _h^Â° ^d) .

(g) T = y _tÎ±; ^900d ,Ï-) where e _k=<i,j>. (3) Output(Detect(T) ) . where, i and j are neighboring edges.

The elements of the input tube T are in one to one correspondence with the universe of possible ways to assign colors to the vertices of G.

Step (2) of the program creates a new tube from which all aggregates Î± with c ₁ = c ₃ , i.e. which assign the same color to vertex i and j, have been removed. After 5k separations, 2k merges and 1 detect, the program will output 'yes' if G is 3-colorable and 'no' otherwise. Hence the

restricted program answers the 3-colorability problem for the graph G in 'linear' time based on the number of edges. An electronic computer, in contrast, would require exponential time for carrying out this function. This shows the value of parallelism, in which step acts on as many as 3" inputs simultaneously.

Third subset

This technique can be used to efficiently solve the well known (NP-complete) 'satisfiability problem' : given a propositional formula (not necessarily in conjunctive normal form) decide if it is satisfiable.

Given a propositional formula in n variables and m binary connectives (and's or or's) and any number of not's, m+1 separations and m merges can be used to produce two tubes, the first contains molecules encoding satisfying truth assignments , the second contains molecules encoding unsatisfying truth assignments. A single final detect on the first tube determines whether the problem is satisfiable or not.

Merging can be done by pouring the contents of the multiple input tubes into a single tube. This is much faster and less error prone than separating. For many of the problems attacked by these restricted computers , detect is done rarely.

Because of the amplify instruction is not used in the restricted model, there may no longer be a need to use biomolecules for computation as described above. Using biomolecules may have several advantages . First, these molecules could potentially be much smaller than those used in the DNA computer. In the DNA computer each symbol is associated with a 10-mer which has a molecular weight of approximately 3000 dalton. In an non biomolecular computer one might hope to associate each symbol with a much smaller molecule. Since mass is roughly inversely proportional to the amount of parallelism available, such small molecules could lead to restricted

molecular computers with much greater parallelism than the DNA computer.

Organic molecules might be constructed which are extremely stable. This could result in a 'catalytic' molecular computer . Merge and separate do not involve the making or breaking of covalent bonds. Detect can also be done without the construction or destruction of molecules . Then, once a computation is completed, all resulting tubes need only be merged into a single tube to recreate the input tube. Hence the molecular computer in a real sense catalyzes the computation. This may be important as a practical matter. For example, a molecular computer for satisfiability could create the input tube T consisting of the molecules encoding all possible truth assignments just once. Then given a formula Ï, T would be used to determine whether Ï was satisfiable. Once that computation was completed, the tube T would be recreated and the next formula processed.

Third, given the freedom to choose arbitrary molecules, it may also be possible that a set of inorganic molecules might be chosen with properties that allow for systems with small error rates .

An embodiment follows. Associate each symbol with a small functional group (e.g. methyl, amino) . Form an aggregate by bonding the appropriate functional groups to a fullerene skeleton. For the purposes of separation, use 'combinatorial' chemistry or rational design to develop enzymes which specifically bind to each incorporated functional group. Such a system would encode an aggregate in approximately one fiftieth of the mass used by the DNA computer. Hence it would provide approximately 50 times the parallelism per unit size. Further, fullerenes are quite stable.

Third Embodiment The restricted model of molecular computation given above in the first and second embodiments is memoryless in the sense that the molecules themselves do not change in the

course of a computation. The state of the computation is represented primarily by the collection of the molecules - i.e., which molecules exist in any tube. The present embodiment describes a system using a molecular computer with memory. In this embodiment, the molecules in the solution can be altered. This alteration causes the molecules to represent stored information.

This model has the same power as the unrestricted model , but also also allows storing results of interim computations, i.e. , storing results of different tests.

This model is used as a memory of a computer.

First subset

A tube is a multi-set of aggregates over an alphabet

Î£ = {aj,a ₂, ... ,a _n,bi,b ₂, ... ,b _n} . The following functions are defined:

1. Separate. Given a tube T and a symbol s e , produce two tubes +(T,s) and -(T,s) where +(T,s) is all of aggregates of T which contain the symbol s and -(T,s) is all of the aggregates of T which do not contain the symbol s.

2. Merge. Given tubes T^T;,, produce the union U(T _lrT ₂) where:

U(Ï _lrÏ ₂) = T, _. U Ï ₂

3. Detect. Given a tube T, say 'yes' if T contains at least one aggregate and say 'no' if it contains none.

4. Flip.

(a) Given a tube T and a symbol _^ e Î£, produce a new tube T ^aÎ¹ = {Î± lÎ± â¬ T} where for all Î± c â, if a ₁ Â£ Î± then Î± ^a ₂ = Î± and if a., e then Î± = (Î± - (a ) U {b .

(b) Given a tube T and a symbol b ₁ e Î£, produce a new tube T^ = { a ^b ₁ \ a â¬ T} where for all Î± c â, if b _x â¬ Î± then a ^b = a and if b _L Â£ Î± then Î±\ = (Î± - (b ) U {a . Hence the flip operations switches a ₁ to b _A (b _x to aj in all aggregates that contain an a ₁ (bj .

Let T = { |Î± c Î£ &Î± = { t _{l t} Â£ ₂,...,< & I = ^a*. ^or < ^=b i=l,2, ... ,n}

Each location f therefore acts like a memory element. Each aggregate in T is a memory with n locations, where the i ^th location contains a 1 if I, = a _^ and contains a 0 if ^ = b _^. This allows executing additional operations and additional instructions that might depend on where the process has been.

Consider the procedure, 'For all aggregates in the tube T, if location 3 has a 1 and location 5 has a 0, then set location 51 to 1'

The following molecular program can be used, for example, to carry this out: (1) Input (T)

(3) T ₂ = + (T _{l t} b

(4) T ₃ = T ₂b ⁵¹

(5) T' = U(-(T,aJ ,-(T.b _s)) An inorganic incarnation is also possible. As above, each a ₁ is associated with some small functional group and b _x is obtained from a ₁ by the non-covalent attachment of a small molecule which binds to a, _^. Second subset Certain processes might require a history to be recorded on the molecule. One example problem - assume a problem which can be solved by solving a series of subproblems, each of which has a unique solution. Test molecules to see if they satisfy the subproblem. Mark those molecules which have been tested. Hence, by testing for the presence of the marking, the molecules which have been tested can be ascertained.

This above-described operation essentially records a history in the working molecules. The present embodiment suggests a technique of using molecular stickers to record this history. The existence of a sticker at each sticker position may indicate whether a material has been through a

particular computation, or has not been through that computation More generally, however, the presence or absence of the sticker is used to determine something about the computation. Assume a limited universe of stickers s _lr s ₂...s ₁₀.

Each sticker can be embodied as a DNA sequence. Each sticker, of course, has an inherent complement {s ₁ not) , (s ₂ not) ... (s ₁₀ not) These complements will be called "locations. " A sticker evaluating molecule can include a complex of locations Each of the locations is either complemented by a sticker or not. The presence of the sticker indicates that the molecule has been tested for the particular subproblem. According to this aspect of the present invention, a plurality of sticker locations are placed at the end of the molecule. If a sticker location is complemented with the sticker, the condition associated with that location is evaluated as a "1". If a sticker location s not complemented, it is evaluated as a "0" While the above description has described one use for the stickers, it should be understood that these memory positions can be used to determine many additional things about the molecule.

For example, an operation can be marked by attaching sticker S ₅ to its appropriate locaton. Later, all molecules can be run across bead-bound S ₅. Only those molecules which are not "marked" with S ₅ will stick.

The sticker evaluating molecule could be the genome of a virus, e.g., the M13 virus. The molecule could alternatively be non DNA, e.g. fullerenes .

The stickers can be any substance which reacts with the sticker evaluating molecule in a predictable way that can be later detected.

The detection can use separations, for example. The seperations can be done based on whether stickers are present. A tube including stickers evaluation molecules with location 1 for sticker s . Separate the tube based on

whether s is contained therein. This provides two different tubes, including approximately 2 ⁶⁹ sequences with S attached and approximately another 2 ⁶⁹ sequences with no s The resulting two tubes respectively have different characteristics.

The generalized molecular computer can also be used to embody the memory model. For each 'location' i, a, is associated with an oligonucleotide 0 ₁ and b ₁ is associated with a methylated version of O _x. Enzymes are used to methylate and demethylate. For example, if O _x =

5 ' ...GAATTC...3' then EcoR I methylase will catalyze the transfer of a methyl group (from S- adenosylmethion ne) resulting in 5' ...GA"ATCC...3' . EcoR I methylase will not methylate oligos without the subsequence GAATTC Other techniques can be used to demethylate and hence clear the stickers. Third subset

This third subset provides renewable and reusable workspaces. This embodiment uses two kinds of stickers - weakly bound stickers and strongly bound stickers. The weakly bound stickers can be removed without affecting anything about the strongly bound stickers. According to this feature, any of the techniques described above are used to provide a plurality of strong bits. The sticker evaluating molecule workspace w is formed of weaker sticker bits. These weak bits can be dissociated from their bonds by heating. According to this technique of the present invention, the strong bits are used to store information or to mark various paths through the operation as described in the previous embodiments. These weakly-bound stickers are used as working memory. The contents of this working memory can be cleared by heating the solution of the substances. The workspace is heated up sufficiently to dislodge these weak stickers but not to dissociate the strong stickers from the sticker evaluating molecule.

Fourth Subset

An alternative memory model facilitates the separation of various different particular elements . This memory model uses a sequence M of a length of, for example, 10,000. The active sequence for M is not so important as the requirement for many copies of M.

This embodiment suggests using the genome of a phage, e.g. a bacterial virus. The phage can be "fed" to form the copies of M. All these copies of M are the same. M includes a plurality of subparts which we call Bl ,

B2...B200. This means that the memory of this embodiment has 200 locations; i.e. it is a 200 bit memory. Each of the subparts is a block of length 50. Each block corresponds to a location in the memory. Each block is broken into three parts, Z, C and O. Z and O are 10 mers , while C is a 30 mer. Conceptually, the portion Z represents 0, O stands for 1, and C represents common.

The locations in memory are written as follows : the present embodiment uses stickers which are oligos. Each block B _x described above has two different sticker oligos. The sticker oligo (Z^ not) represents an indication that the block represents a 0 (that is Z is complemented) . The sticker (C ₁0 ₁ not) indicates that the block represents a 1. For the above example with 200 locations, we need to form 400 sticker oligos; two for each of the 200 blocks. However, since the sticker oligos are short, they can be easily synthesized in large quantities .

Location i can be set to 0 by annealing sticker (Z ₁C ₁ not) to M. Location i can be set to 1 by annealing (C ₁0 ₁ not) to M.

(Z ₁C ₁ not) and (C ₁0 ₁ not) will compete for the location C . Hence, one or the other, but not both, will bind to the location i . Once one location binds , the other location cannot bind.

An advantage of this system is the facility with which the differently-coded elements can be separated. Consider a

string with location 5 set to 0. This string has location B5 annealed to sticker (Z ₅C ₅ not) . The 40 base pairs of B5 include Z ₅ and C ₅ being paired. The 10 base O is unpaired. Hence, those that have B ₅ set to 0 will anneal to the bead- bound (O _s not) . In contrast, other strings that have location B ₅ set to 1 will have location B _s annealed to (C ₅0 ₅ not) . Hence, the O _s is not exposed. Those strings will not anneal to bead-bound (0 ₅ not) .

The molecules in this way can be run over bead-bound (0 ₅ not) . Only those molecules that have location 5 set to 0 will be captured on the beads .

The captured strings include those with location 5 set to 0. These can be released by eluting. Preferably this is done by heating to a eluting temperature. This temperature will break the 10-base pair bond between the O _s and the bead- bound (0 ₅ not) . This eluting temperature is not high enough to break the 40 base pair bond between B ₅ and (Z ₅C ₅ not) , or any other stickers elsewhere on the molecule.

One advantage of this system is that is allows unambiguous determination of l's, 0's and also molecules which are neither l's nor 0's. It also allows a relatively weak bond (only a 10 base pair bond) which can be easily eluted.

Fifth subset - Sticker oligos of XYZ. The fifth subset of this system is a subset of the above fourth subset. In this subset, we will label the parts of the blocks B as, X, Y and Z. Here X and Z are 20 mers and Y is a 10 mer. The sticker oligos which are used are therefore of length 50. Each block B _i consists of X _Î±Y _iZ ₁. We also construct a sticker oligo not) . This requires only 200 sticker oligos, one for each of the 200 blocks. Again, the sticker oligos are short and can be synthesized easily.

Any location i is set to 1 by annealing sticker (X ₁Y _iZ ₁ not) to M. The location i is set to "0" when no oligo is annealed thereto.

Now, bead-bound (Y _^ not) will catch all molecules M which have Y _^ exposed. In other words, those molecules with position 5 that are covered will not be caught by beads which have bead bound (Y ₅ not) . Those molecules can be captured by washing the beads at low temperatures. However, those molecules which do not have position 5 covered will be caught by those beads.

Now, the captured strings can be released by eluting at a temperature sufficient to break the 10-pair base bond, but not hot enough to break the 50 pair base bond.

This fifth subset is simpler, since it requires less stickers be synthesized. However, it is not possible to use this system to detect improper states, i e those states which do not fall into either positive or negative.

Fourth Embodiment

This embodiment describes a DNA-based implementation of the 'restricted model' as described in the third embodiment

The alphabet of symbols Î£ as used herein contains two special symbols s _s and s ₃ Each symbol S e Î£ is associated with an oligonucleotide ("oligo") , i.e. a short sequence of nucleotides. The resulting set of oligos preferably satisfies the following properties ^â¢ (1) Under easily achievable conditions each oligo reliably forms stable hybrids with its Watson-Crick complement. These conditions include, for example, control of temperature, pH, salt concentration or ion concentration

(2) Under easily achievable conditions, each oligo reliably dissociates from its Watson-Crick complement.

(3) Under neither of the conditions above does any oligo form hybrids with itself or another oligo, nor with another oligo ' s Watson-Crick complement.

Of course, some small deviation from these properties is expected, and is covered by the error correction as done herein. For the DNA computer we will require that all input tubes consist of molecules of DNA which begin (5')

with the oligo associates with s ₅, and end (3') with the oligo associates with s ₃ . This latter feature is optional , but allows identifying whether any marked DNA molecules remain in any solution. A Merge of tubes is accomplished by pouring the contents of all tubes to be merged into a single tube.

A Separate is carried out on a tube T with respect to a symbol s as follows. If s is associated with the oligo O, a separator which consists of molecules of the oligo (O not) , the Watson-Crick complement to O conjugated to solid supports are used. For example, the magnetic bead system described in the first embodiment can be used, or an affinity column

A Detect on a tube T uses a PCR with primers appropriate for the common 5' and 3' sequences, followed by gel electrophoresis.

This new model can be used, for example, to solve an instance of the 'satisfiability problem'.

Given a propositional formula Ï with 70 variables and 1000 binary connectives.

This can be defined as

Î£ = {T Î¹=l,2, ... ,70} U {F i = 1,2, ... ,70} U { s, , s ₃ } , with a 'true' symbol and a 'false' symbol for each of the variables and the required 'PCR symbols ' s ₅ and s ₃ . Each symbol of Î£ is associated with a randomly chosen

(or carefully designed) 10-mer to satisfy the three conditions listed above. Denote the oligo associated with T _A as O , the oligo associated with F _A as 0 and the oligos associated with s ₅ and s ₃ as O _s, and 0 ₃ respectively As described above, the input tube T includes the whole universe of all possible truth assignments for the problem. Hence, the input tube consist of the set of all DNA molecules of the form: where for l = 1 , 2 , . . . , 70 , M _X = O _lT or M _^ = O _lF . Thus the elements of T are in one to one correspondence with the set of all possible truth assignments for the 70 variables.

This universe includes 2 ⁷⁰ molecules, each of which is a 720-mer. Assuming that each nucleotide has molecular weight of 300 dal on, then the molecules in tube T have a mass of approximately 425 grams (0.93 lb.) T is preferably created as follows. First, all of the

0 _X are produced separately. Then approximately 2 ⁷⁰ O _s, are conjugated to solid support in a tube T. Then the following steps are taken for I = 1,2,...,70:

(1) pour half of the contents of tube T into T ₁ and

(2) To the molecules in T- _. ligate (a single) 0 ₁ ^Ï to the 3 ' end.

(3) To the molecules in T ₂ ligate (a single) O _i ^F to the 3' end. (4) Merge , into T.

Finally ligate a single o ₃ to the molecules in T.

Performing the required merges would be trivial . Performing detection by PCR would be easily accomplished by standard means . Recall that such a PCR is only performed once at the end of the computation. If the final tube contained many DNA molecules (i.e. if there were many truth assignments which satisfied Ï) then the detect operation could be performed with coarser techniques such as optical density measurement. Separation can use the magnetic bead system described in the first embodiment. Each bead is approximately 1 micron in diameter and has a mass of approximately 1.3 pg. Each bead can bind approximately 7.8 x 10 ⁵ biotinylated oligos . Assuming that one bead-associated biotinylated oligo will bind one target molecule, then binding 2 ⁶⁹ target molecules (the number of targets in the largest separation done during this computation) would require approximately 984 g (approximately 2.2 lb.) of beads with a volume of roughly 0.39 liters (approximately 0.49 qt.) These beads would have a volume of approximately 750 ml.

It is important to note that these beads are reusable After a separation, the beads and their oligos are intact

except for 'natural degradation' This operation rs hence consistent with an automated process using and reusing these beads as described with reference to the automated computer system embodiment. Assume that each separation could be done in 1000 sec, i = approximately 16.7 minutes and that the time for a merge is negligible. Also assume that there are no errors during separations (i.e. C ₊ = C. = 0) . This estimate is unreasonable, but the effect and handling of errors will be considered herein. Let an operation be taken as either having a molecule of DNA 'decide' to go into one tube or another during a separation, or having a molecule of DNA transferred from one tube to another during a merge. The number of separations is 1001 and the number of merges is 1000. For a typical Ï, each separation and each merge would be applied to a tube with approximately 2 ⁶⁹ DNA molecules .

Hence, the number of operations per second is approximately 1.2 x 10 ¹⁸. This is approximately 1,200,000 times faster than the fastest supercomputer. The total time for the calculation would be dominated by the 1001 separations and hence would be approximately 11.6 days. During the computation, approximately 10 ²⁴ operations would be performed, which is quite possibly greater than the total number of operations performed on man-made computing devices throughout history.

The molecular computer above would require 140 'separators' : a 'True' and 'False' for each proportional variable. Hence as many as 140 propositional formulas might be handled by the machine simultaneously. In this case, the machine might execute as many as 1.6 x 10 ^2Q operations per second.

Even this speed can be improved, possibly by another order of magnitude, using the automated embodiment which is described herein.

Fifth embodiment - error correction

Large amounts of error would be catastrophic in current electronic computers. Such errors , however, might be expected in the above-described embodiment. Consider for example the following: Each incarnation of a restricted computer is associated with the parameters T , time for operation; e+, the probability that the particle that should be a + ends up showing as a -, and e- the probability that the particle that should be a - shows up as a +. & + and ⁽- - could be very different. For example, the DNA computer described in the second embodiment could carry out the separation as follows .

(1) Anneal Incubate the contents of T with the beads and allow time for molecules with the correct consecutive subsequence to anneal . Pour off the liquid phase containing molecules that do not anneal.

(2) Wash. Add solvent to the beads and pour off. This will remove additionally any molecules which do not anneal to beads (for example molecules left on the walls of the tube or held weakly to the beads) . Repeat Once a molecule that should be in +(T,s) sticks to a bead, it will usually stay stuck (with very high probability) through many washings . c+ is determined by how well the molecules in +(T,s) stick during the anneal step. Molecules which weakly anneal to the beads or are stuck to the sides of the tube may only be removed on a second or third washing. e- may depend on how much washing is done. One can imagine a system with several washings which would give values like Î²+ = 1/10 and Î²_ = 1/10 ⁶. The analysis that follows simplifies the situation by maintaining e+ and F approximately equal. This is always possible to achieve with the use of repeated separations : (first) Input (T)

(1) T _x = +(T,s) and T\ - -(T,s) .

(2) T ₂ = -(-(T'^s) and T' ₂ = -(T\s)

(n) T _n = + (T ' _n-1 , s) and T ' _n = - <T ' _n_Î¹ , s>

(last) T ⁺ = U fT^ T, , TJ and T ^" = T ' _n .

After n steps, T ⁺ will have (1 - f ⁿ) of the elements which should be in +(T,s) and will have 1 - (1 - e ⁿ of the elements which should be in -(T,S) . Further, T = U(T ⁺,T ) .

When n is chosen so that Î² ⁿ approximately equals 1 - (1 - Â£_)" (which itself approximately equals 1 - ne , then we can think of the n repeated separations, yielding the tubes T ⁺ and T ^" from T, as a single new separation operation with e ₊ approximately equal to â¬. and i n times as large as the original Ï. For example, if the original separation had e ₊ = 1/20, Â£ = 1/10 ⁶ and n = 1000 sec, then the new separation (with n = 6 - actually n = 5.28 is better but we will use an integral number of repetitions) has _t = 1/10 ⁶, e- approximately 6/10 ⁶ and Ï = 6000 sec.

These techniques can be used to allow e_= 6 ₊. We will denote this value as . We will assume that for our satisfiability example e = 6/10 ⁶.

Now assume that a given molecule enters s separations in the course of a computation (e.g. in the satisfiability example s < 1000) . The probability P _good that this molecule goes through all separations without a mistake, i.e. without ever going in to the 'wrong' tube is:

(1 - e) ^s and the probability P _bad, that the given molecule fails to go through the s separations without a mistake is at most:

1 - (1 - e) ^s For the satisfiability example, for all molecules P _good â¥ (1- Î²/10 ⁶) ¹ ₀₀₀ which is approximately 0.994 and P _bad < 0.006. Hence at least 994 molecules out of each 1000 make it through the entire calculation with no error. This is pretty good, but not adequate, because that number of molecules (approximately 1.5 x 2 ⁶²) could have errors at some point and might end up in the wrong tube when the computation is done. To handle this problem, consider the following:

Assume a traditional sequential electronic computer. There is a predicate P (i.e. a function which on each input returns either 'yes' or 'no') and assume you write a program which exhaustively (and sequentially) searches all 2 ⁷⁰ binary strings of length 70 for the existence of a 'winner' string for which P gives a 'yes' answer. Further assume that for each string of length 70, P can be computed with 1000 operations. If during the search, the computer stops and prints out 'Î± is a winner' , should you believe it? Like molecular computers, electronic computers are physical systems which are subject to error. If you are told that your computer makes at most one error in 2 ^{1 ooc cco} operations, then, as a practical matter, you are likely to be justified in believing that Î± is indeed a winner If however, you are told that your computer makes at most one error every 166,666 operations, then the situation is not so clear You may want to check the computer's answer by retesting the string Î± several times. Each time you get the answer 'Î± is a winner' (without ever getting the answer ' is a loser') , you are justified in increasing your belief that Î± is indeed a winner - but of course you will never be absolutely sure.

These ideas will now be applied to the molecular setting. Assume that P _good is fairly large (as in the satisfiability example) and that the computation has been run for the propositional formula Ï to produce the output tube

T1W. This tube should ideally have the 'winners', i.e. those molecules which encode a satisfying truth assignment for Ï. The tube T1W is used as the input to an entire rerun of the whole computation to re-test all of the purported 'winners' .

The new tube T2W includes all of the molecules which went through the entire computation twice and both times ended in the winning tube. The probability of the materials in the tube T2W being incorrect is therefore greatly decreased.

That process can be repeated through r runs of the entire computation to make tube TrW, which will contain two types of molecules. TrW contains the 'true winners ' those that really do encode satisfying truth assignments and the 'false winners ' those which encode unsatisfying truth assignments - but somehow, through errors, made it into TrW

Of course it is much easier to get into TrW as a ' true winner' than as a ' alse winner' , and hence much more probable. A 'true winner' need only go through r computations without an error. Most molecules do make it through each pass of the computation without an error - in our example 994 of every 1000 at least. On the other hand a 'false winner' is unlikely to make it through R computations. A 'false winner' must have at least one error on each pass through the computation.

In the example given, the probability of wrong operations is approximately 6/1000. Making an error 10 times in a row has a probability of approximately 1/(13.5 * 2 ⁷⁰) Hence the expected number of 'false winners' in tube

TwlO is at most 1/13 5 (less than one) Hence, even if Ï is not satisfiable, it is unlikely some 'false winner' will end up in TwlO. Now P ¹⁰, _^, is approximately 0.94 So even if there is ust one satisfying truth assignment for cp, there is at least a 94% chance the molecule encoding that assignment will end up in TW10 - and hence we will be likely to give the correct answer that Q is satisf able.

This analysis shows that the errors can be decreased to a manageable number. This technique is rather heavy handed and more refined techniques can alternately be used. For example, materials can be chosen which have less probability of errors. The oligos used can be specifically chosen to diminish the possibility of non-specific binding. The tubes can be maintained sterile - eg free of extraneous molecules (e.g., enzymes) which might complicate the interactions. In addition, systems such as these have symmetry. For example, if a separation with respect to the symbol F _s is done then -

(T,FJ = +(T,TJ , since every molecule in the system has either T ₅ or F ₅ but not both. A 'double positive' system could be used to reduce error . Consider putting the contents of the tube into a chamber which contains beads for T ₅ on the left and beads for F ₅ on the right with a semipermeable membrane in between which allows oligos to pass but not beads (indeed with such a system, the beads might be replaced by large molecules which cannot pass the membrane) . Agitation of the chamber might yield very clean separations. Such a system could be very fast and could perform separations without the need to expand the volume of solvent used.

There are of course other possible sources of error in molecular computation; for example, 'natural degeneration' of molecules. In a DNA computer, it would of course be necessary that major sources of degeneration such as nuclease contamination be avoided and it would be sensible to have separations done under the mildest conditions possible. It might also be wise to use analogues of DNA such as those with peptide backbones which might be more robust.

Sixth Embodiment - The Practical DNA Computer

The DNA computer that has been described herein is well suited to certain operations. Molecular computers fill a computational niche that electronic computers do not practically fill . Some problems can be decomposed sufficiently such that many electronic computers could carry them out in a massively parallel way. For example, there could be a tradeoff between an electronic computer which executes a trillion instructions in one second and a biological computer which executes 10 ²⁰ computations every 1000 seconds. The better solution depends on the degree of decomposability of the problem. The embodiment of a practical computer carrying out these operations could be made, and the inventor herewith describes one embodiment of such a system with reference to Figure 6. The basic system

of the present invention keeps DNA samples in tubes to follow the tube model given above. Each tube includes a small sample of some working substance. Each tube is numbered, and may include an optically-detectable indicia thereon, such as a holographic indicia or a bar code. These codes can be read by scanner 601 to sense the number of the current tube. All of the tubes are mounted around carousel 610, which should preferably have locations for 6000 tubes. The carousel can be rotated by step motor 612. Alternately, however, the carousel can stay still, or be replaced by a rack of a different geometry and the robot arms can reach to any location in the carousel, and grab any tube.

Step motor is driven by pulses from processor 614 which controls the overall operation according to a pre-stored flowchart The processor also controls other operations as described herein.

Each of the tubes 600 , 602 includes a sample of substances â in this embodiment most samples are DNA. Different tubes may contain different samples. The processor stores information indicating the position of tubes in the carousel and the position of the carousel itself. For example, there is a specified position on the carousel shown as position 604. That position might represent the zero position. By knowing the current position of that location, the processor can always determine where any desired tube, say for example 600, is located. Tube 600 can be rotated to the position shown as Y. Alternately, or additionally, the indicia on the tube can be read to ensure that the correct tube is being accessed.

Each of a plurality of robots 620 includes a robot arm 624. The robot arm reaches in to the carousel at position Y, or some other programmable location, to grab the tube at the position Y. Each of the robot arms can grab a tube from a different group of locations.

Each of the robot arms also includes an associated work area 630 in which the grabbed tube can be further processed,

e.g. by sperating merging, adding stickers, eluting or other operations as described herein.

Figure 6 shows the robot arm in a second position with the tube S ₁₀ therein. The tube S ₁₀ has its contents poured over apparatus 632. Apparatus 632 can be, or example, magnetic beads which are marked with a complement of a particular element. The contents in tube 632 have been removed and held by robot arm 622. The contents of S ₁₀ are poured over the apparatus in 632. Every element which has a 0 in the tenth bit, for example, may be separated out by the element 632. Accordingly, the content of the tube 634 includes all materials that do not have a 0 in the tenth bit Tube 634 may then be grabbed by robot arm 628 of robot 626 and returned to the carousel to a specified location. The returned tube 634, therefore, includes only materials which do not have 0 in the tenth bit.

Other operations can also be carried out in the workspace 630. These operations include release of weakly bound sticker molecules in a workspace by heating the tube for example, or seperating based on the presence of the stickers.

More generally, each robot operation pulls a tube from any position within the reach of the robot. The tube is then further processed in the workspace 630 with other materials which may be taken from other tubes, to carry out some different operation as described above.

Each of these robot operations can be accomplished in approximately one minute, thus increasing the previous estimate of operations per unit time by another order of magnitude. Moreover, many robot operations can be done in parallel by different robots, to further decrease the necessary time of operation.

The carousel embodiment preferably uses a plurality of tubes around the carousel with a number of bead operators. Some tubes outside the carousel can be used as separators, merges and sticker manipulators. A carousel having 3142 - 1/4 inch tubes would be about 10 feet in radius.

While this embodiment describes operation using robot arms, it should be understood that the key point here is the ability to move the molecules from a first position to a second position. Other means of moving these molecules, including pumps, electrostatic movements, and other similar features could also be used.

The flowchart of operation of the apparatus of Figure 6 is now described with reference to the flowchart of Figure 7. The processor 614 carries out various operations. First, it controls the stepper motor to move to various locations. It also controls the robot arms to pick up various materials. Finally, the processor may contribute to determining whether the processed results include errors therein Figure 7 shows these functions being carried out in basic block diagram form.

At step 700, the system detects the next operation, determines the tube locations required for that next operation, and command the robot arms to grab those tubes This requires that the processor control the position of the carousel 610, and the movement of the robot arms between their di ferent positions . The tubes are then located into the workspace 630.

At step 702, the system carries out the desired operations on the tubes which have been obtained in this way.

Step 704 is optional, since it requires the processor to look ahead to try to define the appropriate locations for these tubes which will enable them to be most-efficiently obtained during a subsequent operation. Algorithms for such pipelining are well-known. For instance, if a problem can be decomposed as described above, different parts of the problem can be simultaneously computed.

At step 706 the materials are replaced in the carousel, and the process may continue. If the process is over at step 706, then the solution that is determined by the computer may be tested using an electronic computer at step 708.

The advantage of the electronic computer carrying out the testing is as follows. Say for example there are 10 ²² possible solutions to a problem, and each one might take even a microsecond for an electronic computer to calculate. This still yields 10 ¹⁶ seconds (3.1 x 10 ^s years) to solve the problem.

However, once the problem is solved by a molecular computer, the solution can still be tested by an electronic computer in one microsecond. Even if there are 10 ³ results, only one of which is correct, the molecular computer has still narrowed down the results by 23 orders of magnitude. An electronic computer can test these 10 ³ results in less than a second.

This test operation combines the advantages of electronic and molecular computers, and may in some cases obviate the need for error correction described above. Seventh Embodiment - designer molecules

The ability to create molecules with desired properties (e.g. enzymes, drugs) 'on demand' is of great importance. Historically, the process of creating such molecules has been difficult (if possible at all) and expensive. A new promising approach to this problem, called 'combinatorial' chemistry has arisen. See, Alper J. , Drug discover on the assembly line , Science, 264:1399-1401 (June 3, 1994) . Below is an example of its using the concepts useful in designing molecular computers in combinatorial chemistry. Following the example a potential application of these concepts is described.

Recently, Bartel and Szostak ["BS"] have used the methods of combinatorial chemistry to make what might be called a 'pseudo-enzyme' . We will describe their basic approach briefly. For the sake of clarity many simplifications will be made and many important details will be omitted. The goal of the experiment was to find a molecule of

RNA which would ligate two substrate molecules of RNA p. and p ₂ (p _j and Ï ₂ were base pair complementary in such a way that

hybridization would bring their 5 ' -triphosphate and 3 ' - hydroxyl groups into proximity in preparation for ligation) A pool of approximately 4 ²⁵ random sequences of RNA was developed. Each RNA in the pool was bound to a copy of Ï ₂ at the 5' end and a copy of a constant region C at the 3' end Hence, a tube containing approximately 4 ²⁵ molecules was created, and each molecule m the tube had the form p ₂RC, where R was a some molecule of RNA from the original pool . To this tube an excess of p _r was added. If a 'winner' W existed in the RNA pool which could ligate Ï ₁ to p ₂ then in the tube Pjp ₂ C would be formed. Next molecules which contained the sequence Ï ₁ were separated and retained (hence 'non winner ' RNA f om the original pool were removed) . Because the 5 ' sequence of Ï-_ was known and the sequence C was known, it was now possible to amplify (using reverse transcription, PCR, transcription) only those molecules of the form pjPjWC. Standard means could now be used to discover the sequences of the 'winners' .

The 'winner' (actually 'winners') of Bartel and Szostak experiment is not a true enzyme, since it is 'used up' m acting on its substrates The value of having the 'winner' RNA 'anchored' to one of the substrates is that once the winner acts , it becomes permanently anchored to the reaction product This distinguished it from the rest of the RNA pool and allows it to be retrieved for later identification.

If one wished to create a true enzyme, one might have to identify the 'winner' despite the fact that after acting on its substrate it reenters the pool. This is an example of the problem which we will now explore further: finding the 'winner' when all that is available is the ability to know that a 'winner' exists somewhere in a large pool Being able to solve this problem in a general setting would be particularly important when the pool molecule were not nucleotides (e.g. a pool of proteins, a pool consisting of multiple variants of an existing compound) . Or when anchoring was not physically possible or appropriate (for example when an endonuclease was sought or a drug which

crosses a cell membrane and then binds a particular host molecule) . Below we show that this problem can be solved in a wide variety of settings by an example . Our goal will be to find an RNA molecule which will ligate (as a true enzyme) two DNA molecules Î´ _x and Î´ ₂ (which as above, hybridize would bring their 5 ' -triphosphate and 3 ' -hydroxyl groups into proximity in preparation for ligation) .

Consider the following notation. Let P denote the pool of all 25-mers of RNA (there would be a total of 4 ²⁵) . If we have an RNA sequence Ï, let P ₀ be all 25-mer RNA which begin (5') with Ï. For example P _A would be all 25-mers which begin with A (there would be a total of A ²ⁱ ) , and P _ACCU would be all 25-mers which begin with ACCU (there would be a total of 4 ²¹) Consider two 50 nucleotide sequences of DNA, b ₁ and Î´ ₂ and seeking an RNA which will ligate them. One can begin as in { [BS] } by creating an RNA pool consisting of all 25-mers (i.e. begin with P) . One then incubates P, and excess b _Î» and Î´ ₂ in an appropriate bu fer. After an appropriate time , one runs a PCR with primers derived from the 5' end of Î´ _Î» and the 3 ' end of Î´ ₂. Thus neither b _x nor Î´ ₂ will be amplified but the product of their ligation would be. Hence if no PCR product is produced we will declare that the pool P contains no 'winner' and discontinue the experiment for that pool . If a PCR product is produced we will declare that the pool P contained a 'winner' . For the purpose of this illustration, we will ignore 'practical' problems such as the occasional uncatalyzed ligation of Î´ ₁ and Î´ ₂.How do we find the winner? We proceed as follows . In the next step we repeat the experiment above 4 times, once with each of P _A, P _c, P _G and P,j. Since

P= P _AUP _C[ P _GUP _U,

at least one (and perhaps more than one) of them will contain a 'winner' . If P _c contains a 'winner' , then we know

some 'winner' begins with C. Next we repeat the experiment with P _CA, P _cc, P _CG and P _cu. Again at least one must contain a 'winner' . Say P^, then we know some winner beings with CA Continuing in this fashion we will come to know the exact sequence of some 'winner' as desired.

The present inventor recognized the similarity of this technique to that used for demonstrating that if there exists a polynomial time algorithm for deciding satisfiability, then there exists a polynomial time algorithm for finding satisfying truth assignments.

Admittedly this is a considerable amount of repetitious work. It requires 101 steps of pool synthesis, incubation and PCR. Many of these steps could be run in parallel, however. However, in these 101 steps, 4 ²⁵ (over 1000 trillion) RNA strings are searched. This is of course the same sort of trade-off achieved for our molecular computers applied to problems like satisfiability: linear time versus exponential time. Further, by doing the incubation under 'unamiable' conditions where only the 'best' winners will have a chance to act, the need for 'evolving' the answer as done for example in [BS] is removed and this should save considerable time. Also, there is never a need to perform amplification, on pool elements which in the case of RNA is laborious and for other molecules may be impossible. It is worth remarking that this technique would work in the same way if the pool was composed of RNAs consisting of all length 25 sequences of elements from any set of four 'basic' RNA units (for example four 10-mers rather than the 1-mers used in the example) . The RNA in the pool could also have fixed constant regions if that was desired.

It should be clear that similar techniques would work in a wide variety of settings. Using pools made of RNA, DNA or some other form of aggregates. Whether a ligation or some other function was desirable - so long as the existence of a 'winner' could be detected.

So the method demonstrates that the ability to detect the existence of a winner leads efficiently to the identification of a winner.

Eighth embodiment

Previous embodiments have described operations which can be carried out on a tube which is a multiple set of substances over an alphabet ]T . This embodiment describes additional operations which can be carried out. A) Randomize - Given a tube T, of substances produce a tube with substances that are random variants of the original substances .

This can be done , for example , by heating a tube of primary strands with stickers to a dissociating temperature, and then cooling it. This dissociates all the stickers, and allows them to re-anneal randomly. This system can be used for random operation.

Consider the previous embodiments which require generating a plurality of truth assignments . Assuming 50 variable truth assignments requiring 2 ⁵⁰ states. This requires 2 ⁵⁰ primary strands . These strands are for example of length 500. We want to add 2 ⁴⁹ copies of each sticker (of length 10) . This ratio will maximize the possibility of all possible combinations of stickers being properly bonded. The system is heated then cooled. This randomizes the stickers. After the randomize operation, all desired representations are obtained, up to the limit of statistical variability.

It may be difficult to add exactly 2 ⁴⁹ stickers to 2 ⁵⁰ primary strands. One possible way is to start with the 2 ⁵⁰ primary strands. Take half that solution and mix it with excess stickers of all types . These stickers anneal to saturate the primary strands at all positions . The primary strands are then separated from the leftover stickers. The other half of the virgin primary strands are then added, and heated to apply the randomize operation.

This randomization operation can be used to prepare initial strands for the above discussed embodiments .

B) Point Mutation. Point mutation is an operation to allow some statistical variability in the process. The variability is carried out by allowing certain ones of the stickers to fall off from each primary strand.

A partial randomization operation would cause 1 in 50 stickers to fall off from each primary strand at random. These stickers could then stick to new primary strands at random. Each of the primary strands would therefore randomly loose a few stickers and randomly pick up a few new stickers This allows sticker mutation without enzymes or replication, and allows efficient implementation of heuristic algorithms. It is also possible to produce randomness by using enzymes or replication or other procedures .

One important feature of this system is that weakly bound stickers can be formed. These weakly bound stickers include stickers which differ from standard stickers in that they can be made to dissociate from the primary strand more easily. For example, these weakly-bound stickers dissociate at lower temperature or with different salt concentration. For example, weakly bound stickers can be created by incorporating mismatches, decreasing length or using backbones other than phosphodiester. Similarly DNA oligos can be used as weakly bound stickers in a system where standard stickers bind tighter. For example when standard stickers are longer or use other backbones, for example peptide backbones. This randomizes only the weak stickers, and allows the normal stickers to remain in place. It is also possible to remove weak stickers entirely while leaving standard or more strongly bound stickers in place.

Ninth embodiment - Data representations and translation between the data representations .

The above embodiments have described storing data, including computing states and memory states, and two different ways of representing these states using molecules .

The "strand" representation, such as described in the first embodiment, represents the states using strings of molecules which are held together. The kinds of the molecules in the solution represents the current state of the computation and problem. This representation has many advantages, including the ability to separate by various techniques, and to clone by PCR.

The sticker representation is described, for example, in the third embodiment. The sticker representation can be used for a memory, or to represent states. The information content of the unit can be easily changed, but there are no straightforward techniques to clone the units .

Each of these representations allows different operations to be carried out. It may be desirable to change information between one and another of these representations. The present system facilitates using both representations during a computation.

The preferred embodiment of this technique starts with a primary strand of length 500 , composed conceptually of 50- ers . Each 50 mer block B ₁ is represented as L^^, where L _x is a 20 mer, M _x is a 10 mer, and R is a 20 mer. Each block B ₁ can be associated with a sticker S = (L ₁M ₁R ₁ not)

A tube of molecules that remains after a computation includes the primary strand, and some subset of stickers attached. A different subset may be attached for different complexes. Each sticker S^ is now associated with a new substitute sticker, S ' _x = ( _jM' ^ not) . (M \ not) is a 10 mer that is unrelated to (M _x not) .

An excess of the substitute stickers are added to the tube. Each block in each complex has now attached to either an S _^ or an S . The condition of the block depends on whether the block originally had an S ₁ attached or not.

It is apparent that those blocks with S ₁ attached will not receive S ' ₁.

Ligase is added to covalently bond the sequence of stickers and substitute stickers to form a single secondary strand. The primary strands are then removed.

These secondary strands are now strand representations of the same information that was represented by the sticker strands from which they were derived. The correspondence shows that B _x in the secondary strand has M _x if the complex from which it was derived had a sticker at B _x . The block B _x on the secondary strand has M if the complex from which it was derived did not have a sticker at B _Î» .

These secondary strands may be cloned using PCR, and hence include many of the advantages of the strand representation.

An alternative embodiment immediately adds polymerase and an excess of universal nucleotide monomers to the original tube of complexes . These universal nucleotides complement all of the ATCGs , and hence fills in the gaps between stickers.

The resulting secondary strands now become the single strand representations. When cloned by PCR, the universal nucleotides are replaced by other nucleotides. Accordingly, there is garbage in these locations, but there is a high probability of finding the complement to B ₁ in the ith block that corresponds to the original complex having a sticker at position B _x.

Although only a few embodiments have been described in detail above, those having ordinary skill in the art will certainly understand that many modifications are possible in the preferred embodiment without departing from the teachings thereof.

All such modifications are intended to be encompassed within the following claims. For example, the molecules, DNA, and other substances described herein are merely examples of the possible substances which might find application with the present

invention. In the future, research in molecular biology may provide improved techniques for manipulating macromolecules . Research in chemistry may allow for the development of synthetic designer enzymes . One can imagine the eventual emergence of a general purpose computer consisting of nothing more than a single macromolecule conjugated to a ribosomelike collection of enzymes that act on it.

A single molecule of DNA can also be used to encode the "instantaneous description" of a Turing machine. Currently- available protocols and enzymes could be used to induce successive sequence modifications , which would correspond to the execution of the machine.

Building a computer with a small i time of separation is highly desirable. One of the problems with the designs described to this point is that 'separation' is a largely mechanical process applied from outside the tubes . A molecular computer could be designed, perhaps using different 'primitives' than those used here, which would accomplish its task by purely chemical means inside of a single tube.

Since the restricted model of computation requires only merges, separations and detections, other physical systems could provide an incarnation. For example, some form of all of these operations appear possible using atomic or subatomic particles traveling in accelerators with separation done according to mass or charge.

Single molecules can be detected in solution by labeling them with fluorescent ligands as described in Eigen et al, "Sorting single molecules...", PNAS 91:5740-5747 (June 21, 1994) . Physicists have also determined ways to detect a single subatomic event in a huge volume of space when detecting neutrinos . This could also be used for the detect operation.

A molecular computer could also use a Describe operation. Roughly, decode a molecule back into the set of symbols it encodes. For example, when determining whether a propositional formula Ï is satisfiable or not, a molecular

computer may produce a tube which contains the molecules that encode satisfying truth assignments, if such exist. One may wish to have an explicit description of one such truth assignment rather than to just know that at least one exists. Hence one would want to take a molecule from the tube and Describe it. This operation would be implementable in the DNA computer by well-known methods (e.g. sequencing) of molecular biology. It is worth noting however that it is in fact implementable in all incarnations of restricted molecular computers. Let the alphabet of the system be { s _t , . . . , S _; } then to Describe a single molecule which is contained in a tube T, one can use the following program:

(1) Input (T)

(2) Symbols=0 (begin with the empty set of symbols) (3) If Detect(T) = 0 then output Symbol.

(4) For l = 1 to z :

(a) If Detect(+ (T, s ) = 1 then:

(i) T = +(T,sJ

( n) Symbol=SymbolU{ sJ (add an s ₁ to the set of symbols)

(b) If Detect ( + (T , s ) Ã· l then ( i ) T= - (T , SJ

(5) Output Symbol.

Notice that even if T contained many different molecules each encoding a different set of symbols, the program above would give an explicit description of exactly one of them.

Previous Patent: WO/1997/007437

Next Patent: REGULATING SYSTEM WITH SMITH PREDICTOR