DECONVOLUTION METHOD FOR THE ANALYSIS OF DATA RESULTING FROM ANALYTICAL SEPARATION PROCESSES

Title:

DECONVOLUTION METHOD FOR THE ANALYSIS OF DATA RESULTING FROM ANALYTICAL SEPARATION PROCESSES

Document Type and Number:

WIPO Patent Application WO/1997/022942

Kind Code:

Abstract:

An improved signal processing method in a signal processor for performing the deconvolution of a signal resulting from an analytical separation process is disclosed. In a first aspect, a signal representing a plurality of partially separated sample zones is measured, a point-spread-function of the signal is determined, and the Fourier transform of the signal and the point-spread-function is taken. Next, a noise component n of the signal is determined and the value of a result signal A(f) is calculated using the following filter A(f) = (D(f)P*(f))/P*(f)P(f) + n , where D(f) is the Fourier transform of the signal, P(f) is the Fourier transform of the point-spread-function, and P*(f) is the complex conjugate of P(f). Finally, the inverse Fourier transform of the result A(f) is taken and reported as A(t). Preferably, the point-spread-function is a Gaussian function having a standard deviation 'sigma', where 'sigma' is determined using either an 'alpha''beta' tracker or the function .'sigma'=(a+bt2)1/2, where a and b are constants. In a second aspect of the invention, a plurality of possible point-spread-functions of the signal are determined and the above-described method is applied using each PSF. The value of the largest point-spread-function which provides a non-negative result is determined, and the associated value of A(t) is reported. In a third aspect of the invention, a program storage device readable by a machine, tangibly embodying a program of instructions executable by a machine to perform the method steps of the first or the second aspects of the invention is provided.

Inventors:

ALLISON DANIEL B
BOWLBY JAMES O JR

Application Number:

PCT/US1996/019242

Publication Date:

June 26, 1997

Filing Date:

December 02, 1996

Export Citation:

Click for automatic bibliography generation Help

Assignee:

PERKIN ELMER CORP (US)

International Classes:

G01N27/447; G01N30/86; G01N33/50; G06F17/00; G06F17/14; G06F17/15; H03H17/02; (IPC1-7): G06F19/00; G06F17/15

Domestic Patent References:

WO1994014131A1

1994-06-23

Foreign References:

US4532548A

1985-07-30

Other References:

ZHAO J ET AL: "INVESTIGATION OF BLOCK FILTERING AND DECONVOLUTION FOR THE IMPROVEMENT OF LATERAL RESOLUTION AND FLAW SIZING ACCURACY IN ULTRASONIC TESTING", ULTRASONICS, vol. 33, no. 3, 1 May 1995 (1995-05-01), pages 187 - 194, XP000536424
BAHRAM JAVIDI ET AL: "IMAGE DECONVOLUTION BY NONLINEAR SIGNAL PROCESSING", APPLIED OPTICS, vol. 28, no. 15, 1 August 1989 (1989-08-01), pages 3106 - 3111, XP000033762
KONDO H ET AL: "ADAPTIVE IMAGE RESTORATION USING CONSTRAINED DECONVOLUTION", TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS OF JAPAN, vol. E72, no. 11, 1 November 1989 (1989-11-01), pages 1243 - 1250, XP000086632
MEHMET BILGEN ET AL: "RESTORATION OF NOISY IMAGES BLURRED BY A RANDOM POINT SPREAD FUNCTION", PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, NEW ORLEANS, MAY 1 - 3, 1990, vol. 1 OF 4, 1 May 1990 (1990-05-01), INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, pages 759 - 762, XP000166934
BILGEN M ET AL: "RESTORATION OF NOISY IMAGES BLURRED BY A RANDOM MEDIUM", PROCEEDINGS OF THE ANNUAL SYMPOSIUM ON COMPUTER BASED MEDICAL SYSTE, DURHAM, JUNE 14 - 17, 1992, no. SYMP. 5, 14 June 1992 (1992-06-14), INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, pages 250 - 256, XP000308281
S.DORE ET AL: "Experimental determination of CT point spread function", PROCEEDINGS OF THE ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, IEEE PRESS NEW YORK USA, vol. 2, 9 November 1989 (1989-11-09) - 12 November 1989 (1989-11-12), SEATTLE, WA, USA, pages 620 - 621, XP002026986

Download PDF:

View/Download PDF PDF Help

Claims:

WE CLAIM

A signal processing method in a signal processor for identifying sample zones in a signal including multiple overlapping sample zones resulting from a separation process comprising the steps of receiving a signal representing a plurality of partially separated sample zones; determining a pointspreadfunction of the signal, converting the signal and the pointspread function from a time domain representation to a frequency domain representation using a Fourier transform, determining a noise component of the signal, calculating the value of a result signal A(f) using the formula A(f) 7 (—D—(f)—P7* ( ) , where D(f) is the Fourier transform of the signal, P(f) is the P * (f)P(f) + n Fourier transform of the pointspreadfunction, P*(f) is the complex conjugate of P(f), and n is the noise component of the signal, and converting the result signal A(f) from a frequency domain representation to a time domain representation by performing an inverse Fourier transform .

2.	The method of claim 1 wherein the pointspread function is a Gaussian function having a standard deviation σ.

3.	The method of claim 2 wherein σ is determined using an αβ tracker including a constant term α and a constant term β.

4.	The method of claim 3 wherein the value of α is between 0 2 and 0 8.

5.	The method of claim 3 wherein the value of α is between 0 4 and 0 6 a β = .

The method of claim 3 wherein (2 ) , \ 1 2 1 The method of claim 2 wherein σ [a + bt² ) 8 A signal processing method in a signal processor for identifying sample zones in a signal including multiple overlapping sample zones resulting from a separation process compπsing the steps of receiving a signal representing a distπbution of partially separated sample zones, providing a plurality of possible pointspreadfunctions of the signal, converting the signal and each of the pointspread functions from a time domain representation to a frequency domain representation by taking the Fouπer transform of each, providing an estimate of a noise component of the signal calculating a result signal A(f) for each pointspreadfunction, where _{Affl =} (D(f) P * (f)) P (f)P(f) ^{+ n} _t where D(f) is the Fourier transform of the signal, P(f) is the Fourier transform of the pointspreadfunction, P*(f) is the complex conjugate of P(f), and n is the noise component of the signal, converting the result signal from the frequency domain to the time domain by taking the inverse Founer transform of the result signal A(f) for each value of the pointspread function to give a result signal in the time domain A(t), reporting the result signal A(t) in the time αomain calculated using the largest value of the pointspreadfunction that provides a nonnegative value of the result signal A(t) 9 A program storage device readable by a machine, tangibly embodying a program of instructions executable by a machine to perform method steps to identify sample zones resulting from a signal representing a plurality of partially separated sample zones, said method steps compπsing receiving a signal representing a plurality of partially separated sample zones, determining a pointspreadfunction of the signal, converting the signal and the pointspread function from a time domain representation to a frequency domain representation by taking the Fourier transform of each, determining a noise component of the signal, calculating the value of a result signal A(f) using the formula (D(f) P * ( ) A(f) = ,~_n,_{Λ »} where D(f) is the Fouπer transform of the signal, P(f) is the P * (f P(f) + n Fouπer transform of the pointspreadfunction, P*(f) is the complex conjugate of P(f), and n is the noise component of the signal, and converting the result signal A(f) from a frequency domain representation to a time domain representation by taking the inverse Fourier transform of the result signal.

Description:

DECONVOLUTION METHOD FOR THE ANALYSIS OF DATA RESULTING () FROM ANALYTICAL SEPARATION PROCESSES

BACKGROUND

5 This invention relates to the analysis of data resulting from an analytical separation process using an improved deconvolution method

The ability to separate and detect closely related chemical species is critically important in modern chemical and biochemical science This is particularly true in the o life sciences which involve the study of molecules that exist as complex mixtures of closely related species

Analytical separation process separate individual components of a multicomponent mixture into individual sample zones based on the different migration 5 rate of each component Examples of such separation process include liquid chromatography, gas chromatography, electrophoresis, centrifugation, staged extraction, adsorption, and the like An application of such processes of particular

importance to modern biotechnology is DNA sequencing, which relies on the electrophoretic separation of polynucleotide fragments up to 1000 nucleotides in length which differ in size by only a single nucleotide

Data resulting from analytical separation process is typically in the form of sample quantity as a function of time and/or position For example, in the case of HPLC or reai-time electrophoresis detectors, the data is in the form of concentration as a function of time Alternatively, for autoradiographic electrophoresis detectors, the data is in the form of sample quantity as a function of position

A problem which aπses in the interpretation of such data is the identification of individual sample zones from among a plurality of zones which have not been completely resolved, i e , overlapping sample zones See FIG 1 Lack of resolution is caused by the finite width of a sample zone due to the dispersive effects of diffusion, interfacial mass transfer, thermal profiles, sample injection, and other like effects This problem is particularly acute when automatic data analysis is utilized, e , automatic base calling of DNA sequences, where hundreds of overlapping zones must be accurately resolved

One particularly powerful class of methods useful for identifying individual sample zones in data containing overlapping sample zones is deconvolution ln deconvolution, a measured signal is considered to be a convolution of two separate signal functions— a "true" signal function representing the location of a hypothetical zone having zero width, and a point-spread-function (PSF) representing the non-zero width of each sample zone In practice, the deconvolution process generally is implemented by taking the Fourier transform (FT) of the signal and a PSF, dividing the transformed signal by the transformed PSF, and performing an inverse Fourier transform of the result In addition, a filtering operation may applied prior to taking the inverse Fourier transform to reduce the effect of noise on the deconvolved data

Present deconvolution methods applied to data resulting from analytical separations use a "blind" deconvolution method with homomorphic filtering, e g , U S Patent No 5,273,632, incoφorated herein by reference in its entirety Blind deconvolution methods utilize a constant PSF which is chosen without any reference to a particular application This provides the advantage that the PSF need not be precisely know ahead of time However, blind deconvolution methods suffer from a number of important limitations Because the PSF function is not optimized with respect to the signal, a highly discriminatory filter can not be effectively employed because application of such a filter to a non-optimized PSF would create high frequency noise Furthermore, the transformed signal must be preconditioned prior to filtering to normalize the signal This extra step can introduce unwanted noise into the final result Finally, present methods do not account for the fact that the PSF can vary during the course of an analysis, thus making the fit between a constant PSF and the actual data necessaπly suboptimal

SUMMARY

The present invention is directed toward our discovery of an improved signal processing method in a signal processor for performing the deconvolution of a signal resulting from an analytical separation process, e g , electrophoresis, HPLC or other like processes The method of the invention uses priori knowledge of the nature of the signal and/or adaptive parameter estimation techniques to tune a PSF used to estimate the width of a sample zone The method has particular application in the area of automated DNA sequencing

An object of our invention is to provide an improved signal processing method for identifying individual sample zones in a signal containing multiple overlapping sample zones wherein a highly discriminatory filter can be applied to the transformed signal, e g , a Werner filter

Another object of our invention is to provide an improved signal processing method for identifying individual sample zones in a signal containing multiple

overlapping sample zones wherein a PSF is determined using an application-specific model of the PSF which takes into account the variation of the PSF as a function of time and/or position in the signal

Yet another object of our invention is to provide an improved signal processing method for identifying individual sample zones in a signal containing multiple overlapping sample zones wherein the value of the PSF is determined using adaptive parameter estimation techniques

Another object of our invention is to provide an improved signal processing method which facilitates the automatic base-calling of DNA sequences

In a first aspect, the foregoing and other objects of the invention are achieved by an improved signal processing method in a signal processor comprising the following steps A signal D(t) representing a plurality of partially separated sample zones is measured in a data window and received bv the processor A pomt-spread- function of the signal is then determined for that data window, P(t), and the Fourier transform of the signal and the point-spread-function is taken Next a noise component n of the signal is determined and the value of a result signal A(f) is calculated, where

(D(f) P * (f>) ^U P * (f) P(0 + n where D(f) is the Fourier transform of the signal, P(f) is the Fouπer transform of the point-spread-function, P*(f) is the complex conjugate of P(f), and n is the noise component of the signal Finally, the inverse Fouπer transform of the result signal A(f) is taken and reported as result signal A(t) in the time domain This process is repeated for successive data windows until all of the signal has been processed

In a first preferred embodiment, the point-spread function is a Gaussian function having a standard deviation σ, where σ is determined using an αβ tracker including constant terms α and β, where α is between 0 2 and 0.8 and

β =

(2 - a )

In a second preferred embodiment, the point-spread function is a Gaussian function having a standard deviation σ, where σ is determined using the function where a and b are constants

ln a second aspect of the invention, a plurality of possible point-spread- functions of the signal are determined Then, the Fourier transform of each of the point-spread-functions is taken along with the Fourier transform of the signal Next, an estimate of a noise component of the signal is determined and the value of a result signal A(f) is calculated for each point-spread-function, where

(D(f> P * ( )

A(f) = P * (0 P(f) + n

where the variables D(f), P(f), P*(f)\ and n are as defined above The inverse Fourier transform of the result signal A(f) is taken for each value of the point-spread function Then, the value of the largest point-spread-function which provides a non-negative result signal in the data window is determined, and the associated value of A(t) is reported.

In a third aspect of the invention, a program storage device readable by a machine, tangibly embodying a program of instructions executable by a machine to perform the method steps of the first or the second aspects of the invention is provided

In a fourth aspect of the invention, a signal processor adapted to carry out the method steps of the first or the second aspects of the invention is provided

These and other objects, features, and advantages of the present invention will become better understood with reference to the following description, drawings, and appended claims

BRIEF DESCRIPTION OF THE DRAWINGS

FIG 1 is a schematic representation of overlapping sample zones in a signal

FIG 2 is a flow chart of generally describing the signal processor of the present invention

FIG 3 depicts a normal or Gaussian distribution

FIG 4 is a flow chart of steps generally describing a preferred αβ-tracker utilized in the signal processor of the present invention

FIGS 5 A, 5B, and 5C show alternative methods for indexing a data window utilized in the signal processor of the invention

FIGS 6A and 6B show DNA sequencing data obtained early in a run both before (top) and after (bottom) processing by the signal processor of the invention

FIGS. 7 A and 7B show DNA sequencing data obtained in the middle of a run both before (top) and after (bottom) processing by the signal processor of the invention

FIGS. 8 A and 8B show DNA sequencing data obtained towards the end of a run both before (top) and after (bottom) processing by the signal processor of the invention

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Reference will now be made in detail to the preferred embodiments of the invention, examples of which are illustrated in the accompanying drawings While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to those embodiments On the contrary, the invention is intended to cover alternatives, modifications, and equivalents, which may be included within the invention as defined by the appended claims

The present invention is directed towards an improved signal processing method in a signal processor useful for identifying individual sample zones, or "peaks", present in a signal compπsing a plurality of overlapping sample zones, such signal being generated by a detector monitoπng an analytical separation process, e g , chromatography, electrophoresis, centπfugation, staged extraction adsorption, and the like More specifically, the invention is directed toward an improved signal processing method in a signal processor for deconvolving such a signal The invention is particularly well adapted for detecting individual electrophoretic sample zones in a signal containing multiple overlapping sample zones resulting from an automated DNA sequencing process

Generally, the method of the present invention is earned out as follows (i) the Fourier transform of a signal in a data window is computed, (u) a point-spread function of the signal is determined, (in) the Fourier transform of the point-spread function is taken, (iv) a noise component of the signal is determined, (v) the value of a deconvolved result signal A(f) is determined by the formula

Air ) = — ( —D( —f ⁾ P * ( —f) —) P * (f) P(f + n where D(f) is the Fouπer transform of the signal, P(f) is the Fouπer transform of the point-spread-function, P*(f) is the complex conjugate of P(f), and n is a noise

component of the signal; (vi) the inverse Fourier transform of the result signal A(f) is taken; (vii) the result signal in the time domain A(t) for the window is reported; and (viii) the window is indexed. Steps i-viii are repeated until all of the signal has been processed. FIG. 2 shows a representation of the signal processing system of the present invention.

The Signal:

The signal processor of the invention receives a signal generated by a detector which monitors the results of an analytical separation process, e.g., an electrophoretic separation. Such detectors may be "real-time" detectors which monitor the output of a separation during the separation, e.g., a HPLC chromatograph or a real-time electrophoresis scanner. Preferably the signal is generated by an electrophoresis scanner useful for separating and detecting the products of a DNA sequencing reaction, e.g., by Sanger-type sequencing, e.g., U.S. Patent Nos. 4,81 1 ,218, 4,879,012, 4,832,815, 4,675,095, and 5,274,240, each of which is incorporated by reference herein in its entirety. Alternatively, the detector may be a "snap-shot" detector which analyzes the results of a separation after the separation is completed, e.g., an autoradiographic or fluorescent scanner, e.g., U.S. Patent No. 5,091 ,652, which is incoφorated by reference herein in its entirety.

Such detectors may rely on any detectable signal which can be associated with a sample component. Exemplary detectors include detectors based on changes in fluorescence, absorbance, index of refraction, radioactivity, or the like. Preferably, the signal-to-noise ratio (S/N) of the signal is greater than 10 dB. More preferably, the sampling frequency of the signal is chosen to satisfy the Nyquist sampling theorem, i.e., the sampling frequency is at least twice the frequency of the highest frequency in the signal. Preferably, the signals are converted to digital form prior to processing using standard A/D conversion apparatus.

The Point-Spread-Function:

The term "point-spread function" or "PSF" as used herein refers to any model which approximates the signal or portions thereof. The model can be in the form of a

function, or as a set of discrete values Exemplary PSFs include functions such as cos, sin, cos ², Lorentzean, second order parabolic , Gaussian, or any other like function or combination of functions Preferably, the PSF is a Gaussian function or a group of multiple Gaussian functions A Gaussian function is characterized by a standard deviation σ which is directly related to the width of the bell-shaped Gaussian curve, i e , w ^=2 35σ where wι ₂ is the full-width of a peak at half maximum height See FIG 3

The ability of a deconvolution method to identify overlapping sample zones is directly related to how well the PSF approximates the signal, or how well "tuned" the PSF is to a particular signal If the PSF is not properly tuned, the deconvolution process will not faithfully recover the "true" signal, and may result in the creation of spurious peaks and/or the loss of actual peaks Given a Gaussian PSF, the parameter s is used to tune the PSF to the signal

One method of choosing a σ which fits the signal is to assume that s is constant throughout the signal, e.g., Stockham et al , Proceedings of the IEEE, 63(4) 678-692 (1975) However, if the width of the sample zones change throughout the signal, the PSF will never be properly tuned to all of the measured signal In fact, this is the situation in signal derived from a DNA sequencing operation, where the width of the sample zones in the spatial dimension decreases with increasing base number

In one important aspect of the signal processor of the present invention, the value of σ at a particular location in a signal derived from a DNA sequencing electrophoresis experiment is computed using an empirical function which describes the change in s as a function of time or iocation By using a function to describe how σ changes with base number rather than a using a constant value for σ, the PSF is better tuned to the signal throughout the data One preferred expression which is used to describe the change in σ as a function of run time is, where a and b are constants which are chosen by fitting experimental data Values of a and b are dependent on the conditions used for the electrophoretic separation, e g.,

electrical field strength, gel concentration, buffer composition, temperature, and other like parameters which affect electrophoretic separations In the case of automated DNA sequencing, under conditions typically used in the ABI Model 373 DNA Sequencer (Applied Biosystems Division, The Perkin-Elmer Coφoration, Foster City, CA (ABI)), a is preferably about 8 61 s ² and b is preferably about 5.53* 10 ^"7

In another important aspect of the signal processor of the present invention, the value of σ is determined using adaptive parameter estimation, where, as used herein, the terms "adaptive parameter estimation" or "adaptive tuning" refer to a process whereby the value of a parameter is determined with reference to previous values of that parameter in the same signal Adaptive tuning has the advantage that the tuning algorithm can dynamically adapt to a signal in which the value of σ changes in an unpredictable manner.

A number of methods may be used to perform adaptive parameter estimation including neural networks, Kalman filters, and ab trackers Preferably, in the signal processor of the present invention, the adaptive tuning is accomplished using an αβ tracker, e.g , Multiple-Target Tracking with Radar Applications, S.S Blackman, Artech House, Inc (1986) The αβ tracker is preferred because it is computationally efficient and provides sufficiently accurate tracking As used herein, the term "αβ tracker" refers to a method for updating the value of a variable in a data window based on (i) an initial value of a dependent variable, (ii) a present value of a dependent variable; and (iii) a present value of the first derivative of the dependent variable with respect to a relevant independent variable, e.g , time or position As used herein, the term "data window" refers to a portion of the signal to which one cycle of the ab tracker is applied See FIGS 5A-C The size of the data window is chosen such that the value of σ within the window is essentially constant Preferably the value of σ varies less than 10% across the window More preferably, the value of σ varies less than 5% across the window In a preferred embodiment, the size of the data window is constant throughout an analysis

The αβ tracker as applied to the present invention operates as follows See FIG 4 An initial value for σι(k) and dσι(k)/dt is obtained, where k is a data window index, the subscπpt I indicates an initial value, and t is time, the independent vanable σι(k) and dσι(k)/dt can be obtained in any manner resulting in a close approximation to the actual values of σι(k) and dσι(k)/dt In one preferred method for obtaining initial values of σι(k) and dσι(k)/dt, estimates are based on previous experience with like systems

In a second preferred method for obtaining initial values for σι(k) and dσι(k)/dt, values are measured directly from a data window in which such values can be accurately measured For example, in the case of DNA sequencing, the earlv bases, l e , between 20 and 100 nucleotides, are typically completely resolved Thus values of s can be easily measured using any peak-identification method, e g , the 0-crossιng method (The 0-crossιng method will be descπbed hereinafter ) Given a measured value for σ, initial values of σι(k) and dσι(k)/dt may be readily calculated

Once initial values for σι(k) and dσι(k)/dt have been established, the data window is indexed from the initial k window to the k+1 window There are a number of different possible indexing alternatives The k+ 1 window can be (l) immediately adjacent to the k window, FIG 5A, (n) partially overlap the k window, FIG 5B, or (in) be distant from the k window with uninterrogated data located between the k and k+1 windows, FIG 5C The choice of indexing method depends on several factors including (I) the magnitude of dσ/dt, (u) the required accuracy, and (in) the required computational speed

If high accuracy is required, and slow computational speed can be tolerated, a high degree of overlap between the data windows is desirable, e g , overlap of over 50% As the requirement for accuracy is relaxed, computational speed may be increased by reducing the degree of overlap or by entirely eliminating any overlap If the k+1 window is distant from the k window, an mteφolation method is used to determine values for σι(k) and dσι(k)/dt in the uninterrogated inter-window regions

Next, a predicted value for σ in the k+ 1 window, σp(k+l ), is calculated σp(k+l) is calculated for the k+1 window using the relation,

where T is the distance between the center of data window k and the center of data window k+1

Once a predicted value for s in the k+1 data window is obtained, an observed value for σ in the k+1 window, σo(k+l), is measured Any number of well known methods may be used to measure σo(k+l) One preferred method is the "0-crossing" method wherein the value of σo(k+l ) is taken to be proportional to the distance between two points at which d ²σ/dt ==0

Next, given values for σp(k+l) and σo(k+l), the αβ tracker equations are used to calculate values of σ(k+ 1 ) and dσ/dt(k+ 1 ), where, σ(k + ]) = σ _p (k + \) + a [σ ₀(k + \) - σ _p (k + 1)] and, dσ , dσ β r ι

^~d7 ^{{ + 1} = ~}dt ^{~ { ) +} τ ° ^{ik +} ^ ^{~ σp}^ ^{k + λ}Λ where a and b are constants having values between 0 and 1

The values of α and β determine the relative weight assigned to σo(k+l ) and σp(k+ l) If α=0, σ(k+l) is determined solely based on the predicted value of σ, σp(k+l), while if α =1, σ(k+l) is determined solely based on the observed value of σ(k+l), σo(k+l) The value chosen for a depends on the (i) rate of change of σ, i e., magnitude of dσ/dt, and (ii) the accuracy of the method used to measure σo(k+l) When the rate of change of σ is high, and the accuracy of σo(k+l ) is high, the observed value of σ is weighted more heavily then the predicted value, thus α is chosen to be large, e.g , 0.8 Alternatively, when the rate of change of σ is low, and the accuracy of σo(k+l) is low, the predicted value of σ is weighted more heavily then the observed value, thus α is chosen to be small, e.g., 0 2 In the worst case, when

the rate of change of σ is high, and the accuracy of σo(k+l ) is low, one wants to weight the predicted value of σ and the observed value of σ equally, thus a is chosen to be approximately 0 5 Preferred values of α range between 0 2 to 0 8 More preferably, α is between 0 4 and 0 6. The optimum value for β is determined from the value of a according to the relation, a β =

(2 - a )

The above steps are repeated until a PSF has been obtained for each of the data windows making up the signal

In a third important aspect of the present invention, a plurality of possible values for the PSF are selected, and the largest value of the PSF which provides a non-negative result signal A(t) in a particular data window is used in the deconvolution method

Fourier Transform of the Signal and the Point Spread Function Prior to deconvolving the signal and the PSF, the FT of the signal and the PSF is taken, thus transforming the signal and the PSF from the time or position domain to the frequency domain Operating in the frequency domain makes the deconvolution process more computationally efficient because the deconvolution process becomes a multiplicative process in the frequency domain rather than a process requiring the solution of integral equations in the time or position domain Preferably the FT is taken using a fast Fourier transform (FFT) technique, e g , the Cooley-Tukey method Before the FFT is applied to the signal, the signal in the window is "zero-padded" at either the beginning or the end of the window in order to eliminate aliasing effects due to overlap of neighboring sample windows

Application of a Noise Filter and Deconvolution In order to reduce the effect of system noise on the deconvolved result signal

A(t), prior to deconvolving the transformed signal and PSF a Weiner filter is applied to the signal As used herein, the term "system noise" refers to small random

fluctuations in the signal resulting from fluctuations in components of the measurement device such as a radiation source, electronic circuitry, radiation detector, or any other like device-related source, or, from fluctuations in external environmental factors such as mechanical vibration, pick-up from 60-Hz electrical lines, temperature variations, or any other like environmental source

System noise, n, is characteπzed by a frequency of greater than 2/Np, where Np is the approximate number of data points making up a single sample zone or peak The value of n which is used in the Wiener filter is preferably a constant for a given data window Preferably, n is empirically estimated from prior knowledge of the application.

One preferred method to determine a value for n is as follows. The FT of the random fluctuations in the signal is collected in a given data window, and the frequency at which the FT of the fluctuations does not appreciably change as a function of frequency is determined and the value of the FT of the fluctuations at that frequency is taken as the value of n.

The preferred filter used to reduce the effects of the system noise on the deconvolved result is a Weiner filter , e.g., Fundamentals of Digital Image Processing, A K. Jain, pages 7 and 276, Prentice Hall, New Jersey ( 1989) A Weiner filter is defined by the function

D(f)P * (f)

A(f) =

P * (f)P(f) + n

where A(f) is the Fourier transform of the result signal, D(f) is the Fourier transform of the signal, P(f) is the Fourier transform of the point-spread-function, P*(f) is the complex conjugate of P(f), and n is the system noise

Computer

The steps of above-describe signal processing method are performed by a computer Such computer can take the form of a generic microprocessor driven by appropriate software, a dedicated microprocessor using embedded firmware, or a customized digital signal processing circuit (DSP) which is dedicated to the specific data acquisition, analog-to-digital conversion, Fourier transformation, or filtering operation required by the signal processing method

In one preferred embodiment, the computer comprises (i) a memory for storing a digitized representation of the signal, and (ii) a processor for carrying out the various steps of the signal processing method

In such a case, the above method steps are embodied in a program storage device readable by a machine, such program storage device including a computer readable medium Computer readable media include magnetic diskettes, magnetic tapes, optical disks, Read Only Memory, Direct Access Storage Devices, and any other like medium

Examples

The invention will be further clarified by a consideration of the following examples, which are intended to be purely exemplary of the invention

EXAMPLE 1

Deconvolution of DNA Sequencing Data using the Method of the Invention

This example describes the results of an experiment applying the deconvolution method of the present invention to DNA sequencing data collected on a automated DNA sequencer, the Model 373 DNA Sequencer supplied by the Applied Biosystems Division of The Perkin-Elmer Coφoration, Foster City, CA All of the data shown below was generated from the file named "Sample 21. 1 copy" The "Annotation

View" from the Sequencing Analysis software for this file is shown immediately below

Data Col lection

Fi le: Samp I e 21.1 copy

Samp I e : Sample 21.1 copy

Comment: PGErLR8,C9,G9,T10

Lane Number: 21

Channe I Number : 108

Number of Scans: T0736

Length: 1240

Run started at: 1/6/1995, 15:05

Run stopped at: 2/6/1995, 09:03

Gel : Gel

Dyeset Pr i mer : DP4SRc{-21M13}

Comb: 36-ωel I sharks-tooth

Instrument Name: Machine 1115

Col lect Uers. N/fl

Data Analysis

Base Cal I Start: 1033

Base Cal I End: 1073

Primer Peak Loc. : 1033

SignaI : C (156), fl (125), G <117>, T (105)

Matrix Name: Machine 1115

ChanneIs Rυe. : 3

Analysis Uers. Uersion 2.2.0d2

Base Spacing: 8.56 - Baseca I I er++PPC

The sequencing template was a pGEM 3Zf(+) plasmid The sampie was prepared according to standard protocols described in the document ABI Prism Dye Primer Cycle Sequencing Core Kit with AmpliTaq DNA Polymerase FS (ABD part number 402114, rev A, July 1995), with the exception that the nucleotide balance was slightly altered The altered nucleotide balance is shown in the table immediately below

(* denotes 7-deaza dGTP)

The gel was poured using the standard 4% acrylamide as described in the 373 Stretch Instrument Manual (ABI p/n 903204) The migration distance for the electrophoresis was 48cm The electrophoresis was performed using standard conditions

The deconvolution was performed on 3 regions of the data, early (bases 47- 73), middle (bases 424-445), and late (bases 760-786) In each region, a constant Gaussian point spread function (PSF) was assumed The parameters used for the deconvolution in each region are shown in the table immediately below

The results for each region are shown below For each region, the data is shown before and after the deconvolution Each window contains 300 data points FIGS 6A and 6B show data from the early region of the signal both before (top) and after (bottom) deconvolution, FIGS 7A and 7B show data from the middle region of the signal both before (top) and after (bottom) deconvolution, and FIGS 8A and 8B show data from the late region of the signal both before (top) and after (bottom) deconvolution

Previous Patent: SPECIFYING INDEXES FOR RELATIONAL DATABASES

Next Patent: ELECTRONIC ORDERING AND ROUTING SYSTEM