Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ANALYSING A BIOLOGICAL SAMPLE AND OBTAINING RELATED BIOLOGICAL INFORMATION AND STORING AND SEARCHING BIOLOGICAL INFORMATION IN A DATABASE
Document Type and Number:
WIPO Patent Application WO/2017/064142
Kind Code:
A1
Abstract:
A system for obtaining biological information relating to a biological sample comprises a control unit, a phenotype data acquisition unit analysing the biological sample concerning a phenotype characteristic; and a biological sequence data acquisition unit determining a biological sequence of the biological sample. The control unit controls the phenotype data acquisition unit to obtain phenotype data representing said phenotype characteristic, to control the biological sequence data acquisition unit to obtain biological sequence data representing said biological sequence; and to store an indication of the phenotype data and the biological sequence data to a storage unit with an indication that both data are associated with the same biological sample. Thus the storage unit may comprise a database of records of biological information relating to a plurality of biological samples, with associated data for each. The database is searchable by selecting a selected biological sequence characteristic or a selected phenotype.

Inventors:
BURKE JULIAN FRANCIS (GB)
Application Number:
PCT/EP2016/074511
Publication Date:
April 20, 2017
Filing Date:
October 12, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
LEICA MICROSYSTEMS CAMBRIDGE LTD (GB)
International Classes:
G16B20/20; G16B50/30
Domestic Patent References:
WO2009086521A22009-07-09
Foreign References:
US20120102054A12012-04-26
US20030021017A12003-01-30
US7035004B22006-04-25
Other References:
A. T. OBERLIN ET AL: "Biological Database of Images and Genomes: tools for community annotations linking image and genomic information", DATABASE, vol. 2013, no. 0, 2 April 2013 (2013-04-02), pages bat016 - bat016, XP055342595, DOI: 10.1093/database/bat016
Attorney, Agent or Firm:
M PATENT GROUP (DE)
Download PDF:
Claims:
CLAIMS

What is claimed is: 1 . A system for obtaining biological information relating to a biological sample; the system comprising:

a control unit,

a phenotype data acquisition unit configured to analyse the biological sample with respect to at least one phenotype characteristic; and

a biological sequence data acquisition unit configured to determine at least partially a biological sequence of the biological sample,

wherein the control unit is configured to:

control the phenotype data acquisition unit to obtain phenotype data representing the at least one phenotype characteristic of the biological sample;

control the biological sequence data acquisition unit to obtain biological sequence data representing a biological sequence characteristic of the biological sample; and

convey an indication of the phenotype data and the biological sequence data to a storage unit for storing together with an indication the phenotype data and the biological sequence data are associated with one another.

2. The system of claim 1 , wherein the phenotype data acquisition unit comprises an imaging module analysing the biological sample and the phenotype data comprise image data for the biological sample. 3. The system of claim 1 or 2, wherein the phenotype data acquisition unit comprises a mass spectrometer module analysing the biological sample and the phenotype data comprise mass spectrometer data for the biological sample.

4. The system of any preceding claim, wherein the phenotype data acquisition unit comprises a flow cytometer module analysing the biological sample and the phenotype data comprise flow cytometry data for the biological sample.

5. The system of any preceding claim, wherein the phenotype data acquisition unit comprises a physiology data acquisition module analysing the biological sample and the phenotype data comprise physiology data for the biological sample

6. The system of claim 1 , wherein the phenotype data acquisition unit and at least a part of the biological sequence data acquisition unit form an integrated apparatus.

7. The system of claim 6, wherein the integrated apparatus further comprises the control unit.

8. The system of claim 6 or 7, wherein the storage unit is separate from the integrated apparatus. 9. The system of claim 8, wherein the controller unit is communicatively coupled to the storage unit through the Internet.

10. The system of any preceding claim, wherein the biological sequence data acquisition unit comprises a nucleic acid sequence data acquisition unit and the biological sequence data comprise nucleic acid sequence data.

1 1 . The system of claim 10, wherein the nucleic acid sequence data acquisition unit comprises a nucleic acid extraction module configured to extract nuclei acid from the biological sample and a nucleic acid sequencing module configured to sequence the nuclei acid extracted from the biological sample by the nucleic acid extraction module.

12. The system of any preceding claim, wherein the biological sequence data acquisition unit comprises a protein sequence data acquisition unit and the biological sequence data comprises protein sequence data.

13. The system of any preceding claim, wherein the system further comprises a dissection unit configured to dissect a portion of biological sample and to transfer the portion of biological sample to the biological sequence data acquisition unit for processing by the biological sequence data acquisition unit.

14. A method for obtaining biological information relating to a biological sample in a system comprising a phenotype data acquisition unit and a biological sequence data acquisition unit, wherein the method comprises:

using the phenotype data acquisition unit to analyse the biological sample with respect to at least one phenotype characteristic by and to obtain phenotype data representing said at least one phenotype characteristic of the biological sample; using the biological sequence data acquisition unit to determine at least partially a biological sequence of the biological sample and to obtain biological sequence data

representing a biological sequence characteristic of the biological sample; and

conveying an indication of the phenotype data and the biological sequence data to a storage unit for storing together with an indication the phenotype data and the biological sequence data are associated with one another.

15. A method for maintaining a database of biological information relating to biological samples, the method comprising:

analysing a biological sample with respect to at least one phenotype characteristic;

determining at least partially a biological sequence of the biological sample,

receiving phenotype data representing said phenotype characteristic of siad biological sample and biological sequence data representing said biological sequence characteristic of said biological sample together with an indication the phenotype data and the biological sequence data are associated with one another; and

storing a record of the phenotype data for the biological sample in the database in association with a record of the biological sequence data for the biological sample.

16. An apparatus for maintaining a database of biological information relating to biological samples, the information derived by analysing a biological sample with respect to at least one phenotype characteristic and by determining at least partially a biological sequence of the biological sample, wherein the apparatus comprises:

a communications unit configured to receive phenotype data representing a phenotype characteristic of a biological sample and biological sequence data representing a biological sequence characteristic of the biological sample together with an indication the phenotype data and the biological sequence data are associated with one another; and

a storage unit configured to store a record of the phenotype data for the biological sample in the database in association with a record of the biological sequence data for the biological sample.

17. A method of retrieving data from a database of biological information relating to biological samples, wherein the database comprises records for a plurality of biological samples, and wherein the records for each biological sample comprise a record of phenotype data representing a phenotype characteristic of the biological sample and a record of biological sequence data representing a biological sequence characteristic of the biological sample stored in association with one another, and wherein the method comprises: selecting a biological sequence characteristic of interest;

identifying a record of biological sequence data in the database which matches the biological sequence characteristic of interest; and

retrieving from the database the phenotype data stored in association with the biological sequence data which matches the biological sequence characteristic of interest.

18. An apparatus for retrieving data from a database of biological information relating to biological samples, wherein the database comprises records for a plurality of biological samples, and wherein the records for each biological sample comprise a record of phenotype data representing a phenotype characteristic of the biological sample and a record of biological sequence data representing a biological sequence characteristic of the biological sample stored in association with one another, and wherein the method comprises: wherein the apparatus comprises a control unit configured to:

receive an indication of a biological sequence characteristic of interest;

identify a record of biological sequence data in the database which matches the biological sequence characteristic of interest; and

output an indication of the phenotype data stored in the database in association with the biological sequence data which matches the biological sequence characteristic of interest. 19. A storage unit comprising a database of records of biological information relating to a plurality of biological samples, wherein the records for each biological sample comprise a record of phenotype data representing a phenotype characteristic of the biological sample stored in association with a record of biological sequence data representing a biological sequence characteristic of the biological sample.

20. A computer program product comprising machine readable instructions for causing a computer to execute the method of any of claims 14, 15 and 17.

Description:
ANALYSING A BIOLOGICAL SAMPLE AND OBTAINING RELATED BIOLOGICAL

INFORMATION AND STORING AND SEARCHING BIOLOGICAL INFORMATION IN A

DATABASE

BACKGROUND OF THE INVENTION

The present invention relates generally to the field of biological information, and more particularly to methods and apparatus for obtaining biological information, storing biological information in a database, and searching a database to identify biological information of interest.

Large databases of biological information are becoming commonplace, for example to support many areas of biological research and diagnostic analyses. Such databases record biologically relevant information for large numbers of biological samples which may then be searched to identify information of interest. One example biological information database is associated with the BaseSpace (RTM) analysis environment provided by lllumina Inc, San Diego, California, United States.

BaseSpace (RTM) represents a cloud-computing based approach for providing a database of biological information. In broad summary, BaseSpace (RTM) provides a database of nucleic acid sequences which users may access to search for sequences of interest, for example sequences matching / correlating with a sequence obtained by the user for a biological sample of interest. Sequence-based biological information is particularly well-suited to these kind of database applications. This is because sequence-based biological information can be readily searched, for example based on well-established character string searching techniques. However, there is also a desire to use databases for biological information that is less amenable to searching. For example, databases comprising medical image data for a plurality of different biological samples can provide a valuable resource, but such a database can be harder to construct and search than a sequence-based database. This is because the characteristics which are typically of interest (i.e. the biologically relevant descriptors) for image data are often characteristics such as the size, shape, colour, distribution and so on, for structures in the images. However, these types of characteristics cannot be readily searched without significant amounts of automated image processing or manual intervention to in effect label / catalogue the respective images with some form of parameterisation for the relevant characteristics. For example, a study might be concerned with the shapes of cells in images, but to search a database of cell images for particular shapes of cell would need the shapes of the cells in each image to be parameterised in a searchable manner, for example by establishing a measure of how circular the cells appear.

The need to label / tag biological information for use in databases with more readily searchable metadata in this way gives rise to a number of drawbacks. Not only can there be a significant amount of effort required to generate the metadata, in many cases there may be a lack of consistency and reliability in how the metadata is generated. For example, this may be due to errors arising in automated numerical processing techniques for generating metadata of interest and / or the inherent subjectivity in manually generating metadata of interest.

In view of these drawbacks with existing techniques, there is a desire for improved methods and apparatus for obtaining biological information, storing biological information in a database, and searching a database to identify biological information of interest.

SUMMARY OF THE INVENTION

According to a first aspect of the disclosure there is provided a system for obtaining biological information relating to a biological sample; the system comprising: a control unit, a phenotype data acquisition unit configured to analyse the biological sample with respect to at least one phenotype characteristic; and a biological sequence data acquisition unit configured to determine at least partially a biological sequence of the biological sample, wherein the control unit is configured to: control the phenotype data acquisition unit to obtain phenotype data representing the at least one phenotype characteristic of the biological sample; control the biological sequence data acquisition unit to obtain biological sequence data representing a biological sequence characteristic of the biological sample; and convey an indication of the phenotype data and the biological sequence data to a storage unit for storing together with an indication the phenotype data and the biological sequence data are associated with one another.

In accordance with some examples, the phenotype data acquisition unit comprises an imaging module analysing the biological sample and the phenotype data comprise image data for the biological sample.

In accordance with some examples, the phenotype data acquisition unit comprises a mass spectrometer module analysing the biological sample and the phenotype data comprise mass spectrometer data for the biological sample.

In accordance with some examples, the phenotype data acquisition unit comprises a flow cytometer module analysing the biological sample and the phenotype data comprise flow cytometry data for the biological sample. In accordance with some examples, the phenotype data acquisition unit comprises a physiology data acquisition module analysing the biological sample and the phenotype data comprise physiology data for the biological sample

In accordance with some examples, the phenotype data acquisition unit and at least a part of the biological sequence data acquisition unit form an integrated apparatus.

In accordance with some examples, the integrated apparatus further comprises the control unit.

In accordance with some examples, the storage unit is separate from the integrated apparatus.

In accordance with some examples, the controller unit is communicatively coupled to the storage unit through the Internet.

In accordance with some examples, the biological sequence data acquisition unit comprises a nucleic acid sequence data acquisition unit and the biological sequence data comprise nucleic acid sequence data.

In accordance with some examples, the nucleic acid sequence data acquisition unit comprises a nucleic acid extraction module configured to extract nuclei acid from the biological sample and a nucleic acid sequencing module configured to sequence the nuclei acid extracted from the biological sample by the nucleic acid extraction module.

In accordance with some examples, the biological sequence data acquisition unit comprises a protein sequence data acquisition unit and the biological sequence data comprises protein sequence data.

In accordance with some examples, the system further comprises a dissection unit configured to dissect a portion of biological sample and to transfer the portion of biological sample to the biological sequence data acquisition unit for processing by the biological sequence data acquisition unit.

According to a second aspect of the disclosure there is provided a method for obtaining biological information relating to a biological sample in a system comprising a phenotype data acquisition unit and a biological sequence data acquisition unit, wherein the method comprises: using the phenotype data acquisition unit to analyse the biological sample with respect to at least one phenotype characteristic and to obtain phenotype data representing said at least one phenotype characteristic of the biological sample; using the biological sequence data acquisition unit to determine at least partially a biological sequence of the biological sample and to obtain biological sequence data representing a biological sequence characteristic of the biological sample; and conveying an indication of the phenotype data and the biological sequence data to a storage unit for storing together with an indication the phenotype data and the biological sequence data are associated with one another. According to a third aspect of the disclosure there is provided a method for maintaining a database of biological information relating to biological samples, the method comprising: receiving phenotype data representing a phenotype characteristic of a biological sample and biological sequence data representing a biological sequence characteristic of the biological sample together with an indication the phenotype data and the biological sequence data are associated with one another; and storing a record of the phenotype data for the biological sample in the database in association with a record of the biological sequence data for the biological sample.

According to a fourth aspect of the disclosure there is provided an apparatus for maintaining a database of biological information relating to biological samples, wherein the apparatus comprises: a communications unit configured to receive phenotype data representing a phenotype characteristic of a biological sample and biological sequence data representing a biological sequence characteristic of the biological sample together with an indication the phenotype data and the biological sequence data are associated with one another; and a storage unit configured to store a record of the phenotype data for the biological sample in the database in association with a record of the biological sequence data for the biological sample.

According to a fifth aspect of the disclosure there is provided a method of retrieving data from a database of biological information relating to biological samples, wherein the database comprises records for a plurality of biological samples, and wherein the records for each biological sample comprise a record of phenotype data representing a phenotype characteristic of the biological sample and a record of biological sequence data representing a biological sequence characteristic of the biological sample stored in association with one another, and wherein the method comprises: selecting a biological sequence characteristic of interest; identifying a record of biological sequence data in the database which matches the biological sequence characteristic of interest; and retrieving from the database the phenotype data stored in association with the biological sequence data which matches the biological sequence characteristic of interest.

According to a sixth aspect of the disclosure there is provided an apparatus for retrieving data from a database of biological information relating to biological samples, wherein the database comprises records for a plurality of biological samples, and wherein the records for each biological sample comprise a record of phenotype data representing a phenotype characteristic of the biological sample and a record of biological sequence data representing a biological sequence characteristic of the biological sample stored in association with one another, and wherein the method comprises: wherein the apparatus comprises a control unit configured to: receive an indication of a biological sequence characteristic of interest; identify a record of biological sequence data in the database which matches the biological sequence characteristic of interest; and output an indication of the phenotype data stored in the database in association with the biological sequence data which matches the biological sequence characteristic of interest.

According to a seventh aspect of the disclosure there is provided a storage unit comprising a database of records of biological information relating to a plurality of biological samples, wherein the records for each biological sample comprise a record of phenotype data representing a phenotype characteristic of the biological sample stored in association with a record of biological sequence data representing a biological sequence characteristic of the biological sample.

According to an eighth aspect of the disclosure there is provided a computer program product comprising machine readable instructions for causing a computer to execute the method of any of the second, third and fifth aspects of the disclosure.

It will be appreciated that features and aspects of the disclosure described above in relation to the first and other aspects of the disclosure are equally applicable to, and may be combined with, embodiments of the disclosure according to other aspects of the disclosure as appropriate, and not just in the specific combinations described above.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is now described by way of example only with reference to the following drawings in which:

Figure 1 schematically represents a system for obtaining biological information, storing the biological information in a database, and searching the database to identify biological information of interest in accordance with certain embodiments of the present disclosure;

Figure 2 is a flow diagram schematically representing a method of obtaining biological information and storing the biological information in a database in accordance with certain embodiments of the present disclosure;

Figure 3 is a schematic representation of a database comprising biological information in accordance with certain embodiments of the present disclosure;

Figure 4 is a flow diagram schematically representing a method of searching a database to identify biological information of interest in accordance with certain embodiments of the present disclosure; and

Figure 5 schematically represents a data acquisition unit / apparatus for obtaining biological information in accordance with certain embodiments of the present disclosure. DETAILED DESCRIPTION

Aspects and features of certain examples and embodiments of the present invention are discussed / described herein. Some aspects and features of certain examples and embodiments may be implemented conventionally and these are not discussed / described in detail in the interests of brevity. It will thus be appreciated that aspects and features of apparatus and methods discussed herein which are not described in detail may be implemented in accordance with any conventional techniques for implementing such aspects and features.

Figure 1 represents in highly-schematic functional form a system 10 for obtaining biological information relating to a biological sample 25, storing the biological information in a database comprising biological information for a plurality of biological samples, and searching the database to identify biological information of interest in accordance with certain embodiments of the present disclosure.

The system 10 comprises three main components, namely a data acquisition component (data acquisition apparatus) 20, a database storage component (database storage apparatus) 30, and a database access component (database access apparatus) 40. In the example implementation of Figure 1 these three components are communicatively interconnected via the Internet 50 in accordance with conventional intercommunication techniques and protocols. However, it will be appreciated that in other examples these components may be interconnected in accordance with other communication techniques without using the Internet. For example, a dedicated communications network may be provided for the system, or indeed two or more of the different components may in some examples comprise a single apparatus rather than separate components which are communicatively connected.

The data acquisition component 20 comprises a data acquisition control unit 21 for controlling the operation of the data acquisition component 20 and its communications with the other components of the system 10, an image data acquisition unit 22 for obtaining images of the biological sample 25, a DNA sequence data acquisition unit 23 for obtaining DNA sequence data by analysing a portion 25A extracted from the biological sample 25, and a dissection unit (transfer unit) 24 for dissecting the biological sample 25 to obtain the portion 25A and passing the portion 25A to the DNA sequence data acquisition unit 23. It will be appreciated the data acquisition component 20 will in general comprise various other elements associated with its operating functionality, for example a power supply and a user interface, but these are not directly relevant to the principles described herein and are not shown in Figure 1 or described further in the interests of simplicity and brevity.

The control unit 21 comprises a processor which is suitably configured / programmed to provide the desired functionality described herein using conventional programming / configuration techniques for controlling the operation of apparatus for acquiring biological information from biological samples. The functionality of the control unit 21 may be provided in various different ways, for example using a suitably programmed general purpose computer, or suitably configured application-specific integrated circuit(s) / circuitry or using a plurality of discrete circuitry / processing elements for providing different elements of the desired functionality. As one example, the data acquisition control unit 21 may comprise a processor running a software application based on the Leica Application Suite X (RTM) software platform from Leica Microsystems GmbH, Wetzlar, Germany with appropriate modifications to provide the functionality described herein.

The image data acquisition unit 22 may be based on a conventional microscope imaging apparatus configured to obtain image data representing an image of the sample 25 and to communicate the image data to the control unit 21 for further processing in accordance with conventional techniques. For example, the image data acquisition unit 22 may be based on the Leica DMi8 (RTM) inverted microscope platform from Leica Microsystems GmbH, Wetzlar, Germany. Similar principles to those used in the Leica DMi8 inverted microscope are also described in US 2003 / 0021017 (1 ].

The dissection unit (transfer unit) 24 may be based on a conventional dissection / transfer apparatus configured to extract a portion 25A from the biological sample 25. For example, the dissection unit 24 may be based on the principles adopted in the Leica LMD6 (RTM) and Leica LMD7 (RTM) laser micro-dissection platforms from Leica Microsystems GmbH, Wetzlar, Germany. Similar principles to those used in the Leica LMD6 and LMD7 laser micro-dissection platforms are also described in US 7,035,004 [2]. However, other dissection techniques may be used, for example using syringes or mechanical cutting.

The DNA sequence acquisition unit 23 may comprise a conventional DNA extraction and sequencer apparatus configured to process the portion 25A of the biological sample 25 obtained by the dissection unit 24 to extract DNA and sequence the DNA and to communicate the sequence data to the control unit 21 for further processing in accordance with conventional techniques. For example, the DNA sequence acquisition unit 23 may be based on the principles adopted in the MinlON (RTM), PromethlON (RTM) and / or GridlON (RTM) devices from Oxford Nanopore Technologies Limited, Oxford, UK and / or on the principles adopted in the HiSeq 2500 (RTM) device from lllumina Inc, San Diego, California, United States and / or the Ion Torrent (RTM) device from ThermoFisher Scientific, Inc, Waltham, Massachusetts, USA.

Thus, to summarise, the data acquisition component 20 comprises an image data acquisition unit 22, a dissection unit 24 and a DNA sequence acquisition unit 23 formed as a single integrated apparatus. The image data acquisition unit 22 is configured to obtain image data representing an image of the biological sample 25 currently under study, the dissection unit 24 is configured to extract a portion 25A of the biological sample 25 and transfer this to the DNA sequence acquisition unit 23. The DNA sequence acquisition unit 23 is configured to extract DNA from the received portion 25A of the biological sample 25 and to sequence the DNA to generate DNA sequence data. The control unit 21 is configured to control the image data acquisition unit 20, dissection unit 24 and DNA sequence acquisition unit 23 to perform their respective functions with appropriate coordination (e.g. first obtain image data for the biological sample 25 using the image data acquisition unit 22, then dissect and transfer the extracted portion 25A of the biological sample 25 to a receiving port of the DNA sequence acquisition unit using the dissection unit 24, and then extract and sequence DNA from the extracted portion 25A of the biological sample 25 using the DNA sequence data acquisition unit 23). The control unit 21 is further configured to receive the image data from the image data acquisition unit 22 and the DNA sequence data from the DNA sequence acquisition unit 23. Control signalling and data transfer between the control unit 21 and the respective other units of the data acquisition component 20 may occur over a communications bus 26 in accordance with conventional communication protocols and techniques.

The operation of the data acquisition unit may be at least partially automated. For example once initiated it may be operable to retrieve a selected biological sample from a supply of samples, for example a stack of previously prepared sample slides that the data acquisition unit is to work through in turn, load the retrieved sample to an imaging location for the image data acquisition unit and acquire the image data for the sample in accordance with a desired imaging protocol for the study at hand (e.g. in terms of the number and type(s) of images to obtain for each sample and associated magnification(s), exposure time(s) etc.). Once the image data has been obtained, the dissection unit may automatically cut and remove a portion of the retrieved sample, for example the dissection unit may be controlled to extract a portion of the sample from a predefined location in the sample and having a size suitable for processing by the DNA sequence acquisition unit and to transfer the extracted portion of the sample to a loading port of the DNA sequence acquisition unit. The DNA sequence acquisition unit may then be triggered to process the portion of the sample to obtain the DNA sequence data. Thus at least some aspects of the data acquisition component 20 may operate robotically in accordance with the principles generally used for automating laboratory equipment.

In another implementation, some or all of the operation of the data acquisition unit may be performed manually, for example by a clinician. For example, the clinician may be responsible for loading the biological sample into the data acquisition unit, and manually manipulating the dissection / transfer aspects of the operation.

It will be appreciated that in general the specific nature of the technologies which underlie the different operating aspects of the data acquisition unit (i.e. the specific type of image data acquisition unit, dissection unit and DNA sequence data acquisition unit) are not significant to the principal described herein, and may in themselves be based on conventional techniques.

Thus, the data acquisition component 20 is configured to analyse a biological sample to obtain image data for the biological sample in association with DNA sequence data for the sample which has been imaged. Significantly, in this example, these two types of data are acquired from the same sample using a single apparatus controlled by a single control unit (whose operating functionality will typically be provided by a software application running on a general-purpose programmable processor of the control unit).

In one example the data acquisition component 20 may be based on an existing data acquisition platform, such as a Leica DMi8 (RTM) platform. The DMi8 is a platform that comprises an inverted microscope and numerous ports for other components and which may be provided with a dissection / transfer unit 23 and a DNA extraction and sequencing unit 24.

The data acquisition component 20 is further configured to transmit the image data and DNA sequence data for the sample 25 to the database storage component 30 via the Internet 50. The image data and DNA sequence data are conveyed to the database storage component 30 in conjunction with an indication that the image data and DNA sequence data are associated with one another, i.e. they relate to the same biological sample 25. This indication may be provided explicitly, for example by associating a matching identifier with the two types of data so that the database storage component 30 may identify the relationship between the two types of data by identifying they are associated with the same identifier. Alternatively the indication that the two types of data relate to the same biological sample may be provided implicitly, for example by transmitting the two types of data together, for example in a single message.

Turning now to the database storage component 30, this comprises a database control unit 31 and a database storage unit 32. The database storage component 30 is generally configured to receive, via the Internet 50, image data and associated DNA sequence data from the biological data acquisition component 20 and to store the data in a database comprising records stored in the database storage unit 32. It will be appreciated the database component 30 will in general comprise various other elements associated with its operating functionality, for example a power supply, but these are not directly relevant to the principles described herein and are not shown in Figure 1 or described further in the interests of simplicity and brevity.

Thus, the database control unit 31 is configured to receive the biological data (image data and associated DNA sequence data) from the data acquisition unit and to control the database storage unit 32 to store a record of the image data for the biological sample in a linked association with a record of the DNA sequence data for the biological sample. In some respects this may be seen as tagging the image data record with the DNA sequence data record such that the DNA sequence data record may in some respects be considered to comprise metadata for the image data record. More generally, the format of the database and the manner in which the image data records and DNA sequence data records are indicated as being associated / linked with one another (because they relate to the same biological sample) may follow conventional database techniques. The database control unit 31 comprises a processor which is suitably configured / programmed to provide the desired functionality described herein using conventional programming / configuration techniques for controlling database generation, maintenance and access operations. The functionality of the control unit 31 may be provided in various different ways, for example using a suitably programmed general purpose computer, or suitably configured application-specific integrated circuit(s) / circuitry or using a plurality of discrete circuitry / processing elements for providing different elements of the desired functionality.

The database storage unit 32 comprises a memory for the various records comprising the database of biological data (i.e. in this example image data records and linked DNA sequence data records). The database storage unit may be based on any conventional memory technology, for example a solid-state memory or a disk-based memory, such as a magnetic disk or optical disc based memory. Furthermore, although the memory comprising the storage unit 32 may in principle comprise a single memory unit, in general the memory comprising the storage unit 32 will typically comprise a distributed array of memory units providing redundancy for failure recovery in accordance with conventional techniques.

Thus, the biological data acquisition component 20 may be used to obtain biological data comprising image data representing one or more images of a biological sample and associated DNA sequence data for the biological sample, and these may be stored in association with one another in a database provided by the database storage component 30. Data associated with a plurality of different biological samples may be obtained by the biological data acquisition component 20, and other similar biological data acquisition units, so that the database may be used to store image data with associated DNA sequence data for a plurality of biological samples.

A significant aspect of this approach is that the image data can be considered to in effect be tagged by the associated DNA sequence data, which is much more amenable to searching them the image data itself. This can provide a mechanism to allow a user to in effect search for and identify what might be relevant image data in the database by searching the associated DNA sequence data. A user may do this via the database access component 40 represented in Figure 1 .

The database access component 40 represented in Figure 1 comprises, in this example, a conventional general-purpose computer running an appropriately configured software application to provide access to the database stored in the database storage unit 32 of the database storage component 30, via the Internet 50, in accordance with the principles described herein and discussed further below. It will be appreciated that in other examples, or in addition, the database access component 40 may comprise other types of device, for example a dedicated / bespoke device which is hardware / firmware configured to provide the functionality described herein, or another type of general purpose programmable device, such as a smartphone or tablet device, running an appropriate software application.

Figure 2 schematically represents a method of acquiring biological data and storing corresponding records in a database using the system 10 represented in Figure 1 in accordance with certain embodiments of the disclosure.

Processing begins in a first step S1 . This may correspond with the point at which the data acquisition component 20 receives a new biological sample 25 to process.

In step S2 image data for the biological sample 25 is acquired using the image data acquisition unit 22. As noted above, the image data may be acquired in accordance with generally conventional techniques, for example using conventional microscope image capture techniques. The image data may be captured in accordance with any known image data format. It will be appreciated the approaches described herein are not limited to images acquired from microscopes, but can be applied in respect of any image data. More generally, and as discussed further below, the same principles can also be applied in respect of other forms of data which do not represent an image of the biological sample, but represent another aspect of the biological sample, and in particular an aspect which is less amenable to searching them sequence data. In this regard, the image data captured in step S2 may be considered an example of more general approaches directed to obtaining phenotype data for the biological sample. That is to say, in accordance with certain embodiments, rather than acquiring image data (or in addition to acquiring image data), the data acquisition unit may be configured to acquire one or more other types of phenotype data relating to the biological sample, for example what might be referred to as data relating to the phenotype of the cell / tissue / biological sample. Some examples of different types of phenotype data are discussed further below. The specific protocols adopted for obtaining the phenotype data will depend on the application at hand. For example where the phenotype data comprises image data acquired from a microscope imager, the nature of the image(s) obtained for the biological sample (for example in terms of focus settings, exposure times, and magnifications, etc.) will depend in the usual way on the nature of the study at hand, and the characteristics of the biological sample which are sought to be captured.

In step S3 a portion 25A of the biological sample of the biological sample is dissected from the biological sample 25 by the dissection unit (transfer unit) 23. As noted above, this may be performed in accordance with generally conventional techniques, for example laser micro- dissection techniques, which may be automated, for example under control of the data acquisition control unit 21 , or manually controlled. The specific protocols adopted for dissecting the portion 25A of the biological sample 25A for transfer to the DNA sequence acquisition unit 24 will depend on the application at hand. For example, the nature of the portion 25A of the biological sample 25A selected for transfer to the DNA sequence acquisition unit 24 (for example in terms of the size of the portion and the location in the sample from which the portion is taken) will depend in the usual way on the nature of the study at hand and the operating characteristics of the particular DNA sequence acquisition unit 24 being used (e.g. in terms of how large a portion of the sample is used for sequencing).

In step S4 DNA sequence data for the biological sample 25 is acquired using the DNA sequence data acquisition unit 24. As noted above, the DNA sequence data may be acquired in accordance with generally conventional techniques, for example using conventional DNA extraction and sequencing techniques. The DNA sequence data may be captured in accordance with any known DNA sequence data format. It will be appreciated the approaches described herein are not limited to DNA sequence data, but can be applied in respect of other forms of biological sequence data, such as RNA sequence data and protein sequence data. That is to say the same principles can also be applied in respect of other forms of (linear) biological sequence data and the approaches described herein are not restricted to DNA sequence data. In this regard, the DNA sequence data captured in step S4 may be considered an example of a more general approach directed to obtaining any form of biological sequence data for the biological sample. That is to say, in accordance with certain embodiments, rather than acquiring DNA sequence data (or in addition to acquiring DNA sequence data), the data acquisition unit may be configured to acquire one or more other types of biological sequence data.

Thus, at the conclusion of step S4, the data acquisition unit 22 has obtained image data

(or more generally phenotype data) for the biological sample and DNA sequence data (or more generally biological sequence data) for the same biological sample. Furthermore, these two types of data are acquired together with one another within a single instrument.

In step S5 the data acquisition control unit 21 is configured to communicate (upload) the phenotype data (in this example image data) and biological sequence data (in this example DNA sequence data) to the database storage component 30 via the Internet 50. As noted above, this may be performed in accordance with conventional communication protocols and techniques. The phenotype data and the biological sequence data are communicated to the database storage component 30 in a manner which indicates these two types of data are related to one another (i.e. they are associated with the same biological sample). This indicator may be provided in a number of ways as discussed above. For example, the two types of data may be associated with a unique identifier for the biological sample to which they relate, or they may simply be transmitted in association with one another to indicate they relate to the same biological sample and are to be stored in linked association in the database provided by the database storage component 30.

In step S6 the database storage control unit 31 receives the phenotype data and biological sequence data from the data acquisition component 20 and identifies these two types of data relate to the same biological sample. The database storage control unit 31 then controls the database storage unit 32 to store a record of the phenotype data (in this case image data) in linked association with a record for the biological sequence data (in this case DNA sequence data) in the database.

This concludes the processing of the biological sample 25 in terms of acquiring and storing the relevant biological information for the sample. The biological sample 25 may thus be removed from the data acquisition component 20 and replaced with the next biological sample to be analysed, and the method of Figure 2 may repeat for the new biological sample. Thus the processing of Figure 2 may be performed repeatedly for different biological samples to build the database associated with the database storage component 30.

Figure 3 schematically represents a database of records generated in accordance with the principles underlying the method represented in Figure 2 and using the system 10 represented in Figure 1 . It will be appreciated this is represented in highly schematic tabular form for ease of representation. Thus, the database may be considered to comprise a series of indexed entries. Each entry index is associated with image data representing an image of a biological sample and corresponding DNA sequence data for the biological sample, as schematically represented in Figure 3 in highly schematic form in the second and third columns respectively. The size of the database (i.e. the number of rows in the tabular format represented in Figure 3) will depend on the number of different biological samples for which data had been obtained.

Thus, in accordance with the principles described herein, certain embodiments of the disclosure provide for the generation of a database of biological information comprising phenotype data which is not amenable to searching stored in association with (i.e. in effect tagged with) biological sequence data which is more amenable to searching. There are many applications for such a database.

For example, the linked biological sequence data may be used to search the database for sequences that correspond to the phenotype data to provide information as to the physiological state of the biological sample, or working the other way possible features of cells that maybe observed if such a pattern of sequences were matched. As a first example, the biological sample may comprise brain tissue in a dish undergoing a calcium imaging experiment and show that calcium waves are particularly rapid. Some tissue is taken, the DNA and / or RNA is extracted and sequenced. This sequence data is tagged to (linked to) the image data and loaded into a database containing other sequences in accordance with the principles discussed above. Using the tagged sequence data a search may be undertaken and this may show there is a match between an unusual raid opening and closing calcium channel. It may therefore be hypothesized that the tissue being experimented upon has the gene turned on for that channel. As the sequencing of the sample may provide multiple sequences, the profile of the genes being expressed can be used not just to match one gene but groups of genes in pathways which could indicate the tissue derives, for example, from a tumor.

As a second example, brain tissue may be observed by microscopy to have an unusual combination of proteins in the cell membrane making those proteins a possible therapeutic target for a given drug "X". Some of the tissue is extracted, sequenced , linked to the image and loaded into the database in accordance with the principles discussed herein. These linked sequences may then be used to search the database for similar sequences expressed by other cells. A match of some of the linked sequences to sequences from other cell types could indicate that the "unusual combination of proteins in the membrane" could also occur in cells from which the matching sequences were found indicating that that class of cells could also be a target for drug "X".

Figure 4 schematically represents a method of searching and retrieving biological data from the database of biological information in the system 10 represented in Figure 1 in accordance with certain embodiments of the disclosure. In this regard, the method of Figure 4 may be performed by the database access component 40. That is to say, the database access component 40 is configured to perform the method of Figure 4, for example through appropriate programming of a general-purpose computing device.

Processing begins in a first step T1 . This may correspond with the point at which a user of the database access component 40 decides they wish to access the database stored in the database storage unit 32 to search for data of interest. For the sake of providing a concrete example, it will be assumed here a user of the database access component 40 is interested in a particular biological sample and wishes to identify other previously-obtained images stored in the database that might be expected to show similar characteristics, for example morphology. As noted above, it is typically not easy to search on these type of phenotype characteristics because they can be hard to parameterise in a readily searchable format (e.g. in a character string format). Thus, in accordance with the principles described herein, the user may instead of searching the image data stored in the database itself, search the linked sequence data. Thus, in step T2 a biological sequence characteristic of interest is selected. In this example this corresponds with a portion of a DNA sequence obtained for the particular biological sample under study.

In step T3 the data access component 40 establishes, via the Internet 50, a link to the database storage component 30 to obtain access to the database linking records of image data with their corresponding DNA sequences obtained in accordance with the principles described above.

In step T4 the data access component 40 communicate an indication of the biological sequence characteristic of interest selected in step T2 to the database storage control unit 31 and requests the database storage control unit 31 to search the database in the database storage unit 32 for a match with the selected biological sequence characteristic. The database storage control unit 31 proceeds to undertake the search, for example in accordance with conventional database searching techniques. It will be appreciated the identification of what constitutes a match may be determined in accordance with conventional biological sequence database searching techniques. It will also be appreciated in some cases there may be no matches or there may be more than one match.

In step T5, having identified there is a match between the biological sequence characteristic provided to the database storage control unit 31 for searching in step T4, the database storage control unit 31 proceeds to retrieve the image data (or other phenotype data) stored in the database in linked association with the biological sequence(s) stored in the database which are identified as matching the biological sequence characteristic of interest.

In step T6 the database storage control unit communicates the image data retrieved from the database in step T5 to the database access component 40 via the Internet. On receiving the image data from the database storage component 30, the database access component 40 may output an indication of the retrieved image data to the user, for example by displaying corresponding image(s) on a display of the database access component 40.

This concludes the process of retrieving data from the database. Further processing / analysis of the retrieved data may be performed according to application at hand, for example in terms of what characteristics represented by the image data the user may wish to seek to correlate with a biological sample under study.

Thus, in accordance with the principles described herein, certain embodiments of the disclosure provide methods and apparatus for obtaining biological information, storing biological information in a database, and searching a database to identify biological information of interest. In particular, two different types of biological information are obtained for a biological sample, wherein a first type of biological information comprises phenotype information, for example image data, and a second type of biological information comprises biological sequence data, for example DNA sequence data. The two types of data are stored in association with one another in a database. The database may thus be conveniently searched on the basis of the biological sequence data, since biological sequence data is generally amenable to searching, and phenotype data, for example image data, corresponding to a biological sequence which matches the search criterion / criteria may be identified and retrieved for presentation to a user.

The above examples have primarily focused on linking DNA sequence data with image data for biological samples, but as already noted, the same principle can be applied for other types of phenotype data and biological sequence data.

In this regard, Figure 5 schematically represents a data acquisition component 50 which is similar to, and will in many respects be understood from, the data acquisition component 20 represented in Figure 1 with corresponding elements identified by corresponding reference numerals. The data acquisition unit 50 represented in Figure 5 may be implemented in a system which is similar to that represented in Figure 1 and described above. However, whereas the data acquisition component 20 of the system represented in Figure 1 comprises an image data acquisition unit for obtaining image data and a DNA sequence acquisition unit for obtaining DNA sequence data for processing as described above, the data acquisition component 50 in Figure 5 comprises data acquisition units for acquiring different types of data.

Thus, in place of the image data acquisition unit 22 represented in the data acquisition component 20 of Figure 1 , the data acquisition component 50 of Figure 5 is represented more generally as comprising a generic phenotype data acquisition unit 52. The phenotype data acquisition unit 52 may comprise an image data acquisition unit, as in the example of Figure 1 , or another type of apparatus for acquiring biological phenotype data. Examples of other types of apparatus for acquiring biological phenotype data associated with a biological sample include a mass spectrometer configured to obtain mass spectrometer data for a biological sample, a flow cytometer configured to acquire flow cytometry data for a biological sample, and a physiology data acquisition unit configured to acquire (electro)physiology data for a biological sample.

Furthermore, in place of the DNA sequence acquisition unit 23 represented in the data acquisition component 20 of Figure 1 , the data acquisition component 50 of Figure 5 is represented more generally as comprising a generic biological sequence data acquisition unit 53. The biological sequence data acquisition unit 53 may comprise a DNA sequence acquisition unit, as in the example of Figure 1 , or another type of apparatus for acquiring biological sequence data. Examples of other types of apparatus for acquiring biological sequence data associated with a biological sample include a RNA sequence data acquisition unit for obtaining RNA sequence data and a protein sequence data acquisition unit for obtaining protein sequence data. In this regard, DNA sequence data and RNA sequence data may be collectively referred to as nucleic acid sequence data. It will be appreciated that in certain implementations DNA sequence data may be derived from messenger RNA sequence data.

In general, the data acquisition units in any given implementation will be selected according to the study at hand, and furthermore, the data acquisition units, whether for the phenotype data or the sequence data, may generally operate in accordance with conventional data acquisition techniques in the field of biological information acquisition. What is significant in accordance with some embodiments of the disclosure is not how the different types of data are obtained in terms of the underlying technology, but how the different types of data are acquired and stored in association with one another so the sequence data can be used to in effect label the corresponding phenotype data, for example to facilitate database searching.

It will be appreciated the above description has focused on only some example implementations of approaches in accordance with certain embodiments of the disclosure, and various aspects of the above-described approaches may be modified in accordance with other implementations.

For example, whereas in the implementation representing Figure 1 the phenotype data acquisition unit and the biological sequence data acquisition unit are comprised in a single integrated apparatus, in principle they may be provided as separate units. However, it may be expected the integration of these elements into a single apparatus will facilitate use, example by avoiding what might otherwise be additional work in ensuring the relevant datasets are properly linked, which may also lead to potential errors in associations. It will further be appreciated that certain aspects of the system represented in Figure 1 may be combined rather than separate elements. For example, rather than have the data acquisition component 20 communicate with a remote database storage component 30 via the Internet 50, the functionality of these elements may be provided in a single integrated apparatus. That is to say, the database storage unit may itself be a part of the data acquisition unit. Similarly, the access to the database may be through the data acquisition unit itself, rather than, or in addition to, through a separate component 40.

Furthermore, it will be appreciated the underlying nature of any particular study which may adopt the principles described herein, for example in terms of what type of tissue the study is concerned with, and what the study is seeking to determine, is not significant. It will also be appreciated that once data had been obtained in accordance with the principles described herein, the data may be used in many different ways. For example, in one implementation a user may obtain image data (or other phenotype data) and associated DNA sequence data (or other biological sequence data) for a particular biological sample of interest. The user may then wish to search a conventional DNA database for a match to the DNA sequence data to identify previous work relating to biological samples having similar DNA sequence characteristics. The user may then review the earlier work to identify any aspect(s) of the biological sample studied in the earlier work which are biologically relevant to the user's current study.

Thus there has been described a system for obtaining biological information relating to a biological sample; the system comprising: a control unit, one or more phenotype data acquisition unit(s); and one or more biological sequence data acquisition unit(s). The control unit is configured to control the phenotype data acquisition unit(s) to obtain phenotype data representing a phenotype characteristic of the biological sample, to control the biological sequence data acquisition unit(s) to obtain biological sequence data representing a biological sequence characteristic of the biological sample; and to convey an indication of the phenotype data and the biological sequence data to one or more storage unit(s) for storing with an indication the phenotype data and the biological sequence data are associated with the same biological sample. Thus the storage unit(s) may comprise a database of records of biological information relating to a plurality of biological samples, wherein the records for each biological sample comprise phenotype data stored in association with biological sequence data. The database may be searched by selecting a biological sequence characteristic of interest and identifying a record of biological sequence data in the database which matches the biological sequence characteristic of interest, and the phenotype data stored in association with the biological sequence data that matches the biological sequence characteristic of interest may be retrieved.

Further particular and preferred aspects of the present invention are set out in the accompanying independent and dependent claims. It will be appreciated that features of the dependent claims may be combined with features of the independent claims in combinations other than those explicitly set out in the claims. REFERENCES

[1 ] US 2003 / 0021017

[2] US 7,035,004