Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD AND APPARATUS FOR LAB TESTS
Document Type and Number:
WIPO Patent Application WO/2022/195582
Kind Code:
A1
Abstract:
A method, apparatus and computer program product, the method comprising: obtaining a number of samples to be tested for an attribute, each sample associated with a sample identifier, a number of pools of tests to be created, each pool associated with a pool identifier; determining a combinatorial testing scheme, comprising generating an association of sample identifiers with pool identifiers, wherein: one or more sample identifiers are associated with two or more pool identifiers, and one or more pool identifiers are associated with two or more sample identifiers; automatically providing the association to a testing facility; receiving from the facility a plurality of test results, each test result indicating for a pool identifier a degree to which a corresponding pool possesses the attribute; and based on the association and on the test results, determining for each sample identifier a probability that the sample possesses the attribute.

Inventors:
POLACHEK RUTH (IL)
Application Number:
PCT/IL2022/050285
Publication Date:
September 22, 2022
Filing Date:
March 14, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
G T A I INNOVATION LTD (IL)
International Classes:
C12Q1/00
Foreign References:
US20190100769A12019-04-04
US20120185177A12012-07-19
US20120173160A12012-07-05
Attorney, Agent or Firm:
HELFAND, Naomi et al. (IL)
Download PDF:
Claims:
CLAIMS

1. A computer-implemented method comprising: obtaining a number of samples to be tested for possessing an attribute, wherein each sample is associated with a sample identifier, a number of pools of tests to be created, each pool associated with a pool identifier; determining a combinatorial testing scheme, comprising generating an association of each sample identifier with one or more pool identifiers, wherein: at least one sample identifier is associated with at least two pool identifiers, and at least one pool identifier is associated with at least two sample identifiers; automatically providing the association to a testing facility; receiving from the testing facility a plurality of test results, each test result indicating for a pool identifier a degree to which a pool corresponding to the pool identifier possesses the attribute; and based on the association and on the test results, determining for each sample identifier a probability that the sample corresponding to the sample identifier possesses the attribute.

2. The computer- implemented method of Claim 1, further comprising determining a result for each sample, the result based on the probability and on a certainty degree.

3. The computer- implemented method of Claim 2, further comprising providing the result to a user.

4. The computer-implemented method of Claim 2, wherein the result is determined by applying a rule based algorithm

5. The computer- implemented method of Claim 2, wherein the result is selected from the group consisting of: “positive” indicating that the sample comprises the attribute with a certainty exceeding a first threshold, “negative” indicating that the sample comprises the attribute with a certainty below a second threshold lower than the first threshold, and “undetermined” if the sample comprises the attribute with a certainty between the first threshold and the second threshold.

6. The computer-implemented method of Claim 2, wherein the result is a real number.

7. The computer- implemented method of Claim 1, further comprising further processing of the plurality of the test results for enhancing comprehension of the test result prior to determining the probability that the sample corresponding to the sample identifier possesses the attribute.

8. The computer- implemented method of Claim 1, wherein the plurality of indicators are received based on a liquid-based test performed for each of the pools.

9. The computer- implemented method of Claim 8, wherein the liquid-based test is a qRT-PCR test.

10. The computer- implemented method of Claim 1, wherein the testing facility comprises an actuator system adapted to: for each pool identifier, automatically mix at least a part from each sample associated with the pool identifier, to create a pool; test the pool to obtain the corresponding indicator; and report the test results.

11. The computer- implemented method of Claim 1, further comprising: receiving data related to a testing facility equipment and operation manner, including at least a number of testing devices; generating a test plan to optimize testing the pools in accordance with the testing facility equipment and operation manner; and providing the test plan to the testing facility.

12. The computer- implemented method of Claim 1, further comprising obtaining a predicted prevalence that an item possesses the attribute; and wherein generating the association is also based on the predicted prevalence.

13. The computer-implemented method of Claim 12, wherein said obtaining comprises computing the predicted prevalence based on at least one parameter selected from the group consisting of: number of tests that can be performed concurrently; testing device type; kit type; demographics, symptoms or geographic data of at least one person that at least one sample is associated with, demographic data related to samples previously tested in the testing facility, or demographic data related to previously tested samples collected from subjects in an environment of a subject that at least one sample is associated with.

14. The computer- implemented method of Claim 13 wherein higher predicted prevalence causes association of fewer sample identifiers with at least one pool identifier.

15. The computer-implemented method of Claim 1, wherein generating the association comprises determining at least a predetermined percentage of the samples to be tested individually.

16. The computer-implemented method of Claim 1, wherein generating the association comprises dividing the samples into at least two groups wherein a different association algorithm or a different decoding algorithm is applied for each group.

17. The computer-implemented method of Claim 1, wherein generating the association comprises determining on each iteration a corresponding pool for each sample, using a greedy algorithm.

18. The computer- implemented method of Claim 17, wherein the greedy algorithm is activated with a cost function to be minimized, the cost function being a probability of the sample being labelled as false positive.

19. The computer-implemented method of Claim 1, wherein calculating for a sample identifier a probability that a sample corresponding to the sample identifier possesses the attribute comprises: performing a random walk over a solution space comprising possible solutions indicating samples assumed to possess the attribute, including calculating a maximal likelihood for each visited point in the solution space; and assigning a probability that a sample possesses the attribute based on a proportion between a number of vertices visited in the random walk in which the sample possesses the attribute and a number of vertices in the random walk.

20. The computer-implemented method of Claim 19, wherein assigning the probability is based on a message passing algorithm.

21. The computer-implemented method of Claim 19, wherein assigning the probability is based on a binary algorithm.

22. A computerized apparatus having a processor and coupled memory, the processor being adapted to perform the steps of: obtaining a number of samples to be tested for possessing an attribute, wherein each sample is associated with a sample identifier, a number of pools of tests to be created, each pool associated with a pool identifier; determining a combinatorial testing scheme, comprising generating an association of each sample identifier with one or more pool identifiers, wherein: at least one sample identifier is associated with at least two pool identifiers, and at least one pool identifier is associated with at least two sample identifiers; automatically providing the association to a testing facility; automatically receiving from the testing facility a plurality of test results, each test result indicating for a pool identifier a degree to which a pool corresponding to the pool identifier possesses the attribute; and based on the association and on the test results, determining for each sample identifier a probability that the sample corresponding to the sample identifier possesses the attribute.

23. A computer program product comprising a non-transitory computer readable storage medium retaining program instructions, which program instructions when read by a processor, cause the processor to perform a method comprising: obtaining a number of samples to be tested for possessing an attribute, wherein each sample is associated with a sample identifier, a number of pools of tests to be created, each pool associated with a pool identifier; determining a combinatorial testing scheme, comprising generating an association of each sample identifier with one or more pool identifiers, wherein: at least one sample identifier is associated with at least two pool identifiers, and at least one pool identifier is associated with at least two sample identifiers; automatically providing the association to a testing facility; automatically receiving from the testing facility a plurality of test results, each test result indicating for a pool identifier a degree to which a pool corresponding to the pool identifier possesses the attribute; and based on the association and on the test results, determining for each sample identifier a probability that the sample corresponding to the sample identifier possesses the attribute.

Description:
A METHOD AND APPARATUS FOR LAB TESTS

TECHNICAL FIELD

[0001] The present disclosure relates to testing in general, and to a method and apparatus for efficient diagnostic and/or molecular tests, in particular.

BACKGROUND

[0002] Some types of testing relate to identifying from a plurality of items, whether and which one or more items have a specific attribute, usually referring to faulty items. For example, it may be required to identify faulty light bulbs from a plurality of light bulbs. In another example, it may be required to identify from a plurality of samples of individuals, whether and which individuals show presence of targets such as a pathogen, such as syphilis, HIV, SARS-CoV-2, or the like, antibodies, or other identifiable material.

[0003] In some situations, the probability of the attribute to be present in any of the items may be rather low, for example in a light bulb factory where most bulbs are believed to be functional, or when testing for a disease people that are generally healthy, such as routine survey testing for soldiers or for women of a certain age group.

[0004] While it is possible to test each item individually and determine whether it carries the attribute (“positive”) or not (“negative”), this may be a lengthy and cost-ineffective process, due to the large number of tested subjects. In order to improve the efficiency and cost, in some of these situations, group testing may be employed.

[0005] In group testing, also referred to as “pooling”, a plurality of items are tested together, such that a positive result is received if and only if at least one of the tested items within the tested group is positive. For example, group testing of light bulbs may be performed by connecting a group of them serially with one another and with a power supply. If all bulbs light up, they are all fine, but if any of them is broken, none will light up. In the case of liquid samples, the samples may be mixed and tested as a single sample. If any one or more of the individual samples has presence of the vims or bacteria, the mixture will return a positive result. However, if the result is negative, then all individuals whose samples were mixed may be assumed to be free of the disease. In such situations, a negative test result may save a plurality of individual tests that will not be necessary to test individually due to the group test results.

[0006] In some situations, group testing may return a binary, e.g., positive/negative result, wherein a negative result indicates that all items are fine, e.g., are functional, healthy and do not carry the vims, bacteria or disease, or the like, and a positive result indicates that at least one item is faulty. In other situations, the test result may be quantitative and can depend on the number and/or severity of positive samples in the test, for example the relative fluorescence over the course of the qRT-PCR reaction.

SUMMARY

[0007] One exemplary embodiment of the disclosed subject matter is a computer- implemented method comprising: obtaining a number of samples to be tested for possessing an attribute, wherein each sample is associated with a sample identifier, a number of pools of tests to be created, each pool associated with a pool identifier; determining a combinatorial testing scheme, comprising generating an association of each sample identifier with one or more pool identifiers, wherein: each of one or more sample identifiers are associated with two or more pool identifiers, and one or more pool identifiers are associated with two or more sample identifiers; automatically providing the association to a testing facility; receiving from the testing facility a plurality of test results, each test result indicating for a pool identifier a degree to which a pool corresponding to the pool identifier possesses the attribute; and based on the association and on the test results, determining for each sample identifier a probability that the sample corresponding to the sample identifier possesses the attribute. The method can further comprise determining a result for each sample, the result based on the probability and on a certainty degree. The method can further comprise providing the result to a user. Within the method, the result is optionally determined by applying a rule based algorithm. Within the method, the result is optionally selected from the group consisting of: “positive” indicating that the sample comprises the attribute with a certainty exceeding a first threshold, “negative” indicating that the sample comprises the attribute with a certainty below a second threshold lower than the first threshold, and “undetermined” if the sample comprises the attribute with a certainty between the first threshold and the second threshold. Within the method, the result is optionally a real number. The method can further comprise further processing of the plurality of the test results for enhancing comprehension of the test result prior to determining the probability that the sample corresponding to the sample identifier possesses the attribute. Within the method, the plurality of indicators are optionally received based on a liquid-based test performed for each of the pools. Within the method, the liquid-based test is optionally a qRT-PCR test. Within the method, the testing facility optionally comprises an actuator system adapted to: for each pool identifier, automatically mix at least a part from each sample associated with the pool identifier, to create a pool; test the pool to obtain the corresponding indicator; and report the test results. The method can further comprise: receiving data related to a testing facility equipment and operation manner, including at least a number of testing devices; generating a test plan to optimize testing the pools in accordance with the testing facility equipment and operation manner; and providing the test plan to the testing facility. The method can further comprise: obtaining a predicted prevalence that an item possesses the attribute; and wherein generating the association is also based on the predicted prevalence. Within the method, said obtaining optionally comprises computing the predicted prevalence based on one or more parameters selected from the group consisting of: number of tests that can be performed concurrently; testing device type; kit type; demographics, symptoms or geographic data of at least one person that at least one sample is associated with, demographic data related to samples previously tested in the testing facility, or demographic data related to previously tested samples collected from subjects in an environment of a subject that at least one sample is associated with. Within the method, higher predicted prevalence optionally causes association of fewer sample identifiers with at least one pool identifier. Within the method, generating the association optionally comprises determining at least a predetermined percentage of the samples to be tested individually. Within the method, generating the association optionally comprises dividing the samples into at least two groups wherein a different association algorithm or a different decoding algorithm is applied for each group. Within the method, generating the association optionally comprises determining on each iteration a corresponding pool for each sample, using a greedy algorithm. Within the method, the greedy algorithm is optionally activated with a cost function to be minimized, the cost function being a probability of the sample being labelled as false positive. Within the method, calculating for a sample identifier a probability that a sample corresponding to the sample identifier possesses the attribute optionally comprises: performing a random walk over a solution space comprising possible solutions indicating samples assumed to possess the attribute, including calculating a maximal likelihood for each visited point in the solution space; and assigning a probability that a sample possesses the attribute based on a proportion between a number of vertices visited in the random walk in which the sample possesses the attribute and a number of vertices in the random walk. Within the method, assigning the probability is optionally based on a message passing algorithm.

[0008] Another exemplary embodiment of the disclosed subject matter is a computerized apparatus having a processor, the processor being adapted to perform the steps of: obtaining a number of samples to be tested for possessing an attribute, wherein each sample is associated with a sample identifier, a number of pools of tests to be created, each pool associated with a pool identifier; determining a combinatorial testing scheme, comprising generating an association of each sample identifier with one or more pool identifiers, wherein: each of one or more sample identifiers are associated with two or more pool identifiers, and one or more pool identifiers are associated with two or more sample identifiers; automatically providing the association to a testing facility; receiving from the testing facility a plurality of test results, each test result indicating for a pool identifier a degree to which a pool corresponding to the pool identifier possesses the attribute; and based on the association and on the test results, determining for each sample identifier a probability that the sample corresponding to the sample identifier possesses the attribute.

[0009] Yet another exemplary embodiment of the disclosed subject matter is a computer program product comprising a computer readable storage medium retaining program instructions, which program instructions when read by a processor, cause the processor to perform a method comprising: obtaining a number of samples to be tested for possessing an attribute, wherein each sample is associated with a sample identifier, a number of pools of tests to be created, each pool associated with a pool identifier; determining a combinatorial testing scheme, comprising generating an association of each sample identifier with one or more pool identifiers, wherein: each of one or more sample identifiers are associated with two or more pool identifiers, and one or more pool identifiers are associated with two or more sample identifiers; automatically providing the association to a testing facility; receiving from the testing facility a plurality of test results, each test result indicating for a pool identifier a degree to which a pool corresponding to the pool identifier possesses the attribute; and based on the association and on the test results, determining for each sample identifier a probability that the sample corresponding to the sample identifier possesses the attribute.

THE BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

[0010] The present disclosed subject matter will be understood and appreciated more fully from the following detailed description taken in conjunction with the drawings in which corresponding or like numerals or characters indicate corresponding or like components. Unless indicated otherwise, the drawings provide exemplary embodiments or aspects of the disclosure and do not limit the scope of the disclosure. In the drawings:

[0011] Figs. 1A shows an illustration of an individual scheme;

[0012] Figs. IB shows an illustration of a group testing scheme;

[0013] Fig. 2 shows an illustration of a combinatorial group testing scheme, in accordance with some exemplary embodiments of the disclosed subject matter;

[0014] Fig. 3A shows a block diagram of an apparatus for combinatorial group testing scheme, in accordance with some exemplary embodiments of the disclosed subject matter;

[0015] Fig. 3B shows a schematic flowchart of the testing procedure, in accordance with some exemplary embodiments of the disclosed subject matter;

[0016] Fig. 4 shows a flowchart diagram of a method for combinatorial group testing scheme, in accordance with some exemplary embodiments of the disclosed subject matter;

[0017] Fig. 5 shows a flowchart diagram of a method for creating sample-test association, in accordance with some exemplary embodiments of the disclosed subject matter; and

[0018] Fig. 6 shows a flowchart diagram of a method for decoding continuous test results, in accordance with some exemplary embodiments of the disclosed subject matter.

DETAILED DESCRIPTION

[0019] One technical problem dealt with by the disclosed subject matter is to provide a method and apparatus for efficient testing of a plurality of items, also referred to as samples, and determining whether and which ones are faulty. The testing process is required to be more efficient than testing each item individually.

[0020] Another technical problem dealt with by the disclosure is that the testing should be non-adaptive, such that once planned, the testing scheme may be performed uninterrupted, and once all results are available, it may be determined for each item whether it is faulty or fine, or in other words positive or negative, with any required certainty degree, such that the need to retest some of the undetermined samples is significantly reduced or eliminated. The solution should also be useful for quantitative tests, in which the test result also indicates the “level” of faultiness, and wherein it is required to provide an estimate to the faultiness level of each positive sample.

[0021] Yet another technical problem dealt with by the disclosure is improvement over current methods of the entire testing procedure, such that the process as a whole, from the receipt of the items at the testing facility until results are available, is performed efficiently, and provides results in an efficient manner.

[0022] One technical solution comprises a combinatorial testing scheme, in which a plurality of items are tested in each single test. However, unlike widely used methods of test grouping, the solution provides for one or more of the items to participate in multiple tests, in different combinations with other items. The sample -test association may be calculated based on factors which may include the number of items or samples to be tested and the maximal number of tests to be performed. The combination may also depend on additional parameters, such as but not limited to any one or more of the following: the number of items that can be tested together in a single test, for example the number of light bulbs that can be connected serially to one another and to a power supply or a maximal number of samples that can be mixed in a tube; prevalence, e.g., the expected percent of faulty items in the population of tested items; knowledge about the probability of each individual item to be faulty; the required results for one or more of the samples, for example when the underlying test checks for several pathogens, but not all pathogens results are required for all samples; a maximal number of tests in which each item can participate, which may depend on additional parameters such as the number of pools in a testing device, the volume of each sample, the minimal volume that needs to be put in a pool (also taking into account the number of repeated tests that may be required per sample), or the like; false positive error result tolerance; false negative error result tolerance, or the like; physical properties of the testing equipment that can improve the time and accuracy of the testing procedure, or the like.

[0023] Further data that may he received and incorporated into the calculation may depend on the specific testing procedure within the testing facility, such as the test duration, the number of tests that can be performed concurrently and their division to equipment. For example, a laboratory' may have two machines, one capable of testing 100 tubes at a time wherein testing takes one hour, and the other capable of testing 200 tubes at a time wherein testing takes two hours.

[0024] Yet further data may include general historical data, including for example historical data of prevalence rate per the testing facility or the geographic area over time, age distribution in the population, other demographic data, season, location, symptoms, or the like.

[0025] The solution provides a calculation that associates each sample with one or more tests in which it needs to participate. The association may be represented as a matrix comprising N rows, each row associated with an item to be tested, and M columns wherein each column is associated with a test, wherein the (i,j) entry is 1 if the i th sample is assigned to participate in the j m test, and 0 if the i m sample is not assigned to participate in the j' h test.

[0026] In some embodiments, one or more samples may be excluded from the pooling and tested individually . The number or percentage of the tests to be individually tested, and the specific samples may be determined in accordance with the historical data. The sample-test association may then be determined for the rest of the samples.

[0027] The solution further provides a decoding calculation, for receiving a result vector V of length M, wherein the j 4 entry indicates whether the j 4 test was positive or negative, and outputting a probability for each sample to be positi ve.

[0028] Depending on the test type, the result vector j may be comprised of binary values, or numeric values. In the quantitative case, the j* entry may be 0 if the j 4 test was functional or healthy , or a real number indicating the degree of faultiness of the j rli test if the test was faulty. The decoding may output a binary ' · result or an estimated degree of faultiness of each sample. The decoding may further output a probability, certainty or confidence degree, indicating the confidence of the decoder in the result. [0029] Although the results may he provided as is to a user, they may be further processed by rule-based algorithms which may or may not he integrated with the other algorithms. For example, if the results are not binary but rather continuous (to the available resolution), the confidence degree may be compared to a threshold, such that only the samples for which the result is positive in the binary case, and exceeds a first threshold in the continuous case, and the confidence exceeds a second threshold are considered faulty and reported to the user. In further embodiments, samples within a predefined result range, or which are associated with another uncertainty factor may he declared as “undetermined” and recommended for retesting. Further division may be made among the tests declared “undetermined”, between those with a lower faultiness probability than can be retested in a group testing with a next batch, and those with a higher faultiness probability which should be tested individually.

[0030] The solution may further provide for automatically interfacing with external information sources for automatically retrieving updated data, such as the faultiness prevalence which may change drastically over time, for example in epidemic situations and local outbursts. The data may also include individual assessment of the probability of an item to be faulty, a correlation between the events of different items being faulty, or the like.

[0031] The solution may further provide for automatically interfacing with systems, such as a laboratory information management system, for obtaining information related to the testing procedure, for example the test duration, the prevalence rate of the samples, the number of tests than can be performed concurrently or other related data figures affecting the analysis in the sample-test association stage and in the decoding stage, such as the number of testing machines and the number of tubes each machine can hold, or the like.

[0032] Further interface with the testing facility may enable the automatic retrieval of results, such as but not limited to undetermined sample results, such that the information can be used in future test planning and execution.

[0033] The solution may also automatically provide the sample-test association to the laboratory equipment, and may automatically receive the test results from the testing equipment, e.g., whether each tube w'as positive or negative, or a specific value such as gene C t value. [0034] The solution may further provide for interfacing with a system that performs the actual preparation of the combined tubes, e.g., a lab automated workbench, also referred to as a “robot”, “actuator, “actuator system”, "automated liquid handier", or the like, A system in accordance with the solution may thus communicate with the robot and provide the association matrix, for example by providing a worklist that a script on the robot can read and implement, such that the robot can transfer a predetermined amount from each sample into the relevant tubes based on the test-sample association, and perform the testing. The testing results may subsequently he provided automatically to a system in accordance with the disclosure, for decoding, wherein the decoding is based on said association.

[0035] All interfaces may be performed using any agreeable protocol, thus making the solution suitable for any hardware and software used for testing.

[0036] The system may also use the inherent redundancy of combinatorial group testing to detect problems in the testing equipment. The information about such problems may be used for calibrating the decoding algorithm in real time, further information for calibrating the equipment and/or the decoding algorithm may be obtained from the limit of detection (LOD) of each specific testing kit. The LOD data may be used specifically for samples with low target load. Additionally or alternatively, the information may be notified to a user such as an operator of the equipment.

[0037] The system may provide for multiple simultaneous testing. In some situations the tested pathogens may be unrelated, for example when testing for multiple sexually transmitted diseases (STDs). In other situations the targets may be correlated, for example when testing for SARS-COV-2, and in further situations some tests may be related while others are not, for example when for COVID-19 with a separate target representing the existence of a specific variant.

[0038] One technical effect of the disclosure provides for efficient planning of the tests, which may significantly reduce the number of performed tests, in particular in situations where the expected prevalence of the tested attribute is relatively low. This is particularly useful in preventing situations in which the laboratories reach their top capacity and are unable to perform all required tests, for example in epidemic or pandemic situations and local outbursts of a disease. [0039] Another technical effect of the disclosure is the non-adaptiveness of the solution, and the solution providing for single-step testing, such that all test planning may be performed in advance, without the need to plan further tests based on the received results, wherein these tests need to be performed on a second or further stages. The single stage has further importance in lengthy tests, as it may reduce or eliminate the need for a second testing stage.

[0040] Yet another technical effect of the disclosure is the significant reduction in planning and execution time, human labor and equipment saved by the efficient process, relative to those required when performing individual tests or even traditional group testing, which necessitates second and possibly further testing stages. This may be directly translated to significant time and cost reduction in the testing procedure, and increase the laboratory efficiency and thus its throughput, capacity and revenue.

[0041] Yet another technical effect of the disclosure is the interfacing with the laboratory equipment, for complete automation of the testing procedure, including automatically planning the tests, preparing the test tubes, performing the actual testing procedure, receiving the results and decoding the results for determining whether and which samples are positive. The automation, in addition to the advantages mentioned above may also reduce vulnerability to human errors, especially in intensive periods, such as epidemic periods.

[0042] Yet another technical effect relates to the possibility of using redundancy in testing for detecting problems in the testing equipment. For example, inconsistent results when testing the same sample or sample combinations may point at a problem with the testing equipment.

[0043] It will be appreciated that although the disclosure focuses on laboratory tests such as testing liquid samples, it is equally applicable, mutatis mutandis , to any other type of items which can be tested using group testing.

[0044] Referring now to Fig. 1A, demonstrating an individual testing scheme. Given eight samples, 100, 104, 108, 112, 116, 120, 124 and 128, wherein only sample 120 is faulty, e.g. positive for an attribute. However, this is a priori unknown, and it needs to be checked whether and which of the samples are faulty. [0045] In the naive individual approach, each sample is tested individually, thus indeed yielding the result that sample 120 is faulty, and none of the other samples is faulty. This testing, however, requires a test for each item, i.e., eight individual tests in the case depicted in Fig. 1A.

[0046] Referring now to Fig. IB, demonstrating a traditional group testing scheme, also referred to as pooling. In this approach, the samples are divided into groups, for example by mixing parts of four samples into a tube. This scheme requires a test for each group as existing within a tube, and a subsequent test for each item within each faulty group. Thus, on a first stage, parts of samples 100, 104, 108 and 112 are mixed and tested in tube 132, and parts of samples 116, 120, 124 and 128 are mixed and tested in tube 136.

[0047] Since all samples 100, 104, 108 and 112 are negative, so is tube 132, such that samples 100,104, 108 and 112 are declared negative and no further tests are required regarding these samples.

[0048] The result of tube 136, however, is positive, therefore it is surmised that at least one of samples 116, 120, 124 and 128 is positive. Therefore, on stage 2 all four samples 116, 120, 124 and 128 need to be checked, thus yielding a total of two tests for stage 1 and four tests for stage 2, altogether six tests.

[0049] Referring now to Fig. 2, demonstrating an example of combinatorial group testing, in accordance with some embodiments of the disclosure.

[0050] In this approach, specimen from the contents of one or more of the samples are distributed between a plurality of testing tubes, and one or more test tubes receive parts from multiple samples. For example, sample 100 is distributed among tubes 200 and 204, sample 112 is distributed among tubes 204 and 216, and so on. It will be appreciated that some of the contents of a sample may be left and not mixed with other samples, for purposes such as retesting. It will also be appreciated that the number of tubes to which each sample is distributed is not limited to two, but can be larger or significantly larger. Moreover, not all samples need to be distributed to the same number of tubes, rather different samples may be distributed to different number of tubes. Furthermore, different tubes can contain specimen from a different number of samples. [0051] Given this scheme with samples 100, 104, 108, 112, 116, 120, 124 and 128 and test tubes 200, 204, 208, 212 and 216, if the only faulty sample is sample 120, the only test tubes that will have a positive result are 212 and 216.

[0052] Given this result vector {0, 0, 0, 1, 1], it is seen that the only possibility is that sample 120 is the only faulty sample. Each of the other samples is also mixed within at least one test tube other than 212 and 216, thus if it were faulty these tubes would have been positive as well.

[0053] The combinatorial group testing thus yields accurate results in this example with five tests at the first stage and no test at a second stage, yielding a total of five tests as compared to eight tests in individual testing and six tests in traditional group testing. Moreover, the tests may be performed concurrently, thus reducing the overall time until all results are available, which may be critical.

[0054] It will be appreciated that although the example of Fig. 2 is simplistic, the same principle may be applied, as detailed below, to real-life situations, in which hundreds, thousands or more samples are to be tested. For example, 100 samples may be determined by performing about 20 tests, thus saving time, money and labor, and increasing the laboratory capacity and throughput.

[0055] Referring now to Fig. 3A, showing a block diagram of a system for combinatorial group testing, in accordance with some exemplary embodiments of the subject matter.

[0056] The system may comprise one or more computing platforms 300, which may be for example one computing platform, multiple interconnected computing platforms, or the like. The system may be implemented as a stand-alone system, as a web service providing services to one or more clients, or the like. The system may be on-premise, for example located at a laboratory or medical center, remotely located, implemented as a hosted cloud solution, a combination thereof, or the like.

[0057] In some exemplary embodiments of the disclosed subject matter, computing platform 300 can comprise processor 304. Processor 304 may be any one or more processors such as a Central Processing Unit (CPU), a microprocessor, an electronic circuit, an Integrated Circuit (IC) or the like. Processor 304 may be utilized to perform computations required by the apparatus or any of its subcomponents. [0058] In some exemplary embodiments of the disclosed subject matter, computing platform 300 can comprise an Input/Output (I/O) device 308 such as a display, a pointing device, a keyboard, a touch screen, a microphone, a speaker, or the like. I/O device 308 can be utilized to receive input from a user, such as laboratory settings, number of samples, prevalence rate, or the like, and to provide output to a user, such as presenting results of a tested item, such as showing a probability that one or more samples are positive.

[0059] In some exemplary embodiments of the disclosed subject matter, computing platform 300 can comprise a communication device 312, for communicating with other computing platforms for obtaining data from another processor or computer, providing data to a laboratory information management system, providing commands to testing equipment such as a robot, or the like.

[0060] Computing platform 300 may comprise a storage device 316. Storage device 316 may be a hard disk drive, a Flash disk, a Random Access Memory (RAM), a memory chip, a cloud server, or the like. In some exemplary embodiments, storage device 316 can retain program code operative to cause processor 304 to perform acts associated with any of the subcomponents of computing platform 300.

[0061] Storage device 316 can store the modules detailed below. The modules may be arranged as one or more executable files, dynamic libraries, static libraries, methods, functions, services, or the like, programmed in any programming language and under any computing environment.

[0062] Storage device 316 may store sorting module 318, for determining based on historical data such as historical prevalence rate of the laboratory or of the population, symptoms, or the like, one or more samples to be tested individually, or otherwise receive a specialized treatment. These samples will thus not be part of the calculations below, although they may appear in the matrix, such that their results will be decoded similarly to all other samples.

[0063] Storage device 316 may store an association generation module 320, which may be configured to receive: a number of samples or a list of sample identifiers, a number of tests or a list of test identifiers, and possibly additional parameters, such as prevalence rate, probability for each individual sample to be faulty due for example to risk factors, threshold of prevalence rate results, a kit or device type, maximal number of samples to be mixed within each test, maximal number of concurrent tests, or the like. Additional data which can be received and handled by association generation module 320 may include historical data, including for example historical data related to prevalence rate per geographic area, age distribution in the population, other demographic data, or the like. Further data may include season, location, or the like. Further data may relate to whether and which symptoms are experienced by any one or more of the subjects associated with the samples. For example, a sample associated with a subject experiencing significant symptoms may be more likely to be tested individually, otherwise they may make multiple pools positive, which may require further testing.

[0064] Association generation module 320 may be further configured to generate an association of each sample identifier with at least one test. At least one sample, typically most samples, often all samples may be associated with at least two tests. It will be appreciated that each value may affect the generated association, thus each set of the parameters may cause the generation of a different association. In some embodiments, association generation module 320 may comprise two or more engines or modules for creating the association, for example one engine for generating an association for the binary test case and another for the continuous test case. In further embodiments, two or more alternative engines, for example two engines related to the continuous test case may exist, wherein the engine to be activated may be selected upon the input parameters. In some embodiments, one engine may output two or more associations, one of which may be selected in accordance with the current data, or with historical data as detailed above. In some embodiments, the association, or matrix, to be selected if more than one algorithm has been activated, or if at least one of the algorithms output two matrices, may be selected in accordance with the predicted prevalence rate which may be influenced by the historical data. In further embodiments, the selected matrix may be enhanced in accordance with the predicted prevalence rate. The association generation is further detailed in Fig. 5 below. In some cases the samples to be pooled may be divided into two or more groups, wherein for each group different pooling and/or decoding algorithms are used. When the actual results are available, the performance of the different algorithms may be compared.

[0065] Storage device 316 may store a binary decoding module 324 and/or a continuous decoding module 328, for receiving a vector of results of the tests, and based on the vector and the association determined by association generation module 320, provide a probability for each test sample to be positive or faulty, and optionally a certainty degree. Binary decoding module 324 may be activated when the test result is binary, i.e., the result of each test is positive or negative, in which case the algorithm computes the probability of each sample to be positive by maximizing the expectation for each sample independently, which converges to unknown parameters iteratively, wherein the binary algorithm is an implementation thereof, and may be referred to as a binary algorithm- a kind of expectation maximization for each sample individually.

[0066] Continuous decoding module 328 may be activated when the test result is continuous (to the available resolution) or may otherwise assume real values, and may provide an estimation of the degree of positiveness of each sample. As detailed below, decoding may relate to multiple results, such as results related to multiple genes.

[0067] In some embodiments, binary decoding module 324 and a continuous decoding module 328 may be used in conjunction with one another. For example, for a certain range, either predetermined or dynamically based on parameters inserted for the specific association, the binary decoder may be used, while for other ranges the continuous decoder may be used. In further situations both decoders may be used, for example on step 424 detailed below.

[0068] Storage device 316 may also store a rule-based algorithm application module 322 for processing initial data related to the sample, for example for sorting the samples in accordance with historical data, for processing the test results or the decoder output, or otherwise using rule-based algorithm in any stage of the process.

[0069] Storage device 316 may store testing environment communication module 332 for communicating with the testing environment, for example receiving through an Application Program Interface (API) the testing parameters, such as number of testing devices, the number of tests that each such device can perform, the test duration, or the like. Testing environment communication module 332 may also be configured for providing the sample-test association to the testing environment and to receive the test results from the testing environment. Testing environment communication module 332 may also be configured to output the computed probabilities and/or any other results of decoding of the samples, to the information management system. [0070] In some embodiments, for example where the testing is of diagnostics or biological samples such as testing of viral loads in a laboratory, testing environment communication module 332 may interface a Laboratory Information Management System (LIMS or LIS), which is a software-based solution with features that support the laboratory's operations. Key features of LIMS include, but are not limited to sample management and tracking, instrument and application integration, workflow and data tracking support, and data exchange interfaces, which support its use in regulated environments.

[0071] In some situations, LIMS may be used as an enterprise resource planning tool that manages multiple aspects of the laboratory informatics. In some embodiments, testing environment communication module 332 may connect to the IT or data information systems of an HMO or medical center facility. The connection may be direct, via the LIMS or via other means such as a separate drive that can be accessed by one or more of the above.

[0072] Storage device 316 may store testing equipment interface 336, for initiating the testing procedure, for example providing specific instructions to a testing robot for which samples to place and combine into which test tubes through a script and / or a worklist as an example. Testing equipment interface 336 may also provide instructions to the testing robot to perform the testing.

[0073] Storage device 316 may store user interface 340 for receiving instructions or parameters from a user, uploading files such as lists and IDs of samples, results of pools, downloading files such as worklists of association of tests to samples, results of samples following decoding, and providing results to the user, for example outputting the decoding results and the computed probability of each sample to be faulty, displaying the identifiers of the samples which are most likely to be faulty, or the like. However, it will be appreciated that a system can operate with communication to external systems, such as testing equipment, databases, etc. without a user interface.

[0074] Referring now to Fig. 3B, showing a schematic flowchart of the testing procedure, in accordance with some embodiments of the disclosure.

[0075] The flowchart comprises some important components and steps of the testing procedure, including testing environment 350 and analysis system 374. Testing environment 350 may be a laboratory including the testing devices, an actuator system, a system for determining the results and providing the results to a computing system or the like.

[0076] On step 354 the test environment may be prepared, for example preparing a clean room, preparing the available devices, inactivation and loading of the samples, preparation of tubes and kits, or the like.

[0077] On step 358 the testing environment may obtain the samples and an identifier for each sample, such as a barcode.

[0078] On step 362 the pools may be prepared, comprising testing environment 350 communicating (368) to association generation module 378 of analysis system 374 the sample IDs, as well as any potential additional parameters, such as characteristics of the testing equipment, volume, calculated expected prevalence rate, or the like. Analysis system 374 may also obtain additional data from other sources. Association generation module 378 may generate the sample-pool association, and may provide (372) the same to testing environment 350. The sample-pool association may be provided in the form of a matrix, a worklist, or the like.

[0079] Testing environment 350 may prepare the pools according to the received association and may perform the testing on step 366. In some embodiment, a second stage testing may be required. In some situations, the second stage may also involve pooling, in which case execution may return to pool preparation 332, which may again communicate with association generation module 378. In other situations, the second stage may involve individual testing, without pooling.

[0080] Once testing is done, testing results 386 may be provided to analysis system 374.

[0081] Analysis system 374 may then provide the sample-pool association and testing results 386 to one or more decoders, to obtain a numeric result for continuous data (such as a C t value), a ruling or result, and/or a probability for each sample to be faulty, and/or a certainty level of any of the above. One or more decoders, or another module in communication with the decoders may also use a rule-based algorithm for enhancing the testing results prior to, during or after the decoding. It will be appreciated that different rules may be applied by the rule-based algorithm. It will be appreciated that multiple decoders may exist as detailed above. [0082] Analysis system 374 may then provide the results, such as C t values, probabilities or ruling or combination thereof as a file, or to a system such as a drive or a LIMS system 394, for example through a driver, file or interface 390.

[0083] Referring now to Fig. 4 showing a flowchart diagram of a method for combinatorial group testing, in accordance with some exemplary embodiments of the disclosed subject matter. The method may be performed by computing platform 300 of Fig. 3A above, and specifically by processor 304.

[0084] On step 400, data related to the samples and testing and required for creating the sample-test association, such as sample IDs, testing tubes IDs, expected or current prevalence rate in the samples being tested, maximal number of specimens to be mixed within each tube, or the like, may be received. For example, the data may include obtaining a number of samples to be tested, wherein each sample is associated with a sample identifier, and a number of pools of tests to be created, each pool associated with a pool identifier.

[0085] On step 404, a sorting step may be performed, in which certain samples may be selected for individual testing, The percentage of the samples, or the specific samples to be tested individually, may be predetermined, or determined upon the data, for example historical data such as prevalence rate in the laboratory over time, symptoms or demographic data of the subjects of the samples, historical test results of the testing facility, symptoms of the samples or the like. The

[0086] On step 406, a combinatorial sample-test association may be created upon the sample data and lab data. The sample-test association may also be based on historic data. The sample-test association may be represented as a matrix. Within the association, at least one sample identifier is associated with at least two pool identifiers, and at least one pool identifier is associated with at least two sample identifiers.

[0087] Referring now to Fig. 5, showing a flowchart of steps in a method for determining the sample-test association.

[0088] The method focuses on a model in which the samples may be mathematically represented as independent and identically distributed (IID) Bernoulli random variables with a received prevalence. The model may be assumed to be a binary model in which the result of each pool is either positive or negative. A pool is positive if and only if at least one sample in it is positive (excluding cases of possible contamination, as disclosed below), and a sample is assumed to be positive if and only if all pools it participates in are positive. It will be appreciated that the model can be enhanced to accommodate false negative or false positive results given by one or more pools.

[0089] On step 500, input parameters related to the testing parameters, procedures and materials may be received such as but not limited to the following: the number of samples, also referred to as items, or a list of sample identifiers; the number of tests, also referred to as pools, or test identifiers; optionally the dilution level e.g. the maximal number of samples that can participate in each test; the repetition number, e.g., the maximal number of pools that each individual sample can be associated with; and the prevalence, e.g. the percentage of the samples that are expected to possess the attribute, e.g., declared as positive. The testing scheme may be determined based upon these parameters. For example, the repetition may be calculated by dividing the sample volume, if given, by a minimum amount of a sample that can be used for testing. The dilution default may be calculated by dividing the test tube volume, if given, by the minimal amount of a sample that can be used for testing. The parameters may be received automatically through a testing environment communication module 332, manually from a user, read from a file, or the like.

[0090] Additional data that may be received, and may influence the testing scheme may relate to historical, demographic and geographic information related to the testing facility or the samples, previous samples with the same demographics or geographical source or other similar attributes, samples tested in the lab, the faulty percentage of a manufacturing line, or the like. In some situations, at least some of the samples may have individual probabilities to be positive rather than a common prevalence.

[0091] In some embodiments the predicted prevalence rate, also referred to as positive attribute rate, may be obtained from an external source. However, in other embodiments the prevalence rate may be predicted based on previous results for similar populations as detailed above. However, individual prior probabilities, based for example on personal data, may also be used to enhance the prevalence prediction. If no values are provided to the optional parameters, default values may be calculated or assumed for the population. The data and the prevalence rate may be used, for example, for sorting some of the samples for individual testing, selecting an algorithm for creating the sample-test correspondence or selecting a particular matrix for the sample-test correspondence.

[0092] On step 502, a combinatorial test scheme may be obtained, comprising a sample- pool association, in which at least one sample identifier is associated with multiple pool identifiers, and at least one pool identifier is associated with multiple sample identifiers. [0093] In some embodiments, the method may construct a data structure of a bipartite graph representing the matrix. A bipartite graph is a graph whose vertices can be divided into two disjoint and independent sets U and V such that every edge connects a vertex in U to a vertex in V. U and V vertex sets may be referred to as the parts of the graph. In an equivalent definition, a bipartite graph is a graph that does not contain any odd- length cycles.

[0094] On step 504 the method may create an initial bipartite graph, comprising a first set of vertices each associated with a sample, and second set of vertices each associated with a pool, and no edges.

[0095] The steps detailed below describe an exemplary implementation of a greedy algorithm for generating the association.

[0096] On step 506, a new cycle may start for iterating through the samples.

[0097] On step 508, a sample ID from all samples, which has not been handled in the current cycle, may be selected. An edge may be created, wherein one end of which is the vertex associated with the sample.

[0098] On step 512, a pool may be selected for the sample, represented by determining the other end of the edge. The pool may be determined in such a way as to minimize the probability of false positive, i.e., reducing the probability of a negative sample wherein all pools it participates in are positive, for example by increasing the probability of each other sample not to meet the same sample again in another pool. In a non-limiting example, a greedy algorithm may be used for which the cost function to be minimized is the probability of the sample being labelled false positive, in the binary model. Formally presented, the cost function may be the probability that the suggested new pool is positive, given that the sample is negative and that existing pools of the sample are positive. Minimizing the cost implies that it is desired that if all existing pools are positive (due to other samples), the new pool will be negative, thus saving a potential false positive. [0099] Thus, assuming that sample s is already in pools rΐ,,,,,rh, it is required to select an additional pool pn+i, which minimizes C=Pr(P(Pn+l)j-iP(s)AP(pl)AP(p2).,,AP(pn)) wherein P(x) indicates that the sample x or pool x is positive. A negative pool indicates that the sample is negative, therefore if the existing pools are all positive, a new pool is required to help determination, by being negative. The formula thus provides that if a sample is negative and all existing pools are positive (because of other positive samples in them), then a new pool is selected which has a minimal probability to be positive, since it is desirable that at least one pool is negative if the sample is negative, to avoid the additional retesting.

[0100] On step 516 it may be determined whether there is another sample in the current cycle. If yes, execution returns to step 508 to continue with the next sample and generate another edge having one end at the vertex associated with the selected sample and its other end in a pool selected for the sample, thus until these steps are repeated through all samples.

[0101] If no further samples are available in the current cycle, on step 520 it may be determined whether an additional cycle is required If yes, execution returns to step 504 to start the next cycle. The method thus first determines a first pool for each sample, then a second pool for each sample, and so on. Further cycles may be performed until all test tubes have the required number of samples and each sample is represented in the required number of test tubes.

[0102] It will be appreciated that if it is required to assure constant dilution of the pools, i.e., the same number of samples being mixed within each pool, then the last cycle may stop before all samples have been considered.

[0103] Otherwise, on step 524 the method may finish.

[0104] In some embodiments, two or more algorithms may be used concurrently or sequentially for generating two matrices. For example, the algorithms may behave differently and output different matrices on any of steps 504, 508, 512 or others. The redundancy may be useful, for example, when handling the behavior of a pathogen in different ranges of target load, or another different behavior in a different area of values of the tested pathogen. In such cases the more appropriate matrix may be selected, for example in accordance with the estimated carrier percentage in the population. The matrix selection may be performed automatically or by a user.

[0105] In some cases, one matrix may be selected over the other in accordance with the historic data of the results of tests performed in the lab, historic data of demographic or other data associated with the environment or region with which at least some of the subjects are associated such as prevalence rate, or with specific data related to one or more patients, such as whether and which symptoms the subject is suffering from. In some embodiments, if an algorithm executed under a set of constraints related for example to volumes, capacity, prevalence rate, the number of allowed repetitions, testing sensitivity, dilution level, efficiency requirements or limitations, or the like, provides a matrix which is assessed to provide results with insufficient certainty, the same algorithm or another one may be executed again with different requirements, to produce a different matrix.

[0106] In some cases, a certain percentage of the samples, for example 10% or 15% of the samples may be selected not to be pooled with other samples, and tested individually, such that their decoding is straightforward. However, they may still be present in the matrix, as they still need to be handled by the lab equipment and machinery. In some embodiments, the percentage of the samples to be tested independently and the specific samples to be tested individually may be determined in accordance with a prediction of the positive rate, based on historical test data related to the environment or demography of the population from which the samples are taken, historical data of the prevalence rate of having the attribute/ of being faulty in the lab, symptoms, or the like.

[0107] In some situations, for example if the prevalence rate is extremely high, for example exceeds a predetermined threshold, yet another algorithm may be used, for example an algorithm based on negative results, i.e., on the probability not to be faulty, or probability of not having the tested attribute.

[0108] It will be appreciated that the algorithm above is exemplary only, and that other algorithms may be used for computing the association.

[0109] It will be appreciated that changing the percentage of the samples to be tested individually may change the matrix generated for the pooled samples.

[0110] Referring now back to Fig. 4, on step 408 the testing procedure may be adapted in accordance with the available testing equipment and operation manner of the testing facility. For example, the method may receive the parameters, procedures and specifications of each of the testing devices, the number of testing tubes each device in the process can handle, the pooling equipment, the testing duration each machine needs for testing, the working days and hours of the laboratory, the times and pace in which samples arrive at the lab, the existing prevalence rates of samples and sample groups arriving at the lab and their origins including geographic and demographic areas they arrive from, the expected prevalence rates of samples, or the like, the determined tests can be assigned to specific machines at specific testing times from specific origin locations and expected prevalence rates. As detailed below, the expected prevalence may be received or calculated.

[0111] It will be appreciated that the pooling time may be enhanced based on knowledge of the pooling equipment, for example by adjusting the algorithm that distributes the samples from the testing tubes into the pools in the testing scheme, thereby optimizing for time efficiency.

[0112] The method may adapt or improve the settings of the current session, and the throughput and operation costs of the testing facility on a more global or general level. The determined test plan can then be provided to the testing facility.

[0113] On step 412 the association results may be provided, directly or indirectly to the testing facility, for example automatically via a communication channel. The results may include the sample-test association, as well as the test plan, including which devices operate on which test, and at what time(s). The sample-test association may be output, for example, as a list of pools, each pool associated with a list of sample identifiers. In some embodiments, the association may be output in a form of a matrix, wherein each row (or column) is associated with a sample identifier and each column (or row) is associated with a test identifier, and an entry corresponding to sample i and test j is 1 if there is an edge from sample i to pool j in the bipartite graph, in other words if sample i participates in test j, e.g., at least a part of sample i is mixed (or tested individually) within test j. In some embodiments, for example, the association may be output as a worklist in a format readable by the device, for example by a script executed by the device, in which a sample is taken from a source location and set in a specific well or pool. For example, liquid- based samples may be aspirated and dispensed by an automated workbench robot. Other samples may be distributed between pools in any other appropriate manner. [0114] The testing facility may then receive the association. In some embodiments, for each pool identifier an actuator such as a robot within the testing facility may automatically create the pools, e.g., mix at least a part from each sample associated with the pool identifier, to create the pools, and subsequently test each pool to obtain the corresponding indicators.

[0115] The testing facility may then provide the test results to the system, for example output the results into a file or stream, send the result over a communication channel, or the like. In some embodiments the results may include a positive/negative result per each test tube, while in other embodiments a numerical value may be provided indicating a level of result.

[0116] On step 416, the test results of the performed tests, also referred to as indicators, may be received by computing platform 300, indicating for each pool identifier a degree to which a pool corresponding to the pool identifier possesses the required attribute e.g., is positive. The test results may be received automatically, for example via a communication channel. As disclosed above, a pool is positive if and only if at least one sample in it is positive (excluding cases of possible contamination, originating from contamination by PCR amplicons, contamination of reagents, sample crosscontamination, cross-reactions with other microorganisms or genetic material, or the like). In ideal situations, a sample is thought to be positive if all pools it participates in are positive. However, in some sensitive cases, such as in the biology field, and specifically in some molecular tests such as PCR, above a certain threshold, where there is non-linear behavior of the detected targets, there exist situations in which only some of the pools containing a positive sample may provide a positive result while one or more others may provide a negative result. This effect may be caused by some pools in which the required attribute does not show up, for example in biological sensitive tests such as PCR which has a non-linear part above a certain value, such that some of the pool tests may show the attribute and some do not. Another situation in which this effect may occur is when there is a false positive pool caused by contamination, reagents, faulty extraction, weak sample (i.e., a sample that causes only some of the pools to be positive), PCR problem, Ct value degradation, or others. Such a situation can be handled by a rule -based algorithms activating predetermined rules or by another algorithm. [0117] On step 418, a rule-based algorithm may be applied to enhance the comprehensiveness of the results prior to decoding. For example, in cases where the test results comprise a plurality of continuous results, such as where a plurality of genes are tested, results from one gene may be inferred to other genes, based on certain calculations such as correlation between genes or other assumptions based on prior simulations or statistics. In some embodiments, any computation that may help reduce the false positive or false negative rate may be employed.

[0118] On step 420, the results may be processed, for example decoded, and a result and optionally a certainty level may be determined for each sample, based on the association and the test results as received.

[0119] In the binary case, the decoding may be based on a Bayesian model, with IID Bernoulli random variables for the samples, and with the evidence being the set of positive pools. Similar to the example of Fig. 2 above, if a pool is negative, all samples participating therein may be declared negative with certainty, e.g., zero probability of being positive. However, a sample for which every pool it participates in is positive, may be assigned a probability lower than 1, if there is an alternative scenario, for example a combination of other positive samples, that explains all the positive test results. In some of these cases, it may not be possible to determine with sufficient certainty which sample has the attribute, therefore multiple samples may receive a ruling of “undetermined”, and may be candidates for a second stage testing, performed with or without pooling.

[0120] In some embodiments, the probabilities may be calculated directly, for example using an exhaustive search, also referred to as a brute-force manner. However, other methods, including local or global optimizations may be used in order to shorten the decoding process.

[0121] In a Quantitative Reverse Transcription Polymerase Chain Reaction (qRT-PCR)) assay, a positive reaction is detected by accumulation of a fluorescent signal. The cycle threshold (C t ) is defined as the number of cycles required for the fluorescent signal to cross a threshold, i.e., exceed a background level. C t levels are thus inversely proportional to the amount of target in the sample, i.e., the lower the C t level, the greater the number of target molecules in the sample. [0122] In the non-binary, e.g., continuous case, a real numeric value is received for each pool, representing for instance the C t value obtained by testing the pool with qRT-PCR. Also available is the sample-pool association as determined on step 404. It is required to determine a first real-valued vector in which each entry indicates a probability of a corresponding sample to be positive, and a second real-valued vector in which each entry indicates an estimate of the C t value of the corresponding sample, if it were to undergo individual qRT-PCR. In some situations, the results may include multiple C t values, each related to a different target In cases where the results are correlated with each other, the decoding can adjust accordingly and take it into account.

[0123] The Bayesian model may involve the usage of the prevalence, i.e., the initial likelihood of a sample to be positive, and a likelihood function. The prevalence may be given or estimated in accordance with received data. Given the prevalence, all samples are assumed to have the same initial probability to be positive, unless some additional knowledge is available, for example risk factors such as historical data related to the patient, age, race, pregnancy or another high risk group or factor, and the positive C t values of the samples are independent uniform or random variables in a fixed interval, such as between 15 and 40.

[0124] Given a vector pf possible C t values for the samples, the C t resulting values for each pool, the sample-pool association and the prevalence, the likelihood function provides a probability that the resulting pool C t values are consistent with the obtained values.

[0125] The likelihood may be calculated as follows: given a set of positive samples and their C t values, the number of target molecules of each sample may be calculated as an exponent of the C t values with some fixed base. The viral load of the pools may then be calculated as a linear combination of the viral loads of the samples participating in each pool, according to the sample-pool association. A logarithm of the viral load of the pools may then be taken to calculate the expected pool C t values. An IID Gaussian noise may be added to each pool C t value, to obtain the expected C t values of the pools, as a function of the C t values of the samples. The likelihood may then be defined in accordance with the calculated expected C t values of the pools and the actual C t values as obtained, for example in accordance with the sum of squares of differences. It will be appreciated that any other statistical model may be used for the likelihood function. [0126] The solution space comprises all possible combinations of positive C t samples in accordance with the prevalence, and a point in the solution space consists of the samples suspected to be positive (also referred to as active set), and a C t value for each sample in the active set, which is restricted to be in a given fixed interval.

[0127] Given an active set of samples, the C t values for these samples may be selected to maximize the likelihood. For each such point, finding the C t values that maximal likelihood estimate is a convex optimization problem, minimizing a quadratic function on a hypercube. In some embodiments, this problem may be solved by an iterative algorithm, based on the Non-Negative Orthogonal Matching Pursuit (NNOMP). Thus, given a group of samples assumed to be positive, and the obtained test C t values, the sample C t values may be determined for each, which maximize the likelihood.

[0128] An active set refers to the set of positive samples within a possible solution. Selecting the best point of the solution space, may be performed using a “random walk” algorithm, such as a Markovian random walk. Given an arbitrary starting point in the solution space, the possible ways to move to a neighboring point are to either add or remove one sample from the active set. For each such neighboring point, the ML estimate of the C t values is computed. The posterior density of the Bayesian model, indicating the probability of the point to be correct, may be calculated for each neighboring point. The next vertex in the random walk may be chosen by a probabilistic algorithm, such as annealing-like fashion, or another probabilistic algorithm, in order to prevent problems of a local maxima, and enable the selection of a point that may initially be less promising. The probabilistic algorithm may be based on the pool C t values and a given “temperature” parameter as referred to in the annealing algorithm, where vertices with a higher posterior density have a higher probability to be chosen.

[0129] Referring now to Fig. 6, showing a flowchart of steps in an exemplary method for estimating the C t values or probabilities of the samples to be positive, and optionally certainty levels.

[0130] On step 600, a number of positive samples may be estimated, based for example on the prevalence, or on a default value. [0131] On step 604, a random point may be selected from the solution space, wherein the solution space comprises all combinations of samples assumed to be positive, wherein the number of the samples assumed to be positive is as determined on step 600.

[0132] On step 608, a random walk may be performed for advancing over the solution space, for example with a temperature that cycles between high and low in order to scan the solution space and assign continuous values with maximum likelihood. The random walk may continue until no significant change has occurred for a predetermined number of cycles, when a timeout threshold is reached, or until another stopping criteria has been met.

[0133] The random walk may be replaced by a simulated annealing iterative process, wherein the temperature cools down gradually, attempting to find the global Maximum A-Posteriori (MAP) estimate of the solution, thus yielding the second output vector, comprising the estimated C t values of the samples.

[0134] On step 612, the probability that a sample is positive may be approximated as the proportion of vertices visited in the random walk, in which the sample is positive. For example, if a sample is positive in 30 out of 100 of the points visited in the random walk of the solution space, the probability that the sample is positive may be estimated at 30%.

[0135] It will be appreciated that the decoding can also handle situations such as noise stemming from the testing kit or the device labware, or biological faults and cause false positive or false negative results in certain pools. Additional potential noise factors may exist, that could be based on the kit, on the device labware, or the like, which the decoding algorithm may take into account. These factors may be used as parameters for threshold determination of algorithms as well as their sensitivity.

[0136] It will be appreciated that the decoding can use other algorithms and is not limited to the algorithm shown above, for example in some embodiments a message passing algorithm or a binary algorithm may be used.

[0137] In some embodiments, two or more decoding algorithms may be used. The results may be combined, or one set of results may be preferred over the other. For example, the ruling of which results to output may be in accordance with acceptable false positive or false negative rates, in accordance with the higher or lower positive or negative results, or the like.

[0138] It will be appreciated that in some cases definite results cannot be determined for a specific sample from combinatorial pooling. Such samples may then be recommended for retesting, either individually or as part of another pooling. Such a case may happen if a specific sample happens to be tested in the same pools with two or more positive samples. In this case, it may not be possible to determine for the specific sample whether it is indeed faulty, therefore it may be recommended for retesting.

[0139] Where each test provides multiple results, such as multiple target results, whether related or unrelated to one another; such as when testing multiple targets or pathogens, whether they are correlated, have some sort of association or are unrelated, decoding can be executed for each of the results. Decoding for different result sets may be performed with or without synchronization with each other, based on the case.

[0140] In some cases, for example when a higher than expected percentage of the pools are positive, or when the decoding process does not converge, another algorithm may be applied, such as the Bloom filter algorithm, which may provide lesser performance but is necessary. Such algorithm may indicate, for example, that all samples participating only in negative pools are negative, and provide no indication for the other samples.

[0141] Referring now back to Fig. 4, on step 424 further processing may be performed, which receives as input the results including the number and certainty levels as output by the decoder(s) and combining a plurality of results into a single result, or generating a ruling per each sample, prior to providing a ruling or another final result to a user or to another system. For example, if the certainty associated with a positive result is above a first threshold, for example 0.9, 0.95 or 0.99 then the ruling may be positive. If the certainty is below a second threshold, for example 0.1, 0.05 or 0.01, then the ruling may be negative. For certainties between the first and the second threshold, an undetermined ruling may be output. If multiple decoding sessions are performed, for example binary and continuous decoding, or multiple decoding sessions related to different genes are performed, the processing may combine the results to provide a more comprehensive assessment for each sample. In some embodiments, the processing may comprise applying a rule-based algorithm. [0142] On step 428 a result such as the ruling for each sample, i.e., whether it is positive or negative determined during decoding and possibly during post-processing, may be provided to a user, for example displayed over a display device, stored in a file, transmitted over a network, or the like. In other embodiments, different types of results may be provided, such as a ruling.

[0143] The disclosure provides for an advanced single-step testing procedure, which optimizes the testing plan, and also the overall testing procedure, thus improving the throughput and effectiveness of a testing facility. The disclosed method provides for high detection rate, i.e., true positive detection, with low false negatives and false positives. The method also comprises a built-in error correction to allow for greater stability of the solution, which cannot be achieved using known methods. For example, if a pool has a high probability of being faulty wherein no sample can explain it, it may be assumed that the pool is contaminated, wherein contamination might occur by PCR amplicons, contamination of reagents, sample cross-contamination, and cross-reactions with other microorganisms or genetic material or others. Since there are multiple iterations of each sample in multiple pools, this mechanism provides error correction of the overall result, without having to retest these samples as is typically currently done in labs in such cases. In some cases, a sample may be assigned an undetermined status. However, unlike the traditional pooling method wherein in such cases all samples in a pool need to be retested, in accordance with the current disclosure the status is per sample, therefore only the specific undetermined samples need to be retested, whether with pooling or individually.

[0144] It will be appreciated that the method and system is applicable also to other detection methods and techniques, such as but not limited to molecular tests such as LAMP, SHERLOCK, RT-PCR or other PCR-based methods, in addition to other detection tools such as CRISPR, biosensors and Mass Spectrometry (MS).

[0145] The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.

[0146] The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non- exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

[0147] Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.

[0148] Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, such as "C", C#, C++, Java, Phyton, Smalltalk, or others. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.

[0149] Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

[0150] These computer readable program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

[0151] The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. [0152] The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

[0153] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

[0154] The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.