METHOD AND SYSTEM FOR SAMPLE IDENTITY ASSURANCE - RADY CHILDRENS HOSPITAL RES CENTER

Title:

METHOD AND SYSTEM FOR SAMPLE IDENTITY ASSURANCE

Document Type and Number:

WIPO Patent Application WO/2020/006431

Kind Code:

Abstract:

The present disclosure provides a method for genetic analysis including allelotyping as well as a system for implementing such analysis.

Inventors:

DING YAN (US)
BATALOV SERGEY (US)

Application Number:

PCT/US2019/039859

Publication Date:

January 02, 2020

Filing Date:

June 28, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

RADY CHILDRENS HOSPITAL RES CENTER (US)

International Classes:

G16B20/00; G16B20/10; G16B20/40; G16B20/50; G16B25/00; G16B25/10; G16B25/20

Domestic Patent References:

WO2014015084A2	2014-01-23
WO2017070497A1	2017-04-27

Foreign References:

US20110230358A1	2011-09-22
US20170016075A1	2017-01-19

Other References:

GYMREK ET AL.: "lobSTR: A Short Tandem Repeat Profiler for Personal Genomes", GENOMIC RESEARCH, vol. 22, 20 April 2012 (2012-04-20), pages 1154 - 1162, XP055560142, DOI: 10.1101/gr.135780.111
See also references of EP 3815091A4

Attorney, Agent or Firm:

HAILE, Lisa, A. et al. (US)

Download PDF:

View/Download PDF PDF Help

Claims:

What is claimed is:

1. A method comprising:

a) determining a first allelotype for a sample via short tandem repeat (STR) amplification;

b) determining a second allelotype for the sample via genetic sequencing; and c) determining allele concordance between the first allelotype and the second allelotype.

2. The method of claim 1, wherein (c) comprises generating an allele profiling concordance table.

3. The method of claim 1, further comprising calculating a statistical probability to determine whether the first allelotype and the second allelotype are of a single subject.

4. The method of claim 3, wherein the subject is human.

5. The method of claim 1, wherein the first allelotype is generated via GeneMapperTM.

6. The method of claim 1, wherein the second allelotype is generated via lobSTRTM.

7. The method of claim 1, wherein the sample is a biological sample.

8. The method of claim 1, wherein the sample is whole blood, plasma, serum, ascites, cerebrospinal fluid, sweat, urine, tears, saliva, buccal sample, cavity rinse, feces, organ rinse, hair or skin.

9. The method of claim 1, wherein the sample is blood.

10. The method of claim 1, wherein genetic sequencing comprises whole genome sequencing (WGS), rapid whole genome sequencing (rWGS), whole exome sequencing (WES), next-generation sequence (NGS), targeted gene panel sequencing, or a combination thereof.

11. The method of claim 10, wherein WES or targeted gene panel sequencing comprises a panel having one or more oligonucleotides selected from the group consisting of SEQ ID NOs: 1-41.

12. The method of claim 11, wherein each oligonucleotide is between about 50 to 120 nucleotides in length.

13. The method of claim 11, wherein each oligonucleotide is 50 nucleotides in length or greater.

14. The method of claim 11, wherein each oligonucleotide is 120 nucleotides in length or less.

15. The method of claim 1, wherein (a) and (b) are performed in parallel.

16. A panel comprising one or more oligonucleotides selected from the group consisting of SEQ ID NOs: 1-41.

17. The panel of claim 16, wherein each oligonucleotide is between about 50 to 120 nucleotides in length.

18. The panel of claim 16, wherein each oligonucleotide is 50 nucleotides in length or greater.

19. The panel of claim 16, wherein each oligonucleotide is 120 nucleotides in length or less.

20. A genetic analysis system comprising: a) at least one processor operatively connected to a memory; b) a receiver component configured to receive DNA analysis information including sequence information generated from PCR amplification of DNA in a DNA sample; and c) an analysis component, executed by the at least one processor, configured to determine: i) an allelotype from the sequence information; ii) generate an allele profiling concordance table; and iii) calculate a statistical probability to determine whether a first allelotype and a second allelotype are from a single subject.

21. A genetic analysis system comprising: a) at least one processor operatively connected to a memory; b) a receiver component configured to receive DNA analysis information including sequence information generated from PCR amplification of DNA in a DNA sample; and c) an analysis component, executed by the at least one processor, configured to perform (a)-(c) of claim 1.

Description:

METHOD AND SYSTEM FOR SAMPLE IDENTITY ASSURANCE

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims benefit of priority under 35 U.S.C. §119(e) of U.S. Serial No. 62/692,366, filed June 29, 2018, the entire contents of which is incorporated herein by reference in its entirety.

INCORPORATION OF SEQUENCE LISTING

[0002] The material in the accompanying sequence listing is hereby incorporated by reference into this application. The accompanying sequence listing text file, name RADY_lWO_Sequence_Listing.txt, was created on June 25, 2019, and is 8 kb. The file can be accessed using Microsoft Word on a computer that uses Windows OS.

BACKGROUND OF THE INVENTION FIELD OF THE INVENTION

[0003] The invention relates generally to genetic analysis and more specifically to a method and system for allelotyping to ensure sample identity.

BACKGROUND INFORMATION

[0004] Whole Genome Sequencing (WGS), Whole Exome Sequencing (WES) and Targeted Gene Panel Sequencing using the Next Generation Sequencing (NGS) platforms are complicated processes involving multiple procedural steps. Sample swap or contamination during the process in NGS may result in false positive variant detections and genotype misclassification. The assurance of sample identity throughout the process is a critical quality control component. The process to ensure correct sample identity is a challenge for sequencing facilities.

[0005] Currently, some NGS facilities are performing array-based genotyping and using single nucleotide polymorphism (SNP) to obtain the concordance between genotype profiling called from NGS data and that from array-based genotype data (SNP microarrays). It is known that errors related or unrelated to specific processes may occur in array-based genotyping and lead to disconcordant genotype calls between SNP array data and NGS data. Meanwhile, the depth coverage of NGS impacts SNP calls from NGS data especially for lower minor allele frequencies (MAF) SNPs. For NGS panel sequencing, a custom designed array workflow has to be created to optimize concordance between NGS panel data and SNP microarray data. The work flow requires 2-3 days to complete. Additionally, for laboratories with relatively small sample volumes, initial instrumentation, modifications of cost and staffing models may need to be developed.

[0006] Improved methods for assuring correct sample identity are needed when performing genetic analysis.

SUMMARY OF THE INVENTION

[0007] The present invention provides a method and system for conducting genetic analysis via allelotyping. The method utilizes a combination of different types of allelotyping techniques to ensure correct sample identity.

[0008] Accordingly, in one aspect, the invention provides a method for performing genetic analysis. The method includes:

a) determining a first allelotype for a sample via short tandem repeat (STR) amplification;

b) determining a second allelotype for the sample via genetic sequencing; and c) determining allele concordance between the first allelotype and the second allelotype.

[0009] In embodiments, the method further includes generating an allele profiling concordance table. In one embodiment, the method includes calculating a statistical probability to determine whether the first allelotype and the second allelotype are of a single subject.

[00010] In various embodiments, genetic sequencing includes whole genome sequencing (WGS) or rapid whole genome sequencing (rWGS) or whole exome sequencing (WES), next-generation sequence (NGS), targeted gene panel sequencing, or a combination thereof.

[00011] In embodiments where sequencing includes WES or targeted gene panel sequencing, a panel having one or more oligonucleotides selected from SEQ ID NOs: 1-41 is utilized which enables allelotyping in these applications.

[00012] Accordingly, the invention further provides a panel having one or more oligonucleotides selected from SEQ ID NOs: 1-41. In embodiments, each oligonucleotide is between about 50 to 120 nucleotides in length. In one embodiment, each oligonucleotide is between about 50 nucleotides in length or greater. In one embodiment, each oligonucleotide is 120 nucleotides in length or less.

[00013] In an embodiment the invention provides a genetic analysis system configured to perform a method of the disclosure. The system includes: a) at least one processor operatively connected to a memory; b) a receiver component configured to receive DNA analysis information including sequence information generated from PCR amplification of DNA in a DNA sample; and c) an analysis component, executed by the at least one processor, configured to perform a method of the disclosure, such as determining an allelotype, generating an allele profiling concordance table and calculating a statistical probability to determine whether a first allelotype and a second allelotype are of a single subject.

[00014] In another embodiment, the invention provides a system for performing the method of the invention. The system includes a controller having at least one processor and non- transitory memory. The controller is configured to perform one or more of the processes of the method as described herein.

[00015] In still another embodiment, the invention provides a non-transitory computer readable storage medium encoded with a computer program. The program includes instructions that, when executed by one or more processors, cause the one or more processors to perform operations that implement a method of the disclosure.

[00016] In yet another embodiment, the invention provides a computing system. The system includes a memory, and one or more processors coupled to the memory, with the one or more processors being configured to perform operations that implement a method of the disclosure.

DETAILED DESCRIPTION OF THE INVENTION

[00017] The present invention is based on an innovative method for ensuring sample identity which includes a combination of multiple allelotyping techniques. The presently disclosed methodology includes comparing the concordance of STR (Short Tandem Repeat) allele profiling generated by the GlobalFiler™ PCR Amplification kit and by NGS using LobSTR™ software to assure sample identity and to detect potential cross contamination among the different samples.

[00018] GlobalFiler™ panel allows the determination of allelic states of 24 positions in the human genome, as well as to identify an event of contamination (mix) of more than one sample. Computational workflow on the WGS or WES or NGS Panel (in which the SEQ ID NOs: 1-41 oligonucleotides have been included in pool down probe design) data set using an in silico STR inference software (such as lobSTR™) allows the independent determination of allelic states of the same 24 positions in human genome. Statistical framework allows one to rule out any reasonable doubt (the probability of error less than 1 / 1,000,000,000,000,000) that the two samples came from the same individual if no less than 18 out of 24 positions match. [00019] Concordance between allelotype profiling called by STR and by WGS or WES is high and consistent. STR genotyping using GlobalFiler™ can generate consistent loci profiling with high accuracy and sensitivity. The work flow is simpler and easier for laboratory technologist to complete within 4-6 hours. Setting up STR reactions does not require as large a batching set as microarray. Additionally, reagents are not lost with a smaller sample set in a batch.

[00020] Before the present compositions and methods are described, it is to be understood that this invention is not limited to particular methods and experimental conditions described, as such compositions, methods, and conditions may vary. It is also to be understood that the terminology used herein is for purposes of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only in the appended claims.

[00021] As used in this specification and the appended claims, the singular forms“a”,“an”, and“the” include plural references unless the context clearly dictates otherwise. Thus, for example, references to“the method” includes one or more methods, and/or steps of the type described herein which will become apparent to those persons skilled in the art upon reading this disclosure and so forth.

[00022] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the invention, the preferred methods and materials are now described.

[00023] METHODS

[00024] The present invention provides a method for conducting genetic analysis via allelotyping. The method utilizes a combination of different types of allelotyping techniques to ensure sample identity.

[00025] Accordingly, in one aspect, the invention provides a method for performing genetic analysis. The method includes:

a) determining a first allelotype for a sample via short tandem repeat (STR) amplification;

b) determining a second allelotype for the sample via genetic sequencing; and c) determining allele concordance between the first allelotype and the second allelotype. [00026] The method of the disclosure contemplates genetic sequencing to generate an allelotype.

[00027] Sequencing may be by any method known in the art. Sequencing methods include, but are not limited to, Maxam-Gilbert sequencing-based techniques, chain-termination-based techniques, shotgun sequencing, bridge PCR sequencing, single-molecule real-time sequencing, ion semiconductor sequencing (Ion Torrent™ sequencing), nanopore sequencing, pyrosequencing (454), sequencing by synthesis, sequencing by ligation (SOLiD™ sequencing), sequencing by electron microscopy, dideoxy sequencing reactions (Sanger method), massively parallel sequencing, polony sequencing, and DNA nanoball sequencing. In some embodiments, sequencing involves hybridizing a primer to the template to form a template/primer duplex, contacting the duplex with a polymerase enzyme in the presence of a detectably labeled nucleotides under conditions that permit the polymerase to add nucleotides to the primer in a template-dependent manner, detecting a signal from the incorporated labeled nucleotide, and sequentially repeating the contacting and detecting steps at least once, wherein sequential detection of incorporated labeled nucleotide determines the sequence of the nucleic acid. In some embodiments, the sequencing comprises obtaining paired end reads.

[00028] In some embodiments, sequencing of nucleic acid is performed using whole genome sequencing (WGS), rapid WGS, whole exome sequencing (WES), targeted gene panel sequencing, next-generation sequencing (NGS), or any combination thereof. In some embodiments, targeted sequencing is performed and may be either DNA or RNA sequencing. The targeted sequencing may be to a subset of the whole genome. In some embodiments the targeted sequencing is to introns, exons, non-coding sequences or a combination thereof. The DNA is sequenced using a NGS platform, which is massively parallel sequencing. NGS technologies provide high throughput sequence information, and provide digital quantitative information, in that each sequence read that aligns to the sequence of interest is countable. In certain embodiments, clonally amplified DNA templates or single DNA molecules are sequenced in a massively parallel fashion within a flow cell (e.g., as described in WO 2014/015084). In addition to high-throughput sequence information, NGS provides quantitative information, in that each sequence read is countable and represents an individual clonal DNA template or a single DNA molecule. The sequencing technologies of NGS include pyrosequencing, sequencing-by -synthesis with reversible dye terminators, sequencing by oligonucleotide probe ligation and ion semiconductor sequencing. DNA from individual samples can be sequenced individually (i.e., singleplex sequencing) or DNA from multiple samples can be pooled and sequenced as indexed genomic molecules (i.e., multiplex sequencing) on a single sequencing run, to generate up to several hundred million reads of DNA sequences. Commercially available platforms include, e.g., platforms for sequencing- by-synthesis, ion semiconductor sequencing, pyrosequencing, reversible dye terminator sequencing, sequencing by ligation, single-molecule sequencing, sequencing by hybridization, and nanopore sequencing. In embodiments, the methodology of the disclosure utilizes systems such as those provided by Illumina, Inc, (HiSeq™ XI 0, HiSeq™ 1000, HiSeq™ 2000, HiSeq™ 2500, HiSeq™ 4000, NovaSeq™ 5000, NovaSeq™ 6000, Genome Analyzers™, MiSeq™ systems), Applied Biosystems Life Technologies (ABI PRISM™ Sequence detection systems, SOLiD™ System, Ion PGM™ Sequencer, ion Proton™ Sequencer).

[00029] The terms “polynucleotide,” “nucleotide sequence,” “nucleic acid,” and “oligonucleotide” are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three-dimensional structure, and may perform any function, known or unknown. Polynucleotides may be single- or multi-stranded (e.g., single-stranded, double-stranded, and triple-helical) and contain deoxyribonucleotides, ribonucleotides, and/or analogs or modified forms of deoxyribonucleotides or ribonucleotides, including modified nucleotides or bases or their analogs. Because the genetic code is degenerate, more than one codon may be used to encode a particular amino acid, and the present invention encompasses polynucleotides which encode a particular amino acid sequence. Any type of modified nucleotide or nucleotide analog may be used, so long as the polynucleotide retains the desired functionality under conditions of use, including modifications that increase nuclease resistance (e.g., deoxy, 2'-0-Me, phosphorothioates, and the like). Labels may also be incorporated for purposes of detection or capture, for example, radioactive or nonradioactive labels or anchors, e.g., biotin. The term polynucleotide also includes peptide nucleic acids (PNA). Polynucleotides may be naturally occurring or non-naturally occurring. Polynucleotides may contain RNA, DNA, or both, and/or modified forms and/or analogs thereof. A sequence of nucleotides may be interrupted by non-nucleotide components. One or more phosphodiester linkages may be replaced by alternative linking groups. These alternative linking groups include, but are not limited to, embodiments wherein phosphate is replaced by P(0)S (“thioate”), P(S)S (“dithioate”), (0)NR ₂ (“amidate”), P(0)R, P(0)OR, CO or CH2 (“formacetal”), in which each R or R' is independently H or substituted or unsubstituted alkyl (1-20 C) optionally containing an ether (—0—) linkage, aryl, alkenyl, cycloalkyl, cycloalkenyl or araldyl. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, intergenic DNA, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro- RNA (miRNA), small nucleolar RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, adapters, and primers. A polynucleotide may include modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component, tag, reactive moiety, or binding partner. Polynucleotide sequences, when provided, are listed in the 5' to 3' direction, unless stated otherwise.

[00030] In embodiments, sequencing includes use of a panel of oligonucleotides. For example, a panel is useful where sequencing includes WES or targeted gene panel sequencing.

[00031] As such, the invention provides a panel having one or more oligonucleotides. In embodiments, the oligonucleotides include one or more oligonucleotides selected from SEQ ID NOs: 1-41 as shown in Table I.

Table I

[00032] Polynucleotides of the present invention, such as oligonucleotides of the panel of the invention may be DNA or RNA molecules of any suitable length. For example, one of skill in the art would understand what lengths are suitable for oligonucleotides to be utilized in targeted gene panels. Such molecules are typically from about 50 to 150, 50 to 140, 50 to 130, 50 to 120, 50 to 110, 50 to 100, 50 to 100, 50 to 90, 50 to 80, 50 to 70 or 50 to 60 nucleotides in length. For example the molecule may be about 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115 or 120 nucleotides in length. Such polynucleotides may include from at least about 50 to about 120 nucleotides or more, including at least about 50 nucleotides, at least about 55 nucleotides, at least about 60 nucleotides, at least about 65 nucleotides, at least about 70 nucleotides, at least about 75 nucleotides, at least about 80 nucleotides, at least about 85 nucleotides, at least about 90 nucleotides, at least about 95 nucleotides, at least about 100 nucleotides, at least about 110 nucleotides, at least about 120 nucleotides or greater than 120 nucleotides.

[00033] As used herein,“polypeptide” refers to a composition comprised of amino acids and recognized as a protein by those of skill in the art. The conventional one-leter or three- leter code for amino acid residues is used herein. The terms“polypeptide” and“protein” are used interchangeably herein to refer to polymers of amino acids of any length. The polymer may be linear or branched, it may include modified amino acids, and it may be interrupted by non-amino acids. The terms also encompass an amino acid polymer that has been modified naturally or by intervention; for example, disulfide bond formation, glycosylation, bpidation, acetylation, phosphorylation, or any other manipulation or modification, such as conjugation with a labeling component. Also included within the definition are, for example, polypeptides containing one or more analogs of an amino acid (including, for example, unnatural amino acids, synthetic amino acids and the like), as well as other modifications known in the art.

[00034] As used herein, the term“sample” herein refers to any substance containing or presumed to contain nucleic acid. The sample can be a biological sample obtained from a subject. The nucleic acids can be RNA, DNA, e.g., genomic DNA, mitochondrial DNA, viral DNA, synthetic DNA, or cDNA reverse transcribed from RNA. The nucleic acids in a nucleic acid sample generally serve as templates for extension of a hybridized primer. In some embodiments, the biological sample is a biological fluid sample. The fluid sample can be whole blood, plasma, serum, ascites, cerebrospinal fluid, sweat, urine, tears, saliva, buccal sample, cavity rinse, feces or organ rinse. The fluid sample can be an essentially cell-free liquid sample (e.g., plasma, serum, sweat, urine, and tears). In other embodiments, the biological sample is a solid biological sample, e.g., feces or tissue biopsy, e.g, a tumor biopsy. A sample can also comprise in vitro cell culture constituents (including but not limited to conditioned medium resulting from the growth of cells in cell culture medium, recombinant cells and cell components). In some embodiments, the sample is a biological sample that is a mixture of nucleic acids from multiple sources, /. e.. there is more than one contributor to a biological sample, e.g., two or more individuals. In one embodiment the biological sample is a dried blood spot.

[00035] In the present invention, the subject is typically a human but also can be any species with methylation marks on its genome, including, but not limited to, a dog, cat, rabbit, cow, bird, rat, horse, pig, or monkey.

[00036] COMPUTER SYSTEMS

[00037] The present invention is described partly in terms of functional components and various processing steps. Such functional components and processing steps may be realized by any number of components, operations and techniques configured to perform the specified functions and achieve the various results. For example, the present invention may employ various biological samples, biomarkers, elements, materials, computers, data sources, storage systems and media, information gathering techniques and processes, data processing criteria, statistical analyses, regression analyses and the like, which may carry out a variety of functions. In addition, although the invention is described in relation to genetic analysis, the present invention may be practiced in conjunction with any number of applications, environments and data analyses; the systems described herein are merely exemplary applications for the invention.

[00038] Methods for genetic analysis according to various aspects of the present invention may be implemented in any suitable manner, for example using a computer program operating on the computer system. An exemplary genetic analysis system, according to various aspects of the present invention, may be implemented in conjunction with a computer system, for example a conventional computer system comprising a processor and a random access memory, such as a remotely-accessible application server, network server, personal computer or workstation. The computer system also suitably includes additional memory devices or information storage systems, such as a mass storage system and a user interface, for example a conventional monitor, keyboard and tracking device. The computer system may, however, comprise any suitable computer system and associated equipment and may be configured in any suitable manner. In one embodiment, the computer system comprises a stand-alone system. In another embodiment, the computer system is part of a network of computers including a server and a database.

[00039] The software required for receiving, processing, and analyzing genetic information may be implemented in a single device or implemented in a plurality of devices. The software may be accessible via a network such that storage and processing of information takes place remotely with respect to users. The genetic analysis system according to various aspects of the present invention and its various elements provide functions and operations to facilitate genetic analysis, such as data gathering, processing and/or analysis. The present genetic analysis system maintains information relating to samples and facilitates analysis, For example, in the present embodiment, the computer system executes the computer program, which may receive, store, search, analyze, and report information relating to the genome. The computer program may comprise multiple modules performing various functions or operations, such as a processing module for processing raw data and generating supplemental data and an analysis module for analyzing raw data and supplemental data to perform genetic analysis.

[00040] The procedures performed by the genetic analysis system may comprise any suitable processes to facilitate genetic analysis. In one embodiment, the genetic analysis system is configured to determine allele concordance.

[00041] The genetic analysis system may also provide various additional modules and/or individual functions. For example, the genetic analysis system may also include a reporting function, for example to provide information relating to the processing and analysis functions. The genetic analysis system may also provide various administrative and management functions, such as controlling access and performing other administrative functions.

[00042] The following example is provided to further illustrate the advantages and features of the present invention, but it is not intended to limit the scope of the invention. While this example is typical of those that might be used, other procedures, methodologies, or techniques known to those skilled in the art may alternatively be used. EXAMPLE I

NGS WGS SAMPLE IDENTITY ASSURANCE

[00043] The following methodology was utilized to determine sample identity.

[00044] Step 1. STR amplification workflow.

1. Setup STR PCR amplification using genomic DNA or a blood spot (1.2 mm) with ThermoFisher GlobalFiler™ PCR amplification kit.

2. Running the cycling using AB Veriti™ PCR machine.

3. Set up for electrophoresis on AB Genetic Analyzer™.

4. Generate STR allele profiling using GeneMapper™ software.

5. Report STR allele profiling.

[00045] Result from STR amplification workflow: 24 pairs of numbers - also known as an allelotype that now comprise a digital fingerprint of the individual DNA.

[00046] Additionally, warning flags may be raised if the sample contains no DNA, non human DNA, or more than one individual DNA.

[00047] Step 2. WGS or WES workflow (may be performed in parallel with Step 1).

1. Using the same sample (or another sample from the same individual), making PCR-free Illumina WGS™ library (WGS™ library naturally includes the biometric marker DNA fragments). Alternatively, or in addition to, using the same sample (or another sample from the same individual), using solution capture targeting approach, making the WES™ library using commercially available KAPAHyper™ barcoded paired end library coupled with the IDT xGEN™ WES probes, the mitochondrial panel and/or the custom biometric marker probes shown in Table I). The custom biometric marker probes of Table I capture sample DNA in the vicinity of the biomarkers for WES™ library.

2. Loading the sample on HiSeq™2500 or 4000 or NovaSeq™ 6000 or other high- throughput sequencer.

3. Perform post-sequencing analysis including: demultiplexing, mapping, and diagnostic variant call.

4. Computationally determine the STR allele profile (using specialized software, such as lobSTRTM).

5. Report STR called by WGS data.

[00048] Result from WGS workflow: at least 21 pairs of numbers (and additional N/D ("not determined") calls) - an allelotype that comprise an independent digital fingerprint of the individual DNA. [00049] Additionally, warning flags may be raised if the sample contains no DNA, non human DNA, or more than one individual DNA.

[00050] Step 3. Generate concordance using STR allele profiling called by GlobalFiler™ and called by WGS or WES or Panel sequencing.

[00051] Using a statistically derived inference from past samples (as well as extensive scientific validation of both aforementioned workflows), it is then possible to ascertain the level of probability that the two allelotypes come from the same individual and assure than no accidental swaps, mixes or other interference occurred while processing the individual DNA. This assures that the clinical genetic diagnosis produced in Step 2 is for the intended individual.

[00052] The allele profiling concordance table shown below was generated using the method described herein.

Table II

STR vs WGS Sample 1 vs others

SampleOl 97 50

Sample02 100 50

Sample03 97 38.2

Sample04 97 29.4

Sample05 97 35.3

Sample06 95 32.4

Sample07 97 29.4

Sample08 97 41.2

Sample09 97 38.2

SamplelO 100 38.2

Samplel l 91 32.4

Samplel2 97 29.4

Samplel3 94 29.4

Samplel4 97 41.2

Samplel5 94 38.2

Samplel6 97 35.3

Samplel7 93.1 29.4

Samplel8 86.2 35.3

Samplel9 85.7 38.2 Sample20 100 32.4

Sample2l 100 38.2

Sample22 100 44.1

Sample23 100 26.5

Sample24 100 29.4

Sample25 93.8 44.1

[00053] Core biometric DNA capture reagent sequences may be synthesized as shown in Table I and added to the gene targeting sequence panel (including WES) DNA capture reagents. In some embodiments longer DNA reagent sequences can be designed using the reference human genome sequence that surrounds the core DNA sequences shown above. In one embodiment, using IDT xGEN™ Exome Research Panel vl.O with the IDT xGEN™ Lockdown Custom Probes, oligonucleotides of length 120 are used whose sequence include the Core biometric DNA capture reagent sequences shown above in Table I.

[00054] Although the invention has been described with reference to the above examples, it will be understood that modifications and variations are encompassed within the spirit and scope of the invention. Accordingly, the invention is limited only by the following claims.

Previous Patent: ROOFING MATERIALS INCLUDING A PARTING AGENT LAYER

Next Patent: FUSED TRICYCLIC HETEROCYCLE COMPOUNDS AND THERAPEUTIC USES THEREOF