Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
AUTOMATED PARSING PIPELINE FOR LOCALIZATION AND CONDITION CLASSIFICATION SYSTEM AND METHOD
Document Type and Number:
WIPO Patent Application WO/2020/092417
Kind Code:
A1
Abstract:
An automated parsing pipeline system and method for anatomical localization and condition classification is disclosed. The system comprises an input even source, a memory unit and processor including a volumetric image processor, a voxel parsing engine, localization layer and a detection module. The volumetric image processor is configured to receive volumetric image from the input source and parse the received volumetric image. The voxel parsing engine is configured to assign each voxel a distant anatomical structure. The localization layer is configured to crop a defined anatomical structure with surroundings. The detection module is configured to classify conditions for each defined anatomical structure within the cropped image. The disclosed system and method provide accurate localization of a tooth and detects several common conditions in each tooth.

Inventors:
EZHOV MATVEY DMITRIEVICH (RU)
ALEKSANDROVSKIY VLADIMIR LEONIDOVICH (RU)
SHUMILOV EVGENY SERGEEVICH (RU)
Application Number:
PCT/US2019/058637
Publication Date:
May 07, 2020
Filing Date:
October 29, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
DIAGNOCAT INC (US)
International Classes:
G06T1/20; G06N3/02; G06T15/08; G06T15/10; G06V10/764
Foreign References:
US20150161786A12015-06-11
US20180116620A12018-05-03
US20070127798A12007-06-07
US20170024373A12017-01-26
US20180259608A12018-09-13
Attorney, Agent or Firm:
GALLENSON, Mavis S. et al. (US)
Download PDF:
Claims:
We Claim:

1. An automated parsing pipeline system for anatomical localization and

condition classification, said system comprising:

a processor;

a non-transitory storage element coupled to the processor;

encoded instructions stored in the non- transitory storage element, wherein the encoded instructions when implemented by the processor, configure the automated parsing pipeline system to:

receive at least one volumetric image;

parse the received volumetric image into at least a single image frame field of viewy

pre-process the parsed volumetric image by at least controlling for image intensity value;

localize a present tooth inside the pre-processed and parsed volumetric image and identifying it by number;

extract the identified tooth and surrounding context within the localized volumetric image; and

classify a tooth’s conditions based on the extracted volumetric image.

2. The system of claim 1, wherein the at least one received volumetric image comprises a 3-D pixel array.

3. The system of claim 2, further configured to pre-process by converting the 3-

D pixel array into an array of Hounsfield Unit (HU) radio intensity measurements.

4. The system of claim 1, further configured to pre-process at least one of the localization or classification steps by rescaling using linear interpolation.

5. The system of claim, wherein the pre-processing comprises using any one of a normalization schemes to account for variations in image value intensity depending on at least one of an input or output of volumetric image.

6. The system of claim 1, wherein the localization is achieved using a V-Net- based fully convolutional neural network.

7. The system of claim 1, further configured to extract anatomical structure by finding a minimum bounding rectangle around the localized and identified tooth.

8. The system of claim 7, wherein the bounding rectangle extends by at least 15 mm vertically and 8 mm horizontally (equally in all directions) to capture the tooth and surrounding context.

9. The system of claim 1, wherein the classification is achieved using a DenseNet

3-D convolutional neural network.

10. An automated parsing pipeline system for anatomical localization, said system comprising:

a volumetric image processor;

a voxel parsing engine;

a localization layer;

a processor;

a non-transitory storage element coupled to the processor;

encoded instructions stored in the non-transitory storage element, wherein the encoded instructions when implemented by the processor, configure the automated parsing pipeline system to:

receive at least one volumetric image data received from a radio-image gathering source by the volumetric image processor;

parse the received volumetric image data into at least a single image frame field of view by said volumetric image processor;

localize anatomical structures residing m the at least single field of view by assigning each voxel a distinct anatomical structure by the voxel parsing engine; and

select all voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure by the localization layer.

1 1. The system of claim 10, wherein the at least one received volumetric image comprises a 3-D pixel array.

12. The system of claim 11, further configured to pre-process by converting the 3- D pixel array into an array of Hounsfield Unit (HU) radio intensity measurements.

13. The system of claim 10, further configured to pre-process at least one of the localization or classification steps by rescaling using linear interpolation.

14. The system of claim 10, wherein the pre-processing comprises using any one of a normalization schemes to account for variations in image value intensity depending on at least one of an input or output of volumetric image.

15. The system of claim 10, wherein the localization is achieved using a V-Net- based fully convolutional neural network.

16. The system of claim 10, wherein the extraction is achieved by finding a minimum bounding rectangle around the localized and identified tooth.

17. The system of claim 16, wherein the bounding rectangle extends by at least 15 mm vertically and 8 mm horizontally (equally in all directions) to capture the tooth and surrounding context.

18. An automated parsing pipeline system for anatomical localization and condition classification, said system comprising:

a volumetric image processor;

a voxel parsing engine; a localization layer;

a classification layer;

a processor;

a non-transitory storage element coupled to the processor;

encoded instructions stored in the non- transitory storage element, wherein the encoded instructions when implemented by the processor, configure the automated parsing pipeline system to:

receive at least one volumetric image data received from a radio-image gathering source by the volumetric image processor;

parse the received volumetric image data into at least a single image frame field of view by said volumetric image processor;

localize anatomical structures residing in the at least single field of view by assigning each voxel a distinct anatomical structure ID by the voxel parsing engine;

select all voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure by the localization layer; and

detect conditions for each defined anatomical structure within

cropped image by the classification layer.

19. The system of claim 18, wherein the at least one received volumetric image comprises a 3-D pixel array.

20. The system of claim 18, further configured to pre-process by converting the 3-

D pixel array into an array of Hounsfield Unit (HU) radio intensity measurements.

21. The system of claim 18, further configured to pre-process at least one of the localization or classification steps comprises rescaling using linear interpolation.

22. The system of claim 18, wherein the pre-processing comprises using any one of a normalization schemes to account for variations in image value intensity depending on at least one of an input or output of volumetric image.

23. The system of claim 18, wherein the localization is achieved using a V-Net- based fully convolutional neural network.

24. The system of claim 18, further configured to extract anatomical structure by finding a minimum bounding rectangle around the localized and identified tooth.

25. The system of claim 24, wherein the bounding rectangle extends by at least 15 mm vertically and 8 mm horizontally (equally in all directions) to capture the tooth and surrounding context.

26. The system of claim 1, wherein the classification is achieved using a

DenseNet 3-D convolutional neural network.

27. An automated parsing pipeline system for anatomical localization and condition classification, said system comprising:

a processor;

a non-transitory storage element coupled to the processor;

encoded instructions stored in the non- transitory storage element, wherein the encoded instructions when implemented by the processor, configure the automated parsing pipeline system to:

receive at least one 2D image;

parse the received image into at least a single image frame field of view pre-process the parsed image by at least controlling for image intensity value; localize a present tooth inside the pre-processed and parsed

image and identifying it by number;

extract the identified tooth and surrounding context within the localized image; and

classify a tooth’s conditions based on the extracted image.

28. A method for localizing a tooth and classifying a tooth condition, said method comprising the steps of:

receiving at least one volumetric image;

parsing the received volumetric image into at least a single image frame field of view;

pre-processing the parsed volumetric image by at least controlling for image intensity value; localizing a present tooth inside the pre-processed and parsed volumetric image and identifying it by number;

extracting the identified tooth and surrounding context within the localized volumetric image; and

classifying a tooth’s conditions based on the extracted volumetric image.

29 The system of claim 28, wherein the at least one received volumetric image comprises a 3-D pixel array.

30. The system of claim 28, further includes a step of: pre-processing by converting the 3-D pixel array into an array of Hounsfield Unit (HU) radio intensity measurements.

31. The system of claim 28, wherein the pre-processing for at least one of the localization or classification steps comprises rescaling using linear interpolation.

32. The system of claim 28, wherein the pre-processing comprises using any one of a normalization schemes to account for variations in image value intensity depending on at least one of an input or output of volumetric image.

33. The system of claim 28, wherein the localization is achieved using a V-Net- based fully convolutional neural network.

34. The system of claim 28, further comprises a step of: achieving extraction by finding a minimum bounding rectangle around the localized and identified tooth.

35. The system of claim 34, wherein the bounding rectangle extends by at least 15 mm vertically and 8 mm horizontally (equally in all directions) to capture the tooth and surrounding context.

36. The system of claim 28, wherein the classification is achieved using a DenseNet 3-D convolutional neural network.

37. A method for localizing a tooth and classifying a tooth condition, said method comprising the steps of:

receiving at least one volumetric image data received from a radio image gathering source by a volumetric image processor;

parsing the received volumetric image data into at least a single image frame field of view by said volumetric image processor;

pre-processing the at least single image frame field of view by controlling for image intensify value by the volumetric image processor; localizing an anatomical structure residing in the at least single pre- processed field of view by assigning each voxel a distinct anatomical structure ID by the voxel parsing engine;

selecting all voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure by the localization layer; and

classifying conditions for each defined anatomical structure

within the cropped image by the classification layer.

Description:

CROSS-REFERENC TO RELATED APPLICATION

This application claim» the benefit of ITS. Non-Provisional Patent Application No,

16/1 75,067 titled‘'System and Method for an Automated Parsing Pipeline for Anatomical Localization and Condition Classiftcaiion/ filed October 30, 2018, the content of which is hereby incorporated by reference in its entirety tor ail purposes.

BACKGROUND

Field

(0001 | litis invention relates generall to diagnostics, and more specifically to m automated parsing pipeline system and method for anatomical localization and condition classification.

Related Art

|b O2| Modern image generation systems play an important role in disease detection and treatment planning. Few existing systems and methods were discussed as follows.. One commo method utilized is dental radiography, which provides dental radiographic images tha enable the dental professional to identify many conditions that ma otherwise go undetected and to se conditions that cannot be identified clinically. Another technology is cone beam computed tomography (CBCT) that allows to view structures k the oral ' maxillofacial complex in throe dimensions. Hence, co beam computed tomography technology is most desired over the dental radiography

10003] However, CBC includes one or ore limitations, such as time consumption and complexity tor personnel to become fully acquainted with the imaging software and correctly using digital imaging and communications In medicine (DICOM) data American Dental Association (ADAJ also suggests that the CBCT image should be evaluated by a dentist with appropriate training and education in CBCT interpretation.

Further, many dental professionals who incorporate this technology into their practices have not had the training required to interpret data on anatomic areas beyond the maxilla and the mandible. To address the foregoing issues, deep learning lias been applied to various medical imaging problems to interpret the generated images, but Its use remains limited within the field of dental radiography. Further, most applications only work with 2D X-ray images

|00Q4] Another existing article entitled“Teeth and jaw 313 reconstruc tion in stomatology . Proceedings of the International Conference on Medical Information

Visualisation . BioMedieal Visualisation, pp 23-23, 2007, researcher Krsek ei ί describe a method dealing with problems of 3D tissue reconstruction tin stomatology. In this process, 313 geometry models of teeth and jaw· bones ware created based on input (computed

tomography) CT image data. The input discrete CT data were segmented by a nearly automatic procedure, with manual correction and verification. Creation of segmented tissue 3D geometry models was based on vectorfoation of Input discrete data extended by smoothing and decimation. The actual segmentation operation was primarily based on .selecting a threshold of Hounsfield Unit values. However, this metho fails to bo sufficiently robust for practical use.

{0005) Another existing patent number IJSB849Q16, entitled“Panoramic image generation from CBCT dental images”to Shoupu Chen el al discloses a method fer forming

SS

a panoramic image from a computed tomography image volume, acquires image data elements Ibr one or more computed tomographic volume images of a subject* identifies a subset of the acquired computed tomographic images that contain one or more features of interest and defines, from the subset of the acquired computed tomographic images, a snb- voiume ha ving a curved shape that includes one or mure of the contained features of interest The curved shape is unfolded by defining a set of unfold lines wherein each unfold line extends at least between t wo curved surfaces of the carved shape sub-volume and re-aligning the image data elements within the curved shape sub- volume according to a re-a!lgnmeni of the unfold lines. One or more views of the unfolded sub-volume are displayed.

{0006) Another existing patent application number US200S0232539, entitled ¾ Method for the reconstruct ion of a panoramic image of an object, and a computed tomography scanner implementing said method." to Alessandro Pasi el al discloses a method for the reconstruction of a panoramic image of the dental arches of a patient, a computer program product:, and a computed tomography scanner implementing said method. The method involves acquiring volumetric tomographic data of the object; extracting, from the volumetric tomographic data, tomographic data corresponding to at least three sections of the object identified by respective mutually parallel planes: determining, on each section extracted a respective trajectory that a profile of the object follows in an area corresponding to said section; determining a first surface transverse to said planes such as to comprise the trajectories, and generating the panoramic image on the basi s of a part of the volumetric tomographic data identified as a function of said surface. However, the above references also fad to address the afore: discusse problems regarding the cone beam computed: tomography technology and image generation system.

{0607) Therefore, there is a need for an automated parsing pipeline system and method for anatomical localization and condition classification.

SUMMARY

|f0O8| A system of one or more computers can be configured to perform particular operations or actions by -virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions, One or more computer programs can be configured to perform particular operations or act inns by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions. Embodiments disclosed include an automated parsing pipeline system and method for anatomical localization and condition classification,

|O609) in an embodiment, the system comprises an input event source, a memory unit in communication ith the input event source, a processor in communication with the memory unit, a volumetric image processor in communication with the processor, a voxel parsing engine in communication with the volumetric image processor and a localizing layer In communication with the voxel parsing engine. In one embodiment, the memory unit Is a non-iransitory storage dement storing encoded information. In one embodiment, at least on© volumetric image data Is received from the input event source by the volumetric image processor, In one embodiment the input event source is a radio-image gathering source, j 11810| The processor is configured to parse the at least one received volumetric image da ta into at least a single image frame field of iew by the volumetri c image processor. The processor Is further configured to localize anatomical structures residing in the at least single field of view by assignin each voxel a distinct anatomical structure by the voxel parsing engine in one embodiment, the single image frame field of vie is pre- processed for localization, which involves rescaling using linear interpolation, The preprocessing involves use of any one of a normalization schemes to account for variations in Image value intensity depending on at least one of an Input or output of volumetric image. In one embodiment, localization is achieved using a V~Nei~based fully convolutional neural network

(Ofil ! ) The processor is further configured to select all voxel belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region for croppin as a defined anatomical structure by the localization layer. The bounding rectangle extends by at least I S mm vertically and 8 mm horizontally (equally in all directions) to capture th tooth and surrounding context. In one embodiment, the automated parsing pipeline system further comprises a detection mod ule. The processor is configured to detect or classify the condition for each defined anatomical structure within the cropped image fr a detection module or classification layer, In one embodiment, the classification is achieved using a DenseNet 3-D convolutional neural network.

f( 12j In another embodiment, an automated parsing pipeline method tor anatomical localization and condition classification is disclosed. At one step, at least one volumetric image data k received from an input event source by a volumetric image processor. At another st p, the received volumetric image data is parsed into at least a single image he e Held of view by the volumetric image processor. At another step, the single image frame field of view k pre-processe by controlling image intensity value by the volumetric image processor. At another step, the anatomical structure residing in the single pre-processed field of view is localized by assignin each voxel a distinct anatomical structure 113 by the voxel parsing engine. At another step, all voxels belonging to the localised anatomical structure Is selected by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure by the localization layer in another embodiment, the method includes a step of classifying the conditions lor each defined anatomical structure within the cropped image by the

classification layer

ffKii 31 Other: embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one: or more computer storage devices, each configured to perform the actions of the methods.

BRI EF DESCRIPTION OF THE DRAWINGS j0014| FIG, I A illustrates in a block diagram, an automated parsing pipeline system for anatomical localisation and condition classification, according to an embodiment.

|001Sf FIG. IB illustrates in a block diagram, an automated parsing pipeline system for anatomical localization and condition classification, according to another embodiment. ilSj FIG. 2A illustrates in a block diagram, an automated arsing pipeline s stem for anatomical localization and condition classification according to yet another embodiment.

FIG. 2B illustrates in a block diagram, a processor system according to an embodi ent.

FIG . 3A. illustrates in a flow diagram, an automated parsing pipeline method for anatomical localization and condition classification* according io an embodiment.

(6 19) FIG. 3B illustrates in a flow diagram, an automated parsing pipeline method for anatomical localization and condition classification, according to another embodiment.

(0601 ) FIG. 4 illustrates in a block diagram, the automated parsing pipeline architecture according to an embodiment.

(0tHI2| FIG. 5 illustrates in a screenshot, an example of ground truth and predicted masks in an embodiment of the present invention.

(O003| FIG. 6A, 6B & bC illustrates in a screenshot, the extraction of anatomical structure by the localization model of the sy stem in an embodiment of the present invention.

(0004) FIG, ? illustrates in a graph * receiver operating characteristic (ROC) curve of a predicted tooth condition in an embodiment of the present Invention.

DETAILED DESCRIPTION

(0005) Specific embodiments of the invention will now he described in detail with reference to the accompanying FlGs. 1A-7, In the following detailed description of embodiments of the i vention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. In other Instances, well-known features have

? mi een described in etail to avoid obscuring the invention. Embodiments disclosed include m automated parsing pipeline system and method for anatomical localization and condition classification.

[0000] FIG, I A illustrates a block diagram 100 of the system comprising an input even? source 103. a memory unit 102 in communication with the input event source 101 a processor 103 In communication with the memory unit 102, a volumetric Image processor 103a in communication ith the processor 103, a voxel parsing engine 104 in

communication wills the volumetric image processor 103a and a localizing laye 105 in communication with the voxel parsing engine 104. In an embodiment, the memory unit 102 is a non-transitory storage element storing encode information. The encoded instructions when implemented by the processor 103, configure the automated pipeline system to localize an anatomical structure and classify the condition of the localized anatomical structure,

10007] In one embodiment, an input data is provided via the input event source 101. In one embodiment the input data is a volumetric image data and the input event source 101 is a radio-image gathering source, In one embodiment, the input data is 2D image data. Tlm volumetric image data comprises 3-D pixel array. The volumetric image processor 103a is configured to receive the volumetric image data from the radio-image gathering source. Initially, the volumetric imag data is pre-processed, which involves conversion of 3-D pixel array into an array of llounsl d Unit (HU) radio intensity measurements.

100081 The: processor 103 is further confi ured to parse at least one received volumetric image data 103b into at least a single image frame Held of view by the volumetric ima e processor. ci {0009} The processor 103 is further configured to localize anatomical structures residing in the single image frame field of view by assigning each voxel a distinct anatomical struct re by the voxel parsing engine 104 In one embodiment, the single image frame field of view is pre-processed for localization, which involves rescaling usin linear interpolation. The pre-processing involves use of any one of a normalization sche es to account for variations in image value intensity depending on at least one of an input or output of volumetric image in one embodiment, localization is achieved using a V~Net~hased folly convolution?!! neural network. In one embodiment, the V-Het s a 3D generalization oi lINet.

[0010] The processor 103 is further configured to select all voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure by the localization layer The bounding rectangle extends by at least IS ram vertically an 8 mm horizontally (equally in all directions} to capture die tooth and surrounding context,

[001 1 ] FIG I B illustrates in a block diagram 110, an automated parsing pipeline system lor anatomical localization and conditio classification, according to another embodiment, The automated parsing pipeline system further comprises a detection modulo 106. The processor S03 is configured to detect or classify the conditions for each defined anatomical structure within the cropped image by a detection module or classification layer Hid. n one embodiment, the classification is chieve using a DenseNet 3-D convolutional neural network,

[00I2J In one embodiment, the localization layer 10S includes 33 class semantic segmentation in 3D. In one embodiment, the system Is configured to classif each voxel as one of 33 teeth or background and resulting segmentation assign each voxel to one of 33 classes. In another embodiment , the -A Mere is configured to classify each voxel as either tooth or other anatomical structure of interest In case of localizing only teeth, the classification includes, but not limited to, 2 classes. Then individual instances of every class (teeth) could be split, erg, by separately predicting a boundary between them, In some embodiments, the anatomical structure being localized, includes, but not limited to, teeth, upper and lower jaw bone, sinuses, lower jaw canal and joint

{00131 In one embodiment, the system utilizes follywonvohmonal network. In another embodiment, the system works on downscaled images (typically from 0 1 -0,2mm voxel resolution to 1 ,0mm resolution) and grayscale (I -channel) Image (say,

1x100x100x100-dimensional tensor). In yet another embodiment, the system outputs 33- ehannel image Isay, 33x1 OOx 1 OOx i OO-dimensional tensor) that i interpreted as a probability distribution for eon-tooth vs. each of 32 possible (for adult human) teeth, for every pixel.

10014] In an alternative embodiment, the syste provides 2-c!ass segmentation, which includes labelling or elassifi call on, if the localization comprises tooth or not The system additionally outputs assignment of each tooth voxel to a separate "tooth instance",

|001S} In one embodiment, the system comprises VNet predicting multiple "energy levels", which are later used to find boundaries. In another embodiment, a recurrent neural network could be used lor step by step prediction of tooth, and keep i efe of the teeth that were outputted a step before in yet another embodiment, Mask-RCNN generalized to 3D could be used by the system. In yet another embodiment, the system could take multiple crops from 3D image in original resolution, perform instance segmentation, and then join crops to form mask for all original image in: another embodiment, the system could apply either segmentation or object detection in 2D, to segment axial slices lids would allow to process images In original resolution (albeit rt 20 nstead of 3D) and then irrfer 3D shape fro 2D segmentation.

fhlfldf in one embodiment, the system could he implemented utilizing descriptor learning in the multitask learning framework be., a single network learning to output predictions for multiple dental conditions. This could he achieved by balancing loss between tasks to make sure every class of every task have approximatel same impact on the learning.

The oss is balanced by maintaining a running average gradient that network receives from every class* task and normaiking it. Alternatively, descriptor learning could be achieved by teaching network on batches consisting data about a single condition (task) and sample examples into these batches in such a way that all classes will have same number of examples in batch (which is generally net possible in multitask setup). Further, standard data augmentation could be applied to 3D tooth images to perform scale, crop, rotation, vertical flips. Then, combining all augmentations and final image resize to target dimensions in a single affitte transform and apply ail at once.

1110.17) Advantageously, m some embodiment* to . accumulate positive cases faster, weak model could be trained and run the model on all of uniabeled data. From resulting predictions, teeth model that gives high scores on some rare pathology of interest are selected. Then, the teeth are sent to be labelled by humans or users an added to the dataset (both positi ve and negative human labels). This allows to quickly and cosh efficiently build up more balanced dataset for rare pathologies,

( ( Mil 8) in some embodiments, the system i use coarse segruenfaiion mask from localizer as an input instead of tooth image. In some embodiments, the descriptor could

J. J. be framed to output Sine segmentation mask from some of the intermediate layers. In some embodiments, tile descriptor could be trained to predict tooth number,

(0019} As an alternative to ultitask learning approach, "one network per condition" could be employed, i,e models for different condition are complete!)-· separate models that share no parameters, Another alternative is to have a small shared base network and use separate subnetworks connected to this base network, responsible for specific cond iti ons/b i agnoses.

(00 01 FIG. 2A illustrates in a block diagram 200, an automated parsing pipeline system for anatomical localization and condition classification according to yet another embodiment:. In an embodiment, the system compr es an input system 204, an output system 202, a memory system or unit 2116, a processor system 2 an input/ouipoi system 21 an an interface 212, Referring to FIG. 2B, the processor system 208 comprises a volumetric image processor 208a, a voxel parsing engine % b in communication with the volumetric image processor 208a, a localization ayer 208c in communication with the voxel parsing engi ne 208 and a detection module 208d in communication with the localization module 208c. The processor 208 is configure to receive at least one volumetric image via art input system 202, At least one received: volumetric image comprise a 3-D pixel array. Hie 3-D pixel army is pre-processed to convert into an array of Hounsileld Unit (HU) radio intensity measurements. T hen, the processor 208 is configured to parse the received Volumetric image data into at least a single image frame field of view by the said volumetric image processor

fOOll j The anatomical structures residing in the at least single field of view is localized by assigning each voxel a distinct anatomical structure by t e : voxel parsing engine 0.

100221 The processor 2R8 is configured to select ail voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels a i the surrounding region far cropping as a defined anatomical structure by the localization layer 20tk. Then, the conditions 1%? each defined anatomical structure within the croppe image is classified by a detection module or classification layer 208<l.

0023] Below is an example of a hypothetical use case where the . colors of the

Image steeds to be inverted, retaining the same transparency as the original image frame, for further analysis. The resulting pixel matrix is broken down to its Individual pixel, tagged, and assigned an identity in the pixel matrix layer, before being passed onto the pixel aggregator for mapping into the pixel identity database (not shown). The following represents an i 1 ins Crab ve al gortihm ;

Step 1 : Load the ... Image into memory

Step 2: Retrieve a pixel from the pixel matrix

Step 3: Retrieve the ..color of the pixel eotnputed In Step 2

Step 4: invert the ... color of the ...pixel from Step 3 into inverse tor each component R s fi &B

Step 5: Store the color computed in Step 4 back to the pixel matrix Step 6: Repeat Steps 2 S, for ail pixels.

Step 7; Update the Image with the newly computed pixel matrix. The tbllowing represents illustrative pseudo code; image image;

pixels image, GetPixelMatrix ();

for { lot h · 0; b < in-age. height: h44 )

i

\

ibr { lot vv ::: 0; < image. width; w-H- } color color ::: pixels [h] [w];

color - color ( 255 - oolor.R, 255 - colonC, 255 - cofer.B, coIonA ); pixels jhj [w] color;

image. SctPixel Matrix ( . pixels };

|0O24| The hollowing example demonstrates a possible hashing of the pixel identities- b uses the color names as keys tor each bucket.

Hash map < String, ... Color, lot, lot > pixel identity

new Hash map < String, .. Color, !m, !nt > ():

pixeiidentityinsert ("red " . Color ( 255. 0, 0, 25$ ), Ck 0 );

pixeiidentity. insert ("red", Color ( 224. 0, 0, 255 ), 0, 1 }; pi el Identity. Insert (“blue", Color ( 0, 0, 255, 255), 250, 0 );

pixel identity .insert ("blue". Color ( 0, 0, 230, 255), 350, 1 );

[0025] The above pseudo code snippet demonstrates the possibility of grouping similar Alternatively. the data can he flattened and encoded into one value for hashing

for example: flash map < Siring, .Color, I hi, Ini > pixel Identity

; ® new Hash map < Siring, Color, Inf Inf > (};

pixelldentiiy insert {“red", Color ( 255, 0, 0, 255 ), 0, 0 ); :

can be written as: Hash map < Siring, String > pixeildemity

- new Hash map < Siring, String > ( ;

pixeSldentltv. Insert (W,“255000000:5500000000000000000000” );

[0026] FIG. 3A illustrates in a flow diagram 31)0, an automated parsing pipeline etho for anatomical localization and condition classification, according to an embodiment At stop 3111 , an input image data is received In one embodiment, the image data is a volumetric image data. At stop M2, the received volumetric image is parsed into at l ast a single image frame field of view. The parsed volumetric image Is pre-processed by

Controlling image intensity value,

[0027] At step 304, a tooth or anatomical structure inside the pre-processed and parsed volumetric image is localized and identified by tooth number. At step 306, the identified tooth and surrounding context within the localized volumetric image are extracted, A t step: 308,. a visual report is reconstructed with localized an defined anatomical structure, hi some embodiments, the visual reports include, but not limited to, an endodontic report (with focus on tooth' s rooi/caual system and its treatment state), an implantation report (with focus on the area where the tooth is missing), and a dy topic tooth report for tooth extraction (with focus on the area of dystopic/impacted teeth).

{0028) f 10. 3B illustrates in a flow diagram 310, an automated parsing pipeline method for anatomical localisation and condition classification, according to another embodiment. At step 312, at least one volumetric image data is received from a radio-image gathering sou ce by a volumetric; i age processor,

|O029j At step 314, the received volumetric image data is parsed into at least a single image frame field of view by the volumetric image processor. At least single image frame field of view is pre-processed by controlling image intensity value by foe volumetric Image processor. At step 31b, an anatomical structure residing in foe at least single pre- processed field of vie is localized by assigning each voxel a distinct anatomical structure ID by the voxel parsing engine. At step 318, all voxels belonging to the focalized anatomical structure is selected by finding a minimal hounding rectangle around tire voxels and foe surrounding region for crapping as a defined anatomical structure by the localization layer.

At step 320, a visual report is reconstructed with defined and focalized anatomical structure. At step 322, conditions for each defined anatomical structure is classified within the cropped image by the classification layer.

10030] FIG. 4 illustrates in a block diagram 400, foe automated parsing pipeline architecture according to an embodiment. According to an embodiment, the system is configured to recei ve input image data from a plurality of capturing devices, or input event sources 402. A processor 404 including an image processor, a voxel parsing engine and a Idealization layer The b page processor i confi ured to parse image into each image frame and preprocess the parsed image. The voxel parsing engine is configured to configured to localize an anatomical structure residing in the at least single pre-processed field of vie by assigning each voxel a distinct anatomical structure ID. The localization layer is configured to select ail voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and th su rounding region for cropping as a defined anatomical structure. The detection module 40b is configured to detect the condition of the defined anatomical structure. The detected condition could be sent to the cloud/rentote server, for automation, to EMR and to proxy health provisioning 408. In anothe

embodiment, detected condition could be sent to controllers 4!ih The controllers 410 includes reports and updates, dashboard alerts, export option or store option to save, search print or entail and sigmin/verification unit.

[00311 Referring to FIG, 5 , an example screenshot 500 of tooth localization done by the present system, is illustrated. This figure shows; examples of fsets segmentation at axial slices of 30 tensor

j0032] Problem: Formulating the problem of tooth localization as a 33-class semantic segmentation. Therefore, each of the 32 teeth and the background are interpreted as separate classes.

|0033} Model A V-Net -based folly convolutional network is used Y-Net is a 6- level deep, with widths of 32; 64; 128: 256; 512; and 1024, The final lay er has an output width of 33, Interpreted as a sofimax distribution over each voxel, assigning it to either the background or one of 33 teeth. Each block contains 3*3*3 convolutions with padding of I and stride of 1 , followed by ReLU non-linear activations an a dropout with 0: 1 rate.

Instance normalization before each convolution is used. Batch normalization s not suitable in this case, as long as there is only one example in batch (GPU memory limits); therefore, butch statistics are not determined.

1 034} Different architecture modifications were tried during the research stage.

For exa ple, an architecture with 64; 64: 128; 128; 256: 2S6 units per layer leads to the vanishing gradient flow and, hus, no training. On the other hand, reducing architecture layers to the first three (three down and throe up) gives a comparable result to the proposed model, though the final loss remains higher,

fiHBS} Loss function; Let R be the ground truth segmentation with voxel values ri (0 or 1 lor each class), and P the predicted probabilistic map for each class with voxel values pi. As a loss function we use soil negative m uiii -class due card similarity, that can be defined

where N is the number of classes, which in our ease is 32. and b is a loss fimction stability coefficient that helps to a void a numerical issue of dividing by aero. Then the model is trained to convergence using an Adam Optimizer with learning rate of le - 4 and weight decay l e - 8. A batch size of I Is used due to the large memory requirements of using volumetric data and models. The training is stopped after 200 epochs and the latest

Checkpoint is used (validation loss does not increase after reach ng the convergence plateau).

jO03d| Results: The localization model is able to achieve a loss value of 0:28 on a test set. The background class loss is 0:0027, which means the model is a capable 2-way " tooth / not a tooth" segmentor, The localization intersection over anion (loll) between the tooihfo r n truth volumetric bounding bo¾ and the modd-predieted boun in box is also defined,. l.o the ease where a tooth is missing from groun truth and the mode! predicted any posi ive voxels (i.e the ground troth bounding box is not defined), localization loll is set to in In the ease where a tooth is missing from ground truth and the mode! did not predict any positive vo els for it. localization ioU is set to i . For a hurnaminterpretable metric, tooth loeanxaiion accuracy which is a percent of teeth is used that have a localization loll greater than 0 3 by deim tum The relatively low threshold value of 0;3 was decided from the manna! observation that even low localization loll values are enough to approximately localize teeth for the downstream processing. he localization model achieved a value of 0:963 loll metr c on the test set, which, on average, equates to the incorrect localization of 1 of 32 teeth.

|0037) Referring to ITGs. 6A-6C, an example screenshot {6(10.4, 60018, 600B) of tooth sub- vo lame extraction done by the present system, illustrated.

|0038| In order to tocos the downstream classification model on describing a speciik tooth of interest, the tooth and its surroundings is extracted fern the original study as a rectangular volumetric region, centered on the tooth, in order to ge the coordinates of the tooth, the upstream segmentation mask is used. The predicted volumetric: binary mask of each tooth is preprocessed by applying erosion, dilation, and then selecting the largest connected component. .4 minimum bounding rectangle is found around the predicted volumetric mask Then, the bounding box is extended by 15 vertically and 8

horizontally (equally in all directions) to capture the tooth context and to correct: possibly weak loeabxer performance Finally, a corresponding sub-volume Is extracted front the original clipped i a o, rescale it to 643 and pass it on to the classifier. An example of a sub- volume bounding box i presented in FiGs. 6A-6C. fCNB9j Referring to FIG. 7, a receiver operating characteristic (ROC) curve 700 of a predicted tooth condition is illustrated

f0040f Mood: The classification model has a OenseNei architecture The only d fference etwee the original and implementation of DeuseNe! by the present invention is a replacement of the 2D convolution layers oath 3D ones. 4 dense blocks of 6 layers is used, oath a growth rate of 48, and a compression factor of 0:5 Alter passing the 643 input through 4 dense blocks followed by down-sampling transitions, the resulting feature map is 543 x 2 s 2 a 2. This feature map is flattened and passed through a final linear layer that outputs 6 loans— each for a type of abnormality.

[06411 Loss function: Since tooth conditions are not mutually exclusive, binary cross entropy i used as a loss. To handle clas imbalance, weight each condition loss inversely pmporiional to its fre uency (positive rate) in the training set. Suppose that Fi is the frequency of condition 1. pi is its predicted probability (sigmoid on output of network) and ii is ground truth. Then: Li ::: ( I :: fi). ti Jog pi r F , (1 - ti) .fogfl - pi) is the loss function for condition i. The final example loss is taken as an average of the 6 condition losses.

[6642| Results: The classification model achieved average area under the receiver operating characteristic curve (ROC AUC) of 0:94 acros the 6 conditions. Ifor- condition scores arc presented in above table. Receiver operating characteristic (ROC) curves 700 of the 6 predicted conditions are Illustrated in FIG. 7. | 43 Advantageously, the present invention provides an end-to-end pipeline for detecting state or condition of the teeth in dental 3D CBCT scans. The condition of the teeth is detected by localising each present moth inside an image volume and predicting condition of the tooth from the volumetric image of a tooth and its surroundings. Further, the performance of the localization model allows to build a high-quality 2D panoramic reconstruction, which provides a fam liar and convenient way for a dentist to inspect a 3D CBCT image. The performance of the pipeline is improved by adding volumetric data augmentations during training; reformulating the locaiizaiion task as instance segmentation instead of semantic segmentation; reformulating the localization task as object detection, and use of different class imbalance han ling approaches for the classification model. Alternati vely, the jaw region of i nterest is localized and extracted as a first step in the pipelines. T he jaw region typicall takes around 30% of the image volume and has adequat v isual distinction. Extracting It with a shallow/small model would allow tor larger do wnstream models Further, the diagnostic coverage of the present invention extend from basic tooth conditions to other dkgnostkaiJy relevant conditions and pathologies.

[0044} The figures illustrate the architecture, functionality, and operation Of possible implementations of systems, methods and computer program products according to various embodiments of the present invention, It should also be noted that, in some alternative implementations, the functions noted/illusirated may occur out of the order noted. For example, two blocks shown in succession ay, in feel, be executed substantially concurrently, or the blocks may sometimes he executed in the reverse order, depending upon the functionality involved.

{®04S| Since various possible embodiments might be made of the above invention, and since various changes might be made in the embodiments above set forth, it is to be und ftf siuod that all matter herein described or shown In the accompanying drawings Is to he interpreted as illustrative and not to be considered in a limiting sense. Thus, it will be understood by those skilled in the art of creating independent multi-layered virtual workspace applications designed for use with independent multiple input systems that although the preferred and alternate embodiments have been shown and described in accordance with the Patent Statutes, the invention is not limited thereto or thereby.

f004b{ The terminology used herein is for the purpose of describing particular embodiments onl and is not intended to he limiting of the invention. As used herein, the singular forms V, "an” and“the" are intended to include the plural forms as well, unless the contest clearly indicates otherwise it will he further understood that the terms“comprises * ' and/or“comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, hut do not preclude the presence or addition of one or more other features, integers, steps, operations, elements:, components, and/or groups thereof

|ίK)471 Some portions of embodiments disclosed are implemented as a program product for use wife an embedded processor. The program(s) of the program product defines functions of the embodiments (including the methods described herein) an can be contained on a variety of signal-hearing media illustrative signal -bearing: media include, but are not limited to; i) information permanently stored on non- rltable storage media (e.g., read-only memory devices within a computer such as CD-ROM disks readable by a CD-ROM drive); (11) alterable information stored on writable storage media (e.g,, floppy disks within a diskette drive or harddisk drive, solid state disk drive, etc.); and (iii) information conveyed to a computer by a communications medium, such as through a computer or telephone network, including wireless eomnrunlcauom. The -latter embodiment specifically includes information downloaded from the Internet and other networks, Such signal-bearing media, when carrying computer- readable instructions that direct the functions of the present invention, represent embodiments of the present invention.

f004§] in general, the ro u tines executed to impl ement the embodi ments of the invention, may be part of an operating system or a specific application- component, program, module, object, or sequence of instructions. The computer program of the present invention typically is comprised of a multitude of instructions that will be translated by the native computer into a machine-accessihle format anti hence executable instructions. Also programs are comprised of variables and data structures that either reside locally to the program or are found in memor or on storage devices. In addition, various programs described may be identified based upon the application tor which they are implemented in a specific embodiment of the invention. However, it should bo appreciated that any particular program nomenclature that follows is used merely for convenience, and thus the invention should not be limited to use solely in any specific application identified and/or implied by such nomenclature.

|0049| The present invention and some of its advantages have been described in detail for some embodiments it shoul be understood that although the system and process is described with reference to Pixel Matrix Data Systems and Methods, the system and process may be used in other contexts as well. It should also be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention : as defined by the appended claims. An embodiment of the inveatio» may achieve multiple objectives, but not every embodiment failing within the scope of the attached claims will achieve every objective, Moreover, the scope of the present application is not intended to be li ite to the particular embodiments of the process, machine, manufacture, and composition of matter, means, methods and steps described in the specification, A person inn inn ordinary skill In the art will readily appreciate if ora the disclosure of the present invention that processes, machines, manufacture. compositions of matter, means, methods, or steps, presently existing or later to be developed are equivalent to, and fail within the scope of, what is claimed. Accordingly, the appended chums are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods,, or steps.

fOhsCif All elements, parts and steps described herein are preferably included ft is to he understood that any of these elements, parts id steps may be replaced by other elements, parts and steps or deleted altogether as will be obvious to those skilled in the art,

1O05!| Broadly, this writing has disclosed at least the following: An automated parsing pi eline system and method for anatomical localization and condition classification is disclosed. The system comprises an input even source, a memory unit and processor including a volumetric image processor, a voxel parsing engine, localization layer and a detection module. The. volumetric image processor is configured to receive volumetric image from foe input source and parse the recei ved volumetric image. The voxel parsing engine is configured to assign each voxel a distant anatomical structure. The localisation layer is configured to crop a defined anato ical strocftsre with surroundings The detection module is configured to classify conditions lor each defined anatomical structure within the cropped Image. The disclosed system and method provide accurate localization of a tooth and detects several common conditions in each tooth.

O0S2| This writing also presents at least the following Concepts. Concepts

1. An automated parsing pipeline system for anatomical toeat lkm and condition classification, said system comprising:

a processor;

a non-transitory storage dement coupled to the processor;

encoded instructions tored in the non-transitory storage element, wherein the encoded instructions when implemented by Che processor, configure the automated parsing pipeline system to;

receive at least one volumetric image; parse the received volumetric image into at least a single Image trams held of view;

pre-process the parsed volumetric image fey at least controlling tor image Intensity value; localise a present tooth inside the pre-processed and parsed volumetric image and identifying it by number; extract th identified tooth and surrounding context within the localised volumetric image; an classify a tooth's conditions based on the extracted volumetric imane. The system of concept 1 , wherein the at least one received volumetric image comprises a 3-D pixel array.

3. The system of concept 2, further configured to pre-process by convening the 3-D pixel array into an array of Heunsfleld Unit (HO) radio intensity measurements.

4. The system of concept 1 , 2 or 3 , further configured to pre-process at least one of the localization or classification steps by rescaling; using linear interpolation,

5. The system of concept I or 4, wherein the pre-processing comprises using any one of a normalization schemes to account for variations in image value intensity depending on at least one of an input or output of volumetric image.

6 The system of concept I , whofein the localization is achieve using a Y- et- based fully convolutional neural network,

7 The system of concept I , further configured to extract anatomical structure by finding a minimum bounding rectangle around the localized and identified tooth.

8 The system of concept 1 or 7, wherein the bounding rectangle extends by at least 1 5 mm vertically and 8 mm horizontally (equally In all directions) to capture the tooth and surrounding context.

9. The system of concept I , wherein the classi ication is achieved using a

DenseNet 3 -D convolutional neural network.

<¾> 10 An automated parsing pipeline system for anatomical local ixation, said system sing: a volumetric image processor:

a voxel parsing engine;

a localization layer:

a processor;

a nomtransitory storage element coupled to the processor;

encoded instructions stored in the non-transitory storage dement wherein the encoded instructions when implemented by the processor, configure the automated pursing pipeline system to:

receive at least one volumetric image data received from a radio-image gathering source by the volumetric image processor;

parse die received volumetric image data into at least a single image frame Odd of vie w by said volumetric image processor;

mediae anatomical structures residing in the at least single field of view by assigning each voxel a distinct anatomical structure by the voxel parsing engine; and

select all voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure by the localization layer,

I i The system o f concept 10, wherein the at leas one received

volumetric image comprises a 3-D pixel array. 12. The system of concept 10 or 1 1, further configured to pre-process by converting the 3-0 pixel array into an array oi Hounsf d Unit (HU) radio intensity Measure ents,

13 The sy stem o f concept 10, further configured to pre-process at least one of the loeslkafion or classification stops by rescaling using linear interpolation 14. The sy stem of concept 10 or 13, wherein (he pre-processing comprises using any one of a normalization schemes to account for variations in image value intensity depending on at least one of an inputor output of volumetric image,

15. The system of concept 10, wherein the localization is achieved using a V- Net- based follv convolutional neural network.

1 . The sy stem of concept 10, wherein the extraction is achie ved by finding a minimum bounding rectangle around the loco lined and identified tooth.

1 ? The sy ste of concept 10 or ! 4 wherein the bounding rectangle extends by at least 15 mm vertically and $ nt horizontally (equally in ail directions) to capture the tooth and surrounding context.

I b. An automated parsing pipeline system for anatomical localization and condition class fication, said syste comprising: a volumetric image processor;

a voxel parsing engine;

a localization l yer;

a classification layer;

2$ a processor,

a non- transitory storage element coupled to the processor;

encoded instructions stored in the non -transitory storage element, wherein the encoded instructions when implemented by tine processor, configure the automated parsing pipeline system to:

receive at least one volumetric image data received from a radio-image gathering source by the volumetric image processor;

parse the received volumetric image data into at least a single Image frame field of view by said volumetric image processor;

localize anatomical structures residing in the at least single field of view by assigning each voxel a distinct anatomical structure ID b the voxel parsing engine;

select all voxels belonging to the localized anatomical structure by finding a minimal bounding rectangle around the voxels and the surrounding region for cropping as a defined anatomical structure by the localization layer; and

detect conditions lor each defined anatomical structure within

the cropped image by the classification layer.

1 9. The system of concept 1 8, wherein the at least one received volumetric image comprises a 3-D pixel array.

20, The system of concept 18, further configured to pre-process by converting the 3- D pixel array into an array of HonnsfieM Unit (HU) radio intensity measurements. 21 . The system of concept 1 8 or 20, further configure to pre-process at least one of the localization or dassifi cation steps comprises rescaling using linear interpolation.

22 The ystem of concept i 8, wherein the pre-processing comprises using any one of a nornmilmfion schemes to account for variations in image value intensity depending on at least one of an input or outpu of volumetric image.

23. The system of conce t 18, wherein the loca! -atioti is achieved using a V~Net~ based fully convolutional neural network

24. he system of concept 18, further configured to extract anatomical structure by finding a minimum bounding rectangle around the ioeahzed and identified tooth.

25. The sy stem of concept i 8 and 34, wherein the bounding rectangle extends by at least 15 mm vertically and 8 mm horizontally (equally in all directions) to capture the tooth and surra tin d ing context .

26. The system of concept 1 8 wherein the classification is achieved using a

DenseNet 3-D convolutional neural network.

27. An automated parsing pipeline system ibr anatomical localization an condition classification, said system comprising:

a processor;

a non-transitory storage dement coupled to the processor; encoded instructions stored in the non -transitory storage element, wherein the encoded instructions when implemented by the processor, configure the automated parsing pipeline s stem to:

receive at least one 21) image; parse the received image into at least a single imago frame field of vie w; pre-process the parsed image by at least controllin for image intensity value; localize a present tooth inside the pre-processed and parsed

image and identifying it by number; : extract the identified tooth an surrounding context within the localized image; and classify a tootles conditions based on the extracted image,

28 A method lor localizing a tooth and classifying a tooth condition, said method comprising the steps of:

receiving at least one volumetric image; parsing the received volumetric image into at least a single image frame fiel of view;

pre-processing the parsed volumetric image by at least controlling for image intensity value; localizing a present tooth inside the pre-processed and parsed volumetric image and identifying it by number; extracting the identified tooth and surrounding context within the localized volumetric image; and classifying a toothw conditions based on the extracted volumetric image.

29. The system of concept 28, wherein the at least one received volumetric image comprises a 3-D pixel array

30. The system of concept 28, further includes a step of; pre-processing by convening the 3-D pixel array into an array of Houosfie!d Unit (HU) radio intensity measurements.

31 . t he system of concept 28 or 30, wherein the pre-processing for at least one of the localization or classification steps comprises rescaling using linear interpolaiion.

32 The system of concept 28, wherein the pre-processing comprises using arty one of a normalization schemes to account for variations in image value intensit depending on at least one of an input or output of volumetric image

33 , The sy stem of concept 28, wherein the localization is achieved using a Y et- based fully convolutional neural network

34 The system of concept 28, further comprises a step oil achieving extraction by finding a minimum bounding rectangle around the localized and identified tooth. 35. The system of concept 34, wherein the bounding rectangle extends by at least 1 5 nun vertically and S mm horizontally (equally in all directions) to capture the tooth and surrounding context

36. The system of concept 28, wherein the classification i achieved using a

Dense et 3-1) convolutional neural network

37. A method lor localizing a tooth and classifying a tooth condition, sai method comprising the steps oft

receiving at least one volumetric image: data received from a radio-image gathering source by a volumetric image processor;

parsing the received volumetric image data into at least a single image frame field of view by said volumetric image processor;

pre-processing the at leas t single image frame field of view by controlling for image intensity valise by the volumetric image processor;

localizing an anatomical structure residing in the at least single pre- processed field of view by assigning each voxel a distinct anatomical structure ID by the voxel parsing engine;

selecting all voxels belonging to the localized anatomical structure b finding a minimal hounding rectangle around the voxels and the surrounding region for croppin as a defined anatomical structure by foe localization layer; and

classifying conditions for each defined anatomical structure

within the cropped image by the classification layer.