Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR PROCESSING 3D IMAGING DATA AND ASSISTING WITH PROGNOSIS OF CANCER
Document Type and Number:
WIPO Patent Application WO/2023/222818
Kind Code:
A1
Abstract:
It is disclosed a method processing imaging data of a patient having cancer, for instance lymphoma, comprising: - Providing three-dimensional imaging data of the patient, - computing from said three-dimensional imaging data, at least one two-dimensional Maximum Intensity Projection image, corresponding to the projection of the maximum intensity of the three-dimensional imaging data along one direction onto one plane, - extracting a mask of the MIP image corresponding to cancerous lesions by application of a trained model. Using the extracted mask it is possible to compute one or more cancer prognosis indicators.

Inventors:
BUVAT IRÈNE (FR)
GIRUM KIBROM (FR)
Application Number:
PCT/EP2023/063366
Publication Date:
November 23, 2023
Filing Date:
May 17, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INST NAT SANTE RECH MED (FR)
INST CURIE (FR)
UNIV PARIS SACLAY (FR)
International Classes:
G06N3/02; G06N20/00; G06T7/00; G06T7/10; G06T7/11; G06T15/08
Other References:
BLANC-DURAND PAUL ET AL: "Fully automatic segmentation of diffuse large B cell lymphoma lesions on 3D FDG-PET/CT for total metabolic tumour volume prediction using a convolutional neural network", EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, SPRINGER BERLIN HEIDELBERG, BERLIN/HEIDELBERG, vol. 48, no. 5, 24 October 2020 (2020-10-24), pages 1362 - 1370, XP037450397, ISSN: 1619-7070, [retrieved on 20201024], DOI: 10.1007/S00259-020-05080-7
BLANC-DURAND PAUL: "supplemental fig.1 to Fully automatic segmentation of diffuse large B cell lymphoma lesions on 3D FDG-PET/CT for total metabolic tumour volume prediction using a convolutional neural network.", EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 24 October 2020 (2020-10-24), pages 1 - 1, XP055980178, Retrieved from the Internet [retrieved on 20221110]
WANG WEI ET AL: "Recurrent U-Net for Resource-Constrained Segmentation", 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 27 October 2019 (2019-10-27), pages 2142 - 2151, XP033724010, DOI: 10.1109/ICCV.2019.00223
MINA JAFARI ET AL: "FU-net: Multi-class Image Segmentation Using Feedback Weighted U-net", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 28 April 2020 (2020-04-28), XP081654198, DOI: 10.1007/978-3-030-34110-7_44
QIANG ZHIWEN ET AL: "A k-Dense-UNet for Biomedical Image Segmentation", 2019, ARXIV.ORG, PAGE(S) 552 - 562, XP047529465
WANG KE ET AL: "Residual Feedback Network for Breast Lesion Segmentation in Ultrasound Image", 21 September 2021, 20210921, PAGE(S) 471 - 481, XP047619167
MIKHAEEL NGSMITH DDUNN JT ET AL.: "Combination of baseline metabolic tumour volume and early response on PET/CT improves progression-free survival prediction in DLBCL", EUR J NUCL MED MOL IMAGING., vol. 43, 2016, pages 1209 - 1219, XP035871120, DOI: 10.1007/s00259-016-3315-7
COTTEREAU A-SNIOCHE CDIRAND A-S ET AL.: "18 F-FDG PET Dissemination Features in Diffuse Large B-Cell Lymphoma Are Predictive of Outcome", J NUCL MED., vol. 61, 2020, pages 40 - 45
SIBILLE LSEIFERT RAVRAMOVIC N ET AL.: "18 F-FDG PET/CT Uptake Classification in Lymphoma and Lung Cancer by Using Deep Convolutional Neural Networks", RADIOLOGY, vol. 294, 2020, pages 445 - 452
BLANC-DURAND P, JEGOU S, KANOUN S: " Fully Automatic segmentation of Diffuse Large B-cell Lymphoma lesions on 3D FDG-PET/CT for total metabolic tumour volume prediction using a convolutional neural network", EUR J NUCL MED MOL IMAGING., vol. 48, 2021, pages 1362 - 1370, XP037450397, DOI: 10.1007/s00259-020-05080-7
GIRUM ET AL.: "Learning with Context Feedback Loop for Robust Medical Image Segmentation", IEEE TRANSACTIONS ON MEDICAL IMAGING, ARXIV:2103.02844, 2021
VERCELLINO LCOTTEREAU ASCASASNOVAS O ET AL.: "High total metabolic tumor volume at baseline predicts survival independent of response to therapy", BLOOD, vol. 135, 2020, pages 1396 - 1405
CAPOBIANCO NMEIGNAN MCOTTEREAU A-S ET AL.: "Deep-Learning 18 F-FDG Uptake Classification Enables Total Metabolic Tumor Volume Estimation in Diffuse Large B-Cell Lymphoma", J NUCL MED., vol. 62, 2021, pages 30 - 36
Attorney, Agent or Firm:
PLASSERAUD IP (FR)
Download PDF:
Claims:
CLAIMS A method of processing imaging data of a patient having cancer, comprising:

- Providing (100) three-dimensional imaging data of the patient,

- computing (200) from said three-dimensional imaging data, at least one two-dimensional Maximum Intensity Projection image, corresponding to the projection of the maximum intensity of the three-dimensional imaging data along one direction onto one plane,

- extracting (300) a mask of the MIP image corresponding to cancerous lesions by application of a trained model. The method according to claim 1 , wherein the three-dimensional imaging data is PET scan data. The method according to claim 1 or 2, comprising computing from the three- dimensional imaging data two Maximum Intensity Projection images corresponding to the projection of the maximum intensity of the three- dimensional imaging data onto two orthogonal planes. The method according to claim 3, wherein the model has been previously trained by supervised learning on a database comprising a plurality of MIP images corresponding to projections of three-dimensional imaging data according to a first plane, and a plurality of MIP images corresponding to projections of three-dimensional imaging data according to a second plane, orthogonal to the first, and, for each MIP image, a corresponding mask of the image corresponding to cancerous lesions. The method according to any of the preceding claims, wherein the trained model is a Convolutional Neural Network comprising a forward system comprising:

- an encoder region comprising a succession of layers of decreasing resolutions,

- a decoder region comprising a succession of layers of increasing resolutions, wherein a layer of the decoder region concatenates the output of the layer of the encoder region of the same resolution with the output of the layer of the decoder region of the next lower resolution, a bottle-neck region between the encoder and decoder regions, and a feedback system, comprising an encoder part and decoder part respectively identical to the encoder region and decoder region of the forward system, where the output of the encoder part is concatenated to the output of the layer of lowest resolution of the forward system for at least one training phase of the network. A method according to the preceding claim, the encoder, decoder and bottleneck regions of the network comprise building blocks where each building block is a residual block comprising at least a convolutional layer and an activation layer, with a skip connection between the input of the block and the activation layer. A method for assisting with cancer prognosis comprising:

- performing the method according to any of the preceding claims on three-dimensional imaging data of a patient to output a two-dimensional cancerous lesion mask of a MIP image computed from the three- dimensional imaging data, and

- processing (400) said cancerous lesion mask to compute at least one prognosis indicator. The method according to claim 7, wherein at least one prognosis indicator comprises an indicator of the lesion dissemination. The method according to claim 8, wherein processing the cancerous lesion mask comprises computing the distance between tumor pixels belonging to the mask along two orthogonal axes of the mask and summing said dimensions. The method according to any of claims 7-9, wherein at least one prognosis indicator comprises an indicator of the lesion burden. The method according to the preceding claim, wherein processing the cancerous lesion mask comprises computing a number of pixels belonging to the lesion multiplied by the area represented by each pixel. The method according to any of claims 7-11 , wherein the cancer is a lymphoma. The method according to claim 12, wherein the lymphoma is Diffuse Large B-cell Lymphoma. A computer-program product comprising code instructions for implementing the method according to any of the preceding claims, when it is executed by a processor. A non-transitory computer readable storage having stored thereon code instructions for implementing the method according to any of claims 1-13, when they are executed by a processor.

Description:
METHOD FOR PROCESSING 3D IMAGING DATA AND ASSISTING WITH PROGNOSIS OF CANCER

TECHNICAL FIELD

The present disclosure relates to the field of medical imaging, more specifically to the processing of three-dimensional imaging data of patients having cancer.

PRIOR ART

Diffuse large B-cell lymphoma (DLBCL) is the most common type of non-Hodgkin lymphoma.

In clinical practice, acquiring F-FDG PET/CT image is a standard-of-care for staging and assessing response in DLBCL patients. Positron Emitting Tomography (PET) is a technology which allows locating a radiotracer which has been previously injected in a patient. Typically chosen radiotracers, such as fluorodeoxyglucose 18F-FDG, accumulate on the regions of the body which include cells with a high metabolic activity. Such regions include brain, liver, and tumors. PET scan imaging thus allows mapping the tumors of a patient.

Moreover, once a 18F-FDG PET/CT image is acquired on the patient, this image can be processed to compute one or more biomarkers having prognostic value for the patient. It has been largely demonstrated that the total metabolically active tumor volume (TMTV) calculated from 18F-FDG PET images has prognostic value in lymphoma, and especially in DLBCL (Mikhaeel NG, Smith D, Dunn JT et al. “Combination of baseline metabolic tumour volume and early response on PET/CT improves progression-free survival prediction in DLBCL”. Eur J Nucl Med Mol Imaging. 2016;43:1209-1219). The disease dissemination, reflected by the largest distance between two lesions in the baseline whole-body 18F-FDG PET/CT image (Dmax), has also been shown to be an early prognostic factor (Cottereau A-S, Nioche C, Dirand A-S et al. 18 F-FDG PET Dissemination Features in Diffuse Large B-Cell Lymphoma Are Predictive of Outcome, J Nucl Med. 2020; 61 :40-45).

TMTV and Dmax calculations require tumor volume delineation over the whole-body three-dimensional (3D) 18F-FDG PET/CT images, which is time consuming (up to 30 min per patient), prone to observer-variability and complicates the use of these quantitative features in clinical routine. To address this problem, automated lesion segmentation approaches using convolutional neural networks (CNN) have been proposed in:

Sibille L, Seifert R, Avramovic N, et al. « 18 F-FDG PET/CT Uptake Classification in Lymphoma and Lung Cancer by Using Deep Convolutional Neural Networks”. Radiology. 2020;294:445-452

Blanc-Durand P, Jegou S, Kanoun S, et al. « Fully Automatic segmentation of Diffuse Large B-cell Lymphoma lesions on 3D FDG-PET/CT for total metabolic tumour volume prediction using a convolutional neural network”. Eur J Nucl Med Mol Imaging. 2021 ;48: 1362-1370.

These methods have shown promising results, but they require high computational resources to be developed, and tend to miss small lesions. Further, results from CNN still need to be validated and adjusted by an expert before using them for further analysis and subsequent biomarker calculation. This implies a thorough visual analysis of all 3D 18F-FDG PET/CT images and delineation of the lesions missed by the algorithm. Consequently, developing a pipeline that would fully automate the segmentation and/or speed-up this checking/adjustment process is highly desirable in clinical practice.

SUMMARY OF THE INVENTION

The aim of the present disclosure it to address the limitations of the prior art. In particular, an aim of the invention is to provide a method for processing three- dimensional imaging data of a patient having cancer in order to delineate a lesion region that is more reliable and less computationally-intensive than state-of-the-art method, and reduces the time needed by an expert to perform post-processing validation.

Accordingly, the present disclosure relates to a method of processing imaging data of a patient having cancer, comprising:

- providing three-dimensional imaging data of the patient,

- computing from said three-dimensional imaging data, at least one two- dimensional Maximum Intensity Projection image, corresponding to the projection of the maximum intensity of the three-dimensional imaging data along one direction onto one plane,

- extracting a mask of the MIP image corresponding to cancerous lesions by application of a trained model. In embodiments, the three-dimensional imaging data is PET scan data.

In embodiments, the method comprises computing from the three-dimensional imaging data two Maximum Intensity Projection images corresponding to the projection of the maximum intensity of the three-dimensional imaging data onto two orthogonal planes. In this case, the model may have been previously trained by supervised learning on a database comprising a plurality of MIP images corresponding to projections of three-dimensional imaging data according to a first plane, and a plurality of MIP images corresponding to projections of three- dimensional imaging data according to a second plane, orthogonal to the first, and, for each MIP image, a corresponding mask of the image corresponding to cancerous lesions.

In embodiments, wherein the trained model is a Convolutional Neural Network comprising:

- an encoder region comprising a succession of layers of decreasing resolutions,

- a decoder region comprising a succession of layers of increasing resolutions, wherein a layer of the decoder region concatenates the output of the layer of the encoder region of the same resolution with the output of the layer of the decoder region of the next lower resolution,

- a bottle-neck region between the encoder and decoder regions, and

- a feedback linking the output of the network and the bottle-neck region.

In embodiments, the encoder, decoder and bottle-neck regions of the network comprise building blocks where each building block is a residual block comprising at least a convolutional layer and an activation layer, with a skip connection between the input of the block and the activation layer.

It is also disclosed a method for assisting with cancer prognosis comprising:

- performing the method of processing imaging data according to the above description on three-dimensional imaging data of a patient to output a two-dimensional cancerous lesion mask of a MIP image computed from the three-dimensional imaging data, and processing said cancerous lesion mask to compute at least one prognosis indicator.

In embodiments, the at least one prognosis indicator comprises an indicator of the lesion dissemination.

In embodiments, processing the cancerous lesion mask comprises computing the distance between tumor pixels belonging to the mask along two orthogonal axes of the mask and summing said dimensions.

In embodiments, at least one prognosis indicator comprises an indicator of the lesion burden.

In embodiments, processing the cancerous lesion mask comprises computing a number of pixels belonging to the lesion multiplied by the area represented by each pixel.

In embodiments, the cancer is a lymphoma, for instance a Diffuse Large B-cell Lymphoma.

It is also disclosed a computer-program product comprising code instructions for implementing the methods of processing imaging data and for assisting with cancer prognosis according to the above description, when it is executed by a processor.

It is also disclosed a non-transitory computer readable storage having stored thereon code instructions for implementing the methods of processing imaging data and for assisting with cancer prognosis according to the above description, when they are executed by a processor.

The proposed method allows automatically segmenting cancerous lesions regions from 3D imaging data such as PET imaging data, by performing said segmentation on 2D Maximum Intensity Projection (MIP) images obtained from said 3D data, using a trained model. The computational resources needed to train and execute the trained model on a 2D MIP image are very much reduced as compared to the training and execution of a model on PET imaging data, and the checking/adjustment process performed by an expert is speeded-up since the expert does not need to analyze a whole 3D PET image, but only the 2D MIP images(s). Meanwhile, the lesion region that is extracted from the 2D MIP image can be processed to extract indicators reflecting the volume of the tumor and the tumor dissemination which are prognosis indicators that can serve as a basis to estimate the chances of survival of the patient (overall survival OS), or the chances of progression-free survival (PS).

DESCRIPTION OF THE DRAWINGS

Other features and advantages of the invention will be apparent from the following detailed description given by way of non-limiting example, with reference to the accompanying drawings, in which:

- Figure 1 schematically represents the main steps of a method according to an embodiment,

Figure 2 represents 18F-FDG PET MIP images and segmentation results (blue color overlapped over the PET MIP images) by experts (MIP masks) and by the CNN for four patients: (A, B) from the REMARC patient cohort, and (C, D) from the LNH073B patient cohort.

Figure 3 schematically represents the structure of a Convolutional Neural Network that may be used for segmenting MIP images,

Figure 4 illustrates the computation of the lesion dissemination feature from a MIP image.

Figure 5 displays Kaplan-Meier estimates of overall survival (OS) and progression-free survival (FPS) on the REMARC cohort according to 3D 18F-FDG PET/CT image-based features TMTV (cm3) and Dmax (cm) (A, C), and according to PET MIP image-based features (l B (cm2) and ID (cm)) estimated from Al (B, D).

- Figure 6 displays Kaplan-Meier estimates of overall survival (OS) and progression-free survival (FPS) on the LNH073B cohort according to 3D 18F-FDG PET/CT image-based features TMTV (cm3) and Dmax (cm) (A, C), and according to PET MIP image-based features (l B (cm2) and ID (cm)) estimated from Al (B, D).

- Figure 7 displays confusion matrices for classification of patients using PET features derived from the expert-delineated 3D 18F-FDG PET regions (3D- expert) and from the 2D PET MIP regions delineated by the CNN (2D-AI) on LNH073B cohort. A) Two-risk-group classification using Dmax and ID, B) two- risk-group classification using TMTV and l B , and C) three-risk-group classification using TMTV and Dmax (3D-expert), and l B and ID (CNN).

DETAILED DESCRIPTION OF AT LEAST ONE EMBODIMENT

With reference to the drawings, a method for processing three-dimensional imaging data of a patient having cancer, and of extracting prognosis indicators therefrom, will now be described.

The method may be implemented by a computing system comprising at least one processor, which may include one or more Computer processing unit(s) CPU, and/or Graphical Processing Unit(s) GPU, and a non-transitory computer-readable medium 11 storing program code that is executable by the processor, to implement the method described below. The computing system 1 may also comprise at least one memory 12 storing a trained model configured for extracting cancer lesion region or mask from a Maximum Intensity Projection (MIP) Image obtained from three dimensional imaging data of a patient.

In embodiments, the method disclosed below may be implemented as software program by a PET/CT scanner incorporating said at least one processor, and which may also store the memory 12 for accessing the stored model. Alternatively, the memory may be remotely located and accessed via a data network, for instance a wireless network.

With reference to figure 1 , the method comprises providing 100 three-dimensional PET imaging data of a patient having cancer.

The three-dimensional imaging data may be Positron Emission Tomography imaging data obtained with 18F-FDG tracer. The three-dimensional imaging data may be acquired from skull base to upper thighs of a patient, and is later denoted as whole-body imaging data. In embodiments, step 100 does not include the actual acquisition of imaging data on a patient, but may comprise recovering said data from a memory, Picture Archiving and Communication System (PACS) or network in which it is stored.

The cancer may be any type of cancer, metastatic or not, including colorectal cancer, breast cancer, lung cancer, lymphoma, in particular non-Hodgkin lymphoma, in particular Diffuse Large B-Cell Lymphoma (DLBCL). The method then comprises computing 200 from said three-dimensional imaging data, at least one two-dimensional Maximum Intensity Projection (MIP) image, corresponding to the projection of the maximum intensity of the three-dimensional imaging data onto one plane. In other words, a MIP image is a 2D image in which each pixel value is equal to the maximum intensity of the 3D imaging data observed along a ray normal to the plane of projection.

In embodiments, the plane of projection of the MIP image may be the coronal plane, i.e. the vertical plane that partitions the body into front, and back. The plane of projection of the MIP image may also be the sagittal plane, i.e. the vertical plane that partitions the body into left and right halves.

In embodiments, one, two or more MIP images are computed from the 3D imaging data, where the MIP images preferably correspond to projections of the maximum intensity of the 3D imaging data along two orthogonal planes. According to an embodiment shown in figure 2, step 200 may comprise computing one MIP image along the sagittal plane, and one MIP image along the coronal plane.

The method then comprises extracting 300 from said at least one 2D MIP image a mask corresponding to cancerous lesions. The mask extracted from the MIP image is a two-dimensional image, that may have the same size as the MIP image, in which the pixels corresponding to cancer lesions are set to one, and the other are set to zero.

This extraction, or segmentation, is performed by a trained model that is configured to extract from 2D MIP images obtained from 3D imaging data, in particular 18F- FDG PET imaging data, a mask of the cancerous lesions. The trained model may be a Convolutional Neural Network (CNN), in particular having a U-Net architecture, In embodiments, the trained model may be the model disclosed by Kibrom Berihu Girum et al. “Learning with Context Feedback Loop for Robust Medical Image Segmentation”, in IEEE Transactions on Medical Imaging, arXiv:2103.02844, 2021 , having the structure shown in figure 3.

This CNN comprises a main, forward system, comprising an encoder region encoding the raw MIP input image into a feature space, a decoder region decoding the encoded features into target labels, and a bottle-neck region or processing region of the feature space. The CNN further comprises skipped connections between the encoder and decoder regions. The encoder region comprises a succession of layers of decreasing resolution, where each layer comprises a convolutional building block discussed in more details below, and each layer except the first performs a Max Pooling on the output of the building block of the preceding layer of higher resolution.

The decoder region also comprises a convolutional building block that receives as input the output of the encoder layer of same resolution through a skip connection, concatenated with the output of an up-convolutional layer applied to the output of the building block of the preceding layer of lower resolution.

The bottle-neck region is a residual block with a skip connection between the output of the last layer of the encoder region and the input of the first layer of the decoder region.

The building block in all components of the model is a residual CNN, comprising convolutional layers and an activation layer, with a skip connection between the input of the block and the activation layer. This can ease training and facilitate information propagation from input to the output of the network architecture. In particular in the case of lymphoma, lesions can be scattered over the whole body and the choice of this building block prevents losing information in the successive convolution and pooling operations.

As shown in the left-hand part of figure 3, such network further comprises an external fully-connected network-based feedback system. The feedback system links the output of the CNN, i.e. the segmentation map or segmented region of the image, to the bottleneck region. As shown in the right-hand part of figure 3, the feedback system also has a structure of encoder-decoder, with the encoder and decoder parts being identical respectively to the encoder and decoder parts of the main forward system represented in the left-hand part of figure 3, but with the output of the last convolutional building block of the encoder being fed directly to the first up-convolutional layer of the decoder block. The output of the CNN is thus encoded by the feedback system into the same high-feature space as the bottle-neck region of the main forward system represented in the left-hand-part of figure 3.

The output h f of the last convolutional building block of the encoder can be concatenated with the output of the building block of the layer of lowest resolution of the main forward system for at least one training phase of the network.

The training of such model may comprise a series of steps including: - Training the network weights of the forward system, considering raw input images and zero feedback (denoted h 0 in figure 3) as inputs, and the ground truth labels as outputs,

- Training the network weights of the feedback system, considering the input from the predicted output of the forward system’s decoder network, and the ground truth label as outputs, and,

- Training the network weights of the forward’s system decoder part only, taking the inputs from previously extracted high-level features from the raw input image and the feedback h f from the feedback system. Here, the forward’s system and the feedback system’s encoder are designed to predict from previously learned and updated weights during the previous steps, Repeat until convergence is reached.

The model has been preliminarily trained on a learning database comprising a plurality of MIP images calculated from 3D images data and, for each MIP image, a mask of the cancerous lesions derived from the tumor delineation of the 3D images by experts. The model can in particular be trained on a learning database comprising MIP images corresponding to sagittal and coronal maximum intensity projections of 3D imaging data and their corresponding lesion masks. In this case, the sagittal and coronal MIP images are treated independently, meaning that a single model is trained to transform either a coronal or sagittal MIP image as input into its corresponding mask.

Once a cancer lesion mask is extracted from a MIP image, said mask can be further processed or analyzed in order to compute at least one biomarker, for instance a prognosis indicator of survival of the patient or of progression-free survival of the patients.

In embodiments, the further processing 400 of the lesion mask may comprise computing an indicator of lesion dissemination l D . Said indicator may be computed by estimating the largest distance between the lesion pixels belonging to the lesion mask, which may be implemented by computing the distance between pixels belonging to the lesion mask that are the farthest away according to two orthogonal axes and summing said distances.

According to an embodiment schematically shown in figure 4, the computation of lesion dissemination may comprise calculating the sum of the pixels values (i.e. the sum of the pixels corresponding to the lesions since they are set to 1 and the other are set to 0) along the rows and columns of the lesion mask, yielding x and y profiles where the value of the profile for a line (y profile) or a column (x profile) is the number of pixels belonging to a lesion along the considered line or column.

In each profile, the largest distance is computed between a column, respectively line, corresponding to a given percentile a and a column, respectively line, corresponding to the percentile equal to 100-a, with a preferably between 0 and 10, preferably inferior to 5, for instance a=2. Pixel positions with zero total number of tumor pixels (often at the beginning and end of the pixel positions) are not considered for the percentile calculation.

The indicator of lesion dissemination may thus be computed, for a given MIP image and when setting a to 2, as I D = (x 98% - x 2% ) + (y 98% - y 2% )

When, for a patient, a MIP coronal image and a MIP sagittal image are calculated and corresponding lesion masks are obtained, the indicator of lesion dissemination is the sum of the indicators computed on each image:

ID ID, coronal T ID, sagittal

In figure 4 is shown an example displaying the distances between the 2% percentile and the 98% percentile in x and y.

The further processing of the lesion mask may also, or alternatively, comprise the computation of an indicator of tumor burden, l B , by computing a number of pixels belonging to the lesion, multiplied by the area represented by each pixel.

When, for a patient, a MIP coronal image and a MIP sagittal image are calculated and corresponding lesion masks are obtained, the indicator of lesion burden is the sum of the indicators computed on each image:

Patients

The study population included DLBCL patients who had a baseline (before treatment initiation) PET/CT scan from two independent trials: REMARC (NCT01122472) and LNH073B (NCT00498043). PFS and OS as defined following the revised National Cancer Institute criteria were recorded. All data were anonymized before analysis. The institutional review board approval, including ancillary studies, was obtained for the two trials, and all patients provided written informed consent. The demographics and staging of the patients used for the survival analysis are summariz in Table 1 . Table 1 :

Measurement of Reference TMTV and Dmax

For the REMARC cohort, the lymphoma regions were identified in the 3D PET images as described in the following publications:

- Vercellino L, Cottereau AS, Casasnovas O, et al. High total metabolic tumor volume at baseline predicts survival independent of response to therapy.

Blood. 2020;135:1396-1405.

- Capobianco N, Meignan M, Cottereau A-S, et al. Deep-Learning 18 F-FDG Uptake Classification Enables Total Metabolic Tumor Volume Estimation in Diffuse Large B-Cell Lymphoma. J Nucl Med. 2021 ;62:30-36. A SUVmax 41% threshold segmentation was then applied on these regions, corresponding to including in the final region all voxels whose intensity was greater than or equal to 41% of the maximum intensity in the region.

The LNH073B lesions were segmented by first automatically detecting hypermetabolic regions by selecting all voxels with an SUV greater than 2 included in a region greater than 2 mL, and a 41% SUVmax thresholding of the resulting regions was used, corresponding to including in the final region all voxels whose intensity was greater than or equal to 41% of the maximum intensity in the region.

In all cohorts, physicians removed the regions corresponding to physiological uptakes and added pathological regions missed by the algorithm. The physicians were blinded to the patient outcomes. Expert-validated 3D lymphoma regions were used to compute the reference TMTV and Dmax (based on the centroid of the lymphoma regions).

Calculation of the PET MIP Images and 2D Reference Lvmohoma Regions

For each patient whole-body 3D 18F-FDG PET images and associated 3D lymphoma regions, two 2D MIP views and associated 2D lymphoma regions were calculated (Figure 2). The 3D PET image was projected in the coronal and sagittal directions, 90° apart (Figure 2), setting each pixel value of the projection to the maximum intensity observed along the ray normal to the plane of projection. Similarly, MIP of the expert-validated 3D lymphoma regions were calculated, resulting in binary images of 2D lymphoma regions (Figure 2), hereafter called MIP masks. These MIP masks were then used as a reference output to train a CNN-based fully automatic lymphoma segmentation method.

Fully Automatic Lvmohoma Segmentation on PET MIP Images

To automatically segment the lymphoma lesions from the sagittal and coronal PET MIP images, a deep learning model was implemented.

The model consists of an encoder and a decoder network with a skipped connection between the two paths and external fully connected network-based feedback with a residual CNN as a building block (Figure 3). The input and output dimensions of the network were 128x256x1.

The building block is the convolutional building block of the deep learning model. Each 2D CNN (Conv2D) with a kernel size of 3x3 was followed by batch normalization and activation function. The exponential linear unit (ELU) activation function was used, except it was a sigmoid activation function at the output layers. After the convolutional building block in the encoder, a 2x2 max pooling operation was applied, with stride 2 for downsampling. Before the convolutional building block, a 2x2 up-convolutional layer was used in the decoder.

All available 3D PET images and the corresponding expert-validated 3D lymphoma segmented regions were resized in to 4 x 4 x 4 mm3 voxel size. The resized 3D images were then padded or cropped to fit into a 128x128x256. The resized and cropped image were projected into sagittal and coronal views. The input and output image dimensions to the network were 128x256x1 .

The sagittal and coronal PET MIPs were independent input images during training.

The corresponding MIP mask was the output image. The deep learning model was trained to transform a given sagittal or coronal PET MIP image to the corresponding MIP mask with pixels of lymphoma regions set to one and pixels of the nonlymphoma regions set to zero.

First, using the REMARC cohort (298 patients), a five-fold cross-validation technique was used to train and evaluate the model. Patients were randomly split into five groups, and then five models were trained on 80% of the population and the remaining 20% was used for validation.

The model was trained with a batch size of 32 for 1000 epochs and 300 early stop criteria. The deep learning model neural network weights were updated using a stochastic gradient descent algorithm, ADAM optimizer, with a learning rate of 1 e -4 . All other parameters were Keras default values. A sigmoid output activation function was used to binarize the image into the lymphoma region and non-lymphoma region. The average of the Dice similarity coefficient (Lossoice) and binary crossentropy (Losst>inary cross-entropy) was used as a loss function defined by:

The model was implemented with Python, Keras API, and Tensorflow backend. The data was processed using the Python 3.8.5 package, including Numpy, Scipy, Pandas, and Matplotlib. No post-processing method was applied for the segmentation metrics. To compute the surrogate biomarkers from the Al-based segmented images, regions with less than 4.8 cm2 were removed. Secondly, the model trained from the REMARC cohort (298 patients) was tested on the independent LNH073B cohort (174 patients) to characterize its generalizability and robustness. The REMARC and LNH073B cohorts were acquired from two different trials. The REMARC (training-validation) data was a double-blind, international, multicenter, randomized phase III study, which started inclusion in 2010. In contrast, the LNH073B data was a prospective multicenter, randomized phase II study, that started inclusion including patients in 2007.

Calculation of l B and ID

Burden indicator l B and Dissemination indicator l D , interpreted respectively as surrogate indicators for TMTV and Dmax, were defined and computed from the MIP masks automatically segmented from the coronal and sagittal PET MIP images using the deep learning method.

To characterize tumor burden l B , the number of pixels belonging to the tumor regions in MIP mask multiplied by the pixel area was computed. For a given patient, l B was calculated from the coronal and the sagittal MIP masks as l B = l B , coronal + I B, sagittal ■

The dissemination of the disease ID was analyzed by estimating the largest distance between the tumor pixels belonging to the MIP mask. First, the sums of pixels along the columns and the rows of MIP mask were calculated, yielding x and y profiles (Figure 4). Second, in each of these two profiles, the distances between the 2% percentile and the 98% percentiles (x 2 % and x 98 % in the x profiles, y 2 % and y 9 8% in the y profiles) were calculated, yielding (x, J8% - x 2% ) and (y 98 % - y2%) , respectively. These percentiles were chosen to improve the robustness of the calculation to outliers. The largest distance was defined as

ID = ( 98% — X 2%) + ^98% 72%)

For a given patient, the tumor dissemination l D was the sum of the coronal and sagittal disseminations using

ID ID, coronal T ID, sagittal

Statistical Analysis Using the MIP masks obtained from the expert-delineated 3D lymphoma regions (Figure 2) as a reference, CNN's segmentation performance was evaluated using the Dice score, sensitivity, and specificity. The difference between the CNN-based segmentation results and the expert-delineated 3D lymphoma regions were quantified using Wilcoxon statistical tests. Univariate and multivariate survival analyses were performed. For all biomarkers, a time-dependent area under the receiver operating characteristics curve (AUC) was calculated. Bootstrap resampling analysis was performed to associate confidence intervals to the Cox model hazard ratio and the time-dependent AUC. Test results were considered statistically significant if the two-sided P-value was <0.05.

RESULTS

A total of 475 patients from two different cohorts were included in this study, of which 93 patients were excluded from the biomarker and survival analysis because the provided baseline 18F-FDG PET/CT images were not suitable to analyze all biomarkers (no PET segmentation by an expert or less than 2 lesions).

The performance of the proposed segmentation method was evaluated patient-wise. The CNN segmentation method achieved a 0.80 median Dice score (interquartile range [IQR]: 0.63-0.89), 80.7% (IQR: 64.5%-91.3%) sensitivity, and 99.7% (IQR: 99.4%-0.99.9%) specificity on the REMARC cohort. On the testing 174 LNH073B patients, the CNN yielded a 0.86 median Dice score (IQR: 0.77-0.92), 87.9% (IQR: 74.9.0%-94.4%) sensitivity, and 99.7% (IQR: 99.4%-99.8%) specificity. In the LNH073B data, the CNN yielded a mean Dice score of 0.80 ± 0.17 (mean ± SD) on the coronal view and 0.79 ± 0.17 on the sagittal view. Figure 2 shows segmentation result examples from experts (MIP masks) and CNN. The Dice score was not significantly different (p>0.05) between the coronal and sagittal views, both for the REMARC and LNH073B cohorts (p>0.05).

In both cohorts, there was a significant correlation between ranked TMTV and Dmax values and the associated surrogate values l B , ID obtained using CNN. For REMARC, TMTV was correlated with IB (Spearman r = 0.878, p<0.001), and Dmax was correlated with l D (r = 0.709, p<0.001). Out of 144 patients who had TMTV greater than the median TMTV (242 cm3), 121 (84.02%) patients had also IB greater than the median l B (174 .24 cm2). 144 patients had Dmax greater than the median Dmax (44.8 cm), and 113 (78.5%) of these patients also had ID greater than the median l D (98.0 cm).

For LNH073B, TMTV was correlated with IB (r =0.752, p<0.001 ), and Dmax was correlated with l D (r = 0.714, p<0.001 ). Out of 48 patients who had TMTV greater than the median TMTV (375 cm3), 42 (87.5%) patients had also IB greater than the median l B (307.2 cm2). 48 patients had Dmax greater than the median Dmax (44.1 cm), and 39 (81.3%) of these patients also had ID greater than the median l D (116.4 cm). Table 2 shows the descriptive statistics for the IB and ID.

Table 2

Survival Analysis

The time-dependent ALIC and hazard ratios (HR) with 95% confidence interval of the metabolic tumor volume and tumor spread are shown in Table 3 for the REMARC and LNH073B data. All PET features extracted from the baseline 3D 18F- FDG PET/CT images and using Al (l B and ID) were significant prognosticators of the PFS and OS.

Combining TMTV and Dmax (or their surrogates), three risk categories could be differentiated in the REMARC data (Figure 5): using the 3D features, category 1 corresponded to low TMTV (< 222 cm3) and low Dmax (< 59 cm) (low risk, n=108); category 2 corresponded to either high Dmax or high TMTV (intermediate risk, n=112); category 3 corresponded to both high Dmax and high TMTV (high risk, n=67). This stratification was similar when using the MIP-features-based categories using Al (Figure 5). The accuracy of the CNN-based classification into three categories with respect to the 3D-biomarkers-based classification was 71 .4%.

In the LNH073B cohort, combining TMTV and Dmax (or their surrogates), three risk categories could be differentiated (Figure 6). Using the 3D features, category 1 was defined as low TMTV (< 468 cm3) and low Dmax (< 60 cm) (n=45); category 2 corresponded to either high Dmax or high TMTV (n=37); category 3 corresponded to both high Dmax and high TMTV (n=13). Out of the 13 patients classified as high risk, 9 (69.2%) patients had less than 4-years of OS, and 10 (76.9%) patients had less than 4-years of PFS. This stratification was similar when using the CNN-based results. The IB cut-off value was 376 cm2, the ID cut-off value was 122 cm. There were 38 patients in category 1 , 35 in category 2, and 22 in category 3. Out of the 22 patients classified as a high risk, 19 (77.3%) patients had less than 4-years of OS, and 19 (86.4%) patients had less than 4-years of PFS. The accuracy of the Al- based classification into three categories with respect to the 3D-biomarkers-based classification was 64.2%. All patients classified as high risk using the 3D biomarkers were also classified as high risk using the CNN, except one patient who had an OS of 36.6 months. Out of the nine patients classified as high risk when using the CNN but not when using the 3D biomarkers, 8 (88.9%) patients had less than 4-years of OS, and the remaining one (11.1%) patient had 21 .95 and 57.99 months of PFS and OS respectively.

In Figure 7, the confusion matrices show the agreement between the 3D-based biomarkers and the surrogate MIP biomarkers in the LNH073B data. The percentage of the data classified into high, low, and intermediate risk is also shown. Considering one biomarker-based classification, the Al-based classification into two groups (high and low risks) to the 3D-based classification (using either tumor burden or dissemination biomarkers) was 79% accuracy.

Thus, the automated segmentation of lesion mask of Maximum Intensity Projection Images obtained from 3D imaging data provides accurate and less computationally intensive segmentation, and the obtained lesion masks can provide prognostic indicators reflecting tumor burden and tumor dissemination.