Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD FOR DETERMINING THE TEMPORAL PROGRESSION OF A BIOLOGICAL PHENOMENON AND ASSOCIATED METHODS AND DEVICES
Document Type and Number:
WIPO Patent Application WO/2017/195126
Kind Code:
A1
Abstract:
The invention relates to age-related brain diseases, such as Parkinson's or Alzheimer's disease. Statistical models based on the regression of measurements with age are inadequate to model the progression of such diseases. As a consequence, the inventors worked on a numerical model to determine a temporal progression for such biological phenomenon, the numerical model being a function in a Riemann manifold. Such model enables to obtain a method for determining the temporal progression of a biological phenomenon which can be implemented on computer and provides better results than statistical models based on the regression of measurements. This determining method may be applied for predicting that a subject is at risk of suffering from such disease, diagnosing a disease, identifying a therapeutic or a biomarker and screening compounds useful as a medicine.

Inventors:
DURRLEMAN STANLEY (FR)
SCHIRATTI JEAN-BAPTISTE (FR)
ALLASSONNIERE STÉPHANIE (FR)
COLLIOT OLIVIER (FR)
Application Number:
PCT/IB2017/052722
Publication Date:
November 16, 2017
Filing Date:
May 10, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INSTITUT NATIONAL DE LA SANTE ET DE LA RECH MEDICALE (INSERM) (FR)
INST DU CERVEAU ET DE LA MOELLE EPINIERE (FR)
ECOLE POLYTECH (FR)
INRIA INST NAT RECH INFORMATIQUE & AUTOMATIQUE (FR)
CENTRE NAT RECH SCIENT (FR)
UNIVERSITÉ PIERRE ET MARIE CURIE (PARIS 6) (FR)
ASSIST PUBLIQUE - HOPITAUX DE PARIS (FR)
International Classes:
G06K9/62
Foreign References:
US20040242972A12004-12-02
Other References:
J.-B SCHIRATTI ET AL: "Mixed-effects model for the spatiotemporal analysis of longitudinal manifold-valued data", MATHEMATICAL FOUNDATIONS OF COMPUTATIONAL ANATOMY, 9 October 2015 (2015-10-09), pages 48 - 59, XP055333442
BILGEL MURAT ET AL: "A multivariate nonlinear mixed effects model for longitudinal image analysis: Application to amyloid imaging", NEUROIMAGE, ELSEVIER, AMSTERDAM, NL, vol. 134, 16 April 2016 (2016-04-16), pages 658 - 670, XP029608533, ISSN: 1053-8119, DOI: 10.1016/J.NEUROIMAGE.2016.04.001
None
Attorney, Agent or Firm:
BLOT, Philippe et al. (2 place d'Estienne d'Orves, Paris Cedex 09, FR)
Download PDF:
Claims:
CLAIMS

1 . - A method for determining the temporal progression of a biological phenomenon which may affect a studied subject, the method comprising the steps of:

- providing first data, the first data being data relative to biomarkers for the studied subject, the biomarkers being relative to the progression of the biological phenomenon,

- providing a numerical model (NM), the numerical model (NM) being a function in a Riemann manifold (RM), the numerical model (NM) associating to values of biomarkers a temporal progression trajectory for the biological phenomenon and data relative to the dispersion of the progression trajectory for the biological phenomenon among a plurality of subjects, the numerical model (NM) being obtained by using a stochastic approximation in an expectation-maximization technique on data relative to biomarkers taken at different time points for a plurality of subjects,

- converting the first data into at least one point on the same Riemann manifold (RM), and

- using the numerical model (NM) to determine a temporal progression for the biological phenomenon for the studied subject.

2. - The method according to claim 1 , wherein the biological phenomenon is a biological phenomenon whose temporal progression extends over more than three years.

3. - The method according to claim 1 or 2, wherein the biological phenomenon is a neurodegenerative disease. 4.- The method according to any one of claims 1 to 3, wherein, at the step of providing first data, the first data are data relative to neuropsychological biomarkers.

5. - The method according to any one of claims 1 to 4, wherein the studied subject is an animal, preferably a mammal, more preferably a human being.

6. - The method according to any one of claims 1 to 5, wherein at the step of providing the numerical model (NM), the stochastic approximation in an expectation- maximization technique is a Monte-Carlo Markov Chain Stochastic Approximation Expectation-Maximization technique.

7. - The method according to any one of claims 1 to 6, wherein at the step of providing the numerical model (NM), the number of subjects is superior or equal to 100.

8. - The method according to any one of claims 1 to 7, wherein, in the numerical model (NM), the data relative to the dispersion of the progression trajectory for the biological phenomenon among a plurality of subjects are provided as standard deviations of time for a plurality of values of biomarkers.

9. - A method for predicting that a subject is at risk of suffering from a disease, the method for predicting at least comprising the step of:

- carrying out the steps of the method for determining the temporal progression of a biological phenomenon which may affect a studied subject to any one of claims 1 to 8, the biological phenomenon being the disease, to obtain a first temporal progression, and

- predicting that the subject is at risk of suffering from the disease based on the first temporal progression.

10.- A method for diagnosing a disease, the method for diagnosing at least comprising the step of:

- carrying out the steps of the method for determining the temporal progression of a biological phenomenon which may affect a studied subject to any one of claims 1 to 8, the biological phenomenon being the disease, to obtain a first temporal progression, and

- diagnosing the disease based on the first temporal progression.

1 1 .- A method for identifying a therapeutic target for preventing and/or treating a pathology, the method comprising the steps of:

- carrying out the method for determining the temporal progression of a biological phenomenon which may affect a studied subject according to any one of claims 1 to 8, the first data being data relative to a subject suffering from the pathology, to obtain a first temporal progression,

- carrying out the method for determining the temporal progression of a biological phenomenon which may affect a studied subject according to any one of claims 1 to 8, the first data being data relative to a subject not suffering from the pathology, to obtain a second temporal progression, - selecting a therapeutic target based on the comparison of the first and second temporal progressions.

12. - Method for identifying a biomarker, the biomarker being a diagnostic biomarker of a pathology, a susceptibility biomarker of a pathology, a prognostic biomarker of a pathology or a predictive biomarker in response to the treatment of a pathology, the method comprising the steps of:

- carrying out the method for determining the temporal progression of a biological phenomenon which may affect a studied subject according to any one of claims 1 to 8, the first data being data relative to a subject suffering from the pathology, to obtain a first temporal progression,

- carrying out the method for determining the temporal progression of a biological phenomenon which may affect a studied subject according to any one of claims 1 to 8, the first data being data relative to a subject not suffering from the pathology, to obtain a second temporal progression,

- selecting a biomarker based on the comparison of the first and second temporal progressions.

13. - Method for screening a compound useful as a medicine, the compound having an effect on a known therapeutical target, for preventing and/or treating a pathology, the method comprising the steps of

- carrying out the method for determining the temporal progression of a biological phenomenon which may affect a studied subject according to any one of claims 1 to 8, the first data being data relative to a subject suffering from the pathology and having received the compound, to obtain a first temporal progression,

- carrying out the method for determining the temporal progression of a biological phenomenon which may affect a studied subject according to any one of claims 1 to 8, the first data being data relative to a subject suffering from the pathology and not having received the compound, to obtain a second temporal progression,

- selecting a compound based on the comparison of the first and second temporal progressions.

14.- A computer program product comprising instructions for carrying out the steps of a method according to any one of claims 1 to 13 when said computer program product is executed on a suitable computer device.

15.- A computer readable medium having encoded thereon a computer program product according to claim 14.

Description:
A METHOD FOR DETERMINING THE TEMPORAL PROGRESSION OF A BIOLOGICAL PHENOMENON AND ASSOCIATED METHODS AND DEVICES

TECHNICAL FIELD OF THE INVENTION

The present invention concerns a method for determining the temporal progression of a biological phenomenon. The present invention also relates to an associated method for predicting, an associated method for diagnosing, an associated method for identifying a therapeutic target, an associated method for identifying a biomarker and an associated method for screening. The present invention also concerns a computer program product and a computer readable medium adapted to carry out one of these methods.

BACKGROUND OF THE INVENTION

Age-related brain diseases, such as Parkinson's or Alzheimer's disease are complex diseases, which have multiple effects on the metabolism, structure and function of the brain. Models of disease progression showing the sequence and timing of these effects during the course of the disease remain largely hypothetical. Large multimodal databases have been collected in the recent years in the hope to give experimental evidence of the patterns of disease progression based on the estimation of data-driven models. These databases are longitudinal, in the sense that they contain repeated measurements of several subjects at multiple time-points which do not necessarily correspond across subjects. As a matter of fact, learning models of disease progression from such databases raises great methodological challenges.

The main difficulty lies in the fact that the age of a given individual gives no information about the stage of disease progression of this individual. The first clinical symptoms of Alzheimer's disease may appear at forty years for one patient and eighty years for another. The duration of the disease also may vary across patients from few years to decades. Moreover, the onset of the disease does not correspond with the onset of the symptoms: according to recent studies, symptoms are likely to be preceded by a silent phase of the disease, for which little is known. As a consequence, statistical models based on the regression of measurements with age are inadequate to model disease progression.

SUMMARY OF THE INVENTION

The invention aims at determining the temporal progression of a biological phenomenon, notably a neurodegenerative disease, based on data taken from a subject. To this end, the invention concerns a method for determining the temporal progression of a biological phenomenon which may affect a studied subject, the method comprising the step of providing first data, the first data being data relative to biomarkers for the studied subject, the biomarkers being relative to the progression of the biological phenomenon, the step of providing a numerical model, the numerical model being a function in a Riemann manifold, the numerical model associating to values of biomarkers a temporal progression trajectory for the biological phenomenon and data relative to the dispersion of the progression trajectory for the biological phenomenon among a plurality of subjects, the numerical model being obtained by using a stochastic approximation in an expectation-maximization technique on data relative to biomarkers taken at different time points for a plurality of subjects, the step of converting the first data into at least one point on the same Riemann manifold, and the step of using the numerical model to determine a temporal progression for the biological phenomenon for the studied subject.

The present invention enables to determine the temporal progression of a biological phenomenon based on data taken from a subject.

The data taken from the subject are data relative to biomarkers for the studied subject, the biomarkers being relative to the progression of the biological phenomenon.

There is little constraint on the data taken from the subject which enables to carry out the method with data taken from the subject different from the data used to build the numerical model. The only constraint is that the data taken from the subject and the data used to build the numerical model be relative to the same biomarker(s).

The converting step ensures such freedom.

In addition, the numerical model enables to obtain interesting result by proposing a generic statistical framework for the definition and estimation of mixed-effects models for longitudinal manifold-valued data. Using the tools of geometry allows us to derive a method that makes little assumptions about the data and problem to deal with. Modeling choices boil down to the definition of the metric on the manifold. This geometrical modeling also allows us to introduce the concept of parallel curves on a manifold, which is a key to decompose differences seen in the data in a unique manner into a spatial and a temporal component. Because of the non-linearity of the model, the estimation of the parameters shall be based on an adequate maximization of the observed likelihood. To address this issue, a stochastic version of the Expectation-Maximization algorithm is used.

According to further aspects of the invention which are advantageous but not compulsory, the method for determining the temporal progression of a biological phenomenon might incorporate one or several of the following features, taken in any technically admissible combination: - the biological phenomenon is a biological phenomenon whose temporal progression extends over more than three years.

- the biological phenomenon is a neurodegenerative disease.

- at the step of providing first data, the first data are data relative to neuropsychological biomarkers.

- the studied subject is an animal, preferably a mammal, more preferably a human being.

- at the step of providing the numerical model, the stochastic approximation in an expectation-maximization technique is a Monte-Carlo Markov Chain Stochastic Approximation Expectation-Maximization technique.

- at the step of providing the numerical model, the number of subjects is superior or equal to 100.

- in the numerical model, the data relative to the dispersion of the progression trajectory for the biological phenomenon among a plurality of subjects are provided as standard deviations of time for a plurality of values of biomarkers.

The invention also concerns to a method for predicting that a subject is at risk of suffering from a disease, the method for predicting at least comprising the step of carrying out the steps of the method for determining the temporal progression of a biological phenomenon which may affect a studied subject as previously described, the biological phenomenon being the disease, to obtain a first temporal progression, and the step of predicting that the subject is at risk of suffering from the disease based on the first temporal progression.

The invention also relates to a method for diagnosing a disease, the method for diagnosing at least comprising the step of carrying out the steps of the method for determining the temporal progression of a biological phenomenon which may affect a studied subject as previously described, the biological phenomenon being the disease, to obtain a first temporal progression, and the step of diagnosing the disease based on the first temporal progression.

The invention also concerns to a method for identifying a therapeutic target for preventing and/or treating a pathology, the method comprising the steps of carrying out the method for determining the temporal progression of a biological phenomenon which may affect a studied subject as previously described, the first data being data relative to a subject suffering from the pathology, to obtain a first temporal progression, carrying out the method for determining the temporal progression of a biological phenomenon which may affect a studied subject as previously described, the first data being data relative to a subject not suffering from the pathology, to obtain a second temporal progression, selecting a therapeutic target based on the comparison of the first and second temporal progressions.

The invention also relates to a method for identifying a biomarker the biomarker being a diagnostic biomarker of a pathology, a susceptibility biomarker of a pathology, a prognostic biomarker of a pathology or a predictive biomarker in response to the treatment of a pathology, the method comprising the steps of carrying out the method for determining the temporal progression of a biological phenomenon which may affect a studied subject as previously described, the first data being data relative to a subject suffering from the pathology, to obtain a first temporal progression, carrying out the method for determining the temporal progression of a biological phenomenon which may affect a studied subject as previously described, the first data being data relative to a subject not suffering from the pathology, to obtain a second temporal progression, selecting a biomarker based on the comparison of the first and second temporal progressions.

The invention also relates to a method for screening a compound useful as a medicine, the compound having an effect on a known therapeutical target, for preventing and/or treating a pathology, the method comprising the steps of carrying out the method for determining the temporal progression of a biological phenomenon which may affect a studied subject as previously described, the first data being data relative to a subject suffering from the pathology and having received the compound, to obtain a first temporal progression, carrying out the method for determining the temporal progression of a biological phenomenon which may affect a studied subject as previously described, the first data being data relative to a subject suffering from the pathology and not having received the compound, to obtain a second temporal progression, selecting a compound based on the comparison of the first and second temporal progressions.

The invention also relates to a computer program product comprising instructions for carrying out the steps of a method as previously described, when said computer program product is executed on a suitable computer device. The invention also relates to a computer readable medium having encoded thereon a computer program product as previously described.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be better understood on the basis of the following description which is given in correspondence with the annexed figures and as an illustrative example, without restricting the object of the invention. In the annexed figures: - Figure 1 shows schematically a system and a computer program product whose interaction enables to carry out a method for determining the temporal progression of a biological phenomenon;

- Figure 2 shows a flowchart of an example of carrying out a method for determining the temporal progression of a biological phenomenon;

- Figure 3 shows schematically an example of longitudinal manifold-valued data;

- Figure 4 shows schematically an example of numerical model in a Riemann manifold;

- Figures 5 to 7 shows schematically calculation in the Riemann manifold so as to obtain another example of numerical model, and

- Figures 8 to 16 show schematically experimental results obtained by using the numerical model of Figures 5 to 7.

DETAILED DESCRIPTION OF SOME EMBODIMENTS

A system 10 and a computer program product 12 are represented in Figure 1 . The interaction between the computer program product 12 and the system 10 enables to carry out a method for determining the temporal progression of a biological phenomenon.

System 10 is a computer. In the present case, system 10 is a laptop.

More generally, system 10 is a computer or computing system, or similar electronic computing device adapted to manipulate and/or transform data represented as physical, such as electronic, quantities within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices.

System 10 comprises a processor 14, a keyboard 22 and a display unit 24.

The processor 14 comprises a data-processing unit 16, memories 18 and a reader 20. The reader 20 is adapted to read a computer readable medium.

The computer program product 12 comprises a computer readable medium.

The computer readable medium is a medium that can be read by the reader of the processor. The computer readable medium is a medium suitable for storing electronic instructions, and capable of being coupled to a computer system bus.

Such computer readable storage medium is, for instance, a disk, a floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs) electrically programmable read-only memories (EPROMs), electrically erasable and programmable read only memories (EEPROMs), magnetic or optical cards, or any other type of media suitable for storing electronic instructions, and capable of being coupled to a computer system bus. A computer program is stored in the computer readable storage medium. The computer program comprises one or more stored sequence of program instructions.

The computer program is loadable into the data-processing unit 16 and adapted to cause execution of the method for determining the temporal progression of a biological phenomenon when the computer program is run by the data-processing unit 16.

Operation of the system 10 is now described in reference to an example of carrying out of a method for determining the temporal progression of a biological phenomenon which may affect a studied subject.

The studied subject is an animal.

Preferably, the studied subject is a mammal.

More preferably, the studied subject is a human being.

For the sake of exemplification, it is assumed, in the remainder of the specification, that the studied subject is a human being.

In the most generic definition, the biological phenomenon is a biological phenomenon for which a temporal progression can be defined.

For instance, a disease or the ageing are examples for which a temporal progression can be defined.

According to a specific embodiment, the biological phenomenon is a biological phenomenon whose temporal progression extends over more than three years.

According to a more specific embodiment, the biological phenomenon is a biological phenomenon whose temporal progression extends over more than ten years.

In addition, at least one of the onset of the biological phenomenon or the duration of the biological phenomenon varies from a subject to another.

A typical example for such biological phenomenon is a neurodegenerative disease. Neurodegenerative disease designates a set of disease which primarily affects the neurons in the human brain.

Alzheimer's disease, Parkinson's disease, prion disease, motor neurone disease, Huntington's disease, spinocerebellar ataxia and spinal muscular atrophy are examples of neurodegenerative diseases.

For the sake of exemplification, it is assumed, in the remainder of the specification, that the biological phenomenon is Alzheimer's disease.

The method for determining comprises four steps: a step S30 of providing first data, a step S32 of providing a numerical model, a step S34 of converting and a step S36 of using.

At step S30, first data are provided.

The first data are data relative to biomarkers for the studied subject. The biomarkers are relative to the progression of the biological phenomenon.

For instance, the first data are data collected via cognitive tests devoted to detect the Alzheimer's disease.

According to another embodiment, the first data are data obtained by medical imaging.

According to still another embodiment, the first data are a combination of data collected by different ways. As an example, the first data are data collected via cognitive tests and data by medical imaging.

Preferably, the first data are data collected at several time points for the same subject.

In the specific described example, the first data are data relative to neuropsychological biomarkers, for instance assessing cognitive or motor functions such as memory or praxis.

At step S32, a numerical model labeled NM is provided.

The numerical model NM is a function in a Riemann manifold named RM.

The numerical model NM associates to values of biomarkers a temporal progression trajectory for the biological phenomenon and data relative to the dispersion of the progression trajectory for the biological phenomenon among a plurality of subjects.

According to a preferred embodiment, the data relative to the dispersion of the progression trajectory for the biological phenomenon among a plurality of subjects are provided as standard deviations of time for a plurality of values of biomarkers.

The numerical model NM is therefore construable as a statistical model.

The numerical model NM is obtained by using a stochastic approximation in an expectation-maximization technique on data relative to biomarkers taken at different time points for a plurality of subjects.

This means that the numerical model NM concerns longitudinal data.

Figure 3 illustrates a specific example of longitudinal data.

For each subject, several images of their brain are taken at different instants.

More precisely, for subject 1 , two images I40 and I42 are provided.

For subject 2, three images I44, I46 and I48 are provided.

For subject 3, two images I50 and I52 are provided.

Except for images I48 and I52 (as schematically illustrated by the arrow on Figure 3), the images are taken at different time points.

The case of Figure 3 is purely illustrative, the longitudinal data being usually more numerous.

Indeed, preferably, the number of subjects is superior or equal to 100. In other words, the data consists in repeated multivariate measurements of p individuals. For a given individual, the measurements are obtained at time points ,

The y-th measurement of the /-th individual is denoted by In the remainder of the

specification, it is assumed that each observation is a point on a A/-dimensional Riemannian manifold M embedded in R N and equipped with a Riemannian metric

The generic spatiotemporal model belongs to a class of statistical models for which maximum likelihood estimates cannot be obtained in closed form. This issue is addressed by using a stochastic approximation in a expectation-maximization technique.

Preferably, the stochastic approximation in an expectation-maximization technique is a Monte-Carlo Markov Chain Stochastic Approximation Expectation-Maximization technique.

An example of the numerical model NM provided at the step S32 is schematically represented on Figure 4.

The numerical model NM can be provided by providing three curves which are C moy , C 1 and C 2 . The three curves correspond to three kinds of temporal progression for the Alzheimer's disease. These curves may represent the average disease progression trajectory (C moy ) and the dispersion of this trajectory at plus or minus one standard deviation in the values of biomarkers for a plurality of subjects

Two curves L1 and L2 are represented on Figure 4. These curves indicate a temporal correspondence between the mean curve C moy and the second curve C 2 .

As a specific example, in relation to Figures 5 to 7, it is now illustrated how to obtain a specific numerical model NM relevant for the Alzheimer's disease.

Let be a Riemannian manifold of dimension N equipped with a Riemannian

metric g w , which is assumed to be geodesically complete.

A Riemannian metric is geodesically complete if the geodesies of M are defined on this notation being for the set of real numbers. It is recalled that a geodesic is a curve

drawn on the manifold , which has no acceleration.

The Riemannian metric g w defines a unique affine connexion on M, namely the Levi-Civita connexion, denoted by Let γ denote a geodesic of It is

recalled that, given a tangent vector ξ in T y(to) M, the parallel transport of ξ along γ, denoted by is a vector field along γ which satisfies the following

equations:

Let p £ M.

The Riemannian exponential in M at p is denoted by

For denotes the value at time 1 of the geodesic in M issued from

p with initial velocity v.

The temporal progression of a family of N (N≥ 2) scalar biomarkers is studied. A longitudinal dataset of the form is considered. This longitudinal dataset is

obtained by observing p individuals at repeated time points. The vector denotes the j-th observation of the i-th individual. The k-Vn coordinate of denoted by

corresponds to the measurement of the k-th biomarker, at time ti j .

It is assumed that each measurement belongs to a one dimensional Riemannian

manifold (M, g) which is geodesically complete. In this setting, the observations = can be considered as points in the product manifold The average

progression of this family of biomarkers is modeled by a geodesic trajectory on the manifold M, which is equipped with the product metric, denoted by

The numerical model NM is described for observations on a manifold which is a product of one-dimensional manifolds. This framework is particularly convenient to analyze the temporal progression of a family of biomarkers.

In order to determine relative progression of the biomarkers among themselves, the average trajectory is chosen among the parametric family of geodesies : where:

and

• Yo is a geodesic, of the one-dimensional manifold M, parametrized by a point p o in M, a time t 0 and a velocity

This parametrization of the geodesic γ 0 is the natural parametrization such that:

By choosing the average trajectory among this parametrized family of geodesies, it is assumed that, on average, the biomarkers follow the same trajectory but shifted in time. The delay between the progression of the different biomarkers is measured by the vector The parameters measure a relative delay

between two consecutive biomarkers. The parameter t 0 plays the role of reference time as the trajectory of the first biomarker will reach the value p 0 at time t 0 whereas the other trajectories will reach the same value p 0 at different points in time, shifted with respect to the time t 0 .

The numerical model NM is a hierarchical model: data points are assumed to be sampled from subject-specific trajectories of progression. These individual trajectories are derived from the average trajectory γ δ . The subject-specific trajectory of the i-th individual is constructed by considering a non-zero tangent vector orthogonal to

for the inner product defined by the metric « This tangent vector

is a space shift which enables registering the individual trajectories in the space of measurements. The tangent vector w, is transported along the geodesic γ δ from time t 0 to time s using parallel transport. This transported tangent vector is denoted by

At the point a new point in is obtained by taking the Riemannian

exponential of This new point is denoted by As s varies, this point

describes the curve which is considered as a "parallel" to the curve γ δ (see

Figure 5). The orthogonality condition on the tangent vectors w, is an important hypothesis which ensures that a point on a parallel moves at the same pace on this parallel than on the average trajectory. This hypothesis ensures the uniqueness of the decomposition between spatial and temporal components.

The trajectory γ i of the i-th individual is obtained by reparametrizing the parallel where the mapping is a

subject-specific affine reparametrization which allows registering in time the different individual trajectories of progression. Such affine reparametrization corresponds to a time- warp wherein the tangent vector w, can be considered, in the light of the univariate model, as a random effect associated to the point p 0 .

The parameter a, is an acceleration factor which encodes whether the /-th individual is progressing faster or slower than the average, τ, is a time-shift which characterizes the advance or delay of the ith individual with respect to the average and w, is a space-shift which encodes the variability in the measurements across individuals at a given stage, once the paces at which individual trajectories are followed are normalized. Each of these parameters are assumed to be random, non observed and variables.

Because M is equipped with the product metric, the parallel transport of the tangent vector is a N-dimensional vector whose component is equal

to the parallel transport of the tangent vector along the curve

in the one-dimensional manifold M. It follows that:

Taking the Riemanniann exponential, in of the tangent vector boils

down to taking the Riemannian exponential, in M, of each component of the vector. If denotes the Riemannian exponential map in M, the component of

For the longitudinal dataset the numerical model NM writes:

In particular, for the /c-th biomarker, this numerical model NM writes :

Wherein:

• assumed to follow a probability distribution

being the Gaussian distribution with zero mean and variance σ

• is assumed to follow a probability distribution according to

being the variance of the Gaussian distribution,

• is assumed to follow a probability distribution according to

being the variance of the Gaussian distribution and l n the

identity matrix of order n, and

• the tangent vectors w, are assumed to be a linear combination of N s < N statistically independent components. This writes w, = A.s, where A is a N χ N s matrix of rank N s whose columns are vectors in and s, is a

vector of N s independent sources following a Laplace distribution with parameter 1 /2.

As a consequence, the fixed effects of the model are the parameters of the average geodesic: the point p 0 on the manifold, the time-point t 0 and the velocity v 0 . The random effects are the acceleration factors α,, time-shifts τ, and space-shifts w,. The random effects are considered as hidden variables. With the

observed data form the complete data of the model. In this context, the

Expectation-Maximization (EM) algorithm is very efficient to compute the maximum likelihood estimate of the parameters of the model, denoted Θ.

In other words, it appears that the numerical model NM depends from the vector parameter Θ which writes:

where vec(A) stands for the entries of the matrix A concatenated in a raw vector. In addition, the random effects of the model are described by

A stochastic version of the Expectation-Maximization (EM) algorithm is used to estimate the vector parameter Θ of the numerical model NM. Because of the nonlinearity of the model, the E step of the EM algorithm is intractable. A stochastic version of the EM algorithm, namely the Monte-Carlo Markov Chain Stochastic Approximation Expectation- Maximization (MCMC-SAEM) algorithm is used.

In order to ensure the theoretical convergence of the MCMC SAEM algorithm, the parameters of the model are considered as realizations of independents Gaussian random variables, which is equivalent to ensure that the model belongs to the curved exponential family.

This approach yields to the following hypothesis:

• p o follows a probability distribution according to ο being

the mean and variance of the Gaussian distribution respectively,

· t 0 follows a probability distribution being the mean and

variance of the Gaussian distribution respectively,

• v 0 follows a probability distribution being the mean and variance of the Gaussian distribution respectively,

• for all k, S k follows a probability distribution being

the mean and variance of the Gaussian distribution respectively, and

• matrix A is assumed to be writable as where, for all k, c k

following a probability distribution according to being the mean and variance of the Gaussian distribution respectively, A k being an orthonormal basis of the sub-space in that is orthogonal to y(t 0 ) obtained using a Gram-Schmidt process. Under this hypothesis, the random variables are considered hidden variables for the

model. The previous hypothesis relative to matrix A ensures the orthogonality condition on the columns of A. The orthogonality condition ensures that a point on the parallel curve moves at the same

pace as in the average trajectory, so only the parameters of the time- reparameterization function accounts for the difference in the dynamics of the progression across the subjects. This property is crucial to ensure a unique decomposition between the spatial and temporal components, and thus the identifiability of the model.

Therefore, under the previous hypotheses, the vector parameter Θ of the model is:

whereas the hidden variables z of the model are: To obtain the vector parameter Θ and hidden variables z, the MCMC-SAEM is iterated, until convergence, between three sub-steps: a first sub-step of simulation, a second sub-step of stochastic approximation and a third sub-step of maximization.

Let k be an integer greater than 1 and (respectively denote the

parameters (respectively the hidden variables) at the iteration of the algorithm.

The k-th iteration can be described as follows.

At the sub-step of simulation, is sampled from the transition kernel of an ergodic Markov Chain whose stationary distribution is the conditional distribution of the hidden variables knowing the observations and the current estimates of the parameters

Q <k~ 1) . This sampling is done by using a Hasting-Metropolis technique within a Gibbs sampler scheme.

At the sub-step of stochastic approximation, the stochastic approximation is done by calculating sufficent statistics as follows:

where k is a decreasing sequence of positive step sizes.

In other words, the stochastic approximation sub-step consists in a stochastic approximation on the complete log-likelihood log ) summarized as follows :

where is a decreasing sequence of positive step-sizes in ]0, 1 ] which satisfies

At the sub-step of maximization, parameters updates are obtained in closed form from the stochastic approximation on the sufficient statistics.

For instance, the parameter estimates are updated in the maximization step according to the following formula:

In summary, it has been disclosed a generic hierarchical spatiotemporal model for longitudinal manifold-valued data. The data consist in repeated measurements over time for a group of individuals. This numerical model NM enables estimating a group-average trajectory of progression, considered as a geodesic of a given Riemannian manifold. Individual trajectories of progression are obtained as random variations, which consist in parallel shifting and time reparametrization, of the average trajectory. These spatiotemporal transformations allow the applicant to characterize changes in the direction and in the pace at which trajectories are followed. The parameters of the model are estimated using a stochastic approximation of the expectation-maximization (EM) algorithm, the Monte Carlo Markov Chain Stochastic Approximation EM (MCMC SAEM) algorithm. Experimental results obtained with the numerical model NM are illustrated in the experimental section.

Thus, at the end of the step of providing S32, both first data and a numerical model NM are obtained.

At the step of converting S34, the first data are converted into at least one point on the same Riemann manifold RM.

At the step of using S36, the numerical model NM is used to determine a temporal progression for the Alzheimer's disease for the studied subject.

These steps S34 and S36 are difficult to carry out so far as it requires to adapt (personalize) the parameters of the NM so that the trajectory of progression passes through, or as close as possible, to the first data converted as points on the Riemann manifold RM.

For instance, one may use the parameters of dispersion of the NM to generate series of trajectories on the Riemann manifold RM, which derives from the average trajectory by spatiotemporal transformations, such as the curves C : and C 2 in the example of Figure 4. The point(s) to which the first data correspond in the Riemann manifold RM are used to select the curve which minimizes a distance with these points.

If the distance with the first curve Ci is the smallest, it is determined that the expected temporal progression for the Alzheimer's disease for the studied subject is the first curve Ci .

The present invention therefore enables to determine the temporal progression of a biological phenomenon based on data taken from a subject.

The present method provides a good versatility in so far as the Riemannian manifold RM and its metric are chosen a priori, which allows us to introduce anatomical, physiological constraints into the model. The definition of the generic spatiotemporal model requires no other choice. The models which are introduced herein are based on the concept of parallel curves on a manifold. The random effects of the model allow to spatially and temporally register individual trajectories of progression.

The absence of other choice required in the present invention is different from the prior art in which the model is constrained to fit the data. In other words, a reduced model (with hypothesis) is used to fit the data which means that prior to any fitting, the hypotheses are already known. On the contrary, the model is adapted in step S36 to the data and the obtained model enables to determine the hypotheses. In such context, the step S34 can be construed as a rescaling step of the parameters of the model so that the data be proximate. This rescaling is notably explained in the second and in the third experiments. In addition, it should be stressed that the proposed method for determining is particularly easy to implement in so far as no constraints are imposed on the first data.

Given the complexity of the problem to address, it could have been indeed expected that constraints be imposed on the data to provide. In particular, there is no need to data taken at specific time points. As a specific example, there is no need to obtain data at a specific reference time, such as the date at which disease starts.

Furthermore, the present method for determining a temporal progression is usable in multiple applications.

For instance, such method for determining a temporal progression is used in a method for predicting that a subject is at risk of suffering from a disease.

The method for predicting comprises carrying out the steps of the method for determining the temporal progression of a biological phenomenon which may affect a studied subject, the biological phenomenon being the disease, to obtain a first temporal progression. The method for predicting also comprises predicting that the subject is at risk of suffering from the disease based on the first temporal progression.

According to a specific embodiment, the method for predicting further provides when specific symptom are expected to occur for the subject.

According to another example, the method for determining a temporal progression is used in a method for diagnosing a disease.

The method for diagnosing comprises carrying out the steps of the method for determining the temporal progression of a biological phenomenon which may affect a studied subject, the biological phenomenon being the disease, to obtain a first temporal progression. The method for diagnosing also comprises a step for diagnosing the disease based on the first temporal progression.

According to another example, the method for determining a temporal progression is used in a method for identifying a therapeutic target for preventing and/or treating a pathology.

The method for identifying a therapeutic target comprises carrying out the steps of the method for determining the temporal progression of a biological phenomenon which may affect a studied subject, the first data being data relative to a subject suffering from the pathology, to obtain a first temporal progression.

The method for identifying a therapeutic target also comprises carrying out the steps of the method for determining a temporal progression which may affect a studied subject, the first data being data relative to a subject not suffering from the pathology, to obtain a second temporal progression. The method for identifying a therapeutic target also comprises a step of selecting a therapeutic target based on the comparison of the first and second temporal progressions.

In such context, the term « therapeutic target » should be construed broadly as encompassing selecting specific kind of patients.

According to yet another example, the method for determining is used in a method for identifying a biomarker.

The biomarker may vary according to the specific example considered. For instance, the biomarker is a diagnostic biomarker of a pathology. In variant, the biomarker is susceptibility biomarker of a pathology, a prognostic biomarker of a pathology or a predictive biomarker in response to the treatment of a pathology.

The method for identifying a biomarker comprises carrying out the steps of the method for determining the temporal progression of a biological phenomenon which may affect a studied subject, the first data being data relative to a subject suffering from the pathology, to obtain a first temporal progression.

The method for identifying a biomarker also comprises carrying out the steps of the method for determining the temporal progression of a biological phenomenon which may affect a studied subject, the first data being data relative to a subject not suffering from the pathology, to obtain a second temporal progression.

The method for identifying a biomarker also comprises a step of selecting a biomarker based on the comparison of the first and second temporal progressions.

According to another example, the method for determining is used in a method for screening a compound useful as a medicine.

The compound has an effect on a known therapeutical target, for preventing and/or modifying and/or treating a pathology.

The method for screening comprises carrying out the steps of the method for determining the temporal progression of a biological phenomenon which may affect a studied subject, the first data being data relative to a to a subject suffering from the pathology and having received the compound, to obtain a first temporal progression.

The method for screening also comprises carrying out the steps of the method for determining the temporal progression of a biological phenomenon which may affect a studied subject, the first data being data relative to a subject suffering from the pathology and not having received the compound, to obtain a second temporal progression.

The method for identifying a therapeutic target also comprises a step of selecting a therapeutic target based on the comparison of the first and second temporal progressions.

Each previously described application illustrate the possibility of the proposed method for determining a temporal progression. The embodiments and alternative embodiments considered here-above can be combined to generate further embodiments of the invention.

EXPERIMENTAL SECTION: FIRST EXPERIMENT

The numerical model NM is used to analyze the temporal progression of a family of biomarkers. This progression model estimates a normative scenario of the progressive impairments of several cognitive functions, considered here as biomarkers, during the course of Alzheimer's disease. The estimated average trajectory provides a normative scenario of disease progression. Random effects provide unique insights into the variations in the ordering and timing of the succession of cognitive impairments across different individuals.

Data considered

The neuropsychological assessment tests "ADAS-Cog 13" from the ADNI1 , ADNIGO or ADNI2 cohorts of the Alzheimer's Disease Neuroimaging Initiative (ADNI) was used. These tests are notably available at the internet address https://ida.loni.usc.edu/.

The "ADAS-Cog 13" consists of 13 questions, which allow testing the impairment of several cognitive functions. For the purpose of our analysis, these items are grouped into four categories: memory (5 items which are items 1 , 4, 7, 8 and 9), language (5 items which are items 2, 5, 10, 1 1 and 12), praxis (2 items which are items 3 and 6) and concentration (1 item which is item 13).

248 individuals were included in the study. These 248 individuals were diagnosed with mild cognitive impairment at their first visit and the diagnosis changed to Alzheimer's disease before their last visit. There is an average of 6 visits per subjects, with an average duration of 6 or 12 months between consecutive visits. The minimum number of visits was 3 and the maximum number of visit was 1 1 .

For the experiments of Figures 8 to 12

According to a first case, the score of each item was normalized by the maximum possible score. Consequently, each data point of each individual consists in thirteen normalized scores, which can be seen as a point on the manifold M = ]0, 1 [ 13 .

In the case where M = ]0, 1 [ 13 , the number of independent sources N s can be any integer between 1 and 12. The choice of the number of independent sources influences the number of parameters to be estimated, which equals 9 + 12 * N S . In order to keep a reasonable runtime, three experiments were conducted with N s equal to 1 , 2 and 3. For each experiment, the MCMC-SAEM algorithm was run five times with different initial parameters. Only the experiment which returned the smallest residual noise variance was kept. Increasing the number of sources allowed to decrease the residual noise among the experiments : σ 2 =0.02 for N s = 1 , σ 2 = 0.0162 for N s = 2 and σ 2 = 0.0159 for N s = 3. Because the residual noise was almost similar for N s = 2 and N s = 3 sources, the results obtained with the less complex model are described. As a consequence, the results obtained with two independent sources are further developed below.

The average trajectory γ δ is given in Figure 8, where each curve represents the temporal progression of one specific item of the ADAS-Cog test. The estimated fixed effects are p 0 =0.74, t 0 = 79.88 years, v 0 = 0.047 unit per year, and in years:

δ = [0; -14; -11 ; 4.6; -13; -14; -7.7; -0.9; -14.4; -14.05; -11.80; -15.3292]

This means that, on average, the memory-related items (items 1 , 4, 7, 8, 9) reach the value p 0 = 0.74 at respectively t 0 , t 0 4 , t 0 7 , t0-δ 8 and t0-δ 9 years, which correspond to respectively 79.88, 75.2, 87.6, 80.7 and 94.3 years. The concentration item reaches the same value at t 0 -5 13 = 86.1 years. The progression of the concentration item is followed by praxis and language items.

Random effects show the variability of this average trajectory within the studied population. The standard deviation of the time-shift equals σ τ =8.3 years, meaning that the disease progression model in Figure 8 is shifted in time by at most ±8.3 years for 95% of the population. This accounts for the variability in the age at disease onset among the population. The effects of the variance of the acceleration factors and the two independent components of the space-shifts are illustrated in Figure 10.

The first column of Figure 10 illustrates the variability in pace of disease progression (the time-shifts are assumed to be zero in order to illustrate the effect of acceleration factor only). This variability is encoded by the variance σ η =0.8 of the log- acceleration factor.

The first and second independent components illustrates the variability in the relative timing of the cognitive impairments.

The first independent direction shows that some memory items and language items are shifted in time with respect to the other ones, especially for memory item 4 ( ) and item 7 (°). The ordering of the memory item 7 (°) and the concentration item is inverted for individuals with a space shift For those individuals, praxies

items are impaired later, after the language items 2(*), items 12 (Δ) and item 5 ( ).

The second independent component shows a greater variability for the memory- related items than for the first independent components, in particular for memory item 9 (Δ) and item 4 ( ). For individuals with a space shift language-related items might be impaired later than the average individual, especially for the language item 12 (Δ).

The subject-specific random effects estimated for each individual are obtained from the sampling step of the last iteration of the MCMC-SAEM and are plotted in Figure 9. The figure 9 shows that the individuals who have a positive (respectively negative) time shift (they are evolving ahead, respectively behind, the average trajectory) are the individuals who converted late (respectively early) to Alzheimer's disease. This means that the individual time-shifts correspond well to the age at which a given individual was diagnosed with Alzheimer's disease. It is also to be noted that there is a negative correlation, equal to -0.4, between the estimated log-acceleration factors and time shifts. There is a tendency for early onset patients to be fast progressers.

Through its subject-specific affine reparametrization, the age of a given individual is registered to the common timeline of the average scenario.

In Figure 1 1 , the evolution of the sum of the absolute errors with time is represented. In this case, the sum of the absolute errors

corresponds to the age at which the /-th individual converted to Alzheimer's disease. As apparent on Figure 1 1 , the sum of the absolute errors shows a unique minimum at t* = 77.45 years. This age can be understood as the age of symptoms onset in the timeline of the normative scenario of disease progression.

Figure 12 corresponds to a histogram giving the number of individuals with relation to the absolute errors on the age of conversion to Alzheimer's disease in years. In other words, the histogram of Figure 12 gives the number of individuals in function of the absolute error which is The analysis of the histogram of Figure 12

shows that the age is a prediction of the true age of conversion : the error of

prediction is less than 5 years for 50% of the population. This prediction is all the more remarkable as this prediction is obtained by analyzing cognitive scores, which are inherently noisy and whose reproducibility is limited.

For the experiments of Figures 13 to 16

According to a second case, scores within each category are added and normalized by the maximum possible score. Consequently, each data point consists in four normalized scores, which can be seen as a point on the manifold M = ]0, 1 [ 4 .

The model was applied with N s = 1 , 2 or 3 independent sources. In each experiment, the MCMC SAEM was run five times with different initial parameter values. The experiment which returned the smallest residual variance σ 2 was kept. The maximum number of iterations was arbitrarily set to 5000 and the number of burn-in iterations was set to 3000 iterations. The limit of 5000 iterations is enough to observe the convergence of the sequences of parameters estimates. As a result, two and three sources allowed to decrease the residual variance better than one source (σ 2 = 0.012 for one source, σ 2 = 0.08 for two sources and σ 2 = 0.084 for three sources). The algorithm was implemented in MATLAB® without any particular optimization scheme. The 5000 iterations required approximately one day.

The number of parameters to be estimated was equal to 9+3 * N s . Therefore, the number of sources did not dramatically impact the runtime. Simulation was the most computationally expensive part of the algorithm. For each run of the Hasting-Metropolis algorithm, the proposal distribution was the prior distribution.

For a matter of clarity and because the results obtained with three sources were similar to the results with two sources, the experimental results obtained with two independent sources are further detailed.

The average model of disease progression γ δ is plotted on Figure 13. Figure 13 is a graph showing the evolution with the age of the memory, language, praxis and concentration in terms of normalized cognitive score. The four curves represented correspond to the estimated average trajectory. A vertical line is drawn at t 0 = 72 years old and a horizontal line is drawn at p 0 = 0.3.

When analyzing Figure 13, it appears that the estimated fixed effects are p 0 =0.3, t 0 = 72 years, v 0 =0.04 unit per year, and δ = [0; -15; -13; -5] years. This means that, on average, the memory score (first coordinate) reaches the value p 0 at t 0 = 72 years, followed by concentration which reaches the same value at t 0 +5 = 77 years, and then followed by praxis and language at the age of 85 and of 87 years respectively.

Random effects show the variability of this average trajectory within the studied population. The standard deviation of the time-shift equals σ τ =7.5 years, meaning that the disease progression model in Figure 13 is shifted by ±7.5 years to account for the variability in the age of disease onset. The effects of the variance of the acceleration factor a,, and the two independent components and A 2 of the space-shifts are illustrated in Figure 15.

Figure 15 is a series of six graphs illustrating the variability in disease progression superimposed with the average trajectory γ δ (dotted lines).

On the first column, the effects of the acceleration factor a, with plots of Y 5 (exp(±a n )(t - t 0 )+ t 0 ) are represented. The acceleration factor a, shows the variability in the pace of disease progression, which ranges between 7 times faster and 7 times slower than the average. On the second column, the effects of the first independent component of space- shift with plots are illustrated. The first independent component shows

variability in the relative timing of the cognitive impairments: in one direction, memory and concentration are impaired nearly at the same time, followed by language and praxis; in the other direction, memory is followed by concentration and then language and praxis are nearly superimposed.

On the third column, the effects of the first independent component of space-shift with plots are represented. The second independent component A 2 keeps

almost fixed the timing of memory and concentration, and shows a great variability in the relative timing of praxis and language impairment. It shows that the ordering of the last two may be inverted in different individuals. Overall, these space-shift components show that the onset of cognitive impairment tends to occur by pairs: memory and concentration followed by language and praxis.

Estimates of the random effects for each individual are obtained from the simulation step of the last iteration of the algorithm and are plotted in Figure 16. In a similar way to Figure 9, Figure 16 shows that the estimated individual time-shifts correspond well to the age at which individuals were diagnosed with Alzheimer's disease. This means that the value p 0 estimated by the model is a good threshold to determine diagnosis (a fact that has occurred by chance), and more importantly that the time-warp correctly registers the dynamics of the individual trajectories so that the normalized age correspond to the same stage of disease progression across individuals. This fact is corroborated by Figure 14 which shows that the normalized age of conversion to Alzheimer's disease is picked at 77 years old with a small variance compared to the real distribution of age of conversion. Figure 14 is a histogram diagram of the ages of conversion to Alzheimer's disease with time-warp (grey bars) and the normalized ages of conversion to Alzheimer's disease (black bars).

EXPERIMENTAL SECTION: SECOND EXPERIMENT

Data

We used this model to highlight typical spatiotemporal patterns of cortical atrophy during the course of Alzheimer's Disease from longitudinal MRI of MCI converters from the ADNI database. This 154 MCI converters corresponding to 787 observations, each subject being observed 5 times on average. To get a common fixed-graph, we aligned the measures on a common atlas with FreeSurfer (which is publicly available notably at http://surfer.nmr.mgh.harvard.edu) in order to distribute the measurements maps on the same underlying graph G, constituted of 1827 nodes corresponding to areas uniformly distributed over the brain surface. Out of these vertices, the Applicant selected 258 control nodes that encode the spatial interpolation of the propagation. The distance matrix D used is defined by a geodesic distance on the graph G.

Cortical thickness measurements

The Applicant used the model instantiation defined in the next paragraph to characterize the cortical thickness decrease. Multiple runs of 30.000 iterations (~4 hours) of this 1827 dimensional MCMC-SAEM lead to a noise standard deviation σ equal to 0.27 with 90% of the data included in the segment [1 .5,3.6].

To represent this propagation, the Applicant computed the thickness difference over 5 year periods. The mean spatiotemporal propagation as the cortical thickness decrease between 80 and 90 years old, shows that the most affected area is the medial- temporal lobe, followed by the temporal neocortex. The parietal association cortex and the frontal lobe are also subject to important alterations. On the other side, the sensory-motor cortex and the visual cortex are less involved in the lesion propagation. These results are highly consistent with previous knowledge of the Alzheimer's Disease effects on the brain structure. As the model is able to exhibit individual spatiotemporal patterns with their associated pace of progression, it respectively refer to individuals with a faster and slower propagation of the disease. The thickness decrease is thus respectively more and less substantial than the mean scenario.

Model instantiation

As many measurements correspond to positive values (eg. the cortical thickness, volume ratios), the Applicant considers in the following the open interval M = ]0,+∞[ as a one-dimensional Riemannian manifold equipped with a Riemannian metric g such that for all With this metric and given

is a geodesically complete Riemannian manifold whose geodesies are of the form

For identifiability reasons, we choose to fix the parameter p k among the nodes, leading to a shared parameter p 0 . Considering the interpolation functions introduced previously and the fa hat the parameters it leads to define:

The model can be rewritten as:

Discussion and perspectives

It has been proposed a mixed-effect model which is able to evaluate a group- average spatiotemporal propagation of a signal at the nodes of a mesh thanks to longitudinal neuroimaging data distributed on a common network. The network edges describe the evolution of the signal whereas its vertices encode a distance between the nodes via a distance matrix. The high dimensionality of the problem is tackled by the introduction of control nodes: they allow to evaluate a smaller number of parameters while ensuring the smoothness of the signal propagation through neighbour nodes. Moreover, individual parameters characterize personalized patterns of propagation as variations of the mean scenario.

The evaluation of this non-linear high dimensional model is made with the MCMC- SAEM algorithm that leads to convincing results: we were able to highlight areas affected by considerable neuronal loss such as the medial-temporal lobe or the temporal neocortex.

The distance matrix, which encodes here the geodesic distance on the cortical mesh, may be changed to account for the structural or functional connectivity information. In this case, signal changes may propagate not only across neighbouring locations, but also at nodes far apart in space but close to each other in the connectome.

This model could also be used with multimodal data, such at PET scans, introducing numerical models of neurodegenerative diseases. These models could first inform about the evolution of a disease at a population level while being customizable to fit individual data, thus predicting stage of the disease or time to symptom onset.

EXPERIMENTAL SECTION: THIRD EXPERIMENT

In the experiment, the Applicant proposes a method to predict the subject-specific longitudinal evolution of brain structures extracted from baseline MRI, and evaluate its performance on Alzheimer's disease data. The disease progression is modeled as a trajectory on a group of diffeomorphisms in the context of large deformation diffeomorphic metric mapping. The Applicant first exhibits the limited predictive abilities of geodesic regression extrapolation on this group. Building on the recent concept of parallel curves in shape manifolds, the Applicant introduces a second predictive protocol which personalizes previously learned trajectories to new subjects, and investigate the relative performances of two parallel shifting paradigms. This design only requires the baseline imaging data. Finally, coefficients encoding the disease dynamics are obtained from longitudinal cognitive measurements for each subject, and exploited to refine our methodology which is demonstrated to successfully predict the follow-up visits.

Introduction

The primary pathological developments of a neurodegenerative disease such as

Alzheimer's are believed to spring long before the first symptoms of cognitive decline. Subtle gradual structural alterations of the brain arise and develop along the disease course, in particular in the hippocampi regions, whose volumes are classical biomarkers in clinical trials. Among other factors, those transformations ultimately result in the decline of cognitive functions, which can be assessed through standardized tests. Being able to track and predict future structural changes in the brain is therefore key to estimate the individual stage of disease progression, to select patients in clinical trials, and to assess treatment efficacy.

To this end, the Applicant's work settles down to predict the future shape of brain structures segmented from MRIs. The Applicant propose a methodology based on three building blocks : extrapolate from the past of a subject ; transfer the progression of a reference subject observed over a longer time period to new subjects ; and refine this transfer with information about the relative disease dynamics extracted from cognitive evaluations. Instead of limiting ourselves to specific features such as volumes, we propose to see each observation of a patient at a given time-point as an image or a segmented surface mesh in a shape space.

In computational anatomy, shape spaces are usually defined via the action of a group of diffeomorphisms. In this framework, one may estimate a flow of diffeomorphisms such that a shape continuously deformed by this flow best fits repeated observations of the same subject over time, thus leading to a subjectspecific spatiotemporal trajectory of shape changes. If the flow is geodesic in the sense of a shortest path in the group of diffeomorphisms, this problem is called geodesic regression and may be thought of as the extension in Riemannian manifolds of the linear regression concept. It is tempting then to use such regression to infer the future evolution of the shape given several past observations. To the best of our knowledge, the predictive power of such a method has not yet been extensively assessed. The Applicant will demonstrate that satisfying results can only be obtained when large numbers of data points over extensive periods of time are available, and that poor ones should be expected in the more interesting use-case scenario of a couple of observations.

In such situations, an appealing workaround would be to transfer previously acquired knowledge from another patient observed over a longer period of time. This idea requires the definition of a spatiotemporal matching method to transport the trajectory of shape changes in a different subject space. Several techniques have been proposed to register image time series of different subjects. They often require time series to have the same number of images, or to have correspondences between images across time series, and are therefore unfit for prognosis purposes. Parallel transport in group of diffeomorphisms have been recently introduced to infer deformation of follow-up images from baseline matching. Such paradigms have been used mostly to transport spatiotemporal trajectories to the same anatomical space for hypothesis testing]. We build on this concept to infer the subject trajectory continuation in the future. Two main methodologies have emerged : either by transporting the time series parallel to the baseline matching, or by transporting the baseline matching parallel to the time series as in the present invention. Both methods are evaluated in this section.

In any case, these approaches require to match the baseline shape with one in the reference time series. Ideally, we should match observations corresponding to the same disease stage, which is unknown. The Applicant proposes to complement such approaches with estimates of the patient stage and pace of progression using repeated neuropsychological assessments of the subjects, in the spirit of this invention. These estimates may be used to adjust the dynamics of shape changes of the reference subject to the test subject, according to the dynamical differences observed in the cognitive tests.

Method

Let be a time series of segmented surface meshes for a given subject, obtained at the ages The Applicant builds a group of diffeomorphisms of the

ambient space which act on the segmented meshes, following the procedure described in the prior art. Flows of diffeomorphisms of are generated by integrating time-varying vector fields of the form

where:

• K is a Gaussian kernel,

• are control points of the deformation, and

· are the momenta of the deformation.

The space of diffeomorphisms is endowed with a norm which measures the cost of the deformation. In the following, are only considered geodesic flows of diffeomorphisms i.e. flows of minimal norm connecting the identity to a given diffeomorphism. Such flows are uniquely parametrized by their initial control points and momenta Under the action of the flow of diffeomorphisms, an initial template shape T is continuously deformed and describes a trajectory in the shape space, which the Applicant will note

Simultaneously, the surface meshes are endowed with a varifold norm ||. || which allows to measure a data attachment term between meshes without point correspondence.

Geodesic regression

In the spirit of linear regression, one can perform geodesic regression in the shape space by estimating the intercept T and the slope such that minimizes

the following functional :

where R is a regularization term which penalizes the kinetic energy of the deformation. A solution of equation (1 ) is estimated with a Nesterov gradient descent as implemented in the software Deformetrica (www.deformetrica.org), where the gradient with respect to the control points, the momenta and the template is computed with a backward integration of the data attachement term along the geodesic. Solving this optimisation problem gives a description of the progression of the brain structures which lies in the tangent space at the identity of the group of diffeomorphisms.

Once an optimum is found, the Applicant can extrapolate the geodesic to further ages and attempt to predict the future evolution of the brain structures.

Two methods to transport spatiotemporal trajectories of shapes

As it will be demonstrated later, geodesic regression extrapolation produces an accurate prediction only if data over a long time span is available for the subject, which is not compatible with the ambition of early prognosis.

Given a reference geodesic, the Riemannian parallel transport is used to generate a new trajectory. It is first performed a baseline matching between the reference subject and the new subject, which can be described as a vector in the tangent space of the group of diffeomorphisms. Two paradigms are available to obtain a parallel trajectory. It is known in prior art to transport the reference regression along the matching and then shoot. In the shape space, this generates a geodesic starting at the baseline shape ; for this reason, this solution is called geodesic parallelization. On the other hand, in the present invention, it is suggested to transport the matching vector along the reference geodesic and then build a trajectory with this transported vector from every point of the reference geodesic. This procedure is called exp-parallelization.

In such a high-dimensional setting, the computation of parallel transport very often relies on the Schild's ladder scheme. However, in the Applicant's case the computation of the Riemannian logarithm may only be computed by solving a shape matching problem, resulting not only in an intractable algorithm but also in an uncontrolled approximation of the scheme. To implement these parallel shifting methods, it was used an algorithm which relies on an approximation of the transport to nearby points by a well-chosen Jacobi field, with a sharp control on the computational complexity. The same rate of convergence as Schild's ladder is obtained at a reduced cost.

Cognitive scores dynamics

The protocol described in the previous section has two main drawbacks. First, the choice of the matching time in the reference trajectory is arbitrary : the baseline is purely a convenience choice and ideally the matching should be performed at similar stages of the disease. Second, it does not take into account the pace of progression of the subject. In the present invention, the Applicant proposes a statistical model allowing to learn, in an unsupervised manner, dynamical parameters of the subjects from ADAS-cog test results, a standardized cognitive test designed for disease progression tracking. More specifically, they suppose that each patient follows a parallel to a mean trajectory, with a time reparametrization :

which maps the subject time to a normalized time frame, where α > 0 and τ are scalar parameters. A high (respectively low) value of a hence corresponds to a fast (respectively slow) progression of the scores. With these dynamical parameters, the shape evolution can be adjusted by reparametrizing the parallel trajectory in the same way. Results

Data, preprocessing, and global parameters

MRIs are extracted from the ADNI database, where only MCI converters with 7 or more visits are kept, for a total of N=74 subjects and 634 visits. Subjects are observed for a period of time ranging from 4 to 9 years (5.9 on average), with 12 visits at most. The 634 MRIs are segmented using the FreeSurfer software. The extracted brain masks are then affinely registered towards the Colin 27 Average Brain using the FSL software. The estimated transformations are finally applied to the pairs of caudates, hippocampi and putamina subcortical structures.

All diffeomorphic operations i.e. matching, geodesic regression estimation, shooting, exp-parallelization and geodesic parallelization are performed thanks to the Deformetrica software previously mentionned. A varifold distance with Gaussian kernel width of 3 mm for each structure and a deformation kernel width of 5 mm are chosen. The time discretization resolution is set to 2 months.

Geodesic regression extrapolation

The acceleration factor α in equation (2) encodes the rate of progression of each patient. Multiplying this coefficient with the actual observation window gives a notion of the absolute observation window length, in the disease time referential. Only the 22 first subjects according to this measure have been considered for this section : they are indeed expected to feature large structural alterations, making the geodesic regression procedure more accurate.

Table 1 presents the results obtained for varying learning dataset and extrapolation extents.

The performance metric between two sets of meshes is the Dice coefficient, that is the sum of the volumes of the intersections of the corresponding meshes, divided by the total sums of the volumes. It is comprised between 0 and 1 : it equals 1 for a perfect match, and 0 for disjoint structures. The geodesic regression predictive performance is compared to a reference one consisting of the last observed brain structures in the learning dataset. A Mann-Whitney test was performed with the null hypothesis that the observed Dice coefficients distributions are the same to obtain the statistical significance levels.

The extrapolated meshes are satisfying only in the case where all but one data points are used to perform the geodesic regression, achieving a high Dice index and outperforming the reference one, by a small margin though and failing to reach the significance level (p = 0.25). When the window of observation becomes narrower, the prediction accuracy decreases and becomes worse than the reference one.

Non reparametrized transport

Among the 22 subjects whose regression-based predictive power has been evaluated in the previous section, the two which performed best are chosen as references for the rest of this paper. Their progressions are transported onto the 73 other subjects with the two different parallel shifting methods. We obtain a total of 288 predicted spatiotemporal trajectories - a few visits were excluded for they fatally failed at some stage of the processing pipeline, for meaningless reasons. In more details, for each pair of reference and target subjects, the baseline target shape is first registered to the reference. The reference geodesic regression is then either exp-parallelized or geodesically parallelized. Prediction performance is then assessed : the Dice index between the prediction and the actual observation, for the two modes of transport, are computed and compared to the Dice index between the baseline meshes and the actual observation - the only available information in the absence of a predictive paradigm.

The upper part of table 2 presents the results. In most cases, the obtained meshes by the proposed protocol are of lesser quality than the reference ones, according to the Dice performance metric. The two methods of transport are essentially similarly predictive, although geodesic parallelization slightly outperforms the exp-parallelization for the M12 prediction.

At the exception of the M12 prediction, both protocols consistently outperform the reference. The M36, M48, M72 and M96 predictions are the most impressive ones, with p- values always lesser than 1 %. This shows that the pace of cognitive score evolution is well correlated with the pace of structural brain changes, and therefore allows an enhanced prediction of follow-up shapes.

No conclusion can be drawn concerning the two parallel shifting methodologies, a single weak significance result being obtained for the M12 prediction. Conclusion

The Applicant conducted a quantitative study of geodesic regression extrapolation, exhibiting its limited predictive abilities. The Applicant then proposed a method to transport a spatiotemporal trajectory into a different subject space with cognitive decline- derived time reparametrization, and demonstrated its potential for prognosis. The results show how crucial the dynamics are in disease modeling, and how cross-modality data can be exploited to improve a learning algorithm. The two main paradigms that have emerged for the transport of parallel trajectories were shown to perform equally well in this prediction task. Nonetheless, given the importance of the time-reparameterization, the exp-parallelization offers a methodological advantage in that it can be more easily combined with such reparameterization in generative statistical model for longitudinal data.

The robustness of the proposed protocol to the choice of reference subject has not been assessed. Such a choice could be avoided by constructing an average disease model. This framework may also be used to estimate a joint image and cognitive model to better estimate individual dynamical parameters of disease progression.