Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
PARALLEL SIMULATION OF LARGE-SCALE DYNAMICAL SYSTEMS USING TENSOR NETWORK
Document Type and Number:
WIPO Patent Application WO/2023/139525
Kind Code:
A1
Abstract:
A system includes a memory storing computer-readable instructions and at least one processor to execute the instructions to perform at least one tensor network method to numerically solve at least one differential equation.

Inventors:
ELEFANTE STEFANO (AT)
IERVOLINO RAFFAELE (IT)
Application Number:
PCT/IB2023/050486
Publication Date:
July 27, 2023
Filing Date:
January 20, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
EFSOLUTIONS GBR (DE)
International Classes:
G06F17/13; G06N10/00
Other References:
JOHNSON T H ET AL: "Dynamical simulations of classical stochastic systems using matrix product states", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 14 June 2010 (2010-06-14), XP080483338, DOI: 10.1103/PHYSREVE.82.036702
PIETRO SILVI ET AL: "The Tensor Networks Anthology: Simulation techniques for many-body quantum lattice systems", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 10 October 2017 (2017-10-10), XP080827719
GEDON DANIEL ET AL: "Tensor Network Kalman Filter for LTI Systems", 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), EURASIP, 2 September 2019 (2019-09-02), pages 1 - 5, XP033660397, DOI: 10.23919/EUSIPCO.2019.8902976
DOMINIC W BERRY: "High-order quantum algorithm for solving linear differential equations", JOURNAL OF PHYSICS A: MATHEMATICAL AND THEORETICAL, INSTITUTE OF PHYSICS PUBLISHING, BRISTOL, GB, vol. 47, no. 10, 19 February 2014 (2014-02-19), pages 105301, XP020259146, ISSN: 1751-8121, [retrieved on 20140219], DOI: 10.1088/1751-8113/47/10/105301
ALEXANDER OSTERMANN ET AL: "Convergence of a low-rank Lie--Trotter splitting for stiff matrix differential equations", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 28 March 2018 (2018-03-28), XP081364599
SONE AKIRA: "Quantum System Identification by Local Measurements", DISSERTATION, 10 May 2019 (2019-05-10), XP093042674, Retrieved from the Internet [retrieved on 20230427]
JOHANNES HAUSCHILD ET AL: "Efficient numerical simulations with Tensor Networks: Tensor Network Python (TeNPy)", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 30 April 2018 (2018-04-30), XP081425382, DOI: 10.21468/SCIPOSTPHYSLECTNOTES.5
Attorney, Agent or Firm:
FRESH IP (GB)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A system comprising: a memory storing computer-readable instructions; and at least one processor to execute the instructions to: perform at least one tensor network method to numerically solve at least one differential equation; build a mathematical representation of one of a physical, economic, and engineering problem using the at least one differential equation; determine a graph that defines connections between states of the problem and determine an adjacency matrix; subdivide a matrix into n sets of matrices whose elements commute with each other while at least one element of a set does not commute with at least one element of another set; implement a Suzuki Trotter decomposition using singular value decomposition to reduce data transfer among cores of the at least one processor on the n sets of matrices with a given time interval δ and a predefined p expansion order; evaluate the problem at a time T=Nδ by iteratively performing the Suzuki Trotter decomposition N times; and generate simulation results for the problem within an error of order of N o (δp+1) .

2. The system of claim 1 , wherein the at least one differential equation comprises one of a linear differential equation and a non-linear differential equation.

3. The system of claim 1, wherein the at least one differential equation comprises one of a linear time variant differential equation and a non-linear time variant differential equation.

4. The system of claim 1 , wherein the tensor network method comprises a timeevolving block decimation (TEBD) method.

5. The system of claim 1, further comprising at least one graphical processing unit (GPU) to execute the instructions.

6. The system of claim 1 , wherein the at least one processor numerically solves the at least one differential equation sequentially.

7. The system of claim 1, wherein the at least one processor numerically solves the at least one differential equation in parallel.

8. A method, comprising: performing, by at least one processor, at least one tensor network method to numerically solve at least one differential equation; building, by the at least one processor, a mathematical representation of one of a physical, economic, and engineering problem using the at least one differential equation; determining, by the at least one processor, a graph that defines connections between states of the problem and determining an adjacency matrix; subdividing, by the at least one processor, a matrix into n sets of matrices whose elements commute with each other while at least one element of a set does not commute with at least one element of another set; implementing, by the at least one processor, a Suzuki Trotter decomposition using singular value decomposition to reduce data transfer among cores of the at least one processor on the n sets of matrices with a given time interval δ and a predefined p expansion order; evaluating, by the at least one processor, the problem at a time T=Nδ by iteratively performing the Suzuki Trotter decomposition N times; and generating, by the at least one processor, simulation results for the problem within an error of order of N o (δp+1).

9. The method of claim 8, wherein the at least one differential equation comprises one of a linear differential equation and a non-linear differential equation.

10. The method of claim 8, wherein the at least one differential equation comprises one of a linear time variant differential equation and a non-linear time variant differential equation.

11. The method of claim 8, wherein the tensor network method comprises a timeevolving block decimation (TEBD) method.

12. The method of claim 8, wherein the at least one processor comprises at least one graphical processing unit (GPU).

13. The method of claim 8, further comprising numerically solving the at least one differential equation sequentially.

14. The method of claim 8, further comprising numerically solving the at least one differential equation in parallel.

15. A non-transitory computer-readable storage medium comprising instructions stored thereon that, when executed by a computing device cause the computing device to perform operations, the operations comprising: performing, at least one tensor network method to numerically solve at least one differential equation; building a mathematical representation of one of a physical, economic, and engineering problem using the at least one differential equation; determining a graph that defines connections between states of the problem and determining an adjacency matrix; subdividing a matrix into n sets of matrices whose elements commute with each other while at least one element of a set does not commute with at least one element of another set; implementing a Suzuki Trotter decomposition using singular value decomposition to reduce data transfer among cores of the at least one processor on the n sets of matrices with a given time interval δ and a predefined p expansion order; evaluating the problem at a time T=Nδ by iteratively performing the Suzuki Trotter decomposition N times; and generating simulation results for the problem within an error of order of lVo(δp+1).

16. The non-transitory computer-readable medium of claim 15, wherein the at least one differential equation comprises one of a linear differential equation and a non-linear differential equation.

17. The non-transitory computer-readable medium of claim 15, wherein the at least one differential equation comprises one of a linear time variant differential equation and a nonlinear time variant differential equation.

18. The non-transitory computer-readable medium of claim 15, wherein the tensor network method comprises a time-evolving block decimation (TEBD) method.

19. The non-transitory computer-readable medium of claim 15, wherein the at least one computing device comprises at least one graphical processing unit (GPU).

20. The non-transitory computer-readable medium of claim 15, the operations further comprising numerically solving the at least one differential equation sequentially.

21. The non-transitory computer-readable medium of claim 15, the operations further comprising numerically solving the at least one differential equation in parallel.

Description:
Parallel Simulation of Large-Scale Dynamical Systems Using Tensor Network

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application is related to and claims priority to U.S. Patent Application No. 63/301,406, filed January 20, 2022, entitled “Parallel Simulation of Large-Scale Dynamical Systems Using Tensor Network,” the entire contents of which are incorporated herein by reference.

BACKGROUND

[0002] Over the last decade, tensor networks have become used in quantum mechanics and other fields such as quantum chemistry and machine learning. Although tensor networks have been used to solve differential equations, they have not been used to investigate dynamic systems and problems associated with dynamic systems.

[0003] It is with these issues in mind, among others, that various aspects of the disclosure were conceived.

SUMMARY

[0004] The present disclosure is directed to parallel simulation of large-scale dynamical systems using a tensor network.

[0005] In one example, a system may include a memory storing computer-readable instructions and at least one processor to perform at least one tensor network method to numerically solve at least one differential equation, build a mathematical representation of one of a physical, economic, and engineering problem using the at least one differential equation, determine a graph that defines connections between states of the problem and determine an adjacency matrix, subdivide a matrix into n sets of matrices whose elements commute with each other while at least one element of a set does not commute with at least one element of another set, implement a Suzuki Trotter decomposition using singular value decomposition to reduce data transfer among cores of the at least one processor on the n sets of matrices with a given time interval δ and a predefined p expansion order, evaluate the problem at a time T=Nδ by iteratively performing the Suzuki Trotter decomposition N times, and generate simulation results for the problem within an error of order of N o (δ p+1 ) . [0006] In another example, a method may include performing, by at least one processor, at least one tensor network method to numerically solve at least one differential equation, building, by the at least one processor, a mathematical representation of one of a physical, economic, and engineering problem using the at least one differential equation, determining, by the at least one processor, a graph that defines connections between states of the problem and determining an adjacency matrix, subdividing, by the at least one processor, a matrix into n sets of matrices whose elements commute with each other while at least one element of a set does not commute with at least one element of another set, implementing, by the at least one processor, a Suzuki Trotter decomposition using singular value decomposition to reduce data transfer among cores of the at least one processor on the n sets of matrices with a given time interval δ and a predefined p expansion order, evaluating, by the at least one processor, the problem at a time T=Nδ by iteratively performing the Suzuki Trotter decomposition N times, and generating, by the at least one processor, simulation results for the problem within an error of order of N o (δ p+1 ).

[0007] According to an additional aspect, a non-transitory computer-readable storage medium includes instructions stored thereon that, when executed by a computing device cause the computing device to perform operations, the operations including performing at least one tensor network method to numerically solve at least one differential equation, building a mathematical representation of one of a physical, economic, and engineering problem using the at least one differential equation, determining a graph that defines connections between states of the problem and determining an adjacency matrix, subdividing a matrix into n sets of matrices whose elements commute with each other while at least one element of a set does not commute with at least one element of another set, implementing a Suzuki Trotter decomposition using singular value decomposition to reduce data transfer among cores of the at least one processor on the n sets of matrices with a given time interval δ and a predefined p expansion order, evaluating the problem at a time T=Nδ by iteratively performing the Suzuki Trotter decomposition N times, and generating simulation results for the problem within an error of order of N o (δ p+1 ). [0008] These and other aspects, features, and benefits of the present disclosure will become apparent from the following detailed written description of the preferred embodiments and aspects taken in conjunction with the following drawings, although variations and modifications thereto may be effected without departing from the spirit and scope of the novel concepts of the disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] The accompanying drawings illustrate embodiments and/or aspects of the disclosure and, together with the written description, serve to explain the principles of the disclosure. Wherever possible, the same reference numbers are used throughout the drawings to refer to the same or like elements of an embodiment, and wherein:

[0010] Figure 1 is a diagram of a method performed by a system according to an example of the instant disclosure.

[0011] Figure 2 is a one dimensional lattice with one neighboring site interaction according to an example of the instant disclosure.

[0012] Figure 3 is a one dimensional lattice with two neighboring sites interaction according to an example of the instant disclosure.

[0013] Figure 4 shows a two-dimensional lattice with one neighboring site interaction and partitioned using square elements according to an example of the instant disclosure.

[0014] Figure 5 shows a two-dimensional lattice with one neighboring site interaction and partitioned using triangle elements according to an example of the instant disclosure.

[0015] Figure 6 shows a two-dimensional lattice with one neighboring site interaction and partitioned using another pattern of triangle elements according to an example of the instant disclosure.

[0016] Figure 7 shows a three-dimensional lattice with one neighboring site interaction and partitioned using cubic elements according to an example of the instant disclosure.

[0017] Figure 8 shows a parallel implementation for a band matrix and three different computational layers for a second order Suzuki Trotter decomposition according to an example of the instant disclosure. [0018] Figures 9A and 9B show another parallel implementation for a band matrix and seven different computational layers for a fourth order Suzuki Trotter decomposition according to an example of the instant disclosure.

[0019] Figure 10 shows a parallel implementation for any matrix A for the second order Suzuki Trotter decomposition according to an example of the instant disclosure.

[0020] Figure 11 shows an example of connections of a full matrix characterizing four sites according to an example of the instant disclosure.

[0021] Figure 12 shows an example of connections of a full matrix characterizing four sites according to an example of the instant disclosure.

[0022] Figure 13 shows a mass spring damper system according to an example of the instant disclosure.

[0023] Figure 14 shows communication between threads when the matrix A is a band matrix according to an example of the instant disclosure.

[0024] Figure 15 shows free evolution response showing all the state variables according to an example of the instant disclosure.

[0025] Figure 16 shows a number of singular values during the simulation greater than the threshold σ= 0.01 according to an example of the instant disclosure.

[0026] Figure 17 shows simulation error versus elapsed time in seconds with δ =0.1 and p=4 according to an example of the instant disclosure.

[0027] Figure 18 shows simulation error versus elapsed time in seconds with δ =0.01 and p=4 according to an example of the instant disclosure.

[0028] Figure 19 shows a mass spring damper system with a first and a last mass connected according to an example of the instant disclosure.

[0029] Figure 20 shows communication between threads when matrix A is not a band matrix and a Suzuki Trotter decomposition 2nd order approximant with ten masses and five threads considered according to an example of the instant disclosure.

[0030] Figure 21 shows parallel implementation of the Suzuki Trotter decomposition second order approximant when the first and last mass are connected according to an example of the instant disclosure. [0031] Figure 22 shows a two-dimensional lattice model with masses and no dumping according to an example of the instant disclosure.

[0032] Figure 23 shows a two-dimensional lattice with twelve masses connected through square elements according to an example of the instant disclosure.

[0033] Figure 24 shows a two-dimensional lattice with twelve masses connected through triangular elements according to an example of the instant disclosure.

[0034] Figure 25 shows a three-dimensional lattice model with twenty-seven masses connected through cube elements according to an example of the instant disclosure.

[0035] Figure 26 shows step force according to an example of the instant disclosure.

[0036] Figure 27 shows free evolution plus step response of the first state variable including all simulations according to an example of the instant disclosure.

[0037] Figure 28 shows step response of the first state variable including all the simulations according to an example of the instant disclosure.

[0038] Figure 29 shows free evolution plus sine response of the first state variable including all the simulations for m=1 according to an example of the instant disclosure.

[0039] Figure 30 shows a sine response of the first state variable including all the simulations for ω= 1 according to an example of the instant disclosure.

[0040] Figure 31 shows parallel implementation using three different computational layers for a second order Suzuki Trotter decomposition according to an example of the instant disclosure. [0041] Figure 32 shows a cantilever beam according to an example of the instant disclosure.

[0042] Figure 33 shows a heat conductor according to an example of the instant disclosure.

[0043] Figure 34 shows a block diagram for non-linear systems according to an example of the instant disclosure.

[0044] Figure 35 shows a parallel implementation of a discrete time-invariant system for a band matrix A according to an example of the instant disclosure.

[0045] Figure 36 shows a parallel implementation of a discrete time-invariant system for any matrix A according to an example of the instant disclosure. [0046] Figure 37 shows an example process of building a mathematical representation of one of a physical, economic, and engineering problem using at least one differential equation according to an example of the instant disclosure.

[0047] Figure 38 shows an example of a system for implementing certain aspects of the present technology.

DETAILED DESCRIPTION

[0048] The present invention is more fully described below with reference to the accompanying figures. The following description is exemplary in that several embodiments are described (e.g., by use of the terms “preferably,” “for example,” or “in one embodiment”); however, such should not be viewed as limiting or as setting forth the only embodiments of the present invention, as the invention encompasses other embodiments not specifically recited in this description, including alternatives, modifications, and equivalents within the spirit and scope of the invention. Further, the use of the terms “invention,” “present invention,” “embodiment,” and similar terms throughout the description are used broadly and not intended to mean that the invention requires, or is limited to, any particular aspect being described or that such description is the only manner in which the invention may be made or used. Additionally, the invention may be described in the context of specific applications; however, the invention may be used in a variety of applications not specifically described.

[0049] The embodiment(s) described, and references in the specification to “one embodiment”, “an embodiment”, “an example embodiment”, etc., indicate that the embodiment(s) described may include a particular feature, structure, or characteristic. Such phrases are not necessarily referring to the same embodiment. When a particular feature, structure, or characteristic is described in connection with an embodiment, persons skilled in the art may effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

[0050] In the several figures, like reference numerals may be used for like elements having like functions even in different drawings. The embodiments described, and their detailed construction and elements, are merely provided to assist in a comprehensive understanding of the invention. Thus, it is apparent that the present invention can be carried out in a variety of ways, and does not require any of the specific features described herein. Also, well-known functions or constructions are not described in detail since they would obscure the invention with unnecessary detail. Any signal arrows in the drawings/figures should be considered only as exemplary, and not limiting, unless otherwise specifically noted. Further, the description is not to be taken in a limiting sense, but is made merely for the purpose of illustrating the general principles of the invention, since the scope of the invention is best defined by the appended claims.

[0051] It will be understood that, although the terms “first,” “second,” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. Purely as a non-limiting example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. As used herein, the singular forms "a", "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be noted that, in some alternative implementations, the functions and/or acts noted may occur out of the order as represented in at least one of the several figures. Purely as a non-limiting example, two figures shown in succession may in fact be executed substantially concurrently or may sometimes be executed in the reverse order, depending upon the functionality and/or acts described or depicted.

[0052] It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may in fact be executed substantially concurrently or may sometimes be executed in the reverse order, depending upon the functionality /acts involved.

[0053] Conditional language, such as, among others, “can,” “could,” “might,” or “may,” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without user input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment. [0054] It is highly desirable to be able to simulate and analyze complex real- world systems. For example, civil engineers model the stresses that may be experienced by a structure under various conditions, in order to confirm that a structure as designed is unlikely to fail. Possible failure in a simulation allows for re-design and re-simulation to quickly improve the design to avoid eventual real-world failures. Similarly, an existing real-world structure can be simulated and analyzed so that the structure may be repaired and/or reinforced if necessary to reduce the chances of failure. Many real-world systems are too complex to be simulated with perfect accuracy. For example, a real-world structure is composed of an essentially infinite number of discrete portions, each of which may be subject to a different stress under a given force. The stresses between one point and another are related, but in a complex way that cannot easily be reduced to an equation for simulation purposes. Therefore, in practice it is necessary to build a simplified model in order to simulate a complex real-world system.

[0055] In some such models, a system is modeled as a network of a finite number of connected, distributed nodes. For example, a structure may be modeled as a mesh of connected nodes in the general shape of the structure. Since all points in the system are interconnected, an assumption can be made that values at points between nodes can be interpolated from values at the nearest nodes. Typically the greater the number of nodes used, the more accurate such an assumption is. Equations may then be developed that model characteristics at each node and their relation to each other. For example for a structure, a system of equations may be developed that estimate the stress at each node depending on initial conditions at the nodes and some input, such as a force acting on one or more nodes, and that incorporate the interconnectedness of the nodes. Commonly, these equations are linear differential equations. The development of such models and equations is well known in the art for a wide variety of engineering (civil, mechanical, aerospace, chemical, telecommunications, electrical, etc.) and other applications, as is discussed in more detail below.

[0056] Systems and methods of the present disclosure may execute a highly scalable parallel algorithm based on tensor network methods to numerically solve large systems of ordinary linear differential equations as well as the iterative resolution of one or more linear difference equations. Linear differential equations may represent mathematical models related to industrial applications and may include a number of variables. The system may utilize distributed computing devices to utilize parallel algorithms to simulate a number of dimensional systems that can be recast in a state space representation. This allows simulations to be run in the same amount of time and with the same accuracy with less computing power, or equivalently the simulations may be run faster and/or more accurately (e.g., with more nodes) with the same computing power.

[0057] The distributed parallel scheme of the present disclosure can be easily deployed on an HPC cluster, grid, as well as on cloud environments and can be summarized as follows. First, the matrix A representing the system space representation is partitioned in n smaller matrices. Each n matrix of this matrix A is assigned to a job and executed by one or more threads permitting the possibility of a fine-grained parallelism for large n. The aim of this preliminary step is to compute the matrix exponentials of the smaller matrices that are therefore calculated una tantum only at the beginning of the processing. Second, starting from a set of initial conditions X o , any instant time T=Nδ can be reached by iteratively applying N times a p order Suzuki Trotter expansion of time step δ. To gain a more efficient parallel execution, singular value decomposition to the output matrix products can be applied during the Suzuki Trotter decomposition execution to reduce the data transfer among cores (this feature can be very useful especially for cloud platforms in which network bandwidth is a major bottleneck). Last, simulation results within an error of order N o (δ p+1 ) for many initial conditions trials are obtained.

[0058] In one example, the system may provide a numerical solution by using a tensor network related to any system model dynamics that can be recast as a linear time invariant state space representation. The system may link quantum mechanics concepts of a tensor network to systems engineering and distributed computing by providing numerical integration of linear differential equations and iterative resolution of linear difference equations. As an example, the system can be used to provide analysis based on the lumped masses method or the finite element method where there may be a plurality of masses and nodes/elements. As a result, the system can provide information associated with scenarios in scientific and engineering design by providing advanced simulations.

[0059] Although finite element (FEM) software packages such as MSC Nastran, COMSOL, and ABACUS provide numerical methods to solve large dynamic scale systems, they are not effective. In particular, these techniques utilize algorithm scalability that provides a number of drawbacks related to system simulation. The systems and methods discussed herein provide linearly scalable solutions using the lumped masses method (LMM) and are highly scalable using the finite element method (LMM).

[0060] The systems and methods described herein concern the numerical solution by using tensor network of any system model dynamics that can be recast as a linear time invariant state space representation. It links the quantum mechanics concepts of a tensor network to the branch of systems engineering and distributed computing, with reference to the numerical integration of linear differential equations, as well as the iterative resolution of linear difference equation. The methodology allows a parallel algorithm implementation to simulate high dimensional continuous and discrete time systems and has the capability to exploit not only computational resources, but also distributed infrastructures such as cluster, grid, and cloud environments.

[0061] Possible fields of applications are, in particular, the real-world system analysis based on Lumped Masses Method (LMM) or Finite Element Method (FEM), with a very large number of masses and nodes/elements respectively, without recurring to unnecessary and overly conservative simplifications.

[0062] The system holds the key feature to be almost linearly scalable for LMM and high scalable for FEM. The systems and methods provide advantages associated with opening new technological scenarios in scientific and engineering design through big system simulation.

[0063] In systems engineering, discrete and continuous time systems can be modelled through a state space representation. However, in many industrial applications, a sufficiently large number of state variables is required to gain a realistic description of the underlying phenomena. Therefore, the curse of high dimensionality can easily occur, and can limit de facto a truthful computer simulation of the overall system. The development of techniques to numerically solve large systems that can exploit powerful processing capabilities is discussed herein. The novel paradigm of parallel computing offers an effective methodology to face big system simulations. Nevertheless, to do so, it is desirable to use large multi nodes computing infrastructures but also parallel algorithms which can efficiently distribute the calculation within the processing environment.

[0064] Nowadays, there are on the market various commercial software that offer solutions to address big systems simulation on a parallel manner. However, they have the major drawback of algorithms scalability which inevitably strongly hampers the simulation of large-scale systems. To overcome the mentioned limitation, current algorithms available in literature employ the strategy to reduce system size by using model approximation methods, such as super elements usage, eigenvectors and eigenvalues reductions, as well as machine and deep learning algorithms. [0065] The system instead offers a methodology that solves the above-mentioned scalability flaw by proposing an algorithm that is highly scalable across the processing units, thus removing the necessity to reduce the system dimensionality. In particular, the technique is very suitable to solve both LMM and FEM analysis problems characterised by many masses or nodes, respectively. Indeed, the system provides a parallel solution that is highly scalable, it can be applied in a distributed computational environment (e.g., cluster, grid, cloud) to solve systems with an enormous quantity of masses or nodes since the only limitation is represented by the number of computational units.

[0066] The methodology has, therefore, a great impact on technology because many systems’ models can be reconducted in a state space representation. A non-exhaustive list of possible fields of application include, for instance, engineering, economics, statistics, computer science, neuroscience, etc.

[0067] The novelty of the proposed system relies on exploiting methods based on a tensor network approach, also known in literature as tensor-train network or decomposition. This is a fast-evolving scientific topic that is providing solutions to problems of a large range of disciplines. Techniques based on this innovative field have become increasingly popular within the research community due to the possibility to provide efficient distributed computational approaches. Tensor network methods can be described as algorithms to break very large tensors into a mathematical equivalent network of smaller tensors. Such a flexible modelling that corresponds to an algebraic matrix decomposition offers efficient techniques to solve in a distributed manner a complex mathematical calculation.

[0068] As proof of that, during the last decade, tensor network methods have gradually become the standard in quantum mechanics to evaluate the time evolution of large and correlated quantum systems. They opened new scenarios in computational physics to simulate on a parallel manner quantum many-body systems dynamical behavior including also complex systems such as for instance spin glasses. [0069] Researchers extended their applicability also to other fields such as quantum chemistry. There are, indeed, works that exploit tensor decomposition algorithms to train machine learning based methods to identify candidate drug-like molecules. In addition, other research studies show that tensor-train networks are also effective tools for big data processing and for developing forecasting and predictive financial models.

[0070] However, although tensor network theory has been demonstrated to be a powerful approach to numerically solve differential equations, such as the quantum mechanical Schrodinger equation, the mentioned methodology has not been yet embarked to investigate dynamical systems problems. In this regard, the disclosure proposes to extend the usage of tensor network for dynamic simulations of any high dimensional systems with a given and sufficiently general model structure. [0071] All the current techniques to solve large state space models have their own limitations and drawbacks. Indeed, the existing commercial simulation software shows scalability weakness. A study on COMSOL version 4.2 reveals a parallel average efficiency around E p = 0.8 and discussion concerning COMSOL scalability is still a topic of high interest as evidently debated in its forum. The scalability downside is mainly because proprietary software usually implements solutions based on iterative methods.

[0072] MSC Software proposes two different approaches for the distributed solution i) direct transient response, which is based on a numerical approximation of differential equations through Newton-Raphson method and its related techniques; ii) modal transient response method, which uses either eigenvalue extraction through a highly tuned Lanczos solver or the automated component mode synthesis which is an approximation to the full Lanczos eigensolution.

[0073] However, those two approaches have their own disadvantages. Direct transient response method uses a very small value for the time step to guarantee a sufficient accuracy and avoid numerical instability. Algorithms derived by the iterative Lanczos solution hold the severe drawbacks that they do not linearly scale and that they iteratively utilize the calculation of the system matrix eigenvalues. As consequence, computation is a complex and burdensome task when large scale systems are considered. As reported in NX Nastran, the number of calculations for real eigenvalues analysis (it is known that a calculations number is even higher for complex eigenvalues) is of order o(nb 2 E) where n is the number of equations, b is the semi-bandwidth or similar decomposition parameter and E the number of extracted eigenvalues. [0074] Taking into consideration the computation complexity, an effective and fast time dependent response processing can be attained evaluating an approximated solution by considering only the dominant eigenvalues and their corresponding eigenvectors. Unfortunately, high order systems modes are removed and phenomena that could be of interest for an accurate systems analysis are thus neglected. This is a significant limitation in industrial applications since system size reduction excludes high frequency system dynamics, strongly hampering the possibility of a more reliable and robust technological design.

[0075] Machine learning and deep learning methods can also solve large systems but at the price of also discarding possible significant information. Indeed, these techniques rely on methods that can be recast as data compression algorithms, and they consequently neglect high order systems modes.

[0076] Vice versa, tensor network algorithms offer the possibility to design almost linearly scalable parallel solutions without the necessity of system size reduction. This remarkable feature is achieved since they break a large tensor into smaller entities providing the possibility to distribute small dimensional and, therefore, manageable matrices among the computational units. In addition, the methods can be implemented and executed on a single computing device as well as on distributed computing infrastructures such as for instance, cluster, grid, cloud, etc., laying the foundations to define a new roadmap in system simulation design for industrial applications.

[0077] The disclosure discusses a highly scalable parallel algorithm based on tensor network methods to numerically solve large systems of ordinary linear differential equations or of ordinary linear difference equations. Since the computation of the output response of a linear time invariant discrete time system can be computed through the solution of a suitable equivalent matrix exponential, as it occurs in the continuous time case, the disclosure will refer to continuous time systems only, and dedicate a specific section to the discrete time case.

[0078] Linear differential equations which represent the system mathematical models of many industrial applications are characterized by a large number of variables. The employment of the distributed computing paradigm is, therefore, a pragmatic and effective response to this challenge. Under this premise, tensor network techniques address the issue since they allow the development of parallel algorithms to simulate any high dimensional systems that can be recast in a state space representation. [0079] The core of the methodology is based on exploiting the theoretical and mathematical analogy between the Schrodinger equation, which is linear and governs quantum mechanics, and the state space representation of linear time-invariant systems. Indeed, if the Hamiltonian operator and the wave function ψ(x,t) are substituted by the system matrix A and the state matrix X (t) respectively, the tensor network algorithms that are applied to provide the numerical solution of the Schrodinger equation can also be employed to solve any system in a state space representation. This is achievable since in both the multi quantum bodies problems and state space systems, the free evolution time dependent analytical and, therefore, also numerical, solution may utilize the computation of a matrix exponential.

[0080] For instance, the Time Evolution Block Decimation (TEBD) algorithm can, indeed, effectively address big system time dependent solutions. The method has been developed by computational physicists to provide the numerical solution of the Schrodinger equation, e.g., to evaluate the time evolution of multi bodies quantum systems by efficiently calculating the exponential of large matrices. The key feature of the algorithm is associated with breaking the Hamiltonian into groups of non-commutative matrices that are afterwards divided into smaller commutative matrices which can be handled by one single processing unit. The Suzuki Trotter decomposition follows to compute the time dependency in a very effective distributed manner.

[0081] The disclosure discusses, for example, using the TEBD algorithm to solve time dependent responses of any system which can be recast in a state space representation, and can include the following steps:

[0082] (1) building a mathematical representation of the physical, economic, engineering, etc., problem through differential equations;

[0083] (2) determining the graph that defines the connections between the system’s states and computing the adjacency matrix;

[0084] (3) identifying through an optimization process the sets of matrices such as all the elements within each set commute each other while at least one element of a set does not commute with one element of another set;

[0085] (4) implementing una tantum the Suzuki Trotter decomposition with a given time interval δ and a predefined p expansion order; [0086] (5) evaluating the system response evaluation at the time T =N8 by iteratively using

N-times the result obtained in step 4;

[0087] (6) generating simulation results for the problem within an error of order of

Ao(5P +1 ).

[0088] Strengths of the system can be associated with an algorithm almost linearly scalable for lumped masses and highly scalable for FEM: any big system can be simulated because the only limitation is based on the number of computational units. There is not a necessity of system model reduction, all high order modes are included, simultaneous system response computation for different initial states X(0), e.g., the algorithm computes system time evolution for a set of initial conditions X(0) in just one computational run, and simulation error of order N o (δ p+1 ), e.g., any accuracy can be reached since the error is strictly controlled by adjusting the time interval length 8 or the Suzuki Trotter expansion order p.

[0089] The wave function behaviour of a quantum mechanical system is expressed by the well-known Schrodinger equation. For a given Hamiltonian operator H and wave function ¥ / (x,r) ) the time dependent Schrodinger equation is represented by a linear partial differential equation:

[0090]

[0091] where i is the imaginary unit and h is the (reduced) Planck constant.

[0092] If the Hamiltonian operator H is constant, the solution of equation (1) assumes the expression:

[0093]

[0094] It is well known that in case of large dimensionality, equation (2) cannot be numerically solved especially due to the presence of the matrix exponential term .

[0095] Analogously, for any linear time invariant continuous time system expressed through the state space representation

[0096] the computation of the free evolution x ƒ (t) of the system may require solving the following simplified model [0097]

[0098] whose solution is given by:

[0099]

[00100] where x(0) is a column vector containing the initial state conditions.

[00101] More in general, an initial state matrix X(0) can be considered and formed by n linearly independent column vectors, which are a vector basis whose linear combination provides all the possible initial conditions values:

[00102]

[00103] In this case, equation (5) becomes:

[00104]

[00105] where x ƒ (t)is a matrix, whose columns are the free evolution state vectors for the initial conditions given by the corresponding columns of the matrix X(0).

[00106] If replacing the role of with X(O) the similarity between equations (2) and (6) is established.

[00107] Moreover, in both cases (2) and (6), the system computes an exponential matrix. Such a mathematical analogy has a great impact on technology because many continuous time systems are modelled as in equation (3) (a non-exhaustive list of fields of application include for instance engineering applications, economics, statistics, computer science, neuroscience, etc ...) and, consequently, analytical, and numerical methods applied to compute (2) can also be employed for equation (6).

[00108] For instance, computational physics-related methods like the time-evolving block decimation (TEBD) and density matrix renormalization group (DMRG) algorithms are currently applied on a distributed platform to numerically solve equation (2) and they can be therefore also used in (6).

[00109] Any finite dimensional linear time-invariant system (LTI) can be expressed in a matrix state space representation as follows: [00110] [00111] This is a system of linear differential equations whose free evolution x ƒ (t) is given by:

[00113] In many technological applications, the matrix A is extremely large, and it consequently cannot be handled by only one computational unit. In this context, the distributed computing paradigm offers a viable solution to solve the exponential of large matrices. However, other than distributed infrastructures (e.g., cluster, grid, or cloud platforms), parallel algorithms can be used. Indeed, the calculation is to be executed in a distributed manner, within a certain error and designed in a way that requires low communication among the computational units.

[00114] In this context, the TEBD algorithm, for instance, has the potential to address this high dimensional problem because it provides the numerical solution of large quantum many- body systems.

[00115] To gain insights on how the TEBD methodology works, the commutative property of multiplication does not hold for matrices as in general AB ≠ BA. For this reason, the exponential of a sum of matrices cannot be embarrassingly parallelised.

[00116] Indeed, the exponential of the sum of two matrices is different than the product of the two-matrix exponential e A+B ≠ e A e B . The equality e A+B = e A e B holds if and only if A and B are two matrices that have the algebraic property to commute each other AB = BA <=> [A, B] = 0.

[00117] The TEBD algorithm succeeds in circumventing such a mathematical restriction by exploiting the Suzuki Trotter decomposition that relies on approximating the exponential of a sum of matrices with a function whose arguments are single matrix exponential e A+B = ƒ(e A , e B ) each. The core of the TEBD method is based on, firstly, subdividing the matrix A into n sets of matrices whose elements commute each other while at least one element of a set does not commute with at least one element of another set and, secondly, performing the Suzuki Trotter decomposition on these n sets.

[00118] The tensor network technique can be applied based on TEBD to determine the free evolution of big dynamic systems. To do so, the disclosure provides a mathematical representation of the problem (physical, economic, engineering, ... etc..) through differential equations. Secondly, the adjacency matrix is evaluated by determining the graph that defines the connections between the system’s states. Then, through an optimization process, the minimum number of sets is computed such that all the elements within each set commute each other while at least one element of a set does not commute with one element of another set.

Fourth, the Suzuki Trotter decomposition with time interval δ and order p is computed offline and then iteratively used to reach the instant time T=Nδ. Lastly, the final system response of a set of initial conditions X(0) is computed at the time T. The total error of the simulation will be of order N o (δ p+1 ) as the Suzuki Trotter decomposition will be iteratively employed N times. Any accuracy can be, therefore, achieved by either reducing the time interval δ or increasing the Suzuki Trotter decomposition order p.

[00119] As an example, the mathematical representation of the problem may be associated with thermal analysis of aircraft, automobiles, or another object. In addition, the mathematical representation of the problem may be associated with structural analysis or thermal structural analysis. The structural analysis or the thermal structural analysis may be associated with one or more buildings, bridges, automobiles, or other objects or structures. Even further, the mathematical representation of the problem may be related to thermofluiddynamics or aerodynamics. As an example, the problem may be related to one or more pipes for fluids such as water or oil or may be related to aerodynamics associated with an aircraft, an automobile, or a train, among others. In addition, the mathematical representation may be related to electromagnetic modelling of one or more electric or magnetic fields. As another example, the mathematical representation of the problem may be related to computational finance such as a problem related to financial derivatives (e.g., options) or other financial problems.

[00120] The system discussed herein is able to utilize one or more of high performance computing devices (HPC), grid computing devices, or cloud computing infrastructure to provide a way to simulate large state space systems. As an example, the system is able to divide each problem into one or more tasks and assign each task to one or more processing devices, or cores of processing devices, or threads of processing devices such as CPUs or GPUs to perform analysis of the problem in parallel. As a result, the tensor network associated with the system provides an almost linearly scalable solution using the lumped masses method (LMM) and is highly scalable using the finite element method (LMM). The system may eliminate system model reduction, e.g., all high order modes may be included. In addition, a user may adjust simulation error of the system. In one example, the user may adjust accuracy by modifying the time interval δ or the p order Suzuki Trotter expansion. Even further, the system provides simultaneous response computation for different initial states X(0).

[00121] Figure 1 shows a diagram of a method 100 performed by the system according to an example of the instant disclosure.

[00122] Next, the disclosure includes determining corresponding connections, graphs, and a minimum set of non-commuting matrices. The first example is a one-dimensional lattice.

[00123] In a one-dimensional lattice, the interactions are occurring only between neighboring sites.

[00124] In one example, the one-dimensional lattice may be made of six sites although it may include more or less than six sites.

[00125] Figure 2 shows the one-dimensional lattice 200 with one neighboring site interaction according to an example of the instant disclosure.

[00126] As provided below, G can be the adjacency matrix defined as a matrix whose elemen j is equal to 1 if the sites are connected (i.e., there is an edge between them) otherwise is equal to 0.

[00127]

[00128] The adjacency matrix G for the case in Figure 2 is given by:

[00129]

[00130] The diagonal terms in the matrix G are equal to zero since there are no self-loops while instead all off-diagonal terms next to the diagonal are equal to one.

[00131] As provided below, At can be the matrix representative of the site are the algebraic formulation of the physical interpretation of the sit The commutation matrix C defined as a matrix whose element c I7 is equal to the commutation term [00133] is given by:

[00135] The term implies that sites are connected, i.e.

[00136]

[00137] however, the contrary is not in general true because is an algebraic product and therefore it is possible that even if are connected.

[00138] For this kind of one-dimensional lattice model, it is possible to partition the interaction graph in two sets whose sites do not interact. Afterwards it is possible to consider the sets constituted by the corresponding matrices representation.

[00139] can be the set containing the sites which the site commutes with E 1 can be = . The first site commutes with the sites in

[00140] Next, it can be checked if the second site can be included in and the tentative set Taking into consideration that the 2 nd site commutes with sites it can be observed that

[00141]

[00142] Because it is possible to exclude the 2 nd site and it is still

[00143] The third site is checked next. Now, the tentative set is and 3 commutes with sites

[00145] Since the 3 rd site is accepted, and it can update

[00146] The 4 th site commutes with sites

[00147] [00148] excluded.

[00149] Checking the 5 th site, it commutes with sites

[00150]

[00151] included.

[00152] Checking the 6 th site, it commutes with sites and the tentative set is now

[00153]

[00154] excluded.

[00155] The process can be iterated by considering again the 2 nd site that has been previously excluded. It is possible that The 4 th site can be considered and checked if it can be included in E 2 .The 3 rd site can be skipped as it has already been incorporated in E 1 .

[00156]

[00157] included

[00158] Skipping the 5 th site, that has been already considered, it can apply the same procedure for the site [00159] included

[00160] In the end:

[00161]

[00162]

[00163] Note that the solution obtained is not unique because the following groups are made of elements that are commutative each other

[00164]

[00165]

[00166]

[00167] The algorithm described above is an example of a methodology about how to select commutative sets of matrices, with a non-unique result, in the sense that it may have a different solution depending on the starting conditions.

[00168] Next, a one-dimensional lattice model 300 with two neighbouring sites interaction can be considered as shown in Figure 3. [00169] The adjacency matrix G is therefore

[00170]

[00171] For this kind of one-dimensional lattice model, it is possible to partition the interaction graph in three sets including the sites which do not interact:

[00172]

[00173]

[00174]

[00175] A two-dimensional lattice 400 can be partitioned using square elements representation as shown in Figure 4.

[00176] The adjacency matrix G is:

[00177]

[00178] While the commutation matrix C is given by:

[00179] [00180] The two sets are in this case

[00181]

[00182]

[00183] And the sets Φ 4 and Φ 2 are in this case:

[00184]

[00185]

[00186] In case the mesh is constituted by triangle elements as shown in Figure 5), three sets can be identified:

[00187] Figure 5 shows a two-dimensional lattice 500 with one neighbouring sites interaction and partitioned using triangle elements.

[00188] The three sets are in this case

[00189]

[00190]

[00191]

[00192] The connection within the sites of the mesh through triangles could also be different as shown in Figure 6.

[00193] Figure 6 shows a two-dimensional lattice 600 with one neighbouring sites interaction and partitioned using another pattern of triangle elements.

[00194] The three sets are in this case include the following elements:

[00195]

[00196]

[00197]

[00198] Figure 7 shows a three-dimensional lattice 700 with cubic elements and two sets and . In addition, Figure 7 shows the three-dimensional lattice 700 with one neighboring site interaction and is partitioned using cubic elements.

[00199] In this case the two sets are easily identified by simply considering the even and odd indexes:

[00200]

[00201] [00202] The system relates to numerically solving by using a tensor network and the dynamics of any system that can be recast as a linear time invariant state space representation. In one example, a mathematical formulation is provided to build the optimal tensor network by identifying the sets , containing the sites and the corresponding representative matrices A k

[00203]

[00204] To build the optimal tensor network, it is desirable to solve a constrained minimization problem whose solution is the optimal sets with / minimum.

[00205] Minimum I subject to:

[00206]

[00207] This optimization problem can be solved by using linear programming, integer linear and non-linear programming as well as machine learning, deep learning, and neural network algorithms, among others.

[00208] Once the number / of non-commuting sets has been computed, a

Lie-Trotter-Suzuki decomposition can be performed.

[00209] A TEBD implementation in which the matrix A can be divided in only two sets and of non-commuting matrices is provided below. In case more sets of non-commuting matrices are considered the same rationale can be applied.

[00210] A can be a matrix such as it can be split as a sum of / addends denoted by

[00211]

[00212] and each matrix in the set which has an odd index commutes with all the matrices having odd indexes and does not commute with at least one matrix having an even index. Furthermore, each matrix in the set of even indices matrices commutes with all the matrices having even indices and does not commute with at least one matrix having odd indices. In synthesis:

[00213]

[00214]

[00215] [00216] The Suzuki Trotter decomposition of order p can be applied on the two sets of matrices , and this results in e As an example, setting p=2, the second order approximant is given by:

[00217]

[00218] where the term can be furthermore expressed as product of the exponential of the matrices that have odd indices (this equality holds since all those matrices do commute with each other by construction):

[00219]

[00220] By applying the same consideration to the term

[00221]

[00222] Thanks to this property, the Suzuki Trotter decomposition can be implemented in parallel.

[00223] A parallel solution for the second order approximant is provided because once the methodology has been identified, the same approach can also be applied for higher order approximants. In case of the second order approximant, the Suzuki Trotter decomposition is performed in only three different steps in the sense that there are three layers on which a parallel computation is performed. In the first step, the term can be calculated by distributing on different computing units each single term odd, of the overall product In the second layer, the term can be calculated by multiplying each even, with the results of the first step that will include only the terms that interact with Aj once they have been compressed through singular value decompositions (SVDs). It will then follow the final layer that will compute the products between each and the corresponding SVDs compressed outputs retrieved from the second step.

[00224] Although the Suzuki Trotter decomposition can be an iterative process to calculate the system evolution, since it computes the matrix exponential for a small-time step the terms ,j even, are calculated only once during the simulation since they are constant along the time evolution. This is an important feature that simplifies the calculation and makes the major computational bottleneck the SVDs evaluations. Indeed, except for the last layer, all the steps terminate by computing the SVDs of the corresponding threads output. This strategy aims at reducing the system size during the time evolving simulation process and is implemented by discarding the eigenvectors whose corresponding eigenvalues are below a given threshold. This ensures that low data transfer and communication messages will take place between layers and threads and the system size constantly decreases during the time evolution because the matrix A induces correlation among the state variables.

[00225] The TEBD algorithm can also be used to solve quantum states and, therefore, wave functions. This is achieved by ascertaining the mathematical analogy of the Schrodinger equation and the dynamical systems. The TEBD algorithm has been developed to solve the time evolution of quantum bodies and, therefore, the methodology has the task to evaluate the wave function ψ(x,t). If the wave function is replaced at the time t=Q, ψ(0) with an initial state matrix X(0) formed by n linearly independent column vectors, representing all the possible values of initial conditions X(0) = [x 1 (0), x 2 (0), ... x n (0)] the algorithm can solve all these trials in one computational run. In other words, instead of launching many Monte Carlo trials, the algorithm holds the capability to evaluate the entire solution set in only one computational run.

[00226] The methodology has therefore two major advantages that can be summarised as follows:

[00227] ( 1 ) singular value decompositions computed during the time evolution provide an efficient truncation technique of the large state space,

[00228] (2) all the simulation experiments corresponding to a set of Monte Carlo trials are processed in one single run.

[00229] As an example, the TEBD parallel algorithm can be used in which the second and fourth order approximant Suzuki Trotter decomposition is employed. A set of experimental trials can be computed in only one computational run.

[00230] linearly independent column vectors that represent the initial conditions of q different simulation trials and let X be a matrix whose columns are the vectors

[00231] A can be a band matrix or a different type of matrix.

[00232] On a given site a fraction, one or more threads can run for the task's execution.

As an example, that thread 1 is associated to site on. However, the task assignment within the distributed computational scheme can also be different. For instance, thread 2 can be associated to only thread 2 can be associated to etc. or also even dynamically allocated through the processing in the sense that it can change during the processing.

[00233] In one example, I is an even number, the matrix A 1 is related to matrix A 2 to n 2 , ... matrix A, to site and the matrices satisfy the following properties:

[00234] A is a band matrix and can be split as a sum of A 1 , A 2 , ... A,

[00235]

[00236] and each matrix Ay does not commute with any adjacent matrix while it commutes with all the others:

[00237]

[00238]

[00239] Under the assumption of thread 1 assigned to sites thread 2 to sites fl 3 and f ...thread n/2 to sites there can be a total of 1/2 threads.

[00240] In one example, the initial time t=0 and the site n where it is assumed thread 1 executes. In the first layer thread 2 holds the subtask to evaluates the SVD of the term of the Suzuki Trotter decomposition. Considering SVD mathematical properties, there will be three matrices such as:

[00241]

[00242]

[00243] The matrix is diagonal for SVD construction, with the (ordered) singular values σ 1 ≥ σ 2 > •• • > σ n on its diagonal. The singular values below a given threshold, say can be discarded to reduce the system size, e.g., if , then: [00244]

[00245] In the second computational layer, the site n 2 in which only the data from the sites can be conveyed as the other sites act on other positions since A is a band matrix, e.g.,

[00246]

[00247] The step number three follows the same rationale, and the different computational layers showing the distributed implementation are shown in Figure 1.

[00248] Figure 1 shows a parallel implementation for a band matrix and three different computational layers for the second order Suzuki Trotter decomposition 800 according to an example of the instant disclosure.

[00249] In one example, the fourth order approximant (p=4), and the Suzuki Trotter decomposition relies on solving the following exponentials:

[00250]

[00251]

[00252]

[00253] The methodology follows the same approach that has been employed to solve the second order approximant. The only difference is that in this case seven parallel processing layers instead of three can be used.

[00254] Figures 9A and 9B show a parallel implementation for a band matrix and seven different computational layers for the fourth order Suzuki Trotter decomposition 900 according to an example of the instant disclosure.

[00255] A does not have to be a band matrix. Rather, any matrix A can be used in the TEBD parallel implementation. However, there may be drawbacks associated with a possible increase of the number of connections between threads because data will be transferred between non-adjacent sites.

[00256] As an example, a matrix Aj does not have to commute with adjacent indices matrices p does not have to commute although The hypothesis can be formally stated as: [00257] let A be a matrix that can be split as a sum of n addends denoted by A 1 , A 2 , ... Aj

[00258]

[00259] and Aj does not commute with matrices having adjacent indices while it commutes with all the others except for the matrices having indexes a and P with a E odd index, P G even index

[00260]

[00261]

[00262] odd index, p G even index

[00263] In the first layer, thread a/2 holds the subtask to evaluate the SVD of the Suzuki

Trotter decomposition's term while and the SVDs of respectively.

[00264] In the second computational layer, focus is on the site in which data not only from the adjacent but also from the site are to be transferred to compute the following term:

[00265]

[00266] The third layer is described next.

[00267] Figure 10 shows a parallel implementation for any matrix A for the second order Suzuki Trotter decomposition 1000 according to an example of the instant disclosure.

[00268] As an example, the TEBD implementation can have A divided in any number of sets of non-commuting matrices.

[00269] In some cases, the state matrix A is a full matrix and therefore more than two sets of non-commuting matrices are to be determined.

[00270] A full matrix A

[00271] [00272] can be, for instance, partitioned in n matrices whose elements are zeros except the terms of a column:

[00273]

[00274]

[00275] An approximant of e Ax for a given order can be determined by using the fractal decomposition methods and other and optimal approximants can also be built by employing other techniques.

[00276] As an example, a second order approximant for only two sets A 1 and A 2 is given by:

[00277]

[00278] For three sets A 1, A 2 and A 3 becomes

[00279]

[00280] For any finite number of sets A 1, A 2 , A 3 , ... A, it is possible toobtain

[00281]

[00282]

[00283] As an example, a four-order approximant for two matrices and A 2 is given by:

[00284]

[00285]

[00286]

[00287]

[00288] For three terms A 1, A 2 and A 3 ,:

[00289]

[00290]

[00291]

[00292] [00293]

[00294] For an arbitrary finite number of sets A 1, A 2 , A 3 , ... A I ,

[00295]

[00296]

[00297]

[00298]

[00299]

[00300] Another example of four-order approximation is given by the following expression:

[00301]

[00302]

[00303]

[00304]

[00305]

[00306]

[00307] For example, if three terms A 1, A 2 and A 3 :

[00308]

[00309] .

[00310]

[00311]

[00312]

[00313] 1

[00314] For any finite number of sets A 1, A 2 , A 3 , ... A,

[00315] [00316]

[00317]

[00318]

[00319]

[00320]

[00321]

[00322] As an example, for any finite number of sets A 1, A 2 , ..., A I , the sixth order approximant is given by:

[00323]

[00324]

[00325]

[00326]

[00327]

[00328]

[00329]

[00330]

[00331]

[00332]

[00333]

[00334]

[00335]

[00336]

[00337]

[00338] [00339]

[00340]

[00341] The eight-order approximant is given by:

[00342]

[00343]

[00344] In one example, parallel implementation may be achieved using Suzuki Trotter expansion. The advantage concerning the parallel implementations is that the exponential approximant terms of the Suzuki Trotter expansion are constant with respect to the time and they can therefore be computed only once at the beginning of the processing, e.g., offline.

[00345] However, the disadvantage may be that all the sites are connected since the state matrix is full and consequently, network links between all the computing elements may be used to transfer data among sites.

[00346] Figure 11 shows an example of connections of a full matrix characterizing four sites 1100 according to an example of the instant disclosure.

[00347] In one example, the system may perform Suzuki Trotter expansion and use distributed algorithms for matrix exponentials. If the matrix A is very large, the calculation of the exponential approximant terms can be time intensive although they have to be computed only once at the beginning of the processing.

[00348] To address this issue, a hybrid methodology can be implemented to reduce the number of the exponential approximant terms and can rely on employing the Suzuki Trotter expansion together with any other algorithm that is able to compute the exponential of matrices in parallel such as the classical matrix diagonalization techniques based on eigenvectors.

[00349] For instance, a full matrix A (can be a 4x4 matrix):

[00350]

[00351] can be partitioned in the following four-square submatrices [00352]

[00353]

[00354]

[00355] Thus, as an example, there is a Suzuki Trotter expansion with only four terms. If a second order approximant is considered:

[00356]

[00357] Noting that , , the following equalities can be obtained:

[00358]

[00359]

[00360] where I is the identity matrix. The expression for S 2 (x) simplifies to:

[00361]

[00362] The terms can be computed on a distributed manner using algorithms such as for instance the Lanczos algorithm.

[00363] As an example, an 8x8 matrix can be:

[00364]

[00365] And it can be partitioned in sixteen matrices:

[00366] [00367]

[00368]

[00369] A can be equal to the sum of all the sixteen matrices . Noting that:

[00370]

[00371] The following equalities can be determined:

[00372]

[00373]

[00374]

[00375]

[00376] The second order approximant is given by the following product

[00377]

[00378]

[00379] casein one example, let a matrix of size p x k such as

[00380]

[00381] In general, a matrix A can be partitioned in many submatrices A 11 , -A 21 ,... A nm such as

[00382] [00383]

[00384] With this partition, the sum of the matrices gives the matrix A

[00385]

[00386] Analogously, the second order approximant is given by:

[00387]

[00388]

[00389] As an example, there can be n x m matrices and the matrix exponential of the Suzuki

Trotter decomposition can be evaluated through parallel algorithms.

[00390] There are many different ways to partition the matrix A, and the optimal strategy will depend on the structure of the matrix A (e.g., positions of zeros) and on the architecture of the available computational platform (e.g., RAM amount, number cores and GPUs, bandwidth speed etc ...).

[00391] Concerning the time evolution of any LTI system with a dynamic matrix A (either band, not band or full matrix), the Suzuki Trotter decomposition computes the term e AS . The algorithm can evaluate the time evolution for the period 8 whose length, for convergence reasons, can be sufficiently smaller than one, i.e., 5<<1.

[00392] Nevertheless, this implies an iteration scheme which can be implemented many times. This workflow can be achieved by firstly splitting the overall time in N small intervals of length 5<<1. Secondly, starting from an initial state X(0) and denoting by X(kδ) the state variables at the k-th interval of length 8, the overall time evolution can be obtained by iteratively computing the value until reaching the final value X(Nδ) as shown in Figure 12.

[00393] Figure 12 shows an iterative process 1200 to evaluate system time evolution according to example of the instant disclosure.

[00394] Recalling that the Suzuki Trotter decomposition with time interval 8 and order p th gives an error of order o(δ p+1 ), it is possible to compute the total simulation error.

[00395] Let e At be a matrix exponential and let e At the matrix obtained through the p th order Suzuki Trotter decomposition applied to the matrix A. For a given time step δ, it is e = , e.g., the Suzuki Trotter expansion gives an approximated value with an approximation error of order o(δ p+1 ).

[00396] The system free response at the time δ is then given by:

[00397]

[00398] At the time 28 and 3δ, the system free response can be computed as:

[00399]

[00400]

[00401]

[00402]

[00403]

[00404] By iteratively repeating the procedure, the final instant time can be determined T = Nδ.

[00405]

[00406]

[00407] As a result, the total error of the simulation will be of order N o (δ p+1 ) since a Suzuki Trotter decomposition will be iteratively performed N times.

[00408] In other words, the algorithm can improve accuracy by either reducing the time interval δ or increasing the Suzuki Trotter decomposition order p.

[00409] If the initial state X(0) is a matrix whose columns are initial vectors of a set of many trials, the system can evaluate many Monte Carlo experiments in only one computational run.

[00410] Nevertheless, the procedure can be trivially extended in the sense that all possible trials can be evaluated in a single run by exploiting the system linearity. Indeed, if the initial state X(0) is the identity matrix

T 0 O’

[00411]

[00412] all possible combinations can be calculated by simply scaling the solutions. [00413] Denoting by X(T) the solution at time T obtained by posing X(0) = I nxn , the response of the system at a given set of initial conditions K o are given by:

[00414] Y(T) = X(T)Y 0

[00415] In one example, the system can solve all possible combinations of initial states in one run. This can provide a great impact on technology and allows industry to design robust solutions in the sense that they are unfailing and reliable to a large set of initial conditions and forces.

[00416] In one example, the system can be used to determine the dynamics of a lumped mass spring mechanical model. In one example, the matrix A is a band matrix, or any matrix A.

[00417] In the first example, A is a band matrix. Let be I masses, let and l springs and dampers characteristic parameters, respectively. The first mass m 1 can be connected to a fixed point through the spring k 1 and damper b 1 and to the mass m 2 through k 2 and b 2 . In general, the mass is connected to the mass with and damping while to the last mass is instead connected to another fixed point through the spring k L+1 and damping b L+ 1 .

[00418] Mass-spring-damper models are classic and widely used simulation models. Such models are well-suited for modelling any kind of objects including those with complex material properties such as nonlinearity and viscoelasticity. As well as engineering simulation, these systems have applications in computer graphics and computer animation. As one example of an engineering application, a mass-spring-damper model can be used to design vehicle seats, where it is typically desirable to limit the amount of vertical displacement and velocity an occupant experiences as the vehicle traverses the road surface and obstacles such as speed bumps. A vehicle seat has mass, its materials have a springiness (e.g., cushioning) which can be modelled as a spring. It has resistance to motion (e.g., where the seat is secured to the floor of the vehicle) which can be modelled as a damper. Other simple examples of systems that may be modelled in such a way include windows experiencing forces from wind or physical contact (e.g., mass, resistance to motion (frame), springiness of the glass), a wine glass being tapped (e.g., the wine glass has more springiness than the window, as can be heard when the glass “pings” in response to an impact), and a bridge, which has mass, springiness, and resistance to motion and experiences forces from people or vehicles passing over it. [00419] By running such a simulation, one can determine, for example, for a given force and given spring and damping constants how the masses will move. In this way, spring and damping constants can be adjusted to avoid excessive movement when the masses experience expected forces. For example, in the design of a vehicle seat, stiffness of the seat materials might be increased if they were found to otherwise move too much in the simulation when the vehicle experiences typical forces from passing over a speed bump.

[00420] Figure 13 shows a mass spring damper system 1300 according to an example of the instant disclosure. The system can be modelled using a system of second order differential equations as shown below:

[00423] A second order differential equation can be transformed in two first order differential equations, and the mass-spring-damper model of Figure 13 in a system space representation can be expressed as follows:

[00427] The system can be formulated through the following matrix notation:

[00429] If the matrix A is split in I matrices A 1, A 2 , A 3 , ...A l constructed in the following manner [00431]

[00432]

[00433]

[00434] It can be determined not only that their sum is equal to the matrix A

[00435]

[00436] but also, that each matrix does not commute with its adjacent matrices while commuting with all the others:

[00437]

[00438]

[00439] For instance, the matrix A 1 does not commute with A 2

[00440]

[00441] while it commutes with all the others

[00442] .

[00443] It follows that the Suzuki Trotter decomposition can be therefore applied on two sets of matrices having even and odd indices. [00444] However, for computational reasons, in order to decrease the memory usage, instead of using A 1, A 2 , A 3 , ...A l , the following smaller matrices are considered:

[00445]

[00446]

[00447]

[00448]

[00449] The parallel version of the algorithm can be implemented by computing the Suzuki

Trotter decomposition on the matrices The description of the parallel coding is provided herein.

[00450] The simulation can be performed over the time interval [0, T], In this instance, it is possible to divide the overall interval in smaller time steps of length δ, and each single time interval will, therefore, begin at the time instants 0, δ, 2δ, .. , nδ, ..., T — δ.

[00451] A second order Suzuki Trotter decomposition approximant can be divided in three different parallel processing layers. Let be the value of the thread i at the layer j of the Suzuki Trotter decomposition.

[00452] The time evolution of the system at time δ with initial conditions X(0) can be computed by distributing over the threads the calculation. The first thread at the layer one will process a part of the overall computation and is given by:

[00453]

[00454] The second thread at the layer one will be:

[00455]

[00456] At the layer two, the following is calculated

[00457] [00458]

[00459]

[00460]

[00461] At the layer three of the Suzuki Trotter decomposition, the following is calculated:

[00462]

[00463]

[00464]

[00465] The values at the different threads provide a distributed solution at the time X(δ). The communication scheme concerns only adjacent threads delivering the possibility to implement a high scalable tool.

[00466] Figure 14 shows communication 1400 between threads when A is a band matrix according to an example of the instant disclosure.

[00467] In one example, it is possible to evaluate the time evolution at the time instant 28, by repeating the Suzuki Trotter decomposition by entering as initial condition the value X(δ) computed at the time 8. The initial conditions at the final time N8 is therefore the value X( (N — 1)δ) calculated at the previous step (N — 1)δ as shown in Figure .

[00468] As already discussed, the methodology offers the possibility to evaluate many Monte Carlo trials in only one computational run. Considering the linearity of the system, if the matrix of initial condition X(0) is the identity matrix, all possible combinations can be calculated afterwards:

[00469] [00470] Figure 15 shows the outcome of the simulation considering as initial matrix X(0) = / nxn the identity matrix according to example of the instant disclosure. Any set of initial conditions can be now easily computed by exploiting system linearity. Figure 16 shows how the number of the considered singular values along the simulation decreases, e.g., the matrix A induces correlation between the state variables and therefore the number of singular values greater than a given threshold decreases according to example of the instant disclosure.

[00471] Figure 15 shows free evolution response showing of all the state variables 1500 according to an example of the instant disclosure.

[00472] Figure 16 shows a number of the singular values during the simulation greater than the threshold 1600 according to an example of the instant disclosure.

[00473] The simulation error can be strictly controlled by changing the time interval δ and the Suzuki Trotter decomposition p order. Figure 17 shows an example of the simulation error with respect to the time, where a time step δ = 0.1 is chosen and a p = 4 order Suzuki Trotter decomposition is implemented.

[00474] The maximum simulation error is N o (δ p+1 ) and therefore:

[00475]

[00476] Figure 17 shows an example of simulation error versus elapsed time in seconds with δ = 0.1 and p=4 1700 according to an example of the instant disclosure.

[00477] Keeping the 4 th order approximant of the Suzuki Trotter decomposition and reducing by one order of magnitude the time interval it is possible to gain a reduction of the simulation error of p=4 order of magnitudes as shown in Figure .

[00478] error = N δ 4+1 = 50000 * 0. 01 5 = 5-10“ 6

[00479] Figure 18 shows an example of simulation error versus elapsed time in seconds with δ = 0.01 and p=4 1800 according to an example of the instant disclosure.

[00480] As an example, there can be a numerical example of a lumped mass spring mechanical model.

[00481] For instance, there can be a system with four masses.

[00484] As a further example:

[00485]

[00486]

[00487]

[00488] The state matrix formulation becomes: [00489]

[00490] And therefore:

[00491]

[00492] It is possible to consider the following four matrices:

[00493] [00494]

[00495]

[00496]

[00497] As a result:

[00498]

[00499]

[00500]

[00501] The following two non-commuting sets are identified:

[00502]

[00503]

[00504] It is possible to apply the second order Suzuki Trotter decomposition. [00505]

[00506] In addition, as an example, δ = 0.1.

[00507]

[00508]

[00509]

[00510]

[00511] [00512]

[00513] Such a computation can be implemented on a distributed platform.

[00514] As initial condition x(0):

[00515]

[00516] If there are two threads thread 1 and thread 2 , the first step in which

[00517] thread computes

[00518]

[00519]

[00520] thread 2 computes

[00521] ( ) [00522]

[00523] In the second step it is possible to obtain

[00524] thread 1 computes

[00525]

[00526]

[00527]

[00542] The solution is distributed among the threads, indeed the first four rows of thread 1 provides the solution for the first four state variables [x 1 ,x 2 ,x 3 ,x 4 ] while the last four rows of thread 2 gives the solution for the last four state variables [x 5 ,x 6 ,x 7 ,x 8 ]

[00543]

[00544] Note that the have been computed and therefore it is possible to proceed in computing the next time instant by multiplying those matrices for the previous states.

[00545] The solution at the time instant x(2 δ) = x(0.2) is given by

[00546]

[00547]

[00548]

Let m 4 , m 2 , be 21 masses. The system is equal to the model shown above with the only difference that the first mass is connected to the second mass but also with the last mass that has an even index as shown, for example, in Figure 19.

[00549] Figure 19 shows an example of a mass-spring-damper system with a first and last mass connected according to an example of the instant disclosure. [00550]

[00551]

[00552]

[00553] If the matrix A is split in I matrices A 1 , A 2 , A 3 , ...A i lonstructed in the following manner:

[00554]

[00555] [00556]

[00557]

[00558] Their sum is equal to the matrix A:

[00559]

[00560] And the matrices with odd indexes do not commute with the matrices having even indexes but all the matrices with odd indices commute each other as well as all the even matrices:

[00561]

[00562]

[00563]

[00564]

[00565] The parallel implementation of this example may use communication not only between adjacent threads but also between the first and last thread as shown in Figure 20 and Figure 21.

[00566] Figure 20 shows communication 2000 between threads when A is not a band matrix and a Suzuki Trotter decomposition 2 nd order approximant according to an example of the instant disclosure. Ten masses and five threads are considered.

[00567] Figure 21 shows parallel implementation of the Suzuki Trotter decomposition 2 nd order approximant when the first and last mass are connected 2100 according to an example of the instant disclosure. [00568] As an example, a system on a surface may have nine masses and there is no dumping. Furthermore, the masses form a mesh whose basic elements are square as shown in Figure 22.

[00569] Figure 22 shows an example of a two-dimensional lattice model with nine masses and no dumping 2200 according to an example of the instant disclosure.

[00570] Every mass with an even index can be connected to a mass with an odd index and vice versa. The system can be mathematically formulated as a system of nine second order differential equations:

[00571]

[00572]

[00573] In a state space representation, it becomes a system of eighteen first order differential equations and therefore:

[00574] [00575]

[00576] If the matrix A is split in nine matrices A 1, A 2 , A 3 , ... A 9 which are constructed in the following manner

[00577]

[00578] [00579]

[00580] Their sum can be equal to the matrix A.

[00581]

[00582] Furthermore, any matrix with an odd index does not commute with at least one matrix having an even index but all the matrices with odd indices may commute each other as well as all the even indexes matrices:

[00583]

[00584]

[00585]

[00586] These properties are a consequence that every mass with an odd index is connected to a mass with an even index and vice versa. For instance, noting that m 1 is connected to m 2 and m 4

[00587]

[00588] It is possible to build a commutation matrix C such as the element C(i,j) is equal to:

[00589]

[00590]

[00591] [00592] The two sets of non-commuting terms are therefore made of the matrices constituted by the following elements:

[00593]

[00594]

[00595] The model can be extended by considering twelve masses as shown, for example, in Figure .

[00596] Figure 23 shows an example of a two-dimensional lattice with twelve masses connected through square elements 2300 according to an example of the instant disclosure.

[00597] In this case even and odd indexes do not define the sets, for instance [A 1 , A 5 ] ≠ 0. A possibility of two non-commuting sets is the following:

[00598]

[00599]

[00600] Masses connected through triangles are shown in Figure 24. In this instance, there may be at least three sets of non-commuting terms.

[00601] Figure 24 shows an example of two-dimensional lattice with twelve masses connected through triangular elements 2400 according to an example of the instant disclosure.

[00602] For instance, a possibility of three non-commuting sets are those constituted by the following elements [00603]

[00604]

[00605]

[00606] As an example, there may be twenty-seven masses forming a mesh whose basic elements are cubes as depicted in Figure 25.

[00607] Figure 25 shows an example of a three-dimensional lattice model with twentyseven masses connected through cube elements 2500 according to an example of the instant disclosure.

[00608] Every mass with an even index can be connected to a mass with an odd index and vice versa. Thus, two groups of matrices can be built such as

[00609]

[00610] And also that: [00611]

[00612]

[00613]

[00614] Noting that m 1 is connected to m 4 , m 2 and m 10 :

[00615]

[00616] The two non-commuting sets are made of the elements characterised by even and odd indexes, respectively. Therefore:

[00617]

[00618]

[00619] Other basic shapes can be considered as connecting elements of the masses: e.g., when tetrahedrons are considered, at least three non-commuting sets can be considered.

[00620] As an example, a system can be expressed in a state space representation:

[00621]

[00622] The system can have a response to different canonical input signals, such as impulse, step, sine, and cosine. As an example, thanks to the system linearity, the system has single-input and single-output. The system response to any input signal, can be decomposed through Taylor or Fourier series expansions.

[00623] The response of the LTI system (7) to an impulse u of a given amplitude c is:

[00624]

[00625] Supposing that Bc is equal to x 0 , the impulse response can be regarded as a free evolution with initial conditions Be. The results also hold in case of real impulsive inputs, e.g., finite time duration signals, provided that this duration is sufficiently shorter than the dominant time constant of the recipient dynamic system.

[00626] The response of the system to a step input (as shown in Figure ) and from an initial condition x 0 , is given by:

[00627]

[00628] Figure 26 shows a step force 2600 according to an example of the instant disclosure. [00629] The expression can also be written as the sum of three different addends:

[00630]

[00631] Where:

[00632] is the free evolution of the system response starting from the initial condition x 0 ;

[00633] is the transient time evolution of the system response for the step of amplitude u 0 ;

[00634] s a constant steady-state term that the system will asymptotically reach in case the system is stable.

[00635] The disclosure provides an example of determining the free evolution term e At x 0 . The other two terms, instead, need a remark. They contain the inverse of A and if this matrix is large this calculation may become a burdensome task. Nevertheless, there are parallel algorithms that can efficiently perform this kind of computation and some of them also use GPUs.

[00636] Furthermore, the inversion of A can be evaluated only once and it can be done a priori so that this onerous computation is not involved in the time evolution iteration cycles. As a conclusion, matrix A inversion can be easily implemented and does not have a great impact on the overall code computational performance.

[00637] The results of many trials can be provided in one computational run, and it can be easily implemented. To do so, the vectors x 0 and B are to be substituted by matrices X(0) and Bu 0 because their columns’ vectors are the initial conditions and the step amplitude of the trials. It follows that this equation is to be solved:

[00638]

[00639] However, recalling system linearity, all possible combinations of initial conditions and step inputs can be calculated by solving a model in which X(0) and Bu 0 are identity matrices. Once this calculation has been formed all the possible combinations can be recast by considering the linear combinations of the free evolution and forced step input solutions.

[00640] [00641] Figure 27 and Figure 28 depict the results of the simulation of the one-dimensional mass-spring-damper model with A being a band matrix. Figure 27 shows system free evolution on which has been added the step response. Figure 28 depicts only the system’s step response.

[00642] Figure 27 shows free evolution plus step response of the first state variable including all the simulations 2700 according to an example of the instant disclosure.

[00643] Figure 28 shows step response of the first state variable including all the simulations 2800 according to an example of the instant disclosure.

[00644] Exponential input

[00645] Let the input signal be and assume that where λ i (A) is the z’-th eigenvalue of the matrix A. The complete (free and forced) state response of the LTI system is given by

[00646]

[00647]

[00648]

[00649] Note that, in case of asymptotically stable systems, the state response is composed of a transient term, which resembles a free evolution term, and a steady-state term, which is proportional to the exponential input (pure exponential response).

[00650] The response of the LTI system (7) to a sine input signal u(t) = Usin(ωt + α) starting from the initial state x(0) = x 0 is given by:

[00651]

[00652] With

[00653]

[00654]

[00655]

[00656] Considering a set of initial conditions X(0), the expression becomes:

[00657] .

[00658] When setting X(0) and B to be the identity matrices: [00659]

[00660] it is possible to compute all the possible states in one single run.

[00661] Figure 29 and Figure 30 show the results of the simulation of the one-dimensional mass-spring-damper model with A being a band matrix. Figure 29 shows system free evolution and an added a sine force, and Figure 30 depicts the system’s sine response.

[00662] Figure 29 shows free evolution plus sine response of the first state variable including all the simulations for ω = 1 2900 according to an example of the instant disclosure.

[00663] Figure 30 shows a sine response of the first state variable including all the simulations for ω = 1 3000 according to an example of the instant disclosure.

[00664] As an example, the system is considering linear systems and thus, the superposition principle holds. As a result, it is possible to decompose any input signal u(t) as a sum of basic signals u(t) = u t (t) + u 2 (t) + ••• u n (t) + ••• whose system’s response is rather easy to evaluate. The total system’s response will be therefore the sum of the system’s solution to each basic signal: [00665]

[00666]

[00667]

[00668] Although it is possible to consider signal decomposition in any basis, as an example, the system can decompose the input signal by using the Taylor or Fourier series.

[00669] The Taylor series of a signal u(t) with an error of order o(t n+1 ) is given by:

[00670] u [00671] The system response can be computed by summing the system’s response to each single addend of the Taylor series:

[00672]

[00673] [00674]

[00675] The total system response can be therefore decomposed as the sum of the step, ramp, ... etc ... responses.

[00676] Almost any periodical signal u(t) can be decomposed by a Fourier series

[00677]

[00678] The system’s response is therefore given by:

[00679]

[00680]

[00681]

[00682] The system’ s response to a periodical signal is the sum of different sine and cosine signals at different (multiple) frequencies. The error ε is given by:

[00683]

[00684] As an example, a finite element method (FEM) having a cantilever beam is modelled as shown in Figure 31. Let p, E and / be the density, Young elasticity module and the moment of inertia of the bar respectively.

[00685] The system can provide parallel implementation for a general matrix. As an example, the system may perform parallel implementation with a matrix partitioned into four parts. [00686] The following provides a description of implementation on a parallel manner the matrix exponential for any matrix exploiting the Suzuki Trotter decomposition.

[00687] It is possible to consider a matrix r partitioned by the following four submatrices A, B, C and D

[00688]

[00689] Without loss of generality matrix F can be considered as formed by the following matrices such as F

[00691] It is possible to consider the second order Suzuki Trotter decomposition [00692]

[00693]

[00694]

[00695]

[00696]

[00697]

[00698] The following jobs are allocated to the four threads

[00699]

[00700]

[00701]

[00702]

[00703] The following implementation can be applied to reduce the number of matrices product

[00704]

[00705]

[00706]

[00707]

[00708] The matrix exponential is transformed in matrices products, GPUs are therefore a viable hardware solution since they are very powerful to perform this operation. [00709] Cascade parallel implementation

[00710] In this implementation, the terms e xA and e xD are approximated by a Suzuki Trotter expansion creating a cascade solution. By denoting

[00711]

[00712] The matrix F is partitioned as follows:

[00713]

[00714] t

[00715] t

[00716] t

[00717] t

[00718] D

[00719] e

[00720] The term e xA can be approximated by using Suzuki Trotter decomposition and therefore:

[00721]

[00722] Four threads can be for instance be employed to perform the following computation:

[00723]

[00724]

[00725]

[00726]

[00727] [00728] )

[00729] The calculation can be recast in two computational layers, firstly thread 11 and thread 42 will be computed and secondly the thread 24 and thread 22 which will use the result previously calculated. In this way only two threads can be in total employed.

[00730]

[00731]

[00732]

[00733]

[00734] Following the same logic:

[00735]

[00736] [00737] [00738]

[00739]

[00740] Applying again Suzuki Trotter decomposition to the terms e xA11 , e xAzz , e xD11 , and e xD 22, the procedure can be iteratively repeated until reaching the desired size.

[00741] Parallel implementation for a FEM system with mass matrix formed by blocks is discussed below.

[00742] State system representation

[00743] Let M, K and C be the mass, spring, and dump matrices with the following block structure:

[00744] [00745]

[00746]

[00747] The state space representation is given by:

[00748]

[00749] Noting hat

[00750]

[00751] Without losing generality, suppose B=0, C=0, the state space representation is given by:

[00752]

[00753] Reordering rows and columns: [00754]

[00755] The representation obtained can be therefore reconducted to the previous cases in which lumped masses are substituted with block masses. No mass band matrices, bi-dimensional (e.g., plates) and three-dimensional cases (e.g., solids) can also be solved.

[00756] Time evolution

[00757] The partition of the dynamic matrix is described below although the same method can be implemented in also other cases (e.g., heat transfer, Black Scholes equation, FEM, etc ...)

[00761] Two sets of non-commuting terms can be identified:

[00762]

[00763]

[00764] Without loss of generality and for sake of simplicity a Suzuki Trotter decomposition 2 nd order approximant is considered:

[00765]

[00768] Let X(0) be a matrix whose columns are vectors of different initial conditions

[00769] X(0) = [x1,(0),x2(0), ... x n (0)]

[00770] Time evolution at instant time t = δ is given by:

[00771] X(δ) = S 2 (δ)X(0) =

[00773] It is possible to perform the Singular Value Decomposition (SVD) on each term of the product and neglect the singular values lower than a given threshold. As an example, considering the first matrix exponential

[00775] where the matrix is diagonal for SVD construction, with the (ordered) singular values on its diagonal. The singular values below a given threshold, say c can be discarded to reduce the system size, e.g., if , then obtaining a lower order diagonal matrix and hence a low order approximation of the matrix exponential. [00776] Analogously for the other terms we have

[00777]

[00778]

[00779]

[00780]

[00781]

[00782]

[00783]

[00784]

[00785]

[00786]

[00787]

[00788]

[00789]

[00790] When considering a multi-cores and multi-threads platform, the following solution can be implemented in three computational layers using three threads as shown in Figure 31.

[00791] At the instant time layer 1 will solve

[00792]

[00793]

[00794]

[00795]

[00796]

[00797]

[00798]

[00799]

[00800]

[00801]

[00802] Such a job assignment holds the property that each thread will deal with a reduced order matrix of mxn size saving memory but also reducing computational effort since the matrix product will be performed also between reduced matrices. Only at the end of the step, the product will be multiplied for the matrix returning to the original space dimension.

[00803] Figure 31 shows parallel implementation using three different computational layers for the second order Suzuki Trotter decomposition 3100 according to an example of the instant disclosure.

[00804] A cantilever beam is a basic example in FEM. The proper connection of many of these elements allows users to model also very complex structures. A very intuitive application is the bridge modelling that can be, indeed, considered as numerous connected beams. However, in FEM there are many basic elements. Two-dimensional elements are for instance modelled through triangle, square, etc ... while three-dimensional elements are modelled through tetrahedra, cubes, etc. All these basic elements when properly connected constitute the fundamentals to model all the structures such as buildings, autos, aircraft etc. For instance, a building, indeed, can be regarded as a connection of pillars and walls that are modelled as beams and planes. The numerical simulation can help designers to properly calculate stress and strain conditions avoiding dangerous events that could lead for instance the building or the bridge to collapse. Analogously, automobiles and aircraft can be modelled through rods, triangle and tetrahedra which are the one-, two- and three-dimensional elements, respectively.

[00805] Without loss of generality and for sake of simplicity, it is possible to divide the bar in four elements of equal length L.

[00806] Figure 32 shows a cantilever beam 3200 according to an example of the instant disclosure.

[00807] Supposing that there is no damping, the differential equation governing the motion of any element of the beam is given by:

[00808] dt dx

[00809] The equation of motion of the element ie{l,2,3,4} is:

[00810]

[00811] Where q is the vector that contains displacement and rotation of the element ie{l,2,3,4) and the input matrix B depends on where the input force is applied to the beam.

[00812] The mass matrix m l of the element ie{l,2,3,4) is given by: [00813]

[00814]

[00815]

[00816]

[00817] while the stiffness matrix is:

[00818]

[00819] The global mass matrix M is given by:

[00820]

[00821] The global stiffness matrix is:

[00822]

[00823] The global vector containing displacement and rotation of all elements is

[00824]

[00825] Considering the constraint, it becomes: [00826]

[00827] The damping matrix C and the state space representation is:

[00828]

[00829] In this case the inverse of the mass matrix M 1 is a full matrix and therefore the matrix A is, in general, almost a full matrix.

[00830] It is possible to divide the state matrix ' n an arbitrary number of non commuting terms A t , A 2 , A 3 , ... A, such that they are small enough to be handled by a processing unit.

[00831] As shown in the previous section, for instance, a second order approximant for a general number of sets of non commuting matrices A 1, A 2 , A 3 , ... A, it is possible to obtain

[00832]

[00833]

[00834] As an example, it is possible to consider the following three matrices decomposition:

[00835]

[00836]

[00837]

[00838] It is possible to obtain the following three sets

[00839]

[00840]

[00841]

[00842] As previously shown, the expression for the second order approximant S 2 (x) for three sets is given by:

[00843]

[00844]

[00845] since [00846]

[00847] e

[00848] where I is the identity matrix.

[00849] The free evolution of the system can be evaluated as extensively shown in the disclosure herein.

[00850] Considering a constant concentrated force of amplitude u 0 applied at the node P 4 , the matrix B becomes:

[00851]

[00852] Thus, there is a step input having the solution as previously discussed.

[00853] A numerical example of a cantilever beam with two nodes is provided below.

[00854] It is possible to consider an aluminium cylindric rod of L r =4 four meters length and divide it in two elements of equal length of meters.

[00855] The state vector in this case becomes:

[00856]

[00857] The physical parameters concerning an element of the rod are:

[00858] [00859] The mass matrix M becomes:

[00860]

[00861]

[00862]

[00863]

[00864]

[00865]

[00868] Under the hypothesis of absence of dumping C=0, the state matrix A is:

[00869] [00870]

[00875] The matrix A simplifies to: By posing

[00882] The following two non-commuting sets are identified:

[00883]

[00884]

[00885] The Suzuki Trotter second order approximant for two non-commuting sets concerning a time interval x = 8 = 10 -3 is given by

[00886]

[00887]

[00888]

[00889]

[00890]

[00891]

[00895]

1 1 5054 06 45160 06 6 7741 06 l OOOO 03 7 5268 10 2 2580 09 3 3870 09

[00896] The one-dimensional diffusion partial differential equation (PDE) describes many physical phenomena including heat transfer.

[00898] As an example, a wall with two faces at prescribed temperature T r , T 2 is considered. [00899] In numerical simulation, a wall with two faces at prescribed temperature T 1; T 2 is an example model to show how heat is transferred through materials. This example can be extended to model any physical and engineering problem in which heat transfer is involved such as automobiles and aircraft engines but also solar panels and even kitchen pots. The wall example can be extended and applied to other very complex fields which also include aerodynamics, gas dynamics, and more in general thermo-fluid dynamics. For instance, in some applications the aim could be to simulate conditions in which the fluid due to the heat exchange modifies its density and pressure triggering a change in the lift of an aircraft wing. This condition may put at risk the safety of the flight possibly causing a crash or flight problem for the aircraft.

[00900] The semi-discretization method is shown in Figure 33. In this example, it is discretized with respect to the spatial variable only.

[00901] Figure 33 shows a heat conductor 3300 according to an example of the instant disclosure.

[00902] The PDE can be recast as a system of linear ordinary differential equations (ODEs) by using the method of the lines. In state space representation,

[00903]

[00904] by posing:

[00905] Thus, it is possible to consider a one-dimensional lattice model with one neighboring site interaction and it is possible to reconduct the problem based on the model as shown in Figure 2. [00906]

[00907]

[00908]

[00909] As an example, it is possible to identify the following sets Φ

[00910]

[00911]

[00912] The system input matrix B is given by:

[00913] [00914] Two step inputs of amplitude and T 2 are applied to the first and last state variable, respectively. The solution of the system under a step input has been extensively previously discussed.

[00915] One-dimensional heat equation numerical example using the method of the lines is provided below.

[00916] As an example, five nodes N + 1 = 5, N = 4, α = 0.5 and L = 1.

[00917]

[00918] The state matrix A is therefore:

[00919]

[00920] The following matrices can be considered:

[00921]

[00922]

[00923] The following two non-commuting sets are identified:

[00924]

[00925]

[00926] A second order Suzuki Trotter decomposition with δ = .01 can be applied:

[00927]

[00928]

[00930] Boundary conditions can be given through the initial time t=0 and Dirichlet conditions. [00931] In this example boundary conditions are given through the initial condition x 0 and

Dirichlet conditions as:

[00932] ƒ(x, 0) = x 0 = sin(πx)

[00933] ƒ(0, t) = 0

[00934] f (1, t) = 0

[00935] As a result:

[00936] ƒ(x, 0) = sin(πx)=

[00937]

[00938] Matrix B is given by

[00939]

[00940]

[00941] The syst em is

[00942] x(.01) =

[00943] x(.02) =

[00944] [00945] As an example, boundary conditions can be given through the initial time t=0 and Dirichlet conditions.

[00946] In this example boundary conditions are given through the initial condition x 0 and Dirichlet conditions as:

[00947] ƒ(x, 0) = x 0 = sin(πx)

[00948] ƒ(0, t) = 0

[00949] ƒ(1, t) = 0

[00950] Therefore:

[00951] ƒ(x, 0) = sin(πx)=

[00952]

[00953] Matrix B is given by

[00954]

[00955]

[00956] The syst em is

[00957] x(.01) =

[00958] x(. 02) = [00959]

[00960] Boundary conditions can be given through an instant time t=T and Dirichlet conditions.

[00961] In this example, at the time T=1 boundary conditions are the following

[00962] f (x, T) = x(T) = sin(πx)

[00963] ƒ(0, t) = 0

[00964] f (1, t) = 0

[00965] Matrix B is given by

[00966]

[00967]

[00968] Considering tha

[00969]

[00970] The following equality is derived:

[00971]

[00972]

[00973] Pre multiplying for :

[00974]

[00975]

[00976] In this case the inverse of the matrix is to be computed.

[00977] Recalling that , the inverse of the matrix exponential can be computed by applying the Suzuki Trotter expansion on the matrix -A.

[00978] For instance, a second order Suzuki Trotter decomposition with δ = .01 will be:

[00979] [00980]

[00981] The partial differential equation (PDE) governing the multi-asset Black-Scholes model can be also solved by using tensor network methodology.

[00982] The Black-Scholes is a PDE which models the dynamics of financial derivatives. Its numerical solution provides derivative investment instruments with a price. Financial derivatives are important contracts that give to investors the possibility to diversify their portfolio and hedge against inflation and deflation. For instance, an automotive manufacturer company may be interested in purchasing a European call option for the steel commodity to reduce the risk that its price can increase worldwide for unforeseen circumstances. The type of the derivative depends on the boundary conditions assigned to the Black Scholes PDE. A non-comprehensive list of the most common derivatives includes Futures, Forward, Swaps, European, America, Exotic options as well as Covered Warrants. The multiasset Black Scholes allows to also evaluate derivatives whose underlying are more than one asset, for instance, different stocks but also stock indexes, commodities, interest rates, etc.

[00983] The Black-Scholes equation to evaluate the price of an option V(S, t) is:

[00985] where S is the price of an asset value, r the free-risk rate, t is the time since the option was issued.

[00986] By denoting:

[00987]

[00988]

[00989]

[00990]

[00991] The equation can be reduced as a diffusion partial differential equation:

[00992]

[00993] Next, consider a multi asset Black-Scholes model. Let S L be the price processes of the assets i= 1, N and the asset satisfies the dynamic [00994] [00995] Assets Si and Sj are correlated and let p be the correlation matrix of all the price processes of the assets:

[00996]

[00997] The matrix p is symmetric and therefore The differential second order product term of the Wiener processes and is given by:

[00998]

[00999] The price processes of the assets verify the following equality:

[001000]

[001001] The multi asset Black-Scholes equation becomes for the option

[001003] The previous equation can be transformed in a N dimensional diffusion equation.

Consider a variable x ( such as: r [n0m010n0n9n]-i

[001010] rnninni

[001011]

[001012] rnnimai

[001013]

[001014] This is a diffusion equation that can be treated and solved as shown herein. [001015] For instance, for a two asset Black Scholes model the equation becomes the following:

[001017] The European call option numerical example using the method of the lines is provided below.

[001018] As an example, consider a European call option with strike price K = 10€, expiring date T = 10, risk-free rate r = 0.02 and volatility δ 2 = 0.4 .

[001019] Denoting by:

[001020]

[001021]

[001022]

[001023]

[001024] The constant parameter has the value:

[001025]

[001026] The stock price interval [S min , S max ] = [0.4,1000] becomes in the new variable

[001027]

[001028]

[001029] Considering along X six points X = [X 0 ,X1, X2,X3,X4,X5 ] and therefore N=5 equally spaced intervals, it is possible to obtain

[001030]

[001031] Recalling that for a European call option, the following equality is derived at the maturity date t = T

[001032]

[001033] the initial conditions for are given by:

[001034]

[001035] [001036]

[001037]

[001038]

[001039]

[001040]

[001041] while the boundary conditions are:

[001042]

[001043]

[001044] Considering that

[001045]

[001046]

[001047] The state matrix A becomes:

[001048]

[001049] The matrix A can be decomposed as following:

[001050]

[001051]

[001052] The following two non-commuting sets are identified:

[001053]

[001054]

[001055] A second order Suzuki Trotter decomposition with δ = .1 can be applied:

[001056]

[001057] [001058]

[001059] Recalling that at X 0 the boundary condition is:

[001060]

[001061] Denoting by

[001062] it is possible to write the boundary condition at as:

[001063]

[001064] The inputs are ° 39O6 Taking into consideration that although two inputs are applied to the system, the first one is a step force of amplitude zero u 0 = 0 and therefore cancels. The matrix B simplifies and becomes:

[001065]

[001066] As a result, only the exponential input response has to be computed by using the formulas previously discussed. The system response at the first-time step can be calculated as:

[001067]

[001068]

[001069] [001070]

[001071] The maximum value can be reached by iterating the time evolution. Once the function has been computed, the call price function K(S, t) can be evaluated through the inverse variables’ transformation.

[001072] The proposed methodology can also resolve non-linear systems in a state space representation that is:

[001073]

[001074] Non linear systems can be linearised around an equilibrium point or along one of system ’s trajectory.

[001075] As an example, in the following the case a ) with , or when and in case b).

[001076] Let be an equilibrium point for the system , i.e., it is The corresponding linearised system around x is:

[001078] where A is the so-called Jacobian matrix and is given by:

[001080] and From the initial conditions it is possible to compute the free evolution of the linearised system at the time instant δ <<1 using the Suzuki Trotter expansion with parameter 8. Next, it is possible to afterwards compute the free evolution of the non-linear system by using the approximation x. This procedure is iteratively repeated until reaching the final point [001081] Figure 34 shows a flowchart of a process 3400 for non-linear systems according to an example of the instant disclosure.

[001082] As an example, the system can be used to solve time-variant systems such as linear time-variant systems.

[001083] As an example, the state-space representation of a linear time-variant (LTV) system can be:

[001084]

[001085] in this case the state and input matrices A(t) and B(t) are time dependent.

[001086] Considering a sufficiently small-time interval δ such as the variation of the function

A(t) and B(t) is negligible with respect to the state response, the LTV system can be approximated with a LTI system as discussed above.

[001087] A non-linear time variant (NTV) system can be written as:

[001088]

[001089] In every sufficiently small-time interval, the NTV systems can be approximated with an NTI system which can be analysed.

[001090] A linear discrete time-invariant system can be expressed in a state space representation such as:

[001091]

[001092] The parallel implementation of a discrete time-invariant system can be achieved in two different approaches (1) by calculating the equivalent continuous system of the discrete representation or (2) by computing the discrete evolution in a direct manner.

[001093] Considering that the free evolution of a linear time invariant discrete system is given by the term

[001095] the following equality is known:

[001097] As consequence, by posing the discrete time evolution has been recast as a continuous time evolution that has been extensively described in the previous sections.

[001098] Let A be a matrix such as [001099] it can be split as a sum of N addends denoted by A 1 , A 2 , ... A N

[001100]

[001101] the product of the matrices not having two consecutive indices is equal to zero

[001102]

[001103] A band matrix satisfies these properties.

[001104] Let thread I (k) be the value of the thread i at the instant time k. At the value k=l, it is possible to obtain

[001105]

[001106] At the instant k=2,

[001107]

[001108] At a generic instant time k+1

[001109]

[001110] Analogously to the continuous case, a SVD transformation can be applied to the term

[001111]

[001112] and discard singular values below a given threshold to reduce data transfer through threads. A scheme of the parallel implementation of a discrete time-invariant system for a band matrix A is shown in Figure 35.

[001113] Figure 35 shows a parallel implementation scheme of a discrete time-invariant system for a band matrix A 3500 according to an example of the instant disclosure. [001114] A does not have to be a band matrix. Let us now suppose the matrix A is such as it can be split as a sum of n addends denoted by A 1 , A 2 , ... A N

[001115]

[001116] the product of the matrices A l and A m is different than zero with I and m not consecutive indices

[001117]

[001118] Apart from the matrices A t and A m , the product of the matrices not having two consecutive indices is equal to zero

[001119] Under this hypothesis, the parallel implementation may have the following modification. Let thread i (k) be the value of the thread i at the instant time k. At the value k=l, the following are obtained

[001120]

[001121] At the instant k=2, [001122]

[001123] At a generic instant time k+1 [001124]

[001125] Also in this case the terms

[001126] [001127] can be transformed through SVD to diminish the data transfer among cores.

[001128] Figure 36 shows a parallel implementation scheme of a discrete time-invariant system for any matrix A 3600 according to an example of the instant disclosure.

[001129] Figure 37 illustrates an example method 3700 of include building a mathematical representation of one of a physical, economic, and engineering problem using at least one differential equation according to an example of the instant disclosure. Although the example method 3700 depicts a particular sequence of operations, the sequence may be altered without departing from the scope of the present disclosure. For example, some of the operations depicted may be performed in parallel or in a different sequence that does not materially affect the function of the method 3700. In other examples, different components of an example device or system that implements the method 3700 may perform functions at substantially the same time or in a specific sequence.

[001130] According to some examples, the method 3700 may include performing at least one tensor network method to numerically solve at least one differential equation at block 3710. The at least one differential equation may be one of a linear differential equation and a non-linear differential equation. Additionally, the at least one differential equation may be one of a linear time variant differential equation and a non-linear time variant differential equation.

[001131] As an example, the tensor network method can be a time-evolving block decimation (TEBD) method.

[001132] Next, according to some examples, the method 3700 may include building a mathematical representation of one of a physical, economic, and engineering problem using the at least one differential equation at block 3720.

[001133] According to some examples, the method 3700 may include determining a graph that defines connections between states of the problem and determining an adjacency matrix at block 3730.

[001134] According to some examples, the method 3700 may include subdividing a matrix into n sets of matrices whose elements commute with each other while at least one element of a set does not commute with at least one element of another set at block 3740.

[001135] According to some examples, the method 3700 may include implementing a Suzuki Trotter decomposition using singular value decomposition to reduce data transfer among cores of at least one processor on the n sets of matrices with a given time interval δ and a predefined p expansion order at block 3750.

[001136] According to some examples, the method 3700 may include evaluating the problem at a time T=Nδ by iteratively performing the Suzuki Trotter decomposition N times at block 3760. [001137] Next, according to some examples, the method 3700 may include generating simulation results for the problem within an error of order of N o (δ p+1 ) at block 3770.

[001138] According to some examples, at least some of the method 3700 may be performed by at least one graphical processing unit (GPU).

[001139] According to some examples, the method 3700 may include numerically solving the at least one differential equation sequentially.

[001140] According to some examples, the method 3700 may include numerically solving the at least one differential equation in parallel.

[001141] Figure 38 shows an example of computing system 3800, which can be for example any computing device making up one or more computing devices or computing units such as a distributed computing system, or any component thereof in which the components of the system are in communication with each other using connection 3805. Connection 3805 can be a physical connection via a bus, or a direct connection into processor 3810, such as in a chipset architecture. Connection 3805 can also be a virtual connection, networked connection, or logical connection.

[001142] In some embodiments, computing system 3800 is a distributed system in which the functions described in this disclosure can be distributed within a datacenter, multiple data centers, a peer network, etc. In some embodiments, one or more of the described system components represents many such components each performing some or all of the function for which the component is described. In some embodiments, the components can be physical or virtual devices. [001143] Example system 3800 includes at least one processing unit (CPU or processor) 3810 and connection 3805 that couples various system components including system memory 3815, such as read-only memory (ROM) 3820 and random access memory (RAM) 3825 to processor 3810. Computing system 3800 can include a cache of high-speed memory 3812 connected directly with, in close proximity to, or integrated as part of processor 3810.

[001144] Processor 3810 can include any general purpose processor and a hardware service or software service, such as services 3832, 3834, and 3836 stored in storage device 3830, configured to control processor 3810 as well as a special-purpose processor where software instructions are incorporated into the actual processor design. Processor 3810 may essentially be a completely self-contained computing system, containing multiple cores or processors, a bus, memory controller, cache, etc. A multi-core processor may be symmetric or asymmetric.

[001145] To enable user interaction, computing system 3800 includes an input device 3845, which can represent any number of input mechanisms, such as a microphone for speech, a touch- sensitive screen for gesture or graphical input, keyboard, mouse, motion input, speech, etc.

[001146] Computing system 3800 can also include output device 3835, which can be one or more of a number of output mechanisms known to those of skill in the art. In some instances, multimodal systems can enable a user to provide multiple types of input/output to communicate with computing system 3800. Computing system 3800 can include communications interface 3840, which can generally govern and manage the user input and system output. There is no restriction on operating on any particular hardware arrangement, and therefore the basic features here may easily be substituted for improved hardware or firmware arrangements as they are developed.

[001147] Storage device 3830 can be a non-volatile memory device and can be a hard disk or other types of computer readable media which can store data that are accessible by a computer, such as magnetic cassettes, flash memory cards, solid state memory devices, digital versatile disks, cartridges, random access memories (RAMs), read-only memory (ROM), and/or some combination of these devices.

[001148] The storage device 3830 can include software services, servers, services, etc., that when the code that defines such software is executed by the processor 3810, it causes the system to perform a function. In some embodiments, a hardware service that performs a particular function can include the software component stored in a computer-readable medium in connection with the necessary hardware components, such as processor 3810, connection 3805, output device 3835, etc., to carry out the function.

[001149] For clarity of explanation, in some instances, the present technology may be presented as including individual functional blocks including functional blocks comprising devices, device components, steps or routines in a method embodied in software, or combinations of hardware and software. [001150] Any of the steps, operations, functions, or processes described herein may be performed or implemented by a combination of hardware and software services or services, alone or in combination with other devices. In some embodiments, a service can be software that resides in memory of a client device and/or one or more servers of a content management system and perform one or more functions when a processor executes the software associated with the service. In some embodiments, a service is a program or a collection of programs that carry out a specific function. In some embodiments, a service can be considered a server. The memory can be a non- transitory computer-readable medium.

[001151] In some embodiments, the computer-readable storage devices, mediums, and memories can include a cable or wireless signal containing a bit stream and the like. However, when mentioned, non-transitory computer-readable storage media expressly exclude media such as energy, carrier signals, electromagnetic waves, and signals per se.

[001152] Methods according to the above-described examples can be implemented using computer-executable instructions that are stored or otherwise available from computer-readable media. Such instructions can comprise, for example, instructions and data which cause or otherwise configure a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Portions of computer resources used can be accessible over a network. The executable computer instructions may be, for example, binaries, intermediate format instructions such as assembly language, firmware, or source code. Examples of computer-readable media that may be used to store instructions, information used, and/or information created during methods according to described examples include magnetic or optical disks, solid-state memory devices, flash memory, USB devices provided with non-volatile memory, networked storage devices, and so on.

[001153] Devices implementing methods according to these disclosures can comprise hardware, firmware and/or software, and can take any of a variety of form factors. Typical examples of such form factors include servers, laptops, smartphones, small form factor personal computers, personal digital assistants, and so on. The functionality described herein also can be embodied in peripherals or add-in cards. Such functionality can also be implemented on a circuit board among different chips or different processes executing in a single device, by way of further example. [001154] The instructions, media for conveying such instructions, computing resources for executing them, and other structures for supporting such computing resources are means for providing the functions described in these disclosures.

Appendix

Example of code to find sets of non-commuting matrices

Code has been developed in Octave.

######################################################### ##################

##

## Example of a program to find sets of non commuting matrices

##

## EFSolutions Gbr Corporation

##

######################################################### ################## clear all close all

## A is the adjacency matrix

## nA System size nA=10;

A=zeros(n A, nA) ;

# Maximum number of sets that the program can identify nterms=6;

## Please set the variable value according to disp("Enter a value 1,2 or 3") disp("1 if you want to solve a Monodimensional system such as mass spring model") disp("2 if you want to solve a Bidimensional system (e.g plate) in which sites are connected through square elements") disp("3 if you want to solve a Bidimensional system (e.g plate) in which sites are connected through triangle elements") prompt="value= value = input(prompt); if value == 1

# Build Monodimensional system: mass spring model disp("SOLVING Monodimensional system") for t=1:size(A,l)-l

A(t,t+1)=1;

A(t+1,t)=1; end

## Order sites (Solution may vary according to the order sites) values=[1,2,3,4,5,6,7,8,9,10]; elseif value == 2 disp("SOLVING Bidimensional system: plate with square elements")

## PLATE with square elements

##123

##456

##789

## 10

A=[ 0, 1,0, 1,0, 0, 0, 0, 0,0;

1,0, 1,0, 1,0, 0, 0, 0,0;

0, 1, 0, 0, 0, 1, 0, 0, 0, 0;

1,0, 0, 0, 1,0, 1,0, 0,0;

0, 1,0, 1,0, 1,0, 1,0,0;

0, 0, 1,0, 1,0, 0, 0, 1,0;

0, 0, 0, 1,0, 0, 0, 1,0, 1;

0, 0, 0, 0, 1,0, 1,0, 1,0;

0, 0, 0, 0, 0, 1,0, 1,0,0;

0, 0, 0, 0, 0, 0, 1, 0, 0, 0];

## Order sites (Solution may vary according to the order sites) values=[1,2,3,4,5,6,7,8,9,10]; elseif value == 3 disp("SOLVING Bidimensional system: plate with triangle elements")

## PLATE with triangle elements

##123

##456

##789

## 10

A=[ 0, 1,0, 1, 1,0, 0, 0, 0,0;

1,0, 1,0, 1, 1,0, 0, 0,0;

0, 1, 0, 0, 0, 1, 0, 0, 0, 0;

1,0, 0, 0, 1,0, 1, 1,0,0;

1, 1,0, 1,0, 1,0, 1, 1,0;

0, 1, 1,0, 1,0, 0, 0, 1,0;

0, 0, 0, 1,0, 0, 0, 1,0, 1;

0, 0, 0, 1, 1,0, 1,0, 1,0;

0, 0, 0, 0, 1, 1,0, 1,0,0;

0, 0, 0, 0, 0, 0, 1, 0, 0, 0]; ## Order sites (Solution may vary according to the order sites) values=[ 1,6, 8, 10,2,4,9,3,5,7]; end disp("The adjacency matrix is:")

A allt=1:nA; for g=1:nA

[i,j]=find(A(g,:)==0); jjc(g,l:size(j,2))=j; end sol_tot=[]; startl=1; for rr=1:nterms sol=[]; index=1; sol(l)=startl; g_save=sol(l); for kt=1:nA t=values(kt); sol_temp=sol; j j_ 1 =intersect(jj c(start 1 , : ),jj c(t, : )) ; sol_temp(index+ 1 )=t ; jj_2=intersect(jj_l ,sol_temp); if isempty(jj_2)==0 && size(jj_2,2)==(index+l) sol=sol_temp; index=index+l; end end sol; sol_TOT(rr, 1 :size(sol,2))=sol; sol_tot=union(sol_tot, sol) ; allt2=setxor(sol_tot,allt); if isempty(allt2)==0 start 1=allt2( 1); endif for t=1:size(sol,2)

[ii,jj]=find(jjc==sol(t)); for tr=1:size(ii,l) jjc(ii(tr),jj(tr))=O; end end if isempty(allt2)==1 if sum(sum(abs(jjc))>0) disp("PROGRAM HAS FOUND AN ERROR")

ERROR endif if sum(sum(sol_TOT))!=55 disp("PROGRAM HAS FOUND AN ERROR") disp("Some values are missing")

ERROR endif disp("PROGRAM SUCCESSFULLY TERMINATED") string2disp=["The number of set identified is ",num2str(rr)]; disp(string2disp) disp("The sets are:") sol_TOT break endif end

Example of code to solve lumped mass spring model with 16 masses

Code has been developed in Octave.

######################################################### #####################

##

##

## Example of a program to implement TEBD algorithm to solve

## systems can be recast in a state space representation ##

## In this code we are simulating a

## lumped mass spring model with 16 masses

##

## Although the following code is sequential

## (it uses only one thread to make easier the understanding)

## the logic of the code is parallel in the sense that

## variables which contain threadl, thread2, thread3, thread4, .. etc ...

## are thought to be allocated to the

## corresponding threads performing a parallel implementation.

##

## EFSolutions Gbr Corporation

##

######################################################### #####################

## clear all close all

####################################################

############# PARAMETERS SETTING

# mows is the number of the rows

# We create exactly 16 matrices

# Hlgen, H2gen, H3gen, H4gen, H5gen, H6gen, H7gen, H8gen ... H16gen

# in this way we need 4 threads n_rows=2;

Nmass=(16*n_rows+2)/2;

## Time evolution

## time = Ndt_step * dt_step

## Simulation time step disp(" Simulation of a 16 lumped mass spring model") prompt="Please, enter time step (default = .01) dt_step = input (prompt); if isempty(dt_step)== 1 dt_step=.01; end dt_step

## Number of time steps prompt="Please, enter Number of time steps (default = 1000)

Ndt_step = input (prompt); if isempty(Ndt_step)== 1 Ndt_step=1000; end

Ndt_step

## SVD treshold prompt="Please, enter SVD treshold (default = le-8) treshold = input (prompt); if isempty(treshold)==1 treshold=1e-8; end treshold

############# END OF PARAMETERS SETTING

#################################################### kMTot3=zeros(2,2,Nmass); cMTot3=zeros(2,2,Nmass); kcMtot=zeros(2,4,Nmass); kMTot2=zeros(Nmass+ 1 ,Nmass+ 1 ) ; cMTot2=zeros(Nmass+ 1 ,Nmass+ 1 ) ;

######################################################### #####################

##

## We build the state matrix A for 16 masses

##

## dx

## — = A*x

## dt

##

######################################################### #####################

# The matrix TEBD_kM_cM is the state space matrix A for tl=1:Nmass kMTot3(l,l,tl)=tl; kMTot3(l,2,tl)=-tl; kMTot3(2,l,tl)=kMTot3(l,2,tl); kMTot3(2,2,tl)=kMTot3(l,l,tl); cMTot3(l,l,tl)=tl/10; cMTot3(l,2,tl)=-tl/10; cMTot3(2, 1 ,t 1 )=cMTot3 ( 1 ,2, 11 ) ; cMTot3(2,2,t 1 )=cMTot3 ( 1 , 1 , 11 ) ; end #kcMtot for tl=1:Nmass kMTot2(tl:tl+l,tl:tl+l)=kMTot2(tl:tl+l,tl:tl+l)+kMTot3(:,:,t l); cMTot2(tl:tl+l,tl:tl+l)=cMTot2(tl:tl+l,tl:tl+l)+cMTot3(:,:,t l); end kMTot2=-kMTot2; cMT ot2=-cMT ot2 ; for rl=1:Nmass-l for tl=1:Nmass-l

TEBD_kM_cM(2* 11 ,2*r 1 - 1 )=kMTot2(t 1 + 1 ,r 1 + 1 ) ;

TEBD_kM_cM(2* 11 ,2*r 1 ) =cMTot2(t 1 + 1 ,r 1 + 1 ) ; end end for tl=1:Nmass-l

TEBD_kM_cM(2*t 1 - 1 ,2*t 1 )= 1 ; end

# The matrix TEBD_kM_cM is the state space matrix A

######################################################### ################## H 1 gen=zeros(n_rows , size(TEB D_kM_cM, 1 ) ) ;

H2gen=zeros(n_rows , size(TEB D_kM_cM, 1 ) ) ;

H3gen=zeros(n_rows,size(TEBD_kM_cM,l));

H4gen=zeros(n_rows , size(TEB D_kM_cM, 1 ) ) ;

H5gen=zeros(n_rows,size(TEBD_kM_cM,l));

H6gen=zeros(n_rows , size(TEB D_kM_cM, 1 ) ) ;

H7 gen=zeros(n_rows , size(TEB D_kM_cM, 1 ) ) ;

H8gen=zeros(n_rows,size(TEBD_kM_cM,l));

H9gen=zeros(n_rows , size(TEB D_kM_cM, 1 ) ) ;

H 1 Ogen=zeros(n_rows, size(TEBD_kM_cM, 1 )) ;

H 11 gen=zeros(n_rows, size(TEBD_kM_cM, 1 )) ;

H 12gen=zeros(n_rows, size(TEBD_kM_cM, 1 )) ;

H 13gen=zeros(n_rows, size(TEBD_kM_cM, 1 )) ;

H 14gen=zeros(n_rows, size(TEBD_kM_cM, 1 )) ;

H 15gen=zeros(n_rows, size(TEBD_kM_cM, 1 )) ;

H 16gen=zeros(n_rows, size(TEBD_kM_cM, 1 )) ;

H 1 gen=TEB D_kM_cM( 1 : n_ro ws , : ) ;

H2gen=TEBD_kM_cM( 1 *n_rows+ 1 : 2*n_rows, : ) ; H3gen=TEBD_kM_cM(2*n_rows+ 1 : 3 *n_rows, : ) ; H4gen=TEB D_kM_cM( 3 *n_rows+ 1 : 4*n_rows , : ) ; H5gen=TEBD_kM_cM(4*n_rows+ 1 : 5 *n_rows, : ) ; H6gen=TEBD_kM_cM(5*n_rows+ 1 : 6*n_rows, : ); H7 gen=TEB D_kM_cM( 6*n_rows+ 1 : 7 *n_rows , : ) ; H8gen=TEBD_kM_cM(7*n_rows+l:8*n_rows,:); H9gen=TEB D_kM_cM( 8 *n_rows+ 1 : 9 *n_rows , : ) ;

H10gen=TEBD_kM_cM(9*n_rows+l : 10*n_rows,:); Hl lgen=TEBD_kM_cM(10*n_rows+l: 1 l*n_rows,:); H12gen=TEBD_kM_cM(l l*n_rows+l: 12*n_rows,:); H 13gen=TEBD_kM_cM( 12*n_rows+ 1 : 13 *n_rows, : ) ; H 14gen=TEBD_kM_cM( 13 * n_ro ws+ 1 : 14* n_ro ws , : ) ; H 15gen=TEBD_kM_cM( 14*n_rows+ 1 : 15 *n_rows, : ) ;

H 16gen=TEBD_kM_cM( 15 * n_ro ws+ 1 : 16* n_ro ws , : ) ;

## Error variable definition and initialization error_X_t_T aylor=zeros( 1 ,Ndt_step) ; error_X_t_Sx=zeros( 1 ,Ndt_step); error_X_t_S4x=zeros( 1 ,Ndt_step); error_X_t_Sx_parallel=zeros( 1 ,Ndt_step); error_X_t_S4x_parallel=zeros( 1 ,Ndt_step); error_X_t_Sx_gradino=zeros( 1 ,Ndt_step); nLl_thl_save=zeros( 1 ,Ndt_step); nL l_th2_save=zeros( 1 ,Ndt_step); nL l_th3_save=zeros( 1 ,Ndt_step); nL l_th4_save=zeros( 1 ,Ndt_step); nL2_thl_save=zeros( 1 ,Ndt_step); nL2_th2_save=zeros( 1 ,Ndt_step); nL2_th3_save=zeros( 1 ,Ndt_step); nL2_th4_save=zeros( 1 ,Ndt_step); nL3_thl_save=zeros( 1 ,Ndt_step); nL3_th2_save=zeros( 1 ,Ndt_step); nL3_th3_save=zeros( 1 ,Ndt_step); nL3_th4_save=zeros( 1 ,Ndt_step); nL4_thl_save=zeros( 1 ,Ndt_step); nL4_th2_save=zeros( 1 ,Ndt_step); nL4_th3_save=zeros( 1 ,Ndt_step); nL4_th4_save=zeros( 1 ,Ndt_step);

Htot=TEBD_kM_cM; kkl=size(Htot,l); ############################################################ ###

## Hit, H3t, H5t ... odd indexes commutes each other

## H2t, H4t, H6t ... even indexes commutes each other

H 1 t=zer os (kk 1 ,kk 1 ) ;

H2t=zeros(kk 1 ,kk 1 ) ;

H3t=zeros(kk 1 ,kk 1 ) ;

H4t=zeros(kk 1 ,kk 1 ) ;

H5t=zeros(kk 1 ,kk 1 ) ;

H6t=zeros(kk 1 ,kk 1 ) ;

H7 t=zeros(kk 1 ,kk 1 ) ;

H8t=zeros(kk 1 ,kk 1 ) ;

H9t=zeros(kk 1 ,kk 1 ) ;

H 10t=zeros(kk 1 ,kk 1 ) ;

H 11 t=zeros(kk 1 ,kk 1 ) ;

H 12t=zeros(kk 1 ,kk 1 ) ;

H 13t=zeros(kk 1 ,kk 1 ) ;

H 14t=zeros(kk 1 ,kk 1 ) ;

H 15t=zeros(kk 1 ,kk 1 ) ;

H 16t=zeros(kk 1 ,kk 1 ) ;

Hlt( 1 :n_rows, :)=H1 gen;

H2t( 1 *n_rows+ 1 :2*n_rows, : )=H2gen; H3t(2*n_rows+ 1 : 3 *n_rows, : )=H3gen ; H4t(3*n_rows+ 1 :4*n_rows, : )=H4gen; H5t(4*n_rows+ 1 : 5 *n_rows, : )=H5gen ; H6t(5 *n_rows+ 1 : 6*n_rows, : )=H6gen ; H7t(6*n_rows+ 1 : 7 *n_rows, : )=H7 gen ; H8t(7*n_rows+ 1 : 8 *n_rows, : )=H8gen ; H9t( 8 *n_rows+ 1 : 9 *n_rows , : )=H9gen ;

H 10t(9 * n_rows+ 1 : 10*n_rows , : )=H 1 Ogen ; H 111( 10 *n_ro ws+ 1: 11* n_rows , : )=H 11 gen ; H 12t( 11 *n_ro ws+ 1 : 12* n_rows , : )=H 12gen ; H 13t( 12*n_rows+ 1 : 13*n_rows, : )=H 13gen; H 14t( 13 *n_ro ws+ 1 : 14* n_rows , : )=H 14gen ; H 15t( 14*n_rows+ 1 : 15*n_rows, : )=H 15gen; H 16t( 15 *n_ro ws+ 1 : 16* n_rows , : )=H 16gen ;

H 1 t=H It* dt_step ;

H2t=H2t*dt_step;

H3t=H3t*dt_step;

H4t=H4t*dt_step;

H5t=H5t*dt_step;

H6t=H6t*dt_step; H7t=H7t*dt_step;

H8t=H8t*dt_step;

H9t=H9t*dt_step;

H 1 Ot=H 1 Ot* dt_step ;

H 11 t=H l it* dt_step ;

H 12t=H 12t* dt_step ;

H 13 t=H 131* dt_step ;

H 14t=H 14t* dt_step ;

H 15 t=H 151* dt_step ;

H 16t=H 16t* dt_step ;

### Setting variables

NMontecarlo=2*Nmass-2;

X_t=zeros(2*Nmass-2,NMontecarlo,Ndt_step);

X_t_gradino=zeros(2*Nmass-2,NMontecarlo,Ndt_step);

X_t_Taylor=zeros(2*Nmass-2,NMontecarlo,Ndt_step);

X_t_Sx=zeros(2*Nmass-2,NMontecarlo,Ndt_step);

X_t_Sx_gradino=zeros(2*Nmass-2,NMontecarlo,Ndt_step);

X_t_Sx_solo_gradino=zeros(2*Nmass-2,NMontecarlo,Ndt_step) ;

X_t_Sx_parallel=zeros(2*Nmass-2,NMontecarlo,Ndt_step);

# Set initial conditions

X_0=zeros(2*Nmass-2,NMontecarlo,l); for tt=1:NMontecarlo

X_O(tt,tt,l)=1; end indexn=1;

X_t( : , : ,indexn)=X_0 ;

X_t_gradino( : , : ,indexn)=X_0 ;

X_t_T aylor( : , : ,indexn)=X_0 ;

X_t_Sx( : , : ,indexn)=X_0;

X_t_S4x( : , : ,indexn)=X_0;

X_t_Sx_parallel(: , : ,indexn)=X_0;

X_t_Sx_S VD_parallel( : , : ,indexn)=X_0;

X_t_S4x_parallel( : , : ,indexn)=X_0;

######################################################### #####################

## ## Preparing data which can be computed offline

## for Suzuki-Trotter 4th order Approximant

## Equations 41 - 42

## https://arxiv.org/pdf/math-ph/0506007.pdf

######################################################### ####

### Coefficients definition s=1/(2-2**(l/3)); s2=(l-s)/2; s3=1-2*s;

######################################################

######################################################

Sx_levell=expm((s/2)*Hlt)*expm((s/2)*H3t)*expm((s/2)*H5t) *expm((s/2)*H7t)* ... expm((s/2)*H9t)*expm((s/2)*Hl lt)*expm((s/2)*H13t)*expm((s/2)*H15t);

Sx_level2=expm(s*H2t)*expm(s*H4t)*expm(s*H6t)*expm(s*H8t) * ... expm(s*H10t)*expm(s*H12t)*expm(s*H14t)*expm(s*H16t)*Sx_level l;

Sx_level3=expm(s2*Hlt)*expm(s2*H3t)*expm(s2*H5t)*expm(s2* H7t)* ... expm(s2*H9t)*expm(s2*Hl lt)*expm(s2*H13t)*expm(s2*H15t)*Sx_level2;

Sx_level4=expm(s3*H2t)*expm(s3*H4t)*expm(s3*H6t)*expm(s3* H8t)* ... expm(s3*H10t)*expm(s3*H12t)*expm(s3*H14t)*expm(s3*H16t)*Sx_l evel3;

Sx_level5=expm(s2*Hlt)*expm(s2*H3t)*expm(s2*H5t)*expm(s2* H7t)* ... expm(s2*H9t)*expm(s2*Hl lt)*expm(s2*H13t)*expm(s2*H15t)*Sx_level4;

Sx_level6=expm(s*H2t)*expm(s*H4t)*expm(s*H6t)*expm(s*H8t) * ... expm(s*H10t)*expm(s*H12t)*expm(s*H14t)*expm(s*H16t)*Sx_level 5;

Sx_level7=expm((s/2)*Hlt)*expm((s/2)*H3t)*expm((s/2)*H5t) *expm((s/2)*H7t)* ... expm((s/2)*H9t)*expm((s/2)*Hl lt)*expm((s/2)*H13t)*expm((s/2)*H15t)*Sx_level6;

Sx=Sx_level7;

## Calculating matrices for Suzuki Trotter for Level 1

Sx_levell_threadl=expm((s/2)*Hlt)*expm((s/2)*H3t);

Sx_levell_thread2=expm((s/2)*H5t)*expm((s/2)*H7t);

Sx_levell_thread3=expm((s/2)*H9t)*expm((s/2)*Hl lt);

Sx_levell_thread4=expm((s/2)*H13t)*expm((s/2)*H15t);

## Calculating matrices for Suzuki Trotter for Level 2 Sx_level2_threadl=expm(s*H2t)*expm(s*H4t);

Sx_level2_thread2=expm(s*H6t)*expm(s*H8t);

Sx_level2_thread3=expm(s*H10t)*expm(s*H12t);

Sx_level2_thread4=expm(s*H14t)*expm(s*H16t);

## Calculating matrices for Suzuki Trotter for Level 3

Sx_level3_threadl=expm(s2*Hlt)*expm(s2*H3t);

Sx_level3_thread2=expm(s2*H5t)*expm(s2*H7t);

Sx_level3_thread3=expm(s2*H9t)*expm(s2*Hl It);

Sx_level3_thread4=expm(s2*H13t)*expm(s2*H15t);

## Calculating matrices for Suzuki Trotter for Level 4

Sx_level4_threadl=expm(s3*H2t)*expm(s3*H4t);

Sx_level4_thread2=expm(s3*H6t)*expm(s3*H8t);

Sx_level4_thread3=expm(s3*H10t)*expm(s3*H12t);

Sx_level4_thread4=expm(s3 *H 14t)*expm(s3 *H 16t) ;

## Calculating matrices for Suzuki Trotter for Level 5

Sx_level5_threadl=expm(s2*Hlt)*expm(s2*H3t);

Sx_level5_thread2=expm(s2*H5t)*expm(s2*H7t);

Sx_level5_thread3=expm(s2*H9t)*expm(s2*Hl It);

Sx_level5_thread4=expm(s2*H13t)*expm(s2*H15t);

## Calculating matrices for Suzuki Trotter for Level 6

Sx_level6_threadl=expm(s*H2t)*expm(s*H4t);

Sx_level6_thread2=expm(s*H6t)*expm(s*H8t);

Sx_level6_thread3=expm(s*H10t)*expm(s*H12t);

Sx_level6_thread4=expm(s*H14t)*expm(s*H16t);

## Calculating matrices for Suzuki Trotter for Level 7

Sx_level7_threadl=expm((s/2)*Hlt)*expm((s/2)*H3t);

Sx_level7_thread2=expm((s/2)*H5t)*expm((s/2)*H7t);

Sx_level7_thread3=expm((s/2)*H9t)*expm((s/2)*Hl lt);

Sx_level7_thread4=expm((s/2)*H13t)*expm((s/2)*H15t);

## Computation of matrix A inverse for step response invA=inv(Htot);

# Setting matrix B

B=X_0;

Sx_2=eye(size(Htot)) ; save_neig=zeros( 1 ,Ndt_step);

######################################################### #####################

##

##

## Time evolution

## Start For Cycle

Ill ##

######################################################### #####################

## tic for timevalue=dt_step:dt_step:Ndt_step*dt_step timevalue indexn=indexn+ 1

######################################################### #####################

## Computing the Free evolution EXACT SOLUTION

## It will be used for comparison with respect to the

## Suzuki Trotter expansion expHtot=expm(Htot*timevalue) ;

#X_0=[l,0,0,0,0,0,0,0];

X_t( : , : ,indexn)=expHtot*X_0;

## Computing the Step Response EXACT SOLUTION

X_t_gradino( : , : ,indexn)=expHtot* (X_0-inv A*B )-invA*B ;

######################################################### #####################

## Taylor expansion (only for verification)

## Computation of the matrix exponential using a fourth order

## Taylor expansion

## It will be used to compare with the Suzuki Trotter provides

## an approximated solution of the 4th order

HtotTay=Htot*dt_step; taylor 1 =eye(size(Htot))+HtotT ay; taylor2=taylor 1 +( 1/2)* HtotT ay *HtotT ay ; taylor3=taylor2+(l/6)*HtotTay*HtotTay*HtotTay; taylor4=taylor3+(l/24)*HtotTay*HtotTay*HtotTay*HtotTay;

#taylor5=taylor4+(l/120)*HtotTay*HtotTay*HtotTay*HtotTay* HtotTay;

#taylor6=taylor5+(l/(120*6))*HtotTay*HtotTay*HtotTay*Htot Tay*HtotTay*HtotTay;

Taylor=taylor4;

X_t_T aylor( : , : ,indexn)=T aylor*X_t_Taylor( : , : ,indexn- 1 ) ;

## error_X_t_Taylor contains the error of the Taylor expansion ## with respect to the exact solution error_temp_T aylor=max(max(abs(X_t( : , : ,indexn)-X_t_Taylor( : , : ,indexn)))) error_X_t_Taylor(indexn)=error_temp_Taylor;

######################################################### #####################

## End Taylor expansion

######################################################### #####################

######################################################### #####################

######################################################### #####################

## Starting Parallel implementation

## Suzuki-Trotter decomposition 4th order Approximant

## Equation 42

## https://arxiv.org/pdf/math-ph/0506007.pdf

###### Step 1 ########################

X_t_Sx_S VD_level l_thread 1 =Sx_level l_thread 1 *X_t_Sx_parallel( : , : ,indexn- 1 ) ;

X_t_Sx_S VD_level l_thread2=Sx_level l_thread2*X_t_Sx_parallel( : , : ,indexn- 1 ) ;

X_t_Sx_SVD_levell_thread3=Sx_levell_thread3*X_t_Sx_parall el(:,:,indexn-l);

X_t_Sx_S VD_level l_thread4=Sx_level l_thread4*X_t_Sx_parallel( : , : ,indexn- 1 ) ;

###### Step 2 ######################## ## Indeces to reduce matrices sizes

## to transfer only a part of it among threads in_thl=9; fi_thl=10; in_th2_l=7; fi_th2_l=8; in_th2=17; fi_th2=18; in_th3_l=15; fi_th3_l=16; in_th3=25; fi_th3=26; in_th4=23; fi_th4=24;

######################################################

## Singular Value Decomposition

[u_th 1 , s_th 1 , v_th 1 ] =s vd(X_t_Sx_S VD_le vel 1 _thread 1 ) ;

[ii_thl,jj_thl]=find(abs(diag(s_thl))>treshold); nL 1 _th 1 =size(ii_th 1,1);

X_t_Sx_S VD_level 1 _thread 1 _Red=u_th 1 ( : , 1 : nL 1 _th 1 ) * s_th 1 ( 1 : nL 1 _th 1 , 1 : nL 1 _th 1 ) * v_th 1 ( : , 1 : nLl_thl)'; transfer_L 1 _th 1 =size(u_th 1 ( : , 1 : nL 1 _th 1 ) , 1 ) *nL 1 _th 1 +nL 1 _th 1 +size( v_th 1 ( : , 1 : nL 1 _th 1 ) , 1 ) *nL 1 _thl;

#data2tr_th 1 =size(u_th 1 ( : , 1 : nL 1 _th 1 ) , 1 ) * size(u_th 1 ( : , 1 : nL 1 _th 1 ) ,2)+nL 1 _th 1 +...

#size(v_thl(:,l :nLl_thl),l)*size(v_thl(:,l :nLl_thl),2)

[u_th2, s_th2, v_th2] =s vd(X_t_Sx_S VD_le vel 1 _thread2) ;

[ii_th2,jj_th2]=find(abs(diag(s_th2))>treshold); nL 1 _th2=size(ii_th2, 1 ) ;

X_t_Sx_SVD_levell_thread2_Red=u_th2(:, 1 :nLl_th2)*s_th2(l :nLl_th2, 1 :nLl_th2)*v_th2(:, 1 : nLl_th2)'; transfer_L 1 _th2=size(u_th2( : , 1 : nL 1 _th2) , 1 ) *nL 1 _th2+nL 1 _th2+size( v_th2( : , 1 : nL 1 _th2) , 1 ) *nL 1 _th2;

[u_th3 , s_th3 , v_th3 ] =s vd(X_t_Sx_S VD_le vel 1 _thread3 ) ;

[ii_th3,jj_th3]=find(abs(diag(s_th3))>treshold); nL 1 _th3 =size(ii_th3 , 1 ) ;

X_t_Sx_S VD_level 1 _thread3_Red=u_th3 ( : , 1 : nL 1 _th3 ) * s_th3 ( 1 : nL 1 _th3 , 1 : nL 1 _th3 ) * v_th3 ( : , 1 : nLl_th3)'; transfer_L 1 _th3 =size(u_th3 ( : , 1 : nL 1 _th3 ) , 1 ) *nL 1 _th3 +nL 1 _th3+size( v_th3 ( : , 1 : nL 1 _th3 ) , 1 ) *nL 1 _th3;

[u_th4, s_th4, v_th4] =s vd(X_t_Sx_S VD_le vel 1 _thread4) ;

[ii_th4,jj_th4]=find(abs(diag(s_th4))>treshold); nL 1 _th4=size(ii_th4, 1 ) ; X_t_Sx_SVD_levell_thread4_Red=u_th4(:, 1 :nLl_th4)*s_th4(l :nLl_th4, 1 :nLl_th4)*v_th4(:, 1 : nLl_th4)'; transfer_L 1 _th4=size(u_th4( : , 1 : nL 1 _th4) , 1 ) *nL 1 _th4+nL 1 _th4+size( v_th4( : , 1 : nL 1 _th4) , 1 ) *nL 1 _th4; nL 1 _th 1 _save(indexn)=nL 1 _th 1 ; nL 1 _th2_save(indexn)=nL 1 _th2 ; nL l_th3_save(indexn)=nL 1 _th3 ; nL 1 _th4_save(indexn)=nL 1 _th4 ;

#save_neig(indexn)=neig;

#matRed=ueig(: , 1 :neig)*seig( 1 :neig, 1 :neig)*veig(: , 1 :neig)';

X_t_Sx_SVD_levell_threadl_input=X_t_Sx_SVD_levell_threadl _Red;

X_t_Sx_S VD_level 1 _thread 1 _input(in_th 1 : fi_th 1 , : )=X_t_S x_S VD_le vel 1 _thread2_Red(in_th 1 : fi_thl,:);

X_t_Sx_SVD_levell_thread2_input=X_t_Sx_SVD_levell_thread2 _Red;

#X_t_Sx_S VD_level I_thread2_input(in_th2_ 1 : fi_th2_ 1 , : )=X_t_Sx_S VD_level l_thread 1 _Red(i n_th2_ 1 : fi_th2_ 1 , : ) ;

X_t_Sx_SVD_levell_thread2_input(in_th2:fi_th2,:)=X_t_Sx_S VD_levell_thread3_Red(in_th2: fi_th2,:);

X_t_Sx_SVD_levell_thread3_input=X_t_Sx_SVD_levell_thread3 _Red;

#X_t_Sx_SVD_levell_thread3_input(in_th3_l:fi_th3_l,:)=X_t _Sx_SVD_levell_thread2_Red(i n_th3_ 1 : f i_th 3 > 1 , : ) ;

X_t_Sx_S VD_level 1 _thread3_input(in_th3 : fi_th3 , : )=X_t_S x_S VD_le vel 1 _thread4_Red(in_th3 : fi_th3,:);

X_t_Sx_SVD_levell_thread4_input=X_t_Sx_SVD_levell_thread4 _Red;

#X_t_Sx_SVD_level I_thread4_input(in_th4:fi_th4, :)=X_t_Sx_SVD_level l_thread3_Red(in_th 4:fi_th4,:);

X_t_Sx_SVD_level2_threadl=Sx_level2_threadl*X_t_Sx_SVD_le vell_threadl_input;

X_t_Sx_SVD_level2_thread2=Sx_level2_thread2*X_t_Sx_SVD_le vell_thread2_input;

X_t_Sx_SVD_level2_thread3=Sx_level2_thread3*X_t_Sx_SVD_le vell_thread3_input;

X_t_Sx_SVD_level2_thread4=Sx_level2_thread4*X_t_Sx_SVD_le vell_thread4_input; ###### Step 3 #######################

[u_th 1 , s_th 1 , v_th 1 ] =s vd(X_t_Sx_S VD_le ve!2_thread 1 ) ;

[ii_thl,jj_thl]=find(abs(diag(s_thl))>treshold); nL2_th 1 =size(ii_th 1,1);

X_t_Sx_S VD_level2_thread 1 _Red=u_th 1 ( : , 1 : nL2_th 1 ) * s_th 1(1: nL2_th 1,1: nL2_th 1 ) * v_th 1 ( : , 1 : nL2_thl)'; transfer_L2_th 1 =size(u_th 1 ( : , 1 : nL2_th 1 ) , 1 ) *nL2_th 1 +nL2_th 1 +size( v_th 1 ( : , 1 : nL2_th 1 ) , 1 ) *nL2 _thl;

[u_th2,s_th2,v_th2]=svd(X_t_Sx_SVD_level2_thread2);

[ii_th2,jj_th2]=find(abs(diag(s_th2))>treshold); nL2_th2=size(ii_th2, 1 );

X_t_Sx_SVD_level2_thread2_Red=u_th2(: , 1 :nL2_th2)*s_th2( 1 :nL2_th2, 1 :nL2_th2)*v_th2(:, 1 : nL2_th2)'; transfer_L2_th2=size(u_th2(: , 1 :nL2_th2), 1 )*nL2_th2+nL2_th2+size(v_th2(: , 1 :nL2_th2), 1 )*nL2 _th2;

[u_th3 , s_th3 , v_th3 ] =s vd(X_t_Sx_S VD_le vel2_thread3 ) ;

[ii_th3,jj_th3]=find(abs(diag(s_th3))>treshold); nL2_th3=size(ii_th3, 1 );

X_t_Sx_SVD_level2_thread3_Red=u_th3(: , 1 :nL2_th3)*s_th3( 1 :nL2_th3, 1 :nL2_th3)*v_th3(:, 1 : nL2_th3)'; transfer_L2_th3 =size(u_th3 ( : , 1 : nL2_th3 ) , 1 ) *nL2_th3 +nL2_th3+size( v_th3 ( : , 1 : nL2_th3 ) , 1 ) *nL2 _th3;

[u_th4,s_th4,v_th4]=svd(X_t_Sx_SVD_level2_thread4);

[ii_th4,jj_th4]=find(abs(diag(s_th4))>treshold); nL2_th4=size(ii_th4, 1 );

X_t_Sx_SVD_level2_thread4_Red=u_th4(: , 1 :nL2_th4)*s_th4( 1 :nL2_th4, 1 :nL2_th4)*v_th4(:, 1 : nL2_th4)'; transfer_L2_th4=size(u_th4(: , 1 :nL2_th4), 1 )*nL2_th4+nL2_th4+size(v_th4(: , 1 :nL2_th4), 1 )*nL2 _th4; nL2_thl_save(indexn)=nL2_thl ; nL2_th2_save(indexn)=nL2_th2; nL2_th3_save(indexn)=nL2_th3 ; nL2_th4_save(indexn)=nL2_th4;

X_t_Sx_SVD_level2_threadl_input=X_t_Sx_SVD_level2_threadl _Red;

#X_t_Sx_SVD_level2_threadl_input(in_thl :fi_thl , :)=X_t_Sx_SVD_level2_thread2_Red(in_th l:fi_thl,:);

X_t_Sx_SVD_level2_thread2_input=X_t_Sx_SVD_level2_thread2 _Red;

X_t_Sx_S VD_level2_thread2_input(in_th2_ 1 : fi_th2_ 1 , : )=X_t_Sx_S VD_level2_thread l_Red(in _th2_l:fi_th2_l,:);

#X_t_Sx_SVD_level2_thread2_input(in_th2:fi_th2,:)=X_t_Sx_ SVD_level2_thread3_Red(in_th 2:fi_th2,:);

X_t_Sx_SVD_level2_thread3_input=X_t_Sx_SVD_level2_thread3 _Red;

X_t_Sx_S VD_level2_thread3_input(in_th3_ 1 : fi_th3_ 1 , : )=X_t_Sx_S VD_level2_thread2_Red(in _th3_l :fi_th3_l,:);

#X_t_Sx_SVD_level2_thread3_input(in_th3:fi_th3,:)=X_t_Sx_ SVD_level2_thread4_Red(in_th 3:fi_th3,:);

X_t_Sx_SVD_level2_thread4_input=X_t_Sx_SVD_level2_thread4 _Red;

X_t_Sx_SVD_level2_thread4_input(in_th4:fi_th4,:)=X_t_Sx_S VD_level2_thread3_Red(in_th4: fi_th4,:);

X_t_Sx_SVD_level3_threadl=Sx_level3_threadl*X_t_Sx_SVD_le vel2_threadl_input;

X_t_Sx_SVD_level3_thread2=Sx_level3_thread2*X_t_Sx_SVD_le vel2_thread2_input;

X_t_Sx_SVD_level3_thread3=Sx_level3_thread3*X_t_Sx_SVD_le vel2_thread3_input;

X_t_Sx_SVD_level3_thread4=Sx_level3_thread4*X_t_Sx_SVD_le vel2_thread4_input;

######## Step 4 ########################

[u_th 1 , s_th 1 , v_th 1 ] =s vd(X_t_Sx_S VD_le ve!3_thread 1 ) ;

[ii_thl,jj_thl]=find(abs(diag(s_thl))>treshold); nL3_th 1 =size(ii_th 1,1);

X_t_Sx_S VD_level3_thread l_Red=u_th 1 ( : , 1 : nL3_th 1 )*s_th 1(1: nL3_th 1,1: nL3_th 1 )* v_th 1 ( : , 1 : nL3_thl)'; transfer_L3_thl=size(u_thl(:,l:nL3_thl),l)*nL3_thl+nL3_thl+s ize(v_thl(:,l:nL3_thl),l)*nL3 _thl; [u_th2,s_th2,v_th2]=svd(X_t_Sx_SVD_level3_thread2);

[ii_th2,jj_th2]=find(abs(diag(s_th2))>treshold); nL3_th2=size(ii_th2, 1 );

X_t_Sx_SVD_level3_thread2_Red=u_th2(: , 1 :nL3_th2)*s_th2( 1 :nL3_th2, 1 :nL3_th2)*v_th2(:, 1 : nL3_th2)'; transfer_L3_th2=size(u_th2(:,l:nL3_th2),l)*nL3_th2+nL3_th2+s ize(v_th2(:,l:nL3_th2),l)*nL3 _th2;

[u_th3 , s_th3 , v_th3 ] =s vd(X_t_Sx_S VD_le vel3_thread3 ) ;

[ii_th3 jj_th3]=find(abs(diag(s_th3))>treshold); nL3_th3 =size(ii_th3 , 1 ) ;

X_t_Sx_SVD_level3_thread3_Red=u_th3(: , 1 :nL3_th3)*s_th3( 1 :nL3_th3, 1 :nL3_th3)*v_th3(:, 1 : nL3_th3)'; transfer_L3_th3=size(u_th3(:,l:nL3_th3),l)*nL3_th3+nL3_th3+s ize(v_th3(:,l:nL3_th3),l)*nL3 _th3;

[u_th4,s_th4,v_th4]=svd(X_t_Sx_SVD_level3_thread4);

[ii_th4,jj_th4]=find(abs(diag(s_th4))>treshold); nL3_th4=size(ii_th4, 1 );

X_t_Sx_SVD_level3_thread4_Red=u_th4(: , 1 :nL3_th4)*s_th4( 1 :nL3_th4, 1 :nL3_th4)*v_th4(:, 1 : nL3_th4)'; transfer_L3_th4=size(u_th4(:,l:nL3_th4),l)*nL3_th4+nL3_th4+s ize(v_th4(:,l:nL3_th4),l)*nL3 _th4; nL3_th 1 _save(indexn)=nL3_th 1 ; nL3_th2_save(indexn)=nL3_th2; nL3_th3_save(indexn)=nL3_th3 ; nL3_th4_save(indexn)=nL3_th4;

X_t_Sx_SVD_level3_threadl_input=X_t_Sx_SVD_level3_threadl _Red;

X_t_Sx_S VD_level3_thread l_input(in_th 1 : fi_th 1 , : )=X_t_Sx_S VD_level3_thread2_Red(in_th 1 : fi_thl,:);

X_t_Sx_SVD_level3_thread2_input=X_t_Sx_SVD_level3_thread2 _Red;

#X_t_Sx_S VD_level3_thread2_input(in_th2_ 1 : fi_th2_ 1 , : )=X_t_Sx_S VD_level3_thread 1 _Red(i n_th2_ 1 : fi_th2_ 1 , : ) ; X_t_Sx_SVD_level3_thread2_input(in_th2:fi_th2,:)=X_t_Sx_SVD_ level3_thread3_Red(in_th2: fi_th2,:);

X_t_Sx_SVD_level3_thread3_input=X_t_Sx_SVD_level3_thread3 _Red;

#X_t_Sx_SVD_level3_thread3_input(in_th3_l:fi_th3_l,:)=X_t _Sx_SVD_level3_thread2_Red(i n_th3_ 1 : f i_th 3 > 1 , : ) ;

X_t_Sx_SVD_level3_thread3_input(in_th3:fi_th3,:)=X_t_Sx_S VD_level3_thread4_Red(in_th3: fi_th3,:);

X_t_Sx_SVD_level3_thread4_input=X_t_Sx_SVD_level3_thread4 _Red;

#X_t_Sx_SVD_level3_thread4_input(in_th4:fi_th4,:)=X_t_Sx_ SVD_level3_thread3_Red(in_th 4:fi_th4,:);

X_t_Sx_SVD_level4_threadl=Sx_level4_threadl*X_t_Sx_SVD_le vel3_threadl_input;

X_t_Sx_SVD_level4_thread2=Sx_level4_thread2*X_t_Sx_SVD_le vel3_thread2_input;

X_t_Sx_SVD_level4_thread3=Sx_level4_thread3*X_t_Sx_SVD_le vel3_thread3_input;

X_t_Sx_SVD_level4_thread4=Sx_level4_thread4*X_t_Sx_SVD_le vel3_thread4_input;

######## Step 5 ########################

[u_th 1 , s_th 1 , v_th 1 ] =s vd(X_t_Sx_S VD_le ve!4_thread 1 ) ;

[ii_thl,jj_thl]=find(abs(diag(s_thl))>treshold); nL4_th 1 =size(ii_th 1,1);

X_t_Sx_S VD_level4_thread 1 _Red=u_th 1 ( : , 1 : nL4_th 1 ) * s_th 1(1: nL4_th 1,1: nL4_th 1 ) * v_th 1 ( : , 1 : nL4_thl)'; transfer_L4_th 1 =size(u_th 1 ( : , 1 : nL4_th 1 ) , 1 ) *nL4_th 1 +nL4_th 1 +size( v_th 1 ( : , 1 : nL4_th 1 ) , 1 ) *nL4 _thl;

[u_th2,s_th2,v_th2]=svd(X_t_Sx_SVD_level4_thread2);

[ii_th2,jj_th2]=find(abs(diag(s_th2))>treshold); nL4_th2=size(ii_th2, 1 );

X_t_Sx_SVD_level4_thread2_Red=u_th2(: , 1 :nL4_th2)*s_th2( 1 :nL4_th2, 1 :nL4_th2)*v_th2(:, 1 : nL4_th2)'; transfer_L4_th2=size(u_th2(: , 1 :nL4_th2), 1 )*nL4_th2+nL4_th2+size(v_th2(: , 1 :nL4_th2), 1 )*nL4 _th2;

[u_th3 , s_th3 , v_th3 ] =s vd(X_t_Sx_S VD_le vel4_thread3 ) ; [ii_th3 Jj_th3]=find(abs(diag(s_th3))>treshold); nL4_th3=size(ii_th3, 1 );

X_t_Sx_SVD_level4_thread3_Red=u_th3(: , 1 :nL4_th3)*s_th3( 1 :nL4_th3, 1 :nL4_th3)*v_th3(:, 1 : nL4_th3)'; transfer_L4_th3 =size(u_th3 ( : , 1 : nL4_th3 ) , 1 ) *nL4_th3 +nL4_th3+size( v_th3 ( : , 1 : nL4_th3 ) , 1 ) *nL4 _th3;

[u_th4,s_th4,v_th4]=svd(X_t_Sx_SVD_level4_thread4);

[ii_th4,jj_th4]=find(abs(diag(s_th4))>treshold); nL4_th4=size(ii_th4, 1 );

X_t_Sx_SVD_level4_thread4_Red=u_th4(: , 1 :nL4_th4)*s_th4( 1 :nL4_th4, 1 :nL4_th4)*v_th4(:, 1 : nL4_th4)'; transfer_L4_th4=size(u_th4(: , 1 :nL4_th4), 1 )*nL4_th4+nL4_th4+size(v_th4(: , 1 :nL4_th4), 1 )*nL4 _th4; nL4_thl_save(indexn)=nL4_thl ; nL4_th2_save(indexn)=nL4_th2; nL4_th3_save(indexn)=nL4_th3 ; nL4_th4_save(indexn)=nL4_th4;

X_t_Sx_SVD_level4_threadl_input=X_t_Sx_SVD_level4_threadl _Red;

#X_t_Sx_SVD_level4_threadl_input(in_thl :fi_thl , :)=X_t_Sx_SVD_level4_thread2_Red(in_th l:fi_thl,:);

X_t_Sx_SVD_level4_thread2_input=X_t_Sx_SVD_level4_thread2 _Red;

X_t_Sx_S VD_level4_thread2_input(in_th2_ 1 : fi_th2_ 1 , : )=X_t_Sx_S VD_level4_thread l_Red(in _th2_l :fi_th2_l,:);

#X_t_Sx_SVD_level4_thread2_input(in_th2:fi_th2,:)=X_t_Sx_ SVD_level4_thread3_Red(in_th 2:fi_th2,:);

X_t_Sx_SVD_level4_thread3_input=X_t_Sx_SVD_level4_thread3 _Red;

X_t_Sx_S VD_level4_thread3_input(in_th3_ 1 : fi_th3_ 1 , : )=X_t_Sx_S VD_level4_thread2_Red(in _th3_l :fi_th3_l,:); #X_t_Sx_SVD_level4_thread3_input(in_th3:fi_th3,:)=X_t_Sx_SVD _level4_thread4_Red(in_th 3:fi_th3,:);

X_t_Sx_SVD_level4_thread4_input=X_t_Sx_SVD_level4_thread4 _Red;

X_t_Sx_SVD_level4_thread4_input(in_th4:fi_th4,:)=X_t_Sx_S VD_level4_thread3_Red(in_th4: fi_th4,:);

X_t_Sx_SVD_level5_threadl=Sx_level5_threadl*X_t_Sx_SVD_le vel4_threadl_input;

X_t_Sx_SVD_level5_thread2=Sx_level5_thread2*X_t_Sx_SVD_le vel4_thread2_input;

X_t_Sx_SVD_level5_thread3=Sx_level5_thread3*X_t_Sx_SVD_le vel4_thread3_input;

X_t_Sx_SVD_level5_thread4=Sx_level5_thread4*X_t_Sx_SVD_le vel4_thread4_input;

######## Step 6 ########################

[u_th 1 , s_th 1 , v_th 1 ] =s vd(X_t_Sx_S VD_le ve!5_thread 1 ) ;

[ii_thl,jj_thl]=find(abs(diag(s_thl))>treshold); nL5_th 1 =size(ii_th 1,1);

X_t_Sx_S VD_level5_thread l_Red=u_th 1 ( : , 1 : nL5_th 1 )*s_th 1(1: nL5_th 1,1: nL5_th 1 )* v_th 1 ( : , 1 : nL5_thl)'; transfer_L5_thl=size(u_thl(:,l:nL5_thl),l)*nL5_thl+nL5_thl+s ize(v_thl(:,l:nL5_thl),l)*nL5 _thl;

[u_th2,s_th2,v_th2]=svd(X_t_Sx_SVD_level5_thread2);

[ii_th2,jj_th2]=find(abs(diag(s_th2))>treshold); nL5_th2=size(ii_th2, 1 );

X_t_Sx_SVD_level5_thread2_Red=u_th2(: , 1 :nL5_th2)*s_th2( 1 :nL5_th2, 1 :nL5_th2)*v_th2(:, 1 : nL5_th2)'; transfer_L5_th2=size(u_th2(:,l:nL5_th2),l)*nL5_th2+nL5_th2+s ize(v_th2(:,l:nL5_th2),l)*nL5 _th2;

[u_th3 , s_th3 , v_th3 ] =s vd(X_t_Sx_S VD_le vel5_thread3 ) ;

[ii_th3,jj_th3]=find(abs(diag(s_th3))>treshold); nL5_th3 =size(ii_th3 , 1 ) ;

X_t_Sx_SVD_level5_thread3_Red=u_th3(: , 1 :nL5_th3)*s_th3( 1 :nL5_th3, 1 :nL5_th3)*v_th3(:, 1 : nL5_th3)'; transfer_L5_th3=size(u_th3(:,l:nL5_th3),l)*nL5_th3+nL5_th3+s ize(v_th3(:,l:nL5_th3),l)*nL5 _th3; [u_th4,s_th4,v_th4]=svd(X_t_Sx_SVD_level5_thread4);

[ii_th4,jj_th4]=find(abs(diag(s_th4))>treshold); nL5_th4=size(ii_th4, 1 );

X_t_Sx_SVD_level5_thread4_Red=u_th4(: , 1 :nL5_th4)*s_th4( 1 :nL5_th4, 1 :nL5_th4)*v_th4(:, 1 : nL5_th4)'; transfer_L5_th4=size(u_th4(:,l:nL5_th4),l)*nL5_th4+nL5_th4+s ize(v_th4(:,l:nL5_th4),l)*nL5 _th4;

X_t_Sx_SVD_level5_threadl_input=X_t_Sx_SVD_level5_threadl _Red;

X_t_Sx_S VD_level5_thread l_input(in_th 1 : fi_th 1 , : )=X_t_Sx_S VD_level5_thread2_Red(in_th 1 : fi_thl,:);

X_t_Sx_SVD_level5_thread2_input=X_t_Sx_SVD_level5_thread2 _Red;

#X_t_Sx_S VD_level5_thread2_input(in_th2_ 1 : fi_th2_ 1 , : )=X_t_Sx_S VD_level5_thread 1 _Red(i n_th2_ 1 : fi_th2_ 1 , : ) ;

X_t_Sx_SVD_level5_thread2_input(in_th2:fi_th2,:)=X_t_Sx_S VD_level5_thread3_Red(in_th2: fi_th2,:);

X_t_Sx_SVD_level5_thread3_input=X_t_Sx_SVD_level5_thread3 _Red;

#X_t_Sx_S VD_level5_thread3_input(in_th3_ 1 : fi_th3_ 1 , : )=X_t_Sx_S VD_level5_thread2_Red(i n_th3_ 1 : f i_th 3 > 1 , : ) ;

X_t_Sx_SVD_level5_thread3_input(in_th3:fi_th3,:)=X_t_Sx_S VD_level5_thread4_Red(in_th3: fi_th3,:);

X_t_Sx_SVD_level5_thread4_input=X_t_Sx_SVD_level5_thread4 _Red;

#X_t_Sx_SVD_level5_thread4_input(in_th4:fi_th4,:)=X_t_Sx_ SVD_level5_thread3_Red(in_th 4:fi_th4,:);

X_t_Sx_SVD_level6_threadl=Sx_level6_threadl*X_t_Sx_SVD_le vel5_threadl_input;

X_t_Sx_SVD_level6_thread2=Sx_level6_thread2*X_t_Sx_SVD_le vel5_thread2_input;

X_t_Sx_SVD_level6_thread3=Sx_level6_thread3*X_t_Sx_SVD_le vel5_thread3_input;

X_t_Sx_SVD_level6_thread4=Sx_level6_thread4*X_t_Sx_SVD_le vel5_thread4_input;

######## Step 7 ######################## [u_th 1 , s_th 1 , v_th 1 ] =s vd(X_t_Sx_S VD_le vel6_thread 1 ) ;

[ii_thl,jj_thl]=find(abs(diag(s_thl))>treshold); nL6_th 1 =size(ii_th 1,1);

X_t_Sx_S VD_level6_thread 1 _Red=u_th 1 ( : , 1 : nL6_th 1 ) * s_th 1(1: nL6_th 1,1: nL6_th 1 ) * v_th 1 ( : , 1 : nL6_thl)'; transfer_L6_th 1 =size(u_th 1 ( : , 1 : nL6_th 1 ) , 1 ) *nL6_th 1 +nL6_th 1 +size( v_th 1 ( : , 1 : nL6_th 1 ) , 1 ) *nL6 _thl;

[u_th2,s_th2,v_th2]=svd(X_t_Sx_SVD_level6_thread2);

[ii_th2,jj_th2]=find(abs(diag(s_th2))>treshold); nL6_th2=size(ii_th2, 1 );

X_t_Sx_SVD_level6_thread2_Red=u_th2(: , 1 :nL6_th2)*s_th2( 1 :nL6_th2, 1 :nL6_th2)*v_th2(:, 1 : nL6_th2)'; transfer_L6_th2=size(u_th2(: , 1 :nL6_th2), 1 )*nL6_th2+nL6_th2+size(v_th2(: , 1 :nL6_th2), 1 )*nL6 _th2;

[u_th3 , s_th3 , v_th3 ] =s vd(X_t_Sx_S VD_le vel6_thread3 ) ;

[ii_th3,jj_th3]=find(abs(diag(s_th3))>treshold); nL6_th3=size(ii_th3, 1 );

X_t_Sx_SVD_level6_thread3_Red=u_th3(: , 1 :nL6_th3)*s_th3( 1 :nL6_th3, 1 :nL6_th3)*v_th3(:, 1 : nL6_th3)'; transfer_L6_th3 =size(u_th3 ( : , 1 : nL6_th3 ) , 1 ) *nL6_th3 +nL6_th3+size( v_th3 ( : , 1 : nL6_th3 ) , 1 ) *nL6 _th3;

[u_th4,s_th4,v_th4]=svd(X_t_Sx_SVD_level6_thread4);

[ii_th4,jj_th4]=find(abs(diag(s_th4))>treshold); nL6_th4=size(ii_th4, 1 );

X_t_Sx_SVD_level6_thread4_Red=u_th4(: , 1 :nL6_th4)*s_th4( 1 :nL6_th4, 1 :nL6_th4)*v_th4(:, 1 : nL6_th4)'; transfer_L6_th4=size(u_th4(: , 1 :nL6_th4), 1 )*nL6_th4+nL6_th4+size(v_th4(: , 1 :nL6_th4), 1 )*nL6 _th4;

X_t_Sx_SVD_level6_threadl_input=X_t_Sx_SVD_level6_threadl _Red;

#X_t_Sx_SVD_level6_threadl_input(in_thl :fi_thl , :)=X_t_Sx_SVD_level6_thread2_Red(in_th l:fi_thl,:); X_t_Sx_SVD_level6_thread2_input=X_t_Sx_SVD_level6_thread2_Re d;

X_t_Sx_S VD_level6_thread2_input(in_th2_ 1 : fi_th2_ 1 , : )=X_t_Sx_S VD_level6_thread l_Red(in _th2_l:fi_th2_l,:);

#X_t_Sx_SVD_level6_thread2_input(in_th2:fi_th2,:)=X_t_Sx_ SVD_level6_thread3_Red(in_th 2:fi_th2,:);

X_t_Sx_SVD_level6_thread3_input=X_t_Sx_SVD_level6_thread3 _Red;

X_t_Sx_S VD_level6_thread3_input(in_th3_ 1 : fi_th3_ 1 , : )=X_t_Sx_S VD_level6_thread2_Red(in _th3_l :fi_th3_l,:);

#X_t_Sx_SVD_level6_thread3_input(in_th3:fi_th3,:)=X_t_Sx_ SVD_level6_thread4_Red(in_th 3:fi_th3,:);

X_t_Sx_SVD_level6_thread4_input=X_t_Sx_SVD_level6_thread4 _Red;

X_t_Sx_SVD_level6_thread4_input(in_th4:fi_th4,:)=X_t_Sx_S VD_level6_thread3_Red(in_th4: fi_th4,:);

X_t_Sx_SVD_level7_threadl=Sx_level7_threadl*X_t_Sx_SVD_le vel6_threadl_input;

X_t_Sx_SVD_level7_thread2=Sx_level7_thread2*X_t_Sx_SVD_le vel6_thread2_input;

X_t_Sx_SVD_level7_thread3=Sx_level7_thread3*X_t_Sx_SVD_le vel6_thread3_input;

X_t_Sx_SVD_level7_thread4=Sx_level7_thread4*X_t_Sx_SVD_le vel6_thread4_input;

## We join the solution in one matrix to simplify this code but

## this is absolutely

## not necessary since we know the solution is distributed among threads

X_t_Sx_level7_thread_S VD_TOT( 1:8,: )=X_t_Sx_S VD_level7_threadl ( 1:8,:);

X_t_Sx_level7_thread_SVD_TOT(9: 16,:)=X_t_Sx_SVD_level7_thread2(9: 16,:);

X_t_Sx_level7_thread_S VD_TOT( 17 : 24, : )=X_t_Sx_S VD_level7_thread3( 17 : 24, : ) ;

X_t_Sx_level7_thread_S VD_TOT(25 : 32, : )=X_t_Sx_S VD_level7_thread4(25 : 32, : ) ;

## Ending Parallel

## Suzuki-Trotter decomposition 4th order Approximant

######################################################### #####################

######################################################### #####################

### Free evolution solution

X_t_Sx( : , : ,indexn)=Sx*X_t_Sx( : , : ,indexn- 1 ) ; Sx_2=Sx*Sx_2;

### Step Response plus free solution

X_t_Sx_gradino(:,:,indexn)=X_t_Sx(:,:,indexn)-Sx_2*invA*B -invA*B;

X_t_Sx_solo_gradino(:,:,indexn)=-Sx_2*invA*B-invA*B;

### Only Step Response

X_t_Sx_parallel(: , : ,indexn)=X_t_Sx_level7_thread_S VD_TOT ;

## Error exact solution with no parallel Suzuki Trotter expansion error_parziale_Sx_temp=max(max(max(abs(X_t( : , : ,indexn)-X_t_Sx( : , : ,indexn))))) error_X_t_Sx(indexn)=error_parziale_Sx_temp;

## Error exact solution with parallel Suzuki Trotter expansion error_parziale_Sx_parallel_temp=max(max(max(abs(X_t(:,:,inde xn)-

X_t_Sx_parallel(:,:,indexn))))) error_X_t_Sx_parallel(indexn)=error_parziale_Sx_parallel_tem p;

## Erorr exact STEP solution with STEP Suzuki Trotter expansion error_parziale_Sx_gradino_temp=max(max(max(abs(X_t_gradino(: ,:,indexn)-

X_t_Sx_gradino( : , : ,indexn))))) error_X_t_Sx_gradino(indexn)=error_parziale_Sx_gradino_temp; end ## end FOR cycle for time evolution disp ("\nRisultati") treshold

##error_X_t_T aylor ;

##error_X_t_Sx ;

##error_X_t_Sx_parallel ; error_X_t_T aylor_MAX=max(error_X_t_T aylor) error_X_t_Sx_MAX=max(error_X_t_Sx) error_X_t_Sx_parallel_MAX=max(error_X_t_Sx_parallel) error_X_t_S x_gradino_M AX=max(error_X_t_S x_gradino) n_rows

Nmass toe

######################################################### #####################

## ##

## FIGURES - PLOT

##

######################################################### #####################

## figure( 1 ) xx=1:size(X_t,3); plot(xx,X_t( 1 'r ' , xx ,X_t_S x(l,:,:),'b') title( "First state variable of all simulations - Solution vs. Sx") grid

###################################### figure(2) xx=1:size(X_t,3); plot(xx,X_t(l,:,:),'r',xx,X_t_Sx_parallel(l,:,:),'k') title( "First state variable of all simulations - Solution vs. Sx parallel ") grid

###################################### figure(3) plot(xx,error_X_t_Taylor,'r',xx,error_X_t_Sx,'b',xx,error_X_ t_Sx_parallel,'k') titlef'Error Taylor (red) vs. Sx (blue) vs. Sxparallel (black)") grid

###################################### figure(4) xx=1:size(X_t,3); plot(xx,X_t_gradino(l,:,:),'r',xx,X_t_Sx_gradino(l,:,:),'k') title( "First state variable of all simulations - Sx step") grid

###################################### figure(5) xx=1:size(X_t,3); plot(xx,X_t_Sx_solo_gradino( 1 , : , : ),'k') title( "First state variable of all simulations - Sx only step ") grid

######################################

#### Plot Number of eigenvalues

###################################### figure(6) plot(xx,nLl_thl_save,'r',xx,nLl_th2_save,'b',xx,nLl_th3_save ,'k',xx,nLl_th4_save,'g') titlef'Levell nLl thl th2 th3 th4 save") grid ###################################### figure(7) plot(xx,nL2_thl_save,'r',xx,nL2_th2_save,'b',xx,nL2_th3_save ,'k',xx,nL2_th4_save,'g') title("Level2 nL2 thl th2 th3 th4 save") grid

###################################### figure(8) plot(xx,nL3_thl_save,'r',xx,nL3_th2_save,'b',xx,nL3_th3_save ,'k',xx,nL3_th4_save,'g') title("Leve!3 nL3 thl th2 th3 th4 save") grid

###################################### figure(9) plot(xx,nL4_thl_save,'r',xx,nL4_th2_save,'b',xx,nL4_th3_save ,'k',xx,nL4_th4_save,'g') title("Level4 nL4 thl th2 th3 th4 save") grid