Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR PROVIDING DECENTRALIZED COMPUTING RESOURCES
Document Type and Number:
WIPO Patent Application WO/2023/079551
Kind Code:
A1
Abstract:
A computer based system and method for providing decentralized computing resources including translating a task of training of or inference with a machine learning model into a code in a programming language that is executable by a readymade runtime environment ( RRE) that enables interface with an Internet; finding, among a plurality of computing devices, at least one computing devices that has available computing power; transferring a portion of the translated task to the at least one computing devices; and obtaining results of execution of the portions of the translated task from the at least one computing devices. The RRE may be configured to limit interaction between execution of the portions of the translated task and processes running on each of the at least one computing and/or to limit interaction between execution of the portions of the translated task and resources of each of the at least one computing devices.

Inventors:
BOREN BEN (IL)
BOREN HAREL (IL)
LIN AMIT (IL)
Application Number:
PCT/IL2022/051162
Publication Date:
May 11, 2023
Filing Date:
November 03, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
R STEALTH LTD (IL)
International Classes:
G06N3/04; G06F9/38
Foreign References:
US20210019151A12021-01-21
CN112561078A2021-03-26
US20220292300A12022-09-15
US20220172044A12022-06-02
Attorney, Agent or Firm:
GEFEN, Nurit et al. (IL)
Download PDF:
Claims:
What is claimed is:

1. A method for providing decentralized computing resources, the method comprising, using a main processor: selecting, from a plurality of computing devices, one or more computing devices, each of which having available computing power and comprising a readymade runtime environment that enables interface with a computer network; dividing a translated task into one or more portions, wherein the translated task comprises a task gist extracted from a task of training of a machine learning model or inference using the machine learning model; transferring one portion of the one or more portions to each of the selected computing devices to be executed by the readymade runtime environment of each of the respective computing devices; and obtaining results of the execution of the one or more portions of the translated task from the selected computing devices.

2. The method of claim 1, wherein the task is training the machine learning model, and wherein the task gist comprises a model structure, model weights, hyperparameters, and datasets.

3. The method of claim 1, wherein the task is inference using the machine learning model, and wherein the task gist comprises model structure, model weights, and input data.

4. The method of claim 1, wherein the readymade runtime environment is configured to limit interaction between execution of the portions of the translated task and processes running on each of the at least one computing devices.

5. The method of claim 1, wherein the readymade runtime environment is configured to limit interaction between execution of the portions of the translated task and resources of each of the at least one computing devices.

33

6. The method of claim 1 , wherein the task of training of a machine learning model or inference using the machine learning model comprises the model structure, model weights, hyperparameters and datasets.

7. The method of claim 4, wherein each of the one or more portions comprise one of the list consisting of: a subset of the hyperparameters, the model structure and weights, and the dataset; a subset of the hyperparameters, the model structure and weights, and a portion of the dataset; a subset of the hyperparameters, a portion of the model structure and weights, and a portion of the dataset.

8. The method of claim 1, wherein the readymade runtime environment is an internet browser.

9. The method of claim 1, wherein the training of the model is done until metrics of performance are satisfied.

10. The method of claim 1, comprising: negotiating a service -level agreement with a client requesting the task; automatically committing to a service -level agreement with the client requesting the task; monitoring at least one performance indicators of each of the plurality of computing devices; and selecting the one or more computing devices based on the service-level agreement and the at least one performance indicators of the plurality of computing devices.

11. The method of claim 1, comprising: monitoring the available computing power of each of the one or more computing devices; and adjusting the computing power consumed from each of the one or more computing devices so that the computing power consumed is maintained lower than the available computing power of the computing device.

12. The method of claim 1, comprising:

34 monitoring at least one performance indicator of each of the one or more computing devices; and adjusting the computing power consumed from each of the one or more computing devices based on the at least one performance indicator. The method of claim 1, comprising training the machine learning model by the one or more computing devices within the readymade runtime environment, wherein transferring the portion of the translated task to a computing device of the one or more computing devices comprises: transferring a first shard of data comprising a single mini batch of the training data set to the computing device; and continuing transferring subsequent shards of data of the training dataset until an entire training dataset is transferred to the computing device. The method of claim 13, wherein the subsequent shards of data comprise more than a single minibatch. The method of claim 13, wherein the task is obtained from a client, and wherein a size of the subsequent shards of data or a number of mini batches included in each shard of data is adjusted based on a speed of communication from the client to the main processor. The method of claim 13, wherein a size of the subsequent shards of data or a number of mini batches included in each shard of data is adjusted based on a speed of communication from the main processor to the computing device. The method of claim 13 wherein a size of the subsequent shards of data or a number of mini batches included in each shard of data is adjusted based on progress of execution of the translated task. The method of claim 1, comprising: training the machine learning model by the one or more computing devices within the readymade runtime environment; and sending, by the one or more computing devices, the results of execution of the portions of the translated task to the main processor. The method of claim 1, wherein the portion of the translated task comprises a portion of the training dataset, the method comprising: integrating the results of execution of the portions of the translated task to obtain a trained model. The method of claim 1, wherein the portion of the translated task comprises a portion of the training dataset, the method comprising: obtaining results of the portions of the translated task; integrating the results of execution of the portions of the translated task to obtain an updated version of the portions of the translated task; sending the updated version of the portions of the translated task to the one or more computing devices; and repeating sending, obtaining and integrating until a criterion is satisfied. The method of claim 1, wherein the task is obtained from a client, the method comprising: integrating the results of execution of the portions of the translated task to obtain task results; converting the task results to a code readable by the client; and sending the converted task results to the client. The method of claim 1, comprising: extracting the task gist from the task of training of the machine learning model or inference using the machine learning model, to generate the translated task. A method for performing decentralized training of a machine learning model or inference using the machine learning model, the method comprising, using a main processor: extracting a task gist from a task of training of a machine learning model or inference using the machine learning model; obtaining available computing power of each of a plurality of computing devices, wherein each of the plurality of computing devices comprises a code execution environment that enables interface with a computer network; dividing the task gist into one or more portions, and assigning each portion to one of the plurality of computing devices such that each portion is executable by the code execution environment of the assigned computing device using the available computing power of the assigned computing device; transferring each portion to the assigned computing device to be executed by the code execution environment of the assigned computing device; and obtaining results of the execution of the one or more portions from the assigned computing devices.

24. The method of claim 23, wherein the task is training the machine learning model, and wherein the task gist comprises a model structure, model weights, hyperparameters, and datasets.

25. The method of claim 23, wherein the task is inference using the machine learning model, and wherein the task gist comprises model structure, model weights, and input data.

26. The method of claim 23, wherein the code execution environment is configured to limit interaction between execution of the portions of the task gist and processes running on each of the at least one computing devices.

27. The method of claim 23, wherein the code execution environment is configured to limit interaction between execution of the portions of the task gist and resources of each of the at least one computing devices.

28. The method of claim 23, wherein the task of training of a machine learning model or inference using the machine learning model comprises the model structure, model weights, hyperparameters and datasets.

29. The method of claim 28, wherein each of the one or more portions comprise one of the list consisting of: a subset of the hyperparameters, the model structure and weights, and the dataset; a subset of the hyperparameters, the model structure and weights, and a portion of the dataset;

37 a subset of the hyperparameters, a portion of the model structure and weights, and a portion of the dataset.

30. The method of claim 23, wherein the code execution environment is an internet browser.

31. The method of claim 23, wherein the training of the model is done until metrics of performance are satisfied.

32. The method of claim 23, comprising: negotiating a service -level agreement with a client requesting the task; and automatically committing to a service -level agreement with the client requesting the task, wherein assigning each portion to one of the plurality of computing devices is performed based on the service-level agreement.

33. The method of claim 23, comprising: monitoring the available computing power of each of the assigned computing devices; and adjusting the computing power consumed from each of the assigned computing devices so that the computing power consumed is maintained lower than the available computing power of the assigned computing device.

34. The method of claim 23, comprising: monitoring at least one performance indicator of each of the assigned computing devices; and adjusting the computing power consumed from each of the assigned computing devices based on the at least one performance indicator.

35. The method of claim 23, comprising training the machine learning model by the assigned computing devices within the code execution environment, wherein transferring the portion of the task gist to the assigned computing device comprises: transferring a first shard of data comprising a single mini batch of the training data set to the assigned computing device; and continuing transferring subsequent shards of data of the training dataset until an entire training dataset is transferred to the assigned computing device.

38

36. The method of claim 35, wherein the subsequent shards of data comprise more than a single minibatch.

37. The method of claim 35, wherein the task is obtained from a client, and wherein a size of the subsequent shards of data or a number of mini batches included in each shard of data is adjusted based on a speed of communication from the client to the main processor.

38. The method of claim 35, wherein a size of the subsequent shards of data or a number of mini batches included in each shard of data is adjusted based on a speed of communication from the main processor to the assigned computing device.

39. The method of claim 35 wherein a size of the subsequent shards of data or a number of mini batches included in each shard of data is adjusted based on progress of execution of the task gist.

40. The method of claim 23, comprising: training the machine learning model by the one or more assigned computing devices within the code execution environment; and sending, by the one or more assigned computing devices, the results of execution of the portions of the task gist to the main processor.

41. The method of claim 23, wherein the portion of the task gist comprises a portion of the training dataset, the method comprising: integrating the results of execution of the portions of the task gist to obtain a trained model.

42. The method of claim 23, wherein the portion of the task gist comprises a portion of the training dataset, the method comprising: obtaining results of the portions of the task gist; integrating the results of execution of the portions of the task gist to obtain an updated version of the portions of the task gist; sending the updated version of the portions of the task gist to the one or more assigned computing devices; and

39 repeating sending, obtaining and integrating until a criterion is satisfied. The method of claim 23, wherein the task is obtained from a client, the method comprising: integrating the results of execution of the portions of the task gist to obtain task results; converting the task results to a code readable by the client; and sending the converted task results to the client. A system for training of a machine learning model or inference using the machine learning model, the system comprising: a memory; and a processor configured to: select, from a plurality of computing devices, one or more computing devices, each of which having available computing power and comprising a readymade runtime environment that enables interface with a computer network; divide a translated task into one or more portions, wherein the translated task comprises a task gist extracted from a task of training of a machine learning model or inference using the machine learning model; transfer one portion of the one or more portions to each of the selected computing devices to be executed by the readymade runtime environment of each of the respective computing devices; and obtain results of the execution of the one or more portions of the translated task from the selected computing devices. The system of claim 44, wherein the task is training the machine learning model, and wherein the task gist comprises a model structure, model weights, hyperparameters, and datasets. The system of claim 44, wherein the task is inference using the machine learning model, and wherein the task gist comprises model structure, model weights, and input data. The system of claim 44, wherein the readymade runtime environment is configured to limit interaction between execution of the portions of the translated task and processes running on each of the at least one computing devices.

40

48. The system of claim 44, wherein the readymade runtime environment is configured to limit interaction between execution of the portions of the translated task and resources of each of the at least one computing devices.

49. The system of claim 44, wherein the task of training of a machine learning model or inference using the machine learning model comprises the model structure, model weights, hyperparameters and datasets.

50. The system of claim 49, wherein each of the one or more portions comprise one of the list consisting of: a subset of the hyperparameters, the model structure and weights, and the dataset; a subset of the hyperparameters, the model structure and weights, and a portion of the dataset; a subset of the hyperparameters, a portion of the model structure and weights, and a portion of the dataset.

51. The system of claim 44, wherein the readymade runtime environment is an internet browser.

52. The system of claim 44, wherein the processor is configured to train the model until metrics of performance are satisfied.

53. The system of claim 44, wherein the processor is further configured to: negotiate a service -level agreement with a client requesting the task; automatically commit to a service -level agreement with the client requesting the task; monitor at least one performance indicators of each of the plurality of computing devices; and select the one or more computing devices based on the service -level agreement and the at least one performance indicators of the plurality of computing devices.

54. The system of claim 44, wherein the processor is further configured to: monitor the available computing power of each of the one or more computing devices; and adjust the computing power consumed from each of the one or more computing devices so that the computing power consumed is maintained lower than the available computing power of the computing device.

41 The system of claim 44, wherein the processor is further configured to: monitor at least one performance indicator of each of the one or more computing devices; and adjust the computing power consumed from each of the one or more computing devices based on the at least one performance indicator. The system of claim 44, comprising training the machine learning model by the one or more computing devices within the readymade runtime environment, wherein the processor is further configured to: transfer the portion of the translated task to a computing device of the one or more computing devices by: transferring a first shard of data comprising a single mini batch of the training data set to the computing device; and continuing transferring subsequent shards of data of the training dataset until an entire training dataset is transferred to the computing device. The system of claim 56, wherein the subsequent shards of data comprise more than a single minibatch. The system of claim 56, wherein the processor is further configured to obtain the task from a client, and to adjust a size of the subsequent shards of data or a number of mini batches included in each shard of data based on a speed of communication from the client to the main processor. The system of claim 56, wherein the processor is further configured to adjust a size of the subsequent shards of data or a number of mini batches included in each shard of data based on a speed of communication from the main processor to the computing device. The system of claim 56, wherein the processor is further configured to adjust a size of the subsequent shards of data or a number of mini batches included in each shard of data based on progress of execution of the translated task. The system of claim 44, wherein the portion of the translated task comprises a portion of the training dataset, wherein the processor is further configured to:

42 integrate the results of execution of the portions of the translated task to obtain a trained model. The system of claim 44, wherein the portion of the translated task comprises a portion of the training dataset, wherein the processor is further configured to: obtain results of the portions of the translated task; integrate the results of execution of the portions of the translated task to obtain an updated version of the portions of the translated task; send the updated version of the portions of the translated task to the one or more computing devices; and repeat sending, obtaining and integrating until a criterion is satisfied. The system of claim 44, wherein the processor is further configured to: obtain the task from a client; integrate the results of execution of the portions of the translated task to obtain task results; convert the task results to a code readable by the client; and send the converted task results to the client. The system of claim 44, wherein the processor is further configured to: extract the task gist from the task of training of the machine learning model or inference using the machine learning model, to generate the translated task;

43

Description:
SYSTEM AND METHOD FOR PROVIDING DECENTRALIZED COMPUTING

RESOURCES

FIELD OF THE INVENTION

[0001] The invention relates generally to a method for automatically providing decentralized computing resources for performing highly intensive computing tasks related to machine learning models including training of deep learning models, neural architecture search processes, and hyperparameter tuning processes.

BACKGROUND OF THE INVENTION

[0002] Artificial intelligence (Al) and machine learning (ML) models such as neural network (NN) models, are gaining increasing popularity. Applications of ML models are spread in nearly every professional field, including financing, automatic vehicles, linguistics, health services, digital media, advertising and many others. Developing an ML model to a production level of being implemented and used in a product or a service, typically includes model creation, generation of datasets for training and validation, including in many cases data accumulation and labeling, model training, architecture search, and hyperparameter optimization of the model.

[0003] An ML lifespan may be generally divided into training and inference phases. Training may include creating the ML model and training the ML model by running a training dataset into the ML model to adjust the model parameters and weights. Inference may refer to using the ML model on real life input data to produce predictions based on the input data.

[0004] Training of ML models and hyperparameter optimization, as well as ML model inference is a computationally intensive task. Generating a satisfactory ML model typically requires building many test models, training each of the test models using training and validation datasets, improving the test models, and retraining them in an iterative process of optimization, to achieve a final model (e.g., the production or production-grade model) that provides predictions at satisfactory performance.

[0005] ML models may include millions, or even tens and hundreds of millions, of trainable parameters, and training an ML model may require hundreds of millions or billions of computations per each run of the model on the training dataset (epoch), where training typically includes tens to several hundred of epochs. In addition, an ML model may require further maintenance after being put to production due to data drift, e.g., changes of the distribution of production data over time. Maintenance may include retraining, optimization of hyperparameters and/or adjustments of the model itself. Therefore, the amount of computing power used to train ML models constantly increases, making ML adoption in both the development and production phases both environmentally unfriendly and expensive.

[0006] Training of ML models may be performed using on-premises computing power or using a cloud service. Cloud computing services may provide large and scalable computing power. However, cloud computing requires lengthy and complex setup procedures and are expensive. Therefore, other ways for providing the computing power required for performing training, optimization, and production phase inference and maintenance of ML models, that are less complex, less lengthy, less expensive, more accessible and generate a lesser carbon footprint are needed.

SUMMARY OF THE INVENTION

[0007] According to embodiments of the invention, a computer-based system and method for providing decentralized computing resources may include, using a processor: selecting, from a plurality of computing devices, one or more computing devices, each of which having available computing power and comprising a readymade runtime environment that enables interface with a computer network; dividing a translated task into one or more portions, wherein the translated task comprises a task gist extracted from a task of training of a machine learning model or inference using the machine learning model; transferring one portion of the one or more portions to each of the selected computing devices to be executed by the readymade runtime environment of each of the respective computing devices; and obtaining results of the execution of the one or more portions of the translated task from the selected computing devices. Embodiments of the invention may further include extracting the task gist from the task of training of the machine learning model or inference using the machine learning model, to generate the translated task.

[0008] According to embodiments of the invention, a computer-based system and method for performing decentralized training of a machine learning model or inference using the machine learning model, may include, using a main processor: extracting a task gist from a task of training of a machine learning model or inference using the machine learning model; obtaining available computing power of each of a plurality of computing devices, wherein each of the plurality of computing devices comprises a code execution environment that enables interface with a computer network; dividing the task gist into one or more portions, and assigning each portion to one of the plurality of computing devices such that each portion is executable by the code execution environment of the assigned computing device using the available computing power of the assigned computing device; transferring each portion to the assigned computing device to be executed by the code execution environment of the assigned computing device; and obtaining results of the execution of the one or more portions from the assigned computing devices.

[0009] According to embodiments of the invention, the readymade runtime environment may be configured to limit interaction between execution of the portions of the translated task and processes running on each of the at least one computing devices.

[0010] According to embodiments of the invention, the readymade runtime environment may be configured to limit interaction between execution of the portions of the translated task and resources of each of the at least one computing devices.

[0011] According to embodiments of the invention, the task of training a machine learning model or inference using the machine learning model may include the model structure and weights, hyperparameters and datasets.

[0012] Embodiments of the invention may include dividing the task of training of a machine learning model or inference using the machine learning model into portions, wherein each portion may include one of the following: a subset of the hyperparameters, the model structure and weights, and the dataset; a subset of the hyperparameters, the model structure and weights, and a portion of the dataset; a subset of the hyperparameters, a portion of the model structure and weights, and a portion of the dataset.

[0013] According to embodiments of the invention, the readymade runtime environment may be an internet browser, a container such as Docker, a podman or any other readymade runtime environment that is able to run the translated task.

[0014] According to embodiments of the invention, the training of the model may be done until metrics of performance are satisfied.

[0015] Embodiments of the invention may include negotiating a service-level agreement with a client requesting the task; automatically committing to a service-level agreement with a client requesting the task; monitoring at least one performance indicators of each of the plurality of computing devices; and selecting the at least one computing devices based on the service-level agreement and the at least one performance indicators of the plurality of computing devices.

[0016] Embodiments of the invention may include monitoring the available computing power of each of the at least one computing devices; and adjusting the computing power consumed from each of the at least one computing devices so that the computing power consumed is maintained lower than the available computing power of the computing device. [0017] Embodiments of the invention may include monitoring at least one performance indicator of each of the at least one computing devices; and adjusting the computing power consumed from each of the at least one computing devices based on the at least one performance indicator.

[0018] According to embodiments of the invention, the task of training of the machine learning model may be a model fit function.

[0019] According to embodiments of the invention, the task of inference of the machine learning model may be a model predict or model evaluate function.

[0020] Embodiments of the invention may include training the machine learning model by the at least one computing devices within the readymade runtime environment or readymade runtime environments run on that computing device, where transferring the portion of the translated task to a computing device of the at least one computing device may include: transferring a structure of the machine learning model, model weights and the training hyperparameters to the remote computing device; transferring a first shard of data comprising a single mini batch of the training data set to the remote computing device; and continuing transferring subsequent shards of data of the training dataset until an entire training dataset is transferred to the remote computing device.

[0021] According to embodiments of the invention, the subsequent shards of data may include more than a single mini-batch.

[0022] According to embodiments of the invention, the task may be obtained from a client, and a size of the subsequent shards of data or a number of mini batches included in each shard of data may be adjusted based on a speed of communication from the client to the main processor.

[0023] According to embodiments of the invention, a size of the subsequent shards of data or a number of mini batches included in each shard of data may be adjusted based on a speed of communication from the main processor to the at least one computing devices.

[0024] According to embodiments of the invention, a size of the subsequent shards of data or a number of mini batches included in each shard of data may be adjusted based on progress of execution of the translated task.

[0025] Embodiments of the invention may include training the machine learning model included in the translated task by the at least one computing devices within the readymade runtime environment or readymade runtime environments on each computing device; and sending, by the at least one computing devices, the results of execution of the portions of the translated task to the main processor. [0026] According to embodiments of the invention, the portion of the translated task may include a portion of the training dataset, and embodiments of the method may include integrating the results of execution of the portions of the translated task to obtain a trained model.

[0027] According to embodiments of the invention, the portion of the translated task may include portion of the training dataset, and embodiments of the method may include obtaining results of the portions of the translated task; integrating the results of execution of the portions of the translated task to obtain an updated version of the portions of the translated task; sending the updated version of the portions of the translated task to the at least one computing devices; and repeating sending, obtaining and integrating until a criterion is satisfied.

[0028] Embodiments of the invention may include integrating the results of execution of the portions of the translated task to obtain task results; converting the task results to code readable by an original programming language of the task; and sending the converted task results to the client.

BRIEF DESCRIPTION OF THE DRAWINGS

[0029] Non-limiting examples of embodiments of the disclosure are described below with reference to figures attached hereto. Dimensions of features shown in the figures are chosen for convenience and clarity of presentation and are not necessarily shown to scale. The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features and advantages thereof, can be understood by reference to the following detailed description when read with the accompanied drawings. Embodiments of the invention are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like reference numerals indicate corresponding, analogous or similar elements, and in which:

[0030] Fig. 1 schematically illustrates a system, according to embodiments of the invention.

[0031] Fig. 2 schematically illustrates dataflow between a client, an orchestrator and a trainer, according to embodiments of the invention.

[0032] Fig. 3 schematically illustrates a system according to embodiments of the invention.

[0033] Fig. 4 schematically illustrates dataflow between a client, orchestrator, a trainer and a parameter server, according to embodiments of the invention.

[0034] Fig. 5 is a flowchart of a method providing decentralized computing resources for training an ME model, according to embodiments of the present invention. [0035] Fig. 6 is a flowchart of a method providing decentralized computing resources for an ML model inference, according to embodiments of the present invention.

[0036] Fig. 7 illustrates an example computing device according to an embodiment of the invention. [0037] It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn accurately or to scale. For example, the dimensions of some of the elements can be exaggerated relative to other elements for clarity, or several physical components can be included in one functional block or element.

DETAILED DESCRIPTION

[0038] In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the present invention can be practiced without these specific details. In other instances, well- known methods, procedures, and components, modules, units and/or circuits have not been described in detail so as not to obscure the invention.

[0039] Although some embodiments of the invention are not limited in this regard, discussions utilizing terms such as, for example, "processing," "computing," "calculating," "determining," "establishing", "analyzing", "checking", “inferring” or the like, may refer to operation(s) and/or process(es) of a computer, a computing platform, a computing system, or other electronic computing device that manipulates and/or transforms data represented as physical (e.g., electronic) quantities within the computer's registers and/or memories into other data similarly represented as physical quantities within the computer's registers and/or memories or other information transitory or non- transitory or processor-readable storage medium that may store instructions, which when executed by the processor, cause the processor to execute operations and/or processes. Although embodiments of the invention are not limited in this regard, the terms "plurality" and "a plurality" as used herein may include, for example, "multiple" or "two or more". The terms "plurality" or "a plurality" may be used throughout the specification to describe two or more components, devices, elements, units, parameters, or the like. The term "set" when used herein may include one or more items unless otherwise stated. Unless explicitly stated, the method embodiments described herein are not constrained to a particular order or sequence. Additionally, some of the described method embodiments or elements thereof can occur or be performed in a different order from that described, simultaneously, at the same point in time, or concurrently. [0040] Developing, running and maintaining ML models, and specifically NN models, have significant costs in terms of time, money, human resources, etc. Developing an ML model to a production level may take from between several weeks to months or years. For example, developing an ML model may include determining a model structure from a plurality of prospective model structures, training each of these models using an annotated or labeled training dataset (supervised learning) as well as training each of these models using non-annotated or unlabeled training dataset (unsupervised learning), until arriving at a final model that generates predictions with satisfactory performance. As known, performance of an ML model may be defined and measured in terms of any applicable metrics, including accuracy, precision and recall, etc. Each of these models may typically include millions, or even tens and hundreds of millions, of trainable parameters, and its each iteration of training may require hundreds of millions or billions of computations, per each epoch. Every model training-run usually includes tens to several hundred epochs to get to a model having satisfying performance.

[0041] Each task of training an ML model may also include hyperparameters that are used to define model and training characteristics, including, inter-alia, a learning rate, a mini batch size, a loss function, a number of decision trees, max -depth, a number of epochs, etc. In some configurations, the specifications of the model architecture itself are hyperparameters, e.g., the number of convolution layers or the number of dense layers that make up a NN model may not be hard-coded, but rather a hyperparameter that is adjusted alongside other hyperparameters during iterative training sessions in search of the satisfactory model.

[0042] A NN may refer to an information processing paradigm that may include nodes, referred to as neurons, organized into layers, with links between the neurons. The links may transfer signals between neurons and may be associated with weights. A NN may be configured or trained for a specific task, e.g., pattern recognition or classification. Training a NN for the specific task may involve adjusting these weights based on examples (e.g., labeled data included in the training dataset). Each neuron of an intermediate or last layer may receive an input signal, e.g., a weighted sum of output signals from other neurons, and may process the input signal using a linear and/or nonlinear function (e.g., an activation function). The results of the input and intermediate layers may be transferred to other neurons and the results of the output layer may be provided as the output of the NN. For example, in a NN algorithm known as the gradient descent algorithm, the results of the output layer may be compared to the labels of the samples in the training dataset, and a loss or cost function (such as the root-mean-square error) may be used to calculate a difference between the results of the output layer and the labels. The weights of some of the neurons may be adjusted using the calculated differences, in a process that iteratively minimizes the loss or cost until satisfactory metrics are achieved or satisfied. A NN may be executed and represented as formulas or relationships among nodes or neurons, such that the neurons, nodes or links are “virtual”, represented by software and formulas and mathematical constructs, such as activation functions and multi-dimensional matrices of data elements and weights. A processor, e.g., central processing units (CPU), graphical processing units (GPU) or tensor processing units (TPU) or a dedicated hardware device may perform the relevant calculations on the mathematical constructs. As used herein a NN may include deep neural networks (DNN), convolutional neural networks (CNN), probabilistic neural networks (PNN), time delay neural network (TDNN), deep stacking network (DSN), generative adversarial networks (GAN), recurrent neural network (RNN), long short-term memory (LSTM), etc.

[0043] The process of applying a large set of permutations upon the chosen hyperparameters is called hyperparameter tuning or hyperparameter optimization (HPO). During HPO, each permutation of hyperparameters requires executing at least one (and in most cases many more than one) full training process either upon the whole dataset, or upon portions of the dataset (data parallelism) or upon portions of the models (model parallelism), or combinations of the above, all performed to achieve a viable production-grade model that will satisfy the metrics desired for successful production-stage.

[0044] In many applications, a production-grade model may require maintenance, e.g., the model building and HPO may be repeated at time intervals after the model has been put to production if there is a data drift, e.g., a growing variance between the real-life data and the data used for training and/or if enhanced model types are introduced. As a result, the number of computations associated with building and maintaining an ML model is huge. An attempt to reduce the number of computations by reducing training, reducing HPO or reducing maintenance during production may inevitably result in a model’s sub-optimal performance to start with, or its deterioration to ineffective usability over time. [0045] Training of ML models, and their inference on live data during production, may be performed using on-premises computing power or using a cloud service. In both cases a complete runtime environment must be prepared for the training or inference to take place. For example, if the model being trained is a NN model, e.g., code or program written in the backend framework TensorFlow Keras, using open-source Python libraries such as NurnPy, pandas and matplotlib, then a complete preparation must be performed on the computing device that will be running the training or inference processes, such that it has ready a “runtime environment” that will run the program and may include, inter-alia, an interface with hardware, e.g. compute unified device architecture (CUDA), the full stack of correct and updated software versions, libraries, etc. This preparation is mandatory to ensure successful code execution of the model training on that particular computing device.

[0046] For a training of a model to be executed by several computing devices, in a decentralized or distributed manner, this complex preparation task must be tailored for and installed on each variation of computing device. There is no one-fits-all runtime environment preparation script in existence, which would fit various computers, smartphones, tablets, car-infotainment systems (ICE) and the like. Each preparation process must thus be performed differently across the multitude of computing devices that are to be used in the execution of the distributed training task. For example, installation of Tensorflow on an Apple® laptop including an Ml® chip and MacOS® operating system, is different than installation of Tensorflow on a Dell® laptop including an intel®-CPU, Nvidia® GPU and Windows® 10 operating system, and so on with each different computing device. Those skilled in the art will recognize that this constitutes an essentially insurmountable hurdle to actual implementation of large-scale decentralized or distributed computing resources utilization including training and inference of ML models across a multitude of computing devices.

[0047] Training of ML models may be performed using on-premises computing power, or a cloud computing service. Cloud computing services may provide large and scalable computing power. However, cloud computing requires lengthy and complex setup procedures, which vary from one cloud services provider to another, each designed to ensure a standard runtime environment for the computing devices owned and operated by the respective cloud services provider. In addition, cloud computing services are very expensive, due to their mandatory cost structure that may include buying computers, holding them on especially prepared real estate, maintaining them, cooling them, staffing them, etc. Thus, the financial costs of training and inference of ML models imposes additional, and in some cases prohibitive, difficulty for academics, students, researchers and commercially motivated scientific progress, to engage in deep learning research and technology innovation and progression.

[0048] On top of the direct costs, using a cloud service for training an ML model may involve opening an account, determining training setup parameters, such as the numbers, brand types and model types of CPUs, GPUs or TPUs to be used in training and HPO, and if training and HPO will be performed using these resources in parallel or consecutively. Those setup and preparation processes are usually site-specific, which makes the transition from one cloud service provider to another complex and expensive, (referred to as vendor lock-in) and require employees designated as “ML operations” or “MLOps”, to acquire unique expertise pertaining to a particular cloud service provider. However, despite those drawbacks, ML developers use cloud computing services for developing ML models due to a lack of any alternative to the vast computing power that only cloud computing services can provide.

[0049] Embodiments of the invention may include a system and method for providing decentralized computing resources to clients. According to embodiments of the invention, computing devices in homes, vehicles and enterprises may provide computing power for training ML models by such clients. According to embodiments of the invention, to operate as a trainer of an ML model, owners of those computing devices may only be required to have them connected to a network, e.g., the Internet and operate a readymade runtime environment (RRE). Lor example in personal computers, gaming consoles and mobile phones the RRE may be or may include a web browser, and in data center servers the RRE may be or may include a container (e.g., Docker or Podman). In an example RRE, the RRE may include all the necessary infrastructure to train or partially train an ML model as disclosed herein, and no other installations of software, agents, executables, runtime environments or libraries, adjustments and setups may be required, e.g., except for the installation of the RRE itself. Clients (e.g., computing devices or applications that train, optimize, run or maintain one or more ML models) and trainers (e.g., computing devices owned by persons or firms that may have available computing power for training or inferring ML models) may sign up to a service of ML training according to embodiments of the invention, without performing any specific installations, adjustments and setups as above.

[0050] According to embodiments of the invention, a service -level agreement (SLA) or service- level-objective (SLO) may be committed to by the service provider to the client for a task of training an ML model, which may be automatically assigned to be trained by any trainer that has available or free computing power (or available computing power that is above a threshold) and satisfies other considerations as sufficient for satisfying the SLA, e.g., by personal computers, servers, gaming consoles, rented virtual-machines, smart televisions, smart phones, media streaming devices, in-car infotainment systems and other processors distributed all over the world, that are connected to the internet and operate a web browser, a container or other suitable RRE, also referred to herein as a sandboxed environment. According to some embodiments of the invention, the utilization of such available computing power may be performed without tangible hinderance to the ongoing normal utilization of those computing devices. As used herein, free or available computing power may refer to computing power of a trainer that is currently not used by the trainer or that is assigned by the trainer for performing a portion of the task of training an ML model.

[0051] Eurthermore, embodiments of the invention may improve ML development technology by providing a system and method for implementing the utilization of the decentralized computing resources automatically. According to embodiments of the invention, utilizing the decentralized computing resources automatically may include sending translated tasks to a plurality of trainers running computing devices of various types, without requiring any special adjustments on the side of the trainer, regardless of the specific type and architecture of the trainer. This stands in stark contradiction to prior art decentralized environments that are inherently not automatically implemented, e.g., require separate preparation on the side of each of the trainers for each ML training task which exceeds the mere installation of a readymade runtime environment (such as a web browser, a container or a container engine, a Docker or Podman, etc.), and are not operable on a multitude of machines while being agnostic to each and every machine’s special environment including hardware, operating system, drivers, etc.

[0052] Embodiments of the invention may include a system and method for providing decentralized computing resources. Embodiments of the method may include, using a main processor: translating a task (e.g., obtained or received from a client) of training or inference of a machine learning or neural network model, into a task gist, e.g., a code including the model structure, model weights, hyperparameters, other parameters associated with the model, and datasets, in a programming language and/or format that are executable by an RRE that enables interface or communication with a computer network such as the Internet, to generate a translated task; finding, among a plurality of computing devices (e.g., candidate trainers), at least one computing devices that has available computing power; transferring (e.g. via a network such as the Internet or another network) a portion of the translated task to each of the at least one computing devices; and obtaining results of execution of the portions of the translated task from the at least one computing devices, where the RRE may be configured to limit, prevent or block interaction between execution of the portions of the translated task and processes running on each of the at least one computing devices and between execution of the portions of the translated task and resources of each of the at least one remote computing devices. The main processor may convert the results of execution of the portions of the translated task to be readable in a programing language as requested by the client.

[0053] For clarity, extracting the task gist from a task obtained from the client in a first programing language, and optionally transforming the task gist to the programming language and/or format that are executable by an RRE, may be referred to herein as translating, and transforming the results of the execution, as obtained from the trainers back to be readable in the first programing language or another programing language and/or format that is required by the client may be referred to herein as converting. According to some embodiments, translating a task obtained from the client may include extracting from the task as obtained from the client, the model structure, model weights, hyperparameters, and datasets which together form the task gist. The task gist, e.g., the model structure, model weights, hyperparameters, and datasets may then be transformed into a format and/or programing language that is readable by a trainer.

[0054] Thus, embodiments of the invention may provide huge computing resources that are decentralized. Therefore, embodiments of the invention may improve the technology of training, optimizing, running and maintaining of ML models by providing computing resources to these ends. Benefits for the clients and ML developers may include accessible and less expensive computing power and simpler setup process, availability of very large computing power at short notice, automatic translation and distribution of training and inference assignments among the multitude of trainers. NN technology may be improved to more efficiently develop NNs. Benefits for trainers may include a new source of income paid for using their computing devices. Benefits for the environment may include an overall lower carbon footprint. Benefits to science and commerce may include lower barriers for the development, optimization, use and maintenance of useful ML models, for academics, students, researchers, and commercially motivated scientific innovation and progression, in both developed as well as emerging economies. Benefits for manufacturers of products containing computing devices or services providers utilizing such computing devices (e.g. personal computers manufacturers, gaming consoles manufacturers, car and electric car manufacturers, set top boxes services providers, mobile phone manufacturers, cellular networks services providers) may be in providing their goods or services at lower prices, thus splitting the value generated by these devices’ joining in global decentralized computation, between themselves and/or their clients.

[0055] Reference is made to Fig. 1, which schematically illustrates a system 100, according to embodiments of the invention. System 100 may include one or more clients 110, 112 and 114 an orchestrator 120 and one or more trainers 130, 132 and 134. Each of clients 110, 112 and 114, orchestrator 120 and trainers 130, 132 and 134 may be or may include a computing device such as computing device 700 depicted in Fig. 7.

[0056] Networks 140 may include any type of network or combination of networks available for supporting communication between clients 110, 112 and 114, orchestrator 120 and trainers 130, 132 and 134. Networks 140 may be or may include for example, the Internet and intranet networks, a part of a private IP network, an integrated services digital network (ISDN), a set of frame relay connections, a public or private data network, a local area network (LAN), a wide area network (WAN), a wireline or wireless network, a cellular or satellite network, a local, regional, or global communication network, an enterprise intranet, a mesh network, and any combination of the preceding and/or any other suitable communication infrastructure. It will be recognized that embodiments of the invention are not limited by the nature, type or other aspects of network 140.

[0057] Any of client 110, 112 and 114 may initiate performing a computing task, referred to herein as a task or a job, e.g., training of an ML model or ML model inference. The ML model may include any type of learning model that requires training or inference such as, but not limited to, NN models, decision trees, gradient boosted trees, gradient boosting techniques, etc. For example, client 110 may provide a computer task, e.g., a training task, to orchestrator 120, as disclosed herein. The task may be provided through networks 140. The task may be provided as software code in any supported computer language or format, including, Python, C, C++, C#, ML specific languages or frameworks, etc. For example, a task for training an ML model (or model) may follow a call for training start (e.g., model.fit), and include model architecture or structure, model weights, training configuration and optimizer (e.g., in an H5 file), training or validation data (e.g., .npy files or .csv data frames), and separate training hyperparameters, including mini batch size, epochs, loss function, accuracy metrics, optimizer, learning rate, callbacks, etc. As another example, the training hyperparameters may include n_estimators, tree_method, eta, max_depth, learning_rate, min_child_weight, subsample, colsample_bytree, reg_alpha, reg_lambda, early_stopping_rounds, etc. A task for a NN model inference (also referred to herein as inference) may include a call for inference to start (e.g. model. predict) model architecture and weights, and input data. A model.fit, or fit functions may refer to commands or functions included in any relevant ML software framework such as Tensorflow® Keras®, Scikit-Learn®, PyTorch®, or coding frameworks, that when executed perform training of an ML model or specifically a NN model.

[0058] Orchestrator 120 may obtain the task from client 110, automatically translate the task into a programming language, or programming languages in some embodiments, supported and executable by RREs 131, 133, and 135, thus generating a translated task. For example, orchestrator 120 may translate the task such that it is supported and executable in JavaScript by Internet browsers, or executable in Python or any other computer language supported by RREs 131, 133, and 135. In some embodiments, automatically translating the task may include extracting a task essence or gist from the task obtained from client 110. The task essence or gist may include the parameters and data required for training or inferring the model (depending on the type of the task). For example, in a task of training the machine learning model, the task essence or gist may include the model structure, the model weights, hyperparameters, and datasets. In a task of inferring using a trained ML model, the task essence or gist may include the model structure (e.g., the structure of the trained model), the model weights e.g., the weights of the trained model) and the input data. In some embodiments, automatically translating the task may further include adjusting or changing the format of the task essence or gist to be readable, executable or supported by RREs 131, 133, and 135.

[0059] A code of training of the machine learning model or inference using the machine learning model may include importing various libraries. Those libraries may include various files or modules that contain various functions and data values, including, inter-alia, custom-made functions and data values that cannot be forecast in advance, for training an ML model or performing inference. Thus, executing a code for training an ML model or performing inference (e.g., executing the task) may require using at least part of the functions and data included in those libraries. Therefore, according to prior art, such libraries would also have to be installed on or otherwise provided to the remote or decentralized execution environments in order for them to successfully execute the task or a portion of the task of training an ML model.

[0060] According to embodiments of the invention, installations of those libraries on the RRE’s installed on trainers 130, 132 and 134 may be eliminated by generating the task essence or gist, since the task essence or gist is readable and executable by RRE’s 131, 133 and 135 and may not require the functions and data included in those libraries.

[0061] In one example, the datasets of the original task may require preprocessing. For example, images (e.g., labeled images used for training the ML model) may require filtering, augmentations, size adjustments, etc., text may require transformations into feature vectors, audio data may require preprocessing, etc. In many applications, preprocessing of training datasets may include using functions provided in the libraries included with the code of training of the machine learning model or inference using the machine learning model. According to embodiments of the invention, as part of extracting the task gist, orchestrator 120 may extract the datasets after they have been preprocessed, and include in the task gist datasets after preprocessing, that are ready to be provided to the ML model. In some applications, the ML model itself may require preprocessing. For example, an ML model may include embedding layers with pre -prepared matrices that are included in libraries imported to the code of training of the machine learning model or inference using the machine learning model or of the ML software framework, or both. According to embodiments of the invention, as part of extracting the task gist, orchestrator 120 may perform the preprocessing of the ML model, e.g., prepare the required matrices and include them in the task gist. Thus, RREs 131, 133 and 135 and trainers 130, 132 and 134 may not require preinstallation of libraries, functions or data required to perform a portion of the task. Accordingly, an RRE 131, 133 and 135 including a standard Internet browser or a container may be used to perform a portion of the translated task without any specific adjustments.

[0062] Orchestrator 120 may distribute the translated task among one or more trainers 130, 132 and 134, each including an RRE 131, 133 and 135, respectively, e.g., Internet browsers and / or Dockers. Trainers 130, 132 and 134 may execute the portion of the translated task using RREs 131, 133 and 135. For example, orchestrator 120 may divide or partition the translated task into one or more portions, and send each portion to one of trainers 130, 132 and 134. According to some embodiments, orchestrator 120 may operate a website to implement embodiments of the invention. For example, orchestrator 120 may operate a website to communicate with RREs 131, 133, and 135 of trainers 130, 132 and 134, and with clients 110, 112, 114, to register clients 110, 112, 114 and trainers 130, 132 and 134, to obtain tasks from clients 110, 112, 114, to assign or send portions of the translated task to trainers 130, 132 and 134, in accordance with the programming code executable in each of RREs 131, 133, and 135, to obtain results from trainers 130, 132 and 134, to send results to clients 110, 112, 114, etc.

[0063] For a training task, the portions of the translated task sent to trainers 130, 132 and 134 may include translated versions of the model structure, model weights, training data, training hyperparameters (or parts of the model structure, model weights, training data, and training hyperparameters pertaining to the portion). For an inference task, the portions of the translated task sent to trainers 130, 132 and 134 may include translated versions of the model structure, model weights, and input data. While drawn as separated components, orchestrator 120 or parts thereof, may be implemented in client 110, e.g., as a software block or module. For example, is some embodiments, extraction of the task gist from the task may be performed implemented in client 110. In some embodiments, training of the ME model may continue until a criterion is satisfied, e.g., until metrics of performance, e.g., obtained from client 110, are achieved or until a number of epochs defined for example by client 110 is reached.

[0064] According to some embodiments, a portion of the translated task may include, a part, a portion, or a subset of the hyperparameters, the model structure or architecture, the model weights and the dataset; a part, a portion, or a subset of the hyperparameters, the model structure or architecture, the model weights and a portion of the dataset; a part, a portion, or a subset of the hyperparameters, a portion of the model architecture and weights and a portion of the dataset; or a portion of the hyperparameters, a portion of the model structure or architecture, and weights, and a portion of the dataset. [0065] Trainers 130, 132 and 134 may include any type of computing device (such as computing device 700) that may provide computing power and resources for executing or performing at least a part or a portion of the translated task, e.g., at least a part or a portion of the model training or inference. Each of trainers 130, 132 and 134 may be a local computing device, e.g., connected to orchestrator 120 through a local network such as an Intranet, or a remote computing device, e.g., connected to orchestrator 120 through a wide network such as the Internet. The computing power may include processing power as well as available memory and other resources required for performing the translated task or portions thereof.

[0066] According to embodiments of the invention, each of trainers 130, 132 and 134 may include an RRE 131, 133, and 135, respectively, also referred to as sandbox, that may enable interfacing with network 140 , to obtain supplementary code that may be used for controlling the execution or performance of the portion of the task, e.g. engaging with orchestrator 120, reporting back to orchestrator 120, and/or control of the training process at the trainer level, e.g. the dynamic monitoring of free or available computing power at the trainer, and the dynamic utilization of the free or available computing power for purposes of training, etc., and later obtaining at least a part or a portion of translated tasks. RREs 131, 133, and 135 may be or may include an environment for execution of code (e.g., software code), and may enable to execute the obtained supplementary code that may be used for controlling the execution or performance of the portion of the tasks, and later to execute the obtained portion of the translated task. RREs 131, 133, and 135 may support execution of software code in at least one programing language, e.g., the programing language that can execute the translated task and supplementary code that may be used for controlling the execution or performance of the portion of the task. RREs 131, 133, and 135 may be configured to reduce, limit, prevent or block interaction between execution of the portions of the translated task and processes running on trainer 130, 132, 134 and between execution of the portions of the translated task and resources of trainer 130, 132, 134. According to some embodiments, RREs 131, 133, and 135 may be or may include an Internet browser, e.g., Google’s Chrome®, Microsoft’s Edge®, Apple’s Safari®, etc., that is capable of executing JavaScript programming language, or a container e.g. Docker or Podman.

[0067] RREs 131, 133, and 135 may be or may include software applications that are configured to run or execute on varying computing devices (e.g., computers, video streaming devices, gaming consoles, smart phones, or dedicated hardware, such as computing device 700). RREs 131, 133, and 135 may be at least partially implemented by dedicated hardware. RREs 131, 133, and 135 may enable interface with the Internet, for purposes of receiving, transmitting, presenting and/or computing data from the Internet. RREs 131, 133, and 135 may be agnostic to the hardware and software environment in which they operate, e.g., the specific hardware and software environment of trainers 130, 132 and 134, in the sense that RREs 131, 133, and 135 may operate similarly, and provide users with similar services, across various hardware and software combinations. Specifically, RREs 131, 133, and 135 installed on computing devices (e.g., on trainers 130, 132 and 134) are agnostic to, and therefore compatible with hardware specifications of the computing devices, the operating systems (e.g., operating system 715 depicted in Fig. 7) of the computing devices, access schemes to the various hardware or software of the computing devices, and other services, processes, drivers, widgets, that may be executed by the computing device. RREs 131, 133, and 135 may all be able to natively execute at least one programming language. For example, all contemporary Internet browsers are such RREs and may be readily capable of executing HTML and JavaScript codes. Similarly, container engines may support programming languages such as Python and other programming languages as known in the art. RREs 131, 133, and 135 may be accompanied by a graphical user interface (GUI). While specific programming languages and environments are discussed, other languages and environments may be used.

[0068] According to embodiments of the invention, executing portions of the translated task on RREs 131, 133, and 135 may enable the portions of the translated task to be executed by trainers 130, 132 and 134 regardless of the specific hardware and software configuration of each trainer 130, 132 and 134, and without any preparation process required of users of these trainers 130, 132 and 134, while restricting possible interactions between that executed translated task and other processes running on trainers 130, 132 and 134 and other resources of trainers 130, 132 and 134. Thus, once each of RREs 131, 133, and 135 are installed and connected to the system (e.g., connected to a website of orchestrator 120) and operated by trainers 130, 132 and 134, respectively, no other preparations, installations, configurations and adjustments are required from trainers 130, 132 and 134 to execute a portion of the translated task and the supplementary code that may be used for executing the portion of the task, as disclosed herein.

[0069] According to embodiments of the invention, RREs 131, 133, and 135 may be a standard unit of software that include all of the necessary elements to execute software code or applications in any computing environment, e.g., on any of trainers 130, 132 and 134. For example, a container may include a package of an application software code together with dependencies such as libraries required to run the software code.

[0070] In some embodiments, RREs 131, 133, and 135 may be provided by the following components: • A plurality of preset processes, bundled with RREs 131, 133, and 135, where each of the preset processes may be associated with an aspect of functionality and user experience of RREs 131, 133, and 135. For example, the preset processes may include a process for playing video and audio, a process for reading files or writing files to the file system (e.g., memory 720 and storage 730 depicted in Fig. 7), such as caching websites, storing cookies, saving images and the like, a process for communicating with a GPU, and many others. These preset processes are standardized and either come packaged with RREs 131, 133, and 135 or used by RREs 131, 133, and 135. Once RREs 131, 133, and 135 are respectively installed on trainers 130, 132 and 134, the respective users (e.g. owners or renters) of trainers 130, 132 and 134 are inherently limited to experiencing all processes run by RREs 131, 133, and 135 within the boundaries, e.g., constraints and limitations, that are preset within the processes bundled with RREs 131, 133, and 135. Any external code that requests privileges exceeding those that are available within the preset processes bundled with RREs 131, 133, and 135, will be denied those privileges.

• Inter-Process-Communication (IPC) - a set of well-defined, sanitized messages that initiate the execution of the preset processes bundled with RREs 131, 133, and 135 or that RREs 131, 133, and 135 is configured to use. For example, video process may initiate pre-installed CODEC process using an IPC message.

• A Tenderer process, which is in charge of executing external code, e.g., websites HTML & JavaScript.

[0071] The end-result of the structure of example RREs 131, 133, and 135 is that all functionality that can be implemented by the processes bundled with RREs 131, 133, and 135 may successfully run on or be executed by any given trainer 130, 132 and 134, and vice versa, functionality that cannot be implemented by the processes bundled with RREs 131, 133, and 135 will not run on or be executed by trainers 130, 132 and 134.

[0072] Orchestrator 120 may divide the translated task into one or more portions, and send each portion to one of trainers 130, 132 and 134. For example, for a task or assignment of training a NN model, a portion of the translated task may include a structure of the NN model or a part of the NN model, weights and/or initial weights, and a particular combination of execution parameters and hyperparameters such as optimizer, loss function/s, metrics to be evaluated by the model during training, optimization, testing and inference, batch size (e.g., the number of samples in each mini batch), steps per epoch (the number of batch iterations per epoch), number of epochs, types of callbacks to apply, augmentation parameters, and also a training and/or validation dataset or a portion thereof. The training dataset may be provided in chunks or mini batches or any smaller segments, as disclosed herein.

[0073] One or more of trainers 130, 132 and 134 may execute the portion of the translated task assigned to them by orchestrator 120, within RREs 131, 133 and 135. Once completed, one or more of trainers 130, 132 and 134 may automatically send execution results to orchestrator 120. Orchestrator 120 may automatically aggregate, unify or integrate execution results from the different trainers 130, 132 and 134 to obtain the task results. Orchestrator 120 may convert the task results (e.g., the trained model weights or model predictions) to be readable in the original programming language of the task, and send the converted task results (e.g., the converted trained model or the converted model predictions) to client 110. For a training task, orchestrator 120 may obtain a trained model as associated with the portion of the translated task performed by each respective trainer 130, 132, 134, convert the trained model back to be readable in the original programming language in which the original task was provided by client 110, or in another programming language as required by client 110, and send the final results to client 110. For an inference task, orchestrator 120 may unify the results of obtain the model prediction.

[0074] According to some embodiments, orchestrator 120 may evaluate the task, and establish or estimate the amount of computing power that is required to execute the translated task within a predetermined time frame. Orchestrator 120 may determine a configuration for executing the translated task within the predetermined time frame or without any predetermined timeframe. For example, orchestrator 120 may evaluate or otherwise have knowledge of the amount of available computing power at trainers 130, 132 and 134, for example, orchestrator 120 may periodically or whenever needed, query trainers 130, 132 and 134 for their available computing power or obtain the available computing power from trainers 130, 132 and 134. Orchestrator 120 may evaluate the amount of computing power required to perform the translated task. Orchestrator 120 may divide the translated task into portions and allocate or assign the portions to one or more of trainers 130, 132 and 134 according to the amount of computing power that is required to execute the translated task within a predetermined time frame and the available computing power of trainers 130, 132 and 134. Orchestrator 120 may divide the translated task into portions requiring computing power that is equal to or below the available computing power of the assigned trainer 130, 132 or 134. In some embodiments, orchestrator 120 may divide the translated task into portions so that each portion may require the available computing power of one of trainers 130, 132 and 134 to be executed (or less). Thus, orchestrator 120 may assign a portion to a trainer 130, 132 or 134 that can be executed by the available computing power of that trainer 130, 132 or 134.

[0075] According to some embodiments, orchestrator 120 may negotiate with client 110 the terms and conditions of performing the task with client 110 to arrive at an agreed SLA. The SLA and/or SLO may define the level of service, e.g., time frame, and other parameters, expected by a client 110 from orchestrator 120 for completion of the task. Orchestrator 120 may also determine and provide a price for performing the task in the selected SLA to client 110. According to some embodiments, orchestrator 120 may negotiate the terms and conditions of performing the portions of the translated task with trainers 130, 132 and 134, including time frame, process, prices paid to trainers 130, 132 and 134, and other parameters.

[0076] According to some embodiments, orchestrator 120 may determine the SLA or the SLO for a task automatically based on considerations including but not limited to the number of trainers 130, 132 and 134 available, the free or available computing power of the available trainers 130, 132 and 134, the communications bandwidth (e.g., over networks 140) with trainers 130, 132 and 134, the price client 110 (e.g., the user) is willing to pay (the bid), the price that trainers 130, 132 and 134 require for using their computing power (the ask), the number of compute operations (e.g., float operations) required to perform the task by trainers 130, 132 and 134, etc. In some embodiments, orchestrator 120 may determine the SLA or SLO or price provided to client 110, and/or prices offered to trainers 130, 132 and 134 by an auction process between trainers 130, 132 and 134 competing for the right to execute the translated task or a portion thereof.

[0077] According to some embodiments, orchestrator 120 may automatically monitor the free or available computing power available and other performance indicators such as temperatures, available memory, available battery power, satisfaction of manufacturer usage limitations, at trainers 130, 132 and 134 while executing the portion of the translated task provided to them, e.g., the computing power of trainers 130, 132 and 134 that is not used for executing the respective portions of the translated tasks. In some embodiments, orchestrator 120 may automatically monitor the free or available computing power and satisfactory performance indicators as above, at trainers 130, 132 and 134, while not engaged in executing a portion of a translated task and may use this data to determine SLA or SLO. In some embodiments, orchestrator 120 may select trainers for executing a portion of a translated task, only from trainers 130, 132 and 134 with free or available computing power and satisfactory performance indicators as above, at any given present or past time frame, that is above respective thresholds. In some embodiments, orchestrator 120 may monitor the free or available computing power and satisfactory performance indicators as above, at trainers 130, 132 and 134, while each is engaged in executing a portion of a translated task and may use this information to automatically adjust the amount of computing power used by trainers 130, 132 and 134 to execute the portion of the translated task. For example, orchestrator 120 may dynamically adjust the portion of the translated task that is assigned to a trainer, and/or to adjust the amount of computing power consumed by trainer 130 for executing the portion of the translated task. For example, orchestrator 120 may increase the amount of computing power consumed by trainer 130 to execute the portion of the translated task if the free or available computing power of trainer 130 increases, and vice versa, decrease the amount of computing power used by trainer 130 to execute the portion of the translated task if the free or available computing of trainer power decreases. Orchestrator 120 may adjust the amount of computing power used by trainers 130, 132 and 134 to execute the portion of the translated task based on others performance indicators as well. For example, orchestrator 120 may increase the amount of computing power consumed by trainer 130 to execute the portion of the translated task if the performance indicators of trainer 130 increase or improve (e.g., temperature decreases), and vice versa, decrease the amount of computing power used by trainer 130 to execute the portion of the translated task if the performance indicators decrease or degrade (e.g., temperature increases).

[0078] In some embodiments, at least one of trainers 130, 132 and 134 may periodically, continually or continuously monitor the free or available computing power that the trainer is consuming, for executing the portion of a translated task assigned to that trainer by orchestrator 120, as well as the state of the performance indicators as above, and may use that information to automatically adjust the amount of computing power that the trainer is consuming to execute the portion of the translated task. For example, trainer 130 may increase the amount of computing power used by trainer 130 to execute the portion of the translated task if the free or available computing power of trainer 130 increases; and vice versa, decrease the amount of computing power consumed by trainer 130 to execute the portion of the translated task if the free or available computing power of trainer 130 decreases. Trainer 130 may adjust the amount of computing power used by trainers 130, 132 and 134 to execute the portion of the translated task based on others performance indicators as well. As another example trainer 130 may increase the amount of computing power consumed by trainer 130 to execute the portion of the translated task if the performance indicators of trainer 130 increase or improve (e.g. battery power or available memory increases), and vice versa, decrease the amount of computing power used by trainer 130 to execute the portion of the translated task if the performance indicators decrease or degrade (e.g. battery power or available memory decreases). [0079] Training an ML model may require providing or feeding a training dataset into the ML model. A training dataset may include a plurality of samples. A sample may also be referred to as an instance, an observation, an input vector, a feature vector, a dataframe, and the like. A sample may include inputs that are fed into the ML model and an output that is compared to the prediction of the model to calculate an error, incorporated into a loss or cost function. In many applications the number of samples in a training dataset may be large, reaching even many millions of samples. Training an ML model is typically performed in epochs, where each epoch includes feeding the ML model with the entire training dataset (or a portion of the training dataset in the case of data parallelism training methods), and updating or adjusting the model weights and parameters thereupon. Training an ML model may include a plurality of epochs, where the same training dataset is used over and over again.

[0080] For example, some algorithms for training a NN model such as mini batch gradient descent may enable training the NN model using mini batches of data taken from the training dataset (or the training dataset portion in case of data parallelism training methods), instead of using the entire training dataset at once. When using mini batch gradient descent to train a NN model, the training dataset may be divided into a plurality of mini batches. Then, each sample in the mini batch may be provided as input to the NN model and a prediction may be made. At the end of training session using the mini batch, the resulting predictions are compared to the expected output variables and a mean error, objective function, loss or cost, and respective gradient is calculated. The mean error, loss or cost, and respective gradient is then used to train the NN model, e.g., to adjust the model weights, for example using backpropagation and/or other training methods. A mini batch of data may include more than one sample and less than the whole dataset (or the portion of it as noted above). The size of the mini batch may also be a hyperparameter that defines the number of samples to work through before updating the internal model weights.

[0081] According to some embodiments, orchestrator 120 may divide or partition the training data associated with or included in a portion of the translated task that is provided to a single trainer 130 into one or more mini batches, where each mini batch may include more than one sample and less than number of samples included in the training dataset associated with the translated task. In some embodiments, client 110 may divide the training dataset into one or more mini batches. For example, the size or sizes of the mini batches may be provided to orchestrator 120 as part of the training parameters. According to some embodiments, orchestrator 120 may start transmission of a portion of the translated task to trainer 130 by first transferring the model structure, weights, and training parameters. Next, orchestrator 120 may send, transfer, or transmit, either directly or in a mesh architecture, the training data to the trainer 130 in data shares, chunks or shards, where each data shard may include a single mini batch, a plurality of mini batches, or less than one mini batch. A mini batch may refer to a segment of the training data that may be sufficient for initiation of training.

[0082] Trainer 130 may begin training the NN model once the first mini batch is obtained and may not have to wait for the transfer of the entire training dataset before starting to train the NN model. This may considerably reduce latencies in system 100, specifically the time taken from starting to transmit the translated task to trainer 130 to the time of starting to train the NN model by trainer 130 may be reduced. Thus, the overall time required for training the NN model may be decreased.

[0083] In some embodiments, a first shard of data may include a single mini batch, while subsequent shards of data may include more than one mini batches. In some embodiments, the size of the subsequent data shards, or the number of mini batches included in each data shard sent to a trainer 130 may be determined or adjusted based on the speed of communication between orchestrator 120 and trainer 130 (e.g., the size or number or both may be adjusted or increase as the speed of communication increases and decrease as the speed of communication decreases). In some embodiments, the size of the subsequent data shards, or the number of mini batches included in each data shard sent to a trainer 130 may be adjusted or determined based on the progress of the execution of the portion of the translated task by that trainer 130 (e.g., the size or number or both may increase if the progress of the execution is above a threshold and decrease otherwise).

[0084] In some embodiments, client 110 may transfer or transmit the training dataset to orchestrator 120 in data shards, where each data shard includes a single mini batch, a plurality of mini batches, or less than one mini batch, enabling orchestrator 120 to start transmission of mini batches of data to trainer 130, before receiving the entire training dataset from client 110, again reducing latencies in system 100. Specifically, the time from starting to transmit the task from client 110 to orchestrator 120 (or directly to trainer 130 in a mesh architecture), to the time of starting to transmit the translated task from orchestrator 120 to trainer 130 may be reduced, again decreasing the overall time required for training the ML model. In some embodiments, a first shard of data may include a single mini batch, while subsequent shards of data may be larger or include more than one mini batches or both. In some embodiment, the size of the subsequent shards of data or number of mini batches included in each data shard, or both, sent to orchestrator 120 from client 110 may be adjusted or determined based on the speed of communication between client 110 and orchestrator 120.

[0085] According to some embodiments, orchestrator 120 may send, transfer, or transmit the training dataset to the trainer 130 in data shards, where each shard includes less than a mini batch. In this case, trainer 130 may automatically examine if a received chunk of training data includes a complete mini batch. If yes, trainer 130 may automatically start training the NN model using the received mini batch. If not, trainer 130 may automatically request another chunk from orchestrator 120, or simply wait until another chunk is obtained from orchestrator 120 and start training the NN model once a full mini batch is obtained. Obtaining data shards of training data may continue until the entire training dataset required for completing the portion of the translated task is transmitted to trainer 130.

[0086] For example, in a complete training dataset that includes 2,000,000 samples, a mini batch may include 10 samples, Orchestrator 120 may start transmission of data shards that may include mini batches of samples to trainer 130, after receiving only 10 samples, and before receiving the entire 2,000,000 samples. Similarly, trainer 130 may automatically start to train the NN model after receiving the first 10 samples only (e.g., a single mini batch), and before receiving the entire 2,000,000 samples included in the training dataset. The mini batches may be stored by trainer 130 for the next epochs as described below.

[0087] According to some embodiments, trainer 130, after having received the full dataset (or portion thereof) assigned to trainer 130, may automatically save the received dataset to a local storage of trainer 130, thus eliminating the need to transfer the dataset again as the translated task is being executed thereby.

[0088] Reference is now made to Fig. 2 which schematically illustrates the dataflow between client 110, orchestrator 120 and trainer 130, according to embodiments of the invention. As can be seen in Fig. 2, client 110 may generate or initiate a task 210. For example, task 210 may include training an ML model and thus may include input model 221, including a structure 222 of the model and weights of the model 223 (initial or other weights), training dataset 224, and training parameters (also referred to herein as hyperparameters) 225, etc. and may be provided in any programing language that is supported by orchestrator 120, e.g., Python or JavaScript. Orchestrator 120 may obtain the task 210 and automatically, by translation module 220, extract the task gist 228 including structure 222, weights of the model 223 training dataset 224, and hyperparameters 225 to generate a translated task 226. The translated task 226 may be generated by translating task 210 or task gist 228 into a code that is supported by and executable by RRE 131, e.g., Python or JavaScript. According to some embodiments, translating a task by translation module 220 may include extracting from the task as obtained from client 110, the task gist or essence 228 including the model structure 222, model weights 223, hyperparameters 225, and datasets 224. In some embodiments translation module 220 may be implemented in or executed by client 110. The task gist 228 may then be transformed into a format and/or programing language that is readable by a trainer 130. Translated task 226 or task gist 228 may be divided by orchestrator 120 into portions, and each portion may be provided by orchestrator 120 to each of RREs 131 of a trainer 130, RRE 133 of trainer 132, RRE 135 of trainer 134, each in code that is executable by the respective RRE 131, 133 and 135, etc. RRE 131 of trainer 130 may automatically download 232 the portion of the task as disclosed herein, train 233 the ML model using the model structure, model weights, training dataset and training parameters included in the respective portion of translated task 226 or task gist 228, and generate a trained model or a partially trained models 234, e.g., provide a structure 242 and weights 243 of the trained model or the partially trained models, and other relevant data such as training assignment information artifacts such as total epoch time, loss values at each specific epoch or step, etc., if applicable, together forming the translated task results 235). According to embodiments of the invention, downloading 232 training 233 and uploading 236 may be implemented within RRE 131. For example, trainer 130 may access the website operated by orchestrator 120 by feeding a link to RRE 131, e.g., a browser, and the interaction may be similar to the interaction with a regular website.

[0089] RRE 131 of trainer 130 may automatically return (e.g., via network or Internet 140) translated task results 235 to orchestrator 120. Output conversion module 240 of orchestrator 120 may automatically convert the translated task results, obtained from the trainers 130, 132, 134, into any programming language or format as required by client 110 and supported by orchestrator 120, to obtain converted trained model 241. Orchestrator 120 may automatically provide or transmit the converted trained model 241 to client 110.

[0090] Reference is made to Fig. 3, which schematically illustrates a system 300, according to embodiments of the invention. Components in system 300 are similar to those of system 100 depicted in Fig. 1, with the addition of a parameter server 150. Parameter server 150 may be implemented in a separate server, as a block in orchestrator 120 or by a trainer, e.g., one of trainers 130, 132 and 134 may operate as parameter server 150. Parameter server 150 may help system 300 to support data parallelism, or model parallelism combined with data parallelism.

[0091 ] According to some embodiments, the translated task may be executed by system 300 using data parallelism method. For implementing data parallelism, results of portions of the translated task may be sent by each of trainers 130, 131, 132 to parameter server 150. For example, results of the portions of the translated task may be sent by each of trainers 130, 131, 132 to parameter server 150 after each epoch. Parameter server 150 may unify, aggregate, integrate, or combine the results of the portions of the translated task and send updated versions of the portions of the translated task to trainers 130, 131, 132. For example, parameter server 150 may combine results from a plurality of trainers 130, 131, 132 to update weights of a NN model and provide the updated model weights to trainers 130, 131, 132, for performing the next epoch with the updated model. After completing all epochs, e.g., when a criterion is satisfied, parameter server 150 may unify, aggregate, integrate, or combine the results from trainers 130, 131, 132, to arrive at the final trained model (e.g., the final model structure and weights), and may provide the trained model back to orchestrator 120. The criterion may include one or more predefined metrics of performance or a number of epochs, and may be defined by client 110.

[0092] Reference is now made to Fig. 4 which schematically illustrates an example dataflow between client 110, orchestrator 120 trainer 130 and parameter server 150, according to embodiments of the invention. Parts of the dataflow presented in Fig. 4 are similar to those presented in Fig. 2. In Fig. 4, however, trainer 132 implements parameter server 150 within its RRE 133. Thus, trainer 130 (and other trainers that are taking part in training the same ML or NN model to which belongs the portion of the translated task sent to them) may send training results 235, for example, including a partially trained model 434 and other relevant data, after completing each epoch, to parameter server 150. Parameter server 150 may unify the results obtained from the plurality of trainers (e.g., trainer 130 and others). For example, parameter server 150 may update the weights of an entire NN model, and may return the updated parameters (or parts therefrom, as required) to trainer 130 (and other trainers that are taking part in training the same ML or NN model). After training is completed, parameter server 150 may unify, aggregate, integrate, or combine the results from trainers 130, 131, 132, to arrive at the final trained model 235 (e.g., the final model structure and weights), and may provide or upload 236 the trained model back to orchestrator 120.

[0093] Fig. 5 is a flowchart of a method for providing decentralized computing resources for training an ML model, according to some embodiments of the present invention. While in some embodiments the operations of Fig. 5 are carried out using systems as shown in Figs. 1, 2 and 7, in other embodiments other systems and equipment can be used. In operation 300, a processor (e.g., processor 705 depicted in Fig. 7) may obtain a task, e.g., from a client such as client 110. The task may include training an ML or NN model and may be provided in any supported programming language or format. The task may include the input model 221, including a structure 222 of the model, and weights of the model 223 (initial or other weights), training dataset 224, and training hyperparameters 225, etc. The processor may agree or automatically commit to an SLA and/or SLO with the client. In operation 310, the processor may automatically translate the task such that it is supported and executable by the RREs of a plurality of trainers, e.g., executing Python or Javascript, to generate a translated task. According to some embodiments, translating a task may include extracting, from the task as obtained from the client, the task gist or essence including the model structure, model weights, hyperparameters, and datasets. The structure, model weights, hyperparameters, and datasets may then be transformed into a format and/or programing language that is readable by a trainer.

[0094] In operation 320 the processor may automatically monitor or obtain (e.g., from the trainers) the free or available computing power and/or other performance indicators available at the plurality of trainers. In some embodiments, the processor may automatically determine the SLA based on the free or available computing power and/or other performance indicators available at the plurality of trainers. In operation 330, the processor may automatically divide the translated task into portions and assign each portion to one of the plurality of trainers. For example, division, and then assignment of portions to trainers may be performed based on the available computing power and/or other performance indicators at the trainers, the SLA, the SLO, trainer’s hardware, the communication speed and bandwidth available to each of the trainers, and other parameters. In operation 340, the processor may automatically transmit each portion to the selected trainer. The portion may be transmitted in whole or in chunks (data-shards). For example, the processor may transmit the model structure, weights and hyperparameters first, and only then chunks of training data, organized in data-shards that may include training mini-batches. In operation 350, the selected trainers may automatically execute the portion of the translated task assigned to them, in an RRE. In operation 360, the trainers may transmit the execution results back to the processor. In optional operation 370, the processor (e.g., a parameter server) may unify, combine or aggregate the execution results obtained from the plurality of trainers, for example to support data parallelism or model parallelism or both. In some embodiments, operation 370 may be repeated after each epoch, and the updated model, e.g., the updated model parameters, may be sent back to the trainers for further training, as indicated in operation 372, until arriving at the final trained ML model. In operation 380, the processor may automatically convert the trained model, including weights, model architecture, hyperparameters and any training task information artifacts (referred to as the converted trained model) such that it is readable in a programming language or format agreed upon with the client that has provided the task in operation 300. In operation 390, the processor may provide or automatically transmit the converted trained model to the client from which the task was obtained in operation 300. In some embodiments, operations 350, 360 and 370 and 372 (if applicable), may be repeated until a criterion (e.g., one or more predefined metrics of performance or a number of epochs) is satisfied. [0095] Fig. 6 is a flowchart of a method for providing decentralized computing resources for an ML model inference, according to some embodiments of the present invention. While in some embodiments the operations of Fig. 6 are carried out using systems as shown in Fig. 1, in other embodiments other systems and equipment can be used. In operation 600, a processor (e.g., of processor 705 presented in Fig. 7) may obtain a task, e.g., from a client such as client 110. The task may include an ML or NN model inference and may be provided in any supported programming language or format. The task may include the input model 221, including a structure 222 of the model, and weights of the model 223, input data, etc. The processor may agree or automatically commit to an SLA and SLO with the client. In operation 610, the processor may automatically translate the task such that it is supported and executable in a programing language that is supported by an RRE of a plurality of trainers, e.g., in Javascript or Python, to generate a translated task. According to some embodiments, translating a task by translation module 220 may include extracting from the task as obtained from client 110, the task gist or essence including the model structure, model weights, and input data. The structure, model weights and input data may then be transformed into a format and/or programing language that is readable by RRE 131 executed by trainer 130.

[0096] In operation 620 the processor may automatically monitor or obtain (e.g., from the trainers) the free or available computing power and/or other performance indicators available at the plurality of trainers. In some embodiments, the processor may automatically determine the SLA or SLO based on the free or available computing power and/or other performance indicators available at the plurality of trainers. In operation 630, the processor may automatically divide the translated task into portions and assign each portion to one of the plurality of trainers. For example, assignment of portions to trainers may be performed based on the available computing power and/or other performance indicators at the trainers, the SLA, the SLO, trainer’s hardware, the communication speed and bandwidth available to each of the trainers, and other parameters. For example, portions may be assigned to trainers such that each portion is executable by the code execution environment or RRE of the assigned trainer (e.g., the assigned computing device) using the available computing power of the assigned trainer. In operation 640, the processor may automatically transmit each portion to the selected trainer. In operation 650, the selected trainers may automatically execute the portion of the translated task assigned to them, in an RRE. In operation 660, the trainers may transmit the execution results (e.g., model predictions) back to the processor. In optional operation 670, the processor may unify, combine or aggregate the execution results obtained from the plurality of trainers. In operation 380, the processor may automatically convert the model predictions to a code or format readable in a programming language agreed upon with the client that has provided the task in operation 600. In operation 690, the processor may provide or automatically transmit the converted model predictions to the client from which the task was obtained in operation 600.

[0097] Fig. 7 illustrates an example computing device according to an embodiment of the invention. Various components such as clients 110, 112, 114 orchestrator 120 and trainers 130, 132 and 134 may be or include computing device 700, or may include components such as shown in Fig. 7. For example, a first computing device 700 with a processor 705 may be used to translate a computing task and distribute the translated task among trainers 130, 132 and 134.

[0098] Computing device 700 may include a processor 705 that may be, for example, a CPU, a GPU, a TPU, a DSP, a chip, an field-programmable gate array (FPGA), an Application Specific Integrated Circuit (ASIC), a system on a chip (SoC), or any suitable computing or computational device, an operating system 715, a memory 720, a storage 730, input devices 735 and output devices 740. Processor 705 may be or include one or more processors, etc., co-located or distributed. Computing device 700 may be for example a workstation, personal computer, media streaming device, smart TV, smart phone, tablet, set top box, gaming console, car infotainment system or may be at least partially implemented by a remote server (e.g., in the “cloud”).

[0099] Operating system 715 may be or may include any code segment designed and/or configured to perform tasks involving coordination, scheduling, arbitration, supervising, controlling or otherwise managing operation of computing device 700, for example. Operating system 715 may be a commercial operating system. Memory 720 may be or may include, for example, a Random Access Memory (RAM), a read only memory (ROM), a Dynamic RAM (DRAM), a Synchronous DRAM (SD-RAM), a double data rate (DDR) memory chip, a Flash memory, a volatile memory, a non-volatile memory, a cache memory, a buffer, a short term memory unit, a long term memory unit, or other suitable memory units or storage units. Memory 720 may be or may include a plurality of, possibly different memory units.

[00100] Executable code 725 may be any executable code, e.g., an application, a program, a process, task or script. Executable code 725 may be executed by processor 705 possibly under control of operating system 715. For example, executable code 725 may be or include code applicable for translating a computing task and distributing the translated task among trainers 130, 132 and 134. As another example executable code 725 may be or include code applicable for converting the translated task results received back from trainers 130, 132 and 134, or from parameter server 150, to a programming language or format required by client 110. In some embodiments, more than one computing device 700 may be used. For example, a plurality of computing devices that include components similar to those included in computing device 700 may be connected to a network and used as a system.

[00101] Storage 730 may be or may include, for example, a hard disk drive, a floppy disk drive, a Compact Disk (CD) drive, a CD-Recordable (CD-R) drive, a universal serial bus (USB) device or other suitable removable and/or fixed storage unit. In some embodiments, some of the components shown in Fig. 7 may be omitted. For example, memory 720 may be a non-volatile memory having the storage capacity of storage 730. Accordingly, although shown as a separate component, storage 730 may be embedded or included in memory 720.

[00102] Input devices 735 may be or may include a mouse, a keyboard, a touch screen or pad or any suitable input device. It will be recognized that any suitable number of input devices may be operatively connected to computing device 700 as shown by block 735. Output devices 740 may include one or more displays, speakers and/or any other suitable output devices. It will be recognized that any suitable number of output devices may be operatively connected to computing device 700 as shown by block 740. Any applicable input/output (I/O) devices may be connected to computing device 700 as shown by blocks 735 and 740. For example, a wired or wireless network interface card (NIC), a modem, printer or facsimile machine, a universal serial bus (USB) device or external hard drive may be included in input devices 735 and/or output devices 740. Network interface 750 may enable device 700 to communicate with one or more other computers or networks. For example, network interface 750 may include a WiFi or Bluetooth device or connection, a connection to an intranet or the internet, an antenna etc.

[00103] Embodiments described in this disclosure may include the use of a special purpose or general- purpose computer including various computer hardware or software modules, as discussed in greater detail below.

[00104] Embodiments within the scope of this disclosure also include computer-readable media, or non-transitory computer storage medium, for carrying or having computer-executable instructions or data structures stored thereon. The instructions when executed may cause the processor to carry out embodiments of the invention. Such computer-readable media, or computer storage medium, can be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer- executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of computer-readable media.

[00105] Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

[00106] As used herein, the term "module" or "component" can refer to software objects or routines that execute on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system (e.g., as separate threads). While the system and methods described herein are preferably implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated. In this description, a “computer” may be any computing system as previously defined herein, or any module or combination of modulates running on a computing system or combining together to operate as a computer system.

[00107] Computer database, systems integration, and scheduling technology may be improved by shortening the time taken to identify a person, retrieve records related to the person, and schedule a meeting with the person.

[00108] For the processes and/or methods disclosed, the functions performed in the processes and methods may be implemented in differing order as may be indicated by context. Furthermore, the outlined steps and operations are only provided as examples, and some of the steps and operations may be optional, combined into fewer steps and operations, or expanded into additional steps and operations.

[00109] One skilled in the art will realize the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting of the invention described herein. Scope of the invention is thus indicated by the appended claims, rather than by the foregoing description, and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

[00110] In the foregoing detailed description, numerous specific details are set forth in order to provide an understanding of the invention. However, it will be understood by those skilled in the art that the invention can be practiced without these specific details. In other instances, well-known methods, procedures, and components, modules, units and/or circuits have not been described in detail so as not to obscure the invention. Some features or elements described with respect to one embodiment can be combined with features or elements described with respect to other embodiments.

[00111] Although embodiments of the invention are not limited in this regard, discussions utilizing terms such as, for example, “network”, “processing,” “computing,” “calculating,” “mesh”, “determining,” “transfer”, “establish”, “analyzing”, “checking”, or the like, can refer to operation(s) and/or process(es) of a computer, a computing platform, a computing system, or other electronic computing device, or related network thereof, that manipulate and/or transforms data represented as physical (e.g., electronic) quantities within the computers’ registers and/or memories into other data similarly represented as physical quantities within computers’ registers and/or memories or other information non-transitory storage medium that can store instructions to perform operations and/or processes.

[00112] The term set when used herein can include one or more items. Unless explicitly stated, the method embodiments described herein are not constrained to a particular order or sequence. Additionally, some of the described method embodiments or elements thereof can occur or be performed simultaneously, at the same point in time, or concurrently.