KRISHNAN SUBRAMANIAM VENKATRAMAN (US)
KARANASOS KONSTANTINOS (US)
CURINO CARLO (US)
TARTE ISHA (US)
DARBHA SUDHIR (US)
CLAIMS 1. A method for automatically tuning a cloud environment comprising a plurality of servers, the method comprising: accessing telemetric data of the cloud environment; accessing a plurality of operational parameters for the cloud environment; modeling a group of the operational parameters for operating the cloud environment; testing the modeled group of the operational parameters in a subset of the servers; selecting the modeled group of the operational parameters for use in tuning the cloud environments based on said testing; and deploying the modeled group of the operational parameters into the cloud environment. 2. The method of claim 1, further comprising building one or more machine learning (ML) models from the accessed telemetric data. 3. The method of claim 2, further comprising applying the one or more ML models to a subset group of the plurality of servers to calculate the performance metrics of the cloud infrastructure. 4. The method of claim 3, wherein the telemetric data comprises total bytes read by the cloud infrastructure per a quantity of time. 5. The method of any of claims 1-4, wherein the telemetric data comprises a ratio based on a total amount of data read and a total execution time per machine of the cloud infrastructure. 6. The method of any of claims 1-5, wherein the telemetric data comprises a ratio based on a total amount of data read and a total central processing unit (CPU) time per machine of the cloud infrastructure. 7. The method of any of claims 1-6, wherein the telemetric data comprises an average time running containers in the cloud infrastructure. 8. The method of any of claims 1-7, wherein the cloud infrastructure is a large-scale cloud-computing environment that processes at least an exabyte of data on a daily basis. 9. A system for automatically tuning a cloud environment comprising a plurality of servers, the system comprising: memory embodied with executable instructions for performing said tuning of the cloud infrastructure; and one or more processors programmed for: calculating performance metrics of the cloud infrastructure based on the telemetric data, generating one or more configurations of cloud operational parameters for the cloud environment based on the performance metrics, testing the one or more configurations of the cloud operational parameters in a subset of the one or more servers, selecting an optimal set of the cloud parameters based on performance metrics of the subset of the one or more servers once the operational parameters are applied, and deploying the optimal set of the cloud operational parameters to one or more servers of the cloud environment. 10. The system of claim 9, wherein the telemetric data is collected daily. 11. The system of any of claims 9-10, wherein the one or more processors are programmed for building one or more machine learning (ML) models from the accessed telemetric data. 12. The system of any of claims 9-11, wherein the one or more processors are programmed for applying the one or more ML models to a subset group of the plurality of machines to calculate the performance metrics of the cloud infrastructure. 13. The system of any of claims 9-12, wherein the telemetric data comprises total bytes read by the cloud infrastructure per a quantity of time. 14. One or more computer-readable memory devices embodied with modules that are executable by one or more processors for automatically tuning a cloud environment comprising a plurality of servers, the modules comprising: a performance monitor configured to join telemetric data from various sources and calculate performance metrics of the cloud infrastructure based on the telemetric data; a modeler configured to generate one or more configurations of cloud operational parameters for the cloud environment based on the performance metrics; an experimenter configured to test the one or more configurations of the cloud operational parameters in a subset of the one or more servers and select an optimal set of the cloud parameters based on performance metrics of the subset of the one or more servers once the operational parameters are applied; and a deployment tool configured to deploy the optimal set of the cloud operational parameters to one or more servers of the cloud environment. 15. The one or more computer-readable memory devices of claim 14, wherein the modeler applies an artificial intelligence (AI) algorithm to the performance metrics for generating the one or more configurations of cloud operational parameters. |
[0072] The above two observations are important building blocks to capture the relationship between changing configurations to the changes in the objective functions (or constraints) that embodiments hope to optimize. To do so, some implementations configured to perform the following operations: (1) based on the set of parameters being tuned, identify the set(s) of metrics that will be directly impacted; (2) create (or build) ML models to understand how this set of metrics affects the others, especially the ones that relate to our objective functions/constraints; and (3) based on the resulting formulation, perform optimization to pick the optimal configuration.
[0073] For the development of the ML models, the dynamics between the different sets of operational parameters 216 remain the same, even with different operational parameters 216. Those aspects reflect the mechanics of the infrastructure, and capturing those relationships becomes key for modeling and prediction. For instance, in FIG. 5, even with different levels of CPU utilization or the workload levels, the relationship between the resulting throughput and the CPU utilization level may be expressed with the same formulation for each group of servers 201 with a particular software/hardware combination. This relationship is not affected by the external configuration, such as YARN configuration settings. Such fundamentals are used to predict the resulting performance under new configurations to avoid the need for experiments.
[0074] Based on observational data, sets of ML models are built by the modeler 222, such as g k (·), h k (·) and f k (·), for each SC (software configuration) and SKU (hardware) combination k. to capture the relationship between the different sets of metrics, such as (1) the number of running containers versus the CPU utilization level; (2) the CPU utilization level versus the number of tasks finished; and/or (3) the CPU utilization level versus the task latency respectively, using the following Equations (l)-(6): x k = g k (m k ) ∀ k = 1, 2, 3, • • • ,K, (1) x ' k = g k (m' k ) ∀ k = 1, 2, 3, • • • ,K, (2) l k = h k (x k ) ∀ k = 1, 2, 3, • • • , K, (3) l ' k = h k (x ' k ) ∀ k = 1, 2, 3, • • • , K, (4) w k = f k (x k ) ∀ k = 1, 2, 3, • • • , K, (5) w' k = f k (x ’ k ) ∀ k = 1, 2, 3, • • • , K, (6) where, k: the index for the SC-SKU combination, k =1, 2, 3,
• • • ,K; m k : the number of running containers (simultaneously) per machine with SC-SKU combination k ; m' k : the original number of running containers per machine with SC-SKU combination k ; x k and x ' k : the CPU utilization level for machines with SCSKU combination k , given the number of running containers m k and m' k respectively; l k and l ' k : the number of vertices finished on a machine with SC-SKU combination k , given the CPU utilization level x k and x' k respectively; and w k and w' k : the average vertex latency for machines with SC-SKU combination k , given CPU utilization level x k and x' k respectively.
[0075] In some implementations, these mappings between sets of performance metrics remain the same as in Equations (l)-(6) regardless of configuration changes, different utilization levels x k or x' k , or running different amounts of workloads (measured by number of running containers m k or m' x ). In other implementations, the modeler 222 uses different ML models that involve a larger set of metrics of interest, such as the resource utilization of SSD, network bandwidth, or the like. In still other implementations the modeler 222 uses regression models as the predictors, such as linear regression (LR), support vector machines (SVM), or deep neural nets (DNN). The modeler 222 may be run different objective functions and constraints with respect to the goals of a given application and corresponding ML models with respect to the directly impacted performance metrics, and the ones related to the objective functions/constraints are used.
[0076] For the application of tuning the maximum running containers in YARN, embodiments maximize the total number of running containers subject to the same overall average task latency at the cluster level as the current situation. Therefore, the directly impacted performance metric is the number of running containers on the machine. Embodiments maintain the same level of task latency (cluster-wide average) as the constraint. Optimization may then be performed with a closed-form objective function according to the following:
(1) - (6). where, n k : the number of machines in the cluster for machine function-SKU combination k. and : the overall average task latency for the full clus- ter, given CPU utilization level x k and x' k re- spectively, calculated as the weighted average of task latency running on different groups of machines.
The optimal solution of the optimization (m* ∀k = 1, 2, 3, • • • , K) indicates the optimal workload distribution across different groups of machines. Based on the changes of the workload distribution, embodiments modify the configuration for the maximum running containers accordingly, increasing or decreasing it for different software-hardware (SC- SKU) combinations.
[0077] The tasks running on slower servers 201 are more likely to slow down ajob. Re- balancing of the workloads suggested by the modeler 222 reduces the workload skew and shifts traffic from slower machines to faster servers 201 to improve the overall efficiency. With the increased utilization level on faster machines, mild performance degradation may be experienced; however, those are less likely to be on the critical path of ajob (a set of slowest tasks for each stage of the execution) that directly impact the job-level latency. Therefore, even though the constraint for the optimization formulation ensures the same average task-level latency, the automated tuning improves the performance of the straggler tasks and is very likely to improve the job-level latency by reducing the variation of task- level latency. [0078] Figures 6A-6D show a set of calibrated ML models to depict the running containers and task execution time in seconds versus CPU utilization level. Each small dot corresponds to an observation aggregated at the daily level for a machine. The line shows the model estimation. The large dot in the center of the figure indicates the median level of the variables across all observations. These FIGs.6A-6D show the optimization results in terms of the suggested shift of current workloads (calculated as the number of containers running per machine). For slower machines, the ML models generated by the modeler 222 suggest to decrease the utilization by reducing the number of running containers, while for faster machines, the model suggests to increase it. The same optimization model was run focusing on a higher percentile of CPU utilization level, corresponding to the situation where the whole cluster is running with heavy workloads. The suggested configuration change is the same in terms of the direction for the gradients. [0079] The flighting tool 224 is an important component of the disclosed tuning service 214. Before fully deploying to a production cluster, several rounds of flighting were performed that validated the possibility of increasing maximum running containers for different SKUs to increase utilization. The first pilot flighting was on 40 Gen 1.1 machines to confirm that if reducing the max_num_running_containers in the YARN configuration file is affecting the real observed maximum number of running containers. The second piloting flighting experiment was on Gen 4.1 machines to confirm that if increasing the max_num_running_containers in the YARN configuration is effective and allows the machines to run more workloads. The third piloting experiment was on two sub-clusters of machines (each with around 1700 machines) to validate if the updated configuration changes the workload distribution. The fourth pilot flighting was for three sub-clusters of machines and validated the benefits of tuning, i.e. adding more containers to the sub-cluster with better performance. [0080] The production roll-out process was quite conservative where the operational parameters 216 of the configuration were only adjusted by a small margin, e.g., decrease or increase the maximum running containers for each group of machines by one. Performance data for the periods of one month before and one month after the roll-out were extracted, where the maximum running containers is increased/decreased by one unit. In the production environment, the configuration was changed conservatively to avoid any possible large-scale performance impact. Treatment effects were used to evaluate the performance changes during the two periods with significant tests. It was observed that the level of latency (measured by average task latency), the throughput (measured by Total Data Read) was improved by 9%. For this round of deployment, conservatively, a 2% sellable capacity from the cluster was gained (measured by the total number of containers with the same level of latency as before) by only modifying the maximum running containers for each SKU-SC by one.
[0081] The Prediction Engine may be extended to other performance metrics of interest. For the tuning of other parameters, one identifies the most relevant sets of operational parameters 216 and starts tracking the dynamics between them by developing the set of predictive ML models. The Optimizer may be formulated with various objective functions for the different tuning tasks.
[0082] In some implementations, low priority containers are queued on each machine. The queuing length and latency vary significantly for machines with different SKUs and SCs. This is because of the same setting of maximum queuing length for all SKUs. As faster machines have faster de-queue rates, embodiments are able to allow more containers to be queued on them. In this sense, similar tuning methodologies may be used to learn the relationship between the tuned parameters (e.g., the maximum queuing length) and the objective performance metrics (e.g., variance of queuing latency) to achieve better queuing distribution.
[0083] In this application, the resource utilization metrics of the machines (as opposed to the throughput, latency, etc.) influence decisions around what hardware components to purchase in future machines. As discussed above, the dynamics between different sets of performance metrics may be quantitatively measured, such as the utilization level of different hardware resources like SSD, RAM, or CPU. Once CPU to use for the next generation servers is determined, the configuration design problem reduces to a prediction problem for estimating the utilization of SSD and RAM given the number of CPU cores. The resource utilization pattern is the same as the current fleet as it reflects the characteristics of the workloads (CPU intensive versus memory intensive). Therefore, the predictive models capture the relationship between the number of cores used versus the amount of SSD and RAM used in the observational data and project the SSD and RAM usage as a function of the number of cores on the server, as expressed in the following equations: s = p(c) = α s + β s c, (11) r = q(c) = α r + β r c, (12) where, c: number of CPU cores used. s: amount of SSD used when using c
CPU cores. α s , β s . parameters to predict the SSD usage. r : amount of RAM used when using c CPU cores. α r , β r : parameters to predict the RAM usage.
[0084] In Equations (11) and (12), for p(c) and q(c), a simple linear regression model was used. Based on current data, the values αr, αs, βr and βs are calibrated.
[0085] FIG. 7 shows the current resource utilization for SSD and RAM with respect to different levels of CPU utilization for a particular SKU running with the production workload. The observation is for each second for a full day with around 10.4 million records. The α s and α r are the intercepts of the projection, indicating the SSD and RAM usage levels when running with 0 cores. The βs and βr indicate the SSD usage per core and RAM usage per core. A full distribution with regard to the α s , α r , β s and β r can be derived based on each observation to capture the nature variances and noises.
[0086] For the optimization, the objective is to determine the most cost-efficient size of SSD and RAM for the new machines that have 128 CPU cores. Instead of having a closed form as Equation 7, a Monte-Carlo simulation is used to estimate the objective function — the expected total cost of each configuration. It was assumed that the maximum number of running containers on a machine are stranded by any of the three resources (CPU cores, SSD, and RAM). In some implementations, the cost of each configuration with different SSD and RAM sizes includes the penalty of idle CPU cores, SSD and RAM based on the unit cost of each resource and the extra penalty of running out of SSD or RAM. Running out of CPU is handled more gracefully than running out of RAM or SSD.
[0087] For a design with S SSD and R RAM, let α s and α r be the calibrated baseline usage for SSD and RAM respectively, the corresponding objective function may be estimated in the following manner. Initially, random numbers β s and β r are drawn from the observational data. The maximum number of CPU cores that can be used, c. are calculated in the following manner: c= min{128,p - 1 (S),q - 1 (R)}. Then, the quantity of the idle resources are estimated. In some implementations and examples, the number of idle CPU cores is: I c = 128 - c . the amount of idle SSD is: I s = S- p(c); and the amount of idle RAM is: I r = R- q(c). The total cost based on the unit price may then be calculated. If there is no idle SSD (RAM), the machine is stranded by SSD (RAM), adding an extra penalty for running out of SSD (RAM).
[0088] By repeating the above process (e.g., 1000 times), the expected cost for each design configuration is estimated with different amounts of SSD and RAM. If the configuration is designed with insufficient SSD or RAM, the out-of-SSD or out-of-memory penalty dominates the cost. If the configuration is designed with too much SSD or RAM, the penalty of having idle resources increases. This expected cost with respect to different configurations is shown in FIG. 8. The Optimizer is looking for an optimal spot where the expected penalty based on the distribution of RAM and SSD usage per core is minimized. This is shown in FIG. 8 in the top left comer of the graph, around 6.0 TB, 250.0 GB.
[0089] The same methodology of Observational Tuning and Experimental Tuning is also applicable for different resources utilization, such as network bandwidth, and many other performance metrics. The Optimizer can take either a closed-form formulation and use solvers to obtain the optimal solution, or use simple heuristics. In either case, given a predictor of the resulting performance (instead of building a complicated simulation platform), one can avoid the need for experiments to deploy new configurations in the production cluster. The set of machine learning models precisely captures the system dynamics in the complex production environment, tailored to the customer workloads.
[0090] Although Observational Tuning and Hypothetical Tuning cover a large number of applications, the performance impact for some configuration changes, such as changing a software configuration that affects the input/output (I/O) speed or the introduction of a new feature to improve the processor performance, is still unpredictable. In this case, Experimental Tuning is used. With the introduction of machine-level metrics, experiments may be done by deploying experiments to groups of machines in production and conducting A/B testing at a smaller scale.
[0091] Different applications require experiments to deploy the configuration changes to a group of machines using the flighting tool where in Phase II, the statistical analysis is used (see FIG. 4). For those configuration parameters, the tuning process involves: (1) experiments, and (2) evaluation. Based on the performance metrics discussed above, the optimal configuration may be picked. [0092] For a group of applications, the key is the design of the experiments and the determination of performance metrics. To have a fair comparison between the different groups of machines with different configurations, variables that can potentially affect the performance to the best effort are controlled, such as the hardware configurations, the time frame of data collection, even the physical location of the machines. To have statistical significance, a relatively large sample size is used. To this end, three possible experiment settings are used: ideal setting, time-variant setting, and hybrid setting. [0093] The ideal experiment setting is to have both the experiment group and control group from the same physical location, for example, choosing every other machine in the same rack as the control/experiment group. In this case, half of the machines are shown with the old configuration and half with the new one in the same physic location. This setting is ideal as it ensures that the two groups of machines are receiving almost identical workloads throughout the experiment, and as they are physically located close to each other, they are often purchased at the same time, and storing data for similar customers. [0094] The time-variant setting is in general popular in A/B testing. For the same group of machines, this setting deploys consecutively the new and old configurations back-and- forth with a particular frequency, such as every five hours (instead of 24 hours to avoid time of day effects). The evaluation of different configurations is done by measuring the performance during different time intervals. However, this setting, even though it is popular in industry, has several limitations. In the production cluster, it is very difficult to frequently deploy new configurations in a short time frame, and will potentially have variance in workloads during the different time intervals, therefore the selection of re-deployment interval becomes tricky. [0095] While both ideal and time-variant settings may not be feasible, the hybrid setting may be used that will collect performance metrics for different groups of machines with different configurations. In this sense, the aim is to ensure that the groups of machines are as similar as possible and conduct the experiment for a relatively long time period. With respect to the workload variation, one uses performance metrics that are less sensitive to the workload level. [0096] Next, two applications are discussed: selecting software configurations and power capping. [0097] Embodiments achieve the ideal setting through selecting two rows (with approximately X number of machines each) and choosing every other machine in the same rack as the control/experiment group. Two different software configurations are compared that represent using either SSD or HDD for the D: logical drive. SC1 puts D: drive on HDD, and SC2 puts D: drive on SSD. The creation of the SC2 design was motivated by high D: drive write latency for SC1 caused by contention for I/O on the HDD. This write latency created a bottleneck for resource localization in the tested cloud infrastructure. [0098] The experiment was scheduled to run over five consecutive workdays. The following table shows the performance impact using metrics that directly reflect the latency and throughput of the system: The Total Data Read per day increased by 10.9% while the average task latency decreased by 5.2%, which is a very significant improvement. In all aspects of the performance of interest, the SC2 machines dominate and the result for Student’s t-test shows that the changes are all significant. [0099] Compared to the experiment in the previous section, in this application, the power capping is at a higher level of control infrastructure and all machines in the same chassis will be capped at the same level. Moreover, multiple rounds of experiment were performed to test the performance at different capping levels, the data will be collected for different time periods. The hybrid setting was used in this application and focused on the normalized metrics, such as the Bytes per CPU Time (ratio of the Total Data Read and the CPU time) and Bytes per Second (ratio of the Total Data Read and the task execution time), that are less sensitive to the workload level and examine the differences between the experiment group(s) and the control group in different time periods. [00100] The experiment capped the machines to different provision levels and evaluated their performance. The performance impact was evaluated for machines with a new feature at the processor level enabled. For each round of the experiments with a particular level of capping, data was collected for four groups of machines for each SKU tested during the same time period to ensure that those groups of machines are receiving similar levels of workloads (but not necessarily identical as in the previous section): • Group A with no capping and Feature off, • Group B with no capping and Feature on, • Group C with capping and Feature off, and • Group D with capping and Feature on. Over 120 machines were selected for each group and capped at 10%, 15%, 20%, 25% and 30% below the original power provision level, respectively. Each round of experiments ran for more than 24 hours. [00101] FIG.9 shows the performance impact on the two metrics due to different power capping limits for machines with a particular SKU with/without the feature enabled. The y- axis indicates the performance change benchmarked to the baseline, i.e. Group A with no capping and Feature off. With 10% capping, with Feature enabled (blue bars), for Bytes per CPU Time, the performance improved by 5%. While without Feature enabled (orange bars), the same capping results in the performance degrading by 1%. One can see that with the increasing power capping level, the impact of capping becomes more significant. In all cases, having Feature enabled improves the performance. [00102] Similar experiments were also conducted for other SKUs in different clusters to determine the optimal power provision limit. Eventually, a relatively conservative capping level was chosen. However, it is still much lower than the original level and leads to considerable power reduction per year that may be harvested to add more machines in a cloud infrastructure. [00103] For applications belonging to this category (Experimental Tuning), it is critical to properly design the experiment and choose from the different settings. This analysis is feasible because of the introduction of machine-level metrics that reflect the performance of the machines when running with a large amount of production traffic. It is impossible to isolate the impacts of configuration changes in the job-level metrics. Each job runs on hundreds or thousands of machines, and each machine executes tasks from all different jobs. One cannot control for each job to be executed only in the experiment group or the control group. On the other hand, by evaluating the performance metrics at the machine group level, the disclosed embodiments circumvent the need for extracting representative workload traces in the production environment. Data is collected for a relatively long time period to ensure that the machines received a sufficiently large amount of work and the performance is relatively stable for the evaluation of statistical tests. [00104] The tuning service disclosed herein may be used to evaluate many other features of the system and has become a standardized pipeline that leads to significant performance improvement with minimum extra effort needed. The disclosed embodiments make up an end-to-end tuning application service configured for tuning framework at the cluster level at scale. Example Cloud-Computing Environment [00105] FIG. 10 illustrates a block diagram of one example of a cloud-computing environment 1000 of a cloud infrastructure, in accordance with some of the disclosed embodiments. Cloud-computing environment 1000 includes a public network 1002, a private network 1004, and a dedicated network 1006. Public network 1002 may be a public cloud-based network of computing resources, for example. Private network 1004 may be a private enterprise network or private cloud-based network of computing resources. And dedicated network 1006 may be a third-party network or dedicated cloud-based network of computing resources. [00106] Hybrid cloud 1008 may include any combination of public network 1002, private network 1004, and dedicated network 1006. For example, dedicated network 1006 may be optional, with hybrid cloud 808 comprised of public network 1002 and private network 1004. [00107] Public network 1002 may include data centers configured to host and support operations, including tasks of a distributed application, according to the fabric controller 1018. It will be understood and appreciated that data center 1014 and data center 1016 shown in FIG.10 are merely examples of suitable implementations for accommodating one or more distributed applications, and are not intended to suggest any limitation as to the scope of use or functionality of examples disclosed herein. Neither should data center 1014 and data center 1016 be interpreted as having any dependency or requirement related to any single resource, combination of resources, combination of servers (e.g., servers 1020 and 1024) combination of nodes (e.g., nodes 1032 and 1034), or a set of application programming interfaces (APIs) to access the resources, servers, and/or nodes. [00108] Data center 1014 illustrates a data center comprising a plurality of servers, such as servers 1020 and 1024. A fabric controller 1018 is responsible for automatically managing the servers 1020 and 1024 and distributing tasks and other resources within the data center 1014. By way of example, the fabric controller 1018 may rely on a service model (e.g., designed by a customer that owns the distributed application) to provide guidance on how, where, and when to configure server 1022 and how, where, and when to place application 1026 and application 1028 thereon. One or more role instances of a distributed application may be placed on one or more of the servers 1020 and 1024 of data center 1014, where the one or more role instances may represent the portions of software, component programs, or instances of roles that participate in the distributed application. In other examples, one or more of the role instances may represent stored data that are accessible to the distributed application. [00109] Data center 1016 illustrates a data center comprising a plurality of nodes, such as node 1032 and node 1034. One or more virtual machines may run on nodes of data center 1016, such as virtual machine 1036 of node 1034 for example. Although FIG.10 depicts a single virtual node on a single node of data center 1016, any number of virtual nodes may be implemented on any number of nodes of the data center in accordance with illustrative embodiments of the disclosure. Generally, virtual machine 1036 is allocated to role instances of a distributed application, or service application, based on demands (e.g., amount of processing load) placed on the distributed application. As used herein, the phrase “virtual machine,” or VM, is not meant to be limiting, and may refer to any software, application, operating system, or program that is executed by a processing unit to underlie the functionality of the role instances allocated thereto. Further, the VMs 1036 may include processing capacity, storage locations, and other assets within the data center 1016 to properly support the allocated role instances. [00110] In operation, the virtual machines are dynamically assigned resources on a first node and second node of the data center, and endpoints (e.g., the role instances) are dynamically placed on the virtual machines to satisfy the current processing load. In one instance, a fabric controller 1030 is responsible for automatically managing the virtual machines running on the nodes of data center 1016 and for placing the role instances and other resources (e.g., software components) within the data center v16. By way of example, the fabric controller 1030 may rely on a service model (e.g., designed by a customer that owns the service application) to provide guidance on how, where, and when to configure the virtual machines, such as VM 1036, and how, where, and when to place the role instances thereon. [00111] As described above, the virtual machines may be dynamically established and configured within one or more nodes of a data center. As illustrated herein, node 1032 and node 1034 may be any form of computing devices, such as, for example, a personal computer, a desktop computer, a laptop computer, a mobile device, a consumer electronic device, a server, and like. VMs machine(s) 1036, while simultaneously hosting other virtual machines carved out for supporting other tenants of the data center 1016, such as internal services 1038, hosted services 1040, and storage 1042. Often, the role instances may include endpoints of distinct service applications owned by different customers. [00112] In some embodiments, the hosted services 1040 include a tuning service 214 configured to perform the various features discussed herein. In particular, the tuning service 1050 may be implemented via executable instructions (code), middleware, hardware, or a combination thereof. In operation, the tuning service 214 causes one or more central processing units (CPUs), graphical processing units (GPUs), VMs, quantum processors, or other processing units to specifically and automatically tune the disclosed operational parameters of the cloud infrastructure. [00113] In operation, the tuning service 214 is configured to fully automate cluster configuration to be fully data- and model-driven. Tuning service 214 leverages a mix of domain knowledge and principled data-science to capture the essence of our cluster dynamic behavior in a collection of descriptive ML models. These models power automated optimization procedures for parameter tuning and inform user about some of the most tactical and strategical engineering/capacity decisions (such as hardware and data center design, software investments, etc.). Additionally, the tuning service 214 combines rich observational models (e.g., models collected without modifying the system) with judicious use of flighting (testing in production). This allows the tuning service 214 to support a broad range of applications that discussed herein. [00114] In some embodiments, the tuning service 214 includes three main components: the performance monitor 218, the experimenter 220, the modeler 222, the flighting tool 224, and the deployment tool 226. In some implementations, the performance module 218 joins the data from various sources and calculates the performance metrics of interest, providing a fundamental building block for all the analysis. An end-to-end data orchestration pipeline is developed and deployed in production to collect data on a daily basis. The modeler 222 proposes the optimal configurations. Depending on applications, different methods can be used, such as machine learning, optimization, statistical analysis and econometric models. The flighting tool 224 facilitates the deployment of configuration changes to any machine in the production cluster, and the deployment tool 226 deploys the configuration settings. [00115] The tuning service 214, and its modules 218-226, may be partially or wholly operated in the public network 1002, private network 1004, and/or dedicated network 1006. For example, the performance monitor 218 may be a service run in the public network 1002, but the modeler 222 and flighting tool 224 may be run in the private network 1004. In another example, all of the modules 218-226 may operate in the public network 1002. [00116] Typically, each of the nodes include, or is linked to, some form of a computing unit (e.g., CPU, GPU, VM, microprocessor, etc.) to support operations of the component(s) running thereon. As utilized herein, the phrase “computing unit” generally refers to a dedicated computing device with processing power and storage memory, which supports operating software that underlies the execution of software, applications, and computer programs thereon. In one instance, the computing unit is configured with tangible hardware elements, or machines, that are integral, or operably coupled, to the nodes to enable each device to perform a variety of processes and operations. In another instance, the computing unit may encompass a processor (not shown) coupled to the computer-readable medium (e.g., computer storage media and communication media) accommodated by each of the nodes. [00117] The role of instances that reside on the nodes may be to support operation of service applications, and thus they may be interconnected via APIs. In one instance, one or more of these interconnections may be established via a network cloud, such as public network 1002. The network cloud serves to interconnect resources, such as the role instances, which may be distributed across various physical hosts, such as nodes 1032 and 1034. In addition, the network cloud facilitates communication over channels connecting the role instances of the service applications running in the data center 1016. By way of example, the network cloud may include, without limitation, one or more communication networks, such as LANs and/or wide area networks WANs. Such communication networks are commonplace in offices, enterprise-wide computer networks, intranets, and the Internet, and therefore need not be discussed at length herein. [00118] FIG. 11 is a flowchart diagram of a workflow 1100 for automatically tuning a large-scale cloud infrastructure. As shown at 1102, telemetric data is accessed. The performance monitor may join the telemetric data from various sources (as shown at 1104) and calculate performance metrics of the cloud infrastructure based on the telemetric data (as shown at 1106). The modeler may identify one or more optimal configurations of cloud operational parameters for the cloud environment based on the performance metrics, as shown at 1108. The flighting tool and deployment tools are configured to pre-process and deploy, respectively, the one or more optimal configurations of the cloud operational parameters to one or more machines of the cloud infrastructure, as shown at 1110. [00119] FIG. 12 is a flowchart diagram of a workflow 1200 for automatically tuning a large-scale cloud infrastructure. As shown at 1202, telemetric data is accessed. The performance monitor may join the telemetric data from various sources (as shown at 1204) and calculate performance metrics of the cloud infrastructure based on the telemetric data (as shown at 1206). The tuning service may initiate automatic tuning of the cloud environment, either on its own (e.g., periodically or conditionally based on a performance event, such as processing, memory, or networking resources exceeding certain performance thresholds); to accommodate a particular processing job (e.g., an application upgrade, deep learning job, redundancy backup, or the like); or upon developer initiation. As shown by decision box 1208, the tuning service waits until automatic tuning is initiated. [00120] Once automatic tuning is initiated, the modeler generates (or “models”) configurations of cloud operational parameters for the cloud environment based on the performance metrics, as shown at 1210. The experimenter tests the configurations of the cloud operational parameters in a subset of servers and selects an optimal set of the cloud parameters based on performance metrics of the subset of the servers once the operational parameters are applied, as shown at 1212. These optimal set may be pre-processed by a flighting tool and then deployed by the deployment tool to the cloud environment, as shown at 1214. Additional Examples [00121] Some examples are directed to a method for automatically tuning a cloud environment comprising a plurality of servers. The method comprises: accessing telemetric data of the cloud environment; accessing a plurality of operational parameters for the cloud environment; modeling a group of the operational parameters for operating the cloud environment; testing the modeled group of the operational parameters in a subset of the servers; selecting the modeled group of the operational parameters for use in tuning the cloud environments based on said testing; and deploying the modeled group of the operational parameters into the cloud environment. [00122] Some examples build one or more machine learning (ML) models from the accessed telemetric data. [00123] Some examples apply the one or more ML models to a subset group of the plurality of servers to calculate the performance metrics of the cloud infrastructure. [00124] In some examples, the telemetric data comprises total bytes read by the cloud infrastructure per a quantity of time. [00125] In some examples, the telemetric data comprises a ratio based on a total amount of data read and a total execution time per machine of the cloud infrastructure. [00126] In some examples, the telemetric data comprises a ratio based on a total amount of data read and a total CPU time per machine of the cloud infrastructure. [00127] In some examples, the telemetric data comprises an average time running containers in the cloud infrastructure. [00128] In some examples, the cloud infrastructure is a large-scale cloud-computing environment that processes at least an exabyte of data on a daily basis. [00129] Other examples are directed to a system for automatically tuning a cloud environment comprising a plurality of servers. The system comprises: memory embodied with executable instructions for performing said tuning of the cloud infrastructure; and one or more processors programmed for: calculating performance metrics of the cloud infrastructure based on the telemetric data; generating one or more configurations of cloud operational parameters for the cloud environment based on the performance metrics, testing the one or more configurations of the cloud operational parameters in a subset of the one or more servers, selecting an optimal set of the cloud parameters based on performance metrics of the subset of the one or more servers once the operational parameters are applied, and deploying the optimal set of the cloud operational parameters to one or more servers of the cloud environment [00130] In some examples, the telemetric data is collected daily. [00131] In some examples, the one or more processors are programmed for building one or more machine learning (ML) models from the accessed telemetric data. [00132] In some examples, the one or more processors are programmed for applying the one or more ML models to a subset group of the plurality of machines to calculate the performance metrics of the cloud infrastructure. [00133] In some examples, the telemetric data comprises total bytes read by the cloud infrastructure per a quantity of time. [00134] In some examples, the quantity of time is an hour. [00135] In some examples, the telemetric data comprises total number of tasks finished per a quantity of time. [00136] Other examples are directed to one or more computer-readable memory devices embodied with modules that are executable by one or more processors for automatically tuning a cloud environment comprising a plurality of servers. The modules comprising: a performance monitor configured to join telemetric data from various sources and calculate performance metrics of the cloud infrastructure based on the telemetric data; a modeler configured to generate one or more configurations of cloud operational parameters for the cloud environment based on the performance metrics; an experimenter configured to test the one or more configurations of the cloud operational parameters in a subset of the one or more servers and select an optimal set of the cloud parameters based on performance metrics of the subset of the one or more servers once the operational parameters are applied; and a deployment tool configured to deploy the optimal set of the cloud operational parameters to one or more servers of the cloud environment. [00137] In some examples, the modeler applies an artificial intelligence (AI) algorithm to the performance metrics for generating the one or more configurations of cloud operational parameters. [00138] In some examples, the one or more configurations of cloud operational parameters comprise a quantity of processing resources to use for a processing job. [00139] In some examples, the one or more configurations of cloud operational parameters comprise a quantity of processing jobs to run on one CPU. [00140] In some examples, the telemetric data comprises total number of tasks finished per a quantity of time. [00141] The examples and embodiments disclosed herein may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program components, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program components including routines, programs, objects, components, data structures, and the like, refer to code that performs particular tasks, or implement particular abstract data types. The disclosed examples may be practiced in a variety of system configurations, including personal computers, laptops, smart phones, mobile tablets, hand-held devices, consumer electronics, servers, VMs, specialty computing devices, etc. The disclosed examples may also be practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network. [00142] While the aspects of the disclosure have been described in terms of various examples with their associated operations, a person skilled in the art would appreciate that a combination of operations from any number of different examples is also within scope of the aspects of the disclosure. [00143] The order of execution or performance of the operations in examples of the disclosure illustrated and described herein is not essential, and may be performed in different sequential manners in various examples. For example, it is contemplated that executing or performing a particular operation before, contemporaneously with, or after another operation is within the scope of aspects of the disclosure. [00144] In embodiments involving a general-purpose computer, aspects of the disclosure transform the general-purpose computer into a special-purpose computing device when configured to execute the instructions described herein. [00145] The embodiments illustrated and described herein as well as examples not specifically described herein but within the scope of aspects of the disclosure constitute exemplary means for authenticating a client to automatically tune a cloud. [00146] When introducing elements of aspects of the disclosure or the examples thereof, the articles “a,” “an,” “the,” and “said” are intended to mean that there are one or more of the elements. The terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements other than the listed elements. The term “exemplary” is intended to mean “an example of.” The phrase “one or more of the following: A, B, and C” means “at least one of A and/or at least one of B and/or at least one of C." [00147] Having described aspects of the disclosure in detail, it will be apparent that modifications and variations are possible without departing from the scope of aspects of the disclosure as defined in the appended claims. As various changes could be made in the above constructions, products, and methods without departing from the scope of aspects of the disclosure, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
Next Patent: NONWOVEN FABRICS WITH IMPROVED HAPTICS AND MECHANICAL PROPERTIES