Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SCALE-UP VERSUS SCALE-OUT
Document Type and Number:
WIPO Patent Application WO/2019/162727
Kind Code:
A2
Inventors:
SHARMA, Pratik (Kailashpuri, Bunglow No 2 Govind Nagar, Malad East, Mumbai 7, 400097, IN)
Application Number:
IB2018/051170
Publication Date:
August 29, 2019
Filing Date:
February 24, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SHARMA, Pratik (Kailashpuri, Bunglow No 2 Govind Nagar, Malad East, Mumbai 7, 400097, IN)
International Classes:
G06F9/44; G06F15/16
Download PDF:
Claims:
Claims

Following is the claim for this invention :-

1 . Today we have huge deployments of clusters of cheap comity hardware to run data analytics workloads. To handle dynamic workloads of various data analytics frameworks we have scale-out option which is adding more virtual machines or servers to the cluster and scale-up option which is adding more resources like cores, memory to a single server. Data analytics infrastructure frameworks like Hadoop is designed to process petabytes of data or more but for some jobs a single server with scale-up capability can do better than a scale-out cluster in terms of performance and server density. Here we classify any job (like a data analytics Hadoop job) we want to run into CPU-intensive (Central Processing Unit intensive) jobs or Data Shuffling intensive jobs. Scale-out option works well for CPU-intensive jobs or tasks as there are more cores and more aggregate memory bandwidth. Scale-up option works well for Data Shuffling intensive jobs as there is no network bottleneck and there is quick access to storage. Also for jobs which are to be performed periodically we analyse their Central Processing Unit time consumption against the input data set size and number of data shuffling tasks per job against the input data set. It is expected that most of these kind of jobs will follow the power-law distribution. The above novel technique of classifying different jobs and deciding either scale-out or scale-up option for running it is the claim for this invention.

Description:
Scale-up versus Scale-out

Today we have huge deployments of clusters of cheap comity hardware to run data analytics workloads. To handle dynamic workloads of various data analytics frameworks we have scale-out option which is adding more virtual machines or servers to the cluster and scale-up option which is adding more resources like cores, memory to a single server. Data analytics infrastructure frameworks like Hadoop is designed to process petabytes of data or more but for some jobs a single server with scale-up capability can do better than a scale-out cluster in terms of performance and server density. Here we classify any job(like a data analytics Hadoop job) we want to run into CPU-intensive (Central Processing Unit intensive) jobs or Data Shuffling intensive jobs. Scale-out option works well for CPU-intensive jobs or tasks as there are more cores and more aggregate memory bandwidth. Scale-up option works well for Data Shuffling intensive jobs as there is no network bottleneck and there is quick access to storage. Also for jobs which are to be performed periodically we analyse their Central Processing Unit time consumption against the input data set size and number of data shuffling tasks per job against the input data set. It is expected that most of these kind of jobs will follow the power-law distribution.