Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A BIG DATA PROCESSING METHOD, DEVICE, AND SYSTEM, AND A MACHINE READABLE MEDIUM
Document Type and Number:
WIPO Patent Application WO/2018/215651
Kind Code:
A1
Abstract:
The present invention provides a big data processing method, device, and system, and a machine readable medium. The big data processing method comprises receiving a big data processing request, said big data processing request being used to request the processing of big data, acquiring said big data from the storage device of said big data according to said big data processing request, determining the processing logic which processes said big data according to said big data processing request, invoking at least one R service from one R service library according to said processing logic, and acquiring the processing results through the processing of said big data by said at least one R service. The big data processing device comprises a request receiving module, a data acquisition module, a request analysis module, a service invocation module, and a data processing module. The present solution can improve the accuracy of big data processing results.

Inventors:
ZANG DI (CN)
WU JIANG (CN)
QI WEI (CN)
LIU KANG (CN)
Application Number:
PCT/EP2018/063807
Publication Date:
November 29, 2018
Filing Date:
May 25, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SIEMENS AG (DE)
International Classes:
G06F17/18; G06F9/50; G06F9/54; G06F17/30
Other References:
BERRAL JOSEP LL ET AL: "Database Integrated Analytics Using R: Initial Experiences with SQL-Server + R", 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), IEEE, 12 December 2016 (2016-12-12), pages 1 - 7, XP033055790, DOI: 10.1109/ICDMW.2016.0009
Attorney, Agent or Firm:
ISARPATENT - PATENT- UND RECHTSANWÄLTE BEHNISCH BARTH CHARLES HASSA PECKMANN UND PARTNER MBB (DE)
Download PDF:
Claims:
Claims

1. A big data processing method, wherein said method comprises: receiving a big data processing request, said big data processing request being used to request the processing of big data, acquiring said big data from the storage device (20) of said big data according to said big data processing request,

determining the processing logic which processes said big data according to said big data processing request,

invoking at least one R service from one R service library according to said processing logic, and

processing said big data through said at least one R service to acquire the processing results. 2. The method as claimed in claim 1, wherein invoking at least one R service from one R service library according to said processing logic comprises:

determining at least one processing step which is comprised in the processing of said big data according to said processing logic, and invoking the R services which are used to realize each of said at least one processing step from said R service library in turn.

3. The method as claimed in claim 2, wherein

before invoking the R services which are used to realize each of said at least one processing step from said R service library in turn, the method further comprises:

acquiring at least one development language interface, each of said development language interfaces being used for a corresponding development language to invoke an R service from said R service library,

invoking the R services which are used to realize each of said at least one processing step from said R service library in turn comprises :

performing the following operations for each of said at least one processing step in turn: determining the target R service which is used to realize said processing step in said R service library, and

through a preset development language corresponding to said target R service, utilizing the development language interface corresponding to said development language to invoke said target R service from said R service library.

4. The method as claimed in any of claims 1 to 3 , wherein acquiring said big data from the storage device (20) of said big data comprises :

sending an acquisition request to the distributed file system (202) in said storage device (20) , said acquisition request being used to acquire said big data from the database (201) in said storage device (20) through said distributed file system (202) , and receiving said big data from said distributed file system (202) .

5. A big data processing device (10), wherein said device comprises :

a request receiving module (101) , used to receive a big data processing request, said big data processing request being used to request the processing of big data,

a data acquisition module (102) , used to acquire said big data from the storage device (20) of said big data according to said big data processing request received by said request receiving module (101) , a request analysis module (103) , used to determine the processing logic which processes said big data according to said big data processing request received by said request receiving module (101) , a service invocation module (104), used to invoke at least one R service from one R service library according to said processing logic determined by said request analysis module (103) , and a data processing module (105) , used to process said big data acquired by said data acquisition module (102) through said at least one R service invoked by said service invocation module (104) to obtain the processing results.

6. The device as claimed in claim 5, wherein said service invocation module (104) comprises:

a procedure decomposition unit (1041) , used to determine at least one processing step which is comprised in the processing of said big data according to said processing logic, and

a service invocation unit (1042) , used to invoke the R services which are used to realize each of said at least one processing step determined by said procedure decomposition unit (1041) from said R service library in turn.

7. The device as claimed in claim 6, wherein said device further comprises an interface acquisition module (106) , which is used to acquire at least one development language interface, each development language interface is used for a corresponding development language to invoke R services from said R service library, and

said service invocation unit (1042) determines the target R service which is used to realize said processing step in said R service library for each of said at least one processing step in turn, and through a preset development language corresponding to said target R service, utilizes the development language interface which is acquired by said interface acquisition module (106) and corresponds to said development language to invoke said target R service from said R service library.

8. The device as claimed in any of claims 5 to 7, wherein: said data acquisition module (102) is used to send an acquisition request to the distributed file system (202) in said storage device (20) , said acquisition request being used to acquire said big data from the database (201) in said storage device (20) through said distributed file system (202) , and receive said big data from said distributed file system,

9. A big data processing unit (10) , wherein said unit comprises at least one storage (801) and at least one processor (802) , wherein said at least one storage (801) is used to store machine readable programs, and

said at least one processor (802) is used to invoke said machine readable programs to execute the method as claimed in any of claims 1 to 4.

10. A big data processing system, wherein said system comprises a storage device (20) and a big data processing device (10) as claimed in any of claims 5 to 9, wherein

said storage device (20) is used to store said big data to be processed.

11. The system as claimed in claim 10, wherein said storage device (20) comprises:

a database (201) , used to store said big data, and

a distributed file system (202) , used to acquire said big data from said database (201) according to an acquisition request from said big data processing device (10) and send said big data to said big data processing device (10) .

12. A machine readable medium, wherein a computer instruction is stored on said machine readable medium and a processor executes the method as claimed in any of claims 1 to 4 when said computer instruction is executed by said processor.

Description:
Description

A Big Data Processing Method, Device, and System, and a Machine

Readable Medium

Technical Field

The present invention relates to the technical field of big data, and in particular relates to a big data processing method, device, system, and a machine readable medium.

Background Art

Big data is a set of data. The size of big data is usually very large . Big data can be used to realize the analysis of the development trend and behavior mode in a specific field. Through the analysis of big data, intelligent, in-depth, and valuable information can be obtained. Therefore, big data analyses are widely applied in fields such as artificial intelligence, market analysis, and product optimization.

The R language is a tool used for statistical computation and statistical charting and can be used for big data analyses.

Currently, when the R language is used to analyze big data, the R language invokes big data, selects limited data samples from big data, analyzes the selecteddata samples to obtain a processingmodel, and uses the obtained processing model to further analyze big data to obtain the analysis result. For the above-mentioned big data processing method, the processing model is obtained from the analysis of a limited quantity of data samples. However, the limited quantity of data samples cannot completely reflect the data rules of big data to be processed. Therefore, the processing model obtained from the limited quantity of data samples has a certain error . The accuracy of the processing results of big data obtained by using the processing model having an error will be low.

Summary of the Invention

In view of this, the big data processing method, device, and system, and the machine readable medium provided by the present invention can improve the accuracy of processing results of big data.

In a first aspect, the embodiments of the present invention provide a big data processing method, which comprises:

after receiving a big data processing request used to request the processing of big data, acquiring big data from the storage device of big data according to the big data processing request , determining the processing logic which processes big data according to the big data processing request, invoking at least one R service from one R service library according to the processing logic, and acquiring the processing results through the processing of big data by at least one invoked R service. At least one R service is invoked according to the processing logic of big data, and at least one invoked R service directly processes big data to acquire the processing results. It is unnecessary to select data samples from big data to form a processing model and then use the obtained processing model to further analyze big data to obtain the processing results . Thus, a lower accuracyof processing results of big data because of the error of the processing model formed from a limited quantity of data samples can be avoided. Therefore, the accuracy of the processing results of big data can be improved by invoking R services to directly process big data.

Optionally, when at least one R service is invoked from one R service library according to the processing logic, at least one processing step which is comprised in the processing of big data is determined according to the processing logic, and the R services which are used to realize each of at least one processing step are invoked from the R service library in turn. A plurality of R services are included in the R service library, and each R service can realize a corresponding processing step. In the processingof big data, after at least one processing step required for the processing of big data is determined according to the processing logic which processes big data, the corresponding R services are invoked from the R service library to complete each processing step and realize the processing of big data. Different processing types can be realized for big data through the combination and sorting of different R services. Thus, the types of big data which can be processed by use of the present big data processing method are broadened and the applicability of the present data processing method is improved. Optionally, before the R services which are used to realize each of at least one processing step are invoked from the R service library in turn, at least one development language interface is acquired and each development language interface is used for a corresponding development language to invoke R services from the R service library. When the R services which are used to realize each of at least processing step are invoked from the R service library in turn, for each of at least one processing step, the target R service which is used to realize the processing step is determined in the R service library, and through a preset development language corresponding to the target R service, the development language interface corresponding to the development language is utilized to invoke the target R service from the R service library.

The development language interfaces corresponding to different development languages are acquired, and a corresponding development language is predefined for each R service in the R service library. After the target R service which can realize a processing step is determined for the processing step, through a development language corresponding to the target R service, the development language interface corresponding to the development language is utilized to invoke the target R service from the target R service. After development language interfaces are acquired and a corresponding development language is predefined for each R service , the R services in the R service library can be invoked through different development languages. The advantages of different development languages are fully utilized to invoke the R services to process big data, and the processing efficiency of big data is improved.

Optionally, acquiring big data from the storage device of big data comprises :

sending an acquisition request to the distributed file system in the storage device , said acquisition request being used to acquire big data from the database in the storage device through the distributed file system, and receiving said big data from said distributed file system.

The distributed file system is a general-purpose big data management system. The acquisition of big data from the database through the distributed file system improves the applicability of the present big data processing method.

In a second aspect, the embodiments of the present invention further provide a big data processing device, which comprises:

a request receiving module, used to receive a big data processing request, the big data processing request being used to request the processing of big data,

a data acquisition module, used to acquire big data from the storage device of bigdata according to thebigdataprocessing request received by the request receiving module,

a request analysis module, used to determine the processing logic which processes big data according to the big data processing request received by the request receiving module,

a service invocation module, used to invoke at least one R service from one R service library according to the processing logic determined by the request analysis module, and

a data processing module, used to process big data acquired by the data acquisition module through at least one R service invoked by the service invocation module to obtain the processing results.

According to at least one R service invoked by the service invocation module, the data processing module processes big data acquired by the data acquisition module directly through at least one invoked R service to obtain the processing results . It is unnecessary to select data samples from big data to form a processing model and then use the obtained processing model to further analyze big data to obtain the processing results. Thus, a lower accuracy of processing results of big data because of the error of the processing model formed from a limited quantity of data samples can be avoided. The accuracy of the processing results of big data can be improved because the service invocation module invokes R services according to big data and the data processing module directly processes big data through R services .

Optionally, the service invocation module comprises :

a procedure decomposition unit, used to determine at least one processing step which is comprised in the processing of big data according to the processing logic, and

a service invocation unit, used to invoke the R services which are used to realize each of at least one processing step determined by the procedure decomposition unit from the R service library in turn.

The procedure decomposition unit determines at least one processing step required for the processing of big data according to the processing logic determined by the request analysis module, and the service invocation unit invokes the R services which are used to realize each processing step from the R service library in turn. A plurality of R services are included in the R service library, and each R service can realize a corresponding processing step. Therefore, the service invocation unit can invoke different R services from the R service library according to the processing steps determined by the procedure decomposition unit to process different types of big data. Thus, the types of big data which can be processed by use of the present big data processing method are broadened and the applicability of the present data processing method is improved.

Optionally, the big data processing device can further comprise an interface acquisition module, which is used to acquire at least one development language interface, each development language interface is used for a corresponding development language to invoke R services from the R service library, and

said service invocation unit determines the target R service which is used to realize each processing step in the R service library for each of at least one processing step in turn, and through a preset development language corresponding to the target R service, utilizes the development language interface corresponding to the development language to invoke a target R service from the R service library.

TheR service library itself provides development language interfaces for some development languages, the interface acquisition unit acquires the development language interfaces provided by the R service library and creates development language interfaces which theR service librarydoes not provide . Througha development language corresponding to each target R service, the service invocation unit utilizes the development language interface corresponding to the development language to invoke anR service from the R service library. The interface acquisition unit acquires development language interfaces and the service invocation unit can invoke the R services in the R service library through different development languages. The advantages of different development languages are fully utilized to invoke the R services to process big data, and the processing efficiency of big data is improved.

Optionally, the data acquisitionmodule is used to send anacquisition request to the distributed file system in the storage device, the acquisition request being used to acquire big data from the database in the storage device throughthe distributed file system, and receive big data from the distributed file system. Since the distributed file system is a general-purpose big data management system and the data acquisition module can acquire big data from the database through the distributed file system, the applicability of the present big data processing device is improved.

In a third aspect, the embodiments of the present invention further provide a big data processing device, which comprises at least one storage and at least one processor, wherein

at least one storage is used to store machine readable programs, and

at least one processor is used to invoke machine readable programs to execute the method provided in the first aspect or provided by any possible realization in the first aspect. Machine readable programs are stored in the storage , and the processor can execute the method provided in the first aspect or any possible realization in the first aspect by invoking the machine readable programs stored in the storage. Through the R service invocation for big data, data samples do not need to be selected to form a processing model, and big data are directly processed through R services to obtain the processing results . Thus, inaccurate big data processing results caused by the error of the processing model are avoided, and the processing accuracy of big data can be improved. In a fourth aspect, the embodiments of the present invention further provide a big data processing system, which comprises a storage device and a big data processing device provided in the second aspect or provided by any possible realization in the second aspect or any possible realization in the third aspect.

The storage device is used to store big data to be processed.

Big data are stored in the storage device. The big data processing device acquires big data to be processed from the storage device according to a big data processing request. Through the R service invocation for big data, data samples do not need to be selected to formaprocessingmodel, andbigdata are directlyprocessed through R services to obtain the processing results. Thus, inaccurate big data processing results caused by the error of the processing model are avoided, and the processing accuracy of big data can be improved.

Optionally, the storage device comprises:

a database, used to store big data, and

a distributed file system, used to acquire big data from the database according to an acquisition request from the big data processing device and send big data to the big data processing device .

The storage device can comprise the database and the distributed file system. The database isusedto storebigdata, andthedistributed file system acquires big data from the database and sends big data to the big data processing device. The distributed file system is a general-purpose big data management system. The acquisition of big data from the database through the distributed file system further improves the applicability of the present big dataprocessing system. In a fifth aspect, the embodiments of the present invention further provide a machine readable medium. Computer instructions are stored on the machine readable medium. When computer instructions are executedby the processor, the processor executes the methodprovided in the first aspect or provided by any possible realization in the first aspect.

Computer instructions are stored on the machine readable medium. When computer instructions are executed by the processor, the processor executes the big data processing method provided in the first aspect or provided by any possible realization in the first aspect. Through the R service invocation for big data, data samples do not need to be selected to form a processing model, and big data are directly processed through R services to obtain the processing results. Thus, inaccurate big data processing results caused by the error of the processing model are avoided and the purpose of improving the processing accuracy of big data is achieved. Brief Description of the Drawings

Figure 1 is a schematic diagram for a big data processing system provided in one embodiment of the present invention.

Figure 2 is a schematic diagram for another big data processing system provided in one embodiment of the present invention.

Figure 3 is a flowchart of a big data processing method provided in one embodiment of the present invention.

Figure 4 is a flowchart of a method for invoking R services in one embodiment of the present invention. Figure 5 is a flowchart of another method for invoking R services in one embodiment of the present invention.

Figure 6 is a schematic diagram for a big data processing device provided in one embodiment of the present invention.

Figure 7 is a schematic diagram for another big data processing device provided in one embodiment of the present invention.

Figure 8 is a schematic diagram for a third big data processing device provided in one embodiment of the present invention.

Figure 9 is a schematic diagram for a fourth big data processing device provided in one embodiment of the present invention. Figure 10 is a flowchart of anotherbig dataprocessingmethodprovided in one embodiment of the present invention.

Description of reference numbers in the drawings:

10: Big data 20: Storage 101: Request

processing device device receiving module

102 : Data 103 : Request 104 : Service acquisition module analysis module invocation module

105 : Data 106: Interface 1041: Procedure processing module acquisition decomposition unit module

1042: Service 201: Database 202 : Distributed invocation unit file system

901: Storage 902: Processor

301: Receive a big data processing request

302 : Acquire big data according to the big data processing request 303: Determine a processing logic according to the big data processing request

304: Invoke at least one R service according to the processing logic

401: Determine at least one processing step in the processing of big data according to the processing logic

402: Invoke the R services which are used to realize each processing step in turn

501 : Determine the target R serviceused to realize eachprocessing step, respectively

502: Determine the development language used to realize the target R service

503 : Utilize the corresponding interface of a development language to invoke the target R service

1001: Acquire at least one development language interface 1002: Receive a big data processing request

1003: Acquire big data from the storage device according to the big data processing request

1004 : Determine a processing logic according to the big data processing request

1005: Determine at least one processing step according to the big data processing request

1006 : Determine the target R services used to realize eachprocessing step in turn

1007: Determine the corresponding development language of each target R service

1008: Utilize the corresponding development language interface to invoke each target R service through a corresponding development language

1009: Process big data through the target R services in turn 1010: Output the processing results of big data

Detailed Description of the Invention

As described above , when the Rprogramming language is used to analyze big data, the processing model is obtained from the analysis of a limited quantity of data samples. However, the limited quantity of data samples cannot completely reflect the data rules of big data to be processed. Therefore, the processing model obtained from limited data samples has a certain error. The accuracy of the processing results of big data by use of the processing model having an error will be low.

In the embodiments of the present invention, an R service library is constructedbyuse of the Rprogramming language , and the R services used for the analysis, mining, calculation, statistics, and charting of big data, and the development language interfaces used to acquire the corresponding development languages are contained in the R service library. When big data is processed, according to the processing logic which processes big data, the development language interface corresponding to a development language is utilized to invoke at least one R service from the R service library to process big data and obtain the processing results . On the one hand, through the R service invocation for big data, data samples do not need to be selected to form a processing model, and big data are directly processed through R services to obtain the processing results. Thus, inaccurate big data processing results caused by the error of the processing model are avoided and the accuracy of the processing results of big data is improved. On the other hand, through the acquisition of the development language interfaces corresponding to various development languages, the R services in the R service librarycanbe invoked toprocess big data throughvarious development languages, the advantages of various development languages are fully utilized during the processing of big data. Thus, the processing efficiency of big data is improved and the types of big data which can be processed are broadened.

The following will describe in detail the method and device provided in the embodiments of the present invention in combination with the drawings .

As shown in Figure 1 , the embodiments of the present inventionprovide a big data processing system, which comprises a big data processing device 10 and a storage device 20, wherein

the storage device 20 is used to store big data, and the big data processing device 10 invokes at least one R service from one R service library to process the acquired big data so as to obtain the processing results after acquiring big data from the storage device 20.

Optionally, as shown in Figure 2,

the storage device 20 comprises a database 201 and a distributed file system 202, wherein,

the database 201 is used to store big data, and the distributed file system 202 is used to acquire big data from the database 201 according to the acquisition request from the big data processing device 10 and sends the big data to the big data processing device 10.

Optionally, the storage device 20 can be in a form other than shown in Figure 2. For example, the storage device 20 can comprise only the database. In this case, the big data processing device 10 can directly read big data to be processed from the database according to a big data processing request.

The following will describe the big data processing method provided in the embodiments of the present invention. The big data processing method can be executed by the above-mentioned big data processing device 10. As shown in Figure 3, the method comprises the following steps : Step 301: Receive a big data processing request, wherein the big data processing request is used to request the processing of big data.

Step 302: Acquirebigdata from the storage device of bigdata according to the big data processing request.

Step 303: Determine the processing logic which processes big data according to the big data processing request.

Step 304: Invoke at least one R service from one R service library- according to the processing logic. Step 305: Process big data through at least one invoked R service to acquire the processing results.

In a big data processing method provided in the embodiments of the present invention, after abigdataprocessing request used to request the processing of big data is received, big data is acquired from the storage device of big data according to the big data processing request , the processing logic which processes big data is determined according to the big data processing request, at least one R service is invoked from the R service library according to the processing logic which processes big data, big data is processed through at least one invoked R service to obtain the processing results. Thus, it can be seen that when big data is processed, at least one R service is invoked from the R service library according to the processing logic of big data to directly process big data. Big data processing is realized through R service invocations, without any necessity to select data samples from big data to form a processing model, and big data are directly processed as samples. Thus, the influence of the error of the processing model on the processing results of big data can be avoided and the accuracy of the processing results of big data is improved. Optionally, Step 304 in which at least one R service is invoked from one R service library canbe realized through the following sub-steps, as shown in Figure 4.

Sub-step 401: Determine at least one processing step which is comprised in the processing of big data according to the processing logic , and

Sub-step 402: Invoke the R services which are used to realize each of at least one processing step from the R service library in turn.

After the processing logic which processes big data is determined, the processing steps of big data can be determined according to the processing logic. Usually, the processing of big data requires a plurality of processing steps. For example, when big data X is processed, three processing steps of processing step A, processing step B, and processing step C, which need to be executed in turn to process big data X are determined according to the processing logic which processes big data X. The R services which can realize a plurality of processing steps of different processing logics are included in the R service library, and after at least one processing step of big data is determined, the R services used to realize each processing step are invoked from the R service library in turn. For example, 100 R services are included in the R service library, wherein R service A can achieve the purpose of data processing in processing step A, R service B can achieve the purpose of data processing in processing step B, and R service C can achieve the purpose of data processing in processing step C. After it is determined that the processing of big data X requires processing step A, processing step B, and processing step C in turn, R service A, R service B, and R service C are invoked from the R service library in turn so that R service A realizes processing step A, R service B realizes processing step B, and R service C realizes processing step C. A plurality of R services are included in the R service library and each R service can realize a processing step. The sequential combination of different processing steps can realize different processing logics to perform different types of processing for big data. Therefore, different types of processing can be performed for big data by changing the types of invokedR services and the invocation sequence so as to meet the requirements of different users and improve the applicability of the present big data processing method. Optionally, in order to fully utilize the advantages of different development languages, at least one development language interface of the R service library can be acquired before step 402. Each development language interface is used for a corresponding development language to invoke R services from the R service library. For example, a C programming language interface, a C++ interface, and a Java interface are acquired. The R services in the R service library can be invoked by use of the C programming language interface through the C programming language, the R services in the R service library can be invoked by use of the C++ interface through C++, and the R services in the R service library can be invoked by use of the Java interface through Java.

Accordingly, the R services which are used to realize each of at least one processing step are invoked from the R service library in turn in step 402. As shown in Figure 5, the step can specifically be realized through the following sub-steps:

Sub-step 501 : For each of at least one processing step, determine the target R service used to realize the processing step in the R service library.

Sub-step 502: Through a preset development language corresponding to said target R service, utilize the development language interface corresponding to said development language to invoke said target R service from the R service library. Different development languages have their own advantages. For example, the C++ programming language has a high computing speed, the C programming language can solve the problem about memory overflow, and Java services can realize cross-platform operations. Through the acquisition of development language interfaces provided by the R service library to correspond to different development languages, a corresponding development language is defined for each R service according to the characteristics of each R service. For example, the invocation speed of R service 1 is low and the corresponding development language of R service 1 can be set to C++; R service 2 occupies a large memory size when it runs and the corresponding development language of R service 2 can be set to the C programming language . After R service 1 is determined as a target R service, the C++ interface is utilized to invoke R service 1 from the R service library through the development language C++; after R service 2 is determined as a target R service, the C language interface is utilized to invoke R service 2 from the R service library through the development language C. The advantages of different development languages are utilized and a plurality of R services are invoked from the R service library through a plurality of development languages to process big data . Thus , the processing speed of big data is increased, and the types of big data which can be processed are broadened, that is to say, the applicability of the present big data processing method is improved.

Specifically, the R service library itself provides development language interfaces corresponding to some development languages, for example, interfaces corresponding to C++, C, Java, PHP, BASIC, C#, and Python. The development language interfaces which the R service library does not provide can be expanded for the R service library through the corresponding development languages so that the R services in the R service library can be invoked through more types of development languages. For example, a Perl interface is created for the R service library through the development language Perl so that the R services in the R service library can be invoked through the development language Perl . In this way, R services can be invoked from the R service library through various development languages such as C++, C, Java, PHP, BASIC, C#, Python, and Perl. On the one hand, the requirements of different users can be satisfied and the applicability of the present big data processing method is improved. On the other hand, a plurality of R services are invoked through a plurality of development languages to process big data, the advantages of different development languages are fully utilized to execute each processing step of big data, and the processing speed and quality of big data are improved.

Optionally, when big data is acquired from the storage device of big data in step 302, specifically, an acquisition request can be sent to the distributed file system in the storage device, and the distributed file system reads bigdata from the database in the storage device and returns the read big data after receiving the acquisition request .

Since big data usually has a large data size and the distributed file system is a general -purpose big data management system, the acquisition of big data to be processed through the distributed file system improves the applicability of the present big data processing method. Of course, big data can also directly be acquired from the database .

As shown in Figure 6 , one embodiment of the present inventionprovides a big data processing device 10, which comprises:

a request receiving module 101, used to receive a big data processing request, wherein the big data processing request is used to request the processing of big data,

a data acquisition module 102, used to acquire big data from the storage device of big data according to the big data processing request received by the request receiving module 101,

a request analysis module 103, used to determine the processing logic which processes big data according to the big data processing request received by the request receiving module 101,

a service invocation module 104, used to invoke at least one R service from one R service library according to the processing logic determined by the request analysis module 103, and a data processing module 105, used to process big data acquired by the data acquisition module 102 through at least one R service invoked by the service invocation module 104 to obtain the processing results .

In one embodiment of the present invention, the request receiving module 101 can be used to execute step 301 in the embodiment of the above-mentioned method, the data acquisition module 102 can be used to execute step 302 in the embodiment of the above-mentioned method, the request analysis module 103 can be used to execute step 303 in the embodiment of the above-mentioned method, the service invocation module 104 can be used to execute step 304 in the embodiment of the above-mentioned method, and the data processing module 105 can be used to execute step 305 in the embodiment of the above-mentioned method.

Optionally, as shown in Figure 7,

the service invocation module 104 can comprise a procedure decomposition unit 1041 and a service invocation unit 1042, wherein, the procedure decomposition unit 1041 is used to determine at least one processing step which is comprised in the processing of big data according to the processing logic, and

the service invocation unit 1042 is used to invoke the R services which are used to realize each of at least one processing step determinedby theprocedure decompositionunit 1041 from the R service library in turn.

In one embodiment of the present invention, the procedure decomposition unit 1041 can be used to execute sub-step 401 in the embodiment of the above-mentioned method, and the service invocation unit 1042 can be used to execute sub- step 402 in the embodiment of the above-mentioned method.

Optionally, as shown in Figure 8, the big data processing device further comprises an interface acquisition module 106. The interface acquisition module 106 is used to acquire at least one development language interface, and each development language interface is used for a corresponding development language to invoke an R service from the R library.

The service invocation unit 1042 determines the target R service which is used to realize each processing step in the R service library for each of at least one processing step in turn, and through a preset development language corresponding to the target R service, utilizes the development language interface acquired by the interface acquisition module 106 and corresponding to a development language to invoke a target R service from the R service library.

In one embodiment of the present invention, the service invocation unit 1042 can be used to execute sub-steps 501 to 503 in the embodiment of the above-mentioned method.

Optionally, as shown in any of Figures 6 to 8,

the data acquisition module 102 is used to send an acquisition request to the distributed file system {for example, distributed file system 202) in the storage device (for example, storage device 20) and receive big data from the distributed file system, and the acquisition request is used to acquire big data from the database (for example, database 201) in the storage device through the distributed file system.

As shown in Figure 9, one embodiment of the present inventionprovides a big data processing device 10, which comprises at least one storage 901 and at least one processor 902, wherein

at least one storage 901 is used to store machine readable programs , and

at least one processor 902 is used to invoke machine readable programs stored in at least one storage 801 to execute the steps in the embodiment of the above-mentioned method. The following will further describe the big data processing method provided in one embodiment of the present invention in detail in combination with the big data processing device and the big data processing system provided in the above-mentioned embodiments. As shown in Figure 10, the method can comprise the following steps:

Step 1001: Acquire at least one development language interface.

In one embodiment of the present invention, at least one development language interface provided by the R service library is acquired and each development language interface is used for a corresponding development language to invoke R services from the R service library. The development language interfaces which the R service library does not provide can be expanded for the R service library through corresponding development languages. For example, the R service library itself provides a C++ interface, a C programming language interface, a Java interface, a PHP interface, a BASIC interface, a C# interface, and a Python interface, wherein the C++ interface can be used for C++ to invoke R services in the R service library, the C programming language interface can be used for the development language C to invoke R services in the R service library, the Java interface can be used for Java to invoke R services in the R service library, the PHP interface can be used for PHP to invoke R services in the R service library, the BASIC interface can be used for BASIC to invoke R services in the R service library, the C# interface can be used for C# to invoke R services in the R service library, and the Python interface can be used for Python to invoke R services in the R service library. If the R service library does not provide an interface for the development language Perl, a Perl interface can be expanded for the R service library through the development language Perl so that the R services in the R service library can be invoked through the development language Perl.

Step 1002: Receive a big data processing request. In one embodiment of the present invention, the big data processing device receives a big data processing request from a user terminal or storage device, wherein the big data processing request is used to request the processing of big data. For example, when a user needs to process big data, a big data processing request is sent from the user terminal that is used to the big data processing device, or a timer is set in the storage device and a data processing request is sent to the big data processing device at certain intervals . Step 1003: Acquire big data from the storage device according to the big data processing request.

Inone embodiment of thepresent invention, after abigdataprocessing request is received, big data to be processed is acquired from the storage device according to the big data processing request. For example, the storage device sends an acquisition request to the distributed file system according to the big dataprocessing request , and the distributed file system acquires big data from the database under its management according to the received acquisition request and sends the acquired big data to the big data processing device.

Step 1004: Determine a processing logic according to the big data processing request. Inone embodiment of thepresent invention, afterabigdataprocessing request is received, the processing logic which processes big data is determined according to the big data processing request. For example, if a user needs to perform a mining analysis for big data, the processing logic which processes big data is determined to be mining analysis according to the received big data processing request .

Step 1005: Determine at least one processing step according to the processing logic. In one embodiment of the present invention, after the processing logic which processes big data is determined, at least one processing step which is comprised in the processing of the big data is determined according to the processing logic . For example, after the processing logic which processes big data is determined to be mining analysis, three processing steps which are required for the mining analysis of big data in turn are determined according to the processing logic , and the three processing steps are processing step A, processing step B, and processing step C, in turn.

Step 1006: Determine the target R services used to realize each processing step in turn.

In one embodiment of the present invention, after at least one processing step of big data is determined, the target R service which can realize the processing step in the R service library is determined for each of the at least one processing step. For example, R service A which can realize processing step A is determined as the target R service of processing step A from the R service library, R service B which can realize processing step B is determined as the target R service of processing step B from the R service library, and R service C which can realize processing step C is determined as the target R service of processing step C from the R service library. Step 1007 : Determine the corresponding development language of each target R service.

In one embodiment of the present invention, the corresponding development language of each target R service is determined according to the triggering of a user, that is to say, the development language which invokes the target R service is designated by the user. Or a corresponding development language is predefined for eachR service in the R service library according to the big data processing characteristics of each R service, and then the corresponding development language of each target R service is determined according to the predefined development language. The big data processing characteristics of R services include the invocation speed at which an R service is invoked to process big data, the memory occupied by an R service, and whether a cross-platform operation is required. For example, the corresponding development language of service A is determined to be C++, the development language of service B is determined to be C, and the development language of service C is determined to be Java.

Step 1008 : Utilize the corresponding development language interface to invoke each target R service through a corresponding development language .

In one embodiment of the present invention, after the corresponding development language of each target R service is determined, the corresponding development language interface of the corresponding development language of each target R service is utilized to invoke each target R service from the R service library in turn. For example, service A is first invoked from the R service library by utilizing the C++ interface of the development language C++, then service B is invoked from the R service library by utilizing the C programming language interface of the development language C, and finally service C is invoked from the R service libraryby utilizing the Java interface of the development language Java. Step 1009: Process big data through the target R services in turn

Inone embodiment of the present invention, after each target R service is invoked from the R service library, big data is processed through the target R services in turn to obtain the processing results . For example, service A is first invoked through the development language C++ to process big data to obtain the primary processing results, then service Bis invoked through the development language C toprocess the primary processing results to obtain intermediate processing results, and finally service C is invoked through the development language Java to process the intermediate processing results to obtain the final processing results of big data. Step 1010: Output the processing results of big data.

In one embodiment of the present invention, after the processing of big data is completed, if the big data processing request is sent from the user terminal, the processing results will be sent to the user terminal; if the big data processing request is sent from the storage device, the processing results will be stored in the storage device, for example, the processing results will be stored in the distributed file system.

It should be noted that the steps in the embodiment shown in Figure 10 are used to describe the realization process of the big data processing method more clearly, and there is no absolute sequence between these steps in the practical service realization process, for example, step 1004 can be executed before step 1003 or can be executed together with the step 1003.

The present invention further provides a machine readable medium, which stores instructions used to allow a machine to execute the big data processing method described in this document . Specifically, a system or device equipped with a storage medium can be provided. Software program codes which can realize the function in any of the above-mentioned embodiments are stored in the storage medium and the computer (or CPU or MPU) of the system or device can read and execute the program codes stored in the storage medium.

In this case, program codes read from the storage medium themselves can realize the function in any of the above-mentioned embodiments. Therefore, program codes and the storage medium where program codes are stored constitute a part of the present invention.

Embodiments of storage media used to provide program codes include floppy disk, hard disk, magneto-optical disk, compact disk (for example, compact disk read-only memory (CD-ROM) ) , compact disk - recordable (CD-R) , compact disk - rewritable (CD-RW) , digital video disk - read only memory (DVD-ROM) , digital versatile disk - random access memory (DVD-RAM) , digital versatile disk - rewritable (DVD-RW) , DVD+RW magnetic tape, non-volatile memory card, and read-only memory (ROM) . Optionally, program codes can be downloaded from the server computer over a communication network .

In addition, it should clearly be understood that the function of any of the above-mentioned embodiments can be realized not only by executing the program codes read out by a computer, but also by letting the operating system running on the computer complete a part or all of thepractical operations throughaprogram code-based instruction.

In addition, it should be understood that the program codes read out from a storage medium are written into the storage in the expansion board in a computer or are written into a storage in an expansion unit connected to the computer, and then the instruction based on program codes lets the CPU installed on the expansion board or expansion unit execute a part or all of the practical operations in order to realize the function of any of the above-mentioned embodiments .

It shouldbe noted that not all steps or modules in the above-mentioned flowcharts and system structure diagrams are required, and some steps or modules can be ignored, depending on the actual requirements. The execution sequence of the steps is not fixed and can be adjusted as required. The system structures described in the above-mentioned embodiments can be physical structures or logical structures. That is to say, some modules may be realized by a physical entity, or some modules may be realized by a plurality of physical entities or may jointly be realized by some components in a plurality of self-contained devices.

In the above-mentioned embodiments, hardware units can be realized mechanically or electrically. For example, a hardware unit can comprise a permanent dedicated circuit or logic ( for example , special processor, FPGA, or ASIC) to complete the corresponding operations. A hardware unit can further comprise a programmable logic or circuit (for example, a general processor or other programmable processor) and can complete the corresponding operations through temporary software settings. The specific realizations (mechanical, or dedicated permanent circuit, or circuit which is temporarily set) can be determined on the basis of a consideration of cost and time.

The present invention is revealed and described in detail above in combination with the drawings and preferred embodiments. However, the present invention is not limited to these disclosed embodiments . On the basis of a plurality of the above-mentioned embodiments , those skilled in the art know that more embodiments of the present invention can be obtained by combining the code review means in the above-mentioned different embodiments, and all these embodiments should also fall within the scope of protection of the present invention.