Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR FACILITATING BEHAVIORAL ANALYSIS OF MALWARES
Document Type and Number:
WIPO Patent Application WO/2023/223352
Kind Code:
A1
Abstract:
The present invention discloses a system (500) and a method of facilitating behavioral analysis of malwares. The system (500) comprises a testbed (110), and a processing device (502). The testbed (110) comprises a plurality of devices (110a – 110n) of different configurations. Each device of the plurality of devices (110a – 110n) is configured to execute samples of the malwares. The processing device (502) is configured to receive one or more samples of a malware and one or more conditions for execution of the malware. The processing device (502) selects a device (110a) from the plurality of devices (110a – 110n) and executes the one or more samples of the malware on the device (110a), based on the one or more conditions. The processing device (502) stores results of execution of the one or more samples of the malware, including run-time activity of the malware observable across network, Operating System (OS), and hardware.

Inventors:
KARAPOOLA SAREENA (IN)
REBEIRO CHESTER DOMINIC (IN)
VEEZHINATHAN KAMAKOTI (IN)
Application Number:
PCT/IN2023/050462
Publication Date:
November 23, 2023
Filing Date:
May 17, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INDIAN INST TECH MADRAS (IN)
International Classes:
G06F21/56; G06F11/00
Foreign References:
US20210117544A12021-04-22
US20150295945A12015-10-15
Attorney, Agent or Firm:
MAHUA RAY (IN)
Download PDF:
Claims:
CLAIMS:

1. A method for facilitating behavioral analysis of a malware, the method comprising: receiving one or more samples of the malware and one or more conditions for execution of the malware; executing, based on the one or more conditions, the one or more samples of the malware on a real-world testbed (110) provided with internet connectivity, wherein the testbed comprises a heterogeneous hardware setup including multiple processing devices (110a - 11 On) of different configurations for providing conducive conditions for malware execution; and collecting results of execution of the one or more samples of the malware, wherein the results of execution include precise and holistic run-time activity of the malware observable across network, Operating System (OS), and hardware. providing an unbiased comprehensive view of real-world malware behavior, enabling researchers to quickly explore and compare detection mechanisms to fast-track malware research.

2. The method as claimed in claim 1, wherein the one or more conditions include a software platform on which the malware is to be executed and a time duration (t) for which the execution of the malware is to be observed.

3. The method as claimed in claim 1, wherein before executing the one or more samples of the malware, one or more processing devices (110a - 1 lOn) capable of providing a real- world software and hardware platform for execution of the one or more samples of the malware are resetted to a clean baseline state.

4. The method as claimed in claim 1, wherein the testbed (110) is connected to internet through a multi-level firewall, that allows the malware to communicate to the remote command and control servers associated with the malware similar to real- world scenarios, while blocking the damaging effects of the attacks if any, to permeate outside the testbed (HO).

5. The method as claimed in claim 1, wherein the testbed (110) has a multi-level reset mechanism.

6. The method as claimed in claim 5, wherein a first level of the multi-level reset mechanism is a software based baseline -reset for restoring a physical machine of the heterogeneous hardware setup to its clean baseline state in a faster manner, and a second level of the multi-level reset mechanism is an image-reset for reloading a required OS from an image server which ensures a clean baseline state in scenarios where the malware may impair the machine from booting.

7. The method as claimed in claim 1, wherein the results of execution of the one or more samples of the malware are stored in a repository (504) storing details of a plurality of malwares.

8. A system (500) for facilitating behavioral analysis of a malware, the system (500) comprising: a testbed (110) comprising a plurality of devices (110a - 11 On) of different configurations connected in a heterogeneous hardware setup for providing conducive conditions for malware execution, wherein each device of the plurality of devices (110a - 1 lOn) is configured to execute samples of the malware; characterized in that, a processing device (502) configured to: receive one or more samples of a malware and one or more conditions for execution of the malware; select a device (110a) from the plurality of devices (110a - 1 lOn) based on the one or more conditions; reset the selected device to a clean-baseline state to initiate analysis of malware; execute, based on the one or more conditions, the one or more samples of the malware on the device (110a); and store results of execution of the one or more samples of the malware, wherein the results of execution include a precise and holistic run-time activity of the malware observable across network, Operating System (OS), and hardware. two firewall devices () configured in two levels to: connect the testbed to the Internet; allow malware to communicate to its remote command and control servers; and, block all malware communications that may be detrimental to the infrastructure and the Internet; a processing device configured to update the malware corpus that scans the Internet to search for newly reported malware samples; downloads the newly reported malware samples to a malware corpus; executes the newly reported malware samples on the testbed. thus, ensuring timely analysis of live malware, wherein malware are analysed soon before their remote command and control servers are blocked.

9. The system (500) as claimed in claim 8, wherein the one or more conditions include a software platform on which the malware is to be executed and a time duration (t) for which the execution of the malware is to be observed.

10. The system (500) as claimed in claim 8, wherein the plurality of devices (110a - 1 lOn) are off-the-shelf devices.

11. The system (500) as claimed in claim 10, wherein the off-the-shelf devices are one or more of desktop computers, single-board computers, and embedded platforms with different operating systems.

12. The system (500) as claimed in claim 8, wherein a multi-level firewall is installed in the system (500) to manage a connection with internet.

13. The system (500) as claimed in claim 8, wherein the testbed (110) has a multi-level reset mechanism.

14. The system (500) as claimed in claim 13, wherein a first level of the multi-level reset mechanism is a software based baseline-reset for restoring a device of the heterogeneous hardware setup, and a second level of the multi-level reset mechanism is an image-reset for reloading a required OS from an image server.

15. The system (500) as claimed in claim 8, wherein the memory (504) is configured to store details of behavior of a plurality of malwares and the memory (504) is retrieval by a user for the results of the execution and the details of the behavior of the plurality of malwares.

Description:
SYSTEM AND METHOD FOR FACILITATING BEHAVIORAL ANALYSIS OF MALWARES

FIELD OF INVENTION

The present invention generally relates to malware detection. More specifically, the present invention facilitates behavioral analysis of samples of malware.

BACKGROUND

A program having malicious content is known as a malware. The malware poses varying levels of risk to system users. The ramifications of these attacks range from data breaches to business disruptions, reputation damage, financial loss, and sabotage of critical infrastructures. Malware analysis may be broadly classified into static analysis and dynamic analysis. In static analysis, contents of a malware may be examined to extract signatures and detect maliciousness of the malware. However, static signatures can be easily thwarted by techniques such as packing and obfuscation.

In dynamic analysis, maliciousness is detected using a run-time behavior of the malware. The dynamic analysis adopts an active technique and a dynamic technique for analysis of the malware. The active technique repeatedly instruments the malware before execution to explore all execution paths in the malware. Thus, the instrumentation done by the active technique may be detected by some evasive malware and choose not to execute. The passive technique merely execute the malware and observe behavioral trails of the malware. Thus, the passive technique of behavioral analysis is immune to the evasive malware. Artificial Intelligence (Al) driven run-time behavioral analysis is generally used in defence against evasive malwares. Data models developed using Al techniques provide offer a suitable mechanism to detect anomalies. Al techniques require availability of ground-truth of malware behavior. However, collecting a precise representation of real-world malware behavior in a laboratory setting is challenging.

Currently, research in malware detection adopts two approaches to address demand for live samples of the malware. In a first approach, analysis done by Anti-virus (AV) engines provide an outcome that includes an inference in maliciousness of the samples, signatures, and reports obtained from the analysis. However, the signatures and reports are limited by capabilities of the available AV engines. In a second approach, live malware samples are provided to researchers for execution and subsequent analysis is performed. However, the second approach has multiple limitations. One of the limitation is distribution of live samples of a malware which is highly vulnerable to accidental execution. Any leakage of the live samples of the malware can lead to potential misuse, warranting policies for ensuring accountability. Another limitation is restricted and monopolized services for providing the live malware samples by private enterprises, thus, incurring an excessive cost for a regular supply of new samples of malwares. Another limitation is execution of malwares and detecting real-world behavior in a laboratory setting is challenging. Researchers may prioritize safety for analysing the malwares. However, the evasive malware looks for real-world conditions before revealing offensive behavior. Thus, the evasive malware can easily identify artifacts of test environments and choose not to execute. Consequently, data collected after execution of the evasive malware in virtual test environment does not represent offensive behavior.

Further, a large-scale evaluation of malwares is also challenging. Specifically, malware execution impacts a system state in an analysis framework

Therefore, there is a need of an efficient method for safe and timely execution and behavioral analysis of samples of malwares.

OBJECTS OF THE INVENTION

An object of the present invention is to provide a method and a system to facilitate precise and comprehensive behavioral analysis of samples of malwares.

Another object of the present invention is to provide a model or framework for safe execution of the samples of the malwares to facilitate behavior-as-a-service.

Another object of the present innovation is to provide a model that enables a user to submit a hashes of the samples of the malware for analysis and retrieve real-world run-time behavioral trails by execution of the samples of the malwares. Still another object of the present invention is to provide a testbed ensuring close-to-real- world configuration with network and internet connectivity for execution of the samples of the malwares.

Yet another object of the present invention is to provide the framework for timely and large- scale execution of the samples of the malwares.

SUMMARY OF THE INVENTION

The summary is provided to introduce aspects related to a system and method for safely and precisely facilitating behavioral analysis of samples of malwares, and the aspects are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.

In an embodiment, a method for facilitating behavioral analysis of samples of a malware may comprise receiving one or more samples of malware and one or more conditions for execution of the malware. The one or more samples of malware may be executed on a testbed provided with internet connectivity based on the one or more conditions. The testbed comprises a heterogeneous hardware setup including multiple processing devices of different configurations for providing conducive conditions for malware execution. Further, results of execution of the one or more samples of malware may be collected. The results of execution include run-time activity of the malware observable across network, Operating System (OS), and hardware. The results of execution of the one or more samples of the malware are stored in a repository storing details of a plurality of malwares.

In one aspect, the one or more conditions include a software platform on which the malware is to be executed and a time duration (t) for which the execution of the malware is to be observed. Before executing the one or more samples of the malware, one or more processing devices capable of providing the software platform for execution of the one or more samples of the malware are resetted to a clean baseline state. The testbed is connected to the internet through a multi-level firewall that allows the malware to communicate to a server associated with the malware, while blocking attacks to permeate outside the testbed. The testbed has a multi-level reset mechanism. A first level of the multi-level reset mechanism is a software based baseline -reset for restoring a physical machine of the heterogeneous hardware setup and a second level of the multi-level reset mechanism is an image -reset for reloading a required OS from an image server.

In another aspect, the present invention discloses a system for facilitating behavioral analysis of a malware. The system comprises a testbed, a processing device, and a memory. The testbed comprises a plurality of devices of different configurations connected in a heterogeneous hardware setup for providing conducive conditions for malware execution. Each device of the plurality of devices is configured to execute samples of the malwares. The processing device is configured to receive one or more samples of a malware and one or more conditions for execution of the malware. Further, the processing device selects a device from the plurality of devices and executes the one or more samples of the malware on the device, based on the one or more conditions. Furthermore, the processing device stores results of execution of the one or more samples of the malware. The results of execution include run-time activity of the malware observable across network, Operating System (OS), and hardware.

In one aspect, the one or more conditions include a software platform on which the malware is to be executed and a time duration (t) for which the execution of the malware is to be observed.

The plurality of devices are off-the-shelf devices. The off-the-shelf devices are one or more of desktop computers, single -board computers, and embedded platforms with different operating systems. A multi-level firewall is installed in the system to manage a connection with internet.

The testbed has a multi-level reset mechanism. A first level of the multi-level reset mechanism is a software based baseline-reset for restoring a device of the heterogeneous hardware setup, and a second level of the multi-level reset mechanism is an image-reset for reloading a required OS from an image server. The memory is configured to store details of behavior of a plurality of malwares and the memory is retrieval by a user for the results of the execution and the details of the behavior of the plurality of malwares.

Other aspects and advantages of the invention will become apparent from the following description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings constitute a part of the description and are used to provide further understanding of the present invention. Such accompanying drawings illustrate the embodiments of the present invention which are used to describe the principles of the present invention. The embodiments are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” or “one” embodiment in this invention are not necessarily made to the same embodiment, and they mean at least one. In the drawings:

Fig. 1 illustrates a framework for facilitating behavioral analysis of malwares, in accordance with an embodiment of the present invention;

Fig. 2 illustrates a process flow of interaction with a user interface provided by a front-end of a framework to facilitate behavioral analysis of samples of malwares, in accordance with an embodiment of the present invention;

Fig. 3 illustrates a process flow of interaction with a back-end of a framework to facilitate behavioral analysis of samples of malwares, in accordance with an embodiment of the present invention;

Fig. 4 illustrates a process flow of execution of samples of malwares and collection of results of behavioral data, in accordance with an embodiment of the present invention; Fig. 5 illustrates a block diagram of a system for facilitating behavioral analysis of malwares, in accordance with an embodiment of the present invention; and

Fig. 6 illustrates a process flow of a multi-level reset mechanism in a real-world testbed, in accordance with an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The detailed description set forth below in connection with the appended drawings is intended as a description of various embodiments of the present invention and is not intended to represent the only embodiments in which the present invention may be practiced. Each embodiment described in this invention is provided merely as an example or illustration of the present invention, and should not necessarily be construed as preferred or advantageous over other embodiments. The detailed description includes specific details for the purpose of providing a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced without these specific details.

The present invention proposes a system and a method for safely and precisely facilitating behavioral analysis of malwares. Fig. 1 illustrates a framework 100 for facilitating behavioral analysis of malwares. The framework 100 comprises a front-end 102, a dataset corpus 104, and a back-end 106. The front-end 102 may be configured to receive a request from a user and provide results corresponding to the request of the user. For example, the front-end 102 may provide an Application Programming Interface (API) for users to submit program hashes or files and to deliver results corresponding to the program hashes or files, retrieved from the dataset corpus 104. The dataset corpus 104 may be a collection of behavioral data of a plurality of malwares. In a case where the behavioral data corresponding to the program hashes or files requested by the user is not present in the dataset corpus 104, the front-end 102 submits the request to the back-end 106. For example, the front-end 102 provides the request along with data received from the user. The back-end 106 may comprise a buffer 108 to supply the samples of the malware to a real-world testbed 110. The back-end 106 executes the samples of the malware on the real-world testbed 110 and store the behavioral data of the malware in the dataset corpus 104. In one implementation, the behavioral data of the malware may be transmitted to the user.

In one implementation, the framework may provide simultaneous capture of three artifacts including network, Operating System (OS), and hardware behaviour, for analysis of behavior of malwares. These artifacts may be predominantly used for their effectiveness and decreased overheads in dynamic malware detection. A network trail may capture malware communications, an OS trail may present system calls made by the malware, and a hardware trail may include the micro-architectural events triggered during malware execution.

Fig. 2 illustrates a process flow of interaction with a user interface provided by the frontend 102 of the framework 100. In an aspect, the API (i.e. GetDataForHash) may be used to submit a program hash and a request for obtaining corresponding behavioral data. In such case, the user input may comprise a hash h of the program. At step 202, data related to the program hash h requested by the user may be checked in the dataset corpus 104. At step 204, it is determined whether the data related to the program hash h is present in the dataset corpus 104 or not. If the data is not found to be present in the dataset corpus 104, an error message may be provided to the user, at step 206. Alternatively, when the data is found to be present in the dataset corpus 104, behavioral data corresponding to the program hash h may be extracted from the dataset corpus 104, at step 208. For example, when it is determined that the behavioral data corresponding to the program hash h is present in the dataset corpus 104, the front-end extracts the behavioral data from the dataset corpus 104. Successively, the behavioral data may be provide to the user through the interface of the front-end 102, at step 210.

In another aspect, the API (i.e. GetDataForProgram) may be used to submit a program executable and request for corresponding behavioral data. An input for requesting the behavioral data of the program executable may include samples of malware executable p, a platform f (e.g. Linux, Windows, Android, etc.) on which the malware needs to be executed, and optionally, a time duration t for which the execution of the malware is to be observed. A format of the input may be present in a form of (program p, platform f, time t). At step 212, the back-end 106 may be invoked for execution of the samples of malware. For example, after receiving the input from the user, the front-end 102 raises a request to the back-end to execute the samples of malware requested by the user. The back-end 106 executes the samples of malware requested by the user on the platform f for the time duration t. In one implementation, a default time duration of 2 minutes may be set, which is considered to be sufficient to elicit most of the malicious behaviors of the malware. Thus, if the time duration t is not defined by the user, the back-end 106 may execute the samples of malware for the default time duration i.e. 2 minutes. The back-end 106 collects the behavioral data based on the execution of the samples of malware. The behaviorial data of the malware may be saved in the dataset corpus 104. Successively, the behavioral data of the malware may be provided to the user, at step 214.

In one aspect, the user may upload multiple files for collection of the behavioral data. In such scenario, the API (i.e. GetDataForFolder) may be used to submit a folder of malwares, along with the platform f and time t for executing each sample present in the folder. A format of the input may be present in a form of (program folder, platform f, time t). The front-end 102may invoke the back-end to execute and collect behavioral data of each sample of the folder and return the behavioral data to the user.

Fig. 3 illustrates a process flow of interaction with the back-end 106 of a framework to facilitate the behavioral analysis of the samples of malware. The back-end 106 may comprise an update engine 304, the buffer 108, a test engine 302, and the real-world testbed 110. Algorithm 1 as provided below describes working of the back-end 106 of the framework 100.

1. begin

2. while true do

/* Update Engine */

3. Crawl online repositories for newly reported samples

4. if updates are available then

5. NewhashList<— Hashes of newly reported malware samples

6. for h G NewHashList do

7. p <— Download hash h 8. Supply-of- Samples*— p

/* Test Engine */

9. Data-Corpus*— Execute Collect (p)

10. Check for requests from front-end

11. if requests queued from front-end then

12. ListOfPrograms*— List of programs submitted by user

13. for p 6 ListOfPrograms do

14. Supply-of- Samples*— p

// Test Engine

15. Data-Corpus*— Execute Collect (p)

Algorithm 1

As evident from Algorithm 1, the update engine 304 may periodically search for a newly reported malware in public malware repositories and may download the newly reported malware in the buffer 108 (indicated through steps 3-8 of the Algorithm 1). Further, the testengine 302 may execute samples of the newly reported malware on the real-world testbed 110. The test engine 302 may collect behavioral data of the newly reported malware on artifacts such as network, operating system (OS), and hardware. The test engine 302, by default, may execute and collect the behavioral data for a pre-defined time duration. In one implementation, the default time duration may be 2 minutes, which is proved to be sufficient to obtain most of malicious behaviors of malware. Further, the behavioral data may be stored in the dataset corpus 104 (indicated through step 9 of the Alogrithm 1).

In an implementation, the back-end 106 may receive a request to execute samples of a malware from the front-end 102 of the framework 100. The back-end 106 may extract the samples of the malware requested by a user. The samples of the malware may be temporally stored in the buffer 108 (indicated through step 14 of the Algorithm 1). The buffer 108 may provide the samples of the malware to the test engine 302 for execution of the samples of the malware. The test engine 302 may execute the samples of the malware on the real- world testbed 110 and may collect the behavioral data of the malware. Execution of the samples of the malware and collection of the behavioral data of the malware is described successively in detail, with reference to Fig. 4. The samples of the malware may be executed for a time duration specified by the user. When the time duration is not specified by the user, the samples of the malware may be executed for the default time duration. Further, the behavioral data may be provided to the front-end 102 of the framework 100 for presenting the behavioral data of the malware to the user. In one implementation, the behavioral data may be stored in the dataset corpus 104 (indicated through step 15 of Algorithm 1). Thus, the back-end 106 ensures timely execution of a regular feed of the newly reported malware, for updating the dataset corpus 104.

Fig. 4 illustrates a process flow of execution of the samples of the malware and collection of results of the behavioral data. At step 402, all devices of the real- world testbed 110 may be resetted to clean baseline states. At step 404, an appropriate device of the real- world testbed 110 may be selected as a profiler to execute the samples of the malware. The profiler may be a software installed on devices of the real- world testbed 110 on artifacts such as network, operating system (OS), and hardware. Further, the samples of the malware may be provided to the profiler for execution of the samples of the malware. At step 406, collection of the behavioral data from the real-world testbed 110 may be initiated. For example, a corresponding tool to capture each artifact such as network, OS, and hardware may be started in the profiler. In an implementation, a process-monitoring tool may be started in the profiler to capture OS behavioral data. At step 408, the samples of the malware may be executed on the profiler for a specific time period. The specific time period may be provided by the user or may be pre-defined. At step 410, execution of the samples of the malware and collection of results of the execution may be stopped after completion of the specific time period. At step 412, the test engine 302 may extract the results of the execution and store the results into the dataset corpus 104.

Fig. 5 illustrates a block diagram of a system 500 for facilitating behavioral analysis of malwares. The system comprises a processing device 502, a memory 504, and the real- world testbed 110. The processing device 502, the memory 504, and the real-world testbed 110 may be communicatively coupled with each other.

The real-world testbed 110 may consist of a plurality of devices of different configurations such as a first device 110a, a second device 110b, and so on till Nth device llOn. Each device of the plurality of devices may execute samples of malwares. The plurality of devices 110a - llOn may be off-the-shelf devices such as desktop computers, single-board computers, and embedded platforms. Each device may operate using an operating system (OS) such Linux, Mac, Windows, and others. In an implementation, the real-world testbed 110 may comprise a heterogeneous hardware setup such as Raspberry Pi, Intel x86 Atom,

Quark, i5, and i7 machines configured with different OS. The heterogeneous hardware setup ensures real-world conditions for execution of the samples of the malwares. Some evasive malwares may search real-world conditions before revealing malicious behavior. In such cases, the real-world testbed 110 provides a heterogeneous network of physical machines that may be employed as a profiler to execute malware. Table 1 provides details of the hardware and the OS used in the real-world testbed 110.

Table 1

The processing device 502 may manage the plurality of devices 110a - llOn of the real- world testbed 110. For example, the processing device 502 may control a device of the plurality of devices 110a - llOn to execute the samples of the malware. The memory 504 may store results of execution of the samples of the malware and details of behavior of the samples of the malware.

In operation, the processing device 502 may receive one or more samples of a malware and one or more conditions for execution of the malware. The one or more samples of the malware may be received from either public malware repositories or from a user. The one or more conditions may include a software platform on which the malware is to be executed and a time duration (t) for which the execution of the malware is to be observed. The time duration (t) may be an optional condition received from the user. In case the time duration is not provided by the user, the samples of the malware may be executed for a pre-defined time duration. The pre-defined time duration may be defined based on a type of a device on which the samples of the malware to be executed.

The processing device 502 may select a first device 110a from the plurality of devices 110a - llOn based on one or more conditions. For example, the processing device 502 may extract information regarding the software platform on which the malware is to be observed, from the one or more conditions. Further, the processing device 502 may select the first device 502 from the plurality of devices 110a - llOn by mapping the software platform with details of the plurality of devices 110a - llOn mentioned in the table 1.

The processing device 502 may store results of execution of the samples in the memory 504. The results of execution may include run-time activity of the malware observable across network, Operating System (OS), and hardware. Thus, the memory 504 may be continuously updated by storing behavioral data of newly identified malware observed across network, OS, and hardware. The user may retrieve the behavioral data of the plurality of malwares from the memory 504.

The system 500 may use a dedicated network connection for internet connectivity. The dedicated network connection may be managed by a multi-level firewall. The multi-level firewall may allow the malware to communicate with a server for execution of one or more samples of the malware. In one implementation, a two-level firewall may be used for managing the dedicated network connection between the system 500 and Internet. The malware may need to compromise multiple firewalls to cross the system 500. Thus, devices present outside the real- world testbed 110 may be protected from the malwares executed on the real-world testbed 110. Similarly, external malware may need to compromise multiple firewalls to attach the real-world testbed 110.

Fig. 6 illustrates a process flow of a multi-level reset mechanism in a real-world testbed. For execution of each sample of malwares, the real-world testbed may be resetted to a clean initial state or a baseline state. In one implementation, the real-world testbed may be employed with a multi-level reset mechanism (such as two-level reset mechanism as described with reference to Fig. 6). Whenever samples of a malware are received by the real- world testbed, all devices of the real-world testbed may be resetted to the clean-initial state or the baseline state, at steps 602 and 604. After resetting the devices of the real- world testbed, the samples of the malwares may be executed on the real- world testbed, at step 606. After execution, all the devices of the real-world testbed may be resetted at a first level of the multi-level reset mechanism, at step 608. The first level of the multi-level reset mechanism may be a software based baseline-reset for restoring a device of real-world testbed. The first level of the multi-level reset mechanism provides a quick low-overhead baseline-reset by restarting all devices of the real-world testbed using remote commands. When execution of the samples of the malwares makes the real-world testbed inaccessible remotely, the real-world testbed may be resetted by using smart power switches. In some cases, the execution of the samples of the malwares may cause critical faults in the real- world testbed such that the real-world testbed may not be resetted to the clean initial state. In such cases, all devices of the real-world testbed may be resetted at a second level of the multi-level reset mechanism, at steps 610 and 612. The second level of the multi-level reset mechanism is an image-reset for reloading a required OS from an image server 614.

Modern malwares areevasive and look for real-world conditions before revealing offensive behavior, thus remaining dormant in virtualized analysis environments. Thus, the evasive malwares can easily identify artifacts of test environments and choose not to execute. Consequently, data collected execution of the evasive malware does not represent offensive behavior. Present invention proposes a system that provides real-world conditions and Internet connectivity to ensure malware to continue execution beyond the conditional checks for evasion in their code. The present invention also proposes a system that allows simultaneous capture of three artifacts i.e. network, OS, and hardware behavior of a malware.

Table 2 describes analysis time taken by the system to execute and collect behavioral data for the samples of the malware.

Table 2

As described in table 2, the system proposed by the present invention is compared with a public testbed such as DETER. It is observed from the table 2 that multi-level reset mechanism used in the present system enables 58.6% times faster reloads compared to DETER. The shorter time taken for state resets enables more number of sample analysis (255 per day) in the present system as compared to DETER (154 per day).

In one implementation, memory of system presented in the invention has 2.7 TB of data and 22M behavioral snapshots of 10,432 samples of the malwares. The memory further includes 7M network packets, 11.3M operating system call traces, and 3.3M micro-architectural events from hardware for 8 classes of malware.

Table 3 describes distribution of samples of malware collected in a growing dataset of a memory of the system.

Table 3

The present invention provides a system and a method for safe execution of samples of malwares by providing a real-world testbed for execution of samples of malwares. Further, the system continuously updates behavioral analysis data in a memory associated with the system. Thus, the system provides an unbiased comprehensive view of real-world behavior of the malwares, which enables the researchers to quickly explore and compare detection mechanisms to counter the evolving malware landscape. Furthermore, the system offloads time and efforts of setting up a real-world evaluation infrastructure for comprehensive behavioral data collection, while alleviating high risks involved in handling and executing evasive malwares.

The terms “or” and “and/or” as used herein are to be interpreted as inclusive or meaning any one or any combination. Therefore, “A, B or C” or “A, B and/or C” mean “any of the following: A; B; C; A and B; A and C; B and C; A, B and C.” An exception to this definition will occur only when a combination of elements, functions, steps or acts are in some way inherently mutually exclusive.

Any combination of the above features and functionalities may be used in accordance with one or more embodiments. In the foregoing specification, embodiments have been described with reference to numerous specific details that may vary from implementation to implementation. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The sole and exclusive indicator of the scope of the invention, and what is intended by the applicants to be the scope of the invention, is the literal and equivalent scope of the set as claimed in claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction.