Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR AUTOMATICALLY EXPLORING STATES AND TRANSITIONS OF A HUMAN MACHINE INTERFACE (HMI) DEVICE
Document Type and Number:
WIPO Patent Application WO/2022/258740
Kind Code:
A1
Abstract:
In order to improve effectiveness and efficiency of pathfinding methods, the invention provides computer implemented method for automatically exploring HMI states and HMI transitions of an HMI device, wherein each HMI transition includes an HMI state and an HMI action that caused the HMI device to change into that HMI state, the method comprising: a) detecting image data displayed by the HMI device, the image data being indicative of the HMI state; b) hashing the image data in order to obtain a hash representation of the HMI state; c) continuing with step d), if a previously unencountered hash representation is encountered within a predetermined time interval or a predetermined number of HMI actions, otherwise continuing with step e); d) generating an HMI action that is determined by a curiosity-based reinforcement learning method that includes at least one curiosity measure that is defined for each pair of HMI state and HMI action; e) compiling a sequence of HMI actions that is determined by a DFA; f) sending the HMI action of step d) or the sequence of HMI actions of step e) to the HMI device, so as to cause a change of the HMI state.

Inventors:
ZHENG AN (SG)
CAO YUSHI (SG)
TEO YON SHIN (SG)
TOH YUXUAN (SG)
LIN PROF SHANGWEI (SG)
LIU YANG (SG)
ADIGA VINAY VISHNUMURTHY (SG)
Application Number:
PCT/EP2022/065656
Publication Date:
December 15, 2022
Filing Date:
June 09, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CONTINENTAL AUTOMOTIVE TECH GMBH (DE)
UNIV NANYANG TECH (SG)
International Classes:
G06N3/08; G06F11/36; G06N20/00
Other References:
ZHENG YAN ET AL: "Automatic Web Testing Using Curiosity-Driven Reinforcement Learning", SOFTWARE ENGINEERING, IEEE PRESS, 445 HOES LANE, PO BOX 1331, PISCATAWAY, NJ 08855-1331 USA, 22 May 2021 (2021-05-22), pages 423 - 435, XP033930078, ISSN: 1558-1225, ISBN: 978-1-5386-3868-2, [retrieved on 20210412], DOI: 10.1109/ICSE43902.2021.00048
TANG HAORAN ET AL: "Exploration : a study of count-based exploration for deep reinforcement learning", 1 January 2017 (2017-01-01), XP055960832, Retrieved from the Internet [retrieved on 20220914]
MUHAMMAD USAMA ET AL: "Learning-Driven Exploration for Reinforcement Learning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 16 October 2020 (2020-10-16), XP081787596
DAVID ADAMOMD KHORROM KHANSREEDEVI KOPPULARENEE BRYCE: "Reinforcement learning for android gui testing", PROCEEDINGS OF THE 9TH ACM SIGSOFT INTERNATIONAL WORKSHOP ON AUTOMATING TEST CASE DESIGN, SELECTION, AND EVALUATION, 2018, pages 2 - 8
VOLODYMYR MNIHKORAY KAVUKCUOGLUDAVID SILVERANDREI A RUSUJOEL VENESSMARC G BELLEMAREALEX GRAVESMARTIN RIEDMILLERANDREAS K FIDJELAND: "Human-level control through deep reinforcement learning", NATURE, vol. 518, no. 7540, 2015, pages 529 - 533, XP037437579, DOI: 10.1038/nature14236
DEEPAK PATHAKPULKIT AGRAWALALEXEI A EFROSTREVOR DARRELL: "Curiosity-driven exploration by self-supervised prediction", PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, 2017, pages 16 - 17
SUSANNE STILLDOINA PRECUP: "An information-theoretic approach to curiosity-driven reinforcement learning", THEORY IN BIOSCIENCES, vol. 131, no. 3, 2012, pages 139 - 148, XP035096381, DOI: 10.1007/s12064-011-0142-z
THI ANH TUYET VUONGSHINGO TAKADA: "A reinforcement learning based approach to automated testing of android applications", PROCEEDINGS OF THE 9TH ACM SIGSOFT INTERNATIONAL WORKSHOP ON AUTOMATING TEST CASE DESIGN, SELECTION, AND EVALUATION, 2018, pages 31 - 37, XP055688509, DOI: 10.1145/3278186.3278191
HAORAN TANGREIN HOUTHOOFTDAVIS FOOTEADAM STOOKEOPENAI XI CHENYAN DUANJOHN SCHULMANFILIP DETURCKPIETER ABBEEL: "A study of count-based exploration for deep reinforcement learning", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2017, pages 2753 - 2762
YAN ZHENGXIAOFEI XIETING SULEI MAJIANYE HAOZHAOPENG MENGYANG LIURUIMIN SHENYINGFENG CHENCHANGJIE FAN: "2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE", 2019, IEEE, article "Wuji: Automatic online combat game testing using evolutionary deep reinforcement learning", pages: 772 - 784
Attorney, Agent or Firm:
KASTEL PATENTANWÄLTE PARTG MBB (DE)
Download PDF:
Claims:
CLAIMS

1. A computer implemented method for automatically exploring HMI states and HMI transitions of an HMI device (12), wherein each HMI transition includes an HMI state and an HMI action that caused the HMI device (12) to change into that HMI state, the method comprising: a) detecting image data displayed by the HMI device (12), the image data being indicative of the HMI state; b) hashing the image data in order to obtain a hash representation of the HMI state; c) continuing with step d), if a previously unencountered hash representation is encountered within a predetermined time interval or a predetermined number of HMI actions, otherwise continuing with step e); d) generating an HMI action that is determined by a curiosity-based reinforcement learning method that includes at least one curiosity measure that is defined for each pair of HMI state and HMI action; e) compiling a sequence of HMI actions that is determined by a DFA; f) sending the HMI action of step d) or the sequence of HMI actions of step e) to the HMI device (12), so as to cause a change of the HMI state.

2. The method according to claim ^ characterized by a step g) of repeating the steps a) to g) until a predetermined condition is met and/or stopping execution of the method, when the condition is met.

3. The method according to any of the preceding claims, characterized in that in step b) a transition function of the DFA is updated to include a previously unencountered HMI transition from a first HMI state to a second HMI state, if the second HMI state was previously unencountered.

4. The method according to any of the preceding claims, characterized in that in step d) the reinforcement learning method is a Q-learning method, wherein each pair of HMI state and HMI action has associated with it a Q-value, that is defined to memorize and capture temporal relations among HMI states and HMI actions from which the HMI action to be sent is generated.

5. The method according to any of the preceding claims, ch a racte ri zed i n th at in step d), upon performing a particular HMI action, a corresponding curiosity measure is decreased.

6. The method according to claims 4 and 5, ch a racte ri zed i n th at the Q-value is updated according to the following equation wherein Qnew is the updated Q-value, Qcurrent is the current Q-value, a is the learning rate, b is the curiosity coefficient, g is the discount factor, curiosity(s, a) is the curiosity measure associated with HMI state, s, and HMI action, a, and s’ denotes a newly reached HMI state.

7. The method according to any of the claims 5 or 6, ch a racte ri zed i n th at the HMI action is generated using an e-greedy method, wherein the e-greedy method chooses, with a predetermined probability of 1-e, the HMI action that has the maximum Q-value, or chooses, with a predetermined probability of e, the HMI action that has the maximum curiosity measure.

8. The method according to any of the preceding claims, ch a racte ri zed i n th at in step e) the DFA determines the HMI transition that has the highest curiosity measure, wherein the DFA further identifies the shortest sequence of HMI actions that result in the HMI transition with the highest curiosity measure and outputs said sequence of HMI actions for sending in step f).

9. The method according to any of the preceding claims, ch a racte ri zed by a step h) of storing encountered HMI transitions and encountered HMI states for further processing.

10. A diagnostic arrangement (10) comprising an HMI device (12) to be tested and a test device (14) that is connected to the HMI device (12) and is configured to perform a method according to any of the preceding claims, so as to explore the HMI states of the HMI device (12).

11. A computer program, a machine readable storage medium, or a data signal that comprises instructions that, upon execution on a data processing device, cause the device to perform one, some, or all of the steps of a method according to any of the preceding claims 1 to 10.

Description:
DESCRIPTION

Method for automatically exploring states and transitions of a human machine interface (HMI) device

TECHNICAL FIELD

The invention relates to a computer implemented method for automatically exploring HMI states and HMI transition of an HMI device.

BACKGROUND

Cars have become more and more intelligent by integrating multiple built-in applications such as audio and video play, GPS, Bluetooth control, etc. To easily manipulate these functions, the applications have been integrated into the car dashboard for convenient human-machine interaction. However, before deploying, this dashboard can be difficult and resource-consuming to test whether the applications have met the requirements due to their numerous applications and the complex inner logic within each application.

In addition, given no prior knowledge regarding the logic of the dashboard, its applications and states are needed to be explored during the testing progress, making the test even harder. For a given Human Machine Interface (HMI) dashboard to be tested, one important thing is to explore all its states and map out the paths to ensure that it meets business requirements, this would also help in identifying any unintended bugs.

Intuitively, the more states and paths discovered, the higher the possibility of discovering any bugs or defects. Various kinds of approaches are proposed to achieve a sufficient exploration. Manual exploration is a common and useful way to find all the applications (states) and their corresponding paths. The tester is required to simulate user operations (e.g. click buttons) via a graphical user interface to explore the states and paths. However, such manual work is labor-intensive and costly, where the exploration effectiveness heavily depends on the human testers' domain knowledge. Besides, the dashboards may frequently evolve and the human tester may have to completely redo the test even with minor changes to the HMI dashboard.

Reference is made to the following prior art:

[1] David Adamo, Md Khorrom Khan, Sreedevi Koppula, and Renee Bryce. Reinforcement learning for android gui testing. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation, pages 2-8, 2018.

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Fluman-level control through deep reinforcement learning nature,

518(7540):529-533, 2015.

[3] Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. Curiosity- driven exploration by self-supervised prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 16-17, 2017.

[4] Susanne Still and Doina Precup. An information-theoretic approach to curiosity- driven reinforcement learning. Theory in Biosciences, 131 (3): 139-148, 2012.

[5] Michael Sutton, Adam Greene, and Pedram Amini. Fuzzing: brute force vulnerability discovery. Pearson Education, 2007.

[6] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.

[7] Ari Takanen, Jared D Demott, Charles Miller, and Atte Kettunen. Fuzzing for software security testing and quality assurance. Artech Flouse, 2018. [8] Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel. # exploration: A study of count-based exploration for deep reinforcement learning. In Advances in neural information processing systems, pages 2753-2762, 2017.

[9] Thi Anh Tuyet Vuong and Shingo Takada. A reinforcement learning based approach to automated testing of android applications. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design,

Selection, and Evaluation, pages 31-37, 2018.

[10] Yan Zheng, Xiaofei Xie, Ting Su, Lei Ma, Jianye Hao, Zhaopeng Meng, Yang Liu, Ruimin Shen, Yingfeng Chen, and Changjie Fan. Wuji: Automatic online combat game testing using evolutionary deep reinforcement learning. In 201934th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 772-784. IEEE, 2019.

Random-based method is another approach for software testing[7, 5], which can be applied to explore states and paths. It generates pseudo-random operations to explore the HMI dashboard. Despite the wide adoption in practical development, the shortcomings of such an approach is obvious. That is, they often create meaningless actions such as performing unproductive key action to a certain state repeatedly, performing an unbalanced exploration as it may generate repeated actions and some hard-to-reach states may never be explored.

There is also the problem with consistency, as due to the random nature of this approach, given the same HMI dashboard and parameters, it may yield different results.

To address the aforementioned challenges, an effective and end-to-end automatic path exploration method is needed. Recently, reinforcement learning (RL) has demonstrated its ability for testing and interacting with games [2, 10] or Android applications [1 , 9] However, it is hard to implement a RL model to explore the dashboards directly. To begin with, the target domains are different which makes RL modelling totally different. For example, in game playing, the state representations are easier to acquire and well-structured, which are not applicable in HMI dashboard exploration. In addition, another challenge is how to perform efficient exploration given complicated application logic and numerous states. Thus, more effective exploration is needed for RL-based exploration of the HMI dashboard.

SUMMARY OF THE INVENTION

It is the object of the invention to improve path exploration in HMI devices, such as vehicle dashboards.

The invention provides a computer implemented method for automatically exploring HMI states and HMI transitions of an HMI device, wherein each HMI transition includes an HMI state and an HMI action that caused the HMI device to change into that HMI state, the method comprising: a) detecting image data displayed by the HMI device, the image data being indicative of the HMI state; b) hashing the image data in order to obtain a hash representation of the HMI state; c) continuing with step d), if a previously unencountered hash representation is encountered within a predetermined time interval or a predetermined number of HMI actions, otherwise continuing with step e); d) generating an HMI action that is determined by a curiosity-based reinforcement learning method that includes at least one curiosity measure that is defined for each pair of HMI state and HMI action; e) compiling a sequence of HMI actions that is determined by a DFA; f) sending the HMI action of step d) or the sequence of HMI actions of step e) to the HMI device, so as to cause a change of the HMI state.

Preferably, the method comprises a step g) of repeating the steps a) to g) until a predetermined condition is met and/or stopping execution of the method, when the condition is met.

Preferably, in step b) a transition function of the DFA is updated to include a previously unencountered HMI transition from a first HMI state to a second HMI state, if the second HMI state was previously unencountered. Preferably, in step d) the reinforcement learning method is a Q-learning method, wherein each pair of HMI state and HMI action has associated with it a Q-value, that is defined to memorize and capture temporal relations among HMI states and HMI actions from which the HMI action to be sent is generated.

Preferably, in step d), upon performing a particular HMI action, a corresponding curiosity measure is decreased.

Preferably, the Q-value is updated according to the following equation wherein Q new is the updated Q-value, Q current is the current Q-value, a is the learning rate, b is the curiosity coefficient, g is the discount factor, curiosity(s, a) is the curiosity measure associated with HMI state, s, and HMI action, a, and s’ denotes a newly reached HMI state.

Preferably, the HMI action is generated using an e-greedy method, wherein the e- greedy method chooses, with a predetermined probability of 1-e, the HMI action that has the maximum Q-value, or chooses, with a predetermined probability of e, the HMI action that has the maximum curiosity measure.

Preferably, in step e) the DFA determines the HMI transition that has the highest curiosity measure, wherein the DFA further identifies the shortest sequence of HMI actions that result in the HMI transition with the highest curiosity measure and outputs said sequence of HMI actions for sending in step f).

Preferably the method comprises a step h) of storing encountered HMI transitions and encountered HMI states for further processing.

The invention provides a diagnostic arrangement comprising an HMI device to be tested and a test device that is connected to the HMI device and is configured to perform a preferred method, so as to explore the HMI states of the HMI device. The invention provides a computer program, a machine readable storage medium, or a data signal that comprises instructions that, upon execution on a data processing device, cause the device to perform one, some, or all of the steps of a preferred method.

In this invention, we carefully designed a curiosity-based RL with deterministic finite automaton (DFA) to achieve an efficient exploration approach. In particular, this approach is designed for the RL agent to explore all the states and related paths of car HMI dashboards.

In this invention, we propose a new testing framework which utilizes a reinforcement learning method to perform automated exploration of HMI dashboards. The idea is to leverage reinforcement learning and DFA for automated HMI dashboard exploration for software testing.

We propose a curiosity-based RL to provide low-level guidance for the exploration. Since the HMI dashboard has no extrinsic rewards and there is no end goal, we use curiosity as an intrinsic reward that motivates the RL agent to explore unknown states. Guided by the curiosity, the RL tends to explore less visited or unknown states and their corresponding behaviors automatically. The assumption here is that less visited states will have more new paths to explore.

As not all states are equally connected, to further explore hard-to-reach states, we propose a deterministic finite automaton (DFA) guided exploration strategy that provides high-level guidance for the RL agent to efficiently explore the HMI dashboard. In particular, the DFA records all the states and path taken during the exploration. When the RL agent is trapped (i.e. , cannot discover new states within a given time budget or after a fixed number of operations), a path will be chosen from DFA based on curiosity to further continue the agent’s exploration.

Both the high-level (DFA) and low-level (curiosity) exploration scheme complement each other to provide an effective exploration. By integrating the curiosity and DFA into RL, it can efficiently discover the states and the corresponding logic for path finding. Compared with a manual testing approach, the testing method according to the invention is nearly fully automatic, only requiring the user to define the state and action space at the very beginning. Then the method is able to explore on its own and generate a mapping of the state-space efficiently. Especially, with the guidance of both curiosity and DFA, as the software evolves, the method can easily explore modified parts by utilizing previous explorations stored by the curiosity and DFA approaches.

Compared with other approaches like a fuzzing method, the approach according to the invention is improved by providing two levels of guidance to explore more states in limited time or limited operations. Instead of choosing actions randomly at each state without any consideration of past operations, the low-level guidance (curiosity) makes the method explore less visited or unseen states to a larger degree. The fuzzing approach, however, may spend most of the time oscillating among known states repeatedly. In addition, when new states cannot be visited within a given time frame, the high-level guidance (DFA) is able to pick up a path based on curiosity, thereby increasing the likelihood of discovering new states, i.e. previously unencountered states.

This invention aims to automatically generate actions to explore states for the purpose of path finding in the HM I device. To achieve that, our approach comprises of three components. First the image responses from the HM I dashboard are mapped to distinguishable hash representations by adopting the Simhash (or ImageFlash). Then, the curiosity-based method is designed for exploring more states of the HM I device.

Finally, the DFA-guided exploration further improves the efficiency and coverage of the curiosity strategy by maintaining a DFA during the test to reduce the amount of repeated actions and paths taken.

For the curiosity-based part, an effective reward function is needed to be defined for determining an optimal policy. To express the goal of discovering as many different states as possible in the form of reward function, the invention leverages the concept of curiosity (for example known in general from documents [4, 8, 3]), which is an intrinsic reward that motivates the policy to explore unknown areas. Specifically, we have defined a curiosity measure to guide the exploration towards hard-to-reach states. During the testing, a count table is maintained according to the number of each HM I transition. curiosity( where N(s, a) is initialized to 1, and at each time when action a is performed, the corresponding N(s, a) is increased by 1.

For this automated testing method, the invention uses a model-free based RL algorithm, namely Q-learning generally known from document [6], to optimize the exploration policy. Q-learning has a function that calculates the Q-value for each state-action pair. Thus the Q-value is a real number associated with each state- action pair. The Q-value memorizes and captures temporal relations among states and actions, from which new actions are chosen based on the Q-value to maximize the reward. For each HM I state s, when a new state s’ is reached via choosing action a, the Q-value is updated by: where a is the learning rate to control the learning speed, b is the curiosity coefficient, and Y is the discount factor to control the impact of the history.

Based on the Q-function, the policy is gradually optimized via curiosity-based e- greedy strategy: x Q(s, a) with probability 1 — e max curiosity(s, a) with probability e

V a

For each state representation, actions that are less executed have higher curiosity value and thus are more likely to be selected. In the meantime, the curiosity decreases along with the execution of the chosen action, making other less selected actions more likely to be chosen next. During the exploration, it becomes challenging as the sequences of actions gets longer. Though the curiosity provides guidance for the action selection to discover more new paths, however it remains hard for RL to explore states that take long sequence of actions to reach. For example a RL can choose a specific HMI action with 0.8 probability to reach a particular HMI state S t from an initial HMI state so. However, even if the probability of each choice is relatively high, the final possibility of reaching S t is only (0.8) 4 = 0.41 ; and this path can be easily distracted based on the action chosen at each state. So, the longer a path is, the easier for it to be interrupted, making the deeper states harder to reach.

To address the problem, the invention uses an on-the-fly deterministic finite automaton (DFA), which provides high-level guidance to boost the RL's navigation capability. When no states are found for some time, for example, a HMI transition (s, a) with the highest curiosity is found. Then, the shortest path that can reach the HMI transition is identified by the DFA so that the RL can reach the HMI transition directly for further exploration efficiently.

Specifically, a deterministic finite automaton can be described as a 5-tuple, (S, A, d, so , F), where S is a finite set of states, A is a finite set of actions, d is a transition function that maps a current HMI state s and an HMI action a to a new HMI state a, so is the initial state, and F is a finite set of states that cannot transit to other states.

During the exploration, once a new HMI transition (s, a) is explored, the transition function of the DFA will be updated. After selecting a HMI transition (s t, a t ) with the highest curiosity, the method performs Dijkstra's algorithm to identify the shortest path pa that can reach (s t, a t ). With the guidance of the DFA, the curiosity-based RL can directly reach states with have a higher potential to discover new states, which enhances the efficiency and coverage. While it is possible that the method might arrive at a dead end state, after a few more operations which does not help to discover another state, the curiosity for this state will reduce so that the DFA is more and more likely to explore other states.

Originally, we proposed this approach for the HMI software pathfinder to automatically explore its states and generate corresponding paths . This invention can react according to the image response from the HMI device with only pre-defined actions and state definitions. After exploration, a model containing the HMI transitions and states of the HMI device and its software can be constructed, from which multiple paths can be generated for further testing.

For other software having similar state and action definitions, this invention can be applied directly or with minor changes.

Other than applications in the context of exploring HMI devices, our proposed approach is versatile as it can also be applied in various settings that involves path finding between two vertices in a graph with well-defined discrete state transition. Examples are auto-routing in design of circuit boards, wiring and interconnection of signaling networks or power grids, navigation of strategic games, and graph-based object searching in geographical information systems (GIS).

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention are described in more detail with reference to the accompanying schematic drawings.

Fig. 1 depicts an embodiment of a diagnostic arrangement according to the invention; Fig. 2 depicts experimental results; and Fig. 3 depicts experimental results.

DETAILED DESCRIPTION OF EMBODIMENT

Referring to Fig. 1 , a diagnostic arrangement 10 is depicted that comprises a human machine interface device 12 (subsequently called HMI device 12) and a test device 14. The test device 14 is operatively coupled to the HMI device 12 that is under test.

The test device 14 is configured to perform a method for automatically exploring HMI states s and HMI transitions (s, a), that include an HMI state s and an HMI action a that caused the HMI state s. The method employs a combination of a deterministic finite automaton (subsequently called DFA) and a curiosity-based reinforcement learning method as explained below.

The method comprises a step S1 of detecting image data of the HMI state s that is currently displayed on the HMI device 12. The image data can be gathered by taking an image or by directly accessing memory of the HMI device 12.

In a step S2, the image data is hashed in order to obtain a hash representation of the image data and thereby of the HMI state s. Furthermore, the hash representation includes the possible HMI actions a that can be accessed by a potential user.

In a step S3, the transition function of the DFA is updated to include a previously unencountered pair of HMI state and HMI action.

In a step S4, the test device 14 determines whether to continue the method with a curiosity-based approach or a DFA-based approach. Which approach is determined can be configured by a predetermined parameter, such as a predetermined time interval or a predetermined number of operations/actions. If the hash representation determined in step S2 is previously unencountered then the curiosity-based approach is used. The same is true if the hash representation that was determined in step S2 was already encountered and the predetermined parameter, e.g. time interval or number of operations, is not exceeded. Otherwise, the DFA-based approach is used.

If the curiosity-based approach is chosen, then, in a step S5, the number N(s, a) that corresponds to the HMI action a is increased such that it tracks the number of times a specific pair of HMI state s and HMI action a are encountered as it were. To this end, the test device 14 preferably includes a count table 16 of all pairs (s, a) of HMI states s and HMI actions a and the corresponding number N(s, a). The curiosity measure is defined such that it decreases with increasing N(s, a). A typical configuration for the curiosity measure curiosity(s, a) is an inverse square-root of the N(s, a). It should be noted that the values N(s, a) are typically initialized to N(s, a) =

1. Thus, the initial value of the curiosity measure is also curiosity(s, a) = 1. In a step S6, for each HMI state s, when a new HMI state s’ was reached as indicated by the hash representation and by choosing an HMI action a, the Q-value is updated by: where a is the learning rate to control the learning speed, b is the curiosity coefficient, and Y is the discount factor to control the impact of the history.

Furthermore, in step S6 an HMI action a to be sent to the HMI device 12 is chosen using an e-greedy strategy, which chooses either the highest Q-value among all HMI actions a with a probability of 1 - e or the highest curiosity measure among all HMI actions a with a probability of e. It should be noted that the curiosity measure associated with the chosen HMI action a decreases, thereby lowering the probability of the same HMI action a being chosen next.

If the DFA-based approach is chosen, then, in a step S7, the DFA determines the pair of HMI state s and HMI action a that has the highest curiosity measure associated with it, based on its internal state that was previously updated in step S3. After the pair is determined, a pathfinding method is performed so that a sequence of HMI actions a is found, that lead from an initial pair of HMI state so and HMI action ao to the pair just determined by the DFA. The pathfinding method is preferably Dijkstra’s method, so as to identify the shortest path between the pairs. The sequence of HMI actions a is then selected as the result of the DFA-approach.

Depending on which approach was used in a step S8, either a single HMI action a that was determined by the curiosity-based approach or a sequence of HMI actions a that was determined by the DFA-approach is sent by the test device 14 to the HMI device 12 so as to cause the HMI device 12 to change to another HMI state s.

The steps are repeated until a predetermined condition is met, e.g. a certain number of HMI states s was discovered, the method ran for certain time, the user aborts the method, etc.

During the method or after the method stopped, DFA data 18 memorized by the DFA can be stored for further processing or analysis. The combination of the curiosity approach and the DFA which memorizes all the states and actions during the testing and improve the method to be more effective and efficient in exploring new HMI states s by utilizing the cumulative data acquired during path exploration. To validate the models conveniently, applicants conducted experiments on a self-built simulation with the number of states ranging from 60 to 1000. The fuzzing approach, curiosity-only approach, and curiosity-RL approach according to the invention were compared. For each method, we conducted a fixed number of 10,000 operations to see how many states and paths can be discovered. The results are shown in Fig. 2 an Fig. 3.

Fig. 2, upper diagram shows the ratio of discovered HMI states s with respect to the total number of HMI states using the curiosity driven Q-learning and DFA methods according to the invention. As can be seen the ratio of discovered HMI states s is greater than 90% for almost all numbers of total HMI states. Specifically the DFA method part excels with ratios of discovered HMI states consistently around 95% with about 3 or 4 percentage points of variation.

Fig. 2, lower diagram shows the ratio of discovered HMI states s with respect to the total number of HMI states using the conventional curiosity and random methods. As can be seen the ratio of discovered HMI states s is below 90% for most numbers of total FI Ml states. The known methods never reach a ratio above 95%, but for some cases around 100 total HMI states.

Fig. 3, upper diagram shows the ratio of discovered transitions with respect to the total number of HMI states using the curiosity driven Q-learning and DFA methods according to the invention. As can be seen the ratio of discovered transitions is generallly greater than 80% for almost all numbers of total HMI states. In any case the ratio of discovered transitions varys around 85% consistently.

Fig. 3, lower diagram shows the ratio of discovered transitions with respect to the total number of HMI states using the conventional curiosity and random methods. As can be seen the ratio of discovered transitions is below 80% for most numbers of total HMI states. The known methods never reach a ratio above 85%, but for some cases around 100 total HMI states.

In conclusion, the curiosity-based RL methods are consistently capable to discover more HMI states s and more transitions between HMI states s than the conventional approaches by a large margin of about 10 percentage points.

REFERENCE SIGNS

10 diagnostic arrangement 12 HMI device 14 test device 16 count table 18 DFA data