Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR AUTOMATIC TESTING OF ALGORITHM
Document Type and Number:
WIPO Patent Application WO/2010/126387
Kind Code:
A2
Abstract:
The subject invention provides a method for automatic testing of at least one algorithm, comprising establishing a database, actuation of a testing procedure with the use of resources of the database, and obtaining test results. According to the inventive method, in the successively performed steps, a database is established, said database comprising a repository of resources; a testing specification is composed, said testing specification comprising a set of three identifiers E, A, T, where E is an identifier for a resource that includes an evaluation procedure, A is an identifier for a resource that includes an algorithm, T is an identifier for a resource that includes an algorithmic task, and then the testing procedure is actuated to readout the testing specification comprising the identifiers E, A, T. In a further step the resource with the identifier E is retrieved from the repository of resources and optionally check is performed whether the resource E includes an evaluation procedure. The evaluation procedure is actuated to retrieve from the repository of resources the resource with the identifier A and the resource with the identifier T, where optionally check is performed whether the resource A includes an algorithm of the kind supported by the actuated evaluation procedure, and optionally check is performed whether the resource T includes an algorithmic task of the kind supported by the actuated evaluation procedure, and then in an operation mode of evaluation procedure, the algorithm included in the resource A is actuated to read out the input data according to the algorithmic task included in the resource T and to resolve the algorithmic task so as to obtain output data which are subjected to analysis in the evaluation procedure to obtain at least one result value that constitutes a test result. Optionally, the steps of actuation of the testing procedure, retrieving the resource with the identifier E from the repository of resources, and actuation of the evaluation procedure are repeated to obtain further at least one result value that constitutes a further test result.

Inventors:
WOJNARSKI MARCIN (PL)
LOBODZINSKI JANUSZ (PL)
Application Number:
PCT/PL2010/000032
Publication Date:
November 04, 2010
Filing Date:
April 27, 2010
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
WOJNARSKI MARCIN (PL)
LOBODZINSKI JANUSZ (PL)
Other References:
RADFORD NEAL: "Assessing Relevance Determination Methods Using DELVE" IN PROCEEDINGS OF NEURAL NETWORKS AND MACHINE LEARNING, 1998, pages 97-129, XP002607948
LE BONHOMME B ET AL: "On-Line Benchmarking for Multimedia Applications with www.MyMultimediaWorld.com" IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES, 2008. WIAMIS '08. NINTH INTERNATIONAL WORKSHOP ON, IEEE, PISCATAWAY, NJ, USA, 7 May 2008 (2008-05-07), pages 235-238, XP031281865 ISBN: 978-0-7695-3344-5
BERNHARD SENDHOFF ET AL: "Evolutionary computation benchmarking repository [Developmental Tools]" IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, IEEE, US, vol. 1, no. 4, 1 November 2006 (2006-11-01), pages 50-60, XP011174932 ISSN: 1556-603X
Attorney, Agent or Firm:
PANKOWSKI, Jacek (02-640 Warszawa, PL)
Download PDF:
Claims:
Patent claims

1. A method for automatic testing of at least one algorithm, comprising establishing a database, actuating the testing procedure with the use of resources from the database, and obtaining a test result, characterized by the following consecutively performed steps: i) establishing a database comprising a repository of resources, ii) composing a test specification comprising three identifiers E, A, T, where E is an identifier of a resource that includes an evaluation procedure, A is an identifier of a resource that includes an algorithm, T is an identifier of a resource that includes an algorithmic task, iii) actuating the testing procedure to read out the test specification comprising the identifiers E, A, T, iv) retrieving, from the repository of resources, of the resource with the identifier E and optionally checking whether the resource E includes an evaluation procedure, v) actuating the evaluation procedure to retrieve from the repository of resources the resource with the identifier A and the resource with the identifier T, optionally checking whether the resource A includes an algorithm of the kind supported by the actuated evaluation procedure, and optionally checking whether the resource T includes an algorithmic task of the kind supported by the actuated evaluation procedure, and then, in an operation mode of evaluation procedure, actuating the algorithm included in the resource A to read out the input data according to the algorithmic task included in the resource T and to resolve the algorithmic task to obtain output data which are subjected to analysis in the evaluation procedure to obtain at least one result value constituting a test result, vi) optionally repeating steps (iϋ)-(v) to obtain further at least one result value that constitutes a further test result.

2. A method according to claim 1, characterized in that the repository of resources includes at least one evaluation procedure, at least one algorithm, at least one algorithmic task, preferably included in at least one file.

3. A method according to claim 2, characterized in that the at least one evaluation procedure, algorithm, algorithmic task are included in separate files.

4. A method according to claim 2, characterized in that the at least one evaluation procedure and algorithm are included in one file.

5. A method according to claim 2, characterized in that the at least one evaluation procedure, algorithm, algorithmic task are included in one file.

6. A method according to claims 2-5, characterized in that the files include classes.

7. A method according to claims 2-6, characterized in that the repository of resources contains information concerning relationship between a dependent resource and at least one further resource.

8. A method according to claim 7, characterized in that upon actuation of the testing procedure, a dependent resource and at least one further resource the dependent resource depends on are retrieved from the repository of resources. 9. A method according to claims 1-8, characterized in that the repository of resources is modified by addition of a new resource, updating a resource, deleting a resource or change of a resource identifier.

10. A method according to claims 1-9, characterized by composing of the testing specification, said specification including, in addition to a set of identifiers, a number that indicates the maximum duration time for evaluation procedure operation and/or maximum size of memory used by the evaluation procedure.

11. A method according to claims 1 or 10, characterized in that the algorithm output data are subjected to analysis in the evaluation procedure, in comparison to reference data provided by the algorithmic task. 12. A method according to claims 1, or 10, or 11, characterized by sending the test result along with the testing specification to the database of results.

13. A method according to claims 1, or 10, or 11, characterized by sending, to the database of results, the test result along with the test specification and with a numerical value selected from the set of integers larger than 0, where the numerical value 1 is assigned to the test result obtained in step (v), and the consecutive integers beginning with the number 2 are assigned in an ascending manner to the successive results obtained in step

14. A method according to claims 12 or 13, characterized in that the test result is stored in the database of results.

15. A method according to claims 11-13, characterized in that the test result along with information to identify the user who actuates the testing procedure are sent to the database ofresults.

16. A method according to claims 11-13, characterized in that the test result along with information to characterize the hardware/software environment where the testing procedure is performed, are sent to the database ofresults.

17. A method according to claims 15 or 16, characterized in that the information along with the test result are stored in the database ofresults.

18. A method according to claims 1-17, characterized in that the repository of resources and the database ofresults are created as one database.

19. A method according to claims 1-18, characterized in that as a consequence of modification of the repository of resources the contents of the database of results is automatically modified.

20. A method according to claim 19, characterized in that deletion or update of a resource in the repository of resources causes automatically deletion of the related results from the database of results.

21. A method according to claim 19, characterized in that as a consequence of change of a resource identifier in the repository of resources, the testing specifications assigned to the results are automatically updated in the database ofresults. 22. A method according to claims 14 or 17, characterized in that when storing a result, check is performed whether the resource subjected to automatic testing remains valid in the repository of resources.

23. A method according to claim 22, characterized in that as a consequence of updating or deleting of a resource subjected to automatic testing, the result, not valid anymore, is not stored in the database ofresults.

24. A method according to claim 1, or 10-23, characterized in that the result value constitutes a numerical value that carries information on the operation of the algorithm.

25. A method according to claim 24, characterized in that at least one result value is obtained by measuring the level of the CPU usage by the algorithm, measuring the internal memory and/or disk memory used by the algorithm and/or measuring the size of data sent by the algorithm in the course of resolving the algorithmic task.

26. A method according to claims 24 or 25, characterized in that at least two numerical values are subjected to further analysis, preferably statistical analysis. 27. A method according to claims 24 or 25, characterized in that at least two numerical values are subjected to sorting.

28. A method according to claims 24 or 25, characterized by comparing at least two numerical values obtained from the evaluation procedure for different algorithms actuated on the same algorithmic task. 29. A method according to claims 24 or 25, characterized by comparing at least two numerical values obtained from the evaluation procedure for one algorithm actuated on different algorithmic tasks.

30. A method according to claims 24 or 25, characterized in that the comparison is performed to obtain a ranking list of the algorithms. 31. A method according to claims 1-30, characterized in that to the testing procedure security means are introduced so as to prevent or constrain performing undesired operations in the course of operation of the evaluation procedure.

32. A method according to claim 31, characterized in that the undesired operations are deletion of data on the computer disk, installation of viruses and/or worms, contacting websites without authorization and/or awareness of the user.

Description:
039/10/JPpct

Method for automatic testing of algorithm

This invention provides a method for automatic testing of an algorithm, comprising establishing a database, actuating testing procedure with the use of files retrieved from the database, and obtaining the test result. US application 2003/0142819 discloses a method for evaluation of algorithms, comprising determining an algorithm specification and the algorithm integrity with the specification, comprising expressing the algorithm as a concatenation of basic functions, and checking parameters involved in performing these basic functions, as well as indicating error, if present, detected in the course of the check. International application WO 2008/110002 discloses a method for automatic evaluation of data files that are subject to analysis, comprising establishing a database with reference files, establishing for each analyzed file a training set comprising files from the database of reference files, and establishing a test set of attributes of the analyzed file, dynamically generating a learning model basing on the training set, and applying the learning model to the testing set to obtain a predicted value corresponding to the analyzed file.

The aim of the invention it to provide a method for automatic testing of an algorithm which enables comparison of operations of different algorithms and evaluation of the quality level thereof.

A method for automatic testing of at least one algorithm, comprising establishing a database, actuating a testing procedure with the use of resources from the database, and obtaining a test result, according to the invention, is characterized in that the consecutively performed steps comprise: (i) establishing a database comprising a repository of resources,

(ii) composing a test specification comprising a group of three identifiers E, A, T, where E is an identifier for a resource that includes an evaluation procedure, A is an identifier for a resource that includes an algorithm, T is an identifier for a resource that includes an algorithmic task, (iii) actuating the testing procedure to read out the test specification comprising the identifiers E, A, T, (iv) retrieving, from the repository of resources, the resource with the identifier E and optionally checking whether the resource E includes the evaluation procedure, (v) actuating the evaluation procedure to retrieve from the repository of resources the resource with the identifier A and the resource with the identifier T, optionally checking whether the resource A includes an algorithm of the kind supported by the actuated evaluation procedure, and optionally checking whether the resource T includes an algorithmic task of the kind supported by the actuated evaluation procedure, and next in an operation mode of the evaluation procedure, actuating the algorithm included in the resource A to read out the input data according to the algorithmic task in the resource T and to resolve the algorithmic task to obtain output data that are analyzed in the evaluation procedure to obtain at least one result value constituting the test result, (vi) optionally repeating steps (iii)-(v) to obtain a further at least one result value that constitutes a further test result. Preferably, the repository of resources comprises at least one evaluation procedure, at least one algorithm, at least one algorithmic task, preferably included in at least one data file. In particular, the at least one evaluation procedure, algorithm, algorithmic task are included in separate files. Optionally, the at least one evaluation procedure and algorithm are included in one file. In particular, one evaluation procedure, algorithm, algorithmic task are included in one file. The files especially comprise classes. Preferably, the repository of resources comprises information concerning the relationship between a dependent resource and at least one further resource.

In particular, upon actuation of the testing procedure, a dependent resource is retrieved from the repository of resources, and at least one further resource the dependent resource depends on. Preferably, the repository of resources is modified by addition of a new resource, updating a resource, deletion of a resource, or change of identifier for a resource. Preferably, a test specification is composed, said specification comprising, in addition to the group of identifiers, a number that indicates the maximum duration time of operation of the evaluation procedure and/or the maximal size of memory used by the evaluation procedure. Optionally, the algorithm output data are subjected to analysis during the evaluation procedure in comparison with reference data provided by the algorithmic task. In particular, the test result along with the test specification is sent to the database of results. Optionally, the test result is sent to the database of results, along with the test specification and a numerical value selected from the set of integers above 0, where the numeral 1 is assigned to the test result obtained in step (v), and the subsequent integers, starting from the number 2, are successively progressively assigned to the subsequent results obtained in step (vi). The test result is in particular stored in the database of results.

Optionally, the result is sent to the database of results, along with information that identifies the user who actuated the test procedure. In particular, the test result is transmitted to the database of results, along with information that characterizes the hardware/software environment in which the test procedure is actuated. In particular, the information together with the test result is recorded in the database of results.

Preferably, the repository of resources and the database of results are established as a single database. In particular, as a consequence of modification of the repository of resources, also the contents of the database of results is modified. Optionally, as a consequence of deleting or updating of a resource in the repository of resources, also all the corresponding results are deleted from the database of results. In particular, as a consequence of change of identifier of a resource in the repository of resources, the test specifications assigned to the results are automatically updated in the database of results. Preferably, in the course of storing a result, check is performed whether the resource subjected to automatic testing remains valid in the repository of resources, hi particular, as a consequence of updating or deleting a resource subjected to automatic testing, the result being no longer valid is not stored in the repository of results.

Preferably, the result value constitutes a numerical value that carries information concerning operation of the algorithm. In particular, at least one result value is obtained from measurement of the level of CPU usage by the algorithm, measurement of the internal memory and/or disk memory used by the algorithm and/or measurement of the size of data transmitted by the algorithm in the course of resolving the algorithmic task.

Optionally, at least two numerical values are subjected to further analysis, preferably statistical analysis. At least two numerical values are especially subjected to sorting.

In particular, at least two numerical values are compared, said values being obtained from the evaluation procedure for different algorithms actuated for the same algorithmic task.

Optionally, at least two numerical values are compared, said values being obtained from the evaluation procedure for one algorithm actuated for different algorithmic tasks. The comparison is performed especially to obtain a ranking list of algorithms.

Preferably, into the testing procedure, security means are introduced that prevent or obstruct performing undesired operations in the course of operation of the evaluation procedure. In particular, such undesired operations are data deletion on the computer disc, installing viruses and/or worms, contacting websites without consent and/or awareness of the user.

The presently developed computer software is usually based on advanced algorithms the quality of which is the key factor that determines usefulness and commercial value of - A - a software system. Algorithm quality may be defined in various ways, dependently on the kind of algorithm and the use thereof. For example, for a data compression algorithm the most important quality criterion is the obtained compression rate, i.e. the quotient of the file sizes: the original one and the compressed one. On the other hand, for a classification algorithm, such as an anti-spam filter, the most commonly used quality measure is classification accuracy, which is the number of objects assigned by the algorithm to a specific class, divided by the number of all the objects subjected to analysis. For each algorithm type such evaluation criteria as operation rate of the algorithm and the memory capacity required are important. Due to the increasing level of complexity of the problems resolved, more and more frequently advanced algorithms are applied so that the quality of them cannot be established by theoretical analysis, such as e.g. formal analysis of computational complexity. Therefore, each time more necessary becomes algorithm quality evaluation by experimental methods - this is accomplished by operating the algorithm on some testing algorithmic tasks and by measurement of operation quality. Experimental quality evaluation is necessary e.g. for all data exploration, machine learning or data compression algorithms.

Different algorithm and task types may require different evaluation methods. Selection of the appropriate evaluation method is critical for correct establishing of usefulness of a tested algorithm for some specific application and for the correct selection of the best algorithm. Moreover, development of an effective algorithm for a specific application requires analyzing of dozens or even hundreds of algorithms, e.g. differing solely in values of certain controlling parameters. In the course of experiments, also the algorithmic task and evaluation procedure are subjected to changes. As a consequence, it may result difficult to arrange properly the results and to evaluate unambiguously the algorithm quality. Therefore, there exists a high risk, due to a simple error or omission, that some improper algorithm is selected or excessively high evaluation level for an algorithm quality is assigned.

If the tests are performed by a plurality of entities, a result comparability problem arises. In practice, as a rule it is not possible to control performance of tests in a manner so as to ensure observation of all the assumed prerequisites. Therefore, there exists a need to provide a method for automatic performance of tests and to collect results in a manner that enables their further smooth analysis and interpretation. The method has to ensure capabilities for convenient addition of new algorithms, algorithmic tasks and evaluation methods, without any necessity to modify the method steps.

The subject invention provides a method for automatic algorithm testing, ensuring capabilities to collect test results. The solution of this invention meets the above commented requirements.

The object of the invention defined in the embodiments is illustrated in the drawing, where Fig. 1 shows a diagram illustrating the data flow, Fig. 2 shows an operation diagram of logic operations in a test procedure operation mode, and Fig. 3 shows a diagram of logic operations in an evaluation procedure operation mode.

A method for automatic algorithm testing according to the invention is performed in a testing procedure operation mode with the use of the contents of the repository of resources. In the course of the testing procedure, automatic tests of algorithms retrieved from the repository of resources are performed. The test results can be stored in the database of results, for possible use by the user for further analysis and interpretation. The repository of resources also includes algorithmic tasks and evaluation procedures which along with the algorithms are used in testing procedures. Algorithm is a mathematical system which performs a finite number of actions on the input data. In other words, algorithm is a set of data processing instructions to read out the input data (provided by the user) and to generate the output data (which become accessible to the user). The algorithm may be implemented in any programming language in a form of any programming structure. The algorithmic task comprises a set of input data which are to be fed to be read out by the algorithm upon actuation of the evaluation procedure. The algorithmic task can include a set of expected output data which are to be compared to the data actually generated by the algorithm. The set of these data is intended for interpretation by the evaluation procedure used in the testing procedure and it can assume varying forms, e.g. a file with input/output data or an executable instruction that generates such data upon actuation. The evaluation procedure comprises an executable instruction that implements the method for performing the test. The evaluation procedure specifies: (i) in which way the retrieved algorithm is going to be used for the retrieved algorithmic task; (ii) which is the numerical value that carries the information on the operation of the algorithm, to be generated as a consequence of testing, and how it is going to be calculated.

Different algorithm/task types may require different evaluation procedures. Moreover, use of numerous diverse evaluation procedures for the same algorithm/task types is envisaged, which differ in some details in the way in which testing is carried out and in the results generated thereby. The evaluation procedure includes instruction according to which interpretation of algorithm and algorithmic task is performed. The same task or algorithm can be differently interpreted by different evaluation methods which can lead to obtaining different test results.

The evaluation procedure result is at least one result value, i.e. one or more numerical values that characterize the operation method of the tested algorithm, with the use thereof for resolving a retrieved algorithmic task. Number of these values and the meaning thereof are specific for a given evaluation procedure. The manner of operation of an algorithm, measured in the course of operation of the evaluation procedure, and characterized by the result values, can be estimated in relation to the features assigned to one of two types (1 or 2) as indicated below.

1. Output data quality level, said data being generated by the algorithm, e.g. the similarity degree relative to the expected data. The output data quality level can be defined suitably to the kind of algorithm and the field of application, e.g.: (a) for a data compression algorithm the output data quality level can be calculated as an quotient of the input (non-compressed) data size and the output (compressed) data size; (b) for a machine learning algorithm for statistical classification, such as an anti-spam filter, the most frequently used measure for the output data quality level is classification accuracy - the number of objects assigned by the algorithm to the proper class divided by the number of all objects in the input data. 2. The level at which the system is used by the algorithm during operation thereof, e.g. the level of usage of the computing capability of the computer (time during which the CPU is used by the algorithm) and/or specific subassemblies thereof, e.g. the size of memory used, the extent of hard disc capacity used, the size of data transmitted over the network, etc.

Also some of quality level evaluation criteria may be used jointly, e.g. classification accuracy or operation rate of the algorithm. In such case, the result of quality level measurement will not be a single result value but a set of several result values.

For example, in the data compression process, an algorithm, an algorithmic task, as well as an evaluation procedure may be defined in the following manner.

The algorithm is implemented as two functions: one (compress) receives some data at the input and generates compressed data at the output which data can be fed to the input of a second function (Decompress), so as to obtain at the output the original data: data Z ) 1 → Compress → compressed data D 2 — > Decompress → data Z) 3 = D \ The algorithmic task is a file containing data Z) 1 .

The evaluation procedure calculates a compression rate obtained by a specific algorithm in compression of data Z) 1 . For this purpose the following steps are performed: 1. actuating of the Compress function to load data Z) 1 at the input and obtain data Z) 2 at the output;

2. actuating of the Decompress function to load data Z ) 2 at the input and obtain data Z) 3 at the output;

3. checking whether Z) 3 = Z) 1 (whether Z) 3 and Z ) 1 are identical); if not, operation is discontinued and an error message is generated (algorithm operates incorrectly);

4. calculating the size S 1 of data D\ and size S 2 of data Z) 2 ;

5. calculating of the compression rate r according to the formula: r = S\ I S 2 ;

6. obtaining a value r as a result value.

In another exemplary embodiment, for machine learning of statistical classification, algorithm, algorithmic task and evaluation procedure can be defined in the following manner.

The algorithm for learning of statistical classification can be implemented as two functions: (i) a function Learn receives training data T = ((JCijO, (x2,yi), — , (*BJV)) an d generates a specification of the learnt classifier h at the output; (ii) a function Apply uses the specification of the classifier h and test data without labels U = (iii, «2, ..., iik), i'ι e X and generates expected labels V = (l\, Ti, ..., ZV-), Λ e F at the output for the individual objects tested:

Training data T → Learn → classifier h (classifier h, testing data U) → Apply — * labels L

Statistical classification learning algorithmic task is a file including a data sequence D = ((V 1 , Z 1 ), (v 2 , t 2 ), ..., (v m , V 1 <= X, t, ε Y

Evaluation procedure calculates classification accuracy obtained by the classifier learnt by the learning algorithm used. For this purpose, the following steps are performed: 1. dividing the data sequence D into two disjoint sub-sequences: A and Z) 2 , so as the sizes thereof meet a predetermined ratio, e.g. 70% data D is allocated to Z ) 1 , and 30% to Z ) 2 ;

2. actuating function Learn and loading therein data Z ) 1 (T = D \ according to the above designations) at the input and to obtain specification of a classifier h at the output;

3. separating pairs (v, /) from the sequence Z) 2 to form two sequences: U= (v \ , » 2 , ..., V k ) and L = (Z 1 , Z 2 , ..., Ik), where u h I 1 are elements of the z-th pair (v, /) of the sequence D 2 , and k is the size of Z ) 2 ;

4. actuating function Apply and loading therein the classifier h specification and the sequence t/ at the input; obtaining a sequence of labels U = (Ti, Z 2 , ..., ZV) predicted by the classifier Z?, at the output; 5. counting the number c of correct labels in the sequence L\ i.e. the same as the corresponding labels in the sequence L: c = | {/: / * , = /,} |;

6. calculating the accuracy rate for classification A according to formula: A = c I ' k.

7. obtaining A as the result value.

The method for automatic algorithm testing of the invention is carried out in a testing procedure operation mode with the use of the contents of the repository of resources, basing on the specification provided by the user.

Testing procedure is actuated for a definitely indicated testing specification. The testing specification defines: (i) which evaluation procedure is to be actuated in order to perform testing, (ii) which algorithm is going to be tested, and (iii) which algorithmic task is going to be presented to the algorithm so as to resolve it. The testing specification includes thus a set of three identifiers (E, A, T), where the symbol E means an identifier for a resource that includes an evaluation procedure, the symbol A means an identifier for a resource that includes the algorithm, and the symbol T is an identifier of a resource that comprises an algorithmic task. In addition to the set of identifiers, the testing specification can include additional instructions, such as a numeral that determines the maximum duration time for performing the evaluation procedure and/or a numeral that determines the maximum memory size used by the evaluation procedure.

Resources that include evaluation procedures, algorithms and algorithmic tasks are stored in the repository of resources. Resources can be in a form of files, and some embodiments are possible where the evaluation procedure, algorithm or algorithmic task are included in separate files, as well as some embodiments are possible where the file includes jointly an algorithm and evaluation procedure, or jointly an algorithm and evaluation procedure as well as an algorithmic task, or several different algorithms or several different evaluation procedures, etc. Moreover, the contents of a resource is subjected to interpretation in the course of the testing procedure, as a consequence of which in the first exemplary interpretation the contents of a resource can be interpreted as an algorithm, while in another interpretation as an algorithmic task, etc.

The repository of resources is a database that contains resources with algorithms, evaluation procedures and algorithmic tasks, used in the testing procedure. Each resource has a unique identifier.

A resource in a repository of resources can depend on at least one further resource. Then performing of testing concerning this resource requires also retrieval of resources it depends on. In particular, an algorithm during operation may use another algorithms contained in the repository of resources. In an exemplary embodiment, if a resource with the identifier X depends on a resource with the identifier Y, then during testing with specification (E, A, T), where E, or A, or T means X, not only the resources E, A are T are retrieved, but also the resource Y. The repository of resources contains complete information concerning dependencies between the resources.

For example, the resources contained in the repository of resources are files. Moreover, the resources may be a kind of classes or functions with executable instructions (such as Java language classes or Python language functions) included in files and designated with unique codes therefor.

Class identifier. In an exemplary embodiment, the repository of resources comprises files of two kinds: (i) JAR-type (Java ARchive) file that comprises a bytecode of Java language classes and that can be executed by Java Virtual Machine (JVM); in a JAR file there can be included a class that implements an evaluation procedure and/or class that implements certain algorithm, optionally classes that implement some algorithm; (ii) a file with data that represent certain algorithmic tasks, e.g. ARFF-type file that represent a statistical classification task.

These two kinds of files can be distinguished basing on the file extension: j ar for JAR-type files or any other in cases of data files. Resource identifiers can have a textual form of a series of characters unique for a specific resource. For example, identifiers can have the following form: 1. TextCompress ion . jar - identifier for a file with Java code;

2. spam . ar f f - identifier for a file with data in a ARFF format that may represent e.g. a task for recognition of 'spam' type e-mail messages (i.e. classification of a message as spam or non-spam);

3. someDocument . txt - identifier for a file of any contents and format that can be used e.g. in data compression algorithms;

4. UCI /data/glass . arf f , UCI/data/audiology . arf f - identifiers can have a multi-part structure so as to make it possible for the user to handle them, e.g. to memorize; this structure is similar to the structure of names of directories in a computer disc in conventional operating systems (e.g. Windows, Linux), where the files may be positioned in nested directories and the complete unique file name (uci /data/glass . arf f) is composed as a concatenation of names of the successively quoted nested directories (UCI , data) and the file name (glass . arf f ), separated by a slash „ / ".

In another exemplary embodiment, evaluation procedures and algorithms may be implemented in any programming language, and then subject to software build to obtain standalone software and to be contained in a DLL file (Dynamic-Link Library) for

Windows or SO (Shared Object) for Unix/Linux operating systems. Exemplary resource - file names:

1. johnsmith/alg/compression/TextCompression.dll ;

2. johnsmith/alg/compression/TextCompression . so . In an exemplary embodiment, the resources are Java language classes included in JAR files which have assigned identifiers to define the file and class therein. The class identifier is composed of two parts: a JAR file identifier from the repository of resources and the name of class included therein, separated by a colon „ : " which is considered a special symbol and cannot appear in the class name nor in the file identifier (and thus a given class identifier can be unequivocally divided into a file identifier and class name; moreover, basing on analysis of the identifier itself it can be stated whether this is a file or class identifier).

For example:

TextCompression . j ar : org . j ohnsmith . NewPPM_Algorithm is a Java class identifier named: org . j ohnsmith . NewPPM_Algor ithm included in a resource - file JAR having an identifier:

TextCompression . j ar .

If test specification input to the testing procedure provides a class identifier, then retrieving of the corresponding class consists on retrieving a file the class is contained in. For example, if the evaluation procedure identifier E is an class identifier in a form of E = J:C (J - JAR file identifier, C - class name), the testing procedure retrieves the file J and then actuates a fixed method (e.g. a method named run) for class C. If instead of J:C an identifier J (E=J) was input, the same file J would be retrieved, but another class (with some pre-established name) could be executed. Due to the possibility to input a class identifier in testing specification, the author of the class that implements evaluation procedure can assign arbitrarily any name to the class.

Similarly, the identifier of an algorithm A or an algorithmic task T in testing specification may be a class identifier. Therefore, evaluation procedure used in a given test actuates the corresponding method/methods of a class specified by the input identifier and not of a class of a preset name. Parameterized identifier. Parameterized identifier is a concatenation, which is a combination of a resource identifier and a parameter specifier. Parameter specifier is a series of characters that defines values of selected parameters that control operation of executable code included in a given resource. Parameterized identifier can be present in testing specification in lieu of a resource identifier (non-parameterized). Then upon actuation by testing procedure or evaluation procedure of a resource code indicated by a parameterized identifier, values are attributed to the suitable parameters, said values being indicated by the parameterized identifier. In an exemplary embodiment, parameter specifier is a concatenation of one or more parameter specifiers separated by a vertical bar „ | ". The parameterized identifier is a resource (file, class, function, etc.) identifier with a parameter specifier following immediately the vertical bar „ | ". The mark „ | " is considered a special one and it cannot be present in a resource identifier or parameter specifier.

The parameter specifier is a sequence of characters in a form of: N = W, where N is the parameter name and W is the value thereof. In another embodiment, parameter specifier comprises solely the value, and then its localization within the parameter specifier (position of the specifier in a series of specifiers) determines to which parameter a given value is to be assigned. For example:

Regress ion . j ar : org . j s . Polynomia lRegress ion | degree=3 | alpha=0 . 2 is a parameterized identifier that specifies two parameters: degree and alpha. These values are to be assigned upon actuation of the class instruction org . j s . PolynomialRegression from the file Regress ion . jar . The method for assignment of parameter values for the code being actuated depends on the particulars of the testing procedure implementation (for parameterized evaluation procedures) or a specific evaluation procedure (for algorithms and algorithmic tasks).

As it has been indicated above, a given resource may be used for testing in several different roles: an algorithm, algorithmic task and/or evaluation procedure. E.g. a JAR file with an identifier JAR_Compression . jar can comprise a data compression algorithm that will be subjected to evaluation, and at the same time for further testing it can be used as a set of data that describes a data compression algorithmic task, i.e. the contents of the file is to be fed to some data compression algorithm for compression. Therefore, it is possible to perform testing with the specification (E, A, T), when: A = T = JAR_Compression . jar .

In other words, the subject resource can be used for performing two different functions, of an algorithm and of an algorithmic task.

In another embodiment, different tests can be performed with a specification (E 1 , Ai, T 1 ) and (E 2 , A 2 , T 2 ), where A 1 = T 2 = JAR_Compression . jar , and this causes that the resource in different tests fulfills two different functions. Another example: the resource Compression , jar can include simultaneously a data compression algorithm subjected to evaluation and an evaluation procedure as such. Therefore, it is possible to perform testing with the specification (E, A, T), where

A = E = Compression, jar . Method for automatic algorithm testing according to the invention is executed in a testing procedure operation mode with the use of the contents of the repository of resources, basing on the specification provided by the user. Results of operation of the testing procedure can be stored in the database of results, so as to allow further analysis thereof and comparison to results of other tests. In an exemplary embodiment, a database of resources is created, said database comprising at least one resource that includes an algorithm, algorithmic task and evaluation procedure. The resources can be files that can optionally include classes. Retrieval of a resource is equal to retrieval of a file or retrieval of a class-containing file.

In the automatic testing method, a testing procedure is actuated which executes operations presented in Fig. 2. The testing procedure actuates an evaluation procedure which executes operations presented in Fig. 3. The subsequent methods steps are commented below in more detail. i) The testing procedure is actuated, which testing procedure retrieves a testing specification (E, A, T) set by the user. ii) Then the testing procedure retrieves from the repository of resources a resource with the identifier E; checks whether the resource E includes an evaluation procedure; if ,,YES", the evaluation procedure is actuated and the identifiers of the resources A and T are sent to it; if ,JvTO", operation is discontinued (with optional generation of error message). iii) The actuated evaluation procedure retrieves from the repository of resources the resources with the identifiers A and T; checks whether the resource A includes an algorithm of a type supported by the evaluation procedure and whether T represents an algorithmic task of a type supported by the evaluation procedure; if from any of these checks ,,NO" is output, the evaluation procedure ceases to operate; if both checks generate ,,YES", the procedure passes to step (iv). The evaluation procedure does not have to accept as arguments all the algorithms or algorithmic tasks stored in the repository of resources. Usually, the evaluation procedure will be brought into line with a specific type of algorithms and tasks (e.g. solely algorithms and statistical classification tasks). If upon retrieval of the resources A and T it results that they do not represent suitable kind of algorithm or task, the evaluation procedure stops operation, and it can also generate an error message. iv) The evaluation procedure actuates the algorithm included in the resource A.

The evaluation procedure provides the input data to be read out by the algorithm, specific for the algorithmic task included in the resource T and specific for the actuated evaluation procedure. In the course of resolving the algorithmic task by the algorithm, output data are generated which are subjected to analysis by the evaluation procedure. Parallel to solving of the algorithmic task by the algorithm or after completion of resolving of the task, the evaluation procedure may measure the level of usage of the system, e.g. the level of usage of computing capacity of the computer and/or some its subassemblies, such as RAM memory, hard disk, etc. As a consequence of operation of the evaluation procedure at least one result value is obtained, e.g. in a form of a numerical value or a set of result values, e.g. in a form of a set of numerical values.

The result value may be obtained in an evaluation procedure operation mode, through analysis of algorithm output data when compared to the reference data provided by the algorithmic task. The result of the analysis generates one or more result values that characterize accuracy, efficacy and/or effectiveness of data processing performed by the algorithm A when used for data specified by the task T. The result values constitute jointly the test result. Number, meaning and calculation method of the result values are specific for a given evaluation procedure.

The testing procedure may be executed again or repeatedly for a given test specification (i.e. according to a preset evaluation procedure, on a preset algorithm, with the use of the same algorithmic task). The calculated result values constitute jointly a further test result. v) The testing procedure optionally transmits the test result to the database of results, where the result is stored. Along with the result, the test specification is transmitted and stored.

If at least two test results are assigned to the same testing specification (in preferable embodiment, when the testing procedure is executed at least twice for a given test specification), upon transmittal of the test result to the database of results along with the test specification, to the result a numerical designation is assigned, said designation along with the testing specification unequivocally characterizing the result. In an exemplary embodiment, the first test result with a preset testing specification is assigned the number

1. the subsequent result is assigned the number 2, and any possible further test results are assigned the subsequent integers in an ascending series, i.e. numbers 3, 4, 5, etc. vi) The test results stored in the database of results (along with the testing specification and optionally with information to define the user who actuates the testing procedure and/or possibly with information to characterize the hardware/software environment in which the testing procedure is actuated), with the user data, can be then used for analysis of the collected results. Preferably, the results can be subject to the following analysis: i) performing statistical analysis (calculation of a mean value, standard deviation, variance, etc.) for different test results characterized by the same specification (E, A, T); ii) performing result selection for testing of a selected type, e.g. directed to one pre-established algorithm or one pre-established data set; iii) sorting the selected test results according to a selected syntax value of the result (e.g. usage of memory) ; iv) performing a comparative plot that imagines the selected test results; v) comparing the results of different algorithms actuated on the same task, in the same evaluation procedure; vi) comparing results obtained from evaluation procedure, in the course of resolving different algorithmic tasks, by a single selected algorithm.

In one exemplary embodiment, testing procedure is implemented in the Java language and executed on Java Virtual Machine (JVM). Similarly, evaluation procedures and algorithms are implemented as Java language classes. Actuation of evaluation procedure E or algorithm A may then consist on execution of a sequence of the following steps: 1. retrieving from the repository of resources of the resource of identifier E or the resource of identifier A and performing check whether this is a JAR type file;

2. loading from the JAR file of the byte code of the corresponding class to JVM, by calling the method defineClass of a standard class object java . lang. ClassLoader ;

3. calling a suitable method(s) of the loaded class. The name of the loaded class and the returned name, arguments and kind of the called method are determined by the testing procedure (when the class implements the evaluation procedure) or by the evaluation procedure used in given evaluation process testing (when the class implements the algorithm).

For example, testing procedure may require for each evaluation procedure to be implemented in a method run of the class org . test . Evaluation, that retrieves two string-type arguments (identifiers A and T) and returns a Double [ ] type object (table of test result values). Declaration of the class stored in the Java language might be as follows: package org. test; public class Evaluation { public Doublet] run (String algorythm, String task) { ... body ofthe method ... }

Similarly, evaluation procedure for data compression algorithms may require for each such algorithm to be implemented in two related methods compress and decompress of the class org . test . Compression , that retrieve one byte [ ] type argument (a sequence of bytes) and returns also a byte [ ] type object. The method compress executes compression of the provided sequence of bytes and returns the compression result, i.e. a sequence of bytes - usually smaller than the original one - that can be transferred to the method decompress, to obtain as the object the returned original series of bytes as earlier fed to the method compress.

In another embodiment, the testing procedure is compiled to a machine code of a selected machine, e.g. based on Intel processor architecture and operating under control of

Windows operating system. Evaluation procedures and algorithms are implemented as functions compiled into a machine code of this machine. The functions are placed in files of DLL libraries {Dynαmic-Link Library). Actuation of an evaluation procedure or algorithm may then consist on execution of a sequence of the following steps: 1. retrieving from the repository of resources of the resource with the identifiers E or A, and checking whether this is a DLL type file;

2. loading the DLL library into the address space of the testing procedure, e.g. by calling a LoadLibrary function defined by API (Application Programming Interface) of Windows; 3. calling a function that implements the evaluation procedure or algorithm; name, arguments and type returned to the function are determined by the testing procedure (if the function implements an evaluation procedure) or by the evaluation procedure used for given testing (when the function implements an algorithm).

As indicated above, the testing procedure can be executed repeatedly for the same testing specification. Since to each result the successive numerical value is assigned, there also exists a possibility to store a plurality of results in the database of results for the same specification. The results can be non-identical, since they usually depend on prospective varying environment characteristics where testing is performed or on elements of non-determinism in the objects tested (algorithms, tasks, evaluation procedures). Even if the results are not identical, they still have some mutual specific features that can be subjected to statistical analysis upon having collected some number of the results in the database of results. For example, all the results for some testing specification can derive from a common probability distribution the parameters of which - such as a mean, standard deviation, etc. - can be estimated basing on the contents of the database of resources. For example, the consecutive test results will vary if one of the test elements is the operation time of the algorithm that depends not only on the testing specification but also on external factors such as technical parameters of the computer where it is executed or execution of other applications actuated parallelly (concurrently) to the testing procedure. Differentiation of the results may also arise from internal features of the algorithms, tasks or evaluation procedures being tested. For example, a neural network learning algorithm may randomly initiate network connection weights; the algorithmic task may also specify input data for the algorithm that are generated by addition of random disturbances to some established set of data - then upon each actuation of testing these disturbances will vary slightly as to the value but they will be the same as to their general nature (e.g. always derived from normal distribution of a fixed variance); the evaluation procedure of algorithms of statistical classification machine learning may divide the input data specified by the task into two disjoint parts - training and testing ones - on a random basis.

As indicated above, the obtained test results can be transferred to and stored in the database of results. The database of results is a database that comprises results of tests obtained by automatic testing method according to the invention and stored in such database. In a preferred embodiment, the repository of results and the database of resources can be physically created as one database that combines both functionalities.

To the test result when sent to the database of results, user- identifying information can be added, said information concerning the user who actuated the testing procedure. The information is stored in the database of results along with the corresponding result which enables further selection of results sent solely by a specific user or comparison of results sent by different users.

Additionally or alternatively, when sending the test result to the database of results, information can be enclosed to characterize the hardware/software environment in which testing was performed. E.g.: the size of available internal memory of the computer, CPU operation rates, name and version of the operating system, the results of certain system efficiency measurements, performed by the testing procedure, etc. The information is stored in the repository of results along with the test result which enables further analysis of relationship of results in relation to the hardware/software environment where testing is performed.

Before the test result is introduced into the database of results, it is checked whether during testing any of the resources to be tested has been updated or deleted. If so the result is not entered into the repository of results, being a non- valid result.

Number and meaning of components of the result are determined by the evaluation procedure E used for testing. For example, if the resource

ClassificationEval . jar comprises a statistical classification algorithm evaluation procedure, then the results of all tests where

E = ClassificationEval. jar may be composed of a pair of result values (T, A), where T is the total operation time of the algorithm being tested and A is the classification accuracy rate obtained by the algorithm.

In another exemplary embodiment, if the resource

TextCompression . j ar comprises a compression algorithm evaluation procedure, then the results of all tests where E = TextCompression. jar may be composed of three result values (Z 1 , h, r), where U is the operation time of data compression, / 2 is the decompression operation time, and r is a data compression rate obtained by the algorithm being tested.

In the subject invention also possibilities of modification of the repository of resources and automatic adjustment of the contents of the database of results are contemplated. For example, it is possible to delete a resource from the repository of resources, to add a new resource, to update a resource, (to change the contents of the resource), to change a resource identifier. Upon modification in the repository of resources, automatically the contents of the database of results is adjusted correspondingly. Preferably, the below indicated modifications are performed in a synchronous manner.

Deletion or update of a resource causes automatically deletion of all the related results from the database of results (they become invalid).

Change of a resource identifier causes automatically update of the testing specifications that are present in the repository of resources. Update of the testing specification consists on a change of presence (if any) of a resource identifier into a new one.

Since the resource in the repository of resources may be dependent on at least one further resource, execution of testing related to the resource requires retrieving also the resources it depends on. E.g., the algorithm in the course of operation may use other algorithms included in the repository of resources. In an exemplary embodiment, if a resource with an identifier X is dependent on a resource with an identifier Y, then during testing with the specification (E, A, T), where E, or A, or T is X, not only the resources E, A and T are retrieved, but also the resource Y. Since the repository of resources contains complete information on the relationships between the resources, in the case of the resource X that depends on the resource Y, as a consequence of deletion or update of Y, not only the test results for the specifications in which Y is present are deleted from the repository of results, but also the test results for the specifications where X is present are deleted.

In the automatic algorithm testing method of the invention, also a concept of preventing or constraining execution of undesired operations during operation of the evaluation procedure is envisaged. For example, the testing procedure may include security means that constrain or prevent performing operations of certain kinds (potentially harmful ones) by the executable code retrieved from the repository of resources and executed in the course of testing. Examples of harmful operations are such operations as deletion of data from the user's disk, installation of viruses, contacting websites with the user not aware of it, etc. These security means allow users to test resources generated by other users without risk of damage of the user's computer system.

The method for automatically performing quality tests and collecting results, according to the subject invention, provides opportunities for analysis and interpretation of the results.

Moreover, the method provides a possibility to add easily new algorithms, algorithmic tasks and evaluation methods, without necessity to modify the method steps.

The inventive method ensures performing experimental evaluation of an algorithm quality level, necessary inter alia for all data mining algorithms, machine learning or data compression, while maintaining comparability of the obtained results. The use of the method of the invention allows to limit the risk of selection of an improper algorithm or of excessively high evaluation level for an algorithm quality.