Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DATA AGGREGATION AND PERFORMANCE ASSESSMENT
Document Type and Number:
WIPO Patent Application WO/2018/002664
Kind Code:
A1
Abstract:
Implementations of the current subject-matter provide a method of satisfying a data query in a data processing system having access to a database defining, for each of one or more data objects, a plurality of data fields having data values allocated thereto. The method may comprise receiving a request for data relating to a specific data object; determining whether the requested data relating to the specified data object is present in the database; and, if the requested data relating to the specified data object is not present in the database, attempting to acquire the data from an external source. Attempting to acquire the data from an external source may comprise fetching unstructured data relating to the specific data object; analysing the unstructured data to identify data elements therein; and processing each data element to assess its relation to respective data fields of the data object and, on a relation being estimated between a data element and a data field, populating the data field for the specific data object with content data from the data element. A computer program product and a computer system are also provided.

Inventors:
OSBORNE JOANNE (GB)
ZATUCHIN DMITRIJ (PL)
Application Number:
PCT/GB2017/051942
Publication Date:
January 04, 2018
Filing Date:
June 30, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
OSBORNE JOANNE (GB)
ZATUCHIN DMITRIJ (PL)
International Classes:
G06F17/30; G06F11/30
Domestic Patent References:
WO2012125166A12012-09-20
WO2012057728A12012-05-03
Foreign References:
US20080177770A12008-07-24
Other References:
None
Attorney, Agent or Firm:
SLINGSBY PARTNERS LLP (GB)
Download PDF:
Claims:
CLAIMS

1 . A method of satisfying a data query in a data processing system having access to a database defining, for each of one or more data objects, a plurality of data fields having data values allocated thereto, the method comprising:

receiving a request for data relating to a specific data object;

determining whether the requested data relating to the specified data object is present in the database; and

if the requested data relating to the specified data object is not present in the database, attempting to acquire the data from an external source,

wherein attempting to acquire the data from an external source comprises:

fetching unstructured data relating to the specific data object;

analysing the unstructured data to identify data elements therein; and

processing each data element to assess its relation to respective data fields of the data object and, on a relation being estimated between a data element and a data field, populating the data field for the specific data object with content data from the data element.

2. A method in accordance with claim 1 , wherein the one or more data objects are a set of data objects each relating to a same type of a real object.

3. A method in accordance with claim 2, wherein each data object of the set has at least one same data field of the plurality of data fields.

4. A method in accordance with claim 3, wherein receiving a request comprises receiving a request for data relating to a same data field of at least some data objects of the set and the determining and attempting to acquire are carried out for each object of the set.

5. A method in accordance with any preceding claim, wherein the unstructured data is gathered from a search on the specified data object.

6. A method in accordance with claim 5, wherein the search is a search on a communication network which may be one or more of a shared network; a public network; a private network; a wide area network; a local area network; a wireless communication network; a wired communication network; the internet; and an enterprise network.

7. A method in accordance with any preceding claim, wherein the unstructured data comprises one or more of an image; a photograph; a video clip; a frame of a video; an icon; a logo; a sound file; and a wave file.

8. A method in accordance with claim 7, wherein the data elements comprise blocks of the unstructured data.

9. A method in accordance with any preceding claim, wherein analysing the unstructured data comprises assessing temporal data associated with the unstructured data.

10. A method in accordance with claim 9, wherein the analysing is carried out before processing each data element and only processing the data element if the temporal data indicates that the unstructured data is newer than current data in the database pertaining to the specific object.

1 1 . A method in accordance with any preceding claim, wherein a relation between a data element and its respective data field of the data object comprises estimation by one or more of: semantic recognition; optical character recognition; referring to a database storing multiple names for a data value; and querying an external data source about multiple names for a data value.

12. A method in accordance with any preceding claim, further comprising, if it is estimated that there is not or probably not a relation between a data element and its respective data field, discarding the data element.

13. A method in accordance with any preceding claim, further comprising determining whether a data frame schema exists for the data fields having a relation with a respective data element and if not, creating a data frame schema for the data object which includes those data fields.

14. A method in accordance with any preceding claim, further comprising determining whether a data frame schema exists for the data fields having a relation with a respective data element and if one exists which includes only some of those data fields, expanding the data frame schema to include data fields not currently existing.

15. A method in accordance with any preceding claim, further comprising determining whether existing data aggregation rules apply to the data fields having a relation with a respective data element and if not, constructing one or more aggregation rules for the data fields.

16. A method in accordance with any preceding claim, wherein attempting to acquire the data comprises fetching various data and determining whether some of the data is unstructured.

17. A method in accordance with claim 16, further comprising fetching structured data relating to the specific data object and populating one or more data fields with content data from the structured data.

18. A method in accordance with claim 17, further comprising aggregating the data elements having a relation with a respective data field together with the fetched structured data and any existing data in the database.

19. A method in accordance with claim 18, further comprising generating a view of the aggregated data.

20. A computer program product comprising a computer-readable medium storing instructions which, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising the method of any preceding claim.

20. A computer system for aggregating data, comprising a computer-readable medium in accordance with claim 19 and one or more servers and processors.

21 . A method for analysing the performance of an application running on a data processing system, the method comprising, in a data processing system, implementing the steps of: repeatedly interacting with the application to generate a first map of the user interface structure of the application;

receiving performance data indicative of the performance of the application over a period of time;

reducing the complexity of the first map in dependence on the performance data to form a second map;

receiving one or more success criteria indicative of successful performance of the application; and

comparing the performance of the application as represented by the second map with the success criteria.

22. A method as claimed in claim 21 , wherein nodes of the first map represent user interface elements of the application.

23. A method as claimed in claim 22, wherein the application is a web application and nodes of the first map represent web pages capable of being served by the application.

24. A method as claimed in claim 22, wherein the application is a mobile application.

25. A method as claimed in any of claims 21 to 24, wherein the step of reducing the complexity of the first map comprises reducing the first map so as to form the second map.

26. A method as claimed in claim 25 as dependent on any of claims 22 to 24, wherein the step of reducing the first map comprises forming the second map such that it excludes nodes of the first map whose usage during the said period of time is below a predetermined threshold.

27. A method as claimed in any of claims 22 to 26, wherein the step of reducing the complexity of the first map comprises hierarchically arranging content of the first map to form the second map.

28. A method as claimed in any of claims 21 to 27, wherein the step of reducing the complexity of the first map comprises forming the second map such that it incorporates for nodes of the second map a representation of a subset of the performance data relating to those nodes.

29. A method as claimed in any of claims 21 to 28, comprising:

generating a modified map of the interface structure of the application;

comparing the performance of the application as represented by the modified map with the success criteria; and

if the performance of the application as represented by the modified map is greater than the performance of the application as represented by the second map storing a representation of the modified map as a candidate alternative structure for the application.

30. A method as claimed in claim 29, wherein the modified map incorporates for nodes of the modified map a representation of a subset of the performance data relating to those nodes.

31 . A method as claimed in claim 29 or 30, wherein the step of comparing the performance of the application as represented by the modified map with the success criteria comprises performing that comparison in dependence on environmental information for the said period of time.

32. A method as claimed in claim 31 , wherein the environmental information represents one or more of weather, current affairs, social media content, health information and financial market information, information represents one or more of the state of one or more natural forces, legal and/or political factors, current affairs, geographic factors, health, demographic factors, social media data, statistical data, financial and economic data, cultural factors, market and/or industry competition.

33. A method as claimed in claim 31 or 32, comprising receiving the environmental information by downloading it from an external source.

34. A method as claimed in claim 33, comprising receiving the environmental information by downloading it from a publicly available external source.

35. A method as claimed in any of claims 29 to 34, comprising automatically modifying the application in accordance with the modified map.

36. A method as claimed in any of claims 21 to 35, wherein the application is an e-commerce application.

37. A computer readable data carrier storing non-transient instructions for causing a data processing system to perform the method of any of claims 1 to 19 or 21 to 36.

38. A computer program product comprising a computer-readable medium storing instructions which, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising the method of any of claims 1 to 19 or 21 to 36.

39. A data processing system configured to perform the method of any of claims 1 to 19 or 21 to 36.

Description:
DATA AGGREGATION AND PERFORMANCE ASSESSMENT

The portions of this specification headed "Data Aggregation" and "Performance Assessment" are to be construed independently from each other. Terms used in one portion may bear a different meaning in the other portion. General or limiting statements made in one portion should not be understood to apply in respect of the concepts discussed in the other portion.

DATA AGGREGATION

Technical field

This invention relates to a method of satisfying a data query in a data processing system, in particular for aggregating data, a computer program product for implementing such a method and a computer system for aggregating data.

Background

One task which it may be useful for data management systems to be able to handle is data aggregation. One example is an on-line retailer, who may have customers presenting queries regarding particular products or what products having one or more specified criteria are available to purchase. Another example is an enterprise management system in which a query for a particular set of data may be made, such as a list of employees possessing a given professional qualification and who may have suitable experience for a particular business task. Other similar situations in which data from multiple sources is required to be obtained, aggregated and analysed to find a particular piece of information or to obtain a set of data satisfying one or more criteria will occur to those skilled in the art.

One limitation of the above-described systems is that they are limited as to what data they are able to obtain and aggregate. In particular, known systems may be able to aggregate structured data, for example by retrieving and analysing a data table, but do not have any capability to retrieve or analyse unstructured data, nor to use it when performing data aggregation. This inability presents a limitation in many instances where data is not available in a structured form. For example, for a given product query to an on-line retailer, it may be that an optimum product for a customer exists but data about that product is not currently stored and/or not available in a structured format.

It would be desirable to provide an improved method system which can retrieve, analyse and incorporate unstructured data in order to satisfy queries requiring data aggregation.

Summary

According to one aspect there is provided a method of satisfying a data query in a data processing system having access to a database defining, for each of one or more data objects, a plurality of data fields having data values allocated thereto. The method may comprise receiving a request for data relating to a specific data object; determining whether the requested data relating to the specified data object is present in the database; and, if the requested data relating to the specified data object is not present in the database, attempting to acquire the data from an external source. Attempting to acquire the data from an external source may comprise fetching unstructured data relating to the specific data object; analysing the unstructured data to identify data elements therein; and processing each data element to assess its relation to respective data fields of the data object and, on a relation being estimated between a data element and a data field, populating the data field for the specific data object with content data from the data element.

The one or more data objects may be a set of data objects each relating to a same type of a real object. Each data object of the set may have at least one same data field of the plurality of data fields. For example, whilst the data objects may all share a common field of an ingredient such as water, they may have different numbers of ingredients fields in dependence on the number of ingredients the specific real product they relate to has. Receiving a request may include receiving a request for data relating to a same data field of at least some data objects of the set and the determining and attempting to acquire may be carried out for each object of the set.

The unstructured data may be gathered from a search on the specified data object. Such a search may be a search on a communication network which may be one or more of a shared network; a public network; a private network; a wide area network; a local area network; a wireless communication network; a wired communication network; the internet; and an enterprise network.

The unstructured data may comprise one or more of an image; a photograph; a video clip; a frame of a video; an icon; a logo; a sound file; and a wave file. The data elements may comprise blocks of the unstructured data. In implementations where the unstructured data is an image, this may include a mixture of text and graphics and the data elements may comprise portions or blocks of the image.

Analysing the unstructured data can comprise assessing temporal data associated with the unstructured data. The analysing may be carried out before processing each data element and the data element may only be processed if the temporal data indicates that the unstructured data is newer than current data in the database pertaining to the specific object.

A relation between a data element and its respective data field of the data object may comprise estimation by one or more of: semantic recognition; optical character recognition; referring to a database storing multiple names for a data value; and querying an external data source about multiple names for a data value. If it is estimated that there may not be or is probably not a relation between a data element and its respective data field, the data element may be discarded.

The method may further include determining whether a data frame schema exists for the data fields having a relation with a respective data element and if not, creating a data frame schema for the data object which includes those data fields. The method may also include determining whether a data frame schema exists for the data fields having a relation with a respective data element and if one exists which includes only some of those data fields, expanding the data frame schema to include data fields not currently existing.

The method may further comprise determining whether existing data aggregation rules apply to the data fields having a relation with a respective data element and if not, constructing one or more aggregation rules for the data fields. Attempting to acquire the data may comprise fetching various data and determining whether some of the data is unstructured. The method can further include fetching structured data relating to the specific data object and populating one or more data fields with content data from the structured data. The method may further comprise aggregating the data elements having a relation with a respective data field together with the fetched structured data and any existing data in the database.

A view of the aggregated data may be generated. Such a view can be returned to satisfy the data query.

According to another aspect of the current subject-matter, there is provided a computer program product comprising a computer-readable medium storing instructions which, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising any of the methods described above in any feasible combination.

According to yet another aspect of the current subject-matter, there is provided a computer system for aggregating data, comprising a computer-readable medium such as the one described above and one or more servers and processors. The servers and processors may be ones such as those described herein.

Brief description of the drawings

The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings:

Fig. 1 shows an exemplary system environment suitable for use in implementations of the current subject-matter;

Fig. 2 shows a logical structure for a data processing and aggregation system in which illustrative examples of the current subject-matter may be implemented; Fig. 3 shows a flowchart of data aggregation consistent with implementations of the current subject matter;

Fig. 4 shows a part of the flowchart of Fig. 3 in more detail; Fig. 5A shows an exemplary data frame schema and Fig. 5B shows an exemplary data frame created from the schema of Fig. 5A; and

Fig. 6 shows a part of the flowchart of Fig. 4 in more detail.

In the figures, like reference numerals indicate like parts.

Detailed description

The current subject-matter relates to a computer implemented method and a computer system for data aggregation. Such systems may include local components and use a data network across which data can be fetched from external sources. The data to be retrieved can be data for use in responding to a client query, where the client may be a computing device set up to make such requests or a human interacting with a device that is capable of transmitting the query. The query may relate to data representing a particular real object or individual. It may be a query which could relate to multiple data items representing multiple real objects or individuals. Thus in many cases, multiple data sources are investigated in order to process the query, which may be a mixture of internal to the system and/or external. Some implementations are capable of retrieving unstructured data items, in many cases in addition to structured data items, and can use real time processing and data pattern recognition to handle the unstructured data. Thus unstructured data may be analysed and data formats created such that the unstructured data can be aggregated and stored in a useable format, in many cases together with structured data that already exists or that has also been fetched and analysed. In this way, a subset of data satisfying the query can be provided to the requesting client.

Figure 1 shows a system environment 100 in which an application or host server 200 can provide functions to a client device 201 over a network 202. The application server 200 may, for example, be a web server or a server device for another protocol. It may reside either physically or effectively as a cloud device in a host site. For example, the server 200 may run an on-line retailer's website. Alternatively or additionally it may be an enterprise's server system. It may comprise multiple servers. The client device 201 could, for example, be a personal computer or a mobile device such as a smartphone or tablet and may be operable to interact with the server 200 via the network 202. One client device is shown by way of example and to improve viewability of the figure, but it will be appreciated that multiple client devices may be part of the system. The network 202 could be a publicly accessible network, such as the internet, or a private network, such as an enterprise's intranet.

The application server 200 comprises a processor 210, a program and data store 21 1 , a working memory 212 and a communication interface 213. The data store 21 1 stores in a transient or non-transient way computer program instructions that are executable by the processor 210 to perform the functions described herein, for example to provide the functionality of data retrieval, analysis and aggregation, as will be described in more detail with respect to subsequent figures. The data store

21 1 also stores working data for use by the processor in executing the instructions. The memory 212 is usable by the processor as a temporary store for data. The communication interface 213 is communicatively coupled to the processor 210 and can communicate with other devices over the network 202. Such communication may be wired or wireless.

The client device comprises a processor 220, a program and data store 221 , a user interface 222 and a communication interface 223. The program and data store 221 stores in a transient or non-transient way computer program instructions that are executable by the processor 220 to allow it to request and interpret data from the application server 200. The user interface 222 can be controlled by the processor 220 to present to a user data received from the application server 200 and to gather input from the user which can be transmitted back to the application server. The user interface could, for example, be a touch screen. The communication interface 223 is communicatively coupled to the processor 220 to permit it to communicate with other devices over the network 202 and provide received information to the processor 220.

The system also includes other entities 203, 204 which are accessible over the network 202 and which may be termed external sources. Those entities could be web servers or devices using other protocols to provide external and/or third-party and/or environmental data. For example, those entities could store and be responsive to queries over the network 202 to provide information regarding current or historic data on any one or more of: weather, financial data, current affairs, travel events, user locations, service status, internet searches, product information and the like. In the context of the system 100 being an enterprise management system, each of these entities could store data for a particular department of the business.

In operation, a client device 201 may present a data query to the application server 200 across the network 202. The query may be a request for a report relating to data about a specific data object representing a real object or individual or it may relate to, for example, a characteristic of a set of data objects. The application server can determine whether it has any data stored in the memory 212 which may assist in satisfying the query and it can also fetch data via the network 201 from one or more of the external data sources 203, 204. Thus a search for relevant data can be performed in a number of locations. The data can be analysed and aggregated to produce a view of the data in accordance with the requested report. Processes for doing this will be discussed in more detail below.

Figure 2 shows schematically a method structure 230, which may be a logical structure of a server such as the application server 200, for implementing examples of the current subject- matter. The method starts at 232 and at 234 a data query request is received into a request monitoring service. The request monitoring service may be implemented by the communications interface 213, which is capable of receiving data queries and passing them onto a processor such as the processor 210. Thus at 236 a request is passed to a processor to process the data query request.

At 238 a data matching service is implemented. This service is for matching data to the data request as will be explained in the following. The processor may determine at 240 whether a data frame schema exists for the data query and feed the result into the data matching service 238. A data frame schema is a structure which can be populated with data pertaining to a data object, as will be discussed in more detail below with respect to Fig. 5.

The data matching service 238 proceeds at 240, where it is determined whether cached results are available. This may involve reading a memory such as the data memory 21 1 or 212, to determine whether any data relating to the query is available. As a first example, if the query is "Does topical pain relief cream X made by manufacturer Y contain salicylic acid?", the determination looks at stored data to determine whether a data record pertaining to cream X is stored and whether it contains ingredient information. As a second example, the query could be "Provide a list of topical pain relief creams containing salicylic acid", in which case the determination looks at stored data, which may comprise multiple data records each relating to a topical relief cream, and then looks into the content of those data records to determine which include salicylic acid in the ingredient list. In this latter case, the query would eventually be satisfied by returning a subset of the multiple data records which satisfy the query.

If the answer to 240 is in the affirmative, the process proceeds to 242, where a data frame schema for the returned data is retrieved from a database such as the memory 21 1 or 212 of Fig. 1. In the case of the first example, the data frame schema may be a product information sheet, having multiple data fields, one of which is populated by an ingredient. Other data fields may be populated by data items such as the product name, the manufacturer, the volume(s) in which the product is sold and the manufacturer Y. In the case of the second example, the data frame schema would be a report populated with a list of creams, each identified by name or manufacturer etc.

If the answer to 240 is in the negative, the process proceeds at 244, where sourcing of data, preferably in real time, is implemented. An input to data sourcing 244 is the fetching or retrieval of data from external sources at 246. Such external sources may be implemented by external devices such as devices 203, 204 in Fig. 1 . The fetched data can be in real time, meaning that it may be more up to date than similar data already stored in the server 200. Having fetched the data, at 248 it is determined whether the data is structured or not. If the data is structured, the process can proceed with data aggregation at 250. If the data is determined not to be structured, a data pattern recognition service is implemented at 252. This will be described in more detail below with reference to Fig. 3. Data sourced at 246 may be loaded and transiently stored in temporary storage at 254, which may be implemented by a temporary data store such as the store 212 of Fig. 1 . The temporary storage 254 may also be loaded with data from the memory 21 1 and it may also receive from and transmit data to each of the data pattern recognition service 252 and the data aggregation service 250. Thus the data aggregation service can, depending on the determined nature of the available data, receive input in the form of structured data (at 248) and unstructured data which has been processed by the data pattern recognition service 252. The data aggregation process will be explained in more detail with reference to Fig. 6 below. Following data aggregation, the results of the external data sourcing and aggregation are saved at 256. This can include saving in the memory 21 1 . The results may be saved by populating appropriate data frame schemas. Such data frame schemas may be ones that were retrieved at 240 or may be newly-created ones, as will be explained below.

At 258, a response to the query is generated. This can be done using the saved data, together with data frames retrieved from cached data, if any were found. The response is prepared as a view of the data, for example in the form of a report of the data which satisfies the data query. At 260 the response is transmitted to the client machine 203, 204 that requested it, over the network 202. Thus the process ends at 262. As and when a further data query is received, the process may be repeated.

Turning now to Fig. 3, this shows schematically a detail of a process of the data pattern recognition service 252 of Fig. 2. The process starts at 300, as a result of data being fetched from external data sources at 246. A determination as to whether the data is semi- structured is made at 302. It will be appreciated that data sourced from external sources may be in a variety of formats. One possibility is that the data is structured. For example, it may take the form of a manufacturer's product listing, which is formatted in accordance with a data frame schema that can be stored in the present system and returned to a user. This might be possible with a query such as example 1 discussed above. However, it may be that the product about which the query is made does not have a convenient manufacturer's product listing, perhaps because the product is manufactured outside the country in which a user is wanting to purchase it. In this case, the present system, unlike many prior art systems, attempts to retrieve the necessary data from a less formal source, such as a photograph of the product. Such a photograph may include a view of a label on the product packaging listing the ingredients that are required to satisfy the query. However, such data is not in an immediately useable format and hence is termed unstructured. The present applicants have realized the limitations of current systems which neither look for nor include such data and have recognized the need to provide a way of including such data in processing of client queries.

Thus if the answer to 302 is in the negative, at 304 the unstructured data is decomposed into individual data units or elements. For example, in the case of a photograph, each element may be part or block of the image, such as a part of a picture or a letter of text, such as may be found on a label. At 306 a semantic analysis is carried out on each data element. This will be described in more detail below with respect to Fig. 4. If the answer to 302 is yes, and/or following the analysis at 306, candidate structural models are explored at 308. This part of the process may involve looking up data in the database 21 1 and at 310 it is determined whether a suitable data frame schema for the data exists in the database 21 1 . The "data" at this point may be a mixture of structured data that was either present already in the database 21 1 or was retrieved from external sources 203, 204, as well as analysed unstructured data emanating from 306. Its composition depends on which of these types of data was found or retrieved.

If the answer to 310 is positive, at 312, an existing data frame schema is populated with the data. If the answer to 310 is negative, a new data frame schema to hold the data is created at 314. This is a feature of implementations of the current subjectmatter that is not present in many prior art systems i.e. the ability to format data that is not in a previously-used form. It may not be necessary to create a new data frame schema from scratch, but rather, if some of the data can be fitted into an existing schema, such an existing schema can be expanded to include data which does not fit.

Thus at 316 a data frame schema is populated with the data. Thus each field of the data frame schema is populated with an element of the data. As indicated above, the data frame schema may be either an existing data frame schema, an expansion of an existing data frame schema or a new data frame schema. The process ends at 318.

Figure 4 shows in more detail a process for the semantic data analysis 306 of Fig. 3 to determine a relation between a retrieved data object and a stored data object. The process starts at 400. Such analysis may include some form of data recognition, for example optical character recognition, possibly by an image capture device within the server 200. This may be useful if the captured data is a photograph, for example. An image can be broken down into individual data units or elements, which at 402 are each checked. It may be that some parts of the image are merely pictures that do not provide useful data, but other parts of the image may be text, and thus these parts of the image can be converted to a text value. At 404, it is determined whether each data element has a similarity with existing data stored in a database such as the memory 212. Determination of similarity will be discussed below. If the answer is yes, at 406, temporal data for that data unit is assessed. For example, a timestamp of the data could be retrieved. Thus at 408, it is determined whether the retrieved data is consistent or inconsistent with stored data, which can be retrieved from storage such as the store 21 1 of Fig. 1 . For example, if it is determined that the retrieved data is the same age or older than the stored data (identical or earlier timestamp), the process proceeds to 410, not using the retrieved data. If the retrieved data is older, it may relate to a discontinued product or a product whose best before date has passed and thus this data is not returned in response to the query. If it is determined that the retrieved data is newer than the stored data, the process proceeds to 412. Likewise, if it is determined at 404 that there is no similarity with existing data, the process proceeds to 412.

At 412 a similarity level is generated for each data element. Details of this process are implemented by stages 414 through 424 of Fig. 4. At 414, a base data unit of a data element is determined. For example, it may be a text character, a number or a Boolean operator. At 416, a referral unit for the query is retrieved from the database

21 1 . At 418, it is determined whether the referral unit and the base data unit match. If the answer is in the affirmative, at 420, it is determined whether a similarity of the base unit meets a set threshold.

The similarity can be determined using a variety of analyses. Semantic recognition may be used to determine the meaning of a symbol, possibly in the context of surrounding symbols. Thus a meaning can be attached to every symbol and subsets of multiple symbols can be recognized as forming words. Having identified symbols and, in some cases, concatenated them into words, the semantic analysis may determine that a particular query is satisfied. With respect to example query 2, it may determine that multiple data elements together form the words "salicylic acid" and that this is placed in the context of an ingredient list and thus that the product whose image is being analysed fulfils the criteria of the query. However, the semantic analysis 304 may also need to refer to a database or query an external data source to determine whether an identified word which appears not to be identical in fact satisfies the query. For example, it may be determined that the words "2-Hydroxybenzoic acid" are present in the image. The semantic analysis would in this case need to determine whether this is an equivalent to "salicylic acid", which in this instance would be the case, since it is the lUPAC name for salicylic acid. Thus the product whose image is being analysed would satisfy the query. Furthermore, the semantic analysis can take account of the context of a term, for example by determining whether the identified word is in an ingredient list as opposed to in a statement such as "Does not contain any salicylic acid". In other words, the analysis takes account of surrounding symbols. An exact match with the base unit may be found, or it may be decided that a threshold level of similarity is met. This latter case may be, for example, the equivalent name situation discussed above.

If it is determined at 420 that similarity is sufficient, the data element is stored in the database 21 1 . As previously discussed, this may be by populating a data frame schema. If it is determined at 420 that similarity level is insufficient, and thus that there is probably not a relation between the data element and the referral unit, the data element is discarded and the process returns to 416, to analyze the next data element. Thus data elements which relate to, for example, a portion of an image which does not provide any useful information in respect of the data query, are eliminated from the process.

If it is determined at 418 that the referral unit and the base data unit do not match, the data is also discarded and the process proceeds to 424. Equally, following a save at 420, the process proceeds to 424. At 424, it is determined whether the data element just analysed is the last one and if not, the process returns to 416. Thus the process loops round all data elements until they have all been analysed.

If it is determined at 424 that the data element just analysed is the last one and thus that all the data elements have been analysed, the process proceeds to 410, where a similarity level for each data unit is returned to 308 of Fig. 3 (exploration of candidate structural models). Thus the process ends at 426.

The concepts of data base unit and matching may be further understood with reference to Fig. 5. Fig. 5A shows a data frame schema 500 for a product and Fig. 5B shows the data frame schema populated with the data for a particular product (denoted by 500'). It can be understood from Fig. 5A that the data frame schema 500 contains a number of data fields, for example, product name, volume as sold ingredients, smell etc. Each of these fields is defined by a data type, such as a character string, a floating point number, an integer or a Boolean operator. Thus it can be understood that a base data unit may refer to a particular one of these types of data, and determining a data type of a data element of unstructured data may assist in the semantic analysis described above.

In Fig. 5B, the data frame schema is shown as a data frame 500' populated with data for a particular product, as an example Head & Shoulders ® Shampoo for Men. A populated schema 500' such as this may be useable to return to a user as a data query. For example, if the query is of the type such as example 1 above, where an ingredient query is requested for a product, this schema could be used to satisfy that request. For a query of the type of example 2, a different populated schema listing a number of products could be used. Thus, if a query asked to return all shampoos costing less than £2 for example, a subset of all the sets of data pertaining to shampoos would be returned.

Figure 6 shows schematically in more detail a process for the data aggregation service 250 of Fig. 2. The process starts at 600, where data from the database 21 1 may be input. At 602, it is determined whether any existing aggregation rules are applicable to aggregating the data that has been provided to the data aggregation service 250. If the answer is affirmative, at 604, aggregating logical rules are applied to aggregate the data. If the answer is in the negative, at 606, one or more aggregation rules are constructed. This process may use data that is temporarily stored in temporary storage 254 (which may be implemented by temporary storage 212 of Fig. 1 ). The process can then proceed to 604, where the newly- created aggregation rules are applied to aggregate the data. It should be noted that in some sets of data provided to the data aggregation service 250, it may be possible to apply some existing rules, whilst some new rules are created. Both types of rule can be used to aggregate the data. The process ends at 606.

Thus it will be appreciated that present systems and methods provide an improvement over the prior art in that unstructured data can be used to improve the returned query to the user, rather than ignoring such data. Moreover, in the event that no schema exists for the data, implementations of the current subject-matter can create a schema or add to an existing one, such that a comprehensive report can be returned to a user in a digestible format. Thus the current system is not bound by any need for fixed relationship tables or data dimension catalogues, since appropriate data structures can be created in dependence on the data fetched. Additionally, the system and method have the flexibility to create new rules for aggregating the data.

Implementations of the present system and method may satisfy data queries in realtime (taking account of data age) or on-demand mode from a client's or other external systems request to match the requested need for a report based on a given subset of hierarchical data. A server or servers may analyze the request, match it with available data frame schemas in a database, and if the request is not completely available in a database or does not match a data frame in a database, it can be determined that the data is semi-structured or unstructured. Data pattern recognition techniques can be applied to such data to design a new data frame schema or expand an existing one, then perform semantic analysis of every primary data unit such that a data schema can be applied to the data. New or existing aggregation logic can then be applied and the result can saved to a database, such as by populating fields in a schema. An output, as the result of such processing, is a view of the data which is generated and transmitted to a client. Such output may be in the form of a report, and may include raw data output with or without applied sorting techniques. It can be delivered through a service network.

The system may be capable of incorporating a variety of different types of unstructured data, including but not limited to photographs, video clips, a frame of a video, icons, logos, images captured from e.g. a website and sound files such as wave files.

User inputs at the user interface 222 may be received in any form, including, but not limited to, acoustic, speech, or tactile input. Possible input devices include, but are not limited to, touch screens or other touch-sensitive devices such as single or multipoint resistive or capacitive trackpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.

The functions of the processors 210, 220 described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof. These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor such as a microprocessor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to storage devices such as the storage devices discussed herein, which may include clients and servers. Such computer programs, which can be software, software applications, applications, components, or code, include machine instructions for a programmable processor such as a microprocessor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the term "machine- readable medium" refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable processor and which can receive instructions as a machine-readable signal. Such a machine-readable medium can store instructions transitorily or non-transitorily, such as for example as would a non-transient solid-state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable mediums can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random access memory associated with one or more physical processor cores.

The subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration. The implementations set forth in the foregoing description do not represent all implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of several further features disclosed above. In addition, the logic flows depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

PERFORMANCE ASSESSMENT

Technical field

This invention relates to computer software. Embodiments of the invention may relate to apparatus and methods for performing performance assessment of applications, optimising application performance and/or reporting on application performance.

The present invention is applicable to analytics systems which optimise the structure of an application and/or the data within an application with respect to defined success criteria. The application may be a web application, a mobile application or another form of application.

Background

Computing devices include, without limitation, mobile PCs, desktop PCs, server computers, smartphones and tablets. When a user interacts with a computing device to perform a function the user may do so by employing an application. The application that provides the essence of the function may be installed locally on the computing device. Alternatively, an interface application such as a web browser or a thin client may be installed locally on the computing device and that application may communicate over a network with another computing device on which an application is installed that provides the essence of the function. In the case of a smartphone, for example, in order to display a map the user may either install a mapping application on the phone or use a web browser on the phone to communicate with a server which implements a mapping application at its web server. In the latter case the computing load of providing the mapping function is divided between the web server, which may determine which maps to display and provide the appropriate image data to the web browser, and the web browser, which may provide the user interface, for example by executing instructions in a scripting language.

Users increasingly require the web and/or mobile applications that they use to be more personalised and relevant to their individual needs. In order to achieve this, the data within these applications needs to be as dynamic as possible. Accordingly, the structures of applications may be very complex. One notable class of applications that can be very complex is e-commerce applications, which present to a user a range of products, provide an interface whereby the products can be purchased, and may automatically implement promotions based on behavioural information gathered about the user.

One way to analyse the structure of applications is to perform a manual structural analysis. This may be done by an information architect. This structural analysis is time consuming.

One way to improve the structure of an application is to implement human expert advice based on experience or knowledge. An alternative is to perform automatic application optimisation methods. Application optimisation methods can be used to improve the functioning of data within an application. The objective of this may be to improve performance or to provide users with a more personalised, accurate and/or relevant user experience. If the owner or operator of an application can better tailor a user experience to a particular user then it becomes more likely that the application may provide a competitive advantage to its operator.

Typical current application optimisation methods involve defining one or more success criteria for an application. The part of the application optimisation system in which this is done may be referred to as the structure optimisation module. A success criterion may include factors such as revenue generation, reduction in operating costs, increase in application user time, increase in conversions, increase in clicks and/or increase in audience share. Once success criteria are defined the application optimisation method may then analyse aspects of the application against those criteria so as to provide recommendations for alternative application structures that can achieve the success criteria. Typically the aspects of the application that can be analysed are:

1 . Web application traffic and usage data. These may include each user's behaviour on the web application as measured by metrics such as hits, product views by a user, page views, session times, conversions, landing pages associated with use purchases and repeat visits.

2. Third party analytics data generated automatically by systems external to the application. Those systems may for example include Google Analytics, KISSmetrics and Mint.

3. Manually generated data from external data sources. This data may be compiled, aggregated and then interpreted through human intervention. Examples of data of this type include user product reviews, trends (e.g. from Google Trends Platform) and reports (such as Mintel Consumer Trends).

One problem with such application optimisation methods is that they do not have the ability to automatically interpret or take into account real time external data or variables derived from third-party data sources such as weather, news and financial data. This is a problem because these external factors can heavily affect the performance of the application against the defined success criteria.

Another problem with such application optimisation methods is that they only provide a limited range of advice on optimisation. They might be restricted to recommending optimising the structure of the application using combinations of the same components (e.g. nodes and edges) with the same data, or changing the arrangements of a user interface (e.g. through the order in which items are presented, the highlighting or sizing of items, or introducing shortcuts to a frequently used path).

Another problem with such application optimisation methods is that they must be fed with the success criteria. They cannot automatically suggest criteria that might result in advantageous changes to an application and/or data, or estimate what the impact of changed success criteria might be.

Research into improved application optimisation methods has focused on improving the use of graph modelling to represent application structures.

Summary

According to one aspect there is provided a method for analysing the performance of an application running on a data processing system, the method comprising, in a data processing system, implementing the steps of: repeatedly interacting with the application to generate a first map of the user interface structure of the application; receiving performance data indicative of the performance of the application over a period of time; reducing the complexity of the first map in dependence on the performance data to form a second map; receiving one or more success criteria indicative of successful performance of the application; and comparing the performance of the application as represented by the second map with the success criteria.

Nodes of the first map may represent user interface elements of the application.

The application may be a web application. Nodes of the first map may represent web pages capable of being served by the application.

The application may be a mobile application.

The step of reducing the complexity of the first map may comprise reducing the first map so as to form the second map.

The step of reducing the first map may comprise forming the second map such that it excludes nodes of the first map whose usage during the said period of time is below a predetermined threshold.

The step of reducing the complexity of the first map may comprise hierarchically arranging content of the first map to form the second map.

The step of reducing the complexity of the first map may comprise forming the second map such that it incorporates for nodes of the second map a representation of a subset of the performance data relating to those nodes.

The method may further comprise the steps of: generating a modified map of the interface structure of the application; comparing the performance of the application as represented by the modified map with the success criteria; and if the performance of the application as represented by the modified map is greater than the performance of the application as represented by the second map storing a representation of the modified map as a candidate alternative structure for the application.

The modified map may incorporate for nodes of the modified map a representation of a subset of the performance data relating to those nodes.

The step of comparing the performance of the application as represented by the modified map with the success criteria may comprise performing that comparison in dependence on environmental information for the said period of time.

The environmental information may represent one or more of weather, current affairs, social media content, health information and financial market information, information represents one or more of the state of one or more natural forces (such as weather), legal and/or political factors, current affairs, geographic factors, health, demographic factors, social media data, statistical data, financial and economic data (global, segmented, individual and/or corporate), cultural factors, market and/or industry competition.

The method may comprise receiving the environmental information by downloading it from an external source.

The method may comprise receiving the environmental information by downloading it from a publicly available external source.

The method may comprise automatically modifying the application in accordance with the modified map.

The application may be an e-commerce application.

According to a second aspect of the invention there is provided a computer readable data carrier storing non-transient instructions for causing a data processing system to perform a method as set out above.

According to a third aspect of the present invention there is provided a computer program product comprising a computer-readable medium storing instructions which, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising a method as set out above.

According to a fourth aspect of the present invention there is provided a data processing system configured to perform a method as set out above.

Brief description of the drawings

The present invention will be described below by way of example with reference to the accompanying drawings.

In the drawings:

Figure 7 shows the physical architecture of a system for performing application optimisation. Figure 8 shows the logical architecture of a system for performing application optimisation. Figure 9 is a flowchart illustrating a method for performing application optimisation. Figure 10 is a diagram illustrating functional components of the system of figure 9. Detailed description

The system to be described below is arranged to perform automatic application optimisation. By "optimisation" is meant the identification of modifications to an application that may lead, or alternatively have been identified as being likely to lead, to an enhancement in the operation of the application. The enhancement may be improved performance as measured against one or more success criteria. "Optimised", "optimise" and related words should be construed accordingly. The system described below is arranged to automatically gather environmental variables and to take account of those variables in determining an optimised alternative structure for an application.

Figure 7 shows an environment in which an application server 200 provides functions to a client device 201 over a network 202. The application server may, for example, be a web server or a server device for another protocol. The client device 201 could, for example, be a personal computer or a mobile device such as a smartphone or tablet. The network 202 could be a publicly accessible network, such as the internet, or a private network.

The application server 200 comprises a processor 210, a program and data store 21 1 , a working memory 212 and a communication interface 213. The data store 21 1 stores in a non-transient way computer program instructions that are executable by the processor 210 to perform the functions described herein: for example to provide the functionality of an application 250 and an optimisation engine 251 as will be described in more detail with reference to figure 8. The data store also stores working data for use by the processor in executing the instructions. The memory 212 is usable by the processor as a temporary store for data. The communication interface 213 is communicatively coupled to the processor 210 to permit it to communicate with other devices over the network 202.

The client device comprises a processor 220, a program and data store 221 , a user interface 222 and a communication interface 223. The program and data store 221 stores in a non- transient way computer program instructions that are executable by the processor 220 to allow it to request and interpret data from the application server 200. The user interface 222 can be controlled by the processor 220 to present to a user data received from the application server 200 and to gather input from the user which can be transmitted back to the application server. The user interface could, for example, be a touch screen. The communication interface 223 is communicatively coupled to the processor 220 to permit it to communicate with other devices over the network 202.

The system also includes other entities 203, 204 which are accessible over the network 202. Those entities could be web servers or devices using other protocols to provide external and/or third-party and/or environmental data. For example, those entities could store and be responsive to queries over the network 202 to provide information regarding current or historic data on any one or more of: weather, financial data, current affairs, travel events, user locations, service status, internet searches and the like.

Figure 8 shows the logical structure of the application server 200. The server has an operating system 252 which supervises the operation of the server. Running on the operating system are the application 250 and the optimisation engine 251. The application 250 can communicate over the network 202 to send and receive data from clients. The optimisation engine can receive data from the application regarding the operation of the application. That data may be received directly from the application, it may be derived from inspection by the optimisation engine of the application's data as stored in data store 21 1 , or it may be derived from monitoring the operation of the application on the server (e.g. the amount of processor load invoked by the application, the amount of memory used by the application or the amount of data traffic or communication requests processed by the application). That data regarding the operation of the application may correspond to the performance data 103 of figure 9. The optimisation engine can request and receive data from and to the network 202 so as to gather external data from the entities 203, 204. That external data may correspond to the external environmental variables 106 of figure 9.

The optimisation engine has an interface 253 to a command terminal 254. The command terminal 254 could be the computer of a user who is controlling the optimisation of the application 250. That user can transmit commands over interface 253 to initialise the engine

251 to perform optimisation of the application 250. For example, the user may indicate to the engine 251 how to gather data from the application 250 and the user may define to the engine 251 one or more criteria against which the operation of the application is to be optimised. Then the user can start the engine performing its optimisation. The results of that optimisation operation can be provided to the user over link 253 as a report, for example a list of one or more recommended changes to the application 250. Alternatively, or in addition, the engine 251 can communicate directly with the application 250 to make one or more of those changes. That may be done by the engine 251 changing the program code and or the working data of the application 250 as stored in data store 21 1 .

In one example, the application 250 could be an e-commerce application. The working data of the application could define a list of products that are to be offered for sale by the application, associated prices and product details, and presentational and hyperlink information defining the presentation of the e-commerce application to clients. In another example, the application 250 could be a mapping application. The working data of the application could define a set of map tiles together with a list of objects and their locations. In another example, the application 250 could control the operation of a transport network. The working data of the application could define transport routes and timetables, and the location of vehicles in the network. Where the application is a web application it could be supported by a web server service running on the server 200. The web server service could be part of the operating system 252 or separate from it.

In figures 7 and 8 the application 250 and the optimisation engine 251 are shown as running on the same computer. They could run on different computers. The optimisation engine could communicate with the application over the network 202 or another network. In figures 7 and 8 the application 250 is shown as providing functions by way of the user interface 222. The application could provide functions in other ways, for example by operating machinery or analysing data. In figures 7 and 8 the application is shown as being remote from the end- user device 201 on which the results of the application are consumed. The application could be running on an end-user device, in which case the optimisation engine could run on the same end-user device or on another device.

The operation of the optimisation engine will now be described in more detail with reference to figures 9 and 10. In figures 9 and 10 the following numbers denote the following elements:

100: Optimisation engine

101 : Application

102: Application structure

103: Performance data

104: Success criteria

106: External environmental variables

107: Advised application structure

1 1 1 : Server infrastructure

1 12: Structure composer

1 13: Storage

1 14: Performance assessment module

1 15: Service bus

1 16: Monitoring agent

1 18: Advice module

121 : User device

122: Input device

123: Processor

124: Storage

125: Display 131 : Connector

199: User

The optimisation engine 100 of figures 9 and 10 is configured to apply one or more algorithms to optimise an application 101 in accordance with one or more success criteria 104. In this example the application 101 is an application for serving web pages to remote users. In a first step performed by module 1 12 the optimisation engine analyses the application 101 and automatically composes an application structure 102 in dependence on that analysis. The application structure 102 represents the structure of the application 101 in relevant part, simplified to facilitate the subsequent steps. Step 1 12 may employ any or more of fuzzy logic, graph or network optimisation algorithms, clustering algorithms, probabilistic algorithms and machine learning techniques to derive the application structure 102. The application structure can then be assessed against the success or performance criteria 104 by performance assessment module 1 14. The monitoring agent 1 16 monitors the application structure. Then the advice module can automatically select and apply one or more algorithms to achieve an advised application structure 107 and therefore optimise application 101 in accordance with the success criteria 104.

The system of figures 9 and 10 provides a performance assessment of a current application structure using a machine-learning system incorporating reasoning services. The results output of the system may include optimization advice and/or information defining steps to implement that advice. Following that implementation, further analysis can be provided to the server, and the server may perform a further pass of structure recognition in dependence on the application structure to identify the static structure.

The system includes:

A. A structure composing method which takes as input information concerning the application and performance data from analytics and provides an up-to-date application structure for further assessment.

B. A performance assessment method based upon inputs of the determined application structure and success criteria to provide an objective assessment of the current application structure with respect to performance data and success criteria defined by (e.g.) the application owner. Success criteria are understood as goals and may include objective parameters to be improved upon in performance of the application (e.g. page visits, revenue increase, volume of sales increase of selected product).

C. An optimisation method of the application structure based on known algorithms for application structure optimization.

D. An external-factors based machine-learning recommending system to provide an application operator with advice on the improvement of an application structure based on set success criteria.

E. An external-factors based monitoring agent module with applied knowledge reasoning algorithms to provide advice on improvement of the application structure and/or data: e.g. service offerings or characteristics of offered services through the application.

The method may take into account external variables. Those variables may be processed by a suitable machine learning technique or knowledge-based reasoning. In this way the system can result in better application performance.

Structure composer module

In order to optimise the application 101 , first the application structure 102 is generated by the structure composer module 1 12. The application structure 102 may be a data structure of any suitable format. Examples of possible formats include but are not limited to networks, graphs, trees, maps, ontological paradigms and Markov chains. Performance data 103, or data derived or extracted from the performance data, may be attached to or otherwise incorporated in the application structure 102. The performance data may be incorporated by being included as a list, labels or values, or by means of tags or other indicative descriptors assigned to the selected data structure.

The structure composer module operates according to a series of sub-steps.

In the first sub-step the structure composer 1 12 detects the type of the application 101 . In dependence on the detected type the structure composer 1 12 selects one of a set of predetermined navigation structure scanning algorithms. Examples of the structure scanning algorithms include one for web applications whose output is provided via one or more uniform resource locators (URLs) and one for locally running applications such as native mobile applications desktop applications. In the second sub-step the application 101 is scanned by the selected algorithm. For web- based applications, including extranet and intranet-based applications, the selected algorithm may implement a traversing crawling methodology. For example, this may be done using BFS or DFS-based graph algorithms, a web crawler or a search engine such as DataparkSearch. For locally running applications such as mobile or desktop applications, the selected algorithm may run as a service on the device running the application. It may monitor data sent to the user interface of the device. It may provide inputs via, or in substitution for, the user interface. In that way it can interact with the application as if through the user interface. This is technique is known from the Monkey-Runner tool which is currently used in test automation techniques. Using this technique the system can infer the navigation structure of the application. The scanning algorithm results in a static data structure representing the navigation structure of the application. The data structure may be represented in a directional graph-based structure, with nodes and edges connected and with labels and values assigned to the nodes and edges to represent the scanned application.

In a third sub-step the data structure may be reduced in complexity. By reducing the complexity of the data structure the subsequent analysis steps can be simplified. The reduction in complexity may be done by reducing the data structure (reducing the number of nodes and/or edges) or by simplifying the data structure, for example by arranging it hierarchically. In the case of a graph-based data structure denoted GS, reduction may be done through the deletion of selected nodes NS of the graph structure from the set of all nodes and the deletion of selected edges E of the graph structure from the set of all edges. The deleted nodes NS may be those that meet some defined reduction condition, for instance that number of times the node is used in the time range t-τ to t, where t is the current time and τ is a window length, is less than a defined value b. τ and b may be manually defined or automatically calculated. The changed set NU without deleted nodes NS may be defined by:

NU = NU/{nui } : NOcc(nui) < b (i =i) N NOcc(nui)

where NOcc is a number of occurrences of the i-th node nu and N is the number of nodes in the graph structure. This deletion of nodes may result in some directed edges E of the graph structure being orphaned. Those edges can be deleted from the graph structure. The deleted edges E may be defined by:

E = E/({eji,eij }:) {nui } g NU

where e ej. are the edges which are connected a deleted node nu, with other j-th nodes in NU.

In a fourth sub-step the data structure (e.g. GS) may be enhanced by the incorporation of performance data 103. The performance data may be collected for the predetermined timeframe t-τ to t. The performance data 103 may be any data relating to the usage of application 101 . It may be generated by users of the application, owners of the application, logs of a server on which the application runs or other Applications that interact with Application 101 , which may be third-party and/or external Applications. This performance data may include without limitation user specific data (e.g. as to demographic, gender, age or geo-location), historic data (e.g. order history, written content or interaction with other users), social media data (e.g. posts, reactions, shared content or profile data) links, banking or financial reports and any other data which may be related or deemed to be related to the operation or usage of application 101 .

In a fifth sub-step a representation of the navigation structure of the application is generated. The navigation structure is denoted NNS(t, T ) (GS). NNS may be defined as follows:

SSN(t, T ) = (GS,{NAcc,NPop,Nlmp,T, Νζ},{ΕΟοη,ΕΡορ,ΕΐΓηρ, Εζ},θ,ω)

in which case in order to determine the NNS the following are calculated:

i. a parameter of bounce rate d^) for every node in the application structure (the parameter (t, T ) may be defined arbitrarily by the user or calculated dynamically basing on a statistics or probability approach);

ii. a minimum acceptable time ω of usage of every remaining node in GS (the parameter ω may be defined arbitrarily by the user or calculated dynamically basing on a statistics or probability approach employing a highest success rate for nodes with acceptable time higher than value ω);

iii. data Νζ related to every node or Εζ to every edge, representing any kind of information attached to the node or presented on the node: this may be for instance articles on a website implemented by the application, any images in the application, any products, or prices; the Εζ may further represent any data related to connections between nodes: for example links at a website implemented by the application between nodes or NUI (natural-user-interface) metaphors;

iv. for every node on the graph structure nu u e NU,u ε {1 ,2, ... ,N} :

a. a node accessibility NAcc(nu u ) representing how easily the node in the structure is accessible. This may be defined by length from root element to the node, by number of paths available to the node, both from external applications or internal, or other parameters which influence objective time needed to spend by a user to get to the node.; b. a node popularity NPop(nu u ,t, T ) representing how popular is the node, where popularity may be defined by usage of that node by users in the time window, time spent on the node, times of request to the nodes and other data related to the usage.; c. a node impression Nlmp(url u ,t, T , ^) representing how is the node impressive on the whole structure, so if it were deleted, the value of the application structure will drop down or go up, the success criteria like revenue will drop down or go up, taking into account data Νζ presented on that node. Impression may be related to an informative, usable, or financial function if defined.; d. for every user (i.e. every unique end-user that used that node during their session with the application within the given time window) of that node an Average Node Usage Time AT (nu u ,t, T ) indicating how long the node was used by the user, and if the AT of the u-th node is less than the minimum acceptable time ω then such a node is marked as a "transit" node. The AT of the u-th node may be defined for each user or as an aggregated function over all users. An operator may define this time with reference to a single test user, or the system may calculate this parameter automatically for all users, or the system may take into account user faceting and calculate the parameter per user facet, or per user.

v. for every edge on the graph structure e v e Ε,ν ε {1 ,2, ... ,Μ}, where N represents the total number of nodes of GS and M represents the total number of edges of GS:

a. an edge connectivity ECon(e v ) representing how many connections the edge has to other nodes in the structure; b. an edge popularity EPop(e v ,t, T ) representing how popular the edge was: e.g. how many times the edge was used or requested in the given time window; c. an edge impression representing how impressive on the whole structure the edge is: for example representing if it were deleted how the whole structure would lose or gain from that, so how many nodes will lose connections, so how many paths will be changed or deleted.;

vi. a Local Application Performance (LAP) using a measure of energy of the navigation structure NNS: ∑(v=i) (YEv Elmp(e v ))

The energy measure indicates how the application 101 performs in collision with performance data 103 for its usage by a sub-set of users in a range of time t-τ to t and with which data confidence yNu for every node and yEvfor every edge.

LAP is understood as an objective measure of application performance without taking into consideration success criteria 104. LAP indicates how the initial design, flow, structure or any other factor performs in terms of the efficiency, effectiveness or satisfaction of an end- user during interactions of users with the Application 101 as indicated by data registered as the performance data 103.

The value of LAP is at a maximum when the energy of NSS is at a maximum, which can be understood as:

LAPEnMAx(SSN ( t, T))=a∑(u= i) N (NlmpMAx(nuu,t,T))+P

In this way, when the structure composer 1 12 is presented with a specific application it outputs the application structure 102 which can be represented as a network of nodes connected together with edges indicating how the relevant application works. The application structure 102 can be used in the assessment of the current performance of the application.

It is notable that the structure composer module 1 12 is arranged to compose the application structure 102 in dependence on both the application 101 and the performance data 103.

It is also notable that the present embodiments involve - in effect - reverse engineering the application 101 to form the application structure 102. This can improve processing efficiency. The engine 101 may apply network theory with an asymmetric approach to form the application structure 102. As a result the application structure may interpret only discrete elements of the performance data 103. This means that only those parts of the application's structure that are actually relevant and/or used by the users fall to be assessed in the performance assessment module 1 14. This may differ from current approaches to determining an application structure for analysis, such as Best-First, PageRank, Shark- Search, and InfoSpiders in that the current approach may: (i) build the application structure 102 through web-crawling to represent the application 101 at a current point in time as a directed graph or tree; (ii) repetitively calculate weights for each node in the structure based on the number of connections to and from each found node; and/or (iii) segregate the performance data 103 (i.e. classify the performance data to groups based on, for example, the consuming computer, consuming browser, consuming user, content, server, interface, or financials (e.g. volume of purchases or revenue).. As a result, the set of discovered nodes may be classified (e.g. sorted or bucketed) by their weights, with the weights being derived from the strength and position of each node in the hierarchy. This means that every page or "element" being any single part of the application 101 can be classified to some bigger group with similar weights. In this context the term "weight" signifies how a node performs in the structure. The weight may conveniently be based on the depth in the hierarchy of all nodes and number of connections to other nodes: either internal to the application or external to other applications.

The performance data 103 may, for example, be obtained through a service interface between the engine 100 and the application 101 and/or through any suitable external or internal tracking libraries or integrated software (e.g. AppAnnie, Flurry Analytics, Google Analytics, DeskMetrics, a web logs collective mechanism or click or action trackers) and/or from other data provided by the operator of the application 101 .

The approach described above can result in a reduced complexity for generating the application structure 102 from, in a pessimistic case a complexity of O(N!) (i.e. an NP-hard problem) to complexity of 0(N 2 + aM), where N is a number of nodes discovered in the application, M is a number of edges (connections) between those nodes and a is a complexity parameter dependent on applied complexity reduction strategy. For example, a sample reduction strategy can mean that at the step where paths from one node to another are being identified from the performance data 103, the program will accept only data that fulfils one or more predetermined boundary conditions. Those boundary conditions may, for example, be a selected interval of time spent on a node, a minimum level of usage or a minimum sale price of an item. These simplifications improve the performance and complexity of the overall system, and are reflected in the advice module 1 18 to be described below. In the advice module by a preferred strategy for reducing the complexity of the application structure can be selected depending on the classification of the structure graph (e.g. degree, number of cycles, isomorphisms, levels or paths) and by benchmarking sets of structure graphs for the application 101 from prior periods of time. The adjustment of complexity of the processed structure done by the engine system 100 improves the efficiency of the interpretation of the performance data 103. The adjustment of complexity may be switched on in a manual, rule-based way, or in an intelligent manner with the application of elements including but not limited to probabilistic, self-learning, genetic and heuristic algorithms. For example, in the case of the application 101 being an e-commerce application with more than given cardinal of performance data 103, graph simplification methods (e.g. Gansner's heuristic algorithm to eliminate cycles, or Minimum Spanning Tree (MST) algorithms like Fredman & Tarjan's algorithm for network reduction) may be applied. This approach results in reduced processing time to assess the quality of the application structure 102 and/or to generate the advised application structure 107.

Performance assessment module

The next step is implemented by the performance assessment module 1 14. In this module the application structure 102 is analysed in dependence on the success criteria so as to assess the performance of the application against the success criteria. For different groups of users of the application the assessment might be done differently. That can allow the quality of the application structure 102 to be assessed for a particular group of users. Those groups of users might be indicated manually or determined automatically using suitable clustering or pattern discovery algorithms. Those algorithms may, for example, take into account one or more of: demographic variables concerning the users, historical usage data and referral market data relevant to the application being analysed. User bucketing of this nature may be performed in the performance assessment module 1 14.

External environmental data 106 can be input to the advice module 1 18. The advice module can use that external data to increase performance of the application 101 and thereby derive improvements in the performance of the application structure 102 against the success criteria 104.

The external environmental variables 106 are imported from external sources. The external sources may be network-based data sources or environmental sensors: for example a temperature sensor or a humidity sensor. Examples of the data that may be derived from network-based sources include but are not limited to factors such as weather conditions of the regions where application operates, marketing reports related to the data stored for use with the application, statistical data on finance, prognosis, income data for user groups of the application, historical data and chat history on topics related to the products offered through the application, social media data and health data (including data relating to cosmetics and wellbeing). Data from those sources can be correlated with the application 101 . The term "correlated" is to be understood in a statistical sense, as meaning statistical relationships between (i) any aspect of the application structure 102 and/or the performance data 103 and (ii) any of the environmental variables 106. The external environmental variables 106 may be stored in any data structure compatible with the application structure 102.

The comparison of the application structure with the success criteria may be performed using a suitable metric, for example a network energy metric.

Monitoring agent

The monitoring agent 1 16 is set to assess either constantly or on an intermittent basis (e.g. every 60 seconds) how well the application structure 102 complies with the success criteria 104. This may for example be done using rule-based or alarm-detection techniques (e.g. with a Shewart control chart or Nine Nelson Rules). It the application structure 102 complies with the success criteria 104 to a level greater than a predetermined threshold then the monitoring agent may cause no further action to be taken. Otherwise, the monitoring agent may invoke the advice module to begin attempting to find a more optimised structure for the application.

The decision on whether to invoke the advice module may also be dependent on the external environmental variables 106. The advice module may develop over time a correlation map between the external environment variables, the performance data 104 and the application structure 102. If that correlation map indicates that an aspect of the external environment variables has changed in correlation with a deterioration of greater than a given threshold in the performance data then the advice module may be triggered.

Advice module

The advice module 1 18 is responsible for processing the output of the monitoring agent 1 16 to form one or more recommended optimisations for the application. Those optimisations may be provided in the form of a report (application advised structure 107) or in the form of feedback to the performance assessment module which can be used to model the effect of the optimisation(s).

In order to deliver the output in the advised application structure 107 the advice module 1 18 applies statistical analysis, data mining and artificial intelligence methods to inputs comprising (i) the performance data 103, (ii) the external environmental variables 106 and (iii) the result of a quality assessment of the application structure 102 performed by the performance assessment module 1 14 in dependence on the success criteria 104.

The advice module analyses the application structure 102, the external environmental variables 106, and the defined time-frame (which may be historical and future), to determine one or more modified application structures. For each of those structures it estimates performance against the success criteria 104. Where the performance of a modified structure is better than the performance of the present application structure as indicated at 102 it can be provided as part of the advised application structure 107. A user of the engine 100 can then select an advised application structure 107, or part of it, to apply to the application 101 . If this is done, a new application structure 102 can be generated.

The modified structures may be identified by systematically or randomly varying the application structure, e.g. by one or more of reducing its components (e.g. by omitting pages, form elements, data containers or multimedia elements) or re-ordering its components. These may be identified by algorithms including Adaptive Website algorithm, LogRank, Connecting Components of Navigation Structure (CCNS) algorithm, Improvement of Navigation Structure (INS) algorithm, Network Balancing algorithm, ant-colony meta- heuristic algorithm, algorithm of promotion, stepwise adaptation and orphan-pages reduction algorithm. Finding the truly best application structure (i.e. the one for which performance against the success criteria is maximised) is possible only if using a naive approach such as a complete search which involves checking all possible combinations of components of application structure for a given data and usage data. That is computationally complex. For example, a method of complete review may involve checking all possible combinations of edge connections for a given set of vertices of the graph structure GS. This is a difficult and time-consuming task due to the computational complexity of a complete review of that for a directed graph being O(N!). Instead, approximated approaches such as referring to an acceptable amount of energy breadcrumbs (En~) or a stepwise improvement of application structure might be preferred. An acceptable energy value (En~) can be arbitrarily predetermined (e.g. by an experienced analyst) or calculated by reference to the network navigation structures of comparable applications to the application 101 , e.g. using statistical methods for quality control such as a Shewart control chart. A value (En~) calculated on the basis of the energy networks of other application may be determined as follows:

(GSi)), where En(NNS ( t, r)(GSi))

i.e. the value of the energy of i-th network structure from the set of k applications with higher than the En value of the compared network.

In the advice module, program code defining clustering algorithms or other machine learning may learn to identify patterns from the performance data (or parts of it) and thereby automatically select and apply optimisation algorithms in the advice module so as to develop modifications of the application that achieve improved performance.

Preferably the advised structures do not include external environmental variables. This can increase the performance of the application 101 while it is being used and better fit the iterated application structure to the success criteria.

The advice module may use not only performance data 103 to find an improved application structure, but it may expand this data set with new portions of information obtained from the external environmental variables 106 as the result of knowledge reasoning algorithms acting on an initial learning data set of performance data. Referring to figure 10, the engine 100 runs on server infrastructure 1 1 1 . The server comprises a service bus 1 15 and storage 1 13. The result 107 may be output by the engine 100 and stored in a database on a temporary or persistent storage 1 13, transmitted through a service interface connector 131 to a user device 121 and displayed on a display 125 of that device. The result may be presented in any suitable form, for example downloadable file, visualization, text instructions or printable form. The user device has a processor 123, storage 124 and an input device 122. Server 1 1 1 may correspond to server 200 of figures 7 and 8. User device 121 may correspond to device 254 of figure 8.

The engine 100 may be implemented in any suitable form. Conveniently it may be implemented as a collection of software services implemented with any suitable software language(s), such as ASP.NET, Python, PHP, Java or Scala. It may be organised in modules (encapsulated portions of a system), database Storage 131 implemented with any data management solution (e.g. file system data structures, solid database, SQL, noSQL, distributed database systems, hash-maps). Modules may be organised as shown in figure 9 or in another way. Each module may have a service interface through which end users of the engine 100 may interact with the module through a suitable protocol (e.g. HTTP over TCP/IP). For example, the communication endpoints communicate through SOAP/WebService/RestAPI between modules and the application 101 , performance data 103 and/or sources of external environmental variables 106.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.