Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
FAULT CLASSIFICATION SYSTEM AND METHOD
Document Type and Number:
WIPO Patent Application WO/2002/086722
Kind Code:
A1
Abstract:
The present invention is concerned with a method of selecting an instance of a second parameter on the basis of data identifying a relationship between a first parameter and instances of the second parameter, where the data is organised as a distribution across the instances of the second parameter. The method includes the steps of: receiving data identifying an instance of the first parameter; retrieving a distribution corresponding to the received instance of the first parameter; for each instance in the retrieved distribution, calculating a cost representative of an incorrect selection; and selecting one of the instances on the basis of the calculated cost such that the cost of incorrect selection is minimised.

Inventors:
AZVINE BEHNAM (GB)
NAUCK DETLEF DANIEL (GB)
Application Number:
PCT/GB2002/001821
Publication Date:
October 31, 2002
Filing Date:
April 19, 2002
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BRITISH TELECOMM (GB)
AZVINE BEHNAM (GB)
NAUCK DETLEF DANIEL (GB)
International Classes:
G06F11/25; (IPC1-7): G06F11/25
Foreign References:
US5214653A1993-05-25
Other References:
FELKE T: "AN OPTIMIZING SYSTEM FOR TROUBLESHOOTING", PROCEEDINGS OF THE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE. (NAECON). DAYTON, MAY 23 - 27, 1988, NEW YORK, IEEE, US, vol. 4 CONF. 40, 23 May 1988 (1988-05-23), pages 1232 - 1236, XP000012392
GRABEL D: "DATA VERIFICATION - A PREREQUISITE FOR HEURISITIC DIAGNOSTICS", MEETING THE TESTS OF TIME. WASHINGTON, AUG. 29 - 31, 1989, PROCEEDINGS OF THE INTERNATIONAL TEST CONFERENCE, WASHINGTON, IEEE COMP. SOC. PRESS, US, vol. CONF. 20, 29 August 1989 (1989-08-29), pages 519 - 526, XP000089981, ISBN: 0-8186-8962-5
TORASSO P: "SUPERVISING THE HEURISTIC LEARNING IN A DIAGNOSTIC EXPERT SYSTEM", FUZZY SETS AND SYSTEMS, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 44, no. 3, 16 December 1991 (1991-12-16), pages 357 - 372, XP000258437, ISSN: 0165-0114
ROUSE G W: "LOCATOR - AN APPLICATION OF KNOWLEDGE ENGINEERING TO ICL'S CUSTOMER SERVICE", ICL TECHNICAL JOURNAL, PETER PERGRINUS LTD. HITCHIN, GB, vol. 7, no. 3, 1 May 1991 (1991-05-01), pages 546 - 553, XP000229080
Attorney, Agent or Firm:
Lloyd, Barry George William (BT Group Legal Intellectual Property Department Holborn Centre 8th Floor 120 Holborn London EC1N 2TE, GB)
Download PDF:
Claims:
CLAIMS
1. A method of selecting an instance of a second parameter on the basis of data identifying a relationship between a first parameter and instances of the second parameter, wherein the data is organised as a distribution across the instances of the second parameter, the method including the steps of: receiving data identifying an instance of the first parameter; retrieving a distribution corresponding to the received instance of the first parameter; for each instance in the retrieved distribution, calculating a cost that is representative of an incorrect selection ; and selecting one of the instances on the basis of the calculated cost such that the cost of incorrect selection is minimised.
2. A method according to claim 1, in which the step of calculating a cost that is representative of an incorrect selection comprises quantifying the proportions of instances of the second parameter and multiplying each quantified proportion by a predetermined cost associated therewith.
3. A method according to claim 1 or claim 2, in which the data is organised as a fuzzy logic distribution across the instances of the second parameter.
4. A method according to any one of the preceding claims, wherein the data includes data representative of a relationship between instances of a third parameter and instances of the second parameter.
5. A method according to claim 4, in which the data is collected, at least in part, from events wherein data representative of instances of the second and third parameters are generated.
6. A method according to any one of the preceding claims, further including displaying the distribution of instances of the second parameter.
7. A method of modifying a relationship between a first parameter and instances of a second parameter, wherein said relationship is represented as a distribution across said instances and an instance of the second parameter is to be selected on the basis of the distribution, the method including the steps of: modifying the distribution across said instances of the second parameter, calculating a cost associated with incorrect selection of an instance in the modified distribution, evaluating the cost with respect to at least one cost condition, and modifying the relationship between said first and second parameters in the event that the cost condition is satisfied.
8. A method according to claim 7, in which the evaluating step includes comparing said cost with a cost associated with one or more previous distributions.
9. A method according to claim 7 or claim 8, further including calculating a confidence value associated with the modified distribution, and said evaluating step further includes evaluating the confidence value with respect to a predetermined confidence value, such that said relationship is modified if the confidence value is additionally satisfied.
10. A method according to any one of claims 7 to 9, further including calculating an accuracy associated with the modified distribution, evaluating the accuracy with respect to at least one accuracy condition, and modifying said relationship if the accuracy condition is satisfied.
11. 10 A method according to any one of the preceding claims, in which the first parameter is indicative of a fault identification code.
12. A method according to any one of the preceding claims, in which the second parameter is indicative of a fault location.
13. A method according to any one of claims 4 to 11, in which the third parameter is a code indicative of technical features of the fault.
14. Apparatus for selecting an instance of a second parameter on the basis of data identifying a relationship between a first parameter and instances of the second parameter, wherein the data is organised into a distribution across the instances of the second parameter, the apparatus including: storage arranged to store the data; means arranged to receive data identifying an instance of the first parameter; means arranged to retrieve a distribution of instances of the second parameter corresponding to the received instance of the first parameter; means arranged to calculate, for each instance in the retrieved distribution, a cost representative of an incorrectly selected instance, and to select an instance associated with a minimum cost.
Description:
FAULT CLASSIFICATION SYSTEM AND METHOD The present invention relates to methods of, and apparatus for, reviewing relationships between parameters, and has particular application in diagnosis of faults.

In service industries, a significant proportion of time and resource is spent attending to interruptions to the provision of services. Such interruptions are often due to equipment failure, and, depending on the nature and location of the problem, can affect a few, or a large number of, customers. There is a lot of pressure for the service provider to diagnose and attend to faults punctually ; given the increasing levels of competition the market place, customers can be lost very quickly if a service provider is perceived to provide inefficient levels of service. In addition to the concerns relating to the marketplace, accurate diagnosis of fault locations has a significant impact on the internal efficiency and productivity of the service provider's work force.

Managing faults typically comprises a number of processes: firstly the nature of the fault is identified, secondly a record is assigned to the fault (the record includes details derived from the fault diagnosis stage), thirdly an engineer having the appropriate skills is allocated to a job corresponding to the record, fourthly the job is schduled into the engineer's workload, fifthly the engineer attends to the fault, and sixthly the engineer reports on the job. For service industries such as communications providers, fault identification is extremely difficult due to the complexity of products and services, the characteristics of the network and the vast number of customers. In addition, fault identification is partly dependent on information supplied by third parties (e. g. customers), which unavoidably includes uncertainty. Thus a combination of a complex matrix of fault sources, together with unreliable processing of the fault symptoms, means that fault diagnosis is a non- trivial task. However, it is crucial that faults are diagnosed correctly because the subsequent steps, which lead to actual attendance of a fault, are wholly dependent on accurate fault identification.

In an attempt to improve the accuracy of fault classification, several workers have developed systems that model parts of a fault-processing system.

These systems use traditional knowledge based techniques, which include expert

systems and search strategies and operate in accordance with first-order logic (for further details of these techniques refer to"Artificial Intelligence : A modern approach", by Russel S. and Norvig P., published by Prentice Hall, Englewood Cliffs 1995). However, these methods rely on a detailed logical model of the system, which is difficult to obtain and is unsuitable for fast reactive diagnosis. Other known techniques are based on statistical methods e. g. frequency analysis. These methods are all restricted to evaluating fixed solutions, so that it is extremely difficult to evaluate and compare the effects of various modifications to fault identification.

Tim Felke, in"An optimising system for troubleshooting", presented at the IEEE 1988 National Aerospace and Electronics Conference NAECON May 23-27 1988, describes an expert system that uses rules and facts to reduce the costs associated with troubleshooting. Essentially, when a part/system develops a fault, a so-called symptom (S) vector is determined for the part/system. In order to determine the S vector a number of tests are performed on the part/system, and the results from each test (pass/fail) are used to populate an instance of the S vector.

This S vector instance is compared with a two dimensional D matrix, an entry of which shows how often a test j (columns) has failed when a part i (rows) was faulty ("was the failure mode"), in order to identify a part/system that has a similar S vector signature to the determined S vector instance. Those parts that are both most similar to the determined S vector instance, and that incur least cost are selected for repair; in the paper, selection is based on a similarity measure (Ni), which represents similarity between the determined S vector instance and a row (i) of the D matrix divided by the sum of all similarities between the determined S vector instance and the rows of the D matrix.

Thus the symptom (S) vector describes a multitude of (possibly) faulty parts, and those parts having a small relation of cost to probability are replaced first.

However, this strategy biases selection of parts for repair towards those that are very unlikely to fail but are cheap to replace (Equation 8).

According to a first aspect of the present invention there is provided a method of selecting an instance of a second parameter on the basis of data identifying a relationship between a first parameter and instances of the second parameter, wherein the data is organised into a distribution across the instances of the second parameter,

the method including the steps of: receiving data identifying an instance of the first parameter ; retrieving a distribution corresponding to the received instance of the first parameter; for each instance in the distribution, calculating a cost representative of an incorrect selection ; and selecting one of the instances on the basis of the calculated cost such that the cost of incorrect selection is minimised.

This is different to the system described in Felke, where an instance of a second parameter (failure mode, or part to repair) is selected on the basis of data identifying a relationship between a distribution over instances of a first parameter (symptom vector S) and instances of the second parameter (a particular failure mode), wherein the data is organised into a distribution across the instances of the first parameter (elements of the symptom vector S). In contrast, embodiments of the present invention describe selecting an instance of a second parameter on the basis of data identifying a relationship between a first parameter and instances of the second parameter, where the data is organised into a distribution across the instances of the second parameter.

Moreover, in Felke, the cost associated with the distribution of the first parameter (symptom vector) is calculated for each failure mode, and the failure mode that incurs a minimal cost is selected. In contrast, in embodiments of the present invention, the cost of incorrect selection is calculated, and selection of an instance of the second parameter is based on a minimum incorrect selection cost.

In embodiments of the invention, the first parameter can conveniently be a fault identification code, and the second parameter can be a fault location.

Advantageously the data is organised as a fuzzy logic distribution across the instances of the second parameter, thereby allowing the distribution to be described in psuedo-natural language.

Advantageously the method further includes displaying the distribution, thereby enabling users to review the relationship between the first and second parameters graphically.

In embodiments of the invention the data also includes data representative of a relationship between instances of a third parameter and instances of the second

parameter, and the method further includes the steps of tagging at least some of the collected data with an associated instance of the first parameter, and organising the collected data into a distribution across instances of second parameter. In these embodiments, the third parameter is a code indicative of technical features of the fault.

Preferably the data is collected, at least in part, from events wherein data representative of instances of the second and third parameters are generated. Thus at least part of the data is representative of fault event data.

According to a second aspect of the present invention there is provided a method of modifying a relationship between a first parameter and instances of a second parameter, where said relationship is represented as a distribution across said instances and an instance of the second parameter is to be selected on the basis of the distribution. The method includes the steps of: modifying the distribution across said instances of the second parameter, calculating a cost associated with incorrect selection of an instance in the modified distribution, evaluating the cost with respect to at least one cost condition, and modifying the relationship between said first and second parameters in the event that the cost condition is satisfied.

Preferably the evaluating step includes comparing said cost with a cost associated with one or more previous distributions of instances of the second parameter. Additionally, the method may include calculating an accuracy associated with the modified distribution, evaluating the accuracy with respect to at least one accuracy condition, and modifying said relationship if the accuracy condition is satisfied. Preferably the relationship can additionally be modified on the basis of a confidence value associated with the distribution and/or a number of occurrences of respective instances of the second parameter within the distribution.

This method allows distributions to be evaluated and modified on a"what-if" basis, in accordance with various modification conditions.

According to a further aspect of the invention there is provided apparatus corresponding to the method of the first and second aspects.

In the following description, the term"user"is used and is defined as follows :

"user" : an entity that makes use of services supplied by a service provider.

A user may be a human, such as a customer or a manager, or it may be a piece of software that monitors for the occurrence of predetermined events. The service may be, for example, any form of utility service, where receipt of a service is, at least in part, dependent on the operational status of the corresponding utility equipment.

Such utility services include those for the provision of communications, gas, electricity, water etc. In particular, when the entity is human, the users could be a customer, a call centre operator, a call centre manager, a Field manager, an Operations Manager, or an Engineer etc.

Further aspects, features and advantages of the present invention will be apparent from the following description of preferred embodiments of the invention, which are given by way of example only and with reference to the accompanying drawings, in which Figure 1 is a schematic diagram of a workflow that utilises an embodiment of the invention; Figure 2 is a schematic block diagram showing an embodiment of apparatus for optimising decision tables ; Figure 3 is a flow diagram showing operation of validating means comprising part of the apparatus of Figure 2; Figure 4 is a graph showing application of some of the validation conditions criteria processed according to Figure 3; Figure 5 is an output of display means forming part of the apparatus of Figure 2, showing assigned and actual main fault location for a particular customer report code; Figure 6a is an output of display means forming part of the apparatus of Figure 2, showing an overall view of the accuracy of main fault location assignment for 3 different configurations of suggested customer report code-to-main fault location mappings; Figure 6b is a further output of display means forming part of the apparatus of Figure 2, showing a row view of the accuracy of main fault location assignment for 3 different configurations of suggested customer report code-to-main fault location mappings;

Figure 6c is another output of display means forming part of the apparatus of Figure 2, showing a column view of the accuracy of main fault location assignment for 3 different configurations of suggested customer report code-to- main fault location mappings; Figure 6d is a yet further output of display means forming part of the apparatus of Figure 2, pictorially showing accuracy of main fault locations assignment for a selected main fault location ; Figure 7 is an output of display means forming part of the apparatus of Figure 2, showing accuracy of main fault location assignment for 3 different configurations of suggested customer report code-to-main fault location mappings; Figure 8 is an output of display means forming part of the apparatus of Figure 2, showing compared costs for the five different configurations of suggested customer report code-to-main fault location mappings shown in Figure 7; and Figure 9 is an extract from a decision table, showing part of a mapping between customer report codes (CRC) and main fault locations (MFL).

General overview of fault handlinq Figure 1 shows the typical processes involved in reporting and fixing faults in communication lines and/or equipment. The process is initiated by a call 101 from a user reporting a fault with their communications equipment-e. g. a loss of dialling tone on their phone line. This call 101 is received by an operator 103 at a call centre, who asks the user a series of questions relating to the fault and attempts to assign a customer report code (CRC) 105 thereto. A customer report code (CRC) classifies the fault reported by the user, based on the information gathered by the operator 103 speaking to the user, into one of a plurality of fault types.

In the first instance, the operator 103 can attempt to fix the fault on-line.

However, if this is not possible (e. g. the operator 103 has insufficient information), the operator 103 records the job type, together with the CRC and various details relating to the user, and this record is sent onto the next stage in the process-to a customer support system 107. Alternatively faults can be reported to the customer support system 107 directly, bypassing the operator 103.

The customer support system 107 receives the record from the operator 103, extracts the CRC, and identifies a Main Fault Location (MFL) 106 from the

CRC. A Main Fault Location (MFL) identifies a particular part of, or location in, the communications infrastructure where the fault is likely to lie. Identification of MFL may involve performing a series of tests on either the line, exchange or communications apparatus and/or reviewing a decision table 111, such as that shown in Figure 9, which is an empirical table that maps CRC to MFL. Some fault conditions do not involve performing any tests, so that the decision table 111 is the sole source of MFL identification.

Once an MFL has been assigned, the customer support system 107 then passes the MFL onto a work manager system 108, which schedules repair of the fault on the basis of the type of the MFL. This involves identifying a technician 109 who is qualified to fix this kind of job, allocating the identified technician 109 to the job and allocating a time to the job. In general, the identified technician 109 travels to the fault location and, once he has completed the job, assigns a Clear Code (CC) 113 thereto. Each clear code (CC) is representative of a particular set of fault characteristics, and the technician 109 assigns whichever CC most closely resembles the characteristics of the job he has just completed. Each of the clear codes (CC) is mapped to a Main fault location (MFL) via a second table (not shown).

In the following description, reference is made to the following types of MFL: CA (fault at customer's premises); LN (fault in line) ; EX (fault at exchange); DT (fault sent to an engineer of a diagnostic test unit); CE (fault either in line or customer's premises (thus MFL either LN or CA)). These specific MFL types are used to illustrate features of embodiments of the invention, and are not an exhaustive list.

Misclassification of fault locations by ascribing the wrong MFL can be costly : in addition to the unnecessary expenditure of resources and time, customers can be inconvenienced. For example, if the assigned MFL is a CA (Customer's premises), but the fault is actually in the line (MFL is LN) and a technician 109 is sent to the customer's premises, the technician 109 will not be able to clear the fault. As a result the customer will have been inconvenienced unnecessarily, resources have been unnecessarily allocated, and the fault has not been fixed. Thus, embodiments of the invention are concerned with improving the accuracy of the CRC-to-MFL mappings, quantifying the cost of these misclassifications and factoring

the cost into any changes to CRC-to-MFL mappings, as is described in more detail below.

The process of assigning a CRC to a fault often involves the operator making a decision based on information gathered from the customer and/or basic line testing. There are several sources of uncertainty involved in MFL assignment: firstly either (or both) of the customer and the operator may provide incomplete information, so that the CRC can be assigned to an incorrect MFL, either accidentally, or because the fault details do not clearly fall within a single CRC; secondly, a CRC code could in fact cover more than one possible MFLs (e. g. if the CRC is"no dial tone", for which there can be at least 2 MFLs), so that the actual mapping assigned in the decision table 111 will only be correct some of the time, and thirdly the mapping in the decision table 111 is incorrect de facto.

In addition to the uncertainty described in respect of the initial MFL assignment, assignment of the Clear code (CC) is itself subject to error: e. g. if a technician incorrectly assigned CC_l to a job instead of CC-2, and if CC_1 maps to CA and CC_2 maps to LN, then the MFL allocated to this job will be CA rather than LN. If that data is then used to verify mappings between CRC-to-MFL (e. g. operator assigned a CRC that maps to LN), then the data could incorrectly invalidate the mapping.

In view of these many and disparate sources of uncertainty, embodiments use distributions instead of using crisp rules for validation of CRC-to-MFL mappings and/or for populating the decision tables 111. These distributions are built from the Clear Code (CC) information entered by the technicians 109, and the mapping between CC-to-MFL. In particular, embodiments use this data to build a"history"of mappings between individual Clear codes, denoted as CC-i, where i identifies individual CC, and Main Fault Locations, denoted as MFL i, where i identifies individual MFL: o Firstly, the CC-to-MFL historical data is mapped to an originally assigned CRC, and this CC-to-MFL data is used to generate an MFL distribution for that CRC.

As a large number of data is collected and used to create the distribution, the distribution is representative of the true relationship between CRC and MFL;

o Secondly, by using a distribution, the method accounts for the fact that a CRC can rarely be assigned to a single MFL, due to operator errors or the fact that the fault symptoms overlap with at least 2 CRCs. Thus embodiments of the invention have the advantage of implicitly including these uncertainties when validating the CRC-to-MFL mappings using historical, or real, data.

Populating the decision table 111 with distributions, which, in preferred embodiments include fuzzy sets, instead of crisp mappings between CRC and MFL has many additional advantages, such as allowing a technician 109, on receipt of MFL data, to review the fault details more closely before going to the job in order to correctly determine the type of the MFL. This is described in detail below.

Overview of an embodiment Figure 2 is a schematic block diagram showing an embodiment of the invention for optimising decision tables, generally referred to as decision table optimiser 200. The arrows between blocks indicate the direction of data transfer between components. The decision table optimiser 200 essentially comprises analysing means 201 for analysing historical data 202, a store DB1 for storing default conditions 204, an input 206 for receiving validation conditions 205 and validating means 207 for applying the validation conditions 205 to the historical data 202 and to the default conditions 204 in order to decide whether to make a change to the decision table 111.

The analysing means 201 stores each of the CC-to-MFL mappings and uses the mappings to extract information about the corresponding CRC-to-MFL mapping.

For example, consider Fault Y, which was classified by the operator 103 as CRC 13.

Suppose that CRC 13 currently maps to a single MFL, CA, and that, when the technician 109 eventually fixed fault Y, he entered a CC of 54 for the fault. Further suppose that CC 54 currently maps to MFL LN, so that the historical data 202 for this example is 54-> LN. Thus, for a single data point (i. e. Fault Y) CRC 13 is classified as LN, on the basis of CC-to-MFL historical data 202. In practice, for each CRC, at least several hundred data are collated in this manner and used to create a distribution, and the distribution is accompanied by a confidence value, which is based on the number of historical data 202 used to generate the distribution.

In addition to the historical data 202, default conditions 204 are used.

These default conditions 204 can generally be referred to as"expert opinions", and are recommended CRC-to-MFL mappings. The default conditions 204 include a confidence associated with the mappings, and in one embodiment they can be expressed as a probability distribution across all types of MFL (described in more detail below).

The validating means 207 fuses the analysed historical data 202 and the default conditions 204 together in accordance with the validation conditions 205 to provide a recommendation for making a change to the CRC-to-MFL mappings.

As described above, an important aspect of the validation process is the costs incurred due to incorrect MFL assignment. Embodiments of the invention can include costing means 209 for factoring in costs associated with each type of MFL, and then use the cost information to review changes to CRC-to-MFL mappings. For example, consider a derived distribution for CRC 1 of 60% CA, 35% LN and 5% EX. If the cost involved in assigning CRC1 to CA instead of CRC1 to LN is sufficiently high (e. g. for instances when the MFL should have been LN), the user may chose to assign CRC_1 to LN, even though on the face of it (statistically) the historical data 202 indicates that CRC1 should map to CA.

Thus further advantages of particular embodiments of the invention include being able to evaluate the cost of making a change to the CRC-to-MFL mapping, and being able to set the validation conditions 205 as dependent, at least in part, on certain cost conditions.

Other embodiments of the invention additionally include a parameter modifier 211 for modifying parameters comprising the validation conditions 205, for inputting the modified parameters to the validating means 207 and for reviewing the effects thereof. In one embodiment the effects can be viewed by display means 213, as described in greater detail below. These embodiments thus move away from fixed solutions, providing the user with a"scenario modelling"tool, and have the advantage of being usable by a variety of different users.

Referring to the components of the decision table optimiser 200 in more detail, the default CRC-to-MFL conditions 204 are typically presented as a distribution, e. g. For CRC1 the distribution may be 75% CA, 20% LN and 5% EX; and the confidence in this distribution may be 80%. When received by the validating

means 207 this distribution is then converted into a fuzzy set, by normalising the distribution (e. g. by the percentage for the dominant MFL, which for this example is CA). For this example, the default, or expert fuzzy set would be CA: 1.0, LN: 0.27, EX: 0.07, confidence value 80%.

The analysing means 201 receives historical data 202 from the customer support system 107 (Figure 1), which assigns CC-to-MFL mappings in response to a technician 109 entering a Clear code (CC) for a job. As stated above, the historical data 202 comprises actual CC-to-MFL mappings, each of which can be traced back to a CRC. The analysing means 201 stores each of the CC-to-MFL mappings, categorizing them as a function of CRC. For each CRC, the analysing means 201 evaluates the number of data that have been collected, and, if the number exceeds a predetermined threshold, the analysing means 201 evaluates the distribution of CRC-to-MFL for that CRC.

As an example, suppose that from all of the CC-to-MFL data collected, 700 data can be mapped back to CRC1. Suppose also that, on the basis of this collected data, the distribution of MFL corresponding to this CRC_1 is spread between CA and LN as 50% CA and 50% LN. A confidence value for the historical data 202 is then calculated by inspecting the actual number of data used to generate that distribution. For example, the analysing means 201 may set a (configurable) certainty threshold to 1000 data, meaning that if 1000 data have been received for CRC_1 (say), the confidence value is 100%. Thus the confidence value associated with this data is 70%.

In summary, the analysing means 201 receives CC-to-MFL data as input from the customer support system 107, classifies the input data as a function of initially assigned CRC, analyses the MFL distribution for each CRC and outputs this as a CRC-to-MFL distribution. The distribution is input to the validating means 207, which converts the distribution into a fuzzy set, again by normalizing the absolute values by the CA value. For this example the historical fuzzy set would be CA: 1.0, LN: 1.0, EX: 0.0, and confidence value 70%.

The validating means 207 also receives validation conditions 205, which may either be manually entered by a user, or may be available to the validating means 207 in the form of precompiled computer code. The validation conditions 205 determine how to fuse the default conditions 204 with the analysed historical

data 202. In preferred embodiments of the invention, at least some of the validation conditions 205 are expressed using fuzzy logic, for example, using a standard equation used in fuzzy logic theory, (which can be found in"Fuzzy Set Theory and its Applications", 3r'Edition, Kluwer Academic Publishers, Boston, 1996) : c = ##a#b+(1-#)#(a+b-a#b) Equation 1 Such an equation can be impemented in computer code, and can be made accessible to the validating means 207. In Equation 1, a represents membership degree of the expert fuzzy set, which, for the example given above is (CA: 1.0, LN: 0.27, EX: 0.07) with confidence 0.8, b represents a membership degree of the historical data fuzzy set (CA: 1.0, LN: 1.0, EX: 0.0), with confidence 0.7 and c represents a membership degree of the fused fuzzy set (to be calculated).

X represents the difference between confidence in the default conditions 204 (confidence expert) and confidence in the historical data 202 (confidence data), and is given by: A = 1-abs (confidence expert-confidence data). Equation 2 Thus if the value of confidence expert is similar to the value of confidence_data, x will tend to 1.0; if the value of confidence expert is very different to the value of confidence data, X will tend to 0.0.

In other words, if Ris 1.0 the confidence values are related by an"and" connection, which enhances the differences in the membership degrees of the fused fuzzy set (i. e. c = a b, indicating that both the default conditions and historical data 202 are equally confident that their respective MFL distribution is correct). If there are significant differences between the 2 sets, fused fuzzy set c will reflect the differences.

If X is 0.0 the confidence values are related by an"or"connection, and the membership degrees in the fused fuzzy set are more similar to each other (c = a + b-a-b). When k is between 0 and 1 there is a smooth transition between both extremes.

In this example, B= 1- (0.8-0.7) = 0.9. The fused fuzzy set, c, is therefore (CA = 0.425, LN=0. 175, EX = 0.01), which, when normalized, becomes

(CA = 1.0, LN = 0.41, EX = 0.02). The confidence value for the fused fuzzy set is then calculated, and may be given by the average of confidence expert and confidence data, which is 0.75 for this example.

Other ways of combining the default conditions 204, historical data 202 and confidence values may be used, such as a weighted average, where the weighting between confidence types is predetermined-for example c = p a + (1-p). b Equation 3 would give a weighted average of the membership degrees, where a, b and c are defined as per equation 1 and p is a weight that is attached to the confidence expert, e. g. p = confidenceexpert/ (confidenceexpert + confidence data), using fuzzy arithmetic (for further details refer to the Kluwer reference detailed above).

In at least one embodiment, the fuzzy set c can be presented to the user for review, whereupon the user can decide how to use this new information. For example, the fuzzy set can be utilised to make changes to the CRC-to-MFL decision table 111.

Such changes are subject to the fuzzy set satisfying a variety of conditions, which may be specified by the user and comprise a combination of confidence levels and features of the fused fuzzy set c (referred to as further validation conditions 205). Advantageously, non-mathematical users can specify conditions in natural language, because the validating means 207 uses fuzzy logic. Optionally, the validating means 207 can include a parsing means, to extract the content of the condition and reproduce it as a fuzzy logic statement (suitable parsing means are common general knowledge).

Alternatively, upon creation of the fuzzy set, the validating means 207 can automatically apply such conditions to the fuzzy set, without review of the fuzzy set c by the user. Examples of further validation conditions 205 include: Condition 1. Do not use information from historical data 202 if the confidence value associated with the historical data 202 is smaller than 1.0.

Condition 2. Change the mapping in the decision table 111 so that CRC i maps to the most frequently occurring MFL in the fused fuzzy set (winner-takes- all) ;

Condition 3. Change the mapping in the decision table 111 if the most frequently occurring MFL in the fused fuzzy set appears in at least 50% of all cases, or if it appears 30% more frequently than all other MFLs.

Condition 4. Change the mapping in the decision table 111 based on a fused fuzzy set if the difference between the largest membership degree and the second largest membership degree in the fused fuzzy set is at least 0.3.

Condition 5. Never change the mapping in the decision table 111 if the value of confidence expert is greater than 0.9.

Condition 6. Change the mapping in the decision table 111 based on a fused fuzzy set (ci) if the absolute value of largest membership degree of the fused fuzzy set exceeds 0.5, and if the difference in the largest membership degree and the second largest membership degree in the normalised fused fuzzy set is at least 0.3, and if the difference in the second largest membership degree and the third largest membership degree in the normalised fused fuzzy set is less than the second largest membership degree.

In one embodiment of the invention, in addition to using a distribution for evaluating CRC-to-MFL relationships, the decision table 111 is updated by a CRC-to- fuzzy sets MFL mapping, replacing the conventional one-to-one CRC-to-MFL mapping shown in Figure 9. Thus each CRC has a fuzzy set associated therewith, which, as described above, effectively defines the distribution of MFL for a particular CRC i.

When the decision table 111 is populated with mappings to fuzzy sets, the technicians 109 can be provided with information about the distribution between MFLs. They can then apply their experience and judgment in deciding which of the MFLs is most likely to be correct. For example, if a fuzzy set, c, for CRC1 has converged to CA = 1. 0, Lu = 1. 0, EX=0. 0 after a few months of collecting data and fusing default conditions 204 with historical data 202 as described above, then it is clearly impossible to classify CRC_1 into either of CA or LN, as both are equally possible. Thus in this embodiment, when an operator 103 classifies the fault as CRC1, the customer support system 107 sends the technician 109 the following statement: MFL:"CA or LN equally likely". The technician 109 can then request details of the job, and/or contact the customer to ask further questions etc. order to identify the correct MFL for this particular fault.

A further advantage of having CRC-to-fuzzy MFL set mapping is that the distribution in the fuzzy sets can be used to identify whether new CRCs should be created-i. e. in the example given above, it could be that some of the features comprising CRC_1 should be taken out and allocated to one or more other CRCs.

This would improve the accuracy of fault identification at source.

The process by which the decision table 111 is updated is shown in Figure 3, which is a flow diagram showing the steps carried out by the validating means 207 whenever data is received from the analysing means 201 (i. e. without review of the fuzzy set by the user). The figure makes use of the following nomenclature: ai : Current fuzzy set for CRC. The initial MFL for CRCi is given by the largest membership degree of a. Initially an expert specifies ai. bi : Fuzzy set obtained from the distribution of historical data 202 for CRCj.

Ci : Fuzzy set obtained by fusing a, and b,. ci. : normalised ci. dc: confidence threshold for analysed historical data (data confidence). dc ; : confidence-data confidence in analysed historical data for CRCj. ec: confidence threshold for confidence value for a fuzzy sets (expert confidence) ec ; : confidence expert confidence in ai. fuse : fusion operator for fuzzy sets. n: max CRC number i: instance of CRC normalise : normalisation operator for fuzzy sets.

At step S3.1 new data is received, which is a distribution mapping of CRC-i to MFL. At step S3.2 the validating means 207 performs some default checks, which include checks as to whether i is less than maximum number of CRCs (n) (This is essentially a sanity check that the CRC is a valid code), and whether confidence data (dci) for this CRC i is less than a confidence threshold (dc) for the CRC-i AND that confidence_expert (eci) is less than a confidence threshold (ec) for the default conditions 204 (these are sanity checks that the confidence values are valid).

If the outcome of the default checks is positive, the validating means 207 moves to step S 3.3; if the outcome is negative the validating means 207 moves to the next data sent in from analysing means 201, which is data relating to the next CRC i, and repeats this step.

At step S3.3 fuzzy set bi is computed, which generates a distribution of MFL for the current CRCJ. At step S3.4 the current MFL distribution (ai) is fused

with fuzzy set (bi) in accordance with Equation 1, and the fused distribution (ci) is normalized.

At step S 3.5 the validating means 207 checks whether ci satisfies the further validation conditions 205 relating to updating of decision table 111. If the outcome is positive the validating means moves to step S 3.6; if the outcome is negative the validating means 207 moves to next new data and returns to step S 3.2.

At step S 3.6 the validating means 207 updates the current MFL distribution with ci, so that ai=ci.

As stated above, the validation conditions checked at step S 3.5 are typically given by the user, and include conditions such as those listed as conditions 1 to 6 above. Figure 4 shows application of the condition 6, which can be expressed graphically, and the actual values superimposed upon the conditions. In Figure 4, the first condition is denoted a (0.5), the second condition is denoted 8 (0.3), and the third condition, the second largest membership degree, is denoted T.

Referring to Figure 4, the largest membership degree 401 in the data set is compared against a ; the difference between largest and second largest membership degrees 401,403 in the data set is compared with 8 ; and the difference between the second largest and third largest membership degrees 403,405 in the data set is compared against T, in order to determine whether the fused fuzzy set (ci) satisfies the above condition. It will be appreciated that evaluation of other conditions can be performed in a similar manner.

Conditions 2 and 3 above relate to a situation where the decision table is updated by a one-to-one CRC-to-MFL mapping rather than by a one-to-fuzzy set CRC-to-MFL mapping. In this situation the historical data 202 and default conditions 204 can still be evaluated as fuzzy sets, but a single MFL ("winner") is identified in the fuzzy set and, if that winner satisfies the relevant conditions, the MFL is placed in the decision table 111.

The conditions and changes made in respect of those conditions described above essentially cause changes to be made to entries in the decision table 111 on the basis of accuracy and/or confidence. For example, applying the condition presented at point 2 to the fused fuzzy set derived in the example above, where CA = 1.0, LN = 0.41, EX = 0.02, the mapping from CRC_1 would be CA. However,

these conditions do not take account of costs, and it may be that the cost associated with assigning CRC1 to CA, in cases when the correct MFL is LN, is very high. In the case described briefly in the Overview above, where a customer is inconvenienced unnecessarily because the fault is in the line and not in the customer's premises, as initially diagnosed, the cost includes delay to customer, poor publicity, negative service image, technician work and travel costs, and the assignment of a new technician.

Embodiments of the invention preferably take account of costs in the further validation conditions 205 by utilizing costing means 209, which operates in association with the validating means 207 to evaluate the cost of making a change to a CRC-to-MFL assignment. The costing means 209 receives predetermined estimated costs in the form of a cost matrix (these costs encompass the contributions described above).

Table 1 shows one form of such a cost matrix, where"assigned MFL"is the proposed MFL, i. e. the MFL that the customer support system 107 assigns (e. g. using the decision table 111) and that the technician 109 receives, and"actual MFL"is the MFL that the job actually related to, as determined from CC-to-MFL historical data 202: Assigned MFL CA CE DT EX LN CA 0 0 350 200 155 Actual MFL EX 150 200 300 55 LN 300 0 350 100 0 Table 1 This shows that for a case that has an actual MFL of CA, if the CRC had been assigned to DT the cost would be 350 units (units could be, for example, pounds sterling), if the CRC had been assigned to EX then the cost would be 200 units and if the CRC had been assigned to LN the cost would be 155 units.

The costing means 209 uses the data in Table 1 to evaluate how expensive each assignment would be-for example, say that on the basis of analysed data (analysed historical data 202) there is an MFL distribution of 60% CA and 40% LN

for CRC 1. Based on statistical data alone, the user would assume that CRC 1 should map to CA. However, referring to Table 1 it can be seen that the cost of assigning CA when the fault is actually LN, i. e. unnecessary costs incurred in 40% of cases, is 0.4 x 300 = 120, whereas the cost of assigning LN when the fault is actually CA, i. e. unnecessary costs incurred in 60% of cases, is 0.6 x 155 = 93.

Thus, assuming that you were interested in one-to-one CRC-to-MFL mappings, it would be more costly to map CRC_1 to CA than to LN, because the cost of incorrect assignment (120 vs. 93) is higher in the former case. This information can be presented to a user in the form of a"what-if"analysis (described in more detail below), in the form of cost savings, or as a comparison between MFL assignments.

Such a costing analysis can be performed for each CRC-i entry in the decision table 111. An average cost value can then be calculated by multiplying the cost of each CRC i by the number of times this CRC can be traced back from in the historical data 202 and then dividing this by the overall number of cases comprising the historical data 202.

As an alternative to the single cost matrix, several cost matrices could be used to express different costs (e. g. for labour, travel, customer satisfaction, company image, etc), which would then be combined to give an overall cost value.

For example, a cost matrix for"customer satisfaction"could be used, and fuzzy statements could be entered instead of crisp cost values (e. g. highly satisfied, totally fed up etc). In general, cost matrices could be used for either crisp or fuzzy statements; the user has the option of entering either a crisp amount-say £100- or a relative term such as"very expensive", if the user doesn't want to specify exact values.

Advantages of using cost matrices in this manner include increased flexibility. Each cost can be weighted or disregarded, using fuzzy terms such as"I want a CRC table that gives me very low labor costs and very high customer satisfaction".

In essence the further validation conditions 205 can be based on accuracy alone, on accuracy and confidence (e. g. condition 2), on cost alone, (e. g. cheapest solution-minimize all costs) cost and confidence, on confidence alone, or on a trade-off between all of the condition types.

Further embodiments of the invention include display means 213, which, as shown in Figure 2, is coupled to the validating means 207. The display means 213 provides a graphical display of CRC-to-MFL distribution based on the historical data 202. An example of the output of display means 213 is given in Figure 5 of the accompanying drawings, which shows the total number of Clear Codes (CC) that have been assigned by a technician 109 to faults that were initially classified as CRC ANCA 501, together with assigned and actual MFL for ANCA 501. Figure 5 shows that an MFL of CA 503 was assigned to ANCA, and that the distribution of actual MFL (in Figure 5 this is referred to as"Derived MFL"), as determined from the historical data 202, is balanced between CA 503 and LN 505.

Embodiments of the invention can be used to evaluate suggested CRC-to- MFL mappings on a"what-if"basis via the parameter modifier 211. A user can manually (or automatically, via a suitable computer program) input various different MFLs for any given CRC, for example via parameter modifier 211, whereupon the costing means 209 evaluates the cost and accuracy associated with each MFL.

In these embodiments, the MFL distributions derived by the validating means 207 (derived from the historical data 202 as described above) provide the basis for reviewing each of the suggested CRC-to-MFL mappings. For example, say that for CRC1 the MFL distribution is calculated to be 45% CA, 50% LN, 5% EX.

If a user inputs a suggested MFL of CA for CRC-1, the accuracy associated with this MFL would be 45%, and the cost of incorrect MFL assignment would be 0.5 x 300 + 0.05 x 150 = 157.5. This cost is conveniently referred to as an"overhead" cost.

Similarly, if the suggested MFL for CRC1 were changed to LN, MFL would be correctly assigned 50% of the time, and the overhead would be 0.45 x 155 + 0.05 x 55 = 72.5; while if the suggested MFL for CRC1 were changed to EX, MFL would be correctly assigned 5% of the time, and the overhead would be 0.45 x 200 + 0.5 x 100 = 140.

For each suggested change to the CRC mapping, an average overhead can be computed by multiplying the overhead by the number of times the CC-to-MFL historical data 202 can be mapped to the CRC in question, CRC i, and dividing the product by the total number of cases in the historical data 202. This information can be combined to describe cost and accuracy associated with each of the CRC-to-MFL mappings, as is shown in Table 2, enabling the user to evaluate the cost of making changes.

User input MFL CA LN EX Accuracy (%) 45 50 5 Cost (units) 157.5 72. 5 140 Number of data for CRC 1 450 450 450 Average cost for total number of 157.5 x 450/10000 72. 5 x 450/10000 140 x 450/10000 historical data (10000) =70. 8 =32. 5 =63 Table 2

In practice a user may choose to input suggested MFLs for a plurality of different CRCs, whereupon the costing means 209 will evaluate accuracy and/or cost for each CRC, in the manner described above, and combine the results as a function of MFL. The accuracy and/or cost associated with the different permutations of mappings may be presented to the user, via display means 213, as "assigned"versus"actual"MFL, where assigned MFL represents the mappings suggested by the user, and actual MFL represents the mappings calculated on the basis of the historical data 202.

The accuracy of the suggestions can be presented in a number of different ways: firstly as a percentage of cases assigned to a particular MFL i (Figure 6a- "Total view") ; secondly as percentage of all actual MFL i that were correctly assigned to MFL_i (Figure 6b-"Row view"); and thirdly as percentage of all cases assigned to MFL_i that actually turned out to be MFL i (Figure 6c-"column view").

As stated above, the basis for evaluating correct/incorrect assignment is the historical data 202.

Thus tables 601,602 and 603 in Figure 6a can be used to determine the accuracy of suggested MFL assignments: for example, the value 64.63% 605 at column CA and row CA represents the percentage of all cases that were correctly assigned as CA, and the value 0.78% 607 at column EX and row LN represents the percentage of all cases that were incorrectly assigned to EX when their actual MFL was LN. The"total"row 604 represents the total number of cases that were

assigned to MFL-i, so that, for example, the total number of cases that were assigned to CA (correctly or not) was 67.02%.

Tables 601,602,603 in Figure 6b show the percentage of all actual MFL-i that were correctly assigned to MFLJ. Thus, in table 601, the value 91.16% 609 at column CA and row CA represents the percentage of all actual CA cases that were correctly assigned CA.

In Figure 6c, each entry in tables 601,602,603 represents the percentage of all cases assigned to MFL i that actually turned out to be MFLJ. Thus, in table 601, the value 96.43% 611 at column CA and row CA represents the percentage of all cases assigned to CA that had an actual MFL of CA.

For each type of view, the values can be compared for the three tables 601,602,603, allowing a user to review which of the CRC-to-MFL configurations gives most accurate results. Such comparisons can also be presented in the form of an overlapping circle chart, as shown in Figure 6d, which displays the CRC-to-MFL assignment accuracy for a selected MFL. Figure 6d shows that 83.61 % of the actual CA cases were correctly classified to be CA by the original CRC table 603.

The second variation of the CRC table 602 is better, because it classifies 98.46% of the actual CA cases correctly. The goal is for the dashed circle to overlap with the darkly shaded circle, and for the dashed circle to be independent of (i. e. not overlap with) the other circles. The size of a circle represents the percentage of the actual MFLs in the data, and therefore shows that there are more CA cases than EX cases, and more EX cases than LN cases. The overlapping areas between the dashed circle and the EX and LN circles represent the percentages of cases that are actually EX and LN respectively, but were incorrectly assigned to CA. Additionally, comparisons between the tables 601-603 can be presented via bar graphs (not shown), which show direct visualisations of the percentages in the corresponding tables 601- 603.

Figure 7 shows the"total view"for 5 different configurations of CRC-to- MFL mappings 701,702,703,704,705 (two more tables added, 704,705 to those displayed in Figures 6a-6c). As stated above, with respect to Figures 6a- 6d, the content of the tables 701-705 can be used to assess the accuracy of assigned MFL to actual MFL. Figure 8 shows average costs corresponding to the tables 701-705 of Figure 7, where the costs have been calculated as described

above with respect to the example in Table 2. From Figure 8 it can be seen that the fourth table 704 is the least expensive variation (partly because, in the fourth table 704, at least some of the CRCs have been assigned to MFL CE. As stated above, CE includes both CA and LN; although the cost involved in hiring a technician that is qualified to attend to both types of MFL is higher than hiring a CA, or an LN technician, the fault can be fixed in"one shot", so that the overall costs are lower).

As an alternative to presenting costs and accuracies of the various CRC-to- MFL combinations to the user for review via display means 213, the accuracy and cost results shown Figures 6 to 8 could be used to automatically update the decision table 111. In this case the costs and accuracies are compared against the further validation conditions 205 and the decision table 111 is updated (or not, as the case may be) as described with reference to step S 3.5 of Figure 3.

As will be understood by those skilled in the art, the invention described above may be embodied in one or more computer programs. These programs can be contained on various transmission and/or storage mediums such as a floppy disc, CD-ROM, or other optically readable medium, or magnetic tape so that the programs can be loaded onto one or more general purpose computers or could be downloaded over a computer network using a suitable transmission medium. Embodiments of the present invention are conveniently written using the Java programming language, but it is to be understood that this is inessential to the invention.

The repository DB1 is initially populated using Microsoft Access'97"" The manipulation of data described above is performed using Java code running inside a software application program running on a Personal Computer running MS- Windows.

The functionality of the display means 213, is provided by Java code running within the same software application program.