Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NETWORK FAULT DISCOVERY
Document Type and Number:
WIPO Patent Application WO/2018/184748
Kind Code:
A1
Abstract:
Network addresses which are frequently accessed by user terminals are monitored by a network access point to which the terminals are connected, and the network access point then makes attempts (50) to access the network addresses from time to time and makes reports to a network management system (53) of any such network addresses that fail to respond. This increases the likelihood of detection and reporting of failures of active addresses before the user has need of them.

Inventors:
ABOUELMAATI DALIA (GB)
Application Number:
PCT/EP2018/052281
Publication Date:
October 11, 2018
Filing Date:
January 30, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BRITISH TELECOMM (GB)
International Classes:
H04W24/00; H04W88/08
Domestic Patent References:
WO2015079195A12015-06-04
WO2016118899A12016-07-28
WO2016118899A12016-07-28
Foreign References:
US20090164625A12009-06-25
US20130322234A12013-12-05
US20110208992A12011-08-25
Attorney, Agent or Firm:
BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY, INTELLECTUAL PROPERTY DEPARTMENT (GB)
Download PDF:
Claims:
CLAIMS

A method of monitoring a data communications network to identify outages of resources associated with network addresses by recording access attempts to network addresses made by user terminals through an access point, wherein test attempt requests are generated and the test attempt requests are transmitted over the data communications network to network addresses for which successful access attempts have been recorded in a predetermined period, and any such network addresses that fail to respond are reported to a network management system.

A process according to Claim 1 , in which the network management system is responsive to multiple failure reports from different access points relating to the same network address by recording occurrence of a potential fault condition associated with the network address.

A process according to Claim 1 or Claim 2, in which the network management system is responsive to multiple failures of access attempts from a given access point by recording occurrence of a potential fault condition associated with the access point.

A process according to Claim 1 , Claim 2 or Claim 3, in which access attempts by user terminals to each address are counted during a predetermined period, and in which the access point attempts access periodically to addresses which have been recorded as having been accessed through the access point more than a predetermined number of times.

A process according to Claim 1 , Claim 2, Claim 3 or claim 4, in which access attempts are made at different times of day. A process according to Claim 1 , Claim 2, Claim 3, Claim 4 or Claim 5, in following a first access attempt, a subsequent access attempt is made after an interval which is selected according to whether the first access attempt fails or succeeds.

A process according to Claim 1 , Claim 2, Claim 3, Claim 4, Claim 5 or Claim 6, in which an alert is transmitted to the remote management system only if a response received by the response monitor in response to a test access request to a network address is different from a preceding response received by the response monitor in response to a previous test access request to the same network address.

A communications access point for connecting one or more user terminals to a data communications network, having an access request monitor for detecting and recording access requests made by user terminals connected to the access point to network addresses for which successful access attempts have been recorded in a predetermined period, a test generation processor for generating test access requests for transmission over the data communications network to the target addresses, a response monitor for detecting responses to the test messages received from the target addresses through the data communications network, and an alerting processor for generating reports of failed responses, for transmission to a remote management system.

A communications access point according to Claim 8, comprising a counter for counting access attempts to a network address, and for controlling the test generation processor to generate access requests in response to the counter identifying a predetermined number of requests in a predetermined time.

0. A communications access point according to Claim 8 or Claim 9, wherein the test generation processor is configured to generate test access requests at staggered times of day. A communications access point according to Claim 8, Claim 9 or Claim 10, wherein the test generation processor is controlled by the response monitor to generate test access requests to a network address at a first interval if the response monitor records a successful response to the previous request, and at a second, shorter interval if the response monitor records a failed response to the previous request.

A communications access point according to Claim 8, Claim 9, or Claim 10, wherein the alerting processor is configured to transmit an alert to the remote management system only if a response received by the response monitor in response to a test access request to a network address is different from a preceding response received by the response monitor in response to a previous test access request to the same network address.

A process for remote configuration of a programmable device associated with a communications access point by transmission of programme data to the communications access point over a data communications connection, wherein the programmable device, when executing the program data, causes the access point to operate according to claim 8, claim 9, Claim 10, claim 1 1 , or Claim 12 or claim 13.

A computer system associated with a communications access point including a processor and memory storing computer program code for performing the steps of the process of Claim 1 , Claim 2, Claim 3, Claim 4, Claim 5, Claim 6, or claim 7 when the computer program code is executed by the processor.

A computer program element comprising computer program code to, when loaded into a computer system and executed thereon, cause the computer to perform the steps of a process as claimed in any of Claim 1 , Claim 2, Claim 3, Claim 4, Claim 5, Claim 6 or claim 7.

Description:
Network Fault Discovery

The invention relates to monitoring of a network to identify outages of resources associated with network addresses.

It is known to monitor individual network addresses to detect outages, as described for example in WO2016/1 18899 and US201 1/208992. In these examples, addresses are monitored periodically by network gateways to identify any which are failing to respond, indicating a possible failure of the server at that address or the communications links connecting it to the rest of the network. However, this involves an additional communications overhead in transmitting the test messages and responses. Moreover, it does not take account of how significant such a failure may be, as the failed address may relate to a server which has fallen into disuse and is rarely accessed by access requests from real users, as distinct from the test messages.

It is therefore desirable to concentrate the test overhead on elements of the network on which a fault is more likely to affect the end-users.

Individual user terminals could report access failures, but this would only identify problems retrospectively, and only when a user terminal is connected and a request for access is made. It is desirable to identify outages of resources before those resources are requested, so that they can be remedied before the resource is required.

According to the invention, there is provided a method of monitoring a data communications network to identify outages of resources associated with network addresses by recording access attempts to network addresses made by user terminals through an access point, wherein test attempt requests are generated and the test attempt requests are transmitted over the data communications network to network addresses for which successful access attempts have been recorded in a predetermined period, and any such network addresses that fail to respond are reported to a network management system.

Preferably, the network management system is responsive to multiple failure reports from different access points relating to the same network address by recording occurrence of a potential fault condition associated with the network address. It may also be responsive to multiple failures of access attempts from a given access point by recording occurrence of a potential fault condition associated with the access point.

In a preferred embodiment, access attempts by user terminals to each address are counted during a predetermined period, and the access point attempts access periodically to addresses which have been recorded as having been accessed through the access point more than a predetermined number of times. Access attempts may be made at different times of day, in order to identify failure modes which have a diurnal pattern, for example because of overloads at times of peak demand.

Following a first access attempt, the intervals between subsequent access attempts may be selected according to whether the first access attempt fails or succeeds.

To minimise communication overhead, alerts may be transmitted to the remote management system only if a response received by the response monitor in response to a test access request to a network address is different from a preceding response received by the response monitor in response to a previous test access request to the same network address.

The invention also provides a communications access point for connecting one or more user terminals to a data communications network, having an access request monitor for detecting and recording access requests made by user terminals connected to the access point to network addresses for which successful access attempts have been recorded in a predetermined period, a test generation processor for generating test access requests for transmission over the data communications network to the target addresses, a response monitor for detecting responses to the test messages received from the target addresses through the data communications network, and an alerting processor for generating reports of failed responses, for transmission to a remote management system.

The invention makes use of individual user's network access points to monitor network addresses in regular use. Each access point records network addresses regularly accessed from the access point, and periodically checks those addresses to see if they are still active. If any fail to respond, this is reported to a network management entity which co-ordinates the data to allow collection of data from multiple access points which can be used to identify problems (e.g denial of service attacks, system outages etc). This allows the network operator to identify potential problems before the customer is inconvenienced by them. Thus a distributed mechanism can be provided for monitoring access requests, which will only notify the network if needed, preventing the overload of the network.

Reporting access request history only when an individual access point detects a problem reduces network overhead and allows monitoring to be concentrated on websites that are attracting the most interest, as compared with others that are dormant. It can also identify if an access problem is specific to an individual website/access point pair.

The co-ordination of data at network level also allows problems specific to an individual access point to be identified - in particular if access requests from an individual access point to multiple addresses are resulting in errors, this may be indicative of a problem with the backhaul connection to that access point, or with a user terminal connected to that access point, rather than with the addresses to which the requests are directed.

Another benefit is that with the huge increase of network access devices, it is more feasible to check only the most frequently-used websites rather than checking everything all the time, thereby lessening the communication overhead.

Embodiments of the invention will now be described, by way of example, with reference to the drawings, in which:

- Figure 1 depicts the network entities which co-operate to perform the invention

- Figure 2 depicts a wireless access point configured to operate according to the invention

- Figure 3 depicts a network management entity configured to operate according to the invention

- Figure 4 depicts a first stage in a process according to the invention

- Figure 5 depicts a second stage in a process according to the invention

Figure 1 depicts in schematic form a simplified network 6 connected to a network management system 3, an access point 2 and a target website server 5. A user terminal 1 can connect to the network 6 through the access point 2, and thereby communicate with the target website 5 and the management system 3. It will be recognised that in any practical system there will be many access points 2 and website servers 5 interconnected through the network 6, and each access point 2 may be connected to multiple user terminals 1 .

The access point 2 may be a domestic wireless router, femtocell or enterprise femtocell connected wirelessly to the user terminal 1 , or they may have a wired connection (e.g Ethernet). A wireless access point is depicted schematically in more detail in Figure 2. The functional elements depicted in Figure 2 are typically embodied in software or firmware. The access point 2 has a wireless interface 20 for communication with user terminals 1 , and a network interface 22 for connection to a data communications network 6 such as the Internet. Data packets are translated from one medium to the other by a modem 23 and routing processes such as reading and writing address packets are is controlled by a routing function 21 .

In addition to these conventional functions, the access point operates a number of additional functions in accordance with an embodiment of the invention. A monitoring system 24 intercepts access requests generated by user terminals connected by the access point, and stores a record of such requests in a memory store 26. A counter 25 is used to determine the number of access requests made to each individual address, and this is used to update the store.

A test generation system 27 is arranged to transmit access requests periodically to the addresses stored in the data store, by way of the modem 23 and network interface 22. A response monitor system 28 intercepts responses to these access requests, and controls an alert generation system 29 which is configured to process messages received over the network interface 22 in response to such requests by transmitting reports by way of the modem 23 and network 6 to the management entity 3.

Figure 3 depicts a network management entity 3, which may be embodied in software, which co-operates with a number of access points 2 of the kind depicted in Figure 2. The functional elements include a report reception function 30 which is configured to receive reports from the various access points about possible outages of network based server platforms such as the one depicted at 5 in Figure 1 . Such reports are stored in a database 31 for retrieval by a retrieval unit 32 which analyses the reports to identify patterns in the failure reports which may indicate a fault with a server 5 or with a user terminal 1 , and reports to an appropriate fault management system 33, 34 accordingly.

The process by which the wireless access point 2 operates is depicted in Figure 4 and Figure 5, which illustrate two stages in the process. Figure 4 depicts a method for selecting which network addresses are to be monitored, and Figure 5 depicts the actual monitoring process. It should be noted that these processes can run concurrently, and in particular, the list of addresses to be monitored is continuously updated.

As shown in Figure 4, the request monitor unit 24 in wireless access point 2 detects access requests made by the users and records a list of URLs (Internet Protocol addresses) that are regularly used by the customer. To do this it first stores the address identities in a temporary counting store 25 (step 40). At each such successful access attempt, a comparison is made with addresses already in the store 25 (step 41 ) and any address which occurs more than a predetermined number of times within a specified period t (for example five times in seven days) is forwarded to the main memory store 26. Each successful access attempt is removed from the temporary store (step 42) once the time window t has expired for that access attempt.

Addresses may be removed from the permanent store 26 if they have not been accessed for a longer predetermined period.

As shown in Figure 5, the test generation system 27 identifies the addresses currently in the store 26 and tests each one from time to time to determine if they are still active, by sending an access request to each one (step 50) by way of the modem 23 and network interface 22. It is preferable that this is done when traffic is otherwise quiet, but it may be desirable to make successive tests on a particular website at different times of day as there may be a diurnal pattern of availability of certain websites which would not be detected if the test were made at the same time each day. The requests are flagged with an address corresponding to the response monitor unit 28 so that the responses are not forwarded to any of the user terminals.

The response monitor unit 28 is alerted to the requests and responds accordingly when a response is received, according to the process depicted in Figure 5. If any of these URLs 5 is not responding, or responds with an error message (step 51 ) a report is generated by the alert generation unit 29 and sent by way of the modem 23, network interface 22, and Internet 6, to the management entity 3. The input 30 of the management entity 3 receives reports from multiple access and stores them in a store 31 for analysis. The reports are analysed in a retrieval unit 32. If several access points report a failure of the same target network address 5, this is flagged as a potential fault with the target address and reported to a server fault management system 33, for example as a possible denial of service (DNS) issue. However, if more than one, or all of the target addresses tested by an individual access point are not responding, the report analyser 32 may identify this as a potential problem with the access point, for example with its security settings or backhaul connection, and report to the backhaul fault management system 34. (It will be appreciated that only faults short of complete failure of the backhaul connection would be able to be reported in this way)

The management entity 3 can process such reports to identify clusters or patterns to help identifying the cause of the issue. This allows the network operator to be more proactive, knowing about the issue and fixing it even before the customer notices. As the access point reports potential faults, fault detection can be determined even if no user terminal is currently connected to the access point, so that the problem can be reported to the network management system 3 before the user needs to use the address.

The response monitor system 28 stores the status of the url, and when the next check is performed by the test generation system 27 the response is again analysed by the monitor unit. After a certain time t (step 54, 55) the test generation system performs another check. The process depicted in Figure 5 is arranged such that a change of status is reported to the management entity 3. If the address it is still returning a fault report (step 52) it does not inform the Management entity 3 again, but if it is has returned to activity the management entity is informed (step 53), so that the Management entity stops taking any further actions.

The interval t' between tests may be shorter when a url is on record as faulty (step 55) than the time t when it is operating normally (step 54), so that updates are received more frequently.