Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
AERONAUTICAL INFORMATION PROCESSING
Document Type and Number:
WIPO Patent Application WO/2011/036499
Kind Code:
A1
Abstract:
A computer-implemented method of extracting aeronautical information from a file (424) comprising computer-readable text comprises identifying (308) a section title (112) in the file and identifying (310) a table (102) in the file based on the section title (112). Aeronautical information is then extracted from the table (102). The content and layout of text on a page of the file (424) is substantially the same as that of a corresponding page of an aeronautical information publication (420, 426). The content of the aeronautical information publication (420, 426) is preferably defined by Annex 15 to the Convention on International Civil Aviation.

Inventors:
BRAY CHRISTOPHER (GB)
MALAN STEFAN (GB)
Application Number:
PCT/GB2010/051617
Publication Date:
March 31, 2011
Filing Date:
September 28, 2010
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NATS SERVICES LTD (GB)
BRAY CHRISTOPHER (GB)
MALAN STEFAN (GB)
International Classes:
G06F40/143
Foreign References:
EP1657649A22006-05-17
GB2459200A2009-10-21
US20060271843A12006-11-30
EP1280075A22003-01-29
US20060104511A12006-05-18
Other References:
JEAN-PIERRE CHANOD ET AL: "From Legacy Documents to XML: A Conversion Framework", 1 January 2005, RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES LECTURE NOTES IN COMPUTER SCIENCE;;LNCS, SPRINGER, BERLIN, DE, PAGE(S) 92 - 103, ISBN: 978-3-540-28767-4, XP019018385
DAVID PINTO ET AL: "Table Extraction Using Conditional Random Fields", SIGIR 2003. PROCEEDINGS OF THE 26TH. ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL. TORONTO, CANADA, JULY 28 - AUG. 1, 2003; [ANNUAL INTERNATIONAL ACM-SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN IN, vol. CONF. 26, 28 July 2003 (2003-07-28), pages 235 - 242, XP002441378, ISBN: 978-1-58113-646-3
ISHITANI Y: "Document transformation system from papers to XML data based on pivot XML document method", DOCUMENT ANALYSIS AND RECOGNITION, 2003. PROCEEDINGS. SEVENTH INTERNAT IONAL CONFERENCE ON AUG. 3-6, 2003, PISCATAWAY, NJ, USA,IEEE, 3 August 2003 (2003-08-03), pages 250 - 255, XP010656617, ISBN: 978-0-7695-1960-9, DOI: DOI:10.1109/ICDAR.2003.1227668
BORIS CHIDLOVSKII ET AL: "Supervised learning for the legacy document conversion", PROCEEDINGS OF THE 2004 ACM SYMPOSIUM ON DOCUMENT ENGINEERING. (DOCENG 2004). MILWAUKEE, WI, OCT. 28 - 30, 2004; [ACM SYMPOSIUM ON DOCUMENT ENGINEERING], NEW YORK, NY : ACM, US, 28 October 2004 (2004-10-28), pages 220 - 228, XP002442300, ISBN: 978-1-58113-938-9
ALTAMURA O ET AL: "Transforming Paper documents into XML Format with WISDOM++", INTERNET CITATION, 2000, XP002317735, Retrieved from the Internet [retrieved on 20050215]
Attorney, Agent or Firm:
CUPITT, Philip (26 Caxton StreetLondon Greater, London SW1H 0RJ, GB)
Download PDF:
Claims:
CLAIMS

1. A computer-implemented method of extracting aeronautical information from a file, the file comprising computer-readable text wherein the content and layout of text on a page of the file is substantially the same as that of a corresponding page of an aeronautical information publication, the method comprising:

identifying a section title in the file;

identifying a table in the file based on the section title; and

extracting aeronautical information from the table.

2. A method in accordance with claim 1, wherein identifying a section title comprises automatically comparing a text string in the file with a text string specified by a standard defining the content of the aeronautical information publication. 3. A method in accordance with claim 1 or claim 2, wherein identifying a section title comprises receiving a first user input to associate a text string in the file with a text string specified by a standard defining the content of the aeronautical information publication.

4. A method in accordance with any one of the preceding claims, wherein identifying a table comprises automatically identifying one or more columns of the table by analysing positions of two or more section titles in a page of the file.

5. A method in accordance with any one of the preceding claims, wherein identifying a table comprises receiving a second user input to identify a portion of a table comprising the aeronautical information.

6. A method in accordance with claim 5, further comprising storing coordinates defining the position in a page of the file of the portion of the table identified by the second user input. 7. A method in accordance with any one of the preceding claims, wherein identifying a table comprises automatically identifying a portion of the table comprising the aeronautical information based on the position of a portion of a previously-identified table that comprises aeronautical information.

8. A method in accordance with claim 7, wherein the position of the portion of the previously-identified table is defined by its coordinates in a page of a file. 9. A method in accordance with any one of the preceding claims, wherein extracting aeronautical information comprises automatically identifying a text string in the table based on the position of the text string in a page of the file.

10. A method in accordance with any one of the preceding claims, wherein extracting aeronautical information comprises receiving a third user input to identify a text string in the table and to identify an element of a schema.

11. A method in accordance with claim 10, further comprising validating the text string identified by the third user input against the schema.

12. A method in accordance with claim 10 or claim 11, wherein extracting aeronautical information further comprises:

writing the element identified by the third user input to an output file; and

writing the text string identified by the third user input to the output file as a value of an attribute of the element.

13. A method in accordance with claim 12, further comprising transforming the output file to a second output file, wherein the structure and content of the second output file is defined by a second schema.

14. A method in accordance with any one of claims 3 to 13, further comprising storing auditing information on the first, second and/or third user input in an output file.

15. A method in accordance with any one of the preceding claims, further comprising creating the file by scanning and performing optical character recognition on one or more pages of a paper aeronautical information publication.

16. A computer readable medium comprising instructions which, when executed by a suitable computer, cause the computer to perform a method in accordance with any one of the preceding claims. 17. An apparatus for extracting aeronautical information from a file, the file comprising computer-readable text wherein the content and layout of text on a page of the file is substantially the same as that of a corresponding page of an aeronautical information publication, the apparatus comprising:

means for identifying a section title in the file;

means for identifying a table in the file based on the section title; and

means for extracting aeronautical information from the table.

18. An apparatus in accordance with claim 17, wherein the means for identifying a section title comprises means for automatically comparing a text string in the file with a text string specified by a standard defining the content of the aeronautical information publication.

19. An apparatus in accordance with claim 17 or claim 18, wherein the means for identifying a section title comprises means for receiving a first user input to associate a text string in the file with a text string specified by a standard defining the content of the aeronautical information publication.

20. An apparatus in accordance with any one of claims 17 to 19, wherein the means for identifying a table comprises means for automatically identifying one or more columns of the table by analysing positions of two or more section titles in a page of the file.

21. An apparatus in accordance with any one of claims 17 to 20, wherein the means for identifying a table comprises means for receiving a second user input to identify a portion of a table comprising the aeronautical information. 22. An apparatus in accordance with claim 21 , further comprising means for storing coordinates defining the position in a page of the file of the portion of the table identified by the second user input.

23. An apparatus in accordance with any one of claims 17 to 22, wherein the means for identifying a table comprises means for automatically identifying a portion of the table comprising the aeronautical information based on the position of a portion of a previously- identified table that comprises aeronautical information.

24. An apparatus in accordance with claim 23, wherein the position of the portion of the previously-identified table is defined by its coordinates in a page of a file.

25. An apparatus in accordance with any one of claims 17 to 24, wherein the means for extracting aeronautical information comprises means for automatically identifying a text string in the table based on the position of the text string in a page of the file.

26. An apparatus in accordance with any one of claims 17 to 25, wherein the means for extracting aeronautical information comprises means for receiving a third user input to identify a text string in the table and to identify an element of a schema.

27. An apparatus in accordance with claim 26, further comprising means for validating the text string identified by the third user input against the schema. 28. An apparatus in accordance with claim 26 or claim 27, wherein the means for extracting aeronautical information further comprises:

means for writing the element identified by the third user input to an output file; and means for writing the text string identified by the third user input to the output file as a value of an attribute of the element.

29. An apparatus in accordance with claim 28, further comprising means for transforming the output file to a second output file, wherein the structure and content of the second output file is defined by a second schema. 30. An apparatus in accordance with any one of claims 19 to 29, further comprising means for storing auditing information on the first, second and/or third user input in an output file.

31. An apparatus in accordance with any one of claims 17 to 30, further comprising means for creating the file by scanning and performing optical character recognition on one or more pages of a paper aeronautical information publication. 32. A computer-implemented method of extracting information from a file, the file comprising computer-readable text and wherein the file defines the layout of the computer- readable text on a page of the file, the method comprising:

identifying a title in the file;

identifying a table in the file based on the title; and

extracting information from the table.

33. A method substantially as herein described and/or as illustrated in any of the accompanying drawings. 34. An apparatus substantially as herein described and/or as illustrated in any of the accompanying drawings.

Description:
AERONAUTICAL INFORMATION PROCESSING

FIELD OF THE INVENTION The invention relates to the extraction of aeronautical information from aeronautical information publications.

BACKGROUND OF THE INVENTION Aeronautical Information Services (AIS) maintain and publish information necessary for the safety, regularity and efficiency of air navigation. Aeronautical information published by AISs includes, for example, information on:

• Air Traffic Services (ATS), including their frequency channel assignments, callsigns and geographical coverage;

• ATS routes, including route designators that identify a particular route and the coordinates of points on the route;

• radio navigational aids, including their identifiers, frequencies and coordinates;

• navigation warnings, including the coordinates of areas that might present a danger to aircraft; and

• aerodromes, including their names, coordinates, elevation, operational hours and runway information (such as the designation, length, bearing, width and strength of each runway at an aerodrome). Generally speaking, each country has its own AIS, which is responsible for maintaining and publishing aeronautical information for that country. The AIS in the United Kingdom (UK) is NATS, which is associated with the applicant for the present patent application. The aeronautical information published by an AIS is known as an Aeronautical Information Publication (AIP).

Annex 15 to the Convention on International Civil Aviation (which is commonly known as the Chicago Convention) defines how an AIS shall receive and/or originate, collate or assemble, edit, format, publish/store and distribute specified aeronautical information. Thus, Annex 15 is an international standard that governs, amongst other things, the content of an AIP. Whilst Annex 15 defines certain minimum requirements for the content of AIP, it does not require the content to be formatted or presented in any particular way. For example, Annex 15 does not require that a particular wording be used to identify each item of information, that certain information be presented in a tabular or non-tabular form, or even that particular units of measurement be used for numerical quantities.

For example, Fig. 11 is an excerpt from Section AD 2 of Annex 15, which contains information on aerodromes. Section AD 2.2 (illustrated by reference sign 1100) will be used as an example at various places in the following description. Section AD 2.2 1100 comprises a section title 1110, which is identified by the text string AD 2.2 Aerodrome

Geographical and Administrative Data", where represents a four character location indicator that uniquely identifies a particular aerodrome. Section AD 2.2 1100 also lists a number of categories of information 1101, 1102, 1103 that must be provided in an AIP. Note, however, that Section AD 2.2 does not require that a particular wording be used to identify each category of information or specify how this information must be formatted.

The absence of any requirements for the format or presentation of information means that different countries' AIPs can vary greatly in their manner of presentation whilst still complying with Annex 15. Indeed, it is even possible that comparable information within a single AIP may be presented in a different way, whilst still complying with Annex 15. For example, an AIP may present information on two aerodromes in subtly different tabular formats.

Figs 1 and 2 are two examples of pages from the UK's AIP, which provide information on Guernsey and Biggin Hill aerodromes respectively. The AIP page shown in Fig. 1 contains three tables 102, 104, 106, which respectively contain information on Guernsey aerodrome's geographical and administrative data, operational hours, and handling services and facilities, as indicated by the section titles 1 12, 1 14 and 1 16. Table 102 contains the information required by Section AD 2 of Annex 15, as shown in Fig. 11. The AIP page shown in Fig. 2 also, coincidentally, contains three tables 200, 220, 230. Table 200 contains information on Biggin Hill aerodrome's approach and runway lighting, as indicated by the section title 210. Table 200 presents this information using nine columns, the first three of which are indicated with reference signs 201, 202 and 203. It is emphasised that Annex 15 does not require that this sort of information be presented in tabular form, let alone require tables to have a particular format. It can also be seen from the first four characters of each of the section titles 112, 114, 116, 210 that the location indicator for Guernsey aerodrome is 'EGJB' and the location indicator for Biggin Hill aerodrome is 'EGKB'.

Differences in the manner of presentation do not cause a problem for human readers of aeronautical information. The typical user of aeronautical information has sufficient intelligence to be able to account for variations in the format and presentation of AIPs in order to extract pertinent information. However, differences in format and presentation mean that it is very difficult for a computer to extract pertinent information from AIPs, because there is no known way of defining rules that allow a computer to identify a specific piece of information with a reasonable degree of accuracy. Annex 15 was developed in 1952, long before computers were in widespread use, and so it is understandable that Annex 15 was never intended to facilitate automated extraction of information from AIPs.

Annex 15 requires AISs to publish AIPs in a paper-based form. AIPs typically contain a very large amount of information and, for example, the paper-based AIP for the UK is approximately 1,800 pages long, whilst the AIP for the USA is considerably longer. AIPs are also updated on a regular basis. For example, the AIP for the UK is updated every twenty- eight days. Different AISs use different techniques to produce their AIPs. Some AISs produce AIPs using word processing software or desktop publishing (DTP) software, whilst other AISs produce AIPs using typewriters. Due to the longstanding requirement to publish AIPs in paper-based form and the different techniques used to produce AIPs, there is no standard computer file format for the publication of AIPs.

There is nevertheless a desire to process aeronautical information using computers. Thus, the Aeronautical Information Exchange Model (AIXM) has been developed by the US Federal Aviation Administration (FAA) and the European Organisation for the Safety of Air Navigation (EUROCONTROL) to define a structured way of storing aeronautical information in a computer. Detailed information on AIXM can be found at the following website: www.aixm.aero. AIXM has two components, namely the Aeronautical Information Conceptual Model (AICM) and the AIXM XML Schema. AICM is a conceptual/logical model that describes aeronautical information. AICM defines all the important features, properties (i.e. attributes and associations) and business rules that make up aeronautical information. The AICM is expressed in Unified Modelling Language (UML), which is an object-oriented language. Features describe important aeronautical entities, such as aerodromes, runways and routes. Features are represented as classes in UML. Attributes characterise the features. For example, an aerodrome feature is a class that has attributes for specifying the name, location and elevation of an aerodrome, whilst a runway feature is a class that has attributes for specifying its designation, location, length and width of a runway. Associations define relationships between features. For example, an association can specify that each aerodrome comprises zero or more runways. Finally, business rules check the validity of data. For example, a business rule may specify that a runway should be located within a particular distance of the aerodrome with which it is associated; if the runway is located further away than this distance, then this may indicate errors in the attributes specifying the locations of the aerodrome and/or runway. AICM can be used to define the structure of a database in which to store aeronautical information.

The AIXM XML Schema implements AICM as an Extensible Markup Language (XML) Schema. XML is well known in the field of computing as a language that describes a class of data objects called XML documents. An XML document is a file containing Unicode text that defines one or more data items known as elements. An XML schema defines the structure and content of a particular type of XML document. An XML schema is itself an XML document. The AIXM XML Schema is an XML document in the World Wide Web Consortium's (W3C) XML Schema Language, and is stored in a plurality of files having a .XSD (XML Schema Definition) file extension. Hence, the AIXM XML Schema is an XML document that defines the structure and content of XML documents that are suitable for storing aeronautical data in accordance with the AICM. The aeronautical information contained in an AIP can be represented in an XML document that complies with the AIXM XML Schema. However, it is a very difficult task to extract information from existing AIPs and to place that information into an AIXM-compliant XML document. This is because, as noted above, unpredictability in the format and presentation of AIPs means that there are no known automated techniques for extracting information from existing AIPs, even though those AIPs comply with Annex 15. Hence, current techniques for converting Annex 15-compliant AIPs to AIXM-compliant XML documents require a human to read and interpret the AIP, and then to type information into a computer so that it can be inserted into appropriate part of the XML document.

Current practice in the AIS industry is to use Standard Input Forms (SIFs) to facilitate the typing of aeronautical information into the computer. SIFs provide a graphical user interface whereby user instead types aeronautical information into an appropriate location in the SIF, such that the user does not need to edit an XML document directly. SIFs can also perform basic checks to ensure that the typed information is valid. However, even with the use of SIFs, typing aeronautical information is still a time-consuming and error-prone task. It is particularly important that conversion of AIPs to XML documents does not introduce errors, since errors may prejudice the safety of aircraft. Thus, there is a need for a way to extract information from Annex 15-compliant AIPs quickly and with few errors. There is also a need to convert information from Annex 15-compliant AIPs to AIXM-compliant XML documents quickly and with few errors.

SUMMARY OF THE INVENTION

A first aspect of the invention provides a computer-implemented method of extracting aeronautical information from a file, the file comprising computer-readable text wherein the content and layout of text on a page of the file is substantially the same as that of a corresponding page of an aeronautical information publication, the method comprising: identifying a section title in the file; identifying a table in the file based on the section title; and extracting aeronautical information from the table. A further aspect of the invention provides a computer readable medium comprising instructions which, when executed by a suitable computer, cause the computer to perform said method. A further aspect of the invention provides an apparatus for extracting aeronautical information from a file, the file comprising computer-readable text wherein the content and layout of text on a page of the file is substantially the same as that of a corresponding page of an aeronautical information publication, the apparatus comprising: means for identifying a section title in the file; means for identifying a table in the file based on the section title; and means for extracting aeronautical information from the table.

The invention described herein is primarily intended to extract aeronautical information from an AIP and, as explained above, unpredictability in the format and presentation of AIPs gives rise to particular problems in the automated extraction of aeronautical data that are overcome by the invention. However, the invention can also be used to extract information from other types of document. In particular, the invention can be used to extract information from documents that do not contain aeronautical information, but which comprise one or more titles and in which information is contained in one or more tables. Thus, a further aspect of the invention provides a computer-implemented method of extracting information from a file, the file comprising computer-readable text and wherein the file defines the layout of the computer-readable text on a page of the file, the method comprising: identifying a title in the file; identifying a table in the file based on the title; and extracting information from the table. A further aspect of the invention provides a computer-readable medium comprising instructions which, when executed by a suitable computer, cause the computer to perform said method. A further aspect of the invention provides an apparatus comprising means for performing said method. Preferred features of the aspects of the invention that extract aeronautical information can also be applied to the aspects of the invention that extract other types of information in any appropriate combination. BRIEF DESCRIPTION OF THE DRAWINGS

Preferred features of the invention will now be described, purely by way of example, with reference to the accompanying drawings, wherein like elements will be indicated using like reference signs, and in which:

Fig. 1 is a first example of a page of an AIP;

Fig. 2 is a second example of a page of an AIP;

Fig. 3 is a flow chart of a method of extracting aeronautical information from an AIP;

Fig. 4 is a flow chart illustrating the conversion between file formats performed by the method of Fig. 3;

Fig. 5 is an example of Annex 15 section titles;

Fig. 6 is an excerpt from an AIP definition file;

Fig. 7 is a flow chart of a method for identifying Annex 15 section titles;

Fig. 8 is a flow chart of a method for identifying tables in an AIP

Fig. 9 is a flow chart of a method for receiving user input to extract aeronautical information from a file;

Fig. 10 is a screenshot of a user interface for resolving Annex 15 section titles;

Fig. 11 is an excerpt from Annex 15 to the Convention on International Civil Aviation;

Fig. 12 is an example of XML elements that are created by resolving a section title;

Fig. 13 is an example of an auditing element;

Fig. 14 is a screenshot of a user interface for resolving tables;

Figs. 15(a), 15(b) and 15(c) collectively show an excerpt from an Annex 15 XML schema; Fig. 16 is a screenshot of a graphical user interface for receiving user input to extract aeronautical information;

Fig. 17 is a screenshot of a form used by the user interface of Fig. 16; and

Figs. 18(a) and 18(b) collectively show an example of XML elements that are created by resolving a table. DETAILED DESCRIPTION

A method 300 of extracting aeronautical information from an AIP will now be described with reference to Figs. 3 and 4. Fig. 3 shows the steps of the method, whilst Fig. 4 shows how aeronautical information is converted from one file format to another by the method. As will be clear from the following description, steps 302 to 306 and 314 of the method 300 are preferably performed using off-the-shelf software. However, steps 308 to 312 of the method 300 are preferably performed by one more novel computer programs, which will be referred to in the following description as a Document Transform System (DTS). The DTS may optionally control the operation of certain off-the-shelf computer programs during steps 302 to 306 and 314 (using appropriate calls to an application programming interface, for example), so as to provide a seamless conversion process in which the user interacts only with the DTS and does not interact with the off-the-shelf computer programs. However, it will be appreciated that the DTS could also comprise computer program code for performing any of steps 302, 304, 306 and/or 314 itself, without the use of off-the-shelf software.

The AIP that is to be converted may be a paper AIP 420 or an electronic AIP 426. In the following description, the term 'paper AIP' is preferably to be understood to mean one or more pieces of paper having aeronautical information printed, typed or written thereon, whilst the term 'electronic AIP' is preferably to be understood to mean one or more computer-readable files having aeronautical information stored therein. As noted previously, some AISs produce AIPs using computer-based technologies such as word processing, in which case an electronic AIP 426 may be available. Other AISs produce AIPs using older technologies such as typewriters, in which case an electronic AIP 426 will not be available. However, since Annex 15 requires AISs to publish AIPs in paper-based form, it is guaranteed that a paper AIP 420 will be available. Whether the AIP is a paper AIP 420 or an electronic AIP 426, it is assumed that the AIP complies with Annex 15. A paper AIP 420 and an electronic AIP 426 each comprise one or more pages. The preferred starting point for the method 300 is an electronic AIP 426, since this avoids the risk of introducing errors during the steps of scanning 302 and optical character recognition 304 (which are described below). If an electronic AIP 426 is not available or if the electronic AIP 426 is in a file format than cannot easily be interpreted, then the starting point for the method 300 is a paper AIP 420.

If the starting point for the method 300 is a paper AIP 420, the first step 302 in the method 300 is to scan the paper AIP 420. As is well known in the art, scanning is a process for taking a digital image of a physical document, so as to generate an electronic representation of the physical document. The output of the scanning step 302 is preferably a file 422 in Portable Document Format (PDF), which is a well known file format. In this case, the AIP 420 will be represented as one or more raster images in the PDF file, rather than as computer-readable text (such as text in American Standard Code for Information Interchange (ASCII) or Unicode formats). The output of the scanning step 302 may alternatively be a file in any other suitable format, such as Tagged Image File Format (TIFF).

The second step 304 in the method 300 is to perform optical character recognition (OCR) on the file generated by the scanning step 302. OCR is a process for extracting alphanumeric characters (including letters, numbers, white space and punctuation) from an image, so as to generate one or more files containing computer-readable text corresponding to the alphanumeric characters that are present in the image. The term 'computer-readable text' as used herein is preferably understood to mean a string of one of more alphanumeric characters (including, where appropriate, letters, numbers, white space and punctuation), wherein each character is represented in accordance with a particular character encoding scheme. Suitable character encoding schemes may be ASCII or Unicode. The output of the OCR step 304 is preferably a file 422 in PDF format. The PDF file 422 generated by the scanning step 302 may be overwritten by the PDF file generated by the OCR step 304. Each page of the PDF file 422 will correspond to a page of the paper AIP 420, and each page of the PDF file 422 will contain strings of computer-readable alphanumeric text positioned at approximately the same position on the page as that at which the same text was printed on the corresponding page of the paper AIP 420. The PDF file 422 generated by the OCR step 304 preferably also contains any text formatting (such as underlining or italicisation of text) and/or non-textual graphical elements (such as lines that represent the borders around tables in the AIP, or drawings in the AIP) that were present in the paper AIP 420. Hence, the net result of the scanning step 302 and OCR step 304 is to generate a file 422 containing computer-readable text, wherein the content and layout of the text in each page of the file 422 is substantially the same as that of the corresponding page of the paper AIP 420. The term 'layout' as used herein is preferably to be understood to mean the positioning of items (such as text, tables and diagrams) on the page. It will be appreciated that the file generated by the OCR step 304 need not necessarily be in PDF format, and that any other suitable format may be used, such as Scalable Vector Graphics (SVG) or PostScript. The OCR step 304 can be performed using any suitable software; for example, it is possible to use off-the-shelf software such as Acrobat™ Professional, which is marketed by Adobe Systems Incorporated. Assuming that the file generated by the OCR step 304 is not already in SVG format, the third step 306 in the method is to convert that file to SVG format. SVG is an XML-based image file format, developed by the W3C, that represents images as a combination of computer- readable text, vector graphics and raster graphics. In the example shown in Figs. 1 and 2, the converting step 306 converts the PDF file 422 to an SVG file 424. The converting step 306 can be performed using any suitable software; for example, off-the-shelf software such as the Mars plug-in for Acrobat™ Professional can be used. The Mars plug-in is marketed by Adobe Systems Incorporated.

If the starting point for the method 300 was an electronic AIP 426, then different steps may be performed in order to generate an SVG file 424. For example, if the electronic AIP 426 is a PDF file that does not contain computer-readable text, then the method 300 will commence at step 304. If the electronic AIP 426 is a PDF file containing computer-readable text (which may be the case if the AIP has been produced using word processing software or DTP software), then the method 300 will commence at step 306. If the electronic AIP 426 is in SVG format, then the method will commence at step 308. If the electronic AIP 426 is in some other format, then one or more steps of file format conversion (not shown in Fig. 1) must be performed to convert the file to PDF or SVG format, such that the method 300 can commence from any of steps 304, 306 or 308. Whilst the examples of the invention described herein refer to an SVG file 424, it will be appreciated that other suitable file formats may be used instead of SVG. Whichever file format is used, the file format should be capable of representing pages of computer-readable text, wherein the content and layout of the text in each page of the file in the file format is substantially the same as that of the corresponding page of the paper AIP 420 or electronic AIP 426. Preferably, the file format should be XML-based. Once an SVG file 424 has been created, the next step 308 in the method 300 is for the DTS to identify Annex 15 section titles in the SVG file 424. The term Annex 15 section title' as used herein is preferably to be understood to mean a string of alphanumeric characters which identifies a particular section or subsection of an AIP that is required by Annex 15 to the Convention on International Civil Aviation. The term Annex 15 section title' does not, however, necessarily mean that the alphanumeric characters that identify the section or subsection are specified by Annex 15. For the sake of brevity, the term 'section title' is used interchangeably with the term Annex 15 section title' herein.

Fig. 5 shows an example of Annex 15 section titles. Annex 15 section titles define several hierarchies of information within the AIP. Fig. 5 illustrates that Annex 15 defines a hierarchy that contains all of the information on aerodromes, whose root (parent) level is identified in an AIP by the section title "AD". Annex 15 further defines that the next lowest level in the hierarchy (i.e. the child level) contains up to four items, which are identified in an AIP by the section titles "AD 0.", "AD 1. AERODROMES AND HELIPORTS - INTRODUCTION", "AD 2. AERODROMES" and "AD 3. HELIPORTS". These section titles each identify subsections of the "AD" section. Annex 15 further defines that the next lowest level of the hierarchy (i.e. the grandchild level) contains further members, which are sub-sections of the "AD 2. AERODROMES" and "AD 3. HELIPORTS" sub-sections. For example, Annex 15 defines the section titles "AD 2.1 Aerodrome Location Indicator and Name", "AD 2.2 Aerodrome Geographical and Administrative Data" and "AD 2.3 Operational Hours", which identify sub-sections of an AIP that contain specific categories of information on aerodromes.

Annex 15 does not specify the precise wording that must be used to identify sections or subsections in the aerodrome "AD" hierarchy. However, Annex 15 does require each of the sections and subsections to contain certain information. For example, as shown in Fig. 1 1, Annex 15 requires the "AD 2.2 Aerodrome Geographical and Administrative Data" section of the AIP relating to a particular aerodrome to contain information including: the geographical coordinates and site of an aerodrome reference point (illustrated by reference numeral 1101); the direction and distance of the aerodrome reference point from the centre of the city or town which the aerodrome serves (illustrated by reference numeral 1102); and the aerodrome elevation and reference temperature (illustrated by reference numeral 1103). These items of information respectively correspond to the following rows in table 102 of Fig. 1 : "ARP Co-ordinates and site at AD" (denoted by reference numeral 120); "Direction and distance from city" (denoted by reference numeral 122); and "Elevation/Reference temperature" (denoted by reference numeral 122). Whilst the text strings in the left column of table 102 (i.e. "ARP Co-ordinates and site at AD", "Direction and distance from city" and "Elevation/Reference temperature" etc.) identify particular information in the AIP that is required by Annex 15, the string itself is not defined in Annex 15. Hence, the alphanumeric strings used to identify a particular piece of information can vary between different AIPs (and can even vary between different pages of the same AIP), which makes it difficult to identify these strings and extract the information that they identify.

The section title identification step 308 uses an AIP definition file 429 to identify Annex 15 section titles in the SVG file 424. The AIP definition file 429 is illustrated by an ellipse in Fig. 4 to denote that it is an input to the method, rather than a file to which the AIP is converted by the method. The AIP definition file 429 is an XML document that describes the content of an AIP. The AIP definition file 429 is preferably specific to a particular AIP, such that differences in the format and presentation of different countries' AIPs can be accommodated by providing a respective AIP definition 429 for each AIP. The AIP definition file 429 contains a plurality of XML elements that correspond to each of the Annex 15 section titles. For example, the AIP definition file 429 contains one or more elements corresponding to the "AD 2. AERODROMES" section title, one or more elements corresponding to the "AD 2.1 Aerodrome Location Indicator and Name" section title, and one or more elements corresponding to the "AD 2.2.1 Aerodrome Reference Point and Site" section title. Fig. 6 illustrates the elements of the AIP definition file 429 that correspond to the "AD 2.1 Aerodrome Location Indicator and Name" section title, as illustrated by reference numeral 1 101 in Fig. 1 1. Element 602 (which is indicated by the <annexl5Short> tag) contains a short description of the item of information. The short description is preferably identical to the wording used to denote that item of information in the AIP (as illustrated by reference numeral 120 in Fig. 1 ). Element 604 (which is indicated by the <annexl5Long> tag) contains a long description of the item of information. The long description is preferably identical to the wording used to describe that item of information in Annex 15 (as illustrated by reference numeral 1 101 in Fig. 1 1 ). Element 606 (which is indicated by the <annexl5Keywds> tag) contains a number of keywords that can be used to assist in identifying that item of information within the AIP, in the event that the AIP does not contain text identical to either the short description or the long description. Each of the keywords is a word that has a high probability of being used to indicate that particular item of aeronautical information in the AIP. Each XML element in the AIP definition file 429 has a structure similar to that of the example shown in Fig. 6, although the content of each XML element will clearly be different. An Annex 15 XML schema 431 is an XML document that defines the structure and content of XML documents that are suitable for storing aeronautical data, wherein the aeronautical data complies with the requirements of Annex 15 to the Convention on International Civil Aviation. The Annex 15 XML schema 431 is illustrated by an ellipse in Fig. 4 to denote that it is an input to the method, rather than a file to which the AIP is converted by the method. The Annex 15 XML schema is preferably in the W3C's XML Schema Language (and, more preferably, in W3C XML Schema Definition Language (XSD) 1.0), but it will be appreciated that any suitable XML schema language can be used. A significant advantage of the Annex 15 XML schema 431 is that it can be used to validate information that is extracted from the AIP or entered manually, which reduces the risk of errors being introduced during the method 300 and thereby improves the accuracy of the extracted information. Fig. 15 is an excerpt from the portion of the Annex 15 XML schema 431 that defines the structure and content of an XML document that is suitable for storing the information contained under the Annex 15 section title "AD 2. AERODROMES". Fig. 7 illustrates the operation of the section title identification step 308. Firstly, at step 702, a new XML document 428 is created. This is the file that will eventually contain all of the aeronautical information that was contained in the paper AIP 420 or electronic AIP 426. The structure and content of the XML document 428 will comply with the Annex 15 XML schema 431. Starting with the first text string in the SVG file 424, at step 704 a text string in the SVG file 424 is compared with a text string associated with each element of the AIP definition file 429. For example, the text string can be compared with the short description element 602, the long description element 604 or any of the keyword elements 606. If a text string in the SVG file 424 is found to be equal to an element in the AIP definition file 429 at decision step 706 then the text string is assumed to correspond to a section title. In this case, the element in the AIP definition file 429 that is associated with that text string is added to the XML document 428 at step 708. Step 704 is then repeated for the next text string in the SVG file 424. On the other hand, if a text string in the SVG file 424 is found not to be equal to any of the elements in the AIP definition file 429 at decision step 706, nothing is added to the XML document 428, and step 704 is then repeated for the next text string in the SVG file 424. In the event that the text string did not match any elements in the AIP definition file 429, user input is needed (at steps 312 and 920, described below) to identify the text string. Steps 704, 706, 708 and 710 are repeated until all of the strings in the SVG file 424 have been compared with elements in the AIP definition file 429. Text in the SVG file 424 can be normalised prior to step 704 in order to improve the likelihood of that text being correctly identified as a section title. In a preferred embodiment, decision step 706's comparison of a text string in the SVG file 424 with a text string associated with each element of the AIP definition file 429 does not require the two text strings to be strictly equal to each other in order to determine that a particular text string in the SVG file 424 corresponds to a particular section title. In this embodiment, decision step 706 may instead involve an approximate string matching algorithm that determines whether a text string in the SVG file 424 matches a section title based upon one or more criteria. For example, any one or more of the following criteria can be used:

• Whether the text string contains more than a predetermined threshold number of words that are keywords 606 in the element of the AIP definition file 429 that corresponds to that section title. • Whether the text string contains fewer than a predetermined threshold number of words that are not keywords 606 in the element of the AIP definition file 429 that corresponds to that section title.

• The position of the text string relative to other text strings in the SVG file. For example, the text string "ARP Co-ordinates and site at AD" (illustrated by reference numeral 120 in Fig. 1) may be deemed to match the section title AD 2.2.1 (indicated by the reference numeral 1101 in Fig. 11) because: (i) the text string "ARP Coordinates and site at AD" appears immediately to the right of the text string "1 "; (ii) the text string "1" is located between the AD 2.2 section title 112 and the AD 2.3 section title 114; and/or (iii) the text string "ARP Co-ordinates and site at AD" has the same alignment in the horizontal direction as one or more other text strings (such as "Direction and distance from city") that are potentially section titles or that have already been positively identified as being section titles.

• The font properties of the text string. For example, the text string may be determined to be a section title if it has the same font, font size and/or format (e.g. bold, italics and/or underline) as one or more previously-identified section titles.

It will be appreciated that other suitable criteria may be used to enable approximate string matching of a text string in the SVG file 424 to a section title.

As noted previously, Annex 15 does not require that aeronautical information be presented in tabular form, but most AIPs contain a large number of tables in order to present aeronautical information in a concise and easily-comprehensible form. The success of an automated method for extracting aeronautical information from an AIP is, therefore, largely dependent upon its abilities to identify tables having an arbitrary format and to extract information reliably from such tables. The method described herein identifies tables and extracts information therefrom by first identifying and extracting as much information as possible using fully- automated techniques (see step 310, described below) and then receiving user input to identify and extract any information that could not be obtained by the fully-automated techniques (see steps 930 and 940, described below). The method described herein preferably uses the properties of previously-identified tables (particularly the spatial positions (e.g. coordinates) of various features of those tables) to improve its ability to identify newly-encountered tables automatically and to extract information from the newly- encountered tables automatically. The properties of previously- identified tables are obtained from one or more table definition files 432. A table definition file 432 is generated for one or more pages of an AIP by the DTS with the assistance of user input, as described in more detail below. The table definition file 432 is then used to identify tables in other pages of the same AIP. This takes advantage of the fact that, within a given AIP, tables that contain similar information tend to have similar properties, and so a table definition file 432 that describes the properties of the tables in one page of an AIP can be used to identify similar tables in other pages the same AIP. For example, the table definition file 432 that describes the properties of table 200 (which contains information on Biggin Hill aerodrome's approach and runway lighting) can also be used to identify other tables in the UK's AIP that contain information on other aerodromes' approach and runway lighting.

Hence, the next step 310 in the method 300 is for the DTS to identify tables in the SVG file 424. Tables are identified based on the structure of the SVG file 424. Fig. 8 illustrates the operation of the table identification step 310. At step 802, the titles of each column of a table are identified. The column titles are preferably identified by analysing the coordinates in the SVG file 424 of the computer- readable text that makes up the section titles that were identified in step 308. The coordinates are analysed in order to determine whether two or more section titles have substantially the same position along the vertical axis on a particular page of the SVG file 424. If the two or more section titles have substantially the same position along the vertical axis, but different positions along the horizontal axis, then those section titles are assumed to be in the header row of the table, and each section title is assumed to be the column title of a respective column of the table. Section titles sometimes occupy more than one line in the header row of the table. For example, the titles of columns 202 and 203 in table 200 each occupy four lines of text. To allow columns to be identified in cases where section titles occupy multiple lines of text, the coordinates of section titles are analysed to determine whether two or more section titles have overlapping positions along the vertical axis on a particular page of the SVG file 424. If the two or more section titles have overlapping positions along the vertical axis, but different positions along the horizontal axis, then those section titles are assumed to be in the header row of the table, and each section title is assumed to be the column title of a respective column of the table. Thus, by analysing the coordinates of section titles in step 802, the columns of a table and the column titles can be identified.

At step 804, horizontal lines and vertical lines in the SVG file are identified. Horizontal lines and vertical lines are identified by searching the SVG file 424 to locate each vector graphic element that represents a line having a substantially horizontal orientation or a substantially vertical orientation. At step 806, the vertices (i.e. the points of intersection) of the horizontal lines and vertical lines identified at step 804 are identified. The vertices are identified by analysing the geometry of the vector graphic elements identified in step 804 to determine the coordinates of any points at which a horizontal line intersects with a vertical line. At step 808, cells of the table are identified. Any rectangular or square region that is defined by the intersection of the horizontal lines and vertical lines is potentially a cell of the table. In order to determine whether such a rectangular or square region is actually a cell of the table, the coordinates of each region (as defined by the coordinates of the vertices that were identified at step 806) are compared with the coordinates of the column titles that were identified at step 802. If any of the column titles is found to be positioned within one of the rectangular or square regions, that rectangular or square region is deemed to be a cell within the header row of the table. Cells in the header row of the table can thereby be identified. The remaining cells in the table (i.e. cells that are not part of the header row of the table) can then by identified by analysing the coordinates of the other rectangular or square regions (as defined by the coordinates of their vertices) in order to identify a contiguous extent of adjoining rectangular or square regions. The contiguous extent of adjoining regions is deemed to represent the table, and each of its constituent regions is deemed to be a respective cell of the table. Hence, a table is thereby identified. At step 810, data describing the identified table is stored in the XML document 428. For example, XML elements describing the coordinates of each of the column titles identified at step 802 and the coordinates of each of the cells identified at step 808 can be added to the XML document 428. Similar XML elements can also be added to a table definition file 432 in order to facilitate the identification of other similar tables within the same AIP. One or more elements that describe the position along the horizontal axis of each vertical line that forms part of a table, together with one or more elements that identify the section or sub- section of the AIP to which that table belongs, may also be added to the table definition file 432 at step 810.

In the event that the DTS is unable to identify the table itself or to identify all of its rows or columns, user input is needed (at steps 312 and 930, described below) to resolve the tables.

The next step 312 in the method 300 is for the DTS to receive user input in order to complete the extraction of aeronautical information from the SVG file 424. Fig. 9 illustrates the operation of the receiving user input step 312. In step 910, a graphical user interface (GUI) is displayed. The GUI is operable to receive user input in order that the XML document 428 can be edited to include information from the SVG file 424. In step 920, the GUI is used to resolve section titles, i.e. to insert an XML element into the XML document 428 for each section title in the SVG file 424 that was not matched at step 710. Step 920 need not be performed if all section titles in the SVG file 424 were correctly identified at step 308 or, alternatively, step 920 may simply comprise allowing a user to use the GUI to verify that all of the section titles in the SVG file 424 were correctly identified at step 308. Step 920 can also modifying XML elements in the XML document in the event that any text in the SVG file 424 has been incorrectly identified as being a section title.

Whilst Fig. 3 shows the step 312 of receiving user input as being subsequent to the step 310 of identifying tables, it is preferable for steps 910 and 920 to occur prior to the step 310 of identifying tables. Allowing a user to resolve section titles before the DTS automatically identifies tables can improve the speed and accuracy of automatic table identification, by providing more reliable information regarding the position of section titles on the page of the SVG file.

Fig. 10 shows an example of a GUI window 1000 for resolving section titles in the SVG file 424. The GUI window 1000 comprises a display pane 1002 and a control pane 1004. A particular page of the SVG file 424 is displayed the display pane 1002. In the example shown in Fig. 10, the page of the SVG file 424 that is displayed corresponds to the AIP page that is illustrated in Fig. 1. Section titles that were identified in step 308 are highlighted in the display pane 1002, to allow a user to recognise them easily. Data fields are shown in the display pane 1002 without any highlighting.

The control pane 1004 contains buttons 1014 and an information panel 1018 to assist the user in resolving the strings. The buttons 1014 comprise a plurality of 'section' buttons 1016, a plurality of 'add' buttons 1020, a plurality of 'delete' buttons 1022 and a 'non- Annex 15' button 1024. Each 'section' button 1016 and the 'non-Annex 15' button is associated with a respective 'add' button 1020 and a respective 'delete' button 1022. Each of the 'section' buttons 1016 corresponds to a respective element of the AIP definition file 429, and each 'section' button 1016 is marked with a brief text description (such as "ad2", "ad2.1", "ad2.2 1" and "ad2.2_6_name") to allow easy identification of the element of the AIP definition file 429 to which it corresponds. When a user activates a 'section' button 1016 (for example, by moving a mouse pointer over the 'section' button 1016 and activating a mouse button), the information panel 1018 displays a description of the data that is identified by the corresponding element of the AIP definition file 429. This assists a user in the process of resolving section titles, because the information in the information panel 1018 helps the user to determine which 'section' button 1016 corresponds to a particular unresolved string in the display pane 1002 (and, hence, allows the unresolved string in the display panel 1002 to be associated with the correct element of the AIP definition file 429). Preferably, activation of a 'section' button 1016 causes the information panel 1018 to display a description that uses wording taken directly from the corresponding section of Annex 15. For example, as shown in Fig. 10, activation of the 'section' button 1016 marked with the text "ad2.2 1" causes the display panel to display the words "aerodrome reference point (geographical coordinates in degrees, minutes and seconds) and its site", which are taken directly from Annex 15 as indicated by reference sign 1101 in Fig. 11. In order to resolve an unresolved section title, a user first selects the unresolved section title 1012. The unresolved section title 1012 is preferably selected by a user moving a mouse pointer to one corner of the section title 1012 (for example, the bottom right corner) in the display pane 1002, and activating a mouse button whilst moving the mouse pointer to the diagonally opposite corner of the section title 1012 (for example, the top left corner). This causes a rectangular or square box to be drawn in the display pane 1002, such that the box encompasses the section title 1012. The user then activates the 'section' button 1016 corresponding to the unresolved section title 1012. Finally, the user activates the 'add' button 1020 that is associated with the 'section' button 1016. This causes the DTS to add an appropriate XML element to the XML document 428. The XML element includes the x and y coordinates of the box that was drawn by the user. Fig. 12 is an example of the XML elements that are added when section title 1012 has been resolved.

Whenever a user edits the XML document 428, an XML auditing element is added as a sub- element of each XML element that is affected by the user's editing. Fig. 13 is an example of an auditing element: The auditing element in this example is denoted by the <transaction> tag. The auditing element includes two attributes, known as a globally unique identifier (GUID) and a session identifier. The GUID (which has the attribute name "guid") is a text string that uniquely identifies a particular edit to the XML document 428. For example, the GUID is a 512-bit number expressed in hexadecimal. The session identifier (which has the attribute name "session-id") is a text string that identifies the time and date at which the XML document 428 was edited, together with the name of the user who performed the edit.

By including an auditing element whenever the XML document 428 is edited, the XML document 428 contains information documenting all of the changes that have been made to it. This can allow errors in the XML document 428 to be identified and corrected. For example, the XML document 428 can easily be searched to list all changes made by a particular user and/or made on a particular date, in order that those changes can be checked. This can help to ensure that the contents of the XML document 428 are accurate, thereby reducing the risk of errors that may prejudice the safety of an aircraft. The DTS preferably contains a software module to display the auditing information in a user- friendly format, in order to make it easier for a human to check the changes made to the XML document 428, so as to improve the likelihood of any errors being detected and thereby to improve the accuracy of data contained in the XML document 428.

If a wishes to delete an XML element from the XML document 428 (which may be desirable, for example, if the user makes a mistake or if a text string was incorrectly identified as being a section title at step 308), the user can activate the 'delete' button 1022 associated with the 'section' button 1016 corresponding to the XML element to be deleted. This causes the DTS to edit the XML document 428 by adding an attribute to the XML element to indicate that the XML element is to be considered as being deleted. For example, the XML element may be edited to include the following attribute: "edit_status=deleted". Importantly, the XML element is not actually deleted from the XML document 428, which allows the 'deleted' element to be reinstated (which may be desirable if the element was incorrectly deleted). Activating the 'delete' button 1022 also causes an auditing element to be marked as deleted. Step 920 can optionally also involve identifying information that is present in the SVG file 424 but that is not defined by Annex 15. This sort of data is referred to herein as 'non- Annex 15 content'. For example, many AIPs include an email address in Section AD 2.2.6, which contains contact details for an aerodrome. Annex 15 defines what these contact details should contain (e.g. a postal address and telephone number), but doesn't specify that the contact details may include an email address. This information should be marked as non- Annex 15 content since, otherwise, it would be incorrectly appended the to the text associated with another section. For example, an aerodrome's email address might otherwise be incorrectly treated as being part of the postal address or telephone number for the aerodrome. In order to identify non- Annex 15 content, a user then activates the 'add' button 1020 associated with the 'non- Annex 15' button 1024 and then selects the non- Annex 15 content (in the manner previously described for selecting a section title). This causes an XML element representing the non- Annex 15 content to be created. The XML element representing non- Annex 15 content is preferably stored in a separate XML document (not shown in Fig. 4) from the Annex 15 XML document 428, so as to simplify any later process of converting the Annex 15 XML document 428 to another format. In the GUI 1000, non- Annex 15 content may be highlighted in a different colour from the colour used to highlight section titles, to allow a user to recognise non- Annex 15 content easily. When the user has finished resolving section titles (which usually occurs when the user has resolved all of the section titles in the SVG file 424), the method proceeds to step 930. In step 930, the GUI is used to resolve tables, i.e. to identify any tables in the SVG file 424 that were not identified at step 310. Step 930 need not be performed if all tables in the SVG file 424 were correctly identified at step 310. Fig. 14 shows an example of a GUI window 1400 for resolving tables in the SVG file 424. The GUI window 1400 comprises a display pane 1402 and a control pane 1404. A particular page of the SVG file 424 is displayed in the display pane 1402. In the example shown in Fig. 14, the page of the SVG file 424 that is displayed corresponds to the AIP page that is illustrated in Fig. 2. The control pane 1404 comprises an information panel 1418, an 'add table' button 1428, an 'OK' button 1429, a plurality of 'section' buttons 1416, a plurality of 'set' buttons 1420 and a plurality of 'delete' buttons 1422. Each 'section' button 1416 is associated with a respective 'set' button 1420 and a respective 'delete' button 1422. Each of the 'section' buttons 1416 corresponds to a respective element of the AIP definition file 429, and each 'section' button 1416 is marked with a brief text description (such as "ad2.14 1" and "ad2.4_2") to allow easy identification of the element of the AIP definition file 429 to which it corresponds. When a user activates a 'section' button 1416, the information panel 1418 displays a description of the data that is identified by the corresponding element of the AIP definition file 429. The functionality of the information panel 1418 is the same as that previously described in respect of the information panel 1018 shown in Fig. 10.

The DTS highlights the header row 1412 of the table 1430 (i.e. the first row of the table 1430) to indicate to a user that the table 1430 was not identified at step 310 and that user input is required to resolve the table. In order to resolve a table, a user first selects the portion of the table 1430 that contains aeronautical information of interest. This is achieved by the user activating the 'add table' button 1428, and then moving a mouse pointer to one corner of the portion of the table 1430 that is of interest (for example, the bottom right corner 1426) in the display pane 1402. The user then activates a mouse button whilst moving the mouse pointer to the diagonally opposite corner of the portion of interest (for example, the top left corner 1425). This causes a rectangular or square box to be drawn in the display pane 1402, such that the box encompasses the portion of the table that contains the aeronautical information of interest. The user need not select the entire table 1430: for example, in the screenshot shown in Fig. 14, the user does not select the header row 1412, the second row or the bottom row of the table 1430, since these do not contain any aeronautical information of interest. When the portion of the table that contains the aeronautical information of interest has been selected, the user activates the 'OK' button 1429.

Activation of the 'OK' button 1429 causes the DTS to store, in a table definition file, the x and y coordinates of the corners 1425, 1426 of the box that was drawn by the user. The table definition file 432 is an XML document that is separate from the Annex 15 XML document 428. Coordinates that are stored in the table definition file 432 can be used to assist the DTS in automatically identifying similar tables at step 310 when the DTS is used to convert other pages in the same AIP. The x and y coordinates of the corners 1425, 1426 of the box are preferably expressed as an offset from a particular Annex 15 section title; that is, the table definition file 432 stores the distances along the x and y axes from a defined point in the SVG representation of the Annex 15 section title that immediately preceded the box to each of the corners 1425, 1426 of the box. For example, the table definition file 432 may store the distances along the x and y axes from the top left hand corner of the SVG representation of a particular Annex 15 section title to each of the corners 1425, 1426. Thus, when the DTS is used to convert other pages in the same AIP, the Annex 15 section title is identified at step 308 or step 920, and the coordinates of a defined point in the Annex 15 section title are extracted from the SVG file 424 for the similar AIP. The offsets in the table definition file 432 that are associated with that section title are added to the coordinates of the defined point in the section title in order to estimate the coordinates of a box containing all of the aeronautical information of interest in a table that is assumed to follow the section title. This can avoid the need for a user to select the portion of the table that contains the aeronautical information of interest, or can at least reduce the amount of effort required by the user by providing a good first approximation of the location of the portion of the table that contains the aeronautical information of interest. A set of coordinates is added to the table definition file 432 each time that a box containing aeronautical information in a table is identified by a user. Thus, the table definition file 432 eventually contains a large number of coordinates that can be used to estimate the location of the portion of the table that contains aeronautical information. The DTS is able to select from amongst a plurality of sets of coordinates in the table definition file 432 in order to generate a best estimate of the location of a portion of the table that contains aeronautical information.

As mentioned above, the table definition file 432 may contain one or more elements that describe the position along the horizontal axis of each vertical line that forms part of a table, together with one or more elements that identify the section or sub-section of the AIP to which that table relates. A table definition file 432 that contains such elements can be used to identify similar tables in the same AIP by comparing the positions of vector graphic elements that define vertical lines in the SVG file 424 with the positions of vertical lines for other tables that are defined in the table definition file. This allows the DTS to determine whether a table that has yet to be identified has columns with positions that are substantially the same as those of a table that has previously been identified and, if so, the table that has yet to be identified is deemed to belong to the same section of the AIP as the previously-identified table.

The next step in resolving the table is for the user to identify columns in the table 1430. A user first selects any cell in the column 1432 to be identified, by moving a mouse pointer over the cell and activating the mouse button. This causes the cell to be highlighted. The user then activates the 'section' button 1416 corresponding to the item of information in Annex 15 to which the column 1432 containing the selected cell relates. This causes a portion of the column 1432 that both contains the selected cell and that is within the selected portion of the table 1430 to be highlighted, as shown by the shading and dark borders around the portions of each column 1432 in Fig. 14. Finally, the user activates the 'set' button 1420 that is associated with the activated 'section' button 1416. This causes the DTS to add an appropriate XML element to the XML document 428.

Fig. 18 is an example of an XML element that is added to the table definition file 432 by the step of resolving tables 930. In this example, the <defmition> element added to the table definition relates to table 200 that is shown in Fig. 2. The <definition> element contains a <column> child element for each of the columns of the table. The <column> elements contain the following child elements: <x>, which defines the horizontal position of a particular point (such as the left-hand edge) of a column; <width>, which defines the width of a column; and <associated-data-element>, which identifies the section title associated with a column. The <top> element defines the position of the top of the table. For example, the <top> element can define the position of the top of the table as an offset from the position of a section title. The <bottom> element defines the position of the bottom of the table. For example, the <bottom> element can define the position of the bottom of the table as an offset from the start of the next section in the AIP, the end of a page of the AIP or a line nominated when the table was set up.

When the user has finished resolving tables (which usually occurs when the user has resolved all of the tables in the SVG file 424), the method proceeds to step 940. In step 940, the GUI is used to extract aeronautical information, i.e. to add values to XML elements in the XML document 428 based on information in the SVG file 424. Fig. 16 shows an example of a GUI window 1600 for extracting aeronautical information. The GUI window 1600 comprises a display pane 1602, a data tree 1604 and an information panel 1618. A particular page of the SVG file 424 is displayed in the display pane 1602. In the example shown in Fig. 16, the page of the SVG file 424 that is displayed corresponds to the AIP page that is illustrated in Fig. 1. The DTS highlights each item of aeronautical information 1606 in the display pane 1602. Preferably, an item of aeronautical information 1606 is highlighted in a first colour (such as red, for example) before it has been extracted, and highlighted in a second colour (such as blue, for example) after it has been extracted, so as to draw a user's attention to the items of aeronautical information that need to be extracted.

The data tree 1604 is generated by the DTS based on the content of the AIP definition file 429. The data tree 1604 contains a plurality of elements 1614, wherein each element 1614 corresponds to an element of the AIP definition file 429. Thus, the data tree 1604 contains an element corresponding to each Annex 15 section title, and also an element corresponding to each piece of information that Annex 15 requires to be present in each section of an AIP. The data tree 1604 reflects the hierarchical structure of an AIP, as discussed above with reference to Fig. 5. An 'expand/collapse' button 1612 is provided at each level of the tree that comprises one or more child element. Activating the 'expand/collapse' button 1612 toggles the display of the child elements in the data tree 1604, and thereby controls which elements of the AIP definition file 429 are displayed in the data tree 1604. Each element 1614 in the data tree 1604 is also associated with a status icon 1608, 1610, which indicates whether the element (and its child elements, if applicable) has been populated with valid aeronautical information. For example, status icon 1608 takes the form of an exclamation mark to draw a user's attention to the fact that this element (or one or more of its child elements) has not been populated with valid aeronautical information, whereas status icon 1610 takes the form of a tick to indicate that this element (and all of its child elements) has been populated with valid aeronautical information. Additionally or alternatively, elements 1614 may be displayed in different colours to indicate whether they have been populated with valid aeronautical information. When a user selects any of the elements 1614 in the data tree 1604, the information panel 1618 displays a description of the data that is identified by the corresponding element of the AIP definition file 429. The functionality of the information panel 1618 is the same as that previously described in respect of the information panel 1018 shown in Fig. 10. In order to extract aeronautical information from the SVG file 424, a user selects an element 1614 from the data tree 1604. The element is selected by moving a mouse pointer over the name of the element 1614 in the data tree 1604 and activating a mouse button, for example. When the user selects an element 1614 from the data tree 1604, the DTS causes a form 1700 to be displayed. Fig. 17 illustrates an example of a suitable form 1700. The form 1700 comprises a 'new value' box 1702, a 'current value' box 1704, a 'commit' button 1706 and a 'validate' button 1708. The form 1700 also displays the name 1701 of the selected element 1614. The 'current value' box 1704 contains the value of the selected element 1614 that is currently recorded in the XML document 428. In the event that the value of the selected element 1614 is not currently recorded in the XML document (which will be the case before that particular item of aeronautical data from the SVG file 424), the 'current value' box 1704 is preferably initially populated by automatically extracting computer-readable text from the SVG file 424. The computer readable text that is extracted from the SVG file in order to populate the 'current value' box is determined based upon one or more criteria, such as: • The position of text relative to a previously-identified section title. The coordinates of computer-readable text in the SVG file 424 are compared with the coordinates of previously-identified section titles in the SVG file 424, to determine whether the text is likely to be associated with a particular section title. If the position of the text relative to a section title suggests that the text and section title are associated with one another, that text is deemed to be the value of an element associated with that section title. For example, text that is immediately to the right of, or immediately below, a previously-identified section title may be assumed to represent the value of an element associated with that section title.

• The position of text within a previously identified table. The coordinates of computer- readable text in the SVG file 424 are compared with the coordinates of table cells that were identified in step 858, to determine whether the text is likely to be the value of an element. If the text is positioned within a cell of a table, the coordinates of the text and the coordinates of previously-identified section titles and/or column titles are used to determine which element the text relates to.

It will be appreciated the other suitable criteria may be used. If the value displayed in the 'current value' box 1704 is incorrect (which may be the case when the value is based upon computer-readable text that has been automatically extracted from the SVG file 424), the user can correct the value by adding the correct value to the 'new value' box 1702.

In order to reduce the risk of errors being introduced when the value of an element is being corrected, the form 1700 provides drag-and-drop functionality to reduce or eliminate the need for a user to type the correct value of an element. More particularly, a user can highlight some or all of the text in the 'current value' box 1704 by placing a mouse pointer at the start of the text to be highlighted and then activating a mouse button whilst moving the mouse pointer to the end of the text to be highlighted. The user can then copy the highlighted text from the 'current value' box 1704 to the 'new value' box 1702 by placing the mouse pointer over the highlighted text in the 'current value' box 1704, activating the mouse button, and moving the mouse pointer to the 'new value' box 1702. This causes the highlighted text to be dragged- and-dropped to the 'new value' box 1702. There is preferably no need for a user to activate any menu options or keyboard shortcuts in order to make use of this drag-and-drop functionality. A user can edit the text in the 'new value' box 1702 using a keyboard, but it is preferable for the 'new value' box to be populated using only the drag-and-drop functionality in order to avoid typing errors. Once the 'new value' box 1702 contains the correct value for the selected element 1614, the user activates the 'validate' button 1708. When the 'validate' button 1708 is activated, the DTS compares the value in the new 'value box' 1702 with the Annex 15 XML schema 431 to determine whether the value is valid. For example, the DTS may check that the value is numeric if the element of the Annex 15 schema requires a numeric value, or the DTS may check that the value is text if the element of the Annex 15 schema requires a text value. Any other suitable checks may be performed to determine the validity of values. Hence, the validation process can identify errors in the information entered using the form 1700 and can prevent those errors being introduced into the aeronautical information. If the DTS determines that the value is valid, then the appearance of the 'validate' button 1708 changes to indicate to the user that the value is valid; otherwise, the appearance of the 'validate' button 1708 changes to indicate to the user that the value is invalid, in order that the user may correct the value. Alternatively or additionally, a standalone program can be used to validate elements in the Annex 15 XML file 428 against the Annex 15 XML Schema. Once the value in the 'new value' box 1702 has been validated by the DTS, the user activates the 'commit' button 1706. When the 'commit' button 1706 is activated, the DTS copies the current contents of the 'new value' box 1702 to the appropriate element in the XML document 428, in order that the value of the element in the XML document 428 is equal to the value currently in the 'new value' box 1702. Activation of the 'commit' button 1706 also causes an auditing element to be added to the XML document 428, as described above. The auditing element can include, for example, data indicating the identity of the user that changed the value of the element, the values of the element before and/or after the change, and the time at which the value of the element was changed. When the auditing element has been added, the DTS closes the form 1700 in order that the user can extract the next item of aeronautical information using the GUI window 1600. The form 1700 shown in Fig. 17 is capable of populating only one element of the XML document 428, but the DTS can also display forms that are capable of populating several elements at once. For example, if a user selects an element 1614 from the tree view 1604 that has a plurality of child elements, the form 1700 can display a 'new value' box 1702 and a 'current value' box 1704 for each of the child elements. In this case, activating the 'commit' button 1706 causes the XML document 428 to be edited to reflect all of the new values that were entered into the 'new value' boxes 1704.

The DTS can also display forms that comprise one or more drop-down boxes (not shown in Fig. 17) in order to allow a user to select a unit of measurement from amongst a limited selection of valid units of measurement. For example, the form can comprise a drop-down box to allow a user to either select "metres" or "kilometres" as the unit of measurement for a "runway length" element. The valid units of measurement that are used to provide the options available for selection from the drop-down box are preferably contained in the Annex 15 XML schema 431, but may alternatively be provided in a separate configuration file. The use of such a drop-down box can prevent a user from specifying invalid units of measurement, and can also speed up the process of extracting aeronautical information from the SVG file 424. It will be appreciated that any other suitable sort of GUI feature that allows a user to select a unit of measurement from amongst a limited selection of valid units of measurement may also be used; for example, radio buttons may be used.

Whilst the form 1700 is illustrated in Fig. 17 as a separate window that is overlaid on the GUI window 1600 (a so-called "pop-up window"), it will be appreciated that the functionality of the form could be provided in a pane of the GUI window 1600 such that the form 1700 is persistently displayed.

When the user has extracted all of the aeronautical information from the SVG file 424, the XML document 428 contains all of the aeronautical information that was present in the paper AIP 420 or electronic AIP 426 in a format that complies with the Annex 15 XML schema 431. It is possible to extract aeronautical information completely automatically, without requiring user input. In this case, rather than populating the 'current value' box 1704 with a text string that has been automatically extracted, the automatically-extracted text string is written to the XML document 428 without displaying the form 1700. For each item of automatically- extracted aeronautical information, the data written to the XML document 428 comprises: the element of the Annex 15 XML schema 431 that corresponds to the automatically-extracted text string; and the automatically- extracted text string itself, which is written as a value of the element. The automatically-extracted information may be validated against the Annex 15 XML schema 431 before it is written to the XML document 428. It is faster to extract aeronautical information completely automatically than to require user input. On the other hand, user input can allow errors to be detected and corrected, thereby improving the accuracy of the extracted information.

Once the aeronautical information contained in the AIP paper AIP 420 or electronic AIP 426 is contained in an XML document 428 that complies with the Annex 15 XML schema 431, it is relatively simple to convert the aeronautical information to an XML document 430 that complies with the AIXM XML Schema. In particular, since the contents of the Annex 15 XML schema 431 and AIXM XML Schema are both known, it is a relatively simple task to program a computer to perform an automated conversion of each element of an XML document that complies with one schema to the corresponding elements of an XML document that complies with the other schema. For example, an appropriate XSL Transformation (XSLT) can be devised to convert an XML document 428 that complies with the Annex 15 XML schema 431 to an XML document 430 that complies with the AIXM XML Schema. XSLT is known to those skilled in the art of XML document processing and need not be described in detail here. Once the aeronautical information has been converted to an AIXM- compliant document, known techniques can be used to store the aeronautical information in a database whose structure is defined by AICM.

In the foregoing description, steps 708, 810, 920, 930 and 940 all add XML elements to the same XML document 428. Alternatively, each of these steps can add XML elements to a respective intermediate XML document (not shown in Fig. 4), and the intermediate documents may be combined and/or transformed (using appropriate XSL Transformations, for example) in order to create a single Annex 15 XML document 428.

As noted previously, AIPs are also updated on a regular basis. The method described herein is preferably performed only once on a given AIP, so as to generate an XML representation of all the aeronautical information contained in a paper AIP 420 or electronic AIP 426 at a particular date. Since some of the aeronautical information extracted from this AIP will be superseded when an updated AIP is published, a computer program for updating the extracted information can be provided. This computer program preferably updates entries in the AICM database, but may additionally or alternatively update elements and values in the Annex 15- compliant XML document 428 or the AIXM-compliant XML document 430. A further computer program can also be provided to convert aeronautical data from the AICM database, Annex 15-compliant XML document 428 or the AIXM-compliant XML document 430 to an electronic AIP 426, which can then be printed to produce a paper AIP 420.

The method described herein can be performed by an apparatus. Preferably the apparatus comprises a general purpose computer, or a plurality of general purpose computers that are arranged to communicate with one another. Alternatively, the apparatus may be implemented by dedicated hardware, such as one or more application-specific integrated circuits (ASICs) or appropriately connected discrete logic gates. A suitable hardware description language can be used to implement the method described herein with dedicated hardware. In the event that the apparatus comprises a general purpose computer or a microprocessor, a processor readable medium comprising instructions for causing a suitable processor to perform the method described herein is also provided. The processor readable medium may be: a read-only memory (including a PROM, EPROM or EEPROM); random access memory; a flash memory; an electrical, electromagnetic or optical signal; a magnetic, optical or magneto- optical storage medium; one or more registers of a processor; or any other type of processor readable medium. The processor may be a microprocessor, a microcontroller or any other suitable device for executing stored instructions.

It will be understood that the invention has been described above purely by way of example, and that modifications of detail can be made within the scope of the invention.