Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD OF CODING VIDEO AND AUDIO TRANSMISSIONS
Document Type and Number:
WIPO Patent Application WO/2013/186543
Kind Code:
A2
Abstract:
The present invention provides a number of novel codecs for encoding still and moving images in vector form. In particular, aspects of the invention provide an adaptive codec for vectorisation of photographic images (VPI) and a motion codec which provides for encoding and decoding moving images making use of VPI.

Inventors:
WILLIS PHILIP (GB)
PATTERSON JOHN (GB)
Application Number:
PCT/GB2013/051522
Publication Date:
December 19, 2013
Filing Date:
June 11, 2013
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV BATH (GB)
Domestic Patent References:
WO2008003944A22008-01-10
Foreign References:
GB0613199A2006-07-03
Other References:
ABASSI; MOKHTARIAN; KITTLER: "Curvature scale space image in shape similarity retrieval", MULTIMEDIA SYSTEMS, vol. 7, 1999, pages 467 - 476
"Postscript Language Tutorial and Cookbook", 1985, ADDISON- WESLEY READING
AGUI, T. ET AL.: "An Automatic Interpolation Method of Gray-Valued Images Utilising Density Contour Lines", ROC EUROGRAPHICS'87, pages 405 - 421
"Graphical Kernel System", February 1984, AMERICAN NATIONAL STANDARD INSTITUTE
"Programmer's Hierarchical Interactive Graphics Standard (PHIGS", February 1984, AMERICAN NATIONAL STANDARD INSTITUTE
ARMSTRONG, M. A.: "Basic Topology", 1983, SPRINGER SCIENCE
BARTELS, R.: "An Introduction to Splines for Use in Computer Graphics and Geometric Modelling", 1987, MORGAN KAUFFMAN
BLINN, J.: "What is a Pixel?", IEEE CG&A, September 2005 (2005-09-01), pages 82 - 87
CARRATO S.: "A Simple Edge-Sensitive Image Interpolation Filter", PROC. LNT. CONF. IMAGE PROCESSING, 1996, pages 311 - 314
CATMULL E.; SMITH A.R.: "Computer Graphics (Proc SIGGRAPH'80", July 1980, ACM, article "3D Transformations of Images in Scanline Order", pages: 279 - 285
CRIMINISTI A.: "Object Removal by Exemplar-Based In-Painting", PROC. CVPR'03, June 2003 (2003-06-01)
COCKTON G: "Architecture and Abstraction in Interactive Systems", PHD THESIS, 1993
CUISENAIRE, 0; MACQ, B: "Fast and Exact Signed Euclidean Distance Transformation with Linear Complexity", IEEE PROC ACOUSTICS, SPEECH, AND SIGNAL PROCESSING 99, vol. 6, 15 March 1999 (1999-03-15)
DUCE, D. ET AL.: "SVG Tutorial", W3C, 2000
STANDARD FOR CHARACTERIZATION OF IMAGE SENSORS AND CAMERAS CONTENTS RELEASE 3.0, 29 November 2010 (2010-11-29)
FABBRI, R.: "2D Euclidean Distance Transform Algorithms: A Comparative Survey", ACM COMPUTING SURVEYS, vol. 40, no. 1, February 2008 (2008-02-01), pages 2.1 - 2.44
FARIN G.: "Curves and Surfaces for Computer-Aided Geometric Design(4th edition)", 1996, ACADEMIC PRESS
FERRARIOLO J.: "Scalable Vector Graphics (SVG) 1.0 Specification", WORLD WIDE WEB CONSORTIUM, 2000
FINCH C.: "The Art of Walt Disney", 1973, HARRY N. ABRAMS, INC., pages: 180
GLASSNER, A.: "An Introduction to Ray Tracing", 1989, MORGAN KAUFMANN
ITOH K; ONHO Y: "A curve fitting algorithm for character fonts", vol. 6, September 1993, ELECTRONIC PUBLISHING, pages: 195 - 205
KIM, T-H ET AL.: "Image Dequantisation: Restoration of Quantized Colours", PROC. EUROGRAPHICS 2007 IN COMPUTER GRAPHICS FORUM, vol. 26, no. 3, 2007, pages 619 - 626
KUNII, T. L; MAEDA, T.: "Proceedings of Computer Animation '96", 1996, IEEE COMPUTER SOCIETY PRESS, article "On the Silhouette Cartoon Animation", pages: 110 - 117
KUNII,T.L. ET AL.: "Proceedings of Third Asian Technology Conference in Mathematics", 1998, SPRINGER, article "Technological Impact of Modern Abstract Mathematics"
LECOT, G; LEVY, B: "ARDECO: Automatic Region Detection and Conversion", PROC. EUROGRAPHICS SYMPOSIUM ON RENDERING, 2006, pages 1604 - 1616
LINDBERG, T.: "Edge Detection and Ridge Detection with Automatic Scale Selection", INT. J. COMP. VIS., vol. 30, no. 2, pages 117 - 154
LORENSEN W.E.; CLINE, H.E.: "Marching Cubes: a High Resolution 3D Surface Reconstruction Algorithm", COMPUTER GRAPHICS, 1987 (PROC. SIGGRAPH'87, vol. 21, no. 4, pages 163 - 169
MATHERON G.: "Random Sets and Integral Geometry", 1975, WILEY
MEIJSTER, A.; ROERDINK, J.; HESSELINK, W.: "A general algorithm for computing distance transforms in linear time", PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON MATHEMATICAL MORPHOLOGY AND ITS APPLICATIONS TO IMAGE AND SIGNAL PROCESSING, 2000
NAKAJIMA, M: "Coding Method of Gray-Valued Image by Density Contour Lines", TECH. REP. I.E.C.E. JAPAN IE83-74, 1983, pages 1 - 6
NEBEL J.C.; COCKSHOTT W.P.; YARMOLENKO V.; BORLAND E.; MACVICAR D.: "Pre- commercial 3-D digital TV studio", IEE PROCEEDINGS - VISION, IMAGE, AND SIGNAL PROCESSING, 2005
ODDY R.; WILLIS P.: "A Physically Based Colour Model", COMPUTER GRAPHICS FORUM, vol. 10, no. 2, June 1991 (1991-06-01), pages 121 - 127
ORZAN A.: "Structure Preserving Manipulation of Photographs", PROC. NPAR, 2007, pages 103 - 110
ORZAN A.: "Diffusion Curves: A Vector Representation for Smooth-Shade Images", PROC. SIGGRAPH, 2008
PATTERSON J.; COCKTON G.: "Compositing Hierarchically Structured Images", COMPUTER GRAPHICS FORUM, vol. 11, no. 3, September 1992 (1992-09-01)
PORTER T.; DUFF T.: "Compositing Digital Images", PROC SIGGRAPH '84, July 1984 (1984-07-01)
PRICE, B.; BARRETT, W.: "Object-Based Vectorisation for Interactive Image Editing", VIS. COMP., vol. 22, no. 9-11, 2006, pages 661 - 670
REEVES, W.T.: "in-betweening for Computer Animation Utilising Moving Point Constraints", COMPUTER GRAPHICS (PROC SIGGRAPH 81, vol. 15, no. 3, pages 269 - 276
SCHNEIDER P.: "An Algorithm for Automatically Fitting Digitised Curves", 1990, ACADEMIC PRESS
SERRA J.: "Image Analysis and Mathematical Morphology", 1982, ACADEMIC PRESS
SETHIAN J.A.: "Level Set methods and Fast Marching Methods, Second Edition,", 1999
SHANTZIS M.: "A Model for Efficient and Flexible Image Computing", PROC SIGGRAPH’85, August 1985 (1985-08-01)
SMITH, A. R.: "A Pixel is Not a Little Square, A Pixel is Not a Little Square, A Pixel is Not a Little Square! (And a Voxel is Not a Little Cube", TECHNICAL MEMO 6, 1995
SUN, J ET AL.: "Image Vectorisation using Optimized Gradient Meshes", ACM TOG, vol. 26, no. 3, 2007, pages 11 - 1,7
TAYLOR, C.D PRIVATE COMMUNICATION
VANSICHEM G.; WAULTERS E.; VAN REETH F.: "Real-Time Modelled Drawing and Manipulation of Stylised Characters in a Cartoon Animation Context", PROC. IASTED INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS AND IMAGING (CGIM 2001, 2001, pages 44 - 49
VINCENT L.: "Morphological Grayscale Reconstruction in Image Analysis: Applications and Efficient Algorithms", IEEE TRANS. IP, vol. 2, no. 2, April 1993 (1993-04-01), pages 176 - 201
WARNTZ W.; WOLDENBERG, M.: "Concepts and Applications -Spatial order", HARVARD PAPERS IN THEORETICAL GEOGRAPHY, May 1967 (1967-05-01)
WEISS, B: "Fast Median and Bilateral Filtering", PROC SIGGRAPH, 2006
YU, J-H: "Polar coordinates linear interpolation (PCLI", DEPT OF COMPUTING SCIENCE, 1990
YU X.: "Image Reconstruction Using Data-Dependent Triangulation", IEEE CG&A, vol. 21, 2001, pages 62 - 68
YUKSEL C.: "On the parameterisation of Catmull-Rom Curves", PROC SIGGRAPH, 2010
ZHANG L; SNAVELY N; CURLESS B; SEITZ S.M: "Spacetime faces: high resolution capture for modeling and animation", PROC SIGGRAPH 04
Attorney, Agent or Firm:
HODSDON, Stephen J. et al. (33 Gutter LaneLondon, Greater London EC2V 8AS, GB)
Download PDF:
Claims:
CLAIMS

1. A method of encoding an image in vector form, the method including the following steps:

a) reading the image in pixel form;

b) computing the errors in pixel values;

c) determining a plurality of levels to subdivide the pixel values;

d) finding the intersections of each of said levels with the pixels;

e) identifying extrema candidates outside said levels; and

f) compiling said intersections into one or more chains defining a contour for each of said levels using the errors in the pixel values as tolerances for the fitting of said chains, wherein said step f) further includes the steps of:

for each of said intersections, creating a contour path in the form of a polygon whose vertices are the intersection of the level line of the contour with the interpolation of the adjacent pixel values together with the interpolated errors in those pixel values;

resampling the image along the contour path at a fixed chord length which is determined as an exact fraction of the total border length of said polygon sufficient to suppress pixel-pitch noise;

forming forward differences up to the fourth difference for each coordinate on said resampled contour path;

determining the longest cubic curve that passes a plurality of said coordinates on the resampled contour path within the error bounds for each of said coordinates and which has the smallest fourth differences at its end points, and repeating until all the coordinates on said resampled contour path are associated with at least one such cubic curve; and

reconciling any overlapping end points between the determined cubic curves to form the complete Bezier chain defining the contour; and

calculating all interior control points defining the cubic curves in said chain.

2. A method of encoding an image according to claim 1 , wherein the step of determining the cubic curve includes the sub-steps of:

creating a list of said fourth differences summed in x and y and sorted by absolute magnitude;

for each potential Bezier segement in said chain, repeatedly:

selecting the smallest unused fourth difference and discarding it from the list and retaining the difference table index; computing a stack of central differences consistent with a cubic curve passing through the curve point of said index;

using said difference stack to generate cubic values which lie within the error bounds of the points used to form said differences and restarting the process when said cubic values exceed starting point error bounds by reorienting the curve or changing the shape of the curve, or terminating the process when restarting is no longer possible;

until all the resampled points in the difference table are associated with at least one cubic segment.

3. A method of decoding a vector encoded image, the method including the steps of: reading the encoded image data; and

selecting each enclosing contour and repeatedly, for each enclosing contour:

reconstructing a footprint using tiles which are a square array of sample values of equal or higher resolution than the original image sampling, initially at a predetermined rendering resolution;

rendering contours into enclosing contour footprint tiles at said predetermined rendering resolution for border and enclosed contours;

if a contour is already present in a particular tile, subdividing the tile using a rule with progressively smaller bias coefficients in each subdivision level until the tiles each contain only one piece of a single contour;

filling tiles within footprints with quasi-Euclidean distances / from the enclosing contour boundary, and propagating border slopes synthesised by scaling by distance from contributing sample points;

filling enclosing contour footprint tiles minus enclosed contour footprints with quasi-Euclidean distances j from enclosed contour boundaries, again synthesising interior slopes from contributing pixels; and

interpolating contour levels using i/(i+j) as an index into tiles.

4. A method according to claim 4 wherein the quasi-Euclidean distances /" and j are the Euclidean distances from the enclosing contour boundary modified to provide intracontour smoothing.

5. A method of encoding a series of moving images in vector form, the method including the steps of:

setting up a package structure; repeatedly, until said package structure is full:

obtaining a pixel array representing the next image in the series; encoding said pixel array in vector form and placing the result of said encoding in a predetermined data structure;

identifying common substructures within said encoded array compared to one or more previous arrays and placing said identified common substructures in a group list; and adding said group list of common substructures and said encoded pixel array to said package structure, and

sending said full package,

wherein the step of identifying common substructures identifies common contours within said array and further includes the steps of:

modifying the contour structure to form a contour scale space in which successive versions of the contour are fitted with Bezier chains together with petal numbers, petal maps and curvature maps;

using the contour scale space to populate a contour similarity hierarchy which has three discriminants: a level number which is the number of levels needed to achieve a hierarchy, a maximum side length for the bounding box for the contour and an aspect ratio for the adjacent bounding box sides;

for each contour in said array, if a match is found within said hierarchy:

scaling, rotating or translating the input contour to match the bounding box;

working systematically through the levels in said hierarchy in a descending manner, matching petal counts, petal maps and curvature maps using tree- based sorting of indices which determine if the discriminant is fine enough, and discarded at that level otherwise;

at each level:

if a successful match is found, setting a flag indicating this and proceeding to matching at the next lower level, if present, and forwarding the flag if not present;

if a match is found within one or more regions of the contour in the hierarchy setting a flag indicating this and proceeding to matching at the next lower level, if present, for the regions of the contour that match, and forwarding of the unmatching portions with the flag if not present;

unsuccessful matching causes entry of the remaining levels of the contour into the hierarchy and the forwarding of the contour as novel:

if a match is not found within the hierarchy, then the characteristics of the contour are entered into the hierarchy as new initial discriminants along with the hierarchy descending from them as an alternative sub-hierarchy.

6. A method according to claim 5 wherein the scale factors used in scaling the input contour are retained to provide independent scaling in x and y directions in order to discriminate between contours simplified using the contour scale space.

7. A method of decoding a series of vector encoded images, the method including the steps of:

receiving image data comprising: data describing a plurality of contours and data describing a plurality of transformations to be performed on said contours;

storing, in a cache, a plurality of said contours;

reconstructing each image in the series by applying the specified transformations to one or more of the stored contours and rendering the image using the transformed contours.

8. A method according to claim 7 in which the series of images are encoded by a method according to claim 5 or claim 6.

9. A computer program which, when run on a computer, performs the method of any one of claims 1 to 8.

10. A computer data carrier containing a computer program according to claim 9.

1 1. A read-only memory device arranged to perform the method of any one of claims 1 to 8.

12. A programmable logic array arranged to perform the method of any one of claims 1 to 8.

Description:
METHOD OF CODING VIDEO AND AUDIO TRANSMISSIONS Field of the Invention

The present invention relates to methods of encoding and decoding pictures and video transmissions. BactoguTid of e Invention

Over the past few years, increasing amounts of video data have been produced and are being transmitted across computer networks. This is particularly true with the advent of services which offer individual customers "on demand" access to television programmes and films. However, the availability of such services can often be hampered by the size of the data files storing the appropriate video or audio data whilst retaining the desired quality of picture or sound. The size of the data files is not only relevant for the storage of these files on the servers of service providers and in back up locations, but also results in increased demand on bandwidth of the computer networks used to distribute this data to the end user at an acceptable speed and resolution.

In order to capture video images or audio recordings at high resolution, it is often necessary to apply some form of encoding or compression at the stage of recording those images or sounds, in order to allow the high resolution images or data to be extracted from the recording device at a sufficient rate to match the quantity of data being recorded. A common problem in distributing digital images and movies is that of catering for varying image or film formats. For example a short sequence of a feature film may be shown on standard TV (768x576), HDTV (1920x1024), internet video (various), or even mobile phones (anything from 384 x 256 upwards). If shown for publicity reasons the producers will want this to be shown at the best quality possible. In effects-houses which concentrate on advertising a significant proportion of time is spent just converting between the various digital formats on which the advertisement is to be shown. The problem arises because all digital images have to be sampled in order to be seen at all, but different kinds of display device have, of necessity, to show the images at differing resolutions. One possible solution to this problem is to use a resolution-independent image format which provides a rule for determining colours in the image anywhere within the image boundaries, independently of any sampling grid. Vector formats, historically associated with drawn images rather than with photographs, can provide such a resolution-independent rule but none of the existing fully automatic fill rules for vector formats work well on photographic images.

There are a number of systems and plug-ins available to turn images into vector form (e.g. Adobe Live Trace™) but they are all to an extent compromised by the absence of a good rule for determining varying colour values in a continuous field. Tools like Live Trace™ extract isochromic contours from sampled images but have to extract a sufficiently large number of contours to preserve the illusion of a smooth surface on a sampled display. This is because the usual rule for automatically filling between contours is to provide a constant colour (here called flat fill) in the viewable rasterised version of the image so the results can be thought of as providing a series of step changes in colour values rather than a continuous variation in those values.

Figure 1 (i) shows the original rendering of a photograph of Baku Underground station. In Figure 1 (ii) we have used a flat fill regime on an image which was vectorised with the intention of using a diffusion-based fill regime of the kind described in the present application, for comparison. An aim of the diffusion-based approach is to allow for the use of fewer contours without compromise to the appearance of smoother parts of the image and this has shown up a particular weakness of flat fill in the form of visible bands of colour changes (Mach bands). Indeed the usual application of such systems is to produce an artistic effect rather than a realistic outcome, so less care is taken about finding a truly accurate contour. It is therefore desirable to provide a method that produces high-quality (high-accuracy) contours to achieve synthetic images which are intended to be visually indistinguishable to the originals in an Adaptive codec.

It is an aim of the present invention to provide methods and systems for rendering contoured images and sequences to give high-fidelity and other photorealistic outputs without Mach bands. Encoding/compression techniques for images (still or video) and sound generally comprise two complementary portions: an encoder for producing the encoded/compressed signal from the raw input data and a decoder for producing the output data (sound or image) from the encoded data. Collectively these coding/decoding methods are normally referred to as "codec"s.

Vectorising (contourising) as applied to an array of sample points is a technique whose origins go back to geographical information systems [War&67], where contour maps were to be produced from spot height surveys, and was first applied directly to photographic images by Matheron [1975] and later by Serra[1982] (these are the early references to the morphological approach). At about the same time Nakajima et al. [1983] (a more accessible reference is Agui et al. [1987]) proposed an approach more closely allied to the techniques of computer graphics. The fill methods used in image reconstruction in all these references is flat fill. More recently Price & Barrett [2006] and Sun et al. [2006] have proposed methods for generalising from flat fill while keeping the same number of contours [Sun&06] or sample points [Pric&06] by building an adaptively subdivided mesh where colours are associated with mesh intersection points, although some interaction is required to determine the starting shape of the mesh. Both the method of contour finding and mesh generation (Live Trace™[Pric&06][Sun&06] and gradient mesh tool[Sun&06] respectively) are available in commercial drawing packages but the papers focus on smooth colour interpolation and mesh optimisation for minimal dataset size. However this approach simply swaps one set of sample points for another more feature-oriented set, as befits the type of calculation they want to do, and offers no help to calculations like histogram equalisation or processes involving preserving features of the isosurface topology (The isosurface is defined by a given set of isochromic contours selected from the set of quantised colours used). Another approach is to use a data-dependent triangulation (DDT) based on the range of techniques described by Dyn et al. [1990], for example the approach used by Yu et al.

[2001]. Here the idea is to adaptively triangulate the image so that each vertex is associated with a colour and the image is reconstructed by a barycentric interpolation between the colour values associated with each vertex. The main part of the algorithm involves an initial convex hull triangulation based on image sample points so that triangle edges approximately align themselves with isochromic contours while optimising an overall smoothness function, and then improves this triangulation by edge reconnections and deletions. While the contours are not found explicitly, the alignment of some edges will correspond to image features and the optimisation steps are aimed at enhancing these correspondences. This in effect replaces a sampled form of the isochromic surface with an approximate triangulation of the continuous isosurface as is commonly done in topology [Arms83], However, once again there is no support for the kinds of image operation we consider here as for example there is no guarantee that a given isochromic contour will be matched everywhere by a connected set of triangle edges. There is also the question as to whether a planar triangulation will survive an arbitrary (2D) image warping operation.

There are, however, other approaches to the problem of vectorising images than by using isochromic contours, for example the ARDECO approach [Leco&06] where a segmentation, filled by the various fills available to SVG, is fitted to an image to within a given error bound, here a threshold. The resulting structure can be made to support statistical image manipulation operations like histogram equalisation in a similar way to our diffusion approach. In fact it is possible to extend 'flat fill' for a uniformly quantised colour space to a diffusion-like fill using the technique of Kim et al. [2006] and that would include the simple contourisation model we have used for the images presented here, but not its generalisations. As Kim et a/.'s[2006] algorithm attempts to model the isosurface as a smooth continuous surface everywhere other than at edges (so is non-linear anywhere) this complicates the working of an algorithm like histogram equalization, but does provide some degree of support. Another approach which does not use contours, but rather chooses edges as the key feature, is that due to Orzan et al. [2007] [2008]. Here the idea is to use edge- lines as the vectors and to decorate the lines with colour data. This is then propagated away from the edge using a Poisson equation. When used for images the edge lines correspond to discontinuities in otherwise smooth shading and reconstructed images look a lot like their original forms although the unconstrained use of Poisson equation diffusion results in quite inaccurate diffusion boundaries. It is the insight of Lindeberg [1998] (and others) that this inaccuracy tends not to be noticed which Orzan et al. are exploiting here. They also note ([Orza&08]) that a similar decorated-edge representation can be used to produce smooth- shaded images of a kind difficult to generate by other means. Similarly the diffusion method described here could be used to generate different forms of smooth-shaded synthetic vector image although the method of control would be quite different.

In the end any accurate method of vectorising a photographic image needs to have some kind of interpretation of just what a pixel is and in essence we make a different interpretation of an image pixel on input to the interpretation made on output. Two papers which discuss this problem in ways we pay particular attention to are those due to Blinn[2005] and to Smith[1995]. Blinn[2005] discusses eleven distinct ways in which to consider what a pixel actually is and this includes a discussion of the relationship between a sample and a sensor which generate samples, while Smith[1995] makes a strong argument for not considering a pixel as being a square over which some simple form of integration is done (e.g. a tent filter).

Codecs for producing vectorised photographic images (VP I) are known which have no adaptive capability (i.e. take no account of the characteristics of the input data in order to improve compression and/or speed of coding/decoding). Such codecs are sometimes referred to as "blind" codecs. Figure 1 is all generated from a first, incomplete,

implementation of the vectorising image (Blind VPI) codec which was nevertheless a proof- of-concept for many of the key ideas revealed here, and that codec is also detailed below.

An image vectorising codec starts from a sampled image, typically one obtained from a digital camera, encodes it into an annotated contour set (derived as and collected into level sets) and subsequently decodes it back into a sampled image after whatever image transformation processes required are applied to it. A diagram of the main encoder stages of a blind VPI encoder is shown in Figure 2 and detailed below.

An example of a method which carries out blind VPI encoding has previously been described by Peter Balch. The method of encoding set out in that document can be conveniently summarised by the steps set out in Table 1 below.

Table 1

1.2 Until end of !ast commorient

2 Until last level tamcsssed:

Format data structure and write to file as g ouped YUV contours in VP! format

Explanatory notes to Table 1 (indicated by [x] in table)

[1] The Carrato algorithm in the x-direction for a given y is:

^ = - Y + ½ ' ^ where k ^ p ' nd k 2 = p - (k-i , ) 2 + k +i , f )

The Delta notation is explained in note 7.3 to Table 4 below. The value p=0.5 is used as default but may be adjusted.

Error terms are calculated in the usual way assuming:

and φ is the pixel value. Similar formulae apply in the y-direction.

[2] The factor of 2 increase in size (see also note [5] below) was chosen to substitute for adaptive subdivision around the edges to get accurate edge values.

[3] Although U, V are sampled at 5 quantisation level intervals the amount of computation required was far lower than for Y due to lack of detail. U, V file sizes were also much smaller.

The resulting algorithm is set out in Table 2 below:

Table 2

Notes to Table 2 above:

General description (as set out in Balch P. "VPI Algorithm: Simplify Contour Map" VPI papers, University of Glasgow, 2006): The algorithm which finds the contours provides a list of intersections (referred to here as the 'raw' contour) with integral x and y values which is basically a polyline format and essentially overdetermined. Algorithms are known to accomplish this for e.g. Schneider [2000] and Vansichem [2001] but they usually assume a fixed error around each point to be fitted, whereas we have a variable error bound for each point. Both the Balch algorithm and the present invention deal with this case, albeit in different ways. Each 'raw' contour therefore has to be simplified by reducing the number of points which specify it. The simplified contour line is made of cubic curves specified by their Bezier knots and control points. The curves for a contour form a continuous closed loop. The joins between curves are not smooth. The curves are such that every "raw" data point has a curve passing through its error circle. The algorithm described so far generates a joined-up chain of cubic curves but the lines do not join smoothly. It is possible to use a modified form of Schneider's algorithm [Schn90] to make smoothly joined curves. However, generally speaking this does not appear to be worth it. This makes the curves slower to calculate, more curves are needed (twice as many on average) and they do not noticeably improve the quality of the resulting reconstructed image.

Continuity between Bezier chain segments was ignored unless tangent angles differed by less than a threshold amount when they were 'snapped together' (averaged). This gave good results.

The total encoding time for this encoder was 7.5 seconds for a 256 x 257 image on a 1 GHz machine. The Bezier chain finding (step 2.1.1.3.1) took 'negligible time' within this. (Note that the factor of 2 in [2] above is alone responsible for an increase of 4 x in time taken compared to the methods of the embodiments below).

Similarly, the same publication disclosed a corresponding blind VPI decoder which can be conveniently summarised by the steps set out in Table 3 below and is illustrated in outline in Figure 3.

Table 3

Explanatory notes to Table 3 (indicated by [x] in table) [1 ] Manhattan distance is used as acceptable simplification of Euclidean distance but this is not ideal in extreme situations. It can also exaggerate creases in unsmoothed surfaces. 2] The correct formulae (see note [4]) are given by the diffusion (Eikonal) equations:

and similarly for the other direction of diffusion (K= -1 x value shown here). If φ and ψ are the diffusion border coordinates at time t they also form (Euclidean) distances, so interpolation indices and can substitute for /, / above.. These did not assume smoothing, which exceptional cases show needs to be taken into account. Here 'dist' is the geodesic distance measure given in Patterson Taylor & Willis 'Constructing and Rendering Vectorised

Photographic Images' Proc CVMP 09, also JVRB (in press) and R is the 'upwind' boundary

[3] Rendering was carried out at 4 x resolution to avoid the need for adaptive tiling and explicit sampling of the image space. Given the resulting compute times (10 seconds for a 256 x 257 image on a 1 GHz machine) it might have been possible to avoid this, with a potential gain of 16x speed (more likely 10x), but this would have needed Euclidean distances, not Manhattan distances, as a base with associated additional time cost.

[4] Explicit Euclidean Distance algorithms which do not use level sets and run in linear time are known, for example as disclosed in Fabbri, R. et. al. "2D Euclidean Distance Transform Algorithms: A Comparative Survey" ACM Computing Surveys, 40, 1 (Article 2) February 2008.

The present inventors have also proposed an improved method of vector representation of images which is described in detail in WO2008/003944.

It is therefore an object of the present invention to provide one or more codecs (or more specifically methods of encoding and decoding) for audio and video which provide improvements in efficiency of processing and signal bandwidth.

It is further object of the present invention to provide one or more codecs for video which have adaptive characteristics to improve their efficiency.

A further objective of aspects of the present invention is to provide codecs that are resolution independent not only spatially but also in time. Such time-independence means that not only can image aspect ratios and a number of single-image characteristics be altered, but also the output frame rate.

Accordingly, at its broadest, a first aspect of the present invention provides a method for vector encoding of photographic (i.e. still) images which is adaptive and makes use of a novel approach to contour fitting..

A first aspect of the present invention preferably provides a method of encoding an image in vector form, the method including the following steps:

a) reading the image in pixel form;

b) computing the errors in pixel values;

c) determining a plurality of levels to subdivide the pixel values;

d) finding the intersections of each of said levels with the pixels;

e) identifying extrema candidates outside said levels; and

f) compiling said intersections into one or more chains defining a contour for each of said levels using the errors in the pixel values as tolerances for the fitting of said chains, wherein said step f) further includes the steps of:

for each of said intersections, creating a contour path in the form of a polygon whose vertices are the intersection of the level line of the contour with the interpolation of the adjacent pixel values together with the interpolated errors in those pixel values;

resampling the image along the contour path at a fixed chord length which is determined as an exact fraction of the total border length of said polygon sufficient to suppress pixel-pitch noise;

forming forward differences up to the fourth difference for each coordinate on said resampled contour path;

determining the longest cubic curve that passes a plurality of said coordinates on the resampled contour path within the error bounds for each of said coordinates and which has the smallest fourth differences at its end points, and repeating until all the coordinates on said resampled contour path are associated with at least one such cubic curve; and

reconciling any overlapping end points between the determined cubic curves to form the complete Bezier chain defining the contour; and

calculating all interior control points defining the cubic curves in said chain. Preferably the step of determining the cubic curve includes the sub-steps of: creating a list of said fourth differences summed in x and y and sorted by absolute magnitude;

for each potential Bezier segement in said chain, repeatedly:

selecting the smallest unused fourth difference and discarding it from the list and retaining the difference table index;

computing a stack of central differences consistent with a cubic curve passing through the curve point of said index;

using said difference stack to generate cubic values which lie within the error bounds of the points used to form said differences and restarting the process when said cubic values exceed starting point error bounds by reorienting the curve or changing the shape of the curve, or terminating the process when restarting is no longer possible;

until all the resampled points in the difference table are associated with at least one cubic segment.

This is a completely novel algorithm which works using principles substantially different to any other curve-fitting algorithm. Tests show it is the fastest algorithm known for this purpose and yields the longest continuous curves, so therefore optimal for this application. The use of this algorithm also makes gathering petal maps etc. easy and is highly efficient when mapping out curvature scale-spaces.

At its broadest a second aspect of the present invention provides a method for decoding images encoded in vector format which makes use of the manipulation of the distance values to produce smoothed interpolants. Preferably the method of this aspect is suitable for decoding images encoded by the method of the above first aspect including some, all or none of the optional or preferred features of that aspect.

Preferably the method of the second aspect provides a method of decoding a vector encoded image, the method including the steps of:

reading the encoded image data; and

selecting each enclosing contour and repeatedly, for each enclosing contour:

reconstructing a footprint using tiles which are a square array of sample values of equal or higher resolution than the original image sampling, initially at a predetermined rendering resolution;

rendering contours into enclosing contour footprint tiles at said predetermined rendering resolution for border and enclosed contours;

if a contour is already present in a particular tile, subdividing the tile using a rule with progressively smaller bias coefficients in each subdivision level until the tiles each contain only one piece of a single contour;

filling tiles within footprints with quasi-Euclidean distances / from the enclosing contour boundary, and propagating border slopes synthesised by scaling by distance from contributing sample points;

filling enclosing contour footprint tiles minus enclosed contour footprints with quasi-Euclidean distances j from enclosed contour boundaries, again synthesising interior slopes from contributing pixels; and

interpolating contour levels using i/(i+j) as an index into tiles.

The above method steps may also be used in a method of encoding an image in vector form as part of the test render(s) in that method and this forms a further aspect of the present invention.

Preferably the quasi-Euclidean distances / and j are the Euclidean distances from the enclosing contour boundary modified to provide intracontour smoothing.

This method aims to produce smoothed interpolants which do not produce the artefacts in the rendered image the way that unsmoothed interpolants do.

The above aspects of the invention relate to a single image codec, the so-called "VP I codec", which may then form the core of a further codec, the so-called "Motion-VP!" codec which forms further aspects of the present invention.

Motion-VPI is in essence showing the original images after encoding them as single images then decoding them again from their internal formats. This is analogous to motion-JPEG, which is a particular form of MPEG which constitutes a stream of JPEG images restricted in their compression options for consistency. However, unlike motion-JPEG, motion-VPI can take advantage of the similarities between contours, to the point that no data (contours or parts of contours) ever needs to be sent twice. This is preferably effected by a sieve algorithm which classifies contours so as to reject dissimilar ones very quickly, with its most expensive stages being reserved for the most similar contours. The decoder maintains a cache (which mirrors a cache maintained by the encoder) and may be instructed to take a given contour from its cache rather than receive it from the encoder. In this sense it is expected that data reduction advantages would be comparable to true MPEG even though motion vectors are not explicitly employed.

At its broadest a third aspect of the present invention provides a method for vector encoding of moving images which is an adaptation from the encoding of still images in which images are encoded, preferably using adaptive VPI, and common substructures between successive images identified. Preferably the method of this aspect uses a sifting contour comparator which exploits a multistage "fast reject" process which sorts the contours into categories and sub-categories.

Preferably the third aspect of the present invention provides a method of encoding a series of moving images in vector form, the method including the steps of:

setting up a package structure;

repeatedly, until said package structure is full:

obtaining a pixel array representing the next image in the series; encoding said pixel array in vector form and placing the result of said encoding in a predetermined data structure;

identifying common substructures within said encoded array compared to one or more previous arrays and placing said identified common substructures in a group list; and adding said group list of common substructures and said encoded pixel array to said package structure, and

sending said full package,

wherein the step of identifying common substructures identifies common contours within said array and further includes the steps of:

modifying the contour structure to form a contour scale space in which successive versions of the contour are fitted with Bezier chains together with petal numbers, petal maps and curvature maps;

using the contour scale space to populate a contour similarity hierarchy which has three discriminants: a level number which is the number of levels needed to achieve a hierarchy, a maximum side length for the bounding box for the contour and an aspect ratio for the adjacent bounding box sides;

for each contour in said array, if a match is found within said hierarchy:

scaling, rotating or translating the input contour to match the bounding box;

working systematically through the levels in said hierarchy in a descending manner, matching petal counts, petal maps and curvature maps using tree- based sorting of indices which determine if the discriminant is fine enough, and discarded at that level otherwise;

at each level:

if a successful match is found, setting a flag indicating this and proceeding to matching at the next lower level, if present, and forwarding the flag if not present;

if a match is found within one or more regions of the contour in the hierarchy setting a flag indicating this and proceeding to matching at the next lower level, if present, for the regions of the contour that match, and forwarding of the unmatching portions with the flag if not present;

unsuccessful matching causes entry of the remaining levels of the contour into the hierarchy and the forwarding of the contour as novel:

if a match is not found within the hierarchy, then the characteristics of the contour are entered into the hierarchy as new initial discriminants along with the hierarchy descending from them as an alternative sub-hierarchy.

This method uses a novel form of curvature scale space (or contour scale space), and is preferably used in conjunction with the 'fountains of difference' method described in the above first aspect. In combination these methods can allow a much more accurate implementation than has been achieved previously because the scales can be found quickly in terms of complete contours held as Bezier chains. The whole contour can be considered, then simplified into maps for efficient matching at higher levels using a tree-based discriminator which can also act as a test for discriminability (i.e. whether the test offers good enough discrimination, or not). The use of the various petal and curvature maps in curve orientation and their properties is also previously unknown, and using these methods in combination, the corners can be found easily and included in maps.

The curvature scale-space is preferably indexed by petal counts and those petal counts are preferably derived using the method of the above first aspect and from the nature of Bezier chains generally. Preferably the scale factors used in scaling the input contour are retained to provide independent scaling in x and y directions in order to discriminate between contours simplified using the contour scale space. g

At its broadest a fourth aspect of the present invention provides a method of decoding a series of vector encoded images which uses a cache to store common contours between renders with the object of minimising bandwidth by minimising the number of times we send the same contour data. Preferably the fourth aspect of the present invention provides a method of decoding a series of vector encoded images, the method including the steps of:

receiving image data comprising: data describing a plurality of contours and data describing a plurality of transformations to be performed on said contours;

storing, in a cache, a plurality of said contours;

reconstructing each image in the series by applying the specified transformations to one or more of the stored contours and rendering the image using the transformed contours.

Preferably the cache is used in conjunction with the sifting process of the above described third aspect, with the contours or part contours identified as common contours in that process being stored in the cache. Further aspects of the present invention provide a computer program which, when run on a computer, performs the method of any one of the above described aspects, including some, all or none of the preferred and optional features described, and a computer data carrier containing such a computer program.

Further aspects of the present invention provide a read-only memory device or a

programmable logic array arranged to perform the method of any one of above described aspects, including some, all or none of the preferred and optional features described.

Brief Description of the , Drav¾flDflS,

Embodiments of the invention will now be described by way of example with reference to the accompanying drawings in which:

Figure 1 (i) shows the original rendering of a photograph of Baku Underground station;

Figure 1 (ii) shows a re-rendered version of the photograph which used a flat fill regime on an image which was vectorised with the intention of using a diffusion-based fill regime of the 6

kind described in the present application; and Figure 1 (iii) shows the re-rendering of the same photograph using the VPI embodiment of the present invention;

Figure 2 shows in schematic form the main encoder stages of a known blind VPI encoder of the type described above;

Figure 3 shows in schematic form the main decoder stages of a known blind VPI decoder of the type described above;

Figure 4 shows an outline of the structure of an adaptive encoder according to an embodiment of the present invention;

Figure 5 shows an example of an enclosure tree which specifies which contour encloses which other contours;

Figure 6(i) shows an out-of-focus image; Figures 6(ii)-(iv) show, respectively, (ii) the Y- component, (iii) the U-component in grey-scale and (iv) the V-component in grey-scale;

Figure 7 shows the contours for the YUV components of Figure 6(i): Figure 7(i) shows the Y- component in Figure 6(ii) contourised, Figure 7(ii) shows the U-component in Figure 6(iii) contourised and Figure 7(iii) shows the V-component in Figure 6(iv) contourised;

Figure 8 shows the outline of the approach to contour finding: Figure 8(i) shows a

representative image patch as pixels, Figure 8(ii) shows the same image patch as addressed point samples and Figure 8(iii) shows it as a pixel map;

Figure 9 shows an example of how a contour is determined;

Figure 10 illustrates a process for finding intersection points: Figure 10.1 covers the cases in which the next intersection point is unique and Figure 10.2 covers the remaining cases where the contour trajectory is ambiguous because of multiple choices for the next intersection point on the contour's trajectory;

Figure 11 shows the values of x and y against uniform parameters for the curve in Figure 12;

Figure 12 shows a sample Bezier curve and relationships between parameters using chord length and "natural" parameters; Figure 13 shows an inflected Bezier and the same parameter relationships as shown in Figure 12;

Figure 14 shows the effect of Gaussian kernel values on noisy data with the underlying signal; Figure 15 illustrates an example of the process of subdivision of tiles;

Figure 16 illustrates how pixels cut by a single contour are considered;

Figure 17 illustrates the process of rendering between nested contours;

Figure 18(i) shows an example of a fill region and Figure 18(ii) shows a section of the map reduced to two-dimensional slice; Figure 19 illustrates the smoothing of an iso-surface slice;

Figure 20 is a further illustration of smoothing;

Figures 21 and 22 illustrate curvature concepts and petal mapping; and

Figure 23 shows, in schematic form, two different implementations of a motion VPI codec according to embodiments of the invention. Detailed Description

Figure 4 shows an outline of the structure of an adaptive encoder according to an

embodiment of the present invention.

In general terms, an adaptive encoder according to embodiments of the present invention encodes a single 'scaffolding' level followed by repeated subdivision within footprints until local maxima/minima are isolated in order to provide the rest of the scaffolding.

Subsequently the codec proceeds to conduct test renders interspersed with contour-finding in the nearest power-of two interval between individual contours until the (local) render corresponds to the original pixel values within their error bounds. A final (optional) stage 'stress-tests' contours by removing them to see if they are still needed for a correct render. The idea here is that if an interval is split and then split again only on one side then possibly the first splitting contour is no longer needed. A more sophisticated optimiser might replace the last two splitting contours by one with a level number which subdivides the difference between them using overall level descent rule, but that would require an additional call on the contour finder (not shown here).

The first embodiments described relate to a single image codec, the VPI codec, which forms the core of the subsequent Motion-VPI codec. Colour regimes and 'blind' contour choice.

At the heart of all these codecs is the internal format of a single vectorised image as discussed below.

Where possible SVG or SVG-compatible formats are used. As SVG is an instruction-based format, the form of an SVG-friendly instruction set is properly the subject of the next section. This section describes the data structure assumed for a VPI image in design work to-date. It should be pointed out that this data structure could change in detail depending on the implementation and the following is provided purely as an example of a suitable structure for implementation. Suitable variations will be apparent to the skilled person seeking their own implementation. The structure is largely that set out by Peter Balch in earlier work and discussed in outline above.

The principal VPI data structure is a representation of the contour map referred to as the 'Enclosure Tree'. The Enclosure Tree specifies which contour encloses which other contours as set out in Figure 5. This is in effect the implementation of the relation encloses(,) and is a hierarchy of contours (including contours R and S) defined in terms of R encloses S in turn defined as follows in terms of closed connected regions such as A . To define enclosesQ we start with the inside test inside(), p inside A ID W 4 (p ~ s where w A (p) is the winding number for p in the closed region A and s is consistently +1 or -1 for every point p in the region. Now we need footprintQ defined as: inside A} and hence:

A encloses B > ^B c footprint (A A 3.C, A encloses CA C encloses

Also

A. enclosed by :D B encloses A defines enclosed by(,). So A encloses B is an ordering relation between individual closed contours in each level and determines all the contours B which are enclosed by A and by no other contour. Rendering the vector format consists of instructions for drawing and filling these contours in order.

Each node of the Enclosure Tree contains · a specification of the contour segments as Bezier curves

• the height of the contour (the colour value associated with it)

• pointers to the contours that it encloses

• the bounding rectangle of the contour

• the area of the contour - positive for clockwise contours, negative for anticlockwise (positive or negative is a convention for whether the area inside the contour is higher or lower than the contour)

The Enclosure Tree is an N-ary tree, not just a binary tree as shown above. Each contour node has a daughter pointer and a sister pointer. Node P uses the daughter pointer to point at one of the contours that it encloses. That daughter then points at other sisters which are also directly enclosed by node P.

The finished contours must be sorted into a properly arranged Enclosure tree. Sorting is slow and costly - slower than generating the contours for large bitmaps.

A variation of insertion sort is used. Each finished contour is inserted in turn into the Enclosure tree. It has been found that sorting the finished contours into order (biggest area first) significantly speeds up sorting the tree.

A contour is inserted by descending the tree while comparing the new contour with each node:

• the root is always the outer bounding contour • start by comparing the new contour with each of the root's daughters in turn

• if it is enclosed inside the daughter then descend further

• if it encloses the daughter then

• the new contour becomes a daughter

• the enclosed daughter is moved down a level to inside the new contour

• other daughters might also be moved down a level to inside the new contour

This is set out in Table 4 below:

Table 4

2.4 until tree traversed

The relation A enclosed by B (line 2.2.1 ) is tested often while sorting, so it is worth making it efficient. The following tests are performed until one fails:

• the bounding rectangles of the contours are compared: A must be entirely enclosed in B

• the areas are compared: A must be smaller then B (actually, this test fails so often that comparing areas slows down the sort)

• one of the coordinates of the 'A' contour is compared with the polygon of the "B" contour

as set out in Table 5 below:

Table 5 Step Activity

2.2.1 enclosed by (A, B) = enclosed £>y(bounding box(A), bounding box(B)) Λ area (A) < area (B) Λ coordinate (A) inside B

The standard 'point inside a polygon' (insideQ) test is used, which is slow. An imaginary line is drawn from the point, Northwards (+y direction) to infinity. If the line crosses an odd number of edges of the polygon, then the point is inside. You have to watch out for passing through the corners of the polygon and running up a vertical polygon edge. This is a classical method for implementing insideQ.

A major obstacle in determining this vector form is that of determining how many contours to use. A naive approach - encouraged by the flat-fill example - would be to use one contour for every unit of colour quantisation but this not only produces far more contours than are needed for a visually faithful reproduction of the image but also far more detail in defining each contour than is needed. If flat fill is used within contour boundaries the inefficient naive approach is the only technique which will assure an artefact-free result. Techniques based on morphological principles (e.g. [Serr82]) tend to do this and, while we reference these techniques as part of the historical record, we would like to draw a distinction between the morphological approach and ours, notably in terms of the number of levels required for a visually satisfactory rendering. In a morphological approach all that would be needed for a visually faithful rendering would be to fill the contour with an 'average' colour which quantisation would force to be either the border colour or the next unit above or below it. In fact far fewer contours are needed in practice as we show here. To illustrate the contours without too much visual confusion we chose the out-of-focus image shown in Figure 6(i). Figures 6(ii)-(iv) show, respectively, (ii) the Y-component, (iii) the U-component in grey-scale and (iv) the V-component in grey-scale.

The images have been rendered from contour sets representing the YUV components of a colour image using the blind decoder described above. As can be seen in Figure 6 (iii)-(iv) and Figure 7 (ii)-(iii) (described below) there is very little detail in the U, V components so the size of the dataset is dominated by the Y-component. This remains true even with a sharp, high-definition image. The Y component (grey-scale) is, here, held in levels-of-10 256-bit quantisations, i.e. the colours 127, 127+10 etc. 127-10 etc. of which there are 25 altogether; while the U,V components are held in levels-of-5 (so 127, 127-5, 127-10 etc.) of which there are 49 altogether.

This selection of contour levels is adequate for the images we have used here (although it is possible to find counter-examples) but an adaptive codec will make its own choices (see below). The contours for the YUV components of Figure 6(i) are shown in Figure 7. Figure 7(i) shows the Y-component in Figure 6(ii) contourised, Figure 7(ii) shows the U-component in Figure 6(iii) contourised and Figure 7(iii) shows the V-component in Figure 6(iv) contourised. The contour segments whose Bezier end-point tangents have been 'snapped together' because their trajectories have lain inside a threshold angle are shown.

Bandwidth

The Blind VPI codec discussed in the background above was quite wasteful in contours. In particular it generated every contour required for all the 25 or 49 levels actually present in an image. For the U, V components this wasn't an issue as there were very few levels in an image anyway (as can be seen in Figures 7 (ii) and (iii)), and little detail to capture, but in the Y component most, if not all, the levels would usually be present. We estimate that there is a 98-99% redundancy in contours in such an image and in fact on the small images we were looking at (unoptimised) file sizes more than 10 x {size in pixels}. So our contouriser was spending a lot of time finding un-needed contours (and in fact working from an inefficient image format in doing so). Accordingly our adaptive contouriser carries out a test render after vectorising just a few levels (3 within boundaries) and uses that to guide the selection of new contours to encode. Thus not only does the adaptive codec produce an image guaranteed to within assumable accuracy but also finds many fewer contours than before, as we say, maybe 50 to 100 times fewer. Also these large file sizes concealed a key fact, notably that as images scaled, so the pixel forms would scale with the square of resolution, while contour data would scale at a much lower rate (between n log n and nVn, where n is the resolution). This can be seen when comparing the border of a square or a circle with its area. The border corresponds to a contour and the area to the number of pixels required to fill it. So the border of a circle is 2 - π- r while its area is π· >· 2 , the border of a square 4 - r and its area r 2 . Each of the lengths r measure resolution, so borders (contours) scale by 0(r) while areas (pixels) scale by o(r 2 ) . However as images scale so it is likely additional contours will be needed at higher resolutions to capture uninferrable detail not seen at lower resolution. The precise correction for this is not clear, but is likely to be log(r) or possibly log 2 (r) . We would expect the exact amount of this correction to be image-dependent; some images are obviously 0(r) while others may scale by 0(r · log(r)) or worse. We will assume here that the usual scaling is o(r · log(/-)- log 2 (c)) where c is the colour depth (possible number of contours) and can usually be left as a constant across comparisons, so not represented in a formula of this kind. Technically pixel arrays also scale with colour depth as oir 2 · log(c)) but in our comparisons we will assume the c is always the same, an assumption which also favours pixel arrays.

From the foregoing and other research, it is known that VPI images will scale better than images held as pixel arrays or other forms of image encoding based directly on the sampled representation, such as JPEG, so there will be a point at which apparatus for managing V- formats, as covered herein, will supplant sampled formats at resolutions of about 6K (4x3 aspect ratio, the 6K refers to the long side-length). Motion-VPI will scale similarly with scale factors which supplant MPEG at about 4K. The above calculations assume only that known scalefactors apply and so are almost certainly conservative. For example, if there is significant commonality between contour shapes in the same images, e.g. around textures, then the 'tipping point' between sampled images and vector images may be significantly lower, and similarly if there are large numbers of identical contours between frames (e.g. in static backgrounds, the 'sieve' automatically compensates for panning etc movements at the contour level) then the data reduction ratio could be comparable to the difference between MPEG and Motion-JPEG, Although these gains are unknown, they are expected to be considerable, and may well be scene-dependent and certainly dependent on the extent of the support provided to exploit these aspects of repeated images. Mitigation of scene dependence is also an objective of aspects of this patent.

This analysis assumes that none of the options for compression in V technology are exploited. In fact V technology is not a compression technology per se, just a different way of encoding images, but it does have as yet unexplored capabilities for lossy compression. What we have been doing here is to compare lossless formats (V) against a lossy format (MPEG) and seeing the compression format overtaken above 4K. So bandwidth improvements and what they facilitate would appear to be a key motivation for developing and adopting V technology. In the end there is no alternative to V if film and broadcast are going to benefit from higher resolutions and not stall at 8K.

Contour Representation We will now set out the principal features of the approach used in the codecs of embodiments of the present invention. In these codecs individual contours are usually represented by closed Bezier chains; contours clipped by the image border are completed by the shape of the border segment which falls within the contour footprint. This has certain implications for constructing the intermediate vector format and for the diffusion process involved in decoding (see the section entitled "Smoothing diffusion" below).

Figure 8 shows the outline of the approach to contour finding. Figure 8(i) shows a representative image patch as pixels. Figure 8(ii) shows the same image patch as addressed point samples and Figure 8(iii) shows it as a pixel map. Contours are found by first finding where they intersect lines between sample points derived from the original pixels. A key aspect of the input process is the explicit assumption that the pixel is contaminated by noise which can arise from any source, quantisation, sensor noise, even numerical inaccuracy, so is essentially of unknown origin. For example when considering the degree of accuracy to which the isosurface is modelled the strictest requirement we can safely make is that the isosurface model lies everywhere inside the error bounds of the pixels.

Pixel error can be modelled in a number of ways, essentially either globally or locally. The accuracy of the value derived from the model is not critical although too crude a model could result in retaining image noise in the final result or a result which loses detail. While more accurate local approaches are covered here we should say that all the images in this introduction were generated assuming a simple global noise value (±constant around each pixel value) without apparent loss of detail due to that assumption. If we are to attempt to preserve noise statistics in the final image, as suggested earlier, more accurate, local, methods will be needed. For example in EXIF the noise estimate is given in terms of the CIPF-DC-008 standard determined in terms of SNR, here a standard deviation derived from a weighted mean of luminance and colour values for individual pixels and this may be available from metadata. In the absence of quantified data we would prefer the bilateral median approach of Weiss[2006].

Some care is nonetheless need in interpreting the results of filtering and comparison. We really need to treat the original pixel values in some sense as 'ground truth'. In the event of an unsigned error being available (e.g. SNR as above) this isn't an issue, the pixel value is surrounded by a ± value which could go either way. However, the results of filtering give us a signed error in one direction only so one might treat the pair (pixel value, filter value) as constituting the errors around a synthetic value which is their mean, and it is this synthetic value which replaces the pixel value thereafter. This is particularly important when using bilateral filtering which, while preserving edges, tends to suppress non-edge high frequencies to an unacceptable degree. Weiss[2006] and others refer to this as a 'cartoonish look'.

Error terms ε, however derived, can be converted into spatial errors δχ, Sy in the x or y directions by applying the formulae:

Here = (χ,γ) is the piecewise continuous approximation of the 'true' isosurface. We now need to introduce some notation, as set out in the next section.

The spatial errors in equation (1) above are quoted in terms of continuous derivatives but once again we use first differences to generate the needed va ral differences, i.e. we use the formulae <Sc « ε■ (A U X : V ) where Δ' nd

corresponding formulae for the y direction). In the following discussion we will take≤- = ± 0.5, the quantisation error, which is the smallest value of ε we can assume, but when encoding the image examples we have used here we essentially scaled ε by a constant amount chosen subjectively (essentially SNR) and intended to reflect our sense of how noisy the image was. Most digital images available to us have been through a process of JPEG encoding. It would be possible to trace through the decoding process to determine the per- pixel quantisation error which is typically image-dependent and spatially variable in that image, although we didn't actually do this. For us the correct way of handling the effects of JPEG encoding is simply to use these quantisation errors as a lower bound for ε if other methods of estimating it are in play. The same applies to MPEG and other lossy filters, if an instrumented decoder returns a quantisation image (in effect pursuing lower and upper bounds throughout the decode and derive thereby an unsigned quantisation error for each pixel) as well as a pixel array then this can be used as the basis for an error estimation on a pixel-by pixel basis. A V process (encode/decode) could thus mitigate compression artefacts from frequency-based compression - another potential early application.

Constructing Contours To show how a contour is determined we will use an example in which the error term is taken to be the quantisation error (±0.5). This is shown in Figure 9. If we determine the path of a contour in terms of its intersections with the borders of a box whose corners are decorated with sample values we can interpolate between sample values to get an 'exact' solution., thus allowing the 'exact' computation of the lower bound of the contour trajectory (Figure 9(i)) and the upper bound (Figure 9(ii)). When these are transferred to the pixel tile with no error bound (£=0) they delimit the region we refer to as the ribbon of error within which any contour trajectory is equally valid. This is true for any realistic calculation for the pixel error. In fact we associate these errors with points on the contour line when fitting the Bezier chain but any fitting algorithm (e.g. the method outlined by Schneider[Sch90] or by Vansichem et al. [VWV01]) has to take into account different values for that error around each sample point. One way to relax this condition is to multiply the derived error values with a constant and the consequence will be first to increase the width of the ribbon of error (whose width will scale with the multiplier) and then to reduce the number of segments in the chain and thereby increase its smoothness.

It turns out that failing to take into account the presence of noise results in unnecessarily large numbers of Bezier segments in every contour as it twists and turns around single pixel- sized 'features' which are no more than noise-induced deviations from local correlation.

If instead we account for noise adequately we get much smoother curves (with many fewer segments) whose smoothness has, up to a point, no perceptible effect on the resultant render. As indicated earlier the interpolation formula used is that for linear interpolation, that is by solving in the appropriate direction for the point where the linear interpolant along x or y is equal to the sought contour value, and then interpolating between the sample errors by the same amount. This is entirely equivalent to a process of solving for two isosurfaces and is, in fact, safer numerically. If we had used a local error measure then we could have determined which degree of interpolation was appropriate on a solution-by solution basis by finding which difference approximated to zero within the calculated error bounds for that difference[PW08]. However past experiments with a pixel-level noise estimator have showed that for the majority of cases (approximately two-thirds in typical images) only linear interpolation could be justified and the rather basic assumptions about noise made when forming the images here would not justify higher order interpolation anywhere. The resulting polyline approximation to the 'true' contour is simplified by finding the Bezier chain with the fewest segments which fit within the error ribbon the polyline approximation defines. In an encoder the contour values can be determined by a blind strategy or an adaptive strategy. For the examples used in this introduction we have used a simple but (reasonably) effective blind strategy of pre-selecting the values to be found. As a consequence all the images in this paper are represented by the same choice of contour levels and all the contours are found at the same time by a single scan of the image from top to bottom. Here we examine each pixel to see which contours pass through a bounding box around its centre and then join up the contours by matching adjacent bounding box edges. The classical form is referred to as 'marching cubes' but Balch[2006] refers to it as 'domino contouring'.

This scan-based approach is only really possible with a blind strategy as an adaptive strategy will of necessity contain a stopping condition based on testing the need for the contour loop under consideration. In fact both processes normally invoke a highly localized search process to find the next intersection point or points following a solution for the previous one. This process is sometimes referred to as the marching squares algorithm because of its similarity to marching cubes[Lore&87] but the search process is modified in a scan-conversion context to discover all solutions along a given scan-line before proceeding to the next one. The process is illustrated in Figures 10.1 and 10.2. Figure 10.1 covers the cases in which the next intersection point is unique and Figure 10.2 covers the remaining cases where the contour trajectory is ambiguous because of multiple choices for the next intersection point on the contour's trajectory. In Figures 10.1 and 10.2 the actual pixel values are not shown but the outcome of the test p < φ(χ,}>) is marked as < (true) or > (false, p > φ(χ,γ)).

If the pixel centres are marked as shown and the initial intersection of the contour with the square joining the pixel centres is shown as a black dot on the left side then the next unique point on the contour path will normally be found as shown in cases (i)-(vi) in Figure 10.1. However Figure 10.2 shows as cases (i) and (iv) the two ambiguous cases which have been left out of Figure 10.1. The two possible interpretations for case (i) are shown as (ii) and (iii), and the two possible interpretations for case (iv) are shown as (v) and (vi). The resolution of these ambiguities is done by determining the isosurface value at the centre point, marked in the Figure as an empty circle. Outcomes (ii) and (vi) apply if p > φ(χ,γ) at the centre, and (iii) and (iv) if p < φ χ,γ) at the centre.

It should be noted that this process marks where the contours intersect a square boundary around the pixel, in the x or the y directions, or both. Bezier Curve Fitting

Embodiments of the present invention make use of a new Bezier curve-finding (or 'curve- slinging') algorithm intended for finding the longest Bezier curve segments to fit a series of sample points, themselves found as the intersection of chromatic level-lines (isochromic contours) in an image, referred to as a 'fountains of difference' approach. These are available as intersections of integral-valued x and y lines i.e. lines parallel to the x and y axes, so come in series of points with successive x or y values as one of the coordinates. Which coordinate this is usually alternates between successive points, but one coordinate or other will predominate in regions where the curve tangents are similar in slope to one or other axis. An error bound for the unconstrained coordinate is available at the point where the intersection solution is found.

Historically this problem has been solved by iterative least-squares fitting (e.g. [ltoh&93], [Schn90], [Vans&01]) which is an inherently expensive calculation, even though it can be quite extensively optimised by avoiding un-necessary recomputation of values [Patt&11], Further, these methods involve repeatedly attempting to fit smaller and smaller curves after failing to fit the larger curves. This approach is highly likely to find curves with lengths just under half the longest segments which could be found, because of an inherent iterative or recursive 'split the problem in two' approach. Finally, any attempt at re-slinging through the same curve points with increased error bounds, involves starting from scratch, for the same reason. Curves are found from the outside-in.

The method described here finds parametric Bezier curves by extending out from 'suitable' start points. With parametric curves the x, y values simply follow the parameter value as it scales from 0 to 1 , that is, well-behaved values in both x and y are always accessible in terms of a progressively scaled parameter, as shown in Figure 11. In Figure 11 , the lower line (marked Series 2) shows the value of x against uniform parameters for the curve in Figure 12, whilst the upper line (marked Series 1) shows the value of y against uniform parameters for the curve in Figure 12.

The problem is, what are the parametric values for the given curve points? Worse, the given points aren't even on the curve but differ by a 'fitting error' which has to be smaller than its input or given error. There is, however, a quantity which is available to be calculated as a parametric difference, and that is chord length as measured along the polyline (or more accurately, polygon) connecting the given points. Chord-length is a highly-desirable form of parameterisation and is often used explicitly in non-uniform B-splines [Bart&87], and indeed in modelling the Bezier parameter in curve-fitting algorithms [Schn90]. In fact the unachievable arc-length parameterisation would be even more desirable, and the reduction which we require for our sample points (in effect replacing them by sample-points at a lower pitch) will mean that our parametric lengths will be synthesised from much shorter chord lengths, so they really will approximate arc length, certainly more closely so after normalisation than straightforward chord length between the final samples.

In this case we take the differential chord length as being given by the sample points themselves, here the distance between successive sample points, regardless of which integral value they are solved against. So 'successive' is taken as successive in trajectory and the sum of these distances form a parameter which can be normalised. In fact the parameter used in Bezier curves does approximate chord (or arc) length to a degree (see Figures 12 and 13. Figure 12 shows, respectively, the Bezier curve and relationships between parameters; the upper line, marked Series 2 in Figure 12 is using chord length, while the lower, straight line is "natural". Figure 13 shows an inflected Bezier and the same parameter relationships), but also varies from linearity slowly enough not to affect the stability of the difference approximation. Finally, implicit differentials can be found from parametric ones by simply dividing the y parametric differentials against the x ones. Infinity problems will appear when this last is attempted, but this is not actually needed. Pre-filtering and re-sampling

In essence the method involves first finding local sample point approximations in terms of convolution using uniform taps (convolution weights remain the same each time). In fact we need to filter the sample points anyway because they will be contaminated by noise. If you imagine a colour ramp, with values varying uniformly from one colour to another, and then capturing an image of the ramp, the image will not contain uniformly varying pixel values, but approximations to them, as each will be contaminated by noise arising from various sources [EMVA10]. Accordingly, a carefully-calculated contour determined directly from the pixel values will not show straight lines normal to the colour slope, but a wiggly line which approximates that straight line, as the individual pixel values stray from the viewpoint values in either direction. The contour, measuring a uniform level, is thus circuiting around the tiny peaks and troughs which noise has put into the image at pixel-pitch wavelengths. Since the pixel-pitch frequencies in the contour will be image or frame-dependent, successive frames in an image sequence will show different wiggles, so these need to be eliminated from the beginning.

Figure 14 shows the effect of Gaussian kernel values (upper figure) on noisy data (the highly variant line marked as Series 2 in the lower figure) with the underlying signal (the less variant line marked as Series 1 in the lower figure).

The 'best' solution is to filter the contour itself, but at this stage it is only available as sample- points, which are themselves not wholly accurately placed if we take the image as definitive, rather than the image capture. Accordingly the solution is to first synthesise sample points, using 1 -dimensional filters on the x and y values separately, moving step-by-step along the sample-point trajectory. Because this is intended to eliminate pixel-pitch wavelengths it is recommended that a 5, 7, or 9 -tap Gaussian filter is used, scaled to eliminate just pixel-pitch noise. An example showing how pixel-pitch noise is removed to expose the 'signal' that is actually carried is shown in Figure 13 using a 5-tap convolution filter whose un-normalised values are graphed (on the whole a 7 -tap filter, i.e. one which uses the weighted averages of 7 successive values, is considered safer here [Nebe&05], [Zhan&04]). This is actually the costliest part of the whole algorithm and a 5-tap filter is significantly cheaper than a 7 -tap filter (3-tap filters are inadequate for these purposes). The precise filter parameters can be decided by testing one or more small image patches, where an abrupt fall-off in high- frequency variations (alternate positive and negative curvatures in the contour) due to noise frequencies disappearing - as above - is sought. In a sequence the required kernel and number of taps may be expected to remain the same, as the capture device will usually be the same device for each image.

The next stage is to determine chord-length-separated values, as approximated by the polyline (actually, here, a polygon, because it is closed) made up from the filtered sample- points. Filtering doesn't by itself guarantee uniform separation and this has to be approximated by 'walking' along the polyline in uniform path-lengths, here an exact sub- multiple of the polyline's total chordlength (summed individual chordlengths). If that sub- multiple (the baseline path-length) is chosen to be at least two pixels in length (i.e. of the order of the wavelength of pixel-pitch noise) then this will give separations which approximate uniform arc-length better than interpolating with lengths comparable to the original chord-length samples. Although the longer this baseline length is, the better it is from the point of view of approximating arc-length, there is a compromise to be struck, which is to make the baseline (technically a long baseline because it will always pass over at least two samples, and sometimes 4 or 5), or stride as it is referred to later, nearly equal to, but always bigger than, twice the pixel pitch.

So a sequence of synthesised, filtered, sample points of approximately equal parametric distance, here modelled by something like arc-length (because multiple chord-length segments make up the path length, so it is nearer to arc-length than chord length, albeit not exactly equal to the unachievable arc-length, which is good enough for present purposes), is presented to the main part of the algorithm which proceeds to use them to build (separate) difference tables for both x and y values. Inherent errors in each are determined in terms of the error bounds around the two adjacent points in the polyline actually used to determine the uniformly sampled parameterised sample point by interpolation. Since these points were themselves filtered, their errors can be determined by filtering the original lower bounds and upper bounds . Error bounds should always be determined for both x and y directions as follows. If P Xi i _i , P x , P M are successive values in the x-direction the x-gradient is

P - P _

approximated as Δ x j ¾ x+l *~~ . If the SNR error in the value of P is ΔΡ (with appropriate i\P x .

indices) then the x-error in P X I is S x i ■ --··· . If a contour has an intersection point at index /

+ j where 0 < / < 1 , then the x-error in x i+J is given by j S x i+l + (l - j)- S x j , and similarly for the y-error (which is also interpolated by j). The actual value is obtained by splitting the difference (so the filter needs to be applied twice and not 3 times). Thus this preliminary filtering needs to be considered to be a part of the algorithm itself.

The Fountains of Difference

The 'fountains' algorithm essentially proceeds in the following steps. First, differences up to degree 4 need to be formed separately for the presented x-values and y-values respectively. Then the longest cubic curve (that is in terms of cubic equations for both x and y) which fits the 'most promising' set of points needs to be found. This is the curve which passes near the coordinate pairs within the error bounds ( s x t , s y l ) for each coordinate (x t , y t ) and the most promising candidate set will be those showing the smallest fourth differences (which contain fitting discrepancies only). There will come a stage when the curve cannot be extended further while fitting within the error bounds for each coordinate. The result is then the coordinate points for the end-points of the Bezier curve. Further, because the algorithm uncovers cubically-related coordinates (the algorithm enforces a cubic relationship via the nature of the higher differences) the points form a parametric Bezier curve as the individual coordinates of the points on a uniformly parameterised cubic Bezier are themselves cubic (and form correct differences, the same ones the algorithm enforces). So both the points and the Bezier parameterisation become available as each point implies a uniform increment between the end-points. The (two) interior points may ultimately be found by putting in (normalised) parameter values into the difference stack and solving two linear simultaneous equations (the cubic equation coefficients could be directly retained and used to determine curve points at the desired pitch, but finding interior points appears easy), although this has to be done twice, once to define points on the discovered curve and once to find the interior points for the reconciled (i.e. clipped) curve.

The algorithm is applied repeatedly until all the sample points are included in one curve or another. The precise segments used are then determined in a reconciliation step, where curves are edited and joined if their tangents are close enough, so 'snapped together', or just joined with C° continuity, if there is no such common point. It is at this stage that the actual part of the curve which is going to be used in the contour is identified, and hence its ultimate normalisation and parameterisation. The pseudocode for the algorithm is set out in Table 6 below.

Table 6

Step Activity

1 For all x intersection solutions do {

1.1 Reconcile subdivided tile to give synthetic lowest resolution level tile (subdivisions need to be retained at this stage)

1.2 Calculate ε χ ι [as described in note 2 for adaptive encoder below]

1.3 Add x, , e x t to initial polyline path

1.4 }

2 For all x intersection solutions do {

2.1 filter x intersections using x, +ε χ ι

2.2 filter x intersections using x, -ε χ ι

2.3 calculate average x, +e X )+ (x l -¾.,))

2.4 >

3 For all y intersection solutions do {

Adaptive contour finding

The next stage is to place the contours in context, which is unique to the adaptive algorithm. At this stage the blind algorithm has all the contours it is going to find so places them in the data structure described previously. By contrast the adaptive algorithm has only the contours from one level, in 8 bit colour depth level 128. This is augmented by a simple, quick process for providing additional scaffolding. The level contourisation subdivides the entire image into contour footprints (contours which run to the edges of an image include the image boundary to close them). Within each footprint there may be some smaller contours which will also mark boundaries at the same level. These are subtracted from the enclosing contour's footprint. Each footprint is now scanned for the relevant extrema. If two or more extrema with different values are found then the interval (here 128 to 0 or 255) is split in the appropriate direction, and this is repeated for successively split intervals until there are only one or more extrema with the same value in the immediately enclosing footprint. The process is not complete and now proceeds with a process of repeated test-rendering and comparison with the original pixels. If a test render results in sample values which are wholly within the corresponding pixel error bounds as computed earlier then the process is complete and no more contours need to be found. Accordingly samples are initially taken at output image resolution only. Also if the pixel errors are scaled, either locally or globally, this gives a further control on image quality. If, on the other hand, the test rendering of an annulus between two contours results in one or more samples falling outside pixel error bounds, then the render is abandoned and a new contour found to split the level interval between the outer and inner contours in the annulus. The entire area of the annulus needs to be taken into account when finding these contours, so more than one contour may be found. Accordingly test rendering is performed from the outside in, starting with the contour defining the image boundary. This ordering is just the same as in blind rendering.

Pixel extension (pixel mapping)

The first stage in the encoder is described as 'pixel mapping'. Note that, in the methods of the invention pixel maps are not automatically produced and are typically only constructed where more than one contour falls on a pixel. In some implementations 4 x 4 pixel maps may be constructed for each pixel crossed by a contour but this depends on the method of calculating the pixel colour. There may be significant areas where no pixel maps are formed. This is a significant part of the adaptive algorithm and will give a potentially 3+ x speed-up to the encoder relative to the blind encoder from this provision alone.

Here we double the image samples in each direction so that each set of 2 x 2 samples provided to the contour finder corresponds to a 'map' of the original pixel. However the new samples are all calculated by a non-linear process due to Carrato ef.a/.[Carr&96] which emphasises edge detail, as more conventional interpolation (we use linear interpolation elsewhere for reasons which will be discussed shortly) does not introduce any new information.

The re-sizing step is in part intended to allow some distance between contours we take as adjacent although inevitably reconstructed edge values are often not identical with their original values (this is not noticeable to the naked eye) and the 'Carrato' sharpening is intended to defeat an ambiguity problem which arises in interpreting contour trajectories with the 'marching squares' algorithm because of uncertainty over the centre sample in a pixel map. 'Carrato' will bias the solution one way or another depending on surface topology and, although it would seem as though we have merely pushed the problem down to the next (second) level in fact linear interpolation is quite adequate for the second order case. (Surfaces tend to flatten out locally with more interpolants.)

Figure 8(i) shows what is meant by modelling an image in terms of pixels, which may have many interpretations[Bli05], but commonly (despite Smith[Smi95]) as little squares. For our purposes a more appropriate model would be point samples addressed from the pixel centre as in Figure 8(ii). Figure 8(iii) shows one of the interpolated pixel values in dark. Using the 'Carrato' formula this value would be calculated using the following indices.

where and

For the images in this paper we used the value p = Q.S. Similar equations prevail in the indirection. The errors in the interpolated values are determined by subtracting the maximum and minimum interpolant values, using the limiting errors in the pixel values contributing to the result. For these differences, the index map for the error calculation is as follows.

where the error ε χ > , is the (designated) error in φ χ ^.. The resultant error is then associated with the interpolated pixel which participates in all subsequent calculations indistinguishably from the originals.

Pixels which do not contain contours are coloured already, but pixels cut by one or more contours need to be subdivided until each subdivision contains no more than one contour segment. An example of the process is shown in Figure 15. Here a pixel carrying two contours is subdivided until each of its subdivisions carries only one, for which case a different procedure applies.

Pixels cut by a single contour need to be considered as two potentially planar segments meting at the contour edge, which itself can be taken as a straight line between where the contour enters or exits the pixel square.

Figure 16 illustrates how this can be treated. Figure 16(i) shows the pixel sample points with no contour present. Figure 16(ii) shows the situation where the top left corner of the pixel is intersected. Figure 16(iii) shows the situation where opposite (left and right) sides are intersected above centre and Figure 16(iv) shows the situation where the opposite (left and right) sides are intersected below centre. The contour trajectory is treated as a line through an output pixel which, as in Figure 16 (ii)-(iv) defines three or four extra points (border intersections and the intervening diagonal) and thus divides up the pixels into 4 (no contour intersection), 8 (contour only intersects one diagonal), or 10 (contour intersects both diagonals) triangles whose vertex colours (C^ etc.) are known. The contour colour is C.

Area rule would be the normal way of determining pixel colour although the colour values are varying continuously across the pixel (because of sample value smoothing, see beiow), which complicates matters. Rather than applying the rule directly the pixel area is supersampled and the diffusion process (see below) used to determine the colour values for each supersample [Glass89]. For any contour-free pixel or subpixel there will be five points, four corner points and a centre point (as in / above). The (sub)pixel area is deemed coplanar if

If the condition fails then corner samples are dropped until the condition is satisfied. The pixel quadrants (subpixels) containing dropped samples are then each subdivided according to the same rule, producing a similar quadtree structure as before. The process terminates when the five points are 'coplanar enough' according to the relation, in which case one of the situations in i-iv , or their 90° rotations, obtains inside the subpixel. The (sub) pixel is then coloured using the centre value.

If the contour coincides with a ridge or edge (which can be pre-determined as edges suppress smoothing) then this condition is unlikely to ever settle down completely. This is dealt with by using barycentric coordinates, as is well known. One has then to proceed carefully by area rule as each side of the contour will be oriented differently. Away from the contour flatness can be expected, but on subpixels riding the ridge the problem can be resolved by area rule directly:

4, 8,10

Here the Cs are the colours of ea triangle calculated as the barycentric average q+c 2 -i-c 3 of the colours d, C 2 , C 3 attached to the vertices, and the As are the areas of each triangle. This may be calculated in terms of bar centric coordinates λ as

A : where

The colours of individual sample points within a triangle whose vertices are coloured as C-,, C 2 , C 3 can also be calculated as:

K = 4 (x, v> C, + A 2 (x, y). C 2 + λ 3 (x, y} C 3 and the inside test carried out using winding number (as below).

Although the 'correct' way of comparing synthetic values with pixels is to use the foregoing formulae, another moderately less tedious approach is to subsample the original pixels themselves using the Carrato algorithm to give the subsample values (and error bounds), then comparing the subsamples against each other. Pixels then match if all the synthesised subsamples fall within the Carrato subsample error bounds. This limits the amount of area rule calculation needed, as some will still be needed along unsmoothed edges. If Carrato is used (it is not strictly speaking needed if pixels are fully re-synthesised and compared, but that involves more steps than direct comparison) then the parameter needs to be reduced (halved?) in magnitude at each sub-level. Using Carrato like this may produce a different result to pixel reintegration, requiring more contours at edges and capturing more high- frequency detail, but this has been shown to be advantageous in zooms. On the other hand it may also increase file sizes and overall computational cost, but in turn this can be mitigated by reducing the initial value of the Carrato parameter (0.5 is recommended as a default), thus exposing the performance advantages in using direct subsample comparison in the first place. This is one of several 'tuning' issues which will need investigating during implementation.

Contours are mapped out in terms of their intersections with pixel edges, bearing in mind possible pixel subdivision. It should be noted that pixel subdivision is considered a relatively rare event and more then compensates for the pixel mapping approach in the blind algorithm. In fact pixel subdivision is another form of pixel mapping, one which is only carried out when needed, unlike pixel mapping in blind encoding, which is carried out on every pixel. Here, also, we use contour proximity as a measure of edge proximity, using normalised pixel tree depths as the actual measure. This avoids using a separate and potentially expensive edge detection process, and pixel trees are reconstructed as part of the decoding stage anyway, as is inevitable in an adaptive system. Accordingly we can describe the semantics of this kind of contour drawing as corresponding as closely as possible to the drawing of an infinitely thin line of the contour colour and modelling edges as the density of the contours deemed necessary for an exact reconstruction.

'Blind' encoders don't guarantee exact reconstruction (as shown in several examples here) and are in particular inaccurate on edges, but can you tell? But experts worry about such things, so will be happier with adaptive encoding. The larger point is that blind encoders produce far too much data in comparison with adaptive encoding and take too much time finding it, because of the sample-weighted overhead of producing maps for every pixel.

Decoding and rendering

The outcome of the encoding process is a hierarchy of contours (including contours R and S) defined in terms of the relation R encloses S defined as follows in terms of closed connected regions such as A . To define encloses(,) we start with the well-known inside test insideQ, p inside A 3 W A (p)■■■■ s where w A (p) is the winding number for p in the closed region A and s is consistently +1 or -1 for every point p in the region. Now we need footprint) defined as: footprint(A - {p \ p inside A and hence:

A encloses B z> -[J? C footprint (A )Λ 3 C, A encloses CAC encloses U j So A encloses B is an ordering relation between individual closed contours in each level and determines all the contours B which are enclosed by A and by no other contour. Rendering the vector format consists of rules for drawing and filling these contours in order. The decoder, shown in outline in Figure 3, 'simply' applies the fill rule for the footprint of each pair of contours in the hierarchy as defined by the footprint of the enclosing contour subtracted from the footprint of all the contours it encloses directly. 4

So although contours are generated in levels (i.e. all the contours at a given level) they are ordered in terms of individual pairs of contours.

The principal issue in rendering is to use a process which mimics to some degree the fall-off in values of pixels from a higher level to an adjacent lower one (or vice-versa). The intention is to develop a simple diffusion-based fill algorithm between levels, as defined by level lines (isochromic contours). We will first develop the idea in Level Set terms and then show how to implement it without having to solve the differential equations which the invocation of level sets implies. The reason for doing this is that Level Set theory makes the issues clear in a direct and easily visualisable manner but the complexities of solving the equations led us to use known fast, scanline -based methods to give us quick renders.

Rendering between nested contours

The principal issue in rendering is to use a process which mimics to some degree the fall-off in values of pixels from a higher level to an adjacent lower one. The intention is to develop a simple diffusion-based fill algorithm for the annulus between levels, as defined by level lines (isochromic contours). We will first develop the idea in familiar level set terms and then show how to implement it without having to solve the differential equations which the invocation of level sets implies. The reason for doing this is that Level Set theory makes the issues clear in a direct and easily visualisable manner but the complexities of solving the equations led us to use known fast, scanline -based methods to give us quick renders. Figure 17 illustrates the derivation of this. Figure 17(i) shows the dilation of a contour Ψ to include a contour Φ. Figure 17(ii) shows the contraction of a contour Φ to inside Ψ. Figure 17(iii) shows the effect of matching the indices of Φ and Ψ. Figure 17(iv) shows the pinching of an isochromic contour by shape. Figure 17(v) shows the effect of two interior contours on intermediates before joining and Figure 17(vi) shows the effect after the joining of those interior contours.

We derive the formulae in terms of a simple case (Figure 17(i)-(iii)) of a single outer contour with level value R surrounding a single inner contour level S, and then generalise. We want to arrange for the inner contour s = /(o) to first expand (in the terminology of Vincent [1993] 'to dilate') at a uniform (unit) speed until it wholly contains R (Figure 17(i)). At all times the level line for (ί), over the interval 0 < ί < ι , gives the shape of the intermediate dilation at instant f which we note is wholly dependent on S and has no connection with the shape of R. If, at the same time, the corresponding level line for R is contracted ('eroded' [Vinc93], as in Figure 17(ii)) at uniform speed until it falls wholly within S then the level lines for intermediate erosion at time point i 2 i say, will intersect the level lines for dilation in a range around another time point /,. These time points /, and t 2 now correspond to the times taken to reach the nearest points on R and S respectively, so define linear distances which can be used to interpolate colours associated with R and S respectively at the points of intersection of these two curves. What we are saying is that for any point inside the region R - S (R with S removed from within it) the shortest distances to the boundaries of R and S respectively determine the interpolation of the colours associated with the bounding levels R, S at that point. These interior colours will usually vary linearly from the values associated with one contour to the other, but this only happens if the geodesies running through the points in question are straight lines, i.e. the boundaries are not occluded from the point. In more complicated situations the interior colours will vary as though affected by surface tension, which is likely to fit what is actually found. Taking the outwards direction as positive (as shown in Figure 17(i)), the level set equation for the expansion (dilation) of ψ is: where * J i * ('W| > ° (2)

{ o otherwise

Here y/fV) is the expansion of ψ ο) clipped by R = <f>(Q). The morphological distances of points in R, say R(i), from S are given by the value of t at the points at which (t)= R(i). We can obtain the morphological distance fields for -R and +S by evaluating equation (2) and its matching partner for the erosion Vinc93] of R, equation (3) as in Figure 17(ii): ( 3 )

This will give two Euclidean distances t l = I + , t 2 = I "' , where φ(ΐ + ^ ψ(ΐ ^ at zero or more points inside R - S so we can calculate the colour values C of those points from the colours C j , C J+l associated with level lines R and S respectively as:

c_ c r r+c l+ r

r + r

We have used diffusion twice to give us morphological distances from each point in space in terms of the time to reach each of the two levels (here the speed of travel is unity so time = distance). When normalised by the sum of the distances this gives us an interpolation ratio between the two contour values which is linear for simple geometries but quadratic with a positive curvature - quite similar to the effects of surface tension - when the contour geometry becomes complicated. We refer to this process as double diffusion and note that isochromic lines are in effect interpolants between the shapes of the inner and outer contours, so it should properly be called double diffusion interpolation.

For the examples here we used a computationally simpler measure than Euclidean distance [Fabb&08], namely Manhattan distance, calculated outwards (inwards) from a border defined in terms of those pixels which contained the border contour. The Manhattan distance can be calculated like a fill process in which successive erosions or dilations define an ascending index starting at 1. Although the Manhattan distance is always an overestimate this tends to get normalised by the division of indices calculated in the same way. If the calculation of dilation (or erosion) is carried out in a quantised manner this naturally supports Manhattan distances, but if it is carried out continuously (e.g. by equations 2 and 3) this naturally supports Euclidean distances and the precise calculation of interpolants which are smooth everywhere.

Smoothing diffusion

One of the aims of the invention is to minimise the intermediate file sizes produced. Gains here depend on the noise-estimation method used as well as the extent to which the diffusion process mimics the expected variation of pixels values within a contour footprint. The noise estimation options have been discussed already but there remain a significant number of possible improvements to be made in respect of smoothing.

One improvement is to use a standard level set approach [Seth81] of adding or subtracting (depending on the sign convention used) the traversal speed of the level line with a (usually) small amount, calculated as (say) 0.01 x curvature, where curvature is calculated as ν··^. This has the effect of smoothing out 'shocks' or discontinuities in the evolving line (as shown in Figure 17(H) and 17(iii), which is also a common problem in interpolating systems [Reev81] and which we will have to consider again later. (It is noted also that loops are another problem with 2D interpolators but the rules for interpreting Level Set solutions explicitly precludes these under the 'weakest solution' rule; instead the loop is cut off at a point which often leaves a visible shock to which the technique being described can be applied) Shocks also arise when two advancing fronts intersect one another but again the 'weakest solution' [Seth81] applies to determine a single front. Again the foregoing modification smoothes away the discontinuities, but the calculated indices are no longer computed from wholly linear (Euclidean) distance values. It is also possible to manipulate these indices further so that tangents in the trajectories of index values are matched across contour boundaries to achieve G continuity. While we would expect an improvement in image quality (and matching improvements in file sizes) by these measures, such improvements are usually visually unnoticeable in an unwarped image.

These smoothing processes can be bundled together in the double diffusion process by replacing the equations (2 - 'upwind') and (3 - 'downwind') by equations of the form (here 'upwind'): wise

This is a somewhat intimidating equation at first sight, so it is perhaps best to see how it arises from equation (2) by adding smoothing terms: intra-annular smoothing inter-annular

smoothing

We note that ( i - 3 )- |V^ i _ 1 | j =( , -C 2 V^| + +(C 2 -C 3 v^| , the average slope (on the Catmull-Rom convention) on either side of the point where t = 0.

In Figure 18(i) we show an example of a fill region, here 'upwind' of the contour level C 2 . Figure 18(ii) shows a section of the map reduced to two-dimensional slice. As the slice is normal to the contours C2 and C3 we can assume propagation will lie in the plane of the slice use this view to gain insight not only in how the equation has ben put together, but also how to develop alternative methods to actually solving the differential equations directly on this basis.

What we want to do is to derive a tangent direction with which to line up the tangents with which the diffusion processes from the central contour (here at level C 2 ) begin. Unconstrained, these will not be in equal and opposite directions and will result in a crease in the surface which has actually been seen in resizing experiments, so is real rather than theoretical. From Figure 18(ii) we can see we can take the overall gradient T h - 1 as the slope to

C—C which diffusion surfaces away from level C 2 are initially constrained. Here Τ ι =—— - so m + n jv^. _ = r ". , Here m, n are the Euclidean distances from Level C 2 to as seen on the

C ] -C 3

isosurface plan view, Figure 18(i), and T., T + are the downwind and upwind tangents respectively. Similarly the magnitudes of the normals on the isosurface at level C 2 are j (//,. _ — , the downwind normal and | ^.| =— r , the upwind normal. Accordingly,

C] -C 2 + ^2 ~ ^' 3

and « f « - (C, - c 2 |

So to scale the upwind tangent to make it match the intended tangent (i.e. divide by the original tangent, then multiply by the one we want) we can use the inverted normals to provide this scaling, thus:

is the scaled tangent (and similarly for the downwind direction). So we replace the normalisation term in (2) by an extended term which scales the tangent as desired, albeit differently for the upwind and downwind directions.

The most common context in which smoothing is required is that of splitting the interval. Here

2 . 3 2 a

These equations refer to the 'sheets' of Euclidean distances sent out from C 2 to determine intermediate colours and different ms and ns are used on either side of an interval when computing the (modified) Euclidean distances used in interpolation. If and Hi apply downwind and m 2 and n 2 apply upwind then the interpolation formula upwind from the m level for colour C will be Here Γ and f refer to the upwind and downwind Euclidean distance measures respectively.

The effect of this formula is to give a modest smoothing with G 1 continuity at the relevant level.

The following example give a sense of the degree of smoothing achieved. An isosurface 'slice' - essentially an extension of the surface shown in Figure 18(ii) is tabled by levels as set out in Table 7 below.

Table 7

The values of /, Z and resulting colours are set out in Table 8 below and illustrated in Fig

19:

Table 8

Splitting the interval isn't the only context in which subdivision occurs. As can be seen, adjacent intervals are also significant, and they may not necessarily carry the same interval, so the equal interval case is not general. In this (general) case:

A similar example to before (level 5 = 32 has been removed, and the appropriate Euclidean distances adjusted)) is shown in Table 9 below:

Table 9

This results in the table of coefficients (Table 10) which is also illustrated in Figure 20: Table 10

"""" 6 19 2 : 23 24 25 26 28 29 31 32 34 36 ί 38 40 42 j: 44 ' 45 47 48

This does not cover ali of the issues relevant to getting valid solutions. The two-dimensional case is not general in one other respect, namely as to what values n and m actually have in the interior of the diffusion. The values can be unambiguously identified at the start, they are the results of previous propagation from the other boundary, and decorate the contour ahead of completing te smoothing process. However, the m and n values have to be propagated as well, and the values corresponding to the path with the minimum Euclidean distance to the boundary (the contour) taken each time. Because of this and other considerations, notably wanting to include intra-annular smoothing in the process and wanting fractional accuracy, which the usual fast algorithms don't provide[Fabb&08], it looks like a progressive Euclidean distance algorithm should be preferred.

The intra-annular component has been covered already. Here correlates inversely with the depth of the pixel tree, being a small constant (~0.01) when the pixel depth is zero, and zero when it is more than a set limit. A similar arrangement is needed with inter-annular smoothing, and the default at the point when smoothing is switched off is the original interpolation function, i.e. Z + and Z default to 1 at edges. This can be accomplished by using the same function to scale the Z values from their calculated values to 1 using the same criterion and threshold values, thus:

Here ε 0 is the (set) maximum value of ε(ψ) [Seth99]. Video Encoding: Motion VPI

The description above has focused solely on single images. However, it will be evident that vectorised images can also be used in video sequences. Like motion-JPEG (a version of MPEG where only I -frames are transmitted) images could be vectorised, the contour coordinates and decorations (e.g. colour) transmitted and the image reconstructed in a vector decoder. Such a codec would be scarcely different to a single image codec, differing only in the handling of images in a sequence. As we have seen, such a form of motion VPI is unlikely to see bandwidth advantages over motion JPEG, let alone more highly compressed forms, much below 8K image resolution. Image quality would be better as V formats can ensure 'sensor quality', not identical to what the sensor sees, but with every detail, except noise, present, whereas the systematic discarding of frequencies in JPEG compromises images for effects work, let alone introducing artefacts like the 'shower door', 'blocking', 'mosquito noise' and 'ringing' effects (Gibbs' phenomenon) at high levels of compression.

In fact simply transmitting full contour sets all the time is quite wasteful and inefficient, for two reasons. First, many elements of a moving image do not change from frame to frame, outside orientation, and even here the changes are small. So, resending the local contours that make up unchanging objects is wasteful; they have been sent already. Second, some contours, particularly silhouette contours, resemble each other enough that they do not need to be re-sent in their entirety; only the edits to make a previously sent contour the same as the one which would otherwise be sent. Accordingly there is a possible bandwidth economy in sending only the edits and having the decoder perform them. The gains in bandwidth resulting from these measures are potentially great, possibly as much as 90% or more in identical contours and a somewhat subject-dependent amount in respect of similar ones.

At first sight matching every contour in one image against every other in the next seems like a Herculean task, but there is a quite straightforward approach to performing this task efficiently, sufficiently so to allow for some exploration of 'similarity' over and above 'identity'. Another issue is that of the numbers of contours to be considered, which is far less than the number of pixels. Indeed contour-based operations are unlikely to take much more than 10% of the time required of their pixel-based equivalents (boundaries versus areas again). The cost comes when encoding and decoding, which needs to be kept to a minimum. In essence sampled image sequences need to be translated on capture into a V-form, for example VPI, then handled and manipulated in V-form, and finally reconstructed into sampled form at the moment of viewing. All intermediate processes will then be computationally far cheaper than before and equally quite extensive transformations can be included without compromising performance. Comparing contours for minimising bandwidth is one such transformation. These are all advantages over sampled forms which increase progressively with resolution. Curvature

The criterion we use for contour shape measurement is curvature, a property of freeform shapes which Kunii and others [Kuni&97][Kuni98] claim is a usable invariant. By curvature we mean the signed inverse of the radius of curvature. In fact what Kunii et al really mean is that curvature tends to be preserved in natural shape changes, although this is not always true, as can be seen in the example in Figure 21.

Here the curvature measured along the black lines (most of the drawing) is unaltered despite orientation changes, but in fact curvature changes quite considerably for the short blue sections around the "elbow". However, this is really the magnitudes of the local curvatures and in the example the signs do not change anywhere outside the fold in the sleeve. If this extra feature had been omitted, and indeed would not be present under unrevised Level Set interpolation rules, we could say that the sign of the local curvature did not change anywhere. In fact curvature (value plus sign) is indeed invariant under affine transformations and when purely shape changes take place within a contour. For example, one might expect that the curvature of most of the contour will not be affected either because most of it remains in the same place or only changes orientation, i.e. is subject to affine transformations. Over and above this one might also reasonably expect the signs of local curvature to be preserved also, to the extent of approximating to an invariant. It will not be necessary to match whole contours using curvature identity when shape changes are clearly taking place, just long sections. In fact relative lengths of segment matches and preservation of sign give confidence measures for use in deciding correspondences between contours (also positional predictability across sequences which will involve finding motion vectors). Later embodiments of the present invention exploit this.

The use of cubic Bezier chains as contour models is especially helpful to us in determining curvature. A cubic Bezier segment can only consist of three different forms, a curve with positive curvature everywhere, one with negative curvature everywhere, and one with a point of inflection which can be split (for the purposes of classification) between adjacent curves carrying the same curvature. We take all the segments with the same curvature and group them together, classifying them all as a single section of the contour with the curvature sign all its components carry. Accordingly a contour can be classed in terms of sections each containing the same curvature, starting with positive, for e.g. +-+-+-+-.

We can therefore classify contours in terms of their section sign-pair or 'petal' count, staring with + (zero), +- (one),+-+- (two) etc. Our previous example was a four-petal contour and could be like the example in Figure 22(i) below. 5

Notationally we refer to a petal or whole petal as the shape forming a +- pair, the exception being a single + which is ambiguously a whole or a half-petal. A half-petal is either a + section or the - section of a whole petal and will have is own bounding box.

Any four-petal contour like the one in Figure 22(i) is equivalent, from the point of view of overall curvature, to its petal diagram (Figure 22(H)) where the curvatures are carried by uniform curves of the same size for each sign. As can be seen this gives us a quite effective partitioning of possible contour shapes as the petal count for contours evolving in time is likely to be preserved. The necessary data, curvature sign, half-petal bounding boxes, points of inflection, corners, are all readily identified during the process of identifying the Bezier chain segments, and can be passed on from there.

The Sieve

An overview of the algorithm

Here we give a brief overview of the algorithm, followed by the mathematical considerations. The pseudocode of an embodiment implementing this is set out in more detail below. The first stage is performed when the contour intersection set is first reinterpreted as a Bezier chain. At the time this is begun the contour model is that of a polyline joining the intersections of the contour line with x±0.5, y±0.5 lines as a single loop. Each x or y intersection comes with a (linear) error bound. The 'fountains of difference' algorithm is then applied and this results in a series of end-points and tangents for the Bezier curves (an alternative would have been to use Catmull-Rom Splines mixed in with the Bezier segments but they, unlike Beziers, turn out not to be affine-invariant, and this property becomes important), which are the longest cubic curves which can be found within the given error bounds. If these error bounds are loosened then the usual consequence will be that the individual Bezier segments will get longer, with some not being needed any longer, so the number of segments in the chain will also reduce.

Eventually the error bounds will be so loose that the petal count for the contour will drop to ½, its lowest possible value, and possibly soon after only one Bezier segment will be needed in the chain. However, the present most sensible stopping point will be when the petal count first becomes ½. The number of iterations to achieve this is also significant, and is retained. At the same time as determining the petal number for each iteration, petal maps and curvature maps for each half-petal are determined, also bounding boxes with error bounds for the resulting shape. The italicised terms will be explained in more detail later, but it is arguable whether curvature maps, which show how curvature varies within a single half- petal, will provide sufficient extra discrimination for initially high petal count contours which have been degraded to low petal numbers. This, however, is something which could be tested in practice (we will assume for simplicity that they are useful at all levels). So a contour presents itself as something we could describe as a scale-space form, although that isn't an exactly correct usage of the term.

The contour thus presents at the sieve as an hierarchy of data with size parameters at the top and the 'correct' Bezier chain form at the bottom. Size parameters consist of 1 , the length of the 'long' axis of the final bounding box, 2, the corresponding length of the 'short' axis, and 3, the number of levels in the hierarchy needed to reduce the petal count to ½ . Each is a discriminant. Lower levels all consist of petal number at that level, petal map at that level, and curvature maps for that level, with error bounds for each curvature value. The bottom-most level also carries the Bezier chain control points.

Contours are first discriminated by size parameters, in order. Bounding box sizes are allowed a range of x2 to x1/2, level numbers are exact. They are subsequently discriminated by, in order, petal count, petal map, curvature map. This is done repeatedly down the tree, matching in each case within error bounds, where appropriate. Petal maps indicate the number of segments needed to make up each half-petal and include corners and straight lines. Petals with corners must match (there may be options on this in the presence of multiple corners in the contour but multiple corners in a single half-petal must match within a single Bezier segment) but otherwise it is only the segment count that matters, and then only in terms of the order it imposes on petals within a contour. There are similarities here with the previous 'sieve'. It is only at the bottom of the hierarchy, if discrimination hasn't happened by then, that individual Bezier segments are compared by looking at their control points. In each case the current contour being compare with the tree contents is re-oriented, initially by scaling (which does not preserve curvature, but this can be corrected), then by rotation. Bounding boxes are present at each level for alignment (translation) and carry error bounds.

When a contour is discriminated as being 'novel' i.e. failing to match at any level with a preexisting entry, it forms a new entry in the discrimination tree at that level. All higher-level hierarchy data is discarded by then, so the scale-space list gets shorter as the structure proceeds down the tree. If it matches, but hasn't reached the bottom level, then both matches descend the tree, discarding the previous level data so attempt a re-match.

On conclusion, all matched contours will remain unseparated at the bottom of the tree and will correspond exactly in shape, within the original error bounds, so segments may not align. 5

Partial matches will be tested for at lower levels and partial matches will point to one another if later discriminated and continue testing down to the bottom level. Similar but not identical shapes, for example in-betweens, will be discriminated only at low levels and will reveal themselves by multiple partial matches between each other separated by regions of identically-signed curvature (straight lines have a sign of zero so a partial sign change, plus to zero or minus to zero is acceptable here, albeit at a lower confidence value). So matches are scored 100% (perfect matches within error bounds), partials 99-1% depending on proportion of similarity, and no match, 0%. Since partials are sent on as updates, these are specified in a command list and both forms of the partial retained against possible full matches later.

Curvature

The key aspect to all of this is curvature, which, fortunately, can be calculated quite easily. The key formula is

Using the implicit cubic Bezier form

Now, if we have differences for y = A-x^ + B-x 2 +C-x D then Δ^ ' = 3·Λ·χ 2 +2·Β·χ +€ and

A], =6·Α·Χ + 2·Β, so:

This assumes that the differences are directly from the cubic and not, say, the differences in the 'fountain' tableau, as these are contaminated with the original errors E v . If we were to use these we would get a quite approximate value for the curvature, i.e.

If we now treat all the E v s as E v ~ E mean ±E sd , i.e. a mean with up to ± a standard deviation added, then the mean will be zero and all the E y s will be standard deviations. So (using a truncated Maclaurin expansion on the bottom line),

K, appr x < K exaci ± A - E y

\ exaii "· approx j

In other words, < E, so we could use the difference between the two as an error measure which scales with E.

Using The Parametric Bezier form

If we have a parametric equation it is of the form φ = (x(t) y(t o) in 2D. Using dot notation the curvature equation then reads:

For a cubic Bezier the (vector) equation and its derivatives are given in the following Table 1 1 :

Table 1 1

Deriving the curvature is simply a matter of 'plugging in' the formulae to equation (1 ).

The point to note here is that, although the Bezier curve is affine invariant, its curvature isn't. It is preserved in translation and rotation, but not in scaling. If we assume a global scalefactor s is applied to the control points, then 3 - s - φ as each of the control points is scaled by s, and <j> s = s · φ , for the same reason (the s subscript indicates the scaled version). Accordingly

So curvatures need to be scaled by -, if scaling takes place (right at the beginning of the

s

sieve discriminations). Axial scaling (differential scaling in x and y) results in a more complicated correction factor

The only possibility here is to carry the derivative terms directly and form curvatures as required at the time of comparison.

Uses of Curvature The algorithm uses several kinds of petal map, each prepared during the process of repeated generation of progressively simplified versions of the contour. The first 'map' is the petal count. We reclassify straight lines and corners as belonging to runs of one sign or anther (allowing for straight lines to 'change sign' if ended by curves of opposite sign) and replace runs of curvatures of the same sign by a single sign. In this way we treat all contours in terms of these signs, as follows:

One half-petal: +

One petal: +-

Two petals; +-+- etc. Each +- pair is a whole 'petal'. By convention we always starts with a '+', so the sequence (for a closed contour) ends always in a '-' with the sole exception of the (degenerate) single half-petal. We don't need to retain the petal pattern, the number (petal count) is definitive, although we would typically use zero as the indication of a half-petal in implementations. A more discriminatory measure is a petal map. Here we count the numbers of +s and -s in sequence and note the position of corners and straight lines in the map. So for a single petal with no straight lines or corners we would have +(n) where n is the number of Bezier segments in the chain. If there was a corner or a straight line, then this would be marked, here Λ and | for corner and line respectively, e.g. +(3) Λ +(2) indicates a + ha If- petal with a single corner surrounded by three Bezier segments 'upwind' and two 'downwind'. So a complete petal map would show how each half-petal in order was made up. For comparison purposes the half-petals are sorted by size (number of segments in each petal) after each half-petal is given a sequence number. In fact petal maps' sole purpose is to ensure that strong clues like corners are prioritised when determining orientation, so petal maps are only invoked if corners are actually present. Straight lines are also discriminated in the map solely to ensure that the segment counts are as accurate as possible. However, when two petal maps are compared the petals containing the corners may not line up exactly, and in fact the minimum distance between petal sort order and petals containing corresponding contours (i.e. of the same sign) is an indication of error bounds. The largest of these errors is now assigned to each sort entry and a tree of entries (as discussed below in connection with curvature) can now be built. The correspondence condition is thus similar, that there is least one path from root to leaf matching the comparator's order, and this determines the reorientation of the candidate contour with respect to its comparator. In this case the error bounds are only known at the point of comparison, so tree construction has to be made at decision time, not map construction time. From the point of view of making timing estimates this will be a comparatively rare event, so not significant. The point is that the initial orientation (rotation) prioritises corner matching, if and only if this is available and this is because numbers of segments, unlike curvature measures themselves is not necessarily a reliable indicator of shape correspondence.

So we come to the third kind of map, which is the curvature map, showing the curvature changes in each half-petal. A single Bezier segment will typically have its curvature

1 2

measured near its interior points ( t =— , t =— ). The error bound around the curvature

3 3 measurement is also quoted (as — A' ' w "' v l ) and if the error bounds are wide enough to

4

include each value then one is used to indicate the average curvature for the segment. Similarly if adjacent curvature values fall within each others error bounds then only one is quoted. All that is needed is a 'good enough' estimation of curvature until this becomes precise enough to make a significant discriminator. Once again these could initially just be used as relative values, to show regions of high and low curvature, by assigning them sequence numbers and sorting round the entire contour. In fact the sequence would be organised as a tree, whose root is an anonymous start point and whose immediate descendents were the legal largest curvature candidates taking error bounds into account, followed by the next largest, depending on choice of largest, etc. Two trees would correspond if each tree had a common path from root to leaf, and discrimination on the basis of petal maps would fail if all possible alternatives were present in the tree, i.e. as many leaves as permutations of list entries. Unlike with petal maps, error bounds are available for each contour and can be precalculated (and indeed allow for the map to be withdrawn from making comparisons if not discriminatory enough at any level).

Where the interior point curvatures don't fall within their respective bounds then the curve is subdivided further ( t = -, t = -, t - - etc.) until the curvature measurement is fine enough on

6 2 6

the same criterion. In the map multiple curvatures for a single segment are quoted in order. Textures and generalisations Textures have always seemed to be a potential problem in V technology, mainly because it has always been assumed that large numbers of contours would be needed to handle high- frequency textures. In fact even the 'blind' VP I codec seems to have handled such textures quite well although the time cost is unknown. So far in our discussions we have assumed that animated textures would not be treated as any kind of special case, but in practice this could mean having to send a lot of texture detail, thus possibly compromising bandwidth. Static textures, or textures which change only according to affine transformations (changes of viewpoint), will be handle mostly by sending the affine transformation and not the contour data. The receiver then reconstructs the contour view by applying the transformation to it. In essence we would have a subtree of locally neighbouring contours each sending the same affine transformation and this could be simplified by editing out the subtree and propagating the transformation towards the subtree root. Thus affine transformations may sit remote from the contour to which they apply, so they need to be propagated according to the usual Hierarchical Display Model rules (see discussion below). Given that affine transformations are propagated it opens the possibility of making systematic changes to these transformations as they are propagated, so the subtree itself may be decorated by these transformations. In this way the behaviour of a region of similar texture may be managed from subtree nodes which don't actually carry contours. The simplest way to build such a subtree is by inspection, starting from the leaves and working upwards. Once the subtree is identified and built it can be traversed to find non-local commonalities. It may be possible to identify non-linear, similar parameter values up to degree three by working on an assumption of local similarity and attempting to relate parameters cubically. This is another example of where the motion-VPI model can be extended which is discussed further below.

Codec Structures

In motion VPI we might think of two different codec pairings, as shown in Figure 23. In the back-to back codec all that is involved is to transfer the contour tree hierarchy from the encoder block to the decoder block. While straightforward it is also inefficient as most of the contours, typically at least 90%, will either be the same or similar. On our curvature matching model this means that most parts of the contour will be the same while one or a few parts will be changing. Since contour matching takes place regardless of (within limits) size or orientation then contours making up moving backgrounds (say) will be matched, along with many parts of foregrounds. In back-to-back all this un-necessary data is transferred anyway. From the file sizes given in above this arrangement won't even match motion JPEG below 8K image sizes, and Motion JPEG is typically some factor of 2-5 times larger than MPEG files which takes the boundary between 8 and 16K.

If, instead we try to avoid sending un-necessary contours or even unnecessary contour segments then we can achieve much better comparisons, with Motion VPI passing MPEG between 2 and 4K which is pretty much what we want to achieve as a minimum. In this case Motion VPI is the preferable format above HDTV and all present candidates for digital film formats. To achieve this we need to insert a module in the data path which functions like the sieve algorithm detailed earlier. In essence all novel contours are directed into a cache which contains all the novel contours encountered so far in the sequence together with the edits to those contours which change them into similar novel contours. The code generator then generates SVG-like instructions to draw the contours and manage the fill operations. These either refer to cache addresses or to the novel contour data directly depending on whether the contour has been matched -even partially - or not. The cache address is actually that of a mirror cache which is maintained by the decoder in lock step with the encoder's cache. The code generator thus needs to generate additional cache manipulation commands whenever the encoder cache is altered.

The decoder block now executes the commands sent to it by the encoder, including cache manipulation orders intended to keep the caches the same. So when the command refers to a cache address it takes its contour data from the mirror or receives the contour data and places it in the mirror first. One might imagine that if 90% or more of the contours are essentially the same from frame to frame that the caches would fill up very quickly, either that or be very large. If, instead, a least recently used regime is used (the contour most recently accessed goes to the head of a queue, so the least recently used contour is at the tail of the queue) then a cache not much bigger than, say, two typical image data structures would be quite sufficient as least recently used contours may be removed altogether. If, after removal, a would-have-been match comes along it gets treated as a novel contour and the idea is to make the size of the cache big enough for this to be a rare event. There are a number of ways this cache might be organised but the semantic model reflected in the order code is that the most recently accessed contour goes to the head of the queue in its entirety. If it is a partial match then its edited form is matched first, again the most recent form is tested first.

There is one final aspect to this discussion about structure. If the decoder simply obeys instructions it may be structured as an interpreter. It simply receives commands and executes them. This is true for VPI as well as the others, so the VPI decoder forms the basis for all decoders.

Thus a rendering process, entirely analogous to rendering processes used in 3D graphics, is provided. The problem addressed by this process is, if flat-fill is insufficient, how to determine the intermediate pixel values between contours when rendering back to the sampled form? This problem and the present invention's solution to it is illustrated in Figures 1 (i)-(iii) above. In Figure 1 (ii) the straightforward approach of flat fill has been taken. More explicitly, in this version of the image, each contour footprint, here defined as the image region uniquely enclosed by a contour, has been filled with a constant 'average' colour. In Figure 1 (iii) a diffusion-based interpolation technique (as described herein) has been used instead. Each rendered image was rendered from the same contour set. In Figure 1(ii) the different outcomes are quite noticeable; flat fill shows Mach-banding artefacts which are wholly absent with the interpolated fill of Figure 1(iii). As can be seen in Figure 1 (iii) the interpolating fill technique produces satisfactory results even though a quite simple rule has been used to determine which contour levels to use in the vector form (the 'blind' algorithm).

Accordingly, a first embodiment of the present invention provides a method of encoding images for transmission. This embodiment is the encoding part of a codec which will be termed the "fully adaptive VPI codec" as it seeks to take account of the characteristics of the input data in order to improve compression and/or speed of coding/decoding. Table 12 shows the pseudocode (the natural language overview of the steps involved) of the encoding method, along with the time taken to complete the relevant steps. Programming of this pseudocode into computer-readable code is a matter of routine implementation for a programmer skilled in this field.

Table 12

Activity .2.1.2 stack enclosing footprint

.2.1.3 restart scan for contour solutions inside current footprint [9]

.2.1.4 }

.2.2 test render inside footprint [8][10J

.2.3 until footprint scan completed

.3 test removal of enclosing contour (and remove if pixels still in range)

.4 unstack footprint

.5 until end of last enclosing contour

until last component completed

Format data structure and write to file as grouped YUV contours in VPI format

Explanatory notes to Table 12 (indicated by [x] in table)

[1] The standard noise estimation uses signal-to-noise ratio, see EMVA standard 1288. We will use the formulae given here as a first estimate of pixel error via its SNR. [2] There are at least three possible strategies to determining pixel errors: 1) by modelling (arguably the blind encoder uses a very simple noise model), 2) by using externally computed values e.g. from a JPEG decoder, or 3) as here by filtering the image and using the filtered values thereafter. We use SNR initially and scale it if additional contour smoothing is required. If additional smoothing increases the error bound then this is taken as the new bound and a scalefactor calculated. Further details of filtering are discussed in the section on Contour Representation above.

[3] This provision always ensures enough colour samples to give a correct rendering for the pixel value. This is particularly important at edges and the tile data structure should be deepest at edges, making this an implicit edge detector. The bits of the structure which matter to the rendered image will tend to be rebuilt in the decoder (see below), so edge data is available in both encoding and decoding without making explicit provision for it in either case.

[4] A depth limit of 2 + <number of bits per colour primary> is set to prevent uncontrolled recursion where two contours are on top of each other and cannot be separated, e.g. top and bottom of a vertical cliff. At the set limit no change in the recovered colour primary will be seen if the recursion goes deeper. [5] Only a single level needs to be quantised in its entirety and by convention it is the level which splits the colour quantisation space. While this isn't critical for VPI consistency in finding levels is crucial to motion VPI applications. So the first contours sought in an 8-bit quantisation are 128, then 64 (192) then the interval splits between them, etc. These should result in half intervals being targeted in normal cases. The rest of the 'scaffolding' ahead of a test render is found inside the footprints in which the original image has now been

subdivided. If the footprint contains two or more extrema with different values (if they are the same that is OK) then the interval is split until they are in different footprints. Neighbourhoods can be (unsmoothed) rendered with flat fill (they will be within ± half the error bound anyway) but can be smoothed by interpolating the index field with a horizontal flat surface at the spot height and applying the usual fill and smoothing rules. The spot height is thus a degenerate small contour and the fill is a degenerate annular fill. Contours inside footprints left over from the initial level contourisation are contours at the same level as the footprint itself so lead to regions which are opposite to the region enclosed by the annulus (i.e. descending if ascending and vice-versa). The result should be a considerable number of expressed contours which are needed for level rule consistency (their necessity may be checked later) but are obtained without test renders. This offers the possibility of a considerably faster and more space-economical encode (expected to be 2x better) than with, say, a 3 levels regime. The remainder of course will be revealed in a succession of test renders. [6] Management of which is the 'current' footprint can be done via looking at the tree DS into which contours are placed. Discussion of 'stacking' footprints really implies simulating this via DS relationships and requires a 'history' structure to support this.

[7] The 'fountains of difference' algorithm. The pseudocode for this algorithm is set out in Table 13 below. Table 13

Step Activity

1 For all x intersection solutions do {

1.1 Reconcile subdivided tile to give synthetic lowest resolution level tile (subdivisions need to be retained at this stage)

1.2 Calculate x t [as in note 1.1.1 for blind encoder, note 2.1.2 for adaptive encoder]

1.3 Add x, , ε χ ι to initial polyline path

Explanatory notes to Table 13 (indicated by [x] in table)

[7.1] This assumes a 5 or 7-tap 1 -dimensional Gaussian filter set to exclude pixel-pitch wavelengths. The formula for calculating convolution weights is:

Here σ is the standard deviation and x is the normalised index (taken as a distance) from the kernel centre. The distance between each 'tap' (i.e. the increments of x) used in a 5-tap filter was also 1. A 7-tap filter is nonetheless recommended in the literature.

[7.2] Chord-length is being used here as an approximation to the 'true' Bezier parameter value. As a uniform parameterisation it allows the formation of differences but the algorithm refines cubically-related values near to the given values, so returns cubic coefficients for x and y, thus restoring the Bezier parameterisation as the increments between coordinate pairs (points).

[7.3] The final points here must be baseline-length-equidistant everywhere, including between first and last points, which are arbitrary. Normalisation to recover a 'true' parameter is not necessary.

[7.4] It is essential that polyline lengths are measured along polyline segments, i.e. a given distance corresponds to the sum of the lengths of the polyline segments so far. The actual position in a polyline is found by first finding the segment in which the path falls, then interpolating between the previous, reached, vertex and the next, unreached, vertex using the difference between the previous vertex path length (assigned chord length) and the current distance as the interpolation parameter. Chord-lengths are measured along the polyline so chord lengths determine the number of steps taken from start to finish, and this needs to be integral, to assist the formation of continuous difference tables. [7.5] (i) The differences are first formed as forward differences and central differences only formed as required. As central third differences are always required, these should be formed immediately. The table index 'wraps around', so the last value entry also precedes the first and differences must be filled in for the wrap-around also. It may be the case that a curve will straddle the ends of the table, but this won't be a problem with this provision. Differences are formed according to the rule:

Δ x ' Δ ^ ' - A' X:

where we take Δ * .< ~ < to start the process. This is in terms of the notation in 7.5(ii), below, (ii) Interpreting the terms as in the previous section, we develop differences as follows:

1 - 2 X t ~ l X t x t + l X i + 2

, 4

In this notation the first central difference for x < will be Δ X and third central difference will be A . Here t is the parameterisation (initially set at chord-length but finally Bezier).

These are calculated as:

2 and

Also;

(iii) The points provided are not in themselves cubically related, nor even

parameterised just like Bezier curve coordinates. However, they are approximations to both, and the discrepancies are treated as fitting discrepancies. The algorithm assumes there is a cubic there to be fitted. A cubic will always pass through any four points but the attempt is made to make them 'close enough' to at least five points, so 5 or 7 points (5 is better in the sense that it is nearer to 4 but 7 better in terms of forming a good starting guess). 'Close enough' is defined as within the error bounds of the original (filtered and interpolated) points. For a cubic the fourth differences will always be zero, so solutions constrained to a cubic will also always have zero fourth differences, but the given fourth differences here are highly unlikely to be zero at first.

If we take x < = jP *.< + E *.< (the Es are all signed fitting discrepancies) where p *.< is the x- coordinate for a point which lies on the cubic Bezier curve and <l - ¾ , < j S the x-distance by which the x > misses being on the cubic curve (the fitting discrepancies), we will only have approximations to the differences to the underlying cubic curve. These fitting discrepancies, or errors, however, propagate through the difference stack in a precise way and are fully exposed at the level of fourth differences. At this level:

Δΐ " 4 - P + 6. >^ - 4 ./> ¾ It! + P Xt ,, 2 + £,_ 2 ~ 4 ·¾.,_, + 6 - E,., -4 -E X ^E XiM and = 0 (fourth differences for a cubic are zero). So: Δ1- ' = 6 ' £ u + and we will see non-zero fourth differences at first.

[7.6] (i) x-coordinates and y-coordinates are processed distinctly, but progress through the tableaux must be synchronised by index/parameter.

(ii) Fourth differences contain a (weighted) sum of fitting errors only, so adding these for x and y gives an estimate for the closeness of fit of the point (i.e. small value implies small errors locally). So the summed fourth differences are placed in a single stack in sort order. Top of the stack always contains the next index to start, so is consulted for each curve search. Whenever a tableau index is visited the stack entry is deleted, so indexes need to point back to tableau entries. To avoid extensive pointer readjustment the stack should just grow and have entries marked as deleted rather than actually deleted (until stack is needed again for a new contour). Optionally 5 (or 7 depending on the original number of calculated values) successive fourth differences in each coordinate can be taken and added together and this is more exact, but here we are recommending using a single value as proxy for the sum of its neighbours, on the assumption that they are similar enough not to justify the additional discrimination. [7.7] (i) To arrive at an initial set of fitting errors we first need to form a corresponding set of cubically-related values, which means setting up a central difference stack and generating those values. So, to generate the 'best fit' cubic difference stack for the 5 (or 7) nearest points we need to recompute third, second and first differences so they determine 'best guess' values for the cubic.

Third differences: Notation assumes that f is the difference stack index and all formulae assume that x values are being calculated (formulae for y correspond, with y replacing x throughout). So, we first form the third difference

as the average of the central third differences indexed. The overbar indicates a quantity which will nominally replace another in the difference stack, but in practice it will be usual to re-use the difference tableaux so the tableaux themselves should be retained and a copy built up of the required entries. Averagings which include generalisations of this sum will always be part of a restart (when the continue condition has failed), so keeping the relevant sum will save un-necessary computation. This is the calculation for ¾ assuming 5 initial values (with 7 the 5 is replaced by 7 and the bounds are t ± 3):

Second differences: Second differences, again for 5 initial values, will form (here) a linear sequence of the form:

? 2, ( = mean\

where

Here t runs from 0 to 4. So 'central' f is 2 (or, for 7 initial values, t runs from 0 to 6 and 'central' is 3).

First differences: These will be central differences calculated in stages as follows:

New x-values: The formation of differences up to the first central difference give us a fountain-of-difference stack (or 'fountain')which can be used to generate cubic values for a given assumption about the cubic itself (which the stack incorporates). The central value (the 'fountainhead') is initially unaltered but is included for completeness. Applications of Stirling's Central Difference formula the first time can give us, for x:

Again 'central' / is nominally 0. These new x-values (P x values) are then compared with (subtracted from) the given values to determine the E x t s. This gives us the initial fitness discrepancies which make up Δ ,' as in note 7.5 (iii). For 7 values the values for t ± 3 can be found from the general Stirling formula, or by using its recurrences, as in notes 7,8 (iii) & (iv).

(ii) There are now two choices on how to proceed. One is by finding the signed Outlier' errors directly by using fourth differences and then determining the actual values by subtracting their fitting errors, or by calculating the values directly using Stirling's formula or its recurrences (notes 7.8(iii) & (iv)). The remainder of this section discusses the fourth dfference method. We know: ,i = £,,,-2 - 4 '¾i-l + 6 ' ¾ - ' ¾ +A.< + 2

Similarly:

So:

and hence we can get P y +3 as x f+3 - E x,f+3 if we want it. In fact the stop condition is expressed entirely in terms of errors, so usually we don't need P x , f+3 or its adjacent outliers. This process can be carried out in both directions simultaneously, thus avoiding anything other than minimal use of the Stirling formuia itself (to generate the initial values around the fountainhead and in restarts). We will see later (note 7.8(iii)) that the Stirling formula requires 1 1 + 1 arithmetic operations (the additional 1 being to extract the error from the original value) of which 5 are multiply-weight, while the fourth difference method requires 8 operations of which just 1 is multiply-weight, and the error is determined directly. There are relative merits in each approach which are discussed in note 7.8(iii) in connection with stopping conditions and restarts. [7.8] (i) Restarts. The algorithm stops searching in any given direction if the error bounds E x (or E y , again formulae in x can be translated into formulae for y) exceed the original error bounds. Both x and y forms are tested simultaneously and if even one fails that yields a stop in that direction. The algorithm only stalls when it is stopped in both directions. It is at that point a restart is tried, as there are several possible reasons that the stall is not terminal. The first possibility is that the curve is not oriented correctly, so it is attempted to move the curve around without changing its shape. If the same constant is added to the x values then this has no effect on the first or higher differences, but the curve is raised or lowered (within error bounds). If a similar, unique constant is added to first differences this has no effect on higher differences, again they retain their significance, but the curve will rotate around its 'fountain' point (again within error bounds). Since the algorithm starts by adjusting the higher differences it is appropriate to try restarts by alternately trying to re-orient the curve within its error strip and altering its shape, starting with reorientation.

(ii) Reorientation. (Again notation assumes the x-coordinate) This depends on the signs of the Is, taking E xj as the failing error near / and E y as the failing error near j. (The index t is for the fountainhead.) Describing this as ( g "( £ ') 5 g £ *.')) then ihere are f 0ur possibilities: (++), (+-), (-+) and (--). If we have (+-) or (-+) we have to 'lift' the values of the

AC

*. ' S by a small constant big enough to establish the restart condition. This corresponds to rotating the whole curve in the appropriate direction around a central point. Alternatively if we have (++ ) or (-) then we need to 'lift' the values of the x f by a similar small constant, again big enough to restore the normal restart condition. This corresponds to translating the whole x-f curve in the appropriate direction. However in this case we need to verify that the 'lift' doesn't cause the central E x to exceed its error bound. Each of the four cases requires slightly different formulae to be applied to function values and first differences as appropriate, as follows. If first differences are incremented the function values in contention need to be re- evaluated also, e.g. from each other. Table 14 below sets out these actions.

Table 14

Case Effective condition action M D

Here x upper and x, ower are the points with the smallest discrepancies from the upper and lower error bounds respectively for the input sample points of the same index. M is the minimum size the restart requires to clear the stop condition and D is the largest value which can be allowed without causing an error bound breach elsewhere, so if the condition fails then the action cannot be applied. After carrying out a lowering or raising, the candidates for the upper and lower indices need to be revised, as set out in Table 15 below:

Table 15

The values themselves (x uppef and x tovver ) need to be corrected by the correction amount before this revision is carried out. The initial values are found when calculating out the x values when the discrepancies from the original error bounds are determined (the smallest is retained), and this is updated whenever a new value is calculated.

For rotations the situation is more complicated. Points to the left of the centre of rotation ( 2 ) will move in the opposite direction to points to the right of the rotation centre. We therefore need to keep two pairs of limit values around the fountainhead (not the rotation centre, which varies), indexed by lowerL, lowerR, upperL and upperR respectively, (Ls and Rs correspond top left and right in the sense of smaller than fountainhead index (L) and bigger than fountainhead index (R)). These have to be kept going at the same time as the common lower and upper indices which now contain the indices for the smallest of either L and R pair. The maximum amount of rotation permissible is now given by D as set out in Table 16 below. Table 18

ιηίρ. , Dl, D R '

The raising or lowering of x t has similarly to be checked against its error bounds, but this should just be a precaution. There is a similarly differentiated set of rules for updating the four indices as shown in Table 17 below,

Table 17

There is an 'applicable' test in the algorithm which is essentially the test given in the column 'condition' in the previous table. The actual modification of already calculated values is not 10 necessary if it is not required to look at them again, which is normally the case before the algorithm stalls for the last time (see note 7.12).

An alternative approach is to calculate all the difference corrections (7.8(ii),(iii), & (iv)) at the same time and not to alternate them, which simplifies both the logic and the terminating condition test. This means that new values need to be calculated after the initial difference correction, and these are used to determine the values in the foregoing tables in this note. New values for the same indices need to be calculated again at the restart, including sentinel values, so simpler logic (and lower cost) has to be balanced against the extra cost of the recalculations.

(iii) Fountain restart (shape changes). This is very similar to starting but now the central fountain 'sprays' the vicinity of the stalls with solutions for the current cubic, using Stirling's formula. With the fourth difference method (see note 7.7(ii)) four revisions of existing solutions are needed at each stall site, so Stirling's formula has to be invoked 8 times. This is why it is more transparent to use Stirling's formula throughout, although the actual cost per coordinate is 12 operations (one to compute the error). Any review of earlier coordinate values will require subsequently accumulated lifts and rotations to be taken into consideration, unless that earlier value is at the head of the fountain (which is always kept up-to-date), and this needs to be taken into account when finishing up with the curve (see note 7.12). Because it is tedious to track this, any review of previous values should use new values from the fountain itself, as they are, indeed, up-to-date. As it happens this is only very rarely necessary. Fountain restarts go through the same stages as the first start-up and formation of a fountain, but the existing fountain is only modified to reflect a new, revised model for the cubic. So we first reconfigure the fountain difference stack at index , then generate new values around the stalling indices, which we will take here as ( < f), the lower boundary index, and j (> t) the upper boundary index. First the stack reconstruction:

Third and second differences: Since, for a cubic, third differences should all be the same and second differences should all be linearly related, we really want to fit the line Δ ,' = A ' ί + ¾ to the second differences.

Applying linear least squares we can derive (linear) formulae for β1 and p2 as follows:

If partial results are kept for each sum the costs of calculating these sums can be minimised.

First differences: All that is needed now to build the Stirling central difference stack is A C whose calculation is as follows (formulae need to be calculated in bottom-up order): 7K

Δ 1

Δ 3 ,

The next stage is to apply Stirling's Central Difference formula repeatedly, using the available differences. This is given as: ien restarting using the fourth difference approach we need to (re-)calculate the last four valid x-values around , (2 = / -/) and the four around Xj ( z = J ~ t ). This involves applying the Stirling formula four times from indices / ' to i+3 and another four times from j-3 to j. In a direct comparison with the Stirling calculation this means that a minimum run of 8 successive error calculations is needed on average to justify using the fourth difference approach. In this case we have also assumed the availability of recurrences for the Stirling formula, as will be discussed next.

(iv) Recurrences. Where successive values of x are generated on an index, this can be handled by setting up recurrence relations, as follows:

- Δ^ + (ζ + ΐ)· Δ^ '

/ + - r + A 3C

To implement recurrences using the above equations starting values for K + and L + have to be formed as they are required. So a recurrence is set up as follows: x, = {Stirling Formula]

= Δ.

XJ 2

and the last 3 steps are repeated for successive indices as many times as needed. This process needs to be started again whenever the As or xs change, i.e. in a restart.

A similar arrangement can be set up for the negative direction (direction of reducing indice or 'downwind') direction:

* ~ A M -

A typical curve value now only requires 3 + 1 add-weight operations, to produce a new value, but there is a start-up cost in calculating the initial values for K and L. This means that the first recurrence costs 11+1 again (and 5 multiply-weight operations), the second 5 + 1, with 1 multiply-weight operation, and then all the others are 3 + 1s. Using recurrences is even more efficient than using fourth differences, but again less transparent, even, than fourth differences. Moreover the overhead on setting up recurrences is fully amortised against the straightforward Stirling calculation in just 3 iterations, and against the fourth difference method in 4 iterations.

[7.9] Restart failure condition. The process of finding more candidate points can continue so long as either the 'applicable' condition in 7.8(ii) is true (suggesting that the curve may be manipulated to break the stopping conditions at either end) or that at least one of the new points generated by the fountain extends the curve. If neither is true then the restart fails. [7.10] :Pm¾ar¾t On fof m galed caffe of ou ai^ Once the fountains algorithm has been used to generate a complete Bezier chain, the fourth difference sum stack will have been emptied. However, the lengths of the segments are now known, so the segments should now be sorted by length and a new stack built of the entries in length sort order. A re-run of the algorithm on the same data but with scaled (increased) error bounds will bypass the data preparation stage and start with attempting to extend the longest curve, then the next longest etc. Other preparations for the 'sieve' which lie outside this algorithm (i.e. curvature maps) need to take place here.

[7.11] Reconciliation of ends (end-points). It is possible that the refinement process extends the curve until it overlaps with a previous solution in which case the ideal would be if the curves actually cross naturally. That would be the end point for both curves. If they overlap without crossing and there has been no interior point clamping then there is scope for trying to choose a solution where tangents are more-or-less parallel and corresponding points on the curve can be clamped together. Clamping will happen to one or other of the curves only and then a check needs to be carried out to propagate the consequences back to interior points. Assuming a 'snap-together' tolerance value then this could be invoked here to define 'parallel'. This can be summarised as set out in Table 18 below:

Table 18

Intersection Angle Action

It is useful (in the 'sieve') to note corner points in curves as these tend to be preserved between frames. Straight lines (second and third differences all 0) and inflection points (second difference zero crossings) should be so marked to speed up reinterpretation as curve segment, and also for the 'petal' algorithm.

[7.12] Calculation of interior points. This is now straightforward, although care has to be taken over which end points and which interior points. The tableau contains cubic points with the correct difference values (A 3C s all the same constant, second differences linear, first differences quadratic etc.), and is indexed uniformly, so the indexes are Bezier curve indices. However only the curve points on either end of the curve, together with the 'fountain' head point , are completely representative, because of reorientations. Other points have not been updated as their values haven't been needed. The end-points yield the control points P 0 and P 3 but the corresponding interior points must now be solved for, by substituting suitable index numbers into the x and y curve point parametric equations. If two such points are chosen, say the points x, and x 2 corresponding to parameters t 1 and t 2 (which are normalised indices, so column indices need to be normalised now), then the interior points Pi, P 2 can be determined as the unknowns in the simultaneous equations for the x-values involving the t- values. Solving these gives us:

Evaluating the determinant and multiplying everything out gives us the values for Pi, P 2 . The quantities in the equations are; α = 3 · ί 2■ (! - *,)

^ = 3 · ί, · (1 - /,) 2

r= 3 - i| -(i - i 2 )

*, = *ι - /, 3 · /¾ - (1 ·

Similar equations enable the recovery of the y-coordi nates of the interior points. The only 'suitable' x and t values come from the fountainhead point, and another curve point now needs to be synthesised. The fountainhead is the only interior curve value to have fully corrected values at the end of the segment fitting process. Only one other interior point is needed and a suitable such point could be found by splitting the index for the longer of the parameter sequences between the fountainhead and the ends, and then calculate the corresponding x value using the fountain. These two points may be reused in the next stage.

Now, these four control points refer to the complete curve, not the curve with reconciled end- points, and indeed the interior points are needed to determine tangents (which may now be 'snapped together'). So, we now need to isolate the segment which will actually form part of the chain, renormalise the parameters, and solve again for the same x values as before (the parameters will change under renormalisation and the end-points under reconciliation). The equations are the same as previously.

We now, finally, have the four control points for the relevant Bezier segment and this completes the process for the segment. Sometimes reconciliation will be left to the end but it can be carried out as soon as two segments overlap, so the second control point calculation can be done as soon as both end-points are reconciled.

Returning to the notes to the encoder pseudocode:

[8] The test render is to determine if an extra subdivision of the interval between contours (or contour and extremum) is needed. This is conducted just as in the decoder and will involve further tile subdivisions on the same conditions.

[9] Once the process begins, repeated failing test renders will not be expensive because distance maps will be available from previous attempts. One distance map will need to be 'cut' (i.e. not entire solution area needed), the other (half sized) synthesised. An extra 25% compute time is anticipated from this source. [10] Even at this stage it is possible un-necessary contours have been introduced because of the qruas/ ' -half-subdivision, for example in trying to find intermediate contours along the edge of letters or rendered vector artwork.

Generally, the performance without parallelism exploitation is largely determined by the cost of computing all the contours in a single level, needed only once, regardless of colour depth. Speed-ups of 80 x compared to the blind encoder discussed previously are probable, so equivalent to 14 x frame rate for a 256 x 256 image on a 1 GHz machine, or 2 x frame rate on a quad core unaccelerated 2 GHz machine, or 10 x frame rate accelerated, for a 2K image.

There may be further opportunities for compression. The main way in which compression might be achieved is by relaxing tolerances, for e.g. scaling error values. Scaling the pixel error value reduces the numbers of contours generated because the subdivision condition is invoked less often in the contour finding process. Scaling the spatial error in the contour path allows for fewer segments in Bezier chains. This scaling may be non-linear, for e.g. it may be relaxed by local slope (scaled additionally by inverse of slope magnitude). It is known that the HVS is particularly insensitive to this type of error and its consequence will be to reduce the segment count even further. Finally the critical angle for the 'snap-together' condition will have an effect on image appearance but not on Bezier segment numbers because the generation algorithm takes maximal advantage of whatever freedoms it has to maximize smoothing (another feature the HVS is insensitive to). It is believed that image degradation will be 'graceful' unlike JPEG/MPEG where it is more like falling off a cliff at a certain level.

A further embodiment of the present invention provides a method of decoding video signals encoded by the above coding method. Table 19 sets out the steps involved in the decoding method (the pseudocode).

Table 19 - Pseudocode for adaptive VPI decoder

Notes to Table 19 (indicated by [x] in table).

[1] This stage is present to ensure that pixels in dense contour fields are correctly rendered. It also reconstructs edge information provisionally lost in the encoder. The level set formula for this stage, equivalent to the original level set formulation is: where ε is a small constant (which may decrease in size near an edge as determined by the depth of local tile subdivision and actually be zero at an edge, so ε = ε(ψ) ),

V · I —— is the curvature term, and s is the (starting) colour gradient at the contour border itself. Here we are first taking s - H where: H = ~ · j— is the synthetic colour gradient at the edges of level i with associated colour C2 when the scalar derivative |V ¾/ _,] and the colour difference C, ~ C 3 are known from the previous level of recursion (level i-1 ). H is in essence the Catmull-Rom tangent assumption. The '+' subscript indicates the 'upwind' direction at the contour boundary, the '-' subscript the upwind direction, and these now need to be included as appropriate. The complete formulation of the synthetic 'upwind' equation (that is an equation which assumes a synthetic tan ent at the middle contour) is thus: where

he starting boundary at S (R is the other 'upwind' boundary), and jv^.j without sign is the scalar derivative in the interior of the contour footprint. The term for |vy,._,| may be biased Carrato-style when used for test renders to ensure edge preservation using tile depth as a proxy for slope. There is a corresponding form of the 'downwind' equation, essentially with the external signs reversed and ( 2 -C 3 ) replacing the ( , - C,) term, giving the quasi-Euclidean distance segment from the same contour (i.e. the other side of ψ). Similarly the quantities ε and H need to be modified in accordance with tile depth and similarly scaled to a limiting value at maximum

C - C permitted depth. For ε this is 0 (ε→0), for H this is a non-zero constant, H→

C,— C 3

As before, interpreting the equation involves 'dropping off' velocity at each sample point as it is passed. This time the velocity varies depending on local conditions rather than being a unit velocity (which would be obtained if and H =— ), as was the case with the non- adaptive decoder, so the distance travelled does not increase in unit (time-stamp sized) steps but at rates to match the colour gradients on either side of the contour.

In fact s, the colour gradient can be determined directly during encoding. It will typically vary around the contour border but if it lies outside the synthetic value H by more than permitted by the error bounds it should be used instead. These bounds are determined as the difference between the tangents defined on the local isosurfaces defined by the upper and lower bounds respectively. Options for carrying the variations include modelling the values as a spline curve, in terms of harmonic frequencies or in terms of a scalefactor for H modelled by either process, i.e. s = Ah.H and we model or approximate Ah .

Only so many control points or frequency levels will be allowable for future bandwidth considerations so the representation which most accurately tracks the variations of s should be chosen. The decisive criterion is file size, i.e. what choice(s) minimise(s) file size? This assumes that accurate modelling of colour gradients will reduce the numbers of contours an adaptive encoder will generate.

Using curvature scaled with a small constant for front-smoothing is well-known (e.g. J.A. Sethian "Level Set Methods and Fast Marching Methods" CUP Cambridge 1999). The full 'smoothing' form of the equation is a key part of the whole process.

In a development of this embodiment alternative algorithms for edge and crease-smoothed Euclidean distances may be used which may be quicker than solving these Level Set equations directly using the well-known Fast-Marching Narrow-Band solution strategies as in Sethian (above). This formulation of the problem may suit acceleration particularly well.

Decoder performance will be significantly faster than the blind decoder discussed earlier because it is rendering from a typically 1/16 sample set scaling at n x log(c). This translates to 20x prototype speeds or 0.5 second/frame at 1 GHz at 256 x 256, which translates to 80 2K frames/sec (or 1-3x frame rate) on a 4-core 2 GHz machine with an accelerator card. These speeds are somewhat less certain, due to the uncertainties that the smoothing calculations bring, but smoothing could be switched off during test renders, although this might engender more contours than strictly needed in the encoder. This in turn may be partly recovered by switching it back on for test renders to decide whether or not to throw out a contour at the end, when the cost of this kind of search is much lower. Further optimisation can be provided.

A third embodiment of the present invention provides a method of encoding video images which represents an enhancement of the method described in the first embodiment above by extending it to moving images. This encoder will be referred to as the "Motion VPI encoder". Table 20 shows an overview of the steps involved in the decoding method (the pseudocode).

Table 20 - seudocode for motion VPI encoder

Notes to Table 20 (indicated by [x] in table)

[1] The package in which a sequence of data structures (DSs) is sent needs to distinguish such groups for the decoder and these need to be presented first. It is assumed that the package will be broken up into packets as required by IP.

[2] The pixel array may be accompanied by a quantisation error array at the same resolution. The VPI encoder will accept one or two arrays as presented.

[3] The optimisations are optional. The VPI DS is a hierarchy enforced by the relation 'encloses' as mentioned in the descriptions of the VPI codecs above. In this version each contour is cached on an LRU (least recently used) basis and the caches are mirrored in encoder and decoder. A precise definition of 'encloses' in these terms is given in J. Patterson, CD. Taylor, P.J. Willis "Constructing and Rendering Vectorised Photographic Images" Proc CVMP 09, also JVRB (to appear).

Any well-formed hierarchy conforms to the Hierarchical Display Model, HDM, and offers the possibilities for space-reduction optimisations under the functions inheritance, grouping, constraining and composition, which have not been considered so far in VP I encoding. The example of a common sub-component (common subtree) was identified earlier in [1]. Now that the focus is on transmitting many similar images the issue of optimisation even in terms of common groups (intra- or inter-frame) becomes important and easily incorporated in the codec at this level. An API allowing user-optimisations may be useful here. Useful intraframe optimisations could be incorporated into the single image VP I codec. HDMs are often managed in a command- driven model (e.g. pdf) and the building of a new HDM could and should be based on interpreting a sequence of commands.

A typical single-frame DS now consists of a pair (common subcomponents, pruned HDM tree). Common subcomponents may be recognised by caching subtree roots and checking for correspondences. Specific options for partial or wholly matching subtree roots and subtrees are discussed in Note 2 for the Motion VPI decoder above.

[4] Whenever a common substructure is identified it needs to be promoted immediately and sent in the first fields of the next package to the receiver. Since this number is unknown a priori, package sizes will need to be adaptable.

[5] The 'package full' condition is similarily dependent on codec-independent criteria. One possibility is to package up a complete scene at a time, so the package specifies its own inter-sequence transition, but that involves automatic detection of scene changes, outside those clues provided by editing or redirection metadata. A metadata stream is therefore required in the codec for coder-decoder communication.

[6] Finding correspondences. Certain aspects of this algorithm, notably the concept of curvature scale space, the use of aspect ratio as a discriminator, and matching mirror images are have already been described in [Abas99]. However, these are included here to provide a complete overview of the method including the novel parts which form part of aspects of the present invention: the fountain, the use of scaled error bounds, petal counts, partial-order sorts, discriminability tests, curvature averaging and scaling. The algorithm itself is also novel and forms part of aspects of the invention.

The following approach will find matching or partially matching contours quickly. This can be used intraframe, for common subcomponents, of interframe, between frames, for reducing the amount of data needed to be transferred. These optimisations are carried out at the contour level. It is assumed that each contour acquires a principal axis-aligned bounding box with the long axis appearing first. The pseudocode for this approach is set out in Table 21 below.

Table 21

7.7.4 if not loop termination then {

7.7.4.1: verify contour comparison by direct inspection [6.7]

7.7.4.2 act on novel contours [6.9]

7.7.4.3

7.7.5 }

8 optimise data structure

Explanatory notes to Table 21 (denoted by [6.x] in table)

[6.1] As for 'fountains of difference' difference tables described in relation to Table 13 above. [6.2] (i) The general condition for error bounds overlap is that if, at a given level of subdivision (of petal or segment) then where three adjacent curvature error bounds overlap each other individually, the first with the middle and the middle with the last, then the middle curvature can be dispensed with. This is repeated until there are no overlapping errors. The calculation of curvature is discussed elsewhere. (ii) Petal maps are only invoked if there are corners in the petals. Petals are sorted on number of Bezier segments and the error in doing so is determined by the longest of the shortest distances between corresponding corners, here defined by a corner of the same sign but nearest in the sort order. This distance is thus the error bound for the sorting of segment counts and trees, as in part (iii), may be constructed of valid orders of the indices, taking possible swaps due to overlapping bounds into account. Because the error value is not available before a comparison this calculation has to done at the moment of comparison and again, if the degree of discrimination is insufficient, the comparator has to be withdrawn. This is highly unlikely in the presence of corners.

(iii) The curvature index sequence would also be organised as a tree, whose root is an anonymous start point and whose immediate descendents were the legal largest curvature candidates taking error bounds into account, followed by the next largest, depending on choice of largest, etc. This can be determined from an initial sort ignoring error bounds. Partial orders then take error bounds into consideration. Two trees would correspond if each tree had a common path from root to leaf, and discrimination on the basis of petal maps would fail if all possible alternatives were present in the tree, i.e. as many descendents of the start point as indices. If the contour is to be scaled globally the curvatures only need to be corrected for the final comparison, as sort orders are conserved when all the entries are scaled equally. This is not the case for non-global scaling.

Curvatures are correlated initially as one per chain segment, then one per petal by chord- length integration. Each curvature is deemed to apply as the same value within errors to chord-length determined half-way boundaries between measurement points. The average curvature is thus:

[6.3] Initially levels of relaxation are counted from the 'bottom' (the image resolution level) of the hierarchy, which propagates the number of iterations to the top. This then becomes part of the geometry data. The levels are now counted in inverse order from 1 to the iteration number.

[6.4] Scaled and unsealed curvatures (or parameters) all need to be retained until the end, This applies generally to affine transformations. They are only applied at the point of comparison and then the data reverts. These tests involve multiple actions. [6.5] If several parts of a contour partially match the comparison contour they are retained as a subtree of partial matches with affine transformations indicated for each (these may be different but flipping (mirror image) must be consistent. The subtree shows how to rebuild the contour, i.e. non-matching portions indicated and retained in order. Splits, joins and in- betweens will all be indicated like this. Obviously matched in-betweens have a high confidence level as matches, subject to a (later) egomotion test.

[6.6] What this means is that the current affine transformation is a refinement of the previous one within error bounds.

[6.7] This is the stage at which partial or total matches have to fit within their original error bounds. On-curve control points and tangent directions are taken for this. [6.8] The formula

should be used as a safer alternative here. The xs and ys are taken from different tables but apply to proximate points [6.1 above]. Petal maps determine the number of petals, (+,-) curvature pairs, which will start with zero (positive curvature only) then 1 ,2,3 etc. with alternative positive '+' and negative '-' curvatures. Inside each petal will be a curvature magnitude, transformed by outstanding affine transformations. Petals must match each other in order around underlying 'circle', e.g. nth largest curvatures are in same order around both petal maps, for all n, starting with 0. For confirmation, n = 0 corresponds to a contour with positive curvature only (+) , n = 1 to a contour labelled (+-), n = 2 to a contour labelled (+-+-), etc. That is, for n > 0, the contour is labelled (+-) n meaning n repetitions of '+-' . In addition to marking curvatures, straight-line segments (which will appear as separate entities) will be marked with the same curvature as the curves they join onto, i.e. (+ straightline +) is (+++) or simply (+), (+ straightline - ) is (++--), (- straightline +), (-++), or (+-) and (-+) respectively and (- straightline-) is (— ) or (-). Corners are just extreme versions of + or - depending on which is the interior angle. Runs of +s and -s are concatenated to a single + or - and the overall curvature and size relative to bounding box recorded. Petal fine structure involves showing straight lines and corners in context. So a corner embedded in a positive curvature ha If- petal will have a preceding Bezier segment count, a marker indicating the corner and a following segment count.

[6.9] The 'send' function communicates with the context in which the matcher is called. If it is for intraframe operations then only the caches are involved (but requires encoder-decoder communication to keep caches identical). If for interframe operations it determines what is sent for that contour, a stub ('you have it already'), an edit ('edit your version with this') or a full contour. Cache entries are initially optimised for space usage, for e.g the modification of a contour is retained as a back reference to the original plus the modification only. If the original contour is to be purged then it is reconstructed using the edit and retained in that form in place of the edit. Other references to the same contour need to be similarly updated. The cache operates on a least recently used double-ended queue mechanism which is most simply implemented by placing the most recently used entry at the head of the queue and consolidating. This is inefficient (but mirrored in command semantics) so a mirror queue with pointers should be used and reordered instead. Entries are in the form of equally sized linked list elements, so a second list of empty cells is maintained and used for data inserts. So the main structure (with all the data in it) only participates in taking and replacing entries on the empty cell list. If the sieve is being used to match in-betweens or textures then a mirror cache is used and sends are to the mirror cache. This is then traversed at the end of a frame matching and outstanding data sent to the decoder.

[6.10] The decision tree data structure is used for accessing cache entries. They are classified first. All petal structure and bounding box data comes from the analysis for Bezier chains.

A fourth embodiment of the present invention provides a method of decoding images encoded according to the method of the above third aspect. Table 22 shows an overview of the steps involved in the decoding method (the pseudocode).

Table 22 - suedocode for motion VPI decoder

Notes to Table 22 (indicated by [x] in table)

[1] Composition operations include union (A over B), intersect, set difference, x-or. (see Sethian as referenced above). Contours from groups are cut and re-joined as would be expected. No soft edges or compensation for point-spread, hence 'hard' operations.

If an API is provided at the encoder for assisting optimisations the functionality provided in the decoder clearly needs to be able to cope with the consequences, for e.g. allowing some contours to be introduced first. This could be intrinsic or command driven. [2] Options for common subcomponents are set out in Table 23 below, Table 23

Root relationship Affine relationship Additional data

identical no no

identical (scale/rotate/translate)* no

partial no non-corresponding contour segment partial (scale/rotate/translate)* non-corresponding contour segment pre-loaded no no

pre-loaded (scale/rotate/translate)* no

pre-loaded (scale/rotate/translate)* non-corresponding contour segment

Commands specifying these options will carry the relevant qualifiers and parameters. Other possibilities no doubt also exist. [3] Rendering is carried out at input frame rate or multiples of frame rate, otherwise a different decoder structure is needed with independently timed expansion and

decoding/rendering stages. No advantage is gained in 'pure' video VPI from exceeding input frame rate unless interframe motion vectors supported.

Codec Instruction Sets VPI Decoder

As stated above, the preference in the above embodiments is to use SVG or SVG- compatible formats. Most parts of the VPI image data structure can be configured to an SVG- friendly form quite readily, with the exception of the intercontour fill regime for which a new instruction set will be needed and offered up to W3C for incorporation in the standard. The term for 'instruction' used in SVG terminology is 'command' which we will now use in the rest of this document. As of September 201 1 we understand that the SVG subcommittee is looking for suggestions to improve the SVG standard and we will offer our forms for inclusion in the standard. The command sets identified here are as close to the matching SVG command sets as possible, some the same, some 'mirrors' with supersets of existing command semantics. As all the VPI decoders obey the same command set (VPI being a proper subset of motion VPI) this section will be simply extended for the more advanced decoders, and in fact all the decoders can have essentially the same structures.

By comparison with the compact forms desired for VPI internal format, SVG is extremely verbose and completely dependent on text forms. Compact structures making efficient use of fields and encodings are unreadable but can be read if decoded into a textual form, so long as there is a 1 :1 relationship between the compact and textual forms. There is an obvious direct (engineered) relationship between internal forms and SVG for the most part, the exception being the inter-contour fill command which has no counterpart in SVG. For the unexceptional parts the relationships are SVG → XML <-> VPIDS, where VPIDS is the internal data structure. Of these XML is the most verbose as it specifies everything SVG treats declaratively - in essence allowing alternative ways of achieving the necessary semantics. Software to translate between SVG and XML and backwards exist, in the case of XML to SVG the acceptable XML forms are restricted. The basic command set are those required to define the path of a contour. The contour specifies a fixed colour but of infinitesimal size. Contours are defined as in SVG in terms of paths, with path attributes as given in the previous section. These attributes are bound together with the current path definition using context braces (locality is used in the VPI DS). These paths are drawn using the commands which specify a drawing process. All points are nominally in absolute (output) coordinates. Drawing is defined in terms of the current path. which has two pointer values associated with it, the current path (a pointer to the start-point of the curve), and the current point (which points to the last point drawn). To start a path a moveto command ('m' in textual VPIDS) is issued in the form m <coordinate>. This sets the current path and current point pointers and closes off any outstanding alternative paths hitherto treated as current. Drawing proceeds with the issuing of lineto ( I <last coordinate> ) or curveto ( c coordinate 1 > coordinate 2> <last coordinated commands. In each case the current point takes on the value of the last coordinate, so contours are drawn using straight line segments or cubic Bezier segments joined together in a chain. The start point of each line or curve segment is always the starting value of current point which is always reset as a result of performing a drawing command. For Bezier segments the coordinates 1 and 2 are the interior control points whose values will determine continuity at the end point. No explicit provision is made for supporting any particular degree of continuity (this is managed inside VPI). A closepath (z <no parameter) command completes the path with a straight line to current path and current point is reset to current path as a result. If a moveto is issued without a previous closepath this implies a sub-path is being started. Here this is interpreted to mean a separate closed path which is to be treated like the path currently being drawn (same attributes, chirality etc.). The sub-paths of a contour will aways lie inside the contour itself. Contours at the same level are not normally sub-paths one of another and the concept is only invoked for defining diffusion.

Coordinates

Coordinates are of two kinds, absolute and relative. A relative coordinate is the difference between the absolute value of the current coordinate and the absolute values of the previous coordinate issued. These are kept in two values last x and last v. Coordinates are thus specified as absolute or relative (a, r). Relative coordinates may be signed but absolute coordinates are always positive. x 0 , y 0 (0, 0) are always in the top left-hand corner of the output region. The signing of relative coordinates is a path attribute and signs may often be avoided if the top left hand corner of the axis-aligned bounding box is set by a suitable moveto command (successive movetos may be issued). The first bit in a relative address field is the sign bit. The quantisation of relative addresses is also a path attribute whose default is either 256 or the last quantisation set. Commands may be packed (densely aligned) or unpacked (byte aligned). In memory, commands will be aligned as is most advantageous to performance, e.g. with relative addresses converted to absolute, but in intermediate files compactness will be the over-riding consideration.

Commands

A summary of the standard SVG commands and their meanings as described in [Duce&OO] is given in Table 24 below. Table 24

Command Meaning Parameters (1 ,2 etc) m move to coordinate currentpath = currentpoint = parameteM

without drawing.

If preceded by drawing on currentpath (currentpath ≠ currentpoint) then closes path and starts subpath (same as path with subpath attribute) c draw cubic Bezier uses (currentpoint, parameterl , parameter, segment parameter3) as endpoint, interior point, interior point, endpoint respectively. currentpoint = parameters

I draw straight line line drawn between currentpoint and parameter 1.

currentpoint = parameterl z close path if currentpoint≠ currentpath then issue 1 (currentpoint, currentpath); currentpoint = currentpath

Accommodating double diffusion, as in VPI codecs is not straightforward as SVG has no concept of persistence in regard to paths outside explicit naming, and will only normally consider paths one at a time. Here we need to identify an outer contour and one or more inner contours for each diffusion operation. So one approach is to define the contours from the top of the hierarchy downwards and name them as they are defined. The names in effect define the nodes of a tree, i.e. the implementation of the relation enclosed by. SVG does allow grouping and grouping is the support for naming. Accordingly the tree may be defined in terms of the children of each parent, and each node identified by level_number+parent_name+child_name, where '+' implies concatenation and '_' inclusion. <Level_number -1 >+parent_name identifies the enclosing contour and the level_number+parent_name the group name for the contours enclosed by it. This gives a unique rule for identifying individual contours within a single image which may also be used in video formats. However, by itself, this is not enough as the fill regime will need to be aware of all of the contours bounding the region to be filled.

SVG fills are specified by associating a fill attribute for the fill regime. The normal regime is flat colour but other regimes, imported from a paint server, may be used. We will need a new attribute for fill, namely one which introduces a diffusion regime. Like conventional painting the basic semantics call for a value to be associated with each pixel but we to provide distances (which may be distinguished as 'Euclidean' or 'Manhattan' by an additional parameter) from outer contour to inner and vice versa on the postfix encounters of a right- root-left (or left-root-right which gives more obvious reverse Polish) traverse. In each case diffusion of the specified form in the specified direction, as defined by the relations encloses (single diffusion) and enclosed by (together forming double diffusion)are carried out and the results placed in a two-field record associated with each pixel sample. This includes diffusion from sub-paths in the enclosing contour but not in the enclosed contour. By issuing fill commands, following path drawing in the reverse polish order as defined by the tree hierarchy reverse, the entire area defined by the contour will be filled with the required type and numbers of distances. If the contour colours are also propagated with the distances a simple pixel-by pixel operation will now determine sample point colours (depending on whether the single or double attribute is selected each time) and the image may be rendered in stages. However the order in which the contour draw commands are issued will also need to be the same as the order in which fills are issued, namely reverse Polish (postfix) order. This is because inner contours will identify themselves here as being filled already, as a filled sample point will stop the fill locally.

Additional attributes may be associated with individual contours or groups include smoothing on or off (smoothing-on will carry a small numerical parameter) and edge-matching, which can be off, calculated, natural, differenced. Each of these will be associated with a single group which may include other subgroups, but will typically apply to one tree address only. In the case of the natural attribute a list of unit normal vectors, which will be a set of samples uniform in distance around the contour will follow, and in the case of a differenced attribute then another attribute, frequency or curve, followed by a closed cubic curve or a list of frequencies and phase angles modelling normal variation. This particular set of associations will require the ability to qualify the behaviour of the regime because these effects are localized as opposed to applying to the entire tree traverse. Locality will be managed by explicit grouping or making use of the existing grouping patterns. While all of these last parameters can improve the render quality they can be omitted (quite good results have been obtained even then). Reselection of the diffusion attribute to set these parameters can be done individually and the relevant paint server will assume the previous parameter settings as defaults.

Since tree structures and tree traversals are not explicit components of SVG, despite its hierarchical nature, the contour drawing aspects of the traversal are laid our explicitly with the contour acquiring its name independently of its position in the XML/SVG code. Subsequent references to the contour are thus only to the contour (group) name and made in the usual way. Thus the order in which contour drawing is issued is the postfix order of encounter in a right-root-left tree traversal of the original data structure, i.e. from the inside out, while the original tree order is from the outside in. SVG command set extensions

Even with the most compact of representations this approach is quite convoluted, but it does require an absolutely minimal change to SVG, in essence no more than that required for any other novel paint regime. However as the tree is descended contour names will become longer and longer, again as log(n) grows, although this could probably be mitigated using the same trick as with coordinate addresses. Fortunately these can be avoided, but at the cost of reintroducing subpaths explicitly. These in turn aren't common but usually arise when the prevailing direction of levels changes.

What is really needed is a considerably more efficient solution which could, if necessary, be pre-processed to generate the foregoing outcome. This involves introducing new commands which mirror the existing ones while building and managing the tree data structures behind the scenes. This can be done reasonably straightforwardly as well as making explicit contour names explicitly un-necessary. Additionally the previous regime requires that the entire tree be reordered to reverse Polish form, which will also turn out to be un-necessary. A different kind of traverse (subtree traverse, or traverse by tree levels) will be needed to capitalise on this fully. The necessary preprocess to generate 'minimal' SVG can still restore the postfix order although it would have to simulate the tree-build itself, at least after a fashion. This solution now follows.

Our list will consist of commands which introduce new semantics. As they often need to be merged with the commands that usually follow or precede them, because they need access to the same parameters, we will introduce them in their merged forms. These are roof(r), new record(s), add new entry(a), leafjy), fill (f), level number (h) and terminate (t). They are summarised in Table 25 below:

Table 25

Command Meaning Parameters (1 ,2 etc) r (root) move to open queue of queues (QoQ)structure, initialising 'previous' coordinate and 'current' entries in QoQ structure.

without drawing,

initialise queue of load frame contour with start point parameter 1 , level number queues and : parameter 2 size parameter 3 and parameter 4 as a single- insert frame i entry queue onto QoQ structure, then as for moveto m contour <parameter1 > s start new queue load new queue record onto QoQ for contour which follows, record 'current' now points to this, and 'previous' to its predecessor, then as for mo veto m <parameter> Set previous. PtQ and ; current. PtQ to point at start of their queues. PtQ is 'pointer to ; queue entry'

a add new entry to as for z then

current record

add to queue last entered in QoQ with linkage to contour which follows, then as for mo veto m <parameter>

f fill defined region as for z then

mark contour with f (last of inner contour list)

fill in space between selected contours as defined in QoQ. 'previous' entry now pointed to is outer contour (encloses), 'current' entries (contours enclosed by) between fill markers, specifically the one just set and the last one set before it. If 'previous' contour in queue as pointed to by previous. PtQ has a y marker, fill as specified for y then delete and move previous. PtQ pointer to next entry. Repeat as above if this entry also has a y marker.

If previous. PtQ now does not point beyond the end of the queue, process this next contour (now pointed to by previous. PtQ) as Outer' and all oldest entries before and including next f contour as 'inner' contours (the list starts with entry pointed to by current. PtQ). Delete contour entry in 'previous' queue (outer contour, pointed to by previous. PtQ), move previous. PtQ on from last entry pointed to and move current. PtQ to point to entry after f contour in 'current' list just processed.

If the 'previous' queue is now empty, (previous. PtQ points i beyond end of queue) delete queue record from QoQ and reset 'previous' from 'current'.

X leaf (start) as for s then

mark entry in next contour starting at <parameter> as being sole fill region (nothing enclosed in it) y leaf (loose end) as for z then

mark entry in next contour starting at <parameter> as being sole fill region (nothing enclosed in it)

h set level number put level number into contour record which is the currentpaih, then as for z

if no parameter the level number is the same as the last setting. If h is omitted ahead of z or a z equivalent, then again the level number for the current contour will be as last set. t terminate as for f, then verify terminal state (no 'current' entry), then fill entries in 'previous' as specified for y (they will all contain y markers) then delete. Delete QoQ and tidy up.

So Balch's example would translate (aka SVG syntactic sugaring) to: r (x 0 , y 0 ) lo n 1 n 2 s ( X1 y,) c (.., ..) (.., ..) (.., ..) .... c ( ) (.., ..) (.., ..) h y (x 2 y 2 ) c (. ) f s (x 3 y 3 ) c (.„,... ,.,,<) h l 2 (x 4 , y 4 ) c ( ) f x (x 5 . y 5 ) c ( ) t

Here (x,y) coordinate pairs form a single parameter and the Is are level numbers (or colour values or other colour identification). This is itself just a syntactically sugared version of the data structure tree described above, in which each tree level and specifies the filling process as an outside-in process rather than the inside-out process given previously.

These new commands build a queue-of-queues (QoQ) data structure behind the scenes. Initialisation is carried out by 'r'. Each entry in the QoQ will itself be a double-ended queue of entries identifying and managing the identification of matching outer and inner contours. Typically there are one or two entries in the QoQ and when there are two they are marked 'previous' (the first entry) and 'current' (the last entry). The command 's' starts the process of building a new 'current' entry and 'a' adds to the current entry. The last entry which references a given previous entry is marked by an f if a double diffusion is required, and any entry which requires only single diffusion (the top of a hill or the bottom of a dip) is marked with a y'. The actual diffusions are managed by an f command, which also manages the QoQ and its entries so that 'leaf entries on the tree are filled (single diffusion) and single regions between contour levels are filled (double diffusion).

This form of representation, unlike previously, does not require the use of sub-paths to identify holes in the outer contour footprint. If these appear first in a current queue then the translation to the sub-path form is straightforward, should it be required. This also simplifies level specifications. The root command specifies the borders of the image as a rectangle and optionally the SVG version might use this, but all contours will be clipped to the border and closed anyway. If it was desired that the image extend beyond its borders, e.g. as in a panorama, then this extension might be useful. In any case translation to a fully syntactically sugared SVG is straightforward if the new commands are allowed, and straightforward if only conventional fills, as overloaded by the requirements of diffusion via a new paint specification are used following a simulation of the actions above which builds the full tree, followed by a traverse aimed at extracting the Reverse Polish order of contour specifications needed for inner-outer rendering.

As a direct representation of the outer-in version which is originally built, this last form, which uses both prefix and postfix encounters in its implicit traverse, has the potential to form the most compact image representation possible. Common subtrees can be named and referenced but where these are not used contour names are not required (this may also be the case with the inner-out version) and are implicit. This basic command set can be used to structure a VPI decoder to be more like a motion-VPI decoder which will execute supersets of the command set given here, but these have still to be defined.

Coding

The foregoing codec algorithms would normally be reduced to practice using ULAs (uncommitted logic arrays) or FPGAs (Field programmable Gate Arrays) driven off ROM (read-only memory) or PROM (programmable read only memory). Alternatively the codecs could be implemented in software, again using ROMs or PROMs to drive accelerators.

The methods of the above embodiments may be implemented in a computer system (in particular in computer hardware or in computer software) in addition to any structural components and user interactions described. The term "computer system" includes the hardware, software and data storage devices for embodying a system or carrying out a method according to the above described

embodiments. For example, a computer system may comprise a central processing unit (CPU), input means, output means and data storage. Preferably the computer system has a monitor to provide a visual output display (for example in the design of the business process). The data storage may comprise RAM, disk drives or other computer readable media. The computer system may include a plurality of computing devices connected by a network and able to communicate with each other over that network.

The methods of the above embodiments may be provided as computer programs or as computer program products or computer readable media carrying a computer program which is arranged, when run on a computer, to perform the method(s) described above.

The term "computer readable media" includes, without limitation, any medium or media which can be read and accessed directly by a computer or computer system. The media can include, but are not limited to, magnetic storage media such as floppy discs, hard disc storage media and magnetic tape; optical storage media such as optical discs or CD-ROMs; electrical storage media such as memory, including RAM, ROM and flash memory; and hybrids and combinations of the above such as magnetic/optical storage media.

While the invention has been described in conjunction with the exemplary embodiments described above, many equivalent modifications and variations will be apparent to those skilled in the art when given this disclosure. Accordingly, the exemplary embodiments of the invention set forth above are considered to be illustrative and not limiting. Various changes to the described embodiments may be made without departing from the spirit and scope of the invention.

REFERENCES

[Abas99] Abassi, Mokhtarian & Kittler (1999) "Curvature scale space image in shape similarity retrieval" Multimedia Systems 7: 467-476

[Adob85] Adobe Systems Inc "Postscript Language Tutorial and Cookbook" Addison- Wesley Reading, Mass. 1985

[Agu&87] Agui, T. et al. "An Automatic Interpolation Method of Gray-Valued Images Utilising Density Contour Lines" Proc EUROGRAPHICS '87 (August 1987) pp 405-421

[ANSI83] ANSI, "Graphical Kernel System" American National Standard Institute X3H3- 25r3 Computer Graphics Special Issue, February 1984 [ANSI84] ANSI "Programmer's Hierarchical Interactive Graphics Standard (PHIGS)" American National Standard Institute X3H3/84-40, February 1984

[Arms83] Armstrong, M. A. "Basic Topology" Springer Science, New York 1983

[Bart&87] Bartels, R. et. al. "An Introduction to Splines for Use in Computer Graphics and Geometric Modelling." Morgan Kauffman, Los Altos, Calif. 1987 [BisnOS] Blinn, J. "What is a Pixel?" IEEE CG&A (September/October), pp 82-87 (Jim Blinn's Corner) 2005

[Carr&96] Carrato S. et. al. "A Simple Edge-Sensitive Image Interpolation Filter" Proc, Int. Conf. Image Processing, 1996 pp 31 1 -314.

[Catm&80] Catmu!l E., & Smith A.R. "3D Transformations of Images in Scanline Order", Computer Graphics (Proc SIGGRAPH'80) ACM July 1980 pp 279-285

[Crim&03] Criministi A., et. al. "Object Removal by Exemplar-Based In-Painting" Proc. CVPRO3 June 2003

[Cock93] Cockton G,. "Architecture and Abstraction in Interactive Systems" PhD Thesis Heriot-Watt University 1993 [Cuis&99] Cuisenaire, O, Macq, B "Fast and Exact Signed Euclidean Distance

Transformation with Linear Complexity" IEEE Proc Acoustics, Speech, and Signal

Processing 99, Vol 6, 15-March 1999 ISBN: 0-7803-5041-3

[Duce&OO] Duce, D. et al. "SVG Tutorial" VV3C 2000 & 2002

[EMVA10] EMVA (European Machine Vision Association) "Standard for Characterization of Image Sensors and Cameras Contents Release 3.0". (Standard 1288) November 29, 2010

[Fabb&08] Fabbri, R. et. al. "2D Euclidean Distance Transform Algorithms: A

Comparative Survey" ACM Computing Surveys, 40 (1) February 2008 pp 2:1-2:44

[Fari96] Farin G. "Curves and Surfaces for Computer-Aided Geometric Design (4th edition)" Academic Press 1996

[Ferr&OO] Ferrariolo J. et. al. "Scalable Vector Graphics (SVG) 1.0 Specification" World Wide Web Consortium, 2000

[Finc73] Finch C, "The Art of Walt Disney" Harry N. Abrams, Inc. New York 1973 pp

180

[Glass89] Glassner, A. "An Introduction to Ray Tracing" Morgan Kaufmann, New York,

1989

[ltoh&93] Itoh K, Onho Y "A curve fitting algorithm for character fonts" Electronic Publishing, 6 (3) September 1993 pp 195-205

[Kim&07] Kim, T-H, et al. "Image Dequantisation: Restoration of Quantized Colours" Proc. EUROGRAPHICS 2007 in Computer Graphics FORUM 26 (3) 2007 pp 619-626

[Kuni&97] Kunii, T. L, Maeda, T. "On the Silhouette Cartoon Animation", Proceedings of Computer Animation '96 (June 3-4, 1996, Geneva, Switzerland) Thalmann, N.M., Thalmann, D. (eds.), pp. 110-117, IEEE Computer Society Press, Los Alamitos, California, 1996.

[Kuni98] Kunii, T.L. "Technological Impact of Modern Abstract Mathematics"

Proceedings of Third Asian Technology Conference in Mathematics (August), ed. Wei-Chi Yang W-C et al, 1998, Springer [Leco&06] Lecot, G, Levy, B,."ARDECO: Automatic Region Detection and Conversion" Proc. Eurographics Symposium on Rendering 2006 pp 1604-1616

[Lind98] Lindberg, T. "Edge Detection and Ridge Detection with Automatic Scale Selection" Int. J. Comp. Vis. 30, 2, pp1 17-154

[Lore&87] Lorensen W.E., Cline, H.E., "Marching Cubes: a High Resolution 3D Surface Reconstruction Algorithm," Computer Graphics, 1987 (Proc. SIGGRAPH'87), 21 , (4) pp 163- 169

[Math75] Matheron G., "Random Sets and Integral Geometry" Wiley, New York 1975

[Meij&OO] Meijster, A., Roerdink, J., & Hesselink, W. "A general algorithm for computing distance transforms in linear time". In Proceedings of the 5th International Conference on Mathematical Morphology and its Applications to Image and Signal Processing (2000).

[Naka&83] Nakajima, M et. al. "Coding Method of Gray-Valued Image by Density Contour Lines" Tech. Rep. I.E.C.E. Japan IE83-74 (1983) pp 1-6

[Nebe&05] Nebel J.C., Cockshott W.P., Yarmolenko V, Borland E., MacVicar D. "Pre- commercial 3-D digital TV studio" IEE Proceedings - Vision, Image, and Signal Processing 2005

[Oddy&90] Oddy R., Willis P. "A Physically Based Colour Model" Computer Graphics FORUM 10 (2) June 1991 pp121-127

[Orza&07] Orzan A., et. al. "Structure Preserving Manipulation of Photographs" Proc. NPAR 2007 pp103-1 10

[Orza&08] Orzan A., et. al. "Diffusion Curves: A Vector Representation for Smooth-Shade Images" Proc. SIGGRAPH 2008

[Patt&92] Patterson J., Cockton G. "Compositing Hierarchically Structured Images" Computer Graphics FORUM 1 1 (3) September 1992

[Patt&06] Patterson, J & Willis P.J. "Method for Image Processing and Vectorisation" Patent Application 0613199.9, May 2006, University of Glasgow [Port&84] Porter T., Duff T. "Compositing Digital Images" Proc SIGGRAPH '84, July

1984

[Pric&06] Price, B., Barrett, W. "Object-Based Vectorisation for Interactive Image Editing" Vis. Comp. 22 (9-11) 2006 pp 661-670

[Reev81] Reeves, W.T. "In-betweening for Computer Animation Utilising Moving Point Constraints" Computer Graphics (Proc SIGGRAPH 81) 15 (3) pp 269-276

[Schn90] Schneider P. "An Algorithm for Automatically Fitting Digitised Curves" in Graphic Gems, Glassner A. ed. Academic Press, San Diego & London 1990

[Serr82] Serra J., "Image Analysis and Mathematical Morphology" Academic Press, London 1982

[Seth99] Sethian J.A., 'Level Set methods and Fast Marching Methods", Second Edition, CUP, Cambridge 1999

[Shan85] Shantzis M., "A Model for Efficient and Flexible Image Computing" Proc SIGGRAPH'85 August 1985

[Smit95] Smith, A. R., "A Pixel is Not a Little Square, A Pixel is Not a Little Square, A Pixel is Not a Little Square! (And a Voxel is Not a Little Cube)" Technical Memo 6, Microsoft Research, 1995

[Sun&06] Sun, J et al. "Image Vectorisation using Optimized Gradient Meshes" ACM TOG 26 (3) 2007 pp 1 1-1-7

[Tayl09] Taylor.C.D. Private communication

[Van&01] Vansichem G., Waulters E., Van Reeth F., "Real-Time Modelled Drawing and Manipulation of Stylised Characters in a Cartoon Animation Context, Proc. lASTED

International Conference on Computer Graphics and Imaging (CGIM 2001), Honolulu, Hawaii, USA, 2001 pp 44-49

[Vinc93] Vincent L., "Morphological Grayscale Reconstruction in Image Analysis: Applications and Efficient Algorithms" IEEE Trans. IP 2 (2) April 1993 pp 176-201 [War&67] Warntz W., Woldenberg, M. "Concepts and Applications -Spatial order" Harvard papers in Theoretical Geography No 1 , May 1967

[Weis06] Weiss, B "Fast Median and Bilateral Filtering'' Proc SIGGRAPH 2006

[Yu90] Yu, J-H "Polar coordinates linear interpolation (PCLI)" Dept of Computing Science, University of Glasgow, 1990.

[Yu&OI] Yu X., et. al. "Image Reconstruction Using Data-Dependent Triangulation" IEEE CG&A 21 (2001) pp 62-68

[Yuks&10] Yuksel C, et. al. "On the parameterisation of Catmull-Rom Curves" Proc SIGGRAPH (?) 2010 [Zhan&04] Zhang L, Snavely N, Curless B, Seitz S.M, " Spacetime faces: high resolution capture for modeling and animation" Proc SIGGRAPH 04.

All references referred to above are hereby incorporated by reference.