Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
3D MODEL CAPTURE SYSTEM
Document Type and Number:
WIPO Patent Application WO/2020/240210
Kind Code:
A1
Abstract:
The present invention relates to capturing data for modelling objects in three dimensions. More particularly, the present invention relates to a method and apparatus for capturing images and dimensions of objects in order to create three-dimensional models of these objects. The present invention seeks to provide a three-dimensional (3D) model of an object generated from multiple photos of that object.

Inventors:
EASTHAM ROBERT (GB)
Application Number:
PCT/GB2020/051318
Publication Date:
December 03, 2020
Filing Date:
June 01, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
REALITY ZERO ONE LTD (GB)
International Classes:
G06T7/11; G06T7/194; G06T7/564
Foreign References:
US20190088004A12019-03-21
Other References:
YADOLLAHPOUR PAYMAN ET AL: "Discriminative Re-ranking of Diverse Segmentations", IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. PROCEEDINGS, IEEE COMPUTER SOCIETY, US, 23 June 2013 (2013-06-23), pages 1923 - 1930, XP032492964, ISSN: 1063-6919, [retrieved on 20131002], DOI: 10.1109/CVPR.2013.251
WOLFF L B: "Using polarization to separate reflection components", COMPUTER VISION AND PATTERN RECOGNITION, 1989. PROCEEDINGS CVPR '89., IEEE COMPUTER SOCIETY CONFERENCE ON SAN DIEGO, CA, USA 4-8 JUNE 1989, WASHINGTON, DC, USA,IEEE COMPUT. SOC. PR, US, 4 June 1989 (1989-06-04), pages 363 - 369, XP010016473, ISBN: 978-0-8186-1952-6, DOI: 10.1109/CVPR.1989.37873
Attorney, Agent or Firm:
BARNES, Philip Michael et al. (GB)
Download PDF:
Claims:
CLAIMS:

1 . A method of capturing data for modelling an object in three dimensions, comprising:

receiving a plurality of images of the object, wherein the plurality of images of the object comprises a plurality of images of the object each having a different perspective view of the object;

generating a plurality of masks for each of the plurality of images of the object;

determining a final mask for each of the plurality of images, wherein the final mask is determined from one or more of the plurality of masks generated for each of the plurality of images of the object; and

outputting a plurality of final masks, comprising a final mask for each of the plurality of images of the object.

2. The method of any preceding claim, further comprising determining a confidence score for each of the generated plurality of masks for each of the plurality of images of the object.

3. The method of any preceding claim, wherein determining a final mask for each of the plurality of images further comprises selecting a primary mask from the plurality of masks.

4. The method of claim 3 when dependent on claim 2, wherein selecting the primary mask comprises selecting the mask having the highest confidence score of the plurality of masks.

5. The method of any preceding claim, wherein each of the plurality of images is segmented into masked and unmasked segments using one or more of the plurality of masks for each of the plurality of images.

6. The method of claim 5 when dependent on claims 3 or 4, further comprising the step of determining the largest unmasked segment of each of the plurality of images when masked with the primary mask and segmenting all smaller unmasked segments of each of the plurality of images when masked with the primary mask as masked segments.

7. The method of claim 6, further comprising determining one or more overlapping portions of (a) any masked segments in any of the images when masked with any of the other masks other than the primary mask and (b) the determined largest unmasked segment of the images when masked with the primary mask; and segmenting the overlapping portions of each largest unmasked segment as masked.

8. The method of any preceding claim, further comprising applying each final mask to each of the respective plurality of images of the object.

9. The method of any preceding claim, further comprising outputting each of the plurality of images of the object, wherein the final mask for each of the plurality of images of the object has been applied to each of the plurality of images of the object.

10. The method of any preceding claim wherein the final mask of the plurality of masks comprises a combination of two or more of the plurality of masks.

1 1. The method of any preceding claim, wherein the plurality of masks comprises any or any combination of: a difference mask; a grab cut mask; an edge detection mask; a mask determined by iterative edge detection; a mask determined by Sobel edge detection; a mask determined by canny edge detection; a mask to remove holes detected in the object; a mask determined by boundary aware salient object detection (BASNet); a mask determined by the UA2-Net technique; a mask to remove hands and/or fingers; a mask determined using the Hand-CNN technique; a mask determined by comparing polarised and non-polarised images; a mask trained using machine learning techniques to detect certain types of objects in a scene based on previous known good masks of a similar object/subject; a mask to remove pixels of a certain colour, pattern or range of colours and/or patterns; a mask to detect one or more elements of a scanning apparatus.

12. The method of any preceding claim wherein determining the final mask for each of the plurality of images comprises using any or any combination of: a decision tree; a learned approach; a machine learned approach; a weighted approach; a weighted average ensemble approach; a weighted average ensemble prediction approach; an approach trained on a plurality of images from earlier scans.

13. The method of any preceding claim, wherein the plurality of images comprises polarised and unpolarised images; and further comprising a step of determining surface properties of the object using the determined differences between the polarised and unpolarised images, optionally using a learned model to detect materials and/or surface properties of the object, optionally the surface properties comprising specularity.

14. The method of any preceding claim, wherein the plurality of images is obtained from and/or using any or any combination of: one or more image sensors; one or more cameras; one or more video cameras; one or more stereo cameras; one or more 3D object capture apparatus; a rotating platform on which to place the object; one or more background assemblies having known colours and/or patterns; substantially optimised lighting; one or more polarisers; a user holding the object; a head mounted display system; an augmented reality headset; one or more unmanned aerial vehicles provided with images sensors.

15. A 3D object capture apparatus for capturing data for modelling objects in three dimensions, comprising:

a rotatable platform;

a lighting assembly;

one or more digital cameras operable to generate one or more digital images; and

a processor in electronic communication with the rotatable platform and the one or more digital cameras; wherein the processor is operable to:

instruct one or more rotational movements of the rotatable platform; instruct the one or more digital cameras to generate one or more digital images for each of the one or more rotational movements; receive one or more digital images generated by the one or more digital cameras; and transmit the one or more digital images to a server.

16. The apparatus of claim 15, further comprising a backdrop.

17. The apparatus of claim 16, wherein the backdrop comprises a curved lamina surface, optionally wherein the curved lamina surface substantially forms an“L” shape in cross-section.

18. The apparatus of any one of claims 16 or 17, wherein the backdrop is removable and/or replaceable.

19. The apparatus of any one of claims 16 to 18, wherein the backdrop is provided in a first configuration and a second configuration, optionally wherein the first configuration is narrower than the second configuration.

20. The apparatus of any of claims 15 to 19, wherein the one or more digital images comprise one or more sets of raw image data and/or any other suitable image data format.

21 . The apparatus of any of claims 15 to 20, wherein the lighting assembly is integrated into the apparatus.

22. The apparatus of any of claims 15 to 21 , wherein the lighting assembly comprises one or more light emitting diodes (LEDs), optionally wherein the one or more LEDs are provided in the form of one or more lighting panels, optionally wherein the lighting panels comprise non-LED flash panels or units.

23. The apparatus of any of claims 15 to 22, wherein the lighting assembly further comprises a polarising light filter positioned between the lighting assembly and the rotatable platform.

24. The apparatus of claim 23, wherein the one or more digital cameras comprise a polarising camera filter positioned between the digital camera and the rotatable platform.

25. The apparatus of claim 24, wherein the polarising camera filter and the polarising light filter are orientated orthogonally with respect to each other.

26. The apparatus of claim 25, wherein the processor is further operable to:

orientate the polarising light filter and/or the polarising camera filter according to the one or more digital images received.

27. The apparatus of any preceding claim, wherein the one or more rotational movements are made with reference to a predefined angle.

28. The apparatus of any preceding claim, wherein the server is remote from the processor.

29. The apparatus of claim 28, wherein the server comprises a cloud storage database.

30. The apparatus of any preceding claim, wherein the server is operable to process the one or more digital images into a three-dimensional (3D) model.

31 . The apparatus of claim 30, wherein the processor is further operable to:

modify one or more of the digital images such that computational expense is reduced when the server processes the one or more digital images into a 3D model.

32. The apparatus of any preceding claim, wherein the one or more digital images are in the form of digital video data.

33. The apparatus of claim 32, wherein the processor is further operable to:

extract one or more digital images from the digital video data.

34. The apparatus of any preceding claim, wherein the rotatable platform is integrated into the apparatus.

35. The apparatus of any preceding claim, wherein the rotatable platform comprises one or more object holding posts.

36. The apparatus of claim 35, wherein the one or more object holding posts comprise one or more magnets.

37. The apparatus of any preceding claim, wherein the processor is in electronic communication with the lighting assembly, and further wherein the processor is operable to instruct the lighting assembly to illuminate the rotatable platform for each of the one or more rotational movements.

38. A rigid frame comprising the apparatus of any preceding claim.

39. The rigid frame of claim 38, comprising one or more retractable and/or removeable elements.

40. A capture apparatus for generating a digital replica of a painting, comprising:

a frame;

a moveable mount, wherein the moveable mount is provided on the frame; one or more digital cameras operable to generate one or more digital images; and

a processor in electronic communication with the moveable mount and the one or more digital cameras; wherein the processor is operable to:

instruct one or more movements of the moveable mount;

instruct the one or more digital cameras to capture one or more digital images for each of the one or more movements;

receive one or more digital images generated by the one or more digital cameras; and

transmit the one or more digital images to a server.

41 . The apparatus according to any of claims 15 to 40 operable to perform the method of any of claims 1 to 14.

Description:
3D MODEL CAPTURE SYSTEM

Field

The present invention relates to capturing data for modelling objects in three dimensions. More particularly, the present invention relates to a method and apparatus for capturing images and dimensions of objects in order to create three-dimensional models of these objects.

Background

Photogrammetry is a technique used to obtain information about physical objects and the environment, for example by making measurements, from photographs.

Photogrammetry can be used, for example, with a single photograph to determine the distance between two points on a plane that has been captured by the photograph (where the plane is parallel to the image plane of a photograph and if the scale of the image is known).

Photogrammetry can also be applied to multiple photographs, for example, to determine the positions of surface points on objects where there are multiple photographs of an object. Specifically, a technique called stereophotogrammetry involves estimating the three-dimensional co-ordinates of points on an object using measurements made in a plurality of photographs using common points identified on each of the photographs.

Summary of Invention

The present invention seeks to provide a three-dimensional (3D) model of an object generated from multiple photos of that object. Modelling an object in three dimensions can also be referred to as representing an object in three dimensions, and further it is understood in the art that a“three-dimensional model” can be used interchangeably with the term“three-dimensional representation”.

According to a first aspect, there is provided a method of capturing data for modelling an object in three dimensions, comprising: receiving a plurality of images of the object, wherein the plurality of images of the object comprises a plurality of images of the object each having a different perspective view of the object; generating a plurality of masks for each of the plurality of images of the object; determining a final mask for each of the plurality of images, wherein the final mask is determined from one or more of the plurality of masks generated for each of the plurality of images of the object; and outputting a plurality of final masks, comprising a final mask for each of the plurality of images of the object.

Applying multiple masks to each image of an object and determining a final mask from one or more of the multiple masks can allow for a substantially optimal final mask to be determined for each image of an object that is for example one mask using a masking technique to produce a substantially accurate mask of the object in the image or a combination of multiple masks that together combine to provide a substantially accurate mask of the object in the image, and determining final masks for each of a plurality of images of an object for use to create image data for creating a three-dimensional computer model of the object.

Optionally the method further comprising determining a confidence score for each of the generated plurality of masks for each of the plurality of images of the object.

Determining a confidence score for each mask can allow for the more accurate selection or efficient selection of one or more of the masks for each image as the final mask for that image by using masks with a high, or relatively high, confidence score(s) or confidence scores above a pre-determined or dynamic threshold value.

Optionally, determining a final mask for each of the plurality of images further comprises selecting a primary mask from the plurality of masks. Optionally, selecting the primary mask comprises selecting the mask having the highest confidence score of the plurality of masks.

Selecting a primary mask, for example by using the confidence values determined for the multiple masks generated for each image of the object, can allow the final mask to be determined primarily based on the determined primary mask.

Optionally, each of the plurality of images is segmented into masked and unmasked segments using one or more of the plurality of masks for each of the plurality of images. Optionally, the method further comprises the step of determining the largest unmasked segment of each of the plurality of images when masked with the primary mask and segmenting all smaller unmasked segments of each of the plurality of images when masked with the primary mask as masked segments. Optionally, the method further comprises determining one or more overlapping portions of (a) any masked segments in any of the images when masked with any of the other masks other than the primary mask and (b) the determined largest unmasked segment of the images when masked with the primary mask; and segmenting the overlapping portions of each largest unmasked segment as masked.

Determining the largest contiguous portion of unmasked pixels in an image using the primary mask can identify the object being imaged and isolating just this largest contiguous portion of pixels by masking any other unmasked portions of the image when the primary mask is applied can remove portions of the image that are unlikely to be the object being scanned for three-dimensional modelling. Determining the overlap between the largest contiguous portion of unmasked pixels in the primary mask and the masked portions of the other masks generated for the image can identify sections of the mask that are potentially not the object, for example holes through an object, and masking these overlapping portions in the largest contiguous portion of the primary mask allows for only the highest confidence portion of the image to remain as representative of the object being scanned and thus can provide only the highest confidence image data for generating the three-dimensional model of the object that has been imaged.

Optionally, the method further comprises applying each final mask to each of the respective plurality of images of the object. Optionally, the method further comprises outputting each of the plurality of images of the object, wherein the final mask for each of the plurality of images of the object has been applied to each of the plurality of images of the object.

By applying the final mask to the image, only the image data pertaining the object can be retained for use in generating the three-dimensional model of the object.

Optionally, the final mask of the plurality of masks comprises a combination of two or more of the plurality of masks. Optionally, the plurality of masks comprises any or any combination of: a difference mask; a grab cut mask; an edge detection mask; a mask determined by iterative edge detection; a mask determined by Sobel edge detection; a mask determined by canny edge detection; a mask to remove holes detected in the object; a mask determined by boundary aware salient object detection (BASNet); a mask determined by the U A 2-Net technique; a mask to remove hands and/or fingers; a mask determined using the Hand-CNN technique; a mask determined by comparing polarised and non-polarised images; a mask to remove pixels of a certain colour, pattern or range of colours and/or patterns. Optionally, determining the final mask for each of the plurality of images comprises using any or any combination of: a decision tree; a learned approach; a machine learned approach; a weighted approach; a weighted average ensemble approach; a weighted average ensemble prediction approach; an approach trained on a plurality of images from earlier scans. The final mask can be one or more of a plurality of masks, each mask optionally determined using one or a combination of techniques, and the final mask optionally determined using one or a combination of techniques/approaches.

Optionally, the plurality of images comprises polarised and unpolarised images; and further comprises a step of determining surface properties of the object using the determined differences between the polarised and unpolarised images. Optionally, the surface properties are determined using a machine learned approach.

Using a combination of polarised and unpolarised images from the same viewpoint of the object being imaged allows a comparison of the polarised and unpolarised data to determined properties of the object that can be integrated into the three-dimensional model of that object.

Optionally, the plurality of images is obtained from and/or using any or any combination of: one or more image sensors; one or more cameras; one or more video cameras; one or more stereo cameras; one or more 3D object capture apparatus ; a rotating platform on which to place the object; one or more background assemblies having known colours and/or patterns; substantially optimised lighting; one or more polarisers; a user holding the object; a head mounted display system; an augmented reality headset; one or more unmanned aerial vehicles provided with images sensors.

The images of the object can be acquired using any or a combination of apparatus and/or techniques which can allow flexibility in how the image data is captured of the object to be modelled.

According to a further aspect of the present invention, there is provided an apparatus for capturing data for modelling objects in three dimensions, comprising: a rotatable platform; a lighting assembly; one or more digital cameras operable to generate one or more digital images; and a processor in electronic communication with the rotatable platform and the one or more digital cameras; wherein the processor is operable to: instruct one or more rotational movements of the rotatable platform; instruct the one or more digital cameras to generate one or more digital images for each of the one or more rotational movements; receive one or more digital images generated by the one or more digital cameras; and transmit the one or more digital images to a server.

Photogrammetry is a technological development regarding recreating a 3D model from one or more digital images. The quality of the 3D model created is conventionally heavily reliant on the quality of the digital images provided as an input to create the digital model. Aspects can provide a means for capturing digital images of a sufficient quality to produce a 3D model which is satisfactory for an end user. By synchronising one or more digital cameras, a rotatable platform, and/or a lighting assembly, consistent photos of a subject may be taken at a range of different angles. This may make the process of generating a 3D model significantly less computationally expensive and more accurate than images taken in a range of light conditions and distances. Further, computational expense may be reduced through the use of automatic masking of the one or more digital images. Since the masked area may be discarded during a feature matching and/or depth map reconstruction procedure, the speed of processing may be increased.

Optionally, the apparatus for capturing data further comprises a backdrop.

In some embodiments, the use of a backdrop can increase the accuracy of a representation formed from one or more digital images of a subject by reducing the difficulty of removing a background from the representation. However, in some embodiments a backdrop may not be required, owing to more efficient and/or advanced masking algorithms operable to remove backgrounds from images.

Optionally, the backdrop comprises a curved lamina surface. Optionally, the curved lamina surface substantially forms an“L” shape in cross-section, optionally wherein the backdrop is removable and/or replaceable. Optionally, the backdrop is provided in a first configuration and a second configuration, optionally wherein the first configuration is narrower than the second configuration.

The use of a backdrop formed from a single lamina sheet, for example a sheet of white paper, curved into an approximation of an“L” shape, may be referred to as an “Infinity screen backdrop”. This can provides a neutral single colour background (or a pattern, for example a checkerboard with grey and black squares) to the images of the subject such that they can be digitally cropped for further processing (and the patterned background can allow for scaling of the object and detection of holes in objects). In some embodiments the backdrop comprises a single paper and/or fabric sheet, and so may be easily and economically replaceable. It is appreciated that multiple backdrop sizes may be used in order to accommodate different sized subjects to be scanned.

Optionally, the one or more digital images comprise one or more sets of raw image data.

Image data may be stored as a“raw” image file, which is a term in the art used to describe an image file which has undergone minimal processing and hence comprises a greater quantity of data in relation to the image when compared to a“developed” digital image which has already had certain changes made. Therefore, there is greater scope to amend the settings and data in relation to the raw image file, as no data has yet been changed or removed. Optionally, image data can be stored as any or any combination of jpeg, png, tiff, exr or any other image data file type.

Optionally, the lighting assembly is integrated into the apparatus. Optionally, the lighting assembly comprises one or more light emitting diodes (LEDs), optionally wherein the one or more LEDs are provided in the form of one or more lighting panels.

LEDs are conventionally much more efficient at producing light for a specific power usage compared to many other forms of light generation. A light panel may be used comprising multiple, optionally hundreds, of individual LEDs, which provide a naturally diffuse light to illuminate the scan subject. Using an integrated lighting assembly allows for greater control over the illumination of a scan subject. The light may be adjusted to allow for ambient conditions and/or a user preference, and the orientation between the lighting assembly and the one or more digital cameras may remain constant to produce a more accurate 3D representation. Alternatively, a ring flash or flash unit can be used (optionally a ring flash or flash unit with LED or non-LED bulbs).

Optionally, the lighting assembly further comprises a polarising light filter positioned between the lighting assembly and the rotatable platform. Optionally, one or more of the digital cameras comprise a polarising camera filter positioned between the digital camera and the rotatable platform. Optionally, the polarising camera filter and the polarising light filter are orientated orthogonally with respect to each other. Optionally, the processor is further operable to orientate the polarising light filter and/or the polarising camera filter according to the one or more digital images received.

When the polarising camera filter and the polarising light filter are orientated orthogonally with respect to each other the resultant image captured is referred to as “cross-polarised”. The use of cross polarisation is conventionally used to filter specular reflectance from an image. Reflections captured on a digital image can lower the quality of a 3D model. Therefore, following a reduction in reflectance from a subject being scanned, only a diffuse colour remains which provides a higher quality starting point for generating a 3D model. In some embodiments, both polarised and unpolarised digital images may be used to produce more accurate representations of a subject.

The orientation of the polarising camera filter and the polarising light filter may be adjusted through the processor, or manually by a user. The polarising camera filter may be built into a digital camera itself.

Optionally, the one or more rotational movements are made with reference to a predefined angle.

Depending on the number of digital images desired, which may be driven by the eventual quality required of the 3D model, the more rotational movements may be set at a range of different angles.

Optionally, the server is remote from the processor. Optionally, the server comprises a cloud storage database. Optionally, server is operable to process the one or more digital images into a three-dimensional (3D) model.

A remote server may have greater computational power than the local processor, and so provide a faster and/or more accurate means for generating the 3D model.

Optionally, the processor is further operable to modify one or more of the digital images such that computational expense is reduced when the server processes the one or more digital images into a 3D model.

By pre-processing one or more digital images within the local processor, for example by performing less computationally expensive processes involving the removal of background imagery, less work is left for the server where the 3D model is generated. Therefore, the 3D model may be generated faster and/or with greater accuracy.

Optionally, the one or more digital images are in the form of digital video data. Optionally, the processor is further operable to extract one or more digital images from the digital video data.

Digital video data can be a cost-effective means of generating a large number of digital images, and so it may be advantageous to use digital video data as an input to create a 3D model.

Optionally, the rotatable platform is integrated into the apparatus. Optionally, the rotatable platform comprises one or more object holding posts. Optionally, the one or more object holding posts comprise one or more magnets. Optionally, the rotatable platform comprises scales. Integrated scales in the platform can allow the apparatus to measure the weight of the object being scanned.

An integrated rotatable platform can more accurately control the subject to be scanned in relation to the one or more digital cameras and lighting assembly used. The subject may require supporting, for example if it cannot assume a required posture unaided. A series of posts, optionally in the same colour and/or pattern and/or material as the backdrop, can provide a required support system while remaining computationally inexpensive to remove from the eventually generated digital images. However, in some embodiments, one or more posts may be used in a contrasting colour and/or material, as some particular masking algorithms (for example grab-cut) may require a multi-stage approach to masking. In such a case, it may be more efficient to mask and remove the posts based on a colour/pattern and/or threshold masking approach, and then use a grab- cut or ML trained pattern recognition algorithm to separate the remaining image region-of- interest from the backdrop. If just the grab-cut algorithm was used alone, one or more posts in the foreground may be erroneously included in the final representation owing to having harsher edges that grab-cut may perceive to be part of the foreground.

Optionally, the processor is in electronic communication with the lighting assembly, and further wherein the processor is operable to instruct the lighting assembly to illuminate the rotatable platform for each of the one or more rotational movements.

By controlling the lighting assembly for each rotational movement, the lighting conditions may be modified and/or optimised for each of the one or more digital images which are captured. Therefore, a more accurate 3D model may be obtained using said one or more digital images.

Optionally there is provided a rigid frame comprising the apparatus as described herein. Optionally, the rigid frame comprises one or more retractable and/or removeable elements.

The use of a scanning apparatus may be required at a range of different locations. Therefore, a single rigid frame comprising the required tools may provide a convenient means of generating 3D models regardless of the location in which the subjects to be scanned are kept. If parts of the rigid frame are retractable and/or removeable then the ease of transportation of the rigid frame may be increased.

According to a further aspect, there is provided a capture apparatus for generating a digital replica of a painting, comprising: a frame; a moveable mount, wherein the moveable mount is provided on the frame; one or more digital cameras operable to generate one or more digital images; and a processor in electronic communication with the moveable mount and the one or more digital cameras; wherein the processor is operable to: instruct one or more movements of the moveable mount; instruct the one or more digital cameras to capture one or more digital images for each of the one or more movements; receive one or more digital images generated by the one or more digital cameras; and transmit the one or more digital images to a server.

According to another aspect, there is provided an apparatus according to any aspect or embodiment described operable to perform the method of any aspect or embodiment herein described.

Brief Description of Drawings

Embodiments of the present invention will now be described, by way of example only and with reference to the accompanying drawings having like-reference numerals, in which:

Figure 1 shows a perspective view of a scanning/capture apparatus according to an embodiment, the apparatus to be used with an object for which a three-dimensional model is to be generated from multiple images of the object captured from different viewpoints of the object;

Figure 2 shows a cross-sectional view of the scanning apparatus of Figure 1 ;

Figure 3 shows a plan view of the scanning apparatus of Figure 1 ;

Figure 4 shows a further perspective view of the scanning apparatus of Figure 1 ;

Figure 5 shows a further perspective view of the scanning apparatus of Figure 1 showing an object to be scanned and showing the image capture portion of the apparatus removed from the main apparatus;

Figure 6 shows an example backdrop for use with the scanning apparatus of Figure

1 ;

Figure 7 shows the scanning apparatus of Figure 1 in both a retracted form and an extended form with the object in position ready to be scanned;

Figure 8 shows a process flowchart outlining the steps in an embodiment of a capture process that can be used with for example the apparatus of Figures 1 to 7 (or other apparatus);

Figure 9 shows a process flow-chart detailing the steps in an embodiment of the masking process that can be used with the capture process of Figure 8;

Figure 10 shows the generation of a difference mask in order to isolate just the image portion of an object to be modelled;

Figure 1 1 shows the generation of a BASNet technique mask in order to isolate just the image portion of an object to be modelled;

Figure 12 shows an overview of the BASNet masking approach used to create the mask of Figure 1 1 ;

Figure 13 shows the generation of a I 2 technique mask in order to isolate just the image portion of an object to be modelled;

Figure 14 shows the generation of a Hand-CNN technique mask in order to isolate just the image portion of an object to be modelled;

Figure 15 shows an apparatus embodiment where the object to be modelled is placed on a turntable in order to capture multiple images from different viewpoints of the object;

Figure 16 shows an apparatus where the object to be modelled is held in the hands of a human user in order for the camera system to capture multiple images from different viewpoints of the object;

Figure 17 shows another view of the apparatus and object being held by a user of Figure 16;

Figure 18 shows an alternative apparatus allowing for images to be captured of an object being held by a user;

Figure 19 shows a further alternative embodiment using an augmented reality headset being worn by a user who is holding an object for which images are captured using the augmented reality headset;

Figures 20 to 23 show another embodiment showing an arrangement of image sensors mounted on an apparatus for scanning a painting;

Figure 24 shows the use of an augmented reality headset by a user to monitor and/or control the model generation process.

Referring to Figures 1 to 7, an example embodiment of a capture apparatus will now be described.

Referring initially to Figure 1 , there is shown a scanning/capture apparatus 100 comprising a number of components which will now be described in more detail. The apparatus 100 is for use in scanning an object in order to generate a three-dimensional digital model of the object.

At a first end of the scanning apparatus 100 is a capture plate 102. The capture plate 102 is arranged to surround one or more digital cameras (camera not shown in this Figure) and provides an aperture 104 for use as a secure and fixed mounting point for the one or more digital cameras. The capture plate 102 can allow one or more digital cameras to be placed or replaced in exactly the same position relative to the rest of the apparatus, for example if there is a hardware fault or if routine maintenance needs to be performed or the hardware exchanged with another piece of imaging apparatus, or if the imaging process is interrupted in some way. The capture plate 102 comprises an aperture 104, through which the digital camera can be at least partially inserted, for example in order to allow a lens to fit through the aperture 104.

In at least some embodiments, any digital cameras used may be placed through and/or around the capture plate 102, or arranged remotely from the capture plate 102. In other embodiments, one or more of any digital cameras used may be embedded within the capture plate 102 as part of a single unit. The aperture 104 is arranged such that when a digital camera is mounted on the capture plate 102 the lens and/or photosensitive apparatus of the digital camera will be operable to capture an image of an object on a rotatable platform 108. It is appreciated that multiple sizes of rotatable platforms may be used to accommodate differently sized objects.

The rotatable platform 108 is operable to rotate a predetermined amount according to one or more instructions received from a processor. Such instructions may be in the form of a wired or wireless electronic communication. The rotatable platform 108 may be in electronic communication with one or more processors and/or one or more of the digital cameras, such that the rotatable platform 108 is arranged to rotate a predetermined amount for each digital image to be captured. For example, if 24 digital images are to be taken, then after every image is taken the rotatable platform 108 can be set to rotate 15 degrees. Therefore after 24 images an entire 360-degree view of an object on the rotatable platform 108 may be captured.

In one example, the rotatable platform 108 rotates such that multiple (typically 48- 192) images are taken in different orientations by the digital camera mounted in the aperture 104. The subject (i.e. object placed on the rotatable platform 108) may be positioned on one or other different of its different sides (“flipped”) such that all aspects of the subject can be captured by the camera over several rotations of the rotatable platform 108, each rotation being for one orientation of the object positioned on one or other of its different sides/faces. The rotatable platform 108 may be covered by a similarly coloured material (for example a white, black, grey or a blue background or a black and grey checkerboard background) as a backdrop to the subject/object.

A backdrop support 1 10 is mounted behind the rotatable platform 108, at a second end of the capture apparatus 100, such that when the digital camera is mounted in the aperture 104, the backdrop support 1 10 is operable to hold a backdrop (not shown) as a background to an object on the rotatable platform 108.

The scanning apparatus 100 of this embodiment is mounted on one or more rails 106. The rails 106 allow the length of the scanning apparatus 100 to be varied according to the needs of the user. For example, in a compact state the rails 106 could allow the capture plate 102 to be brought closer to the rotatable platform 108. Alternatively, the rails 106 could be extended to increase the distance between the capture plate 102 and the rotatable platform 108. In some embodiments, the imaging device is mounted on a tripod with geared crankshaft to allow it to be moved up and down (optionally with a ring flash mounted thereupon, or alternatively a customised lighting and/or flash array). A rack unit 1 12 is located within the scanning apparatus 100, and in this embodiment is stored adjacent the rotatable platform 108 and, in use, underneath the platform 108. The rack unit 1 12 houses a computer system which comprises a processor, which processor is in electronic communication with the rotatable platform 108, the digital camera, and a lighting assembly (not shown). The computer system/processor may be considered as a control system for the digital camera, rotatable platform 108, and a lighting assembly, as well as for storing and/or pre-processing and/or transmitting/distributing captured images to a cloud platform for processing.

Figure 2 and Figure 3 show different views of the scanning apparatus of Figure 1 , including, in the embodiment shown in Figures 2 and 3, a single digital camera 1 14 mounted through aperture 104.

The digital camera 1 14 is mounted on the capture plate 102 through the aperture 104. The digital camera 1 14 is arranged so as to be operable to capture one or more images of an object placed on the rotatable platform 108 and with a backdrop which may be supported by the backdrop support 1 10. The digital camera 1 14 may further comprise a camera polariser (not shown) positioned in front of the digital camera 1 14 at 90 degrees to the orientation of a lighting polariser (not shown). Images captured with such an arrangement of polarisers are referred to as“cross polarised”.

Other embodiments may comprise a plurality of digital cameras, both local and remote to the capture plate 102.

Figure 4 shows a further representation of the scanning apparatus 100 of Figure 1 , showing some details of the internal structure of the apparatus 100 and the use of a backdrop 1 16.

In Figure 4, the backdrop support 1 10 is shown equipped with a backdrop 1 16. The backdrop 1 16 of this embodiment is formed from a single lamina sheet, for example a sheet of white paper, curved into an approximation of an“L” shape. Such an arrangement may be referred to as an“Infinity screen backdrop”, providing a neutral single colour background to the images of the subject such that they can be digitally cropped for further processing.

In some embodiments the backdrop 1 16 comprises a single fabric sheet or other substantially flat material having a known colour and/or pattern. Different choices of material may be used in embodiments, so may be easily and economically replaceable or may provide other desirable properties depending on the object being scanned or environment being used for scanning.

In other embodiments the backdrop 1 16 may be of a non-uniform colour, for example a chessboard pattern, as such a backdrop 1 16 may be more suitable for use with certain approaches to masking. It is appreciated that multiple backdrop 1 16 sizes may be used in order to accommodate different sized subjects to be scanned.

The use of a backdrop 1 16 of a single uniform colour, or having a regularly repeating pattern, can assist with the construction of a three-dimensional (3D) model from the images captured by the digital camera 1 14. The computational complexity of processing the images into a 3D model can be reduced when the background from the images may be more easily detected and hence removed.

Figure 5 shows a further representation of the scanning apparatus 100, together with a more detailed view of the capture plate 102. The rotatable platform 108 is shown with a subject 1 18 to be used to form a 3D model. In this embodiment the subject 1 18 is a toy car.

The capture plate 102 comprises the aperture 104 in the form of a circular hole, through which the digital camera 1 14 is positioned. The capture plate 102 of this embodiment comprises two capture plate receiving housings 124, which are designed to fit over the two capture plate supports 122. In such a way the capture plate 102 may be securely fitted to the main body of the scanning apparatus 100.

The capture plate 102 further comprises a lighting assembly 120. The lighting assembly 120 of this embodiment comprises four panels, each equipped with a plurality of light emitting diodes (LEDs). The lighting assembly 120 is one of the components alongside the digital camera 1 14 and the rotatable platform 108 in electronic communication with the processor which is housed in the rack unit 1 12. In one embodiment, a light panel may be used comprising multiple, optionally hundreds, of individual LEDs. The light panels are orientated around the digital camera 1 14 to minimise shadowing from the camera viewpoint.

Although the embodiment of this example comprises four panels, it is appreciated that any number of panels may be used, or no panels at all, depending on the lighting requirements. In some embodiments, fewer but larger panels may be used to provide the desired illumination. Further, in some embodiments, the panels can provide a constant light source and/or provide a flash configuration to allow the scene to be lit more brightly.

In one embodiment a polarising filter (not shown) is positioned in front of the lighting panels such that the subject is illuminated with polarised light.

In one embodiment, when a 3D model is to be generated, the processor sends a signal to the digital camera 1 14 to capture an image. While the image is being captured, the lighting assembly 120 is sent a signal from the processor to illuminate the subject 1 18 according to one or more predefined settings. Once the subject 1 18 is illuminated and the image is captured, the rotatable platform 108 is sent a signal from the processor to rotate by a predetermined angle. A further image is then captured of the subject 1 18 from a different angle. This process may be repeated numerous times, until enough digital images are captured to form a 3D model of sufficient quality as required by an end user. The process may optionally be repeated with the object positioned on the rotatable platform 108 in different orientations. The process may also optionally be repeated to capture sets of images illuminated with polarised light and also sets of images illuminated with unpolarised light. Optionally, the user can be provided with real time feedback, for example via a computer screen or augmented reality headset, which allows user input to increase the quality of the scan.

Figure 6 shows a further embodiment of a backdrop 1 16. In this embodiment, the backdrop 1 16 is wider than in the previously described embodiment, such that there is provided an auxiliary backdrop support 126 operable to support the backdrop 1 16.

Figures 7a and 7b shows the scanning apparatus 100 in a retracted form and an extended form.

In Figure 7a, the rails 106 are arranged so as to bring the backdrop 1 16 closer to the capture plate 102. In this retracted form the scanning apparatus 100 may be more conveniently transported or, if the subject 1 18 is smaller, then it may be brought closer to the digital camera 1 14.

In Figure 7b, an extended form is shown wherein the rails 106 are extended so as to increase the distance between the backdrop 1 16 and the capture plate 102 when compared with the retracted form. This extended form may be more suitable for larger models, or for specific digital cameras requiring a greater focal distance to produce an accurate image.

Figure 8 shows a process flow chart for the capture process, which will now be described in more detail.

In step 802, a digital camera (for example, the digital camera 1 14 in the capture apparatus of Figure 1 ) captures one or more images of the subject (or object, typically an object to be modelled in three-dimensions) 1 18. In this example embodiment, one or more raw digital images of the subject 1 18 are captured through a polarised camera filter under cross-polarised lighting. In other embodiments, other image or video capture devices can be used to capture images and/or video of the subject/object, either under polarised or unpolarised light or both.

A device control system (DCS) 804 receives the captured images from a digital camera and then transmits the images to a cloud-based system 806. In the example embodiment, the DCS 804 is configured to automatically synchronise data from the local system 802, 804 to the cloud system 806, for example copying or moving the data from the DCS 804, such as images, video, and/or frames extracted from video. In some embodiments, this image or video data may be pre-processed and in further embodiments the pre-processed and/or raw data may be provided to the remote computer system 806.

In some embodiments the digital camera 1 14 and DCS 804 may be combined into a single device, for example using the onboard processing capabilities of the digital camera 1 14 or in an alternative example because one or more imaging sensors are integrated into the computing system functioning as the DCS 804 (for example, but not limited to, an Augmented Reality (AR) headset or unmanned aerial vehicle).

Within the cloud 806, there is provided a storage database 808. Within the storage database 808 the images may be stored. Other cloud or remote computer system configurations can be provided in alternative embodiments, including single remote computers or servers, remotely hosted virtual machines, remote services, distributed services and/or distributed private or public cloud systems, or public or private blockchain or distributed ledgers. The storage database 808 can be provided in a storage arrangement separate from but in communication with the processing portion of the cloud 806, or as an integral part of the cloud processing portion.

Following transmittal of the image/video data to the storage database 808, a first pre-processing stage 810 is performed on these images/videos once stored in the storage database 808. This first pre-processing step 810 reads the one or more image/video files from the storage database 808 and decodes the image files into a local and/or remote memory. In doing so first pre-processing stage 810 may apply one or more transformations to the image, for example adjusting the brightness, apply noise removal and/or perform a gamma correction.

The first pre-processing stage 810 then inputs the image data into a second pre processing stage 812. The second pre-processing stage 812 comprises a combination of two sub functions: masking and image optimisation.

Image masks are created to enable background removal in order to process just image data pertaining to the object being scanned for generation as a three-dimensional model.

In a parallel process, the images without masks or any background removal are optimised for photogrammetry.

In an alternative embodiment, the first pre-processing stage 810 and/or the second pre-processing stage 812 are performed locally to the scanning apparatus 100 by the processor. The processor may be operable to perform any of the processes of the first pre-processing stage 810 and/or the second pre-processing stage 812, including one or more of: edge detection, difference masks; BASNet masks; U A 2 masks; and Hand-CNN masks, machine vision, and/or grab-cut removal of a background or masking algorithms designed to detect and remove elements of the scanning apparatus visible in the image.

The data from each of these parallel processes, which together at least partially make up the second pre-processing stage 812, is then passed to a 3D reconstruction stage 814. In the 3D reconstruction stage 814, images and masks may be used within a photogrammetry reconstruction function. This stage may constitute the most computationally expensive and complex stage of the process described herein, and may take place within a separate, remote, processor.

Following the 3D reconstruction stage 814, there is provided a stage 816 for generating automated level of detail (LOD) and texture and material map variants. In this embodiment, the output generated by the 3D reconstruction stage 814 comprises one or more 3D outputs which themselves comprise a combination of polygon meshes relating to the geometry of the subject 1 18 and texture or material maps relating to the colour or surface properties of the subject 1 18. Both the polygon meshes and texture or material maps have different impact on the usability of output. Multiple levels of detail of polygon mesh may be automatically generated. Multiple resolution texture and material maps may also be generated. It is further appreciated that output data representing a 3D model may be provided according to many different data formats, for example point clouds or voxels. The output may include one or more stylised versions of a 3D scan from one or more input images. This would allow for a model to be created beyond a photographic replication of a subject and allow for specific stylisation of the object. In addition to texture and material maps, the automated output may include generation of additional material maps useful in a PBR (Physically Based Rendering) pipeline - these maps may include, but are not limited to, automatic generation of normal, roughness, shininess, gloss, occlusion, parallax occlusion, transparency maps and/or any other map used in a PBR three- dimensional rendering workflow or real-time three-dimensional environment or non real time three-dimensional execution.

The next stage is to store the output data in a content management system (CMS) 818. An end user may be able to access their scanned data files in the CMS 818. Data can be accessed with a user configured range of LOD and texture and material map resolutions. An end user can add additional metadata to describe the object or properties of the object. This metadata may be useful for improving pre-processing steps for future scan subjects using a machine learning based approach.

Data files within the CMS may then be extracted as part of the extraction stage 820. Within the extraction stage 820 there are two main options from which a user may select their choice:

1 ) “Exported”: specific model files are manually downloaded to a user’s device for further use such as embedding in a 3 rd party application; and/or

2) “Streamed”: upon receiving a request through an integrated 3 rd party application a model file is downloaded temporarily for use such as temporary viewing. Searching or identification of models can be accomplished by the use of tag metadata applied to or associated with the models. The tag metadata can be user supplied or inferred by a machine learning based object classification/recognition process.

The user device 822 may comprise a device such as a computer, tablet, mobile phone, or other digital access tool upon which the user is accessing the relevant information. In one embodiment, a user interface for the scanner control and/or camera operation may be provided in the form of a virtual or physical interface (as described in further detail in the embodiment described in relation to Figure 24 below). In the case of a virtual interface no physical contact between a user and the scanning apparatus would be required. In one embodiment, the user interface may be provided through a companion tablet and/or web application (“app”). One or more of said interfaces may comprise an augmented reality (AR) and/or virtual reality (VR) interface.

In alternative embodiments, some or all of the pre-processing steps can be performed locally. In one such alternative embodiment, for example, the pre-process step 810 can be performed at a local computer or by the camera 802 or DCS 804. In another such alternative embodiment, the pre-process step 812 can also be performed at a local computer or by the camera 802 or DCS 804. The local computer or by the camera 802 or DCS 804 may have storage database 808 provided locally or storage database 808 may be provided as a local storage database and a cloud storage database for receiving the pre-processed images after performance of one or both pre-process steps 810, 812.

Referring to Figure 9, a more detailed embodiment of the mask generation process will now be described, which can be used with the process of Figure 8 (or as a standalone process for masking image data pertaining to a scanned object).

The auto-masking approach 900 can allow for substantially optimal masking in a number of scenarios, for example ranging from use in very controlled environments where the object is placed on a turntable (for example using the apparatus of Figure 1 in a studio environment) to use with an object being held by a user in front of a camera in uncontrolled lighting.

The auto-masking process 900 involves the creation of multiple masks and the determination of a final mask from one or more of these multiple masks, for each image. Each input image 902 has a series of masks created 904, 906, 908, 910. The number of the masks created can vary by embodiment. Example masking techniques include but aren’t limited to any or any combination of: difference masks; BASNet masks; U A 2 masks; and Hand-CNN masks. The skilled reader will appreciate that a selection of masking techniques can be used in addition or instead or any or all of these suggested techniques within the teaching of this aspect.

From the multiple masks created 904, 906, 908, 910, a primary mask 912 is selected. In the example embodiment, a decision tree is used to determine how to select or combine one or more of the masks 904, 906, 908, 910 together to result in one substantially optimal final mask 916. Optionally, techniques such as ensemble learning and weighted average ensemble prediction can be used to train automatic masking algorithms and/or train how to weight or combine the outputs of the masking models and/or train to combine the resulting mask models/masks generated into a final mask. The final mask 916 can comprise some or all elements from some or all of the masks 904, 906, 908, 910 and is used in the reconstruction step when the model of the object is generated.

The process of determining a primary mask 912 and final mask 916 can be in part determined based on user input provided at the start of the scan, in some embodiments, else a set of default or predetermined assumptions or settings can be used regarding the object properties and/or the imaging equipment.

The first step of the auto-masking process 900 is to generate masks 904, 906, 908, 910 using a number of different masking techniques (also known as multi-method masking) to derive a set of masks to be used as input masks for the next step.

The second step is to score each mask 904, 906, 908, 910 using a confidence score, and to select the mask with the highest confidence score as the primary mask 912. Optionally, the second step may comprise multiple iterative sub-steps where some masks influence the generation of intermediate masks until a primary mask is generated or selected. For example, an iterative sub-step or set of sub-steps may include the use of masking algorithms designed to detect and remove elements of the scanning apparatus visible in an image to remove these from all images, or to add detected elements to existing masks, optionally irrespective of any additional combined masking approach. The third step is to use the primary mask 912 as the master comparator mask and to compare it to all of the remaining masks 904, 906, 908, 910 not selected as the primary mask 912.

The fourth step is to, for all masks 904, 906, 908, 910 including the primary mask 912, segment all elements in each mask and classify each segment as masked or non- masked.

In some embodiments, the classification is based on binary input mask (i.e. black or white) but in other embodiments an alpha masking approach is used (e.g. shades of grey in 4 th channel of an RGBA image) to classify and provide weighting to the masking applied during three-dimensional reconstruction, allowing for a more granular approach to any masking step used in the three-dimensional reconstruction step Optionally, an estimated depth-map is used to derive an alpha mask.

The fifth step is to determine the final mask 916 from the primary mask 912 and the remaining of the other masks 904, 906, 908, 910 using a series of sub-steps to determine how to combine the masks 904, 906, 908, 910. The first sub-step is to identify the largest unmasked portion of the primary mask 912 (i.e. the portion indicating the largest area of pixels representing the object in the image of the object). The second sub step is to remove from the primary mask 912 any unmasked portions that are not the largest unmasked portion. Optionally, the second sub-step may also remove from the primary mask 912 unmasked portions that have been detected and classified or identified as being a magnetic holding post or some other part of the scanning apparatus (e.g. using a machine learned approach to detect elements of a known scanner apparatus including elements from the scanner or scanning apparatus from the image in the masked area). This optional sub-step can be performed at other points in the process, as an earlier or later step or sub-step. The third sub-step is to identify any masked portions (i.e. portions representing pixels that are not the object in the image of the object) in the other masks 904, 906, 908, 910 that are not the primary mask 912 that fall within the largest unmasked portion of the primary mask 912 and removing these portions from the primary mask 912 such that the remaining mask in the primary mask 921 is the largest unmasked portion with the masked portions that fall within the largest unmasked portion applied to the largest unmasked portion. Optionally, the fourth sub-step is to shrink the size of the remaining unmasked portion around the edges of the mask to remove the unmasked portion at the boundaries of the unmasked portion to increase the likelihood that the unmasked portion remaining represents the object in the image.

In some embodiments, the masks can be configured to remove part of the object in addition to any holes and background in order to avoid any three-dimensional reconstruction confusion or errors at the boundary of the object.

The auto-masking process 900 can be used with a variety of masking techniques, in a variety of combinations, and a few example techniques will now be presented with reference to Figures 10 to 14.

Referring now to Figure 10, there is shown the application of a difference mask to an image of an object.

A difference mask 1000 requires a background image to be known, for example by capturing an image from a fixed position without a subject or object in the frame. This background image is then subtracted from the frames that are then captured having the subject/object in the frame, to create a difference mask. This technique can result in reconstructions that contain many points that not of interest, i.e. contain lots of noise or elements from the scene that are not required (e.g. parts of the scanning apparatus or the surface that the object is placed on such as a floor or a turntable). Thus, difference masks usually require manual intervention to remove noise manually from the mask/image generated when the mask is applied. Alternatively, user intervention can be made to change the tolerance of any difference map to account for factors such as contact shadows. This technique can also be sensitive to changes in the background, as the background image may need to be re-taken in order to reduce errors from changed background conditions (e.g. the focal length of the imaging device changes, the imaging device is moved relative to the background) and thus this technique typically works better in tightly controlled conditions. An example background image 1002 for an image 1004 is shown in Figure 10, and this generates the example mask 1006 which aims to identify the pixels in the image pertaining to the object that is the subject of the image. To correct for the noise visible in the mask 1006, a user would need to manually adjust settings such as tolerances in order to obtain a correct mask - this adjustment tends to be made manually and assessed visually by the user.

In alternative embodiments, Sobel edge detection and/or Canny edge detection are used in an iterative manner. Then segmentation can be used to classify small islands (i.e. unmasked or masked portions) and remove these. Finally, the mask can be contracted to cut into the subject itself.

Referring now to Figures 1 1 and 12, there is shown the application of the BASNet masking technique. Details of this technique are available at hltps://gjihub ; com/NaihanyA/BASNef and this document/information is herein incorporated by reference. The BASNet technique is shown in Figure 1 1 to be applied to an image 1 102 and produces mask 1 104. However, the BASNet technique does not always entirely accurately identify a mask for the object, and it can be seen in the mask 1 104 that a portion of the mask that should be identified as a hole in the object is instead masked to indicate it is part of the object.

The network 1200 used by the BASNet technique is presented in Figure 12. The network comprises two modules, the predict module 1202 and the residual refinement module 1204. The predict module 1202 is a sequence of layers that take an input image and output a coarse mask. The coarse mask is input into the residual refinement module 1204 which refines the mask to produce the refined mask that is output by the network 1200.

Referring now to Figure 13, there is shown the input image 1302 instead fed into a network performing the I 2 masking technique, which outputs a mask 1304 that correctly identifies the visible hole in the object 1302 that is not identified in Figure 1 1 by the BASNet technique. The U A 2 net technique is documented at https://qithub.com/NathanUA/U-2-Net and this document/information is incorporated herein by reference.

Referring now to Figure 14, there is shown an illustration of the user holding an object 1402 and the mask created 1404 using the Hand-CNN mask technique 1400. Further details of this technique are documented at hUos:/ www3.cs s{onvbrook.edu/%7E:.cvi/Dro|ecls ; hand ciet attention/ and this information/documentation is incorporated herein by reference. This technique can allow for the removal of any of fingers, hands and/or arms from the output mask 1404.

In alternative embodiments, user handling objects being imaged can wear specific colour gloves that cover their hands and forearms in an evenly-lit environment with a matching background (i.e. a background having the same colour in the even lighting) in order to allow the hands and arms to be masked out from the scene using other techniques such as difference techniques or chroma keying. In still further embodiments, this can be used in conjunction with augmented reality headsets and/or a green/black screen for scanning (e.g. portable scanning).

In other embodiments, an auto-grab cut method can be used for one of the masking techniques. Initially segmentation is performed on the image to automatically identify the background and object/foreground and paint lines on to the image. Then a grab-cut mask approach is used to identify pixels within the painted lines on the image that pertain to the object.

In embodiments using augmented reality headsets, user feedback can be acquired in real time as the model of the object is generated from images captured using the augmented reality headset imaging devices or by another imaging device. For example, as an initial batch of images are captured and processed, the model of the object can be generated iteratively and provided to the augmented reality headset for the user to review progress in generation of the model in real time. The user can then use the augmented reality controls to re-orientate the model and review progress, and can select portions of the model to provide user input - for example to identify holes through an object or to provide refinement to any classification or segmentation being performed on the input images as this computation is being performed and incorporated into the model. The same approach can be used with traditional display devices, for example touchscreens, tablet computers or computers with input devices such as a mouse - the model as it is being generated, perhaps alongside the raw images as they are taken or processed/classified/segmented can be shown to the user to allow for user input on the processing performed thus far in the model generation. This is shown and described in more detail in respect of the embodiment shown in Figure 24 as described below.

In embodiments the auto-masking process 900 can be used with a variety of apparatus to capture the images of the object for which a three-dimensional model is to be generated, including: hand-held cameras, cameras mounted on frames or stands, augmented reality headsets; head-mounted cameras; unmanned aerial vehicles equipped with imaging devices; or any imaging device howsoever mounted that is able to capture multiple images of an object at different viewpoints of the object. For example, the process can be used with images captured of an object on the turntable apparatus 1500 shown in Figure 15, or images captured of an object being held by a user 1600, 1700, 1800 as shown in Figure 16, 17 and 18 provided with difference options for imaging device ranging from apparatus similar to that described in relation to Figure 1 or a camera mounted on a tripod as shown in Figure 18. Alternatively, the user may hold the object but be wearing an augmented reality headset 1900 as shown in Figure 19.

Referring now to Figures 20 to 23, a further embodiment showing apparatus for capturing scans of paintings will now be described.

Referring first to Figure 20, in this embodiment the painting 2010 to be captured is placed on a flat surface and the scanning apparatus 2000 is positioned around and/or above the painting 2010. The apparatus 2000 has a frame 2006 and an imaging sensor 2008 mounted upon a moveable track 2012 upon the frame 2006, and the imaging sensor 2008 is in communication, via a connector 2005 such as a standard USB or similar computer cable, with a capture system 2002. In other embodiments, the imaging sensor 2008 may connect wirelessly to the capture system 2002 using for example wireless networking, cellular networking or some other wireless data exchange protocol, and optionally the capture system 2002 may be remote from the imaging sensor 2008 (for example hosted in a remote computer system or cloud distributed computer system). The frame 2006 has four legs 2014 keeping the imaging sensor 2008 when mounted upon the track 2012 at a predetermined distance from the painting 2010. The moveable track 2012 allows the imaging sensor 2008 to be moved relative to the painting 2010 in two dimensions, and also allows the imaging sensor 2008 to change angle in order to capture the painting 2010 at different angles from the predetermined vertical distance away from the painting 2010. By either providing a frame 2006 that is larger than the painting 2010, and moving the moveable track 2012 such that the imaging sensor 2008 is iteratively allowed to take images of all of the surface of the painting (and the edges and back by flipping the painting over, as required), or by moving the frame 2006 as required and repeating the iterative image capture process, all of the painting 2010 can be scanned.

As shown in Figure 21 , different imaging sensors 2108 can be provided within the same apparatus 2106, also on a moveable track 21 12 and having multiple legs 21 14 to keep the image sensor 2108 at a predetermined distance from the painting 21 10.

As shown in Figure 22, in some embodiments the apparatus 2200 is provided with mounting points 2212 for one or more cameras 2208 simultaneously and motors to allow the cameras 2208 to move relative to the painting 2210 to capture images of the painting 2210 from multiple positions and/or angles simultaneously, and including the edges and/or back of the painting depending on the desired comprehensiveness of the scan.

The degrees of freedom of the apparatus 2200 of Figure 22 are shown in Figure 23. The multiple image sensors 2308 are again mounted on the moveable track 2312, and the track 2312 can cause the sensors 2308 to move across the surface 2312a of the painting or at an angle 2312b relative to the surface of the painting.

In addition, a diffuse non-polarised light source and then a flash-based polarised light source are used to capture duplicate images of each viewpoint of the painting. The two images captured, one with the polarised light source and the other with the non polarised light source allow for the creation of a roughness map of the painting in order to reproduce the visual characteristics of the painting under simulated light when viewing the scanned painting (for example when displayed in a real-time PBR three-dimensional environment such as the EPIC Unreal Engine). Directional light sources may be used to infer other material properties and/or morphological features of the object/subject.

Referring now to Figure 24, an augmented reality interface 2406 is shown and will now be described in more detail.

A user 2402 is provided with an AR headset 2406 that is in communication with the system 2412 while a model is being generated of an object 2414. The user 2406 can review the progress of the model generation by reviewing both the images captured 2410 and the model during generation 2408a and the model once finished 2408b. As described above, in this embodiment the user can identify or correct segmentations or classifications on the images 2410 and can also review and interact with the model 2408a, 2408b to view it from multiple angles, locate the relevant photo used to generate portions of the model, and provide user input to improve or change the model as it is generated or once it has been generated.

Any system feature as described herein may also be provided as a method feature, and vice versa. As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure.

Any feature in one aspect of the invention may be applied to other aspects of the invention, in any appropriate combination. In particular, method aspects may be applied to system aspects, and vice versa. Furthermore, any, some and/or all features in one aspect can be applied to any, some and/or all features in any other aspect, in any appropriate combination.

It should also be appreciated that particular combinations of the various features described and defined in any aspects of the invention can be implemented and/or supplied and/or used independently.