Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR AUTOMATIC OBJECT REMOVAL FROM A PHOTO, PROCESSING SYSTEM AND ASSOCIATED COMPUTER PROGRAM PRODUCT
Document Type and Number:
WIPO Patent Application WO/2022/260544
Kind Code:
A1
Abstract:
The invention provides unmoving object removal as a new feature of camera or any device with embedded camera that allows to automatically remove fence or other unmoving objects from a photo. The computer-implemented method for automatic object removal from a photo comprises providing a reference photo containing at least one object to be removed from the reference photo, providing at least one consecutive photo containing at least one object to be removed from the reference photo captured from a different angle; for the reference photo and each at least one consecutive photo performing separately: a) detection of at least one object to be removed from the reference photo, wherein the photo is divided into two planes, b) division of the photo into tiles, c) calculation of the cost function for each tile, next, based on all tiles from all photos, searching for the best new combination of tiles by performing optimization of the global cost function, outputting a photo comprising the best combination of tiles for which the global function has the minimum value, said outputted photo being the reference photo with at least one object removed and replaced by background from at least one consecutive photo.

Inventors:
ŁUKASZEWICZ JAKUB (PL)
Application Number:
PCT/PL2021/050038
Publication Date:
December 15, 2022
Filing Date:
June 07, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TCL CORPORATE RES EUROPE SP Z O O (PL)
International Classes:
G06T5/00
Domestic Patent References:
WO2021035228A22021-02-25
Foreign References:
US20170359523A12017-12-14
US20180330470A12018-11-15
US20160323505A12016-11-03
Other References:
CHRIS HARRISMIKE STEPHENS: "A Combined Corner and Edge Detector", ALVEY VISION CONFERENCE, 1988
M. A. FISCHLERR. C. BOLLES: "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography", COMM. OF THE ACM, 1981
Attorney, Agent or Firm:
STENZEL, Anna (PL)
Download PDF:
Claims:
Claims

1. A computer-implemented method for automatic removal of an unmoving object from a photo, comprising

- providing a reference photo containing at least one unmoving object to be removed from the reference photo;

- providing at least one consecutive photo taken from another angle than the reference photo and containing at least one said unmoving object to be removed from the reference photo; for the reference photo and each at least one consecutive photo performing separately: a) detection of at least one said unmoving object to be removed from the reference photo, where the photo is divided into two planes, wherein at least one object to be removed is assigned to the first plane and the second plane comprises elements to be outputted in a final photo; b) division of all captured photos into tiles; c) calculation of a cost function for each tile, wherein only tiles with second plane are processed; next, based on the tiles from captured photos, searching for the best selection of tiles by performing optimization of the global cost function; outputting the final photo comprising the selected tiles that are the best combination of tiles for which the global function has the minimum value, said outputted final photo being the reference photo with at least one unmoving object removed and replaced by background from at least one consecutive photo.

2. A method according to claim 1 , wherein the calculation of the cost function is performed according to the formula: where C(x ) is a cost of assigning a tile xp to a given position p

0(x ) is a component defining if the tile contains an object to be removed

Vpq(xp,xq) is sum of squared differences (SSD) between adjacent pixels of the tile xp and a neighboring tile

Xq

3. A method according to claim 1 , wherein steps from a) to c) are performed on the fly after providing each consecutive photo.

4. A method according to claim 1 , wherein detection of at least one unmoving object to be removed from the reference photo is a semantic object detection using a neural network.

5. A method according to claim 1 , wherein detection of at least one unmoving object results in production of an object mask, where the object mask comprises at least one object to be removed and the object mask is the first plane, separate from the second plane, where the second plane comprises elements to be outputted in the final photo.

6. A method according to claim 1 , wherein before step a), alignment of each of at least one consecutive photo with the reference photo is performed.

7. A method according to claim 1 , wherein each photo is divided into at least 2x 2 tiles.

8. A method according to claim 7, wherein size of the tile is adaptive.

9. A method according to claim 1 , wherein in the optimization step a loopy belief propagation algorithm is used.

10. A method according to claim 1 , wherein the step of outputting a new photo comprises blending.

11. A method according to claim 1 , wherein the step of providing at least one consecutive photo ends automatically.

12. A method according to claim 1 , wherein the step of alignment comprises key points determination using a combined corner and edge detector.

13. A method according to claim 1 , wherein in each consecutive iteration of the global cost function optimization step a new value of the cost function is calculated for all candidates tiles being neighboring tiles of the tile replaced in the reference photo in the previous iteration.

14. A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations, the operations comprising:

- providing a reference photo containing at least one unmoving object to be removed from the reference photo

- providing at least one consecutive photo taken from another angle than the reference photo and containing said unmoving object to be removed from the reference photo; for the reference photo and each at least one consecutive photo performing separately: a) detection of said unmoving object to be removed from the reference photo, where the photo is divided into two planes, wherein at least one object to be removed is assigned to the first plane and the second plane comprises elements to be outputted in a final photo; b) division of all captured photos into tiles c) calculation of a cost function for each tile, wherein only tiles with second plane are processed next, based on the tiles from captured photos, searching for the best selection of tiles by performing optimization of the global cost function outputting the final photo comprising the selected tiles that are the best combination of tiles for which the global function has the minimum value, said outputted final photo being the reference photo with at least one unmoving object removed and replaced by background from at least one consecutive photo.

15.The data processing system according to claim 11 , wherein the calculation of the cost function is performed according to the formula: where C(x ) is a cost of assigning a tile xp to a given position p

0(x ) is a component defining if the tile contains an object to be removed

Vpq(xp,Xq) is sum of squared differences (SSD) between adjacent pixels of the tile xp and a neighboring tile

Xq

16. The data processing system according to claim 11 , wherein in each consecutive iteration of the global cost function optimization step a new value of the cost function is calculated for all candidates tiles being neighboring tiles of the tile replaced in the reference photo in the previous iteration.

17. The data processing system according to claim 14, wherein before step a), alignment of each of at least one consecutive photo with the reference photo is performed.

18. A mobile phone comprising a data processing system according to claim 14.

19. A computer program product comprising instructions, which when executed by a processor, cause the processor to perform operations, the operations comprising:

- providing a reference photo containing at least one unmoving object to be removed from the reference photo

- providing at least one consecutive photo taken from another angle than the reference photo and containing said unmoving object to be removed from the reference photo; for the reference photo and each at least one consecutive photo performing separately: a) detection of said unmoving object to be removed from the reference photo, where the photo is divided into two planes, wherein at least one object to be removed is assigned to the first plane and the second plane comprises elements to be outputted in a final photo; b) division of all captured photos into tiles c) calculation of a cost function for each tile, wherein only tiles with second plane are processed next, based on all the tiles from captured photos, searching for the best selection of tiles by performing optimization of the global cost function outputting the final photo comprising the selected tiles that are the best combination of tiles for which the global function has the minimum value, said outputted final photo being the reference photo with at least one unmoving object removed and replaced by background from at least one consecutive photo.

20. The computer program product according to claim 19, wherein before step a), alignment of each of at least one consecutive photo with the reference photo is performed.

Description:
Method for automatic object removal from a photo, processing system and associated computer program product

[0001] The present invention relates to a method for automatic object removal from an image such as a photo or a live camera preview, particularly for mobile devices provided with a camera. The invention also relates to a processing system able to implement the method and a computer program product associated with the method.

BACKGROUND

[0002] Removing crowd and unwanted objects to improve quality of a photo is one of the situations in which a photo edition or retouching is highly desirable. Especially, a lot of photos are taken in places where other objects, apart from the object to be photographed are present. Often the access to the object to be photographed is difficult and said object is occluded by other objects such as fences or trees. The result is that for example the bars of the fence or branches of the tree are present in photos and hide some important parts of the scene, which makes photos less attractive for the author. Such situations often occur in zoological gardens where animals are separated from the visitors by bars or wire or museums where the display is behind some kind of barrier. It such cases it is often impossible to take the picture of the object without some object, for example a fence partially obscuring the view of the object, for example an animal. [0003] For this purpose, editing of a single scene and retouching a single photo by inpainting is known from the prior art . Editing and retouching multiple scenes by inpainting is also known from the prior art. Video inpainting refers to a field of computer vision that aims to remove objects or restore missing or tainted regions present in a video sequence by utilizing spatial and temporal information from neighboring scenes. The overriding objective is to generate an inpainted area that is merged seamlessly into the video. In that way when the video is played as a sequence, the visual coherence is maintained throughout and no distortion in the affected area is visible to the human eye.

[0004] For single photo editing there are also known other solutions, usually used by professional graphic designers, which provide high quality tools (e.g. Photoshop) but require a lot of manual work and high-level expertise. There are also apps made for mass market (e.g. TouchRetouch) which are much simpler but offer significantly lower quality and also require manual selection of objects that need to be removed. [0005] In general, all known solutions for object or crowd removal from a single photo require manual operation from the user.

[0006] Thus, there is still a need to provide good quality automated method for crowd or object removal from a single photo.

DISCLOSURE OF THE INVENTION

[0007] Starting from the depicted prior art, the object of the invention is based on developing of a method for automatic object removal from a photo, which would be suitable, firstly, for ensuring automatic removal of unwanted objects from a photo, like crowd or cars or fences without additional manual operation of the user and, secondly, for being able to obtain a good quality output photo.

[0008] This object is achieved by a method for automatic object removal from a photo, a processing system comprising an image processing pipeline according to the invention, and an associated computer program product. [0009] The invention provides crowd removal or object removal as a new feature of camera or any device with embedded camera that allows to automatically remove crowd or object from a photo using deep neural networks and semantic methods (object detection, semantic segmentation etc.).

[0010] According to the first aspect of the invention a computer-implemented method for automatic removal of an unmoving object from a photo is provided. The method according to the invention comprises providing a reference photo containing at least one unmoving object to be removed from the reference photo, providing at least one consecutive photo taken from another angle than the reference photo and containing at least one said unmoving object to be removed from the reference photo, for the reference photo and each at least one consecutive photo performing separately: a) detection of at least one said unmoving object to be removed from the reference photo, where the photo is divided into two planes, wherein at least one object to be removed is assigned to the first plane and the second plane comprises elements to be outputted in a final photo, b) division of all captured photos into tiles, c) calculation of a cost function for each tile, wherein only tiles with second plane are processed and next, based on the tiles from all photos, searching for the best selection of tiles by performing optimization of the global cost function, outputting the final photo comprising the selected tiles that are the best combination of tiles for which the global function has the minimum value, said outputted final photo being the reference photo with at least one unmoving object removed and replaced by background from at least one consecutive photo.

[0011] Advantageous developments of the method for automatic object removal from a photo according to the invention are specified in the dependent claims.

[0012] The method in accordance with the invention may also have one or more of the following features, separately or in combination:

- steps from a) to c) are performed on the fly after providing each consecutive photo,

- detection of at least one unmoving object to be removed from the reference photo is a semantic object detection using a neural network,

- detection of at least one unmoving object results in production of an object mask, where the object mask comprises at least one object to be removed and the object mask is the first plane, separate from the second plane, where the second plane comprises elements to be outputted in the final photo,

- before step a) an alignment of each of at least one consecutive photo with the reference photo is performed,

- each photo is divided into at least 2x 2 tiles,

- size of the tiles is adaptive,

- in the optimization step a loopy belief propagation algorithm is used,

- the step of outputting a new photo comprises blending,

- the step of providing at least one consecutive photo ends automatically,

- the step of alignment comprises key points determination using a combined corner and edge detector,

- in each consecutive iteration of the global cost function optimization step a new value of the cost function is calculated for all candidates tiles being neighboring tiles of the tile replaced in the reference photo in the previous iteration. [0013] According to a second aspect a processing system for automatic object removal from a photo is provided. The system comprises at least a memory and a processor which is configured to implement steps of the method according to the invention.

[0014] The invention also concerns a computer program product comprising instructions, which when executed by a processor, cause the processor to perform steps of the method according to the invention. [0013] The proposed invention allows for automatic removal of crowd, fence or other unmoving objects from a photo. Thanks to acquisition of at least two photos information in time regarding said objects and background can be gathered.

[0014] Thanks to a specific cost function calculated for each predefined subregion of each consecutive image it is possible to evaluate fitting of each distinct subregion taken from other photos into the reference photo which undergoes edition.

[0015] Thanks to the context based global optimization of the cost function the proposed invention allows to edit photos so as to ignore regions which should not change or its change is not important in the context of a better photo.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] Other advantages and features will become apparent from reading the description of the invention and from the appended drawings, in which:

FIG. 1 shows an input photo and an output photo of the method according to the invention;

FIG. 2 shows key points found in each consecutive photo which are used for alignment of each consecutive photo with the reference photo;

Fig. 3 shows a flowchart of the method in accordance with one embodiment of the invention;

DETAILED DESCRIPTION

[0017] Aspects of the present invention are described below with reference to the drawings. In particular, figure 3 shows a flowchart of the method, apparatus (system) and computer program product according to the invention. It will be understood that each block of the flowchart and combinations of blocks in the flowchart, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, executed via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart.

[0018] The general idea of the invention is to allow users of a digital camera to take not only a single photo, as in traditional camera mode, but continuously capture consecutive frames for as long as required by the user in order to acquire enough information for further processing. In the preferred solution, user of a digital camera may slightly change position while taking different frames, obtaining different angles of the image. [0019] The invention is based on the assumption that when obtaining different frames from different angles, on different frames different parts of the object of interest will be hidden by the object or objects to be removed, allowing deletion of the unwanted objects and creation of unobstructed image of the object of interest by combining elements from different frames. In a way, each photograph is treated as it would be divided into two planes, the first plane comprising object to be removed and the second comprising the rest of the scene, object of interest included. Preferably, the distance between the camera and the object to be removed is relatively small. Preferably, the distance between the object to be removed and the object of interest is bigger than the distance between the camera and the object to be removed, and therefore we can treat the second plane as planar.

[0020] The captured data is processed online to detect people and/or objects using deep neural networks and to divide the photo into planes, wherein said people and/or objects to be removed are assigned to the first plane and the second plane comprises elements to be outputted in a final photo. This information allows to remove unwanted people and/or unmoving objects from the reference photo.

[0021] Figure 1 shows in general what is the purpose and what are the results of the method of automatic object removal from a photo. It can be observed that an input image contained a fence obscuring the building that was the object of interest. The output image received at the end of the method according to the invention contains only said building while the fence is replaced by background (the second plane elements that were behind the fence and occluded by it on the left picture of the Fig. 1 ) of good quality. [0022] Now the method for automatic object removal from a photo, according to the invention will be described in reference to Figure 3 which shows an associated flowchart of steps. The proposed method for automatic object removal from a photo is composed of several steps. The method comprises of step 100 of providing multiple photos; image alignment step 200 performed in the alignment module; object detection step 300 performed in the object detection module; the division step 400 performed in the background modeling module; tile cost function calculation step 500 performed in the calculation module; global cost optimization step 600 performed in the optimization module; blending step 700 performed in blending module and final result step 800. All the steps will be described in greater detail below.

[0023] The method begins by step 100 of providing multiple photos. This step 100 includes two substeps. [0024] The first substep 101 is a substep 101 of providing ‘a reference photo’ by means of a camera. The ‘reference photo’ is a first photo from a series of photos taken continuously by the user of the camera. The reference photo is the one from which the at least one unmoving object will be removed and replaced by a background obtained from another, consecutive photo.

[0025] In the preferred embodiment the method according to the invention requires multiple photos to be taken. Apart from the reference photo provided in substep 101 a set of images is taken in a sequence by the user of the camera from slightly different positions and therefore angles during a predetermined period in substep 102, providing at least one consecutive photo. This set or series of images can comprise images registered by a standalone digital camera or a digital camera embedded into another digital device like a mobile phone. The images can be written into a separate memory before processing or can be captured directly from the camera.

[0026] The substep102 of providing at least one consecutive input image includes preferably capturing consecutive photos and their parameters directly from the camera in on the fly mode. Alternatively, the reference photo and at least one consecutive photo can be provided by reading it from a memory, acquiring image parameters, in particular its size and resolution, and optionally displaying it on a display for user perception. [0027] In one embodiment, if the method is performed on the fly, it is the user who decides when to stop acquiring consecutive photos. In practice the period of acquisition can last from 5 to 20 seconds. In another embodiment the decision how many consecutive images should be processed is taken automatically based on the amount of acquired information, namely the method stops automatically if the required amount of information has been gathered for removal off all detected objects

[0028] Several steps used for processing of each acquired photo will be now described below. The person skilled in the art will understand that those steps can be performed on the fly for each photo as it arrives, or can be performed in another technically reasonable sequence once all photos are registered.

[0029] Next step of the method of automated object removal from a photo is an image alignment step 200 performed in the alignment module. In this alignment step, each incoming image is aligned onto the reference frame of the first image, namely the reference photo, by use of the key points and ORB key points detector, described in more detail below. A ’reference frame’ means here an initial position of the reference photo.

[0030] First a substep 201 of key points determination is performed. For example, a Shi-Thomasi feature detector as described in Chris Harris and Mike Stephens. "A Combined Corner and Edge Detector". Alvey Vision Conference, 1988, is used for the purpose of key points finding. By using this approach key points can be found very quickly even on a mobile device. Key points are used to detect relative position between consecutive input images. The result is shown on Fig. 2.

[0031] Next, the image alignment step 200 comprises a matching substep 202. Within this step each input image from a set of consecutive images is matched using for example RANSAC algorithm as described in M. A. Fischler, R. C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Comm of the ACM, 1981 .Knowing position of those key points in each consecutive input image, each incoming image can be warped into the reference frame of the first image and then, such aligned image can be used in further processing. By performing this alignment step 200 it is guaranteed that the content of each incoming input image can be objectively compared with the reference photo, namely the first image. This is required since the user is moving in time between taking of two consecutive images and the same object is captured with different coordinates on each photo. It will be obvious for the person skilled in the art that the alignment step 200 will cause cutting some parts of the consecutive images for further processing.

[0032] Each of the frames fed into the alignment module during the image alignment step 200 has its’ position adjusted to previously collected frames. Namely the first image from the set of images is adjusted to the reference image, the second image is adjusted to the first image and so on. It is done by matching detected by the ORB key point detector key points of the image-to-be-aligned against key points found on reference frame (frame of the previous image) using GMS matcher. The key points detected must be distinctive and numerous enough as to allow at least robust matches, so finding homography between the frames is possible.

[0033] Next step of the method of automated object removal from a photo is semantic object detection step 300 performed in the object detection module. In this detection step 300 an Efficient Det architecture based neural network is used for object detection on the reference photo and each consecutive photo to produce object mask with the same resolution as reference photo. In particular, in one embodiment, the object to be removed is a fence and the produced mask is a fence mask. This fence is detected on the reference photo and each consecutive photo. In another embodiment other types of objects such as tree branches, can be detected. In this step 300 of semantic object detection the reference image and other consecutives photo are processed so as to detect objects, in particular parts of the fence. In this step a regular rectangular regions of the photo are outputted by the object detection algorithm implemented with the above-mentioned neural network. The data set comprises photos with labeled parts of the fence. Of course, the model can be trained for other type of objects based also on a labeled data set. Here in this description, such a detected region is called ‘a region of interest’. All captured images and detected regions of interest are temporarily registered. The information which parts of a photo contains detected object is contained in a specific data structure and later on assigned to each corresponding image subregion extracted in the next step of the method according to the invention. During detection step 300 the detected object to be removed is assigned to the first plane while the elements to be outputted in the final photo, that are not to be removed, are assigned to the second plane.

[0034] Then both the aligned image and the object mask, for example the fence mask are forwarded to the background reconstruction module, in order to reconstruct the parts of the background from the reference image that are occluded by the fence.

[0035] The method then proceeds to the division step 400 of dividing of all photos captured in substeps 101 and 102 into tiles and the tiles comprise the object to be removed. Here ’a tile’ means a 2D rectangular subregion of an image received, by partitioning said image with the use of a net having a mesh of predetermined size. On the other hand, a tile is a 3D element resulting from partition of a 3D structure, the 3D structure consisting of a set of consecutive images. The first two dimensions of said 3D structure are physical dimensions of each consecutive image while the third dimension is time. Basically, each spatial position in the grid has several tiles collected in different time points and from different angles. This multi dimensional grid can be represented as Markov Random Field, where each tile is a node. After collecting many tiles from different time points, it should be possible to have at least one object-free (fence-free) tile for each position in the grid which will make creating the non-occluded background image possible created from the tiles of the second plane, without the use of the tiles comprising the fence, that is the tiles of the first plane. If there are not enough fence-free tiles (tiles of the second plane) for each position in the grid, the method goes back to the step 102 of feeding the new consecutive frame to be then forwarded to the alignment module.

[0036] In general, the size of a tile in 2D will results from a predetermined number of meshes into which an image should be split in 2D, while said number of tiles depends on the computation power of the hardware and can be adaptive. In one embodiment the decision how to partition a photo into tiles can be taken automatically, for example based on hardware capacity of a specific mobile phone type. Preferably, the size and number of tiles is predetermined. Each photo is divided into at least 2x2 tiles. For example, photos are partitioned into 21x11 tiles. In the case of photos having 1920 x 1080 and being partitioned into 21x11 tiles resolution the size of a tile is (1920/21 ) x (1080/11 ). Each tile of the partitioned image has adjacent tiles, also called neighboring tiles or neighbors. The particulars of the relation between the tiles will be described in more detail further in the description.

[0037] Each photo is partitioned using the same net and contains a predetermined number of rectangular subregions, namely each photo is divided into exactly the same number of tiles. The person skilled in the art will understand that smaller the size of a tile is the more detailed processing is possible. For example, more detailed processing is desired if the fence has a denser mesh or comprises more bars close together. On the other hand, the processing cannot be time-consuming, and preferably it is done in real time and thus the number of tiles can be determined dynamically in each case separately.

[0038] In consequence, the step 400 of dividing each photo into tiles results in a photo consisting of regular rectangle puzzles. Tiles of the same sizes and of the same coordinates are present on the reference photo and other consecutives photos. Tiles from the consecutives photos are called candidate tiles. Theoretically each candidate tile can replace a tile of the same coordinates on the reference photo. However only chosen tiles from the reference photo should be replaced by a candidate tile from one of consecutives photos, namely those comprising parts of the object to be removed. Moreover, each tile is assigned an information about presence of a detected object within its area. As a consequence, the output information from the trained neural network model which is further kept in a tile can be used also in checking whether the acquisition of photos can be early stopped. Namely, if for each tile in the reference photo, for which an object to be removed has been detected, there is at least one candidate tile in at least one consecutive photo which does not contain said detected object, then it means that enough information has been acquired for processing.

[0039] Once the image is split into tiles, based on the neural network output, an information about the presence or a lack of detected object within its area is assigned to each tile. This will allow further to provide the answer for the question which tile should be replaced, and whether it should be replaced at all, in further steps of the method.

[0040] For each photo, including the reference photo, the method according to the invention proceeds to a tile cost function calculation step 500 wherein the cost function for each tile from the second plane is calculated. Each of tiles of each image is considered as a separate node of the Markov Random Filed (MRF). In practice this step results in receiving a set of specific values C (x ) for each tile of the second plane, each value representing matching degree of the tile into a specific position of the reference photo. Such set of values is also calculated for tiles in the reference photo.

[0041] The cost function of assigning candidate tile can be represented by equation:

[0042]

The equation defines cost of assigning candidate tile x to position p.

The cost comprises of:

Cost O indicating if tile x P contains an unmoving object

Sum of costs V pq computed as sum of squared differences between pixels of tile x and adjacent pixels of neighbor tile x q . The sum is computed over all neighbor positions q for given position p. [0043] The calculated cost function takes into consideration parameters as follows: whether a tile includes a specific part of unmoving object, detected in a semantic detection step 300.

How well does a tile fit to neighboring tiles (it favorizes rectangles which fit each other).

[0044] The Applicant found that in on the fly mode processing of one incoming photo (including steps 100, 200,300,400,500) takes about 500 ms, which allows for real-time calculation and is satisfactory for commercial use.

[0045] Next step of the method according to the invention is a step 600 of global cost optimization which enables choosing optimal tiles combination forming a new image from selected tiles chosen from tiles from all captured images. In practice this step is iterative and enables choosing a combination of tiles which give the smallest possible cost. By iteratively optimizing the global cost function a decision for each tile in the reference photo is taken whether it should be replaced by one of tiles having the same coordinates from one of the consecutive photos.

[0046] If the tiles’ combination is optimal then: unmoving objects are replaced with background that would be visible in the absence of said unmoving object, there are no sharp transitions between tiles, there are no traces of the unmoving object, there are no incomplete objects like a section of the fence hanging “in the air”.

[0047] In the step 600 of global cost optimization any algorithm that is typically used for optimizing MRFs e.g., Loopy Belief Propagation can be utilized. The choice of the algorithm determines the trade-of between the computation speed and possibility of convergence to the optimal configuration. However, in one embodiment the following greedy optimization algorithm can be used.

[0048] The calculation is initialized with tiles taken from the reference frame (of the reference photo). Then each tile is checked by iterating at random order. During said checking, all available versions of a specific tile (namely all other tiles having the same coordinates within the net) are compared with the tile from the reference photo and their costs are calculated. The tile, with the lower cost replaces the tile with the higher cost. After replacement, the neighborhood of the replaced tile is recomputed. It means that new values of the cost function of all four neighboring tiles of the tile already replaced are calculated.

[0049] In other words, in the first iteration of the optimization algorithm the global cost function value for all tiles from the reference photo is known. Moreover, single cost function value is known for each candidate tile having a position which corresponds to a considered tile in the reference image. During one iteration, theoretically all single tiles from the reference photo are virtually replaced by a corresponding candidate tile from each of consecutive photo. This process is random in terms of the tile position in the reference photo for which the calculation is performed. However, said one iteration ends if a replacing tile for which the global cost function value is lower is found. Then another iteration starts and early stopping of the optimization step is not possible until the tile with lowest value of the global cost function is found.

[0050] In each consecutive iteration all calculations regarding global cost function are performed the same way. However, in each consecutive iteration before the iteration the cost function for four neighboring candidate tiles of the tile that has been replaced if the previous iteration was calculated. This is because the cost function for a specific tile takes into account a factor resulting from neighboring tiles. It means that the cost function becomes out of date for a candidate tile once one of its neighboring tile in the reference photo has been replaced. The value of the global cost function can be determined correctly in the consecutive iteration only for updated input data if the calculated value of the global cost function is lower, then again, the replacement takes place and the iteration stops or the calculation is performed randomly for another tile in the reference photo.

[0051] The optimization algorithm stops if the predetermined number of iterations has been performed or there are no more tiles in the reference photo for which the replacement would result in receiving a lower value of the global cost function (no more replacements takes place).

[0052] Given the possibility of converging to a sub-optimal solution, restarting of the algorithm multiple times with random initialization can be considered. The number of restarts should scale with the number of computations that can be performed according to the specification of the final device in which the method will be implemented.

[0053] If during optimization there is no progress and loss is not diminishing, the global cost optimization stops, which can be called an early stopping.

[0054] Then the method goes to the step 700 of blending. This step is performed in order to output a final photo having at least one object removed and replaced by a background. In one embodiment, once a region (tile) to be reconstructed is calculated in a specific iteration of the optimization step, it is blended into the reference photo, for example if on the fly displaying is required. In another embodiment blending is performed after the end of the optimization step. Blending can be done by any suitable algorithm, for example using Poisson blending as described in [4] Patrick Perez, Michel Gangnet, Andrew Blake, "Poisson Image Editing" 2014. The purpose of this step is to receive a photo of very good quality. The Poisson blending algorithm elevates color differences and reduces number of artifacts. The regions (tiles) are blended onto the reference image to remove crowd and unwanted objects and produce final result.

[0055] The processing system according to the invention (not shown) comprises a memory and a processor which is configured (by comprising several software modules) to implement steps of the method as described above. In one embodiment the processing system is embedded into a mobile phone. The mobile phone according to the invention comprises display means (not shown) for displaying the output of the software modules. From the user perspective the processing system allows him to perform actions as follows: user starts object or objects removal and first photo is taken, user continues objects removal to collect data. More photos are collected, user stops objects removal and final result is computed. The processing system uses collected data to remove objects, for example a fence, from the first photo.

[0056] In another embodiment the processing system can comprise a module for supporting steady holding of the camera. Said module outputs information on the position of the camera to be displayed by the displaying means. It increases chances of acquisition of non-blurred photos of the same quality. In one embodiment, display means are configured to show a reference photo with frames surrounding detected objects and a camera position indicator. The user can see when enough information has been acquired to remove a specific object, for example based on the frame color. In another embodiment the method stops automatically if the required amount of information has been gathered for removal off all detected objects to be removed.

[0057] Aspects of the present invention can be realized by a computer of a system or apparatus (or devices such as a CPU or MPU) that reads out and executes a computer program product recorded on a memory device to perform the functions of the above-described embodiment(s), and by a method, the steps of which are performed by a computer of a system or apparatus by, for example, reading out and executing a program recorded on a memory device to perform the functions of the above-described embodiment(s). For this purpose, the claimed computer program product is provided to the computer for example via a network or from a recording medium of various types serving as the memory device. The computer program product according to the invention comprises also a non-transitory machine-readable medium.

[0058] It should be understood that the present invention is not limited to the above examples. For those of ordinary skill in the art, improvements or changes can be made according to the above description, and all these improvements and changes should fall within the protection scope of the appended claims of the present invention.