Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR AUTOMATICALLY LABELLING MEDIA
Document Type and Number:
WIPO Patent Application WO/2024/057124
Kind Code:
A1
Abstract:
The improved system and method disclosed herein includes training an auto-labelling engine interatively until a performance of the auto-labelling engine is satisfactory. The auto-labelling engine is trained based on a combination of a plurality of manually annotated first media and a plurality of second media without annotation. At least one media for labelling is received and utilized by the trained auto-labeling engine to label and to draw at least one bounding box around an object in the received media.

Inventors:
MANI CHITHRAI (US)
SANJAY NITHESH (US)
GHOSH RIA (US)
Application Number:
PCT/IB2023/058500
Publication Date:
March 21, 2024
Filing Date:
August 28, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
DIGIT7 INDIA PRIVATE LTD (US)
International Classes:
G06F18/21; G06F18/23; G06N20/00
Domestic Patent References:
WO2018157746A12018-09-07
Foreign References:
US20160034512A12016-02-04
US20210357303A12021-11-18
CN110276369A2019-09-24
US9652460B12017-05-16
Attorney, Agent or Firm:
WASHAM, Steven (US)
Download PDF:
Claims:
CLAIMS

We claim:

Claim 01 A system for automatically labelling electronic media, the system comprising: a first electronic device (300) including: a memory ( 302 ) ; an auto-labelling engine (310) ; an interface (304) ; a display (306) ; and a programmable controller (308) in communication with the memory, interface, and display, the controller adapted to execute stored program instructions for performing the processing steps comprising : receiving a first training dataset (502) including a plurality of annotated media first objects, each first object including a corresponding label (406, 702) ; receiving a second training dataset (504) including a plurality of unannotated media second ob j ects ; training (402, 602, 704) the auto-labelling engine (310) to label each second object based upon comparisons with the labels of the first objects, wherein the auto-labelling engine assigns a label to each of the second objects; presenting (706) , with the display (404) , the second object labels and the first object labels to a user for manual review, and accepting from the user a performance indication corresponding to a satisfactory or an unsatisfactory rating concerning the manual review; if the performance indication is unsatisfactory, iterating (708) these previous training steps with a reduced number of first objects until a satisfactory performance indication is achieved; and labelling (604, 710) , with the autolabelling engine, any unlabelled media objects (506) .

Claim 02 The system of Claim 01, the system further comprising: a second electronic device communicatively coupled via the interface (304) to the first electronic device (300) to provide the plurality of media and training datasets upon which the first electronic device operates .

Claim 03 The system of Claim 01, the processing steps further comprising : drawing, with the auto-labelling engine (310) , a bounding box around at least one labelled second object, the bounding box visible to the user with the display .

Claim 04 The system of Claim 01, the processing steps further comprising : receiving a third media dataset including a plurality of unannotated third objects; labelling (710) each third object with the trained auto-labelling engine (310) ; and drawing a bounding box around at least one labelled third object. Claim 05 The system of Claim 01, wherein the trained autolabelling engine (310) is a neural network model, the training iterations (704; 706; 708) using a ratio of a number of manually annotated first media to a number of training iterations .

Claim 06 A method for automatically labelling electronic media, the method steps comprising: training (602, 704) , by an electronic device controller (308) , an auto-labelling engine (310) using a combination of a plurality of manually annotated first media (502) and a plurality of unannotated second media (504) ; and using (604) the trained auto-labelling engine (310) to label (710) at least one object in at least one unlabelled media received for labelling (506) .

Claim 07 The method of Claim 06, the method steps comprising: drawing, with the auto-labelling engine (310), at least one bounding box around the at least one second object in the at least one second media using labels of objects present in the plurality of manually annotated first media.

Claim 08 The method of Claim 06, the method steps comprising: initiating, during training, an iteration of training (704; 706; 708) the auto-labelling engine (310) to label at least one second object in at least one second media using the plurality of manually annotated first media; determining (706) , by a user, a satisfactory or unsatisfactory performance rating of the autolabelling engine by reviewing the label of the at least one second object in the at least one second media, and assigning a satisfactory or unsatisfactory rating; reducing a number of manually annotated media f rom that number used in a previous iteration if the performance of the auto-labelling engine is determined to be unsatisfactory (708) ; and further training (704) the auto-labelling engine (310) until the performance of the auto-labelling engine is determined to be satisfactory.

Claim 09 The method of Claim 08, the method steps comprising: drawing, with the auto-labelling engine (310), at least one bounding box around the at least one second object in the at least one second media using labels of objects present in the plurality of manually annotated first media.

Claim 10 The method of Claim 08 wherein the auto-labelling engine comprises a neural network model, the training iterations (704; 706, 708) using a ratio of a number of manually annotated first media to a number of training iterations.

Description:
System and Method for Automatically Labelling Media

TECHNICAL FIELD

[0001] Embodiments herein relate to computer-automated electronic media labelling and annotation for use with Al learning models.

BACKGROUND ART

[0002] Bounding boxes are a conventional image annotation method utilized in machine learning, deep learning, and computer vision image identification applications, among others. A depiction of the results of this approach is presented in Figures 1 and 2, where these bounding boxes are used to outline, annotate, and/or label objects, classify the objects and localize the objects present within the image according to application-specific requirements .

[0003] For background, Figure 1 presents a scene with three objects of interest, namely, a dog (100) , a bicycle (102) , and an automobile (104) . Shown are rectangles drawn around each with a corresponding label ("dog," "bicycle," and "car") assigned to the upper left corner of the rectangle. In Figure 2, a multitude of objects of interest, namely, automobiles (200) and pedestrians (202) , are present. Each auto and each pedestrian are identified and a frame is drawn around each with a corresponding label attached.

[0004] This conventional approach involves an operator manually drawing the bounding boxes around the objects present in the image. However, such efforts can become quite time-consuming depending on a number of objects present in the image, and may even require months to years for very large image dataset comprising thousands or millions of images. Thus, this conventional drawing and annotating method tends to be extremely time consuming and stress ful , and is highly inef ficient with regard to time , ef fort , and utili zation . What is needed is a system and method for ob j ect-of-interest identi fication, framing, and annotation that is capable of processing very large datasets quickly, surmounting the above stated shortcomings .

SUMMARY OF INVENTION

[0005] A system for automatically labelling media is provided as set out in claim 1 and a method for automatically labelling media is set out in claim 6 . Dependent claims present additional alternative elements .

[0006] The invention presents a novel system and method for automatically labelling obj ects present in electronic media . The invention addresses the inef ficiencies and inaccuracies of traditional manual and other obj ect identi fication by utili zation of a specially and uniquely trained auto-labelling engine as a part of an electronic device including memory, interface connectivity, user display, and a programmable controller for receiving training and other datasets upon which operation is required and desired .

[0007] In a first embodiment , training begins with a first dataset containing a plurality of obj ects that are preannotated, and a second dataset containing a plurality of unannotated obj ects . The auto-labelling engine assigns a label to each unannotated obj ect based upon comparisons with the annotated obj ects and presents this to a user for review . I f the electronic device receives a satis factory performance review indication from the user the auto-labelling engine is considered trained and is used for subsequent obj ect annotation . However, i f the electronic device receives an unsatis factory performance review the training continues on an iterative basis until satis factory performance is achieved . On each iteration the si ze of the dataset containing initial annotated first obj ects is reduced from the previous iteration until satis factory performance is indicated . Additional embodiments include additional devices for supplying datasets and added capability of labelling and drawing bounding boxes around each obj ect

[0008] The novel features that are considered as characteristic for the invention are set forth particularly in the appended claims . The invention itsel f , however, both as to its construction and methods of operation, together with additional obj ects and advantages thereof , will be best understood from the following description of specific embodiments when read in conj unction with the accompanying drawings .

BRIEF DESCRIPTION OF DRAWINGS

[0009] FIGURE 1 depicts a conventional approach to drawing bounding boxes and labelling images using a manual process with a small number of obj ects , namely a dog, bicycle , and car .

[0010] FIGURE 2 depicts another conventional approach to drawing bounding boxes and labelling images using a manual process with a large number of obj ects , namely, automobiles and pedestrians in a congested urban setting .

[0011] FIGURE 3 depicts a block diagram representation of an electronic device for automatically labelling obj ects in media, according to embodiments as disclosed herein .

[0012] FIGURE 4 depicts a block diagram representation of a controller of the electronic device for automatically labelling the media, according to embodiments as disclosed herein .

[0013] FIGURE 5 presents a diagram depicting automatic labelling of the media, according to embodiments as disclosed herein . [0014] FIGURE 6 presents a flow diagram depicting a method for automatically labelling the media, according to embodiments as disclosed herein .

[0015] FIGURE 7 presents a flow diagram depicting a method for training an auto-labelling engine to automatically label the obj ects in the media, according to embodiments as disclosed herein .

[0016] The above figures are provided for illustration and description only, and are not intended to limit the disclosed invention . Use of the same reference number in multiple figures is intended to designate the same or similar parts .

DESCRIPTION OF EMBODIMENTS

[0017] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description . Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein . The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein . Accordingly, the examples should not be construed as limiting the scope of the embodiments .

[0018] When used herein "computer readable medium" means any tangible portable or fixed RAM or ROM like device , such as , for example but not limitation, portable flash memory, a CDROM, a DVDROM, embedded RAM or ROM integrated circuit devices , or the like , or some combination thereof .

[0019] Referring now to the drawings , and more particularly to FIGS . 3 through 7 , where similar reference characters denote corresponding features consistently throughout the figures , there are shown embodiments .

[0020] Figure 3 depicts a programmable electronic device ( 300 ) for automatically labelling obj ects in electronic media, according to embodiments as disclosed herein . This electronic device ( 300 ) is adapted to automatically label/annotate the one or more obj ects present in the media . Examples of the electronic device ( 300 ) structure may be , but are not limited to a computer server, a personal computing device , a cloud computing device (may be a part of a public cloud or a private cloud) , a multiprocessor system, a microprocessor based programmable computer, a minicomputer, a mainframe computer, a database, or any other device capable of executing program instructions to achieve labelling the objects presents in the media. For example, the server may be at least one of, but is not limited to, a standalone server, a server on a cloud, a bank of servers, or some combination thereof. In another example, the computing device may be at least one of, but is not limited to, a personal computer, a notebook, a tablet, desktop computer, a laptop, a handheld device, a mobile device, a medical device, and so on. Also, the electronic device 200 may be at least one of, a microcontroller, a processor, a System on Chip (SoC) , an integrated chip (IC) , a microprocessor based programmable consumer electronic device, or the like.

[0021] Examples of the electronic media objects ("objects") may be, but are not limited to, people, animals, items, retail products, vehicles, and the like, or some combination thereof . Examples of the electronic media ("media") may be, but are not limited to, a digital images, videos, digital animations, Graphic Interchange Format (GIF) files or the like, direct camera output video feeds, buffered video feeds, or some combination thereof .

[0022] The electronic device (300) automatically labels/annotates the one or more objects in the media by drawing virtual bounding boxes around the one or more objects in the media. The bounding boxes may outline the objects in boxes in accordance with requirements of applications in which the objects have to be labelled. Examples of the applications may be, but are not limited to, autonomous vehicle driving, retail clothing, furniture detection and satellite imagery, analysis of drone and robotics imagery, indoor object detection, retail stores (for example, tracking customers and objects bought by customers) , traffic analysis, weather pattern analysis, object detection in medical labs, farming (crop identification and sorting) , and the like. In an embodiment, the phrases such as "labelling the media", "drawing the bounding boxes around the objects in the media", "media data labelling", and so on, are used interchangeably to refer to labelling of the objects in the media .

[0023] The electronic device (300) embodiment includes a memory (302) , an interface (304) , a display (306) , and a controller (308) . The electronic device (300) may also be communicatively coupled with one or more, or combination of, external electronic devices (for example, a server, a database, one or more cameras or imaging devices, or the like, or some combination thereof) using a communication network interface to receive a plurality of media and training datasets. Examples of the communication network may be, but are not limited to, the Internet, a wired network (a Wide Area Network (WAN) , Local Area Network (LAN) , Ethernet and the like) , a wireless network (a Wi-Fi network, a cellular network, a Wi-Fi Hotspot, Bluetooth, ZigBee and so on) and the like, or some combination thereof. The plurality of media may be received over this network for labelling.

[0024] Training datasets comprise a first training dataset and a second training dataset. The first training dataset may include a plurality of manually annotated media. Each of the plurality of manually annotated media may include one or more objects, which have been labelled manually by a user. The second training dataset may include training media without including the labelled/annotated objects. The first training dataset and the second training dataset may be captured in various backgrounds, lighting conditions, environments, angles, and so on. Embodiments herein use the terms "first training dataset", "manually annotated media", "manually annotated first media", "first media", and so on, interchangeably to refer to media/dataset which includes the manually labelled objects. Embodiments herein use the terms "second training dataset", "unannotated second media," "second media", and so on, interchangeably to refer to media/dataset which includes the objects not labelled/ annotated .

[0025] The memory (302) referred herein may include at least one of, but is not limited to, NAND, embedded Multimedia Card (eMMC) , Secure Digital (SD) cards, micro-SD cards, Compact Flash (CF) cards, Universal Serial Bus (USB) , Serial Advanced Technology Attachment (SATA) , solid-state drive (SSD) , and so on. The memory (302) may also include one or more computer- readable storage media. The memory (302) may also include nonvolatile storage elements. Examples of such non- volatile storage elements may include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories. In addition, the memory (302) may, in some examples, be considered a non-transitory storage medium. The term "non-transitory" may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term "non-transitory" should not be interpreted to mean that the memory (302) is non-movable. In certain examples, a non-transitory storage medium may store data that can, over time, change (e.g., in Random Access Memory (RAM) or cache) .

[0026] The memory (302) may store at least one of, but is not limited to, the plurality of media, the training datasets, labels of the one or more objects, an automatic ( auto ) -labelling engine (310) , controller (308) program instructions, and the like. The label may correspond to a tag/identif ier that uniquely identifies the associated object. [0027] The auto-labelling engine (310) may be a neural network module algorithm, which may be trained by the controller (308) to automatically label the one or more objects in the plurality of media or in a very large dataset. Examples of the autolabelling engine (310) include, but are not limited to, a convolutional neural network (CNN) model, a machine learning model, an Artificial Intelligence (Al) model, a deep neural network (DNN) model, a recurrent neural network (RNN) model, a restricted Boltzmann Machine (RBM) model, a deep belief network (DBN) model, a bidirectional recurrent deep neural network (BRDNN) model, generative adversarial networks (GAN) , a regression based neural network, a deep reinforcement model (with ReLU activation) , a deep Q-network, a You only look once (YOLO) model, and so on. The auto-labelling engine (310) may include a plurality of layers. Examples of the layers may be, but are not limited to, a convolutional layer, an activation layer, an average pool layer, a max pool layer, a concatenated layer, a dropout layer, a fully connected layer, a SoftMax layer, and so on. Each layer has a plurality of weight values and performs a layer operation through calculation of a previous layer and an operation of a plurality of weights/coef f icients .

[0028] A topology of the layers of the auto-labelling engine may vary based on a type of the auto-labelling engine (310) . In an example, the auto-labelling engine (310) may include an input layer, an output layer, and a hidden layer. The input layer receives an input (for example: the training datasets) and forwards the received input to the hidden layer. The hidden layer transforms the input received from the input layer into a representation, which can be used for generating the output in the output layer. The hidden layers extract useful/low level features from the input, introduce non-linearity in the network and reduce a feature dimension to make the features equivariant to scale and translation . The nodes of the layers can be fully connected via edges to the nodes in adj acent layers . The input received at the nodes of the input layer can be propagated to the nodes of the output layer via an activation function that calculates the states of the nodes of each successive layer in the network based on coef f icients/weights respectively associated with each of the edges connecting the layers . Embodiments herein use the terms such as "neural network model" , " learning model" , "media labelling and management tool" , and so on, interchangeably to refer to a model which has been trained and used for automatically labelling the media .

[0029] The interface (304 ) may be configured to enable the electronic device (300 ) to communicate with one or more external electronic device entities using an interface supported by the communication network . Examples of the interface may be , but are not limited to , a wired interface , a wireless interface , any structure supporting communications over a wired or wireless connection, or some combination thereof .

[0030] The display ( 306 ) is configured to enable a user to interact with the electronic device ( 300 ) . The display ( 306 ) is adapted to receive from the controller ( 302 ) the annotated media/ first training dataset representation, wherein the annotated media includes the one or more obj ects labelled manually by the user, and visually represent to the user the automatically labelled obj ects in the media .

[0031] The controller ( 308 ) is adapted to automatically label/annotate the obj ects in the given media . The controller ( 308 ) referred herein may include one or a plurality of processors . One or a plurality of processors may be a general- purpose processor, such as a central processing unit (CPU) , an application processor (AP ) , or the like , a graphics-only processing unit such as a graphics processing unit (GPU) , a visual processing unit (VPU) , and/or an Artificial Intelligence (Al ) -dedicated processor such as a neural processing unit (NPU) , or some combination thereof.

[0032] The controller (308) trains the auto-labelling engine (310) and then uses the trained auto-labeling engine (310) to automatically draw the one or more virtual bounding boxes around the one or more objects detected in the media. Thereby, automatically labelling the one or more detected objects in the media .

[0033] In this embodiment the controller (308) trains the autolabelling engine (310) in a plurality of iterations, until a performance of the auto-labelling engine (310) is satisfactory. In each iteration the controller (308) trains the auto-labelling engine (310) based on the first training dataset/plurality of manually annotated media objects, and a second training dataset/ training media object set.

[0034] For training the auto-labelling engine (310) , the controller (308) receives the first training dataset and the second training dataset from the memory (302) or from an external entity. The controller (308) initiates a first iteration of training the auto-labelling engine (310) and then feeds the received first training dataset and second training dataset to the auto-labelling engine (310) . The auto-labelling engine (310) automatically draws the one or more virtual bounding boxes around the one or more objects detected in each media of the second training dataset to label the corresponding one or more objects. The auto-labelling engine (310) takes one image at a time and generates output values (bounding box coordinate values and object class confidence score) . This output along with the respective image is sent to a user interface display (306) , which uses a visualizer function to overlay the bounding boxes on the image . [0035] The auto-labelling engine (310) automatically draws the bounding boxes around the objects present in each media of the second training dataset by learning the labels of the objects present in the plurality of manually annotated media of the first training dataset. The auto-labelling engine (310) can utilize a supervised learning approach to learn labels of the objects. The auto-labelling engine (310) takes an image as an input and outputs bounding box coordinates and an object class probability score for detected objects. The auto-labelling engine (310) compares these bounding box coordinates with the manually labeled ground truth bounding box coordinates to determine a loss value. The learning process attempts to minimize the loss value to a number range between 0 and 0.5, which can be done through a machine learning approach called back propagation.

[0036] On labeling the objects in each media of the second training dataset, the controller (308) checks the performance of the auto-labelling engine (310) . Checking the performance of the auto-labelling engine (310) refers to determining if each media object of the second training dataset has been labelled correctly or not. In an example, the controller (308) may allow the user to check the performance of the auto-labelling engine (310) . If the performance of the auto-labelling engine (310) is satisfactory (i.e., the plurality of media objects of the second training dataset have been labelled correctly) , the electronic device (300) uses the auto-labelling engine (310) to label the objects in the media received for labelling or to label the remaining training datasets. A sample set of auto-labeled objects is sent to the user interface display (306) to be manually inspected by a user for bounding box accuracy. Accuracy is determined to be satisfactory if the object lies inside the bounding box and the bounding box fits the object without large gaps between its edges and the object perimeter. The gap threshold depends on user preference. In general, a lower gap improves the accuracy of the bounding box.

[0037] If the performance of the auto-labelling engine (310) is deemed not satisfactory based upon established criteria (e.g., the plurality of media objects of the second training dataset have not been labelled correctly) , the controller (308) further trains the auto-labelling engine (310) in the successive iteration (s) to label the objects. In each successive iteration, the controller (308) decreases a number of manually annotated media used in a previous approach for training the autolabelling engine (310) . The number of total iterations is determined by the level of precision desired.

[0038] In an example embodiment, the controller (308) uses up to four iterations/stages to train the auto-labelling engine (310) , thereby increasing a precision of the auto- labelling of the objects. In the first iteration/stage the controller (308) may use a minimum of 200 manually annotated media per object for training the auto-labelling engine (310) . During each of the successive iterations/stages the controller (308) uses 50% of the number of media used in the previous iteration/stage. Once the auto-labelling engine (310) is trained, the controller (308) may use the trained auto-labelling engine (310) to label the objects in the given media automatically, and draw a bounding box around each object if desired. The trained auto-labelling engine (310) may be a trained model in which a number of layers, a sequence for processing the layers and parameters related to each layer may be known and fixed for labeling the media in the objects. Examples of the parameters may be, but are not limited to, activation functions, biases, input weights, output weights, height, width, and so on. [0039] In an embodiment, on training the auto-labelling engine (310) , the controller (308) may deploy the trained auto-labelling engine (310) onto a target device (for example, at least one computing device, a device used for training the auto-labelling engine, or the like) . The target device may use the trained autolabelling engine (310) to automatically label the objects in the given media.

[0040] The controller (308) may also be configured to receive the media from the user or from at least one external entity for labelling. The controller (308) uses the trained auto-labelling engine (310) to draw the bounding boxes around the one or more objects present in the received media for labelling the corresponding one or more objects.

[0041] Thus, the media may be automatically labelled and bounded using the auto-labelling engine (310) with a very minimal manual effort .

[0042] Figure 4 depicts the controller (308) of the electronic device (300) for automatically labelling the media objects, according to embodiments as disclosed herein. The controller (308) includes a training module (402) , and a performance checking module (404) for training the auto-labelling engine (310) in the iterative approach to automatically label the given media objects. The controller ( 308) also includes a labelling module ( 406) to automatically label the given media using the trained auto-labelling engine (310) .

[0043] The training module (402) may be configured to train the auto-labelling engine (310) based on the first training dataset and the second training dataset. The first training dataset includes the plurality of manually annotated media. The second training dataset includes the plurality of media without annotation. The training module (402) feeds the first training dataset and the second training dataset to the auto-labelling engine (310) . The training module (402) further enables the autolabelling engine (310) to label each media of the second training dataset by drawing the bounding boxes around the ob j ects/second objects in each media of the second training dataset. The auto-labelling engine (310) draws the bounding boxes around the second objects based on the labels associated with the ob j ects/ first objects present in the plurality of manually annotated media.

[0044] The performance checking module (404) may be configured to check the performance of the auto-labelling engine (310) , on labelling the media of the second training dataset based on the plurality of manually annotated media, and accordingly determine to further train the auto-labelling engine (310) in the iterations. The performance checking module (404) allows the user to check the performance of the auto- labelling engine (310) by reviewing the labelling of each media of the second training dataset. If the performance of the auto-labelling engine (310) is satisfactory, the performance checking module (404) may provide the instructions to the labelling module (406) or the training module (402) to use the trained auto-labelling engine (310) for labelling the given media or the remaining training datasets, respectively. If the performance of the auto-labelling engine (310) is not satisfactory, the performance checking module (404) provides instructions to the training module (402) for further training of the auto-labelling engine (310) using the reduced number of manually annotated media used in the previous iteration for training the auto-labelling engine (310) .

[0045] The labelling module (406) may be configured to receive the media and label the received media using the auto-labelling engine (310) . The labelling module (406) feeds the received media to the trained auto-labelling engine (310) , which labels the objects present in the media by drawing the bounding boxes around the objects present in the media.

[0046] Figures 3 and 4 show exemplary embodiments of the electronic device (300) , but it is to be understood that other embodiments are not limited thereon. In other embodiments, the electronic device (300) may include less or a greater number of units. Further, the labels or names of the units are used only for illustrative purpose and does not limit the scope of the embodiments herein. One or more units can be combined together to perform same or substantially similar function in the electronic device (300) .

[0047] The electronic device (300) trains an auto-labelling engine (310) and uses the trained auto-labelling engine (310) to automatically draw the one or more bounding boxes around the one or more objects present in the media. Thereby, automatically labelling the one or more objects in the media. The electronic device (300) iteratively trains the auto-labelling engine (310) based on a combination of the first training dataset and the second training dataset until the performance of the autolabelling engine (310) is satisfactory. The training of the auto-labelling engine (310) involves enabling the auto-labelling engine (310) to label each media in the second training dataset based on the plurality of manually annotated media of the first training dataset.

[0048] Consider an example scenario, wherein the electronic device (300) receives 200 manually annotated images/media per object (which includes the objects which have been labelled manually by the user) as the first training dataset from the user. On receiving the 200 manually annotated images, the electronic device (300) initiates a first iteration of training the autolabelling engine (310) . In the first iteration, the electronic device (300) provides the 200 manually annotated images (iteration 1: 200 images) and the plurality of images/media (without annotated/labelled) of the second training dataset to the autolabelling engine (310) for training. The auto-labelling engine (310) draws bounding boxes around the objects present in each image of the second training dataset by learning the labels associated with the objects present in the 200 manually annotated images. The bounding boxes that have been draw around the objects present in each image may be used to label the corresponding objects. On labelling each image of the second training dataset in the first iteration, the electronic device (300) checks the performance of the auto-labelling engine (310) by reviewing the labeling of each image of the second training dataset. In an example herein, consider that the performance of the auto-labelling engine (310) is not satisfactory. In such a scenario, the electronic device (300) initiates a second iteration of training the auto-labelling engine (310) . In the second iteration, the electronic device (300) provides only manually annotated images (i.e., 50% of the manually annotated images used in the first iteration (iteration 2: 100 images) ) and the second training dataset including the images without annotated to the auto-labeling engine (310) . The auto-labelling engine (310) labels the images of the second training dataset based on the 100 manually annotated images.

[0049] On labelling each image of the second training dataset in the second iteration, the electronic device (300) checks the performance of the auto-labelling engine (310) by reviewing the labeling of each image of the second training dataset and determines that the performance of the auto-labelling engine (310) is not satisfactory. In such a scenario, the electronic device (300) initiates a third iteration of training the autolabelling engine (310) . In the third iteration, the electronic device (300) provides only 50 manually annotated images (i.e., 50% of the manually annotated images used in the second iteration (iteration 3: 50 images) ) and the second training dataset including the images without annotated to the autolabeling engine (310) . The auto-labelling engine (310) labels the images of the second training dataset based on the 50 manually annotated images.

[0050] On labelling each image of the second training dataset in the third iteration, the electronic device (300) checks the performance of the auto-labelling engine (310) by reviewing the labeling of each image of the second training dataset and determines that the performance of the auto-labelling engine (310) is not satisfactory. In such a scenario, the electronic device (300) initiates a fourth iteration of training the autolabelling engine (310) . In the fourth iteration, the electronic device 200 provides only 25 manually annotated images (i.e., 50% of the manually annotated images used in the third iteration (iteration 4: 25 images) ) and the second training dataset including the images without annotated to the auto-labeling engine (310) . The auto-labelling engine (310) labels the images of the second training dataset based on the 25 manually annotated images.

[0051] On labelling each image of the second training dataset in the fourth iteration, the electronic device (300) determines that the auto-labelling engine (310) has been trained successfully. The electronic device (300) uses the trained autolabelling engine (310) to label the given media. The trained auto-labelling engine (310) may be a neural network model in which the parameters of each layer may be fine-tuned to achieve the best auto-labelling performance.

[0052] In an embodiment herein, a ratio of number of images to an iteration number in combination may be used to fine tune the parameters of the auto-labelling engine (310) , which enables to achieve the best auto labelling performance.

[0053] In another example scenario, the electronic device (300) trains the auto-labelling engine (310) by: tuning a batch size of the auto-labelling engine to 64; changing subdivisions of the auto-labelling engine to 16; determining maximum (max) batches depending on a number of classes (for example, if the classes are less than 3, the max batches may be determined as 6000 and if classes are more than 3, the max batches may be determined as 2000 * number of classes) ; changing a number of steps to 80%, 90% of the max batches, wherein step refers to one gradient update (i.e. one update to neural network values) during the training process and the neural network processing a batch of images (batch size 64) as mentioned above (during each step) ; determining width, height of layers of the auto-labelling engine (310) in multiples of 32; changing a number of classes to current number of classes in each first layer (for example, a yolo layer) of the auto-labelling engine (310) ; changing a number of filters = (number of classes + 5) * 3 (in a second layer (for example, a convolutional layer) right before the first layer only) ; wherein, the batch size, subdivisions, the classes, the width, the height, the number of filters, and so on, may be the parameters/conf iguration parameters of the auto-labelling engine (310) . With respect to the above tuning of the parameters of the auto-labelling engine (310) , an accuracy of 85 mean average precision (mAP) and a loss between 0 and 0.5 may be achieved. It is understood that embodiments herein are not limited to above-described examples of tuning the parameters of the auto-labelling engine, and may include two or more various ways (including the above-described example) of tuning the parameters of the auto-labelling engine at a same time. [0054] Figure 5 is an example conceptual diagram depicting the automatic labelling of the media (500) , according to embodiments as disclosed herein. A first labelled media (502) is provided to the auto-labelling engine (310) for training, so that a second provided media (504) may be processed and ultimately autolabelled (506) .

[0055] Figure 6 is an example flow diagram (600) depicting a method for automatically labelling the media, according to embodiments as disclosed herein.

[0056] At step (602) , the method includes training, by the electronic device (300) , the auto-labelling engine (310) based on the combination of the first training dataset and the second training dataset. The first training dataset includes the plurality of manually annotated first media. The second training dataset includes the plurality of second media without annotation .

[0057] At step (604) , the method includes using, by the electronic device, the trained auto-labelling engine to label at least one object in the at least one media received for labelling. The various actions in method (600) may be performed in the order presented, in a different order or simultaneously. Moreover, additional media datasets of unannotated objects may be presented for automatic labelling.

[0058] Figure 7 is an example flow diagram (700) depicting a method for training the auto-labelling engine (310) to automatically label the objects in the media, according to embodiments as disclosed herein.

[0059] At step (702) , the electronic device (300) receives the manually annotated media as a first training dataset for the first iteration of training the auto-labelling engine (310) . [0060] At step (704) , the electronic device (300) trains the auto-labelling engine (310) to label each media object of a second training dataset (second objects) based on the labels of the objects present in the manually annotated media of the first training dataset (first objects) . Training includes second object boundary detection, framing, and corresponding labelling following automated direct comparisons made with the annotated first objects.

[0061] At step (706) , the electronic device (300) checks the performance of the auto-labelling engine (310) by allowing the operator/user to manually review the labelling of each media second object of the second training dataset in view of the first object labels of the first training dataset. Upon review, the user inputs through the user interface a rating to the electronic device (300) concerning the determined satisf actory/unsatisf actory nature of the auto-labelling efforts.

[0062] I f the performance of the auto-labelling engine (310) is deemed to be unsatisfactory, at step (708) , the electronic device (300) initiates a subsequent iteration of training the autolabelling engine (310) using a reduced number of manually annotated media than was used in the previous iteration (for example, 50% fewer) , and the auto-labelling engine (310) is once again trained and its second-object labelling output evaluated by the user. This iterative process continues until the second training dataset object labels are determined by the user to be satisfactory. In another embodiment, the number of iterations of training is restricted.

[0063] If the performance of the auto-labelling engine (310) is deemed satisfactory, at step (710) the electronic device (300) uses the trained auto-labelling engine (310) to automatically label the remaining dataset media objects if additional objects remain, or an additional dataset containing unlabelled media ob j ects .

[0064] At step ( 712 ) the electronic device ( 300 ) stores the labels associated with the obj ects in the labelled media in the memory ( 302 ) for subsequent use , including use as a training dataset . The metadata labels may be retained separate from the labelled obj ect data so long as a database index or key associates the label with the associated obj ect . Or, the original obj ect data may be stored with the label data, for example , in a single database record . The various actions in method ( 700 ) may be performed in the order presented, in a di f ferent order or simultaneously .

[0065] In another embodiment , in addition to labelling a second training dataset obj ect , it is also possible for the autolabelling engine to detect the obj ect edges and draw an appropriately si zed bounding-box ( frame ) around the obj ect .

[0066] The embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the elements . The elements shown in Figure 3 and Figure 4 can be at least one of the hardware devices , or a combination of hardware device and software modules .

[0067] The embodiments disclosed herein describe methods and systems for automatically labelling media . Therefore , it is understood that the scope of the protection is extended to such a program and in addition to a computer readable medium, having a message therein, such computer readable storage means contain program code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device . The method is implemented in a preferred embodiment through or together with a software program written in e . g . , Very high-speed integrated circuit Hardware Description Language (VHDL) , another programming language, or implemented by one or more VHDL or several software modules being executed on at least one hardware device. The hardware device can be any kind of portable device that can be programmed. The device may also include means which could be e.g., hardware means like e.g., an ASIC, or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. The method embodiments described herein could be implemented partly in hardware and partly in software. Alternatively, the invention may be implemented on different hardware devices, e.g., using a plurality of CPUs.

[0068] As indicated above, aspects of this invention pertain to specific "method functions" implementable through various computer systems. In an alternate embodiment, the invention may be implemented as a computer program product for use with a computer system. Those skilled in the art will readily appreciate that programs defining the functions of the present invention can be delivered to a computer in many forms, which include, but are not limited to: (a) information permanently stored on non-writeable storage media (e.g. read only memory devices within a computer such as ROMs or CD-ROM disks readable only by a computer I/O attachment) ; (b) information alterably stored on writeable storage media (e.g. floppy disks and hard drives) ; or (c) information conveyed to a computer through communication media, such as a local area network, a telephone network, or a public network like the Internet. It should be understood, therefore, that such media, when carrying computer readable instructions that direct the method functions of the present invention, represent embodiments of the present invention . [0069] The scope of the invention is established by the appended claims rather than by the foregoing description . Further, the recitation of method steps does not denote a limiting sequence for execution of the steps . Such method steps may therefore be performed in a sequence other than that recited unless the claim expressly states otherwise .

INDUSTRIAL APPLICABILITY

[0070] The embodiments herein have industrial applicability in that the datasets created may be used for training Al learning models and the like .