Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SAFETY MONITORING SYSTEM FOR OPERATORS OF AGRICULTURAL VEHICLES AND OTHER HEAVY MACHINERY
Document Type and Number:
WIPO Patent Application WO/2024/102960
Kind Code:
A1
Abstract:
A safety monitoring system for operators of agricultural vehicles and other heavy machinery includes a camera configured to be mounted within an operating cabin and a controller that is configured to: receive one or more images or video frames of an operator at an entrance of an operating cabin from the camera; detect a predefined set of body joints and features within the one or more images or video frames using a skeleton model; estimate the operator's posture with respect to the entrance of the operating cabin based on the detected body joints and features; and assign a risk level to the one or more images or video frames based on an estimate of the operator's posture with respect to the entrance of the operating cabin using a trained classifier, wherein the trained classifier is based on a machine learning model.

Inventors:
LOWNDES BETHANY (US)
SIU KA-CHUN (US)
PITLA SANTOSH (US)
IRUMVA TERENCE (US)
MWUNGUZI HERVE (US)
YODER AARON (US)
Application Number:
PCT/US2023/079311
Publication Date:
May 16, 2024
Filing Date:
November 10, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV NEBRASKA (US)
International Classes:
B60R1/00; B60K35/28; B62D33/06; G06N3/08; G06N20/00; G08B21/02
Attorney, Agent or Firm:
ABOU-NASR, Faisal, K. (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A safety monitoring system for operators of agricultural vehicles and other heavy machinery, comprising: a camera configured to be mounted within an operating cabin; a controller communicatively coupled to the camera, the controller being configured to: receive one or more images or video frames of an operator at an entrance of the operating cabin from the camera; detect a predefined set of body joints and features within the one or more images or video frames using a skeleton model; estimate the operator’s posture with respect to the entrance of the operating cabin based on the detected body joints and features; and assign a risk level to the one or more images or video frames based on an estimate of the operator’s posture with respect to the entrance of the operating cabin using a trained classifier, wherein the assigned risk level is based on an orientation of the operator’s head with respect to the operating cabin and a number of contact points between the operator and ingress/egress structures at the entrance of the operating cabin; and an alert system communicatively coupled to the controller, the alert system being configured to provide an alert based on the assigned risk level.

2. The safety monitoring system of claim 1 , wherein the controller is configured to assign a low risk level to the one or more images or video frames when the estimate of the operator’s posture with respect to the entrance of the operating cabin indicates that the operator is looking into the operating cabin while entering the operating cabin and that there are at least three contact points between the operator and the ingress/egress structures at the entrance of the operating cabin.

3. The safety monitoring system of claim 1 , wherein the controller is configured to assign a medium risk level to the one or more images or video frames when the estimate of the operator’s posture with respect to the entrance of the operating cabin indicates that the operator is looking into the operating cabin while entering the operating cabin and that there are only two contact points between the operator and the ingress/egress structures at the entrance of the operating cabin.

4. The safety monitoring system of claim 1 , wherein the controller is configured to assign a high risk level to the one or more images or video frames when the estimate of the operator’s posture with respect to the entrance of the operating cabin indicates that the operator is not looking into the operating cabin while entering the operating cabin or that there are less than two contact points between the operator and the ingress/egress structures at the entrance of the operating cabin.

5. The safety monitoring system of claim 1 , wherein the alert system is configured to provide a heightened alert when the assigned risk level is a high risk level in comparison to when the assigned risk level is a medium risk level.

6. The safety monitoring system of claim 1 , wherein the alert system includes at least one of a visual output device or an audible output device.

7. The safety monitoring system of claim 1 , wherein the alert system includes a separate alert system controller that is configured to receive information about the assigned risk level from the controller and provide the alert based on the assigned risk level.

8. The safety monitoring system of claim 1 , wherein the controller is configured to initiate the receiving of the one or more images or video frames of the operator at the entrance of the operating cabin from the camera in response to first detecting motion at the entrance of the operating cabin with the camera.

9. The safety monitoring system of claim 1 , further comprising: a motion detector communicatively coupled to the controller, wherein the motion detector is separate from the camera, and wherein the controller is configured to initiate the receiving of the one or more images or video frames of the operator at the entrance of the operating cabin from the camera in response to first detecting motion at the entrance of the operating cabin with the motion detector.

10. The safety monitoring system of claim 1 , wherein the trained classifier is based on a machine learning (ML) model comprising: a first script that causes the controller to detect skeletons in images from a training data set, wherein the images are assigned a predetermined risk level; a second script that causes the controller to convert raw skeleton data into organized skeleton data by computing a number of individual skeletons detected in each of the images and discarding any of the images in which no skeletons were detected; a third script that causes the controller to extract body joints and features of the detected skeletons in the images by processing the organized skeleton data; and a fourth script that causes the controller to train the ML model based on the extracted body joints and features of the detected skeletons in the images and the predetermined risk level corresponding to each of the images, wherein the trained classifier comprises the ML model after it has been trained.

11. A method of monitoring and alerting operators of agricultural vehicles and other heavy machinery, comprising: recording one or more images or video frames of an operator at an entrance of the operating cabin using a camera that is mounted in the operating cabin; detecting a predefined set of body joints and features within the one or more images or video frames using a computerized skeleton model; estimating the operator’s posture with respect to the entrance of the operating cabin based on the detected body joints and features; assigning a risk level to the one or more images or video frames based on an estimate of the operator’s posture with respect to the entrance of the operating cabin using a computerized trained classifier, wherein the assigned risk level is based on an orientation of the operator’s head with respect to the operating cabin and a number of contact points between the operator and ingress/egress structures at the entrance of the operating cabin; and providing an alert based on the assigned risk level.

12. The method of claim 11 , wherein a low risk level is assigned to the one or more images or video frames when the estimate of the operator’s posture with respect to the entrance of the operating cabin indicates that the operator is looking into the operating cabin while entering the operating cabin and that there are at least three contact points between the operator and the ingress/egress structures at the entrance of the operating cabin.

13. The method of claim 11 , wherein a medium risk level is assigned to the one or more images or video frames when the estimate of the operator’s posture with respect to the entrance of the operating cabin indicates that the operator is looking into the operating cabin while entering the operating cabin and that there are only two contact points between the operator and the ingress/egress structures at the entrance of the operating cabin.

14. The method of claim 11 , wherein a high risk level is assigned to the one or more images or video frames when the estimate of the operator’s posture with respect to the entrance of the operating cabin indicates that the operator is not looking into the operating cabin while entering the operating cabin or that there are less than two contact points between the operator and the ingress/egress structures at the entrance of the operating cabin.

15. The method of claim 11 , wherein a heightened alert is provided when the assigned risk level is a high risk level in comparison to when the assigned risk level is a medium risk level.

16. The method of claim 11 , wherein the alert comprises at least one of a visual alert or an audible alert.

17. The method of claim 11 , wherein the recording of the one or more images or video frames of the operator at the entrance of the operating cabin from the camera is initiated in response to first detecting motion at the entrance of the operating cabin with the camera.

18. The method of claim 11 , wherein the recording of the one or more images or video frames of the operator at the entrance of the operating cabin from the camera is initiated in response to first detecting motion at the entrance of the operating cabin with a motion detector that is separate from the camera.

19. The method of claim 11 , wherein the computerized trained classifier is based on a machine learning (ML) model comprising: a first script that causes a controller to detect skeletons in images from a training data set, wherein the images are assigned a predetermined risk level; a second script that causes the controller to convert raw skeleton data into organized skeleton data by computing a number of individual skeletons detected in each of the images and discarding any of the images in which no skeletons were detected; a third script that causes the controller to extract body joints and features of the detected skeletons in the images by processing the organized skeleton data; and a fourth script that causes the controller to train the ML model based on the extracted body joints and features of the detected skeletons in the images and the predetermined risk level corresponding to each of the images, wherein the computerized trained classifier comprises the ML model after it has been trained.

20. A safety monitoring system for operators of agricultural vehicles and other heavy machinery, comprising: a camera configured to be mounted within an operating cabin; and a controller communicatively coupled to the camera, the controller being configured to: receive one or more images or video frames of an operator at an entrance of the operating cabin from the camera; detect a predefined set of body joints and features within the one or more images or video frames using a skeleton model; estimate the operator’s posture with respect to the entrance of the operating cabin based on the detected body joints and features; and assign a risk level to the one or more images or video frames based on an estimate of the operator’s posture with respect to the entrance of the operating cabin using a trained classifier, wherein the trained classifier is based on a machine learning (ML) model comprising: a first script that causes the controller to detect skeletons in images from a training data set, wherein the images are assigned a predetermined risk level; a second script that causes the controller to convert raw skeleton data into organized skeleton data by computing a number of individual skeletons detected in each of the images and discarding any of the images in which no skeletons were detected; a third script that causes the controller to extract body joints and features of the detected skeletons in the images by processing the organized skeleton data; and a fourth script that causes the controller to train the ML model based on the extracted body joints and features of the detected skeletons in the images and the predetermined risk level corresponding to each of the images, wherein the trained classifier comprises the ML model after it has been trained.

Description:
TITLE OF THE INVENTION

SAFETY MONITORING SYSTEM FOR OPERATORS OF AGRICULTURAL VEHICLES AND OTHER HEAVY MACHINERY

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] The present application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Application Serial No. 63/424,261 , filed November 10, 2022, and titled “Agricultural Machinery Operators Safety Monitoring System,” which is incorporated herein by reference in its entirety.

STATEMENT OF GOVERNMENT SUPPORT

[0002] This invention was made with U.S. government support under grant number U54 OH010162 awarded by the Centers for Disease Control and Prevention (CDC). The U.S. government has certain rights in the invention.

TECHNICAL FIELD

[0003] The present disclosure generally relates to safety systems for protecting operators of heavy machinery.

BACKGROUND

[0004] Agricultural machinery and tractor operators’ injuries account for 23% of all agriculture producers' injuries, and for farmers who are in their old age, the risk and the seventy of these injuries increase. To address this safety issue, machinery-related injuries should get much more attention especially in the midwestern row crops dominated agriculture, where large equipment is commonly used by most farmers. In particular, tractor operators’ safety should be of paramount concern as tractors are one the most used agricultural equipment, due to their ability to support multiple implements like planters and fertilizer applicators. Ideally, a viable approach to address the issue is preventing these injuries before they occur, which is why monitoring tractor operators’ safety behaviors is a potentially effective approach.

[0005] “Human-centered approaches” are lacking on the issue of tractor safety. Even the research that focuses on tractor operator behaviors typically concentrates on data collection and analysis without more. [0006] There is a need for systems that can monitor operators of agricultural vehicles (e.g., tractors, harvesters, planters, sprayers, etc.) and other heavy machinery (e.g., forklifts, cranes, bulldozers, etc.) and provide feedback to the machinery operators about their safety practices. This should be done at the most preliminary stages of operating heavy machinery while the operator is entering an operating cabin (e.g., driver and/or control cabin) of the machinery.

SUMMARY

[0007] A safety monitoring system for operators of agricultural vehicles and other heavy machinery is disclosed herein. The safety monitoring system includes a camera configured to be mounted within an operating cabin (e.g., driver and/or control cabin) of the machinery and a controller that is communicatively coupled to the camera. In embodiments, the controller is configured to: receive one or more images or video frames of an operator at an entrance of an operating cabin from the camera; detect a predefined set of body joints and features within the one or more images or video frames using a skeleton model; estimate the operator’s posture with respect to the entrance of the operating cabin based on the detected body joints and features; and assign a risk level to the one or more images or video frames based on an estimate of the operator’s posture with respect to the entrance of the operating cabin using a trained classifier.

[0008] In some embodiments, the assigned risk level is based on an orientation of the operator’s head with respect to the operating cabin and a number of contact points between the operator and ingress/egress structures at the entrance of the operating cabin. For example, the controller may be configured to assign: (1 ) a low risk level to the one or more images or video frames when the estimate of the operator’s posture with respect to the entrance of the operating cabin indicates that the operator is looking into the operating cabin while entering the operating cabin and that there are at least three contact points between the operator and the ingress/egress structures at the entrance of the operating cabin; (2) a medium risk level to the one or more images or video frames when the estimate of the operator’s posture with respect to the entrance of the operating cabin indicates that the operator is looking into the operating cabin while entering the operating cabin and that there are only two contact points between the operator and the ingress/egress structures at the entrance of the operating cabin; and (3) a high risk level to the one or more images or video frames when the estimate of the operator’s posture with respect to the entrance of the operating cabin indicates that the operator is not looking into the operating cabin while entering the operating cabin or that there are less than two contact points between the operator and the ingress/egress structures at the entrance of the operating cabin.

[0009] The trained classifier is based on a machine learning (ML) model which may include, but is not limited to: a first script that causes the controller to detect skeletons in images from a training data set, wherein the images are assigned a predetermined risk level; a second script that causes the controller to convert raw skeleton data into organized skeleton data by computing a number of individual skeletons detected in each of the images and discarding any of the images in which no skeletons were detected; a third script that causes the controller to extract body joints and features of the detected skeletons in the images by processing the organized skeleton data; and a fourth script that causes the controller to train the ML model based on the extracted body joints and features of the detected skeletons in the images and the predetermined risk level corresponding to each of the images, wherein the trained classifier comprises the ML model after it has been trained.

[0010] The safety monitoring system may further include an alert system communicatively coupled to the controller. The alert system is configured to provide an alert (e.g., a visual alert and/or an audible alert) based on the assigned risk level. For example, the alert system may be configured to provide a heightened alert when the assigned risk level is a high risk level in comparison to when the assigned risk level is a medium risk level. The alert system may be configured to provide no alerts when the assigned risk level is a low risk level, or alternatively the alert system may be configured to provide a reduced alert when the assigned risk level is a low risk level in comparison to when the assigned risk level is a medium risk level.

[0011] This Summary is provided solely as an introduction to subject matter that is fully described in the Detailed Description and Drawings. The Summary should not be considered to describe essential features nor be used to determine the scope of the Claims. Moreover, it is to be understood that both the foregoing Summary and the following Detailed Description are example and explanatory only and are not necessarily restrictive of the subject matter claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] The Detailed Description is provided with reference to the accompanying Drawings. The use of the same reference numbers in different instances in the Detailed Description and the Drawings may indicate similar or identical items. The Drawings are not necessarily to scale, and any disclosed processes may be performed in an arbitrary order, unless a certain order of steps/operations is inherent or specified in the Detailed Description or in the Claims.

[0013] FIG. 1 is a block diagram of a safety monitoring system for operators of agricultural vehicles and other heavy machinery, in accordance with an example embodiment of this disclosure.

[0014] FIG. 2 is a flow diagram of a process implemented by the safety monitoring system, in accordance with an example embodiment of this disclosure.

[0015] FIG. 3 is a flow diagram of a process implemented by an alert system controller of the safety monitoring system, in accordance with an example embodiment of this disclosure.

[0016] FIG. 4A is a skeleton model used by the safety monitoring system to estimate a person’s posture by detecting a predefined set of body joints and features within an image of the person, in accordance with an example embodiment of this disclosure.

[0017] FIG. 4B is an image of a person with the predefined set of body joints and features of the skeleton model detected within the image, in accordance with an example embodiment of this disclosure.

[0018] FIG. 5A is an image of an operator entering an operating cabin of an agricultural vehicle or other heavy machinery, wherein the operator’s posture is associated with a low risk level because the operator is stepping into the cabin with three points of contact (i.e. , two hands and one foot) and looking into the cabin as they enter, in accordance with an example embodiment of this disclosure. [0019] FIG. 5B is an image of an operator entering an operating cabin of an agricultural vehicle or other heavy machinery, wherein the operator’s posture is associated with a medium risk level because the operator is stepping into the cabin with only two points of contact (i.e. , one hand and one foot) and looking into the cabin as they enter, in accordance with an example embodiment of this disclosure.

[0020] FIG. 5C is an image of an operator entering an operating cabin of an agricultural vehicle or other heavy machinery, wherein the operator’s posture is associated with a high risk level because the operator is stepping into the cabin with only one point of contact (i.e., one foot) and/or is not looking into the cabin as they enter, in accordance with an example embodiment of this disclosure.

[0021] FIG. 6 is a flow diagram including a series of scripts executed by a main controller of the safety monitoring system to implement a machine learning (ML) model, in accordance with an example embodiment of this disclosure.

[0022] FIG. 7A is a low risk level training image for the ML model, in accordance with an example embodiment of this disclosure.

[0023] FIG. 7B is a medium risk level training image for the ML model, in accordance with an example embodiment of this disclosure.

[0024] FIG. 7C is a high risk level training image for the ML model, in accordance with an example embodiment of this disclosure.

[0025] FIG. 8A is a low risk level test image for the ML model, in accordance with an example embodiment of this disclosure.

[0026] FIG. 8B is a medium risk level test image for the ML model, in accordance with an example embodiment of this disclosure.

[0027] FIG. 8C is a high risk level test image for the ML model, in accordance with an example embodiment of this disclosure.

[0028] FIG. 9 is a chart of the total number of training images for each risk level and the number of training images that were accepted by the ML model (i.e., images that were detected to have skeletons and labels corresponding to the predefined risk level), in accordance with an example embodiment of this disclosure.

[0029] FIG. 10 is a table of correct and incorrect labels assigned to test images by the ML model after it was trained (i.e., sometimes referred to herein as the “trained classifier” or simply as the “classifier”), wherein boxes with an asterisk contain the number correct labels for each risk level, in accordance with an example embodiment of this disclosure.

[0030] FIG. 11 is a table of precision, recall, and f1 -score metrics for the classifier’s accuracy evaluation, in accordance with an example embodiment of this disclosure.

DETAILED DESCRIPTION

[0031] This disclosure presents a safety monitoring system for operators of agricultural vehicles (e.g., tractors, harvesters, planters, sprayers, etc.) and other heavy machinery (e.g., forklifts, cranes, bulldozers, etc.). The system monitors the safety behaviors of operators as they interact with an operating cabin (e.g., driver and/or control cabin) of an agricultural vehicle or other heavy machinery, and notifies the operators when an unsafe (risky) operating behavior is detected. The safety monitoring system was designed with a Deep Neural Network (DNN) classifier machine learning (ML) model, trained to identify the operator’s safety practice in an image or a video frame, and categorize the detected behavior (e.g., as a low, medium, or a high-risk safety behavior).

[0032] The safety monitoring system can be a standalone system, or it can be integrated within existing systems that monitor operators’ safety behaviors while driving or otherwise controlling heavy machinery (e.g., systems that can use cameras to detect the operator safety behaviors, or systems that use data from a Controller Area Network Bus (CANbus) system of the agricultural vehicle or other heavy machinery). By providing real-time alerts to operators when risky safety behaviors are detected, the safety monitoring system enables operators to adjust their behavior accordingly and prevent unnecessary injury.

[0033] There is a limited amount of research on designing operator monitoring systems for heavy machinery, especially monitoring systems that are implemented at early stages of operating machinery. Most of the research concentrates on data collection and analysis without providing the feedback aspect to notify and alert operators to adjust their behaviors accordingly. The presently disclosed safety monitoring system adds this aspect of providing feedback to operators of agricultural vehicles and other heavy machinery about their safety practices and optionally relaying the feedback to other operator safety monitoring systems. Importantly, the safety monitoring system enables safety behavior monitoring at the most preliminary stages of operation, when the operator is first entering the operating cabin of an agricultural vehicle or other heavy machinery.

[0034] Various embodiments of the safety monitoring system and experimental findings associated with example embodiments of the safety monitoring system are described below.

[0035] Example embodiment of the safety monitoring system

[0036] FIG. 1 is a block diagram of a safety monitoring system 100 for operators of agricultural vehicles and other heavy machinery, in accordance with an example embodiment of this disclosure. The safety monitoring system 100 includes a main controller 102 configured to perform various image retrieval, image processing, machine learning, analysis, communication, and control functions of the system 100 that are described herein. The main controller 102 may include at least one processor 104, memory 106, and communication interface 108. As used herein, the term “controller” may include a microcontroller, programmable logic device (PLD), application specific integrated circuit (ASIC), personal computer (PC), embedded computer, industrial computer/controller, notebook computer, mobile device (e.g., smartphone, tablet, etc.), or the like.

[0037] The processor 104 provides processing functionality for the main controller 102/system 100 and can include any number of processors, microprocessors, microcontrollers, circuitry, field programmable gate array (FPGA) or other processing systems and resident or external memory for storing data, executable code and other information accessed or generated by the main controller 102. The processor 104 can execute one or more software programs embodied in a non-transitory computer readable medium (e.g., memory) that implement techniques/operations described herein. The processor 104 is not limited by the materials from which it is formed, or the processing mechanisms employed therein and, as such, can be implemented via semiconductor(s) and/or transistors (e.g., using electronic integrated circuit (IC) components), and so forth.

[0038] The memory 106 may include any tangible, computer-readable storage medium that provides storage functionality to store various data and/or program code associated with operation of the main controller 102/processor 104, such as software programs and/or code segments, or other data to instruct the processor 104, and possibly other components of the main controller 102, to perform the functionality described herein. Thus, the memory 106 can store data, such as a program of instructions for operating the main controller 102, including its components (e.g., processor 104, communication interface 108, etc.), and so forth. It should be noted that while a single memory is described, a wide variety of types and combinations of memory (e.g., tangible, non-transitory memory) can be employed. The memory 106 can be integrated within the processor 104, can comprise stand-alone memory, or can be a combination of both. Some examples of the memory 106 can include removable and non-removable memory components, such as random-access memory (RAM), read-only memory (ROM), flash memory (e.g., a secure digital (SD) memory card, a mini-SD memory card and/or a micro-SD memory card), solid-state drive (SSD) memory, magnetic memory, optical memory, universal serial bus (USB) memory devices, hard disk memory, external memory, or the like.

[0039] The communication interface 108 can be operatively configured to communicate with components of the main controller 102. For example, the communication interface 108 can be configured to retrieve data from the processor 104 or other devices, transmit data for storage in the memory 106, retrieve data from storage in the memory 106, and so forth. The communication interface 108 can also be communicatively coupled with the processor 104 to facilitate data transfer between components of the main controller 102 and the processor 104. It should be noted that while the communication interface 108 is described as a component of the main controller 102, one or more components of the communication interface 108 can be implemented as external components communicatively coupled to the main controller 102 via a wired and/or wireless connection. The main controller 102 can also include and/or connect to one or more input/output (I/O) devices (e.g., camera 110, motion detector 112, alert system 114 components, etc.) via the communication interface 108. In embodiments, the communication interface 108 may also include or may be coupled with a transmitter, receiver, transceiver, physical connection interface, or any combination thereof.

[0040] It is noted that the functions, steps or operations described herein are not necessarily all performed by one controller. Instead, the “main controller 102” may comprise a plurality of interconnected controllers. For example, one or more operations and/or sub-operations may be performed by a first controller, additional operations and/or sub-operations may be performed by a second controller, and so forth. Furthermore, some of the operations and/or sub-operations may be performed in parallel and not necessarily in the order that they are disclosed herein.

[0041] The safety monitoring system 100 further includes a camera 110 configured to be mounted within an operating cabin (e.g., driver and/or control cabin) of the machinery. For example, the camera 110 can be a webcam, dashcam, surveillance camera, mobile device camera, or any other type of camera. The camera 110 may be mounted to a dashboard, ceiling, inner wall, doorframe, window frame, or any structural wall, beam, or platform inside the operating cabin such that the camera’s field of view (FOV) includes lower extremity ingress/egress structures (e.g., steps) and both left-side and right-side upper extremity ingress/egress structures (e.g., railings and/or grab bars). The images in FIGS. 5A through 5C illustrate an example of the camera’s FOV when the camera 110 is mounted within the operating cabin of an agricultural vehicle or other type heavy machinery.

[0042] The main controller 102 is communicatively coupled to the camera 110. The main controller 102 is configured to receive one or more images or video frames of an operator at an entrance of the operating cabin from the camera 110. The main controller 102 may receive images or video frames from the camera 110 continuously. Alternatively, the camera function may be event-triggered for improved energy efficiency, memory performance, and/or system longevity. For example, the main controller 102 may be configured to initiate the receiving of the one or more images or video frames of the operator at the entrance of the operating cabin from the camera 110 in response to first detecting motion at the entrance of the operating cabin with the camera 110. In this type of configuration, to preserve system power and/or memory, the main controller 102 may be configured to record images or video frames at a lower resolution and/or lower frame rate until motion at the entrance of the operating cabin is detected.

[0043] In other embodiments, the camera function may be event-triggered with the use of a separate motion detector. For example, the safety monitoring system 100 may include a motion detector 112 (e.g., a sonar motion sensor, radar motion sensor, lidar motion sensor, infrared motion sensor, laser beam motion sensor, pressure/switch based motion sensor, or any other type of motion detector) that is configured to detect motion at the entrance of the operating cabin. In this type of configuration, the main controller 102 can be configured to initiate the receiving of the one or more images or video frames of the operator at the entrance of the operating cabin from the camera 110 in response to first detecting motion at the entrance of the operating cabin with the motion detector 112.

[0044] After receiving one or more images or video frames of the operator at the entrance of the operating cabin from the camera 110, the main controller 102 is configured to detect a predefined set of body joints and features within the one or more images or video frames using a skeleton model. FIG. 4A shows an example of the skeleton model used by the main controller 102 to detect the predefined set of body joints and features within an image, and FIG. 4B is an image of a person with the predefined set of body joints and features of the skeleton model having been detected within the image. The main controller 102 is configured to use a human pose detection library, such as OpenPose, to detect the predefined set of body joints and features. OpenPose is an open-source python library that uses a “two branch multi-stage” convolutional neural network (CNN) to detect joints and important features on the human body in an image or a video frame. The set of detected joints and body features is commonly referred to as a skeleton. In an example embodiment, the controller 102 is configured to use OpenPose version 1.7.0, which can detect up to 18 different joints and features (i.e., the predefined set of body joints and features illustrated in FIG. 4A) to form a full skeleton. [0045] After detecting the pre-defined set of body joints and features within the one or more images or video frames using the skeleton model, the main controller 102 is configured to estimate the operator’s posture with respect to the entrance of the operating cabin based on the detected body joints and features. The main controller 102 then assigns a risk level to the one or more images or video frames based on an estimate of the operator’s posture with respect to the entrance of the operating cabin using a trained classifier. The trained classifier is based on a ML model. FIG. 6 is a flow diagram showing a series of scripts that can be used to implement the ML model. For example, the scripts (programmed instruction sets) may be stored in memory 106 of the main controller 102 or in an auxiliary memory (e.g., external drive or cloud storage) that is accessed by the main controller 102 using its communication interface 108. In embodiments, the ML model includes, but is not limited to: a first script (S1 ) that causes main controller 102 to detect skeletons in images from a training data set, wherein the images are assigned a predetermined risk level; a second script (S2) that causes main controller 102 to convert raw skeleton data into organized skeleton data by computing a number of individual skeletons detected in each of the images and discarding any of the images in which no skeletons were detected; a third script (S3) that causes main controller 102 to extract body joints and features of the detected skeletons in the images by processing the organized skeleton data; and a fourth script (S4) that causes main controller 102 to train the ML model based on the extracted body joints and features of the detected skeletons in the images and the predetermined risk level corresponding to each of the images, wherein the trained classifier comprises the ML model after it has been trained. The ML model may further include a fifth script (S5) for testing and/or applying the ML model (i.e., the trained classifier) to real-world data.

[0046] As shown in FIG. 1 , safety monitoring system 100 may further include an alert system 114 communicatively coupled to the main controller 102. The alert system 114 is configured to provide an alert (e.g., a visual alert and/or an audible alert) based on the assigned risk level. For example, the alert system 114 may be configured to provide a heightened alert when the assigned risk level is a high risk level in comparison to when the assigned risk level is a medium risk level. The alert system 114 may be configured to provide no alerts when the assigned risk level is a low risk level, or alternatively the alert system may be configured to provide a reduced alert when the assigned risk level is a low risk level in comparison to when the assigned risk level is a medium risk level.

[0047] The alert system 114 may include a visual output device 116, such as a light emitting diode (LED), LED array, rotating warning light, strobe light, or any other type of light source or display. The alert system 114 may additionally, or alternatively, include an audible output device 118, such as a speaker, buzzer, siren, chime, bell, or the like. The alert system 114 may include a separate alert system controller 120 (e.g., a separate microcontroller or PLD) that is configured to receive information about the assigned risk level (e.g., labeled images, isolated image labels, or simply an indication of the risk level) from the main controller 102 and provide an alert based on the assigned risk level. For example, the alert system controller 120 may be configured to transmit signals to the visual output device 116 and/or audible output device 118 in order to provide at least a first type of alert for detected medium risk behavior (e.g., a warning light) and a heightened second type of alert for detected high risk behavior (e.g., flashing the warning light, providing the warning light in a different color or at a higher light intensity, and/or providing a combination of a visual alert and an audible alert together, such as static/flashing warning light while also sounding a buzzer or other type of alarm). In other embodiments, the main controller 102 is configured to directly control the visual output device 116 and/or audible output device 118 based on the assigned risk level (without a separate alert system controller 120). For example, the main controller 102 may be configured to transmit signals directly to the visual output device 116 and/or audible output device 118 in order to provide at least a first type of alert for detected medium risk behavior (e.g., a warning light) and a heightened second type of alert for detected high risk behavior (e.g., flashing the warning light, providing the warning light in a different color or at a higher light intensity, and/or providing a combination of a visual alert and an audible alert together, such as static/flashing warning light while also sounding a buzzer or other type of alarm).

[0048] In some embodiments, safety monitoring system 100 may be communicatively coupled to or integrated within a broader monitoring system (e.g., a more encompassing monitoring system for the machinery, or a fleet monitoring system). In such embodiments, the main controller 102 may be further configured to transmit information about the assigned risk level to the broader monitoring system. For example, the information about the assigned risk level may be sent to the broader monitoring system via CANbus, intranet, and/or internet connectivity facilitated by the communication interface 108 of the controller 102.

[0049] The assigned risk level may be based on an orientation of the operator’s head with respect to the operating cabin and a number of contact points between the operator and ingress/egress structures at the entrance of the operating cabin. For example, the main controller 102 may be configured to assign a low risk level to the one or more images or video frames when the estimate of the operator’s posture with respect to the entrance of the operating cabin indicates that the operator is looking into the operating cabin while entering the operating cabin and that there are at least three contact points between the operator and the ingress/egress structures at the entrance of the operating cabin. FIG. 5A is an image of an operator entering an operating cabin of an agricultural vehicle or other heavy machinery, wherein the operator’s posture is associated with a low risk level because the operator is stepping into the cabin with three points of contact (i.e., two hands on railings/grab bars and one foot on step) and is looking into the cabin as they enter. For further example, the main controller 102 may be configured to assign a medium risk level to the one or more images or video frames when the estimate of the operator’s posture with respect to the entrance of the operating cabin indicates that the operator is looking into the operating cabin while entering the operating cabin and that there are only two contact points between the operator and the ingress/egress structures at the entrance of the operating cabin. FIG. 5B is an image of an operator entering an operating cabin of an agricultural vehicle or other heavy machinery, wherein the operator’s posture is associated with a medium risk level because the operator is stepping into the cabin with only two points of contact (i.e., one hand on a railing/grab bar and one foot on step) and is looking into the cabin as they enter. For further example, the main controller 102 may be configured to assign a high risk level to the one or more images or video frames when the estimate of the operator’s posture with respect to the entrance of the operating cabin indicates that the operator is not looking into the operating cabin while entering the operating cabin or that there are less than two contact points between the operator and the ingress/egress structures at the entrance of the operating cabin. FIG. 5C is an image of an operator entering an operating cabin of an agricultural vehicle or other heavy machinery, wherein the operator’s posture is associated with a high risk level because the operator is stepping into the cabin with only one point of contact (i.e. , one foot on step) and/or is not looking into the cabin as they enter.

[0050] Examples of monitoring ingress safety behaviors are described above and illustrated in the drawings; however, egress safety behaviors can also be monitored in a similar fashion. For example, the main controller 102 may be configured to assign: (1 ) a low risk level to the one or more images or video frames when the estimate of the operator’s posture with respect to the exit of the operating cabin indicates that the operator is looking out from the operating cabin while exiting the operating cabin and that there are at least three contact points between the operator and the ingress/egress structures at the exit of the operating cabin; (2) a medium risk level to the one or more images or video frames when the estimate of the operator’s posture with respect to the exit of the operating cabin indicates that the operator is looking out from the operating cabin while exiting the operating cabin and that there are only two contact points between the operator and the ingress/egress structures at the exit of the operating cabin; and (3) a high risk level to the one or more images or video frames when the estimate of the operator’s posture with respect to the entrance of the operating cabin indicates that the operator is looking into the operating cabin while exiting the operating cabin or that there are less than two contact points between the operator and the ingress/egress structures at the exit of the operating cabin.

[0051] FIG. 2 is a flow diagram of a process 200 implemented by the safety monitoring system 100, in accordance with an example embodiment of this disclosure. For example, the main controller 102 may be configured to execute program instructions that cause the main controller 102 to carry out the process 200. In an example embodiment, the process 200 may include at least the following steps: (step 202) recording one or more images or video frames of an operator at an entrance of the operating cabin using a camera 110 that is mounted in the operating cabin; (step 204) detecting a predefined set of body joints and features within the one or more images or video frames using a computerized skeleton model and estimating the operator’s posture with respect to the entrance of the operating cabin based on the detected body joints and features; (step 206) assigning a risk level to the one or more images or video frames based on an estimate of the operator’s posture with respect to the entrance of the operating cabin using a computerized trained classifier; and (step 208) providing an alert based on the assigned risk level by transmitting information about the assigned risk level to the alert system controller 120 or by directly controlling the visual output device 116 and/or audible output device 118 of the alert system 114.

[0052] At step 208, the alert system controller 120, or the main controller 102 in some embodiments, is configured to perform a process 300 illustrated in FIG. 3 to provide an alert based on the assigned risk level. In an example embodiment, the process 300 may include at least the following steps: (step 302) receiving an image labeled with an assigned risk level or receiving information about the assigned risk level without image data; (step 304) returning to step 302 if the image is assigned a low risk level or proceeding to step 306 otherwise; (step 306) providing a first type of alert (e.g., visual alert) if the image is assigned a medium risk level or proceeding to step 308 otherwise; and (step 308) providing a heightened second type of alert (e.g., combination of visual alert and audible alert together) if the image is assigned a high risk level or returning to step 302 otherwise.

[0053] In addition to the steps described above, processes 200 and 300 may also include any other steps or operations that are described or implied by embodiments of the safety monitoring system 100 or its components. Furthermore, the steps or operations described above can be performed in a different order, or in parallel with one another, unless otherwise specified in this disclosure.

[0054] Additional details pertaining to safety monitoring system 100 and processes 200 and 300 are further provided below with experimental data obtained through development and testing of example embodiments.

[0055] Software Frameworks of the ML Model

[0056] In an example embodiment, the ML model for the safety monitoring system 100 is designed using two main open-source software frameworks to code scripts for training and testing the model. One of them is the TensorFlow open-source library which is used in ML and artificial intelligence for the computation of multiple complex analytical and mathematical tasks. For additional information regarding TensorFlow, see Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Zheng, X. (2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16) (pp. 265- 283), which is incorporated herein by reference in its entirety. TensorFlow has multiple functions and modules that enable ML models’ developers to design training and testing scripts for neural networks (NNs), and calculations of metrics to measure models’ efficiencies. Apart from TensorFlow, OpenPose is the other important software framework used to design the ML model. OpenPose is an open-source python library that uses a “two branch multi-stage” convolutional neural network (CNN) to detect joints and important features on the human body in an image or a video frame. The set of detected joints and body features is commonly referred to as a skeleton, and OpenPose version 1.7.0 used for this project can detect up to 18 different joints and features. FIGS. 4A and 4B show body joints and features that are detected by OpenPose to form a full skeleton.

[0057] Training Data Set

[0058] Besides designing the ML model, creating a well-organized and relevant training dataset is among the crucial stages of creating an ML model and possibly the most important one, therefore understanding how it was created is necessary for a better comprehension of the model. This is also because regardless of how the model’s features processing and training procedures were developed, a good training dataset that addresses a big enough portion of the feature space is imperative to create a well-trained and accurate ML model. For an example embodiment of the safety monitoring system 100, the training image dataset was collected by installing a webcam (camera 110) in the operating cabin and recording the ingress/egress behaviors of operators. The recorded videos were then transformed into images by using a python script to convert each video frame into separate images at a speed of 10 fps (frames per second). Only videos of the operators’ behaviors when entering and exiting the operating cabin were collected because those are the behaviors that the safety monitoring system 100 is designed to monitor. [0059] To design the training dataset, the Infrastructure Health and Safety Association (IHSA) suggested three points of contact mechanism when climbing was utilized, which recommends having two hands and one foot, or two feet and one hand in contact with the equipment (including ladders) that that are being climbed (e.g., see Infrastructure Health and Safety Association. (2019, May). 3-poi nt contact — Vehicles and equipment). The operators’ ingress/egress behaviors were divided into three distinct categories which are, entering/exiting the machinery while facing away from the operating cabin (high risk), entering/exiting the machinery while facing into the operating cabin but without all the three points of contact (medium risk), and entering/exiting the machinery while facing into the operating cabin and with all the three points of contact (low risk). Multiple videos for each category were captured, and the training dataset was created by deriving images from the recorded videos of each category (using the aforementioned python script) and storing them into separate folders based on their respective safety risk category. These categories were used as the distinct classes for training and testing the ML model. FIGS. 7A through 7C illustrate examples of low, medium, and high risk level training images, respectively.

[0060] To develop a good training dataset, it is crucial that it contains training samples for a big enough part of the feature space so that accurate predictions can be made regardless of the part of the feature space that was used for predictions. To address this, a training dataset that includes multiple possible situations when entering an operating cabin was collected. Some of the situations that were considered are when the operator is wearing glasses, when the operator uses a different set of hands and feet combination for the three-point contact mechanism, when the operator is facing down when climbing, etc. In some embodiments, the FSPT (Feature Space Partitioning Tree) approach can be used to further optimize the “training space” (i.e. , a feature space with sufficient training samples).

[0061] ML Model Design

[0062] To create the ML model, a Deep Neural Network (DNN) of three layers of 50*50*50 was used to design the classifier algorithm (which is an ML algorithm that assigns a data input to a specific class among the predetermined set of classes), and the risk level categories were used as classes to predict. Also, to design the customized ML model, Felix Chenfy’s real-time action recognition model format was used, which uses OpenPose to detect everyday actions like jumping and running. For more information regarding Felix Chenfy’s real-time action recognition model format, see Felix Chenfy. (2019). Multi-Person Real Time Action Recognition Based-On Human Skeleton [Machine learning], which is incorporated herein by reference in its entirety. The ML model for the safety monitoring system 100 is comprised of five main python scripts executable by the main controller 102, each with its own designated specific task aligned with the primary model goals of processing features, training, and testing. Those five main scripts are discussed in detail below, and FIG. 6 shows the relationships between the scripts, their inputs, and their outputs.

[0063] The first script (S1 ) gets skeletons from the images in the training dataset and identifies how many people are in each training image (e.g., each video frame). Tracker and Skeleton_detector are the two main classes used to accomplish this task. Skeleton_detector is an OpenPose class that detects a person’s skeleton in an image or a video frame. The OpenPose module uses the CMU (Carnegie Mellon University) panoptic dataset which contains around 1.5 million skeletons (e.g., see Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7291 -7299), which is incorporated herein by reference in its entirety), and it is the dataset that the Skeleton_detector class uses to detect skeletons. On the other hand, Tracker detects the number of people in each video frame and tracks the skeleton of each detected person throughout the whole video (i.e., the video frames used to create the training dataset). The Tracker accomplishes this by saving each detected skeleton in the first video frame as Sk1 [i], and if it detects a skeleton Sk2[j] in the next frame, it decides whether Sk1 [i] and Sk2[j] are skeletons of the same person by comparing the distance between them. For example, say that a function dist() calculates the distance between skeletons detected in different video frames (which can be calculated because all video frames have similar dimensions), and the constant dist_thresh is the threshold distance to determine whether Sk1 [i] and Sk2[j] are skeletons of the same person. Then if

[0064] dist(Skl[i], Sk2[j]) < dist_thresh (Equation 1 ), [0065] S1 [i] and S2[j] are skeletons of the same person, and if

[0066] dist(Skl[i], Sfc2[j]) > dist_thresh (Equation 2),

[0067] Sk1 [i] and Sk2[j] are skeletons of different people.

[0068] This analysis is very important to keep track of each individual’s actions when using the model on a livestreaming video. This is also achievable because from frame to frame there is not much difference between a person’s posture since the training dataset was created from the recorded videos at a speed of 10 frames per second (fps).

[0069] As discussed in the previous paragraphs, the input for S1 is the images in the training dataset, and its output is the raw skeletons data of the people detected in the training images. The second script (S2) takes in the raw skeletons data as the input and outputs a well-organized skeletons’ dataset, to make feature extraction easier. This includes computing the exact number of individual skeletons detected in the training images and saving their dataset in a designated folder. S2 also discards all images that do not have skeletons, as well as images with labels different from the defined training classes. In experimental development and testing, all training images fed to S1 had only one person in them, to make the analysis of the accepted images easier. The size of the whole training dataset fed to S1 was 3027 images, and after running S1 and S2, 2370 training images remained (i.e., images that S1 and S2 detected to have skeletons and labels (pre-assigned risk levels) similar to the defined training classes). As shown in FIG. 9, there is a high rejection rate for the medium risk class training images which might be a result of poor camera placement during the training dataset collection. This is because when collecting the training dataset, the camera 110 was mounted in a different position before recording training videos for each class, to later evaluate how each camera position affected the model accuracy. Thus, the camera placement when collecting the medium risk safety class training video was the least ideal, and better camera positions can be identified by finding the camera placements that lead to capturing videos with smaller rejection rates (this can be achieved by making sure that all the operator’s limbs are in the camera’s FOV). [0070] The third script (S3) takes in the well-organized skeletons’ dataset (S2 output) as its input and processes it to extract features and labels for training, and then saves them in a .csv file. S3 achieves this by extracting features of normalized joint positions using a defined function. As the name suggests, the features of normalized joint positions are scaled using the normalization feature scaling mechanism, which scales down features between zero and one (for simplicity) before feeding them to the classifier algorithm. For image classification, feature normalization is recommended and very useful, as it allows “comparable effects in distance computations” (e.g., see Xie, L., Tian, Q., & Zhang, B. (2013, September). Feature normalization for part-based image classification. In 2013 IEEE International Conference on Image Processing (pp. 2607-2611 ), which is incorporated herein by reference in its entirety). After extracting the features, a defined feature processing function converts them from raw features of individual images to time-series features (i.e. , features at regular time intervals).

[0071] The last two programs/scripts S4 and S5 are designated for training and testing the model, respectively. S4 takes in the processed features and labels from S3, trains the model using the risk level category classes, and then saves the trained classifier. Depending on the label (class) of a proposed skeleton feature, S4 will train the model such that when a similar skeleton is detected in a different image or video, it will be labeled similar to that processed skeleton feature in the .csv file. This means that the classifier is trained to label each skeleton it encounters as one of the three risk level categories based on the skeleton features used to train it, which makes a good training dataset imperative in order to design an accurate model. S5 is the script used to apply the trained and saved classifier to real-world data. By using the OpenCV python module (which is an open-source library used for multiple computer vision and ML tasks), S5 outputs a window with an image or a video containing predicted classes for each detected skeleton (e.g., see FIGS. 8A through 8C). For more information regarding OpenCV, see Culjak, I., Abram, D., Pribanic, T., Dzapo, H., & Cifrek, M. (2012, May). A brief introduction to OpenCV. In 2012 proceedings of the 35th international convention MIPRO (pp. 1725-1730), which is incorporated herein by reference in its entirety. The numbers on the left side of the images in FIGS. 8A through 8C show the model’s confidence level of its prediction. S5 was designed to use the classifier on either images, pre-recorded videos, or livestreaming videos. [0072] The Alert/Feedback System

[0073] After the ML model causes the main controller 102 to detect and identify a machinery operator’s safety behavior from livestreaming video, the alert/feedback system 114 notifies the operator whether their behavior is classified as medium or high risk by the model. In an example embodiment, the alert system 114 was created using the Arduino Uno embedded microcontroller as the alert system controller 120 (herein after “Arduino”), an LED as the visual output device 116, and a 2KHz 5V Adafruit buzzer as the audible output device 118. To establish the communication between the ML model and Arduino, the serial python module was used to allow serial communication between the python integrated development environment (IDE) and Arduino. When the ML model identifies the detected tractor operator’s behavior, it sends a specified character to Arduino through serial communication at a baud rate of 9600 (i.e., 9600 bits per second), and the transmitted character is different for each class. Then, the Arduino’s serial monitor receives the transmitted character, and by using an Arduino script uploaded to the microcontroller, alerting instructions are issued accordingly.

[0074] Arduino outputs instructions to either continuously blink the LED, sound the buzzer, or do both based on the received character. In an example embodiment, when the received character indicates that a medium risk safety behavior was detected, the Arduino will give instructions to continuously blink the LED, and when the received character indicates that a high-risk safety behavior was detected, Arduino will give instructions to both blink the LED and sound the buzzer (e.g., see process 300 in FIG. 3 which was previously discussed herein). When the received character indicates that a low-risk safety behavior was detected, the alert system 114 does nothing; however, in other embodiments, the alert system 114 may be further configured to provide feedback on low risk behavior. Also, to create a complete stand-alone system bundle, a Jetson Nano 2GB embedded computer (or similarly specced device) can be used as the main controller 102 to run the ML model and the alert system 114, because of its small and convenient size, and good processing ability adequate to run the safety monitoring system 100. [0075] Results and Discussion

[0076] Before training, S4 split the training dataset into a training set and a testing set (known as the train-test split). For more information regarding the “train-test split,” see Tan, J., Yang, J., Wu, S., Chen, G., & Zhao, J. (2021 ). A critical look at the current train/test split in machine learning. arXiv preprint arXiv:2106.04525, which is incorporated herein by reference in its entirety. After the train-test split, out of 2370 images in the training dataset, 1659 images were designated for training (elements of the training set) and 711 images were designated for testing (elements of the testing set). This was done to test the classifier after training, and it is an accurate method of testing to obtain a good estimation of the model’s accuracy, because it uses testing data derived from a processed dataset which contains only labeled images with detectable skeletons. This means that the classifier trained only on the training data remaining after the train-test split (i.e. , the 1659 images in the training set), because the remaining data (i.e., the testing set derived from the train-test split) was used to evaluate the model’s accuracy. The model accuracy on the testing set was 97% (or 0.97), which was the ratio of the number of accurate predictions to the size of the testing set. This was a very good accuracy rating, and it might have been a bit higher than the actual accuracy because in real-time applications the model will not only be tested on processed datasets with detectable skeletons. In some real-word situations, skeletons may not be easily detectable in images, and differences in lighting conditions might also affect the model’s ability to detect skeletons in images. However, this accuracy rating was a good estimate of the model’s prediction ability, because the training dataset was adequately diverse, as it accounted for multiple situations (e.g., outdoors and indoors), and different camera placements.

[0077] The results obtained from evaluating the model accuracy using the testing set are presented in Table 1 of FIG. 10 using a confusion matrix, which is a method of visualizing a classifier’s prediction results by showing the number of both the correct and incorrect predictions in each class. The values in the asterisk-marked cells (in Table 1 ) are the number of images in the testing set that the classifier correctly predicted the operator’s safety behavior they depict, and the values in the other cells are the number of images that the classifier labeled incorrectly in each class. Other metrics that were used to evaluate the model’s accuracy are the precision, recall, and f1 -score metrics. To explain these metrics, let us call the number of correct predictions in each class C p , the number of images predicted by the classifier to be in the same class (for each class) as /, and the number of images belonging in the same class from the testing set (for each class) as T. Then, precision for a specific class is the ratio of the correctly predicted images in that class (C p ), to the total number of images predicted by the classifier as part of the class. A high precision rate implies a low rate of incorrect predictions.

[0078] Precision(P) = y (Equation 3)

[0079] The recall metric for each class is the ratio of the correctly predicted images in that class (C p ) to the actual number of images that belongs in that class (T).

[0080] (Equation 4)

[0081] Lastly, f1 -score is the weighted average of recall and precision, and it is given by equation 5.

[0082] (Equation s)

[0083] The values of the Precision, recall, and f1 -score metrics for each class are given in Table 2 of FIG. 11 , and they evaluate the classifier to be adequately accurate, but due to the aforementioned reasons, the model might not score as high when being used on unprocessed data.

[0084] Conclusions

[0085] Regardless of a few adjustments needed to make the safety monitoring system 100 more robust, it functions sufficiently well to be viewed as a promising approach to monitor agricultural machinery operators’ safety behaviors. The ML model has a very good accuracy, and by training it using a more diverse dataset that addresses different lighting conditions, operators’ attires like hats and glasses, and different camera positions, the model can be made more suitable for real-world operating environment. Apart from the ML model, the alert system 114 is also very effective since it considers both the sight and sound senses. In further embodiments, the system can be designed such that it is less dependent on visualizing the whole operators’ body to detect and identify their safety behaviors. Also, ML model optimizing approaches like FSPT can be used to further improve the training dataset by creating it such that it represents larger portions of the feature space.

[0086] Although the technology has been described with reference to the embodiments illustrated in the attached drawing figures, equivalents may be employed, and substitutions may be made herein without departing from the scope of the technology as recited in the claims. Components illustrated and described herein are examples of devices and components that may be used to implement the embodiments of the present invention and may be replaced with other devices and components without departing from the scope of the invention. Furthermore, any dimensions, degrees, and/or numerical ranges provided herein are to be understood as non-limiting examples unless otherwise specified in the claims.