PEREZ LANCE C (US)
SCHMIDT TY (US)
MOTE BENNY (US)
NUTECH VENTURES (US)
US20190138801A1 | 2019-05-09 | |||
US20200226360A1 | 2020-07-16 | |||
US20200120899A1 | 2020-04-23 | |||
US20160026895A1 | 2016-01-28 |
WHAT IS CLAIMED IS: 1. A computer-implemented method of tracking animals, the method comprising: recognizing, by using at least one data processor, individual animals in images of a plurality of the animals; and tracking the animals using a probabilistic tracking-by-detection process. 2. The method of claim 1 in which recognizing individual animals comprises using a convolutional detector to recognize the individual animals to provide visible key points of the individual animals, and the probabilistic tracking-by-detection process comprises using, as input, the visible key points of the individual animals provided by the convolutional detector to track the animals over a period of time. 3. The method of claim 1 or 2 in which the convolutional detector comprises a convolutional neural network. 4. The method of any of claims 1 to 3 in which tracking the animals over a period of time comprises tracking the animals over a period that ranges from an hour to a year. 5. The method of claim 4 in which tracking the animals over a period of time comprises tracking the animals over a period that ranges from an hour to a day. 6. The method of claim 4 in which tracking the animals over a period of time comprises tracking the animals over a period that ranges from a day to a week. 7. The method of claim 4 in which tracking the animals over a period of time comprises tracking the animals over a period that ranges from a week to a month. 8. The method of claim 4 in which tracking the animals over a period of time comprises tracking the animals over a period that ranges from a month to 3 months. 9. The method of claim 4 in which tracking the animals over a period of time comprises tracking the animals over a period that ranges from 3 months to 6 months. 10. The method of claim 4 in which tracking the animals over a period of time comprises tracking the animals over a period that ranges from 6 months to a year. 11. The method of any of claims 1 to 3 in which tracking the animals over a period of time comprises tracking the animals over a period that ranges from 1 to 10 years. 12. The method of any of claims 1 to 11 in which individual animals have ear tags, and the method comprises using a classification network to assign unique identification to the individual animals. 13. The method of any of claims 1 to 12, comprising using fixed cardinality of the individual animals to generate a continuous set of tracks of the animals, and using a forward-backward algorithm to assign ear-tag identification probabilities to each detected animal. 14. The method of any of claims 1 to 13, comprising using consumer-grade hardware to achieve real-time multi-animal tracking. 15. The method of any of claims 1 to 14 in which the animals comprises at least one of pigs, cows, horses, sheep, lamb, llama, or alpaca. 16. The method of any of claims 1 to 15 in which the images comprise images of animals in an enclosed environment, and the images are captured by at least one camera positioned above the animals. 17. The method of any of claims 1 to 16, comprising using a deep, fully- convolutional network to detect each animal as a collection of anatomical features. 18. The method of any of claims 1 to 17, comprising tracking individual animals across frames of a video using a tracking algorithm that relies on the number of targets being fixed and known to the tracking algorithm. 19. The method of any of claims 1 to 18, comprising using ear tags for visual identification of each animal, and using a maximum a posteriori (MAP) forward- backward process to assign ear tag identities to individual animals by merging ear tag classification probabilities with frame-to-frame movements. 20. The method of any of claims 1 to 19, comprising representing and tracking animals as a collection of body parts, and inferring activities of the animals and interactions among the animals. 21. The method of any of claims 1 to 20 in which the images show animals living in a fixed group-house environment. 22. The method of any of claims 1 to 21 in which the images comprise a video frames obtained from a static camera mounted above the group-house environment. 23. The method of claim 22 in which the field of view of the camera encompasses the entire living space of the animals in the group-house environment. 24. The method of claim 22 or 23 in which the number of animals remains constant during a period of time when the animals are being tracked, and each animal is equipped with a unique visual marker. 25. The method of any of claims 1 to 24, comprising processing frames of a video using an instance detection and part localization module to detect target animals and determine image coordinates of each instance of animal. 26. The method of claim 25 in which for the tth frame, the set of Nt instances detected by the instance detection and part localization module are denoted : 27. The method of claim 26, comprising after detecting instances of the animals, using a sequence of detected target locations construct continuous tracks for N target animals. 28. The method of claim 27, comprising tracking the animals by removing high-cost detections whenever in which cost is defined for each instance using in which and are the two-dimensional shoulder and tail coordinates that define the location of the instance, is the estimated shoulder coordinates taken from the tail coordinate , is the estimated tail coordinates taken from the shoulder coordinate the metrics score and score are the outputs in the shoulder and tail detection channels of the network output. 29. The method of claim 28 in which when the shoulder and tail location estimates are accurate such that 30. The method of claim 28 or 29 in which the cost of an instance increases as the score of the shoulder and tail detection decrease. 31. The method of any of claims 28 to 30 in which the minimum values of score and score are lower bounded to 0.25, the most that these terms can increase the cost is by a factor of 2. 32. The method of any of claims 28 to 31 in which when the values of score and score are below 0.25, the corresponding parts are not detected and does not contribute to an instance. 33. The method of any of claims 28 to 32 in which when the values of score and score are both equal to one, the cost is decreased by a factor of 2. 34. The method of any of claims 1 to 33, comprising limiting the detections per frame to and approximating a set of N continuous tracks using a fixed-cardinality track interpolation algorithm. 35. The method of claim 34 in which the fixed-cardinality track interpolation algorithm comprises: 36. The method of any of claims 1 to 35 in which recognizing individual animals comprises, by using the at least one data processor, recognizing a plurality of body parts of a plurality of animals based on images of the animals, in which the plurality of body parts include a plurality of types of body parts, including determining first estimated positions of the recognized body parts in the images. 37. The method of claim 36 in which recognizing individual animals comprises, by using the at least one data processor, recognizing a plurality of first associations of body parts based on the images of the animals, each first association of body parts associates a body part of an animal with at least one other body part of the same animal, including determining relative positions of the body parts in each recognized first association of body parts in the images. 38. The method of claim 37 in which recognizing individual animals comprises, by using the at least one data processor, determining, based on the first estimated positions of the recognized body parts and the relative positions of the body parts in the recognized first associations of body parts, second associations of body parts in which each second association of body parts associates a recognized body part of an animal with at least one other recognized body part of the same animal. 39. The method of claim 37 in which recognizing individual animals comprises, by using the at least one data processor, recognizing the individual animals in the images based on the second associations of body parts of the animals. 40. The method of any of claims 1 to 39 in which the images show at least one of (i) animals that have various ages, (ii) animals that have various sizes, (iii) animals that have various activity levels, or (iii) animals in various lighting conditions. 41. The method of any of claims 1 to 40 in which the images show at least one of (i) pigs that congregate or pile up together at night, (ii) pigs that chase each other around a pen, or (iii) pigs moving throughout a pen during daytime. 42. The method of any of claims 1 to 41 in which the animals are tracked with an average precision and recall greater than 50%. 43. The method of claim 42 in which the animals are tracked with an average precision and recall greater than 80%. 44. The method of claim 43 in which the animals are tracked with an average precision and recall greater than 90%. 45. The method of claim 44 in which the animals are tracked with an average precision and recall greater than 95%. 46. The method of claim 45 in which the animals are tracked with an average precision and recall greater than 98%. 47. The method of any of claims 1 to 46 in which the images of the animals comprise a video captured at N frames per second, and N is a an integer between 1 to 60. 48. The method of claim 47 in which N is equal to 5. 49. The method of any of claims 1 to 48, comprising providing a probabilistic framework for merging classification likelihoods to detections. 50. The method of any of claims 1 to 49, comprising using association vectors to evaluate the probability that an ear tag belongs to an instance animal. 51. The method of claim 50, comprising initializing the probability of assigning an instance to a specific identity with a uniform probability, and for each tag and each detected instance, the probability is modified using a weighted summation of the network output and the uniform probability. 52. The method of claim 51, comprising determining a first probability of the observation given a specific identity for the target, determining a second probability of a target transitioning between frames from one location to another, and using the first and second probabilities to calculate a Maximum A-Posteriori (MAP) estimate of each target’s identity. 53. The method of any of claims 1 to 52, comprising: applying at least one recognition module to at least one image of animals to recognize body parts of the animals, in which the body parts include a plurality of types of body parts, and the at least one recognition module outputs first estimated positions of the recognized body parts in the at least one image; applying the at least one recognition module to the at least one image of animals to recognize first associations of body parts of the animals, in which each first association of body parts associates a body part of an animal with at least one other body part of the same animal, and the at least one recognition module outputs relative positions of the body parts in each recognized first association of body parts; determining, based on the first estimated positions of the recognized body parts and the relative positions of the body parts in the recognized first associations of body parts, second associations of body parts in which each second association of body parts associates a recognized body part of an animal with at least one other recognized body part of the same animal; and recognizing individual animals in the at least one image based on the second associations of body parts of the animals. 54. The method of any of claims 1 to 53, comprising: applying at least one recognition module to at least one image of animals to recognize individual body parts of the animals, wherein the at least one recognition module outputs first estimated locations of the recognized individual body parts in the at least one image; applying the at least one recognition module to the at least one image of animals to recognize groups of body parts of the animals, wherein the at least one recognition module outputs relative positions of the body parts in each recognized group of body parts; determining associations of recognized individual body parts based on (i) the first estimated locations of the recognized individual body parts of the animals and (ii) the relative positions of the body parts in the recognized groups of body parts, and recognizing individual animals in the at least one image based on the associations of recognized individual body parts of the animals. 55. The method of any of claims 1 to 54, comprising: applying at least one recognition module to at least one image of pigs to recognize body parts of the pigs, in which the body parts include shoulder portions, tail portions, left ears, and right ears of the pigs, wherein the at least one recognition module outputs first estimated locations of the recognized shoulder portions, the recognized tail portions, the recognized left ears, and the recognized right ears in the at least one image; applying the at least one recognition module to the at least one image of pigs to recognize pairs of body parts of the pigs, including recognizing a pair of shoulder portion and tail portion of each of at least some of the pigs, recognizing a pair of shoulder portion and left ear of each of at least some of the pigs, and recognizing a pair of shoulder portion and right ear of each of at least some of the pigs, and wherein the at least one recognition module outputs a position of the tail portion relative to the corresponding shoulder portion in each recognized pair of shoulder portion and tail portion, a position of the left ear relative to the corresponding shoulder portion in each recognized pair of shoulder portion and left ear, and a position of the right ear relative to the corresponding shoulder portion in each recognized pair of shoulder portion and right ear; determining, for each of at least some of the recognized shoulder portions, an association with a recognized tail portion, a recognized left ear, and a recognized right ear of the same pig based on (i) the first estimated positions of the recognized shoulder portions, tail portions, left ears and right rears, and (ii) the relative positions of the tail portion and the corresponding shoulder portion in each recognized pair of shoulder portion and tail portion, the relative positions of the left ear and the corresponding shoulder portion in each recognized pair of shoulder portion and left ear, and the relative position of the right ear and the corresponding shoulder portion in each recognized pair of shoulder portion and right ear; and recognizing individual pigs in the at least one image of pigs based on the associations of recognized shoulder portions with recognized tail portions. 56. A system for tracking animals, comprising: at least one data processor; at least one storage device storing instructions that when executed by the at least one data processor, performs the method of any of claims 1 to 55. 57. The system of claim 56, further comprising at least one image capturing device for obtaining the at least one image of the animals. 58. A system for recognizing animals, comprising: an instance detection and part localization module; a visual marker classification module; a fixed-cardinality track interpolation module; and a maximum a posteriori estimation of animal identity module. 59. The system of claim 58, comprising at least one body-part recognition module that is configured to recognize body parts of animals in at least one image of the animals, in which the body parts include a plurality of types of body parts, and the at least one recognition module outputs first estimated positions of the recognized body parts in the at least one image; the at least one body-part recognition module is further configured to recognize first associations of body parts of the animals, in which each first association of body parts associates a body part of an animal with at least one other body part of the same animal, and the at least one recognition module outputs relative positions of the body parts in each recognized first association of body parts; an association module configured to determine, based on the first estimated positions of the recognized body parts and the relative positions of the body parts in the recognized first associations of body parts, second associations of body parts in which each second association of body parts associates a recognized body part of an animal with at least one other recognized body part of the same animal; and an animal recognition module configured to recognize individual animals in the at least one image based on the second associations of body parts of the animals. 60. The system of claim 58 or 59, comprising: at least one body-part recognition module configured to process at least one image of animals to recognize individual body parts of the animals, wherein the at least one body-part recognition module outputs first estimated locations of the recognized individual body parts in the at least one image; the at least one body-part recognition module is further configured to process the at least one image of animals to recognize groups of body parts of the animals, wherein the at least one body-part recognition module outputs relative positions of the body parts in each recognized group of body parts; an association module configured to associate each of at least some of the recognized individual body parts with at least one other recognized individual body part of the same animal based on (i) the first estimated locations of the recognized individual body parts of the animals and (ii) the relative positions of the body parts in the recognized groups of body parts, and an animal recognition module configured to recognize individual animals in the at least one image based on the associations of recognized individual body parts of the animals. 61. The system of any of claims 58 to 60, comprising: at least one pig-part recognition module configured to process at least one image of pigs to recognize body parts of the pigs, in which the body parts include shoulder portions, tail portions, left ears, and right ears of the pigs, wherein the at least one body-part recognition module is configured to output first estimated locations of the recognized shoulder portions, the recognized tail portions, the recognized left ears, and the recognized right ears in the at least one image of pigs; the at least one pig-part recognition module is further configured to process the at least one image of pigs to recognize pairs of body parts of the pigs, including recognizing a pair of shoulder portion and tail portion of each of at least some of the pigs, recognizing a pair of shoulder portion and left ear of each of at least some of the pigs, and recognizing a pair of shoulder portion and right ear of each of at least some of the pigs, and wherein the at least one pig-part recognition module is configured to output a position of the tail portion relative to the corresponding shoulder portion in each recognized pair of shoulder portion and tail portion, a position of the left ear relative to the corresponding shoulder portion in each recognized pair of shoulder portion and left ear, and a position of the right ear relative to the corresponding shoulder portion in each recognized pair of shoulder portion and right ear; a pig-part association module configured to determine, for each of at least some of the recognized shoulder portions, an association with a recognized tail portion, a recognized left ear, and a recognized right ear of the same pig based on (i) the first estimated positions of the recognized shoulder portions, tail portions, left ears and right rears, and (ii) the relative positions of the tail portion and the corresponding shoulder portion in each recognized pair of shoulder portion and tail portion, the relative positions of the left ear and the corresponding shoulder portion in each recognized pair of shoulder portion and left ear, and the relative position of the right ear and the corresponding shoulder portion in each recognized pair of shoulder portion and right ear; and a pig-recognition module configured to recognize individual pigs in the at least one image of pigs based on the associations of recognized shoulder portions with recognized tail portions. 62. A machine-readable medium storing instructions that when executed by a machine cause the machine to perform the method of any of claims 1 to 55. |
3.3. Visual Marker Classification In applications where unique visual identification of animals is important, it is common for livestock to be issued permanent ear tags. Serial numbers are common, however, they are not ideal for visual identification. Therefore, a different set of tags was designed and used in this work. The set of 16 tags, illustrated in Figure 5, includes a variety of different color/alphanumeric character combinations. The specific combination chosen here was intended to be easily recognizable for people, even in difficult viewing conditions. For the proposed tracking system, the tags serve as an absolute way to identify each animal and recover from tracking errors. When an ear is located in the image, that section of the image centered at the ear is cropped to a 65 × 65 image. The cropped image is then processed by the convolutional neural network shown in Figure 6 to provide a likelihood that the observed ear is equipped with one of the known tags. The network was designed using the DenseNet architecture [68] (with k = 8). At each time step t an observation I t is made regarding the specific identity of each left or right ear location, denoted ri t or li t , respectively. The ear location will be denoted ei t to simplify notation, and any operation that applies to ei t applies to both ri t and l i t . In this case, the observation is confined to a 65 × 65 window around the animal’s ear. The trained network uses this observation to derive the probability of ear tag e i t having identity {1, ... , N}, given an observation I t . Target instances are defined by pairs of shoulder and tail locations. The network provides association vectors to predict the locations of shoulders from both the right and left ear. Thus, instead of making hard decisions regarding which ear belongs to which instance, the association vectors are used to evaluate the probability that an ear tag belongs to an instance. Specifically, the average back-and-forth distance between ears and shoulders is found using As this distance increases, the probability that the ear is linked to the shoulder is decreased with a decaying exponential given by where a lower bound of 10 −6 prevents network over-confidence from creating instability. Finally, the probability of assigning a specific identity to an instance is initialized with a uniform probability of 1/N and, for each tag and each detected instance, the probability is modified using a weighted summation of the network output and the uniform probability. This calculation is given by In the extremes of this results in when none of the tag locations are strongly linked to the instance location and it results in when ear tag e t j is a highly confident match to instance location x i t . It should also be noted that when all tags are equally likely to be observed and, for the purposes of optimization, the probability of the observation does not affect probability maximization. 3.4. Maximum A-Posteriori (MAP) Estimation of Animal Identity In livestock tracking applications with frame rates exceeding 4 fps, targets move very little between frames. Therefore, a “stay put” motion model is adopted here. Let be the probability of transitioning to state xi t given that the tracked target was previously in state x t j −1 , and let the distance between xi t and x t j −1 be defined as Using a labeled dataset, described in detail in Section 5, a set of 1.73 million samples was collected and its distribution is given by the blue dots in Figure 7. This distribution can not be closely approximated by a single exponential distribution. Instead, it requires a weighted sum of three exponential distributions to achieve the approximation illustrated by the orange line in Figure 7. The equation for the approximate distribution is Equation (4) provides the likelihood of the observation given a specific identity for the target and Equation (6) provides the probability of a target transitioning between frames from one location to another. Together, these two probabilities make it possible to calculate the Maximum A-Posteriori (MAP) estimate of each target’s identity. The proposed method aims to evaluate the probability that target n exists in state x i t given the entire sequence of observations {I 1 , ... , I T }. This probability, previously denoted will now be shortened to to simplify notation. As a consequence, it is assumed that the following operations are performed separately for all n = 1, ... , N. If we assume conditional independence between past and future observations given the current state, the probability can be represented by where I a:b = {I a , ... , I b } is used to simplify notation. The probability of the observations themselves do not affect maximization, thus the expression can be further reduced to This set of posterior marginals can be found using the forward-backward algorithm, which operates by sequentially computing the forward probabilities and backward probabilities at each time step t = 1, ... , T. The update equation for the forward probabilities is given by N where For backward probabilities, the sequential update equation is where Finally, the posterior marginal probability can be computed at each time step as In theory, the standard form of the forward-backward algorithm is suitable for evaluating and comparing the probabilities of target memberships. In practice, however, when implemented in software with floating point precision variables, underflow becomes an unavoidable problem. Essentially, the magnitudes of probabilities become so low that they reach the lower limit of the variable type and are either forced to zero or set to a fixed lower bound. In either case, the value of the probabilities is no longer accurate, creating instability in the system. To avoid underflow, the forward-backward algorithm can be implemented using the log-sum-exp method [70]. This approach operates by adding the logarithms of the probabilities instead of multiplying them, creating a much wider dynamic range. However, the fact that the original expressions for the forward and backward term include summations makes it necessary to add an additional exponent and logarithm. The expression for the logarithm of the forward term becomes In this expression, there remains a significant risk of underflow when the values of a x t −1 become large magnitude negative numbers. For this reason, the value a max = maxxt−1 a xt−1 is computed and subtracted from each term within the summation. The revised expression sets the largest value of arguments within the exponent to zero and then adds back the value of a max outside of the summation. The following two expressions for the logarithm of the backward term perform an equivalent set of tricks to avoid underflow. Finally, the logarithm of the marginal probability is given by and, as discussed earlier, this probability is calculated for each n = 1, ... , N. An optimal bipartite assignment for each frame t is then achieved by applying the Hungarian algorithm to minimize an N × N matrix of costs given by The output of the assignment is an ordered set of detections, denoted 4. Training Details and Evaluation Methodology Tracking performance is evaluated on a collection of videos by comparing the system outputs to human-annotations, where both the shoulder-tail location and ear tag ID are provided for each animal in each frame. The following three scenarios are considered in the evaluation. 1. Location: The user is only interested in the location/orientation of each animal and the specific ID can be ignored. This scenario applies when only pen-level metrics are desired, such as average distance traveled per animal or pen space utilization. 2. Location and ID (Initialized): Both the location/orientation and the ID of each animal are desired and the human annotations are provided for the first frame. This scenario assumes that several videos are being processed in sequence and that tracking results from the previous video are available. Location/orientation with ID are important for individualized metrics, such as monitoring health and identifying aggressors. 3. Location and ID (Uninitialized): This scenario is the same as Location and ID (Initialized), except that human annotations are not provided for the first frame. This is the most challenging scenario because it forces the method to visually ID each animal from intermittent views of the ear tags within the time span of the video. The method described in Section 3 is evaluated according to each of these scenarios in Section 5. In the following, network training used to convert ear tag views into likelihood vectors is described in Section 4.1. Then, the dataset used for evaluation is described in detail in Section 4.2 and the metrics used for tracking success and failure are defined in Section 4.3. 4.1. Ear Tag Classification The proposed method identifies both the location and ID of each pig via separate networks. The dataset used to train the detector was introduced and provided by [28]. A set of 13,612 cropped color images of ear tag locations were used to train a classification network. A separate network was trained for grayscale (infrared) images using 6819 cropped images. The crops were labeled via human annotated as either belonging to one of the 16 known ear tags or to a category of “unknown tag ID.” When a tag image is classified as unknown tag ID, its target likelihood vector for training is set to for all categories. Figure 8 provides eight samples of each tag category along with 32 examples of unknown tag ID for both color and grayscale images. Ear tag classification training was done using stochastic gradient decent with momentum (0.9). It is important to note that, while the output is passed through a softmax layer to ensure a valid probability vector, training is done with MSE regression on the outputs. This allows for the network to target both one-hot vectors and uniform probabilities. 4.2. Dataset Description To evaluate the proposed tracking method, a human-annotated dataset was created. The data, along with cropped ear tag images and their corresponding categorizations, is available for download at psrg.unl.edu/Projects/Details/12-Animal- Tracking. It contains a total of 15 videos, each of which is 30 min in duration. The resolution of the videos is 2688 × 1520 and each was captured and annotated at 5 frames per second (fps). This frame rate was chosen empirically because it was deemed the minimum rate at which a human observer could comfortably interpret and annotate the video, keeping up with nearly all kinds of movement in the pen environment. Higher frame rates are nearly always better for tracking, but they come at the expense of increased processing times and, after a certain point, the improvements to tracking become negligible. The videos depict different environments, numbers of pigs, ages of pigs, and lighting conditions. Table 1 summarizes the videos and their properties. Figure 9 shows the first frame of each video with each pig’s shoulder, tail, and ID illustrated via annotation. Note that annotations are provided for every frame of the video, but only the first frame is show here. Table 1. Properties of the fifteen videos captured and annotated for tracking performance analysis. For each age range (nursery, early finisher, and late finisher), three videos were captured during the day with the lights on and two videos were captured at night using IR video capture and IR flood lights to illuminate the scene. The activity levels for the pigs were subjectively categorized as either High (H), 4.3. Performance and Analysis To analyze tracking performance, a matched detection and a missed detection must be defined. Unlike many tracking applications, the number of targets in the field of view remains constant in group-housing livestock facilities and the ground truth position of the head and tail of each target is provided in each frame. Furthermore, it is assumed that the tracker knows how many targets are in the environment, so the number of detections provided by the tracker and the number of targets in the scene are always equal. Let be the collection of N shoulder-tail pixel coordinates for T frames of a video sequence provided by a tracking algorithm, and let denote the corresponding ground truth human annotations. The distance between the predicted target i’s position and the actual position of target i in frame t is defined as and the length of the ground truth target from shoulder to tail is Given these two definitions, successful matching events are defined as follows. The first condition states that detection i must be closest to ground truth i and vice versa, while the sum of the shoulder-to-shoulder and tail-to-tail distances must not exceed the shoulder-to-tail distance of the ground truth. This distance, while heuristic, adapts to pigs of any size and ensures that the detected and ground truth locations are a plausible match. The second condition is less strict than the first. It imposes a back-and- forth matching criteria that requires that the minimum-distance match for the detection is also the minimum-distance match for the ground truth, but their indices (tag IDs) do not need to coincide. 5. Results The results of the proposed tracking method after being evaluated using the dataset are provided in Table 2. It is worth noting that, because the number of targets is known to the detector and each target’s location is approximated in each frame, the number of false positives and false negatives is equal. Thus, precision and recall are the same. Table 2. Precision/recall results for all 15 videos in the human-annotated dataset. The precision/recall results in “Location” do not require the tracker to provide the correct ID for animals. Instead, it is only required that each animal’s location is matched with a detection. The “Location and ID” results require the tracker to correctly identify the location and correct ID of a pig in order to be counted as a true positive. The “(Uninitialized)” variant does not provide the location and ID of each pig in the first frame, whereas the “(Initialized)” variant does. As anticipated, the worst performance occurs when the locations and IDs of each pig are uninitialized, with an average precision/recall is 0.8251. This situation forces the method to infer the ID of each animal from glimpses of their ear tags within the 30-min duration of the video. The “Late Finisher: Low (Night)” video has the worst performance, at 0.5252 precision/recall. Figure 10 illustrates the ground truth and network output for several error examples, and the top one shows the first frame of the “Late Finisher: Low (Night)” video. Only seven of the 13 pigs are labeled with the correct ID, even though all 13 are detected and oriented correctly. This video is particularly challenging for ear tag classification because, in addition to being at night when ear tags are already more difficult to discern, half of the pigs do not significantly change position during the 30 min record time. Therefore, ear tag presentations are not varied enough to confidently identify each individual pig. It’s worth noting that, in an actual deployment of the system where multiple 30 min segments are processed in sequence, there is a good chance that the ear tags will be viewed and classified in preceding videos. The “uninitialized” assumption is really a worst case scenario that ignores prior observations. The second row of Figure 10 illustrates a different kind of error. The pig labeled ‘66’ is sitting in the corner of the pen and its tail area is occluded by pig ‘II’. Pig ‘II’ also has its head occluded and the method, at some point earlier, detected a pig with reversed shoulder and tail at the same location at ‘II’. This detection likely occurred when ‘66’ was partially occluded and the method assigned the erroneous detection to the ‘66’ ID. In general, occlusions cause missed detections (false negatives) and the method is susceptible to mistaking the shoulders for the tail area when the pig’s head is down toward the ground and not visible to the camera. Errors in the third row of Figure 10 illustrate a situation where multiple targets are not detected for long enough periods of time that the method holds their last observed location until they are re-identified. This occurred for two reasons. First, pigs viewed from the side are more prone to occlusion than pigs viewed from a top-down perspective. Second, targets are smaller in this view so the detection network has less pixels and, correspondingly, less features per target. This could be at least partially corrected by processing larger images, but this would come at the expense of longer processing times. Hardware and Processing Times The method was implemented in MATLAB using the Deep Learning Toolbox. The desktop computer used to process the videos has an Intel i9-9900K 8-core CPU, 32 GB of DDR4 RAM, 512 GB of m.2 SSD memory, and an NVIDIA RTX2080ti GPU. Before processing frames with the fully-convolutional detector, they are downsampled to a resolution of 576 × 1024 × 3 (rows × columns × channels), and 24 frames are stacked together before processing on the GPU. It takes the computer ≈ 0.5 s to process the batch of 24 images. To classify ear tags, all ear tag windows are gathered together into a large batch of 64 × 64 × 3 images and processed all-at-once by the classification network. Classification takes, on average, 0.2 s for 24 images. All other processes involved in detection, including reading video frames and down-sampling, consume an additional 0.7 s per batch of 24 images. Thus, detection and ear tag classification take approximately 0.054 s per frame (18.5 fps). The proposed multi-object tracking method using fixed-cardinality interpolation and forward-backward inference takes 20 s to process a 30-min video with 16 pigs and this time drops to 6 s with 7 pigs. Fixed-cardinality interpolation consumes approximately 75% of that time and forward-backward inference uses the remaining 25%. The computational complexity of fixed-cardinality interpolation is O(TN 3 ), where T is the number of frames and N is the number of targets. This is due to the fact that the Hungarian algorithm, with complexity O(N 3 ), is used to associate every pair of neighboring frames. In practice, with 16 targets, this adds 0.01 s per frame and brings the total to 0.064 s per frame (15.6 fps). The videos used to analyze the method were recorded at 5 fps, so this performance demonstrates that video can comfortably be processed in real-time. References 1. PIC North America. Standard Animal Care: Daily Routines; Wean to finish manual; PIC North America: Hendersonville, TN, USA, 2014; pp.23–24. 2. Jack, K.M.; Lenz, B.B.; Healan, E.; Rudman, S.; Schoof, V.A.; Fedigan, L. The effects of observer presence on the behavior of Cebus capucinus in Costa Rica. Am. J. Primatol.2008, 70, 490–494. [CrossRef] [PubMed] 3. Iredale, S.K.; Nevill, C.H.; Lutz, C.K. The influence of observer presence on baboon (Papio spp.) and rhesus macaque (Macaca mulatta) behavior. Appl. Anim. Behav. Sci.2010, 122, 53–57. [CrossRef] [PubMed] 4. Leruste, H.; Bokkers, E.; Sergent, O.; Wolthuis-Fillerup, M.; Van Reenen, C.; Lensink, B. Effects of the observation method (direct v. from video) and of the presence of an observer on behavioural results in veal calves. Animal 2013, 7, 1858–1864. [CrossRef] [PubMed] 5. Matthews, S.G.; Miller, A.L.; Clapp, J.; Plötz, T.; Kyriazakis, I. Early detection of health and welfare compromises through automated detection of behavioural changes in pigs. Vet. J.2016, 217, 43–51. [CrossRef] 6. Wedin, M.; Baxter, E.M.; Jack, M.; Futro, A.; D’Eath, R.B. Early indicators of tail biting outbreaks in pigs. Appl. Anim. Behav. Sci.2018, 208, 7–13. [CrossRef] 7. Burgunder, J.; Petrželková, K.J.; Modry, D.; Kato, A.; MacIntosh, A.J. Fractal measures in activity patterns:` Do gastrointestinal parasites affect the complexity of sheep behaviour? Appl. Anim. Behav. Sci.2018, 205, 44–53. [CrossRef] 8. Tuyttens, F.; de Graaf, S.; Heerkens, J.L.; Jacobs, L.; Nalon, E.; Ott, S.; Stadig, L.; Van Laer, E.; Ampe, B. Observer bias in animal behaviour research: Can we believe what we score, if we score what we believe? Anim. Behav.2014, 90, 273–280. [CrossRef] 9. Wathes, C.M.; Kristensen, H.H.; Aerts, J.M.; Berckmans, D. Is precision livestock farming an engineer’s daydream or nightmare, an animal’s friend or foe, and a farmer’s panacea or pitfall? Comput. Electron. Agric.2008, 64, 2–10. [CrossRef] 10. Banhazi, T.M.; Lehr, H.; Black, J.; Crabtree, H.; Schofield, P.; Tscharke, M.; Berckmans, D. Precision livestock farming: An international review of scientific and commercial aspects. Int. J. Agric. Biol. Eng.2012, 5, 1–9. 11. Tullo, E.; Fontana, I.; Guarino, M. Precision livestock farming: An overview of image and sound labelling. In Proceedings of the European Conference on Precision Livestock Farming 2013:(PLF) EC-PLF, KU Leuven, Belgium, 10–12 September 2013; pp.30–38. 12. Taylor, K. Cattle health monitoring using wireless sensor networks. In Proceedings of the Communication and Computer Networks Conference, Cambridge, MA, USA, 8–10 November 2004. 13. Giancola, G.; Blazevic, L.; Bucaille, I.; De Nardis, L.; Di Benedetto, M.G.; Durand, Y.; Froc, G.; Cuezva, B.M.; Pierrot, J.B.; Pirinen, P.; et al. UWB MAC and network solutions for low data rate with location and tracking applications. In Proceedings of the 2005 IEEE International Conference on Ultra-Wideband, Zurich, Switzerland, 5–8 September 2005; pp.758–763. 14. Clark, P.E.; Johnson, D.E.; Kniep, M.A.; Jermann, P.; Huttash, B.; Wood, A.; Johnson, M.; McGillivan, C.; Titus, K. An advanced, low-cost, GPS-based animal tracking system. Rangeland Ecol. Manag.2006, 59, 334–340. [CrossRef] 15. Schwager, M.; Anderson, D.M.; Butler, Z.; Rus, D. Robust classification of animal tracking data. Comput. Electron. Agric.2007, 56, 46–59. [CrossRef] 16. Ruiz-Garcia, L.; Lunadei, L.; Barreiro, P.; Robla, I. A Review of Wireless Sensor Technologies and Applications in Agriculture and Food Industry: State of the Art and Current Trends. Sensors 2009, 9, 4728–4750. [CrossRef] 17. Kim, S.H.; Kim, D.H.; Park, H.D. Animal situation tracking service using RFID, GPS, and sensors. In Proceedings of the 2010 Second International Conference on Computer and Network Technology (ICCNT), Bangkok, Thailand, 23–25 April 2010; pp.153–156. 18. Escalante, H.J.; Rodriguez, S.V.; Cordero, J.; Kristensen, A.R.; Cornou, C. Sow- activity classification from acceleration patterns: A machine learning approach. Comput. Electron. Agric.2013, 93, 17–26. [CrossRef] 19. Porto, S.; Arcidiacono, C.; Giummarra, A.; Anguzza, U.; Cascone, G. Localisation and identification performances of a real-time location system based on ultra wide band technology for monitoring and tracking dairy cow behaviour in a semi- open free-stall barn. Comput. Electron. Agric.2014, 108, 221–229. [CrossRef] 20. Alvarenga, F.A.P.; Borges, I.; Palkovicˇ, L.; Rodina, J.; Oddy, V.H.; Dobos, R.C. Using a three-axis accelerometer to identify and classify sheep behaviour at pasture. Appl. Anim. Behav. Sci.2016, 181, 91–99. [CrossRef] 21. Voulodimos, A.S.; Patrikakis, C.Z.; Sideridis, A.B.; Ntafis, V.A.; Xylouri, E.M. A complete farm management system based on animal identification using RFID technology. Comput. Electron. Agric.2010, 70, 380–388. [CrossRef] 22. Feng, J.; Fu, Z.; Wang, Z.; Xu, M.; Zhang, X. Development and evaluation on a RFID-based traceability system for cattle/beef quality safety in China. Food Control 2013, 31, 314–325. [CrossRef] 23. Floyd, R.E. RFID in animal-tracking applications. IEEE Potentials 2015, 34, 32– 33. [CrossRef] 24. Neethirajan, S. Recent advances in wearable sensors for animal health management. Sens. Bio-Sens. Res.2017, 12, 15–29. [CrossRef] 25. Schleppe, J.B.; Lachapelle, G.; Booker, C.W.; Pittman, T. Challenges in the design of a GNSS ear tag for feedlot cattle. Comput. Electron. Agric.2010, 70, 84–95. [CrossRef] 26. Ardö, H.; Guzhva, O.; Nilsson, M.; Herlin, A.H. Convolutional neural network- based cow interaction watchdog. IET Comput. Vision 2017, 12, 171–177. [CrossRef] 27. Ju, M.; Choi, Y.; Seo, J.; Sa, J.; Lee, S.; Chung, Y.; Park, D. A Kinect-Based Segmentation of Touching-Pigs for Real-Time Monitoring. Sensors 2018, 18, 1746. [CrossRef] [PubMed] 28. Psota, E.T.; Mittek, M.; Pérez, L.C.; Schmidt, T.; Mote, B. Multi-Pig Part Detection and Association with a Fully-Convolutional Network. Sensors 2019, 19, 852. [CrossRef] [PubMed] 29. Zhang, L.; Gray, H.; Ye, X.; Collins, L.; Allinson, N. Automatic individual pig detection and tracking in pig farms. Sensors 2019, 19, 1188. [CrossRef] [PubMed] 30. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in neural information processing systems, Lake Tahoe, NV, USA, 3–8 December 2012; pp.1097–1105. 31. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [CrossRef] 32. Kirk, D. NVIDIA CUDA software and GPU parallel computing architecture. In Proceedings of the ISMM, New York, NY, USA, 19–25 May 2007; pp.103–104. 33. Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, Orlando, FL, USA, 3–7 November 2014; pp.675–678. 34. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp.770–778. 35. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp.2980–2988. 36. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large- scale hierarchical image database. In Proceedings of the 2009 IEEE conference on computer vision and pattern recognition, Miami Beach, FL, USA, 25–29 June 2009; pp. 248–255. 37. Everingham, M.; Eslami, S.A.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vision 2015, 111, 98–136. [CrossRef] 38. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, New York, NY, USA, 6–12 September 2014; pp.740–755. 39. Andriluka, M.; Pishchulin, L.; Gehler, P.; Schiele, B.2d human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 3686–3693. 40. Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp.3213–3223. 41. Dehghan, A.; Modiri Assari, S.; Shah, M. Gmmcp tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2016; pp.4091–4099. 42. Milan, A.; Leal-Taixé, L.; Reid, I.; Roth, S.; Schindler, K. MOT16: A benchmark for multi-object tracking. arXiv Preprint 2016, arXiv:1603.00831. 43. Zhong, Z.; Zheng, L.; Cao, D.; Li, S. Re-ranking person re-identification with k- reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Venice, Italy, 22–29 October 2017; pp.1318–1327. 44. Ristani, E.; Tomasi, C. Features for multi-target multi-camera tracking and re- identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp.6036–6046. 45. Nasirahmadi, A.; Richter, U.; Hensel, O.; Edwards, S.; Sturm, B. Using machine vision for investigation of changes in pig group lying patterns. Comput. Electron. Agric. 2015, 119, 184–190. [CrossRef] 46. Kashiha, M.A.; Bahr, C.; Ott, S.; Moons, C.P.; Niewold, T.A.; Tuyttens, F.; Berckmans, D. Automatic monitoring of pig locomotion using image analysis. Livest. Sci.2014, 159, 141–148. [CrossRef] 47. Nilsson, M.; Ardö, H.; Åström, K.; Herlin, A.; Bergsten, C.; Guzhva, O. Learning based image segmentation of pigs in a pen. In Proceedings of the Visual observation and analysis of Vertebrate And Insect Behavior –Workshop at the 22nd International Conference on Pattern Recognition (ICPR 2014), Stockholm, Sweden, 24 August 2014; pp.24–28. 48. Zhang, Z. Microsoft kinect sensor and its effect. IEEE Multimedia 2012, 19, 4– 10. [CrossRef] 49. Kongsro, J. Estimation of pig weight using a Microsoft Kinect prototype imaging system. Comput. Electron. Agric.2014, 109, 32–35. [CrossRef] 50. Zhu, Q.; Ren, J.; Barclay, D.; McCormack, S.; Thomson, W. Automatic Animal Detection from Kinect Sensed Images for Livestock Monitoring and Assessment. In Proceedings of the 2015 IEEE International Conference on Computer and Information Technology, Liverpool, UK, 26–28 October 2015; pp.1154–1157. 51. Stavrakakis, S.; Li, W.; Guy, J.H.; Morgan, G.; Ushaw, G.; Johnson, G.R.; Edwards, S.A. Validity of the Microsoft Kinect sensor for assessment of normal walking patterns in pigs. Comput. Electron. Agric.2015, 117, 1–7. [CrossRef] 52. Lee, J.; Jin, L.; Park, D.; Chung, Y. Automatic Recognition of Aggressive Behavior in Pigs Using a Kinect Depth Sensor. Sensors 2016, 16, 631. [CrossRef] [PubMed] 53. Lao, F.; Brown-Brandl, T.; Stinn, J.; Liu, K.; Teng, G.; Xin, H. Automatic recognition of lactating sow behaviors through depth image processing. Comput. Electron. Agric.2016, 125, 56–62. [CrossRef] 54. Choi, J.; Lee, L.; Chung, Y.; Park, D. Individual Pig Detection Using Kinect Depth Information. KIPS Trans. Comput. Commun. Syst.2016, 5, 319–326. [CrossRef] 55. Mittek, M.; Psota, E.T.; Pérez, L.C.; Schmidt, T.; Mote, B. Health Monitoring of Group-Housed Pigs using Depth-Enabled Multi-Object Tracking. In Proceedings of the Visual observation and analysis of Vertebrate And Insect Behavior, Cancun, Mexico, 4 December 2016; pp.9–12. 56. Kim, J.; Chung, Y.; Choi, Y.; Sa, J.; Kim, H.; Chung, Y.; Park, D.; Kim, H. Depth-Based Detection of Standing-Pigs in Moving Noise Environments. Sensors 2017, 17, 2757. [CrossRef] 57. Matthews, S.G.; Miller, A.L.; PlÖtz, T.; Kyriazakis, I. Automated tracking to measure behavioural changes in pigs for health and welfare monitoring. Sci. Rep.2017, 7, 17582. [CrossRef] 58. Pezzuolo, A.; Guarino, M.; Sartori, L.; González, L.A.; Marinello, F. On-barn pig weight estimation based on body measurements by a Kinect v1 depth camera. Comput. Electron. Agric.2018, 148, 29–36. [CrossRef] 59. Fernandes, A.; Dórea, J.; Fitzgerald, R.; Herring, W.; Rosa, G. A novel automated system to acquire biometric and morphological measurements, and predict body weight of pigs via 3D computer vision. J. Anim. Sci., 2018, 97, 496–508. [CrossRef] 60. Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp.7263–7271. 61. Mittek, M.; Psota, E.T.; Carlson, J.D.; Pérez, L.C.; Schmidt, T.; Mote, B. Tracking of group-housed pigs using multi-ellipsoid expectation maximisation. IET Comput. Vision 2017, 12, 121–128. [CrossRef] 62. Bochinski, E.; Eiselein, V.; Sikora, T. High-speed tracking-by-detection without using image information. In Proceedings of the 201714th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017; pp.1–6. 63. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, 24–27 June 2014; pp.580–587. 64. Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7291–7299. 65. Papandreou, G.; Zhu, T.; Chen, L.C.; Gidaris, S.; Tompson, J.; Murphy, K. PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part- Based, Geometric Embedding Model. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp.269–286. 66. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European conference on computer vision, Amsterdam, The Netherlands, 11–14 October 2016; pp.21–37. 67. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical image computing and computer-assisted intervention, Munich, Germany, 5–9 October 2015; pp.21–37. 68. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Venice, Italy, 22–29 October 2017; pp.4700–4708. 69. Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp.801–818. 70. Chen, M.; Liew, S.C.; Shao, Z.; Kai, C. Markov Approximation for Combinatorial Network Optimization. IEEE Trans. Inf. Theory 2013, 59, 6301–6327. [CrossRef] 71. Hansen, M.F.; Smith, M.L.; Smith, L.N.; Salter, M.G.; Baxter, E.M.; Farish, M.; Grieve, B. Towards on-farm pig face recognition using convolutional neural networks. Comput. Ind.2018, 98, 145–152. It is to be understood that, while the methods and compositions of matter have been described herein in conjunction with a number of different aspects, the foregoing description of the various aspects is intended to illustrate and not limit the scope of the methods and compositions of matter. Other aspects, advantages, and modifications are within the scope of the following claims. Disclosed are methods and compositions that can be used for, can be used in conjunction with, can be used in preparation for, or are products of the disclosed methods and compositions. These and other materials are disclosed herein, and it is understood that combinations, subsets, interactions, groups, etc. of these methods and compositions are disclosed. That is, while specific reference to each various individual and collective combinations and permutations of these compositions and methods may not be explicitly disclosed, each is specifically contemplated and described herein. For example, if a particular composition of matter or a particular method is disclosed and discussed and a number of compositions or methods are discussed, each and every combination and permutation of the compositions and the methods are specifically contemplated unless specifically indicated to the contrary. Likewise, any subset or combination of these is also specifically contemplated and disclosed.