Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD OF PREDICTION OF PRESENCE/ABSENCE FOR THE THREE GENERA OF LARVAE (CULEX, AEDES, ANOPHELES) IN BREEDING SITES
Document Type and Number:
WIPO Patent Application WO/2024/100428
Kind Code:
A1
Abstract:
System and method for determining the presence/absence of larvae for the three genera (Culex, Aedes, Anopheles) per breeding site, using a cloud server for processing the collected inspection and geospatial data and applying Machine Learning (RF/ XGBoost) algorithms.

Inventors:
MOURELATOS SPYRIDON (GR)
Application Number:
PCT/GR2023/000060
Publication Date:
May 16, 2024
Filing Date:
November 09, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
OIKOANAPTYXI ANINYMI ETAIRIA (GR)
International Classes:
G16H50/70; G06N20/00; G06Q10/04; G06Q10/06; G06Q50/02; G16H50/80
Other References:
TSANTALIDOU ARGYRO ET AL: "MAMOTH: An Earth Observational Data-Driven Model for Mosquitoes Abundance Prediction", REMOTE SENSING, vol. 13, no. 13, 30 June 2021 (2021-06-30), CH, pages 2557, XP093149725, ISSN: 2072-4292, DOI: 10.3390/rs13132557
JOSHI ANANYA ET AL: "Review of machine learning techniques for mosquito control in urban environments", ECOLOGICAL INFORMATICS, vol. 61, 1 March 2021 (2021-03-01), NL, pages 101241, XP093149419, ISSN: 1574-9541, DOI: 10.1016/j.ecoinf.2021.101241
WIMBERLY MICHAEL C. ET AL: "Satellite Observations and Malaria: New Opportunities for Research and Applications", TRENDS IN PARASITOLOGY, vol. 37, no. 6, 1 June 2021 (2021-06-01), GB, pages 525 - 537, XP093149418, ISSN: 1471-4922, DOI: 10.1016/j.pt.2021.03.003
ALDOSERY AISHA ET AL: "Mosquito Ovitraps IoT Sensing System (MOISS): Internet of Things-based System for Continuous, Real-Time and Autonomous Environment Monitoring", 2022 IEEE 8TH WORLD FORUM ON INTERNET OF THINGS (WF-IOT), IEEE, 26 October 2022 (2022-10-26), pages 1 - 8, XP034363615, DOI: 10.1109/WF-IOT54382.2022.10152111
LOW RUSSANNE D. ET AL: "Building International Capacity for Citizen Scientist Engagement in Mosquito Surveillance and Mitigation: The GLOBE Program's GLOBE Observer Mosquito Habitat Mapper", INSECTS, vol. 13, no. 7, 13 July 2022 (2022-07-13), CH, pages 624, XP093150308, ISSN: 2075-4450, DOI: 10.3390/insects13070624
Attorney, Agent or Firm:
KOSTI, Iliana (GR)
Download PDF:
Claims:
CLAIMS

1. Method (500) implemented by a computer system (400) for the prediction of the presence / absence of larvae for the three genera (Culex, Aedes, Anopheles) pebreed-ing site is characterized by the fact that it includes collection of inspection data (SOI) sent in real time from portable tablet computing devices operated by technical staff carrying out mass sampling at designated sam-pling sites; Earth observation data collection (502) consisting of remote sensing data, meteorological - climatological data, topographic, environmental and demographic data, pre-processing of the collected data (503) by the cloud processor to perform at least one clean-up, extreme correction, management of missing data, augmentation of da-ta and balancing of class distribution, transformation of data and selection of the most important characteristics to predict the presence/absence of larvae per breed-ing site, application of Machine Learning algorithms (504) for the implementation of three dis-tinct prediction models (506) of the presence/absence of larvae for the three genera (Culex, Aedes, Anopheles) per breeding site,

2. Method of claim 1, where during data transformation the variables breeding site =dry are derived which indicates whether the breeding sites are dry or not and the dry site over time which verifies the presence of surface water in the breeding sites.

3. Method of claim 1, the Machine Learning algorithms (504) applied are Random For-est and/or Extreme Gradient Boosting (XGBoost).

4. Method of claim 1, the three distinct prediction models (506) are retrained with the best selected superparameters (505) to improve the predictive performance of the presence/absence of larvae for the three genera (Culex, Aedes, Anopheles) per breed-ing site.

5. Claim method 4, the three distinct predictions of larval presence/absence for the three genera (Culex, Aedes, Anopheles) per breeding site are used by decision-maldng algorithms (507) to perform qualitative and/or quantitative predictions.

6. Method of claim 1, Earth observation data (502) is characterized by containing satel-Hte data, land cover data, climate forecasts, geophysical and socio-economic data sets.

7. Method of claim 1, inspection data (501) is characterized by containing encoded the location of the sampling site, the level of larval abundance per breeding site, their dis-tribution, their stage of development and their genus.

8. System (400) that implements three distinct predictions of the presence/absence of larvae for the three genera (Culex, Aedes, Anopheles) per breeding site of claim 1 is characterized by including portable tablet computing devices for recording and sending inspection data (401) via the internet (405) to the cloud server (403); doud server (403) for the collection of inspection (401) and Earth observation (402) data, their preprocessing, the application of Machine Learning algorithms and the realization of three distinct prediction models (403-5) of the presence/absence of lar-vae for the three genera (Culex, Aedes, Anopheles) per breeding site, databases and computer networks for the storage of Earth observation data (402).

9. System of claim 8, the cloud server (403) includes 3 direct connection database (403-2), processor (403-1), memory (403-3) and is characterized by the fact that the Ma-chine Learning algorithms for the realization of the three distinct prediction models (403-5) of the presence/absence of larvae for the three genera (Culex, Aedes, Anoph-efes) per breeding site are Random Forest and/or (Extreme Gradient Boosting -XGBocst).

10. System of claim 9, the cloud server (403) includes a digital web application (403-4) for realtime registration of inspection data (401) in the online database (403-2).

11, System of claim 9, the cloud server (403) is used for larval control and for performing the schedule of planned actions, management of field data reports during larvai mon-itoring in wetlands, rural, urban, peri-urban systems, visualization of the map with lar-val breeding sites, monitoring of the movement of the vehicle fleet in real time, fore-casting mosquito nuisance, displaying statistics of actions carried out and visualizing maps of detected surface waters by remote sensing,

12, System of claim 11, database (403-2) contains stored digitized and georefe Fenced potential mosquito breeding sites in wetlands, rural peri-urban and urban systems. 13. System of claim 8, the three distinct predictions of the presence/absence of larvae for the three genera (Culex, Aedes, Anopheles) per breeding site are used by decision-making algorithms (507) to perform qualitative and/or quantitative predictions.

Description:
SYSTEM AND METHOD OF PREDICTION OF PRESENCE/ABSENCE FOR THE THREE GENERA OF LARVAE ( Culex, Aedes, Anopheles} IN BREEDING SITES

DESCRIPTION [0001] Mosquitoes are potential vectors of serious communicable diseases such as malaria, dengue, West Nile virus and others, and cause millions of deaths worldwide every year. The reduction of mosquito populations in the aquatic stage of the larva with biological and chemical preparations is the main method for their control and prevention in relation to the outbreak of diseases tra nsmitted by them. Alternatively, control actions are carried out in the adult stage as well (in the air) using chemical biocides.

[0002] The reproduction of mosquitoes takes place in four stages until their full development into adult mosquitoes, more specifically the breeding process consists of laying eggs (embryo phase), its transformation into a larva, its transformation into a nymph and finally its maturation into an adult mosquito. A female mosquito, after carrying out a blood meal, matures her eggs and places them on the surface of the water. Then they hatch larvae that are aquatic and equipped with two respiratory systems that allow them to breathe both under water and above. In Europe there are three dominant mosquito genera: Culex t Aedes, Anopheles. In the region of Central Macedonia of the Greek territory (total area 275 km2), known for its periodic high reproduction in mosquito larvae due to the existence of wetlands (51.4 km 2 ) and rice fields (223.5 km2), the dominant species is Ochlerotatus caspius. Nso for this area it has been recorded that Culex pipiens is a vector of transmission of West Nile Virus to humans while Anopheles sacharovi is probably a vector of malaria transmission in Greece.

[0003] The standard mosquito population control practice adopted by most countries is systematic spraying. However, the use of biocidal products (larvicides and acmeicides) poses risks to the environment (accumulation in water resources, iand and air, adverse effects on non-target organisms and/or beneficial insects) and humans (headaches, skin irritations and even cancer in case of long-term non-rational use). It has also been observed that repeated use of biocidal products over a long period of time can lead to the development of mosquito resistance to certain active substances and make control actions ineffective. [0004] Monitoring of the adult mosquito population up to date is carried out with conventional light or pheromone bait traps placed in areas of interest to extract useful information about the dynamics of the targeted mosquito population. One of the biggest drawbacks of this monitoring method is the systematic presence of personnel (e.g. technical personnel) in the study areas at predetermined Intervals for enumeration of the trapped mosquito population. In addition, with recent technological developments in telecommunications and big data management, new horizons have opened up regarding the processing of field data in real time for decision making and risk assessment. However, the available software focuses mainly on monitoring data of adult mosquitoes, monitoring their resistance to insecticides, modeling to determine their level of abundance or the risk of disease transmission. In general, the exploratory analysis of data and the determination of larval productivity in breeding sites is a very arduous process and is usually performed retrospectively rather than in real time in order to prioritize breeding sites for future control actions. Also, with the progress of Machine Learning techniques and Artificial intelligence, new innovative tools have been developed. The reported techniques mainly focus on predicting the population of adult mosquitoes and predicting disease transmission (epidemiological risk), such as counting and predicting mosquito larvae within smart traps, However, none of the above techniques foresees the possibility of the existence/absence of mosquito larvae tn natural surface water habitats (wetlands, irrigation canals, flooded plots and others).

[0005] It has been proven that a key element for data recording and decision support is the mapping of ail potential larval breeding sites with Geographic Information Systems (G IS) means, in order to create operational maps clearly delimiting the sampling stations that will be used by technical personnel to carry out monitoring and control actions. In addition, remote sensing provides effective means of mapping the area, extent and changes of surface water over time. Therefore, the progress of remote sensing as well as the increasing availability of Earth Observation Data provide a wealth of data that can be exploited using Machine Learning algorithms. By Machine Learning we mean the ability of computers to learn without specific programming. More specifically, Machine Learning algorithms during exploratory data analysis can learn and establish behavior patterns within : the dataset and automatically make future predictions that can be used to draw conclusions. Supervised Machine Learning constructs functions that represent given inputs to known desired outputs with the ultimate goal of generalizing the functions for inputs with unknown outputs.

[0006] A prerequisite for mosquito control in the larval stage (larvicide) is the identification and recording of all their potential breeding sites. In the implementation of this invention, 26 different types were defined as breeding grounds (landfills, biological treatment of sewage, cesspools, reservoirs, tires, wetlands, drainage channels, lakes, potholes, cemeteries, sewage, wells, swimming pools, flooded areas, agricultural water supply points, watercourses, point sources, garbage, livestock farms, water tanks, fountains, taps, sewer basins, damaged olive oils, rain channels and animal watering cans).

DISCLOSURE OF THE INVENTION

[0007] Therefore, in order to address all the above problems, the need arose for the development of a system and an automated method of recording the presence/absence of larvae in breeding sites in real time. The application of the requested for protection system and the method took place in the areas of Central Macedonia, Western Greece, and Crete of the Greek territory, but this is not a restrictive factor as the proposed system and method can be appropriately modified and implemented for any other area. In particular, specialized technical personnel using a tablet application, in which operational maps recording sampling stations and larval breeding sites are included, during their physical presence in the designated breeding sites, carry out mass sampling of larvae and especially of the genus of Culex. Aedes, Anopheles. The construction of detailed operational maps aims both to better monitor the existence and evolution of larvae and to accurately indicate spray surfaces. These maps draw the water receivers, various topographic features and define the mosquito larvae sampling stations, after their marking in the field. All mosquito breeding sites in all systems (rural, peri-urban, natural and urban) are depicted, the mosquito larva sampling stations are precisely defined and the codified nomenclature of all sampling stations is presented (unique code number for each breeding site). Larval sampling shall be carried out at accessible sampling stations containing water on all or part of their surface (drainage channels, waterways, point springs, wetlands and temporarily flooded areas in peri-urban and urban areas).

[0008] The technical personnel, during periodic sampling (every 7-15 days) at the designated sampling stations from the digital operational maps, enters encoded in a special digital application of the tablet the following 12 inspection data: date of visit, region-regional unitmunicipality and settlement, sampling control station code, number of samples collected, number and developmental stage of mosquito larvae separately for the three genera {Culex, Aedes, Anopheles), extent of the sampling station, extent of flooded surface and area of sprayed surface, area of sampling station with presence of mosquito larvae, type of preparation - quantity of preparation and amount of spraying solution, observations concerning the nature of the outbreak such as a non-water outbreak (dry site) - water flow level in breeding site -presence of sewage - presence of adult mosquitoes - presence of dense vegetation - inability to access the breeding site - inability to sample - inability to spray - preventive spraying, observations - problems, GPS positioning, photo capture. In addition, to determine the number of larvae, a 5-point scale is used (absence, abundance level 1 ~ 1-7 larvae, abundance level 2 = 8-15 larvae, abundance level 3 = 15-25 larvae, abundance level 4 = > 25 larvae) for each of the three dominant genera (Culex, Aedes, Anopheles).

[0009] The tablet computing device (not shown) includes a processor for executing programming commands stored in memory, a screen for displaying information to the user of the device, a speaker, a user interface containing user input media (keyboard, mouse, touch pad, etc.) but also audio input media, camera, sensors (gyroscope, accelerometer, etc.), GPS. The software of the computing device concerns the coordination Of device functions, network browsing applications, the graphical environment, the development and display of digital operational maps and any other possible application on ; the device. The communication unit of the computing device allows the device to communicate with other computer systems over a mobile network or any other network. The recorded encoded data is sent in real time over the internet for the purpose of storing in an online database of the cloud server.

[0010] The web platform of the cioud server is the main component of the implementation of this invention as it is the tool used for larval control and planning individual activities. It consists of data monitoring software and the digital application of the tablet (application where field data is recorded via tablet by technical staff). The web applications (Plan 1) developed include the following eight functions: 1) the timetable of the planned actions 2) the field data reports during larval monitoring in wetlands, rural, urban, peri-urban systems 3) the map with the larval breeding sites 4) the real-time monitoring of the movement of the fleet of vehicles participating in the whole operation 6) the prediction of mosquito nuisance

7) the statistics on the actions carried out in the field 8) maps of the detected water surfaces by remote sensing, A key building block of the web platform is the geodatabase with more than 260000 digitized and georeferenced potential mosquito breeding sites in wetlands, rural, peri-urban and urban systems. Geographic information Systems are used to visualize the 26 different larval breeding sites.

[0011] In an implementation of the invention, the cloud servers used include one or more processing units to execute commands on a number of different operating systems such as UNIX, LINUX, Microsoft Windows, The processing units are connected to an integrated storage unit for storing and retrieving the collected data while the memory unit can be dynamic or static random-access memory (SRAM/ DRAM)., read-only memory (ROM), readonly programmable-erasable memory (EPROM). The mentioned memory types pose as an example and are not a limiting factor, as other types of memory can be used to store computer programs. The main cloud server used for larval control is connected to a database containing the data of 12 measurements transmitted in real time by staff during their physical presence at the selected larval monitoring sites. Access to the database by accredited personnel takes place following a verification process of their personal data. In addition, a number of other data of relevant interest, such as cartographic data of the area of the monitoring sites, are likely to be stored in the database for exploitation.

[0012] For the realization of three distinct models for predicting the presence/absence of the three larval genera (Culex, Aedes, Anopheles}, in addition to the recorded sent inspection data during the physical presence of the technical staff at the larval sampling stations. Earth observation data are also used (remote sensing data, meteorological-climatological data, topographic data, environmental data, demographic data (grouping of villages)). The Google Earth Engine (GEE) is a cloud server that allows remote sensing data to be accessed and processed on a global scale. The list of data it contains mainly includes satellite data and observations (Landsat, Sentinel 1,2,3, MODIS, ASTER), land cover data, environmental data, climate forecasts, geophysical and socio-economic datasets. Earth observation data are processed to extract characteristics from satellite images of sites sampled by technical staff on specified inspection dates.

[0013] Remote sensing data is obtained during the processing of satellite images (Sentinel data) on the cloud server network platform (GEE), In order to reduce the collected satellite Images, the location of interest (Central Macedonia), the time period (1/1/2018-31/12/2020) are defined, and the cloud cover percentage should be less than 20% in the area of interest.

Then 4 remote sensing indicators are calculated for each image. The normalized difference vegetation index (NOVI), the normalized difference water index (NDWI), the normalized difference moisture index (NDMI), the normalized difference construction index (NDBI). Then time series are obtained within a reference zone of 50m, 200m, 1000m, 5000m around the sampling stations. For overlapping areas all images are merged giving a single output per day with the maximum values of remote sensing indices (NDVI, NDWI, NDMI, NDBI). Finally, time series are stored in a csv file in Google Drive.

(0014] Meteorological-climatological data Include land surface temperature, precipitation events and air temperature. Temperature affects the mosquito population by prolonging their breeding season, MODIS retrieves the daily and evening surface temperature. Rainfall generally favors the rapid reproduction of larvae while air temperature is decisive for growth and survival of larvae. The processing of satellite images for the retrieval of the above ciimate data is carried out on the cioud server web platform (GEE). As mentioned above, remote sensing data play an important role in mapping the location and extent of surface water as potential larval breeding sites. The processing of Synthetic Aperture Radar images allows mapping and monitoring of the presence and extent of surface water in wetlands and rice fields. This process was performed on the main cloud server based on the Otsu Valley method by assigning the environmental variable (water_100).

[0015] A plethora of other data is used by the cloud server to implement three distinct predictions for the three larval genera (Culex, Aedes, Anopheles). Topographic data are widely known to play a decisive role in the level of mosquito abundance and distribution This is because it provides information on the geographical background where; mosquito genera develop their habitat. Parameters such as average values of altitude, ;, appearance and accumulated flow shall be recovered in a reference zone of 50m, 200m, 500m, 1000m around breeding grounds. The distance of all breeding sites from the coastline is also calculated.

[0016] Environmental data are essential factors influencing the ecology^ of larval species. Changes over time in land use and cover affect the level of abundance and distribution of mosquitoes. The mean value of surface water and moisture in a reference zone of 500m, 1000m, 5000m around breeding grounds and the total length of watercourses as a combination of the length of mapped breeding sites and the length of watercourses of national hydrological data are calculated. Also, clustering of villages in the region of Central Macedonia was carried out in order to optimize the scheduling of larval sampling and spraying interventions. For the implementation of villages' clustering, weekly socio-economic data were collected, environmental and meteorological data for the period 1/1/2010 to 31/12/2019 for l022 villages with a population of less than 10000 residents creating a dataset of 549,936 records. When processing the resulting dataset, 25 optimal characteristics for the study area were recovered:

Climatic characteristics (10) - accumulated rain in the last week/last month/ and since the beginning of the year, sample days, daily and night land surface temperature, maximum/minimum/average temperature, average wind speed.

Topographic features (3) ~ altitude, slope, elevation.

Hydrographic features (4) ~ accumulated flow, distance from nearest water surface, NDVI, distance from coast.

Land use features (3 nominal and 3 numerical) - rice or sugar beet or cotton or vegetables or meadows or artificial, discontinuous urban web or permanently irrigated land, length of water stream, presence of septic tank (yes/no), ND8I,NDVI.

Census characteristlcs(2) - total population, elderly population (>60).

[0017] The collection of the above data on the cloud server resulted in a dataset of 230,955 observations for three consecutive years 2018-2020 with a total of 268 variables. The resulting variables are of different form, resulting in pre-processing cleaning, normalization, coding, harmonization and quality control so that they can be used to implement the three distinct models predicting the presence/absence of species (Culex, Aedes, Anopheles) tn breeding grounds. More specifically, due to the fact that the resulting dataset is unbalanced (14.3% Culex, 4.1% Aedes, 2.6% Anopheles), the Synthetic Minority Oversampling Technique (SMOTE) and ENN (Edited Nearest Neighbor) are applied, thus creating new paradigms (data augmentation) for the minority class until a balance of ciass distribution is achieved. Also, during the processing of inspection data sent by technical personnel and registered in the direct datab ase of the main cloud server, data transformation takes place and the variable breeding site = dry is obtained, which indicates whether the breeding site is dry or not. This information is only taken into account when in the last 15 days there has been no increase (as estimated by the Wamos algorithm) of surface water in a 1000m reference zone around the breeding site (wamos„1000). In order to reduce the uncertainty caused by rain events or other water inflow incidents between the site inspection dates, a new variable was created: dry site over time. The information of this variable is collected by the technical personnel during their last on-site visit to the breeding sites by recording whether there is water in the breeding site. The presence of water in the monitored sites is verified by the analysis of Synthetic Aperture Radar images that allow mapping and monitoring of the presence and extent of surface water in the breeding sites. Dry site over time is a very important variable that affects the three prediction models of the three mosquito species as it transfers real-time information from the field, allowing the prediction models to more accurately attribute the presence of larvae in breeding sites. In addition, due to the fact that this information is continuously available from the server’s web platform to the three distinct prediction models, the operational dynamics of the models are ensured, while fewer visits are conducted to breeding sites for mosquito control actions. The final dataset resulting from data purification and pre-processing contains a total of 230,627 registrations from the monitored breeding sites.

[0018] The pre-processed data is then selected for the most relevant features (feature selection) using the technique of Recursive Feature Elimination (RFE). Figure 2a illustrates the table with the 32 most important characteristics used by the three prediction models of (Culex, Aedes, Anopheles)). The three distinct models predicting the presence/absence of the three species (binary classification) in breeding sites were implemented with the algorithms

Random Forest (RF) and (Extreme Gradient boosting -XGBoost). Decision trees are generally constructed by sampling the original training set with replacement, which means that individual data can be selected more than once. After taking multiple sampledata, these weak models are trained independently in a way that reduces the correlation between the individual trees and then calculates their average. In the case of binary classification (presence/absence of larvae per breeding site) in the Random Tree algorithm the predicted value is the most common order in the node (majority). In the XGBoost algorithm each tree learns and strengthens from the previous tree that has grown, i.e. the newly created models predict the errors of the previous models, correct them and combine them until they make the final prediction. Finally, tuning of the two Machine Learning algorithms used was performed using the GridSearch method of the Scikit-Learn library, The optimal hyperparameters of the RF, XGBoost algorithms for the three prediction models of presence/absence (Culex, Aedes, Anopheles) per breeding site are illustrated in the table of Figure 2 b.

[0019] Figure 3 presents the diagram of the prediction system architecture implemented by the cloud server, where after splitting the processed dataset (3.1) into 80% training data (3,2) and 20% control data (3.3), the most relevant characteristics (3.4) 32 are selected in the set as shown in the table in Figure 2a and the RF XGBoost algorithms (3.5) are applied for three distinct prediction models of presence/absence Of {Culex, Aedes, Anopheles) per breeding site (3.6). The generated prediction information can be used by the cloud server to trigger alarm procedures and valid notification to bodies related to the issue, as well as initiating actions to combat the phenomenon. Therefore, early detection of the presence of larvae in an area reduces the likelihood of nuisance in nearby areas.

[0020] The important advantage of this invention is the adoption of ameliorated mosquito control practices as the larvae are monitored in real time at their breeding grounds, thus reducing the need for adult mosquito control and insecticide use, achieving the reduction of nuisance from mosquitoes and therefore improving the quality of life of residents and visitors of an area. The immediate access to the collected data at any time is considered valuable because it is a critical factor for the proper functioning of the three prediction models on the cloud server. Also, optimization of the use of resources (technical means, human resources) is achieved with simultaneous economic benefit as fuel consumption is reduced due to the rational management of fleet vehicles participating in mosquito control programs in the wide area. The most important advantage, however, is that by mitigating the epidemiological risk of West Nile Virus transmission, a significant improvement in public health is achieved. The following is the description of the drawings depicting the realization of the Invention as a way of example and not as a way of limitation.

BRIEF DESCRIPTION OF PROJECTS

[0021] Figure 1 is the web platform of the cloud server displaying the eight web applications.

Figure 2a illustrates the table with the 32 most important characteristics used by the three prediction models (Culex, Aedes, Anopheles) when applying the technique of Retrospective Feature Elimination.

Figure 2b illustrates the table with the optimal superparameters of the applied RF, XGBoost algorithms for the three distinct prediction models of presence/absence! of (Culex, Aedes, Anopheles) per breeding site.

Figure 3 is the flow diagram of the system architecture forth® three distinct prediction models of presence/absence (Culex, Aedes, Anopheles) per breeding site.

Figure 4 is an example of an implementation model of the system of three distinct models predicting the presence/absence of larvae for the three genera (Culex, Aedes, Anopheles) per breeding site.

Figure 5 is the flow diagram of the method of the three distinct models predicting the presence/absence of larvae for the three genera (Culex, Aedes, Anopheles) per breeding site. [0022] The term larvae reflects mosquitoes whose presence is not desirable on a site. The term prediction of the presence/absence of larvae per breeding site refers to the qualitative and quantitative assessment of the level of abundance of larvae found in a particular breeding site. [0023] Figure 4 illustrates a system (400) predicting the presence/absence of larvae per breeding site. According to the system, inspection data (401) are sent by the technical staff conducting mass sampling to the designated sampling sites based on the readings of digital operational maps in real time and via the internet (405) to a cloud server (403) where they are recorded in the online database (403 -2). These data shall be encoded and shall contain information on the location of the sampling station, the level of larval abundance per breeding site, their distribution, stage of development, genus, etc. At the Same time. Earth observation data (402) such as remote sensing, meteorological-climatic, topographic, environmental and demographic data either when sent from other cloud servers or from accessing other databases via the Internet (405) are received for processing by the cloud server (403). Data is transmitted wirelessly or wired over the Internet (405), Internet transmission Is not restrictive, as any other type of network, such as communication or a combination of networks, can be applied.

[0024] The system (400) of Figure 4 consists of a cloud server (403) containing a processor (403-1), memory (403-3), a direct connection database (403-2) containing the recorded data (inspection and earth observation) on larval outbreaks in order to process the collected data and apply three distinct prediction models (403-5) of the presence/absence of larvae for the three genera (Culex, Aedes, Anopheles) per production site. In particular, the software of the three prediction models is implemented with the Random Forest ( RF) and (Extreme Gradient Boosting-XGBoost) algorithms. These algorithms are trained with the available set of collected data (230,627 observations after pre-processing of the data) and create rules and patterns of correlation behavior between the data being processed and the presence/absence of larvae per breeding site (binary classification). In addition, through the digital application (403-4), the inspection data recorded by the technical staff during the on-site inspection at the designated sampling stations are recorded in real time. [0025] Figure 5 depletes the flow diagram (500) of the method of the three distinct models predicting the presence/absence of larvae for the three genera (Culex, Aedes, Anopheles) per breeding site. The collected inspection (501) and Earth observation (502) data are pre- processed by the cloud server. Initially, due to the fact that the variables of the data set are of different form, cleaning processes are carried out, normalization, detection and correction of outliers as well as management of missing data. Techniques of oversampling minority classes are then applied because the dataset is unbalanced in order to augment the data until a balanced class distribution is achieved. The data is then processed and transformed by creating two additional variables: the breeding site ~ dry which indicates whether the breeding site is dry or not, and the dry site over time which verifies the presence of surface water in the breeding sites. With the technique of Retrospective Feature Elimination, the most important characteristics are selected (32 in total) where they are entered as inputs to the Machine Learning algorithms (RF / XGBoost) (504) for the implementation of three distinct prediction models (506) of the presence / absence of larvae for the three genera (Culex, Aedes, Anopheles} per breeding site. The performance of the two Machine Learning algorithms is improved (tuning) by selecting the optimal superparameters (505) for the three models predicting the presence/absence of larvae for the three genera (Culex, Aedes, Anopheles} per breeding site. In a preferred implementation of this invention, the prediction of the presence/absence of larvae for the three genera per outbreak may be channelled into decision-making algorithms (507), which may be artificial intelligence algorithms, to perform additional actions to control larval reproduction. Also, the data with which the three distinct forecasting models are trained can be continuously updated with new classified/corrected data. The retraining of prediction models or their amelioration (tuning) may be performed periodically or not, depending on the specifics regarding the larvae and the prediction requirements. The technical expert is able to understand that the implementation details of machine learning algorithms (504) may differ in the alternative models of the present invention. [0026] The technical advantages provided by the applied models are the accurate future prediction of the presence/absence of larvae for the three genera (Culex, Aedes, Anopheles} per breeding site and full control of the system and equipment used to make the prediction. The invention is obviously not limited to the specific models as described here, but also includes any variation that can be considered by any expert of the state of the art within the general purpose of the invention as defined in the claims.