Modelling

Weather Forecasting

ECMWF produces operational global numerical weather forecasts of highest quality that are based on an Earth system model which incorporates a vast number of observations of the atmosphere and oceans. These forecasts are probabilistic predictions based on an ensemble approach and thus include intrinsic estimates of their uncertainties. Forecasts are provided for a range of prediction timescales from days (medium range) to weeks (extended range) and months (seasonal range). Here, we will start exploring the shorter time ranges and expand these to multiple weeks later in the project.

Precipitation, temperature and humidity will be provided for the target regions of Hanoi and Ho Chi Minh City. The skill of these forecasts will be evaluated to establish whether they could be used directly as input for the integrated modelling framework. It is, however, most likely that downscaling and a combination with higher resolution regional forecast data from the National Met Service will improve the performance of the forecasts. Thus, suitable statistical downscaling techniques will be developed and applied to derive forecasts at the city level for Hanoi and Ho Chi Minh.

The focus will be on the rainy season (July-December). The weather forecast products will be developed and tested on freely available past forecast data from ECMWF. These retrospective forecasts (or hindcasts) cover the period of the last 20 years using the same forecast model and will be our main data base for the development phase of the dengue forecasting tool.

While the operational forecasts currently have ensembles sizes of 51 members, the hindcasts use smaller ensemble sizes (approx. 15 members). However, depending on the exact timing of the start of this project, larger hindcast ensembles may become available as well. In our testing we will be able to use both, the reduced hindcast ensemble size and the full 51 forecast ensemble members. The horizontal resolution of ECMWF's ensemble forecasts is currently 18km with the prospect of an update to 9km globally with the implementation of the next model cycle in early 2023. For the verification we will be using both the latest global reanalysis from ECMWF, ERA-5, and station data from Hanoi and Ho Chi Min directly.

Epidemiological modelling

The purpose of the epidemiological modelling is to provide not only a forecasting engine for public health officials, clinicians and the general public, but also a tool for the scientists to explore and test mechanistical relationships between the multiple components and interactions contributing to dengue transmission (entomology, human behavioural and virological). Furthermore, we extend classical forecasting models by testing the inclusion of mechanistic models (as opposed to for agnostic approaches as classically done in machine learning or statistics) into the framework and test whether it can improve the forecasting performance. This becomes increasingly relevant in situations where the model is trained on data collected in locations different from those where the forecasts need to be made. That will typically be the case for example for what regards the entomological compartment as good quality adult mosquito data are extremely difficult to collect in practice and are available so far only from local scale research projects (e.g. World Mosquito Programme) and certainly not from longitudinal surveillance system.

In order to address these multiple purposes in a flexible way based on the data available, we will develop a plug-and-play meta-model pipeline including 3 main components as outlined below:

  • A mosquito population dynamics model that will translate meteorological conditions into transmission capacity of dengue virus. This model will account for local environmental conditions (some parts of the urban landscape being more suitable for mosquito breeding than others) and socio-economic status of the population (that also determines the mosquito breeding and biting success). The modelling will use the framework developed by Tran and collaborators (Ezanno et al 2015, Tran et al 2013, Cailly et al 2012). We will validate the model on adult mosquito data collected in the experimental sites of the World Mosquito Programme in Vietnam (see Table1).

  • A human demographic and behavioural model that will include data such as birth rates (that determine the rate of susceptible replenishment in the population), socio- economic status (that determines the risk of exposure to mosquito bites), school and public holidays (TĂȘt, 1st of May, 2nd of September, etc...), as well as connections between neighbourhoods in the city, and between the city and other locations (in particular between Hanoi and Ho Chi Minh city). The ideal type of data to document this spatial coupling are viral genomes which will be run through a phylogenetic pipeline developed in Oxford. High resolution mobility data are available via a collaboration with the Facebook Data for Good programme. These analyses will help understand the risk of emergence and start of the dengue season in places with low/no transmission during the winter months.

  • A dengue transmission model built on top of the two previous components. This model will account for the immunological status of the local population reconstructed from past incidence and serotype data (from 15 years of studies from dengue group in OUCRU). The transmission model will also include interventions in the form of insecticide spreading and public information. These will be readily updated as new data is coming in and results feed directly back to policy makers.

Inference and forecasting targets: The modelling components may vary significantly based on the forecast target (size, duration of dengue season, maximum incidence, timing of outbreaks, risk of exposure, and hospital and ICU bed forecasts). We therefore build a flexible epidemiological model that can be adapted for each of these targets. It will be trained on historical data using machine learning models such as boosted regression trees, support machine learning, XGBoost and random forests. Training will be performed by various cross- validation methods (k-fold, leave-p-out). Mean square error and Weighted Interval Score (WIS) will be used as a metrics of accuracy (Bracher et al 2021, Paireau et al 2021).