Vietnam Flood Monitoring System -- Real-Time Satellite Analysis and ML Predictions

The Flooding Challenge in Vietnam

Vietnam ranks among the top five countries most affected by climate-related flooding. The Mekong Delta -- home to roughly 18 million people and responsible for over half of the nation's rice output -- experiences annual inundation cycles that are growing more severe and less predictable. In the central highlands, flash floods triggered by intense monsoon rainfall devastate infrastructure with little warning.

Traditional monitoring relies on sparse gauge networks and manual field reports, leaving large gaps in spatial coverage and response time. Satellite remote sensing offers a way to close those gaps: synthetic aperture radar (SAR) instruments penetrate cloud cover, operate day and night, and deliver continent-scale observations on a revisit cadence of six to twelve days.

This article walks through the architecture of a flood monitoring platform that combines Sentinel-1 SAR imagery, interactive web maps, LSTM-based water level prediction, and an automated alerting pipeline. Every component is open-source and deployable on commodity hardware, making the system accessible to regional disaster management agencies with limited budgets.

Scope -- The system covers the Mekong Delta provinces (An Giang, Dong Thap, Long An, Can Tho) and three central highland basins (Thu Bon, Vu Gia, Huong River). Total monitored area exceeds 45,000 km squared. All data referenced in this article is publicly available through the Copernicus Open Access Hub and Vietnam's national hydro-meteorological service.

Satellite Data Pipeline

The foundation of the system is a continuous ingestion pipeline that discovers, downloads, and preprocesses Sentinel-1 Ground Range Detected (GRD) products from the Copernicus Data Space Ecosystem. We target IW (Interferometric Wide) swath mode in VV and VH polarisations, which provides a 10-metre pixel spacing ideally suited to regional flood mapping.

Discovery and Download

A scheduled Airflow DAG queries the OData catalogue API every six hours, filtering for products that intersect our areas of interest and were acquired within the last 48 hours. Matching scenes are downloaded in parallel using the Copernicus access token flow and stored in a staging bucket on MinIO.

import requests
from datetime import datetime, timedelta

ODATA_URL = "https://catalogue.dataspace.copernicus.eu/odata/v1/Products"

def discover_scenes(aoi_wkt: str, hours_back: int = 48) -> list[dict]:
    """Query Copernicus OData for recent Sentinel-1 GRD scenes."""
    cutoff = (datetime.utcnow() - timedelta(hours=hours_back)).strftime(
        "%Y-%m-%dT%H:%M:%S.000Z"
    )
    params = {
        "$filter": (
            f"Collection/Name eq 'SENTINEL-1' "
            f"and Attributes/OData.CSC.StringAttribute/any("
            f"att:att/Name eq 'productType' and att/OData.CSC.StringAttribute/"
            f"Value eq 'GRD') "
            f"and ContentDate/Start gt {cutoff} "
            f"and OData.CSC.Intersects(area=geography'SRID=4326;{aoi_wkt}')"
        ),
        "$top": 50,
    }
    resp = requests.get(ODATA_URL, params=params, timeout=30)
    resp.raise_for_status()
    return resp.json().get("value", [])

Preprocessing with GDAL and SNAP

Each downloaded GRD product passes through a four-step preprocessing chain: apply orbit file, thermal noise removal, radiometric calibration to sigma-nought, and Range-Doppler terrain correction using the SRTM 1-arc-second DEM. We orchestrate this through the ESA SNAP Graph Processing Tool (GPT) driven by a templated XML graph, but the calibration and terrain correction can also be accomplished entirely with GDAL for lighter-weight deployments.

from osgeo import gdal
import numpy as np

def calibrate_sigma0(input_path: str, output_path: str) -> None:
    """Convert DN to sigma-nought (linear scale) for Sentinel-1 GRD."""
    ds = gdal.Open(input_path, gdal.GA_ReadOnly)
    band = ds.GetRasterBand(1)
    dn = band.ReadAsArray().astype(np.float64)

    # Sigma-nought = DN^2 / A_i^2  (annotation LUT already applied for GRD)
    sigma0 = np.where(dn > 0, 10.0 * np.log10(dn ** 2) - 83.0, np.nan)

    driver = gdal.GetDriverByName("GTiff")
    out_ds = driver.Create(
        output_path,
        ds.RasterXSize,
        ds.RasterYSize,
        1,
        gdal.GDT_Float32,
        options=["COMPRESS=DEFLATE", "TILED=YES"],
    )
    out_ds.SetGeoTransform(ds.GetGeoTransform())
    out_ds.SetProjection(ds.GetProjection())
    out_ds.GetRasterBand(1).WriteArray(sigma0)
    out_ds.FlushCache()
    out_ds = None
    ds = None

Terrain-corrected outputs are reprojected to EPSG:4326 and tiled into Cloud Optimised GeoTIFF (COG) format so they can be served directly via HTTP range requests without a dedicated tile server for raster exploration.

def convert_to_cog(input_path: str, cog_path: str) -> None:
    """Re-encode a GeoTIFF as a Cloud Optimised GeoTIFF."""
    gdal.Translate(
        cog_path,
        input_path,
        format="COG",
        creationOptions=[
            "COMPRESS=DEFLATE",
            "OVERVIEW_RESAMPLING=AVERAGE",
            "BLOCKSIZE=512",
        ],
    )

Flood Detection Algorithm

Water surfaces produce specular reflection of the SAR signal, returning very little energy to the sensor. This makes flooded areas appear dark in SAR imagery compared to surrounding land. The detection algorithm exploits this contrast through a combination of global thresholding and change detection.

Otsu Thresholding on Sigma-Nought

For each terrain-corrected VV-polarisation scene, we compute a bimodal histogram of backscatter values and apply Otsu's method to find the optimal threshold separating water from non-water pixels. This works reliably in flat deltaic terrain but requires refinement in hilly areas where radar shadow can mimic water signatures.

from skimage.filters import threshold_otsu
from osgeo import gdal
import numpy as np

def detect_flood_otsu(sigma0_path: str) -> np.ndarray:
    """Binary flood mask via Otsu threshold on VV sigma-nought (dB)."""
    ds = gdal.Open(sigma0_path, gdal.GA_ReadOnly)
    sigma0 = ds.GetRasterBand(1).ReadAsArray()
    ds = None

    valid = sigma0[np.isfinite(sigma0)]
    thresh = threshold_otsu(valid)

    flood_mask = np.where(
        np.isfinite(sigma0) & (sigma0 < thresh), 1, 0
    ).astype(np.uint8)
    return flood_mask

Change Detection

Thresholding a single scene cannot distinguish permanent water bodies from flood water. To isolate newly inundated areas, we compute the pixel-wise difference between a reference (dry-season composite) and the current (event) scene. Only pixels that transition from land to water are classified as flood extent.

def change_detection(
    reference_path: str,
    event_path: str,
    diff_threshold_db: float = -3.0,
) -> np.ndarray:
    """Detect new flood pixels by backscatter change vs dry-season ref."""
    ref_ds = gdal.Open(reference_path, gdal.GA_ReadOnly)
    evt_ds = gdal.Open(event_path, gdal.GA_ReadOnly)

    ref = ref_ds.GetRasterBand(1).ReadAsArray()
    evt = evt_ds.GetRasterBand(1).ReadAsArray()

    ref_ds = None
    evt_ds = None

    diff = evt - ref  # negative means signal decreased (more water-like)

    flood_change = np.where(
        np.isfinite(diff) & (diff < diff_threshold_db), 1, 0
    ).astype(np.uint8)
    return flood_change

Accuracy note -- Validation against manually delineated flood polygons from the Copernicus Emergency Management Service (CEMS) Rapid Mapping activations for events in 2023-2025 shows an overall accuracy of 89% and a kappa coefficient of 0.81. Most misclassifications occur under dense vegetation canopy, where double-bounce scattering masks the specular water response.

The final flood extent polygons are vectorised, simplified with a Douglas-Peucker tolerance of 20 metres, and written to PostGIS for downstream consumption by the map platform and alerting engine.

Interactive Map Platform

Operational users -- provincial disaster committees, DARD offices, search-and-rescue teams -- need a visual interface that loads fast and works on low-bandwidth connections common in rural Vietnam. We built the map client with MapLibre GL JS backed by vector tiles generated from PostGIS and raster tile layers served through GeoServer.

Vector Tile Generation

Flood extent polygons and gauge station point layers are served as Mapbox Vector Tiles (MVT) through pg_tileserv, which reads directly from PostGIS views. This avoids the need to pre-generate a tile cache and ensures that every map pan or zoom returns the latest data.

-- PostGIS view consumed by pg_tileserv as a vector tile source
CREATE OR REPLACE VIEW public.flood_extent_tiles AS
SELECT
    fe.id,
    fe.event_date,
    fe.area_km2,
    fe.severity,
    fe.province,
    ST_Transform(fe.geom, 3857) AS geom
FROM
    flood_events.flood_extents fe
WHERE
    fe.event_date >= CURRENT_DATE - INTERVAL '30 days'
    AND fe.is_validated = true;

COMMENT ON VIEW public.flood_extent_tiles IS
    'Recent validated flood extents for vector tile serving';

WMS Raster Layers via GeoServer

The terrain-corrected SAR backscatter imagery and derived flood probability surfaces are published as WMS layers through GeoServer. Each layer is configured with a colour ramp SLD that maps sigma-nought values to an intuitive blue gradient, letting users visually assess flood severity without interpreting raw radar values.

MapLibre GL Client

The front-end map component initialises a MapLibre GL map centred on the Mekong Delta at zoom level 8. Vector tile sources are added for flood extents and gauge stations, while the SAR raster is loaded as a raster source from the GeoServer WMS endpoint. A layer switcher lets users toggle between base maps (satellite imagery, OpenStreetMap) and overlay combinations.

import maplibregl from "maplibre-gl";

const map = new maplibregl.Map({
  container: "map",
  style: "https://tiles.example.com/styles/dark/style.json",
  center: [105.75, 10.03],   // Mekong Delta
  zoom: 8,
});

map.on("load", () => {
  // Vector tile source -- flood extents from pg_tileserv
  map.addSource("flood-extents", {
    type: "vector",
    tiles: [
      "https://tiles.example.com/public.flood_extent_tiles/{z}/{x}/{y}.pbf",
    ],
    minzoom: 5,
    maxzoom: 14,
  });

  map.addLayer({
    id: "flood-fill",
    type: "fill",
    source: "flood-extents",
    "source-layer": "public.flood_extent_tiles",
    paint: {
      "fill-color": [
        "interpolate", ["linear"], ["get", "severity"],
        1, "#60a5fa",   // low
        3, "#2563eb",   // moderate
        5, "#1e3a8a",   // severe
      ],
      "fill-opacity": 0.55,
    },
  });

  // WMS raster source -- SAR backscatter
  map.addSource("sar-backscatter", {
    type: "raster",
    tiles: [
      "https://geoserver.example.com/geoserver/flood/wms?"
      + "service=WMS&version=1.1.1&request=GetMap"
      + "&layers=flood:sigma0_latest&srs=EPSG:3857"
      + "&bbox={bbox-epsg-3857}&width=256&height=256"
      + "&format=image/png&transparent=true",
    ],
    tileSize: 256,
  });

  map.addLayer({
    id: "sar-overlay",
    type: "raster",
    source: "sar-backscatter",
    paint: { "raster-opacity": 0.7 },
    layout: { visibility: "none" },
  });
});

Performance -- Vector tiles from pg_tileserv render in under 200 ms for typical viewport queries. The GeoServer WMS endpoint is fronted by a Varnish cache with a 5-minute TTL, keeping tile response times below 80 ms for repeat requests. Total initial map load on a 3G connection averages 2.4 seconds.

ML Prediction Model

Detecting current flood extent is valuable, but decision-makers also need forecasts. Our prediction module uses a Long Short-Term Memory (LSTM) network to forecast water levels at 47 gauge stations across the monitored basins, providing 24-, 48-, and 72-hour lead times.

Feature Engineering

For each gauge, we assemble a multivariate time series containing hourly water level readings, upstream gauge levels (lagged), cumulative rainfall from the nearest four meteorological stations, soil moisture estimates from SMAP satellite data, and a satellite-derived flood fraction index computed from the most recent Sentinel-1 pass.

import pandas as pd
import numpy as np

def build_feature_matrix(
    gauge_id: str,
    lookback_hours: int = 168,
) -> tuple[np.ndarray, np.ndarray]:
    """Assemble LSTM input features and targets for a single gauge."""
    # Load hourly observations
    wl = pd.read_sql(
        """
        SELECT ts, water_level_m
        FROM hydro.gauge_readings
        WHERE gauge_id = %(gid)s
          AND ts >= NOW() - INTERVAL '%(lb)s hours'
        ORDER BY ts
        """,
        con=engine,
        params={"gid": gauge_id, "lb": lookback_hours},
        parse_dates=["ts"],
    ).set_index("ts").resample("1h").mean().interpolate()

    # Load rainfall
    rain = pd.read_sql(
        """
        SELECT ts, cumulative_mm
        FROM meteo.rainfall
        WHERE station_id IN (
            SELECT station_id FROM meteo.gauge_station_map
            WHERE gauge_id = %(gid)s
        )
        AND ts >= NOW() - INTERVAL '%(lb)s hours'
        ORDER BY ts
        """,
        con=engine,
        params={"gid": gauge_id, "lb": lookback_hours},
        parse_dates=["ts"],
    ).set_index("ts").resample("1h").sum().fillna(0)

    # Load satellite flood fraction
    flood_frac = pd.read_sql(
        """
        SELECT observation_ts AS ts, flood_fraction
        FROM satellite.flood_indices
        WHERE gauge_id = %(gid)s
          AND observation_ts >= NOW() - INTERVAL '%(lb)s hours'
        ORDER BY observation_ts
        """,
        con=engine,
        params={"gid": gauge_id, "lb": lookback_hours},
        parse_dates=["ts"],
    ).set_index("ts").resample("1h").interpolate()

    features = pd.concat(
        [wl, rain, flood_frac], axis=1
    ).dropna()

    X = features.values  # shape (T, num_features)
    y = wl["water_level_m"].shift(-24).dropna().values  # 24h ahead target
    min_len = min(len(X), len(y))
    return X[:min_len], y[:min_len]

Model Architecture

The network consists of two stacked LSTM layers (128 and 64 units) followed by a dense head that outputs water level predictions at three horizons simultaneously. We apply dropout of 0.2 between layers and train with the Adam optimiser on mean squared error loss. Training runs on historical data from 2015 to 2024, with 2024 held out for validation.

import torch
import torch.nn as nn

class WaterLevelLSTM(nn.Module):
    """Multi-horizon water level forecasting network."""

    def __init__(
        self,
        input_dim: int = 4,
        hidden_dim: int = 128,
        num_layers: int = 2,
        horizons: int = 3,
        dropout: float = 0.2,
    ):
        super().__init__()
        self.lstm = nn.LSTM(
            input_size=input_dim,
            hidden_size=hidden_dim,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout,
        )
        self.fc = nn.Sequential(
            nn.Linear(hidden_dim, 64),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(64, horizons),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # x: (batch, seq_len, input_dim)
        lstm_out, _ = self.lstm(x)
        last_hidden = lstm_out[:, -1, :]  # take last time step
        return self.fc(last_hidden)       # (batch, horizons)

Training Pipeline

Training is orchestrated by an MLflow experiment. Each gauge station gets its own model. The pipeline normalises features using a sliding 30-day z-score, creates rolling-window sequences of length 168 (7 days), and trains for 80 epochs with early stopping on validation RMSE. Best checkpoints are registered in the MLflow model registry and promoted to "Production" stage after passing automated accuracy gates.

import mlflow
import mlflow.pytorch

def train_gauge_model(gauge_id: str, epochs: int = 80) -> None:
    """Train and register an LSTM model for one gauge station."""
    X, y = build_feature_matrix(gauge_id)
    X_seq, y_seq = create_sequences(X, y, seq_len=168)

    split = int(len(X_seq) * 0.8)
    train_X, val_X = X_seq[:split], X_seq[split:]
    train_y, val_y = y_seq[:split], y_seq[split:]

    model = WaterLevelLSTM(input_dim=X.shape[1])
    optimiser = torch.optim.Adam(model.parameters(), lr=1e-3)
    criterion = nn.MSELoss()

    with mlflow.start_run(run_name=f"lstm_{gauge_id}"):
        mlflow.log_param("gauge_id", gauge_id)
        mlflow.log_param("seq_len", 168)
        mlflow.log_param("epochs", epochs)

        best_val_rmse = float("inf")
        patience, wait = 10, 0

        for epoch in range(epochs):
            model.train()
            pred = model(torch.tensor(train_X, dtype=torch.float32))
            loss = criterion(pred, torch.tensor(train_y, dtype=torch.float32))
            optimiser.zero_grad()
            loss.backward()
            optimiser.step()

            model.eval()
            with torch.no_grad():
                val_pred = model(torch.tensor(val_X, dtype=torch.float32))
                val_rmse = torch.sqrt(
                    criterion(val_pred, torch.tensor(val_y, dtype=torch.float32))
                ).item()

            mlflow.log_metric("val_rmse", val_rmse, step=epoch)

            if val_rmse < best_val_rmse:
                best_val_rmse = val_rmse
                wait = 0
                mlflow.pytorch.log_model(model, "best_model")
            else:
                wait += 1
                if wait >= patience:
                    break

        mlflow.log_metric("best_val_rmse", best_val_rmse)

Model performance -- Across all 47 gauges, the median RMSE for 24-hour forecasts is 0.18 m, for 48-hour forecasts 0.31 m, and for 72-hour forecasts 0.47 m. Stations in the lower Mekong with smoother hydrographs achieve sub-0.10 m accuracy at the 24-hour horizon.

Alert and Response System

Predictions and detected flood extents feed into a rule-based alerting engine. Each gauge station has three configurable thresholds: advisory (yellow), warning (orange), and emergency (red), aligned with Vietnam's national flood warning classification. When a predicted or observed water level crosses a threshold, the system triggers a notification cascade.

Threshold Evaluation

from enum import Enum
from dataclasses import dataclass

class AlertLevel(Enum):
    NORMAL = "normal"
    ADVISORY = "advisory"
    WARNING = "warning"
    EMERGENCY = "emergency"

@dataclass
class GaugeThresholds:
    gauge_id: str
    advisory_m: float
    warning_m: float
    emergency_m: float

def evaluate_alert(
    thresholds: GaugeThresholds,
    predicted_level: float,
) -> AlertLevel:
    """Determine alert level from predicted water level."""
    if predicted_level >= thresholds.emergency_m:
        return AlertLevel.EMERGENCY
    elif predicted_level >= thresholds.warning_m:
        return AlertLevel.WARNING
    elif predicted_level >= thresholds.advisory_m:
        return AlertLevel.ADVISORY
    return AlertLevel.NORMAL

Notification Channels

Alerts are dispatched through multiple channels to maximise reach. Provincial disaster committees receive structured webhook payloads compatible with their existing command-and-control dashboards. Field teams receive SMS summaries via a Twilio integration. A dedicated Telegram bot pushes real-time updates to community groups in affected districts.

import httpx

async def send_webhook_alert(
    webhook_url: str,
    gauge_id: str,
    alert_level: AlertLevel,
    predicted_level: float,
    forecast_hour: int,
) -> None:
    """Push structured alert payload to external webhook."""
    payload = {
        "gauge_id": gauge_id,
        "alert_level": alert_level.value,
        "predicted_water_level_m": round(predicted_level, 2),
        "forecast_horizon_hours": forecast_hour,
        "source": "vietnam-flood-monitor",
        "timestamp": datetime.utcnow().isoformat() + "Z",
    }
    async with httpx.AsyncClient(timeout=10) as client:
        resp = await client.post(webhook_url, json=payload)
        resp.raise_for_status()


async def send_sms_alert(
    phone_numbers: list[str],
    gauge_name: str,
    alert_level: AlertLevel,
    predicted_level: float,
) -> None:
    """Dispatch SMS alerts via Twilio."""
    from twilio.rest import Client as TwilioClient

    client = TwilioClient()
    body = (
        f"FLOOD ALERT [{alert_level.value.upper()}] -- "
        f"{gauge_name}: predicted {predicted_level:.2f}m. "
        f"Take precautionary measures."
    )
    for number in phone_numbers:
        client.messages.create(
            body=body,
            from_="+84xxxxxxxxx",
            to=number,
        )

All alerts are logged in a PostgreSQL audit table with delivery status tracking. A de-duplication window of four hours prevents repeated notifications for the same gauge and alert level, reducing alert fatigue for field responders.

Infrastructure

The entire platform runs as a set of Docker containers orchestrated by Docker Compose on a single server with 32 GB RAM, 8 vCPUs, and 2 TB of NVMe storage. This keeps operational costs low while handling the current data volume comfortably. A migration path to Kubernetes is documented for future scale-out.

PostGIS Spatial Database

PostGIS is the central data store for all vector layers: flood extent polygons, gauge station metadata, administrative boundaries, and alert audit logs. Spatial indices on geometry columns enable sub-millisecond containment and intersection queries that power the map tile endpoints and alert evaluation.

-- Schema layout for flood monitoring database
CREATE SCHEMA IF NOT EXISTS flood_events;
CREATE SCHEMA IF NOT EXISTS hydro;
CREATE SCHEMA IF NOT EXISTS meteo;
CREATE SCHEMA IF NOT EXISTS satellite;
CREATE SCHEMA IF NOT EXISTS alerts;

-- Flood extents table with spatial index
CREATE TABLE flood_events.flood_extents (
    id            BIGSERIAL PRIMARY KEY,
    event_date    DATE NOT NULL,
    source_scene  TEXT NOT NULL,
    province      TEXT,
    area_km2      NUMERIC(10,2),
    severity      SMALLINT CHECK (severity BETWEEN 1 AND 5),
    is_validated  BOOLEAN DEFAULT false,
    created_at    TIMESTAMPTZ DEFAULT NOW(),
    geom          GEOMETRY(MultiPolygon, 4326) NOT NULL
);

CREATE INDEX idx_flood_extents_geom
    ON flood_events.flood_extents USING GIST (geom);
CREATE INDEX idx_flood_extents_date
    ON flood_events.flood_extents (event_date DESC);

-- Gauge stations
CREATE TABLE hydro.gauge_stations (
    gauge_id      TEXT PRIMARY KEY,
    name          TEXT NOT NULL,
    river_basin   TEXT,
    province      TEXT,
    elevation_m   NUMERIC(6,2),
    geom          GEOMETRY(Point, 4326) NOT NULL
);

CREATE INDEX idx_gauge_stations_geom
    ON hydro.gauge_stations USING GIST (geom);

-- Alert audit log
CREATE TABLE alerts.alert_log (
    id              BIGSERIAL PRIMARY KEY,
    gauge_id        TEXT REFERENCES hydro.gauge_stations(gauge_id),
    alert_level     TEXT NOT NULL,
    predicted_m     NUMERIC(6,2),
    forecast_hour   SMALLINT,
    channels        TEXT[],
    delivered_at    TIMESTAMPTZ,
    created_at      TIMESTAMPTZ DEFAULT NOW()
);

Docker Compose Services

The compose stack includes: PostGIS 16 with the TimescaleDB extension for time-series gauge data, GeoServer for WMS/WFS publishing, pg_tileserv for vector tiles, MinIO for object storage of raw and processed rasters, Airflow for scheduling the ingestion and prediction DAGs, and an Nginx reverse proxy with TLS termination.

# docker-compose.yml (abbreviated)
services:
  postgis:
    image: timescale/timescaledb-ha:pg16-ts2.14
    environment:
      POSTGRES_DB: flood_monitor
      POSTGRES_USER: flood_admin
    volumes:
      - pgdata:/var/lib/postgresql/data
    ports:
      - "5432:5432"

  geoserver:
    image: kartoza/geoserver:2.24.2
    environment:
      GEOSERVER_ADMIN_PASSWORD_FILE: /run/secrets/gs_pass
    volumes:
      - geoserver_data:/opt/geoserver/data_dir
    depends_on:
      - postgis

  pg-tileserv:
    image: pramsey/pg_tileserv:latest
    environment:
      DATABASE_URL: postgres://flood_admin@postgis:5432/flood_monitor
    depends_on:
      - postgis

  minio:
    image: minio/minio:latest
    command: server /data --console-address ":9001"
    volumes:
      - minio_data:/data

  airflow-webserver:
    image: apache/airflow:2.8.1-python3.11
    environment:
      AIRFLOW__CORE__EXECUTOR: LocalExecutor
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: >-
        postgresql+psycopg2://flood_admin@postgis:5432/flood_monitor
    volumes:
      - ./dags:/opt/airflow/dags
    ports:
      - "8080:8080"

volumes:
  pgdata:
  geoserver_data:
  minio_data:

Automated Refresh Schedules

Three Airflow DAGs drive the data lifecycle. The ingestion DAG runs every six hours, discovering and downloading new Sentinel-1 scenes. The processing DAG triggers on each new download, running the preprocessing chain and flood detection algorithm. The prediction DAG executes hourly, pulling the latest gauge readings, running inference through the deployed LSTM models, and feeding results into the alert evaluation engine.

Results

The platform has been operational since September 2025, covering two full monsoon cycles. Key performance metrics demonstrate meaningful improvements over the pre-existing manual monitoring workflow.

45,000

km²

Monitored area

stations

Gauge coverage

0.18 m

RMSE

24h forecast accuracy

< 45 min

latency

Scene to alert

Prior to the system, provincial committees relied on twice-daily manual gauge readings relayed by phone, providing a typical alert lead time of two to four hours. With LSTM-based 24-hour forecasts, effective lead time has increased to approximately 18 hours for advisory-level events and 12 hours for emergency events.

The satellite-derived flood extent mapping identified three inundation events in October 2025 that were not captured by the sparse gauge network, covering approximately 1,200 km squared of agricultural land in Long An and Dong Thap provinces. Early detection enabled preemptive evacuation of livestock and mobilisation of sandbag reserves.

Response time improvement -- Average time from flood onset to first coordinated response action decreased from approximately 8 hours to under 3 hours during the 2025 monsoon season. This improvement is attributed to the combination of automated alerting, longer forecast lead times, and the visual situational awareness provided by the interactive map.

Future Work

Several enhancements are planned for the next development cycle to increase the system's spatial resolution, community engagement, and institutional integration.

Higher Resolution Imagery

Sentinel-1's 10-metre resolution is effective at the provincial scale but insufficient for neighbourhood-level mapping in urban areas like Can Tho and Da Nang. We are evaluating commercial SAR providers (ICEYE, Capella Space) that offer sub-metre resolution with tasking latencies under two hours. A hybrid pipeline would use Sentinel-1 for routine monitoring and task high-resolution collections during confirmed emergency events.

Crowd-Sourced Flood Reports

A lightweight mobile reporting interface will allow community members to submit geotagged flood observations -- water depth estimates, road passability, and evacuation status. These reports will be ingested as point features in PostGIS, cross-referenced with satellite-derived extents for validation, and displayed as an additional layer on the operational map. Natural language processing will extract structured fields from free-text Vietnamese descriptions.

National Disaster Management Integration

Vietnam's Central Committee for Flood and Storm Control (CCFSC) maintains a national-level dashboard. We are working with their technical team to expose our flood extent and prediction data through a standardised OGC API - Features endpoint, enabling seamless aggregation with monitoring systems from other provinces. The goal is a nationally consistent flood common operating picture that synthesises satellite, gauge, and crowd-sourced data streams.

Ensemble Forecasting

The current single-model LSTM architecture will be replaced with an ensemble of diverse architectures -- Transformer-based temporal fusion, gradient-boosted trees on hand-crafted features, and physics-informed neural networks constrained by Saint-Venant equations. Ensemble prediction intervals will provide calibrated uncertainty bounds alongside point forecasts, giving decision-makers a clearer picture of forecast confidence.

Timeline -- High-resolution imagery integration is targeted for Q2 2026. The crowd-sourced reporting module and CCFSC data exchange are planned for Q3 2026. Ensemble forecasting research is ongoing with expected operational deployment in Q4 2026.

Interactive Lab Demo

Vietnam Fishing Dashboard -- Real-time vessel tracking with MapLibre GL, PostGIS, and the same geospatial stack

Explore the case study