Skip to content

Health Detection

WGnext uses traffic-based health monitoring to detect when a tunnel connection has failed. This approach is more reliable than handshake-based monitoring and works regardless of persistent_keepalive settings.

The ConnectionHealthMonitor polls the WireGuard backend every healthCheckInterval seconds (default: 10s), reading tx_bytes and rx_bytes from the UAPI interface.

Each health check evaluates the traffic deltas since the last check:

tx_bytes deltarx_bytes deltaVerdictMeaning
00IdleNo traffic at all — tunnel is quiet, not broken
0> 0HealthyReceiving data (rare without sending, but healthy)
> 0> 0HealthyTraffic flowing in both directions
> 00SuspectSending data but receiving nothing — start timer

When the “suspect” state (transmitting but not receiving) persists for trafficTimeout seconds (default: 30s), the connection is declared unhealthy and failover triggers.

WGnext originally used handshake-based detection, monitoring last_handshake_time from the WireGuard UAPI. This was replaced because:

  • It required persistent_keepalive to be enabled on all tunnels
  • Users had to carefully coordinate handshakeTimeout > 2 * persistentKeepAlive
  • Idle tunnels without keepalive showed stale handshakes even when the connection was fine
  • The timeout was confusingly coupled to the keepalive interval

Traffic-based detection doesn’t care about keepalive settings and only triggers when there’s actual evidence of a broken connection.

To prevent rapid cycling between tunnels (which can happen when all tunnels are experiencing intermittent issues), WGnext includes several safety mechanisms:

ProtectionDefaultPurpose
Minimum hold time60 secondsWon’t switch configs more frequently than once per minute
Max cycles before cooldown3After cycling through all configs 3 times…
Cooldown duration5 minutes…enter a 5-minute cooldown before trying again

When running on a fallback tunnel with autoFailback enabled, the monitor periodically probes the primary:

  1. Switch to primary config via adapter.update()
  2. Wait up to 15 seconds for a WireGuard handshake
  3. Check last_handshake_time — if recent, stay on primary
  4. Otherwise, revert to the fallback config

The probe interval is configurable via failbackProbeInterval (default: 300 seconds).

Network path changes (detected via NWPathMonitor) trigger an immediate failback probe when on a fallback tunnel. A network change (e.g., switching Wi-Fi networks) often means the primary may have recovered.

  • NWPathMonitor detects network offline → triggers temporary shutdown of the WireGuard backend
  • When the network recovers, the backend restarts and health monitoring resumes
  • wgDisableSomeRoamingForBrokenMobileSemantics is called for iOS-specific roaming behavior
  • Default MTU: 1280
  • Network changes trigger socket bumping (wgBumpSockets) instead of full backend restart
  • Default MTU calculation uses tunnelOverheadBytes = 80