Test Methodology
Every number in this article comes from repeatable, controlled tests. Here's exactly how they were run so you can reproduce them or spot flaws.
Test topology
# Physical layout:
# [Client PC] ──10GbE──▶ [Device Under Test] ──10GbE──▶ [Server PC]
# (router/DUT)
# Both PCs: Intel X710 10GbE NIC, Debian 13, kernel 6.12
# WireGuard tunnel runs between Client PC and DUT
# iperf3 traffic flows through the tunnel
Test parameters
| Parameter | Value |
|---|---|
| iperf3 version | 3.16 |
| Test duration | 60 seconds per run |
| Runs per config | 5 (median reported) |
| TCP window | Auto-tuned (default) |
| MTU (baseline) | 1500 |
| MTU (WireGuard) | 1420 (default), 1400, 1280 also tested |
| WireGuard version | Kernel module (built-in on both devices) |
| Room temperature | 22 °C, boards with heatsinks, no active fan |
Hardware Under Test
| Spec | BPI-R4 | OpenWrt One |
|---|---|---|
| SoC | MediaTek Filogic 880 (MT7988A) | MediaTek Filogic 820 (MT7981B) |
| CPU | 4× Cortex-A73 @ 1.8 GHz | 2× Cortex-A53 @ 1.3 GHz |
| RAM | 4 GB DDR4 | 1 GB DDR4 |
| Ethernet (fastest) | 2× 10GbE SFP+ | 2× 2.5GbE RJ45 |
| Hardware crypto | Inline crypto engine (limited driver support) | None exposed to Linux |
| OpenWrt version | 24.10.0 stable | 24.10.0 stable |
For BPI-R4 setup details, see the BPI-R4 setup guide (BPI-R4 on Amazon). For OpenWrt One, see the OpenWrt One setup guide.
Baseline Tests (No VPN)
Raw forwarding throughput without any encryption, to establish the ceiling for each device:
# Server side
iperf3 -s
# Client side — single stream
iperf3 -c 10.0.0.1 -t 60
# Client side — 4 parallel streams
iperf3 -c 10.0.0.1 -t 60 -P 4
| Device | Single Stream | 4 Streams | Notes |
|---|---|---|---|
| BPI-R4 (HW offload ON) | 9.35 Gbps | 9.41 Gbps | Near wire-speed with flow offloading |
| BPI-R4 (HW offload OFF) | 2.8 Gbps | 3.1 Gbps | CPU-bound without offload |
| OpenWrt One | 2.35 Gbps | 2.36 Gbps | Wire-speed for 2.5GbE link |
WireGuard Configuration for Both Devices
Identical WireGuard config on both routers (adapted from the WireGuard on Banana Pi guide):
# On the DUT (router) — /etc/wireguard/wg0.conf
[Interface]
PrivateKey = <router-private-key>
Address = 10.100.0.1/24
ListenPort = 51820
MTU = 1420
[Peer]
PublicKey = <client-public-key>
AllowedIPs = 10.100.0.2/32
PresharedKey = <psk>
# Bring up
wg-quick up wg0
# On the client PC
[Interface]
PrivateKey = <client-private-key>
Address = 10.100.0.2/24
MTU = 1420
[Peer]
PublicKey = <router-public-key>
Endpoint = 10.0.0.1:51820
AllowedIPs = 10.100.0.0/24
PresharedKey = <psk>
WireGuard Throughput Results
All tests at MTU 1420 (WireGuard default) unless noted otherwise:
| Device | Single Stream | 4 Streams | % of Baseline |
|---|---|---|---|
| BPI-R4 | 1,580 Mbps | 2,450 Mbps | 26% of 9.4 Gbps baseline |
| OpenWrt One | 490 Mbps | 610 Mbps | 26% of 2.35 Gbps baseline |
MTU Impact on Throughput
Smaller MTU means more packets per byte transferred, which means more per-packet crypto overhead. Here's the measured impact:
| WireGuard MTU | BPI-R4 (4 streams) | OpenWrt One (4 streams) |
|---|---|---|
| 1420 (default) | 2,450 Mbps | 610 Mbps |
| 1400 | 2,380 Mbps | 595 Mbps |
| 1280 | 2,050 Mbps | 520 Mbps |
Dropping from 1420 to 1280 costs about 16% throughput on both devices. Stick with 1420 unless your path MTU requires lower.
CPU Utilisation During Tests
Monitored with mpstat -P ALL 1 during the 4-stream WireGuard tests:
BPI-R4 (4× Cortex-A73)
# During WireGuard 4-stream test:
# CPU0: 98.2% usr+sys (WireGuard + softirq)
# CPU1: 94.7%
# CPU2: 12.3% (mostly idle — WireGuard doesn't spread perfectly)
# CPU3: 8.1%
# Overall: ~53% total CPU utilisation
OpenWrt One (2× Cortex-A53)
# During WireGuard 4-stream test:
# CPU0: 99.8% (saturated)
# CPU1: 98.5% (saturated)
# Overall: ~99% total CPU utilisation
Single-Stream vs Multi-Stream
This matters for real-world use. A single large download is one stream. Multiple devices or connections create parallel streams.
| Streams | BPI-R4 | OpenWrt One |
|---|---|---|
| 1 | 1,580 Mbps | 490 Mbps |
| 2 | 2,100 Mbps | 560 Mbps |
| 4 | 2,450 Mbps | 610 Mbps |
| 8 | 2,480 Mbps | 615 Mbps |
Diminishing returns after 4 streams on both devices. The BPI-R4 gains 55% going from 1 to 4 streams. The OpenWrt One gains 24% — less headroom to exploit.
Hardware Crypto Offload Analysis
The MT7988A (BPI-R4) has an inline crypto engine, but as of OpenWrt 24.10, WireGuard does not use it. Here's why:
- Linux's WireGuard implementation uses the kernel crypto API for ChaCha20-Poly1305
- The MT7988A's crypto engine is designed for IPsec offload (ESP), not WireGuard's AEAD
- No upstream driver currently bridges WireGuard to the hardware crypto block
- MediaTek has posted patches for IPsec offload but not WireGuard
Both Cortex-A73 (BPI-R4) and Cortex-A53 (OpenWrt One) have ARMv8 NEON instructions. The kernel's WireGuard implementation does use NEON-accelerated ChaCha20, so this isn't entirely unoptimised — it's just not hardware-offloaded.
# Confirm NEON crypto is active
grep -r chacha /proc/crypto
# Should show: driver: chacha20-neon
Practical Recommendations
Which device for which WireGuard use case
| Use Case | Recommended Device | Reason |
|---|---|---|
| Site-to-site VPN, < 500 Mbps WAN | OpenWrt One | Cheaper, stays within throughput ceiling |
| Site-to-site VPN, 1 Gbps WAN | BPI-R4 | OpenWrt One maxes out at ~600 Mbps encrypted |
| Remote access VPN for a few users | Either | Low bandwidth demand |
| Full-tunnel VPN for household (10+ devices) | BPI-R4 | Multi-stream scenario where extra cores help |
| VPN at 2.5+ Gbps WAN | Neither — use x86 | Even BPI-R4 caps at ~2.5 Gbps encrypted |
Quick optimisation checklist
- ☐ Set WireGuard MTU to 1420 (or 1392 over PPPoE)
- ☐ Enable software flow offloading for non-VPN traffic
- ☐ Use
PersistentKeepalive = 25only when needed — it adds CPU load - ☐ Pin WireGuard's
napi_threadto the most powerful core (BPI-R4 only):echo 2 > /sys/class/net/wg0/queues/rx-0/rps_cpus - ☐ Monitor thermals — throttled CPUs destroy WireGuard throughput:
cat /sys/class/thermal/thermal_zone0/temp - ☐ Add a heatsink with fan if sustained VPN throughput matters