Virtualization

Virtio Multiqueue Issues on Proxmox 9 with OPNsense

We were seeing large file downloads from our S3 backends die when passing through OPNsense on Proxmox 9. Downloads started fine at 4+ Gbps, then after around 20 seconds throughput dropped to zero and never recovered. Rate-limiting to 50 Mbps made downloads work. Hosts bypassing the firewall on the same switches, same path—no issues. We could reproduce the problem reliably using iperf3. The fix was removing queues=4 from the Proxmox VM config.

Environment

OPNsense 25.7.11_9 (FreeBSD 14.3-RELEASE-p7) Proxmox VE 9.1.1, KVM, virtio NICs with queues=4 CARP/pfsync HA pair (fw211/fw212) vtnet interfaces, all hardware offloading disabled

Symptoms

  • Large file downloads from S3 backends through the firewall die after around 20 seconds
  • Throughput drops to zero and never recovers
  • Rate-limiting to 50 Mbps makes downloads work
  • Hosts bypassing the firewall (same switches, same path) work fine
  • Reproducible with iperf3 in reverse mode

Eliminated Causes

We went through most of the commonly recommended OPNsense and FreeBSD troubleshooting steps. Suricata wasn't running. Hardware checksum offload, TSO, and LRO were already disabled (correct for virtio). NIC ring buffers showed zero iqdrops, mbufs had zero allocation failures, netisr had zero QDrops. pf state table was unlimited. MTU was standard 1500 throughout. We tried disabling scrub fragment reassemble globally and enabling sloppy state—no change. Same Proxmox host worked fine for hosts not passing through the firewall.

Key pf Counters

During failing downloads, pfctl -si shows:

$ sh -c 'while true; do pfctl -si | grep -E "short|state-mismatch"; sleep 2; done'

short (PFRES_SHORT) increments at ~6–10/s throughout the session. state-mismatch jumps by ~1,000+ when the session dies.

PFRES_SHORT means packets dropped before rule/state/scrub evaluation. This is why disabling scrub and sloppy state had no effect.

Packet Captures

We captured on both sides of the firewall during a failing S3 download. The client-side capture tells the story. tshark IO stats show the throughput per 10-second interval (server→client only):

# tshark -r client.pcap -q -z io,stat,10,"tcp.port==36658 && ipv6.src==2001:db8:1::35" | Interval | Frames | Bytes | | 0 <> 10 | 178320 | 2034063025 | ~1.9 GB/s | 10 <> 20 | 185708 | 2063566546 | | 20 <> 30 | 182508 | 2046835060 | | ... | | | | 130 <> 140 | 115021 | 1205891768 | ~1.1 GB/s | 140 <> 150 | 31459 | 348790276 | stalling... | 150 <> 160 | 0 | 0 | dead | 160 <> 170 | 0 | 0 | | ...all zeros until RST at t=296s |

The last packets of the flow show exactly what happens. Server sends data, then silence. Client sends keepalive ACKs every ~15 seconds with an unchanged ACK number—no new data arriving. After 10 keepalives with no response, the client RSTs:

# Last data from server, then stall: 143.100 2001:db8:1::35 → client len=4284 [ACK] ← last server data 143.100 2001:db8:1::35 → client len=25704 [ACK] 143.100 client → 2001:db8:1::35 len=0 [ACK] ack=2999360546 143.103 client → 2001:db8:1::35 len=0 [ACK] ack=2999360546 ← last client ACK # 15 second gap — no data arrives from server 158.196 client → 2001:db8:1::35 len=0 [ACK] ack=2999360546 ← keepalive 173.556 client → 2001:db8:1::35 len=0 [ACK] ack=2999360546 ← keepalive 188.916 client → 2001:db8:1::35 len=0 [ACK] ack=2999360546 ← keepalive 204.276 client → 2001:db8:1::35 len=0 [ACK] ack=2999360546 219.636 client → 2001:db8:1::35 len=0 [ACK] ack=2999360546 234.996 client → 2001:db8:1::35 len=0 [ACK] ack=2999360546 250.356 client → 2001:db8:1::35 len=0 [ACK] ack=2999360546 265.716 client → 2001:db8:1::35 len=0 [ACK] ack=2999360546 281.076 client → 2001:db8:1::35 len=0 [ACK] ack=2999360546 296.436 client → 2001:db8:1::35 len=0 [ACK,RST] ack=2999360546 ← gives up

On the WAN side of the firewall, we confirmed the server's retransmissions do arrive—pf receives them but drops them before forwarding to the client.

Disabling pfsync Helped

$ ifconfig pfsync0 down # Downloads work immediately $ ifconfig pfsync0 up # Downloads break again

Repeatable every time. Not the root cause, but clearly involved.

Partial Mitigation: maxupd

# Default maxupd=128, max is 255 (uint8_t) $ ifconfig pfsync0 maxupd 255

Helps at lower throughput. Not enough at 5+ Gbps.

Data Direction Matters

Outbound through firewall—sustained for 120 seconds:

$ iperf3 -6 -c iperf.example.com -t 120 [ 5] 0.00-1.00 sec 561 MBytes 4.71 Gbits/sec [ 5] 1.00-2.00 sec 494 MBytes 4.14 Gbits/sec ...sustained for full 120 seconds...

Inbound through firewall—dies after around 20 seconds:

$ iperf3 -6 -c iperf.example.com -R -t 120 [ 5] 0.00-1.00 sec 561 MBytes 4.71 Gbits/sec ... [ 5] 14.00-15.00 sec 520 MBytes 4.36 Gbits/sec [ 5] 15.00-25.00 sec 959 KBytes 785 Kbits/sec [ 5] 25.00-26.00 sec 0.00 Bytes 0.00 bits/sec

Same IPv6 endpoints, same rule, same path. Only data direction changed.

The Fix

Remove queues=N from all virtio NIC configs on the OPNsense VM. Apply to all interfaces, not just pfsync.

Before (broken):

net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0,queues=4

After (working):

net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0

Single-queue virtio still achieves 4.7 Gbps through the firewall. pfsync remains fully functional. No additional tunables required.

pfsync Configuration Reference

$ ifconfig pfsync0 pfsync0: flags=1000041<UP,RUNNING,LOWER_UP> metric 0 mtu 1500 syncdev: vtnet8 syncpeer: 198.51.100.2 maxupd: 128 defer: off version: 1400 $ sysctl net.pfsync net.pfsync.defer_delay: 20 net.pfsync.pfsync_buckets: 8 net.pfsync.carp_demotion_factor: 240

References

Multiqueue virtio is the right default for most VMs. But on a FreeBSD firewall doing pfsync, packets from the same TCP flow land on different queues, and pfsync's per-state locking can't keep up. Single-queue avoids the problem entirely without a measurable performance hit—we're still pushing 4.7 Gbps through these boxes.

Running Virtualized Firewalls?

We run managed infrastructure on Proxmox across our own datacenters. OPNsense, CARP HA, and network automation on AS215197.