pcap_or_it_didnt_happen.sh

Some time ago, a potential new customer approached us with a project request: firewall migration, over 1,000 rules across 50 interfaces. Connections to customers, suppliers, support partners—but no documentation, no comments, no IPAM. The admin who built it left years ago. Another vendor had quoted 8 months.

We did it differently: mirrored the firewall interfaces, captured traffic for a week, wrote a script that parses PCAPs and generates access-lists. The output still needs review—we insist on human eyes on firewall rules—but you get to the first 90% much faster. The migration was done in 4 weeks.

The scripts stuck around and grew into a toolkit. Capture traffic with tcpdump, upload to S3, split ERSPAN captures by host, generate ACL rules. Currently outputs Cisco ASA syntax, but the analysis is vendor-agnostic—Palo Alto or Fortinet would just need different templates.

The Toolkit

Built on Scapy for packet parsing. Uses Python's ProcessPoolExecutor for true multiprocessing, full CPU core utilization. GNU Parallel for batch processing across hosts. S3/MinIO integration for storage. We processed 10TB of PCAPs on that first project—and that was packet headers only.

Five scripts:

  • pcap_or_it_didnt_happen.sh — Captures traffic with tcpdump, rotates files, uploads to S3. Runs on the host or SPAN/ERSPAN collector.
  • split_erspan_pcaps.py — Splits multi-host ERSPAN/SPAN captures into per-host PCAPs. Handles GRE decapsulation. Parallel processing.
  • generate_pcap_csv.py — Creates inventory CSV from PCAP files. Maps hostnames to IPs using ASA object definitions.
  • process_pcaps.sh — Batch processor. Groups split files by hostname, runs analysis in parallel with GNU Parallel.
  • pcap_to_acl.py — Core engine. Parses PCAPs, detects connection direction, generates deduplicated access-lists.

Workflow

Standard (Per-Host Captures)

pcap_or_it_didnt_happen.sh → generate_pcap_csv.py → process_pcaps.sh

The simplest setup: run the capture script on each host you want to analyze, generate an inventory file that maps hostnames to IPs, then batch process everything into ACLs.

ERSPAN/SPAN (Multi-Host Captures)

pcap_or_it_didnt_happen.sh → split_erspan_pcaps.py → generate_pcap_csv.py → process_pcaps.sh

If you're mirroring traffic from multiple hosts to a single collector, there's an extra step. The split script separates the combined capture into per-host PCAPs before processing. It handles both GRE-encapsulated ERSPAN and regular SPAN traffic automatically.

Example Output

! ACL rules for webserver01 (10.1.1.10) - DMZ zone
! Generated from PCAP analysis

access-list acl-DMZ-in extended permit tcp object o-webserver01-v4 object o-appserver-v4 eq 8080
access-list acl-DMZ-in extended permit tcp object o-webserver01-v4 object o-appserver-v4 eq 8443

! ACL rules for appserver01 (10.2.1.10) - APPS zone

access-list acl-APPS-in extended permit tcp object o-appserver01-v4 object o-database-v4 eq 5432
access-list acl-APPS-in extended permit tcp object o-appserver01-v4 object o-redis-v4 eq 6379

The output uses your hostname and FQDN inventory to build meaningful object names instead of raw IPs. Rules are deduplicated across all capture files for each host, so you don't end up with thousands of duplicate entries.

The Pain Points

That first customer wasn't unique. We've seen the same pattern repeatedly:

  • Hardware EOL announcement comes in. Need to migrate to new firewall.
  • Nobody knows what half the rules do. The guy who built it is gone.
  • Documentation exists but it's from 2017 and the ruleset has been modified 300 times since.
  • Some rules say "temporary fix" in the comment. They've been there for four years.
  • There's a "permit any any" somewhere that nobody dares to remove.

The traditional approach: interview every application owner, review old tickets (if they still exist), make educated guesses. Takes months. Still misses things.

How We Help

We mirror firewall interfaces and capture headers only—no payload data. Captures go to your on-site S3 or one of our sites. Processing can happen on-site or within our infrastructure. Duration depends on your environment—a few days for simple setups, a week or more if you have batch jobs or monthly processes that need to be captured.

The toolkit processes the PCAPs and generates rules. But that's just the first 90%. The remaining 10% is where it gets interesting.

We audit all generated rules with your application teams. Go through them line by line. "This server talks to that database on port 5432—is that expected?" Sometimes yes. Sometimes "wait, that server was decommissioned last year." Sometimes "that's our backup system talking to the old storage array we forgot to disconnect."

Suspicious findings get flagged. Unknown traffic patterns get investigated. By the end, you don't just have a working ruleset—you have a documented ruleset where every rule has a reason, and you've cleaned up connections that shouldn't exist anymore.

Typical engagement: 2–4 weeks from kickoff to completed ruleset. Implementation timeline depends on your change management process.

FAQ

What about encrypted traffic?

We analyze connection metadata, not payload. Encrypted traffic still shows source, destination, and port. That's what firewall rules need.

How much data?

Depends on traffic volume. That first project was 10TB. We can filter to reduce size or capture headers only if storage is a concern.

What if traffic patterns change after migration?

New applications need new rules. The generated ruleset covers what existed during capture. Log denied traffic post-migration to catch anything missed.

Other firewalls besides Cisco ASA?

Analysis is vendor-agnostic. Currently outputs ASA syntax, but Palo Alto, Fortinet, Check Point are just different output templates. Same PCAP parsing.

Stuck with Undocumented Firewall Rules?

We help you get your ruleset on solid ground!