BLACKSHIELD

Öffentlicher Leitfaden

Network Sensor Scaling and Performance

Capacity planning, performance tuning, and sizing guidance for high-volume network telemetry ingestion. Zielgruppe: Platform architects, operations engineers, security engineers. Typische Einrichtungszeit: 10 minutes.

reference

Use this if

Capacity planning, performance tuning, and sizing guidance for high-volume network telemetry ingestion.

Audience
Platform architects, operations engineers, security engineers
Typical time
10 minutes

Bevor Sie beginnen

  • You have deployed at least one network sensor and are familiar with basic operations.
  • You understand your network's typical traffic volume and have monitoring in place.
  • You have access to CloudWatch (AWS), Cloud Monitoring (GCP), or Monitor (Azure).

Guide walkthrough

Schritt 1

Determine traffic volume and sizing tier

Size the sensor infrastructure based on expected network traffic and alert volume.

  • Low volume (<10 Gbps, <1000 alerts/min): t3.medium or g1-small — single sensor instance.
  • Medium volume (10–50 Gbps, 1k–10k alerts/min): m5.large or n1-standard-2 — single sensor, upgrade CPU/memory.
  • High volume (50–500 Gbps, 10k–100k alerts/min): c5.2xlarge or n1-standard-4 — multi-sensor active-passive or active-active.
  • Very high volume (>500 Gbps): multi-sensor active-active with load balancing and dedicated backend.

What success looks like

Very high volume (>500 Gbps): multi-sensor active-active with load balancing and dedicated backend.

Schritt 2

Tuning configuration for your workload

Adjust sensor parameters to match your priorities (real-time vs. accuracy vs. cost).

  • Real-time priority: MIN_SEVERITY=medium, FLUSH_INTERVAL_SECONDS=10, PACKET_SAMPLING_RATE=1.0 (no sampling).
  • Cost-optimized: MIN_SEVERITY=high, FLUSH_INTERVAL_SECONDS=300, PACKET_SAMPLING_RATE=0.1 (10% sampling).
  • High-volume ingestion: set MAX_EVENTS_PER_BATCH=5000, BATCH_TIMEOUT_SECONDS=30.
  • Reduce API load: set SCAN_INTERVAL_SECONDS=60 for periodic ingestion vs. continuous.

What success looks like

Reduce API load: set SCAN_INTERVAL_SECONDS=60 for periodic ingestion vs. continuous.

Schritt 3

Sensor type comparison and selection

Choose between Suricata, Zeek, and eBPF based on use case and resource constraints.

  • Suricata: 15–20 Gbps per core, best for malware/IDS detection, highest memory (4–8 GB for 50 Gbps).
  • Zeek: 5–10 Gbps per core, best for protocol analysis and behavior profiling, moderate memory (2–4 GB).
  • eBPF: 50–100 Gbps per core, best for runtime events and system call monitoring, lowest memory (500 MB–1 GB).

What success looks like

eBPF: 50–100 Gbps per core, best for runtime events and system call monitoring, lowest memory (500 MB–1 GB).

Schritt 4

Monitoring and alerting

Set up dashboards and alerts to track sensor health and performance.

  • Monitor: findings_ingested_total, capture_packets_dropped, cpu_usage, memory_usage, api_request_latency.
  • Alert on: cpu_usage > 80%, memory_usage > 85%, capture_packets_dropped > 1%, api_errors_5xx > 10/min.
  • Enable CloudWatch (AWS), Cloud Monitoring (GCP), or Monitor (Azure) agent on sensor VM.
  • Export metrics to your SIEM or observability platform for centralized alerting.

What success looks like

Export metrics to your SIEM or observability platform for centralized alerting.

Ausführen

sensor-environment-vars.env

bash
# Production real-time configuration
SENSOR_TYPE=suricata
MIN_SEVERITY=medium
SCAN_INTERVAL_SECONDS=30
PACKET_SAMPLING_RATE=1.0
FLUSH_INTERVAL_SECONDS=10
MAX_EVENTS_PER_BATCH=1000
BATCH_TIMEOUT_SECONDS=10

What success looks like

  • Sensor CPU utilization stays below 80% during normal traffic patterns.
  • Memory usage is stable and does not exceed the allocated instance size.
  • Ingestion latency (from capture to platform) is less than 30 seconds.
Network Sensor Scaling and Performance | BlackShield Docs