iptables and Apache Hudi Integration

Powerful performance with an easy integration, powered by Telegraf, the open source data connector built by InfluxData.

info

This is not the recommended configuration for real-time query at scale. For query and compression optimization, high-speed ingest, and high availability, you may want to consider iptables and InfluxDB.

5B+

Telegraf downloads

#1

Time series database
Source: DB Engines

1B+

Downloads of InfluxDB

2,800+

Contributors

Table of Contents

Powerful Performance, Limitless Scale

Collect, organize, and act on massive volumes of high-velocity data. Any data is more valuable when you think of it as time series data. with InfluxDB, the #1 time series platform built to scale with Telegraf.

See Ways to Get Started

Input and output integration overview

The iptables plugin for Telegraf collects metrics on packet and byte counts for specified iptables rules, providing insights into firewall activity and performance.

Writes metrics to Parquet files via Telegraf’s Parquet output plugin, preparing them for ingestion into Apache Hudi’s lakehouse architecture.

Integration details

iptables

The iptables plugin gathers packets and bytes counters for rules within a set of table and chain from the Linux iptables firewall. The plugin monitors rules identified by associated comments, as rules without comments are ignored. This approach ensures a unique identification for the monitored rules, which is particularly important since the rule number can change dynamically as rules are modified. To use this plugin effectively, users must name their rules with unique comments. The plugin also requires elevated permissions (CAP_NET_ADMIN and CAP_NET_RAW) to run, which can be configured either by running Telegraf as root (discouraged), using systemd capabilities, or by configuring sudo appropriately. Additionally, defining multiple instances of the plugin might lead to conflicts; thus, using locking mechanisms in the configuration is recommended to avoid errors during concurrent accesses.

Apache Hudi

This configuration leverages Telegraf’s Parquet plugin to serialize metrics into columnar Parquet files suitable for downstream ingestion by Apache Hudi. The plugin writes metrics grouped by metric name into files in a specified directory, buffering writes for efficiency and optionally rotating files on timers. It considers schema compatibility—metrics with incompatible schemas are dropped—ensuring consistency. Apache Hudi can then consume these Parquet files via tools like DeltaStreamer or Spark jobs, enabling transactional ingestion, time-travel queries, and upserts on your time series data.

Configuration

iptables

[[inputs.iptables]]
  ## iptables require root access on most systems.
  ## Setting 'use_sudo' to true will make use of sudo to run iptables.
  ## Users must configure sudo to allow telegraf user to run iptables with
  ## no password.
  ## iptables can be restricted to only list command "iptables -nvL".
  use_sudo = false
  ## Setting 'use_lock' to true runs iptables with the "-w" option.
  ## Adjust your sudo settings appropriately if using this option
  ## ("iptables -w 5 -nvl")
  use_lock = false
  ## Define an alternate executable, such as "ip6tables". Default is "iptables".
  # binary = "ip6tables"
  ## defines the table to monitor:
  table = "filter"
  ## defines the chains to monitor.
  ## NOTE: iptables rules without a comment will not be monitored.
  ## Read the plugin documentation for more information.
  chains = [ "INPUT" ]

Apache Hudi

[[outputs.parquet]]
  ## Directory to write parquet files in. If a file already exists the output
  ## will attempt to continue using the existing file.
  directory = "/var/lib/telegraf/hudi_metrics"

  ## File rotation interval (default is no rotation)
  # rotation_interval = "1h"

  ## Buffer size before writing (default is 1000 metrics)
  # buffer_size = 1000

  ## Optional: compression codec (snappy, gzip, etc.)
  # compression_codec = "snappy"

  ## When grouping metrics, each metric name goes to its own file
  ## If a metric’s schema doesn’t match the existing schema, it will be dropped

Input and output integration examples

iptables

  1. Monitoring Firewall Performance: Monitor the performance and efficiency of your firewall rules in real time. By tracking packet and byte counters, network administrators can identify which rules are most active and may require optimization. This enables proactive management of firewall configurations to enhance security and performance, especially in environments where dynamic adjustments are frequently made.

  2. Understanding Traffic Patterns: Analyze incoming and outgoing traffic patterns based on specific rules. By leveraging the metrics gathered by this plugin, system admins can gain insights into which services are receiving the most traffic, effectively identifying popular services and potential security threats from unusual traffic spikes.

  3. Automated Alerting on Traffic Anomalies: Integrate the iptables plugin with an alerting system to notify administrators of unusual activity detected by the firewall. By setting thresholds on the collected metrics, such as sudden increases in packets dropped or unexpected protocol use, teams can automate responses to potential security incidents, enabling swift remediation of threats to the network.

  4. Comparative Analysis of Firewall Rules: Conduct comparative analyses of different firewall rules over time. By collecting historical packet and byte metrics, organizations can evaluate the effectiveness of various rules, making data-driven decisions on which rules to modify, reinforce, or remove altogether, thus streamlining their firewall configurations.

Apache Hudi

  1. Transactional Lakehouse Metrics: Buffer and write Web service metrics as Parquet files for DeltaStreamer to ingest into Hudi, enabling upserts, ACID compliance, and time-travel on historical performance data.

  2. Edge Device Batch Analytics: Telegraf running on IoT gateways writes metrics to Parquet locally, where periodic Spark jobs ingest them into Hudi for long-term analytics and traceability.

  3. Schema-Enforced Abnormal Metric Handling: Use Parquet plugin’s strict schema-dropping behavior to prevent malformed or unexpected metric changes. Hudi ingestion then guarantees consistent schema and data quality in downstream datasets.

  4. Data Platform Integration: Store Telegraf metrics as Parquet files in an S3/ADLS landing zone. Hudi’s Spark-based ingestion pipeline then loads them into a unified, queryable lakehouse with business events and logs.

Feedback

Thank you for being part of our community! If you have any general feedback or found any bugs on these pages, we welcome and encourage your input. Please submit your feedback in the InfluxDB community Slack.

Powerful Performance, Limitless Scale

Collect, organize, and act on massive volumes of high-velocity data. Any data is more valuable when you think of it as time series data. with InfluxDB, the #1 time series platform built to scale with Telegraf.

See Ways to Get Started

Related Integrations

HTTP and InfluxDB Integration

The HTTP plugin collects metrics from one or more HTTP(S) endpoints. It supports various authentication methods and configuration options for data formats.

View Integration

Kafka and InfluxDB Integration

This plugin reads messages from Kafka and allows the creation of metrics based on those messages. It supports various configurations including different Kafka settings and message processing options.

View Integration

Kinesis and InfluxDB Integration

The Kinesis plugin allows for reading metrics from AWS Kinesis streams. It supports multiple input data formats and offers checkpointing features with DynamoDB for reliable message processing.

View Integration