OpenShift's traffic analysis capabilities can be further enhanced by analyzing NetFlow data using ElastiFlow, a powerful open source tool for analyzing network traffic in real time. In this article, we will explore how ElastiFlow can be used to analyze and monitor traffic on OpenShift / OKD 4.8 and its later versions (including 4.11 and 4.12+).
We will discuss the benefits of using ElastiFlow and how it can be used to improve the performance of your OpenShift applications. We will also provide some tips on how to get started with ElastiFlow and how to monitor traffic in real time. Finally, we will discuss some of the potential challenges associated with using ElastiFlow on OpenShift clusters.
OpenShift network flow data is collected in one or more formats such as NetFlow, sFlow, or IPFIX; this article focuses on NetFlow, which can be visualized using tools such as ElastiFlow.
By the end of this article, you should have a working ElastiFlow instance receiving NetFlow data from your OpenShift (or OKD) cluster with its OVN Kubernetes networking plugin, and understand its capabilities and how it can be used to improve the way you work with your cluster network.
NetFlow analysis is a powerful tool used by network engineers and IT professionals to analyze network data. By analyzing the data, it can be used to identify traffic patterns, diagnose network performance issues, detect security threats, and more. In short, NetFlow analysis is an essential tool for any organization that relies on its network infrastructure to function.
The ElastiFlow Unified Collector ingests, transforms, normalizes, translates, and enriches data from sources such as IPFIX, NetFlow, sFlow, and AWS VPC flow logs. The data is then visualized using the Elastic Stack (Elasticsearch and Kibana).
As a cluster administrator, collecting information from the pod network can help you monitor ingress and egress traffic, troubleshoot performance issues, and gather data for capacity planning and security audits.
When you enable network flow collection, only the metadata about the traffic is collected, such as protocol, source address, destination address, port numbers, number of bytes, and other packet-level details.
Your application may not be able to handle a surge in traffic or a sudden influx of users, but your network infrastructure can. NetFlow analysis is the key to understanding how your cluster traffic is behaving so you can make sure it's properly configured and ready to meet your application's needs.
NetFlow analysis is a must-have for any network administrator or security engineer because it helps identify network anomalies. When you're using a NetFlow analysis tool, you can get a lot of information about your network. This will help you identify potential problems with your network and allow you to find the best ways to fix them.
NetFlow analysis is also a great way to monitor your network and the applications running on it. By getting the information you need from NetFlow analysis, you can easily identify problems and make sure your network is running at its best.
NetFlow analysis is a great tool to help you better understand the traffic patterns and usage of your network. It's also a great way to perform security audits, allowing you to see if your network is at risk of being attacked or if you're having problems with your network.
It's also great for capacity planning, helping you know if you need to add more bandwidth to handle increased traffic, or if you need to make any changes to your network. It's an extremely valuable tool that can help you determine if your network has any problems that need to be fixed, and can help you plan for any changes that may need to be made.
In this section, we will show you how to configure OpenShift (or OKD) to export NetFlow data. We will cover setting up the NetFlow collector, configuring OpenShift (or OKD) to export NetFlow data, and finally verifying that the data is collected correctly.
To produce network flow data in formats that can be consumed by ElastiFlow, OpenShift/OKD must be configured to export flow data to a NetFlow collector. The NetFlow exporter component is not enabled by default. To enable the NetFlow exporter, you must configure the Cluster Network Operator (CNO).
By configuring the Cluster Network Operator (CNO) with a collector IP address and port number, the operator instructs the Open vSwitch (OVS) on each node to send the network flow records to the collector. It is possible to assign more than one type of network flow collector, such as both NetFlow and sFlow collectors. Even though they are different types, they both send identical records.
Capturing the network flow data and sending the records to the collectors can slow down the performance of the nodes. If the impact is too great, you can delete the destinations for the collectors to stop collecting the data and restore performance.
Let's assume that your target machine has an IP address of 192.168.1.50 and it will listens on port 9995 (ElastiFlow's default port). So create the following configuration file and name it as netflow.yaml
:
spec: | |
exportNetworkFlows: | |
netFlow: | |
collectors: | |
- 192.168.1.50:9995 |
Then apply it to your cluster's network by merging it:
oc patch network.operator cluster --type merge -p "$(cat netflow.yaml)"
The output of this command should read:
network.operator.openshift.io/cluster patched
You can verify that each OVS node has picked up this configuration with the following snippet:
#!/bin/bash | |
for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node -o jsonpath='{range@.items[*]}{.metadata.name}{"\n"}{end}') | |
do | |
echo | |
echo $pod | |
oc -n openshift-ovn-kubernetes exec -c ovnkube-node $pod -- bash -c 'for type in ipfix sflow netflow ; do ovs-vsctl find $type ; done' | |
done |
The output should look like this:
ovnkube-node-xrn4p
_uuid : a4d2aaca-5023-4f3d-9400-7275f92611f9
active_timeout : 60
add_id_to_interface : false
engine_id : []
engine_type : []
external_ids : {}
targets : ["192.168.1.50:9995"]
ovnkube-node-z4vq9
_uuid : 61d02fdb-9228-4993-8ff5-b27f01a29bd6
active_timeout : 60
add_id_to_interface : false
engine_id : []
engine_type : []
external_ids : {}
targets : ["192.168.1.50:9995"]-
…
To remove all network flow exporters, you can run the following command:
oc patch network.operator cluster --type='json' -p='[{"op":"remove", "path":"/spec/exportNetworkFlows"}]'
This should be confirmed by the following output:
network.operator.openshift.io/cluster patched
Now that the OpenShift cluster is producing NetFlow output, we need to set up a consumer. Once this step is complete, you will have a working ElastiFlow setup ready to monitor your network activity in real time.
In this setup, Elastiflow will collect the NetFlow data and forward it to the Elasticsearch instance; the NetFlow data will be sent in JSON format to Elasticsearch, where it will be stored and indexed. We will then use a Kibana dashboard to visualize the data. Here are the steps to deploy ElastiFlow in a minimal Docker environment.
Create the following files and their contents:
services: | |
collector: | |
image: elastiflow/flow-collector:6.1.3 | |
container_name: collector | |
restart: 'unless-stopped' | |
network_mode: 'host' | |
volumes: | |
- collector-data:/etc/elastiflow | |
depends_on: | |
- elasticsearch | |
environment: | |
# Extended config is in elastiflow.env for enrichments. | |
EF_FLOW_SERVER_UDP_IP: '0.0.0.0' | |
EF_FLOW_SERVER_UDP_PORT: 9995 | |
EF_FLOW_DECODER_NETFLOW5_ENABLE: 'true' | |
env_file: | |
- elastiflow.env | |
elasticsearch: | |
container_name: elasticsearch | |
image: docker.elastic.co/elasticsearch/elasticsearch:7.15.2 | |
ulimits: | |
memlock: | |
soft: -1 | |
hard: -1 | |
nofile: | |
soft: 65536 | |
hard: 65536 | |
healthcheck: | |
test: curl -u elastic:elastic -s -f elasticsearch:9200/_cat/health >/dev/null || exit 1 | |
interval: 30s | |
timeout: 10s | |
retries: 5 | |
volumes: | |
- elasticsearch-data:/usr/share/elasticsearch/data | |
environment: | |
# Extended config is in elasticsearch.env for enrichments. | |
- xpack.security.enabled=false | |
- "discovery.type=single-node" | |
env_file: | |
- elasticsearch.env | |
networks: | |
- es-net | |
ports: | |
- 9200:9200 | |
kibana: | |
container_name: kibana | |
image: docker.elastic.co/kibana/kibana:7.15.2 | |
environment: | |
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200 | |
networks: | |
- es-net | |
depends_on: | |
- elasticsearch | |
ports: | |
- 5601:5601 | |
networks: | |
es-net: | |
driver: bridge | |
volumes: | |
collector-data: | |
driver: local | |
elasticsearch-data: | |
driver: local |
indices.query.bool.max_clause_count=8192 | |
search.max_buckets=250000 | |
action.destructive_requires_name=true | |
reindex.remote.whitelist=*:* | |
xpack.monitoring.collection.enabled=true | |
xpack.monitoring.collection.interval=30s | |
xpack.security.enabled=false | |
ES_JAVA_OPTS=-Xms4g -Xmx4g |
EF_LICENSE_ACCEPTED=false # Set this true, to accept ElastiFlow license (required) | |
EF_FLOW_DECODER_ENRICH_MAXMIND_ASN_ENABLE=false | |
EF_FLOW_DECODER_ENRICH_MAXMIND_GEOIP_ENABLE=false | |
EF_FLOW_DECODER_ENRICH_MAXMIND_GEOIP_PATH=maxmind/GeoLite2-City.mmdb | |
EF_FLOW_DECODER_ENRICH_MAXMIND_ASN_PATH=maxmind/GeoLite2-ASN.mmdb | |
EF_FLOW_DECODER_ENRICH_DNS_ENABLE=true | |
EF_FLOW_OUTPUT_ELASTICSEARCH_ENABLE=true | |
EF_FLOW_OUTPUT_ELASTICSEARCH_ADDRESSES=127.0.0.1:9200 | |
EF_FLOW_OUTPUT_ELASTICSEARCH_ECS_ENABLE=true |
This setup will run ElastiFlow on your host machine's IP with UDP port 9995 to receive flows, also Elasticsearch will listen on port 9200 while Kibana will use port 5601.
Once all the files are created, just run the command (Docker Compose v2) to start the stack:
docker compose up -d
Since Kibana will be our dashboard, we need to create the visualizations on it. Ready-made dashboards are provided by ElastiFlow, let's import the one suitable for our starter configuration here:
kibana-7.14.x-ecs-light.ndjson (right-click and save as)
Then follow these steps to import visualizations into Kibana:
http://192.168.1.50:5601/app/management/kibana/objects
(note the IP address, if you have changed it),Once the import is complete, open Kibana's main menu, select Analytics > Dashboards and click on ElastiFlow: Overview, which is the main view of all collected data.
You can then choose from specific views to filter the collected data, such as Overview, Top-N, Core Services, Threats, Flows, Geo IP, AS Traffic, Exporters, Traffic Details, Flow Records, depending on what you are looking for. Also in these views you can filter by OpenShift nodes (Flow exporters) and you can select Pod IPs as client and server to filter, also it is possible to select services/ports to filter by.
In this article, we discussed how to enable NetFlow data export on OpenShift/OKD and how to use ElastiFlow as a collector and processor of NetFlow data. We saw that by using ElastiFlow, we can quickly get a lot of useful information about the applications running on OpenShift/OKD, such as the traffic sent and received by the applications. This information can be used to detect anomalies and troubleshoot connectivity issues. Using ElastiFlow with OpenShift/OKD allows us to quickly and efficiently monitor the network traffic of our applications.