Information Theory and Elastic Pt 2: Intro to Kibana Graph

6 min readNov 15, 2020

Introducing Graphs:

One of the fantastic features integrated into Kibana is the ability to visualize data into graphs. Note that we are not discussing things like bar graphs and pie charts, but rather the mathematical tree-like structure above. Graph Theory is a branch of mathematics that studies the relationship between entities. The circles in the figure above are called vertices/nodes(I will choose nodes for the rest of this article) and the lines are edges. When two nodes are connected it indicates some sort of relationship between them. In the figure above the nodes are separated into two fields. The light red field is ‘host.name’, while the orange field is ‘process.name’. The edges between them are weighted based on the strength of relevance between the nodes. Graphically this weight is scaled according to the width of the edge. The graph gives me insight into the activity of these hosts. The trend that is clearly apparent is that the hostnames in the upper region are more similar in nature than the host, ‘Star Gazer’ at the bottom. The reason behind this clustering is that Star Gazer is my regular-use laptop, while the hosts at the top of the graph all belong to an adversary emulation lab. The two sets rarely interact with each other and serve very different use cases.

What Mark Harwood Taught Me:

One thing I strongly believe in is giving credit to those who taught you the most. In this case, my prime source of research was a deep dive given by Mark Harwood on Kibana Graph. After finishing this article I highly suggest watching this video for more information. The main point Harwood makes is the importance of looking at relevance versus pure count when creating the links between nodes. The best example he uses to illustrate the point is when looking at what links can be made when looking at favorite music artists. The example dataset has a large number of users that have all given a list of their favorite artists/bands. Harwood presents the goal of using a graph to determine what artists are most related to each other in order to give suggestions for possible future choices.

Like with any problem transmitting information the problem is noise. In this case, the noise is caused by certain artists that will show up on just about everybody’s list but may not mean they are similar musically. In a purely quantitative graph system, there would be no method to eliminate these sources of noise without removing them from the dataset, but that becomes problematic and can skew results in the end. A better approach is to rely on the relevance algorithms that Elasticsearch implements and use those results to build the links. Using this method the noise is filtered out and the resulting graph clearly show artists that actually share a common musical influence.

The types of relationships that can be communicated with the graph tool are important because the information carried significance. It can be easy to lose ourselves in giant data lakes and drown in the details behind terabytes of logs. Analysts need to be able to break through that and find the information that brings a new warning, understanding, or insight. Kibana Graph provides a way to communicate that presents the nodes and links within our data and makes them clear.

The Process:

The first step to getting started is navigating to the graph option in the Kibana section on the main menu. Once in the graph tool, you follow the three steps of choosing a dataset, field(s), and query(optional).

Interesting Relationships:

This first graph is the host.name/process.name selection that I introduced at the beginning of the article. It turned out that the shape of the graph was being dictated by the fact that the host, ‘StarGazer’, was in fact the only machine not participating in a lab activity. In the version below I used the styling options provided to give this relationship a special color. This way if I wanted to display the results to others, or use this as a jump-off point for more queries into the relationships it would be easy to spot where to start. You also have the ability to add or remove nodes to fit your needs.

This next graph is focused on a very important question in network traffic analysis. I have chosen to graph the relationships between the host.name and network.protocol. Luckily for us, this is a simple test environment with only http and dns to worry about. Imagine the links that would be expected, or even more importantly not expected in a full-sized environment. Notice the widths of the edges show the weight between the nodes.

In this next graph, I have started to expand into the implementation of threat intelligence and detection. Every signal alert in Elastic is tagged with the MITRE ATT&CK technique that is associated with the detection. Using this information it was possible to create a graph with the host.name/technique.name in order to visualize which techniques were associated with which particular hosts. Understanding the threats faced in an environment is the start of an active defense, and this graph is the beginning of knowing more about the threat landscape that is being faced. From here further details about the hosts could be explored that might show an important link of why those techniques were present. With the editing tools it is very possible to continue the drill down.

A topic that has become very important at MITRE ATT&CK is the source of our data for particular events. Following this line of thought I created a graph looking at event.category/process.name. This graph provides an interesting window into what activity on my devices is actually generating the data that falls within each category. Of course, since I was running emulation tests with PowerShell it, unfortunately, got highly linked with malware and intrusion detection. Though the other data does provide a good map for analysts and detection engineers to look at when getting an idea of what processes tend to cause events of any certain type. Data and insights that could prove the difference during an investigation or rule development.

A Big Part to Play:

It is my belief that Kibana Graph will play a big part in helping the Elastic Stack reach its true potential. The examples I showed were from a small lab sample, but they already show the very key relationships and questions that can be answered using Graph. When applying the Stack beyond the SIEM Workflow it will become necessary for such analytical tools to be at the hands of the user inside of Kibana. Threat Intelligence and detection engineering are my favorite two examples of work areas that will prove quite fruitful and there are many more. There are some barriers that I believe can be addressed the same way other toolsets have grown. I bring machine learning jobs as one of the best features that are usable out of the box without any user setup. The anomaly jobs that are available with a simple click are making a big difference in how quickly the idea of including ML jobs in your regular detection toolkit. I believe that Kibana Graph needs the same treatment to gain wide traction. Depending on the use case amongst the Search, Observability, and Security Paradigm there could be a key set of graphs included for users to start with right away. This would give them a better idea of how it fits in with other tools and they can expand from there. The fact remains that Kibana Graph is a powerful tool to eliminate the noise stopping the messages your data is trying to send from reaching your analysts.

Information Theory and Elastic Pt 2: Intro to Kibana Graph

Written by ivan ninichuck

Responses (2)