How Threat Hunting Tackles Data Brokering Problems
The Nightmare of Today’s Data Brokering Environment
“Tune at the source” is a common paradigm used by security organizations to try to mitigate data storage and throughput issues. This challenge continues to give nightmares to every security engineer. It’s not necessarily because the concept is incorrect; but because it ends up being one of the greatest limiters of today’s security organizations. If data is king, the king’s gold is being poorly invested.
Today’s data brokering environment is similar to the difficulties of putting a fitted sheet on a bed.
However:
- Instead of 4 corners there are over 4 dozen corners
- The corners to the bed keep moving in size and location
- Each corner is different with their own proprietary interface
Let’s continue with this light-hearted analogy. If asked to “tune this problem at the source”, organizations would have their best and brightest (and most expensive) resources understand the most important corners. Once they have understood, let’s say six corners (12 if you are lucky), the team now has a makeshift environment.
There needs to be a solution that can understand the complete picture of the various data sources, volumes, and how tweaks in one source help the organization.
But the questions remain:
- What are the true data bottlenecks of data for security organizations?
- How does Threat Hunting enter the picture?
As it turns out, Threat Hunting is the key to curing nightmares and making this impossible bed.
Data Brokering: The Limitations of Storage
The two biggest bottlenecks in data brokering are:
- The amount of storage available where the data is going
- The ability to reduce (not eliminate) volume from data sources
We have entered a new data problem in cybersecurity. For years, organizations have demanded data that wasn’t available; now organizations are overwhelmed with the diverse volumes of data. The two limiting factors are how much storage they have for the data and how much is known about the proprietary data sources.
Taking a Look at Storage
Storage is the typical starting point to get use out of data sources. You can determine how much storage is available, and then walk back to the data sources. The process is:
- Identify, forward, and store data from an important source
- Check impact on available storage
- Repeat with the next important data source until storage is exhausted
- Cut a check for more storage. And repeat.
The resulting situation is the loss of key data sources due to storage limitations. Figure 1 visualizes the effects of the process described.
Figure 1 Losing Data Due to Storage Limitations
Tuning at the Source
This process is essential. Every data source has elements an organization doesn’t want. Why waste precious storage from unwanted data? The answer falls into two main buckets:
- Didn’t know the data wasn’t predictive or useful
- Didn’t have the bandwidth to tune the data
This is where the concept of tuning at the source enters the discussion. To reduce the level of unwanted data, the source sending the data must be modified. This process is a modest level of work for data sources already in storage. It is simple to identify, from the storage location, what types of data aren’t wanted. The data source responsible for the unwanted data can then be adjusted.
This process is graphically shown in Figure 2.
Figure 2 Challenges of Tuning at the Source
The obvious challenge shown is the repetitive tuning tasks from each source. The not so obvious challenge, which is critical but often overlooked, is the requirement that the data must be stored before tuned. What about tuning data that isn’t in storage yet?
This is a much more challenging problem. There is no understanding of what impacts tuning will have on storage because it isn’t stored. If the return on investment isn’t understood why invest in the first place? This unknown causes organizations to lose most of the valuable data they are generating.
Is there a way to more efficiently tune multiple sources AND tune before you store?
Before offering a point of view on this critical question, let’s first talk about Threat Hunting.
Threat Hunting, and the Challenge of too Many Integrations
Threat hunting is about using as many data sources as possible to proactively determine the integrity of something you are trying to defend (e.g., network, user, device). The analytics and practice of threat hunting are increasingly complex. How to get accurate and easy to understand answers from the various piles of data is essential.
One answer is to increase the importance of machine-based analytics on a live stream of data. This reduces the burden for human-built workflows that rely heavily on the limited bandwidth and expertise at security organizations. Sounds like a win, right? Well, only kind of.
Dedicated threat hunting platforms are becoming more commonplace in security organizations. However, where the platform lives and how it gets fed data is up to the organization. A common implementation is to have the threat hunting platform receive its own feed of traffic which can be redundant to what gets sent to storage. See Figure 3.
Figure 3 Common Threat Hunting Implementation
This implementation helps solve an issue of important data sources dying off the vine due to storage limitations, but it increases the integrations in the environment and resources on the local sources. Unfortunately, this is putting the cart before the horse by having organizations stream multiple sources of data to another location.
Conclusion: Why Not Both? Threat Hunting and Data Brokering Combined
The overlapping challenges between data brokering and threat hunting are clear. Efficiently streaming the myriad data sources in an environment and proactively analyzing the data in real-time to hunt for threats without a storage concern is needed. Threat hunting platforms that provide a solution to data brokering challenges will put the horse before the cart.
When security engineers are told to “tune at the source”, they should instead envision a source between storage and the data originator which (by the way) can also threat hunt. This maximizes the use of data sources and efficiently tunes data while it gets stored.
Figure 4 Combined Data Brokering and Threat Hunting Solution
Would a solution like this be helpful to you? Let us know! BluVector is an innovator in the industry, working to best support the security needs of both government and commercial businesses. Schedule a meeting to talk with an expert today about your cyber security vision, and how BluVector can augment your existing processes and bring your protection to the next level.