• United States




Big security data: What to keep, for how long?

Feb 25, 20144 mins
Network Security

Over the last 10 years investments made by Brazilian banks in cyber security have grown substantially. As has the data. Now, at least two camps are emerging in the debate about what data should be kept for various time intervals.

Brazil has always been a pioneer in financial services. It was one of the first countries in the world to offer Internet banking, which also means it was one of the first countries in the world trying to mitigate the risks related to Internet banking.  However, Brazil has an extreme shortage of security professionals. This coupled with sometimes-fragile infrastructure means they are used to getting through issues with “white knuckle” security – no time, no resources and the results are needed yesterday.  While this model was never very good, it actually worked for some years. It didn’t work great, but it got the job somewhat done through sheer willpower. But today’s threats operate with greater stealth, speed, and sometimes sophistication compared to their predecessors, and as such require a better and more exhaustive approach. One element of this this new approach is big security data.

Raw Data and Metadata

Our conversations were primarily around raw network packet data and metadata that’s derived from these packets. When it comes to log management and SIEM solutions, collection is generally measured in the thousands of logs per second and retention in terms of quarters or even years. Regulatory mandates can play an important role in just how long these logs and alerts are kept. This isn’t necessarily the case with packet data. Packet data, and the artifacts like Microsoft Office documents, voice messages, videos, ISO images and the rest can eat through storage pretty quickly especially when instead of thousands of logs a second there are several million packets a second to be read, indexed, classified and stored. In order to have a sufficient volume of data to empower the security team, how much should be kept?

The answer to packet data retention fluctuates dramatically based on factors such as budget, use cases, desired raw packet versus metadata retention levels, existing storage capacity, utilization of network pipes (for example a 10 gig link may only be 20% utilized) and other variables depending on the organization’s needs. There isn’t a “one-size fits all” approach to retention.

More or Less

Across the banks in Brazil there seemed to be two primary camps on the topic of data retention.

Group one felt that packet data and related artifacts are most beneficial when used in a near real-time capacity with retention only stretching back about a week. In this group’s opinion the data starts to become less valuable and operationally useful very quickly and the value past a week doesn’t justify the storage cost. Generally this group felt that the less storage intensive metadata should and needs to be retained longer. For these banks insider threats are always a core use case and the idea is that insider threat analysis requires longer histories so that user trends can be analyzed. In this case the average retention of metadata was thought to live around six months with the packet data around seven days.

The approach of group two was simply one of “more.” While they agreed that metadata retention should be longer that raw packet and artifact storage, they felt that group one’s approach was far too limiting for their use cases.  Not only did they have the desire to conduct analysis in near real-time, but they also wanted to have the ability to conduct longer-term forensic analysis against a much larger window. Many of the uses cases for this need were derived from potential 0-days, APTs, etc. – so that for attacks they may not catch right away, they would like to be able to go back forensically to determine how they initially got in, when they got in, if they are still in, and in the case of data theft – what was taken? Most of these banks discussed packet retention in terms of quarters. They feel that they needed at least two quarters of raw packet data and at least two years of metadata.

I’m curious to hear how other organizations are approaching the question of big security data storage – packets or otherwise. What are the use cases driving your retention requirements? Is it truly just a function of budget and the more dollars you have the more drives you buy, or are there more tangible, perhaps operational variables that are driving these decisions?

Image credit: Flickr/Roger Wollstadt (CC BY-SA 2.0)


Over the last two decades Brian Contos helped build some of the most successful and disruptive cybersecurity companies in the world. He is a published author and proven business leader.

After getting his start in security with the Defense Information Systems Agency (DISA) and later Bell Labs, Brian began the process of building security startups and taking multiple companies through successful IPOs and acquisitions including: Riptech, ArcSight, Imperva, McAfee and Solera Networks. Brian has worked in over 50 countries across six continents and is a fellow with the Ponemon Institute and ICIT.

The opinions expressed in this blog are those of Brian Contos and do not necessarily represent those of IDG Communications Inc. or its parent, subsidiary or affiliated companies.