Even mature organizations find great difficulty in testing their threat detection capabilities. Part of that reason is that every organization is unique. Many, while they wouldn’t admit it, often rely on test sample sets that offer false hope of truly testing their unique environments. For example, an architectural firm might use a specific datatype for its drawings and plans versus a healthcare environment that uses specialized image formats for x-rays or MRIs.

So, testing those environments with a generalized test sample set that focuses on a broad range of Office-related documents would likely supply false hope for those security teams and, ultimately, increase their risk. This is why it is recommended that security teams should tailor their samples to meet their unique environment along with any generalized test set.  This would include a better balance of PE32 files, Office documents and unique file types that match their current environment.

How many samples do I need for each file type?

Understand a test with too small a sample size has a good chance of being inaccurate or inconclusive. As you will come to understand a sample set of 2,000 samples with a confidence of 95% has a margin of error of 2%. Therefore, it is recommended that a sample size in the order of 2,000 be used to have confidence with your results. Feel free to read more about sample size and margin of error at https://www.isixsigma.com/tools-templates/sampling-data/margin-error-and-confidence-levels-made-simple/.

What’s next? Well, here is the crux of the problem. How do I get samples of unknown malware? Any malware you get from a repository or malware feed has already been seen by any vendor who subscribes to these services. This is all known malware.

There are two recognized methods for curating “unknown or zero-day malware”: Time lag analysis and creating unique generated samples. Time Lag Analysis involves setting up a scenario where you disconnect your solutions under test and allow for no internet updates to occur for a period of time. During this same period you can collect the number of samples and file types to then be tested. With your solution under test (SUT) still not connected to the internet so that it can’t be updated, you use these collected samples to perform the test on the SUT. Essentially you have taken known malware to test against the device that haven’t seen these new samples because they have been quarantined. Remember don’t allow the solution to update when you run your test either, some solutions might not allow you to do this, which should raise its own questions.

The creation of unique samples, while the best, is very difficult to manage. Therefore it is recommended that the time lag method be used. For more information on the creation of malware or zero-day testing we recommend you read the following from the Anti-Malware Testing Standards Organization (AMTSO) at https://www.amtso.org/documents/.

I’m sure you have gathered by now, this isn’t a walk in the park, but it is achievable. Ask your vendors questions. The more you understand how they test and validate their own solutions is important in your quest to deploy the best solutions for your organization. So again, for known threat testing, define a file type breakdown for the test and filter samples that you consider to be malicious either based on industry consensus or other validation methods. Gather lots of samples, 2000 or more. For false positive testing, create a set of files that include both malicious and benign files, these could include PUPs or PUAs and again, unless you are an expert in the field of malware use the time lag analysis method for developing your unknown or zero-day malware.

Nick Arraje has over 20 years of technical sales experience working with companies in telecom, network visibility and cybersecurity. While working in the telecom market, he worked with Tier 1 and network equipment manufactures on network simulation tools and testing scenarios to validate product features and capabilities.  Nick has degree in Electrical Engineering from Northeastern University.