Friday, July 21, 2017

Something is very wrong with the Belfer bug rediscovery paper

This is what the paper says about its Chromium data:
Chrome: The Chrome dataset is scraped from bugs collected for Chromium, an open source software project whose code constitutes most the Chrome browser. 20 On top of Chromium, Google adds a few additional features, such as a PDF viewer, but there is substantial overlap, so we treat this as essentially identical to Chrome.21 Chrome presented a similar problem to Firefox, so to record only vulnerabilities with a reasonable likelihood of public discovery, we limited our collection to bugs labeled as high or critical severity from the Chromium bug tracker. This portion of the dataset comprises 3,397 vulnerability records of which there are 468 records with duplicates. For Chrome, we coded a vulnerability record as a duplicate if it had been merged with another, where merges were noted in the comments associated with each vulnerability record, or marked as a Duplicate in the Status field.

The problem with this methodology is simply that "merges" do not indicate rediscovery in the database. The vast majority of the findings relied upon for the paper are false positives.

To look at this, I went through their spreadsheet and collected out those mentioned 468 records. Then I examined them on the Chromium bug tracking system. The vast majority of them were "self-duplicates" from the automated fuzzing and crash detection systems.
I'm a Unix hacker so I converted it to CSV and wrote a Python script to look at the data. Happy to share scripts/data.

Looking at just the ones that have a CVE or got a reward makes more sense. There's probably only 45 true positives in this data set (i.e. the ones with CVE numbers). That's 1.3% which agrees with the numbers from the much cleaner OpenSSL Bug Database (2.4%) from this paper.

Example false positives in their data set:

  • This one has a CVE, but doesn't appear to be a true positive other than people noticed things crashed in many different ways from one root cause.
  • Here someone at Google manually found something that clusterfuzz also found.
  • Here is another clear false positive. Here's another. Literally I just take any one of them, and then look at it.
  • Interesting one from Skylined, but also a false positive I think.

No comments:

Post a Comment