Debugging in the (Very) Large: Ten Years of Implementation and Experience
Paper on Windows Error Reporting(WER) system. It has interesting features like:
1) Automatic bucketing of error reports based on heuristics at client and server side; ideally, reports on one bug are assigned to one bucket.
2) Progressive data collection from minimal dump to full; it’s determined by the needs.
3) Statistics-based debugging to compare conditions caused an error or to set a hypothesis and test if it is the real cause based on WER coming from billion machines.
#3 is the most interesting part of WER though the paper doesn’t mention much about it. It’s a great area to apply statistical thinking.