On telemetry spy scandals

Another day, another OH NOES, MICROSOFT ARE SPYING ON ME scandal.

I never understand this, particularly when it comes from developers, as in this case. Surely every piece of software you write has logging, metrics, error reporting, analytics and other telemetry features? If it doesn’t, then it should. As much as possible. You should want as much data as possible about how your application runs, what people are doing with it, what environments it is running in.

Of course all the data collected should be anonymized; it should not contain personally identifiable information or any trace of the actual business data from the application. That goes without saying?

Why, then, should we be surprised or offended when the software that we use does the same thing? Who are these paranoid people who think that, of the millions or billions of data points collected by software from Microsoft or Apple, somebody somewhere cares specifically about them?

I wonder if any of these people ever actually look at metrics, through something like Splunk or Grafana or Kibana?

Screenshot from Grafana

When collecting and analyzing this kind of data, it’s really useful to create a non-identifying hash for users, so correlations can be identified: "people who do activity A tend to access the site from mobile devices", for example, or "slow build times are related to fewer test runs".

Yes, in the case of a LoB application with only dozens or hundreds of users, it would be simple to just run the hashing algorithm against the user table and identify each individual person, and go over their activity stream. And yes, that’s also true for millions or billions of users. But why would people building software for millions or billions of users ever do that? What would be the point? The only way data collected at that scale has any kind of meaning is in aggregate, in charts and dashboards and reports and pattern-identification.

Whether it’s dotnet or Cortana or basically every video game you ever play, collecting masses of data feeds directly into making the product work better, and we should be welcoming that effort, not throwing our toys out of the pram and screaming into the social media echo chamber that our fellow engineers are part of an Evil Conspiracy, digging through our digital dustbins looking for ephemeral dirt.


Share on facebook
Share on google
Share on twitter
Share on linkedin


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.