February 7, 2016

Heisenberg Developers

Finely grained management of software developers is compelling to a business. Any organization craves control. We want to know what we are getting in return for those expensive developer salaries. We want to be able to accurately estimate the time taken to deliver a system in order to do an effective cost-benefit analysis and to give the business an accurate forecast of delivery. There’s also the hope that by building an accurate database of estimates verses actual effort, we can fine tune our estimation, and by analysis find efficiencies in the software development process.

The problem with this approach is that it fundamentally misunderstands the nature of software development. That it is a creative and experimental process. Software development is a complex system of multiple poorly understood feedback loops and interactions. It is an organic process of trial and error, false starts, experiments and monumental cock-ups. …

Amen. I’ve never been in a development project where estimates actually work. I encourage you to read the entire article — it’s 100% on point.

February 5, 2016

Three and a half degrees of separation

How connected is the world? Playwrights [1], poets [2], and scientists [3] have proposed that everyone on the planet is connected to everyone else by six other people. In honor of Friends Day, we’ve crunched the Facebook friend graph and determined that the number is 3.57. Each person in the world (at least among the 1.59 billion people active on Facebook) is connected to every other person by an average of three and a half other people. The average distance we observe is 4.57, corresponding to 3.57 intermediaries or “degrees of separation.” Within the US, people are connected to each other by an average of 3.46 degrees.

Well, there we are. Not six but something closer to four. Or maybe even closer to three, given that the figure has shrunk each time Facebook—which continues to grow—has done one of these studies.

February 2, 2016

On The Witness and digital piracy

I’ve always taken a harsh against pirating digital goods, and reading Jonathan Blow’s recent tweets about how many people have pirated The Witness just made me more angry about the situation.

For this particular game, the chief excuse seems to be “$40 is too high”.

I understand that some people may not be able to afford The Witness. But even if the price is too high for someone, it doesn’t mean the value proposition isn’t there — quite the opposite. The Witness has over 50 hours of content. It has sublime graphics, great gameplay, and runs well on both platforms it’s currently on. In my estimation, it’s one of the best puzzle games ever made. Just because you can’t afford it, it doesn’t mean it’s okay to not pay for it.

Caption text

When else would I get the chance to use a South Park episode for cover art?

The problem isn’t the $40 price tag. Price is just one in a long line of excuses pirates use to justify not paying for someone else’s work.

  • If a game has DRM: “I’m pirating it because I won’t tolerate DRM”
  • If a game is developed by a large company: “I’m pirating because company X doesn’t care about its customers”
  • If a game isn’t available on every platform under the sun: “I’m pirating because it’s not also available on X”
  • If a game isn’t available in every region yet: “I’m pirating because company X doesn’t give shit about country Y”
  • If the game has DLC: “I’m pirating because they want us to pay for stuff that should have been in the base game”

The list goes on and on. Examining it any further is a futile exercise. Piracy will continue regardless of  prevailing circumstances, because pirates are a bunch of self-entitled people that simply like free shit.

January 29, 2016

You know when someone holds a presentation and uses the word we when discussing something that could be done better? “We can improve here”, “we should not do this”.

In my experience, it always translates into “you, the audience”. Just because someone uses the collective we doesn’t mean the audience doesn’t see right through it.

January 24, 2016

Today’s purchases: Donkey Kong Jungle Beat for the GameCube (I own the bongos, so why not) and Mario’s Tennis for the Virtual Boy (I admit I haven’t got a good reason for this one).

That makes 549 games in my collection. I wonder if I’ll break 600 before year’s end.

January 21, 2016

SEGA just announced that the SEGA 3D Classics Collection is coming to retail in the US (it was only available in Japan previously).

I really wish they’d announce it for Europe, too. Mainly because M2 is behind the compilation, which ensures it’ll be absolutely top-notch.

How “Making a Murderer” Went Wrong

For those people, and for others close to the original case, “Making a Murderer” seems less like investigative journalism than like highbrow vigilante justice. “My initial reaction was that I shouldn’t be upset with the documentarians, because they can’t help that the public reacted the way that it did,” Penny Beerntsen said. “But the more I thought about it, the more I thought, Well, yeah, they do bear responsibility, because of the way they put together the footage. To me, the fact that the response was almost universally ‘Oh, my God, these two men are innocent’ speaks to the bias of the piece. A jury doesn’t deliberate twenty-some hours over three or four days if the evidence wasn’t more complex.”

The Avery case can been discussed ad nauseam in the press recently, but this is one of the more critical pieces I’ve read. Good stuff.

If it’s not statistically significant, it’s not worth your attention

Businesses love making surveys. They really do. Ask a people a bunch of questions, aggregate the results, and you’ve got all the stuff your great powerpoint presentation needs. Upper management adores this kind of shit.

But there’s a problem, and its name is statistical significance.

Too often I’ve seen results of surveys done with an extremely small sample size taken as gospel. Asking 5-10 people to answer questions on a Likert scale and then interpreting the data as something valid is absolutely laughable. “The average here is 1,2 so action must be taken”. With small sample sizes, such claims are a) outright lies and b) deserving of absolutely no attention whatsoever.

Let’s use an example to illustrate why such results shouldn’t be taken seriously.

Say you are a university professor and want to ask students of your course to give a simple rating from one to five (one = strongly disagree, five = strongly agree) on the following statement:

“The course material provided me with all the information I needed to do well in the exam.”

Further assume that 100 students have enrolled in the course, and although you send the survey to 50 randomly chosen students, you only get ten responses.

In such a scenario, the population is the set of 100 enrolled students. The response rate of the survey is 10/50 = 20%. The sample size is 10 (the number of students that responded).

Let’s say we get the following responses:

  1. 1
  2. 5
  3. 3
  4. 3
  5. 4
  6. 5
  7. 4
  8. 3
  9. 4
  10. 4

Calculating the mean would give a rather respectable 3.6, enough for the professor in question to give himself or herself a pat on the back.

Or is it?

Let’s calculate some descriptive statistics. I cheated and used an online tool:

  • Mean: 3.60
  • Min: 1
  • Max 5
  • Standard deviation: 1.17
  • Confidence interval (95%): 2.92 to 4.28

It’s the last statistic is the most interesting. Basically, it’s saying that if we run the survey 20 times with possible different samples, 19 times out of 20 the mean would be something ranging from 2.76 to 4.44. Given our question, the only conclusion we can draw that the course material may, at worst, be slightly poor (2.76/5.0) or, at best, very good (4.44/5.0). That may sound like a stupid conclusion, but that’s because it is. Intuitively, it tells us jack shit.

On second thought, it does tell us something valuable. It tells us to pay the survey results no mind whatsoever. They are useless. Our professor should know better.

So why are people so adamant about making surveys? I don’t know. Maybe it has to  do with them being seemingly easy to make. Maybe it’s because people hope to get a large sample, thereby gleaning statistically significant information. Maybe it’s because people don’t know that it’s a) entirely possible and b) advisable to calculate the sample size you’ll need to obtain a statistically significant result before a survey is carried out.

Or maybe it’s because although the high level statistical concepts are easy to grasp, statistical methods (both their inner workings and purpose) are much more complicated. I must confess that although I’ve studied statistics for years, much of it still baffles me. I’ve had the honour to work with many great data scientists and statisticians, and I’m still in awe of how concepts like confidence intervals, null hypotheses, t-tests, degrees of freedom and z-scores come naturally to them. I have to work tremendously hard at it and as I learn more, the more like an imposter I feel.

My statistics professor once told me that statistics are everywhere, and once you realise that you’ll understand just how important they are.

He wasn’t wrong.