Evolution of a Go program

About the development of Moyo Go Studio, software to (help) play the Oriental game of Go. Go is a two-player zero-sum game of perfect information. It is considered much harder than Chess. Currently, in spite of enormous effort expended, no computer program plays it above the level of a beginner.

Saturday, November 11, 2006

The Three Fucks

A good thing to know for aspiring programmers are a few coding style rules. I call them "the three fucks", and they are:

- What the fuck
- Why the fuck
- How the fuck

Almost no bad programmer practices the three fucks, and all good programmers practice them. (For those bad at logic: This does not imply that practicing those three rules by themselves make a good programmer!)

#1: What

It should be perfectly clear, two decades after code has been written, well after the original coder has been hit by a truck, for a non-domain expert to understand what the code is doing. With all comments stripped from the source.
The only thing to achieve this is meaningful method- and variable names. "I have heard this all before" you say and "I do give my identifiers meaningful names". No you don't. I dare you, prove to me that you do. Prove it to yourself at least. Most likely, you can't. Long names are not the same as descriptive names. Especially not for someone who has to maintain your code.

Every time you give a function a name, you have to think of something that explains exactly, in detail, what that function will be doing. That is a talent most often associated with folks that write headlines for newspaper articles. This talent is so rare that most good journalists don't even possess it. It's left to the pro's. What makes you think you can do it? I am sure you can't if you haven't done several years of hard effort and especially refactoring.

When maintaining legacy code, I often come across names like: "PrepareDatesForBubbleDiagram". I am sure the Original Programmer thought he'd done a good job in providing a descriptive function name, but in fact this name isn't worth the bytes it occupies. It says nothing about what the function does. That the function "prepares" dates can be learned at a glance, but WHAT does it do to those dates? Complicated stuff, that's for sure.

In Delphi and other languages, "DateTimes" are floating point numbers, with the part behind the decimal point being a fraction of a day. You can imagine that juggling around with fractions, doing all kinds of arithmetical operation on those numbers easily results in something that can't be easily understood merely by looking at the code. 791023.47104 simply does not look like August 12, 2005, 13:45, and adding some of those numbers and then rounding the result to two decimals is NOT going to tell the casual observer what's going on.

"PrepareDatesForBubbleDiagram" could, with the same clarity have been called: "DoSomethingSecretWithTheDatesForBubbleDiagram". Most code is full of those "descriptive" names.. Of course the right way to name such a function would be "NormalizeDatesToExcludeWeekends" or "ModifyDatesToFiveMinuteResolution". But that takes a bit more thinking and we are not prepared to do it, hyped-up on caffeine as we are or preferring to browse for funny video's on the net. To provide more clarity as to what the function does, also name variables that are not used as simple counters properly. Meaning that from their name should be exactly clear what they do. If that means a variable name takes 20 characters, so be it. Camel-capping is your friend.

#2: Why

"Why" is almost as important as "what"

If I don't know why something is done, I might just as well rip it out completely, see what happens and rewrite the whole thing, making it ten times faster in half as many lines. That's what I usually do when something doesn't work and I have no clue why because the OP abandoned the project.

Let's take "NormalizeDatesToExcludeWeekends" as an example. DateTimes are removed when they fall in a weekend, that I can deduce from the source and the name. But WHY?

Is it because an associated component can't handle them, or because the customer does not want to see weekends, or because the data on weekends is unreliable, or because weekends are not working days and 30% of space can be saved?

When people are working on your old code, it's often because it is buggy, or it's too slow, or features need to be added to it. It would help if it were clear what the code is is supposed to do in the first place. This is why comments have been invented. Some say "It's easy to abuse/overuse comments" but that's just as abusing/overusing money. There is no such thing as "too much money", only money ill-spent and ill-invested. The same goes for comments. There is not such thing as "too many" comments, only unclear, superfluous or misplaced comments. Comments are much less used for the "what" part as for the "why" part. "what" is taken care of by the code itself, the identifier names. "why" is of secondary importance and due to the verbosity involved, comments are better suitable for conveying "why". "Why" should cover the following:

A) Why the code does what it does
B) Why it is coded the way it is coded (including improvement suggestions)

B is most often ignored, even in the rare event of observing A.
With B, it becomes much easier to make refactoring decisions and discover bugs that pertain to algorithm and architecture.

#3 How

The deepest level of detail is the "how" of the code.
To be able to modify/bugfix code instead of replacing it completely, we need to know how it achieves a purpose, or, more accurately, how the author intended the code to work.

"How" is also covered by comments, and we again need to know two aspects:

A) How the algorithm is supposed to work (a description)
B) How the algorithm is implemented with sufficient detail for a non-domain expert to maintain its code.

An example of how the three fucks are covered in a piece of code:

This code is also an example of "coding hygiene". Assignments and declarations are logically grouped and column-aligned and logical groups of statements are separated by newlines.
Type declarations come in order of "complexity", pointers below their simple type.

Enforcing a 80-column width is counter-productive with today's hi-res screens. We don't live in the 80-column punch card era any more.

Using braces in expressions, even though operator precedence does not require it, makes clear what is intended, very important when bugs have to be fixed later on.
You can put statements on the same line to show they are related but remember that the debugger won't be able to step through the source line by line in that case.

The goal must always be writing the clearest code possible, not to save keystrokes. Most of your time should be spent designing a solution - on paper - anyway. Clean code saves time debugging but - most importantly - it saves a massive amount of time refactoring when it has become "legacy".