Wednesday, December 19, 2012

Commenting Code is Teaching

You have named your classes, functions and variables in most expressive and descriptive way possible. All your classes are single responsibility, functions are short as to fit onto a single screen (relative as that is) and variables are never reused. The code is formatted according to the standard and it’s aesthetically impeccable (at least according to you). When all this is done why would you put any real effort into writing comments? Your version control system will reject a check-in if you don’t write comments on your public, internal or protected members but that’s just for show, as you can always simply put “TODO: Comment” and the system will happily accept it. There, policy satisfied and who cares: it must have been some clueless apparatchik that came up with it. Your code is so obvious that it doesn't need any comments.

The above description is a caricature but there is more than a grain of truth in it. The code is obvious at the time of coding only because your mental model at the time is clear and (after a lot of debugging) matches what the code is doing. What code lacks is a way to communicate the actual intent behind it and whomever follows you in reading it will lack this crucial information. But more importantly, even if you had a way to perfectly communicate intent through code, what your code will never have, indeed cannot have is what at the end wasn't coded. The coding is an art of not only what has been written but also what hasn't been written. We constantly decide what to put in and what to leave out. We write, delete and rewrite all the time, searching for most optimal solution given our constraints. Then constraints change and we do it all over again. At the end we settle on a solution to exclusion of all others and we have the complete rationale for it. This rationale is one of the most important things we can teach to others and this teaching is best done through comments, alongside the code it talks about. But a rationale should explain not only what was left and why, but also why everything else was excluded.

Beman Dawes, the founder of boost.org, comments:

Failure to supply contemporaneous rationale for design decisions is a major defect in many software projects. Lack of accurate rationale causes issues to be revisited endlessly, causes maintenance bugs when a maintainer changes something without realizing it was done a certain way for some purpose, and shortens the useful lifetime of software.

Rationale is fairly easy to provide at the time decisions are made, but very hard to accurately recover even a short time later.

The tough part is that even when we know all this, our mental model of what code does is too accurate for simpleminded commenting. We have to step outside of ourselves, of our own thinking and comment code for others (including our future selves). And so to comment is to put oneself into other’s place and try to teach others about what’s important and what’s not important, about what we decided to leave in and what we decided to leave out and, above all, why these decisions were made. This is why we should write comments in our code – not because of check-in policies but because our job isn't only what has been coded but so much more and we should be, if nothing else, honor bound by our craft to try to teach those that follow us and make their work easier.

I treat all unclear, incomplete or (Turing forbid!) missing comments as defects during code reviews. And I expect to be called out on any lack of clarity and comprehensiveness in my own comments. It’s not perfect as our coworkers are often steeped into the same issues and can thus understand more than a random future maintainer. But it’s the only way to ensure that the code and comments we write complete each other and that 10 years from now they will still be meaningful.

Wednesday, May 27, 2009

Gloating just a little bit: new meaning of auto keyword in C++0x

I know this blog has been kind of dead in the water for over a year now (strangely I made only two posts since I met my wife! makes me want to go hmmmm.... but of course I'll refrain myself ;) (yes, the big, huge, humongous personal news is that I got married to the WonderWoman... but I bet that everybody here, if there is anyone left... or to be more precise if there was anyone in the first place!... thinks that his/hers better half is *the* WonderBetterHalf, so let's just leave it at that :)

Warning, unless you are a C++ developer this whole post won't mean much to you... Sorry about that. The next post will be about things we are doing in ApexSQL Log and ApexSQL Recover.

Anyway, since it's been such a long time, I thought that I should blog about something that I have been waiting for... at least since 2002 when I first publicly suggested it in now-defunct C/C++ User's Journal (a.k.a. CUJ.) I wrote in 2002 September issue of CUJ in We Have Mail section... well, it's too long to write it here but this is the meat of it:

"...my favorite pet idea for a core language extension: a keyword for an unknown type. It would work exactly like typeof extension, but without taking any parameters..."

After 7 years of waiting today I installed Visual Studio 2010 Beta 1 and I finally had a chance to run my first app with just such language extension from the upcoming C++0x language standard! :) I have known for quite some time that this extension has been officially introduced into the draft of the standard but this is the first time that I had a chance to try it (more on the upcoming standard can be read here.

Anyway, here's the example I used (a bit uninspired but I started from incomplete example I used in my email to CUJ so...)

#include <iostream>
#include <vector>

template<class T>
void process(const T& v)
{
for(auto it = v.begin(); it != v.end(); ++it)
{
std::cout << *it << std::endl;
}

}

int main(int argc, char* argv[])
{

std::vector<int> v;

v.push_back(1);
v.push_back(2);
v.push_back(3);

process(v);

return 0;
}

This spits out:
1
2
3

Marvelous! For those that don't see much value in this I will quote another example that I put forward in that original email:

"The previous example is rather simple, so advantage is not enough to warrant a core language extension. However, consider this example (adapted slightly from
The C++ Standard Library: A Tutorial and Reference by Nicolai M. Josuttis, page 306):

pos = find_if(coll.begin(), coll.end(), std::not1(std::bind2nd(std::modulus<int>
(),2)));

Imagine that we want to make a variable of the type returned by the std::not1(std::bind2nd(std::modulus<int>
(),2)) expression for further reuse..."

Then I went on to provide the exact type of this expression. Here it is for your reading pleasure:


std::unary_negate<std::binder2nd<std::modulus<int>>>


What a cute little type... NOT! Luckily I never had to use it in real life development and hopefully I never will but I have used much worse and, I'm sorry to say, most of my own making. Even though BOOST_FOREACH solved most of for loop direct references to STL iterators, this will be a welcome relief from the tedium of not only writing things like
:

std::map
<SomeTraitsStructure::SomeKey,SomeTraitsStructure::SomeItem>::iterator it = m_objectsMap.begin();

but also from looking at them. Instead, C++0x standard committee gave us:


auto it = m_objectsMap.begin();


I for one am very grateful for that! I can't wait for Visual Studio 2010 to be officially released so that I can get rid of all the verbose cruft.


Just to wrap up:
  1. Andrew Koenig answered that email on the pages of CUJ but at the time he had little hope that such an extension would be adopted. I'm very glad that the committee found a way to bring this forward.
  2. In no way do I think that I was the first to come up with this or some such nonsense like that. In fact I believe that it's a rather obvious extension (after all compiler in a strong typed language as C++ compiler already knows everything that it needs to know to make this work.) But I'm glad that I made that suggestion and that it's the earliest public reference that I know of so... I'm gloating today... just a little bit :)
  3. I'm quite aware of typedefs and have become equally tired of them in this context. Some would call it losing my religion but it's just too much for me. I think that the worst part is the naming: how do you name type of iterator on the map like the one I used above? SomeTraitsStructureKeyItemMapIterator? And all that for just one function... no thanks, I know I will love "auto" if nothing else just for replacing all such typedefs (have I already mentioned that BOOST_FOREACH has really helped the thing here?)
  4. The same "auto" feature has been introduced in C# in version 3.0: var keyword. I read here that MS warned about using "var" everywhere in the source code as it, purportedly, decreases readability of it and that it should be used only with anonymous types. I agree with this advice for C++ but to a degree. I wouldn't type:
auto i = 10;

I wouldn't use it even to shorthand something like "boost::shared_ptr<>" usage. We make a typedef once for every type that needs it and then use just that. I would keep that usage: seeing MyTypePtr is better than just "auto". But for types like we have in the examples above, "auto" keyword (or, to be precise, its new meaning) is really great. However, the real power of "auto" keyword in C++ will show itself in generic programming where today types have to follow naming conventions (in fact, IIRC, the original Koenig and Moo article was about type naming conventions.) In C# generic programing is... damaged goods. It's useful but it's been lobotomized on purpose by its designers so it never approaches usefulness of C++ generic programming model (nor it's complexity and trickiness - that was the whole point.) So yeah, "var" in C# has to be treated differently than "auto" in C++ even if they are essentially the same.


Whew.... Ok, that's all for this year folks ;)

Friday, January 25, 2008

Expressiveness of languages

Jeff Atwood over at Coding Horror asks in his new post "What Can You Build in 600 Lines of Code?". It reminded me of Babel-17, a novel by Samuel R. Delany. I must have read that novel 10 to 15 years ago and by now I have forgotten most of the details in it (and I'm now again grateful to Wikipedia for having reminded me of some the details.) However, among several things that I do remember (some strong images, transformation of one of the characters) there is one particular passage that stuck with me even now after all these years and readily sprang to my mind when I read Jeff's post. Here it goes (heroin Rydra talking to character named Butcher about the expressiveness of languages):
"... Now: there is a huge solar-energy conversion plant that supplies all the electrical energy for the Court. The heat amplifying and reducing components take up an area a little bigger than Tarik. One (,'iribian can slither through that plant and then go describe it to another ^iribian who never saw it before so that the-second can build an exact duplicate, even to the color the walls are painted—and this actually happened, because they thought we'd done something ingenious with one of the circuits and wanted to try it themselves—where each piece is located, how big it is, in short completely describe the whole business, in nine words. Nine very small words, too,"

The Butcher shook his head. "No. A solar-heat conversion system is too complicated. These hands dismantle one, not too long ago. Too big. Not—"

"Yep, Butcher, nine words. In English it would take a couple of books full of schematics and electrical and architectural specifications. They have the proper nine words- We don't."

"Impossible."
I always remember that passage by the crucial part: "They have the proper nine words- We don't." I have always found "nine words" to be really exaggerated but then it's a really compelling idea that stuck with me all these years. It goes to the core of expressiveness of languages in their proper context and is an example of domain specific languages: for ^iribians from the novel "their whole culture is based on heat and changes in temperature" so they have nine very short words to describe a whole solar-energy conversion plant.

Going back to Jeff's example of a production application in less than 600 lines: Ruby leveraged by Ruby On Rails is really expressive for certain types of problems. It's a web application framework after all and it's really expressive for problems of web applications. And it is this narrowing of context that allows it to be so expressive. It is built on a whole stack of software/hardware to get to that point in which less than 600 lines of code can become a full blown application:
1. Ruby On Rails is written in Ruby.
2. Ruby's interpreter is written in C and relies on C run-time library.
3. C run-time library is compiler- and OS- specific and is written in C itself, OS-specific calls and probably some assembly.
4. C compiler needed to build Ruby interpreter and C run-time library was probably written in C itself and some assembly.
5. OS was also most probably written in C and some assembly.
6. Assembler is needed to translate assembly into machine code.
7. Generated machine specific code (either from C compiler or assembler) is executed on CPU's microcode.
8. CPU's microcode was designed in software tools which are themselves written in some computer language running on an OS and on some previously designed hardware. And so on.

Not to mention that you also have a whole set of protocols (HTTP, TCP/IP), software, drivers, network cards, switches, networks and so on that allow you to transfer data to and from Ta-Da (the application in question.) At the top of all this stack is a very specific context of web applications in which Ruby On Rail excels.

Now consider the following passage from the same novel (same conversation, immediately prior to the passage I quoted above):
Take the ^iribians, who have enough knowledge to sail their triple-yoked poached eggs from star to star: they have no word for 'house', 'home', or 'dwelling'. 'We must protect our families and our homes.' When we were preparing the treaty between the (^iribians and ourselves at the Court of Outer Worlds, i remember that sentence took forty-five minutes to say in ^iribian. Their whole culture is based on heat and changes in temperature. We're just lucky that they do know what a 'family' is, because they're the only ones besides humans who have them. But for'house'you have to end up describing'. . .an enclosure that creates a temperature discrepancy with the outside environment of so many degrees, capable of keeping comfortable a creature with a uniform body temperature of ninety-eight-point-six, the same enclosure being able to lower the temperature during the months of the warm season and rise it during the cold season, providing a location where organic sustenance can be refrigerated in order to be preserved, or warmed well above the boiling point of water to pamper the taste mechanism of the indigenous habitant who, through customs that go back through millions of hot and cold seasons, have habitually sought out this temperature changing device . . .' and so forth and so on. At the end you have given them some idea of what a 'home' is and why it is worth protecting. Give them a schematic of the air-conditioning and central heating system and things begin to get through.
So whereas a ^iribians language can express "solar-energy conversion plant" in "nine very small words" it took "forty-five minutes" to say "We must protect our families and our homes." in it. (again I find these examples exaggerated... but I haven't yet had to communicate with an extraterrestrial culture so for all I know this could really turn out to be close to truth.) So even though you can write a whole application in Ruby On Rails in under 600 lines of code, it only works in the context for which Ruby On Rails was designed. Change the context and it would probably become incredibly inefficient and/or impossible in expressing the solution. How would you write an OS driver in Ruby On Rails? I would probably build a web C compiler and linker, then write the driver in C and compile and link it with Ruby On Rails!

Disclaimer: this is most definitely not a critique of Ruby On Rails! We use different languages in different contexts and today I wouldn't try to write a web application in C. I'm just discussing the expressiveness of languages in their contexts and outside of them.

Update: fixed some minor typos and layout errors.

Wednesday, January 23, 2008

CAM world *intersecting* SQL Server world

I never thought that my current area of work (in general auditing and recovery of SQL Server databases) would somehow *intersect* (pun intended - see below) with my previous efforts in 2D CAM (Computer Aided Manufacturing.) But SQL Server 2008: Spatial indexes article by Paul Randal proved me wrong. In the article Paul explains how SQL Server 2008 spatial indexes work but what was most interesting to me was that the solution he described matches a solution I applied in a 2.5D CAM application some 10 or more years ago. Don't get me wrong, that doesn't make me rocket scientist since and anyone who had to work with intensive 2D geometry calculations sooner or later figures out that calculating intersection of curves is expensive (sometimes more, sometimes less: depends on the curves) so avoiding it in advance by using "bounding boxes" is, in general case, much better than blindly trying to calculate every intersection. I remember that one great advantage in my case was that we were able to use comparison of integers instead of floating points which further sped up the algorithm. Also, in our case there was just one "bounding box" per curve which was more coarse but didn't use that much memory and was easily calculated.

The 2.5D CAM application that I refer to above was called "Impakt!" It was used to generate G-code programs for 2.5D CNC (Computer Numerical Control) mills. Half of D means that mill could only position itself in height without really orienting the tool in three dimensions. Thus the geometrical problems were reduced to creating optimal 2D cutting paths and then additionally repositioning the tool between different 2D depth levels.

We started development of Impakt! in Delphi 2.0 sometime in 1996. Impakt! worked like this:
1. You would load 2D geometry from a file in Autodesk's DXF format.
2. You would define depth of each contour by choosing each area and then setting the depth (this could then be saved in our own file format that kept depth and other information)
3. You would define your tool set for the job (we kept a DB of tools and users were able to define their own tools)
4. Application would then generate tool paths by going through the following process:
4.1 Take the tool with the largest diameter in the tool set and start with contours on the highest level of the defined geometry.
4.2 Analyze the contours and identify "lakes" (outward contours) and "islands" (inner contours)
4.3 Grow "islands" outward and shrink lakes inward by the radius of the current tool.
4.4 Solve intersections of the grown/shrank contours thus defining tool path. This really solves one particular path through Voronoi diagram of the 2D contours available on the given depth level (I can't remember why we didn't just calculate Voronoi diagrams and then generate paths... I think that there were cases we couldn't or didn't know how to handle with Voronoi diagrams and I don't think we even had math for it, at first anyway)
4.5 Continue with the process in 4.2 and 4.3 until all contours would fold on themselves.
4.6 Take next smaller tool in tool set, take left over contours from the last successful iteration of 4.5 and then start again at 4.3.
4.7 Once there are no more available tools in the tool set go to the next depth as defined by geometry and start again at 4.2.
4.8 Once you have generated geometry for all depth levels you are finished.
5. You would then define cutting parameters for the job (this only influenced G-code generation)
6. G-code (or some other type of CNC code depending on loaded modules) would be generated from tool paths and cutting parameters. Unlike geometry generation where paths for all tools were calculated at once for each depth level, G-code was generated by following the paths of the largest tool on all depth levels, then changing the tool to the next smaller one and following all its paths on all depth levels and so on until the smallest tool in the tool set.
7. If I remember correctly we had a separate tool for transferring G-code (through RS-232!) to CNC machines.

Heh, more and more stuff is starting to crawl out from the back of my brain. I just remembered that once a contour was identified as an "island" or a "lake" we would reorient it if necessary so that it would go in counter-clock-wise direction if it was an "island" or in clock-wise direction if it was a "lake" (or vice-versa, it doesn't really matter so long as they had different orientations.) This allowed us to grow/shrink the contours with the same parameters ignoring the orientation as they would then "naturally" either grow or shrink depending on the orientation we gave it.

I hope that someday we will make open code source for "Impakt!" I would probably be rather embarrassed with the quality of the code but maybe somebody could find some use for it.

Monday, November 26, 2007

Phishing is not an "externality"

I'm no security expert, not even close (I just read about it), while Bruce Schneier is really world renowned security expert. I'm an avid reader of his monthly newsletter and, far more importantly, Neil Stephenson thanked him in Cryptonomicon which is ummmm... words fail me but let's say awesome. However there is one particular hypothesis of Bruce Schneier that I never bought into, not even a little bit; the "our customers are victims of phishing but it isn't affecting us" hypothesis of phishing as externality. In this article (and several other places) he claimed that "Financial institutions have no incentive to reduce those costs of identity theft because they don't bear them." Again, I'm no security expert but I never agreed with that sentiment; it seems obvious to me that customers leaving financial institutions for phishing problems is a direct cost even if financial institutions are unaware of it or are ignoring it (it's an entirely different problem if that's the case.)

This new study indicates that financial institutions do indeed bear costs of phishing and what's more, phishing seems to affect them at their core: by jeopardizing trust people have in their brands. I don't know how many times I have bought an item from Amazon.com even if it is more expensive just to avoid giving my data to an unknown merchant. That's the power of brand. If the study is correct (and it does need to be confirmed by more studies) then I think "phishing is externality" hypothesis can be safely rejected (most importantly by companies that adhere to it through ignorance or bad managment.)

Tuesday, November 06, 2007

ApexSQL Log 2005.10 released + API

The big news this week is that we have released ApexSQL Log 2005.10 together with ApexSQL Log API 2005.10. Yup, API is out there for all you people that have expressed interest for programmable transaction log reading API over the past couple of years. But let's start with ApexSQL Log.

There are three major enhancements in this release of ApexSQL Log:
1. Support for ApexSQL Log API. These two applications share the same server-side components right from the start so you can run them in parallel on the same server by design.
2. Improvements of UPDATE reconstruction process. Due to the way SQL Server logs UPDATE operation, their auditing is Achilles' heel when auditing with transaction logs. However, in this new version we have again improved this process managing to extract more data than ever. It still not infallible (and it will never be infallible unless SQL Server's way of logging UPDATE operations is changed) but it's *very* good indeed.
3. Support for online transaction log reading on Vista x64 and, much more importantly, on upcoming Windows Server 2008 (x64 and IA64 but more on that below)

Here are two enhancements that we didn't deem as major since they are experimental:
1. Experimental support for Itanium (IA64) platforms for SQL Server 2005 IA64 and SQL Server 2000 64-bit.
2. Experimental support for SQL Server 2008 on all platforms (x86, x64 and IA64.) This includes support for new data types (DATE, DATETIME2, DATETIMEOFFSET and TIME)

Yes, as you can see we can actually add support for Itanium and SQL Server 2008 and not call it a major feature simply because they are experimental. For comparison try finding another transaction log reading application that supports even SQL Server 2005 on x64.

What does "experimental support" means? It means that it works (and it all really does work) but that we don't support it officially which in turn means you get support *anyway* and as always we try to fix problems ASAP *anyway* but you understand that this support hasn't been as thoroughly tested as with our other platforms.

Now let's move to ApexSQL Log API. API exposes DML auditing features of ApexSQL Log. Everything ApexSQL Log has in this regard (reading of online/detached/backup transaction logs, filtering, old/new table ID mapping, etc.) is exposed in API and it works just like it does in ApexSQL Log. So what's missing? Missing are:
1. Recovery Wizard: if you need to recover from a data loss (deleted data without transaction log, truncated and dropped tables, corrupted MDF files) you will need to grab ApexSQL Log.
2. DDL auditing. In this initial version at least we are exposing only DML auditing.
3. Out-of-box exports into XML, CSV and so on. All these can be built by using API so we didn't include them. We are evaluating publishing export classes using API just to demo the technology.
4. Command Line Interface and GUI. You would need to build those but it can be done with API.

I'll post more soon on the way API is used. Regarding licensing and related stuff (like distribution) I would recommend that you consult here.

From now on I'll be writing a bit more hopefully (would I bet on it you say?! well... what odds are you giving me ;) There are several parallel projects that I'm involved with but that I can't discuss right now. Suffice to say that ApexSQL Log (and API) will be getting some pretty cool stuff in ApexSQL Log 2008 release and the same goes for some other products of ours (and one completely new one...)

Monday, July 30, 2007

Fast forward 2 months

I see that it's been more than 2 months since the last time I posted. But things have been moving in fast lane and just this last Friday we have release ApexSQL Log API to QA - hope you see it in our offer soon. Release of ApexSQL Log API will be accompanied with the release of ApexSQL Log 2005.10 which has also been release to QA. Both share the same set of server-side components but more on that in a dedicated post.

What have I missed blogging about? Well, apart SQL Server/development stuff, there is the obvious 38th anniversary from the first moon landing (and 8th anniversary of my arrival to Chile - they fall on the same day! how geek is that??) Then not so obvious - 100 years since the birth of Robert Anson Heinlein.

Oh, and I'm blogging this on Miami airport waiting for my connection flight to Raleigh - and from there to our HQ in Chapel Hill. Among other things I have an unsettled debt there... No, I haven't prepared for it. I haven't played basketball since that faithful day last year... But I'm counting on inspiration... or something... anything!

More on Log stuff and development soon, I (kinda) promise.