Becoming agile

Agile, through the storms

Archive for the tag “Gene Hughson”

Everything is Hunky Dori, Always, No Matter What

What might it mean when you ask someone “how are you?” and the answer is always hunky dory?
Gene Hughson in his latest post  refers to The Daily WTF’s post on a system that never reports an error. This provokes several thoughts.

How it all went pear-shaped

How it all went pear-shaped Photo by Ted and Dani Percival

1. There is a fantasy to create fail-safe systems. There is a similar fantasy to create fail-safe organizations and teams. The truth is that we would much prefer systems (both software and organizational) that are safe-to-fail. Since most software systems and most organizations operate in complex environments which are impossible to predict, knowing what failed is paramount to the evolution of the system.

In an analogy, we would love that our children will have developed magical qualities to get top grades, be friends with everyone they wish, etc. However, we will become much better parents if we invested efforts to help them let us know when they need our help.

2. In the past, Windows, based on the x86 architecture, had a magical error that told us everything: General Protection Fault. Maybe the fantasy at Intel and/or Microsoft was that the x86 with Windows 3.1 is a super quality system, that fails on seldom occasions, so only GPF is required in such conditions (of course, this is a wild and unvalidated hypothesis for the sake of the argument only).
In practice, Even the infamous Dr. Watson, in “his” first versions, was not good enough to tell us what’s wrong, and additional tools were required.
Luckily today Windows combined with 3rd party tools is much better at telling us what went wrong.
Moreover, modern tools tell us what is going wrong now, and even what is about to go wrong.

3. Conways Law tells us that the architecture of the organization is a reflection of the product architecture (and vice versa).
Relating to B.M’s co-worker in Daily WTF’s post, rather than putting the responsibility on him/her, I wonder if and how their organization is structured to hide faults, and what does it mean to admit having made an error.

In organizational life, when your team members always keep telling you that everything is OK, a good advice is to explore how you contribute together to not telling when they could use your help. What are you collectively avoiding to address the real problem issues.
Parallelly, if you are getting frequent customer complaints that are undetectable before the product is released, a good advice is to explore how your architecture is contributing to hiding away such errors.

Don’t Worry, Be Happy… Until One Day

Continuing the disclaimer of two other posts I am referring to – this is not a political post.

Gene Hughson has recently written on the US project, in response to Uncle Bob’s post from November 12th.

This is not the first time that a software failure had caused severe damage to mammoth projects. Here’s a short quote from Wikipedia on the first launch of Ariane 5:

Ariane 5’s first test flight (Ariane 5 Flight 501) on 4 June 1996 failed, with the rocket self-destructing 37 seconds after launch because of a malfunction in the control software. A data conversion from 64-bit floating point value to 16-bit signed integer value to be stored in a variable representing horizontal bias caused a processor trap (operand error) because the floating point value was too large to be represented by a 16-bit signed integer.

The emphasis I have added points to a basic flaw in computer programming, often experienced by novice engineers. One would expect that a high-profile aerospace project will hire better engineers than that, don’t you agree?

Uncle Bob Martin thinks so:

[…] So, if I were in government right now, I’d be thinking about laws to regulate the Software Industry. I’d be thinking about what languages and processes we should force them to use, what auditing should be done, what schooling is necessary, etc. etc. I’d be thinking about passing laws to get this unruly and chaotic industry under some kind of control.
If I were the President right now, I might even be thinking about creating a new Czar or Cabinet position: The Secretary of Software Quality. Someone who could regulate this misbehaving industry upon which so much of our future depends.

Moreover, Uncle Bob refers to another aerospace disaster – the Challenger explosion, and the engineers’ responsibility in not stopping the launch:

It’s easy to blame the managers. It’s appropriate to blame the managers. But it was the engineers who knew. On paper, the engineers did everything right. But they knew. They knew. And they failed to stop the launch. They failed to say: NO!, loud enough for the right people to hear.

In response, Gene Hughson writes:

Considering that all indications are that the laws and regulations around government purchasing and contracting contributed to this mess, I’m not sure how additional regulation is supposed to fix it.

Sadly for our industry, I agree with Gene. Yes, engineering practice has, on the whole, a long, long way to go to become anywhere near excellent. I have a lot of respect for Uncle Bob for his huge contribution there.

But the Challenger disaster is first and foremost not an engineering failure. The disastrous potential of the problematic seal was known for a long time before it actually materialized, to everyone’s shock.

“The Rogers Commission found NASA’s organizational culture and decision-making processes had been key contributing factors to the accident. NASA managers had known contractor Morton Thiokol’s design of the SRBs contained a potentially catastrophic flaw in the O-rings since 1977, but failed to address it properly. They also disregarded warnings (an example of “go fever”) from engineers about the dangers of launching posed by the low temperatures of that morning and had failed in adequately reporting these technical concerns to their superiors.

At the end of the day, it boils down to the fact that NASA’s leadership were operating under the false belief that with every launch of the shuttle, the risk of the seal failing reduces, completely opposite to common sense.

Mr. Larry Hirschhorn has an excellent description of this in his book The Workplace Within.

In such atmosphere, when my managers, and their managers, are so indifferent to life-threatening flaws, heck, why should I exercise excellence in my mundane tasks? Why should I risk my own livelihood? After all, this is the culture here, in this workplace.

It is heartbreaking that the loss of the Columbia can be attributed to similar management pitfalls as that of the Challenger:

In a risk-management scenario similar to the Challenger disaster, NASA management failed to recognize the relevance of engineering concerns for safety for imaging to inspect possible damage, and failed to respond to engineer requests about the status of astronaut inspection of the left wing. Engineers made three separate requests for Department of Defense (DOD) imaging of the shuttle in orbit to more precisely determine damage.

Coming back to Uncle Bob’s conclusions, in his talk, How schools kill creativity, Sir Ken Robinson points out that the school system, in its efforts to teach, are killing creativity in favor of grades. We can only assume that legislating computer engineering studies will, at best, not harm the existing engineering quality. It will probably achieve worse – well certified engineers, with little ability or drive to excel.

This failure has little to do with teaching and certifications, and all too much to do with culture, professionalism, and plain simple awareness.

When managers practice such “It will be OK” attitude, everyone does. By the sound of it, the failure discussed here is not that far off.

In 1992, Prime Minister Yitzhak Rabin was speaking at the Staff and Command school to prospect senior officers. Here’s what he had to say about “It will be OK”:

One of our painful problems has a name. A given name and a surname. It is the combination of two words – ‘Yihyeh B’seder’ [“it will be OK”]. This combination of words, which many voice in the day to day life of the State of Israel, is unbearable.

Behind these two words is generally hidden everything which is not OK. The arrogance and sense of self confidence, strength and power which has no place.

The ‘Yihyeh B’seder’ has accompanied us already for a long time. For many years. And it is the hallmark of an atmosphere that borders on irresponsibility in many areas of our lives.

The ‘Yihyeh B’seder’, that same friendly slap on the shoulder, that wink, that ‘count on me’, is the hallmark of the lack of order; a lack of discipline and an absence of professionalism; the presence of negligence; an atmosphere of covering up; which to my great sorrow is the legacy of many public bodies in Israel – not just the IDF.

It is devouring us.

And we have already learned the hard and painful way that ‘Yihyeh B’seder’ means that very much is not OK.


No, Uncle Bob, engineers are not to blame on this. Management must take responsibility for nourishing a culture that allows such poor standards.

Post Navigation