Becoming agile

Agile, through the storms

Archive for the category “Quality”

Everything is Hunky Dori, Always, No Matter What

What might it mean when you ask someone “how are you?” and the answer is always hunky dory?
Gene Hughson in his latest post  refers to The Daily WTF’s post on a system that never reports an error. This provokes several thoughts.

How it all went pear-shaped

How it all went pear-shaped Photo by Ted and Dani Percival http://www.flickr.com/photos/tedpercival/

1. There is a fantasy to create fail-safe systems. There is a similar fantasy to create fail-safe organizations and teams. The truth is that we would much prefer systems (both software and organizational) that are safe-to-fail. Since most software systems and most organizations operate in complex environments which are impossible to predict, knowing what failed is paramount to the evolution of the system.

In an analogy, we would love that our children will have developed magical qualities to get top grades, be friends with everyone they wish, etc. However, we will become much better parents if we invested efforts to help them let us know when they need our help.

2. In the past, Windows, based on the x86 architecture, had a magical error that told us everything: General Protection Fault. Maybe the fantasy at Intel and/or Microsoft was that the x86 with Windows 3.1 is a super quality system, that fails on seldom occasions, so only GPF is required in such conditions (of course, this is a wild and unvalidated hypothesis for the sake of the argument only).
In practice, Even the infamous Dr. Watson, in “his” first versions, was not good enough to tell us what’s wrong, and additional tools were required.
Luckily today Windows combined with 3rd party tools is much better at telling us what went wrong.
Moreover, modern tools tell us what is going wrong now, and even what is about to go wrong.

3. Conways Law tells us that the architecture of the organization is a reflection of the product architecture (and vice versa).
Relating to B.M’s co-worker in Daily WTF’s post, rather than putting the responsibility on him/her, I wonder if and how their organization is structured to hide faults, and what does it mean to admit having made an error.

In organizational life, when your team members always keep telling you that everything is OK, a good advice is to explore how you contribute together to not telling when they could use your help. What are you collectively avoiding to address the real problem issues.
Parallelly, if you are getting frequent customer complaints that are undetectable before the product is released, a good advice is to explore how your architecture is contributing to hiding away such errors.

Advertisements

Don’t Worry, Be Happy… Until One Day

Continuing the disclaimer of two other posts I am referring to – this is not a political post.

Gene Hughson has recently written on the US healthcare.gov project, in response to Uncle Bob’s post from November 12th.



This is not the first time that a software failure had caused severe damage to mammoth projects. Here’s a short quote from Wikipedia on the first launch of Ariane 5:

Ariane 5’s first test flight (Ariane 5 Flight 501) on 4 June 1996 failed, with the rocket self-destructing 37 seconds after launch because of a malfunction in the control software. A data conversion from 64-bit floating point value to 16-bit signed integer value to be stored in a variable representing horizontal bias caused a processor trap (operand error) because the floating point value was too large to be represented by a 16-bit signed integer.
Source: http://en.wikipedia.org/wiki/Ariane_5#Notable_launches

The emphasis I have added points to a basic flaw in computer programming, often experienced by novice engineers. One would expect that a high-profile aerospace project will hire better engineers than that, don’t you agree?

Uncle Bob Martin thinks so:

[…] So, if I were in government right now, I’d be thinking about laws to regulate the Software Industry. I’d be thinking about what languages and processes we should force them to use, what auditing should be done, what schooling is necessary, etc. etc. I’d be thinking about passing laws to get this unruly and chaotic industry under some kind of control.
If I were the President right now, I might even be thinking about creating a new Czar or Cabinet position: The Secretary of Software Quality. Someone who could regulate this misbehaving industry upon which so much of our future depends.

Moreover, Uncle Bob refers to another aerospace disaster – the Challenger explosion, and the engineers’ responsibility in not stopping the launch:

It’s easy to blame the managers. It’s appropriate to blame the managers. But it was the engineers who knew. On paper, the engineers did everything right. But they knew. They knew. And they failed to stop the launch. They failed to say: NO!, loud enough for the right people to hear.

In response, Gene Hughson writes:

Considering that all indications are that the laws and regulations around government purchasing and contracting contributed to this mess, I’m not sure how additional regulation is supposed to fix it.

Sadly for our industry, I agree with Gene. Yes, engineering practice has, on the whole, a long, long way to go to become anywhere near excellent. I have a lot of respect for Uncle Bob for his huge contribution there.

But the Challenger disaster is first and foremost not an engineering failure. The disastrous potential of the problematic seal was known for a long time before it actually materialized, to everyone’s shock.

“The Rogers Commission found NASA’s organizational culture and decision-making processes had been key contributing factors to the accident. NASA managers had known contractor Morton Thiokol’s design of the SRBs contained a potentially catastrophic flaw in the O-rings since 1977, but failed to address it properly. They also disregarded warnings (an example of “go fever”) from engineers about the dangers of launching posed by the low temperatures of that morning and had failed in adequately reporting these technical concerns to their superiors.
Source: http://en.wikipedia.org/wiki/Space_Shuttle_Challenger_disaster

At the end of the day, it boils down to the fact that NASA’s leadership were operating under the false belief that with every launch of the shuttle, the risk of the seal failing reduces, completely opposite to common sense.

Mr. Larry Hirschhorn has an excellent description of this in his book The Workplace Within.

In such atmosphere, when my managers, and their managers, are so indifferent to life-threatening flaws, heck, why should I exercise excellence in my mundane tasks? Why should I risk my own livelihood? After all, this is the culture here, in this workplace.

It is heartbreaking that the loss of the Columbia can be attributed to similar management pitfalls as that of the Challenger:

In a risk-management scenario similar to the Challenger disaster, NASA management failed to recognize the relevance of engineering concerns for safety for imaging to inspect possible damage, and failed to respond to engineer requests about the status of astronaut inspection of the left wing. Engineers made three separate requests for Department of Defense (DOD) imaging of the shuttle in orbit to more precisely determine damage.
Source: http://en.wikipedia.org/wiki/Space_Shuttle_Columbia_disaster

Coming back to Uncle Bob’s conclusions, in his talk, How schools kill creativity, Sir Ken Robinson points out that the school system, in its efforts to teach, are killing creativity in favor of grades. We can only assume that legislating computer engineering studies will, at best, not harm the existing engineering quality. It will probably achieve worse – well certified engineers, with little ability or drive to excel.

This failure has little to do with teaching and certifications, and all too much to do with culture, professionalism, and plain simple awareness.

When managers practice such “It will be OK” attitude, everyone does. By the sound of it, the healthcare.gov failure discussed here is not that far off.

In 1992, Prime Minister Yitzhak Rabin was speaking at the Staff and Command school to prospect senior officers. Here’s what he had to say about “It will be OK”:

One of our painful problems has a name. A given name and a surname. It is the combination of two words – ‘Yihyeh B’seder’ [“it will be OK”]. This combination of words, which many voice in the day to day life of the State of Israel, is unbearable.

Behind these two words is generally hidden everything which is not OK. The arrogance and sense of self confidence, strength and power which has no place.

The ‘Yihyeh B’seder’ has accompanied us already for a long time. For many years. And it is the hallmark of an atmosphere that borders on irresponsibility in many areas of our lives.

The ‘Yihyeh B’seder’, that same friendly slap on the shoulder, that wink, that ‘count on me’, is the hallmark of the lack of order; a lack of discipline and an absence of professionalism; the presence of negligence; an atmosphere of covering up; which to my great sorrow is the legacy of many public bodies in Israel – not just the IDF.

It is devouring us.

And we have already learned the hard and painful way that ‘Yihyeh B’seder’ means that very much is not OK.

Source:
http://www.imra.org.il/story.php3?id=46224

No, Uncle Bob, engineers are not to blame on this. Management must take responsibility for nourishing a culture that allows such poor standards.

As a Courtesy for the Next Teammate

I have recently returned from a trip abroad, and, during the flight, I came across the following notice:

2013-06-11 11.24.36

I saw this and thought about working in a team. Think of all the people on the airplane as a large group, there is a subgroup whose role, among others, is to make sure the restrooms are tidy and clean. We know them as flight-attendants or stewards. In order to ensure that everyone gets a good experience, they would need to enter the restrooms after each time it is used, and clean after the travellers.

Compared to a software development team, this would mean someone, say testers, would need to refactor and clean the code each time it is being committed, to ensure that the code is in a good-enough state for the next programmer to visit the code. Read more…

A piece of cake. A huge piece of NIH cake

This is so tempting to start something afresh, something that only we will use, so it fits our needs like a glove.

It is so tempting because it seems so easy to develop exactly what we need – it must be a piece of cake to custom build it for us.

And of course what exists out there does not serve the purpose well enough, simply because it was NIH – Not Invented Here.

Image by http://www.flickr.com/photos/rabanito/

So here are ten reasons NOT to develop internally what exists off the shelf:

1. It is already tested by more customers

2. You enjoy from features you didn’t think of; you will enjoy future features you cannot imagine today

3. You are good at what you do business wise. So why try to become proficient in something other organizations are already experts in?

4. Conversely, by investing in existing tools you produce less of what you are uniquely good at

5. Once you deploy your tool you will need to maintain it, making even less of what you are good at

6. When you or the market evolve you will need to expand also the tool – what tool providers do anyway for their own survival

7. Changing the tool becomes much harder. For starters, the tool will be someone’s “baby”. You might also find that you need to lay off really good people to replace it, turning decision making harder still.

8. People always blame tools. Always. It is easier to defend a Rolls Royce or a Chevy than it is to defend a home made cart

9. Some tools (not yours!) are crap for real. It is easier for someone to admit that they’ve bought a piece of crap than to admit that they are making crap.

10. Think twice and thrice before you reinvent the wheel. If you are already making the Rolls Royce of your business, why service it with second grade tools?

What is your view? Are you developing tools that are commercially available?

Spot the Differences

Everyone’s talking about agile. This is the hype in software development process – the fashion, the Mode. It is so much in the Mode that we sometimes forget that our purpose in life is to produce quality, endurable, appealing software. Scrum, or any other agile framework for this matter, is a means, not a goal in its own right.

A friend gave me permission to share this image with you: Can you spot the differences?

Let’s hypothesize what has happened here:

  • Two error messages were displayed in a very different fashion. Why?
  • Because they were introduced by two separate developers. Why?
  • Because one is technical and the other is more business oriented. Why does this matter?
  • Because there was no one shared mechanism for messages for the user. Why?
  • Because it had to be finished within the same iteration. Why does this matter?
  • Because in agile there is no time to build such infrastructures.

Is that really so?

Scrum is no excuse for technical mediocrity. On the contrary – it is an opportunity to improve, based on two major features of agile:

  1. Scrum does not introduce problems. It exposed them. In this example, this is an opportunity to deal with such a scenario now rather than later. In a traditional project, such a failure can also occur. An iterative-incremental process helps here on two ends:
    – The feedback loop is much shorter
    – A sustainable solution, such as a framework to display messages, can be introduced in the next sprint, rather than the too-well-known “We will deal with it in the next version” (also known as “It will never happen”)
  2. When the team encounters similar tasks, they should identify that they fall into the DRY – Don’t Repeat Yourself, category. When you find you are doing something repetitively, such as forming a message box time and again, this is a good time to make it happen in one place. It may cost an extra hour in the current iteration, but it will save dozens of them in maintaining those messages, should you make general changes to them.

If you are one of the agile-skeptics, I strongly recommend you take a course to learn what it really is about. Especially if you are working in an organization that practices agile – you might be doing damage not only to the organization, but also to yourself. So next time you encounter such an agile-makes-things-worse moment, think maybe it is not agile who is not up to scratch, but your knowledge of it

Bread Scrums

There are many reasons to go SCRUM in particular or agile in general. Each organization will have their own reasons and motivations. According to the 2011 State of Agile survey, top three reasons are to increase time to market, to cope with changing priorities and to increase productivity.

The question is, how do you know that you are getting there?

The default answer for most is KPIs. Measure it, and you will know that you are there. The problem with measurements is that they are very elusive. You want to measure one thing, and ending up affecting another. Regardless of the measurement, you may impact something that you didn’t intend to.

Why is that so? This is because when you measure things, you leave a trail, and this trail is not merely the guide to go back, it is also the guide to go forward, a kind of a forecast to the next target.

In fact, a trail is more useful to going back in order to fix things, rather than to predict where to go next. Take breadcrumbs navigation: It helps you go back in the application to re-navigate; it also helps you navigate faster on successive uses of the application. But it doesn’t help you predict what you are going to do. It can help you measure what users are doing most, in order to improve the navigation. Like in Hansel and Gretel, the kids wanted a trail back home, not a guide to go forward.


Source: http://www.flickr.com/photos/koiart66/3877752234/ by koiart71

Take an example. Many teams use velocity as their planning tool. They use the amount of done stuff, in relative size, in order to plan how much they can sensibly fit into the next iteration. It is a planning tool, not a measurement.

Yet, organizations try to use this velocity to do more than planning. After all, velocity can be a good measure for productivity or even predictability, can it not?

Yes, if your organization’s main business is velocity. When were you last asked by your customers: “For the next release we would like to order 54 points and 21 epics, please”? Is this the kind of predictability you require?

Check out Smith and Jones Predictable Lighthouse sketch – are you convinced that predictability is something you welcome that much anyway?!

What kind of trail can you use that will record something deliverable, and not an artificial, game-able, number that can have collateral impact that is potentially undesirable?

Let’s examine few of the options:

  • Velocity: Dear team, please provide more story points.
    No problem, dear manager. We’ll just skip all those tests, and the number is set to increase. Given that we do not provide the support, escaping defects will be anyway handled elsewhere.
  • Predictability: Dear team, please provide a consistent number of story points.
    No problem, dear manager. We’ll just provide the same number of points. When something big comes in, we’ll just bloat the estimate, so we don’t have to deal with breaking to smaller stories.
  • Cycle time: Dear team, please provide a standard ratio between points size to the time to develop it.
    Now here’s a good one. Cycle time can actually be a rather useful trail. But only if you consider it from a systemic point of view – which makes it much harder to measure. Otherwise, it may become similar to velocity

The problem with all the above is that it mixes several concepts into one number. The trail and the measurement are not one and the same. This is why using points or velocity as a measurement is risky. It becomes goal in itself, rather than means to a goal.

A much preferred trail is executable specifications, developed and described by Gojko Adzic in Specification by Example.

In such a trail, for every story you specify, in business context, WHAT is the system expected to do. This specification is turned later to an executable sequence that will verify that the developed code does WHAT it is expected to do. Note, that specification by example is not about HOW, it is about WHAT.

This makes a trail of business rules that have been proven, and are now part of our regression suite, and upcoming trail that is what business rules are to be proven in the coming iteration. In an analogy to the breadcrumbs navigation, where have we navigated so far, and where are we navigating to now. Note that over specifying (forecasting several iterations ahead) is likely to become waste.

Recall from the top of the post, the trail helps you go back and make corrections. In this executable specification trail, it enables you to either fix business rules that got broken due to changes, or to remove redundant rules – and code, in order to navigate better in the future.

Coming back to measurements – this trail is not a measurement. It is a trail.

If you must measure, try to check, for example, how many routes you are navigating on concurrently. WIP can be a good measurement for that. Yuval Yeret has a good explanation of WIP and using CFD to measure it. Once in place, you can start measuring also cycle times, but as a supporting tool for your WIP, not as a guide to limit story duration.

Alternatively, try to use Agile Earned Value Management (EVM). You may read about it in Tamara Sulaiman’s article here. While Agile EVM is useful for planning against budget, it has some hidden assumptions, such as scoping for the entire release.

The merits of one measurement or the other is the subject for a separate post. I will just comment that I like these measurements more than others because a) they are good tools for decision making, and not merely measuring; and b) indeed they are based on the trail but not measuring it.

As long as you remember that the objective of the trail is to correct yourself – to make the right decisions, you will be ok. Don’t force measurements if you cannot effectively use them for decision making.

Otherwise, you may find that you were hoping for breadcrumbs and instead of SCRUM you we left with, well, just the crumbs and no trail.

On the merits of rituals

One moment before Passover start. This photo is probably meaningful to most of my readers. Come Passover, all food that contains, or is suspected to contain, flour and yeast is to be removed. At the workplace in many occasions it is simply covered up to avoid accidental use of such food products. At home, the religious practitioner will meticulously clean the house until all crumbs are properly removed from the house.


There are disputes as to the essence of this activity. Many seculars see this as a futile exercise. Atheists may see this as nonsense. But for the religious individual and family, this is part of the essence of being Jewish. Make no mistake, for the Jewish religious practitioner, Done is Not Done if a minute piece of Khametz, bread, cake, or even plain flour, is not taken out of the house well ahead of Passover.

As for these vending machines: Someone has to remember to take care of all the Khametz before Passover, and all the items that must be handled for this. The coffee jars, the cutlery, the bread baskets, the refrigerators, and even the vending machines.

It takes years of experience to remember to cover everything, and to develop the routine to recall and attent to everything.

And yet, when staff changes, nothing it left forgotten. Part of the routine is its craftsmanship to ensure that this is not dependent on one individual or another. Instead, it is the responsibility of the maintenance team to ensure that everything is taken care of, year after year.

Immediately after Passover, the regular coffee jars will be returned, the refrigerators will be filled with Khametz all over again, and the vending machines will also not be left unforgotten, and will be uncovered.

What are the rituals that you already do, and the ones that you wish you had, in delivering new software? What is not left forgotten even if you do it only once a year? What are the things that you tend to forget, but are important for you?

What will you do this year to get the rituals into the organizational rhythm, so you do not forget it next year?

Happy Passover, Happy Easter, Happy holiday season to all

Post Navigation