Thursday, 15 October 2015

Nuance and momentum

In most companies there's an understandable reluctance to commit significant budget to new ideas unless they're very well proven, and - critically - enough people in senior positions think they're good ideas.

So in order to get sufficient momentum for an idea, it needs to be repeated and communicated many times - and thereby loses any nuance of meaning. It becomes necessary to distill the essence of it into an "executive summary" that, especially for technical ventures, often misses the point by a small but important margin.

In the company I currently work in, various technical people, including myself, have been pitching to refactor a particularly complex area of our architecture. It doesn't perform as well as it needs to, and is also the frequent source of insidious bugs in our live environment which are hard to debug since the system is hard to reason about. 
We have a proposal for how it could be designed differently, but in order for this proposal to gain momentum, the point of the work has been diluted through Chinese whispers to the basic premise of "Make it go faster". The enormous benefits in future technical enablement are glossed over. (Of course, in most companies, eliminating technical debt is considered a Good Thing, but when it come to the crunch, no one wants to invest in it. Where I work now is a hundred times better than most at this, but it's still short term wins that really get people going).

Why does this matter? The message that has become fixed on is the short term performance benefit which yields greater opportunity for sales. This benefit is much smaller than the more intangible longer term advantages, and so the idea hasn't made it very high on the agenda.
What's more, ideas with no short term benefit will probably gain no momentum at all.
And when (if?) the idea comes to be engineered, the senior business stakeholders may be impatient for it to be completed, as they don't understand the nuances of how much long term benefit they will be getting. In some companies, this can lead to the project being canned before it's complete (see Death By a Thousand Incomplete Refactorings), or made more difficult by technical teams being asked to work on short term things in parallel.

If only business people could invest more time in understanding technical matters. Of course, one barrier is that us techies are often resistant to that at a subconscious level - we take pleasure in our area of expertise being impenetrable in much the same way I suspected Polish people liked seeing me struggle with their pronunciation and grammar when I lived there. (Different endings for different numbers of things - seriously?)

But the other barrier is that we make our systems too complex. Perhaps a good rule of thumb should be: If non-technical people don't understand our systems, then they're too complex.

So, returning to the topic of gaining momentum and losing nuance, what's the answer? It seems to me it's the same answer I return to again and again in IT (and many other areas): 
Better to restructure your organisation to allow for small independent teams to make their own decisions and set their own direction. That way, less momentum needs to be built up and more nuanced communication can occur, using a rich shared vocabulary.

Is Eliminating Cross Team Handoffs Possible?


Well, not completely anyway. But just because you can't achieve something doesn't mean you shouldn't strive for it.

Lots of advice floating around in the Agile space tells you that handoffs from one team to another are expensive and inefficient: The DevOps movement has given way to the Anti-DevOps movement, suggesting that the development team should do their own Ops; I've written about The Vicious Cycle of Support whereby the development team should support their own output; separating business teams from technical teams can lead to an unpleasant client-supplier relationship where neither side can work effectively.
So do we just need one huge team with everyone in it?

Well, yes. Except for the "huge" bit.

It's been a core concept of most Agile doctrines that you keep your team small, and your immediate scope small. And the people in that team don't just stick to one role - they chip in and help out wherever it's needed.
I've said before that I think the optimum set up is two developers and a product person. That assumes, however, that all those people are very experienced, and are willing to cover off between them the jobs that we typically divide up into the separate roles of Business, UX, Design, Coding, Testing, Release, Support and so on. That may sound ambitious, but consider the benefits:

  • No communication overhead. Everyone knows exactly what's going on, and everyone is working from the same assumptions.
  • Focus. There's no way you're going to be able to juggle 5 projects at the same time like bigger teams are often expected to do.
  • A sense that you really are a team, and that you own this deliverable end to end. This is the same reason that startups are such exciting and dynamic places to work, and often take the market by storm. 
  • You wake up and feel motivated to come to work! It's fun. And when people have fun doing their jobs, all sorts of virtuous cycles develop. Talented people want to come and work for you. People feel inspired to come up with genuinely new and exciting ideas.
  • Everyone in that small group is forced to understand all the domains end to end, which were previously kept separate. And when people understand things end to end, your quality shoots up because there are no unexpected implications from decisions made in isolation; there are no lost opportunities ("if I'd known that was something we'd want to do, I'd have built it differently!")
  • The team can be autonomous. No more waiting. No more chopping and changing at the whim of far-removed senior management or marketing teams. With clear goals, and empowerment, it's pretty amazing what three people can achieve.

(In case you're wondering why handoffs are bad, see Pawel Brodzinski's blog post on it)

The Vicious Cycle of Support

Developers who don't have to support their code in Production unsurprisingly don't consider all the implications of their changes.
Logging, monitoring and alerting are afterthoughts. Questions like "What happens when the network fails?", or "What if we run out of disk space?" aren't at the forefront of people's minds.
The difficult test scenarios get ignored because "We should probably test those things, but the test environments are rubbish and besides, this is how we've always done it."

But the result isn't just that a particular feature going live drags some poor guy out of bed at 3am (the traditional time that hypothetical production issues take place). And it isn't just that the metrics on number of live incidents rises over time. It goes much deeper than that.

When Support teams are separate to Development teams, developers don't understand the domain of Support, and vice versa.

The problem with the former is that systems that weren't designed to be supported are hard to support. The cost of that support effort grows over time.

The problem with the latter is that if Support people make fixes, they do so without understanding the domain fully. This causes further instability and technical debt, and also creates a system which neither developers nor support people understand.
Of course, developers continue to build on these foundations, which compounds the problem still further.

Still, you shouldn't be downhearted as a developer in this kind of company. At least you don't have to support it!

Death By a Thousand Incomplete Refactorings

Start with a fairly new IT system. The lead developers built it with a Shining Vision of the perfect architecture, and that vision has been realised. The team has put most of the finishing touches on it, the initial waves of bugs have been exterminated. We're proud of it. We even spend some time refactoring a couple of bits we rushed.

The lead developers feel like heroes. They stick around for a couple of years basking in the glory of being the people who understand the system inside and out.
And then, gradually, the realisation dawns that there's some technical debt which needs addressing. Possibly it's been there all along, it's just that the system scaled or was extended in ways we didn't anticipate. Possibly it's caused by developers building stuff without understanding the Shining Vision. Possibly it's down to The Vicious Cycle of Support.

Whatever it is, the Make it Shiny Again project is born. One of the original veteran lead developers spends all day every day telling people about how the Shining Vision got lost, and now enough senior people have listened. Troops are mobilised.

The big refactoring goes well at first. Everyone is behind it. Update emails go out gleefully reporting the excellent progress. Demos happen. Meetings show high level progress and sing the praises of the Make it Shiny Again project. It will destroy disease in the third world. It will bring peace to the Middle East. It'll even increase our unit test coverage.

But, inevitably, as the project stretches on and the full scale of the task becomes apparent, people begin to lose energy. It looks like the timescales are moving out. Fewer and fewer people are extolling the virtues of it. Even the veteran lead developer has got distracted by a new even shinier way of doing things. The project joins the ranks of other tedious long term projects with no sense of urgency, gathering dust at the bottom of the project plans. (In some companies, "long term project" means years. In some it means weeks. Either way, energy attrition occurs).

Eventually something goes live. It may feel like a pyrrhic victory, but it delivers value. One part of the system has been refactored, and is running successfully alongside the old system. No one has the energy to celebrate.

Some conscientious members of the team ask when we'll move on to the next phase of refactoring. Tumble weeds drift across the office floor. Strangely, no one fancies descending back into that dark chapter of their lives.

The final solution, once all functionality had been refactored, meant the switching off of the old system:

However, what the Make It Shiny Again project actually achieved was this:

Now drill deeper and look at the code. Many many incomplete refactorings of different things have happened. Everyone touching the code has either not understood that particular area's Shining Vision, or has sought to impose their own.
Now look at it from 30,000 feet. Whole swathes of the architecture (and the organisation) have undergone incomplete refactorings.
If you squint a bit, the architecture is a fractal. It looks like spaghetti at any level.

No wonder it all feels like such a mess. The solution? 
Let me unveil my Shining Vision. We'll finish it this time, honestly we will.

Less bleak footnote
This can be avoided. Some technical debt is inevitable1, but Death By a Thousand Incomplete Refactorings is not:
- Don't bite off more than you can chew.
- Deliver in small increments. If it's going to fail, make sure it fails early.
- Celebrate each small success. 
- Swap people in and out - keep it fresh. Let everyone be involved.
- Recognise the risk of an incomplete refactoring!

1 See Martin Fowler's TechnicalDebtQuadrant