Posts Tagged ‘ROI’
Ever wondered how to support your client’s service expectations, or provide enhancements to products you’ve built in the past, while trying to develop a market leading flagship product? I recently attended a presentation by Nik Silver of Guardian News and Media (GNM) at the Agile Business Conference in London (view, twitter: #agilebc09), where he mentioned that GNM does this under the guise of “Business as Usual”. Attending that presentation in some way encouraged me to write this as I’ve been assessing the value (ROI) of running such a concept for a while now.
The scope of this article is not to propose a formula for measuring value, instead I want to provide an example of how the efficiency of maintaining legacy applications could be optimized in order to maximize their value.
I work for a large media organization, within an agile scrum digital department which includes two Product Managers, eight Software Engineers (including my role as Engineering Manager) and four Quality Assurers (QA), as well as Project, Creative Service and Client Service teams. Over the last few years (pre agile) the department has developed several software applications that support various internal business divisions. Each of these applications, with the exception of one, was engineered by different individuals. The quality of the code varies and domain knowledge has declined over the years because team members have moved on; to access that domain knowledge, the department is now forced to rely either on myself for most of the domains, or another long standing member of the team for the rest. Until recently these applications were neglected due to a lack of resource, as in 2009 our entire focus switched to the rebuilding of our flagship product (I’ll refer to this as the “main release” from now on).
Four months ago a colleague from Product and I established an “Ops Rota”. Its objectives were:
- Alleviate client pressure to support the different applications
- Expose the Product Managers, Engineers and QA to each of those applications
- Avoid costly new version releases in order to fulfill application roadmap expectations
In order to achieve these objectives the Ops Rota would require the diversion of resource away from the continued development of the main release. This would inevitably impact productivity, however, we agreed to allocate one software engineer and one quality assurer full time.
Last month we assessed the Ops Rota’s output against its three objectives and it was considered a success: our clients have visibility on the development of their products and in four months we have reduced the number of defects for each legacy application by 60-70%, Engineers and QA had been working on and therefore exposed to applications other than the main release and even some new features are making it into production. Additionally, the Ops Rota was managed with our established agile scrum methods and those involved have really enjoyed working on it. During the last month, since I’ve handed the management of it over to the project team (who now assume all scrum mastering responsibilities), it’s become clear how much it’s still dependent on me.
On closer inspection things don’t look so good. The process of delivering bug fixes or features is inefficient because the iterations are often disrupted, something that was covered up by my “affectionate” determination to make the Ops Rota work. Unless we make some changes, achieving the original objectives may not be possible. Our problems are outlined below:
- Only occasionally are tasks (sprint backlog items) added to bugs or product backlog items; very rarely are acceptance specifications defined. The impact of this can be illustrated as follows: One feature of a particular domain is a revenue report. It constantly has bugs, or requires enhancements to parts that have no record of the work that’s been done before. Over time different engineers have made many assumptions about how revenue should be reported because acceptance specifications for the expected outcome have never been defined. The code is now in a very bad state, it’s extremely fragile and a nightmare for any engineer to work on.
- Our Engineer’s exposure to the different domains is not happening quickly enough. They rotate every two weeks to fit in with the main release’s sprint cycle; an Engineer can spend their whole two weeks within one domain fixing bugs or building new features and when their turn comes around again, Product’s priorities mean they could be working on the same domain.
- Sprint iterations tend to focus on only one domain.
- Resource is often removed to compensate for issues arising in a main release’s sprint. This impacts the capacity of the Ops Rota iteration, compromising the ability to accomplish goals.
- Releases are infrequent.
- Feature bottlenecks build up with QA. After development within a domain, QA testing may not start for several weeks, mostly due to resource availability. When testing does start, in order to answer any questions, I’ve been picking up an item where the previous engineer left off, as they’ve moved back into the main release and disturbing them is disruptive [see Joel on Software, last three paragraphs]. This disjointed development – QA testing, combined with QA’s preferred method of testing all bugs and features for any particular domain in one go – delays a production release. We are not using branching effectively, this leads to the “stacking up” of untested domain features on the trunk. If a problem gets identified in the production application that requires an emergency fix, you guessed it, we can’t release because of all the “stacked up” untested features on the trunk.
- Builds are still not automated.
It’s surprising how many principles and practices get ignored when the elements of a particular project management methodology break down. For example, we have invested a great deal of time with the Engineers on “Agile Design” in our main release; TDD, before we got there the SOLID principles, refactoring, paired programming and so on. We’re using scrum: iterations, daily stand ups, and a task board. All these principles and practices are part of daily life in the main release but as you can see not in the Ops Rota, the scrums starts are irregular, we ignore the burn down most of the time and the retrospectives and demos aren’t formalized.
To me, this has raised the question: for those problems to manifest in an organization where our implementation of the agile scrum methodology is pretty mature for the main release cycles, are we using the right kind of agile methodology for this particular environment? Should we invest more time and effort trying to make scrum work or is there something more suitable?
The “Lean Stream”
One of the Ops Rota’s greatest successes has been delivering to the client bug fixes or feature requests that enhance their product. Whether or not this has translated to an increase in revenue, something that we can use to measure ROI, it’s too early to say but it’s certainly easier to walk past people in the corridor now. By starting the Ops Rota, we are in a much better position than we were a year ago to react quickly to those software issues in applications that can hold the business back. How though, especially with the volatility of resource availability, can we become more efficient and have half a chance at fully achieving the original objectives?
Over the last few weeks I have been looking at the Kanban and Lean ways of practicing Agile. According to David Joyce, Lean Software Engineering’s processes can be described as work in progress limits, queues and control loops. My summarized interpretation of that is to limit the amount of features under development and manage that limited amount through to the done state, then release.
“As adding features becomes more straightforward we may find that we are in a state akin to production and can take advantage of a Kanban approach to further optimize for reduced cycle time, increased throughput and just in time scheduling” David Draper on agile & design:
By its very nature the Ops Rota is similar to a production line so let’s explore this further. First, let’s try to simplify the existing problems into areas:
Documenting and Quality
Product, Engineer and QA exposure to applications
I’ll slightly alter the Ops Rota’s scrum management process stages from the “Not Done”, “In Progress”, “Ready for Test” & “Done” statuses to a Kanban style “Ready for Engineering (3)”, “In Progress (2)”, “Ready for Test (3)”, “In Test (2)” & “Done” which includes limiting work items on the task board; the numbers after each heading indicate the limit imposed on the amount of items that can be in each column based on the capacity.
Now let’s now address those problem areas:
Documenting & Quality
Instead of operating under scrum methods, story pointing stories or bugs up front and assigning enough to fit into our two week cycle, feature requests and bugs should be assessed as soon as they are raised, prioritized by business value, then if deemed valuable enough, story pointed and investigated fully to define acceptance specifications (include scenarios). ONLY then can that piece of work be put into a “Ready for Engineering” state. Sticking to the processes described above, only a limited number of features would be under development. The controlled flow, or focused management of that feature through development would encourage greater scrutiny and the enforcement of the rule to add tasks (SBIs), helping to document by providing a historical record of the engineering work being done.
Product, Engineer and QA exposure to applications
“A limited amount of features under development”. I’m starting to like this. Each feature has a business priority attached to it that will determine when work gets done. If you were to look at the domain work currently remaining in our Ops Rota system and at the nature of the issues raised from the different domains, I’m sure you’d agree that by switching the focus from only working on one domain during a two week rota, to working on as many different features as the capacity allows, it would mean everyone involved would be exposed to multiple domains very quickly.
Again, by limiting the amount of work under development, which would also apply to how much is allowed to be tested, bottlenecks can be controlled. If they did occur there is much greater visibility on what resource would be needed to get something to the done state to release the bottleneck.
Kanban says an In Progress “place” should always be available for blockers / emergencies which is fine too. Branches should be made when releasing versions from now on, so that emergencies can be managed. Feature or bug work should be engineered on the trunk and then the new release branched when the feature or bug is set to done. Emergency fixes get engineered on the branch and merged back down onto the trunk. If a bottleneck forms, there will only be a small amount of features causing the bottleneck therefore resource can be assigned to clear it without having a significant affect on other projects in development. Features or bugs should be speedily managed through the system and confidently spot/hot fix/minor released into production; the high release rate should force the creation of build scripts to automate the builds.
It seems to me adopting this Lean approach would provide a controlled, feature driven environment for the Ops Rota. I would expect a rhythm to develop for creating stories and defining feature or bug acceptance specifications, the high release rate would provide great domain exposure for Engineers and QA. With everyone focused on one feature it should promote the Engineer’s adherence to the agile design principles laid out by Robert C. Martin and Mica Martin. There is also a solution to our problem of bad quality legacy code suggested in the point about coding to interfaces on the Improving Your TDD Skills session notes from CITCON Paris 2009. The “pull” nature of the development stream combined with the work item limits means that big backlogs cannot build up.
If this worked I could see it really benefiting our clients by providing them a constant stream of value. A stream that could be switched off and back on with great ease during periods of resource constraint.
I haven’t mentioned the Kanban cards yet. To be honest, I haven’t given them as much thought as I have the management of work items. I could be in danger of becoming a Kanbut but I do not place as much importance in what colour the cards are at the moment as I do in solving the Ops Rota’s problems. We are trialing a digital task board at the moment and we haven’t investigated the Kanban templates yet. I also haven’t mentioned automated acceptance testing. We are not set up in our organization for this yet so that’s also a topic for another day. One step at a time.
I’ve convinced myself we should convert the current format of the Ops Rota into what I want to name our “Lean Stream”. Theoretically this would make the job of managing it more effective, or perhaps, as it should, become self managing. I also reckon everyone else involved will enjoy working on it even more and gain a greater sense of achievement. I can also see a similar, “sister” stream operating for our flagship product, to cater for bugs and urgent feature requests, enabling the main releases (“curtain lifts” as Nik Silver called them) to just focus on version objectives. We could call it “Lean Jeanie”! OK stop now.