Delivering the Goods

Sunday 8 February 2015

About organizing Devops Days event

I've been lazy and not written anything here, "busy at work". Nevertheless, I wrote a text to Eficode's site on How to Organize Devops Days. I was heading the planning and organizing of the first ever Devops Days event in Finland last November, and this is some help for those who are intending to arrange an event of their own. Shortly, there's a lot to do and you should start early enough...

Thursday 17 April 2014

Why Scrum is not enough?

Sorry about the long silence, been busy with work (lame excuse...). Well, to break the silence, I just published a new blog text on my employer's pages. Unfortunately, it's in Finnish - perhaps I rewrite it in English some day.

http://eficode.fi/blogi/miksi-scrum-ei-riita/

Anyway, the key message is that while Scrum is helping SW teams to be more successful, it might not help the business as focus is often just on the performance of the team, not the whole business. And this is what DevOps movement is trying to tackle.

Wednesday 9 October 2013

With continuous processes and collaboration to DevOps

If we already have focus on being agile, doing continuous integration and keeping the quality of our SW constantly on a good level, next step to look into is the distribution of the SW. When working with embedded systems, deployments of new SW are usually done rarely and frequent deployments are not possible in practice as the access to the devices might be difficult and frequent updates are annoying and even risky for the users (e.g. mobile phones), or the update process complex and uptime requirements very high (e.g. core network servers). However, there are other areas where frequent updates could be possible and even desirable, like getting bug fixes and new features deployed rapidly to an application in the cloud.

What is required to make deployments frequent, is a good co-operation between the team doing the development, the team verifying the content and the team deploying and maintaining the services, i.e. handling operations. This is called DevOps.

The foundation for functional DevOps is continuous development process, meaning that high-quality changes to SW are rapidly included to releases. This requires that:

Changes are small
SW builds are fast
Submission gate is fast and well-focused

Fast feedback on the changes encourage developers to continue delivering small changes, and filtering the bad changes out is done fluently. Developers shouldn't be afraid of sending bad changes occasionally, because too much verification on the developer's table slows down the sharing of the good changes. Automated submission gate should take care of checking out the changes, with testing covering key features that are often broken and/or widely used, and analysis that quickly provides good coverage on common issues.

After the collecting of changes is running fluently, next focus area is the creation of releases, which should be documented well and automated as much as possible. Goal should be to have at least one new release every day, to be used as the baseline for further development at minimum.

With all this we have a fluent process for the development. When we start to focus on DevOps, we need to take also care of the needs of the operations team. First thing that operations team expects is high quality of the SW release. While primary aim of the development process is to do superficial testing to ensure that the release is basically ok, that kind of testing is not good enough for keeping the SW in good shape in the long run. We need deep testing done by the verification team, a proper quality assurance. Full verification is not suitable for a fast-paced development process, but the verification process needs to be served as the primary customer for the development. Verification may take days or even weeks, but it will provide important insight on possible problems to the development team, which should fix any upcoming issues promptly, in order to keep the provided insight meaningful also in later verification cycles.

Besides verification, operations team will expect a fluent process for deployments and updates. For that development team needs to pay attention how to create a system which can be deployed and updated easily, and consultation with operations team is very valuable. It is also very important that development and verification teams in testing use replicas of the final production environment, or at least close copies of it. Proper configuration management of the environments will be crucial.

DevOps as a term and practice has been popular for a few years, but I believe that majority of the organizations that could benefit from that are still struggling in transformation from waterfall to agile development and continuous integration. Those that are successful in renewing their processes and keeping those in good shape, have better capabilities for making their business successful.

For further reading on DevOps, wikipedia page is a good place to start from.

Sunday 29 September 2013

Eliminate before automating

I've encouraged in my earlier texts to push for more automation, because it typically helps to create products faster and with better quality. Therefore, it's beneficial for the business to encourage automation. However, there's one more thing that has even more value for the business - eliminating tasks that don't provide enough value for the business.

Traditionally, in SW development we focus on improving the way we develop, test and distribute SW. And with automation we help the process to be faster. With automation we are working harder, but not smarter.

When we develop SW too hastily, we create technical debt into our system. To keep our system in good shape, we should be cleaning it up every now and then, refactoring the code and dropping out features that are not used anymore. We need to do the same also for our product development process, refactoring the existing steps and above all, eliminating steps, tasks and items that don't provide additional value to the business but are slowing down the product development.

As an example, if we have been developing our products for years, our bug handling process has collected a lot of things into itself. Typically, some stakeholder makes noise that a certain piece of information would be vital to have in the bug report form, and that piece of information is added to the process without evaluating the real cost of adding it. This scheme is repeated over the years and we end up with a bug reporting process nobody is happy about. Still, none of the parties is ready to give up any of the bits and pieces each has been able to include in process.

Calculating costs for each item in the process might be difficult. Therefore, better approach to keep the process lean is to start from clean table and include only those steps and items that can be shown to have significant value. The team that succeeds in doing that well will have foundation for making innovative products, saving time by eliminating meaningless tasks.

Saturday 21 September 2013

Creating a decent submission gate

In continuous integration (CI), a well-functioning submission gate is crucial (and it's important also in non-CI process) . Submission gate is the criteria which has to be passed in order for a change to be accepted to the common codebase. Note the difference: this is about single changes to be accepted to a baseline, while definition of done is about acceptance of a feature. Basically, all features are composed of several changes.

Key characteristics for a submission gate criteria:

It should prevent the breakage of the common codebase
It should be fluent and swift to use
It should be reliable

For preventing the breakage, we need to have good enough coverage. However, coverage is limited by the need to make the entries through the gate quick. Therefore, we need to pick an optimal set of activities. These should include:

Static code analysis for detecting e.g. possible memory leaks
Tests for frequently used features, breakage of which would prevent the use of the baseline for many
Tests for areas where the breakage could prevent a major part of further and more expensive testing
Tests for easily broken features
Unit testing done beforehand
Code review
Creation of change metadata

Quite a set and all these need to be fast! For static code analysis there are good tools available, which usually provide reports that enable quick fixing of upcoming problems - so those are very useful! Code review is very important and, if organized properly, an inexpensive way to discover problems that testing can't typically find.

Change metadata refers to all formalities that are related to change management, e.g. creating a ticket to the change management tool. This is often a heavy part of the process and should be optimized to make the creation of changes fluent while serving the management with enough information.

Tests need to be selected based on the above criteria, need to be automated (for fluent use) and quick. But we need to remember that the tests need to be also reliable! That's a major challenge as there are many things that could fail, e.g. bugs in test scripts, or failures in SW/HW environment. We will never have ~100% reliable tests (unless we test only very simple things), and therefore we need to be prepared for random failures. What should we do when a test fails?

Discard the change which seemed to break the test, if we rely on the test and our analysis on the results support the view.
Run the test again a few times to check if it is a random failure. How many times is enough? Do we have time and resources for retesting? Random failure may be caused also by the change at hand, so we need to run further tests also for older SW stacks in our codebase. We may also classify the failure as random if it has already appeared in earlier test runs.
Same failure has appeared occasionally already before - let's report an error and get a high priority for fixing it. Perhaps we should even drop the test until we receive a fix? Running a failing test is not sensible, it just grows irritation in everybody. On the other hand, problem should be fixed quickly as while the test is out of use, or random failures present, new errors causing additional failures in the test may enter our codebase without us noticing it.

A lot of things that we should be doing in designing and operating a submission gate. It will never be perfect, we will suffer in either speed, coverage or reliability. So we need to aim to make it decent. However, the most important aspect for a submission gate is always fast feedback, because good coverage is more a requirement for further testing.

Friday 13 September 2013

Why continuous integration won't succeed?

Continuous integration (CI) as a principle is key part of agile SW development. It is in practice mandatory in order to keep the asset in shape for potential shippable product at the end of the sprint. But there are many hurdles for CI to succeed. Here are some I have faced.

First, there might cultural reasons why developers may not be used to make small changes. This will happen if integration is provided for developers as a service. Thereby, it's no problem for the developer to have a long-living branch with no updates from other development during coding the change, and then hand it over to integrator that will then feel the pain of merging it to SW stack. In addition, developer will in this scheme avoid taking the "bad quality" changes of others into his/her development environment.

Second, there might be also practical reasons, not just culture. How easy it is to merge latest changes to own branch? We need to have good tooling for merges, and new baselines provided at least daily. How much effort there is to deliver a change? Unit testing, reviews, builds, integration testing and bureaucracy may all require such an effort that delivering small changes is not efficient SW development. What is the quality of baselines? Those need to be trustworthy, testing and analysis should be quick but have enough coverage to enable the capturing of most common failures, and baselines not fulfilling the tight release criteria shouldn't be published.

So we need to carefully look into our processes and think how well those support CI culture. Building and testing of changes need to be automated and swift, reviews need to have enough priority within the team and bureaucracy minimized. Ideally, developer's effort before submitting the change shouldn't take more than half an hour, and the complete SW build portfolio and acceptance testing in integration another half an hour. This is not the optimum, but bare minimum that would keep the CI process fluent. Tooling for the process needs to be intuitive to use. If our builds are done nightly or getting results from tests take hours, we have still a lot to do for getting into continuous integration.

Friday 6 September 2013

Version numbering means trouble for continuous integration

In continuous integration (CI), the aim is to push new changes quickly to the SW stack, pre- and postevaluate the changes through analysis and testing, making each of the changes a release candidate. For a release, we would need some simple identifiers. The most typical identifier is a version number. However, in CI incrementing version number can be painful, especially if it is defined compile-time.

First, let's look at the reasons why we need releases. Ultimately, a release is needed to deliver the SW to customer. However, releasing is beneficial also for internal purposes:

Release points out a baseline on which next changes are built on. This is important if testing in the CI process is not happening promptly and we want to avoid the situation that developers are using a bad candidate as a base for their changes.
Release simplifies the build configuration, in the form of baseline.
Release points out our latest and greatest SW to stakeholders outside SW development, e.g. verification.

Typical identifiers used for a release are:

baseline (label)
some identifier for the candidate
version number

Baseline identification is defined when the baseline is created, and can be thus freely selected based on scheme we have defined for the purpose. The scheme should be simple using a short body and a running number. Candidate will get its identifier when it's submitted, also based on a predefined scheme which should be simple.

But version number is difficult in case it is defined compile-time, i.e. when the candidate is submitted, and we would need to know already then what version number is allocated. Knowing it at that time would be difficult if release frequently (daily) and can't be sure which content ends up in the previous release.

If we redefine it when baseline is created, we need to recompile the SW and test it again, taking a lot of time if we have an extensive set of compilations and tests for baseline selection. Without testing we'll have a risk that something gets broken. Sometimes there's an alternative to hack the version number in the binaries and thus avoid recompilation, but the risk that the SW gets broken is still there.

Ok, so could we then make a meaningful selection for the version number already when the candidate is built? Yes we can, at least sometimes or even most of the time, but not always. And every time we have wrong version number, our CI process gets into trouble and SW distribution delayed. Version numbering should enter the game only when we are getting near the time that a release will be provided also to customer.

Best alternatives would be to have a version numbering which is not done compile-time but later, or meaningful identifiers for the candidates. The latter would mean a completely new SW philosophy for any mature organization that has been using version numbering as the primary means of identification for years, and the resistance will be furious by some people. However, open-minded developers familiar with agile ways of working will understand that by avoiding compile-time version numbering we'll keep our CI process fluent.