Delivering Software Quickly and Safely

What are the business reasons to invest in efficient software delivery?

Quick delivery leads to happier customers – quicker delivery means getting more feedback from customers faster, whether they’re internal or external, and delighting them.

Safer delivery avoids angry or frustrated customers – it doesn’t matter if you ship all of the time if your product breaks or is always down. You can’t make money on software that isn’t operational.

You won’t have to choose between speed and quality – the DevOps movement has shown you can have both. Speed and quality can be developed symbiotically in a virtuous cycle.

Efficient software delivery correlates with higher company performance – according to research in the 2019 State of DevOps report and its companion book Accelerate, companies that are good at software delivery are twice as likely to meet or exceed their organization’s performance goals.

What are some warning signs that indicate that you need to invest in improving your delivery efficiency?

If you aren’t meeting the needs of the business – it’s the ultimate test of whether you need investment: are you delivering on your promises? Are you meeting business objectives or could you deliver additions or fixes that could make the business more money?

If you’re not shipping software frequently – if you’re not shipping regularly, you’re not doing your job. You should be able to ship every day, whether or not you actually do. Shipping more frequently is actually safer, as it leads to a smaller blast radius for any incidents and makes it easier to identify causes of failure.

If you have a lot of outages – that’s a problem and indicates you need to evaluate your delivery processes. Outages happen, but you need to do what you can to limit their length and fallout.

If there’s friction in the engineering org – if you can’t figure out the causes of slow or low-quality development, devs are blaming ops or vice versa, and there’s friction and frustration, you may need to invest in delivery efficiency. Engineer burnout can be another symptom–if you’ve been working engineers too hard, it might not mean you need to ship less, but that you need to reexamine your processes and procedures.

What is DevOps? How is DevOps as a methodology different from DevOps as a function within a technology org?

DevOps is delivering software quickly and safely – it is a software delivery methodology that emphasizes speed and quality and enables developers to ship software quickly and safely. Efficient delivery cannot be delegated to specific DevOps employees, everyone involved in software delivery needs to participate in DevOps.

Consider a DevOps FTE if there’s a dev spending a lot of time not writing code – there’s no definite point where you need to invest in a DevOps employee. But if someone on the team is spending most of their time not writing code, then it might be time to bring in a DevOps employee.

DevOps employees are boundary spanners – the purview of DevOps is wide. There are a million titles: DevOps engineers, SREs, platform engineers, tech ops, and site ops. Learn from these different disciplines and incorporate that learning into your software delivery and maintenance practices.

What processes contribute to strong software delivery?

At a high-level, good delivery means you:

Keep your product up and running
Keep the developers moving as quickly as possible

Processes that contribute to better software delivery:

Continuous Integration – check code in and know very quickly whether it’s broken.
Continuous Delivery – get code out to production easily and speed up feedback.
Microservices – deploy code independently. You don’t have to wait on dependencies elsewhere.
Infrastructure as Code – deliver quickly, safely, and repeatedly.
Monitoring and Logging – if you don’t keep track of what’s going on, it’s hard to stay up and running. Google calls this the foundation of all resilient software services.
Communication and Collaboration – go to any teams that will use what you’re working on and talk to them about it.

How does successful software delivery change as your organization matures?

Smaller companies will have an easier time with DevOps – as your organization gets larger, the cultural factors will be more strained as you can’t maintain the relationships that allow for good communication and processes.

Cultural factors will determine what you’re able to implement – there isn’t a specific size at which you can or should implement specific software delivery practices. You need the proper culture to implement and maintain good processes and ensure fast and safe software delivery.

The goal isn’t to reach a specific point, it’s to improve all the time – DevOps requires continuous improvement. It’s a journey without a destination–try to get better every day.

The idea of best practices is a problem – you want to be constantly looking at your processes and improving. You can hone today’s best practices into better practices for tomorrow.

What indicators should you look at to judge successful software delivery?

You have to track performance to business outcomes – you’re not showing up at work to play with technology. You’re showing up to enable business outcomes. If you improve from shipping once a month to three times a day, then you’re now better able to respond to outages, customer needs, market conditions, etc.

Important speed indicators:

Deployment frequency – how often do you ship (features, fixes, etc.)?
Lead time for changes – when you want to ship something, how long does it take to get it out into production? How long does it take to go from idea to deployment?

Important quality indicators:

Change failure rate – how often does your deployment break stuff? Do you have the right testing? Do you have the ability to deploy behind feature flags? Do you do dark launching as a practice?
Time to recover – when you do have a failure, how quickly do you notice it? Do customers have to call before you spot a failure? What happens when it is detected? What procedures do you use to respond to an incident? How do you learn from an incident and what improvement does that lead to?

Look at all of the indicators together – going all in on speed reduces your quality and going all in on quality reduces your speed.

Other indicators to keep an eye on include:

Velocity/Agile metrics
Cycle times
Uptime metrics – these are going to be signs of the health of your service.
- SLI – service level indicators
- SLO – service level objectives
- SLA – service level agreements

How do you improve?

Don’t get wrapped up in the numbers – find what frustrates people and implement improvements. The indicators are just proxies for a good culture. Don’t go optimizing deployment frequency at the expense of everything else–there’s no real difference between shipping 124 and 126 times.

Don’t release 10,000 changes at once – release one change at a time. If you only deploy one change at a time it doesn’t take very long to develop, deploy and get feedback.

There are many processes you can implement that affect your indicators – to improve, you can look at the best practices, including:

Continuous Integration
Continuous Delivery
Microservices
Infrastructure as Code
Monitoring and Logging
Communication and Collaboration
Revision control
Build artifacts automatically
Emphasize unit tests over integration tests
Feature flagging
Dark launching

Read Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim – it goes into great detail on improving how you deliver software.

Assessments can help identify areas to improve – Google Cloud has a DORA self-assessment, or you can go to an advisor for a formal assessment from an expert.

Past a certain point, you see diminishing returns on investment – you’ll want to work on bigger and better projects than marginal improvements to these indicators.

What are strategies for balancing quick development with quality and stable software?

Don’t look at software as something to get done – if you spend a lot of time writing software just to check it into a repository, you won’t make money. Companies only make money when software is out in production, in the hands of the customers who like it and keep paying for it.

Write software with its operation and maintenance in mind – future operation and maintenance needs should affect your coding. If it’s a pain to operate, and you have to make changes to keep things running, it’s not quality software and will cost more to maintain and fix.

Test, test, test – you want to unit test and integration test. Test how the software reacts under various scenarios and potential conditions. You’re not spending six years testing because that’s not smart business, but you haven’t finished when the software is in a code repository.

What are the four “flow items” and how should your team prioritize where they’re investing time?

Make sure you’re investing across all four areas:

Features – features are important but not the only thing you should ship. The Product team is incentivized to ship features, so you have to work in partnership to prioritize other areas.
Defects – you have to carve out time to ship fixes to defects.
Risks – it isn’t easy to communicate the revenue value of shipping risk items until you end up on CNN because of a security breach.
Debt – every organization has tech debt that they need to work through.

The imperative to ship goes beyond features – we say ship often, but a FinTech company might respond that customers won’t tolerate new features every day. That may be true but you need to ship all different kinds of flow items–don’t get tunnel vision on features.

At different times of the year, you can prioritize different items – this is from Mik Kersten’s book, Project to Product:

During the slow period, prioritize fixes, debt, and risk – this is a good time to ship items that clean up your back end.
During the busy season, prioritize features – when people return, they’ll be super excited about your product.

Find what works for you to accomplish everything you need – figure out how your team can earmark time for all four flow items. I’ve been on teams where every other Wednesday teams just focus on debt. You need to get the full range of items done.

How should you incorporate experiments into your software delivery process?

Plan, do, study, act – Toyota’s W. Edwards Deming created this process. Make a plan, run the experiment, study the results, adjust based on what you learned and go do it. Then do that whole process again. This is the essence of Agile.

Experiments mix Agile and Kaizen – the same type of experimentation that helps you improve software for your customers can also help improve your daily work processes. Try experiments everywhere, they shouldn’t be localized to any one part of the process. Don’t just use Agile, be agile. If you’re thinking about doing something, try it. See what happens.

Put a date on the calendar when you’ll review results – come back as a group and talk about how it went, what you liked, what you should change, etc. And if it made things worse, then don’t do it anymore.

Experiment length depends on what you need to evaluate results for promise – it might be a few days or it might be a month.

What kind/how much testing needs to occur before deploying?

The more you test, the more confidence you’ll have when deploying – don’t test past the point of business sense, but be confident in what you’re deploying.

Unit tests
Integration tests
Continuous testing

Test even after deploying – Netflix came up with the “chaos monkey”, an imaginary being that wreaks havoc, turning off machines and severing connections. If I test my microservice to death before shipping, but a microservice that works with mine ships a week later, then my testing wasn’t representative of the operating environment. You need continuous testing.

Testing doesn’t break things, it finds what’s broken – the argument against testing after deployment is that you might break something. But that’s wrong, it’s already broken. So you’ll either find out during business hours or customers will find it for you at 3:30 am.

How should you structure and manage your team to get quick and safe development?

Four different types of teams – the book Team Topologies describes four types of teams:

Value stream-aligned team – usually your dev team, aligned to a flow of work that delivers value in a business segment.
Platform teams – teams that provide an internal product to support and speed up the work of stream-aligned teams.
Enabling teams – help the stream-aligned team to handle obstacles.
Complicated subsystem teams – niche teams that require significant technical expertise.

Use the “two pizza team” model – at Amazon, they have a rule that no team should be bigger than you can feed with two pizzas. At some point, if the team is too big, half the team is going to tune out your daily standup and won’t care what’s being said. This isn’t good for team throughput or morale.

Create a Westrum generative culture – create a performance-oriented culture with good information flows. Your culture should be focused on your mission, where everything else is subordinated to good performance.

Create a culture of psychological safety – Google did a study and found that the highest-performing teams were those with the most psychological safety. This correlated with performance more than anything to do with the team’s pedigree. These teams could talk openly amongst themselves to come up with innovative ideas. Create that culture in your team and your organization.

Tolerate failure – you have to tolerate failure to run experiments and learn from them in order to improve.

What tech tools do you need to deliver software effectively?

Embrace the ecosystem that you’re working in – for example, if you use Amazon and they have 35+ services around containers, then if you deploy things with containers your life is going to be a whole lot easier because there’s a whole ecosystem that’s supporting you.

Use the tool that fits best with the culture that your company has/wants – I never recommend tools to anyone. The tools that fit well into how people work are the ones that will be adopted. There’s nothing worse than investing time and money into evaluation, installation, and subscription for a tool that no one uses. Make the tools you pay for easy to use.

Relevant tool categories include:

CICD tool
Metrics and monitoring
Orchestration tools
Artifact repositories
Security scanners

Don’t run the tech yourself – just pay someone else to do it. So you run your own Jira? Great, your competitors plunk down a credit card every month and Jira gets run for them. If you’re so good at running Jira, you should just be a “Jira running company”.

What are the most important pieces to get right?

The goal isn’t to write software, it’s to solve business problems – collaborate to figure out what you can put in place and the agreements that you can come to, to ship software safely and quickly.

Remember your core responsibilities:

Keep the product up
Keep developers moving as fast as possible

What are the common pitfalls?

Organizations get stuck in their ways – they come up with all the reasons you can’t do something and are afraid to make changes. This leads to the fossilization of practices that aren’t really optimal.

People don’t realize they have permission to make changes – no leader will get mad about the business getting better, spending less money, or making more money. I tell all my clients, “I’m giving you permission to try this. I’m giving you permission to experiment to help improve the business.”