QA | 5 February 2016
The first rule of DevOps is that it's not about the tools.
The second rule of Devops is... wait, that's not where this should be going...
DevOps is the latest idea about how to create and deploy software, and it's an entire cultural shift for companies, bringing an agile approach to the software development lifecycle. The tools are simply a way of helping that process. Without a culture change, to paraphrase a colleague of mine, you simply have a method of failing to deploy software even faster.
However, this post is all about the tools and how we can use some of them to build what is known as the pipeline.
In an ideal world there is very little human interaction in the pipeline. Code is fed in one end and after going through several processes, it is then deployed. Amazon have their pipeline configured to allow for over 50 million deployments a year . That is a lot of code being pushed out into production. Each company will have its own pipeline - there is no one true way of doing this, and new tools are being released all the time. Xebia labs even produced the "Periodic table of DevOps" to give you a clue about how many different options are out there.
The tools tend to fall into a series of categories:
- Source Control
- Build tools
- Configuration Management
There are also several tools which cross over the categories such as testing tools, or tools for collaboration over teams. The term ChatOps is picking up pace. It refers to "conversation driven development". Tools such as HipChat or Slack are used here to keep a team talking and working together. There are numerous add-ons for these tools to keep the chat channel up to date with the build status and tasks as they are completed.
Automation is key here; we do not want people involved in the middle stages of the pipeline if at all possible. People are rubbish. They will forget to do things, or decide it's too much hassle to run those tests at 6pm on a Friday so will skip that step. We have computers now which are capable of doing boring, repetitive tasks for us, so why not let them do that?
The aim of the pipeline is to get something that is repeatable, trustable and reliable. It is based on the same idea as Test Driven Development - if we test everything and the tests pass, then the code is working as expected. If a developer submits some code it should run through all the tests automatically, if it passes then it can be pushed through to deployment without worrying that it is about to bring down the entire system. The quality of the pipeline is all down to the quality of the testing at each stage.
How do you test if a server exists though? There is a shift to everything being expressed as code - "Infrastructure as code" or "Security as code". If we can express how our infrastructure should be laid out in terms of code we have a record of what everything should look like for auditing. We can test the code. We can version control it. If something isn't working the entire infrastructure can roll back to a previous version. ‘Security as code’ means we can express security policies as something which can be tested for in the code base – meaning, less risk involved, as long as your tests are up to scratch.
So about this pipeline...
Before any code is written we have the client and (usually) the Business Analyst (BA). They sit down, have a cup of tea, and chat about what the client is after. The outcome of these meetings is to work out the requirements of the project and to manage expectations from the client. There is no point scoping out the Moon on a stick!
The BA will then go to the development team and hand over the requirements. It is important that they don't then ignore the project from this point on. If the client does not want to be directly involved with talking to the developers then the BA will need to act as a go between. Even if this isn't needed, they should stay on side and be helping out with the project. A good set of requirements is one that can be tested automatically.
There are very few companies out there that can cope without some type of source control method. They come in two types: Centralised or Distributed. The aim with source control is to have a single place where everyone can get a copy of the code for a project. Multiple people can work on it at the same time, adding their changes, and then all the changes are merged together into one place. Conflicts between different people's work are sorted straight away. Git is one of the most popular solutions for this. It uses a Distributed Model. Meaning that there is no single point of failure, but this model does open up more change merger issues.Source control is a key part of the pipeline as this is where the developer submits code and tests into the pipeline. Everything that is required to build and test (and later, even deploy) a project should be stored in the source repositories.
When new code is submitted to the repository, this sets off the next step of the pipeline - The Build Process. This can be done by the git server creating a push notification and telling the build machine to start, or the build machine can be watching git for changes.
There are two types of tools grouped together in this category. The first are the local build tools and dependency managers such as Maven , Gradle, sbt or npm. Then there are server based solutions such as Jenkins, Travis CI or Bamboo.
Local build tools all work in a similar way. There is a file which will describe how to build your project. Maven uses the project object model file - pom.xml whereas sbt uses a build.sbt file. The contents are formatted differently, but you get the same effect. This build file describes how to; compile the software, what dependencies are required and where to get them from. The build file can also look at code style and report test coverage statistics. Importantly, what these build tools do, is they always build your project in the same way; using the same building blocks. No more issues with different versions of libraries being used, as the build tool will pick them up for you.
Server based build tools work alongside the local build tools. They will notice when a change is made to the source repository, clone a copy of the code and run the Maven, Gradle or sbt jobs to build the project. They can also set off pre or post-build steps, such as deploying to a server, or informing another service that the built project is ready. Jenkins is one of the most popular options out there for build management; it was originally used for mostly Java projects, but it has a very active community behind it meaning there will be a plugin for just about any: language, reporting system, test or deployment tool.
After the project is built, the pipeline splits. Some companies deploy their code directly from the build server. Others send a notification to a config management tool to pick up the built code and deploy it. A new kid on the block is Containerisation.
Software containers are much like real containers. A box does not care about the contents inside it. If you have 10 boxes, all the same size, they will still stack the same if they contain the same thing or vastly different things. And so containers for software are boxes which contain code and everything that code relies on to run correctly. They are small, self contained units which can be loaded and unloaded on any operating system which has an engine to do so. The most popular of these tools is Docker.
Docker builds a container from a script known as the Dockerfile. This script is just a text file, so it can be stored along with the code in source control. Build systems like Jenkins can even set off the Docker build process as a post-build step. Containers are designed to be able to be run on any system, they are entirely infrastructure and content agnostic. This means that I can build a container on a Windows desktop machine, transfer it to a Linux machine in the Cloud and it will still run in the same way on both systems. All the files, libraries, settings and even the operating system the Container need, are all inside it.
The best practice with Containers is to make them as small as possible so they are quick to start or transfer between machines. Containers should be able to be started, stopped and removed at will. So there should be no data stored inside the Container.
Docker and all the other containerisation groups have started a standard for Containers - runC . This means that in the future you will be able to create Containers with one system and move it to another.
Then we have another split. Containers can be deployed directly (see Docker Machine, Swarm or Kubernetes) or they can be passed over to the config management tools.
Configuration Management is all about ensuring that servers (or other machines) are in the state they are expected to be in. Tools such as Chef, Puppet or Ansible are the key here (although, like everything, there are probably a dozen more being used!).
Config Management tools generally involve a master server of some variety, which holds the configuration for all the agents. The two styles here are a push and pull. Push style systems send a notification to all the agents telling them they need to update or to check their current configuration is in line with what is required. Chef and Puppet operate a pull style notification where the agent checks in every half an hour to see if there are any changes.
Again, (and there is a pattern here), all the configuration is held in a series of text files. Chef and Puppet use a Ruby-like syntax to express how a machine should be set up (see my previous blog post for how to get started with Puppet!). As these files are all just plain text they can also be stored in source control, alongside the code, the tests and the Docker container build file.
There are modules for configuration management tools that also deal with the idea of Infrastructure as Code. This means even the number of servers, and how they are networked together is expressed in terms of files. There are different systems for different Clouds; AWS uses CloudFormation and Azure uses resource management templates. Both do the same job, describing the setup for each of the machines in the network. There are also other systems that offer an abstraction layer on top of this, such as Terraform . This tool can link together systems in different clouds or even in your data centres.
There is a Docker plugin for Puppet which will pull and start up containers on a base system. The split tends to be the people who are running only microservices who can use something like Docker throughout and the people who run legacy systems, or anything where data is to be stored. Container and Config Management systems have a lot of cross over, but they are both worth having in different situations.
The last stage of the pipeline is Monitoring. How do we know if something is working without having a system watching it and checking? Monitoring is often seen as a bit dull - something to leave until the end, but it is key in any successful pipeline. Changes can be made to the infrastructure via changes to text files. The monitoring tools tell you whether that change was beneficial. Monitoring tools can tell you if a machine is overloaded with requests and then instruct the config management tools to create some more machines and add them to the load balancer until the traffic calms down again.
Monitoring is also the first line of defence against something going wrong. In an ideal world, the monitoring system will pick up any problems with a system before users have a chance to report anything is wrong. Then either automated systems can kick in to fix the problem, or a notification is sent to the people running the pipeline to have a look and solve issues.
The monitoring systems should be available for anyone to look at. The Developer should be just as interested in why part of their program is taking a long time to complete as the database staff who are trying to speed up data retrieval and the ops staff who are trying to ensure that there is a 99.999% uptime on the system. These metrics can also feed back through to the client and the business analyst as they sit down and talk about the next step of the pipeline. Maybe over another cup of tea.
But there is more...
This pipeline doesn't cover everything. There are programs out there that will allow you to visualise and manage your pipeline. Repository Management is a big thing with some companies, as they want to ensure that everything being used by a project is from a trusted, in house source.
Even security can and should be part of the pipeline. Rather than waiting until the end of the process and have the security team sign off on a change, why not have as many security checks automated as possible? There are tools available to do some of the standard checks, and these should be integrated with the pipeline before anything is pushed through to deployment.
The pipeline is just a tool, made up of other tools. It allows for faster, safer and successful software delivery.