Stormtrooperguy: Automated Chef Cookbook Testing. How? Why? Help! Part 1 of many!

As part of the DevOps Promised Land, we decided that it was time to move Chef out of Operations and into every developer's hands.

When I started at my current job, we did pretty much everything manually. I spent most of my first year here building up a reasonable infrastructure management system based on Chef, and getting all 200+ hand crafted EC2 instances rebuilt in an automated fashion.

Now that we've gotten pretty happy with where we are at, it's time to put the Dev in DevOps. No more opening tickets to add cookbook attributes! Self-service equals faster response time and less work for me. Let's do this!!

I use the term "fleet" throughout this post... you can replace that with stack, environment or whatever other term you use for a collection of servers that make up an instance of your application.

Before we get into details, lets talk about the high level goals:

All developers can modify cookbooks.
All cookbooks must have appropriate tests built in.
All cookbooks must have an automated "build" process that runs the tests at every commit.
No cookbook changes are pushed to the Chef server until they pass all of their tests.
Cookbook changes are applied to all fleets (including prod!) as soon as they pass their tests.

Here's where we were at the starting point of this project.

Our testing system was there, but it was manually run.

We were using the minitest-handler cookbook, and had reasonable test coverage. I won't lie to you and say that it was 100%, but it's close.

Every cookbook had a Vagrantfile, and used Berkshelf to maintain dependencies (via the vagrant-berkshelf plugin)

You made a change, and it was on you to vagrant up / make sure the tests pass. Once it's all good, you would commit the code.

Jenkins would get notified of the commit. It would check out the code, and do some REALLY basic tests:

knife cookbook test
foodcritic
knife testcoverage

That last one is a crappy little knife plugin I wrote that would look at each recipe and ensure that there was an _test.rb recipe associated with it. That doesn't mean there were any useful tests, just that you had at least taken the time to create the test file.

If everything passes it would do a knife cookbook upload.

When I was the only developer writing code, this was a workable system. I knew what I changed, when I changed it, and what the impact would be. I could troubleshoot problems very quickly and push fixes within minutes of bug detection.

As the team grows, this falls apart:

The prod and qa environments are pinned to versions, but dev is running head. broken code breaks all 5 dev fleets!
Cookbook versions aren't necessarily frozen. who knows what "1.5.4" actually means? it's totally possible for prod to be "pinned" to a cookbook rev, then for someone to change that version!
What if i just upload directly from my workstation without running tests?

We started off by defining some things that would help us get there

Every cookbook change bumps the version number.
Every cookbook upload is frozen.
Every fleet has cookbook versions pinned (except for the chef test fleet).
Every cookbook should have a build script that encapsulates the logic for building and testing it. That way our Jenkins jobs are simply check out code / run build.sh

The knife-spork plugin by etsy is a great way to manage all of versioning parts of that list. I'll get into the details of how we use all of that later on.

Building a cookbook

Let's walk through a simple cookbook. We have one that we use internally for basic OS stuff: selinux, iptables, package management, etc...

It's a good one for the example because the tests are for the most part super easy:

package "wget" do action :installend

Gets a test of:

it "Installs wget." do package("wget").must_be_installedend

test-kitchen

At first we were using vagrant-berkshelf to tie things together. The 1.5.1 release of Vagrant broke the plugin, and a bit of reading made me feel that test-kitchen was a better choice anyway.

So first up was adding test-kitchen to the cookbook (and removing Vagrant)

rm Vagrantfile

kitchen init

And there you go! Now we need to customize the .kitchen.yml

We use centos-6.4, so I took the ubuntu platforms entry out.
Add the runlist
Attributes.... uh-oh!!!

.kitchen.yml allows you to specify a list of chef attribute to pass. But I don't want to list them in each cookbook! I have an attributes.json file on the Jenkins box that defines all of the credentials that would normally be in an encrypted data bag. Vagrant lets me simply

dna = JSON.parse(File.read("/var/chef/attributes.json"))
chef.json.merge!(dna)

I looked at the test-kitchen docs, but didn't see anything specifically about including a file. I did catch though that the .kitchen.yml file is ERB parsed before being used. Perfect!

I converted attributes.json to yml using one of many tools out there, then added

attributes:
<%= File.read("/var/chef/attributes.yaml") %>

To my yaml file and it promptly blew up! Since the attributes file is just getting inserted exactly at that point, the indentation has to match up. Shifting the data in the yaml file over by a couple of tabs did the trick.

Kitchen test now works! It boots the VM and processes the runlist, including minitest-handler. We will eventually start using test-kitchen's built in tests but for now we're OK with what we've got.

build script

As I mentioned earlier, we want to have a standardized build script that would be used by all cookbooks. The content would change, but the content would be the same:

Run whatever tests that cookbook gets
If the tests pass, upload the cookbook and exit zero
If the tests fail, exit non-zero.

Our standard set of tests includes:

knife cookbook test (basic syntax)
knife spork check --fail (make sure the version number has been bumped)
kitchen test
kitchen destroy
foodcritic
knife spork upload

To make sure that whoever is running this has a fair chance of success, we include a Gemfile in the cookbook with all of those tools in it. The build script starts off with a bundle install, so as dependencies change, the systems should stay up to date.

The shell script looks at the exit code of each command as it runs, and dies if any of them are non zero.

git-hooks

Every cookbook has a git-hooks directory with recommended scripts to either use or integrate into your existing tools.

The main one that all cookbooks have is a simple pre-commit hook to run knife spork check --fail. This makes sure that you don't try to upload a frozen cookbook version.

promoting the cookbooks to the fleets

Since we aren't yet running full integration tests, we don't have a really good way to prove that the newly published cookbook plays well with its other friends in the Chef playground.

For the moment, we are automatically promoting the fleets once a cookbook passes its tests and gets uploaded.

We have a Jenkins job that looks up the latest frozen version of each cookbook and promotes the fleets to that version. Since you don't get a frozen upload without passing your tests, that should be safe enough.

That also allows us to have in flight cookbooks on the chef server without fear -- the fleets will only ever auto promote to the latest frozen revision.

integration testing

Great! Our cookbook passed its test and has been uploaded. All of the fleets are pinned to the older version though, so no-one is seeing this change.

Now what?

Enter the test fleet.

We have 1 development environment that does not get anything pinned. It is used only by ops, just for testing new Chef code.

Go forth and deploy / run rubot / do whatever you need to do to feel happy on the test fleet.

Right now we are doing this by hand, but that will be next on the automation to-do list.

Stormtrooperguy

Thursday, March 20, 2014

Automated Chef Cookbook Testing. How? Why? Help! Part 1 of many!