Monday, March 31, 2014

AWS, Chef and scaling a mixed Windows/Linux environment (Part 2)

The first pass



Our first pass was to get everything moved off of RightScale and into a system we managed. We had the technical skills to do this ourselves, so no need to give RS so much money!

After a month+ of evaluation and testing, we settled on Chef 10. We found the cookbook system to fit better into our overall development workflow, and the open source community was strong. Puppet was a serious contender too, and to this day I think either would have worked as well as the other.

But, there can be only one, and the one was Chef.

Starting at the foundation



We began with a simple goal: We need to be able to launch and minimally configure a base system of either Windows or Linux using the same set of tools.

By minimally configure, I mean:

System is online and accessible from our office or over our VPN
Users have their accounts with appropriate access
Tools that are universally used on all systems are present (sysstat on Linux, Powershell on Windows, etc...)

Sounds easy, right?



Turns out there are a LOT of awesome open source tools for managing your AWS infrastructure, and not many of them support Windows AND Linux equally. Now start looking for things that understand Amazon Simple Workflow or some of the other offerings outside of EC2 and S3.  Even Chef required a different plugin and bootstrapping syntax to do Windows.

This led to creating a basic AWS management platform in-house. We used Python/Boto, and did our own tooling around security groups, s3 buckets.

At the end of this phase we had a config file for each "fleet" (our term for stack or environment). This config file contained the core information needed to manage it: AWS credentials, root SSH keys, things like that. From there you could initialize an entire fleet in a brand new AWS account. All buckets, groups, instances, roles, etc... would be generated for you. Cool!

But that just gets us instances. How did we make them do the work?



We set up a couple of initial cookbooks. Over time this has grown, but in the beginning we had company_system and company_mainproduct.

The system cookbook contained all the OS level stuff: How do I install Java? Python? What users should have access here?

The mainproduct cookbook contained everything about our app: How do I find the Grails war file? How should tomcat be configured for this app? That level of things.

Deploying code was now as simple as a knife command



knife ssh "role:portal" "sudo chef-client -o recipe[product::portal_deploy]"

Woo!

Oh... wait... Windows.... Strange hostnames (long story... has to do with the lack of unique hostnames on the windows boxes messing with Chef's node discovery)

knife winrm -m hostname -x user -P password "chef-client -o recipe[product::worker_deploy]"

Hmm... remembering that will be tough.

Add "deploy.py" to the mix, which wraps up the logic for each app into something simple and easy.

./deploy --fleet test --artifact portal --branch master --build 12

And it sorts the rest.

Cool! Thus closes our first pass at Chef and Automation.

Previous: Starting point
Next: Improvements

No comments:

Post a Comment