May 28

The purpose of Continuous Deployment is to increase Quality and Efficiency,
see e.g. The Software Revolution behind Linkedin’t Gushing Profits or read on

This posting presents an overview of Atbrox’ ongoing work on Automated Continuous Deployment. We develop in several languages depending on project or product, e.g. C/C++ (typically with SWIG combined with Python, or combined with Objective C), C# , Java (typically Hadoop/Mapreduce-related) and Objective-C (iOS). But most of our code is in Python (together with HTML/Javascript for frontends and APIs) and this posting will primarily show Python-centric continuous deployment with Jenkins (total flow) and also some more detail on the testing Tornado apps with Selenium.

Continuous Deployment of a Python-based Web Service / API

Many of the projects we develop involve creating a HTTP/REST or websocket API that generically said “does something with data” and has a corresponding UI in Javascript/HTML. The typical building stones of such a service is shown in the figure:

The flow is roughly as follows

  1. An Atbrox developer submits code into a git repo (e.g. or repo)
  2. Jenkins picks up the change (by notification from git or by polling)
  3. Tests are run, e.g.
    py.test -v --junitxml=result.xml --cov-report html --cov-report xml --cov .
    1. Traditional Python unit tests
    2. Tornado web app asynchronous tests –
    3. Selenium UI Tests (e.g. with PhantomJS or xvfb/pyvirtualdisplay)
    4. Various metrics, e.g. test coverage, lines of code (sloccount), code duplication (PMD) and static analysis (e.g. pylint or pychecker)
  4. If tests and metrics are ok:
    1. provision cloud virtual machines (currently AWS EC2) if needed with fabric and boto, e.g.
      fab service launch
    2. deploy to provisioned or existing machines with fabric and chef (solo), e.g.
      fab service deploy
  5. Fortunately Happy customer (and atbrox developer). Goto 1.

Example of selenium test of Tornado Web Apps with PhantomJS

Tornado is a python-based app server that supports Websocket and HTTP (it was originally developed by Bret Taylor while he was a FriendFeed). In addition to the python-based tornado apps you typically write a mix of javascript code and html templates for the frontend. The following example shows how to selenium tests for Tornado can be run:

Utility methods for starting a Tornado application and pick a port for it

import os
import tornado.ioloop
import tornado.httpserver
import multiprocessing

def create_process(port, queue, boot_function, application, name, 
                    instance_number, service, 
    p = processor.Process(target=boot_function, 
                          args=(queue, port, 
                               application, name,
                               instance_number, service))
    return p

def start_application_server(queue, port, application, name, 
                             instance_number, service):
    http_server = tornado.httpserver.HTTPServer(application)
    actual_port = port
    if port == 0: # special case, an available port is picked automatically
        # only pick first! (for now)
        assert len(http_server._sockets) > 0
        for s in http_server._sockets:
            actual_port = http_server._sockets[s].getsockname()[1]
    pid = os.getpid()
    ppid = os.getppid()
    print "INTERNAL: actual_port = ", actual_port
    info = {"name":name, "instance_number": instance_number, 
            "ppid": ppid, 

Example Tornado HTTP Application Class with an HTML form

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        html = """
<head><title>form title</title></head>
<form name="input" action="http://localhost" method="post" id="formid">
Query: <input type="text" name="query" id="myquery">
<input type="submit" value="Submit" id="mybutton">

    def post(self):
        self.write("post returned")

Selenium unit test for the above Tornado class

class MainHandlerTest(unittest.TestCase):                                                                                        
    def setUp(self):                                                                                                             
        self.application = tornado.web.Application([                                                                             
            (r"/", MainHandler),                                                                                                 
        self.queue = multiprocessing.Queue()                                                                                                                                                                                                        
        self.server_process = create_process(0,self.queue,start_application_server,self.application,"mainapp", 123, "myservice") 
        self.driver = webdriver.PhantomJS('/usr/local/bin/phantomjs')                                                            
    def testFormSubmit(self):                                                                                                    
        data = self.queue.get()                                                                                                  
        URL = "http://localhost:%s" % (data['port'])                                                                             
        self.driver.get('http://localhost:%s' % (data['port']))                                                                  
        assert "form title" in self.driver.title                                                                                 
        element = self.driver.find_element_by_id("formid")      
        # since port is dynamically assigned it needs to be updated with the port in order to work                                                         
        self.driver.execute_script("document.getElementById('formid').action='http://localhost:%s'" % (data['port']))            
        # send click to form and receive result??                                                                                
        self.driver.find_element_by_id("myquery").send_keys("a random query")                                                    
        assert 'post returned' in self.driver.page_source                                                                        
    def tearDown(self):                                                                                                          
if __name__ == "__main__":                                                                                                       

The posting have given and overview of Atbrox’ (in-progress) Python-centric continuous deployment setup, with some more details how to do testing of Tornado web apps with Selenium. There are lots of inspirational and relatively recent articles and presentations about continuous deployment, in particular we recommend you to check out:

  1. Etsy’s slideshare about continuous deployment and delivery
  2. the Wired article about The Software Revolution Behind LinkedIn’s Gushing Profits
  3. Continuous Deployment at Quora

Please let us know if you have any comments or questions (comments to this blog post or mail to

Best regards,
The Atbrox Team

Side note: We’re proponents and bullish of Python and it is inspirational to observe the trend that several major Internet/Mobile startups/companies are using it for their backend development, e.g. Instagram, Path, Quora, Pinterest, Reddit, Disqus, Mozilla and Dropbox. The largest python-based backends probably serve more traffic than 99.9% of the world’s web and mobile sites, and that is usually sufficient capability for most projects.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)
Tagged with:
Oct 01

A while back I wrote about How to combine Elastic Mapreduce/Hadoop with other Amazon Web Services. This posting is a small update to that, showing how to deploy extra packages with Boto for Python. Note that Boto can deploy mappers and reducers in written any language supported by Elastic Mapreduce. In the example below (it can also be found on github –, i.e. check out with git clone

Imports and connection to elastic mapreduce on AWS

#!/usr/bin/env python
import boto
import boto.emr
from boto.emr.step import StreamingStep
from boto.emr.bootstrap_action import BootstrapAction
import time

# set your aws keys and S3 bucket, e.g. from environment or .boto

conn = boto.connect_emr(AWSKEY,SECRETKEY)

Bootstrap step being created
In this case a shell script from s3, note that this could contain sudo commands in order to do apt-get installs, e.g to install classic programming language packages like gfortran or open-cobol, or more modern languages like ghc6 (haskell), or any code, e.g. checking out latest version of a programming language (e.g. Google Go with hg clone -r release $GOROOT) interpreter/compiler and compile it before using it in your mappers or reducers

bootstrap_step = BootstrapAction("download.tst", "s3://elasticmapreduce/bootstrap-actions/",None)

Create map and reduce processing step
Using cache_files also adds a python library available for import (the other way could be to do sudo easy_install boto in the bootstrap step, which would be easier since the boto module wouldn’t have to be unpacked manually in the python code, see my previous posting for details about unpacking). Note that the mapper and reducer could be any language as long as you either have compiled in or have installed an interpreter for it with the bootstrap step.

step = StreamingStep(
  cache_files = ["s3n://" + S3_BUCKET + "/boto.mod#boto.mod"],
  output='s3n://' + S3_BUCKET + '/output/wordcount_output')

jobid = conn.run_jobflow(
    log_uri="s3://" + S3_BUCKET + "/logs", 
    steps = [step],

Wait for job to start
This waits for the Elastic Mapreduce Job to start and prints out status, one of the statuses between starting and running being bootstrapping.

state = conn.describe_jobflow(jobid).state
print "job state = ", state
print "job id = ", jobid
while state != u'COMPLETED':
    print time.localtime()
    state = conn.describe_jobflow(jobid).state
    print "job state = ", state
    print "job id = ", jobid

print "final output can be found in s3://" + S3_BUCKET + "/output" + TIMESTAMP
print "try: $ s3cmd sync s3://" + S3_BUCKET + "/output" + TIMESTAMP + " ."

Validation of what really happened
One way to validate is to check that your mappers and reducers written in any language (i.e. for which compiler that you installed with bootstrap action), e.g. the classic mapreduce word count written in classic languages like Cobol or Fortran 95? The other way is to check the s3 logs, the log directory for an elastic mapreduce job has the following subdirectories:

daemons  jobs  node  steps  task-attempts

In the node directory, each EC2 instance used in the job has a directory, and underneath each of them there is a bootstrap_actions directory with the master.log and stderr, stdout and controller logs. In the case presented above bootstrap output is shown underneath.
stderr output

--2010-10-01 17:38:38--
Connecting to||:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  x-amz-id-2: NezTUU9MIzPwo72lJWPYIMo2wwlbDGi1IpDbV/mO07Nca4VarSV8l7j/2ArmclCB
  x-amz-request-id: 3E71CC3323EC1189
  Date: Fri, 01 Oct 2010 17:38:39 GMT
  Last-Modified: Thu, 03 Jun 2010 01:57:13 GMT
  ETag: "47a007dae0ff192c166764259246388c"
  Content-Type: application/octet-stream
  Content-Length: 153
  Connection: keep-alive
  Server: AmazonS3
Length: 153 [application/octet-stream]
Saving to: `file.tar.gz'

     0K                                                       100% 24.3M=0s

2010-10-01 17:38:38 (24.3 MB/s) - `file.tar.gz' saved [153/153]


2010-10-01T17:38:35.141Z INFO Fetching file 's3://elasticmapreduce/bootstrap-actions/'
2010-10-01T17:38:38.411Z INFO Working dir /mnt/var/lib/bootstrap-actions/1
2010-10-01T17:38:38.411Z INFO Executing /mnt/var/lib/bootstrap-actions/1/
2010-10-01T17:38:38.936Z INFO Execution ended with ret val 0
2010-10-01T17:38:38.938Z INFO Execution succeeded

The posting has shown how to programmatically install packages (e.g. programming languages) to EC2 nodes running elastic mapreduce. Since elastic mapreduce in streaming mode supports any programming language this can make it easier to deploy and test mappers and reducers written in your favorite language, and even automate it. (Opens a few doors for parallelization of legacy code)

Atbrox on LinkedIn

Best regards,
Amund Tveit, co-founder of Atbrox

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)
Tagged with:
Sep 07

We are here to help you:

  • Understand if and how the cloud can be cost-efficient in your setting
  • Efficiently analyze large data sets using the cloud
  • Architect, develop and deploy scalable and reliable software for the cloud
  • Adapt and migrate your existing data and software to the cloud

Technologies and methods we (non-exclusively) use:

Our motto is Simplicity, Automation and Scalability

If you are considering using cloud computing, please drop us a line to info (at)

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)
Tagged with:
preload preload preload