Distributed Tracer Bullet Development

Tracer Bullet Development


Tracer Bullet Development is finding the major “moving parts” of a software system and start by writing enough code to make those parts interact in a real manner (e.g. with direct API-calls, websocket or REST-APIs), and as the system grows (with actual functionality and not just interaction) keep the “tracer ammunition” flowing through the system by changing the internal interaction APIs (only) if needed.

Motivation for Tracer Bullet Development

  1. integration is the hardest word (paraphrase of an old tune)
  2. prevent future integration problems (working internal APIs from the start)
  3. have a working system at all times (though limited in the beginning)
  4. create non-overlapping tasks for software engineers (good management)

(Check out the book: Ship it! A Practical Guide to Successful Software Projects for details about this method)

Examples of Distributed Tracer Bullet Development

Let us assume you got a team of 10 excellent software engineers who had never worked together before, and simultaneously the task of creating an first version and working distributed (backend) system within a short period of time?

How would you solve the project and efficiently utilize all the developers? (i.e. no time for meet&greet offsite)

Splitting the work properly with tracer bullet development could be a start, let’s look at how it could be done for a few examples:

1. Massively Multiplayer Online Games
Massively Multiplayer Online Games, e.g. Zynga’s Farmville, Linden Lab’s Second Life, and BioWare/LucasArt’s Star Wars Old Republic – are complex distributed systems. So what can a high-level tracer bullet architecture for such a game look like? Services that might be needed are:

  1. GameWorldService – to deal with the game world, assuming basic function is returning a graphic tile for a position x, y, z
  2. GameArtifactService – to deal with state of various “things” in the world (e.g. weapons/utilities), e.g. growth of plants.
  3. GameEconomyService – to deal with overall in-game economy and trade
  4. AvatarService – to deal with player avatars and non-player characters (monsters/bots) (i.e. active entities that operate in the GameWorldService and can alter the GameArtifactService)
  5. LogService – to log what happens in the game
  6. StatService – calculates/monitors various statistics about the game
  7. AIService – e.g. used by non-player characters for reasoning
  8. UserService – to deal with users (profiles, login/passwords etc, metainfo++)
  9. GameStateService – overall game state
  10. ChatService – for interaction between players
  11. ClientService – to deal with various software clients users use, e.g. ipad client, pc client
  12. CheatMalwareDetectionService – always someone looking to exploit a game
    UserService (to deal with state/metainfo regarding the user),

Already more services (12) than software engineers (10), but let us create a beginning of a draft of at tracer bullet definition in a json-like manner.

tracerbullets = {
"GameWorldService":{
  "dependencies":["GameStateService","LogService"],
  "defaultresponse":{"tiledata_as_json_base_64": ".."},
  "loadbalancedserveraddress":"gameworld.gamecomp.com"},

"GameArtifactService":{
  "dependencies":["GameStateService","GameWorldService"],
  "defaultresponse":{"artifactinfo": ".."}
  "loadbalancedserveraddress":"gameartifacts.gamecomp.com"},

"AvatarService":{
},

"GameEconomyService":{
}
"
}

Atbrox’ (internal) Tracer Bullet Development Tool – BabelShark
The game example resembles RPC (e.g. Avro) and various deployment type definitions (e.g. Chef, or Puppet) but it focused on specifying enough information (but not more) to get the entire (empty, with default responses) system up and running with it’s approriate host names (which can be run on one machine for testing with either minor /etc/hostname file changes or running a local dns server). When the system is running each request appends the received default responses to its default response so one can trace the path of e.g. REST/HTTP or websocket calls through the system (e.g. if a call to the GameWorldService uses both GameStateService and LogService as below, this will be shown in the resulting json from GameWorldService). When the (mock-like) default responses are gradually being replaced with real services they can be run as before, and when they are properly deployed just removing the DNS entry in /etc/hosts or the local dns server to get real data. Proxying external services (e.g. Amazon Web Services) can be done in a similar manner. This can in overall make it easier to bridge development situation with deployment situation.

In Atbrox we have an internal tool called BabelShark that takes an input tracer bullet definition (json) and creates Python-based tracer bullet system code (using Bret Taylor’s Tornado websocket support) and also creates corresponding websocket commandline client and javascript/html clients for ease of testing all components in the system. Technically it spawns one tornado process per service (or per instance of a service if more than one), dynamically finds available port numbers and communicates them back to , creates a new /etc/hosts file with the requested host names per service (all pointing to localhost), and a kill-shell-script file (note: you quickly get a lot of processes this way, so even if the multicores are humming you can quickly overflow them, so nice to be able to kill them).

Example 2. Search Query Handling
The prerequisite for search is the query, and a key task is to (quickly) understand the user’s intention with the query (before actually doing anything with the query, such as looking up results in an index).

A few questions needs to be answered about the query: 1) what is the language of query?, 2) is the query spelled correctly in the given language? 3) what is the meaning of the query? 4) does the query have ambiguous meaning (either wrt language or interpretation), 5) what is the most likely meaning among the most ambiguous ones? So how can a tracer-bullet development for this look like?

tracerbullets = {
"LanguageDeterminator":{
  "dependencies":["KlingonClassifier", "EnglishClassifier"],
  "defaultresponse":{"sortedlanguagesprobabilities":[{1.0:"English"}]}
},

"SpellingIsCorrect":{
   "dependencies":["LanguageDeterminator","KlingonSpellChecker"],
   "defaultresponse":{"isitspelledcorrectly":"yes"}
},

"MeaningDetermination":{
   "dependencies":["LanguageDeterminator", "NameEntityDeterminator"],
   "defaultresponse":{"meaning":"just a string with no entities"},
},

"Disambiguator": {
   "dependencies":["MeaningDetermination", ".."],
   // specialized for the query: Turkey - is it about the country or 
   // or about food (i.e. right before thanksgiving)
   "defaultresponse":{
    "disambiguatedprobabitity":[{0.9:"country"},{0.1:"bird"}]
}
}



}

Conclusion

Have given an overview of tracer bullet development for a couple of distributed system cases, and have also mentioned how our (internal) tool supports Distributed Tracer Bullet Development.

If you are keen to learn more and work with us here in Atbrox, please check out our jobs page. Atbrox is a bootstrapped startup working on big data (e.g. hadoop and mapreduce) and search (we also work with and own parts of a few other tech startups).

Best regards,

Amund Tveit (@atveit)
Atbrox (@atbrox)

Posted in information retrieval, infrastructure, tracer bullet development | Tagged , , | Leave a comment

Workshop: Mapreduce’12 – 3rd International Workshop on Mapreduce and its applications)

If you are interested in Mapreduce or Hadoop I recommend submitting to or attending the following workshop.

The Third International Workshop on MapReduce and its Applications (MAPREDUCE’12)
June 18-19, 2012 HPDC’2012, Delft, the Netherlands.
http://graal.ens-lyon.fr/mapreduce/

SCOPE
=====

Since its introduction in 2004 by Google, MapReduce has become the programming
model of choice for processing large data sets. MapReduce borrows from functional
programming, where a programmer can define both a Map task that maps a data set
into another data set, and a Reduce task that combines intermediate outputs into
a final result. Although MapReduce was originally developed for use by web
enterprises in large data-centers, this technique has gained a lot of attention
from the scientific community for its applicability in large parallel data
analysis (including geographic, high energy physics, genomics, etc..).

The purpose of the workshop is to provide a forum for discussing recent advances,
identifying open issues, introducing developments and tools, and presenting
applications and enhancements for MapReduce (or very similar) systems. We
therefore cordially invite contributions that investigate these issues, introduce
new execution environments, apply performance evaluations and show the
applicability to science and enterprise applications.

TOPICS OF INTEREST
==================

– MapReduce implementation issues and improvements
– Implementation optimization for GPU and multi-core systems
– Extensions to the programing model
– Large-scale MapReduce (Grid and Desktop Grid)
– Use of CDN and P2P techniques
– Heterogeneity and fault-tolerance
– Scientific data-sets analysis
– Data and compute-intensive applications
– Tools and environments for MapReduce
– Algorithms using the MapReduce paradigm

PAPER SUBMISSIONS
=================

Authors are invited to submit full papers of at most 8 pages, including all
figures and references. Papers should be formatted in the ACM proceedings style
(e.g., http://www.acm.org/sigs/publications/proceedings-templates). Submitted
papers must be original work that has not appeared in and is not under
consideration for another conference or a journal. Accepted papers will be
published by ACM in the conference workshops proceedings.

Papers should be submitted here: TO BE ANNOUNCED ON WEBSITE

IMPORTANT DATES
===============

– Manuscript submission deadline : February 25, 2012
– Acceptance notification : March 26, 2012
– Camera-ready paper deadline : April 16, 2012
– Workshop dates : June 18-19, 2012

ORGANIZATION COMMITTEE
======================

General Chairs
==============

– Gilles Fedak, INRIA/LIP (contact: gilles.fedak@inria.fr)
– Geoffrey Fox, Indiana University (contact: gcf@cs.indiana.edu)

Program Chair
=============

Simon Delamare, INRIA/LIP (contact: simon.delamare@inria.fr)

Publicity chair
===============

Haiwu He, INRIA/LIP (contact: haiwu.he@inria.fr)

PROGRAM COMMITTEE
=================

– Alexandre de Assis Bento Lima, Federal University of Rio de Janeiro
– Amund Tveit, Atbrox
– Carlo Mastroianni, ICAR-CNR
– Christian Engelmann, Oak Ridge National Laboratory
– Francisco V. Brasileiro, Federal University of Campina Grande
– Frédéric Suter, IN2P3/CNRS
– Gabriel Antoniu, INRIA
– Heithem Abbes, Faculty of Sciences of Tunis
– Heshan Lin, Virginia Polytechnic Institute and State University
– Hidemoto Nakada, AIST
– Jerry Zhao, Google (tech.lead for Google’s Mapreduce team, sorting PetaBytes)
– José A.B. Fortes, University of Florida
– Judy Qiu, Indiana University
– Michael C. Schatz, Cold Spring Harbor Laboratory
– Oleg Lodygensky, CNRS
– Shantenu Jha, Louisiana State University
– Xuanhuan Shi, Huazhong University of Science and Technology
– Yang Yang, Netflix

Best regards,
Amund Tveit, Atbrox

Posted in hadoop, mapreduce | Tagged | Leave a comment

Workshop: Searching for fun

If you are interested in search I recommend you to consider submitting a paper to or attending the Searching 4 fun workshop* (I just joined as a program committee member) which is going to be held in Barcelona in April 2012.

Call for Papers
The topics of the workshop will be evaluation focused and include but are not be limited to:

  • Understanding information needs and search behaviour in casual-leisure situations.
  • How existing systems are used in casual-leisure searching scenarios.
  • Systems / Interfaces / Algorithmic approaches to supporting Search in Casual-leisure situations.
  • Use of Recommender Systems for Entertaining Content (books, movies, videos, music, websites).
  • Modelling of users interests and generation of accurate and appropriate user profiles.
  • Interfaces for exploratory search for casual-leisure situations.
  • Evaluation (methods, metrics) of Casual-leisure searching situations.

We are seeking short 2-4 page position papers in this area, or short papers reporting early or formative results in the area of searching for fun.

Reviewing and Publishing

Papers will be reviewed by an international program committee, and we intend to publish the accepted papers using the CEUR Workshop Proceedings service.

Submissions

Participants should submit anonymised 2-4 page ACM format PDFs to the easychair page. More details are in our submissions page.

Organizers

Program Committee

Twitter
We’ll be using the #search4fun hashtag where possible
(ECIR Conference Twitter account is @ecir2012)

Link to web pages for the workshop: fitlab.eu/searching4fun/cfp.php

(* at the 34th European Conference on Information Retrieval – ECIR 2012)

Best regards,
Amund Tveit

Posted in cfp, entertainment, information retrieval, query intent, search, workshop | Leave a comment

Mapreduce & Hadoop Algorithms in Academic Papers (5th update – Nov 2011)

The prior update of this posting was in May, and a lot has happened related to Mapreduce and Hadoop since then, e.g.
1) big software companies have started offering hadoop-based software (Microsoft and Oracle), 2) Hadoop-startups have raised record amounts, and 3) nosql-landscape becoming increasingly datawarehouse’ish and sql’ish with the focus on high-level data processing platforms and query languages.

Personally I have rediscovered Hadoop Pig and combine it with UDFs and streaming as my primary way to implement mapreduce algorithms here in Atbrox.

Best regards,
Amund Tveit (twitter.com/atveit)

Changes from the prior postings is that this posting only includes _new_ papers (2011):

Artificial Intelligence/Machine Learning/Data Mining

Bioinformatics/Medical Informatics

Image and Video Processing

Statistics and Numerical Mathematics

Search and Information Retrieval

Sets & Graphs

Simulation

Social Networks

Spatial Data Processing

Text Processing

Posted in hadoop, machine learning, mapreduce | 7 Comments

Mapreduce & Hadoop Algorithms in Academic Papers (4th update – May 2011)


It’s been a year since I updated the mapreduce algorithms posting last time, and it has been truly an excellent year for mapreduce and hadoop – the number of commercial vendors supporting it has multiplied, e.g. with 5 announcements at EMC World only last week (Greenplum, Mellanox, Datastax, NetApp, and Snaplogic) and today’s Datameer funding announcement , which benefits the mapreduce and hadoop ecosystem as a whole (even for small fish like us here in Atbrox). The work-horse in mapreduce is the algorithm, this update has added 35 new papers compared to the prior posting, new ones are marked with *. I’ve also added 2 new categories since the last update – astronomy and social networking.

Motivation
Learn from academic literature about how the mapreduce parallel model and hadoop implementation is used to solve algorithmic problems.

Which areas do the papers cover?

Author organizations and companies?
Companies: China Mobile, eBay, Google, Hewlett Packard and Intel, Microsoft, Wikipedia, Yahoo and Yandex.
Government Institutions and Universities: US National Security Agency (NSA)
, Carnegie Mellon University, TU Dresden, University of Pennsylvania, University of Central Florida, National University of Ireland, University of Missouri, University of Arizona, University of Glasgow, Berkeley University and National Tsing Hua University, University of California, Poznan University, Florida International University, Zhejiang University, Texas A&M University, University of California at Irvine, University of Illinois, Chinese Academy of Sciences, Vrije Universiteit, Engenharia University, State University of New York, Palacky University, University of Texas at Dallas

Atbrox on LinkedIn

Btw: I would like to recommend:

  1. Mapreduce bibliography maintained by (Cloudera co-founder) Jeff Hammerbacher
  2. (the excellent) book – Data-Intensive Text Processing with Mapreduce by (UMD’s/Twitter’s) Jimmy Lin and Christopher Dyer.

Let me know if you have input/corrections/feedback to this posting – amund @\h@ atbrox.com – or @atveit or @atbrox on twitter.

Best regards,
Amund Tveit (Atbrox co-founder)

Posted in Atbrox, cloud computing, Hadoop and Mapreduce | 16 Comments