Category Archives: infrastructure

Mapreduce in Search

Posted on April 9, 2011 by Amund Tveit

Wrote about mapreduce in search in a presentation for next week. Mapreduce in Search (more up-to-date pdf version of the presentation) Best regards, Amund Atbrox

Posted in Atbrox, Hadoop and Mapreduce, infrastructure, search | Tagged information retrieval, mapreduce, search | 2 Comments

Parallel Machine Learning for Hadoop/Mapreduce – A Python Example

Posted on February 8, 2010 by Amund Tveit

Atbrox is startup providing technology and services for Search and Mapreduce/Hadoop. Our background is from from Google, IBM and Research. Update 2010-June-17 Code for this posting is now on github –http://github.com/atbrox/Snabler This posting gives an example of how to use … Continue reading →

Posted in cloud computing, Hadoop and Mapreduce, infrastructure | Tagged github, hadoop, machine learning, machinelearning, mapreduce, open source, python, ridge regression, svm | 14 Comments

How to combine Elastic Mapreduce/Hadoop with other Amazon Web Services

Posted on November 11, 2009 by Amund Tveit

Elastic Mapreduce default behavior is to read from and store to S3. When you need to access other AWS services, e.g. SQS queues or database services SimpleDB and RDS (MySQL) the best approach from Python is to use Boto. To … Continue reading →

Posted in cloud computing, Hadoop and Mapreduce, infrastructure | Tagged amazon, aws, hadoop, mapreduce, python, simpledb, sqs | 4 Comments

Mapreduce & Hadoop Algorithms in Academic Papers

Posted on October 1, 2009 by Amund Tveit

The newest and most up-to-date version (May 2010) this blog post is available at http://mapreducebook.org An updated and extended version of this blog post can be found here. Motivation Learn from academic literature about how the mapreduce parallel model and … Continue reading →

Posted in cloud computing, Hadoop and Mapreduce, infrastructure | Tagged ebay, google, hp, intel, wikipedia, yahoo, yandex | 1 Comment

How to get pip/virtualenv/Fabric working on Cygwin

Posted on September 21, 2009 by Thomas Brox Røst

If you are new to virtualenv, Fabric or pip is, Alex Clemesha’s excellent “Tools of the Modern Python Hacker” is a must-read. In short: virtualenv lets you switch seamlessly between isolated Python environments, Fabric automates remote deployment, while pip takes … Continue reading →

Posted in infrastructure | Tagged cygwin, fabric, pip, python, virtualenv, windows | 12 Comments