Author Archives: Amund Tveit

Atbrox Customer Case Study – Scalable Language Processing with Elastic Mapreduce (Hadoop)

Posted on November 14, 2009 by Amund Tveit

We developed a tool for scalable language processing for our customer Lingit using Amazon’s Elastic Mapreduce. More details: http://aws.amazon.com/solutions/case-studies/atbrox/ Contact us if you need help with Hadoop/Elastic Mapreduce.

Posted in cloud computing | Tagged amazon, aws, data processing, elastic mapreduce, hadoop, language processing, nlp | 2 Comments

How to combine Elastic Mapreduce/Hadoop with other Amazon Web Services

Posted on November 11, 2009 by Amund Tveit

Elastic Mapreduce default behavior is to read from and store to S3. When you need to access other AWS services, e.g. SQS queues or database services SimpleDB and RDS (MySQL) the best approach from Python is to use Boto. To … Continue reading →

Posted in cloud computing, Hadoop and Mapreduce, infrastructure | Tagged amazon, aws, hadoop, mapreduce, python, simpledb, sqs | 4 Comments

Unstructured Search for Amazon’s SimpleDB

Posted on October 27, 2009 by Amund Tveit

SimpleDB is a service primarily for storing and querying structured data (can e.g. be used for a product catalog with descriptive features per products, or an academic event service with extracted features such as event dates, locations, organizers and topics). … Continue reading →

Posted in cloud computing | Tagged amazon, aws, hadoop, latency, python, s3, search, simpledb, storage, structured search, unstructured search | 2 Comments

How to use C++ Compiled Python for Amazon’s Elastic Mapreduce (Hadoop)

Posted on October 7, 2009 by Amund Tveit

Sometimes it can be useful to compile Python code for Amazon’s Elastic Mapreduce into C++ and then into a binary. The motivation for that could be to integrate with (existing) C or C++ code, or increase performance for CPU-intensive mapper or … Continue reading →

Posted in cloud computing | Tagged amazon, aws, c++, elastic mapreduce, hadoop, mapreduce, python, shedskin | 7 Comments

Author Archives: Amund Tveit

Atbrox Customer Case Study – Scalable Language Processing with Elastic Mapreduce (Hadoop)

How to combine Elastic Mapreduce/Hadoop with other Amazon Web Services

Preliminary Experiences Crawling with 80legs

Unstructured Search for Amazon’s SimpleDB

How to use C++ Compiled Python for Amazon’s Elastic Mapreduce (Hadoop)

Archives

Meta