Monthly Archives: October 2009

Unstructured Search for Amazon’s SimpleDB

Posted on October 27, 2009 by Amund Tveit

SimpleDB is a service primarily for storing and querying structured data (can e.g. be used for a product catalog with descriptive features per products, or an academic event service with extracted features such as event dates, locations, organizers and topics). … Continue reading →

Posted in cloud computing | Tagged amazon, aws, hadoop, latency, python, s3, search, simpledb, storage, structured search, unstructured search | 2 Comments

How to use C++ Compiled Python for Amazon’s Elastic Mapreduce (Hadoop)

Posted on October 7, 2009 by Amund Tveit

Sometimes it can be useful to compile Python code for Amazon’s Elastic Mapreduce into C++ and then into a binary. The motivation for that could be to integrate with (existing) C or C++ code, or increase performance for CPU-intensive mapper or … Continue reading →

Posted in cloud computing | Tagged amazon, aws, c++, elastic mapreduce, hadoop, mapreduce, python, shedskin | 7 Comments

Hadoop World 2009 – some notes from application session

Posted on October 3, 2009 by Amund Tveit

Other recommended writeups : Hadoop World NYC (Hilary Mason) The View from HadoopWorld (Stephen O’Grady) Post Hadoop World Thoughts (Deepak Singh) Hadoop World, NYC 2009 (Dan Milstein) Hadoop World Impressions (Steve Laniel) — Location: Roosevelt Hotel, NYC 1235 Joe Cunningham … Continue reading →

Posted in cloud computing | Tagged finance, hadoop, hadoopworld, mapreduce, search | 1 Comment

Hadoop World 2009 – some notes from morning session

Posted on October 2, 2009 by Amund Tveit

Location: Roosevelt Hotel NYC 09:11 – Christophe Bisciglia (Cloudera) Announcement about BOFs HBASE and UI Birds of a Feather Hadoop history overview happenings during the last year: Hive, Pig, Sqoop (data import) ++ yesterday: Vertica announced mapreduce support for their … Continue reading →

Posted in cloud computing | Tagged amazon, cloudera, facebook, hadoop, ibm, yahoo | 3 Comments

Mapreduce & Hadoop Algorithms in Academic Papers

Posted on October 1, 2009 by Amund Tveit

The newest and most up-to-date version (May 2010) this blog post is available at http://mapreducebook.org An updated and extended version of this blog post can be found here. Motivation Learn from academic literature about how the mapreduce parallel model and … Continue reading →

Posted in cloud computing, Hadoop and Mapreduce, infrastructure | Tagged ebay, google, hp, intel, wikipedia, yahoo, yandex | 1 Comment