-
Archives
- November 2014
- September 2014
- April 2014
- August 2013
- May 2013
- February 2013
- October 2012
- September 2012
- May 2012
- April 2012
- March 2012
- January 2012
- November 2011
- May 2011
- April 2011
- February 2011
- January 2011
- October 2010
- September 2010
- August 2010
- May 2010
- February 2010
- November 2009
- October 2009
- September 2009
-
Meta
Tag Archives: hadoop
Preliminary Experiences Crawling with 80legs
80legs is a company specializing in the crawling and preprocessing part of search, where you can upload your seed urls (where to start crawling), configure your crawl job (depth, domain restrictions etc.) and also run existing or custom analysis code … Continue reading
Posted in cloud computing
Tagged cloud computing, crawling, hadoop, mapreduce, search, web services
3 Comments
Unstructured Search for Amazon’s SimpleDB
SimpleDB is a service primarily for storing and querying structured data (can e.g. be used for a product catalog with descriptive features per products, or an academic event service with extracted features such as event dates, locations, organizers and topics). … Continue reading
Posted in cloud computing
Tagged amazon, aws, hadoop, latency, python, s3, search, simpledb, storage, structured search, unstructured search
2 Comments
How to use C++ Compiled Python for Amazon’s Elastic Mapreduce (Hadoop)
Sometimes it can be useful to compile Python code for Amazon’s Elastic Mapreduce into C++ and then into a binary. The motivation for that could be to integrate with (existing) C or C++ code, or increase performance for CPU-intensive mapper or … Continue reading
Posted in cloud computing
Tagged amazon, aws, c++, elastic mapreduce, hadoop, mapreduce, python, shedskin
7 Comments
Hadoop World 2009 – some notes from application session
Other recommended writeups : Hadoop World NYC (Hilary Mason) The View from HadoopWorld (Stephen O’Grady) Post Hadoop World Thoughts (Deepak Singh) Hadoop World, NYC 2009 (Dan Milstein) Hadoop World Impressions (Steve Laniel) — Location: Roosevelt Hotel, NYC 1235 Joe Cunningham … Continue reading
Hadoop World 2009 – some notes from morning session
Location: Roosevelt Hotel NYC 09:11 – Christophe Bisciglia (Cloudera) Announcement about BOFs HBASE and UI Birds of a Feather Hadoop history overview happenings during the last year: Hive, Pig, Sqoop (data import) ++ yesterday: Vertica announced mapreduce support for their … Continue reading