Category Archives: Hadoop and Mapreduce

Word Count with MapReduce on a GPU – A Python Example

Posted on August 20, 2010 by Amund Tveit

Atbrox is startup company providing technology and services for Search and Mapreduce/Hadoop. Our background is from Google, IBM and research. GPU – Graphical Processing Unit like the NVIDIA Tesla – is fascinating hardware, in particular regarding extreme parallelism (hundreds of … Continue reading →

Posted in Hadoop and Mapreduce | Tagged cuda, gpu, mapreduce, nvidia, pycuda, tesla | 9 Comments

Mapreduce & Hadoop Algorithms in Academic Papers (3rd update)

Posted on May 8, 2010 by Amund Tveit

Atbrox is startup company providing technology and services for Search and Mapreduce/Hadoop. Our background is from Google, IBM and research. Contact us if you need help with algorithms for mapreduce This posting is the May 2010 update to the similar … Continue reading →

Posted in cloud computing, Hadoop and Mapreduce | Tagged google, hadoop, machinelearning, mapreduce, yahoo | 22 Comments

Mapreduce & Hadoop Algorithms in Academic Papers (updated)

Posted on February 12, 2010 by Amund Tveit

The newest and most up-to-date version (May 2010) this blog post is available at http://mapreducebook.org Atbrox is startup company providing technology and services for Search and Mapreduce/Hadoop. Our background is from from Google, IBM and Research. This posting is an … Continue reading →

Posted in cloud computing, Hadoop and Mapreduce | Tagged algorithms, hadoop, machinelearning, mapreduce, search | 7 Comments

Parallel Machine Learning for Hadoop/Mapreduce – A Python Example

Posted on February 8, 2010 by Amund Tveit

Atbrox is startup providing technology and services for Search and Mapreduce/Hadoop. Our background is from from Google, IBM and Research. Update 2010-June-17 Code for this posting is now on github –http://github.com/atbrox/Snabler This posting gives an example of how to use … Continue reading →

Posted in cloud computing, Hadoop and Mapreduce, infrastructure | Tagged github, hadoop, machine learning, machinelearning, mapreduce, open source, python, ridge regression, svm | 14 Comments

How to combine Elastic Mapreduce/Hadoop with other Amazon Web Services

Posted on November 11, 2009 by Amund Tveit

Elastic Mapreduce default behavior is to read from and store to S3. When you need to access other AWS services, e.g. SQS queues or database services SimpleDB and RDS (MySQL) the best approach from Python is to use Boto. To … Continue reading →

Posted in cloud computing, Hadoop and Mapreduce, infrastructure | Tagged amazon, aws, hadoop, mapreduce, python, simpledb, sqs | 4 Comments