Mapreduce & Hadoop Algorithms in Academic Papers

The newest and most up-to-date version (May 2010) this blog post is available at

An updated and extended version of this blog post can be found here.

Learn from academic literature about how the mapreduce parallel model and hadoop implementation is used to solve algorithmic problems.

Disclaimer: this is work in progress (look for updates)

Input Data – Academic Papers
Scholar has 981 papers citing the original Mapreduce paper from 2004 – a citation amount that is approximately 10 thousand pages (~ size of a typical encyclopedia)

What types of papers cite the mapreduce paper?

  1. Algorithmic papers
  2. General cloud overview papers
  3. Cloud infrastructure papers
  4. Future work sections in papers (e.g. “we plan to implement this with Hadoop”)

=> Looked at category 1 papers and skipped the rest

Who wrote the papers?

Search/Internet companies/organizations: eBay, Google, Microsoft, Wikipedia, Yahoo and Yandex.
IT companies: Hewlett Packard and Intel
Universities: Carnegie Mellon Univ., TU Dresden, Univ. of Pennsylvania, Univ. of Central Florida, National Univ. of Ireland, Univ. of Missouri, Univ. of Arizona, Univ. of Glasgow, Berkeley Univ. and National Tsing Hua Univ., Univ. of California, Poznan Univ.

Which areas do the papers cover?

On the papers looked at most of them are focused on IT-related areas, there is lots of unwritten in academia about mapreduce and hadoop applied for algorithms in other business and technology areas.

Opportunity for following up this posting can be to: 1) in more detail describe the algorithms (e.g. input/output formats), 2) try to classify them by patterns (e.g. with similar code structure), 3) offer the opportunity to simulate them in the browser (on toy-sized data sets) and 4) provide links to Hadoop implementations of them.

Do you need help with Hadoop/Mapreduce?
A good start could be to read this book, or contact Atbrox if you need help with development or parallelization of algorithms for Hadoop/Mapreduce – See our posting for an example parallelizing and implementing a machine learning algorithm for Hadoop/Mapreduce

This entry was posted in cloud computing, Hadoop and Mapreduce, infrastructure and tagged , , , , , , . Bookmark the permalink.

One Response to Mapreduce & Hadoop Algorithms in Academic Papers