May 16

It’s been a year since I updated the mapreduce algorithms posting last time, and it has been truly an excellent year for mapreduce and hadoop – the number of commercial vendors supporting it has multiplied, e.g. with 5 announcements at EMC World only last week (Greenplum, Mellanox, Datastax, NetApp, and Snaplogic) and today’s Datameer funding announcement , which benefits the mapreduce and hadoop ecosystem as a whole (even for small fish like us here in Atbrox). The work-horse in mapreduce is the algorithm, this update has added 35 new papers compared to the prior posting, new ones are marked with *. I’ve also added 2 new categories since the last update – astronomy and social networking.

Learn from academic literature about how the mapreduce parallel model and hadoop implementation is used to solve algorithmic problems.

Which areas do the papers cover?

Author organizations and companies?
Companies: China Mobile, eBay, Google, Hewlett Packard and Intel, Microsoft, Wikipedia, Yahoo and Yandex.
Government Institutions and Universities: US National Security Agency (NSA)
, Carnegie Mellon University, TU Dresden, University of Pennsylvania, University of Central Florida, National University of Ireland, University of Missouri, University of Arizona, University of Glasgow, Berkeley University and National Tsing Hua University, University of California, Poznan University, Florida International University, Zhejiang University, Texas A&M University, University of California at Irvine, University of Illinois, Chinese Academy of Sciences, Vrije Universiteit, Engenharia University, State University of New York, Palacky University, University of Texas at Dallas

Atbrox on LinkedIn

Btw: I would like to recommend:

  1. Mapreduce bibliography maintained by (Cloudera co-founder) Jeff Hammerbacher
  2. (the excellent) book – Data-Intensive Text Processing with Mapreduce by (UMD’s/Twitter’s) Jimmy Lin and Christopher Dyer.

Let me know if you have input/corrections/feedback to this posting – amund @\h@ – or @atveit or @atbrox on twitter.

Best regards,
Amund Tveit (Atbrox co-founder)

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

14 Responses to “Mapreduce & Hadoop Algorithms in Academic Papers (4th update – May 2011)”

  1. Mr. Gunn Says:

    Mapreduce is the #2 computer science paper of all time on Mendeley, I found:

  2. MapreduceやHadoopの論文集 | 大規模計算ドットコム Says:

    [...] atbrox.comで、MapreduceやHadoopに関する論文のリンク集が公開されている。例えば、広告とEコマースや天文学、ソーシャルネットワークなどジャンルごとに分類されている。図表1は、Improving Ad Relevance in Sponsored Search(スポンサードサーチにおける広告の関連性の改善)という論文から。 図表1:様々な学習モデルの精度(Improving Ad Relevance in Sponsored Search) [...]

  3. Mapreduce & Hadoop Algorithms in Academic Papers (4th update) « Another Word For It Says:

    [...] Mapreduce & Hadoop Algorithms in Academic Papers (4th update – May 2011) [...]

  4. MapReduce: Links, News and Resources (1) « Angel “Java” Lopez on Blog Says:

    [...] Mapreduce & Hadoop Algorithms in Academic Papers [...]

  5. Hadoop: Links, News and Resources (1) « Angel “Java” Lopez on Blog Says:

    [...] Mapreduce & Hadoop Algorithms in Academic Papers (4th update – May 2011) [...]

  6. Twitted by renjithkv2010 Says:

    [...] This post was Twitted by renjithkv2010 [...]

  7. Twitted by cxcaixinster Says:

    [...] This post was Twitted by cxcaixinster [...]

  8. Mapreduce & Hadoop Algorithms in Academic Papers (4th update – May 2011) « Another Word For It Says:

    [...] Mapreduce & Hadoop Algorithms in Academic Papers (4th update – May 2011) by Amund Tveit. [...]

  9. 小组报告总结 « cxcaixinster Says:

    [...] 报告的时候,也没有去做PPT,因为时间比较紧,涉及的材料也多。大概的思路是,先用Google的那篇论文,讲解MapReduce的基本概念和执行步骤,结合论文里的例子说一下。然后是两天前Wei博讲的那篇 Semi-Supervised Ranking on Very Large Graph with Rich Metadata,里面提到了两个用MapReduce计算的例子。矩阵和向量的乘积,还有一个Kronecker product的计算。然后是展示Standford的一篇文章Map-Reduce for Machine Learning on Multicore(NIPS 06),文章提出了一个Statistical Query Model,说明了一些常见机器学习算法,在满足此模型的基础上,都可以利用MapReduce来加速计算。最后是展示一下延伸阅读的内容,一个是Mapreduce & Hadoop Algorithms in Academic Papers (4th update – May 2011),这里收集了近年应用MapReduce的论文,不错的参考资料。还有一个是一篇博文,Demo:Writing An Hadoop MapReduce Program In Python,举的就是那个词频统计的例子,怎样利用HadoopStreaming使用Python实现,说的很详细。 [...]

  10. Quora Says:

    What are some good books on MapReduce problem solving techniques?…

    * Data Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer is phenomenal. I recommend it to anyone who’s just learning how to use Hadoop or other MapReduce systems. Unlike a basic “Hadoop” book, it’s more about problem solving and …

  11. Open source Big Data Smart City algorithms … Says:

    [...] ideas from other domains to city level problems. Atbrox has a very good set of resources for mapreduce-hadoop algorithms. These include – Search, Behavioural targeting, Astronomy,  Social Networks, [...]

  12. Class projects for Hadoop | Digital thoughts Says:

    [...] Amund Tveit‘s links:… [...]

  13. Hadoop Resources at one place | Hadoop talk Says:

    [...]…What MapReduce (Hadoop) can’t solve [...]

  14. Hadoop Resources | Big Data Analytics News Says:

    […]… […]

Leave a Reply

preload preload preload