The prior update of this posting was in May, and a lot has happened related to Mapreduce and Hadoop since then, e.g.
1) big software companies have started offering hadoopbased software (Microsoft and Oracle), 2) Hadoopstartups have raised record amounts, and 3) nosqllandscape becoming increasingly datawarehouse’ish and sql’ish with the focus on highlevel data processing platforms and query languages.
Personally I have rediscovered Hadoop Pig and combine it with UDFs and streaming as my primary way to implement mapreduce algorithms here in Atbrox.
Best regards,
Amund Tveit (twitter.com/atveit)
Changes from the prior postings is that this posting only includes _new_ papers (2011):
Artificial Intelligence/Machine Learning/Data Mining

NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce
Distributed Evolutionary Algorithm Using the MapReduce Paradigm–A Case Study for Data Compaction Problem
On Using Pattern Matching Algorithms in MapReduce Applications
Using Variational Inference and MapReduce to Scale Topic Modeling
A MapReducebased distributed SVM algorithm for automatic image annotation
Scalable and Parallel Boosting with MapReduce
MasterSlave Parallel Genetic Algorithm Based on MapReduce Using Cloud Computing
Fast clustering using MapReduce
KMeans Clustering with Bagging and MapReduce
Insitu MapReduce for Log Processing
Clustering Very Large Multidimensional Datasets with MapReduce
Large Scale Fuzzy pD* Reasoning Using MapReduce
MapReduce network enabled algorithms for classification based on association rules
PARABLE: A PArallel RAndompartition Based HierarchicaL ClustEring Algorithm for the MapReduce Framework
A MapReduce based parallel SVM for large scale spam filtering
Clustering Systems with Kolmogorov Complexity and MapReduce
Bioinformatics/Medical Informatics

Rapid parallel genome indexing with MapReduce
CloudAligner: A fast and fullfeatured MapReduce based tool for sequence mapping
Nephele: genotyping via complete composition vectors and MapReduce
Genome Analysis with MapReduce
Parallel Metagenomic Sequence Clustering via Sketching and Maximal Quasiclique Enumeration on Mapreduce Clouds
HadoopGIS: A High Performance Query System for Analytical Medical Imaging with MapReduce
Image and Video Processing

Multilayer graphbased semisupervised learning for largescale image datasets using mapreduce
Skyline web service selection with MapReduce
HIPI: A Hadoop Image Processing Interface for Imagebased MapReduce Tasks
An Approach for Processing Large and Nonuniform Media Objects on MapReduceBased Clusters
Building Wavelet Histograms on Large Data in MapReduce
Statistics and Numerical Mathematics

Solving Linear Programs in MapReduce
Gaussian Deconvolution and MapReduce Approach for Chipseq Analysis
Design and implementation of parallel statistical algorithm based on Hadoop’s MapReduce model
A MapReduce framework for onroad mobile fossil fuel combustion CO2 emission estimation
Search and Information Retrieval

Fast personalized PageRank on MapReduce
MapReduce for Experimental Search
Fulltext indexing for optimizing selection operations in largescale data analytics
Sets & Graphs

MapReduce in MPI for Largescale Graph Algorithms
Design Distributed Digraph Algorithms using MapReduce
An Intermediate Algebra for Optimizing RDF Graph Pattern Matching on MapReduce
Processing thetajoins using MapReduce
ClauseIteration with MapReduce to Scalably Query Data Graphs in the SHARD GraphStore
Mining TeraScale Graphs with MapReduce: Theory, Engineering and Discoveries
Filtering: a method for solving graph problems in MapReduce
Colorful Triangle Counting and a MapReduce Implementation
A parallel computing model for largegraph mining with MapReduce
Simulation

Molecular Dynamics Simulation Based on Hadoop Mapreduce
TH‐E‐BRC‐04: Monte‐Carlo Simulation in a Cloud Computing Environment with MapReduce
Distributed simulation of P systems by means of mapreduce: first steps with hadoop and Plingua
Social Networks
Spatial Data Processing

SDPPF—A MapReduce based parallel processing framework for spatial data
MRGIR: Open geographical information retrieval using MapReduce
Scalable Local Regression for Spatial Analytics
Research on Parallel DBSCAN Algorithm Design Based on MapReduce
Text Processing

P 2 LSA and P 2 LSA+: two paralleled probabilistic latent semantic analysis algorithms based on the mapreduce model
Processing Wikipedia Dumps: A CaseStudy comparing the XGrid and MapReduce Approaches
MapReduce for HITS Algorithm with Application to Chinese Word Networks
Implementing MapReduce over language and literature data over the UK National Grid Service
Representing ngram language models for compact storage and fast retrieval
Pingback: 30 Hadoop and Big Data Spelunkers Worth Following  My Blog
Pingback: Mapreduce & Hadoop Algorithms in Academic Papers (5th update – Nov 2011) « Another Word For It
For whom ther are interested in MapReduce, these two papers may be intersting:
1) “A Study on Using Uncertain Time Series Matching Algorithms in MapReduce Applications”
http://arxiv.org/abs/1112.5505
2) “MapReduce Implementation of Prestack Kirchhoff Time Migration (PKTM) on Seismic Data”
http://www.computer.org/portal/web/csdl/doi/10.1109/PDCAT.2011.50
hello,
can any one help me to find the coding or methodology for hadoop clustering in text mining? please………
Pingback: 09CSTFYP交流平台 » 数据分析与数据挖掘相关资源整理
Pingback: Hadoop Learning Resources  hadoop4u
Thanks! Hadoop enables resilient, distributed processing of massive unstructured data sets across commodity computer clusters, in which each node of the cluster includes its own storage. MapReduce serves two essential functions: It parcels out work to various nodes within the cluster or map, and it organizes and reduces the results from each node into a cohesive answer to a query. More at http://www.youtube.com/watch?v=1jMR4cHBwZEa