The prior update of this posting was in May, and a lot has happened related to Mapreduce and Hadoop since then, e.g.
1) big software companies have started offering hadoop-based software (Microsoft and Oracle), 2) Hadoop-startups have raised record amounts, and 3) nosql-landscape becoming increasingly datawarehouse’ish and sql’ish with the focus on high-level data processing platforms and query languages.
Personally I have rediscovered Hadoop Pig and combine it with UDFs and streaming as my primary way to implement mapreduce algorithms here in Atbrox.
Best regards,
Amund Tveit (twitter.com/atveit)
Changes from the prior postings is that this posting only includes _new_ papers (2011):
Artificial Intelligence/Machine Learning/Data Mining
-
NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce
Distributed Evolutionary Algorithm Using the MapReduce Paradigm–A Case Study for Data Compaction Problem
On Using Pattern Matching Algorithms in MapReduce Applications
Using Variational Inference and MapReduce to Scale Topic Modeling
A MapReduce-based distributed SVM algorithm for automatic image annotation
Scalable and Parallel Boosting with MapReduce
Master-Slave Parallel Genetic Algorithm Based on MapReduce Using Cloud Computing
Fast clustering using MapReduce
K-Means Clustering with Bagging and MapReduce
In-situ MapReduce for Log Processing
Clustering Very Large Multi-dimensional Datasets with MapReduce
Large Scale Fuzzy pD* Reasoning Using MapReduce
MapReduce network enabled algorithms for classification based on association rules
PARABLE: A PArallel RAndom-partition Based HierarchicaL ClustEring Algorithm for the MapReduce Framework
A MapReduce based parallel SVM for large scale spam filtering
Clustering Systems with Kolmogorov Complexity and MapReduce
Bioinformatics/Medical Informatics
-
Rapid parallel genome indexing with MapReduce
CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping
Nephele: genotyping via complete composition vectors and MapReduce
Genome Analysis with MapReduce
Parallel Metagenomic Sequence Clustering via Sketching and Maximal Quasi-clique Enumeration on Map-reduce Clouds
Hadoop-GIS: A High Performance Query System for Analytical Medical Imaging with MapReduce
Image and Video Processing
-
Multi-layer graph-based semi-supervised learning for large-scale image datasets using mapreduce
Skyline web service selection with MapReduce
HIPI: A Hadoop Image Processing Interface for Image-based MapReduce Tasks
An Approach for Processing Large and Non-uniform Media Objects on MapReduce-Based Clusters
Building Wavelet Histograms on Large Data in MapReduce
Statistics and Numerical Mathematics
-
Solving Linear Programs in MapReduce
Gaussian Deconvolution and MapReduce Approach for Chipseq Analysis
Design and implementation of parallel statistical algorithm based on Hadoop’s MapReduce model
A MapReduce framework for on-road mobile fossil fuel combustion CO2 emission estimation
Search and Information Retrieval
-
Fast personalized PageRank on MapReduce
MapReduce for Experimental Search
Full-text indexing for optimizing selection operations in large-scale data analytics
Sets & Graphs
-
MapReduce in MPI for Large-scale Graph Algorithms
Design Distributed Digraph Algorithms using MapReduce
An Intermediate Algebra for Optimizing RDF Graph Pattern Matching on MapReduce
Processing theta-joins using MapReduce
Clause-Iteration with MapReduce to Scalably Query Data Graphs in the SHARD Graph-Store
Mining Tera-Scale Graphs with MapReduce: Theory, Engineering and Discoveries
Filtering: a method for solving graph problems in MapReduce
Colorful Triangle Counting and a MapReduce Implementation
A parallel computing model for large-graph mining with MapReduce
Simulation
-
Molecular Dynamics Simulation Based on Hadoop Mapreduce
TH‐E‐BRC‐04: Monte‐Carlo Simulation in a Cloud Computing Environment with MapReduce
Distributed simulation of P systems by means of map-reduce: first steps with hadoop and P-lingua
Social Networks
Spatial Data Processing
-
SDPPF—A MapReduce based parallel processing framework for spatial data
MRGIR: Open geographical information retrieval using MapReduce
Scalable Local Regression for Spatial Analytics
Research on Parallel DBSCAN Algorithm Design Based on MapReduce
Text Processing
-
P 2 LSA and P 2 LSA+: two paralleled probabilistic latent semantic analysis algorithms based on the mapreduce model
Processing Wikipedia Dumps: A Case-Study comparing the XGrid and MapReduce Approaches
MapReduce for HITS Algorithm with Application to Chinese Word Networks
Implementing MapReduce over language and literature data over the UK National Grid Service
Representing n-gram language models for compact storage and fast retrieval
Pingback: 30 Hadoop and Big Data Spelunkers Worth Following | My Blog
Pingback: Mapreduce & Hadoop Algorithms in Academic Papers (5th update – Nov 2011) « Another Word For It
Pingback: 09CST-FYP交流平台 » 数据分析与数据挖掘相关资源整理
Pingback: Hadoop Learning Resources | hadoop4u