Mapreduce & Hadoop Algorithms in Academic Papers (5th update – Nov 2011)

Posted on November 9, 2011 by Amund Tveit

The prior update of this posting was in May, and a lot has happened related to Mapreduce and Hadoop since then, e.g.
1) big software companies have started offering hadoop-based software (Microsoft and Oracle), 2) Hadoop-startups have raised record amounts, and 3) nosql-landscape becoming increasingly datawarehouse’ish and sql’ish with the focus on high-level data processing platforms and query languages.

Personally I have rediscovered Hadoop Pig and combine it with UDFs and streaming as my primary way to implement mapreduce algorithms here in Atbrox.

Best regards,
Amund Tveit (twitter.com/atveit)

Changes from the prior postings is that this posting only includes _new_ papers (2011):

Artificial Intelligence/Machine Learning/Data Mining

NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce

Distributed Evolutionary Algorithm Using the MapReduce Paradigm–A Case Study for Data Compaction Problem

On Using Pattern Matching Algorithms in MapReduce Applications

Using Variational Inference and MapReduce to Scale Topic Modeling

A MapReduce-based distributed SVM algorithm for automatic image annotation

Scalable and Parallel Boosting with MapReduce

Master-Slave Parallel Genetic Algorithm Based on MapReduce Using Cloud Computing

Fast clustering using MapReduce

K-Means Clustering with Bagging and MapReduce

In-situ MapReduce for Log Processing

Clustering Very Large Multi-dimensional Datasets with MapReduce

Large Scale Fuzzy pD* Reasoning Using MapReduce

MapReduce network enabled algorithms for classification based on association rules

PARABLE: A PArallel RAndom-partition Based HierarchicaL ClustEring Algorithm for the MapReduce Framework

A MapReduce based parallel SVM for large scale spam filtering

Clustering Systems with Kolmogorov Complexity and MapReduce

Bioinformatics/Medical Informatics

Rapid parallel genome indexing with MapReduce

CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping

Nephele: genotyping via complete composition vectors and MapReduce

Genome Analysis with MapReduce

Parallel Metagenomic Sequence Clustering via Sketching and Maximal Quasi-clique Enumeration on Map-reduce Clouds

Hadoop-GIS: A High Performance Query System for Analytical Medical Imaging with MapReduce

Image and Video Processing

Multi-layer graph-based semi-supervised learning for large-scale image datasets using mapreduce

Skyline web service selection with MapReduce

HIPI: A Hadoop Image Processing Interface for Image-based MapReduce Tasks

An Approach for Processing Large and Non-uniform Media Objects on MapReduce-Based Clusters

Building Wavelet Histograms on Large Data in MapReduce

Statistics and Numerical Mathematics

Solving Linear Programs in MapReduce

Gaussian Deconvolution and MapReduce Approach for Chipseq Analysis

Design and implementation of parallel statistical algorithm based on Hadoop’s MapReduce model

A MapReduce framework for on-road mobile fossil fuel combustion CO2 emission estimation

Search and Information Retrieval

Fast personalized PageRank on MapReduce

MapReduce for Experimental Search

Full-text indexing for optimizing selection operations in large-scale data analytics

Sets & Graphs

MapReduce in MPI for Large-scale Graph Algorithms

Design Distributed Digraph Algorithms using MapReduce

An Intermediate Algebra for Optimizing RDF Graph Pattern Matching on MapReduce

Processing theta-joins using MapReduce

Clause-Iteration with MapReduce to Scalably Query Data Graphs in the SHARD Graph-Store

Mining Tera-Scale Graphs with MapReduce: Theory, Engineering and Discoveries

Filtering: a method for solving graph problems in MapReduce

Colorful Triangle Counting and a MapReduce Implementation

A parallel computing model for large-graph mining with MapReduce

Simulation

Molecular Dynamics Simulation Based on Hadoop Mapreduce

TH‐E‐BRC‐04: Monte‐Carlo Simulation in a Cloud Computing Environment with MapReduce

Distributed simulation of P systems by means of map-reduce: first steps with hadoop and P-lingua

Social Networks

Implementation of a Large-Scalable Social Data Analysis System Based on MapReduce

Spatial Data Processing

SDPPF—A MapReduce based parallel processing framework for spatial data

MRGIR: Open geographical information retrieval using MapReduce

Scalable Local Regression for Spatial Analytics

Research on Parallel DBSCAN Algorithm Design Based on MapReduce

Text Processing

P 2 LSA and P 2 LSA+: two paralleled probabilistic latent semantic analysis algorithms based on the mapreduce model

Processing Wikipedia Dumps: A Case-Study comparing the XGrid and MapReduce Approaches

MapReduce for HITS Algorithm with Application to Chinese Word Networks

Implementing MapReduce over language and literature data over the UK National Grid Service

Representing n-gram language models for compact storage and fast retrieval

This entry was posted in hadoop, machine learning, mapreduce. Bookmark the permalink.

7 Responses to Mapreduce & Hadoop Algorithms in Academic Papers (5th update – Nov 2011)