Sometimes it can be useful to compile Python code for Amazon’s Elastic Mapreduce into C++ and then into a binary. The motivation for that could be to integrate with (existing) C or C++ code, or increase performance for CPU-intensive mapper or reducer methods. Here follows a description how to do that:
- Shedskin requires a few libraries: 1) the Boehm-Demers-Weiser garbage collector for C++, 2) PCRE – Perl Compatible Regular Expressions. The Shedskin tutorial for detailed install instructions.
- note: The Alestic Debian AMI is fairly slim, so we had to add some more software make Shedskin work, i.e. GDB
- E.g. the commandpython ss.py mapper.py – would generate C++ files mapper.hpp and mapper.cpp, a Makefile and an annotated Python file mapper.ss.py.
- note: with Fortran-to-C you can probably integrate your Python code with existing Fortran code (e.g. numerical/high performance computing libraries). Similar for Cobol (e.g. in financial industry) with OpenCobol (compiling Cobol into C). Please let us know if you try or need help with help that.
Note: if you skip the shedskin-related steps this approach would also work if you are looking for how to use C or C++ mappers or reducers with Elastic Mapreduce.
Note: this approach should probably work also with Cloudera’s distribution for Hadoop.
Do you need help with Hadoop/Mapreduce?
A good start could be to read this book, or