Attended Accel Partners Big Data conference last week. It was a good event with many interesting people, a very crude estimate of distribution: 1/3 VCs/investors, 1/3 startup tech people, 1/3 big corp tech people.
My personal 2 key takeaways from the conference:
- Realtime processing: hot topic with many companies creating their own custom solutions, but wouldn’t object having an exceptionally good opensource solution to gather around.
- Low-latency storage: emerging topic – or as quoted from the talk by Andy Becholsteim’s (Sun/Arista/Granite/Kealia/HighBAR co-founder and early Google-investor): “Hard Disk Drives are not keeping up. Flash solving this problem just in time”. The academic session had also interesting discussions regarding RAM-based storage.
I think Andy Becholsteim’s table titled “Memory Hierarchi is Not Changing” sums up the low-latency storage discussion quite good. I’ve taken the liberty to add a column with rough prices per Petabyte-month (calculation: estimated purchase-price divided by 12, note only the storage itself – not including all the hardware/network in order to run it) for RAM and SSD which are the only ones fit for low-latency AND big data. Note: I think mr. Becholsteim could have added up to petabytes for both SSD and RAM.
Type of memory | Size | Latency | $ per Petabyte-month* (k$) |
---|---|---|---|
L1 cache | 64 KB | ~4 cycles (2 ns) | |
L2 cache | 256 KB | ~10 cycles (5 ns) | |
L3 cache (shared) | 8 MB | 35-40+ cycles (20 ns) | |
Main memory | GBs up to terabytes | 100-400 cycles | 411 (non-ECC) 1,197 (ECC) |
Solid state memory | GBs up to terabytes | 5,000 cycles | 94 |
Disk | Up to petabytes | 1,000,000 cycles |
*Storage price sources and calculations used
RAM (non-ECC): 16GB non-ECC (2x8GB) – price: $79, i.e. $79/16 per GB, $(79/16)K per TB, $(79/16)M per PB, $(79/16)M/12 per PB-month
RAM (ECC): 16GB ECC (1x16GB) – price: $229.98, i.e. $230/16 per GB, $(230/16)K per TB, $(230/16)M per PB, $(230/16)/12 per PB-month.
SSD: 512GB – price $579.99, i.e. $580/512 per GB, $(580/512)K per TB, $(580/512)M per PB, $(580/512)/12 per PB-month.
Conclusion
Since RAM-based storage is up to 50 times faster than SSD (latency-wise) but only roughly 4.3 to 12 times more expensive than SSD it is likely to become high on the agenda in settings where latency matter$ (all types of serving infrastructure, search, finance etc.). In absolute terms the costs for petabytes RAM have become within reach for all Fortune 1000 companies, i.e. about $1.1M per month for the storage alone (ECC RAM). One interesting thing about using RAM only is that for most systems using SSD or Disks there is also a big RAM component in addition, e.g. using memcached or caches various nosql storages, and by moving to RAM-only things might become simpler (i.e. avoiding dealing with memory-vs-disk/ssd-coherency and latency variations when not hitting the memory cache).
Note 1: If you have other sources for interesting large-scale RAM and SSD prices I would appreciate if you could add links to them in the comments below.
Note 2: If you’re interested in large-scale RAM-based key-value stores, check out our opensource project Atbr – github page: https://github.com/atbrox/atbr
Best regards,
Amund Tveit co-founder of Atbrox (@atbrox)
Pingback: A large-scale in-memory storage example