Main takeaways from Accel’s Big Data Conference

Attended Accel Partners Big Data conference last week. It was a good event with many interesting people, a very crude estimate of distribution: 1/3 VCs/investors, 1/3 startup tech people, 1/3 big corp tech people +-.

My personal 2 key takeaways from the conference:

  1. Realtime processing: hot topic with many companies creating their own custom solutions, but wouldn’t object having an exceptionally good opensource solution to gather around.
  2. Low-latency storage: emerging topic – or as quoted from the talk by Andy Becholsteim’s (Sun/Arista/Granite/Kealia/HighBAR co-founder and early Google-investor): “Hard Disk Drives are not keeping up. Flash solving this problem just in time”. The academic session had also interesting discussions regarding RAM-based storage.

I think Andy Becholsteim’s table titled “Memory Hierarchi is Not Changing” sums up the low-latency storage discussion quite good. I’ve taken the liberty to add a column with rough prices per Petabyte-month (calculation: estimated purchase-price divided by 12, note only the storage itself – not including all the hardware/network in order to run it) for RAM and SSD which are the only ones fit for low-latency AND big data. Note: I think mr. Becholsteim could have added up to petabytes for both SSD and RAM.

Type of memory Size Latency $ per Petabyte-month* (k$)
L1 cache 64 KB ~4 cycles (2 ns)
L2 cache 256 KB ~10 cycles (5 ns)
L3 cache (shared) 8 MB 35-40+ cycles (20 ns)
Main memory GBs up to terabytes 100-400 cycles 411 (non-ECC)
1,197 (ECC)
Solid state memory GBs up to terabytes 5,000 cycles 94
Disk Up to petabytes 1,000,000 cycles

*Storage price sources and calculations used

RAM (non-ECC): 16GB non-ECC (2x8GB) – price: $79, i.e. $79/16 per GB, $(79/16)K per TB, $(79/16)M per PB, $(79/16)M/12 per PB-month
RAM (ECC): 16GB ECC (1x16GB) – price: $229.98, i.e. $230/16 per GB, $(230/16)K per TB, $(230/16)M per PB, $(230/16)/12 per PB-month.
SSD: 512GB – price $579.99, i.e. $580/512 per GB, $(580/512)K per TB, $(580/512)M per PB, $(580/512)/12 per PB-month.

Conclusion

Since RAM-based storage is up to 50 times faster than SSD (latency-wise) but only roughly 4.3 to 12 times more expensive than SSD it is likely to become high on the agenda in settings where latency matter$ (all types of serving infrastructure, search, finance etc.). In absolute terms the costs for petabytes RAM have become within reach for all Fortune 1000 companies, i.e. about $1.1M per month for the storage alone (ECC RAM). One interesting thing about using RAM only is that for most systems using SSD or Disks there is also a big RAM component in addition, e.g. using memcached or caches various nosql storages, and by moving to RAM-only things might become simpler (i.e. avoiding dealing with memory-vs-disk/ssd-coherency and latency variations when not hitting the memory cache).

Note 1: If you have other sources for interesting large-scale RAM and SSD prices I would appreciate if you could add links to them in the comments below.

Note 2: If you’re interested in large-scale RAM-based key-value stores, check out our opensource project Atbr – github page: https://github.com/atbrox/atbr

Best regards,

Amund Tveit co-founder of Atbrox (@atbrox)

This entry was posted in cloud computing and tagged , , , , . Bookmark the permalink.

3 Responses to Main takeaways from Accel’s Big Data Conference

  1. Alexander Kjeldaas says:

    You are forgetting the electricity bill.

    1 petabyte * (3W / GB) * (0.1 USD / kWh) in USD/month
    = 230 kUSD/month
    (the query works on Google).

    I am pulling the 3W/GB number out of thin air. It can probably be quite a bit lower than that, but I am guessing that a lower bound is around 0.5W + cooling. See the micron document referenced below.

    This is a fairly significant part of the equation – even when you’re pretty aggressive at writing off the RAM investment over 12 months.

    Micron has a spreadsheet for estimating RAM power consumption.
    http://www.micron.com/~/media/Documents/Products/Technical%20Note/DRAM/4292TN41_01DDR3_Power.ashx

    This should be part of standard profiling tools these days 😉

    Here are some links to SSD power consumption.
    http://www.tomshardware.com/reviews/ssd-nand-reliability,3021-7.html
    http://www.anandtech.com/show/4253/the-crucial-m4-micron-c400-ssd-review/11

    Based on the AnandTech numbers, it seems that SSDs can give almost two orders of magnitude less W/GB. Thus ignoring power consumptions skews the results in favor of the faster parts of the system.

  2. Amund Tveit says:

    Good comment. But 230kUSD/month (with 3W/GB) is only 1/4th of the cost of the RAM per month, and if the lower limit for power usage is about 0.5W + cooling, let us assume 1W, that puts the power price cost to 76kUSD/month, i.e. only 6.3% (76/1197) of the RAM price, i.e. almost neglible compared to the price of RAM.

    B.R,
    Amund

  3. Pingback: A large-scale in-memory storage example

Leave a Reply

Your email address will not be published. Required fields are marked *