May 14

Attended Accel Partners Big Data conference last week. It was a good event with many interesting people, a very crude estimate of distribution: 1/3 VCs/investors, 1/3 startup tech people, 1/3 big corp tech people +-.

My personal 2 key takeaways from the conference:

  1. Realtime processing: hot topic with many companies creating their own custom solutions, but wouldn’t object having an exceptionally good opensource solution to gather around.
  2. Low-latency storage: emerging topic – or as quoted from the talk by Andy Becholsteim’s (Sun/Arista/Granite/Kealia/HighBAR co-founder and early Google-investor): “Hard Disk Drives are not keeping up. Flash solving this problem just in time”. The academic session had also interesting discussions regarding RAM-based storage.

I think Andy Becholsteim’s table titled “Memory Hierarchi is Not Changing” sums up the low-latency storage discussion quite good. I’ve taken the liberty to add a column with rough prices per Petabyte-month (calculation: estimated purchase-price divided by 12, note only the storage itself – not including all the hardware/network in order to run it) for RAM and SSD which are the only ones fit for low-latency AND big data. Note: I think mr. Becholsteim could have added up to petabytes for both SSD and RAM.

Type of memory Size Latency $ per Petabyte-month* (k$)
L1 cache 64 KB ~4 cycles (2 ns)
L2 cache 256 KB ~10 cycles (5 ns)
L3 cache (shared) 8 MB 35-40+ cycles (20 ns)
Main memory GBs up to terabytes 100-400 cycles 411 (non-ECC)
1,197 (ECC)
Solid state memory GBs up to terabytes 5,000 cycles 94
Disk Up to petabytes 1,000,000 cycles

*Storage price sources and calculations used

RAM (non-ECC): 16GB non-ECC (2x8GB) – price: $79, i.e. $79/16 per GB, $(79/16)K per TB, $(79/16)M per PB, $(79/16)M/12 per PB-month
RAM (ECC): 16GB ECC (1x16GB) – price: $229.98, i.e. $230/16 per GB, $(230/16)K per TB, $(230/16)M per PB, $(230/16)/12 per PB-month.
SSD: 512GB – price $579.99, i.e. $580/512 per GB, $(580/512)K per TB, $(580/512)M per PB, $(580/512)/12 per PB-month.

Conclusion

Since RAM-based storage is up to 50 times faster than SSD (latency-wise) but only roughly 4.3 to 12 times more expensive than SSD it is likely to become high on the agenda in settings where latency matter$ (all types of serving infrastructure, search, finance etc.). In absolute terms the costs for petabytes RAM have become within reach for all Fortune 1000 companies, i.e. about $1.1M per month for the storage alone (ECC RAM). One interesting thing about using RAM only is that for most systems using SSD or Disks there is also a big RAM component in addition, e.g. using memcached or caches various nosql storages, and by moving to RAM-only things might become simpler (i.e. avoiding dealing with memory-vs-disk/ssd-coherency and latency variations when not hitting the memory cache).

Note 1: If you have other sources for interesting large-scale RAM and SSD prices I would appreciate if you could add links to them in the comments below.

Note 2: If you’re interested in large-scale RAM-based key-value stores, check out our opensource project Atbr – github page: https://github.com/atbrox/atbr

Best regards,

Amund Tveit co-founder of Atbrox (@atbrox)

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

3 Responses to “Main takeaways from Accel’s Big Data Conference”

  1. Alexander Kjeldaas Says:

    You are forgetting the electricity bill.

    1 petabyte * (3W / GB) * (0.1 USD / kWh) in USD/month
    = 230 kUSD/month
    (the query works on Google).

    I am pulling the 3W/GB number out of thin air. It can probably be quite a bit lower than that, but I am guessing that a lower bound is around 0.5W + cooling. See the micron document referenced below.

    This is a fairly significant part of the equation – even when you’re pretty aggressive at writing off the RAM investment over 12 months.

    Micron has a spreadsheet for estimating RAM power consumption.
    http://www.micron.com/~/media/Documents/Products/Technical%20Note/DRAM/4292TN41_01DDR3_Power.ashx

    This should be part of standard profiling tools these days ;-)

    Here are some links to SSD power consumption.
    http://www.tomshardware.com/reviews/ssd-nand-reliability,3021-7.html
    http://www.anandtech.com/show/4253/the-crucial-m4-micron-c400-ssd-review/11

    Based on the AnandTech numbers, it seems that SSDs can give almost two orders of magnitude less W/GB. Thus ignoring power consumptions skews the results in favor of the faster parts of the system.

  2. Amund Tveit Says:

    Good comment. But 230kUSD/month (with 3W/GB) is only 1/4th of the cost of the RAM per month, and if the lower limit for power usage is about 0.5W + cooling, let us assume 1W, that puts the power price cost to 76kUSD/month, i.e. only 6.3% (76/1197) of the RAM price, i.e. almost neglible compared to the price of RAM.

    B.R,
    Amund

  3. A large-scale in-memory storage example Says:

    [...] Main takeaways from Accel’s Big Data Conference May 16 [...]

Leave a Reply

preload preload preload