[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]
Hi Kai, On Fri, May 08, 2009 at 11:42:17AM +0100, Kai Hendry wrote: > I don't think Google use RAID on their high availability configurations. A couple of years ago I attended a Google talk at a UKUUG event in Manchester. It was just after the publication of Google's research on hard drive reliability. At the talk, the Google employee (whose name now escapes me) said something along the lines of: - their servers are built in-house from commodity components - the servers have several disk drives each used independently - if a disk drive dies, they use hdparm to power it down and keep running the server regardless - once all the disks are dead, the server is taken out of service and scheduled for rebuild - losing a disk takes some data offline, but there is enough mirroring at the rack, suite and datacentre level to carry on. Another copy of the data is brought online elsewhere according to the redundancy requirements of that data. However, it is important to remember that Google has hundreds of thousands of servers and operates at a level of scale which is not applicable to most other environments. At extremes of scale like this some factors are skewed quite significantly. > Anyone who has listened to http://blog.stackoverflow.com/ or like me has > had some PAINFUL experiences with RAID, can't help feel cynical towards > RAID. Well it isn't perfect but on the whole I love it. It's saved my bacon many many times, and without it it's very difficult to achieve good IO performance. > Hence I recommend one drive in each machine. The problem is that a single conventional disk drive is very limited in its performance. It's much more efficient to have several of them attached to a fast bus since at least then there can be some parallelism while the rest of the machine is waiting for one IO request to be handled. You could instead scale horizontally by having many smaller machines with fewer disks, but disks are really so slow that even a modest IO requirement would result in a server that is throwing away CPU power because of lack of IO. It's hard to get a cheaper CPU than a bottom of the range dual core 64-bit model, yet one core of this can do quite a lot of work if supplied with enough IO. If you increased IO by buying an entire new machine then you have to pay for another CPU that is largely wasted. It would be a lot more efficient to put several disks into one machine. If you can design your software and/or processes to work with multiple mount points, no RAID, and to survive the loss of some proportion of these mount points then that is great. That leaves you only with the risk of lost productivity due to having to rebuild machines whose system disks die. Most people need to use general purpose software or more conventional filesystem setups though, and would spend more time in fixing their machines, so for these RAID does make sense. Disks are of course not the only components which can break, so the need for multiple machines is still there regardless. Cheers, Andy -- http://bitfolk.com/ -- No-nonsense VPS hosting Encrypted mail welcome - keyid 0x604DE5DB
Attachment:
signature.asc
Description: Digital signature
-- The Mailing List for the Devon & Cornwall LUG http://mailman.dclug.org.uk/listinfo/list FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html