Jul
11
Data Sharding and Replication in Ruby
Filed Under apple, ruby, technology |
The good peole at FiveRuns have just recently released a gem to the Ruby community that allows their ActiveRecord-managed data models to now have built-in sharding and/or replication functionality…with just a few lines of code!
What do I mean by Data Sharding? Read on:
Specifically we needed two features to scale our mysql database: application-level sharding and master/slave replication. Sharding is the process of splitting a dataset across many independent databases. This often happens based on geographical region (e.g. craigslist) or user account (e.g. flickr). Replication provides a near-real-time copy of a database which can be used for fault tolerance and to reduce load on the master node. Combined, you get a scalable database solution which does not require huge hardware to scale to huge volumes.
They call this new gem, DataFabric. DataFabric makes it super simple (and DRY) to make your application scales to multiple database shards, or even just provide basic replication to different database servers if you so choose.
For me, the idea of having something so complex as data sharding built into the core of ActiveRecord is absolutely fascinating! You don’t have to have any sort of mysql_proxy business, or other strange DRb services running in the background, just essentially plug and play.
Comments
4 Comments so far
That’s really awesome, except when people ask how you did it and you respond “I sharded”
Glad you like it Derek. Let me know if you ever use it in anger, I love to hear war stories.
@Mike,
I haven’t had a chance to use it yet, but I am really interested in this approach to solving the problem. I am just glad that a team smarter than I has solved it first :)
great work!
[...] - bookmarked by 2 members originally found by rickenriquericky on 2008-08-08 Data Sharding and Replication in Ruby http://derekperez.com/blog/2008/07/data-sharding-and-replication-in-ruby/ - bookmarked by 1 [...]