Archive for the ‘Replication’ category

Similarities between Replication and Event or Message Bus

January 24th, 2012

A key component to many distributed software systems is the concept of a Message or Event Bus.  Any good Messaging architecture should be able to accomplish some basic things when it comes to transporting your messages, I have outlined them below:

  • Publish-Subscribe: Modules may subscribe to certain message types. Whenever a module publishes a message to the bus, it will be delivered to all modules that subscribed to its message type.
  • Broadcast: The message will be delivered to all (other) modules.
  • Point-to-point: The message has one and only one recipient.

In doing my research into distributed systems I keep wanting to do a comparison of these two techniques of transporting messages around in a distributed system.  At its very basic usage, replication takes care of the Broadcast scenario very easily, right out of the box, that is the very definition of replication.  The other two types Publish-Subscribe and Point-to-Point were simply not possible up until this point without a lot of application level logic, that just simply doesn’t make much sense.

Although CouchDB with its introduction of selective replication seems to accomplish this, I don’t want to focus this on Couch, I just want to pose Selective Replication as an alternative to using a message bus and I would like to see what the pros-cons are for each.  This is what prompted me to write the CouchDB admin tool, because when you have selective replication, you have a need to administer those replication documents and filters to tell the databases where the documents need to go.  I think that the admin tool is a drawback to using Selective Replication but on the flipside you will have a lot of code or configuration in the event bus system, particularly in the writing/maintenance of the many modules you need to get the publishers/subscribers in place.

I think a large pro in favor of Selective or Full Replication is how easy it is to make backups or replicas of your data, instead of having to pass your messages through the bus in the Message Bus system with replication you just bring your db online and it will do the replication for you, so there is a significant performance advantage in replication.

As far as Publisher-Subscriber messaging, this is where Filtered or Selective Replication comes into play, you have the ability to set up sets of filters that tells your database where to send the documents to and exactly which ones.  The Point-to-Point scenario is accomplished via this method as well, you just have a single point of replication, whether you choose to filter the documents is up to you.

I hope this document can be of use to somebody, I just wanted to get this out there for anybody else looking into options for building their next distributed systems.  I think both technologies have their advantages/disadvantages and should be used when they fit.

Update on CouchDB Admin Tool

January 20th, 2012

I have included some screenshots of my new and improved version of the couchdb admin tool.  Upon further research I have noticed that CouchDB will only run a replicator document per target and source, meaning once you make one replicator document, if you make another and assign it the same target and source, it won’t ever kick that second one off until you change either the target or source.

In this new version I have busted out the UI into more tabs so it makes more sense.  I also created a DocTypeViewModel to model the replication documents you create over to the UI.  The code has been updated at http://code.google.com/p/couchdb-replication-admin-tool/

CouchDB Replication Admin Tool

January 16th, 2012

I whipped up this admin tool in WPF to show how you can manage users along with replication of your documents.  You will have to have the couchbase server installed before you can run the example tho, you can download that HERE   The code is pretty simple, the key is the redbranch-hammock library that I have been working with and modifying to my needs.  At this point the project is a WPF project, it would probably be better implemented as an asp.net site but I come from a desktop background and I was working with a local copy of couch on my machine so I went with a WPF project for now.  I will probably port it over to some sort of web project down the road later.   Here is the link to the google code SVN repository:  http://code.google.com/p/couchdb-replication-admin-tool/

Filtered Replication Scenario for distributed systems.

January 10th, 2012

As you may know, I have been studying couchdb, specifically its ability to replicate your data and trying to get  a better understanding of how all this works.   I have modeled out the basic data flow for selective, or filtered replication of the data and I have attached the diagram below.  I will give a little bit of an explanation of what is going on here to hopefully make more sense of how things are going to work.

We have an admin tool that would most likely reside on a web server somewhere that will have the ability to do the standard CRUD operations on users as well as the standard CRUD operations on the couchdb documents for filtering the data.  Those changes will all go into a database that is housed on the main server on a per organization basis, say its your local store and the Master DB is your HQ.  That is the admin piece, now on to the server side..

The server, or Master DB will just be the central repository of data, the filters in my current idea will all sit in the Filtered DBs and we will let those filtered DBs make all the decisions on where the data goes when it comes to the server.

The client will have the full ability to create all the documents on the local or client side for simplicity at this point.  (I am defining the client as an actual application that will reside on the desktop, mobile device, whatever)  The Filtered DB will grab all the data from the client and hold it for filtered replication to the server later, this would seem to be a nice way to have a backup of your local data.  The idea at this point is to have the client side create the filtered DB on the server side which is simple enough via couchdb.

That pretty much sums it up, I have the diagram below.

CouchDB presentation @ StrangeLoop

January 9th, 2012

Here is a good little video of a presentation that Benjamin Young with CouchBase gave at the StrangeLoop conference.  If you are considering writing a couchDB app or migrating a current application you have to couch, I recommend watching this video, he gives a good rundown of the features of the NoSQL engine.

http://www.infoq.com/presentations/Why-CouchDB

My first contribution to the open source world

January 9th, 2012

I have been working with an excellent library for CouchDB development called relax-net, located here:  http://code.google.com/p/relax-net/.

After having some discussions with Nick, the creator of the library and overall really nice guy I found it was lacking some support for Replication and Replication Filters, so I took the time to write em and include that support into the library.  Thanks for including me in the acknowledgement Nick!!!

–UPDATE – I found that the code I committed into the project was not complete, when you add a replication document to the _replicator database, you have to make sure you are doing a POST instead of a PUT.  The current code was just placing documents in the system and not kicking any kind of replication off.

Database Replication vs. Message-Based system

January 9th, 2012

Replication is a topic I have been keenly interested in the past month or so.  I have been studying various NoSQL database engines including RavenDB, MongoDB, CouchDB to name them in particular.  Each one of the technologies have their strengths and weaknesses but when it comes to replication in particular, CouchDB seems to be way ahead of the game with their implementation of selective, or filtered replication.  I wonder what are the benefits of using a message-based system to get your json documents to your database vs just letting the database handle them via replication.  Any thoughts or comments are welcome.