Archive for the ‘Filtered Replication’ category

Similarities between Replication and Event or Message Bus

January 24th, 2012

A key component to many distributed software systems is the concept of a Message or Event Bus.  Any good Messaging architecture should be able to accomplish some basic things when it comes to transporting your messages, I have outlined them below:

  • Publish-Subscribe: Modules may subscribe to certain message types. Whenever a module publishes a message to the bus, it will be delivered to all modules that subscribed to its message type.
  • Broadcast: The message will be delivered to all (other) modules.
  • Point-to-point: The message has one and only one recipient.

In doing my research into distributed systems I keep wanting to do a comparison of these two techniques of transporting messages around in a distributed system.  At its very basic usage, replication takes care of the Broadcast scenario very easily, right out of the box, that is the very definition of replication.  The other two types Publish-Subscribe and Point-to-Point were simply not possible up until this point without a lot of application level logic, that just simply doesn’t make much sense.

Although CouchDB with its introduction of selective replication seems to accomplish this, I don’t want to focus this on Couch, I just want to pose Selective Replication as an alternative to using a message bus and I would like to see what the pros-cons are for each.  This is what prompted me to write the CouchDB admin tool, because when you have selective replication, you have a need to administer those replication documents and filters to tell the databases where the documents need to go.  I think that the admin tool is a drawback to using Selective Replication but on the flipside you will have a lot of code or configuration in the event bus system, particularly in the writing/maintenance of the many modules you need to get the publishers/subscribers in place.

I think a large pro in favor of Selective or Full Replication is how easy it is to make backups or replicas of your data, instead of having to pass your messages through the bus in the Message Bus system with replication you just bring your db online and it will do the replication for you, so there is a significant performance advantage in replication.

As far as Publisher-Subscriber messaging, this is where Filtered or Selective Replication comes into play, you have the ability to set up sets of filters that tells your database where to send the documents to and exactly which ones.  The Point-to-Point scenario is accomplished via this method as well, you just have a single point of replication, whether you choose to filter the documents is up to you.

I hope this document can be of use to somebody, I just wanted to get this out there for anybody else looking into options for building their next distributed systems.  I think both technologies have their advantages/disadvantages and should be used when they fit.

Filtered Replication Scenario for distributed systems.

January 10th, 2012

As you may know, I have been studying couchdb, specifically its ability to replicate your data and trying to get  a better understanding of how all this works.   I have modeled out the basic data flow for selective, or filtered replication of the data and I have attached the diagram below.  I will give a little bit of an explanation of what is going on here to hopefully make more sense of how things are going to work.

We have an admin tool that would most likely reside on a web server somewhere that will have the ability to do the standard CRUD operations on users as well as the standard CRUD operations on the couchdb documents for filtering the data.  Those changes will all go into a database that is housed on the main server on a per organization basis, say its your local store and the Master DB is your HQ.  That is the admin piece, now on to the server side..

The server, or Master DB will just be the central repository of data, the filters in my current idea will all sit in the Filtered DBs and we will let those filtered DBs make all the decisions on where the data goes when it comes to the server.

The client will have the full ability to create all the documents on the local or client side for simplicity at this point.  (I am defining the client as an actual application that will reside on the desktop, mobile device, whatever)  The Filtered DB will grab all the data from the client and hold it for filtered replication to the server later, this would seem to be a nice way to have a backup of your local data.  The idea at this point is to have the client side create the filtered DB on the server side which is simple enough via couchdb.

That pretty much sums it up, I have the diagram below.