A key component to many distributed software systems is the concept of a Message or Event Bus. Any good Messaging architecture should be able to accomplish some basic things when it comes to transporting your messages, I have outlined them below:
- Publish-Subscribe: Modules may subscribe to certain message types. Whenever a module publishes a message to the bus, it will be delivered to all modules that subscribed to its message type.
- Broadcast: The message will be delivered to all (other) modules.
- Point-to-point: The message has one and only one recipient.
In doing my research into distributed systems I keep wanting to do a comparison of these two techniques of transporting messages around in a distributed system. At its very basic usage, replication takes care of the Broadcast scenario very easily, right out of the box, that is the very definition of replication. The other two types Publish-Subscribe and Point-to-Point were simply not possible up until this point without a lot of application level logic, that just simply doesn’t make much sense.
Although CouchDB with its introduction of selective replication seems to accomplish this, I don’t want to focus this on Couch, I just want to pose Selective Replication as an alternative to using a message bus and I would like to see what the pros-cons are for each. This is what prompted me to write the CouchDB admin tool, because when you have selective replication, you have a need to administer those replication documents and filters to tell the databases where the documents need to go. I think that the admin tool is a drawback to using Selective Replication but on the flipside you will have a lot of code or configuration in the event bus system, particularly in the writing/maintenance of the many modules you need to get the publishers/subscribers in place.
I think a large pro in favor of Selective or Full Replication is how easy it is to make backups or replicas of your data, instead of having to pass your messages through the bus in the Message Bus system with replication you just bring your db online and it will do the replication for you, so there is a significant performance advantage in replication.
As far as Publisher-Subscriber messaging, this is where Filtered or Selective Replication comes into play, you have the ability to set up sets of filters that tells your database where to send the documents to and exactly which ones. The Point-to-Point scenario is accomplished via this method as well, you just have a single point of replication, whether you choose to filter the documents is up to you.
I hope this document can be of use to somebody, I just wanted to get this out there for anybody else looking into options for building their next distributed systems. I think both technologies have their advantages/disadvantages and should be used when they fit.