Many thriving web 2.0/3.0 companies have built their success around workflow engines and messaging frameworks, e.g. Airbnb’s Airflow, LinkedIn’s Kafka and Netflix’s Meson. The cornerstone of any enterprise-grade workflow engine is the use of a messaging framework that allows for distributed execution and ensures guaranteed message delivery.
A workflow engine encompasses various software components serving the purpose of automatically executing tasks based on user-definable business processes. Generally speaking, workflows consist of smaller tasks called activities, which form a set of atomic and re-usable building blocks performing specific actions. Activities cannot be arbitrarily chained together, as some of them depend on specific data or a predefined state in order to complete their task.
So far, so good, no need for messaging frameworks yet… The true complexity is hidden behind the requirements that not only the workflow engine must be fail-safe but also support distributed execution in order to separate the workflow-orchestration from any heavy-duty computation that might max out the host’s resources. You might argue that this also could be done through web services, but just think about the hassle of ensuring that activities are reliably executed after a system failure… Here you go, just to name a few:
By now, I am sure you are also inclined to willingly offload these tasks to a tried and tested messaging framework that gives you all these features for free. This blog post will focus on the usage of RabbitMQ as messaging framework and address two aforementioned issues when implementing a distributed workflow engine:
Let’s have a look at how activities within a workflow can trigger each other using a topic exchange and adequate routing keys. But first let’s briefly recap how topic exchanges work:
Example:
Now, let’s have a look at how the same concept can be used to make Activity A1 selectively trigger A2 or A3.
Example:
As you surely figured out already, the trick is to use a unique activity identifier as routing key. The basic example above can be refined by adding more structure to the routing key, e.g {Workflow ID}.{Activity ID}.{Message Type}. The additional workflow key would separate all messages based on which workflow they belong to. Note that instead of extending the routing key, you could also use distinct exchanges (X) to properly route the messages. The question then is what discriminative criteria to use for exchanges: Message Type? Workflow ID? Both are valid choices, though there is an anti-pattern that should be avoided when designing exchanges and routing keys: Don’t distribute unwanted messages that get filtered out of the queue by the consumer.
As you can see in the example above, basic remoting capabilities come out of the box since RabbitMQ-clients can connect to a remote message broker instance and interact with it as if it were running on the same machine.
Although very easy to implement, the approach above reveals serious restrictions when considering that Machine 1 hosting the message broker could possibly fail and leave the workflow engine hosted on Machine 2 completely idle until the message queues are available again. To remediate this limitation, why not run replicated message broker instances on Machine 1 and Machine 2? Sure, but how can be keep the queue states in sync? RabbitMq supports different types of distribution (source):
As unremarkable as they may seem, message queues are true workhorses when it comes to enabling reliable distributed computing. RabbitMQ is just one good example among many equally good alternatives that you might come across in the wild.
Schreiben Sie einen Kommentar