Message ordering in the face of failure

Message ordering in the face of failure

Another day learning and coding and another interesting problem!

While designing a distributed message queue, I decided to use vector clocks to preserve message ordering when producers are talking to a different quorum of nodes. You can see some of my conclusions in a previous post titled Vector clocks and quorum consensus.

Vector clocks do seem to solve my ordering problems pretty well and don’t require me to have perfectly synchronized wall clock times between nodes to accomplish it. However, a problem arises with my original implementation in the case of a node failure.

Originally, I intended to associate counter values in vector clocks with a particular queue. This would mean that the vector clock’s value would be representative of the state of just the given queue, and when that queue was collected due to a lack of messages, all counter values would be reset to zero. After finally getting to the point where I was coding these portions of the design, I realized that this creates a problem in even the most basic and obvious scenarios:

  • Producer 1 (p1) queues message “Hi” to nodes A and B. Resulting clock is (A:1, B:1)
  • Consumer 1 (c1) consumes message “Hi” and sets its clock to (A:1, B:1)
  • [… TTL passes…]  Queue is garbage collected since all messages have expired. Clocks are reset.
  • p1 queues message “How are you?” to nodes A and B. Resulting clock is again (A:1, B:1)
  • c1 asks for all messages since “Hi” (A:1, B1) and gets nothing.
  • Oops.

Since the client is using the vector clock to ask for new messages, we can’t simply arbitrarily reset the clock. So instead we move the values of the clock to the server instance using a 64 bit unsigned integer to ensure we don’t get overflow. This solves the problem.. Well kinda, in a perfect world. Lets examine what is still wrong.

  • Producer 1 (p1) queues message “Hi” to nodes A and B. Resulting clock is (A:1, B:1)
  • Consumer 1 (c1) consumes message “Hi” and sets its clock to (A:1, B:1)
  • [… TTL passes…]  Queue is garbage collected since all messages have expired. But the clocks are not reset since they are based on the count for the node.
  • p1 queues message “How are you?” to nodes A and B. Resulting clock is (A:2, B:2)
  • c1 asks for all messages since “Hi” (A:1, B1) and gets “How are you?” with clock (A:2, B:2)
  • Yay!

So far, so good. But now what happens when A dies and then comes back with its clock reset? Well.. Nothing good.

  • Node A dies
  • Node A comes back to life with clocks reset
  • p1 queues message “I’m doing well!”. Resulting clock is (A:1, B:3)
  • c1 asks for all messages since “How are you?” with clock (A:2, B:2)
  • Uhoh. Clock (A:1, B:3) “I’m doing well!” does not follow (A:2, B:2) “How are you?”. We get no messages 😦

What do we do? We can keep our vector clock up to date every time we receive a message and for example write the value to disk, but that involves a lot of writes on the disk as well as being prone to not being up to date due to a crash before the updated clock can be written.

What I’ve decided to do is keep tract of the number of times a node is restarted and send that number with each node clock. That way, if a node restarts and the event clock is reset, we can use the restart count to know that any message with a lower restart count comes before a message with a greater restart count, even if the event clock is greater. Lets examine the above scenario with the new rules:

  • We set up clocks as (restart count/event count since last restart)
  • Producer 1 (p1) queues message “Hi” to nodes A and B. Resulting clock is (A:1/1, B:1/1)
  • Consumer 1 (c1) consumes message “Hi” and sets its clock to (A:1/1, B:1/1)
  • [… TTL passes…]  Queue is garbage collected since all messages have expired. But the clocks are not reset since they are based on the count for the node.
  • p1 queues message “How are you?” to nodes A and B. Resulting clock is (A:1/2, B:1/2)
  • c1 asks for all messages since “Hi” (A:1/1, B1/1) and gets “How are you?” with clock (A:1/2, B:1/2)
  • Yay!
  • Node A dies
  • Node A comes back to life with its event clock reset
  • p1 queues message “I’m doing well!”. Resulting clock is (A:2/1, B:1/3)
  • c1 asks for all messages since “How are you?” with clock (A:1/2, B:1/2)
  • Success! Clock (A:2/1, B:1/3) follow clock (A:1/2, B:1/2) since A’s restart count is 2 vs 1 and B’s event count is 2 vs 1.

In this way we can still determine message ordering after a node restart and all is right in the world.. until the next challenge.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s