6.2.4 Fine-grained locking
When we introduced locks and locking, we only worried about providing the same
type of locking granularity as the available WATCH command—on the level of the market
key that we were updating. But because we’re constructing locks manually, and
we’re less concerned about the market in its entirety than we are with whether an item
is still in the market, we can actually lock on a finer level of detail. If we replace the
market-level lock with one specific to the item to be bought or sold, we can reduce
lock contention and increase performance.
Let’s look at the results in table 6.3, which is the same simulation as produced
table 6.2, only with locks over just the items being listed or sold individually, and not
over the entire market.
|Listed items||Bought items||Purchase retries||Average wait per purchase|
|1 lister, 1 buyer, no lock||145,000||27,000||80,000||14ms|
|1 lister, 1 buyer, with lock||51,000||50,000||0||1ms|
|1 lister, 1 buyer, with fine-grained lock||113,000||110,000||0||<1ms|
|5 listers, 1 buyer, no lock||331,000||<200||50,000||150ms|
|5 listers, 1 buyer, with lock||68,000||13,000||<10||5ms|
|5 listers, 1 buyer, with fine-grained lock||192,000||36,000||0||<2ms|
|5 listers, 5 buyers, no lock||206,000||<600||161,000||498ms|
|5 listers, 5 buyers, with lock||21,000||20,500||0||14ms|
|5 listers, 5 buyers, with fine-grained lock||116,000||111,000||0||<3ms|
With fine-grained locking, we’re performing 220,000–230,000 listing and buying operations
regardless of the number of listing and buying processes. We have no retries,
and even under a full load, we’re seeing less than 3 milliseconds of latency. Our listedto-
sold ratio is again almost exactly the same as our ratio of listing-to-buying processes.
Even better, we never get into a situation like we did without locks where there’s so
much contention that latencies shoot through the roof and items are rarely sold.
Let’s take a moment to look at our data as a few graphs so that we can see the relative
scales. In figure 6.3, we can see that both locking methods result in much higher
numbers of items being purchased over all relative loads than the WATCH-based
Looking at figure 6.4, we can see that the WATCH-based method has to perform many
thousands of expensive retries in order to complete what few sales are completed.
And in figure 6.5, we can see that because of the WATCH contention, which caused the
huge number of retries and the low number of purchase completions, latency without
using a lock went up significantly.
What these simulations and these charts show overall is that when under heavy
load, using a lock can reduce retries, reduce latency, improve performance, and be
tuned at the granularity that we need.
Our simulation is limited. One major case that it doesn’t simulate is where many
more buyers are unable to buy items because they’re waiting for others. It also doesn’t
simulate an effect known as dogpiling, when, as transactions take longer to complete,
more transactions are overlapping and trying to complete. That will increase the time
it takes to complete an individual transaction, and subsequently increase the chances
for a time-limited transaction to fail. This will substantially increase the failure and
retry rates for all transactions, but it’s especially harmful in the WATCH-based version of
our market buying and selling.
The choice to use a lock over an entire structure, or over just a small portion of a
structure, can be easy. In our case, the critical data that we were watching was a small
piece of the whole (one item in a marketplace), so locking that small piece made
sense. There are situations where it’s not just one small piece, or when it may make
sense to lock multiple parts of structures. That’s where the decision to choose locks
over small pieces of data or an entire structure gets difficult; the use of multiple small
locks can lead to deadlocks, which can prevent any work from being performed at all.