Saturday, March 18, 2017

Unfair Locks - A Tale of ReadLocks Blocking WriteLock Causing Memory Bloat

Fair locks are a good thing. Especially, fair reentrant read write locks. Here, I recount how a minor oversight of a constructor parameter while instantiating a lock manifested itself as memory bloat.

Imagine an in-memory cache that is constantly updated. Specifically, we add new key-value pairs or update value for the existing keys (say, number of units sold for an item). This can be safely done as follows by using a data structure such as TrieMap (in Scala), that supports concurrent operations.


Now, imagine that the contents of this cache need to be flushed periodically to something more durable such as disk or Amazon S3. One way of doing this would be to have a flusher thread acquire a lock over the cache, take a snapshot, clear the cache, release the lock and continue with dumping the snapshot to disk/S3. A lock needs to be taken over the cache because we don't want any updates to the cache happening between the instant snapshot was taken and the cache was cleared to get lost.
As a corollary, the update operation on the cache also needs to acquire the same lock. However, we do have a thread-safe way of updating a TrieMap concurrently, so all the updates can be done concurrently by acquiring a 'read' lock - as read lock can be held simultaneously by multiple threads.
Flusher thread acquires a 'write' lock periodically, as it needs to block all updates. Since update threads acquire a read lock before updating, they are blocked if a periodic flusher thread holds the write lock thereby blocking all update threads. As you must have already noticed, we are acquiring a 'read' lock for doing an update (because many can hold a read lock) and a write lock for read and clear operation (because a write lock bars all other read locks). That doesn't happen very often in life. In code, all this looks like below.

Now coming to the raison d'etre of this post, imagine a situation where there are a multitude of updates and against this horde for request of read locks comes a tiny puny single flusher thread requesting a write lock, say once every 5 minutes. If the ReadWriteLock had been fair (FIFO), flusher thread would have got a write lock in due time (as per its request time). However, we didn't specify the lock to be fair while instantiating (new ReentrantReadWriteLock(fair = true)) - by default, locks are unfair. Therefore a single request by the flusher thread for a write lock may be up against thousands of requests for read locks without any hopes of justice. Depending on the volume of updates to the cache, the flusher thread may be indefinitely blocked (on readLocks), causing the in memory cache to keep on bloating in the meantime. JavaDocs have this gem on non-fair locks that supports this theory.

A nonfair lock that is continuously contended may indefinitely postpone one or more reader or writer threads, but will normally have higher throughput than a fair lock.

So, in a nutshell - we acquire read locks for updates to the cache so that they can be concurrent. Flusher acquires a write lock to block any updates while it takes a snapshot and clears the cache. However, because the lock itself is unfair, flusher's request for a write lock may be indefinitely postponed causing the in-memory cache to bloat in the meanwhile. That's an interesting turn of events.