fencing token pattern
When a process holds a lock or lease and might be slow/paused/stuck, give every lock acquisition a monotonically increasing token. Resource servers reject writes with stale tokens — even if the holder thinks it still has the lock.
When a process holds a lock or lease and might be slow/paused/stuck, give every lock acquisition a monotonically increasing token. Resource servers reject writes with stale tokens — even if the holder thinks it still has the lock.
symptoms
- data corruption from a stuck-but-recovered process
- two clients both think they're the leader
- writes from a thought-dead node
causes
- GC pauses, network partitions, or stop-the-world events
- trusting wall-clock timestamps for leases
- no validation at the resource server
fixes
- monotonic token from the lock service
- resource server rejects token < highest-seen
- tokens issued by a strongly-consistent store
you might say
- fencing token
- stale lock holder
- the GC pause stole our lease