One of the common issue in Web service is the problem of network timeout.
As a simple example consider a User registration scenario. User enters all his details and clicks Register button, only to see the web page not responding, He now clicks again and again for a few times and in between one of his click, he is redirected to the Welcome Page.
Looks all good but we may end up having multiple entries of the User in the backend.
There are two approaches to avoid this problem:
Disabling the button till callback of the first click arrives.
To have transactional access to datastore persistence and use the unique constraint violation to catch the duplicate call.
The first approach is a client centric approach relying that client ensure he calls only once, going to the extent of affecting User experience, Not to mention that backend service system has no defense if there is a loophole in the client code.
The second approach is more robust and forms the base of idempotency handling.
In the world of microservices, this issue is more prevalent, with multiple microservices handling a single user action. As an example consider the example of buying a product from an e-commerce website.
A single click checkout and but could involve invoking multiple services, such as service to get User address information, service to get serviceability/ availability of the product, service for getting and applying offers, service getting and invoking payment channels and service for successfully storing the order.
The above call flow is a high level description, In reality the call graph could get more complex it could have loops and is not always a DAG, and in these cases the need for idempotency is of a high importance.
Before rushing into implementing all your API’s to be idempotent, It's important to consider when idempotency is needed.
For services which provide serviceability and availability lookup - idempotency is of no use. But services recording order placed, payment request, idempotency is needed to avoid duplicate order/ payment from being processed.
A rule of thumb is if your api is making any Lookup or GET kinda calls, idempotency is not needed. It’s only needed when you're making a PUT/POST call involving create/update of a persistent entity.
Idempotency implementation: (Not as simple as it looks)
An important aspect overlooked while implementing idempotency is the need for transactionality in checking for idempotency.
Let’s consider a simple implementation
if(idempStore.contains(idemKey){
return idemStore.get(idemKey); // may choose to get or throw IdemAborted exception here
This simple implementation has issues of race conditions. The case of 2 concurrent request threads running through the if stateMent around the same time and returning false resulting in multiple execution of logics in the else block.
A simple fix to the above logic is to use a mutex.
if(idemStore.contains(idemKey){
return idemStore.get(idemKey); // may choose to get or throw IdemAborted exception here
This would prevent race conditions but increase latency over lock contention.
A few improvements over the above approach would be do all logic outside the synchronized block and use synchronize only for updating the data in store. Hence contention would be limited to the store operation latency in your application.
if(idemStore.contains(idemKey){
return idemStore.get(idemKey); // may choose to get or throw IdemAborted exception here
if(idemStore.contains(idemKey){
return idemStore.get(idemKey); // may choose to get or throw IdemAborted exception here
idemStore.udpate(idemKey);
Alternative approach to pessimistic locking would be optimistic locking. Compare and Swap strategy, i.e. if this available within your datastore, such as checkAndUpsert in hbase, putIfAbsent in redis. Here the idea is store the idempotency key along with the data and version with CAS update operation and in case of CAS failure where version doesn’t match input version you recheck for idempotency in the data from store and throw idempotencyAbort if the idempotency key exist in the store.
if(store.contains(idemKey){
return store.get(idemKey); // may choose to get or throw IdemAborted exception here
Boolean success = store.checkAndUpsert(data,idemKey,version);
if(!success) return store.get(idemKey);
The last approach is what is primarily used in practice in distributed systems.As the first 2 approach require having a distributed lock when running multiple instance of stateless app services. Zookeeper, Hazelcast are tools that can be leveraged to build a distributed lock, But distributed locking at high scale proves very inefficient and its best to handle this at persistence layer, which if well distributed and well sharded will serialize at the right partition and make the problem as in memory mutex which is simpler.