Recommended study 1: After 30 days of hard work, I have produced this [Distributed Book: Current Limit + Cache + Communication]. Qiu Talent Job Recruitment is expected to be 2: First release on the entire network! Internal Sharing of Ma Soldiers—page 1658 "Java Interview Assault

2025/04/1003:08:45 hotcomm 1713

Recommended learning

  • 1: Recommended study 1: After 30 days of hard work, I have produced this [Distributed Book: Current Limit + Cache + Communication]. Qiu Talent Job Recruitment is expected to be 2: First release on the entire network! Internal Sharing of Ma Soldiers—page 1658

    1. What is a distributed lock:

    1. What is a distributed lock:

    distributed lock, that is, a lock in a distributed system. In a single-body application, we solve the problem of controlling shared resources to access through locks, and distributed locks solve the problem of controlling shared resources to access in the distributed system . Unlike monolithic applications, the minimum granularity of competing for shared resources in distributed systems has been upgraded from the thread to the process.

    2. What conditions should a distributed lock have:

    • In a distributed system environment, a method can only be executed by one thread of a machine at the same time by at the same time. The high-availability acquisition lock and release lock.
    • . The high-performance acquisition lock and release lock.
    • has the reentry feature (it can be understood as re-entry, used concurrently by more than one task, without worrying about data errors)
    • has the lock failure mechanism, that is, automatic unlocking, preventing deadlocks
    • . It has the characteristics of non-blocking lock, that is, if the lock is not acquired, it will directly return to the lock failed.

    3. The implementation method of distributed lock:

    Implementation of distributed lock based on database. Implementation of distributed lock based on zookeeper Implementation of distributed lock based on reids. This article briefly introduces the implementation of these distributed locks, focusing on the distributed lock based on redishml9.

    2. Distributed lock based on database:

    There are two ways to implement lock based on databases, one is based on the addition and deletion of database tables, and the other is based on the exclusive lock based on database.

    1. Adding and deleting based on database tables:

    Adding and deleting based on database tables is the easiest way. First, create a locked table mainly contains the following fields: the full path name + method name of the class, timestamp and other fields.

    Specific usage method: When you need to lock a method, insert a related record into the table. The full path name + method name of the class is unique. If multiple requests are submitted to the database at the same time, the database will ensure that only one operation can be successful. Then we believe that the thread that has successfully operated has obtained the lock of the method and can execute the method body content. After the execution is completed, the record needs to be deleted.

    (I just briefly introduce it here. The above solution can be optimized, such as: apply the master-slave database, and the data is synchronized in two directions; once it is hung up, quickly switch to the backup library; do a timed task, clean the timeout data in the database every certain time; use while loop until insert is successful and then return successfully; record the host information and thread information of the machine that currently obtains the lock, and query the database first when obtaining the lock next time, if the current machine's main machine is the main machine If the machine information and thread information can be found in the database, just assign the lock to it to realize the reentrant lock)

    2. Based on the database exclusive lock:

    Based on the MySql InnoDB engine, the following methods can be used to implement locking operation:

    public void lock(){connection.setAutoCommit(false)int count = 0;while(count 4){try{select * from lock where lock_name=xxx for update;if(result is not empty){//represents that the lock has been obtained;}}catch(Exception e){ }//If it is empty or an exception is thrown, it means that the lock has not been obtained sleep(1000);count++;}throw new LockException();}

    adds for update after the query statement, and the database will add exclusive locks to the database table during the query process. The thread that obtains the exclusive lock can obtain a distributed lock. After obtaining the lock, the business logic of the method can be executed. After executing the method, release the lock connection.commit(). When an exclusive lock is added to a record, other threads cannot acquire the exclusive lock and are blocked.

    3. Advantages and disadvantages of database locks:

    The above two methods rely on database tables. One is to judge whether there is currently a lock through the records in the table, and the other is to realize distributed locks through the exclusive lock of the database. The advantage of

    • is that it is simple and easy to understand with the help of the database. The disadvantage of
    • is that operating the database requires a certain amount of overhead and performance issues need to be considered.

    3. Distributed lock based on Zookeeper

    distributed locks that can be realized based on zookeeper temporary ordered nodes. When each client locks a method, a unique instantaneously ordered node is generated in the directory of the specified node corresponding to the method on zookeeper. The way to determine whether to acquire a lock is very simple. You only need to judge the one with the smallest serial number in the ordered node. When releasing the lock, just delete this instantaneous node. At the same time, it can avoid the locks that cannot be released due to service downtime, resulting in deadlock problems. (The third-party library includes Curator. InterProcessMutex provided by Curator is an implementation of distributed locks) The distributed lock implemented by

    Zookeeper has two disadvantages:

    • (1) may not be as high as the cache service in terms of performance, because every time the lock is created and released, instantaneous nodes must be created and destroyed to implement the lock function. Creating and deleting nodes in ZK can only be performed through the Leader server, and then synchronizing the data to all Follower machines.
    • (2) Concurrency security problem of zookeeper: Because there may be network jitter, the session connection between the client and the ZK cluster is broken, and the zk cluster thinks that the client has hung up, and will delete the temporary node. At this time, other clients can obtain the distributed lock.

    4. Redis-based distributed lock:

    Redish Command description:

    (1) setnx command: set if not exists, if and only if the key does not exist, set the value of key to value. If the given key already exists, SETNX does not do anything.

    • returns 1, indicating that the process has obtained the lock. Set the value of key to value
    • returns 0, indicating that other processes have obtained the lock and the process cannot enter the critical area .

    command format: setnx lock.key lock.value

    (2) get command: get the value of key, if it exists, it returns; if it does not exist, it returns nil

    command format: get lock.key

    (3) getset command: This method is atomic, set the value of newValue for key, and returns the original old value of key.

    command format: getset lock.key newValue

    (4) del command: delete the key

    command specified in redis: del lock.key

    Solution 1: Distributed lock

    1 based on set command. Lock: Use setnx to lock. When the instruction returns 1, it means that the lock is successfully obtained.

    2. Unlock: After the thread that obtained the lock has completed the task, use the del command to release the lock, so that other threads can continue to execute the setnx command to obtain the lock

    (1) Problem: Assuming that the thread acquires the lock, it hangs up during the execution of the task and executes the del command to release the lock without time to display, the thread competing for the lock will not be executed, resulting in a deadlock.

    (2) Solution: Set the lock timeout time

    3. Set the lock timeout time: The key of setnx must set a timeout time to ensure that the lock must be automatically released after a certain period of time even if it is not explicitly released. You can use the expire command to set the lock timeout time

    (1) There is a problem:

    setnx and expire are not atomic operations. Suppose a thread executes the setnx command and successfully obtains the lock, but before the expire command has been executed, the server hangs up. In this way, the lock will not set the expiration time and become a deadlock, and other threads will no longer be able to obtain the lock.

    (2) Solution: Redis's set command supports setting the expiration time of the key while acquiring the lock

    4, use the set command to lock and set the expiration time of the lock:

    Command format: set lock.key lock.value nx ex expireTime

    details refer to the redis usage document:
    http://doc.redisfans.com/string/set.html

    (1) There is a problem:

    ① If thread A successfully obtains the lock, and the timeout time set is 30 seconds. If some reason causes thread A to execute very slowly and fails to execute after 30 seconds, the lock expires and is automatically released, and thread B gets the lock.

    ② Then, thread A completes the task and then executes the del command to release the lock. But at this time, thread B has not finished executing, and thread A actually deletes the lock added by thread B.

    (2) Solution:

    can make a judgment before del releases the lock to verify whether the current lock is the lock added by yourself. When adding a lock, treat the current thread ID as a value, and verify whether the value corresponding to the key is the ID of your own thread before deletion. However, doing this actually implies a new problem. Get operation, judgment and release lock are two independent operations, not atomic. For non-atomic problems, we can use the lua script to ensure the atomicity of the operation

    5, lock renewal: (This mechanism is similar to the watchdog mechanism of Redisson, which will be explained in detail later in the article)

    Although step 4 avoids the situation where thread A accidentally deletes the key, it is still imperfect to have two threads A and B accessing the code block at the same time. What to do? We can enable the thread that acquires the lock to open an daemon thread to "renew" the lock that is about to expire.

    ① Assuming that thread A has not finished executing it after 29 seconds, the daemon thread will execute the expire command to renew the lock for 20 seconds. daemon thread is executed from the 29th second, and every 20 seconds.

    ② Case 1: When thread A completes the task, the daemon thread will be explicitly turned off.

    ③ Situation 2: If the server suddenly loses power, the daemon thread will also stop because thread A and the daemon thread are in the same process. When this lock expires, no one will prolong its life and release it automatically.

    Solution 2: Distributed lock based on setnx, get, getset

    1, implementation principle:

    (1) setnx(lockkey, current time + expiration timeout time), if 1 is returned, the lock is successfully obtained; if 0 is returned, no lock is obtained, turn to step (2)

    (2) get(lockkey) get the value oldExpireTime, and compare this value with the current system time. If it is less than the current system time, it is believed that the lock has timed out, and other requests can be re-acquisitioned, turn to step (3)

    (3) calculate the new expireTime = current time + lock timeout time, and then getset(lockkey, newExpireTime) will return the current lockkey value currentExpireTime

    (4) determine whether currentExpireTime and oldExpireTime are equal. If equal, it means that the current getset setting is successful and the lock has been obtained. If it is not equal, it means that the lock has been retrieved by another request. Then the current request can be returned directly to failure, or continue to try again.

    (5) After obtaining the lock, the current thread can start its own business processing. After the processing is completed, compare its own processing time with the timeout time set for the lock. If it is less than the timeout time set for the lock, directly execute the del command to release the lock (before releasing the lock, you need to determine whether the thread holding the lock is the current thread); if it is greater than the timeout time set by the lock, no more locking is required for processing.

    2. Code implementation:

    (1) Implementation method of obtaining lock:

    public boolean lock(long acquireTimeout, TimeUnit timeUnit) throws InterruptedException {acquireTimeout = timeUnit.toMillis(acquireTimeout);long acquireTime = acquireTimeout + System.currentTimeMillis();//Use J.U.C's ReentrantLockthreadLock.tryLock(acquireTimeout, timeUnit);try { //Longe try while (true) { //Call tryLockboolean hasLock = tryLock(); if (hasLock) {//Acquiring the lock successfully returns true;} else if (acquireTime System.currentTimeMillis()) {break;}Thread.sleep(sleepTime);}} finally {if (threadLock.isHeldByCurrentThread()) {threadLock.unlock();}} return false;} public boolean tryLock() { long currentTime = System.currentTimeMillis();String expires = String.valueOf(timeout + currentTime);//Set the mutex if (redisHelper.setNx(mutex, expires) 0) { //Acquire the lock, set timeout time setLockStatus(expires);return true;} else {String currentLockTime = redisUtil.get(mutex);//Check whether the lock timed out if (Objects.nonNull(currentLockTime) && Long.parseLong(currentLockTime) currentTime) {//Get the old lock time and set the mutex String oldLockTime = redisHelper.getSet(mutex, expires);//Compare the old value to the current time if (Objects.nonNull(oldLockTime) && Objects.equals(oldLockTime, currentLockTime)) { //Get the lock, set the timeout time setLockStatus(expires);return true;}} return false;}}

    tryLock method, the main logic is as follows: lock calls the tryLock method, the parameters are the timeout and units acquired, and the thread within the timeout time, the acquisition lock operation will spin there until the holder of the spin lock releases the lock.

    (2) Implementation of releasing locks:

    public boolean unlock() {//Only the lock holding thread can unlock if (lockHolder == Thread.currentThread()) {//Judge whether the lock timed out, delete the mutex if (lockExpiresTime System.currentTimeMillis()) {redisHelper.del(mutex);logger.info("Delete the mutex[{}]", mutex);}lockHolder = null;logger.info("Release [{}] Lock successfully", mutex); return true;} else {throw new IllegalMonitorStateException("The thread that has not obtained the lock cannot perform the unlocking operation");}}

    has a problem:

    (1) The core of this lock is based on System.currentTimeMillis(). If the time of multiple servers is inconsistent, then the problem will occur, but this bug can be completely avoided from the server operation and maintenance level, and if the server time is not Similarly, as long as the logic related to time is all

    (2) If multiple servers request to acquire the lock when the previous lock timeout, then redis.getset() will be executed simultaneously, resulting in an expiration time overwrite problem. However, this situation will not affect the correct result.

    (3) There are multiple threads holding the lock at the same time: if thread A executes the task time more than the expiration time of the lock, then another thread can obtain the lock, resulting in multiple threads holding the lock at the same time. Similar to Solution 1, it can be solved by using the "lock-renewal" method.

    The problems with the first two redis distributed locks

    The implementation methods of the first two redis distributed locks are still lacking from the perspective of "high availability". That is to say, when redis is a single point, when a failure occurs, the distributed locks of the entire business will not be used.

    In order to improve availability, we can use master-slave mode or sentinel mode, but there are still problems in this case. In master-slave mode or sentinel mode, under normal circumstances, if the lock is successful, the master node will be copied asynchronously to the corresponding slave node. However, if the master node crashes during this process, the master and standby switch, the slave node changes from the master node, and the lock has not been synchronized from the old master node, this will cause the lock loss, which will cause the problem that multiple clients can hold the same lock at the same time.Let’s take a look at a picture to think about this process:

    Recommended study 1: After 30 days of hard work, I have produced this [Distributed Book: Current Limit + Cache + Communication]. Qiu Talent Job Recruitment is expected to be 2: First release on the entire network! Internal Sharing of Ma Soldiers—page 1658

    So, how to avoid this situation? Redis officially provides a highly available distributed lock solution based on multiple redis cluster deployment: RedLock. We will introduce it in detail in Solution 3. (Note: If multiple clients can tolerate locks at the same time during the master node downtime, then redLock is not required)

    Solution 3: RedLock-based distributed lock

    redLock official document address:
    https://redis.io/topics/distlock

    Redlock algorithm is a high-availability mode introduced by Redis Author Antirez on the basis of a single Redis node. Redlock locking should be implemented in conjunction with a single-node distributed lock algorithm, because it is the basis of RedLock

    1. Locking implementation principle:

    now assumes that there are 5 Redis master nodes (odd numbers greater than 3), which basically ensures that they will not all fall off at the same time. During the process of acquiring and releasing the lock, the client will perform the following operations:

    (1) Get the current unix time in milliseconds, and set the timeout time TTL

    TTL to be greater than the time of normal business execution + Get all redis services to consume + Clock drift

    (2) tries to obtain the lock from 5 instances in turn, using the same key and unique value. When requesting the lock to Redis, the client should set a network connection and response timeout time, which should be less than the failure time of the lock TTL, which can prevent the client from dying, etc. For example: TTL is 5s, and it takes up to 1s to set the lock to acquire, so if the lock cannot be acquired within one second, give up acquiring the lock, and try to acquire the next lock

    (3) client. The time after obtaining all the locks that can be acquired. Subtract the time in step (1) to obtain the lock acquisition time. The acquisition time of the lock must be less than the lock failure time TTL, and at least more than half of the Redis nodes are retrieved to be considered as the successful lock

    (4) If the lock is successfully obtained, the true valid time of the key = TTL - The acquisition time of the lock - Clock drift. For example: TTL is 5s, and it takes 2s to acquire all locks, the actual lock validity time is 3s

    (5). If for some reason, the lock acquisition fails (the lock has not been retrieved in more than half of the instances or the lock acquisition time has exceeded the validity time), the client should unlock it on all Redis instances, regardless of whether the Redis instance is locked successfully, because the server may have lost the response message but actually succeeded.

    imagines a situation where the client request to obtain locks to a Redis node successfully reaches the Redis node, and the node also successfully performs the SET operation, but the response packet it returns to the client is lost. In the client's view, the request to acquire the lock failed due to the timeout, but in the Redis view, the locking has been successfully added. Therefore, when releasing the lock, the client should also make a request to the Redis nodes that failed to acquire the lock. In fact, this situation is possible in the asynchronous communication model: it is normal for the client to communicate to the server, but there is a problem in the opposite direction.

    (6) Failed to retry: When the client cannot acquire the lock, it should retry to acquire the lock after a random time; there must be a certain number of times to retry the lock at the same time;

    retry after a random time, mainly to prevent too many clients from trying to acquire the lock at the same time, resulting in the problem of failure of both of them to acquire the lock. The schematic diagram of the

    algorithm is as follows:

    Recommended study 1: After 30 days of hard work, I have produced this [Distributed Book: Current Limit + Cache + Communication]. Qiu Talent Job Recruitment is expected to be 2: First release on the entire network! Internal Sharing of Ma Soldiers—page 1658

    2, RedLock performance and related solutions for crash recovery:

    Since most of the N Redis nodes can work normally, it can ensure that Redlock works normally, theoretically its usability is higher.The security problems we mentioned earlier in RedLock no longer exist in RedLock, but if a node crashes and restarts, it will still have an impact on the security of the lock. The specific impact is related to Redis persistence configuration:

    (1) If redis does not have a persistence function, after clientA successfully acquires the lock, clientB can re-acquire the lock, which violates the exclusive mutex of the lock;

    (2) If AOF permanent storage is started, things will be better. For example: When we restart redis, since the redis expiration mechanism is based on the Unix timestamp, after restarting, it will expire at the specified time, which will not affect the business; however, because the default method of synchronizing AOF to disk is once every second, if the power is cut within one second, data will be lost, and restarting immediately will cause lock mutually exclusive failure; but if the synchronized disk uses Always (every write command is synchronized to the hard disk), the performance will drop sharply; so there must be some choices in terms of full validity and performance of locks;

    (3) In order to effectively solve the problem, it will ensure the full validity of locks and Efficient performance problem: antirez has proposed the concept of "delayed restart". Redis is synchronized to disk and keeps the default one-second. After redis crashes a stand-alone machine (whether one or all), do not restart it immediately, but wait for the TTL time before restarting. In this way, the locks involved in this node will expire before restarting, and it will not affect the existing lock after restarting. The disadvantage is that the service is equivalent to a pause state within TTL time; the implementation of RedLock in

    3 and redisson:

    has implemented the encapsulation of RedLock in JAVA redisson package, mainly through redishclient and lua scripts. The reason for using lua Scripts are to realize transactionality of unlock verification and execution.

    (1) Generation of unique ID: In the

    distributed transaction lock, in order to allow the storage node as the central node to acquire the holder of the lock, thereby avoiding the lock being misunderstood by non-holders, each client node that initiates the request must have a globally unique id. Usually we use UUID as this unique id, and redisson also implements this way. On this basis, redisson also adds threadid to avoid the performance loss of multiple threads to repeatedly obtain UUID

    protected final UUID id = UUID.randomUUID();String getLockName(long threadId) { return id + ":" + threadId;}

    (2) Locking logic:

    redisson The core code of locking is very easy to understand, and it is passed in TTL and unique id, implements lock requests for a period of time.

    Recommended learning

    • 1: Recommended study 1: After 30 days of hard work, I have produced this [Distributed Book: Current Limit + Cache + Communication]. Qiu Talent Job Recruitment is expected to be 2: First release on the entire network! Internal Sharing of Ma Soldiers—page 1658

      1. What is a distributed lock:

      1. What is a distributed lock:

      distributed lock, that is, a lock in a distributed system. In a single-body application, we solve the problem of controlling shared resources to access through locks, and distributed locks solve the problem of controlling shared resources to access in the distributed system . Unlike monolithic applications, the minimum granularity of competing for shared resources in distributed systems has been upgraded from the thread to the process.

      2. What conditions should a distributed lock have:

      • In a distributed system environment, a method can only be executed by one thread of a machine at the same time by at the same time. The high-availability acquisition lock and release lock.
      • . The high-performance acquisition lock and release lock.
      • has the reentry feature (it can be understood as re-entry, used concurrently by more than one task, without worrying about data errors)
      • has the lock failure mechanism, that is, automatic unlocking, preventing deadlocks
      • . It has the characteristics of non-blocking lock, that is, if the lock is not acquired, it will directly return to the lock failed.

      3. The implementation method of distributed lock:

      Implementation of distributed lock based on database. Implementation of distributed lock based on zookeeper Implementation of distributed lock based on reids. This article briefly introduces the implementation of these distributed locks, focusing on the distributed lock based on redishml9.

      2. Distributed lock based on database:

      There are two ways to implement lock based on databases, one is based on the addition and deletion of database tables, and the other is based on the exclusive lock based on database.

      1. Adding and deleting based on database tables:

      Adding and deleting based on database tables is the easiest way. First, create a locked table mainly contains the following fields: the full path name + method name of the class, timestamp and other fields.

      Specific usage method: When you need to lock a method, insert a related record into the table. The full path name + method name of the class is unique. If multiple requests are submitted to the database at the same time, the database will ensure that only one operation can be successful. Then we believe that the thread that has successfully operated has obtained the lock of the method and can execute the method body content. After the execution is completed, the record needs to be deleted.

      (I just briefly introduce it here. The above solution can be optimized, such as: apply the master-slave database, and the data is synchronized in two directions; once it is hung up, quickly switch to the backup library; do a timed task, clean the timeout data in the database every certain time; use while loop until insert is successful and then return successfully; record the host information and thread information of the machine that currently obtains the lock, and query the database first when obtaining the lock next time, if the current machine's main machine is the main machine If the machine information and thread information can be found in the database, just assign the lock to it to realize the reentrant lock)

      2. Based on the database exclusive lock:

      Based on the MySql InnoDB engine, the following methods can be used to implement locking operation:

      public void lock(){connection.setAutoCommit(false)int count = 0;while(count 4){try{select * from lock where lock_name=xxx for update;if(result is not empty){//represents that the lock has been obtained;}}catch(Exception e){ }//If it is empty or an exception is thrown, it means that the lock has not been obtained sleep(1000);count++;}throw new LockException();}

      adds for update after the query statement, and the database will add exclusive locks to the database table during the query process. The thread that obtains the exclusive lock can obtain a distributed lock. After obtaining the lock, the business logic of the method can be executed. After executing the method, release the lock connection.commit(). When an exclusive lock is added to a record, other threads cannot acquire the exclusive lock and are blocked.

      3. Advantages and disadvantages of database locks:

      The above two methods rely on database tables. One is to judge whether there is currently a lock through the records in the table, and the other is to realize distributed locks through the exclusive lock of the database. The advantage of

      • is that it is simple and easy to understand with the help of the database. The disadvantage of
      • is that operating the database requires a certain amount of overhead and performance issues need to be considered.

      3. Distributed lock based on Zookeeper

      distributed locks that can be realized based on zookeeper temporary ordered nodes. When each client locks a method, a unique instantaneously ordered node is generated in the directory of the specified node corresponding to the method on zookeeper. The way to determine whether to acquire a lock is very simple. You only need to judge the one with the smallest serial number in the ordered node. When releasing the lock, just delete this instantaneous node. At the same time, it can avoid the locks that cannot be released due to service downtime, resulting in deadlock problems. (The third-party library includes Curator. InterProcessMutex provided by Curator is an implementation of distributed locks) The distributed lock implemented by

      Zookeeper has two disadvantages:

      • (1) may not be as high as the cache service in terms of performance, because every time the lock is created and released, instantaneous nodes must be created and destroyed to implement the lock function. Creating and deleting nodes in ZK can only be performed through the Leader server, and then synchronizing the data to all Follower machines.
      • (2) Concurrency security problem of zookeeper: Because there may be network jitter, the session connection between the client and the ZK cluster is broken, and the zk cluster thinks that the client has hung up, and will delete the temporary node. At this time, other clients can obtain the distributed lock.

      4. Redis-based distributed lock:

      Redish Command description:

      (1) setnx command: set if not exists, if and only if the key does not exist, set the value of key to value. If the given key already exists, SETNX does not do anything.

      • returns 1, indicating that the process has obtained the lock. Set the value of key to value
      • returns 0, indicating that other processes have obtained the lock and the process cannot enter the critical area .

      command format: setnx lock.key lock.value

      (2) get command: get the value of key, if it exists, it returns; if it does not exist, it returns nil

      command format: get lock.key

      (3) getset command: This method is atomic, set the value of newValue for key, and returns the original old value of key.

      command format: getset lock.key newValue

      (4) del command: delete the key

      command specified in redis: del lock.key

      Solution 1: Distributed lock

      1 based on set command. Lock: Use setnx to lock. When the instruction returns 1, it means that the lock is successfully obtained.

      2. Unlock: After the thread that obtained the lock has completed the task, use the del command to release the lock, so that other threads can continue to execute the setnx command to obtain the lock

      (1) Problem: Assuming that the thread acquires the lock, it hangs up during the execution of the task and executes the del command to release the lock without time to display, the thread competing for the lock will not be executed, resulting in a deadlock.

      (2) Solution: Set the lock timeout time

      3. Set the lock timeout time: The key of setnx must set a timeout time to ensure that the lock must be automatically released after a certain period of time even if it is not explicitly released. You can use the expire command to set the lock timeout time

      (1) There is a problem:

      setnx and expire are not atomic operations. Suppose a thread executes the setnx command and successfully obtains the lock, but before the expire command has been executed, the server hangs up. In this way, the lock will not set the expiration time and become a deadlock, and other threads will no longer be able to obtain the lock.

      (2) Solution: Redis's set command supports setting the expiration time of the key while acquiring the lock

      4, use the set command to lock and set the expiration time of the lock:

      Command format: set lock.key lock.value nx ex expireTime

      details refer to the redis usage document:
      http://doc.redisfans.com/string/set.html

      (1) There is a problem:

      ① If thread A successfully obtains the lock, and the timeout time set is 30 seconds. If some reason causes thread A to execute very slowly and fails to execute after 30 seconds, the lock expires and is automatically released, and thread B gets the lock.

      ② Then, thread A completes the task and then executes the del command to release the lock. But at this time, thread B has not finished executing, and thread A actually deletes the lock added by thread B.

      (2) Solution:

      can make a judgment before del releases the lock to verify whether the current lock is the lock added by yourself. When adding a lock, treat the current thread ID as a value, and verify whether the value corresponding to the key is the ID of your own thread before deletion. However, doing this actually implies a new problem. Get operation, judgment and release lock are two independent operations, not atomic. For non-atomic problems, we can use the lua script to ensure the atomicity of the operation

      5, lock renewal: (This mechanism is similar to the watchdog mechanism of Redisson, which will be explained in detail later in the article)

      Although step 4 avoids the situation where thread A accidentally deletes the key, it is still imperfect to have two threads A and B accessing the code block at the same time. What to do? We can enable the thread that acquires the lock to open an daemon thread to "renew" the lock that is about to expire.

      ① Assuming that thread A has not finished executing it after 29 seconds, the daemon thread will execute the expire command to renew the lock for 20 seconds. daemon thread is executed from the 29th second, and every 20 seconds.

      ② Case 1: When thread A completes the task, the daemon thread will be explicitly turned off.

      ③ Situation 2: If the server suddenly loses power, the daemon thread will also stop because thread A and the daemon thread are in the same process. When this lock expires, no one will prolong its life and release it automatically.

      Solution 2: Distributed lock based on setnx, get, getset

      1, implementation principle:

      (1) setnx(lockkey, current time + expiration timeout time), if 1 is returned, the lock is successfully obtained; if 0 is returned, no lock is obtained, turn to step (2)

      (2) get(lockkey) get the value oldExpireTime, and compare this value with the current system time. If it is less than the current system time, it is believed that the lock has timed out, and other requests can be re-acquisitioned, turn to step (3)

      (3) calculate the new expireTime = current time + lock timeout time, and then getset(lockkey, newExpireTime) will return the current lockkey value currentExpireTime

      (4) determine whether currentExpireTime and oldExpireTime are equal. If equal, it means that the current getset setting is successful and the lock has been obtained. If it is not equal, it means that the lock has been retrieved by another request. Then the current request can be returned directly to failure, or continue to try again.

      (5) After obtaining the lock, the current thread can start its own business processing. After the processing is completed, compare its own processing time with the timeout time set for the lock. If it is less than the timeout time set for the lock, directly execute the del command to release the lock (before releasing the lock, you need to determine whether the thread holding the lock is the current thread); if it is greater than the timeout time set by the lock, no more locking is required for processing.

      2. Code implementation:

      (1) Implementation method of obtaining lock:

      public boolean lock(long acquireTimeout, TimeUnit timeUnit) throws InterruptedException {acquireTimeout = timeUnit.toMillis(acquireTimeout);long acquireTime = acquireTimeout + System.currentTimeMillis();//Use J.U.C's ReentrantLockthreadLock.tryLock(acquireTimeout, timeUnit);try { //Longe try while (true) { //Call tryLockboolean hasLock = tryLock(); if (hasLock) {//Acquiring the lock successfully returns true;} else if (acquireTime System.currentTimeMillis()) {break;}Thread.sleep(sleepTime);}} finally {if (threadLock.isHeldByCurrentThread()) {threadLock.unlock();}} return false;} public boolean tryLock() { long currentTime = System.currentTimeMillis();String expires = String.valueOf(timeout + currentTime);//Set the mutex if (redisHelper.setNx(mutex, expires) 0) { //Acquire the lock, set timeout time setLockStatus(expires);return true;} else {String currentLockTime = redisUtil.get(mutex);//Check whether the lock timed out if (Objects.nonNull(currentLockTime) && Long.parseLong(currentLockTime) currentTime) {//Get the old lock time and set the mutex String oldLockTime = redisHelper.getSet(mutex, expires);//Compare the old value to the current time if (Objects.nonNull(oldLockTime) && Objects.equals(oldLockTime, currentLockTime)) { //Get the lock, set the timeout time setLockStatus(expires);return true;}} return false;}}

      tryLock method, the main logic is as follows: lock calls the tryLock method, the parameters are the timeout and units acquired, and the thread within the timeout time, the acquisition lock operation will spin there until the holder of the spin lock releases the lock.

      (2) Implementation of releasing locks:

      public boolean unlock() {//Only the lock holding thread can unlock if (lockHolder == Thread.currentThread()) {//Judge whether the lock timed out, delete the mutex if (lockExpiresTime System.currentTimeMillis()) {redisHelper.del(mutex);logger.info("Delete the mutex[{}]", mutex);}lockHolder = null;logger.info("Release [{}] Lock successfully", mutex); return true;} else {throw new IllegalMonitorStateException("The thread that has not obtained the lock cannot perform the unlocking operation");}}

      has a problem:

      (1) The core of this lock is based on System.currentTimeMillis(). If the time of multiple servers is inconsistent, then the problem will occur, but this bug can be completely avoided from the server operation and maintenance level, and if the server time is not Similarly, as long as the logic related to time is all

      (2) If multiple servers request to acquire the lock when the previous lock timeout, then redis.getset() will be executed simultaneously, resulting in an expiration time overwrite problem. However, this situation will not affect the correct result.

      (3) There are multiple threads holding the lock at the same time: if thread A executes the task time more than the expiration time of the lock, then another thread can obtain the lock, resulting in multiple threads holding the lock at the same time. Similar to Solution 1, it can be solved by using the "lock-renewal" method.

      The problems with the first two redis distributed locks

      The implementation methods of the first two redis distributed locks are still lacking from the perspective of "high availability". That is to say, when redis is a single point, when a failure occurs, the distributed locks of the entire business will not be used.

      In order to improve availability, we can use master-slave mode or sentinel mode, but there are still problems in this case. In master-slave mode or sentinel mode, under normal circumstances, if the lock is successful, the master node will be copied asynchronously to the corresponding slave node. However, if the master node crashes during this process, the master and standby switch, the slave node changes from the master node, and the lock has not been synchronized from the old master node, this will cause the lock loss, which will cause the problem that multiple clients can hold the same lock at the same time.Let’s take a look at a picture to think about this process:

      Recommended study 1: After 30 days of hard work, I have produced this [Distributed Book: Current Limit + Cache + Communication]. Qiu Talent Job Recruitment is expected to be 2: First release on the entire network! Internal Sharing of Ma Soldiers—page 1658

      So, how to avoid this situation? Redis officially provides a highly available distributed lock solution based on multiple redis cluster deployment: RedLock. We will introduce it in detail in Solution 3. (Note: If multiple clients can tolerate locks at the same time during the master node downtime, then redLock is not required)

      Solution 3: RedLock-based distributed lock

      redLock official document address:
      https://redis.io/topics/distlock

      Redlock algorithm is a high-availability mode introduced by Redis Author Antirez on the basis of a single Redis node. Redlock locking should be implemented in conjunction with a single-node distributed lock algorithm, because it is the basis of RedLock

      1. Locking implementation principle:

      now assumes that there are 5 Redis master nodes (odd numbers greater than 3), which basically ensures that they will not all fall off at the same time. During the process of acquiring and releasing the lock, the client will perform the following operations:

      (1) Get the current unix time in milliseconds, and set the timeout time TTL

      TTL to be greater than the time of normal business execution + Get all redis services to consume + Clock drift

      (2) tries to obtain the lock from 5 instances in turn, using the same key and unique value. When requesting the lock to Redis, the client should set a network connection and response timeout time, which should be less than the failure time of the lock TTL, which can prevent the client from dying, etc. For example: TTL is 5s, and it takes up to 1s to set the lock to acquire, so if the lock cannot be acquired within one second, give up acquiring the lock, and try to acquire the next lock

      (3) client. The time after obtaining all the locks that can be acquired. Subtract the time in step (1) to obtain the lock acquisition time. The acquisition time of the lock must be less than the lock failure time TTL, and at least more than half of the Redis nodes are retrieved to be considered as the successful lock

      (4) If the lock is successfully obtained, the true valid time of the key = TTL - The acquisition time of the lock - Clock drift. For example: TTL is 5s, and it takes 2s to acquire all locks, the actual lock validity time is 3s

      (5). If for some reason, the lock acquisition fails (the lock has not been retrieved in more than half of the instances or the lock acquisition time has exceeded the validity time), the client should unlock it on all Redis instances, regardless of whether the Redis instance is locked successfully, because the server may have lost the response message but actually succeeded.

      imagines a situation where the client request to obtain locks to a Redis node successfully reaches the Redis node, and the node also successfully performs the SET operation, but the response packet it returns to the client is lost. In the client's view, the request to acquire the lock failed due to the timeout, but in the Redis view, the locking has been successfully added. Therefore, when releasing the lock, the client should also make a request to the Redis nodes that failed to acquire the lock. In fact, this situation is possible in the asynchronous communication model: it is normal for the client to communicate to the server, but there is a problem in the opposite direction.

      (6) Failed to retry: When the client cannot acquire the lock, it should retry to acquire the lock after a random time; there must be a certain number of times to retry the lock at the same time;

      retry after a random time, mainly to prevent too many clients from trying to acquire the lock at the same time, resulting in the problem of failure of both of them to acquire the lock. The schematic diagram of the

      algorithm is as follows:

      Recommended study 1: After 30 days of hard work, I have produced this [Distributed Book: Current Limit + Cache + Communication]. Qiu Talent Job Recruitment is expected to be 2: First release on the entire network! Internal Sharing of Ma Soldiers—page 1658

      2, RedLock performance and related solutions for crash recovery:

      Since most of the N Redis nodes can work normally, it can ensure that Redlock works normally, theoretically its usability is higher.The security problems we mentioned earlier in RedLock no longer exist in RedLock, but if a node crashes and restarts, it will still have an impact on the security of the lock. The specific impact is related to Redis persistence configuration:

      (1) If redis does not have a persistence function, after clientA successfully acquires the lock, clientB can re-acquire the lock, which violates the exclusive mutex of the lock;

      (2) If AOF permanent storage is started, things will be better. For example: When we restart redis, since the redis expiration mechanism is based on the Unix timestamp, after restarting, it will expire at the specified time, which will not affect the business; however, because the default method of synchronizing AOF to disk is once every second, if the power is cut within one second, data will be lost, and restarting immediately will cause lock mutually exclusive failure; but if the synchronized disk uses Always (every write command is synchronized to the hard disk), the performance will drop sharply; so there must be some choices in terms of full validity and performance of locks;

      (3) In order to effectively solve the problem, it will ensure the full validity of locks and Efficient performance problem: antirez has proposed the concept of "delayed restart". Redis is synchronized to disk and keeps the default one-second. After redis crashes a stand-alone machine (whether one or all), do not restart it immediately, but wait for the TTL time before restarting. In this way, the locks involved in this node will expire before restarting, and it will not affect the existing lock after restarting. The disadvantage is that the service is equivalent to a pause state within TTL time; the implementation of RedLock in

      3 and redisson:

      has implemented the encapsulation of RedLock in JAVA redisson package, mainly through redishclient and lua scripts. The reason for using lua Scripts are to realize transactionality of unlock verification and execution.

      (1) Generation of unique ID: In the

      distributed transaction lock, in order to allow the storage node as the central node to acquire the holder of the lock, thereby avoiding the lock being misunderstood by non-holders, each client node that initiates the request must have a globally unique id. Usually we use UUID as this unique id, and redisson also implements this way. On this basis, redisson also adds threadid to avoid the performance loss of multiple threads to repeatedly obtain UUID

      protected final UUID id = UUID.randomUUID();String getLockName(long threadId) { return id + ":" + threadId;}

      (2) Locking logic:

      redisson The core code of locking is very easy to understand, and it is passed in TTL and unique id, implements lock requests for a period of time.The following is the implementation logic of the reentrant lock:

      T RFutureT tryLockInnerAsync(long leaseTime, TimeUnit unit, long threadId, RedisStrictCommandT command) { internalLockLeaseTime = unit.toMillis(leaseTime); // Command sent to 5 redis instances when acquiring the lock return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, command, // Verify whether the KEY of the distributed lock already exists. If it does not exist, then execute the hset command (hset REDLOCK_KEY uuid+threadId 1), and set the invalidation time (also the lease time of the lock) through pexpire "if (redis.call('exists', KEYS[1]) == 0) then " + "redis.call('hset', KEYS[1], ARGV[2], 1); " + "redis.call('pexpire', KEYS[1], ARGV[1]); " + "return nil; " + "end; " + // If the KEY of the distributed lock already exists, then the unique id is verified. If the unique id matches, it means that it is the lock held by the current thread, then the number of reentries is increased by 1, and the invalidation time is set "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " + "redis.call('hincrby', KEYS[1], ARGV[2], 1); " + "redis.call('pexpire', KEYS[1], ARGV[1]); " + "return nil; " + "end; " + // Get the failure time of the KEY of the distributed lock "return redis.call('pttl', KEYS[1]);", // KEYS[1] corresponds to the key of the distributed lock; ARGV[1] corresponds to the TTL; ARGV[2] Corresponding to unique id Collections.ObjectsingletonList(getName()), internalLockLeaseTime, getLockName(threadId));}

      (3) Release lock logic:

      protected RFutureBoolean unlockInnerAsync(long threadId) { // Execute the following command to all 5 redis instances return commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN, // If the distributed lock KEY does not exist, a message is posted to the channel "if (redis.call('exists', KEYS[1]) == 0) then " + "redis.call('publish', KEYS[2], ARGV[1]); " + "return 1; " + "end;" + // If the distributed lock exists, but the unique id does not match, it means that the lock has been occupied "if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) then " + "return nil;" + "end; " + // If the current thread owns a distributed lock, then the number of reentries will be reduced by 1 "local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1); " + // If the value after the number of reentries is greater than 0, it means that the distributed lock has reentered, then only the invalidation time will be set and no deleted "if (counter 0) then " + "redis.call('pexpire', KEYS[1], ARGV[2]); " + "return 0; " + "else " + // If the value after the number of reentries will be reduced by 0, the lock will be deleted and the unlock message will be published "redis.call('del', KEYS[1]); " + "redis.call('publish', KEYS[2], ARGV[1]); " + "return 1; "+ "end; " + "return nil;", // KEYS[1] represents the key of the lock, KEYS[2] represents the channel name, ARGV[1] represents the unlock message, ARGV[2] represents the TTL, ARGV[3] represents the unique id Arrays.ObjectasList(getName(), getChannelName()), LockPubSub.unlockMessage, internalLockLeaseTime, getLockName(threadId));}

      (4) Use of RedLock in redisson:

      Config config = new Config();config.useSentinelServers().addSentinelAddress("127.0.0.1:6369","127.0.0.1:6379", "127.0.0.1:6389") .setMasterName("masterName") .setPassword("password").setDatabase(0); RedissonClient redissonClient = Redisson.create(config);RLock redLock = redissonClient.getLock("REDLOCK_KEY"); try {// Try to add lock, wait for up to 500ms, automatically unlock after locking 10 seconds boolean isLock = redLock.tryLock(500, 10000, TimeUnit.MILLISECONDS); if (isLock) { // Acquisition the lock successfully, execute the corresponding business logic }} catch (Exception e) {e.printStackTrace();} finally { redLock.unlock();} 

      can be seen, redisson In the implementation of the package, the client identity when unlocking is checked through the lua script, so we no longer need to judge whether the lock is successful in finally, nor do we need to do additional identity verification. It can be said that it has reached the level of being used out of the box.

      Similarly, the distributed lock implemented based on RedLock also has the problem that the client acquires the lock and the business logic is not completed within the TTL time. At this time, the lock will be automatically released, causing the problem of multiple threads holding the lock at the same time. In the implementation process, Redisson naturally took this problem into consideration. Redisson provides a "watchdog" feature, which continuously extends the survival time of the lock key when it is about to expire and is not released. (The specific implementation principle will be introduced in Scheme 4)

      Scheme 4: Distributed lock

      based on Redisson watchdog mentioned earlier, if for some reason the thread holding the lock has not completed the task within the lock expiration time, and the lock is automatically released because it has not timed out, then multiple threads will hold the lock at the same time. In order to solve this problem, "lock renewal" can be performed. In fact, there is a "watchdog" mechanism in JAVA's Redisson package, which has helped us implement this function.

      1. Redisson principle:

      redisson will maintain a watchdog thread after acquiring the lock. When the lock is about to expire and has not been released, it will continuously extend the survival time of the lock key

      Recommended study 1: After 30 days of hard work, I have produced this [Distributed Book: Current Limit + Cache + Communication]. Qiu Talent Job Recruitment is expected to be 2: First release on the entire network! Internal Sharing of Ma Soldiers—page 1658

      2. Locking mechanism:

      thread acquires the lock, and obtains it successfully: execute the lua script and save the data to the redis database. The thread of

      acquires the lock, but the acquisition fails: it keeps trying to acquire the lock through a while loop. After the acquisition is successful, execute the lua script and save the data to the redis database.

      3, watch dog automatic delay mechanism:

      After the watchdog is started, it will also have a certain impact on the overall performance. By default, the watchdog thread is not started. If the lock expiration time is set while using redisson to lock, the watchdog mechanism will also fail. After acquiring the lock,

      redisson will maintain a watchdog thread. At 1/3 of the expiration time set by each lock, if the thread has not completed the task, the validity period of the lock will be continuously extended. The check lock timeout time of the watchdog is 30 seconds by default, and can be changed through the lockWactchdogTimeout parameter. The default time for locking in

      is 30 seconds. If the locked service is not completed, then every 30 ÷ 3 = 10 seconds, a renewal will be performed, resetting the lock to 30 seconds to ensure that the lock will not automatically fail before unlocking.

      What if the business machine goes down? If the crash occurs, the watchdog thread cannot execute and cannot be renewed. Naturally, the lock will be uninstalled after 30 seconds.

      4, key points of redisson distributed lock:

      a. The key is not set for the key, and Redisson maintains a watchdog watchdog after the lock is successfully completed. Watchdog is responsible for timely monitoring and processing. It automatically renews the lock when the lock is not released and is about to expire, ensuring that the lock will not automatically expire before unlocking

      b. The atomic operation of locking and unlocking is realized through the Lua script. By recording the client id of the lock, it is determined whether the current client has obtained the lock every time it is locked, and a reentrable lock is realized.

      5, Redisson usage:

      In solution 3, we have demonstrated the use case of RedLock based on Redisson. In fact, Redisson also encapsulates reentrant locks, fair locks, interlocks (MultiLocks), red locks, read-write locks (ReadWriteLocks), semaphores, expiration semaphores (PermitExpirableSemaphores), CountDownLatch, etc. For details, please refer to the official document: Redisson's distributed lock and synchronizer

      Attachment: RedLock's official document translation

      Recommended study 1: After 30 days of hard work, I have produced this [Distributed Book: Current Limit + Cache + Communication]. Qiu Talent Job Recruitment is expected to be 2: First release on the entire network! Internal Sharing of Ma Soldiers—page 1658

      Author: Zhang Weipeng

      Original link:
      https://blog.csdn.net/a745233700/article/details/88084219

      Similarly, the distributed lock implemented based on RedLock also has the problem that the client acquires the lock and the business logic is not completed within the TTL time. At this time, the lock will be automatically released, causing the problem of multiple threads holding the lock at the same time. In the implementation process, Redisson naturally took this problem into consideration. Redisson provides a "watchdog" feature, which continuously extends the survival time of the lock key when it is about to expire and is not released. (The specific implementation principle will be introduced in Scheme 4)

      Scheme 4: Distributed lock

      based on Redisson watchdog mentioned earlier, if for some reason the thread holding the lock has not completed the task within the lock expiration time, and the lock is automatically released because it has not timed out, then multiple threads will hold the lock at the same time. In order to solve this problem, "lock renewal" can be performed. In fact, there is a "watchdog" mechanism in JAVA's Redisson package, which has helped us implement this function.

      1. Redisson principle:

      redisson will maintain a watchdog thread after acquiring the lock. When the lock is about to expire and has not been released, it will continuously extend the survival time of the lock key

      Recommended study 1: After 30 days of hard work, I have produced this [Distributed Book: Current Limit + Cache + Communication]. Qiu Talent Job Recruitment is expected to be 2: First release on the entire network! Internal Sharing of Ma Soldiers—page 1658

      2. Locking mechanism:

      thread acquires the lock, and obtains it successfully: execute the lua script and save the data to the redis database. The thread of

      acquires the lock, but the acquisition fails: it keeps trying to acquire the lock through a while loop. After the acquisition is successful, execute the lua script and save the data to the redis database.

      3, watch dog automatic delay mechanism:

      After the watchdog is started, it will also have a certain impact on the overall performance. By default, the watchdog thread is not started. If the lock expiration time is set while using redisson to lock, the watchdog mechanism will also fail. After acquiring the lock,

      redisson will maintain a watchdog thread. At 1/3 of the expiration time set by each lock, if the thread has not completed the task, the validity period of the lock will be continuously extended. The check lock timeout time of the watchdog is 30 seconds by default, and can be changed through the lockWactchdogTimeout parameter. The default time for locking in

      is 30 seconds. If the locked service is not completed, then every 30 ÷ 3 = 10 seconds, a renewal will be performed, resetting the lock to 30 seconds to ensure that the lock will not automatically fail before unlocking.

      What if the business machine goes down? If the crash occurs, the watchdog thread cannot execute and cannot be renewed. Naturally, the lock will be uninstalled after 30 seconds.

      4, key points of redisson distributed lock:

      a. The key is not set for the key, and Redisson maintains a watchdog watchdog after the lock is successfully completed. Watchdog is responsible for timely monitoring and processing. It automatically renews the lock when the lock is not released and is about to expire, ensuring that the lock will not automatically expire before unlocking

      b. The atomic operation of locking and unlocking is realized through the Lua script. By recording the client id of the lock, it is determined whether the current client has obtained the lock every time it is locked, and a reentrable lock is realized.

      5, Redisson usage:

      In solution 3, we have demonstrated the use case of RedLock based on Redisson. In fact, Redisson also encapsulates reentrant locks, fair locks, interlocks (MultiLocks), red locks, read-write locks (ReadWriteLocks), semaphores, expiration semaphores (PermitExpirableSemaphores), CountDownLatch, etc. For details, please refer to the official document: Redisson's distributed lock and synchronizer

      Attachment: RedLock's official document translation

      Recommended study 1: After 30 days of hard work, I have produced this [Distributed Book: Current Limit + Cache + Communication]. Qiu Talent Job Recruitment is expected to be 2: First release on the entire network! Internal Sharing of Ma Soldiers—page 1658

      Author: Zhang Weipeng

      Original link:
      https://blog.csdn.net/a745233700/article/details/88084219

hotcomm Category Latest News