一、PostgreSQL中的鎖
PostgreSQL中根據不同對象,不同使用場景,使用到了三種鎖,即spinLock,LWLock,Lock
1.spinLock
SpinLock也就是所謂的自旋鎖,是併發場景下(多進程/線程),保護共享資源的一種機制。實現的成本最低,一般是使用基於硬件的TAS操作(test-and-set來實現的)。顯著的特點是審請鎖的進程一直在嘗試能否加鎖成功,只有等到持有鎖的線程釋放鎖之後纔可以獲取鎖。在等待鎖的過程中進程並不是切入內核態進行sleep,而是忙等待,即忙循環–旋轉–等待鎖重新可用,因此一直在user態使用cpu。該鎖只有獨佔一種模式
因此使用場景爲:佔鎖時間短,對臨界資源進行簡單訪問,並且臨界區較短。
PostgreSQL中spinLock的使用:
1.CPU指令集TAS方式:
/*
* s_lock(lock) - platform-independent portion of waiting for a spinlock.
*/
int
s_lock(volatile slock_t *lock, const char *file, int line, const char *func)
{
SpinDelayStatus delayStatus;
// 初始化SpinLock的狀態信息
init_spin_delay(&delayStatus, file, line, func);
while (TAS_SPIN(lock)) //這裏調用TAS
{
// spins,在cpu級別有一個delay時間,另外當spin次數大於100,
// 在此函數中會隨機休眠1ms到1s
perform_spin_delay(&delayStatus);
}
// 獲取鎖後,根據delay的結果調整進入休眠的spin次數,如果,在獲取鎖的時候
// 沒有休眠過,那麼可以把進入休眠spin的次數調大。如果休眠過,表示鎖競爭大,
// 就把進入休眠spin的次數降低,減少CPU消耗。
finish_spin_delay(&delayStatus);
return delayStatus.delays;
TAS函數實現
#ifdef __x86_64__ /* AMD Opteron, Intel EM64T */
#define HAS_TEST_AND_SET
typedef unsigned char slock_t;
#define TAS(lock) tas(lock)
/*
* On Intel EM64T, it's a win to use a non-locking test before the xchg proper,
* but only when spinning.
*
* See also Implementing Scalable Atomic Locks for Multi-Core Intel(tm) EM64T
* and IA32, by Michael Chynoweth and Mary R. Lee. As of this writing, it is
* available at:
* http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures
*/
#define TAS_SPIN(lock) (*(lock) ? 1 : TAS(lock))
static __inline__ int
tas(volatile slock_t *lock)
{
register slock_t _res = 1;
__asm__ __volatile__(
" lock \n"
" xchgb %0,%1 \n"
: "+q"(_res), "+m"(*lock)
: /* no inputs */
: "memory", "cc");
return (int) _res;
}
2.使用semaphore實現
如果DB運行的平臺沒有test-and-set指令,則使用PGsemaphore實現SpinLock。PG中默認有128個信號量用於SpinLock(系統默認最大同時可用的semaphore爲128,cat /proc/sys/kernel/sem 查看),PG信號量實現的加鎖邏輯如下:
int
tas_sema(volatile slock_t *lock)
{
int lockndx = *lock;
if (lockndx <= 0 || lockndx > NUM_SPINLOCK_SEMAPHORES)
elog(ERROR, "invalid spinlock number: %d", lockndx);
/* Note that TAS macros return 0 if *success* */
return !PGSemaphoreTryLock(SpinlockSemaArray[lockndx - 1]);
}
/*
* PGSemaphoreTryLock
*
* Lock a semaphore only if able to do so without blocking
*/
bool
PGSemaphoreTryLock(PGSemaphore sema)
{
int errStatus;
/*
* Note: if errStatus is -1 and errno == EINTR then it means we returned
* from the operation prematurely because we were sent a signal. So we
* try and lock the semaphore again.
*/
do
{
errStatus = sem_trywait(PG_SEM_REF(sema));
} while (errStatus < 0 && errno == EINTR);
if (errStatus < 0)
{
if (errno == EAGAIN || errno == EDEADLK)
return false; /* failed to lock it */
/* Otherwise we got trouble */
elog(FATAL, "sem_trywait failed: %m");
}
return true;
}
由於SpinLock不能用於需要長久持有鎖的邏輯,在PostgreSQL中,SpinLock主要用於對於臨界變量的併發訪問控制,所保護的臨界區通常是簡單的賦值語句,讀取語句等等。
2.LWlock
LWlock:Lightweight Lock,即所謂的輕量級鎖,這個輕量是相對第三種Lock而言的。基於spinLock實現,除了獨佔模式(互斥),還多了一種共享模式和一種special mode。
typedef enum LWLockMode
{
LW_EXCLUSIVE,
LW_SHARED,
LW_WAIT_UNTIL_FREE /* A special mode used in PGPROC->lwlockMode,
* when waiting for lock to become free. Not
* to be used as LWLockAcquire argument */
} LWLockMode;
其主要是以互斥訪問的方式用來保護共享內存數據結構,比如Clog buffer(事務提交狀態緩存)、Shared buffers(數據頁緩存)、wal buffer(wal緩存)等等。
LWlock數據結構定義:
typedef struct LWLock
{
uint16 tranche; /* tranche ID */
pg_atomic_uint32 state; /* state of exclusive/nonexclusive lockers */
proclist_head waiters; /* list of waiting PGPROCs */
#ifdef LOCK_DEBUG
pg_atomic_uint32 nwaiters; /* number of waiters */
struct PGPROC *owner; /* last exclusive owner of the lock */
#endif
} LWLock;
在PostgreSQL中LWLock根據使用場景不同,被細化爲多個子模塊
/*
* Every tranche ID less than NUM_INDIVIDUAL_LWLOCKS is reserved; also,
* we reserve additional tranche IDs for builtin tranches not included in
* the set of individual LWLocks. A call to LWLockNewTrancheId will never
* return a value less than LWTRANCHE_FIRST_USER_DEFINED.
*/
typedef enum BuiltinTrancheIds
{
LWTRANCHE_CLOG_BUFFERS = NUM_INDIVIDUAL_LWLOCKS,
LWTRANCHE_COMMITTS_BUFFERS,
LWTRANCHE_SUBTRANS_BUFFERS,
LWTRANCHE_MXACTOFFSET_BUFFERS,
LWTRANCHE_MXACTMEMBER_BUFFERS,
LWTRANCHE_ASYNC_BUFFERS,
LWTRANCHE_OLDSERXID_BUFFERS,
LWTRANCHE_WAL_INSERT,
LWTRANCHE_BUFFER_CONTENT,
LWTRANCHE_BUFFER_IO_IN_PROGRESS,
LWTRANCHE_REPLICATION_ORIGIN,
LWTRANCHE_REPLICATION_SLOT_IO_IN_PROGRESS,
LWTRANCHE_PROC,
LWTRANCHE_BUFFER_MAPPING,
LWTRANCHE_LOCK_MANAGER,
LWTRANCHE_PREDICATE_LOCK_MANAGER,
LWTRANCHE_PARALLEL_HASH_JOIN,
LWTRANCHE_PARALLEL_QUERY_DSA,
LWTRANCHE_SESSION_DSA,
LWTRANCHE_SESSION_RECORD_TABLE,
LWTRANCHE_SESSION_TYPMOD_TABLE,
LWTRANCHE_SHARED_TUPLESTORE,
LWTRANCHE_TBM,
LWTRANCHE_PARALLEL_APPEND,
LWTRANCHE_FIRST_USER_DEFINED
} BuiltinTrancheIds;
const char *const MainLWLockNames[] = {
"<unassigned:0>",
"ShmemIndexLock",
"OidGenLock",
"XidGenLock",
"ProcArrayLock",
"SInvalReadLock",
"SInvalWriteLock",
"WALBufMappingLock",
"WALWriteLock",
"ControlFileLock",
"CheckpointLock",
"CLogControlLock",
"SubtransControlLock",
"MultiXactGenLock",
"MultiXactOffsetControlLock",
"MultiXactMemberControlLock",
"RelCacheInitLock",
"CheckpointerCommLock",
"TwoPhaseStateLock",
"TablespaceCreateLock",
"BtreeVacuumLock",
"AddinShmemInitLock",
"AutovacuumLock",
"AutovacuumScheduleLock",
"SyncScanLock",
"RelationMappingLock",
"AsyncCtlLock",
"AsyncQueueLock",
"SerializableXactHashLock",
"SerializableFinishedListLock",
"SerializablePredicateLockListLock",
"OldSerXidLock",
"SyncRepLock",
"BackgroundWorkerLock",
"DynamicSharedMemoryControlLock",
"AutoFileLock",
"ReplicationSlotAllocationLock",
"ReplicationSlotControlLock",
"CommitTsControlLock",
"CommitTsLock",
"ReplicationOriginLock",
"MultiXactTruncationLock",
"OldSnapshotTimeMapLock",
"LogicalRepWorkerLock",
"CLogTruncationLock"
};
LWLock的初始化:
在PG初始化shared mem和信號量時,會初始化LWLock array(CreateLWLocks)。
具體爲:
- 計算LWLock需要佔用的shared mem的內存空間:算出固定的和每個子模塊(requested named tranches)LWLock的個數(固定在系統初始化階段就需要分配的LWLock有:buffer_mapping,lock_manager,predicate_lock_manager,parallel_query_dsa,tbm),每個LWLock的大小(LWLOCK_PADDED_SIZE+counter,couter爲計數器,記錄share鎖的數量),子模塊的信息佔用大小。
- 分配內存空間,與cache line對齊。
- LWLockInitialize函數依次對每個LWLock做初始化,並將LWLock的狀態置爲LW_FLAG_RELEASE_OK。
/*
* Initialize LWLocks that are fixed and those belonging to named tranches.
*/
static void
InitializeLWLocks(void)
{
int numNamedLocks = NumLWLocksByNamedTranches();
int id;
int i;
int j;
LWLockPadded *lock;
/* Initialize all individual LWLocks in main array */
/* 初始化BuiltinTrancheIds enum成員*/
for (id = 0, lock = MainLWLockArray; id < NUM_INDIVIDUAL_LWLOCKS; id++, lock++)
LWLockInitialize(&lock->lock, id);
/* Initialize buffer mapping LWLocks in main array */
lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS;
for (id = 0; id < NUM_BUFFER_PARTITIONS; id++, lock++)
LWLockInitialize(&lock->lock, LWTRANCHE_BUFFER_MAPPING);
/* Initialize lmgrs' LWLocks in main array */
lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS + NUM_BUFFER_PARTITIONS;
for (id = 0; id < NUM_LOCK_PARTITIONS; id++, lock++)
LWLockInitialize(&lock->lock, LWTRANCHE_LOCK_MANAGER);
/* Initialize predicate lmgrs' LWLocks in main array */
lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS +
NUM_BUFFER_PARTITIONS + NUM_LOCK_PARTITIONS;
for (id = 0; id < NUM_PREDICATELOCK_PARTITIONS; id++, lock++)
LWLockInitialize(&lock->lock, LWTRANCHE_PREDICATE_LOCK_MANAGER);
/* Initialize named tranches. */
if (NamedLWLockTrancheRequests > 0)
{
char *trancheNames;
NamedLWLockTrancheArray = (NamedLWLockTranche *)
&MainLWLockArray[NUM_FIXED_LWLOCKS + numNamedLocks];
trancheNames = (char *) NamedLWLockTrancheArray +
(NamedLWLockTrancheRequests * sizeof(NamedLWLockTranche));
lock = &MainLWLockArray[NUM_FIXED_LWLOCKS];
for (i = 0; i < NamedLWLockTrancheRequests; i++)
{
NamedLWLockTrancheRequest *request;
NamedLWLockTranche *tranche;
char *name;
request = &NamedLWLockTrancheRequestArray[i];
tranche = &NamedLWLockTrancheArray[i];
name = trancheNames;
trancheNames += strlen(request->tranche_name) + 1;
strcpy(name, request->tranche_name);
tranche->trancheId = LWLockNewTrancheId();
tranche->trancheName = name;
for (j = 0; j < request->num_lwlocks; j++, lock++)
LWLockInitialize(&lock->lock, tranche->trancheId);
}
}
}
- LWLockRegisterTranche函數註冊所有的已經初始化LWLock的子模塊,包括系統預先定義(BuiltinTrancheIds)的和用戶自定義的。
/*
* Register named tranches and tranches for fixed LWLocks.
*/
static void
RegisterLWLockTranches(void)
{
int i;
if (LWLockTrancheArray == NULL)
{
LWLockTranchesAllocated = 128;
LWLockTrancheArray = (const char **)
MemoryContextAllocZero(TopMemoryContext,
LWLockTranchesAllocated * sizeof(char *));
Assert(LWLockTranchesAllocated >= LWTRANCHE_FIRST_USER_DEFINED);
}
for (i = 0; i < NUM_INDIVIDUAL_LWLOCKS; ++i)
/* 註冊MainLWLockNames[] array成員*/
LWLockRegisterTranche(i, MainLWLockNames[i]);
LWLockRegisterTranche(LWTRANCHE_BUFFER_MAPPING, "buffer_mapping");
LWLockRegisterTranche(LWTRANCHE_LOCK_MANAGER, "lock_manager");
LWLockRegisterTranche(LWTRANCHE_PREDICATE_LOCK_MANAGER,
"predicate_lock_manager");
LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
"parallel_query_dsa");
LWLockRegisterTranche(LWTRANCHE_SESSION_DSA,
"session_dsa");
LWLockRegisterTranche(LWTRANCHE_SESSION_RECORD_TABLE,
"session_record_table");
LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
"session_typmod_table");
LWLockRegisterTranche(LWTRANCHE_SHARED_TUPLESTORE,
"shared_tuplestore");
LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN, "parallel_hash_join");
LWLockRegisterTranche(LWTRANCHE_SXACT, "serializable_xact");
/* Register named tranches. */
for (i = 0; i < NamedLWLockTrancheRequests; i++)
LWLockRegisterTranche(NamedLWLockTrancheArray[i].trancheId,
NamedLWLockTrancheArray[i].trancheName);
}
LWLock的使用:
1.獲取鎖
調用LWLockAcquire(LWLock *lock, LWLockMode mode)函數來加鎖,其中mode可以爲LW_SHARED(共享)和LW_EXCLUSIVE(排他)。
加鎖時,首先把需要加的鎖放入等待隊列,然後通過LWLock中的state狀態判斷是否可以加鎖成功,如果可以加鎖成功,使用原子操作campare and set來修改LWLock的狀態,把鎖從等待隊列中刪除。否則,需要等鎖。
還可以使用LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)來獲取鎖,與LWLockAcquire不同的是如果獲取不到直接返回,不會休眠等待。
LWLockAcquireOrWait函數,如果加鎖不成功,會一直等待,但是如果鎖狀態變爲free之後,不會再加鎖而是直接返回;當前這個函數在WALWriteLock中被使用,當一個backend需要flush WAL時,會加上WALWriteLock,然後會順帶把其它backend產生的WAL也flush了,因此,其它等鎖去flush WAL的backend其實也並不需要再去flush WAL了
/*
* LWLockAcquire - acquire a lightweight lock in the specified mode
*
* If the lock is not available, sleep until it is. Returns true if the lock
* was available immediately, false if we had to sleep.
*
* Side effect: cancel/die interrupts are held off until lock release.
*/
bool
LWLockAcquire(LWLock *lock, LWLockMode mode)
{
PGPROC *proc = MyProc;
bool result = true;
int extraWaits = 0;
#ifdef LWLOCK_STATS
lwlock_stats *lwstats;
lwstats = get_lwlock_stats_entry(lock);
#endif
AssertArg(mode == LW_SHARED || mode == LW_EXCLUSIVE);
PRINT_LWDEBUG("LWLockAcquire", lock, mode);
#ifdef LWLOCK_STATS
/* Count lock acquisition attempts */
if (mode == LW_EXCLUSIVE)
lwstats->ex_acquire_count++;
else
lwstats->sh_acquire_count++;
#endif /* LWLOCK_STATS */
/*
* We can't wait if we haven't got a PGPROC. This should only occur
* during bootstrap or shared memory initialization. Put an Assert here
* to catch unsafe coding practices.
*/
Assert(!(proc == NULL && IsUnderPostmaster));
/* Ensure we will have room to remember the lock */
if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
elog(ERROR, "too many LWLocks taken");
/*
* Lock out cancel/die interrupts until we exit the code section protected
* by the LWLock. This ensures that interrupts will not interfere with
* manipulations of data structures in shared memory.
*/
HOLD_INTERRUPTS();
/*
* Loop here to try to acquire lock after each time we are signaled by
* LWLockRelease.
*
* NOTE: it might seem better to have LWLockRelease actually grant us the
* lock, rather than retrying and possibly having to go back to sleep. But
* in practice that is no good because it means a process swap for every
* lock acquisition when two or more processes are contending for the same
* lock. Since LWLocks are normally used to protect not-very-long
* sections of computation, a process needs to be able to acquire and
* release the same lock many times during a single CPU time slice, even
* in the presence of contention. The efficiency of being able to do that
* outweighs the inefficiency of sometimes wasting a process dispatch
* cycle because the lock is not free when a released waiter finally gets
* to run. See pgsql-hackers archives for 29-Dec-01.
*/
/* 主循環*/
for (;;)
{
bool mustwait;
/*
* Try to grab the lock the first time, we're not in the waitqueue
* yet/anymore.
*/
/* 第一次嘗試加鎖,如果成功,函數返回false,並跳出循環 */
mustwait = LWLockAttemptLock(lock, mode);
/* mustwait 爲false,說明加鎖成功,跳出循環 */
if (!mustwait)
{
LOG_LWDEBUG("LWLockAcquire", lock, "immediately acquired lock");
break; /* got the lock */
}
/*
* Ok, at this point we couldn't grab the lock on the first try. We
* cannot simply queue ourselves to the end of the list and wait to be
* woken up because by now the lock could long have been released.
* Instead add us to the queue and try to grab the lock again. If we
* succeed we need to revert the queuing and be happy, otherwise we
* recheck the lock. If we still couldn't grab it, we know that the
* other locker will see our queue entries when releasing since they
* existed before we checked for the lock.
*/
/* 第一次嘗試加鎖失敗,因此將該鎖加入到等待隊列中*/
/* add to the queue */
LWLockQueueSelf(lock, mode);
/* we're now guaranteed to be woken up if necessary */
mustwait = LWLockAttemptLock(lock, mode);
/* 第二次嘗試獲取鎖成功,成功獲取後,將鎖從等待隊列中撤銷 */
/* ok, grabbed the lock the second time round, need to undo queueing */
if (!mustwait)
{
LOG_LWDEBUG("LWLockAcquire", lock, "acquired, undoing queue");
LWLockDequeueSelf(lock);
break;
}
/*
* Wait until awakened.
*
* Since we share the process wait semaphore with the regular lock
* manager and ProcWaitForSignal, and we may need to acquire an LWLock
* while one of those is pending, it is possible that we get awakened
* for a reason other than being signaled by LWLockRelease. If so,
* loop back and wait again. Once we've gotten the LWLock,
* re-increment the sema by the number of additional signals received,
* so that the lock manager or signal manager will see the received
* signal when it next waits.
*/
/* 以下就是等鎖邏輯了,是通過PGSemaphoreLock函數實現的 */
LOG_LWDEBUG("LWLockAcquire", lock, "waiting");
#ifdef LWLOCK_STATS
lwstats->block_count++;
#endif
/* 記錄當前等待事件類型爲LW_Lock,並傳遞具體事件lock->tranche (tranche這個枚舉成員在文章前邊展示過)*/
LWLockReportWaitStart(lock);
TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
/* 等鎖操作 */
for (;;)
{ /* 信號量加鎖 */
PGSemaphoreLock(proc->sem);
if (!proc->lwWaiting)
break;
extraWaits++;
}
/* Retrying, allow LWLockRelease to release waiters again. */
pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_RELEASE_OK);
#ifdef LOCK_DEBUG
{
/* not waiting anymore */
uint32 nwaiters PG_USED_FOR_ASSERTS_ONLY = pg_atomic_fetch_sub_u32(&lock->nwaiters, 1);
Assert(nwaiters < MAX_BACKENDS);
}
#endif
TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
LWLockReportWaitEnd();
/* 等鎖結束 */
LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
/* Now loop back and try to acquire lock again. */
result = false;
}
TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks].lock = lock;
held_lwlocks[num_held_lwlocks++].mode = mode;
/*
* Fix the process wait semaphore's count for any absorbed wakeups.
*/
/* 信號量解鎖 */
while (extraWaits-- > 0)
PGSemaphoreUnlock(proc->sem);
return result;
}
- 等鎖:
等鎖是由PGSemaphoreLock函數完成的,當沒有加上鎖時,會等待一個信號量proc->sem(此時會休眠,不會消耗CPU)。
/*
* PGSemaphoreLock
*
* Lock a semaphore (decrement count), blocking if count would be < 0
*/
void
PGSemaphoreLock(PGSemaphore sema)
{
int errStatus;
/* See notes in sysv_sema.c's implementation of PGSemaphoreLock. */
do
{ /* 調用sem_wait函數,等待信號量,如果信號量的值大於0*/
/* 將信號量的值減1,立即返回。如果信號量的值爲0,則線程阻塞。*/
/* 相當於P操作。成功返回0,失敗返回-1 */
/* sem指向的對象是由sem_init調用初始化的信號量*/
errStatus = sem_wait(PG_SEM_REF(sema));
} while (errStatus < 0 && errno == EINTR);
if (errStatus < 0)
elog(FATAL, "sem_wait failed: %m");
}
- 釋放鎖:
由LWLockRelease(LWLock *lock)函數完成
3.Lock
Lock是pg中的重量級鎖,主要用來操作數據庫對象,分類如下:
/* NoLock is not a lock mode, but a flag value meaning "don't get a lock" */
#define NoLock 0
#define AccessShareLock 1 /* SELECT */
#define RowShareLock 2 /* SELECT FOR UPDATE/FOR SHARE */
#define RowExclusiveLock 3 /* INSERT, UPDATE, DELETE */
#define ShareUpdateExclusiveLock 4 /* VACUUM (non-FULL),ANALYZE, CREATE INDEX
* CONCURRENTLY */
#define ShareLock 5 /* CREATE INDEX (WITHOUT CONCURRENTLY) */
#define ShareRowExclusiveLock 6 /* like EXCLUSIVE MODE, but allows ROW
* SHARE */
#define ExclusiveLock 7 /* blocks ROW SHARE/SELECT...FOR UPDATE */
#define AccessExclusiveLock 8 /* ALTER TABLE, DROP TABLE, VACUUM FULL,
* and unqualified LOCK TABLE */
Lock的使用場景較多,我們拿一個update語句執行堆棧來分析加鎖,等鎖過程
[postgres@postgres_zabbix ~]$ pstack 21318
#0 0x00007f1706ee95e3 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1 0x0000000000853571 in WaitEventSetWaitBlock (set=0x2164778, cur_timeout=-1, occurred_events=0x7ffcac6f15b0, nevents=1) at latch.c:1080
#2 0x000000000085344c in WaitEventSetWait (set=0x2164778, timeout=-1, occurred_events=0x7ffcac6f15b0, nevents=1, wait_event_info=50331652) at latch.c:1032
#3 0x0000000000852d38 in WaitLatchOrSocket (latch=0x7f17001cf5c4, wakeEvents=33, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:407
#4 0x0000000000852c03 in WaitLatch (latch=0x7f17001cf5c4, wakeEvents=33, timeout=0, wait_event_info=50331652) at latch.c:347
#5 0x0000000000867ccb in ProcSleep (locallock=0x2072938, lockMethodTable=0xb8f5a0 <default_lockmethod>) at proc.c:1289
#6 0x0000000000861f98 in WaitOnLock (locallock=0x2072938, owner=0x2081d40) at lock.c:1768
#7 0x00000000008610be in LockAcquireExtended (locktag=0x7ffcac6f1a90, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1050
#8 0x0000000000860713 in LockAcquire (locktag=0x7ffcac6f1a90, lockmode=5, sessionLock=false, dontWait=false) at lock.c:713
#9 0x000000000085f592 in XactLockTableWait (xid=501, rel=0x7f1707c8fb10, ctid=0x7ffcac6f1b44, oper=XLTW_Update) at lmgr.c:658
#10 0x00000000004c99c9 in heap_update (relation=0x7f1707c8fb10, otid=0x7ffcac6f1e60, newtup=0x2164708, cid=1, crosscheck=0x0, wait=true, tmfd=0x7ffcac6f1d60, lockmode=0x7ffcac6f1d5c) at heapam.c:3228
#11 0x00000000004d411c in heapam_tuple_update (relation=0x7f1707c8fb10, otid=0x7ffcac6f1e60, slot=0x2162108, cid=1, snapshot=0x2074500, crosscheck=0x0, wait=true, tmfd=0x7ffcac6f1d60, lockmode=0x7ffcac6f1d5c, update_indexes=0x7ffcac6f1d5b) at heapam_handler.c:332
#12 0x00000000006db007 in table_tuple_update (rel=0x7f1707c8fb10, otid=0x7ffcac6f1e60, slot=0x2162108, cid=1, snapshot=0x2074500, crosscheck=0x0, wait=true, tmfd=0x7ffcac6f1d60, lockmode=0x7ffcac6f1d5c, update_indexes=0x7ffcac6f1d5b) at ../../../src/include/access/tableam.h:1275
#13 0x00000000006dce83 in ExecUpdate (mtstate=0x2160b40, tupleid=0x7ffcac6f1e60, oldtuple=0x0, slot=0x2162108, planSlot=0x21613a0, epqstate=0x2160c38, estate=0x21607c0, canSetTag=true) at nodeModifyTable.c:1311
#14 0x00000000006de36c in ExecModifyTable (pstate=0x2160b40) at nodeModifyTable.c:2222
#15 0x00000000006b2b07 in ExecProcNodeFirst (node=0x2160b40) at execProcnode.c:445
#16 0x00000000006a8ce7 in ExecProcNode (node=0x2160b40) at ../../../src/include/executor/executor.h:239
#17 0x00000000006ab063 in ExecutePlan (estate=0x21607c0, planstate=0x2160b40, use_parallel_mode=false, operation=CMD_UPDATE, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x2146860, execute_once=true) at execMain.c:1646
#18 0x00000000006a91c4 in standard_ExecutorRun (queryDesc=0x2152ab0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:364
#19 0x00000000006a9069 in ExecutorRun (queryDesc=0x2152ab0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:308
#20 0x000000000088017a in ProcessQuery (plan=0x2146780, sourceText=0x204b040 "update test_tbl set id=4 where id=3;", params=0x0, queryEnv=0x0, dest=0x2146860, completionTag=0x7ffcac6f2270 "") at pquery.c:161
#21 0x00000000008818c1 in PortalRunMulti (portal=0x20b6570, isTopLevel=true, setHoldSnapshot=false, dest=0x2146860, altdest=0x2146860, completionTag=0x7ffcac6f2270 "") at pquery.c:1283
#22 0x0000000000880efb in PortalRun (portal=0x20b6570, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2146860, altdest=0x2146860, completionTag=0x7ffcac6f2270 "") at pquery.c:796
#23 0x000000000087b28f in exec_simple_query (query_string=0x204b040 "update test_tbl set id=4 where id=3;") at postgres.c:1215
#24 0x000000000087f30f in PostgresMain (argc=1, argv=0x207a6e0, dbname=0x207a578 "postgres", username=0x207a558 "postgres") at postgres.c:4247
#25 0x00000000007e6a9e in BackendRun (port=0x20702a0) at postmaster.c:4437
#26 0x00000000007e629d in BackendStartup (port=0x20702a0) at postmaster.c:4128
#27 0x00000000007e293d in ServerLoop () at postmaster.c:1704
#28 0x00000000007e21fd in PostmasterMain (argc=1, argv=0x2045c00) at postmaster.c:1377
#29 0x000000000070f76d in main (argc=1, argv=0x2045c00) at main.c:228
[postgres@postgres_zabbix ~]$
這是一個被阻塞的update語句,目前在等鎖狀態(等待之前持有鎖的事務提交)。
postgres=# select pid,wait_event_type,wait_event,query from pg_stat_activity where pid=21318;
-[ RECORD 1 ]---+-------------------------------------
pid | 21318
wait_event_type | Lock
wait_event | transactionid
query | update test_tbl set id=4 where id=3;
加鎖:
調用LockAcquire (locktag=0x7ffcac6f1a90, lockmode=5, sessionLock=false, dontWait=false) ,申請的LockMode爲5,即 ShareLock
LockAcquire函數體計較長,這裏只概述大致的邏輯:
1)根據locktag中給定的需要加鎖對象的相關信息查詢hash表。因爲同一鎖可能被持有多次,爲了加快訪問速度,故而將這些所緩存在hash table中。
例如:
當我們需要執行對table進行加鎖操作時,會將我們所需要操作的數據庫編號,表的編號等信息存儲在locktag中;
當我們需要執行對Tuple進行加鎖操作時候,會將數據庫編號,表的編號,塊號及相應的偏移量等信息設置在locktag中。SET_LOCKTAG_XXX完成了對於相應LOCKTAG的設置工作;
因此首先是查找LocalLOCK hash表並根據結果進行相應的處理;
locallock = (LOCALLOCK *) hash_search(LockMethodLocalHash,
(void *) &localtag,
HASH_ENTER, &found);
2)檢查該對象是否已經獲取相應的鎖;
3)依據相應條件,對該鎖申請操作添加WAL日誌;
4)進行鎖衝突檢測;
5)當不存在相應的訪問衝突後,則進行鎖申請操作並記錄下該資源對於鎖的使用情況;當發現存在着訪問衝突後,需要進行鎖等待處理,使用WaitOnLock進行等待(底層是epoll實現的);
6)告知鎖的申請結果
等鎖:
調用WaitOnLock (locallock=0x2072938, owner=0x2081d40)
底層實現是通過epoll實現的,可以看到頂層堆棧在epoll_wait函數中
#0 0x00007f1706ee95e3 in __epoll_wait_nocancel () from /lib64/libc.so.6
釋放鎖:
調用:LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
不詳細分析了,在事務提交/回滾後釋放鎖。
二、PostgreSQL中鎖的對比
spinlock
主要特點:輕量鎖,只有排他一種模式,無等待隊列,等鎖爲忙等待,cpu空轉
使用場景:佔鎖時間短,對臨界資源進行簡單訪問,並且臨界區較短。臨界區通常是簡單的賦值語句,讀取語句等等
LWLock
主要特點:輕量級鎖,除了排他模式,還存在共享模式。存在等待隊列,等鎖通過sem_wait實現
使用場景:臨界區較長,且邏輯關係比較複雜,對臨界資源的操作比較複雜。比如操作Clog buffer(事務提交狀態緩存)、Shared buffers(數據頁緩存)、wal buffer(wal緩存)等等。
Lock
主要特點:重量級鎖,持有時間可以很長,等鎖通過epoll實現
使用場景:對所有數據庫對象的操作,例如表的增刪改查
參考:
https://zhuanlan.zhihu.com/p/73517810
http://www.postgres.cn/news/viewone/1/241