PostgreSQL中的鎖--spinLock、LWLock、Lock

一、PostgreSQL中的鎖

PostgreSQL中根據不同對象,不同使用場景,使用到了三種鎖,即spinLock,LWLock,Lock

1.spinLock

SpinLock也就是所謂的自旋鎖,是併發場景下(多進程/線程),保護共享資源的一種機制。實現的成本最低,一般是使用基於硬件的TAS操作(test-and-set來實現的)。顯著的特點是審請鎖的進程一直在嘗試能否加鎖成功,只有等到持有鎖的線程釋放鎖之後纔可以獲取鎖。在等待鎖的過程中進程並不是切入內核態進行sleep,而是忙等待,即忙循環–旋轉–等待鎖重新可用,因此一直在user態使用cpu。該鎖只有獨佔一種模式

因此使用場景爲:佔鎖時間短,對臨界資源進行簡單訪問,並且臨界區較短。

PostgreSQL中spinLock的使用

1.CPU指令集TAS方式:


/*
 * s_lock(lock) - platform-independent portion of waiting for a spinlock.
 */
int
s_lock(volatile slock_t *lock, const char *file, int line, const char *func)
{
	SpinDelayStatus delayStatus;

        // 初始化SpinLock的狀態信息
	init_spin_delay(&delayStatus, file, line, func);

	while (TAS_SPIN(lock))  //這裏調用TAS
	{
                // spins,在cpu級別有一個delay時間,另外當spin次數大於100,
                // 在此函數中會隨機休眠1ms到1s
		perform_spin_delay(&delayStatus);
	}

        // 獲取鎖後,根據delay的結果調整進入休眠的spin次數,如果,在獲取鎖的時候
        // 沒有休眠過,那麼可以把進入休眠spin的次數調大。如果休眠過,表示鎖競爭大,
        // 就把進入休眠spin的次數降低,減少CPU消耗。
	finish_spin_delay(&delayStatus);

	return delayStatus.delays;
TAS函數實現
#ifdef __x86_64__		/* AMD Opteron, Intel EM64T */
#define HAS_TEST_AND_SET

typedef unsigned char slock_t;

#define TAS(lock) tas(lock)

/*
 * On Intel EM64T, it's a win to use a non-locking test before the xchg proper,
 * but only when spinning.
 *
 * See also Implementing Scalable Atomic Locks for Multi-Core Intel(tm) EM64T
 * and IA32, by Michael Chynoweth and Mary R. Lee. As of this writing, it is
 * available at:
 * http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures
 */
#define TAS_SPIN(lock)    (*(lock) ? 1 : TAS(lock))

static __inline__ int
tas(volatile slock_t *lock)
{
	register slock_t _res = 1;

	__asm__ __volatile__(
		"	lock			\n"
		"	xchgb	%0,%1	\n"
:		"+q"(_res), "+m"(*lock)
:		/* no inputs */
:		"memory", "cc");
	return (int) _res;
}

2.使用semaphore實現

如果DB運行的平臺沒有test-and-set指令,則使用PGsemaphore實現SpinLock。PG中默認有128個信號量用於SpinLock(系統默認最大同時可用的semaphore爲128,cat /proc/sys/kernel/sem 查看),PG信號量實現的加鎖邏輯如下:

int
tas_sema(volatile slock_t *lock)
{
	int			lockndx = *lock;

	if (lockndx <= 0 || lockndx > NUM_SPINLOCK_SEMAPHORES)
		elog(ERROR, "invalid spinlock number: %d", lockndx);
	/* Note that TAS macros return 0 if *success* */
	return !PGSemaphoreTryLock(SpinlockSemaArray[lockndx - 1]);
}
/*
 * PGSemaphoreTryLock
 *
 * Lock a semaphore only if able to do so without blocking
 */
bool
PGSemaphoreTryLock(PGSemaphore sema)
{
	int			errStatus;

	/*
	 * Note: if errStatus is -1 and errno == EINTR then it means we returned
	 * from the operation prematurely because we were sent a signal.  So we
	 * try and lock the semaphore again.
	 */
	do
	{
		errStatus = sem_trywait(PG_SEM_REF(sema));
	} while (errStatus < 0 && errno == EINTR);

	if (errStatus < 0)
	{
		if (errno == EAGAIN || errno == EDEADLK)
			return false;		/* failed to lock it */
		/* Otherwise we got trouble */
		elog(FATAL, "sem_trywait failed: %m");
	}

	return true;
}

由於SpinLock不能用於需要長久持有鎖的邏輯,在PostgreSQL中,SpinLock主要用於對於臨界變量的併發訪問控制,所保護的臨界區通常是簡單的賦值語句,讀取語句等等。

2.LWlock

LWlock:Lightweight Lock,即所謂的輕量級鎖,這個輕量是相對第三種Lock而言的。基於spinLock實現,除了獨佔模式(互斥),還多了一種共享模式和一種special mode。

typedef enum LWLockMode
{
	LW_EXCLUSIVE,
	LW_SHARED,
	LW_WAIT_UNTIL_FREE			/* A special mode used in PGPROC->lwlockMode,
								 * when waiting for lock to become free. Not
								 * to be used as LWLockAcquire argument */
} LWLockMode;

其主要是以互斥訪問的方式用來保護共享內存數據結構,比如Clog buffer(事務提交狀態緩存)、Shared buffers(數據頁緩存)、wal buffer(wal緩存)等等。

LWlock數據結構定義:

typedef struct LWLock
{
	uint16		tranche;		/* tranche ID */
	pg_atomic_uint32 state;		/* state of exclusive/nonexclusive lockers */
	proclist_head waiters;		/* list of waiting PGPROCs */
#ifdef LOCK_DEBUG
	pg_atomic_uint32 nwaiters;	/* number of waiters */
	struct PGPROC *owner;		/* last exclusive owner of the lock */
#endif
} LWLock;

在PostgreSQL中LWLock根據使用場景不同,被細化爲多個子模塊

/*
 * Every tranche ID less than NUM_INDIVIDUAL_LWLOCKS is reserved; also,
 * we reserve additional tranche IDs for builtin tranches not included in
 * the set of individual LWLocks.  A call to LWLockNewTrancheId will never
 * return a value less than LWTRANCHE_FIRST_USER_DEFINED.
 */
typedef enum BuiltinTrancheIds
{
	LWTRANCHE_CLOG_BUFFERS = NUM_INDIVIDUAL_LWLOCKS,
	LWTRANCHE_COMMITTS_BUFFERS,
	LWTRANCHE_SUBTRANS_BUFFERS,
	LWTRANCHE_MXACTOFFSET_BUFFERS,
	LWTRANCHE_MXACTMEMBER_BUFFERS,
	LWTRANCHE_ASYNC_BUFFERS,
	LWTRANCHE_OLDSERXID_BUFFERS,
	LWTRANCHE_WAL_INSERT,
	LWTRANCHE_BUFFER_CONTENT,
	LWTRANCHE_BUFFER_IO_IN_PROGRESS,
	LWTRANCHE_REPLICATION_ORIGIN,
	LWTRANCHE_REPLICATION_SLOT_IO_IN_PROGRESS,
	LWTRANCHE_PROC,
	LWTRANCHE_BUFFER_MAPPING,
	LWTRANCHE_LOCK_MANAGER,
	LWTRANCHE_PREDICATE_LOCK_MANAGER,
	LWTRANCHE_PARALLEL_HASH_JOIN,
	LWTRANCHE_PARALLEL_QUERY_DSA,
	LWTRANCHE_SESSION_DSA,
	LWTRANCHE_SESSION_RECORD_TABLE,
	LWTRANCHE_SESSION_TYPMOD_TABLE,
	LWTRANCHE_SHARED_TUPLESTORE,
	LWTRANCHE_TBM,
	LWTRANCHE_PARALLEL_APPEND,
	LWTRANCHE_FIRST_USER_DEFINED
}			BuiltinTrancheIds;

const char *const MainLWLockNames[] = {
	"<unassigned:0>",
	"ShmemIndexLock",
	"OidGenLock",
	"XidGenLock",
	"ProcArrayLock",
	"SInvalReadLock",
	"SInvalWriteLock",
	"WALBufMappingLock",
	"WALWriteLock",
	"ControlFileLock",
	"CheckpointLock",
	"CLogControlLock",
	"SubtransControlLock",
	"MultiXactGenLock",
	"MultiXactOffsetControlLock",
	"MultiXactMemberControlLock",
	"RelCacheInitLock",
	"CheckpointerCommLock",
	"TwoPhaseStateLock",
	"TablespaceCreateLock",
	"BtreeVacuumLock",
	"AddinShmemInitLock",
	"AutovacuumLock",
	"AutovacuumScheduleLock",
	"SyncScanLock",
	"RelationMappingLock",
	"AsyncCtlLock",
	"AsyncQueueLock",
	"SerializableXactHashLock",
	"SerializableFinishedListLock",
	"SerializablePredicateLockListLock",
	"OldSerXidLock",
	"SyncRepLock",
	"BackgroundWorkerLock",
	"DynamicSharedMemoryControlLock",
	"AutoFileLock",
	"ReplicationSlotAllocationLock",
	"ReplicationSlotControlLock",
	"CommitTsControlLock",
	"CommitTsLock",
	"ReplicationOriginLock",
	"MultiXactTruncationLock",
	"OldSnapshotTimeMapLock",
	"LogicalRepWorkerLock",
	"CLogTruncationLock"
};

LWLock的初始化:
在PG初始化shared mem和信號量時,會初始化LWLock array(CreateLWLocks)。
具體爲:

  1. 計算LWLock需要佔用的shared mem的內存空間:算出固定的和每個子模塊(requested named tranches)LWLock的個數(固定在系統初始化階段就需要分配的LWLock有:buffer_mapping,lock_manager,predicate_lock_manager,parallel_query_dsa,tbm),每個LWLock的大小(LWLOCK_PADDED_SIZE+counter,couter爲計數器,記錄share鎖的數量),子模塊的信息佔用大小。
  2. 分配內存空間,與cache line對齊。
  3. LWLockInitialize函數依次對每個LWLock做初始化,並將LWLock的狀態置爲LW_FLAG_RELEASE_OK。
/*
 * Initialize LWLocks that are fixed and those belonging to named tranches.
 */
static void
InitializeLWLocks(void)
{
	int			numNamedLocks = NumLWLocksByNamedTranches();
	int			id;
	int			i;
	int			j;
	LWLockPadded *lock;

	/* Initialize all individual LWLocks in main array */
	/* 初始化BuiltinTrancheIds enum成員*/
	for (id = 0, lock = MainLWLockArray; id < NUM_INDIVIDUAL_LWLOCKS; id++, lock++)
		LWLockInitialize(&lock->lock, id);

	/* Initialize buffer mapping LWLocks in main array */
	lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS;
	for (id = 0; id < NUM_BUFFER_PARTITIONS; id++, lock++)
		LWLockInitialize(&lock->lock, LWTRANCHE_BUFFER_MAPPING);

	/* Initialize lmgrs' LWLocks in main array */
	lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS + NUM_BUFFER_PARTITIONS;
	for (id = 0; id < NUM_LOCK_PARTITIONS; id++, lock++)
		LWLockInitialize(&lock->lock, LWTRANCHE_LOCK_MANAGER);

	/* Initialize predicate lmgrs' LWLocks in main array */
	lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS +
		NUM_BUFFER_PARTITIONS + NUM_LOCK_PARTITIONS;
	for (id = 0; id < NUM_PREDICATELOCK_PARTITIONS; id++, lock++)
		LWLockInitialize(&lock->lock, LWTRANCHE_PREDICATE_LOCK_MANAGER);

	/* Initialize named tranches. */
	if (NamedLWLockTrancheRequests > 0)
	{
		char	   *trancheNames;

		NamedLWLockTrancheArray = (NamedLWLockTranche *)
			&MainLWLockArray[NUM_FIXED_LWLOCKS + numNamedLocks];

		trancheNames = (char *) NamedLWLockTrancheArray +
			(NamedLWLockTrancheRequests * sizeof(NamedLWLockTranche));
		lock = &MainLWLockArray[NUM_FIXED_LWLOCKS];

		for (i = 0; i < NamedLWLockTrancheRequests; i++)
		{
			NamedLWLockTrancheRequest *request;
			NamedLWLockTranche *tranche;
			char	   *name;

			request = &NamedLWLockTrancheRequestArray[i];
			tranche = &NamedLWLockTrancheArray[i];

			name = trancheNames;
			trancheNames += strlen(request->tranche_name) + 1;
			strcpy(name, request->tranche_name);
			tranche->trancheId = LWLockNewTrancheId();
			tranche->trancheName = name;

			for (j = 0; j < request->num_lwlocks; j++, lock++)
				LWLockInitialize(&lock->lock, tranche->trancheId);
		}
	}
}
  1. LWLockRegisterTranche函數註冊所有的已經初始化LWLock的子模塊,包括系統預先定義(BuiltinTrancheIds)的和用戶自定義的。
/*
 * Register named tranches and tranches for fixed LWLocks.
 */
static void
RegisterLWLockTranches(void)
{
	int			i;

	if (LWLockTrancheArray == NULL)
	{
		LWLockTranchesAllocated = 128;
		LWLockTrancheArray = (const char **)
			MemoryContextAllocZero(TopMemoryContext,
								   LWLockTranchesAllocated * sizeof(char *));
		Assert(LWLockTranchesAllocated >= LWTRANCHE_FIRST_USER_DEFINED);
	}

	for (i = 0; i < NUM_INDIVIDUAL_LWLOCKS; ++i)
		/* 註冊MainLWLockNames[] array成員*/
		LWLockRegisterTranche(i, MainLWLockNames[i]);

	LWLockRegisterTranche(LWTRANCHE_BUFFER_MAPPING, "buffer_mapping");
	LWLockRegisterTranche(LWTRANCHE_LOCK_MANAGER, "lock_manager");
	LWLockRegisterTranche(LWTRANCHE_PREDICATE_LOCK_MANAGER,
						  "predicate_lock_manager");
	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
						  "parallel_query_dsa");
	LWLockRegisterTranche(LWTRANCHE_SESSION_DSA,
						  "session_dsa");
	LWLockRegisterTranche(LWTRANCHE_SESSION_RECORD_TABLE,
						  "session_record_table");
	LWLockRegisterTranche(LWTRANCHE_SESSION_TYPMOD_TABLE,
						  "session_typmod_table");
	LWLockRegisterTranche(LWTRANCHE_SHARED_TUPLESTORE,
						  "shared_tuplestore");
	LWLockRegisterTranche(LWTRANCHE_TBM, "tbm");
	LWLockRegisterTranche(LWTRANCHE_PARALLEL_APPEND, "parallel_append");
	LWLockRegisterTranche(LWTRANCHE_PARALLEL_HASH_JOIN, "parallel_hash_join");
	LWLockRegisterTranche(LWTRANCHE_SXACT, "serializable_xact");

	/* Register named tranches. */
	for (i = 0; i < NamedLWLockTrancheRequests; i++)
		LWLockRegisterTranche(NamedLWLockTrancheArray[i].trancheId,
							  NamedLWLockTrancheArray[i].trancheName);
}

LWLock的使用:
1.獲取鎖

調用LWLockAcquire(LWLock *lock, LWLockMode mode)函數來加鎖,其中mode可以爲LW_SHARED(共享)和LW_EXCLUSIVE(排他)。
加鎖時,首先把需要加的鎖放入等待隊列,然後通過LWLock中的state狀態判斷是否可以加鎖成功,如果可以加鎖成功,使用原子操作campare and set來修改LWLock的狀態,把鎖從等待隊列中刪除。否則,需要等鎖。

還可以使用LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)來獲取鎖,與LWLockAcquire不同的是如果獲取不到直接返回,不會休眠等待。

LWLockAcquireOrWait函數,如果加鎖不成功,會一直等待,但是如果鎖狀態變爲free之後,不會再加鎖而是直接返回;當前這個函數在WALWriteLock中被使用,當一個backend需要flush WAL時,會加上WALWriteLock,然後會順帶把其它backend產生的WAL也flush了,因此,其它等鎖去flush WAL的backend其實也並不需要再去flush WAL了

/*
 * LWLockAcquire - acquire a lightweight lock in the specified mode
 *
 * If the lock is not available, sleep until it is.  Returns true if the lock
 * was available immediately, false if we had to sleep.
 *
 * Side effect: cancel/die interrupts are held off until lock release.
 */
bool
LWLockAcquire(LWLock *lock, LWLockMode mode)
{
	PGPROC	   *proc = MyProc;
	bool		result = true;
	int			extraWaits = 0;
#ifdef LWLOCK_STATS
	lwlock_stats *lwstats;

	lwstats = get_lwlock_stats_entry(lock);
#endif

	AssertArg(mode == LW_SHARED || mode == LW_EXCLUSIVE);

	PRINT_LWDEBUG("LWLockAcquire", lock, mode);

#ifdef LWLOCK_STATS
	/* Count lock acquisition attempts */
	if (mode == LW_EXCLUSIVE)
		lwstats->ex_acquire_count++;
	else
		lwstats->sh_acquire_count++;
#endif							/* LWLOCK_STATS */

	/*
	 * We can't wait if we haven't got a PGPROC.  This should only occur
	 * during bootstrap or shared memory initialization.  Put an Assert here
	 * to catch unsafe coding practices.
	 */
	Assert(!(proc == NULL && IsUnderPostmaster));

	/* Ensure we will have room to remember the lock */
	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
		elog(ERROR, "too many LWLocks taken");

	/*
	 * Lock out cancel/die interrupts until we exit the code section protected
	 * by the LWLock.  This ensures that interrupts will not interfere with
	 * manipulations of data structures in shared memory.
	 */
	HOLD_INTERRUPTS();

	/*
	 * Loop here to try to acquire lock after each time we are signaled by
	 * LWLockRelease.
	 *
	 * NOTE: it might seem better to have LWLockRelease actually grant us the
	 * lock, rather than retrying and possibly having to go back to sleep. But
	 * in practice that is no good because it means a process swap for every
	 * lock acquisition when two or more processes are contending for the same
	 * lock.  Since LWLocks are normally used to protect not-very-long
	 * sections of computation, a process needs to be able to acquire and
	 * release the same lock many times during a single CPU time slice, even
	 * in the presence of contention.  The efficiency of being able to do that
	 * outweighs the inefficiency of sometimes wasting a process dispatch
	 * cycle because the lock is not free when a released waiter finally gets
	 * to run.  See pgsql-hackers archives for 29-Dec-01.
	 */
	 
	 /* 主循環*/
	for (;;)
	{
		bool		mustwait;

		/*
		 * Try to grab the lock the first time, we're not in the waitqueue
		 * yet/anymore.
		 */
		 /* 第一次嘗試加鎖,如果成功,函數返回false,並跳出循環 */
		mustwait = LWLockAttemptLock(lock, mode);
        /* mustwait 爲false,說明加鎖成功,跳出循環 */
		if (!mustwait)
		{
			LOG_LWDEBUG("LWLockAcquire", lock, "immediately acquired lock");
			break;				/* got the lock */
		}

		/*
		 * Ok, at this point we couldn't grab the lock on the first try. We
		 * cannot simply queue ourselves to the end of the list and wait to be
		 * woken up because by now the lock could long have been released.
		 * Instead add us to the queue and try to grab the lock again. If we
		 * succeed we need to revert the queuing and be happy, otherwise we
		 * recheck the lock. If we still couldn't grab it, we know that the
		 * other locker will see our queue entries when releasing since they
		 * existed before we checked for the lock.
		 */
        /* 第一次嘗試加鎖失敗,因此將該鎖加入到等待隊列中*/ 
		/* add to the queue */
		LWLockQueueSelf(lock, mode);

		/* we're now guaranteed to be woken up if necessary */
		mustwait = LWLockAttemptLock(lock, mode);
        /* 第二次嘗試獲取鎖成功,成功獲取後,將鎖從等待隊列中撤銷 */
		/* ok, grabbed the lock the second time round, need to undo queueing */
		if (!mustwait)
		{
			LOG_LWDEBUG("LWLockAcquire", lock, "acquired, undoing queue");

			LWLockDequeueSelf(lock);
			break;
		}

		/*
		 * Wait until awakened.
		 *
		 * Since we share the process wait semaphore with the regular lock
		 * manager and ProcWaitForSignal, and we may need to acquire an LWLock
		 * while one of those is pending, it is possible that we get awakened
		 * for a reason other than being signaled by LWLockRelease. If so,
		 * loop back and wait again.  Once we've gotten the LWLock,
		 * re-increment the sema by the number of additional signals received,
		 * so that the lock manager or signal manager will see the received
		 * signal when it next waits.
		 */
        /* 以下就是等鎖邏輯了,是通過PGSemaphoreLock函數實現的 */
		LOG_LWDEBUG("LWLockAcquire", lock, "waiting");

#ifdef LWLOCK_STATS
		lwstats->block_count++;
#endif
        /* 記錄當前等待事件類型爲LW_Lock,並傳遞具體事件lock->tranche (tranche這個枚舉成員在文章前邊展示過)*/ 
		LWLockReportWaitStart(lock);
		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
        
        /* 等鎖操作 */
		for (;;)
		{   /* 信號量加鎖 */
			PGSemaphoreLock(proc->sem);
			if (!proc->lwWaiting)
				break;
			extraWaits++;
		}

		/* Retrying, allow LWLockRelease to release waiters again. */
		pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_RELEASE_OK);

#ifdef LOCK_DEBUG
		{
			/* not waiting anymore */
			uint32		nwaiters PG_USED_FOR_ASSERTS_ONLY = pg_atomic_fetch_sub_u32(&lock->nwaiters, 1);

			Assert(nwaiters < MAX_BACKENDS);
		}
#endif
       
		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
		LWLockReportWaitEnd();
        /* 等鎖結束 */
		LOG_LWDEBUG("LWLockAcquire", lock, "awakened");

		/* Now loop back and try to acquire lock again. */
		result = false;
	}

	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);

	/* Add lock to list of locks held by this backend */
	held_lwlocks[num_held_lwlocks].lock = lock;
	held_lwlocks[num_held_lwlocks++].mode = mode;

	/*
	 * Fix the process wait semaphore's count for any absorbed wakeups.
	 */
    /* 信號量解鎖 */
	while (extraWaits-- > 0)
		PGSemaphoreUnlock(proc->sem);

	return result;
}
  1. 等鎖:

等鎖是由PGSemaphoreLock函數完成的,當沒有加上鎖時,會等待一個信號量proc->sem(此時會休眠,不會消耗CPU)。

/*
 * PGSemaphoreLock
 *
 * Lock a semaphore (decrement count), blocking if count would be < 0
 */
void
PGSemaphoreLock(PGSemaphore sema)
{
	int			errStatus;

	/* See notes in sysv_sema.c's implementation of PGSemaphoreLock. */
	do
	{   /* 調用sem_wait函數,等待信號量,如果信號量的值大於0*/
	    /* 將信號量的值減1,立即返回。如果信號量的值爲0,則線程阻塞。*/
	    /* 相當於P操作。成功返回0,失敗返回-1 */
	    /* sem指向的對象是由sem_init調用初始化的信號量*/
		errStatus = sem_wait(PG_SEM_REF(sema));
	} while (errStatus < 0 && errno == EINTR);

	if (errStatus < 0)
		elog(FATAL, "sem_wait failed: %m");
}
  1. 釋放鎖:
    由LWLockRelease(LWLock *lock)函數完成
3.Lock

Lock是pg中的重量級鎖,主要用來操作數據庫對象,分類如下:

/* NoLock is not a lock mode, but a flag value meaning "don't get a lock" */
#define NoLock					0

#define AccessShareLock			1	/* SELECT */
#define RowShareLock			2	/* SELECT FOR UPDATE/FOR SHARE */
#define RowExclusiveLock		3	/* INSERT, UPDATE, DELETE */
#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL),ANALYZE, CREATE INDEX
									 * CONCURRENTLY */
#define ShareLock				5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
#define ShareRowExclusiveLock	6	/* like EXCLUSIVE MODE, but allows ROW
									 * SHARE */
#define ExclusiveLock			7	/* blocks ROW SHARE/SELECT...FOR UPDATE */
#define AccessExclusiveLock		8	/* ALTER TABLE, DROP TABLE, VACUUM FULL,
									 * and unqualified LOCK TABLE */

Lock的使用場景較多,我們拿一個update語句執行堆棧來分析加鎖,等鎖過程


[postgres@postgres_zabbix ~]$ pstack 21318
#0  0x00007f1706ee95e3 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x0000000000853571 in WaitEventSetWaitBlock (set=0x2164778, cur_timeout=-1, occurred_events=0x7ffcac6f15b0, nevents=1) at latch.c:1080
#2  0x000000000085344c in WaitEventSetWait (set=0x2164778, timeout=-1, occurred_events=0x7ffcac6f15b0, nevents=1, wait_event_info=50331652) at latch.c:1032
#3  0x0000000000852d38 in WaitLatchOrSocket (latch=0x7f17001cf5c4, wakeEvents=33, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:407
#4  0x0000000000852c03 in WaitLatch (latch=0x7f17001cf5c4, wakeEvents=33, timeout=0, wait_event_info=50331652) at latch.c:347
#5  0x0000000000867ccb in ProcSleep (locallock=0x2072938, lockMethodTable=0xb8f5a0 <default_lockmethod>) at proc.c:1289
#6  0x0000000000861f98 in WaitOnLock (locallock=0x2072938, owner=0x2081d40) at lock.c:1768
#7  0x00000000008610be in LockAcquireExtended (locktag=0x7ffcac6f1a90, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1050
#8  0x0000000000860713 in LockAcquire (locktag=0x7ffcac6f1a90, lockmode=5, sessionLock=false, dontWait=false) at lock.c:713
#9  0x000000000085f592 in XactLockTableWait (xid=501, rel=0x7f1707c8fb10, ctid=0x7ffcac6f1b44, oper=XLTW_Update) at lmgr.c:658
#10 0x00000000004c99c9 in heap_update (relation=0x7f1707c8fb10, otid=0x7ffcac6f1e60, newtup=0x2164708, cid=1, crosscheck=0x0, wait=true, tmfd=0x7ffcac6f1d60, lockmode=0x7ffcac6f1d5c) at heapam.c:3228
#11 0x00000000004d411c in heapam_tuple_update (relation=0x7f1707c8fb10, otid=0x7ffcac6f1e60, slot=0x2162108, cid=1, snapshot=0x2074500, crosscheck=0x0, wait=true, tmfd=0x7ffcac6f1d60, lockmode=0x7ffcac6f1d5c, update_indexes=0x7ffcac6f1d5b) at heapam_handler.c:332
#12 0x00000000006db007 in table_tuple_update (rel=0x7f1707c8fb10, otid=0x7ffcac6f1e60, slot=0x2162108, cid=1, snapshot=0x2074500, crosscheck=0x0, wait=true, tmfd=0x7ffcac6f1d60, lockmode=0x7ffcac6f1d5c, update_indexes=0x7ffcac6f1d5b) at ../../../src/include/access/tableam.h:1275
#13 0x00000000006dce83 in ExecUpdate (mtstate=0x2160b40, tupleid=0x7ffcac6f1e60, oldtuple=0x0, slot=0x2162108, planSlot=0x21613a0, epqstate=0x2160c38, estate=0x21607c0, canSetTag=true) at nodeModifyTable.c:1311
#14 0x00000000006de36c in ExecModifyTable (pstate=0x2160b40) at nodeModifyTable.c:2222
#15 0x00000000006b2b07 in ExecProcNodeFirst (node=0x2160b40) at execProcnode.c:445
#16 0x00000000006a8ce7 in ExecProcNode (node=0x2160b40) at ../../../src/include/executor/executor.h:239
#17 0x00000000006ab063 in ExecutePlan (estate=0x21607c0, planstate=0x2160b40, use_parallel_mode=false, operation=CMD_UPDATE, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x2146860, execute_once=true) at execMain.c:1646
#18 0x00000000006a91c4 in standard_ExecutorRun (queryDesc=0x2152ab0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:364
#19 0x00000000006a9069 in ExecutorRun (queryDesc=0x2152ab0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:308
#20 0x000000000088017a in ProcessQuery (plan=0x2146780, sourceText=0x204b040 "update test_tbl set id=4 where id=3;", params=0x0, queryEnv=0x0, dest=0x2146860, completionTag=0x7ffcac6f2270 "") at pquery.c:161
#21 0x00000000008818c1 in PortalRunMulti (portal=0x20b6570, isTopLevel=true, setHoldSnapshot=false, dest=0x2146860, altdest=0x2146860, completionTag=0x7ffcac6f2270 "") at pquery.c:1283
#22 0x0000000000880efb in PortalRun (portal=0x20b6570, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2146860, altdest=0x2146860, completionTag=0x7ffcac6f2270 "") at pquery.c:796
#23 0x000000000087b28f in exec_simple_query (query_string=0x204b040 "update test_tbl set id=4 where id=3;") at postgres.c:1215
#24 0x000000000087f30f in PostgresMain (argc=1, argv=0x207a6e0, dbname=0x207a578 "postgres", username=0x207a558 "postgres") at postgres.c:4247
#25 0x00000000007e6a9e in BackendRun (port=0x20702a0) at postmaster.c:4437
#26 0x00000000007e629d in BackendStartup (port=0x20702a0) at postmaster.c:4128
#27 0x00000000007e293d in ServerLoop () at postmaster.c:1704
#28 0x00000000007e21fd in PostmasterMain (argc=1, argv=0x2045c00) at postmaster.c:1377
#29 0x000000000070f76d in main (argc=1, argv=0x2045c00) at main.c:228
[postgres@postgres_zabbix ~]$

這是一個被阻塞的update語句,目前在等鎖狀態(等待之前持有鎖的事務提交)。

postgres=# select pid,wait_event_type,wait_event,query from pg_stat_activity where pid=21318;
-[ RECORD 1 ]---+-------------------------------------
pid             | 21318
wait_event_type | Lock
wait_event      | transactionid
query           | update test_tbl set id=4 where id=3;

加鎖:

調用LockAcquire (locktag=0x7ffcac6f1a90, lockmode=5, sessionLock=false, dontWait=false) ,申請的LockMode爲5,即 ShareLock

LockAcquire函數體計較長,這裏只概述大致的邏輯:
1)根據locktag中給定的需要加鎖對象的相關信息查詢hash表。因爲同一鎖可能被持有多次,爲了加快訪問速度,故而將這些所緩存在hash table中。
例如:
當我們需要執行對table進行加鎖操作時,會將我們所需要操作的數據庫編號,表的編號等信息存儲在locktag中;
當我們需要執行對Tuple進行加鎖操作時候,會將數據庫編號,表的編號,塊號及相應的偏移量等信息設置在locktag中。SET_LOCKTAG_XXX完成了對於相應LOCKTAG的設置工作;
因此首先是查找LocalLOCK hash表並根據結果進行相應的處理;

locallock = (LOCALLOCK *) hash_search(LockMethodLocalHash,
										  (void *) &localtag,
										  HASH_ENTER, &found);

2)檢查該對象是否已經獲取相應的鎖;
3)依據相應條件,對該鎖申請操作添加WAL日誌;
4)進行鎖衝突檢測;
5)當不存在相應的訪問衝突後,則進行鎖申請操作並記錄下該資源對於鎖的使用情況;當發現存在着訪問衝突後,需要進行鎖等待處理,使用WaitOnLock進行等待(底層是epoll實現的);
6)告知鎖的申請結果

等鎖:
調用WaitOnLock (locallock=0x2072938, owner=0x2081d40)
底層實現是通過epoll實現的,可以看到頂層堆棧在epoll_wait函數中

#0  0x00007f1706ee95e3 in __epoll_wait_nocancel () from /lib64/libc.so.6

釋放鎖:

調用:LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)

不詳細分析了,在事務提交/回滾後釋放鎖。

二、PostgreSQL中鎖的對比

spinlock
主要特點:輕量鎖,只有排他一種模式,無等待隊列,等鎖爲忙等待,cpu空轉

使用場景:佔鎖時間短,對臨界資源進行簡單訪問,並且臨界區較短。臨界區通常是簡單的賦值語句,讀取語句等等

LWLock
主要特點:輕量級鎖,除了排他模式,還存在共享模式。存在等待隊列,等鎖通過sem_wait實現

使用場景:臨界區較長,且邏輯關係比較複雜,對臨界資源的操作比較複雜。比如操作Clog buffer(事務提交狀態緩存)、Shared buffers(數據頁緩存)、wal buffer(wal緩存)等等。

Lock

主要特點:重量級鎖,持有時間可以很長,等鎖通過epoll實現

使用場景:對所有數據庫對象的操作,例如表的增刪改查

參考:
https://zhuanlan.zhihu.com/p/73517810
http://www.postgres.cn/news/viewone/1/241

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章