在此之前需要了解的名詞:
page,物理文件的單位,默認大小爲8K。
tuple,PG中物理行。
xid,事務號,執行操作時的順序id。
pageinspect裏邊有三個函數是本文用到的,他們分別是:
a.get_raw_page,根據參數表明、數據文件類型(main、fsm、vm)以及page位置,將當前表文件中的page內容返回。還有一個函數於此同名,只有兩個參數,是將第二個參數省略,直接使用'main'。
b.page_header,參數是函數get_raw_page的返回值,返回值是將本page結構中的PageHeaderData詳細信息。
c.heap_page_items,參數是函數get_raw_page的返回值,返回值是將page內的項指針(ItemIddata)以及HeapTupleHeaderData的詳細信息。
下面通過例子來對page的PageHeaderData、ItemIddata和HeapTupleHeaderData進行說明。
a.建表mvcc
[postgres@localhost bin]$ ./psql
psql (9.4.5)
Type "help" for help.
postgres=# create table mvcc(id int);
CREATE TABLE
postgres=# insert into mvcc values (1),(2);
INSERT 0 2
b.PageHeaderData
postgres=# select * from page_header(get_raw_page('mvcc', 'main', 0));
lsn | checksum | flags | lower | upper | special | pagesize | version | p
rune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+--
---------
0/300F4D8 | 0 | 0 | 32 | 8128 | 8192 | 8192 | 4 |
0
(1 row)
這是表mvcc的PageHeaderData信息,現在比對源碼對各個字段進行解釋:typedef struct PageHeaderData
{
/* XXX LSN is member of *any* block, not only page-organized ones */
PageXLogRecPtr pd_lsn; /* LSN: next byte after last byte of xlog
* record for last change to this page */
uint16 pd_checksum; /* checksum */
uint16 pd_flags; /* flag bits, see below */
LocationIndex pd_lower; /* offset to start of free space */
LocationIndex pd_upper; /* offset to end of free space */
LocationIndex pd_special; /* offset to start of special space */
uint16 pd_pagesize_version;
TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */
ItemIdData pd_linp[1]; /* beginning of line pointer array */
} PageHeaderData;
pg_lsn:記錄最後一次對page修改的xlog記錄id。pg_checksum:頁面的校驗和,主要是通過函數pg_checksum_block函數生成的,0也是有效地,參數爲PageHeaderData和BLCKSZ(page's size)。當校驗和驗證失敗,即認爲當前頁面無效。
pg_flags:page的flags,具體值爲,可以疊加:
#define PD_HAS_FREE_LINES 0x0001 /* are there any unused line pointers? */
#define PD_PAGE_FULL 0x0002 /* not enough free space for new
* tuple? */
#define PD_ALL_VISIBLE 0x0004 /* all tuples on page are visible to
* everyone */
#define PD_VALID_FLAG_BITS 0x0007 /* OR of all valid pd_flags bits */
pg_lower和pg_upper:最後一個項指針的位置和最新的tuple位置。主要進行查找空閒位置,進行插入工作。pg_special:page預留的位置,可以存儲索引等信息。
pg_pagesize_version:page大小以及當前版本。page大小可以通過configure進行設置。version的意思是
/*
* Page layout version number 0 is for pre-7.3 Postgres releases.
* Releases 7.3 and 7.4 use 1, denoting a new HeapTupleHeader layout.
* Release 8.0 uses 2; it changed the HeapTupleHeader layout again.
* Release 8.1 uses 3; it redefined HeapTupleHeader infomask bits.
* Release 8.3 uses 4; it changed the HeapTupleHeader layout again, and
* added the pd_flags field (by stealing some bits from pd_tli),
* as well as adding the pd_prune_xid field (which enlarges the header).
*
* As of Release 9.3, the checksum version must also be considered when
* handling pages.
*/
pg_prune_xid:一般是最後一次刪除或者更新的xid。 pg_linp:項指針。
c.ItemIddata
postgres=# select lp,lp_off,lp_flags,lp_len from heap_page_items(get_raw_page('mvcc', 'main', 0));
lp | lp_off | lp_flags | lp_len
----+--------+----------+--------
1 | 8160 | 1 | 28
2 | 8128 | 1 | 28
(2 rows)
這是表mvcc的項指針的信息,一樣通過結合源碼進行介紹:
typedef struct ItemIdData
{
unsigned lp_off:15, /* offset to tuple (from start of page) */
lp_flags:2, /* state of item pointer, see below */
lp_len:15; /* byte length of tuple */
} ItemIdData;
lp:這是插件自己定義的列,在源碼中其實沒有,這個是項指針的順序。
lp_off:tuple在page中的位置。
lp_flags:tuple的flags,具體爲
#define LP_UNUSED 0 /* unused (should always have lp_len=0) */
#define LP_NORMAL 1 /* used (should always have lp_len>0) */
#define LP_REDIRECT 2 /* HOT redirect (should have lp_len=0) */
#define LP_DEAD 3 /* dead, may or may not have storage */
lp_len:
HeapTupleHeaderData
的長度+Oid的長度(8,因爲要數據對齊,所以在這裏會比原來預計的多4)。
d.HeapTupleHeaderData
postgres=# select * from heap_page_items(get_raw_page('mvcc', 'main', 0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infom
ask2 | t_infomask | t_hoff | t_bits | t_oid
----+--------+----------+--------+--------+--------+----------+--------+--------
-----+------------+--------+--------+-------
1 | 8160 | 1 | 28 | 1831 | 0 | 0 | (0,1) |
1 | 2048 | 24 | |
2 | 8128 | 1 | 28 | 1831 | 0 | 0 | (0,2) |
1 | 2048 | 24 | |
(2 rows)
同樣結合源碼進行介紹:
typedef struct HeapTupleFields
{
TransactionId t_xmin; /* inserting xact ID */
TransactionId t_xmax; /* deleting or locking xact ID */
union
{
CommandId t_cid; /* inserting or deleting command ID, or both */
TransactionId t_xvac; /* old-style VACUUM FULL xact ID */
} t_field3;
} HeapTupleFields;
typedef struct DatumTupleFields
{
int32 datum_len_; /* varlena header (do not touch directly!) */
int32 datum_typmod; /* -1, or identifier of a record type */
Oid datum_typeid; /* composite type OID, or RECORDOID */
/*
* Note: field ordering is chosen with thought that Oid might someday
* widen to 64 bits.
*/
} DatumTupleFields;
struct HeapTupleHeaderData
{
union
{
HeapTupleFields t_heap;
DatumTupleFields t_datum;
} t_choice;
ItemPointerData t_ctid; /* current TID of this or newer tuple */
/* Fields below here must match MinimalTupleData! */
uint16 t_infomask2; /* number of attributes + various flags */
uint16 t_infomask; /* various flag bits, see below */
uint8 t_hoff; /* sizeof header incl. bitmap, padding */
/* ^ - 23 bytes - ^ */
bits8 t_bits[1]; /* bitmap of NULLs -- VARIABLE LENGTH */
/* MORE DATA FOLLOWS AT END OF STRUCT */
};
xmin和xmax是插入、刪除和更新時的事務ID,插入時會在xmin內寫入當前事務ID,當刪除時就會在xmax寫入當前事務ID。更新是進行刪除後再插入。
t_cid:這個是指一個事務內的命令ID,每個事務都是從0開始。
t_ctid:這個是指物理ID,結構如下:
typedef struct ItemPointerData
{
BlockIdData ip_blkid;
OffsetNumber ip_posid;
}
typedef struct BlockIdData
{
uint16 bi_hi;
uint16 bi_lo;
} BlockIdData;
存儲的爲bi_hi(文件號) << 16 | bi_lo(page號),來獲取磁盤順序,ip_posid是在page的中序號。以此來準確定位數據。
t_infomask2:表字段的個數以及一些flags,flags如下:
#define HEAP_NATTS_MASK 0x07FF /* 11 bits for number of attributes */
/* bits 0x1800 are available */
#define HEAP_KEYS_UPDATED 0x2000 /* tuple was updated and key cols
* modified, or tuple deleted */
#define HEAP_HOT_UPDATED 0x4000 /* tuple was HOT-updated */
#define HEAP_ONLY_TUPLE 0x8000 /* this is heap-only tuple */
#define HEAP2_XACT_MASK 0xE000 /* visibility-related bits */
t_infomask:tuple的flags,如下:#define HEAP_HASNULL 0x0001 /* has null attribute(s) */
#define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */
#define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */
#define HEAP_HASOID 0x0008 /* has an object-id field */
#define HEAP_XMAX_KEYSHR_LOCK 0x0010 /* xmax is a key-shared locker */
#define HEAP_COMBOCID 0x0020 /* t_cid is a combo cid */
#define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */
#define HEAP_XMAX_LOCK_ONLY 0x0080 /* xmax, if valid, is only a locker */
/* xmax is a shared locker */
#define HEAP_XMAX_SHR_LOCK (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)
#define HEAP_LOCK_MASK (HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \
HEAP_XMAX_KEYSHR_LOCK)
#define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
#define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
#define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */
#define HEAP_XMAX_INVALID 0x0800 /* t_xmax invalid/aborted */
#define HEAP_XMAX_IS_MULTI 0x1000 /* t_xmax is a MultiXactId */
#define HEAP_UPDATED 0x2000 /* this is UPDATEd version of row */
#define HEAP_MOVED_OFF 0x4000 /* moved to another place by pre-9.0
t_hoff:
HeapTupleHeaderData長度,如果有Oid會增加4,但由於受到對齊的影響,會增加8。t_bits:具體數據,可以參照 PostgreSQL的基礎數據類型分析記錄。