MySQL字符列前綴索引長度的選擇

概述

字符串列具有不定長得特點,可能長度爲0也可能長度爲1024,當我們頻繁查詢這個列的時候,我們可以爲這個字符串列建立一個索引,但是這個索引不應該是不定長的, 我們應該來選擇一個合適的前綴長度來建立索引.下面介紹找到最佳長度的過程.

找到最佳前綴索引長度

創建表

create table city (
	id int(11) not null auto_increment,
	cname varchar(255),
	primary key(id)
);

# 插入數據
insert into city(cname) values('New York');
insert into city(cname) values('Beijing');
insert into city(cname) values('Guangzhou');
insert into city(cname) values('Shanghai');
insert into city(cname) values('Shenzhen');
insert into city(cname) values('Chongqing');
insert into city(cname) values('London');
insert into city(cname) values('Tokyo');
insert into city(cname) values('Soul');
insert into city(cname) values('Garden Grove');
insert into city(cname) values('Escobar');
insert into city(cname) values('Amroha');
insert into city(cname) values('Tegal');
insert into city(cname) values('Lancaster');
insert into city(cname) values('Jelets');
insert into city(cname) values('Ambattur');
insert into city(cname) values('Yingkou');
insert into city(cname) values('Monclova');
insert into city(cname) values('Dazhou');
insert into city(cname) values('Guangan');

# 複製數據(執行多次)
insert into city(cname) select cname from city;

# 隨機化數據(每次執行隨機刪除1000條)
DELETE FROM city ORDER BY RAND() LIMIT 1000;

數據計算

  • 查看最常出現的10個城市.
select count(*) as cnt, cname from city group by cname order by cnt desc limit 10;

返回結果

+-----+-----------+
| cnt | cname     |
+-----+-----------+
| 654 | Guangan   |
| 652 | Lancaster |
| 645 | Shanghai  |
| 633 | Amroha    |
| 607 | Chongqing |
| 578 | Beijing   |
| 574 | Soul      |
| 574 | Ambattur  |
| 552 | Hangzhou  |
| 549 | New York  |
+-----+-----------+
  • 通過計算完整列的選擇性, 然後使得前綴的選擇性接近於完整列的選擇性.
SELECT count(distinct cname)/count(*) FROM city;

返回結果

+--------------------------------+
| COUNT(distinct cname)/count(*) |
+--------------------------------+
|                         0.0018 |
+--------------------------------+
  • 查詢多個列的選擇性進行比較
# 如果數據比較平均, 則這個結果也會比較平均.
SELECT count(distinct LEFT(cname, 3))/count(*) as factor3,
			 count(distinct LEFT(cname, 4))/count(*) as factor4,
			 count(distinct LEFT(cname, 5))/count(*) as factor5,
			 count(distinct LEFT(cname, 6))/count(*) as factor6,
			 count(distinct LEFT(cname, 7))/count(*) as factor7,
			 count(distinct LEFT(cname, 8))/count(*) as factor8,
			 count(distinct LEFT(cname, 9))/count(*) as factor9
FROM city;

返回結果

+---------+---------+---------+---------+---------+---------+---------+
| factor3 | factor4 | factor5 | factor6 | factor7 | factor8 | factor9 |
+---------+---------+---------+---------+---------+---------+---------+
|  0.0017 |  0.0017 |  0.0017 |  0.0018 |  0.0018 |  0.0018 |  0.0018 |
+---------+---------+---------+---------+---------+---------+---------+
  • 查看長度爲4的前綴的最多的前面10個.
SELECT COUNT(1) cnt, LEFT(cname,4) AS pref FROM city GROUP BY pref ORDER BY cnt DESC LIMIT 10; 

返回結果

+------+------+
| cnt  | pref |
+------+------+
| 1188 | Guan |
|  652 | Lanc |
|  645 | Shan |
|  633 | Amro |
|  607 | Chon |
|  578 | Beij |
|  574 | Soul |
|  574 | Amba |
|  552 | Hang |
|  549 | New  |
+------+------+

結果選擇: 在我的表中, 當我選擇6時候分佈最爲均勻,是一個好的結果.

在cname列上創建索引

ALTER TABLE city add key (cname(6));

總結

前綴索引能夠使得查找更加快, 索引大小更小的有效辦法. 但是缺點是無法使用前綴索引做group by或者order by, 也無法使用前綴索引做覆蓋掃描.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章