概述
字符串列具有不定長得特點,可能長度爲0也可能長度爲1024,當我們頻繁查詢這個列的時候,我們可以爲這個字符串列建立一個索引,但是這個索引不應該是不定長的, 我們應該來選擇一個合適的前綴長度來建立索引.下面介紹找到最佳長度的過程.
找到最佳前綴索引長度
創建表
create table city (
id int(11) not null auto_increment,
cname varchar(255),
primary key(id)
);
# 插入數據
insert into city(cname) values('New York');
insert into city(cname) values('Beijing');
insert into city(cname) values('Guangzhou');
insert into city(cname) values('Shanghai');
insert into city(cname) values('Shenzhen');
insert into city(cname) values('Chongqing');
insert into city(cname) values('London');
insert into city(cname) values('Tokyo');
insert into city(cname) values('Soul');
insert into city(cname) values('Garden Grove');
insert into city(cname) values('Escobar');
insert into city(cname) values('Amroha');
insert into city(cname) values('Tegal');
insert into city(cname) values('Lancaster');
insert into city(cname) values('Jelets');
insert into city(cname) values('Ambattur');
insert into city(cname) values('Yingkou');
insert into city(cname) values('Monclova');
insert into city(cname) values('Dazhou');
insert into city(cname) values('Guangan');
# 複製數據(執行多次)
insert into city(cname) select cname from city;
# 隨機化數據(每次執行隨機刪除1000條)
DELETE FROM city ORDER BY RAND() LIMIT 1000;
數據計算
- 查看最常出現的10個城市.
select count(*) as cnt, cname from city group by cname order by cnt desc limit 10;
返回結果
+-----+-----------+
| cnt | cname |
+-----+-----------+
| 654 | Guangan |
| 652 | Lancaster |
| 645 | Shanghai |
| 633 | Amroha |
| 607 | Chongqing |
| 578 | Beijing |
| 574 | Soul |
| 574 | Ambattur |
| 552 | Hangzhou |
| 549 | New York |
+-----+-----------+
- 通過計算
完整列的選擇性
, 然後使得前綴的選擇性接近於完整列的選擇性.
SELECT count(distinct cname)/count(*) FROM city;
返回結果
+--------------------------------+
| COUNT(distinct cname)/count(*) |
+--------------------------------+
| 0.0018 |
+--------------------------------+
- 查詢多個列的選擇性進行比較
# 如果數據比較平均, 則這個結果也會比較平均.
SELECT count(distinct LEFT(cname, 3))/count(*) as factor3,
count(distinct LEFT(cname, 4))/count(*) as factor4,
count(distinct LEFT(cname, 5))/count(*) as factor5,
count(distinct LEFT(cname, 6))/count(*) as factor6,
count(distinct LEFT(cname, 7))/count(*) as factor7,
count(distinct LEFT(cname, 8))/count(*) as factor8,
count(distinct LEFT(cname, 9))/count(*) as factor9
FROM city;
返回結果
+---------+---------+---------+---------+---------+---------+---------+
| factor3 | factor4 | factor5 | factor6 | factor7 | factor8 | factor9 |
+---------+---------+---------+---------+---------+---------+---------+
| 0.0017 | 0.0017 | 0.0017 | 0.0018 | 0.0018 | 0.0018 | 0.0018 |
+---------+---------+---------+---------+---------+---------+---------+
- 查看長度爲4的前綴的最多的前面10個.
SELECT COUNT(1) cnt, LEFT(cname,4) AS pref FROM city GROUP BY pref ORDER BY cnt DESC LIMIT 10;
返回結果
+------+------+
| cnt | pref |
+------+------+
| 1188 | Guan |
| 652 | Lanc |
| 645 | Shan |
| 633 | Amro |
| 607 | Chon |
| 578 | Beij |
| 574 | Soul |
| 574 | Amba |
| 552 | Hang |
| 549 | New |
+------+------+
結果選擇
: 在我的表中, 當我選擇6時候分佈最爲均勻,是一個好的結果.
在cname列上創建索引
ALTER TABLE city add key (cname(6));
總結
前綴索引能夠使得查找更加快, 索引大小更小的有效辦法. 但是缺點是無法使用前綴索引做group by或者order by, 也無法使用前綴索引做覆蓋掃描.