西瓜數據集2.0
編號,色澤,根蒂,敲聲,紋理,臍部,觸感,好瓜
1, 青綠,蜷縮,濁響,清晰,凹陷,硬滑,是
2, 烏黑,蜷縮,沉悶,清晰,凹陷,硬滑,是
3, 烏黑,蜷縮,濁響,清晰,凹陷,硬滑,是
4, 青綠,蜷縮,沉悶,清晰,凹陷,硬滑,是
5, 淺白,蜷縮,濁響,清晰,凹陷,硬滑,是
6, 青綠,稍蜷,濁響,清晰,稍凹,軟粘,是
7, 烏黑,稍蜷,濁響,稍糊,稍凹,軟粘,是
8, 烏黑,稍蜷,濁響,清晰,稍凹,硬滑,是
9, 烏黑,稍蜷,沉悶,稍糊,稍凹,硬滑,否
10, 青綠,硬挺,清脆,清晰,平坦,軟粘,否
11, 淺白,硬挺,清脆,模糊,平坦,硬滑,否
12, 淺白,蜷縮,濁響,模糊,平坦,軟粘,否
13, 青綠,稍蜷,濁響,稍糊,凹陷,硬滑,否
14, 淺白,稍蜷,沉悶,稍糊,凹陷,硬滑,否
15, 烏黑,稍蜷,濁響,清晰,稍凹,軟粘,否
16, 淺白,蜷縮,濁響,模糊,平坦,硬滑,否
17, 青綠,蜷縮,沉悶,稍糊,稍凹,硬滑,否
求信息熵
import math
def ent(*ps: float) -> float:
sum = 0.0
for p in ps:
if p == 0.0:
sum += 0
else:
sum += -1 * p * math.log(p, 2)
return sum
第一次劃分
# 初始時的信息熵
p1 = 8 / 17
p2 = 9 / 17
Ent = ent(p1, p2) = 0.9975025463691153
# 色澤青綠的信息熵
p1 = 3 / 6
p2 = 3 / 6
Ent1_1 = ent(p1, p2) = 1.0
# 色澤烏黑的信息熵
p1 = 4 / 6
p2 = 2 / 6
Ent1_2 = ent(p1, p2) = 0.9182958340544896
# 色澤淺白的信息熵
p1 = 1 / 5
p2 = 4 / 5
Ent1_3 = ent(p1, p2) = 0.7219280948873623
# 色澤的信息熵
Ent1 = 6 / 17 * Ent1_1 + 6 / 17 * Ent1_2 + 5 / 17 * Ent1_3 = 0.88937738110375
# 根蒂蜷縮的信息熵
p1 = 5 / 8
p2 = 3 / 8
Ent2_1 = ent(p1, p2) = 0.9544340029249649
# 根蒂稍蜷的信息熵
p1 = 3 / 7
p2 = 4 / 7
Ent2_2 = ent(p1, p2) = 0.9852281360342516
# 根蒂硬挺的信息熵
p1 = 0.0
p2 = 2 / 2
Ent2_3 = -(p2 * math.log(p2, 2)) = 0.0
# 根蒂的信息熵
Ent2 = 8 / 17 * Ent2_1 + 7 / 17 * Ent2_2 + 2 / 17 * Ent2_3 = 0.8548275868023224
# 敲聲濁響的信息熵
p1 = 6 / 10
p2 = 4 / 10
Ent3_1 = ent(p1, p2) = 0.9709505944546686
# 敲聲沉悶的信息熵
p1 = 2 / 5
p2 = 3 / 5
Ent3_2 = ent(p1, p2) = 0.9709505944546686
# 敲聲清脆的信息熵
p1 = 0.0
p2 = 2 / 2
Ent3_3 = -(p2 * math.log(p2, 2)) = 0.0
# 敲聲的信息熵
Ent3 = 10 / 17 * Ent3_1 + 5 / 17 * Ent3_2 + 2 / 17 * Ent3_3 = 0.8567211127541194
# 紋理清晰的信息熵
p1 = 7 / 9
p2 = 2 / 9
Ent4_1 = ent(p1, p2) = 0.7642045065086203
# 紋理稍糊的信息熵
p1 = 1 / 5
p2 = 4 / 5
Ent4_2 = ent(p1, p2) = 0.7219280948873623
# 紋理模糊的信息熵
p1 = 0.0
p2 = 3 / 3
Ent4_3 = -(p2 * math.log(p2, 2)) = 0.0
# 紋理的信息熵
Ent4 = 9 / 17 * Ent4_1 + 5 / 17 * Ent4_2 + 3 / 17 * Ent4_3 = 0.6169106490008467
# 臍部凹陷的信息熵
p1 = 5 / 7
p2 = 2 / 7
Ent5_1 = ent(p1, p2) = 0.863120568566631
# 臍部稍凹的信息熵
p1 = 3 / 6
p2 = 3 / 6
Ent5_2 = ent(p1, p2) = 1.0
# 臍部平坦的信息熵
p1 = 0.0
p2 = 4 / 4
Ent5_3 = -(p2 * math.log(p2, 2)) = 0.0
# 臍部的信息熵
Ent5 = 7 / 17 * Ent5_1 + 6 / 17 * Ent5_2 + 4 / 17 * Ent5_3 = 0.7083437635274363
# 觸感硬滑的信息熵
p1 = 6 / 12
p2 = 6 / 12
Ent6_1 = ent(p1, p2) = 1.0
# 觸感軟粘的信息熵
p1 = 2 / 5
p2 = 3 / 5
Ent6_2 = ent(p1, p2) = 0.9709505944546686
# 觸感的信息熵
Ent6 = 12 / 17 * Ent6_1 + 5 / 17 * Ent6_2 = 0.9914560571925497
紋理的信息增益(0.9975025463691153 - 0.6169106490008467 = 0.3805918973682686
)最大,所以 紋理
被選爲劃分屬性
第二次劃分
模糊分支 都是壞瓜,劃分完畢,下面 以 稍糊分支 爲例
# 初始時的信息熵
p1 = 1 / 5
p2 = 4 / 5
Ent = ent(p1, p2) = 0.7219280948873623
# 色澤青綠的信息熵
p1 = 0.0
p2 = 2 / 2
Ent1_1 = 0.0
# 色澤烏黑的信息熵
p1 = 1 / 2
p2 = 1 / 2
Ent1_2 = ent(p1, p2) = 1.0
# 色澤淺白的信息熵
p1 = 0.0
p2 = 1 / 1
Ent1_3 = 0.0
# 色澤的信息熵
Ent1 = 2 / 5 * Ent1_1 + 2 / 5 * Ent1_2 + 1 / 5 * Ent1_3 = 0.4
# 根蒂蜷縮的信息熵
p1 = 0.0
p2 = 1 / 1
Ent2_1 = 0
# 根蒂稍蜷的信息熵
p1 = 1 / 4
p2 = 3 / 4
Ent2_2 = ent(p1, p2) = 0.8112781244591328
# 根蒂硬挺的信息熵
p1 = 0.0
p2 = 0.0
Ent2_3 = 0.0
# 根蒂的信息熵
Ent2 = 1 / 5 * Ent2_1 + 4 / 5 * Ent2_2 + 0 * Ent2_3 = 0.64902249956730624
# 敲聲濁響的信息熵
p1 = 1 / 2
p2 = 1 / 2
Ent3_1 = ent(p1, p2) = 1.0
# 敲聲沉悶的信息熵
p1 = 0.0
p2 = 3 / 3
Ent3_2 = 0
# 敲聲清脆的信息熵
p1 = 0.0
p2 = 0.0
Ent3_3 = 0.0
# 敲聲的信息熵
Ent3 = 2 / 5* Ent3_1 + 3 / 5 * Ent3_2 + 0.0 * Ent3_3 = 0.4
# 臍部凹陷的信息熵
p1 = 0.0
p2 = 2 / 2
Ent5_1 = 0.0
# 臍部稍凹的信息熵
p1 = 1 / 3
p2 = 2 / 3
Ent5_2 = ent(p1, p2) = 0.9182958340544896
# 臍部平坦的信息熵
p1 = 0.0
p2 = 0.0
Ent5_3 = 0.0
# 臍部的信息熵
Ent5 = 2 / 5 * Ent5_1 + 3 / 5 * Ent5_2 + 0 * Ent5_3 = 0.5509775004326938
# 觸感硬滑的信息熵
p1 = 0.0
p2 = 4 / 4
Ent6_1 = 0.0
# 觸感軟粘的信息熵
p1 = 1 / 1
p2 = 0.0
Ent6_2 = 0.0
# 觸感的信息熵
Ent6 = 4 / 5* Ent6_1 + 1 / 5* Ent6_2 = 0.0
觸感的信息增益(0.7219280948873623 - 0.0 = 0.7219280948873623
)最大,所以 觸感
被選爲劃分屬性
最終結果
剩下的劃分,具體操作 和 上述兩次劃分 是 一樣的,不再贅述。