ID3 決策樹(基於西瓜數據集2.0)

西瓜數據集2.0

編號,色澤,根蒂,敲聲,紋理,臍部,觸感,好瓜
1,   青綠,蜷縮,濁響,清晰,凹陷,硬滑,是
2,   烏黑,蜷縮,沉悶,清晰,凹陷,硬滑,是
3,   烏黑,蜷縮,濁響,清晰,凹陷,硬滑,是
4,   青綠,蜷縮,沉悶,清晰,凹陷,硬滑,是
5,   淺白,蜷縮,濁響,清晰,凹陷,硬滑,是
6,   青綠,稍蜷,濁響,清晰,稍凹,軟粘,是
7,   烏黑,稍蜷,濁響,稍糊,稍凹,軟粘,是
8,   烏黑,稍蜷,濁響,清晰,稍凹,硬滑,是
9,   烏黑,稍蜷,沉悶,稍糊,稍凹,硬滑,否
10,  青綠,硬挺,清脆,清晰,平坦,軟粘,否
11,  淺白,硬挺,清脆,模糊,平坦,硬滑,否
12,  淺白,蜷縮,濁響,模糊,平坦,軟粘,否
13,  青綠,稍蜷,濁響,稍糊,凹陷,硬滑,否
14,  淺白,稍蜷,沉悶,稍糊,凹陷,硬滑,否
15,  烏黑,稍蜷,濁響,清晰,稍凹,軟粘,否
16,  淺白,蜷縮,濁響,模糊,平坦,硬滑,否
17,  青綠,蜷縮,沉悶,稍糊,稍凹,硬滑,否

求信息熵

    import math

    def ent(*ps: float) -> float:
        sum = 0.0
        for p in ps:
            if p == 0.0:
                sum += 0
            else:
                sum += -1 * p * math.log(p, 2)
        return sum

第一次劃分

# 初始時的信息熵    
p1 = 8 / 17
p2 = 9 / 17
Ent = ent(p1, p2) = 0.9975025463691153

# 色澤青綠的信息熵
p1 = 3 / 6
p2 = 3 / 6
Ent1_1 = ent(p1, p2) = 1.0
# 色澤烏黑的信息熵
p1 = 4 / 6
p2 = 2 / 6
Ent1_2 = ent(p1, p2) = 0.9182958340544896
# 色澤淺白的信息熵
p1 = 1 / 5
p2 = 4 / 5
Ent1_3 = ent(p1, p2) = 0.7219280948873623
# 色澤的信息熵
Ent1 = 6 / 17 * Ent1_1 + 6 / 17 * Ent1_2 + 5 / 17 * Ent1_3 = 0.88937738110375

# 根蒂蜷縮的信息熵
p1 = 5 / 8
p2 = 3 / 8
Ent2_1 = ent(p1, p2) = 0.9544340029249649
# 根蒂稍蜷的信息熵
p1 = 3 / 7
p2 = 4 / 7
Ent2_2 = ent(p1, p2) = 0.9852281360342516
# 根蒂硬挺的信息熵
p1 = 0.0
p2 = 2 / 2
Ent2_3 = -(p2 * math.log(p2, 2)) = 0.0
# 根蒂的信息熵
Ent2 = 8 / 17 * Ent2_1 + 7 / 17 * Ent2_2 + 2 / 17 * Ent2_3 = 0.8548275868023224

# 敲聲濁響的信息熵
p1 = 6 / 10
p2 = 4 / 10
Ent3_1 = ent(p1, p2) = 0.9709505944546686
# 敲聲沉悶的信息熵
p1 = 2 / 5
p2 = 3 / 5
Ent3_2 = ent(p1, p2) = 0.9709505944546686
# 敲聲清脆的信息熵
p1 = 0.0
p2 = 2 / 2
Ent3_3 = -(p2 * math.log(p2, 2)) = 0.0
# 敲聲的信息熵
Ent3 = 10 / 17 * Ent3_1 + 5 / 17 * Ent3_2 + 2 / 17 * Ent3_3 = 0.8567211127541194

# 紋理清晰的信息熵
p1 = 7 / 9
p2 =  2 / 9
Ent4_1 = ent(p1, p2) = 0.7642045065086203
# 紋理稍糊的信息熵
p1 = 1 / 5
p2 = 4 / 5
Ent4_2 = ent(p1, p2) = 0.7219280948873623
# 紋理模糊的信息熵
p1 = 0.0
p2 = 3 / 3
Ent4_3 = -(p2 * math.log(p2, 2)) = 0.0
# 紋理的信息熵
Ent4 = 9 / 17 * Ent4_1 + 5 / 17 * Ent4_2 + 3 / 17 * Ent4_3 = 0.6169106490008467

# 臍部凹陷的信息熵
p1 = 5 / 7
p2 = 2 / 7
Ent5_1 = ent(p1, p2) = 0.863120568566631
# 臍部稍凹的信息熵
p1 = 3 / 6
p2 = 3 / 6
Ent5_2 = ent(p1, p2) = 1.0
# 臍部平坦的信息熵
p1 = 0.0
p2 = 4 / 4
Ent5_3 = -(p2 * math.log(p2, 2)) = 0.0
# 臍部的信息熵
Ent5 = 7 / 17 * Ent5_1 + 6 / 17 * Ent5_2 + 4 / 17 * Ent5_3 = 0.7083437635274363

# 觸感硬滑的信息熵
p1 = 6 / 12
p2 = 6 / 12
Ent6_1 = ent(p1, p2) = 1.0
# 觸感軟粘的信息熵
p1 = 2 / 5
p2 = 3 / 5
Ent6_2 = ent(p1, p2) = 0.9709505944546686
# 觸感的信息熵
Ent6 = 12 / 17 * Ent6_1 + 5 / 17 * Ent6_2 = 0.9914560571925497

紋理的信息增益(0.9975025463691153 - 0.6169106490008467 = 0.3805918973682686)最大,所以 紋理 被選爲劃分屬性
在這裏插入圖片描述

第二次劃分

模糊分支 都是壞瓜,劃分完畢,下面 以 稍糊分支 爲例

# 初始時的信息熵    
p1 = 1 / 5
p2 = 4 / 5
Ent = ent(p1, p2) = 0.7219280948873623

# 色澤青綠的信息熵
p1 = 0.0
p2 = 2 / 2
Ent1_1 = 0.0
# 色澤烏黑的信息熵
p1 = 1 / 2
p2 = 1 / 2
Ent1_2 = ent(p1, p2) = 1.0
# 色澤淺白的信息熵
p1 = 0.0
p2 = 1 / 1
Ent1_3 = 0.0
# 色澤的信息熵
Ent1 = 2 / 5 * Ent1_1 + 2 / 5 * Ent1_2 + 1 / 5 * Ent1_3 = 0.4

# 根蒂蜷縮的信息熵
p1 = 0.0
p2 = 1 / 1
Ent2_1 = 0
# 根蒂稍蜷的信息熵
p1 = 1 / 4
p2 = 3 / 4
Ent2_2 = ent(p1, p2) = 0.8112781244591328
# 根蒂硬挺的信息熵
p1 = 0.0
p2 = 0.0
Ent2_3 = 0.0
# 根蒂的信息熵
Ent2 = 1 / 5 * Ent2_1 + 4 / 5 * Ent2_2 + 0 * Ent2_3 = 0.64902249956730624

# 敲聲濁響的信息熵
p1 = 1 / 2
p2 = 1 / 2
Ent3_1 = ent(p1, p2) = 1.0
# 敲聲沉悶的信息熵
p1 = 0.0
p2 = 3 / 3
Ent3_2 = 0
# 敲聲清脆的信息熵
p1 = 0.0
p2 = 0.0
Ent3_3 = 0.0
# 敲聲的信息熵
Ent3 = 2 / 5* Ent3_1 + 3 / 5 * Ent3_2 + 0.0 * Ent3_3 = 0.4

# 臍部凹陷的信息熵
p1 = 0.0
p2 = 2 / 2
Ent5_1 = 0.0
# 臍部稍凹的信息熵
p1 = 1 / 3
p2 = 2 / 3
Ent5_2 = ent(p1, p2) = 0.9182958340544896
# 臍部平坦的信息熵
p1 = 0.0
p2 = 0.0
Ent5_3 = 0.0
# 臍部的信息熵
Ent5 = 2 / 5 * Ent5_1 + 3 / 5 * Ent5_2 + 0 * Ent5_3 = 0.5509775004326938

# 觸感硬滑的信息熵
p1 = 0.0
p2 = 4 / 4
Ent6_1 = 0.0
# 觸感軟粘的信息熵
p1 = 1 / 1
p2 = 0.0
Ent6_2 = 0.0
# 觸感的信息熵
Ent6 = 4 / 5* Ent6_1 + 1 / 5* Ent6_2 = 0.0

觸感的信息增益(0.7219280948873623 - 0.0 = 0.7219280948873623)最大,所以 觸感 被選爲劃分屬性

最終結果

剩下的劃分,具體操作 和 上述兩次劃分 是 一樣的,不再贅述。
在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章