關聯分析-Apriori法python代碼註解

自己的一點點領悟,可能會有點小錯誤,歡迎交流^_^

獲得頻繁項集

主要思想

這裏寫圖片描述

python代碼

def loadDataSet():
    return [[1,3,4],[2,3,5],[1,2,3,5],[2,5]]

createC1(dataSet)獲得所有第一層的所有項集

這裏寫圖片描述

def createC1(dataSet):
    C1 = []
    for transaction in dataSet:
        for item in transaction:
            if not [item] in C1:
                C1.append([item])

    C1.sort()
    return map(frozenset,C1)
#scanD是根據訓練數據D,來判斷Ck裏面一堆的項集是否是頻繁的。

def scanD(D,Ck,minSupport):
    ssCnt = {}
    for tid in D:
        for can in Ck:
            if can.issubset(tid):
                if not ssCnt.has_key(can): ssCnt[can] = 1
                else: ssCnt[can] += 1
    numItems = float(len(D))
    retList = []
    supportData = {}
    for key in ssCnt:
        support = ssCnt[key] / numItems
        if support >= minSupport:
            retList.insert(0,key)
        supportData[key] = support
    return retList,supportData
#根據前一層的項集的合併得到下一層的。比如
#值得注意的是這樣得到的下一層不一定就是頻繁項集,還得進行k-2次的判斷
{1,2} {3,4} {1,3} 就可以得到{1,2,3}
def aprioriGen(Lk,k):
    retList = []
    lenLk = len(Lk)
    for i in range(lenLk):
        for j in range(i+1,lenLk):
            L1=list(Lk[i])[:k-2];L2=list(Lk[j])[:k-2]
            L1.sort();L2.sort()
            if L1==L2:
                retList.append(Lk[i] | Lk[j])
    return retList
#主函數,給出數據返回頻繁項集
def apriori(dataSet,minSupport=0.5):
    C1 = createC1(dataSet)
    D = map(set,dataSet)
    L1,supportData = scanD(D,C1,minSupport)
    L = [L1]
    k = 2
    while (len(L[k-2]) > 0):
        Ck = aprioriGen(L[k-2],k)
        Lk,supK=scanD(D,Ck,minSupport)
        supportData.update(supK)
        L.append(Lk)
        k += 1
    return L,supportData

根據頻繁項集獲得關聯規則

主要思想

只看規則的右邊發現就是之前獲得頻繁項集的方法哦
然後對於一個頻繁項集定義的規則必須包含所有的元素,那麼只要一個規則的右邊確定了的話,規則的左邊=頻繁項集-右邊的。下面就是用H規則右邊的可能情況表示。
這裏寫圖片描述

pythoh代碼

//主函數. 初始狀態 使得規則右邊也就是H只有一個元素。
def generateRules(L,supportData,minConf=0.7):
    bigRuleList=[]
    for i in range(1,len(L)):
        for freqSet in L[i]:
            H1 = [frozenset([item]) for item in freqSet]
            if(i > 1):
                rulesFromConseq(freqSet,H1,supportData,\
                bigRuleList,minConf)
            else:
                calcConf(freqSet,H1,supportData,bigRuleList,\
                         minConf)
    return bigRuleList
//計算規則的支持度是否符合要求。最後返回所有可能的 規則右邊的集合prunedH. brl存放了所有滿足要求的規則。
def calcConf(freqSet,H,supportData,brl,minConf=0.7):
    prunedH = []
    for conseq in H:
        conf = supportData[freqSet] / supportData[freqSet-conseq]
        if conf >= minConf:
            print freqSet-conseq,'-->',conseq,'conf:',conf
            brl.append((freqSet-conseq,conseq,conf))
            prunedH.append(conseq)
    return prunedH
//就像頻繁項集一樣,試圖對規則的右邊也就是H進行合併.然後產生新的規則
def  rulesFromConseq(freqSet,H,supportData,brl,minConf=0.7):
    m = len(H[0])
    if (len(freqSet) > (m+1)):
        Hmp1 = aprioriGen(H,m+1)
        Hmp1 = calcConf(freqSet,Hmp1,supportData,brl,minConf)
        if (len(Hmp1)>1):
            rulesFromConseq(freqSet,Hmp1,supportData,brl,minConf)




注意點

apriori

轉自Henry
At each level kk, you have kk-item sets which are frequent (have sufficent support).

At the next level, the kk+11-item sets you need to consider must have the property that each of their subsets must be frequent (have sufficent support). This is the apriori property: any subset of frequent itemset must be frequent.

So if you know at level 2 that the sets {1,2}{1,2}, {1,3}{1,3}, {1,5}{1,5} and {3,5}{3,5} are the only sets with sufficient support, then at level 3 you join these with each other to produce {1,2,3}{1,2,3}, {1,2,5}{1,2,5}, {1,3,5}{1,3,5} and {2,3,5}{2,3,5} but you need only consider {1,3,5}{1,3,5} further: the others each have subsets with insufficent support (such as {2,3}{2,3} or {2,5}{2,5} ).

極大頻繁集

包含他的都不是頻繁集

閉頻繁集

包含他的支持度計數都小於他

習題

1

2

(a) s({e}) = 0.8 s({b,d}) = 0.2 s({b,d,e}) = 0.2

3

(a) C(ϕA)=S(A)
(b) c1>c2,c2<c3 -> c1>=c2,c2 <= c3
(c) 規則具有相同的置信度->支持度也就是left->right {left,rigth}的支持度一樣

6

(a) 36262+1=602
(b) 4
(c) 5+C(4,3)+1+C(4,3) -> C(6,3)
(d) 黃油,麪包

7

(b) {1,2,3,4},{1,2,3,5},{1,2,4,5},{1,3,4,5},{2,3,4,5}
(c) {1,2,3,4},{1,2,3,5}, //無{1,4,5},無{2,4,5}

8

  1. 在畫圖的時候要注意,不僅僅是I的時候要向下畫N,在是N的時候也也要向下畫N。
  2. F/total
  3. I/total
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章