自己的一點點領悟,可能會有點小錯誤,歡迎交流^_^
獲得頻繁項集
主要思想
python代碼
def loadDataSet():
return [[1,3,4],[2,3,5],[1,2,3,5],[2,5]]
createC1(dataSet)獲得所有第一層的所有項集
def createC1(dataSet):
C1 = []
for transaction in dataSet:
for item in transaction:
if not [item] in C1:
C1.append([item])
C1.sort()
return map(frozenset,C1)
#scanD是根據訓練數據D,來判斷Ck裏面一堆的項集是否是頻繁的。
def scanD(D,Ck,minSupport):
ssCnt = {}
for tid in D:
for can in Ck:
if can.issubset(tid):
if not ssCnt.has_key(can): ssCnt[can] = 1
else: ssCnt[can] += 1
numItems = float(len(D))
retList = []
supportData = {}
for key in ssCnt:
support = ssCnt[key] / numItems
if support >= minSupport:
retList.insert(0,key)
supportData[key] = support
return retList,supportData
#根據前一層的項集的合併得到下一層的。比如
#值得注意的是這樣得到的下一層不一定就是頻繁項集,還得進行k-2次的判斷
{1,2} {3,4} {1,3} 就可以得到{1,2,3}
def aprioriGen(Lk,k):
retList = []
lenLk = len(Lk)
for i in range(lenLk):
for j in range(i+1,lenLk):
L1=list(Lk[i])[:k-2];L2=list(Lk[j])[:k-2]
L1.sort();L2.sort()
if L1==L2:
retList.append(Lk[i] | Lk[j])
return retList
#主函數,給出數據返回頻繁項集
def apriori(dataSet,minSupport=0.5):
C1 = createC1(dataSet)
D = map(set,dataSet)
L1,supportData = scanD(D,C1,minSupport)
L = [L1]
k = 2
while (len(L[k-2]) > 0):
Ck = aprioriGen(L[k-2],k)
Lk,supK=scanD(D,Ck,minSupport)
supportData.update(supK)
L.append(Lk)
k += 1
return L,supportData
根據頻繁項集獲得關聯規則
主要思想
只看規則的右邊發現就是之前獲得頻繁項集的方法哦
然後對於一個頻繁項集定義的規則必須包含所有的元素,那麼只要一個規則的右邊確定了的話,規則的左邊=頻繁項集-右邊的。下面就是用H規則右邊的可能情況表示。
pythoh代碼
//主函數. 初始狀態 使得規則右邊也就是H只有一個元素。
def generateRules(L,supportData,minConf=0.7):
bigRuleList=[]
for i in range(1,len(L)):
for freqSet in L[i]:
H1 = [frozenset([item]) for item in freqSet]
if(i > 1):
rulesFromConseq(freqSet,H1,supportData,\
bigRuleList,minConf)
else:
calcConf(freqSet,H1,supportData,bigRuleList,\
minConf)
return bigRuleList
//計算規則的支持度是否符合要求。最後返回所有可能的 規則右邊的集合prunedH. brl存放了所有滿足要求的規則。
def calcConf(freqSet,H,supportData,brl,minConf=0.7):
prunedH = []
for conseq in H:
conf = supportData[freqSet] / supportData[freqSet-conseq]
if conf >= minConf:
print freqSet-conseq,'-->',conseq,'conf:',conf
brl.append((freqSet-conseq,conseq,conf))
prunedH.append(conseq)
return prunedH
//就像頻繁項集一樣,試圖對規則的右邊也就是H進行合併.然後產生新的規則
def rulesFromConseq(freqSet,H,supportData,brl,minConf=0.7):
m = len(H[0])
if (len(freqSet) > (m+1)):
Hmp1 = aprioriGen(H,m+1)
Hmp1 = calcConf(freqSet,Hmp1,supportData,brl,minConf)
if (len(Hmp1)>1):
rulesFromConseq(freqSet,Hmp1,supportData,brl,minConf)
注意點
apriori
轉自Henry
At each level kk, you have kk-item sets which are frequent (have sufficent support).
At the next level, the kk+11-item sets you need to consider must have the property that each of their subsets must be frequent (have sufficent support). This is the apriori property: any subset of frequent itemset must be frequent.
So if you know at level 2 that the sets {1,2}{1,2}, {1,3}{1,3}, {1,5}{1,5} and {3,5}{3,5} are the only sets with sufficient support, then at level 3 you join these with each other to produce {1,2,3}{1,2,3}, {1,2,5}{1,2,5}, {1,3,5}{1,3,5} and {2,3,5}{2,3,5} but you need only consider {1,3,5}{1,3,5} further: the others each have subsets with insufficent support (such as {2,3}{2,3} or {2,5}{2,5} ).
極大頻繁集
包含他的都不是頻繁集
閉頻繁集
包含他的支持度計數都小於他
習題
1
2
(a) s({e}) = 0.8 s({b,d}) = 0.2 s({b,d,e}) = 0.2
3
(a)
(b) c1>c2,c2<c3 -> c1>=c2,c2 <= c3
(c) 規則具有相同的置信度->支持度
也就是left->right {left,rigth}的支持度一樣
6
(a)
(b) 4
(c) 5+C(4,3)+1+C(4,3) -> C(6,3)
(d) 黃油,麪包
7
(b) {1,2,3,4},{1,2,3,5},{1,2,4,5},{1,3,4,5},{2,3,4,5}
(c) {1,2,3,4},{1,2,3,5}, //無{1,4,5},無{2,4,5}
8
- 在畫圖的時候要注意,不僅僅是I的時候要向下畫N,在是N的時候也也要向下畫N。
- F/total
- I/total