http://www.cs.uic.edu/~liub/teach/cs583-spring-12/cs583.html
I have implemented the first two passes of Apriori Algorithm as a part of an academic assignment. It was for constructing attack graphs which are used for threat prediction and vulnerability assessment. The algorithm is implemented in python and its very simple. It can be extended for k passes of the algorithm.
Here is the code:
minsup = 0.3
minconf = 0.8
def count_first(transactions):
adict = {}
for t in transactions:
for item in t:
if item in adict:
adict[item] += 1
else:
adict[item] = 1
return adict
def find_frequent(Candidate, minsup, no_of_lines):
adict={}
for key in Candidate:
if ((float)(Candidate[key]/no_of_lines)) >= minsup:
adict[key] = Candidate[key]
return adict
def candidate_gen(keys):
adict={}
for i in keys:
for j in keys:
if i != j and (j,i) not in adict:
adict[tuple([min(i,j),max(i,j)])] = 0
return adict
def add_frequency(Candidate, transactions):
for key in Candidate:
for t in transactions:
if key[0] in t and key[1] in t:
Candidate[key] += 1
return Candidate
f = open("testcase.txt","r")
transactions = []
no_of_lines=0
for line in f:
split_line = line.split()
transactions.append(split_line)
no_of_lines = no_of_lines + 1
print(no_of_lines)
#First iteration
C1 = count_first(transactions)
F1 = find_frequent(C1,minsup,no_of_lines)
#Second iteration
C2 = candidate_gen(F1.keys())
C2 = add_frequency(C2,transactions)
F2 = find_frequent(C2,minsup,no_of_lines)
print(F2)
It accepts input data from the file "testcase.txt". In this file, each new line represents one transaction. Each such transaction contains items represented by 'integers' and separated by spaces. So, essentially, each line is a collection of integers separated by spaces.
The program makes two passes of apriori over input data. It outputs pairs of frequent item sets along with their 'support' metric values.
Comments and suggestions are welcome :)
what does the testcast.txt look like ?
ReplyDeleteCan i have implemented algorithm code??
ReplyDeleteThis is basic functions defined and its algorithmic idea.
I want a full source code in programming language.