Combining the Attribute Oriented Induction and Graph Visualization to Enhancement Association Rules Interpretation

ـــ ـ The important methods of data mining is large and from these methods is mining of association rule. The mining of association rule gives huge number of the rules. These huge rules make analyst consuming more time when searching through the large rules for finding the interesting rules. One of the solutions for this problem is combing between one of the Association rules visualization method and generalization method. Association rules visualization method is graph-based method. Generalization method is Attribute Oriented Induction algorithm (AOI). AOI after combing calls Modified AOI because it removes and changes in the steps of the traditional AOI. The graph technique after combing also calls grouped graph method because it displays the aggregated that results rules from AOI. The results of this paper are ratio of compression that gives clarity of visualization. These results provide the ability for test and drill down in the rules or understand and roll up. Index Terms ـــ ـ Data mining, Association rules, Visualization, AOI.


I. INTRODUCTION
Data mining has a number of common methods; one of such methods is the association rules mining.Apriori is an example of the association rules algorithms .The mining of association rule gives huge number of the rules that resulted from apriori.These huge rules make analyst consuming more time when searching through the large rules for finding the interesting rules, interpreting and evaluation these rules.
Therefore, the problem of dealing with these rules is the basis of the idea of this paper.One of the solutions for this problem is the visualization that makes Audience in interactive with the rules .The visualization of association rules makes the analyst Focus on the main components of the association rules like items in the rules, the relation between the items and the interesting measures that Ingredients of evaluation of the association rules.Many researchers introduced many visualization techniques.This paper dealt one of these techniques is graph-based visualization.This technique Characterized by view way that make easy interpretation the rules by the user.This technique combined with AOI to view large rules.
In this paper, number of step of AOI remove and the others modified.After the combing process, AOI reduces the huge number of the rules to produce the aggregated rules, the graph visualization takes the results of AOI to visualize, AOI is called Modified AOI and the graph technique is also called grouped graph method.
The results of this thesis are ratio of compression that gives clarity of visualization.These results provide the ability for test and drill down in the rules or understand and roll up.This paper contains into six sections.So far, there is an introduction.In section two, a survey of the literature related to the subject is given.In section three, we introduce a preliminary of the method.In section four, we present how rule sets could be grouped by the new modified AOI and then visualize the rules in new grouped graph visualization technique.In section five, results are discussed, while the conclusions are given in section six.

II. RELATED WORKS
This paper revolves around two classes of topics: the first class is the visualization techniques.Scatter plot visualization technique uses support and confidence measures for axes and lift measure for point shading [1] , while two-key plot uses the order measure for point shading [2].Double decker is used for displaying one rule [3].Parallel coordinate uses the items and its position in the rules for axes and arrow for the rules [4].Then matrix-based visualization technique uses antecedent and consequent for axes and interest measures for colored rectangle [5], while (matrix3D) uses the 3D bar instead of colored rectangle.Allrightsreserved©UOITC www.uoitc.edu.iq Hahsler et al. [6] proposed grouped matrixbased visualization technique to enhance matrix-based by grouping the antecedent of the rules.Other techniques like Graph-based visualization technique uses vertex for items or item sets and edges for relationships [7]; [8]; [9].
The second class is the clustering of association rules techniques.In this context, Gupta et al. [10] proposed a new measure that takes the distance between Association rules based on a conditional probability estimate, as in Eq. ( 1) Where the set BS is the union of items in the left and right hand sides of rule i, we call the Conditional Market-Basket Probability (CMPB) distance and CMPB measure is used by Agglomerative Chain Clustering algorithm to find the clusters.M. Klemettinen et al. [11] pruned the rules by extracting a subset that is called a rule covers from the original set of rules, a method for reducing the number of rules by eliminating of redundancy is applied.
Lent et al. [12] introduced a clustered association rule as a rule that is formed by combining similar "adjacent" association rules to form few general rules instead of a set of (attribute = value) equalities.For clustered rules, they had a set of value ranges using inequalities and he considered clustered association rules as in Eq. (2) the association rule is clustered in a two-dimensional space, where each axis represents one attribute on antecedent or Left Hand Side (LHS) and the consequent or Right Hand Side (RHS) that satisfies our segmentation criteria.[7]; [8]; [9] introduced graph-based method that represent the items or the itemsets by vertices and the relationship in rules by edges for visualization of association rule.In Fig. 1a, the vertex uses for the itemsets and directed edges between the itemsets for the rules.In Fig. 1b, the vertex uses for the items and rules share those items.This method selects as basis for the proposed visualization method in the next section.

IV. THE PROPOSED SYSTEM
Figure .2 illustrate the proposed system flow chart, where we enhance the association rules interpretability by the following steps: 1.The system takes the large number of rules from the Apriori algorithm with lift, confidence and support interest measures.2. The proposed modified AOI algorithm performs in Fig. 3. To produce aggregated rules that produce less number of rules than the original rules.3. The aggregated rules views in one of the subjective approaches like the visualization to determine the interesting rules.In particular, graph method.4. The proposed visualization method calls grouped graph method because it views the aggregated rules.5.The analyst can evaluate the system by using the measure that will be some measures.
Now, the proposed system divides into two main stages: modified AOI algorithm and grouped graph visualization method.This two stage discuss in next sections.

A. Modified Attribute Oriented Induction
AOI technique [13] is used to produce general rules or pattern from large set of rules or patterns.By two steps, attribute removal and attribute generalization perform this induction technique [14].
AOI algorithm has number of steps : generalization of smallest attributes, distinct attribute removal When it lacks top-level, concept tree ascension, vote is accumulative when merging identical tuples in generalization, threshold control on maximum number of distinct attribute , generalization threshold controls on distinct tuples of generalized relation in target class, tuple is convert to conjunction formula and set of tuples are convert to disjunction formula.Allrightsreserved©UOITC www.uoitc.edu.iq In this paper, the modified on these steps is satisfied the paper goal in graph visualization method and satisfy new idea as in the following algorithm.In Fig. 3, the inputs for this algorithm are hierarchy trees that are built before the generalization step that contains a number of levels that are defined before any step and a number of levels represented by Generalization Level that entered to the proposed algorithm.In addition, the large number of rules are taken from the Apriori algorithm as input to our algorithm.
Rules reduced by aggregating them, that aggregation will represent the output of this algorithm, and that is the focus of this paper.The resulted algorithm calls Modified AOI.For each rule R i (1<=i <=n, where n=# rules) do 3.
Substitute each itemset k in the antecedent and consequent of R i by its corresponding parent in H k .4.

End.
Fig. 3 The Proposed Modified AOI Algorithm.corresponding parent in the hierarchy tree, then the result from these steps is a set of redundant rules, and the redundancy of the rules are removed by merging the same rules in step 4 to produce the aggregated rules.In steps 6 and 7, all the existing interesting measure of generalized rules like support, confidence and lift are recomputed that are resulted from the previous steps.
Table 1 is illustrated the difference between traditional AOI algorithm and modified AOI by achieving or not achieving this step.

B. Grouped Graph Visualization
The second main stage is the visualization of the resulting rules.This stage takes the output rules from modified AOI to visualize.The proposed method calls Grouped Graph Visualization because some vertices of the rules in the graph that represents a collection of rules instead of one rule as in the previous graph method.
Grouped graph can also visualize every level in the hierarchy tree of the aggregated rules, it can visualize either drill down in the levels to show more detail about the rules or roll up in the levels to show more generalize rules that enable the user to understand large data set and take idea about the nature of the data.Example 1 illustrates this new visualization method.
Example 1: First four transactions are taken randomly from Groceries data set as in Table 2,then apriori algorithm are performed on these transactions to produce 13 rules as in Table 3.The Table 3 visualizes in Fig. 4a.
Secondly, the aggregation performs for rules in Table 3 by the Modified AOI algorithm with levels in Table 4 and show the result in grouped graph visualization.The result from aggregation on Table 3 is 8 rules in level2 as in Table 5 and 6 rules in level1 (more generalize level) as in Table 6.Then the visualization of Table 5 displays in Fig. 4b and Table 6 displays in Fig. 49c.Finally, the general overview is performed about the different data sets in Table 7, the proposed visualization method is compared with different visualization methods to evaluate by the audience, the evaluation of the proposed system is tested by reduction ration measure and monitoring is performed on the performance of the proposed system.
Secondly, The visualization of Association Rules (AR) are often needs four parameters such as sets of LHS items, RHS items, the relation between LHS and RHS, and Interesting Measures (IMs) like support , confidence and lift.
The representation of AR is obtained from The first three parameters , while the evaluation of the AR is obtained from the fourth parameter.The visualization methods are differed in these parameters.Therefore, the comparison between these visualization methods performs according to the above parameters.Now we will discuss four criteria to make difference between the visualization methods as the followings: -Appearance of RHS Item: This criterion is the same as the above criterion except with RHS.
Graph for 13 rules -The value of IM: The value of IM instead the shading in the visualization method gives accuracy in evaluation AR.Table 8 is illustrate these criteria.
Thirdly, Reduction Ratio is a ratio of compression of some operation [18] as in Eq. ( 3) : This measure is applied on the result of grouped graph method and to show the ratio of compression from the aggregation by the modified AOI technique.The result from this measure is explained in Table 9 on a number of nodes, edges of the graph and on the number of the rules that resulting from aggregation of the modified AOI technique.

Fig. 1
Fig. 1 Graph-based visualization with itemsets as vertices or with items and rules as vertices.

Fig. 2
Fig.2The general architecture of the proposed system.

Fig. 4
Fig. 4 Graph visualization for original and aggregation rules in level1 and level2.

Fig. 6
Fig. 6 Monitoring for Memory Usage for Each Levels of Groceries Data Set

Table 1 :
Comparison between traditional AOI and modified AOI.

Table 4 :
The levels of Groceries data set.

Table 5 :
Aggregation 8rules in level2 from Groceries data set containing 13 rules.

Table 6 :
Aggregation 6 rules in level1 from Groceries data set containing 13 rules.

Table 7 :
Data set description -Appearance of LHS Items: means Appearance of LHS items of AR in the visualization method.Appearance of LHS items in a clear way helps the analyst to show and evaluate the AR. this criteria ranges [-1, 1]. 1 means Appearance is good, -1 means Appearance is bad.

Table 8
Comparison between visualization methods in representation and evaluation of the AR.

Table 9 :
Reduction ratio of the proposed system on different datasets.