Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf ·...

20

Click here to load reader

Transcript of Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf ·...

Page 1: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

FACULTATEA DE MATEMATICĂ UNIVERSITATEA DE VEST ŞI INFORMATICĂ DIN TIMIŞOARA

Extensible Environment for Knowledge Representation and Reasoning

Mediu extensibil pentru reprezentare si rationament

bazat pe cunostinte

- REFERAT - DOCTORAND CONDUCĂTOR ŞTIINŢIFIC As. Daniel POP Prof. Dr. Ştefan MĂRUŞTER

Page 2: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

CONTENTS

1. Introduction.......................................................................................................................................................3

2. Knowledge Representation ..............................................................................................................................4

2.1. The Equivalence of Representation Forms ..................................................................................................5 2.2. Rules and Decision Table ............................................................................................................................6 2.3. Decision Table and Decision Tree...............................................................................................................6 2.4. Extensions to Standard Knowledge Representation Forms .........................................................................8

3. Expert System Creator Overview....................................................................................................................9

3.1 Decision Frame Designer ...........................................................................................................................10 3.2 Decision Table Designer.............................................................................................................................11 3.3 Decision Tree Designer ..............................................................................................................................11

4. Verification, Validation and Debugging .......................................................................................................12

4.1 Rules Set......................................................................................................................................................12 4.2 Decision Table ............................................................................................................................................13 4.3 Decision Tree..............................................................................................................................................14

5. Embedding Expert Systems in Host Systems ...............................................................................................14

5.1. Code Generator Module ............................................................................................................................14 5.2. Use of Dictionaries ....................................................................................................................................15

6. Database Integration ......................................................................................................................................15

6.1. ESDB Modules’ Description......................................................................................................................16 6.2. Working With ESDB ..................................................................................................................................18 6.3. Exception Handling ...................................................................................................................................18

7. Applications, Future Research and Extensions............................................................................................19

References............................................................................................................................................................20

Page 3: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 3

1. Introduction

Recent explosive growth in data and the availability of large datasets in multiple disciplines generates the need for new techniques and tools that can intelligently transform the data into useful information and knowledge. The machine learning algorithms synthesize various knowledge representation forms, most of them equivalent and inter-changeable. A new approach is needed to support and combine the knowledge representation form with various data sources (represented by database management systems). In this paper, a powerful, visual CASE tool for knowledge management is introduced. It supports the integration of high-level knowledge “beans” into host projects. A scalable approach to the problem of data integration from conventional database systems in expert systems is also overviewed.

The domain of intelligent database systems has been evolving since the late 70s with contributors both from database systems and from artificial intelligence fields. Database designers and manufacturers are interested in adding semantic support in databases, while artificial intelligence practitioners are willing to apply their theories and algorithms to real life sets of data (considerably larger than experimental ones).

Previous works in database fields deal with semantic and hyper-semantic data models ([11][12]), object-oriented data models [13] and especially active database systems. Both experimental active database systems (Starburst rule system being the most notable [6]) and commercial approaches (e.g. Oracle, Sybase or DB2) are available.

There are many works, experiments, projects and commercial products in the field of AI, which try to integrate database support into existing reasoning systems and vice-versa. In [2] the authors classify the solutions to the above-mentioned problem in four categories: general bridge approaches, extending KBSs with components proper to DBMS, extending DBMSs with components proper to a KBS and the integration of KBS and DBMS. Our approach falls under the ‘full (general) bridge’ architecture. Similar previous works include DBCL [8], DIFEAD [1] and KADBASE [7] systems. The DBCL (DataBase Call Language) is a relatively simple example of the general bridge architecture concerning the ‘tight coupling’ between Prolog-based expert systems and a relational DBMS. The DIFEAD (Dictionary Interface For Expert systems And Databases) approach uses a data dictionary in the communication channel between DBMS and KBSs. The KADBASE prototype has a distributed architecture, which integrates information contained in the schemata of the individual engineering databases into a single global schema. The network data access manager maps requests from the KBS’s data manipulation language and structure to the global data manipulation language and structure.

Even the coupling knowledge-based systems (KBSs) with ‘standard’ database management systems (DBMSs) had its moment of glory in the mid-1980s, recent advances in ES shells (JESS - Java Expert System Shell for example) and emerging new DBMSs technologies (like JDBC or ADO) raise new problems and provide solutions that are more flexible.

An approach towards developing a software engineering tool for knowledge management by merging conventional CASE tool facilities with the expert system technology is introduced. The Expert System Creator assists the human designer by efficient encoding and by reusing the expert knowledge. One of the most well known tools for the development of rule-based expert systems is CLIPS (C Language Integrated Production System) [3] environment from NASA. In the late 90s, its Java counterpart, the JESS (Java Expert System Shell) [6], received great interest from both commercial and academic environments. More recently, a family of software CREATOR expert systems has been developed [10][5]. Although an application of these systems is assisting the human designer when using a

Page 4: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 4

conventional CASE tool, they do not support the translation between different knowledge representation forms nor the debugging or profiling phases, in contrast to the Expert System Creator suite.

Expert System Creator is a software tool for the development, testing and debugging, profiling and optimization of expert system based applications. Its main facilities include: • representing domain knowledge using rules set, decision tables or decision trees • additional representation power, such as using pictorial elements and designers'

comments • automatic code generation for declarative, functional or object-oriented programming

languages • integration with “native” external programs and with expert systems shells (such as JESS

or CLIPS)

After the main facilities are overviewed, the integration of expert systems with external programs is presented. The next section presents the knowledge representation forms and their equivalence. Section 3 outlines the Expert System Creator architecture and briefly describes its modules. In section 4 aspects related to code generation and integration with host projects are detailed. The database integration issue is addressed in section 5. The last section presents several applications, as well as and future research directions and extensions. 2. Knowledge Representation

For the representation of knowledge in expert systems, a number of forms are used, such

as: rules set (production rules, association rules, rules with exceptions), decision tables, classification and regression trees, instance-based representations, and clusters. Table 1 synthesizes the main features of each knowledge representation form. Each representation has its advantages and drawbacks. Expert System Creator is endowed with advanced graphical manipulation tools for three of the above forms: decision table, decision tree and rules set. While a human expert can build in a straightforward way an expert system based on classification or association rules, the decision table representation allows easy automatic analysis and consistency checking. A decision tree can be very easily translated into any common programming language (such as C/C++, Java, Pascal etc.). As these three forms are equivalent [1][14], an expert system built in one of them can be translated into any other one. The rest of the section will briefly describe the knowledge representation forms (see [19][1] for a detailed presentation).

Rules set - A rule-based system form is a set of one or more rules. A rule has two parts - an antecedent and a consequent – and has the following form: if antecedent then consequent, where the antecedent of a rule is a conjunction of elementary constraints, and the consequent is a sequence of elementary actions. Rules may be production rules, association rules and rules with exceptions.

Decision table - A decision table consists of a two-dimensional array of cells, where the columns contain the system’s constraints and each row makes a classification according to each cell’s value (case of condition).

Decision tree - A decision tree consists of a set of nodes and a set of arcs [15]. The set of nodes is divided into two classes: decision nodes and classification/action nodes (leaf nodes). Each decision node is associated with a constraint of the system and each leaf node (classification node) makes the classification based on the cases of the constraints from the decision nodes. Each arc has as its source a deciding node, and is associated with a case corresponding to the constraint from the source decision node, and the destination is a decision or classification node.

Page 5: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 5

A special type of decision trees is the binary tree, which consists of a set of decision nodes and a set of leaf nodes. A decision node is associated with a boolean constraint. A boolean constraint can have two cases: true and false. There are algorithms which implement a decision table in a programming language using binary decision trees [4].

Regression tree – When it comes to predict numeric quantities, the same kind of tree representation as above can be used, but the leaf nodes of the tree would contain a numeric value which is the average of all the training set values that the leaf applies to.

Clusters – In case of clusters, the knowledge takes the form of a diagram that shows how the instances fall into clusters. In the simplest case this involves associating a cluster number with each instance. Instances can be associated probabilistic to more than one cluster. Clustering is often followed by a stage where a decision tree or rule set is inferred.

Table 1. Knowledge representation forms

Knowledge Form Features Decision Tables - automatic correctness analysis

- compact, but rudimentary, form - easy to visualize and understand

Decision trees Regression Trees

- procedural or object-oriented programming language code can be easily generated

- automatically built from large datasets using data mining and statistical algorithms

Rules: Production rules Association rules Rules with exceptions

- easily built and assimilate by human experts - important, mature and large systems are already in use

Clusters - unsupervised learning - uses data visualization techniques - often followed by a decision tree (rule set) inferring stage

2.1. The Equivalence of Representation Forms

When a decision object is transformed from one form into another, the new object can be far larger than the original (it has more parts – rules, rows or nodes - than the original), and far less compact (not all of its parts will be used in practice), caveat named inflation problem. For example, applying brute-force algorithms to the well-known medical expert system Garvan ES1 [9] constructed as a set of production rules results in a decision table with only 13% useful rows [1]. The decision tree obtained from this decision table had 3% of useful parts. These larger objects are side effects of the fact that these decision objects are constructed as total functions on the attribute space (the space defined by the distinct possible choices for the variables monitoring of the classification system). This observation suggests that in systems with large attribute spaces the real-world process can be confined to a very small region. The knowledge of the experts is confined to this small region, which is called the region of experience. A natural way to represent the characterization K of the region of experience is by a set of constraints stating that the values of certain variables are determined by the values of others (for example if a patient is a male, pregnant variable will always be false, although sex and pregnant are independent variables). These constraints are called partial functional dependencies (PFD) [1], [2]. PFD could be constructed by the domain experts as part of the process of building a decision object. There are also proposals for estimating this region using algorithms from rough sets theory or algorithms from data mining literature, more specifically association rules discovery.

Page 6: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 6

The main idea is to represent the knowledge only over the region of experience instead of the entire attribute space. A case is consistent with K if it is consistent with at least one PFD of K (i.e. there is one PFD which verifies the case). The translation algorithms from one representation form to another implemented in the Expert System Creator suite are based on the above-mentioned ideas.

Implementation details and experimental results for decision tables and decision trees are presented in [14]. The most distinct and representative knowledge representation forms are discussed below: decision tree, rules set and decision table.

Let us suppose that the characterization K of the region of experience was obtained using one of the methods presented above. We present translation algorithms modified so that they use K, the set of PFDs. A good estimate of K would not only control inflation, but would act as a filter detecting cases which are either spurious or perhaps legitimate but outside the experience of the domain experts

2.2. Rules and Decision Table

An acyclic rule system defined on the region of the attribute space covered by K is

equivalent to a decision table defined on the same region of the attribute space. Construction Algorithm: 1. Table => Rule based system Each row of the table is translated into a rule in the rule-based system as follows:

1. the antecedent is the conjunction of all cell’s values from one row 2. its consequent is represented by the classification performed by the row Rule based system => Table

For each rule of the system, the antecedent is unfolded so that it is represented as a conjunction of cases. In the unfolding process, each case is tested for consistency with respect to K and inconsistent cases will be removed. The resulting antecedent will form a new row in decision table. A column in the decision table will hold the consequent of each rule. The constructed decision table will be therefore consistent with respect to K.

2.3. Decision Table and Decision Tree

A decision tree defined on the region of the attribute space covered by K is equivalent to

an unambiguous decision table defined on the same region of the attribute space, which is complete with respect to K if the tree is.

Construction Algorithm: a. Decision tree => Decision Table

Each path through the decision tree consistent with K is included in the decision table. If the path is inconsistent with K, no valid case will be consistent with it and path will not be included in decision table. As each classification (leaf) node of the decision tree performs a unique classification, the resulting decision table is unambiguous.

If the tree is complete with respect to K, then for every case C covered by K, there is one path through the tree consistent with C. This path P(C) is consistent with K by definition of K, so is included in the table. Thus, every case C covered by K is consistent with at least one row of the table and therefore the table is consistent with respect to K.

b. Decision Table => Decision tree (by induction on the number of attributes associated with the table) Basis step: If any of the following is the case, then the induction terminates with the

indicated action:

Page 7: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 7

1. The number of attributes is zero. Create a leaf node whose classification is the table's single classification, and whose leaf proposition is TRUE (depends on the table being unambiguous);

2. The number of rows is zero. Create a leaf node whose classification elementary proposition is arbitrary and whose leaf proposition is FALSE;

3. The number of distinct classifications is one. Create a leaf node whose classification elementary proposition is the table's unique classification, and whose leaf proposition is the disjunction of propositions created from each row by the conjunction of the cell propositions.

Induction step: There is at least one attribute, at least one row, and at least two distinct classifications in the table T.

1. Select an attribute A by some method (this point is discussed below); 2. Create a deciding node associated with this attribute; 3. For each v in the value set of a such that the path proposition in the tree down to A

(P(A, v)) is consistent with K, create: a. an arc whose source is the deciding node created and whose arc proposition is A = v; b. a decision table T(A, v) with one row derived from each row r of T in which the cell

associated with A is consistent with A = v, and such that P(A, v) & r consistent with K, by removing the cell associated with A.

T(A, v) has one fewer attribute in its associated attribute set than T, so the induction proceeds.

This proves that the tree is equivalent to the table on K, since no branch pruned is consistent with K, nor is any row excluded from the T(A, v). The tree is therefore complete with respect to K if the table is.

Table 2. Decision table to decision tree algorithm

Tree Table2Tree(Table T) if number_of_constraints == 0 or number_of_rows == 0 or number_of_distinct_classifications == 1 then return new Tree(LeafNode()); Constraint sc = SelectSplit(T); DecNode dn = new DecNode(sc); Foreach v in Valueset(sc) do dn.AddChild(sc==v, Table2Tree(DeriveTable(T, sc, v))); Endfor Return new Tree(dn); End DeriveTable procedure constructs a new decision table T(sc, v) with one row derived from each row r of T in which the cell associated with sc is consistent with sc = v, by removing the cell associated with sc.

The key to the algorithm developed above is in the method of choosing an attribute in part 1 of the induction step. Shwayder [17], [18] uses a heuristic based on equalizing the number of rows in the T(A, v) (maximum dispersion), giving one of the standard algorithms for translating from a decision table to a decision tree. Quinlan [15] uses a heuristic based on minimizing the maximum number of distinct classifications in the T(A, v) (maximum entropy gain). With the maximum entropy gain selection procedure the algorithms gives a variant of the ID3 algorithm.

Page 8: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 8

In order to compare various heuristics (in SelectSplit procedure), we implemented three selection procedures based on ID3, C4.5 and MDV (maximum distinct values, the attribute with the most distinct values is chosen first). The translation algorithm was implemented based on the construction presented above. The results are presented in Table 2:

Table 3. Decision table to decision tree algorithms performances

Decision Table (rows)

MDV (cn/dn/l)

ID3 (cn/dn/l)

C4.5 (cn/dn/l)

37 37/38/8 37/29/7 37/74/16

59 59/52/9 59/52/8 59/139/17

100 100/92/ 17

100/92/8 100/249/ 17

where cn is the number of classification nodes of the tree, dn represents the number of

decision nodes of the tree and l is the number of tree levels. 2.4. Extensions to Standard Knowledge Representation Forms

For the representation of knowledge in expert systems, a number of forms are used, such as: rules set (production rules, association rules, rules with exceptions), decision tables, classification and regression trees, instance-based representations, and clusters. Each representation has its advantages and drawbacks. Expert System Creator is endowed with advanced graphical tools for three of the above forms: decision table, classification tree and rules set. As these three forms are equivalent [9], an expert system built in one of them can be translated into any other one. In order to overcome the limitations of the “standard” decision table and tree representation, we have introduced several extensions to the basic models: pre-actions, intermediate-actions and scoring. These will be presented in this section.

The Decision Frame Designers handles the rule-based expert systems. The most known shells for rule-based expert systems are CLIPS (C Language Integrated Production System) [2] and JESS (Java Expert System Shell) [5]. Both shells are integrated in the Decision Frame Designer and users can switch from one to another at any moment. The Decision Frame Designer [9] is a project-based, user-friendly application for the development, verification, debugging and profiling of rule-based systems. A rule-based expert system can be integrated in C++/Java host projects. The calling code is automatically generated by the Decision Frame Designer.

A decision table consists of a two-dimensional array of cells, where the columns contain the system’s constraints and each row makes a classification according to each cell’s value (case of condition). We are proposing an extended model for the decision table. Each condition has associated two new concepts: pre-actions and score. A pre-action denotes a specific action that is executed before the condition is evaluated. Each table’s condition (column) has associated a list of pre-actions that is executed prior to condition’s evaluation. A pre-action may is represented by variable initialization, user input/output handling etc. The Decision Table Designer let users to construct a decision table object in a visual way.

A classification tree consists of a set of nodes and a set of arcs [11]. The set of nodes is divided into three classes: decision nodes, intermediate-action nodes and classification nodes (leaf nodes). Each decision node is associated with a constraint of the system and each leaf

Page 9: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 9

node (classification node) makes the classification based on the cases of the constraints from the decision nodes. Each arc has as its source a deciding node, and is associated with a case corresponding to the constraint from the source decision node, and the destination is a decision or classification node. The standard decision tree model was extended with a new type of nodes – intermediate-action nodes – that let users to specify actions to be executed before the decision node’s evaluation. The semantic of intermediate-action is similar to decision table’s pre-action. The Decision Tree Designer offers the possibility to develop classification trees, providing a structured designer interface that let users to group tree’s nodes at each level, thus saving design space.

To support inexact reasoning, the decision table/tree were enhanced with scores. A score, represented by a real number between 0 and 1, is attached to each condition of a decision table/tree. The score is a measure of user’s confidence in the specific condition. The final rule’s score (confidence) is obtained by combining the conditions’ scores. Various combination functions can be implemented, but for now, experiments using cumulative and multiplicative functions have been made. 3. Expert System Creator Overview

An overview of the Expert System Creator is shown in Fig. 1. The human expert prepares an initial design using one of the graphic designers: Decision Frame, Decision Table or Decision Tree. This is converted into programming language code by the Code Generator module. The designers can compile the generated code and report any existing errors. In order to use symbols (constants, variables, functions, user types) from other software projects (denoted as host projects) a Dictionary is introduced. The code generated by the Code Generator module can be automatically integrated in the host project. The integrated debuggers in Decision Frame/Table/Tree Debugger can be used to debug the generated code in its original form (as a frame, table or tree). The database integration is ensured by a set of four components (DBRepository, DBEngine, DBWizard and DBMiddleware) that inter-communicate and that have external interfaces for other system’s modules (see Section 5 for details). The communication between Expert System Creator and the host project is detailed in next section.

The Expert System Creator suite supports all the phases of a project’s life cycle: analysis and documentation, development and implementation, testing, debugging and profiling.

Three powerful graphic designers support the analysis and development phases: • Decision Frame Designer – rule-based expert system development • Decision Table Designer – decision table based expert systems construction • Decision Tree Designer – decision tree based expert systems development

The graphical elements provided by each designer, help the human expert to express his/her knowledge in the most suitable form. To add a new rule, or a new constraint is as simple as selecting an option from designer’s menu. Although the user can manually enter the text for rule (table, tree)’s constraints or actions, the system offers graphical code editors. Graphical code editors present the user with the list of defined objects in system, the list of available operators, the list of imported data and data types in Dictionary, etc.

The system is entirely implemented in Java using Java2™ SDK 1.3. The Decision Frame module works together with CLIPS [3] or JESS [8] expert system shells that perform the knowledge-based reasoning process. Interfacing with external programs written in C/C++ is carried out using JNI specifications [8]. The Code Generator module supports the following programming languages: C/C++, Java (for decision tables and trees) and CLIPS/JESS (for decision frames).

Page 10: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 10

Code generation Send/Receive values Execution

Figure 1: Expert System Creator Architecture 3.1 Decision Frame Designer

It is a project based GUI application. A project contains one or more modules, each

module being composed of templates, global variables, functions and rules. Fig. 2 illustrates a snapshot of a Decision Frame work session. The left pane is divided in two areas: Project Browser in the upper left and Property Panel in the bottom left. The Project Browser displays all objects defined in the current module of decision frame. Each time an object is selected, the Property Panel shows its properties. Using available buttons, the graphical editors are invoked to add new constraints (conditions) or new actions. The right pane, the Graphic Pad, displays system entities and the semantic graph (see below for details). The structured graphical designer implemented in the Graphic Pad uses blocks to group entities in higher-level structures.

The Decision Frame Debugger offers the possibility to debug the expert system in the form of rule-based system. Its advanced facilities (breakpoints at rule level, variables inspection, rule-by-rule execution, step-by-step execution of rule’s consequent) are presented to the user in a visually, easy to use way.

The rule-based system profiler helps users to find time-consuming rules in order to optimize their systems. It records the execution context formed by the antecedent and the consequent in the trace files. The Trace Viewer visualizes trace files.

The knowledge base completion and correctness are key issues in designing large knowledge base systems. Expert System Creator includes appropriate mechanisms for visualizing and testing the correctness of constructed knowledge bases.

For decision frames, the semantic graph is visualized as can be seen in Fig. 2. The semantic graph is a pair SG=(N, A), where N – the set of nodes – is represented by system’s rules, and A is the set of arcs. An arc is defined with source N1 and target N2 if the consequent of the rule N1 asserts a fact that appears in the antecedent of the rule N2.

DESIGNERS

Decision Table

Decision Tree

Decision Frame

CODE GENERATOR

DEBUGGERS

TRACEVIEWER TRACE FILE

HOST PROJECT

DIC

TIO

NA

RY

C++

/Jav

a C

ode

Use

r-de

fined

dat

a an

d da

ta ty

pes

Relational DBMS

Relational DBMS

DATABASE INTEGRATION

DBWizard

DBEngine

DBMiddleware

DBRepository

Page 11: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 11

Figure 2. Expert System Creator: Decision Frame Designer

3.2 Decision Table Designer

The Decision Table Designer it is also a project based application, each project containing one or more tables. Its user interface is similar to Decision Frame Designer’s. It integrates the Dictionary Manager module in order to import definitions from host projects (see section 4 for details). The designer generates C++/Java code for the tables of current project using a pre-conversion to decision tree form. In addition to debugging the generated code with appropriate tools for target programming languages (C++/Java), the integrated debugger supports the testing and debugging of constructed system in the form of decision table. Its advanced facilities (breakpoints, variables inspection and step-by-step execution) are presented to the user in a visually, easy to use way.

In order to check the correctness and completeness of the decision table, the Table Analyzer tool highlights the duplicated and ambiguous rules that exist in the table. Fig. 5 shows two ambiguous rules (the 3rd and 6th rules from left to right) in the engine diagnose decision table. 3.3 Decision Tree Designer

Similar to Decision Table and Frame Designers, Decision Tree Designer it is project based as well, each project containing one or more trees. The structured designer offers the users the possibility to block the tree at each level, thus saving vital designing space. As for the Decision Table Designer, it also imports the definitions from the host project and it generates C++/Java code for selected trees. In order to offer a better view of the system, it displays all the rules encapsulated in the tree in a distinct panel.

The integrated debugger helps you to debug by setting breakpoints at node level and by inspecting variables. Fig. 3 shows a debugging session snapshot. The “red node” is the current activated breakpoint. The profiler uses trace files, which record all executed branches for a decision tree. The execution context (identified by the variable values for decision tree nodes) is also stored in the trace file. The Trace Viewer visualizes trace files in order to find system bottlenecks or time-consuming nodes.

Page 12: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 12

Figure 3. Expert System Creator: Decision Tree Debugger 4. Verification, Validation and Debugging

The knowledge base completion and correctness are key issues in designing large knowledge base systems. Expert System Creator includes appropriate mechanisms for visualizing and testing the correctness of constructed knowledge bases and for debugging the execution of expert systems. Fig. 1 shows the general architecture of the entire application. Some modules (like those of the database integration subsystem) are not described in this paper. The reader is referred to [11] for details about these modules.

The integrated debuggers in Decision Frame/Table/Tree Designer can be used to debug the final system in its original form (as a rules set, table or tree), instead of “classical” C++/Java/Jess debugging. This new approach helps domain users and software developers to visually test and repair the constructed expert system.

4.1 Rules Set

For rule-based systems, the semantic graph (SG) highlights the relationships between rules

and templates. The semantic graph is a pair SG = (N, A)

where N – the set of nodes– is represented by system’s rules, and A is the set of arcs. An arc is defined with source N1 and target N2 if the consequent of the rule N1 asserts a fact that appears in the antecedent of the rule N2.

The visual representation of this graph reveals main or isolated rules and templates. The graphical representation is a better approach to computing various numeric metrics that measure the quality of system’s design. For users confident to numbers, a set of base metrics is also computed.

Despite CLIPS’s age, there are no integrated development environments offering “standard” debugging techniques, like step-by-step execution or breakpoints management for it. Decision Frame Debugger implements these debugging techniques by means of an easy-

Page 13: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 13

to-use, visual user interface. The Decision Frame Debugger includes the following features: rule-by-rule execution, breakpoints management, step into rule RHS’s actions (procedural debugging), variables inspection, display of facts and agenda memories. It supports both CLIPS and JESS inference engines. In debugging mode, the system is automatically executed in a rule-by-rule manner, stopping on each breakpoint. In case of using CLIPS inference engine, Java Native Interface (JNI) technology is used for bridging between Decision Frame Debugger (Java environment) and CLIPS engine (native environment). The communication between the Debugger and the inference engine is described by some general interfaces. For the moment, CLIPS and JESS interface instances are implemented, but more inference engines can be easily plugged in (see Fig. 4).

Figure 4. Decision Frame Debugger

In order to find the time-consuming rules, rule-based profiler traces the system’s execution . The system records the execution context in the trace files and using the Trace Viewer these files are visualized. In case of rules set system, the execution context is formed by the antecedent and the consequent of the executed rule.

Figure 5. Decision Table Analyzer 4.2 Decision Table

The automatically check of the correctness and completeness of the decision table is carried out by the Table Analyzer tool (embedded in Decision Table Designer), which highlights the duplicated and ambiguous rules that exist in the table. Fig. 5 shows two ambiguous rules (the 3rd and 6th rules from left to right) in the sample decision table. To measure the completeness of a decision table, the completion ratio (CR) is computed as follows:

CR = [Possible Rules] / [Actual Rules]

Decision Frame Debugger

Inference Engine Interface

CLIPS

JESS

Page 14: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 14

where [Possible Rules] is the number of possible rules and [Actual Rules] is the number of rules of the decision table. The number of possible rules is computed as the product of cardinalities of all attribute domains.

The Decision Table Debugger lets you debug the system as a decision table. You can set breakpoints on table’s cell that will stop the execution of the system. While the execution is paused, you can inspect variables status. To stop the system’s execution on a breakpoint, the Code Generator module generates additional Java/C++ code for each table’s cell. Before the execution of a Java/C++ statement in the “host” project, the Decision Table Debugger is interrogated whether or not a breakpoint is hit. If a breakpoint is hit, the execution control is passed to the Expert System Creator thread and the current values of all watched variables are updated. When the user continues the program execution (from Decision Table Debugger), the control is regained by the host project thread and the watch variables’ values (possibly modified by the user during the debugging session in Decision Table Debugger) are sent back to the host project. 4.3 Decision Tree

For a better highlighting of the rules induced by a decision tree, the Decision Tree Designer displays all the rules encapsulated in the tree in a distinct panel. It also offers statistics regarding the number of nodes in tree, the number of nodes in each category (decision nodes, action nodes, leaf nodes), the number of induced rules, the number of incomplete decision nodes etc.

The Decision Tree Debugger module offers the possibility to debug the expert system in the decision tree form. Similar to the decision table debugging, the Code Generator module generates additional code for each tree’s node. See the section 3.2 for details regarding the communication between host project and Decision Tree Debugger and for control of the execution as well.

In order to find out the time-consuming rules, the Code Generator module optionally generates addition code for tracing the host program execution. During a “traced” execution of the host program a trace file is created. It contains the execution context for each visited node. The execution context is composed of the timestamp and variables’ values. The trace file is visualized by the Trace Viewer module that embosses the “bottlenecks” nodes. 5. Embedding Expert Systems in Host Systems An important issue in developing knowledge-based systems is their embedding and communication with a host project. The knowledge system is a form of representing the logic of a particular field, a form that is most suitable for the way of perception of the domain for a human expert. Embedding the field’s logic with the user interface and database access modules in large software application is still an open issue. A new solution, made up of two components – the Code Generator module and the use of dictionaries - is proposed here. In the following two subsections, the components are presented in more detail. 5.1. Code Generator Module

After the expert system has been built (as a rules set, decision table or decision tree) it can be translated into a common programming language, such as C/C++, Java or Basic. Although the generation of programming code for decision trees and rules sets is straightforward, the following two steps are necessary for decision tables: • Translation of decision tables into decision trees (see [14] for details)

Page 15: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 15

• C++ code generation for the obtained decision trees

The Code Generator module can be easily customized for any structured, functional or object-oriented programming language, offering an easy way of embedding expert systems into host projects. The generated code can have sections for debugging or tracing purposes. If tracing is enabled when generating the code, then at the execution time of the expert system a trace file will log all the fired rules (branches in decision trees). This file is later used in the Expert System Creator‘s Trace Viewer module in order to discover time-consuming rules, frequent (or spurious) rules or rule-sequences (a set of rules activated in the same order). All this information helps you improve the performances of your expert system. 5.2. Use of Dictionaries

A Dictionary is a collection of external interfaces imported into expert systems from a host project for communication with the project’s data and data structures. In order to use the data types, variables or constants defined inside the host project, they have to be imported into the expert system frame. A dictionary contains the following: • Data and data types imported from the host project: user-defined types (classes or

structures), constants and variables, user-defined or system native functions • Local data: collection of variables defined only in the expert system modules; local data

are variables of primitive type (integer, char etc) or of user-defined data types The Code Generator module gets the information about the module in which the constant / variable / function is defined in the host project, and includes the necessary information in the generated files. The local data is also exported in the generated code. The Dictionary Builder module is able to read user-defined data types, constants and variables from C/C++ header files and Java files. 6. Database Integration Large databases of digital information are available in all fields of human activity. Reasoning using information stored in these large warehouses is a demand of our age. Expert System Creator offers direct support for database access for all three forms: rules set, decision table and decision tree.

In the case of decision tables and decision trees, users can use their preferred database access libraries by importing them in the dictionary component. Using database access functions from within the constructed decision table or decision tree is straightforward and requires no specific handling.

In the case of rules set (or decision frames), the problem of reasoning on facts residing in conventional relational database systems requires more attention. A major objective of database integration is to provide independence of both inference engine and DBMS. The ESDB (Expert System DataBase) is an independent subsystem which acts as a communication channel between DBMSs and a decision frame based expert system. The subsystem presents an architecture (see Fig. 6) where an independent subsystem acts as a communication channel between DBMSs and KBSs (ESs). Note that access to the database and KBS/ES can be achieved by the users through the standard DBMS/KBS languages, therefore preserving the generality and power of DBMSs and KBSs.

Page 16: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 16

Figure 6. ESDB Architecture

6.1. ESDB Modules’ Description

The data dictionary stores information independent of the physical storage, for example, sources of data, relationships to other data and rules assuring the data integrity among the facts. Although it controls the location of data in the DB, it is autonomous from the DBMSs data models.

Database tables and views are mapped to so-called Table Template Definitions (TTDs) on the data dictionary level. A table (or view) in the database can be mapped to one or more TTDs in the data dictionary, whereas a TTD can refer a single database table (or view). Database relationships are mapped to so-called Relationship Rule Definitions (RRDs) in data dictionary level. A RRD rule captures the constraints imposed on data in the database and preserves them for facts in the inference engine’s fact database.

Both TTDs and RRDs are saved in an internal file format representation. In the case of CLIPS and JESS, the CLP syntax is used and the data dictionary is composed of one or more CLP files containing TTDs and RRDs. For TTD the deftemplate construct is used, whilst for RRD the defrule is used. All data dictionary constructs are encapsulated in a separate module, namely the ESDB module.

For example, the Employee table in a RDBMS is mapped to the following template in data dictionary: (deftemplate ESDB :: Employee (slot _DBConnection_ (default ?*DBxxx*)) (slot ID (type Integer)) (slot Name (type String)) (slot Address (type String)) (slot DeptID (type Integer)))

Note that the _DBConnection_ slot holds the database connection information and is automatically assigned by DBWizard to a fact (labeled by global variables of form ?*DBxxx*) having the following template:

Page 17: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 17

(deftemplate ESDB :: DBConnection (slot URL (type STRING)) (slot JDBCDriver (type STRING)) (slot User (type STRING)) (slot Password (type STRING)))

For each database there is a single object holding connection properties, which is then used in all database transfer operations.

DBWizard is a GUI tool allowing the user to create, update and maintain the data dictionary. It defines and updates the TTDs and RRDs in the data dictionary for each DB object. The user may select the columns of the table/view or the relationships he/she is interested in, having the ability to reduce the size of data that will be transferred to/ from the DBMS.

The DBEngine module is responsible for fetching the data from the DB and for updating the DB. It also controls the translation between the internal DB format and KBS formats. In particular, it contains the functions that implement database operations, such as: loading facts from the database, saving facts in the database and transaction control. In all these operations, DBEngine relies on data dictionary information. DBEngine functions may be used in both LHS (‘left hand side’) and RHS (‘right hand side’) of a production rule.

After the KBS/ES request has been mapped, the DBEngine identifies the target DBs that contain information required to answer the query, and generates subqueries to those DBs to gather that information. The DBMiddleware module translates each subquery to a target DB to the syntax of the specific database, and the corresponding DBMS is invoked to process resultant subqueries. The records fetched from the DBMS are converted into KBS/ES facts, more precisely instances of TTD templates.

KBS Interface (KBSI) is responsible for interfacing the DBEngine module with external KBS/ES systems, containing KBS/ES specific code. DBEngine functions prototypes are: (dbOpen <JdbcDriverFile> <JdbcDriver> <Url> <User> <Password>) (dbClose ?*DB*) (dbSelect [?*DB*] <Table-Template> | <TableName> [<Slots-list>] [WHERE <SQL-clause>]) (dbInsert [?*DB*] <Facts-list>) (dbDelete [?*DB*] <Facts-list>) (dbStartTransaction ?*DB*) (dbCommit ?*DB*) (dbRollback ?*DB*)

where ?*DB* is a DBConnection instance, square brackets denote an optional parameter, whilst angle brackets are user-provided values.

Note that for the dbSelect, dbInsert and dbDelete functions, the ?*DB* parameter is optional; if not specified, the database instance is retrieved from the _DBConnection_ slot of asserted facts.

The DBMiddleware module ensures database autonomy for the DBWizard and DBEngine modules, being responsible for the whole database access code. The DBMiddleware component achieves the transparence between the KBS level and the data source level represented by various DBMS vendors. It provides Data Manipulation Language (DML) features (retrieving | updating | deleting data from tables or views), automatic data type

Page 18: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 18

conversion between the DBMS data types and the KBS data types and many other database-specific operations. 6.2. Working With ESDB A normal working session involves the following steps: 1 > Call dbOpen to open the database(s) that contains the table(s). 2 > Call dbStartTransaction to begin transaction(s) on the database(s). 3 > Use dbSelect function to download records from the database table(s) into KBSs. The

downloaded records are automatically converted into instances of their corresponding TTD (JESS/CLIPS facts). An SQL SELECT query is created and only the columns that correspond to attribute slots in TTD are downloaded. By default, all records in a table are downloaded. An optional SQL WHERE clause can be supplied to the dbSelect function to qualify the record selection (WHERE Price > 30).

4 > Reasoning phase. 5 > The function dbCommit to update the records in the DB to reflect changes made to

values on the slots that correspond to columns in the facts; this step requires that the application is able to write to any table(s) that is to be updated. ESDB builds and executes the necessary DB commands.

6 > Call dbClose to close the database(s). 6.3. Exception Handling

A communication fault or database error that occurred while data manipulation is carried out by the DBEngine is signaled to the inference engine by asserting a special purpose fact in the inference engine’s fact database. DBException template describes the error as follows: (deftemplate ESDB :: DBException (slot errorID (type NUMERIC)) (slot errorString (type STRING)) (slot errorDB)) A default exception-handling rule is implemented in the data dictionary, as follows: (defrule ESDB :: DefaultExceptionHandling (declare (salience MAX_SALIENCE)) ?ex<-(DBException (errorID ?errorID) (errorString ?errorStr)) => (retract ?ex) (printout err “[Error ” ?errorID “] “ ?errorStr crlf) (halt) (user-exception-handler ?errorID ?errorStr))

It simply prints to error stream an error description message and calls a user-defined function for special error handling. It is possible to replace the default treatment with user-defined exception handling rules or functions.

The ExceptionOccured function, implemented in the DBEngine module, tests the fact database for DBException instances and returns its identifier or null if no exception arose.

Page 19: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

West University of Timisoara 19

The ESDB full-bridge is implemented in Java using JDBC2 standard to access various database management systems. The Java/JDBC2 approach was preferred because of its large acceptance (JDBC2-compliant drivers are available for both commercial and free relational database systems), its portability over various operating systems, its support for legacy databases through the JDBC-ODBC Bridge. A mapping between SQL data types and inference engine supported data types is carried out by ESDB.

The DBMiddleware, DBWizard and DBEngine modules are implemented in a standard Java package, and a thin JNI (Java Native Interface) layer is used to interface ESDB with C/C++ programs and libraries (e.g. CLIPS). An intrinsic advantage of the ESDB approach coming from its Java implementation is its capability to be embedded in J2EE platform components. Having support for various relational DBMSs and KBS/ESs, it can be integrated in a more general heterogeneous environment. 7. Applications, Future Research and Extensions

Expert systems offer several advantages as decision tools in the performance of complex tasks, such as accounting, planning or decision-making in various domains. Expert System Creator gives users the opportunity to develop an expert system based decision aid that is geared towards their specific needs. It also assists users who do not possess technical expertise in computers in building powerful expert systems.

An effective decision support system must have accurate data, user-friendly interface, reliable knowledge base, and good inference mechanism. Expert System Creator helps you combine these requirements and assists programmers and software engineers in building resource planning systems.

Future research includes the optimization of the decision table to decision tree conversion based on automatic building of the characterization of region of experience.

Expert System Creator has an open architecture for building and testing the decision objects. Further extensions are planned to integrate the rule-based expert systems with fuzzy logic engines such as FuzzyCLIPS [12] or FuzzyJ [13]. The automatic growth of decision trees for classification from large data sets [7] is another important feature to be added in the near future.

The system is entirely implemented in Java using Java2™ SDK 1.3. The Decision Frame module supports CLIPS 6.1 and JESS 6 expert system shells [5] that perform the knowledge-based reasoning process. The Code Generator outputs C++ and Java code for decision tables and trees, whilst the rule-based systems are generated using CLIPS/JESS syntax.

Page 20: Extensible Environment for Knowledge Representation …web.info.uvt.ro/~danielpop/misc/Ref2.pdf · Extensible Environment for Knowledge Representation and Reasoning Mediu extensibil

References [1] Colomb R.M.Representation of Propositional Expert Systems as Partial Functions,

Artificial Intelligence 1999, 109: 187-209. [2] Colomb R. M, Chung C.Y. Strategies for Building Propositional Expert Systems,

International Journal of Intelligent Systems 1995, 10: 295 – 328. [3] Culbert C, Riley G, Donnell B. CLIPS Reference Manual, Volume 1-3, Johnson Space

Center NASA; 1993. [4] Dial R.B. Decision Table Translation, Collected algorithms from CACM, 1970. [5] Far B.H, Takizawa T, Koono Z. Software Creation: An SDL-Based Expert System for

Automatic Software Design. In Faergemand O, Sarma A. editors. Proceedings of SDL '93; Elsevier Publishing Co, North-Holland, 1993; p. 399-410.

[6] Friedman-Hill E. JESS: The Rule Engine for the Java Platform. http://herzberg.ca.sandia.gov/jess [3/11/2002]

[7] Gehrke J, Ganti V, Ramakrishnan R, Wei-Yin Loh. BOAT-Optimistic Decision Tree Construction, Proceedings of SIGMOD Conference 1999, 169-180.

[8] Gordon R. Essential JNI: Java Native Interface, Prentince Hall PTR; 1998. [9] Horn K.A, Compton P, Lazarus L, Quinlan J.R. An Expert Computer System for the

Interpretation of Thyroid Assays in a Clinical Laboratory, Australian Computer Journal 1985, 17:7-11.

[10] Koono Z, Far B.H, Takizawa T, Ohmori M, Hatae K, Baba T. Software Creation: Implementation and Application of Design Process Knowledge in Automatic Software Design, Proceedings of the 5th Int. Conference on Software Eng. and Knowledge Eng., SEKE, 1993 June; CA, USA; p. 332-336.

[11] Optimal Solution web site. http://www.optsol.at [3/21/2002] [12] Orchard R.A. FuzzyCLIPS User’s Guide, Integrated Reasoing Institute for Information

Technology National Research Council Canada; 1998. [13] Orchard R.A. NRC FuzzyJ Toolkit for the Java™ Platform. User’s Guide; 2001.

http://www.iit.nrc.ca/IR_public/fuzzy/fuzzyJDocs [3/11/2002] [14] Pop D. and Negru V. Intertranslability of Representation Forms in Classification,

Proceedings of 10th SINTES Conference; 2000 May; Craiova, Romania. [15] Quinlan J.R. Introduction of Decision Trees, Machine Learning 1; 1984. [16] Riley G, Donnell B. CLIPS Arhitecture Manual, Johnson Space Center NASA; 1993. [17] Shwayder K. Conversion of limited-entry decision tables to computer programs - a

proposed modification to Pollack's algorithm. Communications of the ACM 14, pp. 69-73, 1971.

[18] Shwayder K. Extending the information theory approach to converting limited-entry decision tables to computer programs. Communications of the ACM 17, pp. 532-537, 1974.

[19] Witten I.H, Frank E. Data Mining. Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufman Publishers; 2000.