2002 (Volume 12)
Interval mathematics for analysis of multi-level granularity
(University of Texas, USA)
(University of Houston-Downtown, USA)
The more complex the problem, the more complex the system necessary for solving this problem. For very complex problems, it is no longer possible to design the corresponding system on a single resolution level, it becomes necessary to have multi-level, multiresolutional systems, with multi-level granulation. When analyzing such systems - e.g., when estimating their performance and/or their intelligence - it is reasonable to use the multi-level character of these systems: first, we analyze the system on the low-resolution level, and then we sharpen the results of the low-resolution analysis by considering higher-resolution representations of the analyzed system. The analysis of the low-resolution level provides us with an approximate value of the desired performance characteristic. In order to make a definite conclusion, we need to know the accuracy of this approximation. In this paper, we describe interval mathematics - a methodology for estimating such accuracy. The resulting interval approach is also extremely important for tessellating the space of search when searching for optimal control. We overview the corresponding theoretical results, and present several case studies.
keywords: interval mathematics, multi-resolution granulation, space division methods, multi-D generalisations.
Effect of fuzzy discretization in fuzzy rule-based systems for classification problems with continuous attributes
|Hisao Ishibuchi and Takashi Yamamoto|
(Osaka Prefecture University, Japan)
Continuous attributes are usually discretized into intervals in machine learning and data mining. Our knowledge representation is, however, not always based on such discretization. For example, we usually use linguistic terms (e.g., young, middle-aged, and old ) for dividing our ages into some categories with fuzzy boundaries. In this paper, we examine the effect of fuzzy discretization on the classification performance of fuzzy rule-based systems through computer simulations on simple numerical examples and real-world pattern classification problems. For executing such computer simulations, we introduce a control parameter that specifies the overlap grade between adjacent antecedent fuzzy sets (i.e., linguistic terms) in fuzzy discretization. Interval discretization can be viewed as a special case of fuzzy discretization with no overlap. Computer simulations are performed using fuzzy discretization with various specifications of the overlap grade. Simulation results show that fuzzy rules have high generalization ability even when the domain interval of each continuous attribute is homogeneously partitioned into linguistic terms. On the other hand, generalization ability of rule-based systems strongly depends on the choice of threshold values in the case of interval discretization.
keywords: pattern classification, discretization of continuous attributes, rule extraction, data mining, fuzzy rules, rule weights.
An Algorithm of granulation on numeric attributes for association rules mining
|Been-Chian Chien, Zin-Long Lin, Yi-Xue Chen|
(I-Shou University, Taiwan)
(National University of Kaohsiung, Taiwan)
Mining association rules from numeric data is relatively more difficult than categorical data. The main reason is that the domain of real number lacks of the user's abstraction on reality. In this paper, we propose an algorithm to granulate numeric intervals automatically. The proposed method defines two threshold factors, information density-similarity and information closeness, to measure the condition if two granules should be merged and construct an abstraction hierarchy of intervals. For abstracting the best level of interval from the interval hierarchy automatically, we develop a determination function based on the threshold factors. After the intervals are determined, the fuzzy membership functions for each interval can be generated. Then an algorithm for mining fuzzy association rules can be used to mine qualified association rules from the fuzzy intervals.
keywords: data analysis, information granulation, data mining, fuzzy association rule, clustering.
Generation of interpretable fuzzy granules by a double-clustering technique
|Giovanna Castellano, Anna Maria Fanelli and Corrado Mencar|
(University of Bari, Italy)
This paper proposes an approach to derive fuzzy granules from numerical data. Granules are first formed by means of a double-clustering technique, and then properly fuzzyfied so as to obtain interpretable granules, in the sense that they can be described by linguistic labels. The double-clustering technique involves two steps. First, information granules are induced in the space of numerical data via the FCM algorithm. In the second step, the prototypes obtained in the first step are further clustered along each dimension via a hierarchical clustering, in order to obtain one-dimensional granules that are afterwards quantified as fuzzy sets. The derived fuzzy sets can be used as building blocks of a fuzzy rule-based model. The approach is illustrated with the aid of a benchmark classification example that provides insight into the interpretability of the induced granules and their effect on the results of classification.
keywords: information granulation, fuzzy clustering, hierarchical clustering, fuzzy rule-based model.
Granulating XML information
(Universita di Milano, Italy)
(LaTrobe University, Australia)
The eXtensible Mark-up Language (XML) is the standard mark-up language for representing, exchanging and publishing information on the Web. The XML data model, called Infoset, represents XML documents as multi-sorted graphs}, including nodes belonging to a variety of types. XML-based formats are increasingly used as languages for interoperability and agents' communication on the Internet, raising the need for techniques capable to extract and organize heterogeneous XML messages and data while tolerating variations in their internal structure. This paper presents a technique for organizing well-formed XML information items around user provided graph patterns. Our approach is based on a graph granulation technique that allows agents to extract XML data at different levels of detail, using XML graphs' edges as a hint to semantic relation between nodes. The design and implementation of a software tool for XML data granulation are also discussed.
keywords: XML information granulation, multi-sorted graphs, graph granulation, multiple levels of detail.
Granular computing as an abstraction of data aggregation - a view on optical music recognition
(Warsaw University of Technology, Poland)
In the paper optical music recognition (OMR) is considered as an example of paper-to-computer-memory data flow. This specific area of interest forces specific methods to be applied in data processing, but in principle, gives a perspective on the merit of the subject of data aggregation. The process of paper-to-computer-memory music data flow is presented from the perspective of the process of acquiring information from plain low-level data. The discussion outlines an interpretation of this process as a metaphor of granular computing. The stages of data aggregation and data abstraction are shown as steps leading to the formation of knowledge granules and to recovering dependencies between knowledge granules and between the information included in knowledge granules. An influence of the granular world of music notation on the design of a computer program is presented. The presentation is related to a real computer program of music notation recognition and music knowledge representation and processing. The relationship between the granular structure of music knowledge and user interface of the program is outlined.
keywords: data aggregation, data abstraction, granular computing, information granules, knowledge representation, music notation, music recognition, music representation, user interface design.
Granular Entropy and Granulation Process
(Jerusalem College of Technology, Israel)
People use granulation to represent original data as a set of entities that are better suited for managing resulting subtasks. Concepts that are introduced in this paper are based on two features of granulation: the relevance of all the points in a granule and relative number of points in every granule. Based on these features several concepts are introduced that include granular relevance, defined as sum of relevancies of all the points in the granule and granular entropy, that is similar to information entropy and reflects the dispersion of relevant points across granules. Using these concepts granulation process is represented as solution of optimization problem where objective function is granular entropy. To this end the theorem, that shows how the change in relevance of points in a granule affects granular entropy, was proved. The last two sections of the paper show how leverage over the granulation process can be achieved by using t-norm and uni-norm operators.
keywords: granulation, entropy, clustering, control, t-norm, uni-norm operators.
Data granulation through optimization of similarity measure
(The Nottingham Trent University, UK)
(University of Alberta, Canada)
(Tokyo Institute of Technology, Japan)
We introduce a logic-driven clustering in which prototypes are formed and evaluated in a sequential manner. The way of revealing a structure in data is realized by maximizing a certain performance index (objective function) that takes into consideration an overall level of matching (to be maximized) and a similarity level between the prototypes (the component to be minimized). It is shown how the relevance of the prototypes translates into their granularity. The clustering method helps identify and quantify anisotropy of the feature space. We also show how each prototype is equipped with its own weight vector describing the anisotropy property and thus implying some ranking of the features in the data space.
keywords: logic-based clustering, information granulation, t- and s-norms, similarity index, granular prototypes, relevance, data mining, direct and inverse matching problem.