Skip to content

Latest commit

 

History

History
508 lines (425 loc) · 26.5 KB

File metadata and controls

508 lines (425 loc) · 26.5 KB

OpenCAESAR Vocabulary Bundle Closure

Release

J S Jenkins
Systems Engineering Division
Jet Propulsion Laboratory
California Institute of Technology

2024-05-31

Introduction

One of the key features of the Semantic Web (and its formal foundations in Description Logic) is the so-called open-world semantics, which means simply that any unasserted claim may be true or false. The world of ordinary human interaction is open in just this way; knowledge is often incomplete.

Open-world semantics typically do not apply in traditional database applications. If my name does not appear in the table of customers, then an application is justified in concluding I am not (yet) a customer.

Neither open- nor closed-world semantics are “correct”, per se. Each has its appropriate uses, and it is important to be aware of the semantics in effect and draw proper conclusions.

One aspect of open-world semantics that is sometimes surprising to people familiar with object-oriented software development has to do with implicit disjointness. Suppose, for example, in Java, we declare classes as follows:

public class Vehicle {}
public class Person {}

We naturally expect that an object may be a Vehicle or it may be a Person, but it cannot be both. That is, the sets of objects of type Vehicle and Person are disjoint. Formally, two sets are disjoint if their intersection is empty.

If instead, however, we want to declare classes that are not disjoint because one is a subclass of the other, Java requires us to declare the relation explicitly:

public class Vehicle {}
public class Car extends Vehicle {}

The situation with OWL and the Semantic Web is somewhat different. Suppose we declare OWL classes as follows:

Declaration(Class(Vehicle))
Declaration(Class(Person))
Declaration(Class(Car))

If we assert nothing further, then any pair of classes may have a nonempty intersection. To assert the the situation from the Java example, we must add two axioms:

DisjointClasses(Vehicle Person)
SubClassOf(Car Vehicle)

Note that we do not need to assert

DisjointClasses(Car Person)

That fact follows from the definition of disjointness--a DL reasoner will infer it. Note also that, while it is probably not what we mean, it would not be logically inconsistent to assert

DisjointClasses(Car Vehicle)

A reasoner will include that the set of cars is empty, but the empty set is a valid set. A class that can have no members is said to be unsatisfiable.

Disjointness Maintenance

The general problem of disjointness maintenance is the augmentation of a taxonomy with disjointness axioms that encode a specific policy for vocabulary closure. It is of utmost importance to note that these disjointness axioms are in no sense implied by the taxonomy itself; indeed, the open world interpretation is that two classes are not considered to be disjoint unless explicitly declared to be so or if their disjointness is implied by mutually exclusive constraints such as property range or cardinality restrictions.

Disjointness maintenance can be simplified by declaring a policy regarding disjointness and then implementing that policy in code that can reliably generate the prescribed disjointness axioms.

We propose here a simple policy: any two classes that have no common named subclass are considered to be disjoint. A simple corollary is that, if B is a subclass of A, then A and B are not disjoint because A and B have a common subclass, namely B. Also note that if A and B are disjoint, then every subclass of A is disjoint with every subclass of B. We can use this fact to reduce the number of disjointness axioms.

Note that one direction of implication for the policy is simply a consequence of the definition of disjointness. If any two (nonempty) classes A and B are disjoint, then no element e can belong to both A and B. Consequently there can exist no nonempty common subclass of A and B. The policy is simply to make the implication bidirectional.

This policy is inappropriate for, say, biological taxonomies in which we seek to classify objects as they exist in the real world, without teleological context. In that case, disjointness is a feature to be discovered (or not). Developing a vocabulary for engineering, in contrast, involves identifying important concepts and noting that, in many cases, these concepts are disjoint. By definition, an engineering requirement, the system component bound by that requirement, and the supplier of that component cannot be the same thing; they belong to disjoint categories of things entirely. It is appropriate in this cases to declare our intent that the ontological classes Requirement, Component, and Supplier are disjoint.

The implemented policy simply makes disjointness the default. Exceptions must be stated explicitly.

The objectives of a disjointness maintenance algorithm are threefold:

  1. to implement the disjointness policy,
  2. to minimize the number of disjointness axioms generated, and
  3. to generate disjointness axioms of tractable computational complexity for a reasoner.

The final item is beyond the expertise of the author. We focus on the first two and hope for the best with the third.

Finally, there are circumstances in which applying the policy is inappropriate. In OML, we distinguish between concepts and aspects. The operational difference is precisely that: the concepts taxonomy is subjected to vocabulary bundle closure and the aspects taxonomy is not. In practice, concepts are used to denote classes such as Person and Vehicle in which the policy is clearly applicable, and aspects denote classes such as ThingWithVolume and ThingWithMass, in which it is equally clearly not applicable.

A terminological note: when we speak of two classes having a common subclass, we mean an explicitly-named subclass in the taxonomy; that is, defined and related to its superclasses by axioms.

The Simplest Case

Consider the case of a taxonomy that is a directed rooted tree in the graph-theoretic sense. A tree is an undirected graph that is connected and acyclic. (An equivalent condition is that there is exactly one path between any two vertices.) A directed tree is a tree in which the edges are directed, and a rooted tree is a directed tree in which a single vertex is designated the root. For this discussion we will take edge direction to be from subclass to superclass; the parents of a vertex correspond to its superclasses and its children correspond to its subclasses.

Theorem

If the taxonomy is a rooted tree, declaring all sibling subclasses of every class disjoint satisfies the disjointness policy.

Proof

Suppose two distinct classes A and B have no common subclass. Then the path from A to the root does not pass through B, and vice versa. Consequently, there exists some lowest common ancestor L (which may be the root) and sibling classes of L A' and B' such that AA' and BB'. If sibling subclasses A' and B' are declared disjoint, then A and B are disjoint. ☐

Illustration

The General Case

In the general case, we cannot assume the taxonomy is a tree. There may be explicitly-asserted common subclasses, and these invalidate the assumptions that led to the simple algorithm in the simple case.

It is possible, however, to apply the disjointness policy to a general taxonomy in a sequence of graph transformations that transform the taxonomy into an tree representing a subset of the taxonomic relationships implied by the original taxonomy and the consequences of the policy. After such transformation, we simply apply the foregoing simple algorithm to generate disjointness axioms.

The complete taxonomic semantics of the original taxonomy plus the disjointness policy are conveyed in the original taxonomy, the transformed tree, and the generated disjointness axioms. All must be supplied to the reasoner to infer discover all entailed disjointness.

The subclass relation is transitive, that is, if AB and BC, then AC. Consequently, any arbitrary taxonomy represents the same relations after transitive reduction.

Bypass-Reduce-Isolate Algorithm

Theory

Let T be a taxonomy and let G be a graph whose edges represent the is-subclass-of relation in T. If G is not single-rooted, we can without penalty insert a unique root so that it is single-rooted. If G contains any directed cycles (or more precisely, any strongly connected components), we can replace G by its condensation. The reasoner will find all classes in a strongly connected component to be equivalent. Consequently, we can substitute any class from the component for the component in the condensation. Finally, we can perform transitive reduction on G. Therefore we assume in the following that G is a directed acyclic rooted graph in transitive reduction form.

Suppose there exists a class C such that CB1, CB2, …, CBk (where k > 1), D1C, D2C, …, DlC (where l ≥ 0). Then there exists more than one path from C to the root, and therefore G is not a tree. If more than one such multi-parent vertex exists, those vertices can be topologically sorted and there must be at least one that is not a superclass of any other. (If that were not true then G would contain at least one directed cycle.) Let C be that vertex. Then the graph rooted at C is a tree.

Multi-parent class

The fact that G is in reduced form implies that no Bn is a sub- or superclass of Bi for ni.

Consider a particular parent of C (call it B). B cannot be the root, otherwise G would not be in transitive reduction form. B must therefore have at least one parent vertex (call them A1, A1, …, Aj).

Detail for Bi

The first step of the procedure makes use of the identity B ≡ (B \C) ⋃ (BC). Because the graph rooted at C is a tree, there can be no edges from C or any of its descendants to any En. The disjointness policy therefore implies that En is disjoint from C, and therefore EnB \C for every En. Moreover, CB implies that BCC. We can therefore substitute as shown below:

Step 2

Now note that the subclass relations between B and its parents and that between B and C are asserted in the original taxonomy (which will be provided to the reasoner) and need not be repeated in the transformed taxonomy. Finally note that B \CB by definition. That relation may also be omitted from the transformed taxonomy.

Step 3

Now observe that after this step is applied to each parent Bi of C, C may still have multiple parents, but the length of its longest path to the root has now been reduced by one. Recalling that by assumption G is rooted, repeated applications of Steps 1-3 will terminate eventually because a vertex with unit distance to the root has only a single parent, the root. Therefore, repeated applications of the algorithm to C will converge. The procedure eliminates the multi-parentedness of C and creates no other multi-parent children, so repeated applications to each multi-parent child will eliminate all multi-parent children. The resulting transformed graph is a tree.

Relation to Partition Refinement

This algorithm bears some similarity to the algorithm for relational coarsest partition described in R. Paige and R. E. Tarjan, Three Partition Refinement Algorithms, SIAM Journal on Computing, Society for Industrial and Applied Mathematics (SIAM), Jan 1987. The definitive version is available at https://doi.org/10.1137/0216062.

Algorithm Implementation

The transformation code does not follow the proof steps one-by-one. An outer while-loop selects, by depth-first search, the first multi-parent child C. If none, then G is a tree and the algorithm terminates. If C is found, then the subgraph rooted at C is a tree (because the search is depth-first) and the following steps are applied.

Bypass

The edge from C to each of its parents (Bi) is replaced by an edge to each of its grandparents (Ai,n). That is, the parents of C are bypassed.

Reduce

The bypass step may leave G no longer in transitive reduction form. The reduce step removes any redundant edges among ancestors of C.

Isolate

The isolate step replaces each Bi with Bi \ C.

Class Expression Library

The fundamental data structure underlying the algorithm is a graph of class expressions involving complement, difference, intersection, and union of named (unitary) classes. The library that implements these expressions employs a handful of theorems to simplify certain cases. For example:

  • A ⋃ (BC) ≡ (ABC)
  • (A \ B) \ CA \ (BC)
  • A \ A ≡ ∅

These simplifications are syntactic only; the algorithm does not perform full set-theoretic reasoning, which is after all the job of the reasoner for which the bundle closure algorithm produces input. Non-minimality of expressions has so far presented no difficulties in practice.

Simple Example

Non-Trivial Example

This example is drawn from an experiment to encode the taxonomy of the KerML Libraries. It is an extract of that larger problem that is complex enough to have a non-obvious solution but small enough that the solution can be calculated directly for comparison.

Original Taxonomy

Exhaustive Disjointness Enumeration

It is a straightforward calculation to find all descendant subclasses of each of the 20 classes in this taxonomy, and to compute the intersection of the sets of descendants of all 190 pairs of distinct classes. If that intersection is empty, the classes are deemed disjoint by policy. We expect our bundle closure algorithm to conclude that those pairs, and only those pairs, are disjoint.

The result of that computation follows.

Disjoint Class Pairs

class 1 class 2
BinaryLink ChangeMonitor
BinaryLink Clock
BinaryLink DefaultMonitorLife
BinaryLink Life
BinaryLink UniversalClockLife
BinaryLinkObject ChangeMonitor
BinaryLinkObject Clock
BinaryLinkObject DefaultMonitorLife
BinaryLinkObject FlowTransfer
BinaryLinkObject FlowTransferBefore
BinaryLinkObject HappensBefore
BinaryLinkObject HappensLink
BinaryLinkObject Life
BinaryLinkObject MessageTransfer
BinaryLinkObject Performance
BinaryLinkObject Transfer
BinaryLinkObject TransferBefore
BinaryLinkObject UniversalClockLife
ChangeMonitor Performance
Clock Performance
DefaultMonitorLife Performance
FlowTransfer ChangeMonitor
FlowTransfer Clock
FlowTransfer DefaultMonitorLife
FlowTransfer Life
FlowTransfer MessageTransfer
FlowTransfer Object
FlowTransfer UniversalClockLife
FlowTransferBefore ChangeMonitor
FlowTransferBefore Clock
FlowTransferBefore DefaultMonitorLife
FlowTransferBefore Life
FlowTransferBefore MessageTransfer
FlowTransferBefore Object
FlowTransferBefore UniversalClockLife
HappensBefore ChangeMonitor
HappensBefore Clock
HappensBefore DefaultMonitorLife
HappensBefore Life
HappensBefore MessageTransfer
HappensBefore Object
HappensBefore UniversalClockLife
HappensLink ChangeMonitor
HappensLink Clock
HappensLink DefaultMonitorLife
HappensLink Life
HappensLink MessageTransfer
HappensLink Object
HappensLink UniversalClockLife
Link ChangeMonitor
Link Clock
Link DefaultMonitorLife
Link Life
Link UniversalClockLife
LinkObject ChangeMonitor
LinkObject Clock
LinkObject DefaultMonitorLife
LinkObject FlowTransfer
LinkObject FlowTransferBefore
LinkObject HappensBefore
LinkObject HappensLink
LinkObject Life
LinkObject MessageTransfer
LinkObject Performance
LinkObject TransferBefore
LinkObject UniversalClockLife
MessageTransfer ChangeMonitor
MessageTransfer Clock
MessageTransfer DefaultMonitorLife
MessageTransfer Life
MessageTransfer Object
MessageTransfer UniversalClockLife
Object Performance
Performance Life
Transfer ChangeMonitor
Transfer Clock
Transfer DefaultMonitorLife
Transfer Life
Transfer LinkObject
Transfer Object
Transfer UniversalClockLife
TransferBefore ChangeMonitor
TransferBefore Clock
TransferBefore DefaultMonitorLife
TransferBefore Life
TransferBefore MessageTransfer
TransferBefore Object
TransferBefore UniversalClockLife
UniversalClockLife ChangeMonitor
UniversalClockLife DefaultMonitorLife
UniversalClockLife Performance

Non-Disjoint Class Pairs

class 1 class 2
Anything BinaryLink
Anything BinaryLinkObject
Anything ChangeMonitor
Anything Clock
Anything DefaultMonitorLife
Anything FlowTransfer
Anything FlowTransferBefore
Anything HappensBefore
Anything HappensLink
Anything Life
Anything Link
Anything LinkObject
Anything MessageTransfer
Anything Object
Anything Occurrence
Anything Performance
Anything Transfer
Anything TransferBefore
Anything UniversalClockLife
BinaryLink BinaryLinkObject
BinaryLink FlowTransfer
BinaryLink FlowTransferBefore
BinaryLink HappensBefore
BinaryLink HappensLink
BinaryLink LinkObject
BinaryLink MessageTransfer
BinaryLink Object
BinaryLink Occurrence
BinaryLink Performance
BinaryLink Transfer
BinaryLink TransferBefore
BinaryLinkObject LinkObject
BinaryLinkObject Object
BinaryLinkObject Occurrence
ChangeMonitor Life
ChangeMonitor Occurrence
Clock ChangeMonitor
Clock DefaultMonitorLife
Clock Life
Clock Occurrence
Clock UniversalClockLife
DefaultMonitorLife ChangeMonitor
DefaultMonitorLife Life
DefaultMonitorLife Occurrence
FlowTransfer Occurrence
FlowTransfer Performance
FlowTransferBefore FlowTransfer
FlowTransferBefore Occurrence
FlowTransferBefore Performance
HappensBefore FlowTransfer
HappensBefore FlowTransferBefore
HappensBefore Occurrence
HappensBefore Performance
HappensBefore TransferBefore
HappensLink FlowTransfer
HappensLink FlowTransferBefore
HappensLink HappensBefore
HappensLink Occurrence
HappensLink Performance
HappensLink TransferBefore
Link BinaryLink
Link BinaryLinkObject
Link FlowTransfer
Link FlowTransferBefore
Link HappensBefore
Link HappensLink
Link LinkObject
Link MessageTransfer
Link Object
Link Occurrence
Link Performance
Link Transfer
Link TransferBefore
LinkObject Object
LinkObject Occurrence
MessageTransfer Occurrence
MessageTransfer Performance
Object ChangeMonitor
Object Clock
Object DefaultMonitorLife
Object Life
Object Occurrence
Object UniversalClockLife
Occurrence Life
Occurrence Performance
Transfer FlowTransfer
Transfer FlowTransferBefore
Transfer HappensBefore
Transfer HappensLink
Transfer MessageTransfer
Transfer Occurrence
Transfer Performance
Transfer TransferBefore
TransferBefore FlowTransfer
TransferBefore FlowTransferBefore
TransferBefore Occurrence
TransferBefore Performance
UniversalClockLife Life
UniversalClockLife Occurrence

Exhaustive enumeration is impractical for anything other than relatively small problems because the number of class pairs grows quadratically in the number of classes. The full KerML libraries taxonomy from which this example is drawn, for example, contains 417 classes, which implies 86,736 class pairs.

Moreover, the straightforward enumeration approach produces disjointness axioms for pairs of classes, which is not concise and likely redundant. Reducing these axioms to a smaller number of mutual disjointness axioms is equivalent to finding all cliques, a well-known problem in computer science with worse-than-polynomial complexity.

Bundle Closure Results

The OpenCAESAR Java bundle closure application completes in 14 iterations of the outer loop and produces two results:

Transformed Taxonomy

Disjointness axioms

  • Disjoint Children of Anything:
    • Link ∖ (BinaryLinkObject ∪ LinkObject ∪ (Transfer ∖ (FlowTransferBefore ∪ TransferBefore)))
    • Occurrence ∖ (FlowTransferBefore ∪ TransferBefore ∪ (Transfer ∖ (FlowTransferBefore ∪ TransferBefore)))
    • Transfer ∖ (FlowTransferBefore ∪ TransferBefore)
  • Disjoint Children of Link ∖ (BinaryLinkObject ∪ LinkObject ∪ (Transfer ∖ (FlowTransferBefore ∪ TransferBefore))):
    • BinaryLink ∖ (BinaryLinkObject ∪ FlowTransferBefore ∪ (Transfer ∖ (FlowTransferBefore ∪ TransferBefore)) ∪ (TransferBefore ∖ FlowTransferBefore))
    • FlowTransferBefore
    • TransferBefore ∖ FlowTransferBefore
  • Disjoint Children of Object ∖ (BinaryLinkObject ∪ LinkObject):
    • ChangeMonitor ∖ DefaultMonitorLife
    • Clock ∖ (UniversalClockLife ∪ DefaultMonitorLife)
    • DefaultMonitorLife
    • UniversalClockLife
  • Disjoint Children of Occurrence ∖ (FlowTransferBefore ∪ TransferBefore ∪ (Transfer ∖ (FlowTransferBefore ∪ TransferBefore))):
    • BinaryLinkObject
    • Life ∖ (UniversalClockLife ∪ DefaultMonitorLife)
    • LinkObject ∖ BinaryLinkObject
    • Object ∖ (BinaryLinkObject ∪ LinkObject)
    • Performance ∖ (FlowTransferBefore ∪ (Transfer ∖ (FlowTransferBefore ∪ TransferBefore)) ∪ (TransferBefore ∖ FlowTransferBefore))
  • Disjoint Children of Transfer ∖ (FlowTransferBefore ∪ TransferBefore):
    • FlowTransfer ∖ FlowTransferBefore
    • MessageTransfer

Results Verification

It is not immediately apparent that the subclass relations in the transformed taxonomy and the disjointness axioms suffice to conclude disjointness of class pairs as enumerated above. Drawing inferences that follow logically from axioms, especially when those inferences are not obvious, however, is a primary use case for a reasoner. We can employ the reasoner to check the results by creating a subclass of both classes of each pair. The reasoner should find each subclass unsatisfiable if and only if the corresponding pair of classes are disjoint.

OML provides a convenient syntax for the test model. OML concepts correspond to OWL classes that are subject to vocabulary bundle closure. We create a vocabulary expressing the original taxonomy and include it in a bundle. We then extend that vocabulary with 190 test concepts using this pattern:

concept BinaryLinkObject_FlowTransfer < example-vocabulary:BinaryLinkObject, example-vocabulary:FlowTransfer

Running the standard OpenCAESAR workflow produces a reasoning report with the expected result. These (and only these) 91 classes are unsatisfiable:

class
BinaryLinkObject_ChangeMonitor
BinaryLinkObject_Clock
BinaryLinkObject_DefaultMonitorLife
BinaryLinkObject_FlowTransferBefore
BinaryLinkObject_FlowTransfer
BinaryLinkObject_HappensBefore
BinaryLinkObject_HappensLink
BinaryLinkObject_Life
BinaryLinkObject_MessageTransfer
BinaryLinkObject_Performance
BinaryLinkObject_TransferBefore
BinaryLinkObject_Transfer
BinaryLinkObject_UniversalClockLife
BinaryLink_ChangeMonitor
BinaryLink_Clock
BinaryLink_DefaultMonitorLife
BinaryLink_Life
BinaryLink_UniversalClockLife
ChangeMonitor_Performance
Clock_Performance
DefaultMonitorLife_Performance
FlowTransferBefore_ChangeMonitor
FlowTransferBefore_Clock
FlowTransferBefore_DefaultMonitorLife
FlowTransferBefore_Life
FlowTransferBefore_MessageTransfer
FlowTransferBefore_Object
FlowTransferBefore_UniversalClockLife
FlowTransfer_ChangeMonitor
FlowTransfer_Clock
FlowTransfer_DefaultMonitorLife
FlowTransfer_Life
FlowTransfer_MessageTransfer
FlowTransfer_Object
FlowTransfer_UniversalClockLife
HappensBefore_ChangeMonitor
HappensBefore_Clock
HappensBefore_DefaultMonitorLife
HappensBefore_Life
HappensBefore_MessageTransfer
HappensBefore_Object
HappensBefore_UniversalClockLife
HappensLink_ChangeMonitor
HappensLink_Clock
HappensLink_DefaultMonitorLife
HappensLink_Life
HappensLink_MessageTransfer
HappensLink_Object
HappensLink_UniversalClockLife
LinkObject_ChangeMonitor
LinkObject_Clock
LinkObject_DefaultMonitorLife
LinkObject_FlowTransferBefore
LinkObject_FlowTransfer
LinkObject_HappensBefore
LinkObject_HappensLink
LinkObject_Life
LinkObject_MessageTransfer
LinkObject_Performance
LinkObject_TransferBefore
LinkObject_UniversalClockLife
Link_ChangeMonitor
Link_Clock
Link_DefaultMonitorLife
Link_Life
Link_UniversalClockLife
MessageTransfer_ChangeMonitor
MessageTransfer_Clock
MessageTransfer_DefaultMonitorLife
MessageTransfer_Life
MessageTransfer_Object
MessageTransfer_UniversalClockLife
Object_Performance
Performance_Life
TransferBefore_ChangeMonitor
TransferBefore_Clock
TransferBefore_DefaultMonitorLife
TransferBefore_Life
TransferBefore_MessageTransfer
TransferBefore_Object
TransferBefore_UniversalClockLife
Transfer_ChangeMonitor
Transfer_Clock
Transfer_DefaultMonitorLife
Transfer_Life
Transfer_LinkObject
Transfer_Object
Transfer_UniversalClockLife
UniversalClockLife_ChangeMonitor
UniversalClockLife_DefaultMonitorLife
UniversalClockLife_Performance