Relational Object Project

written 26 April 2003 being revised May 2006

Google site search

ContentsIntroductionBasicIntermediateAdvancedFuturePolicyInfrastructure

1   Introduction

This project is oriented towards Maths students and those interested in Computer Science. Relations are fundamental to mathematics and are the main application of set theory. They are also the underlying principle behind relational database.

In this project tcl code is developed which provides a basis for the Micro RDB project.

This package of code is developed using the notion of relation as an object to ensure data integrity in all applications which use relations.

Relations make good objects because an application usually only deals with a few relations at a time and they have a reasonably small base of well-known procedures. In other words the object "relation" is very well understood and well-behaved in a strict mathematical sense.

The code developed here may be sufficienct in many instances to be used in a database context for a small appilaction without invoking the more general approach of the RDB.

2   What is a relation?

A relation is a set of ordered n-tuples over an ordered n-tuple of domains.

A domain is a conceptual notion, it may be numbers, products, dates, colours. In order to compare things they must have the same domain. We cannot compare dates with colours. Some domains, such as numbers, are more general in concept than others such as date. We can consider a date to be a kind of number and compare a date with a number, but really we are looking at the number as though it was a date when we do this, otherwise the comparison has no useful meaning to us.

This means that when we do things with different relations we must always take care that the domains match up. Managing this is an important part of the procedures that we develop.

There are two very common forms of relation, the 1-tuple and the binary relation. The 1-tuple is the same as a simple set, a collection of values from a domain of such values. It may be the even integers, dark colours, or good movies.

The binary relation is a 2-tuple where both domains are the same. It is also called a graph. The word graph here is not used in the sense of plotting data, but is used in the mathematical meaning of graph theory. Graph theory is concerned with the way things connect, or are dependant on one another. A computer's file (hierarchical) structure is a special example of a graph, called a tree.

In relational databases each table is a relation. The domains are things like customer code, address, phone number, outstanding balance.

The order of the domains is important and relates directly to the order of the values in each n-tuple. If the first domain is cumstomer number, then the first value in every tuple is a customer number. The n-tuples are not just lists that can be any length and contain any information. Each n-tuple must have the same length as the domain n-tuple, and every value must correspond to its domain according to its position in the n-tuple.

We can represent a relation in tcl as a list with two components: the domain tuple and the list of element tuples. So the list length of a relation is 2.

The domain tuple is never empty. It is represented by a simple tcl list. A particular domain can appear more than once in the domain list. The length of the domain list dictates the length of the tuples in the second part of the list, the element list.

The element list is a proper list. It can have any number of items in the list from 0 up, so the list can be empty. Every item in the list must be an n-tuple: a list of values of the same length as the domain list. Since this list represents a set, there should be no duplicate tuples in the list.

Any list with this structure can be construed to represent a relation. This means that we can look at any tcl list and decide if it is appropriate to be used in the context of relational operations.

We could put the domain list first or second, but for human readability purposes, say when we peek at a file with a relation stored in it, having the domain list first makes it easy for us to quickly identify what kind of relation it is.

2.1   Questions

  1. [1] Which of the following tcl lists are relations, using the representation outlined above ?
  2. [1] Write a representation of an empty set. Empty means there are no values in the tuple list.
  3. [1] Write a representation of an empty binary relation.
  4. [2] Write a representation of some pairs of complementary colours. Be careful here and note that if yellow is the complement of blue, then blue is also the complement of yellow.

3   What is a domain

A domain can be any collection of values. But it should represent a well defined set. Because all the relations in this implementation are finite with explicit value, we do not get into difficulty with logic. A domain is an absolute kind of animal, that is it should be something that we all recognise in common, such as strings, numbers, colours. Given any value we should be able to tell of it is a valid value of the domain or not.

4   Binary relations - Relations on a single domain

Before looking at the more general case of an n-tuple of domains, it is helpful to consider just one domain and relational operators on it. These are called binary relations and are a special case of relations handled by ::rel. There are two main kinds of relational operator, equality and order."

4.1   Equality

Equality tests whether two objects are "the same" as far as we are concerned. Sometimes we require precision when we decide, sometimes we are happy to approximate. A procedure [equalop a b] that tests for equality typically obeys the convention that if a=b it returns 1, else it returns 0. When we write a procedure [equalop a b] that tests for equality, we need to write a procedure that has the following properties: The last condition has the effect of forcing all the values in the domain to 'bunch together' into sets in which all the values are considered to be the same. Examples are { 1, 1.0, 2.0e0, one } and { yellow, #ffff00, (255,255,0) }

When we use the == sign in tcl we are assuming it has these properties. Its useage within tcl conforms to this in all cases as far as I know. We as users have a common understanding that the == sign behaves this way.

4.2   Similar or Close

'Similar' is like equality but not so rigourous. It drops the last condition listed for equality. This means it behaves in a fuzzy way. There are many examples of similar operators in practice. We may compare image files for similarity. When we compare real numbers on a computer we use fuzzy comparison that attempts to make allowance for rounding errors (checkout tcl's ::math::fuzzy ). Similarity is the same as the concept of closeness. A can be close to B, B can be close to C, but A may not be close to C if C is on the far side of B from A.

The [close] operator should return 1 if the elements are close, 0 otherwise. When we use a 'closeness' operator there is often a parameter of degree included. How close, how similar. These operators typically have 3 parameters. The degree should be the first parameter so that when the procedure is used in ::rel the two element arguments can be appended to it.

The [closeop] procedure which you define should return 1 if a is close to b, 0 otherwise. This means that a [closeop] procedure behaves the same as an [equalop] as far as the ::rel procedures are concerned.

4.3   Order

Order is a binary relation that tells us if something is "less than" something else. In normal useage if a < b and b < a then a is equal to b. Order relations are anti-reflexive, a is NOT < a. They are anti-symmetric, if a < b then b is NOT < a. But they are transitive. If a < b and b < c then a < c.

Because order includes a notion of equality the procedure for an order relation typically returns a range of values. [orderop a b] returns a value < 0 (typically -1) if a < b, returns 0 if a = b, and returns a value > 0 (typically 1 ) if a > b.

4.4   Other binary relations

In general, any binary relation can be in relational join and restriction operations. This includes a binary relation created by ::rel procedures. When using these binary operators for join or restriction you must be careful about the order of the join. Generally the join of a and b will be different to the join of b and a.

General binary relations have the same protocol as [equalop] and [closeop] procedures. They return 1 if a is related to b, 0 if not.

4.5   Summary of relational protocols

The discussion of binary relations shows as that we use 2 protocols for binary operators. 2-valued { 0, 1 } and 3-valued { <0, 0, >0 }. ::rel provides a slot for each protocol per domain. The proc [ ::rel::domain domain options ] provides two options for specifying the two procedures. When the relation join or restiction operators are invoked they have a choice of which operator to use. They use the -orderop proc if it has been defined, oherwise they use the -relop procedure.

4.6   Exercises

  1. [5]Write ::rel::domain to store the procedure invocations in an array ::rel::domain(domain). Each array element is a list { relop orderop } which contains the procedure invocations, minus the (last) 2 element parameters.
  2. [5]Write a relop for character strings, and a relop for numbers.

4.6.1   Variables

Once a tcl variable is assigned a relational list value, that variable is in a sense a relation object, though we dont treat it any differently to any other tcl variable. In particular, the value of its domain tuple should never change. Changing the domain tuple means that the meaning of the relation itself has changed. This would not normally occur in a program. We can see from this that relations are good candidates to be objects rather than just variables.

We start by writing some procedures based on relations as variables. Later we recast them as relation objects with methods.

4.6.2   Answer

########################
#
#  "rel domainop" domain [ -relop command ] [ -orderop command ]
#  "rel is" <rel> - returns <rel'> or {}
#
# Relational routines
#
# ::rel::is tests the parameter for relation form
# it returns a relation in a pseudo-canonical form in the form (domainlist, valuelist)
# a relation in the form (valuelist, domainlist) is swapped around
# in addition the value list is tested to ensure correct tuple size and put thru [ lsort -uinque ]
# if ::rel::is failes to recognise a valid relation it returns {}
# otherwise it returns the pseudo-canonical relation.
# see notes at the bottom of this file regards the pseudo-canonical status
#
# a relation is a variable which is a list of two components.
# 0 - value
# 1 - domain spec
# there is also a reldomain array with an element for each domain type. 
# Each array element has a value which is a list of 3 functional expressions { is == and } specified as lambda expressions
# if the value is {} for any of these the tcl (c) equivalent is assumed.
########################

proc "rel domainop" {a args} {
    global "rel domain"
    set arg(-relop) {}
    set arg(-orderop) {}
    array get arg $args
    if { ![ info exists "rel domain($a)" ] } { set "rel domain($a)" { {} {} } }
    if { [ info exists arg(-relop) ] } { set "rel domain($a)" [ list $arg(-relop) [ lindex "rel domain($a)" 1 ] ]
    if  { [ info exists arg(-orderop) ] } { set "rel domain($a)" [ list [ lindex "rel domain($a)" 0 ] $arg(-orderop) ] ]
    return "rel domain($a)" 
}

# "rel is" returns 1 if the list is a normal relation which may contain {} elements
# later extend to verify that the tuple values are elements of their domains
proc "rel is" rel {
    if { [ llength $rel ] != 2 } { return {} }

    set error 0
    set l [llength [lindex $rel 1 ] ]
    foreach i [ lindex $rel 0] {
        if { $i != {} && [ llength $i ] != $l } { 
            set error 1 
            break
        }
    }
    if { $error == 0 } { return [ list [ lsort -unique [ lindex $rel 0 ] ] [ lindex $rel 1 ] ] }
    set l [llength [lindex $rel 0 ]
    foreach i [ lindex $rel 1 ] {
        if { $i == {} } continue
        if { [ llength $i ] == $l } continue
        return {}
    }
    return [ list  [ lsort -unique [ lindex $rel 1 ] ] [lindex $rel 0 ] ]
}

# at this stage this procedure is not totally correct
# Uniqueness is only determined on a character basis, rather than using the equality relation based on the domain type.
While this procedure looks ok it is not strictly correct. Ensuring each element in the relation is unique is a non-trivial task as we shall see when we look at the relational operators.

5   Relation Procedures

We can treat tcl relations just like any other variable. We need some special procedures to manipulate them. All these procedures are held under ::rel as a package. For now we write general procedures. Later we write [::rel::create] which creates a relation object. This object then comes with relation methods which call on the procedures we are writing here.

These procedures take the convention that the domain is first and the set of tuples second in the list.

Time invariant attributes

::rel::dimension rel - returns the dimension of the relation
::rel::domains rel - returns the domain list of a
::rel::domain rel n - returns the domain name at index n

Time variant attribute

::rel::value rel - returns the list of tuple values

Binary operators

::rel::cross a b - cross product (not strictly a binary operator as it returns a new kind of domain)
::rel::or a b - set union
::rel::and a b - set intersection
::rel::less a b - set difference

Relational operators

::rel::proj rel { j1 j2 j3 . . } - relational projection
::rel::join a b da relop db - relational join
This is the join of a with b where the domain represented by the integer index da is compared to the domain db from b using the operator relop which is = != < <= > >=.
::rel::div a b - relational division
::rel::restriction a d1 relop d2 - relational restriction

5.1   Exercise [5]

Write the procedures for dimension, domains, domain, and value.

5.2   Answers

########################
#
#    "rel dimension" <rel>
#    "rel domains" <rel>
#    "rel domain" <rel> <index>
#    "rel value" <rel>
#
# Relational routines
#
# These routines are mainly accessed from the rdb routines and it is assumed by these routines 
# that their parameters are union compatible normal relations with the exception that some elements 
# may be null  == {}
#
# a relation is a variable which is a list of two components.
# 0 - domain spec
# 1 - value - a list of n-tuples

proc "rel dimension" a {
    return [llength [ lindex $a 1 ] ]
}
proc "rel domains" a {
    return [ lindex $a 1 ]
}

proc "rel domain" {a i } {
    return [ lindex ["rel domains" $a] $i ]
}

proc "rel value" a {
    return [ lindex $a 0 ]
}

6   Domain relation operators (relops)

The binary and relational operators on relations require that we compare elements of the relations for equality or order. In order to perform these operations in a well-behaved manner we need a canonical form for a relation. Each domain in a relation has its own identity and order relations which must be specified. We specify these using the protocol used in tcl in commands such as lsort. We combine both identity and order in one procedure [relop { a b }] that returns ( <0 , 0, >0 ) for ( a < b , a = b, a > b ).

The relop for a domain is registered against the domain name using the procedure [::rel::relop domain proc_name]. If a domain is not registered its relop defaults to ascii order which is the default used for [lsort] and [string compare]. This means that ALL numeric domains must be registered.

The normal relop for numbers is:

proc numrelop { a b } { expr {$x -$y} }

Not all domains have order relations on them. For example we can compare colors in one of 5 formats understood by tcl , name, #rgb, #rrggbb, #rrrgggbbb and #rrrrggggbbbb forms, for equality. But there is no commonly understood order for colour. We do not say red < blue. This allows us to use color as a domain in relations as long as we do not invoke order in the join and restriction operations. Domains with no order relation must return -1 if the elements are not equal, and 0 if there are equal. This is different from normal usage in which equality usually returns 1 and inequality returns 0.

In order to develop this structure further we need to do some ground work and provide a procedure [::rel::tuplecomp domainlist tuple1 tuple2] that can compare two tuples over a domain for equality or order. We also use a simpler version for comparing domain lists on a purely character basis, [::rel::domainsequal]. Domains are another example of objects that have no order; we dont say colour < size or time < zipcode. Although it is called domainsequal it can be used for any character based test of tuple equality.

6.1   Exercises

7   Binary Operators

These procedures are called binary operators because the operate on two relations to produce another relation of the same domain as a result.

The cross product is a kind of multiplier. It produces a new relation by catenating every element of the first relation with every element of the second. THis means it is not strictly a binary operator. If the relations have domain lengths a and b, then the result has domain length (a+b). If the relations have na and nb elements then the result has (na*nb) elements. A cross product relation can easily become huge so take all possible precautions to limit the size of any cross product that you may need to use.

The set union, intersection and differecne are self explanatory. They require that the domains of the two operators match, and the result has the same domains.

8   The Relation Object rel

A relation consists of a domain n-tuple, and a set of values defined in that n-tuple. Null relations with no domain are not permitted, and so for convenience in applications, the value list is allowed to contain occurences of the null tuple {}.

Do we really need this ?This is used as a means to delete elements from the list of tuples without changing the absolute positions of elements in the list which might upset indexing algorithms which expect to find them in particular positions.

rel create a { d1 d2 . . . }

This procedure creates a relational object with the name a. This object exists as a variable and as an associated group of procedures. So it is possible to simply assign the value of one relation to another, in which case the data structure is copied, but the target must already be a table. The object has a number of attributes, functions and operations. d1 d2 etc are domain names, not neccesarily unique. For most of the relational operators domain names of relations must match. ( In other words we must be sure we are talking about the same kinds of things when we make comparisons). This implies that the domains should have some order operation on them. This is trivially the case in a computer system using a literal character based comparison, however certain types of domains may have special order relations which should be applied when checking for identity or order.

Nullary operator

a destroy - destroys the object a

Time invariant attributes

a dimension - returns the dimension of the relation
a domains - returns the domain list of a
a domain n - returns the domain name

Time variant attribute

a get - returns the list of tuple values
Note that there is no explicit inverse - at best we could define something which is never evaluated.

Binary operators

a cross b - cross product
a or b - set union
a and b - set intersection
a less b - set difference

Relational Operators

a proj { j1 j2 j3 . . } - relational projection
a join b da relop db - relational join
This is the join of a with b where the domain represented by the integer index da iscompared to the domain db from b using the operator relop which is = != < <= > >=.
a div b - relational division
a restriction d1 relop d2 - relational restriction

9   Questions

Do some stuff here on sets and logic
  1. Are "even integers", "dark colours", or "good movies" an example of a set ? Use your knowledge of logic to explain your answer for each one.

10   Further work

Extend this suite of procedures to include algebraically defined relations.

©2000 - 2006 WEBSCOOL This page last updated 11 May 2006. All rights reserved - including copying or distribution of any portion of this document in any form or on any medium without authorisation. For more regarding the copyright.