Relational Object Projectwritten 26 April 2003 being revised May 2006 |
In this project tcl code is developed which provides a basis for the Micro RDB project.
This package of code is developed using the notion of relation as an object to ensure data integrity in all applications which use relations.
Relations make good objects because an application usually only deals with a few relations at a time and they have a reasonably small base of well-known procedures. In other words the object "relation" is very well understood and well-behaved in a strict mathematical sense.
The code developed here may be sufficienct in many instances to be used in a database context for a small appilaction without invoking the more general approach of the RDB.
A domain is a conceptual notion, it may be numbers, products, dates, colours. In order to compare things they must have the same domain. We cannot compare dates with colours. Some domains, such as numbers, are more general in concept than others such as date. We can consider a date to be a kind of number and compare a date with a number, but really we are looking at the number as though it was a date when we do this, otherwise the comparison has no useful meaning to us.
This means that when we do things with different relations we must always take care that the domains match up. Managing this is an important part of the procedures that we develop.
There are two very common forms of relation, the 1-tuple and the binary relation. The 1-tuple is the same as a simple set, a collection of values from a domain of such values. It may be the even integers, dark colours, or good movies.
The binary relation is a 2-tuple where both domains are the same. It is also called a graph. The word graph here is not used in the sense of plotting data, but is used in the mathematical meaning of graph theory. Graph theory is concerned with the way things connect, or are dependant on one another. A computer's file (hierarchical) structure is a special example of a graph, called a tree.
In relational databases each table is a relation. The domains are things like customer code, address, phone number, outstanding balance.
The order of the domains is important and relates directly to the order of the values in each n-tuple. If the first domain is cumstomer number, then the first value in every tuple is a customer number. The n-tuples are not just lists that can be any length and contain any information. Each n-tuple must have the same length as the domain n-tuple, and every value must correspond to its domain according to its position in the n-tuple.
We can represent a relation in tcl as a list with two components: the domain tuple and the list of element tuples. So the list length of a relation is 2.
The domain tuple is never empty. It is represented by a simple tcl list. A particular domain can appear more than once in the domain list. The length of the domain list dictates the length of the tuples in the second part of the list, the element list.
The element list is a proper list. It can have any number of items in the list from 0 up, so the list can be empty. Every item in the list must be an n-tuple: a list of values of the same length as the domain list. Since this list represents a set, there should be no duplicate tuples in the list.
Any list with this structure can be construed to represent a relation. This means that we can look at any tcl list and decide if it is appropriate to be used in the context of relational operations.
We could put the domain list first or second, but for human readability purposes, say when we peek at a file with a relation stored in it, having the domain list first makes it easy for us to quickly identify what kind of relation it is.
When we use the == sign in tcl we are assuming it has these properties. Its useage within tcl conforms to this in all cases as far as I know. We as users have a common understanding that the == sign behaves this way.
The [close] operator should return 1 if the elements are close, 0 otherwise. When we use a 'closeness' operator there is often a parameter of degree included. How close, how similar. These operators typically have 3 parameters. The degree should be the first parameter so that when the procedure is used in ::rel the two element arguments can be appended to it.
The [closeop] procedure which you define should return 1 if a is close to b, 0 otherwise. This means that a [closeop] procedure behaves the same as an [equalop] as far as the ::rel procedures are concerned.
Because order includes a notion of equality the procedure for an order relation typically returns a range of values. [orderop a b] returns a value < 0 (typically -1) if a < b, returns 0 if a = b, and returns a value > 0 (typically 1 ) if a > b.
General binary relations have the same protocol as [equalop] and [closeop] procedures. They return 1 if a is related to b, 0 if not.
We start by writing some procedures based on relations as variables. Later we recast them as relation objects with methods.
######################## # # "rel domainop" domain [ -relop command ] [ -orderop command ] # "rel is" <rel> - returns <rel'> or {} # # Relational routines # # ::rel::is tests the parameter for relation form # it returns a relation in a pseudo-canonical form in the form (domainlist, valuelist) # a relation in the form (valuelist, domainlist) is swapped around # in addition the value list is tested to ensure correct tuple size and put thru [ lsort -uinque ] # if ::rel::is failes to recognise a valid relation it returns {} # otherwise it returns the pseudo-canonical relation. # see notes at the bottom of this file regards the pseudo-canonical status # # a relation is a variable which is a list of two components. # 0 - value # 1 - domain spec # there is also a reldomain array with an element for each domain type. # Each array element has a value which is a list of 3 functional expressions { is == and } specified as lambda expressions # if the value is {} for any of these the tcl (c) equivalent is assumed. ######################## proc "rel domainop" {a args} { global "rel domain" set arg(-relop) {} set arg(-orderop) {} array get arg $args if { ![ info exists "rel domain($a)" ] } { set "rel domain($a)" { {} {} } } if { [ info exists arg(-relop) ] } { set "rel domain($a)" [ list $arg(-relop) [ lindex "rel domain($a)" 1 ] ] if { [ info exists arg(-orderop) ] } { set "rel domain($a)" [ list [ lindex "rel domain($a)" 0 ] $arg(-orderop) ] ] return "rel domain($a)" } # "rel is" returns 1 if the list is a normal relation which may contain {} elements # later extend to verify that the tuple values are elements of their domains proc "rel is" rel { if { [ llength $rel ] != 2 } { return {} } set error 0 set l [llength [lindex $rel 1 ] ] foreach i [ lindex $rel 0] { if { $i != {} && [ llength $i ] != $l } { set error 1 break } } if { $error == 0 } { return [ list [ lsort -unique [ lindex $rel 0 ] ] [ lindex $rel 1 ] ] } set l [llength [lindex $rel 0 ] foreach i [ lindex $rel 1 ] { if { $i == {} } continue if { [ llength $i ] == $l } continue return {} } return [ list [ lsort -unique [ lindex $rel 1 ] ] [lindex $rel 0 ] ] } # at this stage this procedure is not totally correct # Uniqueness is only determined on a character basis, rather than using the equality relation based on the domain type. |
These procedures take the convention that the domain is first and the set of tuples second in the list.
######################## # # "rel dimension" <rel> # "rel domains" <rel> # "rel domain" <rel> <index> # "rel value" <rel> # # Relational routines # # These routines are mainly accessed from the rdb routines and it is assumed by these routines # that their parameters are union compatible normal relations with the exception that some elements # may be null == {} # # a relation is a variable which is a list of two components. # 0 - domain spec # 1 - value - a list of n-tuples proc "rel dimension" a { return [llength [ lindex $a 1 ] ] } proc "rel domains" a { return [ lindex $a 1 ] } proc "rel domain" {a i } { return [ lindex ["rel domains" $a] $i ] } proc "rel value" a { return [ lindex $a 0 ] } |
The relop for a domain is registered against the domain name using the procedure [::rel::relop domain proc_name]. If a domain is not registered its relop defaults to ascii order which is the default used for [lsort] and [string compare]. This means that ALL numeric domains must be registered.
The normal relop for numbers is:
proc numrelop { a b } { expr {$x -$y} }
Not all domains have order relations on them. For example we can compare colors in one of 5 formats understood by tcl , name, #rgb, #rrggbb, #rrrgggbbb and #rrrrggggbbbb forms, for equality. But there is no commonly understood order for colour. We do not say red < blue. This allows us to use color as a domain in relations as long as we do not invoke order in the join and restriction operations. Domains with no order relation must return -1 if the elements are not equal, and 0 if there are equal. This is different from normal usage in which equality usually returns 1 and inequality returns 0.
In order to develop this structure further we need to do some ground work and provide a procedure [::rel::tuplecomp domainlist tuple1 tuple2] that can compare two tuples over a domain for equality or order. We also use a simpler version for comparing domain lists on a purely character basis, [::rel::domainsequal]. Domains are another example of objects that have no order; we dont say colour < size or time < zipcode. Although it is called domainsequal it can be used for any character based test of tuple equality.
The cross product is a kind of multiplier. It produces a new relation by catenating every element of the first relation with every element of the second. THis means it is not strictly a binary operator. If the relations have domain lengths a and b, then the result has domain length (a+b). If the relations have na and nb elements then the result has (na*nb) elements. A cross product relation can easily become huge so take all possible precautions to limit the size of any cross product that you may need to use.
The set union, intersection and differecne are self explanatory. They require that the domains of the two operators match, and the result has the same domains.
Do we really need this ?This is used as a means to delete elements from the list of tuples without changing the absolute positions of elements in the list which might upset indexing algorithms which expect to find them in particular positions.
rel create a { d1 d2 . . . }
This procedure creates a relational object with the name a. This object exists as a variable and as an associated group of procedures. So it is possible to simply assign the value of one relation to another, in which case the data structure is copied, but the target must already be a table. The object has a number of attributes, functions and operations. d1 d2 etc are domain names, not neccesarily unique. For most of the relational operators domain names of relations must match. ( In other words we must be sure we are talking about the same kinds of things when we make comparisons). This implies that the domains should have some order operation on them. This is trivially the case in a computer system using a literal character based comparison, however certain types of domains may have special order relations which should be applied when checking for identity or order.