Chu Spaces

being revised May 2006

Google site search

You need the Tcl Plugin to read this page. If this spot  is not green (v2) or cyan (v3)you may want to download the latest Tcl Plugin.

1   What is a Chu Space

Having experimented with the Chu Space calculator, one might wonder how Chu Spaces are different from other similar objects like matrices, or a the representation of a relation, or of a monoid or group ( binary operation).

For starters, Chu Spaces are more general, and also more abstract. Binary relations and binary operations are square matrices and there is also an identity between the rows and columns. One way to see the difference is to look at the isomorphisms of small finite space and note the differerences.

The smallest, useful comparison can be made with an alphabet of {0,1}, 2 points and 2 states. This gives us 16 matrices.

00 00 00 00 01 01 01 01 10 10 10 10 11 11 11 11
00 01 10 11 00 01 10 11 00 01 10 11 00 01 10 11

All the representations admit renaming the alphabet as an isomorphic transformation. This gives us:

11 11 11 11 10 10 10 10 01 01 01 01 00 00 00 00
11 10 01 00 11 10 01 00 11 10 01 00 11 10 01 00

As can be seen, for an alphabet of 2 characters, this is a simple folding.

-o Implication

also * # and =>

Implication is the most important complex operator on Chu spaces. It is also the main justification for Chu spaces. Implication enumerates the number of continuous functions from one space to another. The representation of this as another Chu space might be considered a little artificial as it is not clear that the resulting chu space is of any algebraic significance within the general context of Chu spaces. For instance there is no sign that implication forms interesting algebraic structures. However the operation itself is extremely interesting, in that it tells us all the morphisms , from one structure to another, including isomorphisms. So implication can tell us if two spaces are non-trivailly isomorphic.

Given A -o B, the result is a matrix of rows. Each row represents a mapping from A into B , not necessarily onto B. Each row of A -o B is read as a list of elements (of rows) of B. There are as many elements in the row as there are rows in A, so the length of a row in A -o B is (|rows of A| * |rows of B|).THere is an implied correspondence between the ith row of A and the ith element in the rows of A -o B, that is the function f represented by A -o B (row i ) maps f ( A (row j)) -> element j .

Implication is calculated by making attempts to form a matrix with size of ( rows of A, cols of B). A matrix represents a morphism when all the columns of the matrix are columns of A and all the rows are rows of B.

To exhaustively enumerate all these matrices, every possible combination of character is tried in every position of the candidate matrix. Given an alphabet K the number of possibilities is truly huge even for small K, A, B. The formula is |K| (|rows of A| * |cols of B|). A table with A and B being square matrices of size n gives us |K|n2. The following table gives some values for |K| and n and some calculation times assuming 1 million tests per sec.

n = 2n = 3n = 4n = 5n = 6
|K| = 21651265,53633,554,432
~ 33 secs
~ 19 hrs
~ 43 secs
~ 10 days
~ 4759 years
~ 71 mins
250 ~ 35 years
~ 2 secs
~ 42 hours
~ 10 secs
~ 32 days

Table 1

Typically a |K|=4, n=4 implication may only find 4 morphisms out of the potential 4 billion possibilities, so implication is a real needle in a haystack search. The following diagram is a flowchart for the basic logic of implication.

The candidate matrix M is filled with "."s to start with. The dots mean there is no current valid candidate for this position in M. In fact the "."s are used to facilitate regexp searching in the Tcl implementation. The Matrix has |r| rows and |c| columns, and the alphabet has |k| symbols. Rows, columns and symbols all number from 0,1,... to |r|-1, |c|-1, and |k|-1 respectively. A path is walked thru the cells of the matrix. In theory any path could be chosen, but it is fixed for the life of the search. In each cell the symbols 0, ..., k-1 are tried in order.

An Initialisation procedure sets up the M matrix with its "."s, and sets up slightly modified copies of the two spaces which make regular expression matching by rows easy. It also initialises the pointer to the first cell.

The "incr symbol(cell)" action simply increments the value of the current cell symbol, treating "." as though it were -1. Inspection of the flowchart diagram reveals four columns. Notice that the leftmost column has only one action, "Step back one cell". This is mirrored by the rightmost column which also has only one action "Step forward one cell". The left half of the flowchart is concerned with finding a new cell value worthy of a candidacy test.
The cell order could be rows within cells or cells within rows, or any set sequence of rows. For maximum efficiency a path is selected which hopefully excludes as many pointless traverses as possible. A good candidate seems to be the herringbone path, which is the first row, followed by the rest of the first col, followed by the rest of the second row followed by the rest of the second column, ... This path has a well defined stepwise order regardless of the number of rows and columns and eliminates pointless excursions by setting up a row-column framework which maximises interdependancies and then strengthens it. Using this tactic, probably the only improvement would be an alternating herringbone path which is a rather more difficult path to calculate.

The candidacy test is for whether the current symbol(cell) is valid in the context of its column, that is , is there a column in A that looks like this candidate column? The column for this cell is used as a regular expression and this pattern is sought among the columns of A. Similarly the row for this cell is used as a regular expression to search for matching rows in B. If a matching A column and a matching B row exist, then this cell is OK for now. The search proceeds until there are no matches, in which case the M matrix is unwound back down its path, or else we get to the last cell in the path which is OK, so that all cells are OK and a morphism is identified.

When the cell path has been completely unwound back to the initial cell, we know the search has been exhaustive. And very neatly, the M matrix is returned to being all "."s, and the cell pointer is returned to the first cell.

Given the table of staggering figures above, just how long does it take perform these calculations. I have run a test using a sample square matrix which I called E for no better reason than it followed on after some other test matrices called A B C and D. The E space is defined for an alphabet of size k over an n*n space. Cell E(i,j) = min { (i+j)mod n , k-1 }. Note that E = E_|_ which ensures at least one function, the identity. Rows and columns of E start with all letters of the alphabet, as far as is possible, and so candidate search paths exist over a wide range of values on the search path. In particular cell E(0,0) satisfies the candidacy test for the alphabet. For k=2, E2n represents the space of n discrete points, and E2n -o E2n represents the discrete topology with n! points. This explains the factorial time behaviour for k=2. For k=3, E3n represnts the points of a directed cycle of n points. E3n -o E3n represents the cyclic group on the ring and also has n points.

Table 1 times for executing E(k,n) -o E(k,n) on my laptop:
secs n = 2n = 3n = 4n = 5n = 6n = 7n = 8n = 9n = 10n = 11n = 12
|K| = 20.0110.0510.291.915180
|K| = 30.0160.0430.180.471.
|K| = 40.0160.0550.260.581.
|K| = 50.0220.0990.300.0641.
|K| = 60.0210.130.360.741.
|K| = 70.0240.150.410.861.
|K| = 80.0280.170.481.

Table 2 The times for the new version of imply are :
secsn = 2n = 3n = 4n = 5n = 6n = 7n = 8n = 9n = 10n = 11n = 12
|K| = 20.00540.0250.151.18.7115
|K| = 30.00550.0180.0510.160.320.611.
|K| = 40.00490.0160.0580.200.360.620.991.
|K| = 50.00500.0170.0650.210.390.651.
|K| = 60.00500.0160.0590.200.400.671.
|K| = 70.00490.0160.0590.200.400.711.
|K| = 80.00550.0160.0590.200.380.681.

Table 2

These times do not match the trends which one would expect to see in theory from Table 1 above. For |K| = 2 the time does indeed rise expontentially w.r.t n, but for the other k the times seem to increment O(n2) at most. How do we explain this behaviour?

One answer is that E(2,n) -o E(2,n) returns n! morphisms. For k>2 E(k,n)-oE(k,n) has only n morphisms for n>=k, and only 1 morphism otherwise. This explains the almost linear look of the times in the table. The implication here is that dead-ends in the search path are dispensed with very quickly, and that the algorithm spends most of its time "rubber-stamping" a candidate morphism. So the time it takes to calculate an implication is mainly dependant on the size of the answer rather than on the size of the arguments.

The unary operator Query (?)

For this reason, the usual algorithm for the unary operator query (?) is in a sense the worst algorithm for the job. The standard algorithm is to extend the space with all rows of constants from the alphabet to produce an extended space S+. Then (_|_ S+) -o S+ is calculated and for each matrix returned the diagonal is added to S+ if it is not there already. This procedure is repeated until there are no more new diagonals produced (closure). However (_|_ S+) -o S+ returns several matrices for each diagonal, and in addition returns all the rows in S+ as well as any new ones.

Analysis of the implication algorithm

Implication is recognised as being an NP-hard problem, being of order O(kn*m). And obviously this is the case with some expressions returning n*n*n! sized answers. But this begs a question. So What?

Simply calculating a factorial is an NP-hard problem, but factorials are calculated as part of binomial coefficients and binomial probabilities all the time. The importance is not the potential size of all the possible answers, but the time it takes to get the "interesting" answers. An answer of order n! is not particularly interesting because it is just a numeric way of saying "every" combination, which is not a particularly useful result. So a more interesting question in relation to the implication algorithm is:

How long does it take to tell if there are 0 morphisms?

To this we can then add:

How long does it take to tell if there is only 1 morphism (typically the identity) and no others?

How long does it take to tell if there are only order n morphisms?

One way of addressing these issues is to first understand the nature and anatomy of morphisms. The first issue is: how many morphisms are there between two spaces? Given a space A (ka pa sa) and a space B (kb pb sb), since A-oB consists of symbols that must all be in both A and B take k = min(ka,kb). The minimum number of output rows (points p) is of course 0. The maximum number is papb. But it is possible that this can only occur for spaces which are non-unique to start with, and then perhaps only for constant spaces. The following is a table of results for A-oA where A is (k , n , m ) . This is the same class of spaces as was considered in the timing section in Table 1. In this case however we are not simply enumerating them, but performing A-oA on them which takes longer than 1 microsecond on my laptop. The execution time for each cell in Table 3 below is the product of the corresponding cells in Tables 1 and 2! For A(2,4) it is 65,536 * .21 secs = 3.8 hours. For A(2,5) it is 33554432*1.2secs = 15 months

Table 3 The number of morphisms from A to A where A is n rows and m columns over an alphabet of |k| symbols
k=2m = 2m = 3n = 4n = 5n = 6
n = 2 6 spaces non-unique
10 had 2 morphisms
40 spaces non-unique
12 had 2 morphisms
12 had 3 morphisms
232 spaces non-unique
24 had 4 morphisms
ALL spaces are nonunique
n = 3 40 spaces non-unique
12 had 2 morphisms
12 had 3 morphisms
248 spaces non-unique
72 had 2 morphisms
72 had 3 morphisms
84 had 6 morphisms
36 had 7 morphisms
2488 spaces non-unique
144 had 1 morphism
288 had 2 morphisms
288 had 3 morphisms
288 had 5 morphisms
168 had 6 morphisms
288 had 9 morphisms
144 had 10 morphisms
Same as
k=2 n=5 m=3
Same as
k=2 n=6 m=3
n = 4 232 spaces non-unique
24 had 4 morphisms
2488 spaces non-unique
144 had 1 morphism
288 had 2 morphisms
288 had 3 morphisms
288 had 5 morphisms
168 had 6 morphisms
288 had 9 morphisms
144 had 10 morphisms
31672 spaces non-unique
3456 had 1 morphism
5184 had 2 morphisms
4608 had 3 morphisms
4752 had 4 morphisms
3456 had 5 morphisms
960 had 6 morphisms
2304 had 7 morphisms
2880 had 8 morphisms
1152 had 14 morphisms
2568 had 16 morphisms
2304 had 20 morphisms
48 had 24 morphisms
192 had 34 morphisms
Same as
k=2 n=5 m=4
n = 5 ALL spaces non-unique 26048 spaces non-unique
1440 had 2 morphisms
1440 had 3 morphisms
1440 had 6 morphisms
240 had 9 morphisms
2160 had 11 morphisms
564736spaces non-unique
80640 had 1 morphisms
89280 had 2 morphisms
51840 had 3 morphisms
33840 had 4 morphisms
37440 had 5 morphisms
25920 had 6 morphisms
28800 had 7 morphisms
28800 had 8 morphisms
8640 had 9 morphisms
8640 had 10 morphisms
5760 had 11 morphisms
8640 had 12 morphisms
5760 had 13 morphisms
5760 had 14 morphisms
2880 had 15 morphisms
720 had 16 morphisms
5760 had 18 morphisms
2880 had 19 morphisms
7200 had 20 morphisms
2880 had 21 morphisms
11520 had 22 morphisms
5760 had 23 morphisms
480 had 24 morphisms
5760 had 25 morphisms
5760 had 26 morphisms
5760 had 27 morphisms
2880 had 33 morphisms
2880 had 35 morphisms
960 had 37 morphisms
n = 6 241984 spaces non-unique
4320 had 2 morphisms
5040 had 6 morphisms
4320 had 11 morphisms
4320 had 12 morphisms
2160 had 17 morphisms
n = 7 2056832 spaces non-unique
10080 had 6 morphisms
30240 had 17 morphisms
n = 8
n = 9

©2000 - 2006 WEBSCOOL This page last updated 11 May 2006. All rights reserved - including copying or distribution of any portion of this document in any form or on any medium without authorisation. For more regarding the copyright.