nty.sp Specification of the Notch TYpe routine. Version 6. R. A. Parker 10.10.07 ----------xxxxxxxxxx----------xxxxxxxxxx----------xxxxxxxxxx----------xx General ------- The nty routine holds two entities - relators and notch-types. A relator is a cyclical word, considered to be equal to all its rotations and inversions. A notch-type, on the other hand, is a word - not cyclical. In general, the nty routine is expected to know all there is to know about the list of notch-types currently in the (possibly virtual) list and yet be able to react effectively to inserts and deletes of relators on the fly. This almost impossible problem may ultimately be solved, but moderately efficient inplementations of the nty module can be made by holding words in alpabetical order and creating various indexes and chains to find the required words fairly quickly. Usually it is the ntynexnts and ntyprents routines that form the performance critical heart of a program using nty, and the inserts and deletes that create the table are of lesser importance. It is expected in due course that an ntyfind routine will be necessary, but this and indeed other routines will be added to this specification as further uses of nty arise. Words are handled in the interface to nty as ints. It is expected (but not required) that the nty module is built on top of the alw module. ----------xxxxxxxxxx----------xxxxxxxxxx----------xxxxxxxxxx----------xx 1. nty shall provide a header file nty.h that shall provide 1.1 The typedef of a struct NTY suitable for identifying and holding the full context of a list of words closed under rotation and inversion. 1.3 Prototypes or macros for all the routines listed in section 2 ----------xxxxxxxxxx----------xxxxxxxxxx----------xxxxxxxxxx----------xx 2. nty shall provide the source code or nty.h provide macros to implement all of the following functions. 2.1 void ntycons(NTY * nt) to allocate the memory needed to service a list of words as specified here. ntycons does not prepare the space for use - ntystart() must be called as the next function to use the nt. The structure nt is of a size known at compile time, essentially specified in nty.h. 2.2 void ntystart(NTY * nt, int ngens, int * inv) to reset the nt to the empty set of words. This function can be called at any time, and returns the nt to the initial state. It must be called after ntycons to prepare the tables for use. All data previously held in the nt is wiped out. A pointer to the (constant) inverse table must be given in the third parameter. This table must be populated before the call to ntystart, and must not be changed - nty does not (need to) copy this table. Hence inv[1] is the inverse of generator 1. inv[0] is not used. 2.3 void ntydelete(NTY * nt, int relh) to delete the relator identified by the handle relh. 2.4 int ntyinsert(NTY * nt, int * rel, int len) to insert the cyclical relator pointed to by rel, of length len. The word rel is copied into the areas controlled by nt, and rel can be trashed by the calling routine as soon as ntyinsert has returned. If the relator (or its inverse or one of its rotations) is already in the list, 0 is returned and the nt remains unchanged. If the relator is not in the list, it is inserted and the relator handle (an int) is returned. This handle remains valid until ntystart is called, or the relator is deleted with ntydelete(). Once deleted, the handle may be re-used by a later ntyinsert. The notch types inserted consist of all rotations and inversions of the given word that are distinct. The relator must have length at least 3. 2.5 int ntyrelnt(NTY * nt, int relh) to provide the handle of "the" notch type of the relator specified by the handle relh. The notch type returned shall be precisely the word that was ntyinserted. It shall be the first in the "ntynexntr" chain. 2.6 int ntynexntr(NTY * nt, int nth) to provide the handle of the next notch type of the same relator as nty. If there are no more notch types for that relator, 2 is returned. Hence to loop through the notch types for a relator, ntyrelnt is called, and then ntynexrel is repatedly called to get the handles of the other notch types of the relator, until 2 is returned. nty may use any sequence for the rotations and inversions in this chain. 2.7 int ntyntrel(NTY * nt, int nth) to provide the handle of the relator associated with the notch-type nth. 2.8 int * ntyntptr(NTY * nt, int nth) to provide a pointer to the word whose handle is nth. 2.9 int ntyntlen(NTY * nt, int nth) to provide the length of the notch type whose handle is nth. 2.10 int ntyrot(NTY * nt, int nth, int rot) to return the handle of the notch-type that is the rotation of the notch-type nth by rot steps "to the left". rot may be any signed integer. The pointer is advanced rot steps to the right, making the rotation as a whole leftwards. Hence if rot is 1, the first letter of the word is moved to become the last letter, and all other letters are shifted left one position. 2.11 int ntynexnts(NTY * nt, int nth, int len) returns the handle of the first (in the main ordering) notch type whose length is no more than len following the notch type nth. If len is -1, the next notch-type (regardless of length) shall be returned. If there are no such notch-types, 2 is returned (see 3.2 below). 2.12 int ntyprents(NTY * nt, int nth, int len) returns the handle of the last (in the main ordering) notch type whose length is no more than len preceeding the notch type nth. If len is -1, the previous notch-type (regardless of length) shall be returned. If there are no such notch-types, 1 is returned (see 3.2 below). 2.13 int ntynexrel(NTY * nt, int relh) to return the handle of the next relator. If there are no more relators, 2 is returned (see 3.2 below). ----------xxxxxxxxxx----------xxxxxxxxxx----------xxxxxxxxxx----------xx 3. Conventions and further specifications. 3.1 Ordering. The notch-types shall be given an ordering such that if A, B and C are notch types occurring in that order, the common initial substring between A and B, and also between B and C shall be no shorter than the common initial substring of A and C. This ordering shall be used by ntynexnts. There is no special requirement on the ordering of relators. This ordering is stable. 3.2 Relator and notch-type handles. The handles 1 and 2 shall always exist and shall have the meaning that they are the one before the first, and the one after the last. No real handle shall take a value of less than 3. These rules apply to both relators and to notch types, and are intended to allow the listing of all the relators and all the notch-types by starting with 1, and calling ntynexnts or ntynexrel until the return value is 2. 3.3 Relators are distinct cyclical words. The set of relators must all be distinct as cyclical words. This is enforced by the failure of ntyinsert should the cyclical word be already in the list. Similarly a relator will fail to insert if its inverse is already present. 3.4 At any time, the notch type list shall consist precisely of the distinct words that occur as rotations and inversions of the relators. 3.5 Stability of handles. The handle of a notch-type or relator shall remain valid across all calls except 2.1-2.3. The handle of a notch-type or relator shall remain valid also across a call to ntydelete unless the relator (or the relator from which the notch-type was derived) was itself the one deleted. 3.6 Stability of pointers. A pointer to a notch-type returned by ntyntptr shall remain valid and point to the same word across all calls except 2.1-2.3. The pointer shall also remain valid across a call to ntydelete unless the relator from which the notch-type was derived was itself the one deleted. 3.7 Generator numbers. These shall be from 1 upwards. ----------xxxxxxxxxx----------xxxxxxxxxx----------xxxxxxxxxx----------xx Section 4 - Implementations nty1 implements the full above interface on top of alw. Notice that alw provides an ordering satisfying the contiditons in 3.1 above. There are two files - nty1.h and nty.c. By "knowing" the structure of the NTY struct (and the ALW struct that a member of NTY struct poits to) macros can be implemented in nty1.h for many of the routines, namely ntyrelnt, ntynexntr, ntyntrel, ntyntptr, ntyntlen, ntynexrel. To compile a program with the routine nty1, the file nty1.h must be copied to nty.h prior to compilation. nty needs to allocate a large space to hold the words. I suggest #define WORDSTORE 10000000 as a reasonable method. ntyinsert should check that this size is not transgressed, and print then exit if it is. The ntyinsert module should first find out if the word is already there. if not, the word is concatenated with itself into the the store. alw can then be called starting at the first, second etc. "letter" until the word is already there. All the linkages must be maintained as specified above. ntydelete does not bother to reclaim the space in the wordstore. ntyrot is almost a macro. Notice that the rotation may be negative! ntyrot needs to repeatedly follow the links, which must be both forwards and backwards so that negative rotations can be done. ntynexnts and ntyprents work by repeatedly following the next/previous chains (maintained by alw) until one of a suitable length is found, or the end (or beginning) of the list is reached. ----------xxxxxxxxxx----------xxxxxxxxxx----------xxxxxxxxxx----------xx Section 5 - problems and discussion 5.1 Measuring common initial segment. -------------------------------------- Many of the common functions of performance-critical steps are going to need to know how long the common initial substring is between pairs of notch-types. In view of this, it seems appropriate to try to provide a routine to measure this in the nty module. This is currently not available. 5.2 Doesn't search very well ----------------------------- Given a word, or a cyclic word, there are problems to be solved and methods available for their solution in the area of searching through or round the word that the interface does not allow for. 5.3 Stability of pointers to words. ------------------------------------ Now the pointer is defined to be stable. Although I think this is the correct decision, it seems appropriate to note here that this requires the nty routine to store words such that "flat" pointers must be at all times available, and never moved. This may turn out to be too harsh. 5.4 Do we need ntyprents? -------------------------- It seems that the vast majority of algorithms are working with unordered pairs of words with a long common initial string. As such it may never be necessary to call ntyprents, and if this is indeed the case, the function should be deleted. 5.5 Finding if the relator is there. ------------------------------------- In many of the programs, the main performance-critical work consists of making relators and checking that they are already in the list. It is not clear at present whether always, if it is not in the list, we want to insert it. I feel that we may well want to batch up changes to the relator list (to allow functions unstable against insert and delete to build up information) and that the checking that the relator is there will not be followed unconditionally by an insert. 5.6 Mass deletion ------------------ Usually when a new short relator is found, the result is a relatively massive set of deletions. It is not clear that the ntydelete function, which does them one-by-one, is the approriate way to do this. ----------xxxxxxxxxx----------xxxxxxxxxx----------xxxxxxxxxx----------xx