Signal codes as
ship identifiers in databases
(Revised 8 August 2004)
Introduction
Elsewhere on this site I have documented the allocation of government-sponsored ship identification signal codes during the second half of the 19th century and beyond (refer Codes). The Marryat codes used widely earlier in the century and for an overlapping period are also documented elsewhere (refer Marryat ).
This item elaborates how signal codes could be used as surrogates for official numbers for European ships in databases and poses some questions for feedback. It is assumed that readers have already read the background coverage of the history and allocation of signal codes at Codes or are familiar with its subject content.
List of headings:
● The
desirability of using pre-existing identifiers where possible in databases
● The
potential role for signal codes as identifiers in databases
●
Principles of using signal codes as identifiers in databases
● Combining
signal code and year of construction
● Differentiating
mode of propulsion
●
Indicating the allocating nation
● Implications of
combining letters and numerals in identifiers
● The critical conclusion
● Summary and Conclusions
● Questions
outstanding
The desirability of using
pre-existing identifiers where possible in databases
The context for this exploration is the desirability of the principle of building upon pre-existing information as much as possible when identifying ships in databases in order to facilitate comparing records between databases and exchanging and sharing information between them. I see this principle as independent of the degree of development of ship identification and information exchange tools such as the Global Ship Numbering project or the degree of their adoption as no external system of identification can ever be directly incorporated into all primary documentation.
There must always be intermediate steps between primary data and externally provided identifiers and many sets of records and databases that would require years of part-time work to incorporate other identifiers and which may never be able to fully incorporate them. I think that there will always be a need to use pre-existing identifying information directly, as an auxiliary to any new universal identification system, as an intermediate step towards the implementation of such a system and as a means to utilising its outputs. Making the maximum use of pre-existing identifiers should simplify matters, reduce the effort involved and the potential for error and thereby enable and encourage individuals who would not otherwise be able to do so to participate in data sharing and contribute to database development. There is great “survival value” – in all sorts of ways – in decentralised (but consistent and compatible) approaches to information recording and sharing.
Using the information in the primary documents effectively will facilitate co-operative information recording and sharing at various levels that can feed later into more comprehensive and sophisticated database networks. In the early stages of information recording – as the ballast goes out and the first slings of cargo come aboard if a nautical metaphor will help - one must necessarily start with the “cargo” as it is. Processing comes during manufacture after delivery. “Optimum stowage on the first passage and during transhipment” is what this item is primarily about but is also relevant further along in the process.
Recognition of the advantages of pre-existing identifiers is demonstrated by the degree of consensus of utilising LR/IMO numbers and British and American official numbers in databases as much as possible and generating database-specific codes only when necessary (“necessary” of course varying with the project and the circumstances). Once sophisticated centralised database networks are operative, questions arise of effectively using the information. This will often involve taking it back out into “intermediate” and auxiliary projects involving flows of information that cannot be anticipated in detail because it is in the creative development of new ways of using information where progress in understanding arises. That leads us back into making the most effective use of identifiers already embedded in the data as likely to maximise the opportunities of creative use of information for minimum (or zero) additional effort.
The potential role for signal
codes as identifiers in databases
There is no need to use ship signal codes as identifiers when numerical identifiers are already available that require less adaptation for database purposes. Such identifiers are available for the majority of ships world wide since the late 1960’s, for two major jurisdictions since the middle of the 19th century and for a few small nations for almost as long.
The Lloyd’s Register standard ship identification numbers that have now been adopted by the IMO* are truly international identifiers and “tailor-made” for serving as unique identifiers in relational databases of multinational maritime information. There is a general consensus among colleagues with particular interest and experience in database development and linking, that LR/IMO numbers should be used to identify a ship in databases wherever possible. LR/IMO numbers were introduced during the 1960’s and are therefore extremely relevant and useful for recording information about ships in the second half of the 20th century and currently. Many ships built in the 1950’s and 1960’s and some built quite early in the 20th century survived long enough to be allocated an LR/IMO number. However, for all practical purposes the LR/IMO numbers are irrelevant to the study of merchant sailing ships.
* termed
LR/IMO numbers hereafter and elsewhere on this site.
For ships lost or scrapped before 1969, the systems of official numbers introduced by the British Empire in 1855, the United States in 1867, by Japan shortly before 1886 and by Sweden slightly later, are equally useful, requiring only a simple modification to indicate nationality to allow them to be used in datasets together with each other and with LR/IMO numbers. Upward of half of all world commercial shipping from the 1860’s through to the 1960’s was allocated an official number by one or more of these nations for all or some of its career*. Most of the remainder will have been registered with one of the other major maritime nations and if engaged in foreign trade, are likely to have been allocated a signal code for identification.
* For discussion of ships being allocated
official numbers under more than one jurisdiction refer to “Babel”. Significant numbers of ships that did not
commence their careers under British or American jurisdiction spent some part
of their careers under one or both of these flags and were therefore issued an
official number which is potentially as useful for identifying them as those
launched with an identifying number.
The following discussion is therefore carried out within the context of a working principle of giving first priority to LR/IMO numbers in the allocation of database identifiers and suitably adapted national official numbers the second priority where practicable. The point at issue is the potential usefulness of signal codes as identifiers for the residual. It has already been established that signal codes were allocated to a large proportion of ships that existed during the late 19th century that were not allocated an official number (broadly speaking, primarily those of the European nations).
Various numbers will be observed in the “official numbers” field of Lloyd’s Register for ships of some European nations in some periods but as far as I can ascertain, with the exception of Sweden (and at a comparatively late date the Republic of Ireland), they all relate to registration at a particular port and are not national official numbers at least up to and beyond World War I and therefore the period during which merchant sail constituted any significant percentage of the world merchant fleet. While port registration numbers do have utility for identification and research, an individual ship may have had more than one within the same national jurisdiction so, at best, they will be inferior as database identifiers to any identifiers applicable throughout a ship’s registration under a single jurisdiction. I have not investigated the period after WWI in the same depth but as far as I am aware my analysis of the period up to WWI is applicable for some time after WWI, possibly until LR/IMO numbers were introduced. The signal codes do appear to be the only available candidate for identifying large numbers of European ships.
Principles of using signal
codes as identifiers in databases
The ideal database identifier should be
● Specific to a ship throughout its registration, at least while under a single jurisdiction
● Not re-allocated to any other ship by the allocating jurisdiction after the demise of the first ship allocated the identifier
LR/IMO numbers and national official numbers meet these criteria apart from the comparatively rare allocation of a second identifier in error and debatable borderline decisions concerning what constitutes a “new” ship in cases of major rebuilding.
Signal codes, in isolation, do not invariably meet the second criterion under the administrative practices of at least some European nations in the 1886-1913 period. However, that failing is easily corrected by using the year of construction in conjunction with the signal code. Adding the year of construction to the signal code provides ship-specific identifiers of apparently equal utility to official numbers. The utilisation of signal codes as database identifiers is therefore potentially another way of permitting anyone, anywhere at any time to compare their records with those in other sources and to exchange data reliably with anyone else who applies the same practices to the same primary and secondary information sources.
Subject to more detailed research for more nations for more years, there appears to have been sufficient stability of the codes of the leading European nations for them to be useful as database identifiers for a lengthy period of shipping. They certainly appear to be satisfactory for two leading maritime nations. A number of reallocations would not necessarily overturn the general conclusion. Obviously, it would be highly desirable to clarify the codes’ utility for the 20 years preceding 1886 as well. I would welcome contact from anyone with access to the necessary records for this period who is interested in collaborating to resolve this.
Given that the circumstances in which two ships with the same year of construction could be issued the same national signal letters are so improbable* as to be impossible for all practical purposes, the simple expedient of combining the year of construction with the signal code is sufficient to create the same condition of uniqueness provided by official numbers. The necessary criteria of a unique identifier can therefore be met.
* (1) A second ship built in the same nation
in the same year as an earlier ship destroyed in the same year as that in which
both were launched which will not create a problem if the issuing nations
allowed a period of time to elapse before reissuing a code, as appears to be
the case.
(2) A foreign-built ship purchased by a
citizen of the issuing nation and allocated the same code as that previously
allocated to another built in the same year as the foreign-built ship, which is
possible but extremely improbable.
Issues of national identifiers and the complications that may arise from combining letters and numbers in identifiers are addressed in following sections.
Combining signal code and
year of construction
As an example of
the reallocation of national signal codes, the French signal letters HBGL were
used for the 184 ton brigantine Achille-Celestine
built at
In this example the addition of 11 and 76 to HBGL to form HBGL11 and HBGL76 is sufficient to distinguish the Achille of 1911 from the Achille-Celestine of 1876 and from any ship of either name or any other French ship, and to continue to identify the same ships if their names were changed. That might or might not be the most convenient style to express the combination of the two pieces of information into a single identifier, but for the moment that is immaterial – the essential point is that the combination of the two pieces of information together with nationality are sufficient to uniquely identify the individual ship.
If the starting point was a complete record of a nation’s ships for the last 150 years or more, one could simply number multiple allocations of a signal code as 1, 2, 3……x, but the practice of attaching the year of construction provides unique identifiers at any stage of recording a nation’s ships and long before one has transcribed all records – a critical practical consideration.
Differentiating mode of
propulsion
The European nations used this system of signal codes for approximately 100 years before LR/IMO numbers were introduced – the European signal codes date from the 1860’s and the Lloyd’s Register standard numbers first appear in their present 7-digit form in the 1969-70 Register*. As long as one is considering solely merchant sailing ships, it should be sufficient to use only two digits to identify the year of construction. However, some ships allocated signal codes in the 1860’s will have been built as early as the 1840’s or even earlier. LR/IMO numbers were allocated to ships built, in some cases, much earlier in the 20th century. There is therefore a risk of two ships built several decades apart being allocated the same identifier if the year of construction is indicated by only the last two digits of the year. This could be provided for using the year in full or the last three digits. Alternatively, it is likely that using only the last two digits of the year will be sufficient if you also distinguish the ship’s original mode of propulsion as sail, steam or motor. By the 1930’s the construction of merchant sailing ships allocated signal codes had ceased so distinguishing sail from powered vessels could be expected to suffice. Any merchant sailing vessels built in the 1930’s or later (as were a few of the Portuguese Grand Banks’ fishing schooners) will generally have had auxiliary power and could be treated as a powered vessel for the purposes of database identifiers.
* a 6-digit preliminary version was used in
the 1966-67 to1968-69 registers and translated into 7-digit numbers in 1969-70
but there were probably a few thousand ships that did not survive from 1966 to
1969. Their 6-digit numbers can be adapted to allow the starting date of the
modern international identification system to be pushed back three years which
is worth doing for the sake of documenting those ships but does not extend the
coverage long enough to eliminate the possibility of ships built many years
apart having the same code if two digits are used for the year of construction.
Taking the three most widely used maritime languages, sail and steam each start with the same initial letter in English as they do in French (Vapeurs and Voiliers), but not in German. The words for sail in both English and German start with S. A pragmatic option would be to use S for Sail from the English Sail and German Segelschiff on the majority principle and D for steam from the German Dampfer on the basis that D is a letter that does not precede the word for Sail in any of the three languages. It may or may not be worth also differentiating motor vessels or adhering to the principle followed in records for many years of including both motor vessels and steamers under the heading steam. The words for motorship commence with M in all three languages.
Differentiating the
allocating nations
It is also necessary to distinguish the allocating nations as until the 1930’s many signal letters were allocated by a dozen or more nations and were intended to be used from the outset in conjunction with an indicator of nationality such as a national flag (refer Letters).
For example, adding a prefix Fr for France or some alternative to HBGL11 to form FrHBGL11 would identify the ship allocated the French signal code HBGL in 1911 which should distinguish it from any other ship at any time, French or otherwise. It is less important whether this is the most suitable style than that the database codes are combined from these three ingredients in some systematic way. Provided that criterion is met it should not be difficult to translate from one coding system to another based on the same principle by reliable, economical, electronic means. A further prefix or suffix s for Sail would cover the remote possibility that a steamship or motorship built in 1911 was also issued the identical French code.
Implications of combining
letters and numerals in identifiers
It seems to be widely held that it is preferable to avoid combining letters and numerals in identifiers for database purposes.
LR/IMO numbers identify ships according to wholly numeric identifiers of seven digits commencing with 5.
American, British,
Japanese and Swedish official numbers identify ships according to numerals of
up to six digits. As all four nations issued official numbers in the lowest
number ranges it is necessary to distinguish each jurisdiction. This can be
done conveniently and efficiently by padding them out to six digits with
leading zeros and prefixing them with the numerals 1 to 4 which enables them to
be combined in datasets with LR/IMO numbers (which are of seven digits commencing
with 5) and with each other without fear of duplication. The
In the above applications, only numerals are involved. The suggestions which I have made for converting signal letter codes into unique identifiers for European ships combine letters and numerals and therefore cannot be confused with any wholly numerical identifiers but they conflict with any preference for codes to be wholly numeric or wholly alphabetic.
However, while alphanumeric codes appear to be perfectly workable in the same field as purely numeric identifiers in spreadsheet based datasets, there is resistance to the practice of combining numeric and alphanumeric codes in Access relational databases which are necessarily something of a de facto international standard given the universality of Microsoft products*. It is inevitable that sooner or later any mixed identifiers would wind up in some relational database. I do not yet have sufficient basis to reach a firm conclusion on how critical that consideration is.
* Spreadsheets are convenient for their
simplicity but data contained in them can be more widely used if work in
spreadsheets is designed to facilitate their eventual importation as tables
into relational databases such as Access. Spreadsheet work should therefore
take account of what works best in relational databases.
It is possible to convert the suggested alphanumeric identifiers to wholly numeric identifiers by using two digit numerical codes for the nation and converting each alphabet letter of the original code to a two digit numeral using 01 for A, 02 for B … etc. With the two digits for the year of construction, the result would be a wholly numeric code consisting of ten digits compared with the seven of LR/IMO numbers and the GSN project’s eight. However, such a wholly numeric code would be less convenient and reliable to check and compare with its source components than identifiers that represent only minimal adaptation of the original form of publication. This is a consideration in checking data entry and also in conveniently sharing information with people who have not adopted one’s coding system.
Solutions include holding alphanumeric data in a separate field from purely numeric data or translation files (which could exist in any one or more components of a network of co-operating databases) into which one could enter the easily readable and interpretable alphanumeric code and extract an internationally adopted numerical identifier of seven or eight digits (such as GSN) that one could use to ensure uniformity with other databases with which one might wish to exchange data. I invite feedback and suggestions on this point.
The critical conclusion
One need not necessarily even combine the information into an identifier in one’s own records until actually necessary to do so provided that the decision is made to record the necessary information.
The critical thing is to have identified that 4-letter signal codes used in conjunction with other basic identifying information which one will generally also record in any case, can generate unique identifiers for ships of similar usefulness to official numbers in databases and other records.
The practical consequence is to establish the usefulness of recording signal code letters together with the allocating nation for ships that did not have official numbers allocated to them. Many researchers – myself included hitherto – have not perceived signal letter codes to be worth recording. My analysis of how widely signal codes were allocated during the second half of the 19th century and the demonstration that these can be useful as surrogates for official numbers, provides substantial grounds for changing that earlier perception. I am progressively adding signal codes for European ships to my datasets in order to use them as identifiers as I believe that the effort will be worthwhile. (I am also now doing this for American ships as well as American signal code lists include the number of decks and number of masts not included in the principal published record of American ships and also a convenient quick and dirty way to track the survival of a ship.)
Multiple identifiers
A ship registered under the jurisdiction of two or more nations that allocated signal codes would have two or more possible codes compiled in the way I have suggested from which to choose, each uniquely specific to the same ship. The obvious convention is to select the first as the primary identifier in multinational databases. Exactly the same issue arises when deriving multinational identifiers from national official numbers and the same solution is appropriate.
The difficulty may arise that at a certain stage of research one may have access only to information relating to a ship’s second or third national career and initially be unaware of the first signal code allocated to it. Using the approach outlined here means that this does not actually particularly matter unduly in practice. You can safely start off using the earliest identifier that you currently know for the ship in the confidence that it cannot be allocated to any other ship and that subsequent research will necessarily lead to the only possible original, which likewise cannot be allocated to any other ship (typographic errors and suchlike excepted, as always, of course). This is the beauty of deriving identifiers wholly from generally available, pre-existing primary and secondary records for a ship – or any other object of research, for that matter – rather than from identifiers created independently of, and bearing no systematic relationship to, the original documentation. With care in the selection of procedures, the alternative identifiers cannot be duplicated and further research will place them in the proper sequence: they can only lead to each other and nowhere else, and nothing else can lead to them. The approach is thereby “self-regulating and self-correcting”, a major practical advantage.
Summary and Conclusions
Despite a degree of “recycling” of ships’ signal letter codes, alphabetic signal codes can be adapted to provide a useful, practical and reliable means for uniquely identifying many European ships, paralleling the established use of official numbers for British and American and some other ships. The original signal codes can be adapted for the purpose in a simple way that can be applied by anyone anywhere without risk of confusion or duplication, which has the advantage that people can work independently in confidence of being able to link their work efficiently in the future with that of other researchers whom they may not yet have even heard of.
By extension, this conclusion confirms the value of recording signal codes for European ships as a reference tool even if you are not developing computer databases yourself as sooner or later, and increasingly, you are likely to be combining information with other information from a computerised database. Even if you will only ever combine information from a book or a letter with your handwritten records, the principle still applies. The efficiency of databases and information exchange is an issue for everybody.
Like most maritime historians of my acquaintance, locally and internationally, I have hitherto never bothered to record signal codes on the presumption that their only possible relevance was the unlikely wish to identify an unnamed ship in a painting from its signal hoist which one could happily leave to someone else. However, my analysis convinces me of the value of recording signal codes where available for ships never allocated official numbers or LR/IMO numbers and using these to supplement other means of identifying ships for information recording and data exchange.
Signal letter codes cannot completely bridge the gap between ships with LR/IMO or official numbers on the one hand, and all world shipping on the other (because not all ships were allocated signal codes) but they can make a contribution towards significantly reducing that gap and therefore the residual that can only be handled by more arbitrary and therefore less readily universalised means.
Deriving unique identifiers from the original records ensures that anyone using the same approach anywhere at any time will derive the same code and ensure that the resulting codes cannot be unintentionally duplicated. Moreover, they are self-regulating and self-correcting in that if you accidentally adopt the second rather than the first applicable to a ship, subsequent research and data matching will lead you to the first and neither can be allocated to any other ship so that misidentification is impossible.
If the approach suggested here for deriving unique identifiers for European ships is adopted for the period from the 1860’s to World War I and beyond, an eventual by-product of collaboration would be one or more databases recording the succession of signal codes allocated to a ship by successive national jurisdictions (and in some cases also official numbers issued by some other nation). Such databases would not only enable the quick identification of the preferred identifier but potentially provide a means of linking all signal code based identifiers to alternative identifier codes used in other co-operating databases and also to other nation-specific records whether computerised or printed. Through such linkages networks within networks of co-operating databases may evolve and the most convenient management practices evolve through experience with little waste of resources as such structures and means of translation will ensure information exchange and convertability by electronic means.
Outstanding issues
Information not yet available to me concerning how the European nations administered the allocation of signal codes in the 1865-1885 period may require revision of the suggested approach.
I do not yet have sufficient experience of
operating relational databases to form a final view of whether alphanumeric
codes should be (a) converted to wholly numeric codes even if avoidably large
and less easy to check, (b) retained but used in a separate field from wholly
numeric codes or (c) used solely in an auxiliary capacity to provide a
reliable, convenient and easy means to link to and import alternative wholly
numeric codes such as GSN’s. The last may well be the
best and would certainly work but there need not actually be any single best –
different procedures may be used in different circumstances depending on what
is optimum for the task without necessarily recreating a
I have not investigated the stability of signal codes in the period after World War I through to the internationalisation of the allocation system in the 1930’s or the utility of signal codes as ship identifiers in databases from then through to the introduction of LR/IMO numbers in the 1960’s. Feedback on these points would be appreciated.
Please direct comments and information to j_lowe@ihug.co.nz Keep scrolling down for links.
To go to the documentation of the
allocation of signal codes click here
To go to the discussion of official numbers
as database identifiers click here
To return to the
main maritime menu click here