| Shortcuts |
| TOP |
| Morse |
| Baudot |
| Murray |
| ITA2 |
| FIELDATA |
| ASCII-1963 |
| ASCII-1967 |
| Functions |
| Colors |
| Messages |
| Notes |
| Sources |
| BOTTOM |
by Tom Jennings
email: tomj (at) wps . com
most recently revised 29 October, 2004
revision history
Entire contents copyright Tom Jennings 1999-2004. All rights
reserved.
ASCII is not art. It's a code, a way of hiding things within a smaller thing.
This document is about character codes, specifically a history of ASCII(1), the American Standard Code for Information Interchange , and its immediate ancestors; FIELDATA, ITA2, Murray's telegraphy code, Baudot's telegraphy code, and Morse's telegraphy code, and involves some forensic bitology.
ASCII, born at the dawn of the modern computer age (1958--1965), is perfectly representative of the period; clean,
The codes covered here are the beginning of a crude alphabet for our new machines' pidgin, a baby language, for better and worse, mindlessly mumbled sub-atomic particles of thoughts. There is a thread of research that believes that the internal dialog of human thought is formed by language, not the reverse, and I tend to agree with them. Our character codes certainly shape the things we express and think of electronically.
Character codes are a form of information compression, to accomodate the extreme lack of bandwidth available in paper, ink, or the tapping armature of a telegraph. The concept of characters and character-codes in ASCII is utterly inseparable from our Western, roman alphabet culture. You need the "one time pad"(2) of Western culture to understand it or make use of it at all.
If this history seems too conveniently linear -- it is. My approach was to start with the survivor -- ASCII-1967 -- and trace its direct lineage backwards, then write from the oldest forward. And I vastly simplified things, otherwise it would require a thousand pages and large grant to pull off(3). This isn't a detailed history of the development of character codes per se, but of the codes themselves, the specific meaning of the individual character codes.
Background.
The history of electrical or electronic communications really
means the history of serial communications. Serial means
a symbol at a time, one after the other, in an agreed-upon
sequence. The concept isn't arbitrary, it seems to be inherent in
human language. Words are spoken one at a time, words have a
beginning and an end. While vision is "parallel", broad-side,
both alphabet-based and iconic languages look at one symbol or
ideograph at a time.
Character-based communications is fundamentally different than things like telephony. Characters reduce communications to discrete symbols (incorrectly called "digital"), while things like telephones and facsimile ("fax") are continuously variable (reasonably called "analog", as the vibrations in an earphone is an analogy, a flawed copy, of the vibrations your voice makes in a microphone).
The fantastic advantage of discrete symbolic communications is that the meaning can be modified mechanically. A trivial and silly example: every time you write "PLEASE SEND ME 9 FRUITCAKES" a machine transporting your email could change it to "PLEASE SEND ME 900 FRUITCAKES". (It also helps that there are so many layers of mediation you can't tell if a person or a machine wrote the symbols "9", "0", "0", etc.) This is because "meaning" is accessible; it is agreed that "9" is a number, a quantity. The meaning in the spoken-sound "nine" is quite well hidden to machines, so far.
Character codes are human codes, for surrogate machine organs. The histories of each code is as complex as any human endeavor, only a lot more boring. But if you've persisted so far you might as well continue.
And finally, I will utterly ignore the most obvious end-result of all this electrical communication; the printed word on paper, because at that point, as far as character codes go, it all stops and leaves the mechanical/electrical/electronic realm.
We need to be clear on a few things before we start. While I've tried to limit jargon to an absolute minimum, I need to be clear on a few definitions used in the text:
The standard story has it that Samuel Morse invented the electric recording telegraph in 1837. Since there's a code called Morse code you'd think he'd created that too, and at the same time, but it's not how it happened.
Morse's original signalling scheme didn't involve the transmitting of codes for characters at all; he went one step further, and transmitted what was essentially just a numeric code. At each end of the telegraph, the receiver and the sender both would have a giant dictionary of words, each word numbered. Intelligence would be transported by the sender looking up each word in the "dictionary", obtaining it's numerical code, and transmitting the numeric code for each word, for every word in the message. (It may have been his intent to automate this process, with each transported number controlling a mechanism that located each word.) The receiver would obviously perform the opposite function, receiving the numerical codes and converting them to words using the big dictionary. A reasonable enough approach, considering there was no (zero) experience to fall back on, it seems to have disappeared by 1844, when the famous "WHAT HATH GOD WROUGHT" message was sent. Morse's assistant, Alfred Vail, allegedly worked up the code we think of as "Morse's code", which though still cryptic, had a small, finite(9) set of symbols that were already well understood -- the roman alphabet.
What may not be obvious is that Morse's system is also a recording system; it recorded it's signals on a narrow strip of paper with a pen, making little wiggles when the voltage on the wire changed; the operator would decode these afterwards. The practice of decoding Morse's code by ear didn't happen for a few decades, and only once it became a character code. Later systems pricked holes with a needle. (This appears to be the direct and immediate predecessor to all "paper tape" storage systems nearly always associated with teleprinter equipment.) One scheme raised bumps on paper; Morse used it in 1844 to receive the famous "WHAT HATH GOD WROUGHT" message from the Supreme Court in Washington D.C. to Baltimore, shown below. (The original tape is in the Smithsonian Museum in Washington D.C.)
(There's an interesting story within this early use of "tele-communications", in the inscription Samuel Morse wrote along the top of this historic artifact:
This sentence was written from Washington by me at the Baltimore Terminus at 8h. 45 min. on Friday, May 24th. 1844, being the first ever transmitted from Washington to Baltimore by telegraph, and was indited by my much loved friend Annie G. Ellsworth. Signed...
He appears just as confused about 'here-and-there' as any novice internet emailer, with the "from Washington by me at...Baltimore", repeated in the same brief sentence.)
| TOP | Morse | Baudot | Murray | ITA2 | FIELDATA | ASCII-1963 |
| ASCII-1967 | Functions | Colors | Messages | Notes | Sources | BOTTOM |
| symbols | ||||||||||||||
| 1 |
. E |
_ T |
||||||||||||
| 2 |
.. I |
._ A |
_. N |
_
_ M |
||||||||||
| 3 |
... S |
.._ U |
._. R |
._
_ W |
_.. D |
_._ K |
_
_. G |
_ _
_ O |
||||||
| 4 |
.... H |
..._ V |
.._. F |
._.. L |
_... B |
._
_. P |
_.._ X |
_._. C |
_
_.. Z |
_._
_ Y |
._ _
_ J |
_
_._ Q |
||
| 5 |
..... 5 |
...._ 4 |
_.... 6 |
..._
_ 3 |
_
_... 7 |
._._. + |
_..._ = |
_.._. / |
.._ _
_ 2 |
_._ _. ( |
_ _
_.. 8 |
._ _ _
_ 1 |
_ _ _
_. 9 |
_ _ _ _
_ 0 |
| 6 | .._
_.. ? |
._._._ . |
_....._ - |
_._ _._ ) |
SPECIAL NOTES ON THIS TABLE: Putting telegraphy code into a table is problematic; unlike the rest of the codes in this document, imposing modern base-two conventions on telegraphy codes isn't so easy. For comparison purposes, I've arranged the code into a table based solely upon the length of the two asserted states; short and long, aka dot and dash (or dit and dah, etc), to attempt a correlation with later codes to see if there are any length-based criteria for the seemingly arbitrary arrangement of characters is the table. You decide if it was worth it. Suggestions welcome. Also, shown here is the modern International Morse Code; rather than attempt to document the original numeric code, I thought it more useful to start with the first character-based code. From what I can see of the 1844 tape, shown further above, the alphabet looks the same; The International code varies mostly in a number of punctuation symbols, which are contentious anyways. Most of the punctuation symbols are six or more elements. Further, for comparison purposes, I've only included the punctuation symbols that were used in the later Baudot and early teleprinter code, and hopefully more representative of the period than the current rich set.
Telegraphy is about pushing symbols over a distance using a long wire. Everything else is secondary to this goal.
In 1837, things precision was made of wood and brass and steel, purely mechanical. Electricity had recently become regularized ("understood" too strong a word), and the electro-magnet, copper wire wound 'round a steel core, a movable armature hung closely but not touching over it, was the arbiter between the electrical and the mechanical; a switch, metal moved by a human hand or machine, the arbiter between the mechanical and the electrical world. A long and vastly expensive wire between the two, with a battery to provide electrical power, provides instantaneous communication between places possibly miles apart.
Samuel Morse's system is today called "digital" but is more accurately called "discrete"; with a switch, controlled by a human hand or a machine, battery voltage is applied or removed from the wire, and detected instantaneously at the other end. The states are "discrete" in that they do not rely on finely measuring the voltage; it's mere presence or non-presence suffices.
Skipping theory, information is sent from one end to the other by changing the state of the wire over time, in a manner agreed-upon at both ends.
Morse's first system, in 1837, wasn't very nice. The sender used a cumbersome code, involving a metal slug with coded notches, one for each possible word(16)(7). The sending machine "felt" the notches across the top of the slug, and impressed on/off voltages on the wire. The far-away receiver ticked up and down in time with the on/off voltages, inscribing marks on a strip of paper. The marks were looked up in a book, a "dictionary" of codes and matching words, the words written down, and the message deciphered.
By 1844 this awful scheme was abandoned in favor of a simple alphabetic code, still making scratches on paper strips, but no longer needing tedious, bulky and expensive dictionaries and confusing metal slugs. We need not think of it ever again.
The new alphabetic code, now called "Morse's Code" (even though Morse didn't create it, it was used on his hardware and he got the credit) recognizes four different states of the wire: voltage-on long ("dah" or dash), voltage-on brief ("dit" or dot), voltage-off long (space between characters and words), voltage-off brief (space between dits and dahs)(5). (When sent by hand, it's casually considered to have only two symbols, "dit" and "dah".)
Like all modern codes, Morse code is built up from smaller symbols; characters (letters, numbers, punctuation, etc) are encoded as a series of dits, dahs, and spaces between.
| N | E | W | S | A | T | 1 | 1 | ||
| -· | · | ·-- | ··· | ·- | - | ·---- | ·---- |
It is a variable-length code, designed so that the most common characters are short -- the letter "E" is a single symbol, while "1", occurring less often, is five symbols (considering that for human purposes the dit or dah and the brief space that follows it is a unit). The vowels are all brief and simple, and less-common letters use longer sequences, not that there was much science behind the letter-frequencies, apparently, as there isn't much correspondence between modern, machine-counted letter frequencies and the Morse code lengths beyond the first two or three characters.
Note that there are no "format effectors", that is, codes that control how the transmitted symbols are to be displayed; that's a little too modern yet. There is also no non-printing space character yet; like the Arabic invention of zero, it wasn't needed until a positional notation came into use with mechanical teleprinters, some 50 years in the future. In telegraphy, you simply pause briefly.
The beginnings of modern serial
communications
In France, Emile Baudot designed his own "printing telegraph" system in 1874. The code itself was developed by two of his cohorts, Johann Gauss and Wilhelm Weber as part of the overall system(), though apparently the code was used for cryptography by Sir Francis Bacon as early as 1605(15). Unlike Morse's code, all of the symbols in "Baudot's" code are the same length -- five symbols, making mechanical encoding, and more importantly, decoding, vastly easier(11). The design is quite alien by today's standards, a multiple-wire synchronous multiplex system, where the human operators did the "time slicing". Codes were generated by a device with five piano-like keys, operated with two fingers on the left hand, and three from the right. Synchronization with the "network" was done by the human operator listening for a "cadence signal", and took quite some skill to operate. Printing was done automatically and mechanically. As crude as the hardware is, the fixed-length code was a breakthrough (if an obvious one in hindsight).
Another important side effect of Baudot's method is that it relies on only two states of the wire; the presence of a voltage, or not. This provides an added level of relibility by lowering the possibility of errors, and has a sound mathematical basis later expanded upon by Claude Shannon in the mid 20th century.
| TOP | Morse | Baudot | Murray | ITA2 | FIELDATA | ASCII-1963 |
| ASCII-1967 | Functions | Colors | Messages | Notes | Sources | BOTTOM |
| LTRS | ||||||||
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
| none | undef | A | E | É | I | O | U | Y |
| IV | FIGS | J | G | H | B | C | F | D |
| V | LTRS | C | X | Z | S | T | W | V |
| both | (note) | K | M | L | R | Q | Z | P |
| FIGS | ||||||||
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
| none | undef | 1 | 2 | & | 3 | 4 | O | 5 |
| IV | FIGS | 6 | 7 | H | 8 | 9 | F | 0 |
| V | LTRS | . | , | : | ; | ! | ? | ' |
| both | (note) | ( | ) | = | - | / | No | % |
A schematic Baudot keyboard is shown to the right;
note how the fingers are labelled. The fingers of the left hand,
IV and V, denote rows in the
table to the left; the three fingers of the right hand,
I, II and III,
form the column number; eg. finger I by itself
is 1 ("A"), II
by itself is 2 ("E"), both
I and II are 3
("É"), etc. V is the most
significant digit (har har); I the least. (Note
that Eric Fischer has some problems with this
table(12).)
There are two sub-tables, marked FIGS and LTRS. The table from which the character indicated by the finger-code comes from depends on the most-recently-pressed FIGS or LTRS key; these two keys specify which code table that both the sender and receiver should use.
It's not much worse than having to remember to press the SHIFT key on your computer keyboard to get the % character above the 5 key. The alternative was using a sixth finger, which was probably deemed to be even more cumbersome.
(Note) Not falling into any particular category are two codes that indicate "error" or "erasure", by pressing the two left-hand keys only. These indicate that the last character should be ignored, and produce a character similar to an ASCII asterisk *. This function mutated into DEL over the next few decades.
Telegraphy codes are not "numerical". These codes are not random, scrambled, insensible, etc. There is subtlety here recent computer weenies can't grasp easily. It's easy to complain that machine collation is "impossible" with these codes, but keep in mind that it simply wasn't on the agenda to do so; and that machines even capable of such manipulations of character streams were more than half a century in the future.
Where Morse's code was asymetrical and frequent letters brief, Baudot's code is arranged to minimize hand and finger motion and fatigue, and to "make sense" to the human hand. Forcing it into this table destroys that vision, but is done for my modern comparison purposes, and with the assumption you are not using it to learn to send the Baudot code; it makes the code appear random and badly designed; this is not true. (This is exactly the sort of thing Thomas Kuhn wrote about in "The structure of Scientific Revolutions").
I cribbed the code from two unattributed scans of apparently old graphics occasionally seen on the net, found at (7) (10). In these tables the finger positions are labelled I II III ... V. Today this numbering implies that I is given a weight of 1, II a weight of 2, etc., and this is further supported by the order of characters in the table of International Telegraph Alphabet 1 (ITA1), where A, code "1" in my table and the forefinger of the right hand, has an impulse in the first (out of five) position only. (This further assumes that ITA1 characters were sent least-significant impulse first; with no previous experience before them, it's hard to imagine that the designers of ITA1 didn't issue first the impulse labelled "1st impulse", followed by the remaining four.) This pattern holds true for the rest of the characters in the ITA1 table.
Note that there is no explicit non-printing space character; FIGURE BLANK and LETTER BLANK were used for this purpose(10), so there were in fact two "space" characters, in addition to their case-select functions. There are no "format effectors", such as CR or LF, though ITA1 does contain both.
Many of the unusual characters varied from implementation to implementation, such as É and others. However a common subset of uncontested characters remains constant throughout all of the codes in this article.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
| LTRS | ||||||||||||||||
| undef | A | E | É | I | O | U | Y | FIGS | J | G | H | B | C | F | D | |
| LTRS | C | X | Z | S | T | W | V | (note) | K | M | L | R | Q | Z | P | |
| FIGS | ||||||||||||||||
| undef | 1 | 2 | & | 3 | 4 | O | 5 | FIGS | 6 | 7 | H | 8 | 9 | F | 0 | |
| LTRS | . | , | : | ; | ! | ? | ' | (note) | ( | ) | = | - | / | No | % | |
An astute (or obsessive; your call) person by now has noticed that there are only 32 combinations possible using five fingers, not enough for all the codes in the tables above. There are actually 64 symbol positions in the Baudot code. This is handled by splitting the codes into two "cases", and stealing two codes to specify which case to use. Think of an old mechanical typewriter, where "SHIFT" actually moves the paper and platen up or down, to get at the different cases, or rows of characters; "SHIFT" actually doubles the number of available characters using the same number of print hammers. These special symbols are named FIGS ("figures") and LTRS ("letters"). To type COST 700 DOLLARS you would press the following keys:
| C | O | S | T | space | [FIGS-SHIFT] | 7 | 0 | 0 | space | [LTRS-SHIFT] | D | O | L | L | A | R | S |
Where (sp) is the space bar, and FIGS-SHIFT and LTRS-SHIFT is the typewriter's "SHIFT" key. This is quite practical, because as this paragraph shows, written communication is mostly letters, so it isn't as awful as it sounds; you send a "LTRS" code, then all of the codes that follow are assumed to be in the "letters" code table; when a "FIGS" code is sent, all of the codes that follow are taken from the "figures" table. Therefore, in current ITA2 code, code number 6 means either I if LTRS was the last case-code sent, or 8 if FIGS was last sent.
It may seem odd that this wasn't used to generate "upper" and "lower" case letters, eg. a vs. A, b vs. B etc., at least in the electrical communications world, for nearly a half-century. It was used to cram the minimum number of symbols into five digits to handle basic communications; letters, numbers, punctuation. This technique was in common use until the advent of the ASCII code, covered later.
| TOP | Morse | Baudot | Murray | ITA2 | FIELDATA | ASCII-1963 |
| ASCII-1967 | Functions | Colors | Messages | Notes | Sources | BOTTOM |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
| LTRS | ||||||||||||||||
| 0 | BLANK | E | LF | A | LTRS | S | I | U | CR | D | R | J | N | F | C | K |
| 1 | T | Z | L | W | H | Y | P | Q | O | B | G | FIGS | M | X | V | DEL |
| FIGS | ||||||||||||||||
| 0 | BLANK | 3 | LF | undef | LTRS | ' | 8 | 7 | CR | 2 | 4 | 7/ | - | 1/ | ( | 9/ |
| 1 | 5 | . | / | 2 | 5/ | 6 | 0 | 1 | 9 | ? | 3/ | FIGS | , | £ | ) | DEL |
Between 1899 and 1901, Donald Murray, n agricultural college graduate who joined the New Zealand Herald newspaper when farming "proved unsuitable" for him(18) developed an automatic telegraphy system, using what he thought was the best features of the Baudot multiplex system. Rather than the difficult "piano" key encoding system, his scheme used a more reasonable (to modern souls) typewriter-like keyboard mechanism that automatically generated the bit-level codes, and presumably handled the synchronization.
Since people didn't have to impress bit patterns onto the wires with their fingers, he was free to arrange his code for the benefit of the machinery; all that the operators had to do was press the appropriately-labeled key top, the machinery did the dirty work.
Murray's criteria was to minimize the number of mechanical operations per character; the most common characters have codes that contain the fewest number of 0-to-1 transistions. The letter E, with only one of five bit positions having a 1, moves only one bit's worth of mechanism per character; and punches only one hole in a paper tape (one lever, one punch/die movement), reducing wear on the machinery, no small matter when you consider that a single-spaced, typed page of text is approximately 2000 characters, or more than 10,000 character "bits", each bit having at least one mechanical component that moves, needs oiling, adjustment, etc.
Two items of note here: the codes shown above as LF and CR are shown in my meager data, here and here, as COL and LINE PAGE, respectively. I admit it is flimsy evidence, but it is probably not a coincidence that in the subsequent ITA2 code, these same codes have those functions. So I let my assertion stand(0).
Western Union Telegraph Company purchased the American rights to Murray's design(15), and after modifying the FIGURES case (dumping the peculiar fractionals and other obsolete characters in favor of their own) used the code through the 1950's. Murray's code is the last ad-hoc character code of historical note in this thread; at this point, telegraphy networks were large enough to not tolerate hacker meddling with "better" systems, instead favoring lumbering, international-committee codes with infrequent change. Murray's code, as modified by Western Union, and with the exception of a few "national use" characters, was adopted by CCITT (International Telegraph and Telephone Consultative Committee, "CCITT" if you're French) as the ITA #2 code (International Telegraphy Alphabet), covered next in our story.
| TOP | Morse | Baudot | Murray | ITA2 | FIELDATA | ASCII-1963 |
| ASCII-1967 | Functions | Colors | Messages | Notes | Sources | BOTTOM |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
| LTRS | ||||||||||||||||
| 0 | BLANK | E | LF | A | sp | S | I | U | CR | D | R | J | N | F | C | K |
| 1 | T | Z | L | W | H | Y | P | Q | O | B | G | FIGS | M | X | V | LTRS |
| FIGS | ||||||||||||||||
| 0 | BLANK | 3 | LF | - | sp | ' | 8 | 7 | CR | WRU | 4 | BEL | , | undef | : | ( |
| 1 | 5 | + | ) | 2 | undef | 6 | 0 | 1 | 9 | ? | undef | FIGS | . | / | = | LTRS |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | ||
| LTRS | |||||||||||||||||
| 0 | BLANK | E | LF | A | sp | S | I | U | CR | D | R | J | N | F | C | K | |
| 1 | T | Z | L | W | H | Y | P | Q | O | B | G | FIGS | M | X | V | LTRS | |
| FIGS | |||||||||||||||||
| 0 | BLANK | 3 | LF | - | sp | BEL | 8 | 7 | CR | $ | 4 | ' | , | ! | : | ( | |
| 1 | 5 | " | ) | 2 | # (f) | 6 | 0 | 1 | 9 | ? | & | FIGS | . | / | ; | LTRS | |
ITA2 (International Telegraph Alphabet #2) is the real name for the code often called "Baudot", though it retains the gross characteristics of the Baudot and Murray code before it, with it's five-level code and "case" concept. No teleprinter was ever made that used Baudot's code. Even the ARRL's Radio Amateur Handbook(1) calls ITA "Baudot", casually. Baudot's code was replaced by Murray's code in 1901. And ITA2 replaced both by the early 1930's, so virtually all "teletype" equipment made in the U.S. uses ITA2 or the U.S.-national version of the code.
Though ITA2 is structurally similar, it departs from Baudot in a number of ways. The printing characters are again scrambled, in a so-far-mysterious way(6). "Format effectors" appear for the first time; codes that do not cause printing of a symbol, but control the physical arrangement of characters on a page, specifically, CR and LF, and I suppose you might call BEL one, in that it might cause a human operator to do something useful. The european characters were dropped, and an explicit non-print space code was added.
Since the five-bit scheme of telegraphy was retained, the table above is read the same way as before; when LTRS is received, codes that follow are taken from the first two rows of the table, marked "LTRS"; when a FIGS code is received, codes that follow are assumed to be in the last two rows, marked "FIGS". You can see that some symbols and functions appear in both cases, CR, LF, space, NUL, and of course the case control codes LTRS and FIGS.
In the real world of smelly, oily, metal machinery, the handling of LTRS and FIGS is more complicated; some machines revert to LTRS state after the end of a line, and some after every space. This is for various error-recovery reasons and isn't part of the code. Note, however, that if you were to look at a stream of ITA2 codes, such as generated by amateur radio users, you must know to "unshift on space" (as it's called) and "unshift on linefeed", that is, explicitly insert the implicitLTRS code. At a typical speed of 45 baud, shortcuts are common!
ITA2 and it's relatives have very few control codes, reflecting their telegraphy and non-automatic roots; the only real transmission control is WRU, the "WHO ARE YOU?" function, and arguably BEL, which rings the bell. Teletypes and similar ITA2-coded machinery were pressed into service in early computing simply because they were the only symbolic machinery around; it wasn't like anyone really liked them; they were horribly slow amongst other things, and besides, in the primordal age of computing (1930 - 1960), only a few visionaries like Alan Turing or Vannevar Bush saw any need for computers to even have the ability to process alphabetic symbols (most assumed computers were for calculating with numbers, imagine!).
Codes designed to cover both traditional communication and new-fangled computers, such as FIELDATA, added many control functions, and untangled some of the by-now-annoying features such as jumbled alphabets.
Differences between ITA2 and U.S. TTY(1)The two codes are nearly identical, differing only in the FIGURES case.
This was the electro-mechanical age, where it was far easier to change a teletype's print-head than it was to translate codes from one to another. (I have a type basket for my Model 28 Teletype, circa 1964, that has a very rich character set, but it is scrambled to be compatible with some long-dead IBM punched-card equipment, and is hence unusable.)
Buried in the ITA code is a remnant of what is likely a seventy-five year old compatibility war, probably between two large equipment manufacturers.
The alleged controversy is over the ordering of bits across a row of paper tape, the storage medium of the time. I unearthed this corpse whilst trying to convert some old amateur-radio 5-level tapes to modern disk files. To make a long story short, after reading the physical tapes into my computer, I found that all of the bit patterns on the 5-level tape were reversed, left-to-right. After rigorously double-checking hardware and coding conventions I began to suspect that certain manufacturers equipment punched holes left-to-right, and some right-to-left. As long as you read tapes on the same brand of equipment they were punched on the bits came out just fine; the problem is when you punch a tape on Brand X, then read it on Brand Y's reader -- everything is backwards!
| Character | Defined | Reversed | Result |
| BLANK | 0 0 0 0 0 | 0 0 0 0 0 | symmetrical |
| space | 0 0 1 0 0 | 0 0 1 0 0 | symmetrical |
| LTRS | 1 1 1 1 1 | 1 1 1 1 1 | symmetrical |
| FIGS | 1 1 0 1 1 | 1 1 0 1 1 | symmetrical |
| CR | 0 1 0 0 0 | 0 0 0 1 0 | equals LF |
| LF | 0 0 0 1 0 | 0 1 0 0 0 | equals CR |
Now this isn't necessarily fatal; if data on a Brand X tape is transmitted through a network built with Brand Y equipment, while the data was en route it would appear scrambled; but upon reaching it's destination, it would be just fine when finally read on a Brand X tape reader.
This problem appears to have been solved with a compromise; the characters that are "transmission control" related, the ones that would most affect the movement of this data through the wrong-brand network, are bit-wise symmetrical -- the codes for FIGS, LTRS, space and BLANK -- are the same reversed left to right! Further, the codes for CR and LF, equal each other when reversed left to right!
The CR/LF reversability is useful because CR followed by LF produces the same result as LF followed by CR on page printers.
Other symmetrical characters include C, R, Y and Z. I'd be curious to know if any of these characters were used in de facto protocols.
| TOP | Morse | Baudot | Murray | ITA2 | FIELDATA | ASCII-1963 |
| ASCII-1967 | Functions | Colors | Messages | Notes | Sources | BOTTOM |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
| 0 | IDL | CUC | CLC | CHT | CCR | CSP | a | b | c | d | e | f | g | h | i | j |
| 1 | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z |
| 2 | D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 | SCB | SBK | undef | undef | undef | undef |
| 3 | RTT | RTR | NRR | EBE | EBK | EOF | ECB | ACK | RPT | undef | INS | NIS | CWF | SAC | SPC | DEL |
| 4 | MS | UC | LC | HT | CR | sp | A | B | C | D | E | F | G | H | I | J |
| 5 | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
| 6 | ) | - | + | < | = | > | _ | $ | * | ( | " | : | ? | ! | , | ST |
| 7 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ' | ; | / | . | SPEC | BS |
(NOTE: The names of FIELDATA supervisory codes are not standardized.)
The FIELDATA character code is part of an Army communications system that existed from 1957 through the early/mid 1960's; while it saw no use in commercial communications equipment as far as I can tell, it had an enormous influence on the design of ASCII. ASCII's design was well under way when FIELDATA was deployed, and at least one person worked on both standards (Leubbert (9) (13) ).
FIELDATA isn't just a code; it's a design for "information interchange" that includes an electrical specification and adapters(8)to make peripheral equipment (teletypes, etc.) compatible with FIELDATA computers, such as the MOBIDIC(14) (Sylvania) and BASICPAC and LOGICPAC (Philco).
All of the FIELDATA equipment is long obsolete, though the code itself lingers to this day, unfortunately, in legacy COBOL software (UNIVAC computers used a mushed-up version of FIELDATA as their internal character code); can you say "Y2K"(7)? For all intents and purposes "FIELDATA" today refers to the character code. It, or a minor variant, is sometimes called the "DoD standard 8-bit code".
FIELDATA explicitly incorporates for the first time the concept of "control codes" (called in FIELDATA Supervisory codes), for in-band signalling. For this purpose, a seventh bit (called the "tag") determines which code table to use; 1 for the alphabetic set, 0 for the Supervisory set. As a computer internal code, these are combined into one table, seven bits in width.
A lot of effort went into transmission error analysis, and appears to have affected the supervisory code choices. In-band signalling was addressed directly, and while the codes today appear quite haphazard, they introduced the concept in a workable way. Message-format functions were included (SCB, ECB, SBK, EBK, EBE), as were error correction/flow control functions (RTT, RTR, NRR, EOF, RPT).
Alphabetic and numerical characters are in collation order; simple arithmetic comparisons perform traditional sorting, with the non-printing-space character positioned before A. Characters in the table are arranged such that the alphabet, numbers, and "math" and graphic character sub-sets are isolatable with simple bit-masks. (This later taken to an extreme in ASCII.)
It is a reasonably well-designed code, considering that it broke so much new ground, and we can assume the Army's experience affected the then-current design of ASCII. It was a far-thinking solution to a perennial problem, and the Army had the large equipment base, the need, and most importantly the budget to pull it off.
Needless to say, it didn't solve all problems; FIELDATA is riddled with now-obvious bad ideas, redundancy, missing functions (present in codes before, and after, FIELDATA), etc. But don't be too harsh, it was long long ago in a galaxy far far away (there were probably no more than a few thousand commercial computers in existence at this time).
It is important to note that unlike today, where a character code is assumed to be a single, atomic unit, in FIELDATA the definition is different, and reflects the state-of-the-art of the time. While FIELDATA is essentially a 6-bit code, the definition(9) states that there is an underlying 4-bit "detail" (row of the table above), two "indicator" bits that select one of four rows within an alphabet (Supervisory or Alphabetic), and the "tag" bit, making it variably a six or seven bit code. (Computers of the time considered characters to be six bits in width, and is why many computers had register and memory widths of 18 and 36 bits.)
FIELDATA, and to a lesser extent the ASCII that follows, were designed for hardware decoding. Keep in mind that the free-for-all of symbol and character manipulation by computer didn't happen on a large scale until the mid-1970's; printing and "input/ouput" was something that was done "off line", computer time considered too precious for mere printing of tables and such. Separate machinery -- often partially mechanical -- was used to read tapes produced by computers, and render them into human-readable form. FIELDATA is designed for that environment, and at least one document(9) explicitly describes the decoding of character codes with a separate wire per character-bit. For example, the seventh bit, called the "tag" bit, which determines which alphabet to use (Supervisory or Alphabetic) could be a wire leading to machinery or circuitry; but when used as an "internal code" (eg. computer character code) it can all be contained in one storage word, as is assumed today.
FIELDATA variantsMany thanks to Eric Fischer for providing these variant FIELDATA code tables and the discussion about them.
Eric found the following three variants of the FIELDATA code in the 1968 edition of Reference Data for Radio Engineers by Howard Sams Publishing. Cribbed from an appendix, no additional data on these code tables was provided.
So-called "STANDARD FORM"This variant differs from the reference version by the irritating insertion of codes in the supervisory alphabet, rather than simple substitution, and introduces a number of un-expanded and mysterious acronyms. No attempt has been made to decode these particular puzzles, that is left as an exercise for the reader.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
| 0 | IDL | TST | TCL | TAB | CCR | CSP | a | b | c | d | e | f | g | h | i | j |
| 1 | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z |
| 2 | Dial 0 | Dial 1 | Dial 2 | Dial 3 | Dial 4 | Dial 5 | Dial 6 | Dial 7 | Dial 8 | Dial 9 | SOC | SOB | SOD | undef | undef | Stop |
| 3 | RTT | RTR | NRR | EOBK | EOB | EOF | EOC | AKR | RBK | ISN | NISN | CWF | undef | SAC | SPC | DEL |
| 4 | MS | UC | LC | HT | CR | sp | A | B | C | D | E | F | G | H | I | J |
| 5 | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
| 6 | ) | - | + | < | = | > | _ | $ | * | ( | " | : | ? | ! | , | ST |
| 7 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ' | ; | / | . | SPEC | BS |
From the same appendix as the previous table, no other information is available on this variant, nor its obscure code acronyms nor their purpose, though I arrogantly assume that codes of the same name as the reference table have the same meaning.
With a name like "COMLOGNET" it's a safe bet this is a military code.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
| 0 | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef |
| 1 | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef | undef |
| 2 | β | # | t | OWD | undef | @ | % | ¢ | Bell | & | ∑ | ≠ | ≢ | °ree; | &plusz; | &minusz; |
| 3 | undef | ACK1 | ACK2 | REQ | WBT | REP | SOML | ER | DM | EOM | SOLB | EDB | EOLB | RM | SOMH | undef |
| 4 | MS | UC | LC | HT | CR | sp | A | B | C | D | E | F | G | H | I | J |
| 5 | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
| 6 | ) | - | + | < | = | > | _ | $ | * | ( | " | : | ? | ! | , | ST |
| 7 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ' | ; | / | . | SPEC | BS |
Another likely-military variant of FIELDATA from the same source as above, this one is annoying to tabularize because it does not have nice neat acronyms, instead a series of small messes (sorry, I could not resist). one imagines "Mess" stands for "Message".
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
| 0 | undef | undef | IDL | EOD | EOA | SOM | ACK | Begin Interr | undef | End Interr | D* | undef | Line Good Char | undef | Interr Mess | undef |
| 1 | undef | Line Good Mess | Main Alarm | undef | undef | Display | Alert Loop | Last Older Mess | Send # Mess | Take Over | undef | undef | undef | Last Resort | undef | undef |
| 2 | AA | undef | undef |