Track your comments!
[x]


When you register, comments on your articles and replies to your comments appear here. Register Now!

Sign in to your account
[x]

Not a Scientific Blogging member yet?

Register Now for a Free Scientificblogging.com Account

  • Customize your profile with pictures, banner, a blogroll and more.
  • Leave comments on articles, add other members to your friend lists, chat with people on the site.
  • Write blog posts that can be seen by hundreds of thousands of readers.

It's free and it only takes a minute!

Already a Scientific Blogging member?

Sign In Now

Banner
By Patrick Lockerby | July 1st 2009 02:38 PM | 15 comments | Print | E-mail | Track Comments
.

More The Chatter Box articles

All

About Patrick Lockerby

Retired engineer, 60+ years young.
Computer builder and programmer.
Linguist specialising in language acquisition and computational linguistics.
Interested in every human endeavour except the... Full Bio

Error Handling And Error Codes


This article is a brief explanation of how errors can be handled in computer code.  The basic idea is that if an error arises then it should be obvious, and easily corrected.

The concept of error-handling and correction is relevant to the suggestion in my series,  A Science Of Human Language,  that what we see as the grammar of a language is simply an error-handling component of language.  I suggest that the function of the grammatical aspects of language is this: if an error arises then it is made consciously or subconsciously obvious, and is easily corrected.


The ASCII code

ASCII, the American Standard Code for Information Interchange, pronounced ass-key, is a set of codes for printing text.  Computers use binary numbers, most commonly based on 8 bits, giving a range of values from 0000 0000 to 1111 1111 or 0 to 255 in decimal.  The ASCII codes use only 7 bits, giving 128 values in a range from 0 to 127.  This leaves one unused bit position in a standard computer byte of eight bits.

The unused bit can be used for error detection.  The procedure is to count the ones in a code and to set the spare bit to match, or reflect, the odd or even nature of the result.  The match, or parity, between the number of ones in the ASCII code and the parity bit is tested at the receiving end of a communications link.  If there is a mis-match, a request is made to re-send the code.



More complex codes can be used so that the receiving computer can not only detect an error, but auto-correct that error.

In the case of human language, the error-handler is the grammar.  When language users conform to the norms of word and sentence building, of spelling and pronunciation, of intonation and of conversation, then errors of reading or hearing are made prominent and are easily corrected.  Most often, the language-handling parts of the brain correct the error, so that the hearer or reader is entirely unaware that there ever was any error.

Thatz Y U can reed this evn wiv orl of teh miss takes.

Comments

Gerhard Adam's picture
Patrick

As an example, in large systems there was a double complement method of detecting bits that were "stuck" in a particular value and correcting the data.

Suppose that the bit pattern was supposed to be:   110101

This would be read in (with the error) as:  111101  (with the third bit in error)
Taking the complement and write it:   000010
Read again (note error still exists):     001010 
Complement and get original results: 110101

This kind of error was considered a hard error (reading from memory) since the particular bit was effectively "stuck" and therefore the detection mechanism would help identify what needed to be corrected.

Note that this approach only works with bits stuck in the wrong state.  If they were stuck in the correct state, then errors are still produced, but this example is just intended to show how one type of error detection/correction works.

logicman's picture
Thanks Gerhard.  I'd forgotten about stuck bits.  Hardware is so much more reliable these days.

The general principle of error correction is: the lower the desired error rate, the more data that needs to be transmitted.  By using 64 bits per byte it's possible to reduce errors almost to vanishing point.  On the other hand, by accepting errors, as in lossy picture compression, the data rate can be increased dramatically.  In speech, the redundancy (error-correction components) varies between about 50% for average speech to about 90% in a silent auditorium with perfect elocution by the speaker.

briantaylor's picture
Watched Nova Now last night.
A section of the show was about the young prof who invented the Kapchka.
Can't remember who or where, or how to spell Kapchka...
(If you don't know, a Kapchka is that curvy word that a computer can't read but a human can. They are used to determine the form is being completed BY a human.)
Interesting to note was that this young man was also largely responsible for solving another problem, via the same technology.
The university he works for is instrumental in the slow process of digitizing every last stinking word published on this planet.
During the process of these scans, the computers occasionally come across words they can't read, or recognise.
To fix this, websites seeking security can now take part in the solving of this problem via the folks who visit the sites.
They have added a word to the usual Kapchka
The first word is irrelivant to the security, it is just one of the "mystery" words from some, long forgotten tome, that the computer can't read. (the second word is the actual Kapchka)
The user enters what they think is the first word, and regardless of whether or not it is correct, it gets sent back to the university's computers to be inserted back into the original text in question.
(I'm presuming that someone is looking at these to decipher if they are correct...)

So, we've got about 750 million people working on a problem they didn't even know existed.
Pretty sweet, eh?

logicman's picture
Brian: the second test you refer to is called reCAPTCHA.  It works by presenting people with known and unknown words.  The known words filter out guessers and increase the confidence level for the unknown words. If enough people agree about an unknown word it is accepted as known.
http://news.bbc.co.uk/1/hi/technology/7023627.stm

The biggest problem with both types of CAPTCHA is that a spammer can copy the CAPTCHA to a 'research' site where unsuspecting members of the public are invited to break the code.  There is no such thing as a secure computer - unless, perhaps, you first melt it in a microwave oven and then drop it into the Mariana Trench.

briantaylor's picture
I gave up on security a long time ago, now I just keep my secrets in me head.
My wise old granddad taught me that one should never commit to record that which he wouldn't want the whole world to read.
(A mental drumroll awaits the joke to be inserted here........................:)
Thanks for refreshing my memory.
My grandmother taught me that if you play with matches you pee the bed.

jlafay's picture
Interesting post and the information is reflected at an abstract high level. If you don't mind me asking, why only use the example of ASCII? You seem to incorporate serial data communications and that works on a different layer than ASCII characters. The truncated bit from your proposed octet (octets work better in bit parsing/manipulation) could enhance error correction I suppose but we're also talking about a standard where all values in the 7-bit field would represent a valid ASCII character. You would have to go "outside the box" a lot more for validation because a processor does not care that a word is miss spelled in a message or on a web page.

TCP is already error correcting for data transmissions on networks and requests re-transmissions as well as complete "hand shakes" between to hosts no matter their locations on a network. I was hoping that this post would give some insight on your solution to the grammatic problem as opposed to ill received bits that are automatically corrected or re-transmitted from the originating transmitting host. Do you propose a standard core-dictionary as part of all operating system releases? Do you include slang and conversely, common miss spelled words in their erroneous constructs? You would also have to incorporate excellent grammatic rules/structures to validate against and even then good grammar can be subjective. A human decision still has to be involved which hinders correction; detection would be the easier component to the puzzle.

The introduction of captcha has helped web spamming quite a bit though, it's an excellent technology. It's too "fuzzy" for most malicious code. It could be the key reason for the spike in image processing research in the CS field, or maybe those research efforts are highlighted because of our awareness of captcha. Who knows. I'll be reading the rest of your blog series because I'm a senior CS student that has high interest in programming langagues and lexical analysis. Thanks for the food for thought.

Jeff

Gerhard Adam's picture
I was hoping that this post would give some insight on your solution to the grammatic problem as opposed to ill received bits that are automatically corrected or re-transmitted from the originating transmitting host.

If I'm understanding your suggestion correctly, you're considering that the system should also correct such spelling errors?

If true, this would be highly undesirable since spelling and grammar are subject to context and have no place in data transmission considerations.  During data transmission (of any type), it is absolutely critical that data NOT be modified or compromised in any way from its original source.  Errors would need to be corrected at the origin and not during any transfer mechanism.  To do otherwise throws into question the integrity of the data exchange.

jlafay's picture
Mr. Adam: that is somewhat my point. I don't believe that error handling should be done at an ASCII level, that is very undesirable. I'm trying to gauge what problem and what solution the author is proposing because it wasn't clear (to me anyways). If this is to be included in a blog series for language then bit correction isn't necessary. There is a lot more beyond that level to consider if grammatical correction is needed. Is it supposed to be an overall solution to a big problem that computers aren't versed in? Does the scope include web, electronic documents, telecommunications, etc.? I only proposed a very basic solution to see how the author responds and either discredit my train of thought and construct a new view or build on top of it as necessary.

Integrity should remain in tact for data transmission and not involve grammar rules or business rules. There are protocols in place that manage intricacies of traveling bits and may not need to be included in this idea. However, transfer mechanisms are used in networking. Otherwise there would be no way to know that a problem occurred in a session. Errors are corrected at the originating host when a retransmit is needed in a TCP connection; hand-shakes, connection states, and associated transfer confirmations keeps it all in check very nicely.

logicman's picture
Jeff: thank you very much for your thoughtful comments.  My main purpose in talking about parity checking was to show people who may never have heard of error-correction processes how a simple method works.

In the arena of human languages, my theory is that most words - nuons - function primarily to carry information to be shared consciously between users, whilst other words and affixes - quons - function primarily to carry information to be used unconsciously in the prevention and correction of communication errors.   Despite noisy environments, poor hearing, mumbling etc., we manage to acheive a remarkably error-free level of communication through speech.  I suggest that this is acheived purely by virtue of the huge redundancy - the excess of information - in human language.  The redundancy in all spoken languages is also incorporated into all  forms of writing.

The advantage of knowing how language really works can be applied to provide solutions to problems of computer use.  Take the example of scanning old documents into an archive, or the equivalent problem: providing a book-reading program for the blind.  Current spelling and grammar checkers fail to accurately deal with OCR errors.  A sufficiently advanced program, based on a socially evolved  natural grammar rather than 'school-book' grammar could deal with most of these errors without supervision. 

Although most books are written in a manner which conforms strongly with conventional ideas of 'good grammar', where they contain reported colloquial speech, the conventional grammar fails entirely to correct OCR errors, and may produce false positives.  For an example of this sort of problem, see Mark Twain's Tom Sawyer.

"She'd never noticed if it hadn't been for Sid. Confound it! sometimes she sews it with white, and sometimes she sews it with black. I wish to geeminy she'd stick to one or t'other--I can't keep the run of 'em. But I bet you I'll lam Sid for that. I'll learn him!"

http://www.gutenberg.org/etext/74

Gerhard Adam's picture
Thatz Y U can reed this evn wiv orl of teh miss takes

Patrick;  it's interesting when you really analyze this sentence and see what's happening regarding the error detection and correction.

Thatz doesn't produce a problem because reading it and pronouncing it uses the overlapping sound of the letter 'z' with the 's', they are redundant in this usage.

Y and U are equally redundant since the name of these letters is a word.
NOTE:  What is interesting is that these two letters next to each other make me constantly want to use the word 'YOU' and I have to strictly correct myself to not modify the sentence to read "That's so you can read ...."

can and this are spelled correctly and reed is phonetically identical, so the meaning is also redundant.

evn is more interesting because it is a problem we encounter whenever we have adjacent consonants.  It appears that our pronunciation of the latters themselves introduces a vowel.  For example, 'n' is really 'en' and letters like 'k' are really 'ka' when we attempt to represent the sound they make.

wiv (to me) translated to wif which sounded like a mispronunciation of "with" so it's almost like you could follow the error correction path with this one.

orl  - this one only makes sense when you vocalize it within the context of the sentence and see that what should be there is 'all'.

teh - visually this jumps out as a transposition error and is corrected visually rather than phonetically.

miss and takes - Visually this is meaningless, but when it is spoken out loud, the relationship to the next sound becomes clear and we understand that it is only half of a total word.

In any case, it's an interesting exercise to see which error handling mechanism seems to dominate in something like this.

jlafay's picture
So it appears that I over-analyzed just a bit then. I was thinking more applied computer science than your underlying message. I saw the ASCII example and my brain went into over drive thinking "Ok yeah, but why even use an extra bit for a standard that defines a character for each value?"

Moving back towards CS (my apologies, I really can't help myself) you may even notice single bits in data types used to represent signed values (positive and negative) which works well. Single bits can also be used in transmission frames to represent start/stop flags. They certainly have their use, Alan Turing really was brilliant with his realization of the potential for the binary digit.

OCR doesn't translate well to a standard such as ASCII because it can be a hand drawing, written note, printed document, etc. Interpreting any given OCR scan involves a lot of preliminary processing and then followed by a series of comparisons/matching. Computers do well with the defined and not so well with the abstract. I believe that one day there will be a computer revolution in the (hopefully near) future that brings computation closer to human thought.

-Jeff

Gerhard Adam's picture
Jeff:  Not to sound too cynical, but I'd be happy if they bring most humans closer to human thought. :))

jlafay's picture
Good call, I like it! :)

logicman's picture
Gerhard: your analysis of my deliberate errors example is spot on!  I should comment that 'wiv' is a common British, mainly London, pronunciation of 'with'.  People who turn word-final th into V commonly turn word-initial th into F, so thick becomes 'fick', for example.

When presented with an example of errors we can focus on the error-correction measures that are available.  What is not commonly realised is that we error-correct every time we use language, but we are not usually conscious of it.  More accurately, if ther are no errors, then the error-handling codes are used as validation codes and then discarded, never triggering any conscious awareness that they were ever a part of the conversation stream.

I believe that one day there will be a computer revolution in the
(hopefully near) future that brings computation closer to human thought.

Jeff:  hold that thought!  Before computers can reliably read documents out loud they must be able to handle human language according to the rules of semantics and pragmatics.  Natural Language Programming, NLP is also a goal that cannot be reached by syntax alone.

Please keep those comments coming - let's learn from each other, that's what language is for.

Gerhard Adam's picture
What is not commonly realised is that we error-correct every time we use language, but we are not usually conscious of it. 

Actually this is also interesting from the perspective of people that learn English (or some other language) as a secondary.  They tend to "error-correct" in their native tongue so where words or letters are common between the two, the pronunciation is often that of the original language.

As you know, in German the 'w' is pronounced as a 'v' sound, 'v' with an 'f', and 'j' as a 'y'.  This is what typically amounts to the typical German accent when heard reading/speaking English.

Add a comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <sup> <sub> <a> <em> <strong> <center> <cite> <code> <TH><ul> <ol> <li> <dl> <dt> <dd> <img> <br> <p> <blockquote> <strike> <object> <param> <embed> <del> <pre> <b> <i> <table> <tbody> <div> <tr> <td> <h1> <h2> <h3> <h4> <h5> <h6> <hr> <iframe>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.
CAPTCHA
If you register, you will never be bothered to prove you are human again. And you get a real editor toolbar to use instead of this HTML thing that wards off spam bots.