Showing posts with label semiotics. Show all posts
Showing posts with label semiotics. Show all posts

Tuesday, 14 July 2009

Creating the message

In my last post I showed how an extended semiotic framework could resolve the old data/information chestnut. I showed that during decoding the recipient must use several kinds of knowledge, eg of grammar and coding, that was also used by the sender. Now I want to look at the process of composition. In this, as with decoding, the levels are applied successively but from top to bottom. At each level the sender makes one or more choices on the basis of what he believes that the recipient also knows. Specific steps, technical or otherwise, may be needed to ensure that the recipient does know it.
The starting point is always an intention. This may be to transfer information but is often to produce action, eg giving an command or placing an order. In either case the sender needs to know what the intended recipient already knows about the topic.

For instance, if I wish to explain collaterised debt obligations to you I need to know how much you already know about markets and derivatives. Similarly, if I wish to order a garden hammock from you I should first establish that you sell such hammocks. At the pragmatic level this provides the shared context within which communication will occur.

(Human beings routinely negotiate these issues but IT systems lack this flexibility. Therefore, where the sender and recipient are IT applications, such as an ERP or online order taking system, these matters must, unless relevant standards already exist, be negotiated between the parties. The pragmatic context may include agreements on the legal significance of messages, the speed with which orders should be honoured, etc. At least some of the context may be stated in commercial terms and conditions, in contracts or in national law.)

From the context and his intent the sender must decide what information he needs to transfer. In the general case of inter-personal communication, including negotiating, teaching, etc., this is, in fact, the most difficult step. In IT systems its often strongly constrained by obvious needs, eg to specify the product being ordered, and constraints, eg everything has to be typed by a telesales agent.

This information is now passed to the semantic level where the sender selects the natural language and/or coding system to be used. The natural language should, obviously, be one that the recipient understands and this also applies to any technical terms used. In the case of coding systems there is often an obvious choice, eg the Gregorian calendar for dates, WGS 84 for lattitude and longitude, but other systems remain in use so it may be necessary to specify the system in use. The choice of units of measurement is also part of the semantic level.

(Where the sender and recipient are IT applications these decisions may be taken by the designer of one or the other or negotiated between them. They will not generally be taken by any individual user. They may be documented in a data dictionary but as natural language text as there are no commonly used notations for expressing semantic choices.)

At syntactic level the sender will apply his knowledge of grammar to create grammatically correct sentences in the chosen natural language. The grammar generally follows from the choice of natural language.

At lexical level the words are converted into letters and punctuation marks. In most cases the lexical rules
follows from the choice of natural language but there are a few languages, eg Serbo-Croat, that are written in more than one script.

At coding level the symbols are converted into bytes (this is almost always automatic).

Finally, the bytes are inserted into the fields in a pre-agreed structure (the format level).

(Where the sender and recipient are IT applications the format may be decided by the designer of one or the other or negotiated between them. There are several commonly used notations for defining such structures. It may be documented in a data dictionary and also stored in a database schema.)

We now have a sequence of bytes that can be sent electronically as a message with confidence that the recipient will be able to recover the intended meaning.

Tuesday, 2 June 2009

Information IS data in context

There has been a lot of discussion about the meaning of data, information, knowledge, wisdom and metadata. Though often entertaining this discussion usually generates more light than heat. Here I will use an expanded version of the semiotic framework to resolve the confusion. I find that information IS "data in context" and data IS an encoded form of information. However, the context has several layers, each of which contributes to the encoding of information as data. Much of the confusion is due to ignoring the layered nature of the context.

I will consider the steps needed to fully understand a message (or stored record). I’ll assume that the message is digitally encoded – though this is almost irrelevant to the analysis. Initially, let’s assume that the message is English text.

a) I receive a message comprising a string of bytes.

b) I can covert this into printed, or displayed, characters. But to do this properly I must know which character coding system, eg EBCDIC, was used to create the message.

c) Next I use my lexical knowledge to recognise words and punctuation marks.

d) And then my syntactic knowledge to parse it. I now see the text as a structure, specifically a sentence, with a subject, clauses, etc.

e) Adding my semantic knowledge (of what the words mean) gives me the meaning of the sentence.

f) Finally I relate this meaning to other relevant knowledge (the context for the sentence) and see the significance of the message.

At each step the decoder, whether human or electronic, must apply information it already knows. This information may be called metadata. Here’s a summary:

Semiotic level

What is added by the receiver

The result

6 Pragmatic

Contextual knowledge.

Significance of the information.

5 Semantic

Meanings of words (from dictionary)

Meaning of the sentence

4 Syntax

English grammar

Parsed sentence.

3 Lexical

Lexical rules

Words

2 Coding

Character coding.

Characters

Input

n/a

Bytes

Now let’s generalise.

Numbers

Suppose the message consists of several distinct fields, some textual and some numerical. Then we’ll need to divide the message into fields before we apply the character code for text since numbers may use non-text coding. This will be level 1. And we’ll still need level 2 to turn the bytes into numbers.

There’s no lexical or syntactic level for numbers but we will need to know the semantics. Many numbers are measurements or predictions of measurements and for these we need to know what is measured and the units. We may even need to know how the measurement was made and by whom.

Other numbers are ratings or rankings and this also needs to be known.

Some of this may, of course, be given explicitly by other data items. Taken together this information allows us to convert the number into a sentence of known meaning, eg

  • 17.63 => The height of the mast is 17.63 m.
  • 623 => This MP’s expenses were the 623rd largest.

Finally, at the pragmatic level, we add contextual information to see the significance of the information, eg

  • The boat is too tall to pass under the bridge.
  • This MP is probably honest.

Image

Now suppose that we have fields containing image data. Processing is similar to numbers. The image coding, eg JPEG, is used at level 0 and there is no lexical or syntactic processing. To get the meaning of the image (semantic level) we need to know what sort of image it is, eg an aerial photograph or an X-ray image, its scale and perhaps other details about the equipment and settings used.

Finally, as before, the pragmatic level adds contextual information to let us see the significance of the image.

Composite model

We can put all this together in a composite model as shown in the table. If necessary the Images column can be generalised to cover the result of any sensing system, eg video, radar images, seismograph output.


Text

Numbers

Images

6 Pragmatic

Add contextual knowledge to get significance.

5 Semantic

Add dictionary knowledge to get meaning.

xx

And units.

And scale, part of spectrum sensed.

4 Syntax

Add NL grammar to get a parsed sentence.

xx

xx

3 Lexical

Add lexical rules to get words

xx

xx

2 Coding

Add character coding to get characters.

Add format rules to get numbers.

Add format rules to get 2D image.

1 Record structure

Add record structure knowledge to divide message into fields.


Bytes

I started by noting the muddle about the words data, information, etc. In fact these words are often used interchangeably and in different ways by people with different backgrounds and interests. IT people, however, should say ‘data’ when we want to discuss bits and bytes and their decoding and processing at lexical and syntactic levels. We should say ‘information’ when discussing the semantics and significance of the data.