next up previous index
Next: Sound Up: Waveforms and Spectrograms Previous: Waveforms and Spectrograms

The Speech Chain

The Speech Chain is the title of a book by Peter Denes. The concept of a speech chain is a good place to start thinking about speech. This chain includes the following links in which a though is expressed in different forms as it is born in a speaker's mind and eventually gives rise to understanding in a listener's mind (see Figure 1 for two diagrams from Rabiner and Juang which compare the links in the speech chain with their computer analogs):

  1. Intention The speaker first decides to say something to another human being (or to a machine). This event takes place in the higher centers of the mind/brain.
  2. Language The desired thought passes through the language centers of the brain where it is given expression in words which are assembled together in the proper order and given final phonetic, intonational, and durational form.
  3. Motor Program and Muscle Movement The results of the language-production centers of the brain may be considered a speech motor program which executes over time by conveying firing sequences to the lower neurological centers, which in turn impart motion to all of the muscles responsible for speech production: the diaphragm, the larynx, the tongue, the jaw, the lips, and so on. Much if not all of this activity is subconscious, and involves constant corrective feedback.
  4. Airstream in the Vocal Tract As a result of the muscle movements, a stream of air emerges from the lungs, passes through the vocal cords where a phonation type (e.g. normal voicing, whispering, aspiration, creaky voice, or no shaping whatsoever) is developed, and receives its final shape in the vocal tract before emerging from the mouth and the nose and through the tissues of the face.
  5. Sound Wave in Air The vibrations caused by the vocal apparatus of the speaker radiate through the air as a sound wave.
  6. Electronic Transduction The sound wave may be converted to analog or digital form for storage or transmission, and in the form of electric waves may be transported thousands of miles to its destination, where the information in the electric waves is converted back to the form of sound. It is in the form of an electronic copy of the original sound wave that automatic speech recognition by computer gains access to speech data.
  7. Hearing The sound wave, which may have passed through electronic coding and decoding, eventually strikes the eardrums of another human being, where it is first converted to waves on the surface of the tympanum membranes, next to mechanical motion via the ossicles of the middle ear, then to fluid pressure waves in the medium bathing the basiliar membrane of the inner ear, and finally to firings in the 30,000 neural fibers which combine to form the auditory nerve.
  8. Auditory and Language Processing The lower centers of the brainstem, the thalamus, the auditory cortex, and the language centers of the brain all cooperate in the recognition of the phonemes which convey meaning, the intonational and durational contours which provide additional information, and the vocal quality which allows the listener to recognize who is speaking and to gain insight into the speaker's health, emotional state, and intention in speaking.
  9. Understanding The higher centers of the brain, both conscious and subconscious, bring to this incoming auditory and language data all the experience of the listener in the form of previous memories and understanding of the current context, allowing the listener to ``manufacture" in his or her mind a more or less faithful ``replica" of the thought which was originally formulated in the speaker's consciousness and to update the listener's description of the current state of the world. The listener may in turn become the speaker, and vice versa, and the speech chain will then operate in reverse.

In this course we will have nothing to say about the higher processing levels mentioned above: that is, steps 1 and 2 in the speaker and 8 and 9 in the listener. We will concentrate on the following two areas:

We will also have a little to say about step 7 above, in which the sound wave is converted back into neuronal activity, but this time in the ear of the listener rather than in the vocal tract of the speaker. The branch of science which studies this form of the speech signal is called auditory neurology.

This course is entitled The Structure of Spoken Language, but it bears the subtitle Speech Spectrogram Reading. This is because the spectrogram, which will be introduced below, can be viewed as a central meeting point of all the aspects of speech science which we will study in this course. It contains the imprint of the vocal tract of the person who produced the utterance; it is derived by digital signal processing from that utterance; and it is very likely that the human ear creates something very much like a spectrogram as its first step in decoding the utterance. In this course we will focus our attention on trying to decode the message contained in the spectrogram by using our sense of vision rather than our sense of hearing. That is why we call the course ``spectrogram reading."

Before we can understand what a spectrogram is, however, we need to say something about sound, and about the sound waveform.

next up previous index
Next: Sound Up: Waveforms and Spectrograms Previous: Waveforms and Spectrograms

Ed Kaiser
Sat Mar 15 00:01:27 PST 1997