Current CSLU Research Projects

CSLU conducts a wide range of research projects,  including projects focused on core speech processing and natural language processing algorithms (technology research projects) and projects focused on biomedical applications (biomedical research projects), specifically on creation of diagnostic, remedial, and assistive methods for neurodevelopmental and neurodegenerative disorders and diseases.


I. Technology Research Projects

  • Discriminative Syntactic Language Modeling: Automatic Feature Selection and Efficient Annotation
  • [Brian Roark] The focus of this NSF_funded project is on the effective use of parser-derived and tagger-derived features within discriminative approaches to language modeling for automatic speech recognition. Discriminative language modeling approaches provide a tremendous amount of flexibility in defining features, but the size of the potential parser-derived feature space requires efficient feature annotation and selection algorithms. The project has four specific aims. The first aim is to develop a set of efficient, general, and scalable syntactic feature selection algorithms for use with various kinds of annotation and several parameter estimation techniques. The second aim is to develop general tree and grammar transformation algorithms designed to preserve selected feature annotations yet lead to faster parsing or even tagging approximations to parsing. The third aim is to evaluate a broad range of feature selection and grammar transformation approaches on a large vocabulary continuous speech recognition (LVCSR) task, namely Switchboard. The final aim is to design and package the algorithms to straightforwardly support future research into other applications, such as machine translation (MT); and into other languages, such as Chinese and Arabic. The algorithms developed as a part of this project are expected to contribute to improvements in LVCSR accuracy and applications that rely upon this technology. The algorithms are being packaged into a publicly available software library, enabling researchers working in many application areas -- including LVCSR and MT -- and various languages to investigate best practices in syntactic language modeling for their specific task, without having to hand-select and evaluate feature sets.

  • Multi-Threaded Dialogues For Real-Time Applications
[Peter Heeman]. The goal of this NSF-funded project is to create a speech interface that supports a user in interacting with multiple real-time devices at the same time, where the interaction with each device is a separate dialogue thread. The first aim is to show, using a human-computer study, that the simple way to implement a speech interface for managing multiple threads is not effective. The second aim is to run a human-human study to show that people can inherently manage multiple dialogue threads, and to determine what conventions they use. The third aim is to build a speech interface that implements the conventions that were found.
    The main impact of this work is the development of a model that accounts for how people deal with multi-threaded dialogues. This model will be demonstrated in an implemented speech interface. This work will create a technology that will be useful in interacting with the pervasive electronic devices that we can expect to see in the future.

  • Small Footprint Speech Synthesis
This NSF Small Business Technology Transfer Phase I project is led by Alexander Kain at Biospeech Inc., a CSLU startup, and Jan van Santen   The project aims to develop and implement a new algorithm in the area of text-to-speech synthesis (TTS) that will lead to (i) dramatic decreases in disk and memory requirements at a given speech quality level and (ii) minimization of the amount of voice recordings needed to create a new synthetic voice. Most current TTS systems operate by concatenating segments of recorded speech ([acoustic] units). A challenge for TTS is coarticulation: The dependency of the acoustic manifestations of a phoneme on its neighbors. Current TTS systems use multi-phone acoustic units such as diphones, which preserve coarticulatory patterns naturally present in speech. However, this approach requires a large amount of recordings and generates systems with large footprints. Biospeech proposes a uniphone approach that addresses coarticulation processes with an explicit model. The method uses complex spectral vectors (basis vectors) representing brief segments of speech inside single phonemes, and decomposes these into two components: A formant vector and a spectral balance vector. To generate speech, the formant and spectral balance vectors derived from the basis vectors corresponding to successive phonemes are subjected to separate--and hence generally asynchronous--interpolation operations using time varying weights; the formant and spectral balance vector trajectories thus created are re-combined to create a trajectory in complex spectral space; finally, this trajectory is converted into output speech with the inverse Fourier transform. Asynchronicity is necessitated by the quasi-independence of articulators underlying different spectral features (e.g., frication, formant frequencies). The proposed work has implications for other speech technologies, including Automatic Speech Recognition (ASR). Current ASR technologies address coarticulation by using multi-phone units, typical triphones. The number of triphones in English is over 70,000, and thus requires a large amount of training recordings. The proposed model could dramatically impact on the amount of recordings required for system training. Second, TTS has generally recognized societal benefits for universal access, education, and information access by voice. For example, TTS-based augmentative devices are available for individuals who have lost their voice; and reading machines for the blind have been available for several decades. Third, the approach will make higher-quality TTS more available for smaller devices. For example, voice based caller ID on low-end mobile telephones is currently not possible due to memory limitations. Fourth, it enables voice adaptation with a minimum of recordings. This will enable building personalized TTS systems for individuals with speech disorders who can only intermittently produce normal speech sounds or for individuals who are about to undergo surgery that will irreversibly alter their speech. The method proffered by Biospeech only requires recordings of valid samples of each of (less than 50) phonemes instead of each of (2000 or more) diphones.

  • Objective Methods for Predicting and Optimizing Synthetic Speech Quality

  •   This NSF-funded project focuses on how humans perceive acoustic discontinuities in speech. Current text-to-speech synthesis ("TTS") technology operates by retrieving intervals of stored digitized speech("units")  from a database and splicing ("concatenating") them to form the output utterance. Unavoidably, there are acoustic discontinuities at the time points where the successive speech intervals meet. An unsolved problem is how to predict  from the quantitative, acoustic properties of two to-be-concatenated units whether humans will hear a discontinuity. This is of immediate relevance for TTS systems that select units at run time from a large speech corpus. During selection, the systems search through the space of all possible sequences of units that can be used for the utterance and selects the sequence that has the lowest overall objective cost measure, such as the Euclidean distance between the final frame and initial frame of two units. However, research has already shown that this method and related methods do not predict well whether humans will hear a discontinuity. The current research, by being explicitly focused on perceptually optimized objective cost measures, will directly contribute to the perceptual accuracy of cost measures and hence to synthesis quality.
     
  • Prosody Generation for Child Oriented Speech Synthesis

  •   This NSF-funded project [joint with Alan Black at Carnegie Mellon University and Richard Sproat at the University of Illinois at Urbana-Champaign] focuses on innovative algorithms for generating highly expressive synthetic speech. Generating expressive speech involves three hard  research problems. (i) Computation of abstract tags that specify, e.g., which words need emphasis, and phrasing (e.g., where to pause). (ii) Based on these tags, the system has to compute a fundamental frequency contour. (iii)  Severe modification of the stored speech fragments ("acoustic units") to obtain these contours. The central goal of the project is to address these research problems, and create a TTS system that will make the next generation of  TTS based language remediation systems viable.

  • Creating the Next Generation of Intelligent Animated Conversational Agents

  •   The goal of this NSF-funded project [joint with Ron Cole at the  University of Colorado and Javier Movellan at the University of California at San Diego]  is to improve reading achievement of children with reading problems by designing computer-based interactive reading tutors that incorporate new speech and language technologies. The reading tutors will help English- and Spanish-speaking children learn to read by providing classroom teachers and reading specialists with tools to instruct and exercise the set of auditory, visual and linguistic skills needed to read, speech discrimination, speech production, phonological awareness, sound-to-letter mappings, vocabulary, fluency and comprehension. The tutors will be designed, tested and refined in collaboration with reading specialists and instructional designers, and tested with children in special education programs in elementary schools in Boulder Colorado.


II. Biomedical Research Projects

  • Expressive and Receptive Prosody in Autism
This NIH-supported project, led by Jan van Santen and Lois Black, and in collaboration with Rhea Paul and Fred Volkmar at Yale's Child Study Center and Larry Shriberg at the University of Wisconsin's Waisman Center, focuses on automated technologies for assessment of prosodic ability in autism. Autistic Spectrum Disorders (ASD) form a group of neuropsychiatric conditions whose core behavioral features include impairments in reciprocal social interaction, in communication, and repetitive, stereotyped, or restricted interests and behaviors. The importance of prosodic deficits in the adaptive communicative competence of speakers with ASD, as well as for a fuller understanding of the social disabilities central to these disorders is generally recognized; yet current studies are few in number and have significant methodological limitations. The objective of the proposed project is to detail prosodic deficits in young speakers with ASD through a series of experiments that address these disabilities and related areas of function. Key features of the project include: 1) the application of innovative technology. The study will apply computer-based speech and language technologies for quantifying expressive prosody, for computing dialogue structure, and for generating acoustically controlled speech stimuli for measuring receptive prosody; moreover, all experiments will be delivered via computer to insure consistency of stimuli and accuracy of recording responses; 2) broad coverage of the dimensions of prosody. All three functions of prosody, grammatical, pragmatic, and affective, will be addressed; expressive and receptive tasks are included; and both contextualized tasks (dialogue, story comprehension and memory) and decontextualized tasks (e.g., vocal affect recognition) will be used; 3) inclusion of neuropsychological assessment and classification methodologies to address within-group heterogeneity and obtain a detailed characterization of the groups; 4) inclusion of two comparison groups: children with typical development and those with Developmental Language Disorder; 5) inclusion of an experimental treatment program to enhance the prosodic abilities of speakers with ASD.  A student fellowship for this project is supported by Autism Speaks.
  • In Your Own Voice: Personal Augmentative and Alternative Communication Voices for Minimally Verbal Children with Autism Spectrum Disorders
[Jan van Santen, Lois Black ,Nancy Lurie Marks Family Foundation].   Many children with autism who have limited verbal abilities use Augmentative and Alternative Communication (AAC) devices to help them communicate with others. Often, these devices produce speech output. Necessarily, the voice of such a system does not resemble in any way the voice of the child who uses the system. This project is for children who have at least some speech capability, such as saying a few isolated words. The investigators will develop technology that performs a voice transplant of the child's natural voice onto the AAC device, so that the device's voice will sound like the child. The investigators hypothesize that an AAC device with a personalized voice that mimics the child's voice will psychologically reinforce powerful motivational factors and a sense of owness for communication so that the frequency and richness of AAC use, and its acceptance by family members and friends, will be enhanced. In addition, as a tool for improving a child's speech capabilities, a system that speaks with a voice similar to the child's own voice is likely to be more effective than a system that speaks with a default synthetic voice because the computer provides a model that is closer to the child's speech and hence is easier to emulate by the child. To create the system, the investigators will build on the most recent voice transformation, speech synthesis, and other speech technologies that have been developed in his lab.

  • Automated Measurement of Dialogue Structure in Autism
[Brian Roark, Lois Black, Jan van Santen, AutismSpeaks].  This project seeks to bring the power of machine-based sensing and computation to improve the study of speech patterns in individuals with autism. By combining technologies stemming from natural language processing methods and prosodic analysis methods, they expect to find aspects of speech that could be used as clinical markers. Current manual methods for measuring narrative coherence are not only difficult to obtain and extremely time consuming but it is unclear whether the human coder can even detect the statistical degree of semantic similarity as the machine can. This research will analyze recordings being collected from two narrative recall tests that have the potential to uncover a wider range of speech differences between ASD and others. The hope is that this will clinically define children with ASD relative to typically developing children and differentiate ASD from other groups who also have communication impairments, i.e., children with developmental language delay (DLD), as well as differentiate speech characteristics or markers that might better discriminate subtypes within the ASD umbrella (e.g., HFA vs. Asperger's). We expect that speech and language technologies will not only make critical diagnostic speech features easier to document but also may actually uncover distinguishing speech features in autism and autistic subtypes that have previously gone undetected.

  • ERP Based Communication Device for Nonverbal Children on the Autism Spectrum
[Deniz Erdogmus, Lois Black, Nancy Lurie Marks Family Foundation].  Children with Autism Spectrum Disorders (ASD) exhibit varying levels of communication abilities. In this project, the investigators will address the communication needs of the subset that: 1) lack expressive speech and language; 2) lack ability to operate a keyboard, pointing device, or other typical assistive interface; and 3) are assumed to have adequate cognition, literacy, and receptive language understanding. This research aims to develop a communication system for such children. Resulting technology could also benefit other children and adults with adequate cognition but limited communication options. The investigators will develop an assistive communication facilitation device referred to as the RSVP Keyboard. It unites three technologies: 1) Rapid serial visual presentation (RSVP, with individually adjustable presentation rates) of letters/words/phrases; 2) a yes/no intent detection mechanism based on detecting evoked-response potentials (ERP) in the brain to determine which target letter or letters the child wants to convey; 3) a statistical language model based dynamic sequencing optimization procedure that computes which letter needs to be presented next to take advantage of regularities in language. The system will operate by showing the sequence of candidate letters on the screen as well as previously typed text, such that words and phrases are formed naturally by adding selected letters. The first goal is to test the viability of the basic concept of facilitated communication through the RSVP Keyboard System. Upon demonstration of feasibility through neuroimaging and statistical analysis of brain responses to RSVP stimuli sequences, the investigators will evaluate performances of typically developing children and nonverbal children with ASD in three interactive cognitive tasks.

  • Comparing Standardized and Spontaneous Measures of Language
[Amy Costanza-Smith (Child Development and Rehabilitation Center, OHSU), Lois Black, Jan van Santen, Medical Research Foundation of Oregon].  This research focuses on the markers of childhood language disorders, and on the possibility of automated scoring of those markers. The manner in which testing for language disorders typically takes place - standardized assessments administered by a clinician - often bears little resemblance to real-world communication. Instead, it is proposed to use children's real-life utterances to develop new markers - e.g. vocabulary, grammar, number of errors - that will improve the accuracy of diagnosis. The ultimate goal is to use recent advances in speech technology to automate the processing of these utterances, currently performed through transcription and manual analysis.


  • Diagnostic Markers for Childhood Apraxia Speech

  • This NIH-supported project, led by John-Paul Hosom (PI) and in collaboration with Larry Shriberg at the University of Wisconsin's Waisman Center, focuses on automated methods for assessment of Childhood Apraxia of Speech.  This disorder is a highly controversial disorder due to a lack of consensus on the features that define it and the etiologic conditions that explain its origin. The term Suspected Apraxia of Speech (sAOS) has been proposed as an interim term for this putative clinical entity. The point prevalence of sAOS in young children has been estimated at approximately 0.1%. The long-term objective of this proposal is to develop a valid, reliable, and efficient means to classify children as positive for sAOS. In addition to the contributions to theoretical explication of AOS, the software-based diagnostic tools resulting from this work will allow any certified speech-language pathologist to determine if a child's speech includes prosodic features that fall within a 95% confidence interval supporting the diagnosis of sAOS. The aim for this first period of planned programmatic research is to develop automated diagnostic markers for sAOS with clinically adequate sensitivity and specificity (> 90% positive and negative likelihood ratios). The four specific aims are: (a) to automate and improve the sensitivity and specificity of two existing (manually derived) prosodic markers, (b) to develop four additional automatic, prosody-based diagnostic markers, (c) to derive a single diagnostic index based on a statistical derivative from the six individual markers, and (d) to validate the composite diagnostic marker using classification data obtained from expert clinical researchers. Procedures are divided into four phases. In Year 1, automated versions of existing markers will be developed that determine speech-event locations using automatic speech recognition (ASR). Based on two pilot studies, this technique is expected to yield results equivalent to published data. The sensitivity of the markers will be improved by methods including normalizing by speaking rate and vowel identity. In Year 2, new automated markers will be created based on ASR and speech-signal processing techniques. These markers will measure variation in interstress timing, linguistic rhythm, speaking rate, and glottal-source characteristics. In the first part of Year 3, results from all six markers will be combined into a single diagnostic index using multi-layer perceptrons. In the latter part of Year 3, per-child errors will be evaluated to determine relationships between specific prosodic factors and the diagnosis of sAOS, providing insight into the features and definition of sAOS.

  • Voice Transformation for Dysarthria - Phase I
[Jan van Santen, PI; Alexander Kain, co-PI; NIH]. Software will be developed in a collaborative project with BioSpeech Inc., supported by the NIH,  that transforms speech compromised by dysarthria into easier-to-understand and more natural- sounding speech. The software will reside on laptop computers, with microphone input and amplified speaker or line output. Such software and hardware solutions will assist individuals with dysarthria to better communicate by voice, whether face-to-face or by telephone; it will also help these individuals when interacting with voice controlled services and devices, which are increasingly more popular. The system operates in "Interpreter Mode", meaning that output will take place after a brief processing delay once the speaker has completed an utterance. The software is based on a multi-step formant re-synthesis process: (i) Robust extraction of formant, energy, spectral balance, and pitch trajectories from input speech; (ii) Modification of extracted trajectories by imposition of smoothness and shape based constraints, and by bringing these trajectories in closer proximity to trajectories of normal speech; (iii) Conversion of the trajectories into a speech signal by formant synthesis. Results obtained with a prototype, personal computer based system show that this process is robust, enhances intelligibility, and completely eliminates "vocal fry", i.e., distortions caused by irregularities in the temporal pattern of the vocal folds. In Phase I, the core algorithms performing these steps will be improved and extended, and the software will be ported to a pocketable computer; the resulting system will evaluated on multiple speakers and listeners; and feedback will be obtained from potential users and their partners about desired features, usability, and functionality. In Phase II, acceptable processing delays will be achieved using known methods for optimizing memory and processing speed; further enhancement capabilities will be added, and the system will be evaluated. The currently targeted product will be the first in a family of speech enhancement products with continually expanding functionality, by capitalizing on ongoing algorithmic and hardware improvements. Usage of standard hardware and software platforms, that in turn are compatible with a wide range of headsets and wearable amplified speakers or telephones, puts this software in a strong competitive position. A large percentage of the more than 2.5 million adult Americans with significant disability due to chronic neurological impairment in the United States present with dysarthria or speech impairment as one of their disabling conditions. There are no cures for speech impairments. Dysarthric individuals report losses to employment, educational opportunities, social integration, and quality of life. Individuals are taught strategies that compensate for their impairments, but the isolation caused by communication impairment is pervasive. The project goal is to develop a system that uses a wearable computer to transform speech compromised by dysarthria into easier-to-understand and more natural-sounding speech, and will thereby enable dysarthric individuals to communicate more effectively by telephone or in face-to-face contexts.

  • User Adaptation of AAC Device Voices - Phase I
[Jan van Santen, PI; Esther Klabbers, co-PI, NIH]. A wide range of individuals cannot communicate by voice. Voice enabled Augmentative and Alternative Communication (AAC) devices are often the only channel available by which these individuals can communicate. While many voice enabled AAC devices are currently available, they lack the important ability to generate customized speech that mimics aspects of the user's past or intermittently available speech. Modern "concatenative" speech synthesis technology can mimic a given speaker's voice, by excising speech fragments from a recorded speech data base ("acoustic inventory") and recombining these into output speech using sophisticated algorithms. It requires, however, a large amount of recordings and a high degree of consistency of pronunciation of the speaker. Many AAC users cannot meet these requirements because they already have lost the capability to speak or they cannot speak with adequate consistency of pronunciation. A new type of technology, voice transformation (VT) technology, is available that can transform speech spoken by a "source" speaker into speech that is perceived as spoken by a specific "target" speaker. To tune the transformation system, parallel "training recordings" of the same text are needed from the source and target speakers. The amount of training recordings is far less than what is needed for a high-quality acoustic inventory. In this joint project with BioSpech Inc., supported by the NIH, we propose to use VT in combination with speech synthesis to convert the synthesis system's acoustic inventory into an acoustic inventory that mimics the target speaker's voice. The training recordings can consist of old home videos, or fragmented recordings produced during periods of intact speech, provided that they contain at least one sample of each phoneme. In Phase I, we will develop and evaluate a VT based synthesis system. The project will use high- quality and home-video quality recordings from male and female adults and children to create limited acoustic inventories (adequate to generate a specific set of test sentences) and VT training recordings. Perceptual experiments will be conducted to evaluate voice quality and perceived speaker identity. Phase II will focus on developing complete acoustic inventories for several canonical speakers that will be selected to cover a range of speaker characteristics, and on producing portable, user-friendly software. The anticipated commercial offering consists of (i) software components to be licensed to AAC vendors and (ii) a service consisting of collection and processing of recordings and creation of personalized acoustic inventories. Speech communication ability is impaired or absent in millions of Americans due to neurological disorders and diseases and to trauma, including autism, Parkinson's disease, and stroke. Augmentative and Alternative Communication (AAC) devices that are operated via switches, keyboards, and a broad range of other input devices, and that have synthetic speech as output, are often the only manner in which these individuals can communicate. Without AAC devices, these individuals may suffer from severe social and psychological isolation, and may be unable to lead productive lives. A psychologically important feature that no currently available systems have is the ability to speak with the user's voice, i.e., the ability to produce speech that mimics the individual's pre-morbid speech or speech that the individual may be able to intermittently produce. The proposed project will use voice transformation (VT) technology to accomplish this goal. VT technology requires recordings of the user to be available, but there is substantial flexibility as to the nature and quantity of these recordings; they may consist of home videos or of fragmentary speech, provided that at least some samples are available of each speech sound in the language. The goal of the application is to develop a synthetic voice for an AAC system that sounds like the individual using the system (before they lost the ability to speak), without requiring very much recorded data on the part of the original talker. The system works by first creating a synthetic "base" voice (or set of base voices) using professional actors who must provide a fairly large inventory of speech data. Using the base voice and a small sample from the target talker (i.e., containing at least one instance of each phoneme), a new synthetic voice is created by essentially modulating parameters in the base voice so that it takes on characteristics of the target talker. The ability to create a voice that sounds like the original talker without much data from the original talker would be a significant advantage.

  • Novel Computerized Behavioral Assessment Methods for Attention Deficit Hyperactivity Disorder.

  • This internally funded exploratory project, conducted by Lois Black, Holly Jimison (Biomedical Engineering Department and Department of Medical Informatics and Clinical Epidemiology), Leeza Maron (Psychiatry), Misha Pavel (Biomedical Engineering Department),  and Jan van Santen (PI), focuses on building a computerized assessment system that has these features.
    1. A clear understanding of which neuropsychological functions are measured.
    2. Interactivity (the computer adapts its behavior instantly to the subjects’ responses, thereby being able to operate at a level of optimal sensitivity).
    3. Instantaneous and timed measurement of a range of behavioral responses including the force dynamics of button pushing and eye movements.
    4. Mathematical modeling of the underlying cognitive processes in order to derive “purer” measures of the neuropsychological functions.
    5. A more motivating and shorter assessment process.


  • Pilot Study for Word Recognition of Children with Speech Delay
John-Paul Hosom , PI, Medical Research Foundation of Oregon.  Children with speech delay of unknown origin (hereafter referred to as “speech delay”) are characterized by a number of language problems, including reduced vocabulary size, atypical grammar, and highly unintelligible speech. The long-term objective of the proposed research is to enable children with speech delay to communicate more effectively. This proposal presents only the first step in realizing this long-term objective. In this first step, speech data from a limited number of children with speech delay will be analyzed to evaluate the feasibility of automatically identifying acoustic features in the speech signal that may be used to identify intended phonemes. The hypothesis of the proposed research is that there are correlations between intended phonemes and certain acoustic features of children with speech delay, when the intended phoneme is not the same as the phoneme actually spoken. Such correlations could then be used to assist in the automatic word recognition of an intended utterance.

  • Making Dysarthric Speech Intelligible

  • [Jan van Santen, PI]. This NSF-funded project [joint with Melanie Fried-Oken at the Child Development and Rehabilitation Center at the Oregon Health & Science University]  will develop new algorithms that will enable dysarthric individuals to be more easily understood. Currently available devices are essentially spectral filters and amplifiers that enhance certain parts of the spectrum. While these can help certain types of dysarthria, many dysarthric persons suffer from speech problems that require forms of speech modification that are much more profound and complex such as: irregular sub-glottal pressure, resulting in loudness bursts that can be difficult to adjust to; absence, or poor control, of voicing; systematic mispronunciation of certain phoneme groups, resulting in certain sounds becoming indistinguishable or unrecognizable; variable mispronunciation; and poor prosody (pitch control, timing, and loudness). For these difficult problems, new approaches are needed that do not merely filter the speech signal but analyze it at acoustic, articulatory, phonetic, and linguistic levels.
     

  • Differentiating between Autism Spectrum Disorder and Developmental Language Disorders via Story Recall Analysis
Brian Roark, PI, Medical Research Foundation of Oregon. The analysis of elicited spoken language samples plays a key role in the diagnosis of a wide  range of linguistic and cognitive impairments, from developmental impairments, such as Developmental Language Disorders (DLD) or Autism Spectrum Disorder (ASD), to degenerative cognitive impairments, such as dementia.  Perhaps the most popular means of  eliciting such a sample is through a narrative recall task, where the subject is told a story of sufficient length to preclude verbatim recall, and then asked, either immediately or after some delay, to retell the story they have been told.  Most clinical uses of such tests involve a very simple scoring mechanism, in which the recall of specific items in the story is noted by the administering clinician (as the story is being re-told), and summary scores are calculated based on the number of these recalled items.  The resulting summary score fails to capture much of the potentially relevant information available in the spoken language sample, e.g., grammatical complexity, pause frequency, or the ordering of recalled items. The long-term objective of the proposed work is to identify multiple complex markers,  derived from open and cued responses to narrative recall tasks, for differentiating between: (1) children broadly diagnosed with ASD; (2) children broadly diagnosed with DLD; and (3)  normally developing children.  In the proposed study, narrative retellings produced by a relatively limited number of children will be analyzed for the feasibility of automatically extracting markers from the spoken language samples to effectively discriminate between the three groups.

  • Automatic spoken language analysis for detecting cognitive impairment

  • Brian Roark, PI]. linical research into Alzheimer's disease (AD) and the mild cognitive impairment (MCI) that precedes its full onset, is increasingly focused on early diagnosis and treatment that can delay or even prevent full onset of AD. Effective diagnosis requires differentiating between changes in cognitive and linguistic abilities that occur during normal aging and those that are due to impairment. Both manual linguistic analyses of spoken language samples and orally administered clinical exams are effective but costly methods for discriminating between healthy and MCI subjects. For widespread testing of the growing elderly population for markers of MCI, automation of testing procedures will be required.
        The objective of the NIH-Roybal-funded project will be to develop statistical speech and language analysis techniques to automatically extract features from spoken language samples recorded during clinical examinations. Healthy and MCI elderly subjects of on-going studies at the Layton Center of OHSU take full neuropsychological examinations annually for life. We will request their permission to record and analyze these sessions, which include several tests of particular interest, including a delayed story recall test and a picture description task. We will transcribe the words and annotate syntactic structure for selected tests, and develop algorithms for automatically deriving features from the spoken language samples. These automatically-derived speech- and language-based features will then be used to build classifiers for discriminating between healthy and MCI subjects. In addition to test automation, the statistical speech and language processing techniques will provide two benefits of primary importance: inclusion of approximations to previously researched manually-derived features; and the use of unexplored features derived from statistical characteristics of the samples, such as a number of entropy-based features.

  • Automated Test of Word Recognition - Phase II
[Robert Margolis, University of Minnesota, PI]. Over 5 million word recognition tests are administered annually by audiologists in the United States with an associated cost of more than $100 million. These tests are currently performed manually by highly trained audiologists. This NIH-funded project describes the Phase II development of automated clinical speech recognition tests using clinical test recordings and an automated speech recognition system to score the subjects' responses. A method for automatically interpreting the test scores will also be evaluated. The objectives are to increase the accuracy and efficiency of these clinical tests, substantially reduce the cost, and provide an objective, automatic, evidence-based method for interpreting the results. The automated speech recognition test in combination with the automated pure tone audiogram (currently an STTR Phase II project) will perform diagnostic testing of a majority of audiology patients, freeing the audiologists' time for activities that require their training and skill. Contemporary changes in training and reimbursement patterns create a high demand for automated clinical procedures. The automated procedures are implemented on existing commercial audiometers with a personal computer that controls the audiometer delivery and routing of stimuli. Phase I results were obtained with automatic speech recognizers that were trained on a limited number of subjects (n=9). Estimates of the agreement between human and machine scoring ranged from 82-93%. Additional refinements with benefits that are predictable from prior experience will increase recognizer performance to a level that equals or exceeds human-human agreement and provide the basis for efficient and accurate clinical tests. In Phase II, an automatic speech recognition threshold test will be compared to the manual method used in routine clinical practice. Two different recognizer scoring strategies will be developed, one that requires more test time but is independent of individual speaker differences and is easily adaptable to other languages, and one that requires less time but may not be applicable to all patients. A pilot study will test the method on a Spanish-language speech-recognition test.

  • Speech Supplemented Word Prediction Program - Phase II
[Thomas Jakobs, InvoTek, PI].  Commercial speech recognition software offers many people with physical limitations an important computer access method. While this access method is reasonably reliable for people with typical speech, people with motor speech disorders (dysarthria) are presently not able to use this technology reliably. The purpose of this NIH-funded research is to provide these people with a unique assistive-device access method that utilizes their speech. We will accomplish this by continuing to develop a Speech Supplemented Word Prediction Program (SSWPP) that enables people with dysarthria to use their speech capabilities to interact with personal computers, with an emphasis on assisted writing. The central element of the SSWPP is custom speech-recognition software used in conjunction with word prediction. The feasibility results for the SSWPP developed during Phase 1 are exciting. The average keystroke savings achieved by people with dysarthria on typical sentences was 68%. Commercially available word prediction programs achieved no better than 47% keystroke savings on the same text. Phase 2 design activities include improving the speech recognition engine, developing an optimized microphone interface, integrating the SSWPP into Microsoft Word, and developing a speech-to-text display for use in face-to-face communication. People with disability will evaluate the new SSWPP. The Speech Supplemented Word Prediction Program is a tool for people with disability, who also have difficult to understand speech. This tool enables these people to use their speech to reduce the amount of work required to enter text into a computer and to communicate verbally more effectively.

  • Automated voice-based cognitive assessment and spoken language-based markers for neurodegenerative diseases
This project (Tamara Hayes, PI),  funded under a new program of Intel's Digital Health Group called the Behavioral Assessment and Intervention Commons, is aimed at initiating and accelerating research into behavioral markers of disease, such as changes in walking, speech and performance on computer games, that eventually translate into health-related products and services. CSLU is developing voice enabled automated assessment "kiosk" based versions of standard neurocognitive tasks (e.g., digit span) and speech and language based markers for neurodegenerative diseases.  The kiosk is also develope in the context of the Alzheimer's Disease Cooperative Study (ADCS) program.