Decoding Feedforward and Feedback Speech Control in Human Cortex
Ran Wang
Xupeng Chen
Amirhossein Khalilian-Gourtani
Leyao Yu
Adeen Flinker
Yao Wang


Comparison of original and decoded speech produced by the model. Spectrograms of decoded (left) and original (right) speech exemplar words.


Abstract

Decoding spoken speech from human neural activity can enable neuro-prosthetics. Advances in speech neural decoding have shown successful achievements using deep learning. The scarcity of training data, however, hinders a complete brain-computer-interface (BCI) application. In this work, we propose a novel speech neural decoding framework to map the neural signals to either stimulus (perceived) speech or produced speech during various language tasks. We tease apart the feedforward and feedback speech control in neural cortex during network design. Our proposed model can produce natural speech with quantitative metrics substantially exceeding our previous work as well as the state-of-the-art. Furthermore, we evaluate the contribution of each cortical region to the decoding performance. We reveal the feedforward-feedback contributions for decoding the produced speech, as well as the feedback contributions for decoding the perceived speech. Our analysis provides evidence of a mixed cortical organization where pre-frontal regions, classically believed to be involved in motor planning and execution were engaged in both predicting future motor actions (feedforward) as well as processing the perceived feedback from speech (feedback). Similarly, portions of the superior temporal gyrus which is classically involved in speech perception showed evidence of anterior regions which were predicting future speech (feedforward) and more posterior regions that were processing reafferent speech feedback. Our findings are the first to systematically disentangle the dynamics of feed forward and feedback processing during speech and provide evidence for a surprisingly mixed cortical architecture within temporal and frontal cortices. Our approach provides a promising new avenue for using deep neural networks in neuroscience studies of complex dynamic behaviors.



Generated Audio Demo


Audio type

  • Original
  • Decoded
  • Merged

We provide some selected audio samples. The "Decoded" audio means ECoG-to-Speech decoded speech. In the "Merged" audio we combine original and decoded audio in each trial.



Paper and Supplementary Material

Wang, Ran and Chen, Xupeng and Khalilian-Gourtani, Amirhossein and Yu, Leyao and Flinker, Adeen and Wang, Yao.
Decoding Feedforward and Feedback Speech Control in Human Cortex

( HOSTED HERE , the Password is ecognyu2021 )


Model Architecture


Overall structure of the ECoG-to-Speech decoding pipeline.

A convolutional ECoG decoder is used to map high gamma components of the recorded ECoG signals to a set of speech parameters. These parameters then drive a speech synthesizer to generate speech spectrogram and the corresponding speech waveform.



Distribution of the Feedback and Feedforward Contribution






Spatio-temporal perceptive field of feedforward and feedback processing during speech production