Neural Speech Decoding

A Neural Speech Decoding Framework Leveraging Deep Learning and Speech Synthesis

View Code Read Paper

Our Paper is Online in Nature Machine Intelligence!

Decoding human speech from neural signals is essential for brain-computer interface (BCI) technologies restoring speech in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity, and high dimensionality. Here, we present a novel deep learning-based neural speech decoding framework that includes an ECoG Decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable Speech Synthesizer that maps speech parameters to spectrograms. We developed a companion speech-to-speech auto-encoder consisting of a speech encoder and the same speech synthesizer to generate reference speech parameters to facilitate the ECoG decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Our experimental results show that our models can decode speech with high correlation even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. Lastly, we successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with deficits resulting from left hemisphere damage.

Neural Speech Decoding Framework

Decoding human speech from neural signals is essential for brain-computer interface (BCI) technologies restoring speech in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity, and high dimensionality. Here, we present a novel deep learning-based neural speech decoding framework that includes an ECoG Decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable Speech Synthesizer that maps speech parameters to spectrograms. We developed a companion speech-to-speech auto-encoder consisting of a speech encoder and the same speech synthesizer to generate reference speech parameters to facilitate the ECoG decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Our experimental results show that our models can decode speech with high correlation even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. Lastly, we successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with deficits resulting from left hemisphere damage.

Speech Decoding Examples

	Hybrid-density	Low-density	Left hemisphere (low-density)	Right hemisphere (low-density)
Decoded
Original/Decoded
Video Demo

Cite Our Work

@article{chen2023neural,
    title={A Neural Speech Decoding Framework Leveraging Deep Learning and Speech Synthesis},
    author={Chen, Xupeng and Wang, Ran and Khalilian-Gourtani, Amirhossein and Yu, Leyao and 
           Dugan, Patricia and Friedman, Daniel and Doyle, Werner and Devinsky, Orrin and 
           Wang, Yao and Flinker, Adeen},
    journal={Nature Machine Intelligence},
    year={2024},
    publisher={Nature Publishing Group UK London}
}