interspeech 2021

Special Sessions & Challenges

The Organizing Committee of INTERSPEECH 2021 is proudly announcing the following special sessions and challenges for INTERSPEECH 2021.

Special sessions and challenges focus on relevant ‘special’ topics which may not be covered in regular conference sessions.

Papers have to be submitted following the same schedule and procedure as regular papers; the papers undergo the same review process by anonymous and independent reviewers.

Speech Recognition of Atypical Speech

While speech recognition systems generally work well on the average population with typical speech characteristics, performance on subgroups with unique speaking patterns is usually significantly worse.

Speech that contains non-standard speech patterns (acoustic-phonetic phonotactic, lexical and prosodic patterns) is particularly challenging, both because of the small population with these speech patterns, and because of the generally higher variance of speech patterns. In the case of dysarthric speech, which is often correlated with mobility or other accessibility limitations, accuracy of existing speech recognition systems is often particularly poor, rendering the technology unusable for many speakers who could benefit the most.

In this oral session, we seek to promote interdisciplinary collaborations between researchers and practitioners addressing this problem, to build community and stimulate research. We invite papers analyzing and improving systems dealing with atypical speech.

Topics of interest include, but are not limited to:

Automatic Speech Recognition (ASR) of atypical speech
Speech-to-Speech conversion/normalization (e.g. from atypical to typical)
Voice enhancement and convergence to improve intelligibility of spoken content of atypical speech
Automated classification of atypical speech conditions
Robustness of speech processing systems for atypical speech in common application scenarios
Data augmentation techniques to deal with data sparsity
Aspects of creating, managing data quality, and sharing of data sets of atypical speech
Multi-modal integration (e.g. video and voice) and its application
https://sites.google.com/view/atypicalspeech-interspeech2021
Jordan R. Green, MGH Institute of Health Professions, Harvard University
Michael P. Brenner, Harvard University, Google
Fadi Biadsy, Google
Bob MacDonald, Google
Katrin Tomanek, Google

Oriental Language Recognition

Oriental languages are rich and complex. With the great diversity in terms of both acoustics and linguistics, oriental language is a treasure for multilingual research. The Oriental Language Recognition (OLR) challenge has been conducted for 5 years with big success, and demonstrated many novel and interesting techniques devised by the participants.

The main goal of this special session is to summarize the technical advance of OLR 2020, but it will welcome all submissions related to language recognition and multilingual soeecg processing.

http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/Interspeech_2021_Special_Session
Dong Wang (Tsinghua University)
Qingyang Hong (Xiamen University)
Xiaolei Zhang (Northwestern Polytechnical University)
Ming Li (Duke Kunshan University)
Yufeng Hao (Speechocean)

Far-field Multi-Channel Speech Enhancement Challenge for Video Conferencing (ConferencingSpeech 2021)

The ConferencingSpeech 2021 challenge is proposed to stimulate research in multi-channel speech enhancement and aims for processing the far-field speech from microphone arrays in the video conferencing rooms. Targeting the real video conferencing room application, the ConferencingSpeech 2021 challenge database is recorded from real speakers. The number of speakers and distances between speakers and microphone arrays vary according to the sizes of meeting rooms. Multiple microphone arrays from three different types of geometric topology are allocated in each recording environment.

The challenge will have two tasks:

Task 1 is multi-channel speech enhancement with single microphone array and focusing on practical application with real-time requirement.
Task 2 is multi-channel speech enhancement with multiple distributed microphone arrays, which is non-real-time track and does not have any constraints so that participants could explore any algorithms to obtain high speech quality.

To focus on the development of algorithms, the challenge requires the close training condition. Only provided lists of open source clean speech datasets and noise dataset could be used for training. In addition, the challenge will provide the development set, scripts for simulating the training data, baseline systems for participants to develop their systems. The final ranking of the challenge will be decided by the subjective evaluation. The subjective evaluation will be performed using Absolute Category Ratings (ACR) to estimate a Mean Opinion Score (MOS) through Tencent Online Media Subjective Evaluation platform.

More details about the data and challenge can be found from the evaluation plan of ConferencingSpeech 2021 challenge.

Besides the submitted paper related to ConferencingSpeech 2021 challenge, Paper on multi-channel speech enhancement are all encouraged to submit to this special session.

https://tea-lab.qq.com/conferencingspeech-2021
Wei Rao, Tencent Ethereal Audio Lab, China
Lei Xie, Northwestern Polytechnical University, China
Yannan Wang, Tencent Ethereal Audio Lab, China
Tao Yu, Tencent Ethereal Audio Lab, USA
Shinji Watanabe, Associate Professor, Carnegie Mellon University / Johns Hopkins University, USA
Zheng-Hua Tan, Aalborg University, Denmark
Hui Bu, AISHELL foundation, China
Shidong Shang, Tencent Ethereal Audio Lab, China

Voice quality characterization for clinical voice assessment: Voice production, acoustics, and auditory perception

The appraisal of voice quality is relevant to the clinical care of disordered voices. It contributes to the selection and optimization of clinical treatment as well as to the assessment of the outcome of the treatment. Levels of description of voice quality include the biomechanics of the vocal folds and their kinematics, temporal and spectral acoustic features, as well as the auditory scoring of hoarseness, hyper- and hypo-functionality, creakiness, diplophonia, harshness, etc. Broad and fuzzy definitions of terms regarding voice quality are in use, which impede scientific and clinical communication.

Aim of the special session is to contribute to the improvement of the clinical assessment of voice quality via a translational approach, which focuses on quantifying and explaining relationships between several levels of description. The objective is to gather new insights, advancement of knowledge and practical tools to assist researchers and clinicians in obtaining effective descriptions of voice quality and reliable measures of its acoustic correlates. Topics of interest include, but are not limited to, (i) the statistical analysis and automatic classification, possibly relying on state-of-the-art machine learning approaches, of distinct types of voice quality via non-obtrusively recorded features, (ii) the analysis and simulation of vocal fold vibrations by means of analytical, kinematic or mechanical modelling, (iii) the interpretation and modeling of both acoustic emission and/or high– speed video recordings such as videolaryngoscopy and videokymography, (iv) the synthesis of disordered voices jointly with auditory experimentation involving synthetic and natural disordered voice stimuli.

https://sites.google.com/view/voicequality-interspeech2021
Philipp Aichinger ([email protected])
Abeer Alwan ([email protected])
Carlo Drioli ([email protected])
Jody Kreiman ([email protected])
Jean Schoentgen ([email protected])

Automatic Speech Recognition in Air Traffic Management (ASR-ATM)

Air-traffic management is a dedicated domain where in addition to using the voice signal, other contextual information (i.e. air traffic surveillance data, meteorological data, etc.) plays an important role. Automatic speech recognition is the first challenge in the whole chain. Further processing usually requires transforming the recognized word sequence into the conceptual form, a more important application in ATM. This also means that the usual metrics for evaluating ASR systems (e.g. word error rate) are less important, and other performance criteria (i.e. objective such as command recognition error rate, callsign detection accuracy, overall algorithmic delay, real-time factor, or reduced flight times, or subjective such as decrease of a workload of the users) are employed.

The main objective of the special session is to bring together ATM players (both academic and industrial) interested in ASR and ASR researchers looking for new challenges. This can accelerate near future R&D plans to enable an integration of speech technologies to the challenging, but highly safety oriented air-traffic management domain.

https://www.haawaii.de/wp/interspeech-2021-agenda-for-special-session-on-automatic-speech-recognition-in-air-traffic-management-is-now-online/
Hartmut Helmke (DLR)
Pavel Kolcarek (Honeywell)
Petr Motlicek (Idiap Research Institute)

Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge

Dementia is a category of neurodegenerative diseases that entails a long-term and usually gradual decrease of cognitive functioning. The main risk factor for dementia is age, and therefore its greatest incidence is amongst the elderly. Due to the severity of the situation worldwide, institutions and researchers are investing considerably on dementia prevention and early detection, focusing on disease progression. There is a need for cost-effective and scalable methods for detection of dementia from its most subtle forms, such as the preclinical stage of Subjective Memory Loss (SML), to more severe conditions like Mild Cognitive Impairment (MCI) and Alzheimer's Dementia (AD) itself.

The ADReSSo (ADReSS, speech only) targets a difficult automatic prediction problem of societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD). The challenge builds on the success of the ADReSS Challenge (Luz et Al, 2020), the first such shared-task event focused on AD, which attracted 34 teams from across the world. While a number of researchers have proposed speech processing and natural language procesing approaches to AD recognition through speech, their studies have used different, often unbalanced and acoustically varied data sets, consequently hindering reproducibility and comparability of approaches. The ADReSSo Challenge will provide a forum for those different research groups to test their existing methods (or develop novel approaches) on a new shared standardized dataset. The approaches that performed best on the original ADReSS dataset employed features extracted from manual transcripts, which were provided. The ADReSSo challenge provides a more challenging and improved spontaneous speech dataset, and requires the creation of models straight from speech, without manual transcription. In keeping with the objectives of AD prediction evaluation, the ADReSSo challenge's dataset will be statistically balanced so as to mitigate common biases often overlooked in evaluations of AD detection methods, including repeated occurrences of speech from the same participant (common in longitudinal datasets), variations in audio quality, and imbalances of gender and age distribution. This task focuses AD recognition using spontaneous speech, which marks a departure from neuropsychological and clinical evaluation approaches. Spontaneous speech analysis has the potential to enable novel applications for speech technology in longitudinal, unobtrusive monitoring of cognitive health, in line with the theme of this year's INTERSPEECH, "Speech Everywhere!".

Important Dates

January 18, 2021 : ADReSSo Challenged announced.
March 20, 2021 : Model submission deadline.
March 26, 2021 : Paper submission deadline.
April 2, 2021 : Paper update deadline.
June 2, 2021 : Paper acceptance/rejection notification.
August 31 - September 3, 2021 : INTERSPEECH 2021.
https://edin.ac/3p1cyaI
Saturnino Luz, Usher Institute, University of Edinburgh
Fasih Haider, University of Edinburgh
Sofia de la Fuente, University of Edinburgh
Davida Fromm, Carnegie Mellon University
Brian MacWhinney, Carnegie Mellon University

SdSV Challenge 2021: Analysis and Exploration of New Ideas on Short-Duration Speaker Verification

Are you searching for new challenges in speaker recognition? Join SdSV Challenge 2021 which focuses on the analysis and exploration of new ideas for short duration speaker verification.

Following the success of the SdSV Challenge 2020, the SdSV Challenge 2021 focuses on systematic benchmark and analysis on varying degrees of phonetic variability on short-duration speaker recognition. The challenge consists of two tasks.

Task 1 is defined as speaker verification in text-dependent mode where the lexical content (in both English and Persian) of the test utterances is also taken into consideration.
Task 2 is defined as speaker verification in text-independent mode with same- and cross-language trials.

The main purpose of this challenge is to encourage participants on building single but competitive systems, to perform analysis as well as to explore new ideas, such as multi-task learning, unsupervised/self-supervised learning, single-shot learning, disentangled representation learning and so on, for short-duration speaker verification. The participating teams will get access to a train set and the test set drawn from the DeepMine corpus which is the largest public corpus designed for short-duration speaker verification with voice recordings of 1800 speakers. The challenge leaderboard is hosted at CodaLab.

For more information visit: https://sdsvc.github.io/
Evaluation plan:
Contact: [email protected]
Hossein Zeinali (Amirkabir University of Technology, Iran)
Kong Aik Lee (I2R, A*STAR, Singapore)
Jahangir Alam (CRIM, Canada)
Lukáš Burget (Brno University of Technology, Czech Republic)

Acoustic Echo Cancellation (AEC) Challenge

The INTERSPEECH 2021 Acoustic Echo Cancellation (AEC) challenge is designed to stimulate research in the AEC domain by open sourcing a large training dataset, test set, and subjective evaluation framework. We provide two new open source datasets for training AEC models. The first is a real dataset captured using a large-scale crowdsourcing effort. This dataset consists of real recordings that have been collected from over 5,000 diverse audio devices and environments. The second is a synthetic dataset with added room impulse responses and background noise derived from the INTERSPEECH 2020 DNS Challenge. An initial test set will be released for the researchers to use during development and a blind test near the end which will be used to decide the final competition winners. We believe these datasets are large enough to facilitate deep learning and representative enough for practical usage in shipping telecommunication products.

The dataset and rules are available here.

Please feel free to reach out to us, if you have any questions or need clarification about any aspect of the challenge.

https://aka.ms/aec-challenge
Ross Cutler, Microsoft Corp, USA
Ando Saabas, Microsoft Corp, Tallinn
Tanel Parnamaa, Microsoft Corp, Tallinn
Markus Loide, Microsoft Corp, Tallinn
Sten Sootla, Microsoft Corp, Tallinn
Hannes Gamper, Microsoft Corp, USA
Sebastian Braun, Microsoft Corp, USA
Karsten Sorensen, Microsoft Corp, USA
Robert Aichner, Microsoft Corp, USA
Sriram Srinivasan, Microsoft Corp, USA

Non-Autoregressive Sequential Modeling for Speech Processing

Non-autoregressive modeling is a new direction in speech processing research that has recently emerged. One advantage of non-autoregressive models is their decoding speed: decoding is only composed of forward propagation through a neural network, hence complicated left-to-right beam search is not necessary. In addition, they do not assume a left-to-right generation order and thus represent a paradigm shift in speech processing, where left-to-right, autoregressive models have been believed to be legitimate. This special session aims to facilitate knowledge sharing between researchers involved in non-autoregressive modeling across various speech processing fields, including, but not limited to, automatic speech recognition, speech translation, and text to speech, via panel discussions with leading researchers followed by a poster session.

https://sw005320.github.io/INTERSPEECH21_SS_NAR_SP/
Katrin Kirchhoff (Amazon)
Shinji Watanabe (Carnegie Mellon University)
Yuya Fujita (Yahoo Japan Corporation)

DiCOVA: Diagnosis of COVID-19 using Acoustics

The COVID-19 pandemic has resulted in more than 93 million infections, and more than 2 million casualties. Large scale testing, social distancing, and face masks have been critical measures to help contain the spread of the infection. While the list of symptoms is regularly updated, it is established that in symptomatic cases COVID-19 seriously impairs normal functioning of the respiratory system. Does this alter the acoustic characteristics of breathe, cough, and speech sounds produced through the respiratory system? This is an open question waiting for answers. A COVID-19 diagnosis methodology based on acoustic signal analysis, if successful, can provide a remote, scalable, and economical means for testing of individuals. This can supplement the existing nucleotides based COVID-19 testing methods, such as RT-PCR and RAT.

The DiCOVA Challenge is designed to find answers to the question by enabling participants to analyze an acoustic dataset gathered from COVID-19 positive and non-COVID-19 individuals. The findings will be presented in a special session at Interspeech 2021. The timeliness, and the global societal importance of the challenge warrants focussed effort from researchers across the globe, including from the fields of medical and respiratory sciences, mathematical sciences, and machine learning engineers. We look forward to your participation!

http://dicova2021.github.io/
Neeraj Sharma (Indian Institute of Science, Bangalore, India)
Prasanta Kumar Ghosh (Indian Institute of Science, Bangalore, India)
Srikanth Raj Chetupalli (Indian Institute of Science, Bangalore, India)
Sriram Ganapathy (Indian Institute of Science, Bangalore, India)

Deep Noise Suppression Challenge – INTERSPEECH 2021

The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH 2020 and ICASSP 2020. We open-sourced training and test datasets for the wideband scenario. We also open-sourced a subjective evaluation framework based on ITU-T standard P.808, which was used to evaluate challenge submissions. Many researchers from academia and industry made significant contributions to push the field forward, yet even the best noise suppressor was far from achieving superior speech quality in challenging scenarios. In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios. The two tracks in this challenge will focus on real-time denoising for (i) wide band, and (ii) full band scenarios. We are also making available a reliable non-intrusive objective speech quality metric for wide band called DNSMOS for the participants to use during their development phase. The final evaluation will be based on ITU-T P.835 subjective evaluation framework that gives the quality of speech and noise in addition to the overall quality of the speech.

We will have two tracks in this challenge:

Track 1 : Real-Time Denoising track for wide band scenario The noise suppressor must take less than the stride time Ts (in ms) to process a frame of size T (in ms) on an Intel Core i5 quad-core machine clocked at 2.4 GHz or equivalent processor. For example, Ts = T/2 for 50% overlap between frames. The total algorithmic latency allowed including the frame size T, stride time Ts, and any look ahead must be less than or equal to 40ms. For example, for a real-time system that receives 20ms audio chunks, if you use a frame length of 20ms with a stride of 10ms resulting in an algorithmic latency of 30ms, then you satisfy the latency requirements. If you use a frame of size 32ms with a stride of 16ms resulting in an algorithmic latency of 48ms, then your method does not satisfy the latency requirements as the total algorithmic latency exceeds 40ms. If your frame size plus stride T1=T+Ts is less than 40ms, then you can use up to (40-T1) ms future information.
Track 2 : Real-Time Denoising track for full band scenario Satisfy Track 1 requirements but at 48 kHz.

More details about the datasets and the challenge are available in the paper and the challenge github page. Participants must adhere to the rules of the challenge.

https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2021/
Chandan K A Reddy (Microsoft Corp, USA)
Hari Dubey (Microsoft Corp, USA)
Kazuhito Koishada (Microsoft Corp, USA)
Arun Nair (Johns Hopkins University, USA)
Vishak Gopal (Microsoft Corp, USA)
Ross Cutler (Microsoft Corp, USA)
Robert Aichner (Microsoft Corp, USA)
Sebastian Braun (Microsoft Research, USA)
Hannes Gamper (Microsoft Research, USA)
Sriram Srinivasan (Microsoft Corp, USA)

Privacy-preserving Machine Learning for Audio, Speech and Language Processing

This special session focuses on privacy-preserving machine learning (PPML) techniques in speech, language and audio processing, including centralized, distributed and on-device processing approaches. Novel contributions and overviews on the theory and applications of PPML in speech, language and audio are invited. We encourage submissions related to ethical and regulatory aspects of PPML in this context. Sending speech, language or audio data to a cloud server exposes private information. One approach called anonymization is to preprocess the data so as to hide information which could identify the user by disentangling it from other useful attributes. PPML is a different approach, which solves this problem by moving computation near the clients. Due to recent advances in Edge Computing and Neural Processing Units on mobile devices, PPML is now a feasible technology for most speech, language and audio applications that enables companies to train on customer data without needing them to share the data. With PPML, data can sit on a customer's device where it is used for model training. During the training process, models from several clients are often shared with aggregator nodes that perform model averaging and sync the new models to each client. Next, the new averaged model is used for training on each client. This process continues and enables each client to benefit from training data on all other clients. Such processes were not possible in conventional audio/speech ML. On top of that, high-quality synthetic data can also be used for training thanks to advances in speech, text, and audio synthesis.

https://sites.google.com/view/ppmlforaudio
Harishchandra Dubey (Microsoft)
Amin Fazel (Amazon, Alexa)
Mirco Ravanelli (MILA,Université de Montréal)
Emmanuel Vincent (Inria)

Computational Paralinguistics ChallengE (ComParE) - COVID-19 Cough, COVID-19 Speech, Escalation & Primates

Interspeech ComParE is an open Challenge dealing with states and traits of speakers as manifested in their speech signal’s properties. In this 13th edition, we introduce four new tasks and Sub-Challenges:

COVID-19 Cough based recognition,
COVID-19 Speech based recognition,
Escalation level assessment in spoken dialogues,
Primates classification based on their vocalisations.

Sub-Challenges allow contributors to find their own features with their own machine learning algorithm. However, a standard feature set and tools including recent deep learning approaches are provided that may be used. Participants have five trials on the test set per Sub-Challenge. Participation has to be accompanied by a paper presenting the results that undergoes the Interspeech peer-review.

Contributions using the provided or equivalent data are sought for (but not limited to):

Participation in a Sub-Challenge
Contributions around the Challenge topics

Results of the Challenge and Prizes will be presented at Interspeech 2021 in Brno, Czechia.

http://www.compare.openaudio.eu/now/
Björn Schuller (University of Augsburg, Germany / Imperial College, UK)
Anton Batliner (University of Augsburg, Germany)
Christian Bergler (FAU, Germany)
Cecilia Mascolo (University of Cambridge, UK)
Jing Han (University of Cambridge, UK)
Iulia Lefter (Delft University of Technology, The Netherlands)
Heysem Kaya (Utrecht University, The Netherlands)

OpenASR20 and Low Resource ASR Development

The goal of the OpenASR (Open Automatic Speech Recognition) Challenge is to assess the state of the art of ASR technologies for low-resource languages.

The OpenASR Challenge is an open challenge created out of the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program that encompasses more tasks, including CLIR (cross-language information retrieval), domain classification, and summarization. For every year of MATERIAL, NIST supports a simplified, smaller scale evaluation open to all, focusing on a particular technology aspect of MATERIAL. The capabilities tested in the open challenges are expected to ultimately support the MATERIAL task of effective triage and analysis of large volumes of data, in a variety of less-studied languages.

The special session aims to bring together researchers from all sectors working on ASR for low-resource languages to discuss the state of the art and future directions. It will allow for fruitful exchanges between OpenASR20 Challenge participants and other researchers working on low-resource ASR. We invite contributions from OpenASR20 participants, MATERIAL performers, as well as any other researchers with relevant work in the low-resource ASR problem space.

Cross-lingual training techniques to compensate for ten-hour training condition
Factors influencing ASR performance on low resource languages by gender and dialect
Resource conditions used for unconstrained development condition
Low Resource ASR tailored to MATERIAL’s Cross Language Information Retrieval Evaluation
Genre mismatch condition between speech training data and evaluation
Other topics focused on low-resource ASR challenges and solutions
https://www.nist.gov/itl/iad/mig/openasr-challenge
Peter Bell, University of Edinburgh
Jayadev Billa, University of Southern California Information Sciences Institute
William Hartmann, Raytheon BBN Technologies
Kay Peterson, National Institute of Standards and Technology

Interspeech 2021

Apple is a sponsor of the 33rd Interspeech conference, which was held in a hybrid format from August 30 to September 3. Interspeech is a global conference focused on cognitive intelligence for speech processing and application.

Accepted Papers

Conference accepted papers, a discriminative entity aware language model for virtual assistants.

Mandana Saebi, Ernie Pusateri, Aaksha Meghawat, Christophe Van Gysel

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Vikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Panayiotis Georgiou, Sachin Kajarekar, Jefferey Bigham

DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition in Virtual Assistants

Deepak Muralidharan, Joel Ruben Antony Moniz, Weicheng Zhang, Stephen Pulman, Lin Li, Megan Barnes, Jingjing Pan, Jason Williams, Alex Acero

Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation

Vineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha, Pranay Dighe, Chandra Dhir

User-Initiated Repetition-Based Recovery in Multi-Utterance Dialogue Systems

Hoang Long Nguyen, Vincent Renkens, Joris Pelemans, Srividya Pranavi Potharaju, Anil Kumar Nalamalapu, Murat Akbacak

Talks and Workshops

Meet Apple was an opportunity to learn more about our ML teams, working at Apple, and how to apply to full-time positions. This talk was held virtually on Wednesday, September 1 at 9:30 am PDT.

Apple hosted a panel on internships, where attendees learned more about internship opportunities across our machine learning teams. It was held virtually on September 2 at 9:30 am PDT.

Affinity Events

Apple sponsored the Workshop for Young Female Researchers in Speech Science & Technology which took place virtually on Sunday, August 29.

Matthias Paulik participated in the 8th Students Meet Experts event as a panelist. This event took place virtually on Thursday, September 2.

Learn more about Apple’s company-wide efforts in inclusion and diversity .

Let's innovate together. Build amazing machine-learned experiences with Apple. Discover opportunities for researchers, students, and developers by visiting our Work With Us page.

Discover opportunities in Machine Learning.

Our research in machine learning breaks new ground every day.

Work with us

The 22nd Annual Conference of the International Speech Communication Association

Interspeech 2021, brno, czechia 30 august - 3 september 2021.

New user? please register first by clicking HERE .

If you lost or forgot your password, click HERE .

solidarity - (ua) - (ru)
news - (ua) - (ru)
donate - donate - donate

for scientists:

ERA4Ukraine
Assistance in Germany
Ukrainian Global University
#ScienceForUkraine

default search action

combined dblp search
author search
venue search
publication search

Conference of the International Speech Communication Association (INTERSPEECH)

> Home > Conferences and Workshops

Venue statistics

records by year

frequent authors

Venue Information

has part: International Workshop on the History of Speech Communication Research (HSCR)
has part: International Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction (MA3HMI)
has part: Workshop on Statistical and Perceptual Audition (SAPA)
has part: International Workshop on Speech, Language and Audio in Multimedia (SLAM)
related: European Conference on Speech Communication and Technology (EUROSPEECH)
AVSP - Auditory-Visual Speech Processing: 1997 , 1998 , 1999 , 2001 , 2003 , 2005 , 2007 , 2008 , 2009 , 2010 , 2011 , 2013 , 2015
Diss - Disfluency in Spontaneous Speech: 2001 , 2003 , 2005 , 2010 , 2013
ExLing - Experimental Linguistics: 2006 , 2008 , 2010 , 2011
HSCR - History of Speech Communication Research: 2015
IWSLT - Spoken Language Translation: 2004 , 2005 , 2006 , 2007 , 2008 , 2009 , 2010 , 2011 , 2012
MA3HMI - Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction; 2014
MAVEBA - Models and Analysis of Vocal Emissions for Biomedical Applications: 1999 , 2001 , 2003 , 2005 , 2007 , 2009 , 2011
MLSLP - Machine Learning in Speech and Language Processing: 2011 , 2012
Odyssey - The Speaker and Language Recognition Workshop: 2012 , 2010 , 2008 , 2004 , 2001
SAPA - Statistical and Perceptual Audio Processing: 2004 , 2006 , 2008 , 2010 , 2012
SLAM - Speech, Language and Audio in Multimedia: 2013 , 2014 ,
SLaTE - Speech and Language Technology in Education: 2007 , 2009 , 2011 , 2013 , 2015
SLTU - Spoken Language Technologies for Under-resourced Languages: 2008 , 2010 , 2012 , 2014
SSW - Speech Synthesis: 1990 , 1994 , 1998 , 2001 , 2004 , 2007 , 2010 , 2013
WOCCI - Child, Computer and Interaction: 2008 , 2009 , 2012 , 2014

23rd INTERSPEECH 2022: Incheon, Korea

interspeech2022.org

22nd INTERSPEECH 2021: Brno, Czechia

interspeech2021.org

21st INTERSPEECH 2020: Shanghai, China

interspeech2020.org

20th INTERSPEECH 2019: Graz, Austria

www.interspeech2019.org

19th INTERSPEECH 2018: Hyderabad, India

www.interspeech2018.org

18th INTERSPEECH 2017: Stockholm, Sweden

www.interspeech2017.org

17th INTERSPEECH 2016: San Francesco, CA, USA

16th interspeech 2015: dresden, germany.

www.interspeech2015.org

15th INTERSPEECH 2014: Singapore

14th interspeech 2013: lyon, france, 13th interspeech 2012: portland, oregon, usa, 12th interspeech 2011: florence, italy, 11th interspeech 2010: makuhari, japan, 10th interspeech 2009: brighton, uk, 9th interspeech 2008: brisbane, australia, 8th interspeech 2007: antwerp, belgium, 9th icslp 2006: pittsburgh, pa, usa, 8th icslp 2004: jeju island, korea, 7th icslp 2002: denver, colorado, usa, 6th icslp 2000: beijing, china, 5th icslp 1998: sydney, australia, 4th icslp 1996: philadelphia, pa, usa, 3rd icslp 1994: yokohama, japan, 2nd icslp 1992: banff, alberta, canada, 1st icslp 1990: kobe, japan.

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default . You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.

Unpaywalled article links

load links from unpaywall.org

Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy .

Archived links via Wayback Machine

load content from archive.org

Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy .

Reference lists

load references from crossref.org and opencitations.net

Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org , opencitations.net , and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy , as well as the AI2 Privacy Policy covering Semantic Scholar.

Citation data

load citations from opencitations.net

Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.

OpenAlex data

load data from openalex.org

Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex .

last updated on 2024-04-19 19:19 CEST by the dblp team

see also: Terms of Use | Privacy Policy | Imprint

dblp was originally created in 1993 at:

since 2018, dblp has been operated and maintained by:

the dblp computer science bibliography is funded and supported by:

NVIDIA at INTERSPEECH 2021

August 30 – September 3, 2021

Join us at INTERSPEECH, a technical conference focused on the latest research and technologies in speech processing. NVIDIA will present accepted papers on our latest research in speech recognition and speech synthesis.

Explore NVIDIA’s work in conversational AI research across automatic speech recognition, natural language processing, and text-to-speech. This chapter of I AM AI reveals how NVIDIA developers and creators deploy state-of-the art models for expressive speech synthesis capabilities.

Conference Schedule at a Glance

Come check out NVIDIA’s papers at this year’s hybrid INTERSPEECH event. They cover a wide range of groundbreaking research in the field of conversational AI, including datasets, pre-trained models, and real-world applications for speech recognition and text-to-speech.

Get Started With Pre-trained Models

NVIDIA offers pre-trained models for speech recognition, language understanding, and speech synthesis through the NGC catalog. These models are highly accurate and have been trained on a variety of open and proprietary datasets for thousands of hours using GPUs. The NGC models are seamlessly integrated with SDKs such as NVIDIA NeMo for building, training, and fine-tuning conversational AI models

Create Cutting-Edge Conversational AI Models

Explore NVIDIA NeMo, an open-source toolkit for researchers developing new state-of-the-art conversational AI models. It provides a collection of modules and models for automatic speech recognition, natural language processing, and text-to-speech. NeMo modules and models are highly interoperable with popular PyTorch and PyTorch Lightning frameworks, giving researchers exceptional flexibility.

Develop Conversational AI Apps For Enterprise

NVIDIA offers Riva, a GPU- accelerated SDK to help enterprises develop multimodal conversational AI applications. It includes highly accurate pre-trained models in NGC, tools for fine-tuning these models on custom datasets, and optimized real-time speech and language skills for tasks like transcription and natural language-understanding.

NVIDIA Developer Program

Get the advanced tools and training you need to successfully build applications on all NVIDIA technology platforms.

NVIDIA Deep Learning Institute (DLI)

With the NVIDIA Deep Learning Institute (DLI), developers, data scientists, researchers, and students can access hands-on training in AI, accelerated computing, and accelerated data science to advance their knowledge in topics like AI for speech processing.

Use code INTERSPEECH25 to receive 25% off the upcoming workshops:

Building Transformer-Based Natural Language Processing Applications September 23, 2021 at 9:00am-5:00pm PDT.

Building Conversational AI Applications November 24, 2021 at 9:00am-5:00pm CET

Unlock Your Startup’s Potential

NVIDIA Inception nurtures cutting-edge startups that are revolutionizing industries with artificial intelligence. Our acceleration platform offers go-to-market support, expertise, and technology—all tailored to a new business’s evolution.

LIKE NO PLACE YOU’VE EVER WORKED

At NVIDIA, you’ll solve some of the world’s hardest problems and discover never-before-seen ways to improve the quality of life for people everywhere. From healthcare to robots, self-driving cars to blockbuster movies—and a growing list of new opportunities every single day. Explore all of our open roles, including internships and new college graduate positions.

Learn more about our career opportunities by exploring current job openings as well as university jobs .

Get the latest from NVIDIA on Supercomputing

Company Overview
Venture Capital (NVentures)
NVIDIA Foundation
Social Responsibility
Technologies
Company Blog
Technical Blog
Stay Informed
Events Calendar
GTC AI Conference
NVIDIA On-Demand
Executive Insights
Startups and VCs
Documentation
Technical Training
Training for IT Professionals
Professional Services for Data Science

Privacy Policy
Manage My Privacy
Do Not Sell or Share My Data
Terms of Service
Accessibility
Corporate Policies
Product Security

INTERSPEECH 2021 Acoustic Echo Cancellation Challenge

Ross Cutler ,
Ando Saabas ,
Tanel Panarmaa ,
Markus Loide ,
Sten Sootla ,
Marju Purin ,
Hannes Gamper ,
Sebastian Braun ,
Robert Aichner ,
Sriram Srinivasan

Interspeech 2021 | August 2021

The INTERSPEECH 2021 Acoustic Echo Cancellation Challenge is intended to stimulate research in the area of acoustic echo cancellation (AEC), which is an important part of speech enhancement and still a top issue in audio communication. Many recent AEC studies report good performance on synthetic datasets where the training and testing data may come from the same underlying distribution. However, AEC performance often degrades signiﬁcantly on real recordings. Also, most of the conventional objective metrics such as echo return loss enhancement and perceptual evaluation of speech quality do not correlate well with subjective speech quality tests in the presence of background noise and reverberation found in realistic environments. In this challenge, we open source two large datasets to train AEC models under both single talk and double talk scenarios. These datasets consist of recordings from more than 5,000 real audio devices and human speakers in real environments, as well as a synthetic dataset. We also open source an online subjective test framework and provide an online objective metric service for researchers to quickly test their results. The winners of this challenge are selected based on the average Mean Opinion Score achieved across all different single talk and double talk scenarios.

Follow on Twitter
Like on Facebook
Follow on LinkedIn
Subscribe on Youtube
Follow on Instagram
Subscribe to our RSS feed

Share this page:

Share on Twitter
Share on Facebook
Share on LinkedIn
Share on Reddit

Three Residents File Lawsuit against Moscow City Officials for Unlawful Arrests

Moscow, Idaho–Gabriel Rench and Sean and Rachel Bohnet announced at a press conference on Wednesday that they are filing a lawsuit against Moscow city officials for violating their First Amendment rights.

The lawsuit is directed at the City of Moscow, the police chief, arresting law enforcement officers, and the prosecuting attorney.

Rench and the Bohnets were arrested last year for not wearing masks during an outdoor church song service.

The federal case, filed this week by the Thomas More Society, asserts that the city violated its own ordinance, Amended Public Health Emergency Order 20-03. The order allows the mayor to issue public health emergency orders but exempts core political speech activities protected by the United States and Idaho Constitutions. The filing also contends that Moscow violated Idaho state law protecting the free exercise of religion.

Rench and the Bohnets seek damages for the violation of their constitutional rights and punitive damages for the reckless indifference to their protected core political and religious rights. The lawsuit also seeks to curtail the reach of the city’s amended ordinance so as not to restrict core political and religious activities.

At the press conference, Sean Bohnet said, “Our rights were carelessly ignored.”

Rench also commented that the city council should have publicly recognized the city’s failure to protect their rights.

“Instead, the city council revised its order to target myself and others who would exercise their constitutional rights,” Rench stated. “The council’s actions have fragmented Moscow and increased hostility in the broader community.”

The lawsuit does not seek pecuniary benefits other than reimbursing legal fees.

Special Counsel Michael Jacques explained that they are not suing for money. He said, “The point of this is not necessarily pecuniary gain. It is to make a point with the government agencies that they can’t ignore their limitations and they need to prioritize our first amendment rights.”

The lawsuit names as defendants the City of Moscow, City Chief of Police James Fry, Law Enforcement Officers Will Kasselt, Jake Lee and Carlee Brown and Prosecuting Attorney Elizabeth Warner.

Read the full complaint here .

On September 23, 2020, Gabriel Rench, and Sean and Rachel Bohnet were arrested while participating in a “Psalm Sing” sponsored by Christ Church in the Moscow City Hall parking lot.

Read more of that story: Three People Arrested At Psalm Sing, Multiple Others Cited

Months later, the Moscow prosecuting attorney moved to dismiss the charges.

Read more here: City of Moscow Admits Mistake in Arresting People at Church Singing Event

The city attorney revealed to the court that, while city codes allow the mayor to issue public health emergency orders, exemptions, unless specifically prohibited, include “any and all expressive and associative activity protected by the U.S. and Idaho constitutions, including speech, press, assembly, and/or religious activity.”

“The city violated its own ordinance when law enforcement wrongly arrested Gabriel Rench and Sean and Rachel Bohnet,” said Special Counsel Michael Jacques. He added that law enforcement officers “demonstrated reckless indifference to the defendants’ First Amendment rights.”

After the arrests, Moscow City Council amended the ordinance regarding public health emergencies to apply to all persons and activities in Moscow, including political speech.

Jacques explained that with the amendment, the ordinance now violates the First Amendment and should be declared unconstitutional.

Jesse Sumpter

Jesse Sumpter lives in Moscow, Idaho with his wife and daughter.

The plaintiffs are making a huge mistake. ALWAYS SEEK MONETARY COMPENSATION FROM ALL PARTIES WITHOUT EXCEPTION!! Failure to exact financial penalties from government gangsters and their supporters tells them that there are no serious consequences for violating our civil rights. So unless you’re going to criminally prosecute and jail them, it’s important to sue the bastards into bankruptcy.

The first step is to win, the second step is to get a permanent injunction, the third step is to then have other plaintiffs step forward and sue them for economic damages.

Decoding with shrinkage-based language models

In this paper, we investigate the use of a class-based exponential language model when directly integrated into speech recognition or machine translation decoders. Recently, a novel class-based language model, Model M, was introduced and was shown to outperform regular n-gram models on moderate amounts of Wall Street Journal data. This model was motivated by the observation that shrinking the sum of the parameter magnitudes in an exponential language model leads to better performance on unseen data. In this paper we directly integrate the shrinkage-based language model into two different state-of-the-art machine translation engines as well as a large-scale dynamic speech recognition decoder. Experiments on standard GALE and NIST development and evaluation sets show considerable and consistent improvement in both machine translation quality and speech recognition word error rate. © 2010 ISCA.

Publication

Ahmad Emami
Stanley Chen
Abraham Ittycheriah
Hagen Soltau
Human-Centered AI

SeMiTri: A framework for semantic annotation of heterogeneous trajectories

Flower classification for a citizen science mobile app, the role of engineering work in chi.

COMMENTS

INTERSPEECH 2021
Interspeech 2021 in Brno is over and we hope that you spent a productive and pleasant time in Brno in person, or at least virtually. We, the organizers, are happy that the conference went as we expected, without major problems or technical glitches. We were extremely glad to see our colleagues that could make to Brno, in person, and hope that ...
Interspeech 2021
Browse the proceedings of Interspeech 2021, the 20th International Conference on Spoken Language Processing, held in Brno, Czechia. Find papers on speech synthesis, disordered speech, speech signal analysis, speaker recognition, and more.
Call for Papers
Papers intended for INTERSPEECH 2021 should be up to four pages of text. An optional fifth page can be used for references only. Paper submissions must conform to the format defined in the paper preparation guidelines and as detailed in the author's kit on the conference webpage. Submissions may also be accompanied by additional files such as ...
Important Dates
Instructions for presentation will be available. July 15, 2021. Schedule of papers will be available. August 15, 2021. Deadline for submission of videos and other material for Unified virtual sessions. Deadline of requests for Virtual gatherings. August 30, 2021. INTERSPEECH 2021 Tutorial Day. August 31, 2021.
Registration
In the unfortunate case of having too few participants able/willing to come to Brno, organizers would have to "pull the emergency brake" and switch Interspeech 2021 to fully virtual. VISA REQUIREMENTS. Invitation letters for visa purposes can only be sent to participants who have completed the registration process.
Special Sessions & Challenges
The INTERSPEECH 2021 Acoustic Echo Cancellation (AEC) challenge is designed to stimulate research in the AEC domain by open sourcing a large training dataset, test set, and subjective evaluation framework. We provide two new open source datasets for training AEC models. The first is a real dataset captured using a large-scale crowdsourcing effort.
PDF 22nd Annual Conference of the International Speech Communication
INTERSPEECH 2021 : ASSESSMENT OF PATHOLOGICAL SPEECH AND LANGUAGE II ASSESSING POSTERIOR-BASED MISPRONUNCIATION DETECTION ON FIELD-COLLECTED RECORDINGS FROM CHILD SPEECH THERAPY SESSIONS..... 181 Adam Hair, Guanlong Zhao, Beena Ahmed, Kirrie J. Ballard, Ricardo Gutierrez-Osuna
INTERSPEECH 2021 › Speech and Language Processing
INTERSPEECH 2021. 2. September 2021. INTERSPEECH is the world's largest and most comprehensive conference on the science and technology of spoken language processing. INTERSPEECH conferences emphasize interdisciplinary approaches addressing all aspects of speech science and technology, ranging from basic theories to advanced applications.
Interspeech 2021
This this a short video on the Highlights of Interspeech 2021 selected by Seung Her Yang and Andreas Maier.Selected Papers & Presentations:Heidy Christiansen...
dblp: Interspeech 2021
22nd Interspeech 2021: Brno, Czechia. Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021. ISCA 2021.
Interspeech 2021
Interspeech 2021. Apple is a sponsor of the 33rd Interspeech conference, which was held in a hybrid format from August 30 to September 3. Interspeech is a global conference focused on cognitive intelligence for speech processing and application.
Microsoft at INTERSPEECH 2021
Website: INTERSPEECH 2021 (opens in new tab) Opens in a new tab. Microsoft is proud to be a diamond sponsor of INTERSPEECH 2021, the world's largest and most comprehensive conference on the science and technology of spoken language processing.Microsoft attendees will be presenting 32 papers, one workshop, one special session, and two challenges during this event.
The 22nd Annual Conference of the International Speech ...
The 22nd Annual Conference of the International Speech Communication Association Interspeech 2021 Brno, Czechia 30 August - 3 September 2021
dblp: INTERSPEECH
Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021. ISCA 2021. 21st INTERSPEECH 2020: Shanghai, China. interspeech2020.org. view. table of contents in dblp; electronic edition via DOI (open access)
Join us at Interspeech 2021 Event
INTERSPEECH 2021. August 30 - September 3, 2021. Join us at INTERSPEECH, a technical conference focused on the latest research and technologies in speech processing. NVIDIA will present accepted papers on our latest research in speech recognition and speech synthesis. Register Now.
INTERSPEECH 2021 Deep Noise Suppression Challenge
INTERSPEECH 2021 Deep Noise Suppression Challenge. The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH and ICASSP 2020. We open-sourced training and test datasets for the ...
INTERSPEECH 2021 Acoustic Echo Cancellation Challenge
The INTERSPEECH 2021 Acoustic Echo Cancellation Challenge is intended to stimulate research in the area of acoustic echo cancellation (AEC), which is an important part of speech enhancement and still a top issue in audio communication. Many recent AEC studies report good performance on synthetic datasets where the training and testing data may ...
Three Residents File Lawsuit against Moscow City ...
Moscow, Idaho-Gabriel Rench and Sean and Rachel Bohnet announced at a press conference on Wednesday that they are filing a lawsuit against Moscow city officials for violating their First Amendment rights. The lawsuit is directed at the City of Moscow, the police chief, arresting law enforcement officers, and the prosecuting attorney.
Decoding with shrinkage-based language models for INTERSPEECH 2010
In this paper we directly integrate the shrinkage-based language model into two different state-of-the-art machine translation engines as well as a large-scale dynamic speech recognition decoder. Experiments on standard GALE and NIST development and evaluation sets show considerable and consistent improvement in both machine translation quality ...
Moscow's Green Bond Debut
On 27 May 2021, Moscow City placed the first green subfederal bond issue worth 70 billion roubles on MOEX's Sustainability Sector, with BCS Global Markets acting as the Co-Arranger. The 74th Moscow City bond issue worth 70 billion roubles was placed in full, with х1.23 oversubscription after receiving 721 bids with a total size of 86.3 ...
METALLICA
How many people are here? they entire town came to watch METALLICA perform ENTER SANDMAN!! Loving the drums on this one. Enjoy.#METALLICA#ENTERSANDMANI want ...
[4K] Walking Streets Moscow. Moscow-City
Walking tour around Moscow-City.Thanks for watching!MY GEAR THAT I USEMinimalist Handheld SetupiPhone 11 128GB https://amzn.to/3zfqbboMic for Street https://...