Login Paper Search My Schedule Paper Index Help

My ICIP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDARS-6.7
Paper Title SPEAKER-INDEPENDENT LIPREADING BY DISENTANGLED REPRESENTATION LEARNING
Authors Qun Zhang, Shilin Wang, Gongliang Chen, Shanghai Jiao Tong University, China
SessionARS-6: Image and Video Interpretation and Understanding 1
LocationArea H
Session Time:Tuesday, 21 September, 15:30 - 17:00
Presentation Time:Tuesday, 21 September, 15:30 - 17:00
Presentation Poster
Topic Image and Video Analysis, Synthesis, and Retrieval: Image & Video Interpretation and Understanding
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract With the development of the deep learning technology, automatic lipreading based on deep neural network can achieve reliable results for speakers appeared in the training dataset. However, speaker-independent lipreading, i.e. lipreading for unseen speakers, is still a challenging task, especially when the training samples are quite limited. To improve the recognition performance in the speaker-independent scenario, a new deep neural network structure, named Disentangled Visual Speech Recognition Network (DVSR-Net), is proposed in this paper. DVSR-Net is designed to disentangle the identity-related features and the content-related features from the lip image sequence. To further eliminate the identity information that remained in the content features, a content feature refinement stage is designed in network optimization. By this way, the extracted features are closely related to the content information and irrelevant to the various talking style and thus the speech recognition performance for unseen speakers can be improved. Experiments on two widely used datasets have demonstrated the effectiveness of the proposed network in the speaker-independent scenario.