Contact information

Tel: +86 (0755) 2603-6870

Email:

Address: Room 1701, Information Building, Tsinghua Campus, The University Town

Office Hours:

  • 个人简历
  • 教学
  • 研究领域
  • 研究成果
  • 奖励荣誉
  • Biography

    Education

    Jul. 2001 – Jun. 2005, Ph.D. in Computer Science and Technology, Tsinghua University

    Jul. 1999 – Jul. 2001, Master in Computer Science and Technology, Tsinghua University

    Jul. 1995 – Jul. 1999, Bachelor in Computer Science and Technology, Tsinghua University


    Professional Experience

    Nov. 2018 – present, Research Associate, Shenzhen International Graduate School, Tsinghua University

    May. 2008 – present, Honorary Research Associate, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong

    Dec. 2008 – Nov. 2018, Research Associate, Graduate School at Shenzhen, Tsinghua University

    Sep. 2007 – Dec. 2008, Lecturer, Graduate School at Shenzhen, Tsinghua University

    May. 2005 – Sep. 2007, Postdoctoral Fellow, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong

     


    Additional Positions

    1. Committee Member, Speech Dialogue and Auditory Processing Technical Committee, China Computer Federation (CCF TFSDAP);

    2. Member, China Computer Federation (CCF);

    3. Member, Institute of Electrical and Electronics Engineers (IEEE);

    4. Member, International Speech Communication Association (ISCA);

    5. Reviewer, IEEE/ACM Transactions on Speech and Audio Processing, Speech Communication, Multimedia Tools and Applications;

    6. Reviewer, INTERSPEECH, ICASSP, ISCSLP, NCMMSC, ACL, IJCNLP, NeurIPS, AAAI, IJCAI;

    Opening

    Personal Webpage

    Download CV

  • Current Courses

    1. Speech Signal Digital Processing

    2. Big Data Analysis (B)


    Master’s & Ph.D. Advising

  • Research Interests

    1. Speech signal processing

    2. Audio-visual speech processing

    3. Expressive text-to-audio-visual speech synthesis

    4. Natural language understanding and generation

    5. Multimedia applications

    6. Affective computing

    7. Machine learning


    Projects

    1. National Natural Science Foundation of China (62076144): Paralinguistic Speech Attributes Disentangled Representation Learning and Controllable Speech Synthesis for Intelligent Speech Interaction

    2. National Natural Science Foundation of China – Research Grants Council  (Hong Kong) Joint Research Scheme (61531166002, N_CUHK404/15): Interactive Attribute Mining and Animation Speech Synthesis for Web-based Spoken Dialog Interactions

    3. National Natural Science Foundation of China Key Project (61433018):Psychological Mechanism and Computational Modeling for Internet Discourse Understanding

    4. National Natural Science Foundation of China (61375027): Perception and Generation of Deep Information for Natural Spoken Dialog Interaction


    Research Output

  • Selected Publications

    [1]Xixin WU, Yuewen CAO, Hui LU, Songxiang LIU, Disong WANG, Zhiyong WU, Xunying LIU, Helen MENG, Speech Emotion Recognition Using Sequential Capsule Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 29, pp. 3280-3291, 2021. (SCI, EI) (CCF A)

    [2]Xixin WU, Yuewen CAO, Hui LU, Songxiang LIU, Shiyin KANG, Zhiyong WU, Xunying LIU, Helen MENG, Exemplar-Based Emotive Speech Synthesis, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 29, pp. 874-886, 2021. (SCI, EI) (CCF A)

    [3]Yingmei GUO, Linjun SHOU, Jian PEI, Ming GONG, Mingxing XU, Zhiyong WU and Daxin JIANG, Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding, [in] Proc. EMNLP, pp. 1-12. Punta Cana, Dominican Republic, 7-11 November, 2021. (EI) (THU A)

    [4]Yaohua BU, Tianyi MA, Weijun LI, Hang ZHOU, Jia JIA, Shengqi CHEN, Kaiyuan XU, Dachuan SHI, Haozhe WU, Zhihan YANG, Kun LI, Zhiyong WU, Yuanchun SHI, Xiaobo LU, Ziwei LIU, PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback, [in] Proc. CHI, pp. 1-14. Yokohama, Japan, 8-13 May, 2021. (EI) (CCF A)

    [5]Suping ZHOU, Jia JIA, Zhiyong WU, Zhihan YANG, Yanfeng WANG, Wei CHEN, Fanbo MENG, Shuo HUANG, Jialie SHEN, Xiaochuan WANG, Inferring Emotion from Large-Scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach, [in] Proc. AAAI, pp. 6039-6047. 2-9 February, 2021. (EI) (CCF A)

    [6]Runnan LI, Zhiyong WU, Jia JIA, Yaohua BU, Sheng ZHAO, Helen MENG, Towards Discriminative Representation Learning for Speech Emotion Recognition, [in] Proc. IJCAI, pp. 5060-5066. Macao, China, 10-16 August, 2019. (EI) (CCF A)

    [7]Yishuang NING, Sheng HE, Zhiyong WU, Chunxiao XING, Liangjie ZHANG, A Review of Deep Learning Based Speech Synthesis, Applied Sciences-Basel, vol. 9, no. 19, pp. 4050, September 2019. (SCI, EI)

    [8]Runnan LI, Zhiyong WU, Jia JIA, Jingbei LI, Wei CHEN, Helen MENG, Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs, [in] Proc. ACM Multimedia, pp. 136-144. Seoul, Korea, 22-26 October, 2018. (EI) (CCF A)

    [9]Kun LI, Shaoguang MAO, Xu LI, Zhiyong WU, Helen MENG, Automatic Lexical Stress and Pitch Accent Detection for L2 English Speech using Multi-Distribution Deep Neural Networks, Speech Communication, vol. 96, pp. 28-36, Elsevier, February 2018. (SCI, EI) (CCF B)

    [10]Yishuang NING, Jia JIA, Zhiyong WU, Runnan LI, Yongsheng AN, Yanfeng WANG, Helen MENG, Multi-task Deep Learning for User Intention Understanding in Speech Interaction Systems, [in] Proc. AAAI, pp. 161-167. San Francisco, USA, 4-9 February, 2017. (EI) (CCF A)

    [11]Zhiyong WU, Yishuang NING, Xiao ZANG, Jia JIA, Fanbo MENG, Helen MENG, Lianhong CAI, Generating Emphatic Speech with Hidden Markov Model for Expressive Speech Synthesis, Multimedia Tools and Applications, vol. 74, pp. 9909-9925, Springer, 2015. (SCI, EI) (CCF C)

    [12]Zhiyong WU, Kai ZHAO, Xixin WU, Xinyu LAN, Helen MENG, Acoustic to Articulatory Mapping with Deep Neural Network, Multimedia Tools and Applications, vol. 74, pp. 9889-9907, Springer, 2015. (SCI, EI) (CCF C)

    [13]Qi LYU, Zhiyong WU, Jun ZHU, Polyphonic Music Modelling with LSTM-RTRBM, [in] Proc. ACM Multimedia, pp. 991-994. Brisbane, Australia, 26-30 October, 2015. (EI) (CCF A)

    [14]Qi LYU, Zhiyong WU, Jun ZHU, Helen MENG, Modelling High-dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation, [in] Proc. IJCAI, pp. 4138-4139. Buenos Aires, Argentina, 25-31 July, 2015. (EI) (CCF A)

    [15]Jia JIA, Zhiyong WU, Shen ZHANG, Helen MENG, Lianhong CAI, Head and Facial Gestures Synthesis using PAD Model for an Expressive Talking Avatar, Multimedia Tools and Applications, vol. 73, no. 1, pp. 439-461, Springer, 2014. (SCI, EI) (CCF C)

    [16]Zhiyong WU, Helen M. MENG, Hongwu YANG, Lianhong CAI, Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System, IEEE Transaction on Audio, Speech and Language Processing (TASLP), vol. 17, no. 8, pp. 1567-1577, November, 2009. (SCI, EI) (CCF A)


    Books

    Patents

    Others

  • Awards and Honors

    [1](2009) Ministry of Education (MoE) Higher Education Outstanding Scientific Research Output Award in Technological Advancements for “Research and Applications of Multimodal Multilingual Speech and Language Interaction”

    [2](2016) Ministry of Education (MoE) Higher Education Outstanding Scientific Research Output Award in Technological Advancements for “Chinese Speech Perception and Interaction Modeling and Applications”

    [3](2017) The 1st prize in the “Spoofing Attack Task” in the GeekPwn 2017 Shanghai Contest

    [4](2020) Annual Teaching Excellence Award of Tsinghua University