People
Professor

Ji WU, Ph.D. Professor

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

Tel: +86-10-62781706

Fax: +86-10-62770317

E-mail: wuji_ee@tsinghua.edu.cn

Wu Ji was born in WuXi, JiangSu Province, China. He received his B.S degree and his Ph.D degree from the Department of Electronic Engineering, Tsinghua University, in 1996 and 2001 respectively. He is currently an professor and Vice Chairman of the Department of Electronic Engineering, Tsinghua University. From 2006, Prof. Wu is the director of Tsinghua-iFlyTek Joint Lab for Speech Technologies. And he now is the leader of TWG(Technical Work Group) of Speech Industry Alliance of China. His research interests include speech recognition, natural language processing, pattern recognition, machine learning and data mining. Prof. Wu has published 60 peer-reviewed papers. He has been serving as a standing committee member of National human-computer voice communication conference since 2009.  And he is also paper reviewer for several international and domestic journals and academic conferences.

Education background

1991-1996, Department of Electronic Engineering of Tsinghua, majoring in Radio Technology and Information System, Bachelor degree

1996-2001, Signal and information processing from Department of Electronic Engineering of Tsinghua, Ph.D. degree

Experience

Wu took the teaching position in Tsinghua(2001)

Concurrent Academic

Member of Steer Committee of NCMMSC(National Conference on Man-Machine Speech Communication), Aug. 2009 – current

Member of Technical Program Committee of ISCSLP2012,ISCSLP2010 and ISCSLP2008.

Member of Technical Program Committee of NCMMSC2013,NCMMSC2011,NCMMSC2009 and NCMMSC2007.

Social service

Leader of TWG(Technical Work Group) of Speech Industry Alliance of China, Aug. 2012 – current


Member of SWG(Standard Work Group) of Chinese Voice Interactive Technologies, MIIT, China, Apr. 2004 - current


Areas of Research Interests/ Research Projects

Speech Recognition and Man-Machine Interaction

Contend based Analysis and Indexing for Speech;

Natural Language Processing;

Data mining, Machine Learning and Pattern Recognition

Research Status

(1)NSFC  61170197   1/1/2012 – 12/30/2015  PI(PI: Wu Ji)

Auto Summarization for Spoken Documents

(2)Sub-Project of 863 Hi-Tech Key Project  2012AA011004   9/1/2012 – 12/30/2014

coPI of Sub-Project (PI: Wang Shengjin, coPI: Huang yongfeng, Wu Ji)

Subject: Integrated Management and Analysis Technologies on Massive Uncertain Heterogeneous Data.

Project: Platform and Application of Extraction, Integration, Analysis and Management for Massive Web Data in The Open Environment.

(3)Sub-Project of 863 Hi-Tech Key Project  2006AA010104   6/1/2006 – 10/30/2010

PI of Sub-Project (PI: Wu Ji)

Subject: The research on personalize speech synthesis on statistic modeling.

Project: Key technology research and application development for multilingual speech synthesis.

(4)863 Hi-Tech Project  2006AA01Z149   11/1/2006 – 12/30/2008

PI(PI: Wu Ji, co-PI: Zhijian OU)

Research on high performance content-based Speech Indexing Technology

(5)863 Hi-Tech Project  2001AA114071   10/1/2001 – 12/30/2003

PI(PI: Wu Ji, co-PI: Jiasong Sun)

Intelligence Platform for Chinese Speech Information Processing

(6)Director of LST          2/21/2006 – 6/30/2015

Director (Director: Wu Ji, Vice Director: Wang Zhiguo)

Tsinghua University(Department of Electronic Engineering)-AnHui USTC iFlytek Co., Ltd. Joint Laboratory for Speech Technologies.

Three contracts, each with 3 years and ¥5,000,000.

(7)Tecent R&D Project         2/15/2012 – 2/14/2013

PI (PI: Wu Ji)

Joint Research and Development for Speech Recognition

(8)Project from Beijing Samsung Telecom R&D Center      8/1/2012 – 12/30/2012

PI (PI: Wu Ji)

Text Database Collection and Language Model Training

(9)Project from Beijing Toshiba R&D Center       4/1/2004-5/31/2005

PI (PI: Wu Ji)

Research on Robust Speech Recognition Technologies

Honors And Awards

National Science and Technology Progress Award Second Class (the eighth of the group), China, 2011

Project: The Development Platform for Intelligent Voice Interaction Key Technologies and Applications

Academic Achievement

Journal Papers:

[1] Wu J, Liu X, Zhang X, et al. Master clinical medical knowledge at certificated-doctor-level with deep learning model[J]. Nature Communications, 2018, 9(1): 4352.

[2] Dong F, Tao C, Wu J, et al. Detection of cervical lymph node metastasis from oral cavity cancer using a non-radiating, noninvasive digital infrared thermal imaging system[J]. Scientific reports, 2018, 8(1): 7219.

[3] Zhang T, Wu J. Learning long-term filter banks for audio source separation and audio scene classification[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2018, 2018(1): 4.

[4] Zhang T, Wu J. Discriminative frequency filter banks learning with neural networks. EURASIP Journal on Audio, Speech, and Music Processing. 2019 Dec;2019(1):1.

[5] Zhang, K., Wu, J., Chen, H., & Lyu, P. (2018). An effective teeth recognition method using label tree with cascade network structure. Computerized Medical Imaging and Graphics, 68, 61-70.

[6] Shi J, Wu J, Lv P, et al. BreastNet: Entropy-Regularized Transferable Multi-task Learning for Classification with Limited Breast Data[J]. International Journal of Bioscience, Biochemistry and Bioinformatics, 2018, 9(1)

[7] He, Z., Wu, J., & Lv, P. (2017). Multi-label text classification based on the label correlation mixture model. Intelligent Data Analysis, 21(6), 1371-1392.

[8] Chen, Q., Wu, J., Li, S., Lyu, P., Wang, Y., & Li, M. (2016). An ontology-driven, case-based clinical decision support model for removable partial denture design. Scientific reports, 6, 27855.

[9] Wu J, Li M, Lee C H. A probabilistic framework for representing dialog systems and entropy-based dialog management through dynamic stochastic state evolution[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2015, 23(11): 2026-2035.

[10] He Z, Wu J, Li T. Label correlation mixture model: a supervised generative approach to multilabel spoken document categorization[J]. IEEE Transactions on Emerging Topics in Computing, 2015, 3(2): 235-245.

[11] Zhang X L, Wu J. Deep belief networks based voice activity detection[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(4): 697-710.

[12] Yang L, Wu J, Lv P. Construction Algorithm of Sub-word Unit in Speech Retrieval[J]. Computer Engineering, 2012, 38(24): 251-253.

[13] Zhang X L, Wu J. Linearithmic time sparse and convex maximum margin clustering[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2012, 42(6): 1669-1692.

[14] YU X, WU J, KONG F, et al. Fusing Multi-information for Automatic Story Segmentation of Broadcast News[J]. Journal of Chinese Information Processing, 2012 (2): 21.

[15] Wu J, Zhang X L. Sparse Kernel Maximum Margin Clustering[J]. EN,2011,21(6).

[16] Li Wei,Wu Ji, Ping Lv. Query expansion based high performance Chinese voice retrieval,Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, August 2011,24,(4),pp561-566.

[17] Zhang X, Ji W U, Ping L V. Support vector machine based VAD using the multiple observation compound feature[J]. Journal of Tsinghua University, 2011, 51(9):1209-1214.

[18] Wu, Ji, and Xiao-Lei Zhang. "Efficient multiple kernel support vector machine based voice activity detection." IEEE Signal Processing Letters 18.8 (2011): 466-469.

[19] Wu, J., & Zhang, X. L. (2011). An efficient voice activity detection algorithm by combining statistical model and energy detection. EURASIP Journal on Advances in Signal Processing, 2011(1), 18.

[20] Wu, J., & Zhang, X. L. (2011). Maximum margin clustering based statistical VAD with multiple observation compound feature. IEEE Signal Processing Letters, 18(5), 283-286.

[21] 李伟, 吴及, 吕萍. 低空间复杂度的加权有限状态转换器合成算法. 计算机应用研究, 28 (8), 2931-2934,2011.

[22] 李伟, 吴及, 吕萍. 面向海量数据的语音敏感信息检测系统[J]. 信息工程大学学报, 2010, 11(5).

[23] 李伟, 吴及, 吕萍. 基于前后向语言模型的语音识别词图生成算法[J]. 计算机应用, 2010, 30(10):2563-2566.

[24] 苏腾荣, 吴及, 王作英. 基于空间相关性变换的声学模型训练[J]. 电子与信息学报, 2010, 32(4):1003-1007.

[25] 苏腾荣, 吴及, 王作英, et al. 利用空间相关性的改进HMM模型[J]. 计算机工程与设计, 2010, 31(5):1023-1026.

Conference Papers:

[1] Yu Hao, Xien Liu, Ji Wu et Exploiting Sentence Embedding for Medical Question Answering, AAAI 2019(Accept rate: 16.2%)

[2] Zhang T, Zhang K, Wu J. Temporal transformer networks for acoustic scene classification[J]. Proc. Interspeech 2018, 2018: 1349-1353.

[3] Zhang T, Zhang K, Wu J. Data independent sequence augmentation method for acoustic scene classification, Interspeech 2018.

[4] Zhang T, Zhang K, Wu J. Multi-modal attention mechanisms in LSTM and its application to acoustic scene classification, Interspeech 2018.

[5] Zhang, X., Wu, J., He, Z., Liu, X., & Su, Y. (2018, April). Medical exam question answering with large-scale reading comprehension. In Thirty-Second AAAI Conference on Artificial Intelligence.

[6] Zhang, T., Zhou, X., & Wu, J. (2018, July). Dropframe Scheme in Recurrent Neural Networks for Time Series Modeling. In 2018 International Conference on Audio, Language and Image Processing (ICALIP) (pp. 355-360). IEEE.

[7] Li, M., & Wu, J. (2017). The MSIIP system for dialog state tracking challenge 4. In Dialogues With Social Robots (pp. 465-474). Springer, Singapore.

[8] Chen, Z., & Wu, J. (2017). A Rescoring Approach for Keyword Search Using Lattice Context Information. INTERSPEECH2017(pp. 3592-3596).

[9] Wang, H. D., Zhang, T., & Wu, J. (2017). The monkeytyping solution to the youtube-8m video understanding challenge. CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding.

[10] He, Z., Liu, X., Lv, P., & Wu, J. (2016). Hidden softmax sequence model for dialogue structure analysis. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 2063-2072).

[11] Zhang T, Chen Z, Wu J, et al. Objective Evaluation Methods for Chinese Text-To-Speech Systems[C]//INTERSPEECH. 2016: 332-336.

[12] Wang, H. D., & Wu, J. (2015, December). Collaborative filtering of call for papers. In 2015 IEEE Symposium Series on Computational Intelligence (pp. 963-970). IEEE.

[13] Wang, H. D., & Wu, J. (2015, December). Optimizing seed set for new user cold start. In 2015 IEEE Symposium Series on Computational Intelligence (pp. 957-962). IEEE.

[14] Zhang, T., & Wu, J. (2015, July). Speech emotion recognition with i-vector feature and RNN model. In 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) (pp. 524-528). IEEE.

[15] Huang Z, Li J, Siniscalchi S M, et al. Rapid adaptation for deep neural networks through multi-task learning[C]//Sixteenth Annual Conference of the International Speech Communication Association. 2015.

[16] Wu J, Li M, Lee C H. An entropy minimization framework for goal-driven dialogue management[C]//Sixteenth Annual Conference of the International Speech Communication Association. 2015.

[17] Ding H, Wu J. Predicting retweet scale using log-normal distribution[C]//2015 IEEE International Conference on Multimedia Big Data. IEEE, 2015: 56-63.

[18] Ding H, Wu J. A Retweet Scale Prediction Model Based on Truncated Distribution Estimation. WSDM-BData 2015.

[19] Zhang T, Wu J, Wang D, et al. Audio retrieval based on perceptual similarity[C]//10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. IEEE, 2014: 342-348.

[20] Ma Y, Wu J. Combining n-gram and dependency word pair for multi-document summarization[C]//2014 IEEE 17th International Conference on Computational Science and Engineering. IEEE, 2014: 27-31.

[21] He Z, Wu J, Lv P. Label correlation mixture model for multi-label text categorization[C]//2014 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2014: 83-88.

[22] Chen Z, Zhang T, Wu J. Subword scheme for keyword search[C]//2014 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2014: 483-488.

[23] He Z, Lv P, Wu J. An effective and robust approach to Mandarin spoken language understanding in specific domain[C]//The 9th International Symposium on Chinese Spoken Language Processing. IEEE, 2014: 604-608.

[24] He Z, Lv P, Wu J. Minimum classification error rate training of supervised topic mixture model for multi-label text categorization[C]//The 9th International Symposium on Chinese Spoken Language Processing. IEEE, 2014: 39-43.

[25] Chen Z, He Z, Lv P, et al. Improving keyword search by query expansion in a probabilistic framework[C]//The 9th International Symposium on Chinese Spoken Language Processing. IEEE, 2014: 187-191.

[26] Li M, Ding H, Wu J. Global discriminative model for dependency parsing in NLP pipeline[C]//The 9th International Symposium on Chinese Spoken Language Processing. IEEE, 2014: 614-618.

[27] Li S, He Z, Wu J. An ontology semantic tree based natural language interface[C]//The 9th International Symposium on Chinese Spoken Language Processing. IEEE, 2014: 226-230.

[28] Zhang X L, Wu J. Denoising deep neural networks based voice activity detection[C]//2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013: 853-857.

[29] Zhang X L, Wu J. Weight optimization and layered clustering-based ECOC[C]//2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013: 3477-3481.

[30] He Z, Lv P, Li W, et al. A synchronized pruning composition algorithm of weighted finite state transducers for large vocabulary speech recognition[C]//2012 8th International Symposium on Chinese Spoken Language Processing. IEEE, 2012: 11-15.

[31] Wu Q, Zhang X, Lv P, et al. Perceptual similarity between audio clips and feature selection for its measurement[C]//2012 8th International Symposium on Chinese Spoken Language Processing. IEEE, 2012: 387-391.

[32] Zhang X L, Wu J, Chen Z P, et al. Optimized weighted decoding for error-correcting output codes[C]//2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012: 2101-2104.

[33] Du Z, Li X, Wu J. Accelerating the Training of HTK on GPU with CUDA[C]//2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IEEE, 2012: 1907-1914.

[34] Li Wei , He Zhiyang, Ping Lv, Wu Ji.Topology-related ε-Removal Algorithm for Weighted Finite-state Transducer, National Conference on Man-Machine Speech Communication, NCMMSC2011,Xi’an,2011,10.

[35] Yu Xiaojie, Shao Yang, Wu Ji, Wang Xia,A Fusion Summarization Framework based on SVM and MMR, National Conference on Man-Machine Speech Communication, NCMMSC2011,Xi’an,2011,10.

[36] Wu, J., He, Z., & Lv, P. (2011). An active learning approach to task adaptation. In Twelfth Annual Conference of the International Speech Communication Association.

[37] Li, W., Wu, J., & Lv, P. (2010, November). High performance Chinese Spoken Term Detection based on term expansion. In 2010 7th International Symposium on Chinese Spoken Language Processing (pp. 430-434). IEEE.

[38] Shen, W., Wu, J., & Li, W. (2010, November). Web-based keyword adapted Language Modeling for Keyword Spotting. In 2010 7th International Symposium on Chinese Spoken Language Processing (pp. 251-255). IEEE.

[39] Wu, J., Zhang, X. L., & Li, W. (2010). A new VAD framework using statistical model and human knowledge based empirical rule. In Eleventh Annual Conference of the International Speech Communication Association.