报告题目：Large scale topic modeling on Twitter
报告人: Shuanghong Yang，Ph.D，Lead Scientist；MLInfra@Twitter
We aim to provide a topic-aware multi-channel experience on Twitter to facilitate content creation, discovery and consumption. This requires the ability to organize in real-time a continuous stream of sparse and noisy texts (i.e., tweets) into hundreds of topics with measurable and stringently high precision. We present a spectrum of techniques that contribute to a deployed tweet topic modeling system. These include scaling up LDA to real-time inference at full Twitter scale, high-precision topic filtering, taxonomy construction, non-topical tweet detection, automatic labeled data acquisition, evaluation with human computation, diagnostic and corrective learning, and most importantly high-precision topic prediction. I will briefly introduce these techniques and the machine learning infrastructure behind it.
Shuanghong is a Senior Researcher & Lead Scientist at Twitter, where he leads the machine learning infrastructure team. Prior to Twitter, he worked on machine learning and predictive analytics at Microsoft Research and Yahoo! Labs. He earned his Ph.D from Georgia Institute of Technology in 2012. He has published actively at leading academic conferences and journals. He is the winner of Yahoo! Key Scientific Challenge award (2011) and Facebook Fellow (2011, finalist), and the recipient of the ACM SIGIR 2011 Best Student Paper award, the UAI 2010 Best Student Paper award (nominated) and the PAKDD 2008 Best Student Paper award.