PKU

Statistical Learning (统计学习)
Spring 2013


Course Information

Synopsis (摘要)

This course is open to graduates and senior undergraduates in applied mathematics, statistics, and engineering who are involved in learning from data. It covers some topics statistical learning, featured with several in-class projects in computational advertisement, bioinformatics, and social networks.
Prerequisite: linear algebra, basic probability and multivariate statistics, convex optimization; familarity with R and Matlab (better enhanced by C/++).

Reference (参考教材)

The Elements of Statistical Learning. 2nd Ed. By Hastie, Tibshirani, and Friedman

Instructors:

Jinzhu Jia and Yuan Yao

Time and Place:

Thurday 6:40-9:30pm;
From the second week: 理教 309 (The first week in Rm 1114 Sci. Bldg 1st, Tuesday 2/26/2013)
2 hour lectures plus 1 hour discussion

Homework and Projects:

Irregular assigned homeworks with projects, and a final major project. No final exam.

Teaching Assistant (助教):

Chen, Xinyu (陈薪羽) Email: xycbaker (add "AT gmail DOT com" afterwards)

YAN, Bowei (闫博巍) Email: bwyan (add "AT pku DOT edu DOT cn" afterwards)

Schedule (时间表)

Date Topic Instructor Scriber
02/26/2013, Tue Lecture 01: Introduction
    [Speaker's Bio]
  • Xuehua Shen received B.S. of Computer Science in Nanjing University, China, and Ph.D. of Computer Science at University of Illinois at Urbana-Champaign, USA. His Ph.D. thesis is personalized search. After Ph.D. research, he worked in Google search quality at Mountain View, CA, doing personalized search, and search quality live experiment platform based on real user interactions. He then worked in BlueKai, the biggest data exchange and data management platform (DMP) in Silicon Valley, using Hadoop cloud-computing platform to do personalized ads and predictive modeling. Now, he is co-founder and CTO in iPinyou (www.ipinyou.com), the leader of real-time advertising and audience targeting in China.
Y.Y.
Dr. Xuehua Shen
03/07/2013, Thu Lecture 02: Overview on Supervised Learning [ Lecture 2 slides ]
Y.Y.
03/14/2013, Thu Lecture 03: Linear Models for Classification [ Lecture 3 slides ]
Y.Y.
03/21/2013, Thu Lecture 04: Linear Models for Regression [ Lecture 4 slides ]
Y.Y.
03/28/2013, Thu Lecture 05: Machine Learning in Sponsored Search and Online Advertisement
    [Invited Talk I] Click Prediction in Sponsored Search and Online Advertisement
  • [Speaker] Taifeng Wang and Jiang Bian, Microsoft Research Asia
  • [Abstract] In this talk, we will first give a brief introduction to the online advertising and sponsored search, and point out interesting computational problems in the area. Then, we will take more discussions on the problem of click prediction in sponsored search. In particular, we will talk about why click prediction is critical for sponsored search and introduce some state-of-the-art research, followed by a couple of open challenges in this direction.
  • [Bio] Taifeng Wang is an associate researcher in Internet Economics and Computational Advertising Group (IECA), Microsoft Research Asia (MSRA). His research focuses on modeling users' behavior in ads system and help the search engine deliver better ads. His current research topic includes ads click prediction, user modeling, ads optimization etc. He is also interested in machine learning, distributed system and HCI design. Prior to joining Microsoft, he got his Master degree and Bachelor degree from University of Science and Technology of China. He has served as a PC member for multiple top conferences including WWW, SIGIR, KDD, IJCAI, AIRS.
  • Jiang Bian is an associate researcher in Internet Economics and Computational Advertising Group (IECA), Microsoft Research Asia (MSRA). His research interests include computational advertising, information retrieval, machine learning and data mining. Prior to joining Microsoft, he worked as a scientist at Yahoo! Labs in the US. Jiang received the Ph.D. degree in Computer Science at Georgia Institute of Technology, U.S. in 2010. He has served as PC Member for several international conferences, such as WWW, SIGIR, KDD, etc., and Peer Reviewer for a few journals, such as TOIS, TKDE, TIST, IPM, etc.
    [Invited Talk II] Sponsored Search Auctions: A Brief Introduction
  • [Speaker] Tao Qin, Microsoft Research-Asia
  • [Abstract] I will give a brief introduction to sponsored search auctions. Topics include: sponsored search market, several basic concepts in game theory, current practice of keyword auctions in industry, research focus on sponsored search auctions, and an outlook for future research directions.
  • [Bio] Dr. Tao Qin is a researcher in Internet Economics and Computational Advertising Group (IECA), Microsoft Research Asia (MSRA). His research interests include mechanism design, computational advertising, information retrieval, machine learning and data mining. Prior to joining Microsoft, he got both his PhD degree and Bachelor degree from Tsinghua University. He has served as a Co-Chair for multiple international workshops about online advertising, as an Area Chair for SIGIR 2012 and SIGIR 2013, as a PC member for multiple top conferences including WWW, SIGIR, EC, CIKM and SDM.
    [Discussion] DSP Competition of Online Advertisement: New Training Dataset [Slides]
  • [Speaker] Xuehua Shen
Taifeng Wang,
Jiang Bian,
Tao Qin,
Xuehua Shen

04/11/2013, Thu

Lecture 06: Basis Expansions and Regularization [ Lecture 6 slides ]

[Homework] See the last slide in  Lecture 6 slides . Due: April 25, 2013.

[Readings] 1. Chapter 5, Elements of statistical learning.

           2. Wahba, G. (1990). Spline Models for Observational Data, SIAM, Philadelphia.

           3. Lin Y and Zhang H. (2006) Component selection and smoothing in smoothing splines of variance models. Annals of Statistics 34(5) 2272-2297.

           4. Michal Aharon Michael Elad Alfred Bruckstein.  K-SVD: DESIGN OF DICTIONARIES FOR SPARSE REPRESENTATION.

           5. Julien Mairal, Francis Bach , Jean Ponce , Guillermo Sapiro, Andrew Zisserman (2008) Supervised Dictionary Learning

 

Jinzhu Jia

 

04/18/2013, Thu

Continue with Lecture 06

Students give presentation on the home projects

Jinzhu Jia

 

04/25/2013, Thu

Lecture 07: Basis Expansions and Regularization [ Lecture 7 slides ]

[Homework] See the last slide in  Lecture 7 slides . Due: May 9, 2013.

[Readings]  Chapter 6, Elements of statistical learning.

Jinzhu Jia

 

05/02/2013, Thu

Lecture 08: Model Assessment and Selection [ Lecture 8 slides ]

[Homework] See the last slide in  Lecture 8 slides . Due: May 16, 2013.

[Readings]  Chapter 7, Elements of statistical learning.

Jinzhu Jia

 

05/09/2013, Thu

Lecture 09: Model Inference and Averaging [ Lecture 9 slides ]

[Homework] See the last slide in  Lecture 9 slides . Due: May 23, 2013.

[Readings]  1. Chapter 8, Elements of statistical learning.

2. Chapter 1, Computational Statistics.

Jinzhu Jia

 

05/16/2013, Thu

Lecture 10: Model Inference and Averaging [ Lecture 9 slides ]

[Homework] See the last slide in  Lecture 9 slides . Due: May 30, 2013.

[Project II] Due: May 30, 2013

Project 2 Description

Project 2 assignments

 

[Readings]  1. Chapter 8, Elements of statistical learning.

2. Chapter 1, Computational Statistics.

Jinzhu Jia

Yuan Yao

 

05/23/2013, Thu

Lecture 11: Trees and Boosting [ Lecture 11 slides ]

[Readings]  1. Chapter 9, Elements of statistical learning.

2. Chapter 10, Elements of statistical learning.

3. Friedman, J. H (1991). " Multivariate Adaptive Regression Splines" (with discussion). Annals of Statistics 19, 1. (software)  

4. Friedman, J. H., Hastie, T. and Tibshirani, R. "Additive Logistic Regression: a Statistical View of Boosting." (Aug. 1998)

5. Y Freund, RE Schapire (1997) A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences

Jinzhu Jia

 

05/30/2013, Thu Lecture 12: Machine Learning from Computer Science and Boosting, by Prof. Liwei Wang
    [Yuan Yao] An application of LASSO: Robust Ranking with Hodge Decomposition
Liwei Wang,
Yuan Yao
06/06/2013, Thu Lecture 13: Final Projects [project_final.pdf]
    [ML_Rush Team] Introduction of Winning Stage I in iPinyou Global RTB Competition
    [Yuan Yao] Graphical Model: Protein Folding Prediction by Sequence Variations
    [Reference]:
  • Marcos et al., Direct-coupling analysis of residue coevolution captures native contacts across many protein families, PNAS, 2011, 108(49): E1293-E1301. [weblink][matlab DCA package]
  • Jones et al. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments, Bioinformatics. 2012, 28(2):184-90. [weblink] [PSICOV source]
  • Meinshausen and Buhlmann, High-dimensional graphs and variable selection with the Lasso, Ann. Statist. 2006, 34(3): 1436-1462.[Paperlink]
  • Banerjee, O., El Ghaoui, L. and d'Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation. J. Machine Learning Research, 9, 485-516. [paperlink]
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical Lasso. Biostatistics, 9, 432-441. [Paperlink] [Graphical Lasso package]
  • Ravikumar, Wainwright and Lafferty (2010). High-dimensional Ising model selection using l1-regularized logistic regression. Ann. Stat. 2010, 38(3): 1287-1319. [Paperlink]
Yuan Yao;
Xingqiang Wang et al. (CAS)
06/13/2013, Thu Lecture 14: Neural Networks and Deep Learning [project_final.pdf]
    [Dr. Lei JIA (贾磊), Baidu Co. Ltd.] Deep Learning and Speech Technique in Baidu (slides are noncirculate)
Yuan Yao;
Lei Jia (Baidu)

Datasets


by YAO, Yuan.