PKU

Mathematical Introduction to Data Science (数据中的数学)
Fall 2013


Course Information

Synopsis (摘要)

This course is open to graduates and senior undergraduates in applied mathematics and statistics who are involved in dealing with data. It covers some topics on high dimensional statistics, manifold learning, diffusion geometry, random walks on graphs, concentration of measure, random matrix theory, geometric and topological methods, etc.
Prerequisite: linear algebra, basic probability and multivariate statistics, basic stochastic process (Markov chains); familarity with Matlab or R.

Lecture Notes

[pdf download]

Time and Place:

Wednesday 3:10-6:00pm;

The 3rd Lecture Hall (三教) Rm 103

Homework and Projects:

We are targeting weekly homeworks with monthly mini-projects, and a final major project. No final exam. Scribers will get bonus credit for their work!

Teaching Assistant (助教):

HUANG, Junliang (黄俊亮) Email: jlhwung (add "AT gmail DOT com" afterwards)
WANG, Qing (王擎) Email: wangqing.linus (add "AT gmail DOT com" afterwards)

Schedule (时间表)

Date Topic Instructor Scriber
09/11/2013, Wed Lecture 01: Introduction to Course Syllabus
Yuan Yao
09/18/2013, Wed Seminar: Distributed Sparse Optimization [ slides ]
  • Speaker: Professor Wotao Yin, UCLA
  • Time: 2013.9.18 Wed 3:00pm
  • Venue: The 3rd Lecture Hall (三教) Rm 103, PKU
  • Abstract: Sparse optimization has found interesting applications in many data-processing areas such as compressed sensing, machine learning, signal processing, medical imaging, finance, etc. After reviewing compressed sensing and sparse optimization, this talk then introduces novel algorithms tailored for very large scale sparse optimization problems with very big data. Besides the typical complexity analysis, we analyze the overhead due to parallel and distributed computing. Numerical results are presented to demonstrate the scalability of the parallel codes for handling problems with hundreds of gigabytes of data under 2 minutes on the Amazon EC2 cloud computer. The work is joint with Zhimin Peng and Ming Yan.

Lecture 02. Sample Mean and Covariance, Principal Component Analysis
    [Homework 1]:
  • Homework 1 [pdf]. Deadline: 09/25/2013, Wednesday. Two ways for submissions:
  • Submit your electronic version with source codes to TAs by email before deadline; or
  • Hand in your paper version with source codes to TA on the class 09/25/2013, Wednesday.
  • Mark on the head of your homework: Name - Student ID
Wotao Yin (UCLA); Yuan Yao
09/25/2013, Wed Lecture 03: Stein's Phenomenon and James-Stein's Estimator [lecture note]
    [Homework 2]:
  • Homework 2 [pdf]. Deadline: 10/09/2013, Wednesday. Mark on the head of your homework: Name - Student ID.
Jingshu Wang (Stanford); Yuan Yao Qing Wang; Junxin Zhang
10/09/2013, Wed Lecture 04: Multidimensional Scaling (MDS) and ISOMAP
    [Reference]:
  • MDS: Chapter 2
  • ISOMAP: Chapter 5.2
Jian Sun (Tsinghua)
10/16/2013, Wed Lecture 05: Markov Chains on Graphs [lecture note]
Jian Sun (Tsinghua)
10/23/2013, Wed Lecture 06: Markov Chains on Graphs
Jian Sun (Tsinghua)
10/30/2013, Wed Lecture 07: An Introduction to Convex Optimization [slides]
    [Reference]:
  • Lieven Vandenberghe lecture nots on gradient method [pdf]
Zaiwen Wen
11/06/2013, Wed Lecture 08: Cheeger's Inequality and Lumpability of Markov Chains
    I corrected an error in the old lecture note, thanks to Jiechao Xiong. See the updated version.
11/13/2013, Wed Lecture 09: Diffusion Map, Commute Time Map, and Optimal Lumpable Reduction
    [Reference]:
  • Diffusion Map: Chapter 7.1-7.2
  • Commute Time: Chapter 6.6, 7.3
  • Optimal Lumpable Reduction: Chapter 6.5
    [Homework 3]:
  • Homework 3 [pdf]. Deadline: 11/20/2013, Wednesday. Mark on the head of your homework: Name - Student ID.
11/20/2013, Wed Lecture 10: Semi-supervised Learning from Transition Path Theory, and Combinatorial Hodge Theory
    [Reference]:
  • Transition Path Theory: Chapter 6.7
  • Semi-supervised Learning: Chapter 8
  • Combinatorial Hodge Theory: Chapter 9
    [Homework 4]:
  • Homework 4 [pdf]. Thanks to Weiming Li for pointing out a typo corrected in red. Deadline: 11/27/2013, Wednesday. Mark on the head of your homework: Name - Student ID.
11/27/2013, Wed Lecture 11: Compressed Sensing and Algorithms
    [Homework 5]:
  • Homework 5 [pdf]. Deadline: 12/4/2013, Wednesday. Mark on the head of your homework: Name - Student ID.
12/4/2013, Wed Lecture 12: Unified Framework for Regularized M-estimator in High Dimensional Statistics
    [Homework 6]:
  • Homework 6 [pdf]. Deadline: 12/11/2013, Wednesday. Mark on the head of your homework: Name - Student ID.
12/11/2013, Wed Lecture 13: Robust/Sparse PCA and Partial MDS: SDP extensions [ebanshu]
    [Reference]:
  • [RPCA]: Robust Pricipal Component Analysis.
  • [SPCA]: Sparse Pricipal Component Analysis formulated by a Semidefinite Programming.
  • [Parrilo_SIAM09]: Robust PCA with a view of convex Algebraic Geometry.
  • Emmanuel Candes talk at PKU, Oct 2011
  • [Ye06]: a semidefinite programming (SDP) approach for MDS with missing values (Sensor Network Localization).
  • [Ye11]: Yinyu Ye's talk at Fields Insitute (2011) on Universal Rigidity and SDP, some state-of-the-art open problems.
  • [MVU]: another use of SDP in manifold learning, Maximum Variance Unfolding (MVU).
    [Homework 7]:
  • Homework 7 [pdf]. Deadline: 12/18/2013, Wednesday. Mark on the head of your homework: Name - Student ID.
12/18/2013, Wed Lecture 14: Stochastic Approximations [ebanshu]
    [Reference]:
  • Steve Wright's talk on [Sparse Optimization]: See Part II for Stochastic Approximation, Robust Stochastic Approximation, and Mirror Descent.
  • Sasha Rakhlin's talk at Berkeley course on [Online convex optimization] with a Regret Analysis
  • A. Beck, M.Teboulle. Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization, Operations Research Letters, 31, (2003), 167-175

  • [Seminar] Big Data and Deep Machine Learning
  • Professor Tong Zhang, Rutgers University and Baidu Inc.
Yuan Yao;
Tong Zhang (Rutgers)
12/25/2013, Wed Lecture 15: Final Project Description [pdf] Deadline: 1/12/2013, Sunday.
Yuan Yao

Reference


by YAO, Yuan.