Mathematics for Data Science (数据中的数学)
Fall 2011

Course Information

Synopsis (摘要)

This course is open to graduates and senior undergraduates in applied mathematics and statistics who are involved in dealing with data. It covers some topics on high dimensional statistics, manifold learning, diffusion geometry, random walks on graphs, concentration of measure, random matrix theory, geometric and topological methods, etc.
Prerequisite: linear algebra, basic probability and multivariate statistics, basic stochastic process (Markov chains).

Time and Place:

Tue 10:10-12:00pm;
Fri (odd weeks) 10:10-12:00pm
(Possibly change later!) Rm 425, Ying Jie Exchange Center; 数院本科生机房, 英杰交流中心 425

Homework and Projects:

We are targeting bi-weekly homeworks with mini-projects, and a final major project. No final exam. Scribers will get bonus credit for their wonderful work!

Teaching Assistant (助教):

YAN, Bowei (闫博巍) Email: bwyan (add "AT pku DOT edu DOT cn" afterwards)

Schedule (时间表)

Date Topic Instructor Scriber
09/06/2011, Tue Lecture 01: Introduction: Data Representation, Sample Mean, Variance, and PCA [lecture note 1.pdf]
  • For PCA, see [ESL] Chapter 14.5
  • For SVD, see [Matrix] Chapter 2.5 etc.
Y.Y. Yan, Bowei
09/09/2011, Fri Lecture 02: Stein's Phenomenon and Shrinkage [lecture note 2 by Sheng,Hu.pdf]
  • [Tsybakov09] Chapter 3.4 [pdf] describes Stein's phenomenon and derives James-Stein's estimators.
  • Charles Stein (1956) [Stein56] firstly defines the inadmissiblity and finds better estimators than sample mean in terms of uniformly smaller mean square error, which starts a new Odyssey toward high dimensional inference.
  • [EfronMorris74] [pdf] gives 3 data examples that JS estimator is significantly better than sample mean.
Y.Y. Sheng, Hu.
Luo, Wulin.
Lv, Yuan.
09/13/2011, Tue Lecture 03: Random Matrix Theory and PCA [lecture note 3.pdf]
  • [Johnstone06] shows that in high-dimensional inference when p/n is fixed, PCA will not converge by random matrix theory.
  • [Nadakuditi10] gives a brief treatment on phase transitions appeared in PCA when p/n is fixed.
    [Homework 1]:
  • Homework 1 [pdf]. Deadline: 09/30/2011, Friday. Two ways for submissions:
  • Submit your electronic version to TA (Yan, Bowei) by email before deadline; or
  • Hand in your paper version to TA on the class 09/27/2011, Tuesday.
  • Mark on the head of your homework: Name - Student ID
Y.Y. Tengyuan Liang;
Bowei Yan
09/20/2011, Tue Lecture 04: Diffusion Map, an introduction [lecture note 4.pdf, version 2]
  • [Coifman05] An introduction to Diffusion Map.
  • [CoifmanLafon06] Asymptotic theory of Diffusion Map.
  • [Tao] Sec. 2.4.3 (the end of page 169) for the definition of Stietljes transform of a density p(t)dt on R (the book is using s(z) instead of m(z) in class). From the equation of m and z, say m^2 + zm + 1 =0, one can take derivative of z on both side to obtain m'(z) in terms of m and z.
Xiuyuan Cheng
Peng Luo; Wei Jin
09/23/2011, Fri Lecture 05: Diffusion Map, convergence theory [lecture note 5.pdf, version 4]
  • [CoifmanLafon06] Convergence of Diffusion Map with nonuniform distributed data to Fokker-Planck, Laplacian-Beltrami etc.
  • [Singer06] Improvement with fast convergence rates to Laplacian, with bias-variance decomposition
Xiuyuan Cheng
Jun Yin; Ya'ning Liu
09/27/2011, Tue Lecture 06: Diffusion Distance [lecture note 6.pdf]
    [Homework 2]:
  • Homework 2 [pdf]. Deadline: 10/12/2011, Tuesday. Note: Typoes on Problem 5 corrected! (10/01/2011)
  • Mark on the head of your homework: Name - Student ID
Y.Y. Lei Huang; Yue Zhao
10/11/2011, Tue Lecture 07: Random Walk on Graphs: Perron-Frobenius Vector and PageRank [lecture note 7.pdf]
  • [PageRank]: A book on mathematics for Google's PageRank. Perron-Frobenius Theory on pp. 168-174.
  • [Meyer]: Chapter 8 Perron-Frobenius Theory for Nonnegative Matrices.
  • [BrinPage98] Have you ever read the original paper by Brin-Page on PageRank?
  • [Kleinberg99] Another important class of ranking of authorities and hubs, based on singular value decomposition of link matrix, by Jon Kleinberg.
  • [Chang08]: Chang-Pearson-Zhang generalizes this to nonnegative tensors. Can you develop it into an application as "PageRank" on hypergraphs?
Y.Y. Yuan Lu; Bowei Yan
10/18/2011, Tue Lecture 08: Random Walk on Graphs: Fiedler Vector, Cheeger inequality and spectral bipartition [lecture note 8.pdf]
Y.Y. Zhiming Wang; Feng Lin
10/21/2011, Fri Lecture 9: Random Walk on Graphs: Lumpability (metastability), piecewise constant right Eigenvectors and Multiple Spectral Clustering (MNcut) [lecture note 9.pdf]
  • [KemenySnell76]: Chapter 6.3, 6.4 give definitions of lumpability.
  • [MeilaShi01]: relationship between lumpability and multiple spectral clustering (MNcut).
  • [ShiMalik00]: spectral clustering and image segmentation (Ncut).
  • [Luxburg07]: a tutorial on spectral clustering.
  • [Hochbaum10]: shows that a variant Ncut without 1/2 volume constraint, is surprisingly of polyonomial complexity although original one is NP-hard.
Y.Y. Hong Cheng; Ping Qin
10/25/2011, Tue Lecture 10: Random Walk on Graphs: Diffusion Distance and Commute Time Distance [lecture note 10.pdf]
  • [QiuHancock07]: diffusion distance vs. commute time distance, in applications of spectral embedding and clustering.
  • [Fouss06-CommuteDistance]: shows applications of average commute time distance, or the pseudoinverse of the Laplacian matrix of the graph, in measuring stochastic similiary between nodes on large graphs.
  • [RadLuxHei09]: however, shows in large geometric random graphs commute distance converges to something meaningless for similarity measure!
  • [GobelJagers74-CommuteDistance]: shows that the average commuting time derived from mean first passage time is in fact an Euclidean distance metric
  • [KleinRandic93]: shows that effective resistance is a distance, which is upto a constant equivalent to average commuting time distance.
    [Homework 3]:
  • Homework 3 [pdf]. (New pdate 11/1/2011) Deadline: 11/15/2011, Tuesday.
  • Mark on the head of your homework: Name - Student ID
Y.Y. Tangjie Lv; Longlong Jiang
11/01/2011, Tue Lecture 11: PCA vs. MDS: Schoenberg Theory
  • [Bavaud10]: a survey on MDS and Schoeberg Transformation in data analysis.
Y.Y. Yanzhen Deng; Jie Ren
11/04/2011, Fri Lecture 12: Random Projections and Metric: Johnson-Lindenstrauss Theory
11/08/2011, Tue Lecture 13: MDS with uncertainty: Graph Realization
11/15/2011, Tue Lecture 14: Manifold Learning (Nonlinear Dimensionality Reduction): ISOMAP vs. LLE
11/18/2011, Fri Lecture 15: Other Manifold Learning Techniques: Laplacian, Hessian, LTSA
11/22/2011, Tue Lecture 16: Multiscale SVD and Wavelets on Graphs
11/29/2011, Tue Lecture 17: Sparsity in High Dimensional Statistics
12/02/2011, Fri Lecture 18:
Weinan E
12/06/2011, Tue Lecture 19:
Weinan E
12/13/2011, Tue Lecture 20:
12/16/2011, Fri Lecture 21:
12/20/2011, Tue Lecture 22: Final Project Report


by YAO, Yuan.