2011 Fall: Mathematics for Data Science

Mathematics for Data Science (数据中的数学)
Fall 2011

Course Information

Synopsis (摘要)

This course is open to graduates and senior undergraduates in applied mathematics and statistics who are involved in dealing with data. It covers some topics on high dimensional statistics, manifold learning, diffusion geometry, random walks on graphs, concentration of measure, random matrix theory, geometric and topological methods, etc.
Prerequisite: linear algebra, basic probability and multivariate statistics, basic stochastic process (Markov chains).

Time and Place:

Tue 10:10-12:00pm;
Fri (odd weeks) 10:10-12:00pm
(Possibly change later!) Rm 425, Ying Jie Exchange Center; 数院本科生机房, 英杰交流中心 425

Homework and Projects:

We are targeting bi-weekly homeworks with mini-projects, and a final major project. No final exam. Scribers will get bonus credit for their wonderful work!

Teaching Assistant (助教):

YAN, Bowei (闫博巍) Email: bwyan (add "AT pku DOT edu DOT cn" afterwards)

Schedule (时间表)

Date	Topic	Instructor	Scriber
09/06/2011, Tue	Lecture 01: Introduction: Data Representation, Sample Mean, Variance, and PCA [lecture note 1.pdf] [Reference]: For PCA, see [ESL] Chapter 14.5 For SVD, see [Matrix] Chapter 2.5 etc. [data]: Handwritten digit 3	Y.Y.	Yan, Bowei
09/09/2011, Fri	Lecture 02: Stein's Phenomenon and Shrinkage [lecture note 2 by Sheng,Hu.pdf] [Reference]: [Tsybakov09] Chapter 3.4 [pdf] describes Stein's phenomenon and derives James-Stein's estimators. Charles Stein (1956) [Stein56] firstly defines the inadmissiblity and finds better estimators than sample mean in terms of uniformly smaller mean square error, which starts a new Odyssey toward high dimensional inference. [EfronMorris74] [pdf] gives 3 data examples that JS estimator is significantly better than sample mean.	Y.Y.	Sheng, Hu. Luo, Wulin. Lv, Yuan.
09/13/2011, Tue	Lecture 03: Random Matrix Theory and PCA [lecture note 3.pdf] [Reference]: [Johnstone06] shows that in high-dimensional inference when p/n is fixed, PCA will not converge by random matrix theory. [Nadakuditi10] gives a brief treatment on phase transitions appeared in PCA when p/n is fixed. [Homework 1]: Homework 1 [pdf]. Deadline: 09/30/2011, Friday. Two ways for submissions: Submit your electronic version to TA (Yan, Bowei) by email before deadline; or Hand in your paper version to TA on the class 09/27/2011, Tuesday. Mark on the head of your homework: Name - Student ID	Y.Y.	Tengyuan Liang; Bowei Yan
09/20/2011, Tue	Lecture 04: Diffusion Map, an introduction [lecture note 4.pdf, version 2] [Reference]: [Coifman05] An introduction to Diffusion Map. [CoifmanLafon06] Asymptotic theory of Diffusion Map. [Tao] Sec. 2.4.3 (the end of page 169) for the definition of Stietljes transform of a density p(t)dt on R (the book is using s(z) instead of m(z) in class). From the equation of m and z, say m^2 + zm + 1 =0, one can take derivative of z on both side to obtain m'(z) in terms of m and z. [data]: UMIST face data subset (33 faces of the same person in different angles)	Xiuyuan Cheng Princeton	Peng Luo; Wei Jin
09/23/2011, Fri	Lecture 05: Diffusion Map, convergence theory [lecture note 5.pdf, version 4] [Reference]: [CoifmanLafon06] Convergence of Diffusion Map with nonuniform distributed data to Fokker-Planck, Laplacian-Beltrami etc. [Singer06] Improvement with fast convergence rates to Laplacian, with bias-variance decomposition	Xiuyuan Cheng Princeton	Jun Yin; Ya'ning Liu
09/27/2011, Tue	Lecture 06: Diffusion Distance [lecture note 6.pdf] [Reference]: Amit Singer's Note on Diffusion Distances: An introduction to Diffusion Distances, which was used in the class. [Coifman05] The first paper on Diffusion Map and Diffusion Distance. [Homework 2]: Homework 2 [pdf]. Deadline: 10/12/2011, Tuesday. Note: Typoes on Problem 5 corrected! (10/01/2011) Mark on the head of your homework: Name - Student ID	Y.Y.	Lei Huang; Yue Zhao
10/11/2011, Tue	Lecture 07: Random Walk on Graphs: Perron-Frobenius Vector and PageRank [lecture note 7.pdf] [Reference]: [PageRank]: A book on mathematics for Google's PageRank. Perron-Frobenius Theory on pp. 168-174. [Meyer]: Chapter 8 Perron-Frobenius Theory for Nonnegative Matrices. [BrinPage98] Have you ever read the original paper by Brin-Page on PageRank? [Kleinberg99] Another important class of ranking of authorities and hubs, based on singular value decomposition of link matrix, by Jon Kleinberg. [Chang08]: Chang-Pearson-Zhang generalizes this to nonnegative tensors. Can you develop it into an application as "PageRank" on hypergraphs? [data]: Chinese (mainland) University Weblink during 12/2001-1/2002, more can be found at The Academic Web Link Database Matlab codes for PageRank, HITS etc.	Y.Y.	Yuan Lu; Bowei Yan
10/18/2011, Tue	Lecture 08: Random Walk on Graphs: Fiedler Vector, Cheeger inequality and spectral bipartition [lecture note 8.pdf] [Reference]: [Chung97]: a classic on Spectral Graph Theory, whose first four chapters can be found here. [Chung07]: four proofs for the cheeger inequality Jim Demmel's Lecture notes on Fiedler Theory at UC Berkeley: why we use unnormalized Laplacian eigenvectors for spectral partition [ChungLu06]: if you wanna read more random graph theory and complex networks. [data]: Zachary's Karate Club network: 34-by-34 adjacency matrix A, A(i,j) is 0 or 1.	Y.Y.	Zhiming Wang; Feng Lin
10/21/2011, Fri	Lecture 9: Random Walk on Graphs: Lumpability (metastability), piecewise constant right Eigenvectors and Multiple Spectral Clustering (MNcut) [lecture note 9.pdf] [Reference]: [KemenySnell76]: Chapter 6.3, 6.4 give definitions of lumpability. [MeilaShi01]: relationship between lumpability and multiple spectral clustering (MNcut). [ShiMalik00]: spectral clustering and image segmentation (Ncut). [Luxburg07]: a tutorial on spectral clustering. [Hochbaum10]: shows that a variant Ncut without 1/2 volume constraint, is surprisingly of polyonomial complexity although original one is NP-hard.	Y.Y.	Hong Cheng; Ping Qin
10/25/2011, Tue	Lecture 10: Random Walk on Graphs: Diffusion Distance and Commute Time Distance [lecture note 10.pdf] [Reference]: [QiuHancock07]: diffusion distance vs. commute time distance, in applications of spectral embedding and clustering. [Fouss06-CommuteDistance]: shows applications of average commute time distance, or the pseudoinverse of the Laplacian matrix of the graph, in measuring stochastic similiary between nodes on large graphs. [RadLuxHei09]: however, shows in large geometric random graphs commute distance converges to something meaningless for similarity measure! [GobelJagers74-CommuteDistance]: shows that the average commuting time derived from mean first passage time is in fact an Euclidean distance metric [KleinRandic93]: shows that effective resistance is a distance, which is upto a constant equivalent to average commuting time distance. [data]: Handwritten digits dataset [Homework 3]: Homework 3 [pdf]. (New pdate 11/1/2011) Deadline: 11/15/2011, Tuesday. Mark on the head of your homework: Name - Student ID	Y.Y.	Tangjie Lv; Longlong Jiang
11/01/2011, Tue	Lecture 11: PCA vs. MDS: Schoenberg Theory [Reference]: [Bavaud10]: a survey on MDS and Schoeberg Transformation in data analysis. [data]: Pairwise Distances among 9 Cities in US	Y.Y.	Yanzhen Deng; Jie Ren
11/04/2011, Fri	Lecture 12: Random Projections and Metric: Johnson-Lindenstrauss Theory [Reference]:	Y.Y.
11/08/2011, Tue	Lecture 13: MDS with uncertainty: Graph Realization [Reference]:	Y.Y.
11/15/2011, Tue	Lecture 14: Manifold Learning (Nonlinear Dimensionality Reduction): ISOMAP vs. LLE [Reference]:	Y.Y.
11/18/2011, Fri	Lecture 15: Other Manifold Learning Techniques: Laplacian, Hessian, LTSA [Reference]:	Y.Y.
11/22/2011, Tue	Lecture 16: Multiscale SVD and Wavelets on Graphs [Reference]:	Y.Y.
11/29/2011, Tue	Lecture 17: Sparsity in High Dimensional Statistics [Reference]:	Y.Y.
12/02/2011, Fri	Lecture 18: [Reference]:	Weinan E
12/06/2011, Tue	Lecture 19: [Reference]:	Weinan E
12/13/2011, Tue	Lecture 20: [Reference]:	Y.Y.
12/16/2011, Fri	Lecture 21: [Reference]:	Y.Y.
12/20/2011, Tue	Lecture 22: Final Project Report [Reference]:	Y.Y.

Reference

Books

[Chung97] Chung, Fan R.K. Spectral Graph Theory. 1997, AMS-CBMS. [Chapter 1-4]

[ChungLu06] Chung and Lu. Complex Graphs and Networks. 2006, AMS-CBMS. [Some chapters]

[ESL] Hastie, Tibshirani, and Friedman. The Elements of Statistical Learning. 2008, Springer. [Online book]

[KemenySnell76] Kemeny and Snell. Finite Markov Chains. 1976. [pdf]

[Kleinberg10] Easley and Kleinberg. Networks, Crowds, and Markets. 2010, Cambridge.

[Matrix] Golub and Van Loan. Matrix Computation. 1996.

[PageRank] Langville and Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. 2006, Princeton University Press. [Google book]

[SHDD] Buhlmann and van de Geer. Statistics for High-Dimensional Data. 2011, Springer.

[Tao] Tao, Terrence. Topics in random matrix theory. Lecture Notes in UCLA. [pdf] [Online Book from Terry's BLOG]

[Tsybakov09] Tsybakov. Introduction to Nonparametric Estimation. 2009, Springer. [Online Book]

Papers

[Arun87] Arun, K. S., Huang, T. S., and Blostein, S. D. (1987) Least-squares fitting of two 3-D point sets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9 (5), pp. 698-700. [pdf].

[Bavaud10] Bavaud, Francois (2010) "On the Schoenberg Transformations in Data Analysis: Theory and Illustrations". [arXiv].

[BrinPage98] Sergey Brin, Larry Page (1998) "The Anatomy of a Large-Scale Hypertextual Web Search Engine". Proceedings of the 7th international conference on World Wide Web (WWW). Brisbane, Australia. pp. 107-117. [pdf].

[Chang08] Chang, Kung Ching, Kelly Pearson, and Tan Zhang (2008) "Perron-Frobenius theorem for nonnegative tensors". Commun. Math. Sci. Volume 6, Number 2 (2008), 507-520. [pdf].

[Chung07] Chung, Fan R.K. (2007) "Four proofs for the Cheeger inequality and graph partition algorithms". ICCM 2007. [pdf].

[Coifman05] Coifman, Lafon, Lee, Maggioni, Nadler, Warner, and Zucker (2005) Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps I; Multiscale Methods II. PNAS [I.pdf][II.pdf].

[CoifmanLafon06] Coifman and Lafon (2006) Diffusion maps. Applied and Computational Harmonic Analysis. [pdf].

[EfronMorris74] Efron, Bradley and Carl Morris (1974). Data Analysis using Stein's Estimator and Its Generalizations. [pdf]

[FanHoffman55] Fan, K. and Hoffman, A. J. (1987) Some Metric Inequalities in the Space of Matrices. Proceedings of the American Mathematical Society, 6 (1), pp. 111-116. [pdf].

[Fouss07-CommuteDistance] Fouss, Francois, Alain Pirotte, Jean-michel Renders, and Marco Saerens (2007) Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19(3), pp. 355-369. [pdf].

[GobelJagers74] Gobel, F. and A. Jagers (1974). "Random Walks on Graphs". Stochastic Processes and Their Applications, 2: 311-336. [pdf].

[Hein05] Hein, M., J. Audibert, and U. von Luxburg (2005) From graphs to manifolds: weak and strong pointwise consistency of graph Laplacians, COLT, 2005. [pdf].

[Hochbaum10] Hochbaum, Dorit (2010) "Polynomial Time Algorithms for Ratio Regions and a Variant of Normalized Cut". IEEE Trans. Pattern Analysis and Machine Intelligence, 32, 2010. [pdf].

[Johnstone06] Johnstone, I (2006) High Dimensional Statistical Inference and Random Matrices. arXiv:0611589.

[Keller75] Keller, J. B. (1975) Closest Unitary, Orthogonal and Hermitian Operators to a Given Operator. Mathematics Magazine, 48 (4), pp. 192-197. [pdf].

[Kleinberg99] Kleinberg, Jon (1999). "Authoritative sources in a hyperlinked environment". Journal of the ACM 46 (5): 604-632. [pdf].

[KleinRandic93] Klein, D.J. and M. Randic (1993). "Resistance Distance". J. Math. Chemistry 12: 81-95. [pdf].

[Luxburg07] Ulrike von Luxburg (2007). A tutorial on spectral clustering. [pdf]

[MeilaShi01] Meila and Shi (2001). "A random walk view of spectral segmentation". AISTAT'01 [pdf, 7.7 MB].

[Nadakuditi10] Nadakuditi, R. R. and F. Benaych-Georges (2010) The breakdown point of signal subspace estimation. IEEE Sensor Array and Multichannel Signal Processing Workshop (October 2010), pg. 177-180 [pdf].

[QiuHancock07] Qiu, Huaijun, and E.R. Hancock (2007) "Clustering and Embedding Using Commute Times", IEEE Trans. Pattern Analysis and Machine Intelligence, 29(11): 1873-1890. [pdf].

[RadLuxHei09] Radl, Agnes, Ulrike von Luxburg, and Matthias Hein (2007) The Resistance Distance is Meaningless for Large Random Geometric Graphs. Workshop on Analyzing Networks and Learning with Graphs NIPS 2009 [pdf].

[LLE] Roweis, Sam T. and Saul K. Lawrence (2000) Locally Linear Embedding. Science. [LLE Website].

[ShiMalik00] Shi, Jianbo and Jitendra Malik (2000). "Normalized Cuts and Image Segmentation". IEEE Transactions on Pattern Analysis and Machine Intelligence,22(8): 888-905. [pdf].

[Singer06] Singer, Amit (2006) From graph to manifold Laplacian: The convergence rate. Applied and Computational Harmonic Analysis. [pdf].

[Stein56] Stein, Charles (1956). Inadmissibility of the usual estimator for the mean of a multivariate distribution. 1974. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability. 1. pp. 197-206. [pdf]

[ISOMAP] Tenenbaum, J.B., V. de Silva and J. C. Langford (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science. [ISOMAP Website]

Datasets

[Data_Univ] The Academic Web Link Database

[Data_ESL] Datasets for "The Elements of Statistical Learning"

[Data_Roweis] Sam Roweis's Data for Matlab Hackers

[Data_Stanford] Stanford Large Network Dataset Collection

Latex Template for Lecture Notes

[lecture00] lecture00.tex .

by YAO, Yuan.