2013 Fall: Mathematical Introduction to Data Science

Mathematical Introduction to Data Science (数据中的数学)
Fall 2013

Course Information

Synopsis (摘要)

This course is open to graduates and senior undergraduates in applied mathematics and statistics who are involved in dealing with data. It covers some topics on high dimensional statistics, manifold learning, diffusion geometry, random walks on graphs, concentration of measure, random matrix theory, geometric and topological methods, etc.
Prerequisite: linear algebra, basic probability and multivariate statistics, basic stochastic process (Markov chains); familarity with Matlab or R.

Lecture Notes

[pdf download]

Time and Place:

Wednesday 3:10-6:00pm;

The 3rd Lecture Hall (三教) Rm 103

Homework and Projects:

We are targeting weekly homeworks with monthly mini-projects, and a final major project. No final exam. Scribers will get bonus credit for their work!

Teaching Assistant (助教):

HUANG, Junliang (黄俊亮) Email: jlhwung (add "AT gmail DOT com" afterwards)
WANG, Qing (王擎) Email: wangqing.linus (add "AT gmail DOT com" afterwards)

Schedule (时间表)

Date	Topic	Instructor	Scriber
09/11/2013, Wed	Lecture 01: Introduction to Course Syllabus	Yuan Yao
09/18/2013, Wed	Seminar: Distributed Sparse Optimization [ slides ] Speaker: Professor Wotao Yin, UCLA Time: 2013.9.18 Wed 3:00pm Venue: The 3rd Lecture Hall (三教) Rm 103, PKU Abstract: Sparse optimization has found interesting applications in many data-processing areas such as compressed sensing, machine learning, signal processing, medical imaging, finance, etc. After reviewing compressed sensing and sparse optimization, this talk then introduces novel algorithms tailored for very large scale sparse optimization problems with very big data. Besides the typical complexity analysis, we analyze the overhead due to parallel and distributed computing. Numerical results are presented to demonstrate the scalability of the parallel codes for handling problems with hundreds of gigabytes of data under 2 minutes on the Amazon EC2 cloud computer. The work is joint with Zhimin Peng and Ming Yan. Lecture 02. Sample Mean and Covariance, Principal Component Analysis [Reference]: For PCA, see [ESL] Chapter 14.5 For SVD, see [Matrix] Chapter 2.5 etc. PCA in the analysis of SNPs: Li et al. Science 319(5866):1100-1104, 2008 [data]: Handwritten digit 3 1258_by_452-stock closed prices for 4 years, SNP'500 650K-SNPs_by_1000-persons, Human Genome Diversity Project [Homework 1]: Homework 1 [pdf]. Deadline: 09/25/2013, Wednesday. Two ways for submissions: Submit your electronic version with source codes to TAs by email before deadline; or Hand in your paper version with source codes to TA on the class 09/25/2013, Wednesday. Mark on the head of your homework: Name - Student ID	Wotao Yin (UCLA); Yuan Yao
09/25/2013, Wed	Lecture 03: Stein's Phenomenon and James-Stein's Estimator [lecture note] [Reference]: For James-Stein's Estimator, see Johnstone's draft [GE] Chapter 2, esp. 2.5 and 2.6 etc. For the example in class, see Efron's book Chapter 1. Empirical Bayes and the James-Stein Estimator A Lecture note by Emannuel Candes at Stanford James-Stein Estimate [Homework 2]: Homework 2 [pdf]. Deadline: 10/09/2013, Wednesday. Mark on the head of your homework: Name - Student ID. [Project 1]: Project 1 [pdf]. Deadline: 10/16/2013, Wednesday.	Jingshu Wang (Stanford); Yuan Yao	Qing Wang; Junxin Zhang
10/09/2013, Wed	Lecture 04: Multidimensional Scaling (MDS) and ISOMAP [Reference]: MDS: Chapter 2 ISOMAP: Chapter 5.2	Jian Sun (Tsinghua)
10/16/2013, Wed	Lecture 05: Markov Chains on Graphs [lecture note]	Jian Sun (Tsinghua)
10/23/2013, Wed	Lecture 06: Markov Chains on Graphs	Jian Sun (Tsinghua)
10/30/2013, Wed	Lecture 07: An Introduction to Convex Optimization [slides] [Reference]: Lieven Vandenberghe lecture nots on gradient method [pdf]	Zaiwen Wen
11/06/2013, Wed	Lecture 08: Cheeger's Inequality and Lumpability of Markov Chains I corrected an error in the old lecture note, thanks to Jiechao Xiong. See the updated version.
11/13/2013, Wed	Lecture 09: Diffusion Map, Commute Time Map, and Optimal Lumpable Reduction [Reference]: Diffusion Map: Chapter 7.1-7.2 Commute Time: Chapter 6.6, 7.3 Optimal Lumpable Reduction: Chapter 6.5 [Homework 3]: Homework 3 [pdf]. Deadline: 11/20/2013, Wednesday. Mark on the head of your homework: Name - Student ID.
11/20/2013, Wed	Lecture 10: Semi-supervised Learning from Transition Path Theory, and Combinatorial Hodge Theory [Reference]: Transition Path Theory: Chapter 6.7 Semi-supervised Learning: Chapter 8 Combinatorial Hodge Theory: Chapter 9 [Homework 4]: Homework 4 [pdf]. Thanks to Weiming Li for pointing out a typo corrected in red. Deadline: 11/27/2013, Wednesday. Mark on the head of your homework: Name - Student ID.
11/27/2013, Wed	Lecture 11: Compressed Sensing and Algorithms [Reference]: ebanshu Application of Hodge Theory: Lecture 11-1 Compressed Sensing and Algorithms: Lecture 11-2 Joel Tropp, Greedy is Good: Algorithmic Results for Sparse Approximation . IEEE Inform. Theo. 2004 For (linearized) Bregman Iterative Procedure, see Wotao Yin's website at Rice Feng Ruan's R-package on LInearized BRegman Algorithms (LIBRA) [Download] [Manual] [Homework 5]: Homework 5 [pdf]. Deadline: 12/4/2013, Wednesday. Mark on the head of your homework: Name - Student ID.
12/4/2013, Wed	Lecture 12: Unified Framework for Regularized M-estimator in High Dimensional Statistics [Reference]: ebanshu l2-consistency: Lecture 12 at [ebanshu] Negahban, Ravikumar, Wainwright and Yu (2012) A Unified Framework for High-Dimensional Analysis of M-estimators with Decomposable Regularizers . Statistical Science 27(4):538-557. Wainwright's talk slides at CAS summer school 2013 [Download] [Homework 6]: Homework 6 [pdf]. Deadline: 12/11/2013, Wednesday. Mark on the head of your homework: Name - Student ID.
12/11/2013, Wed	Lecture 13: Robust/Sparse PCA and Partial MDS: SDP extensions [ebanshu] [Reference]: [RPCA]: Robust Pricipal Component Analysis. [SPCA]: Sparse Pricipal Component Analysis formulated by a Semidefinite Programming. [Parrilo_SIAM09]: Robust PCA with a view of convex Algebraic Geometry. Emmanuel Candes talk at PKU, Oct 2011 [Ye06]: a semidefinite programming (SDP) approach for MDS with missing values (Sensor Network Localization). [Ye11]: Yinyu Ye's talk at Fields Insitute (2011) on Universal Rigidity and SDP, some state-of-the-art open problems. [MVU]: another use of SDP in manifold learning, Maximum Variance Unfolding (MVU). [Matlab]: testRPCA.m : my matlab codes for RPCA, based on CVX. testSPCA.m : my matlab codes for SPCA, based on CVX. CVX : Matlab software for Disciplined Convex Programming, a basic package for semidefinite programming. Yi MA's webpage of Low-Rank Matrix Recovery at UIUC : many references and matlab codes SNLSDP: SDP for SNL problem with up to 200 sensors DISCO: SDP for anchor-free SNL problem with a few thousands sensors [Homework 7]: Homework 7 [pdf]. Deadline: 12/18/2013, Wednesday. Mark on the head of your homework: Name - Student ID.
12/18/2013, Wed	Lecture 14: Stochastic Approximations [ebanshu] [Reference]: Steve Wright's talk on [Sparse Optimization]: See Part II for Stochastic Approximation, Robust Stochastic Approximation, and Mirror Descent. Sasha Rakhlin's talk at Berkeley course on [Online convex optimization] with a Regret Analysis A. Beck, M.Teboulle. Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization, Operations Research Letters, 31, (2003), 167-175 [Seminar] Big Data and Deep Machine Learning Professor Tong Zhang, Rutgers University and Baidu Inc.	Yuan Yao; Tong Zhang (Rutgers)
12/25/2013, Wed	Lecture 15: Final Project Description [pdf] Deadline: 1/12/2013, Sunday.	Yuan Yao

Reference

Books

[Boyd09] Boyd and Vandenbergh. Convex Optimization, 7th ed. (2009). [pdf]

[Chung97] Chung, Fan R.K. Spectral Graph Theory. 1997, AMS-CBMS. [Chapter 1-4]

[ChungLu06] Chung and Lu. Complex Graphs and Networks. 2006, AMS-CBMS. [Some chapters]

[ESL] Hastie, Tibshirani, and Friedman. The Elements of Statistical Learning. 2008, Springer. [Online book]

[GE] Iain Johnstone. Gaussian Estimation: Sequence and Wavelet Models. Draft June 11, 2013. [Online book]

[KemenySnell76] Kemeny and Snell. Finite Markov Chains. 1976.

[Kleinberg10] Easley and Kleinberg. Networks, Crowds, and Markets. 2010, Cambridge.

[Matrix] Golub and Van Loan. Matrix Computation. 1996.

[PageRank] Langville and Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. 2006, Princeton University Press. [Google book]

[SHDD] Buhlmann and van de Geer. Statistics for High-Dimensional Data. 2011, Springer.

[Tao] Tao, Terrence. Topics in random matrix theory. Lecture Notes in UCLA. [pdf] [Online Book from Terry's BLOG]

[Tsybakov09] Tsybakov. Introduction to Nonparametric Estimation. 2009, Springer. [Online Book]

Papers

[Achlioptas01] Achlioptas, Dimitris (2001) "Database-friendly Random Projections". Proc 20th ACM Symp Principles of Database Systems, Santa Barbara, CA, 2001, 274-281. [pdf].

[Arun87] Arun, K. S., Huang, T. S., and Blostein, S. D. (1987) Least-squares fitting of two 3-D point sets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9 (5), pp. 698-700. [pdf].

[Bavaud10] Bavaud, Francois (2010) "On the Schoenberg Transformations in Data Analysis: Theory and Illustrations". [arXiv].

[Laplacian] Belkin, M. and P. Niyogi (2003) "Laplacian eigenmaps for dimensionality reduction and data representation.". Neural Computation 15:1373-1396. [pdf].

[Belkin_Niyogi_NIPS2002] Belkin, M. and P. Niyogi (2002) "Using Manifold Structure for Partially Labelled Classification". NIPS 2002 [pdf].

[Ye06] P. Biswas, T.-C. Liang, K.-C. Toh, T.-C. Wang, and Y. Ye (2006) "Semidefinite programming approaches for sensor network localization with noisy distance measurements". IEEE Transactions on Automation Science and Engineering, 3 (2006), pp. 360--371. [pdf].

[BrinPage98] Sergey Brin, Larry Page (1998) "The Anatomy of a Large-Scale Hypertextual Web Search Engine". Proceedings of the 7th international conference on World Wide Web (WWW). Brisbane, Australia. pp. 107-117. [pdf].

[RPCA] E. J. Candes, X. Li, Y. Ma, and J. Wright (2009) "Robust Principal Component Analysis?". Journal of ACM, 58(1), 1-37. [pdf].

[Parrilo_SIAM09] V. Chandrasekaran, S. Sanghavi, P.A. Parrilo, A. Willsky (2009) "Rank-Sparsity Incoherence for Matrix Decomposition". http://arxiv.org/pdf/0906.2220 . [pdf].

[Chang08] Chang, Kung Ching, Kelly Pearson, and Tan Zhang (2008) "Perron-Frobenius theorem for nonnegative tensors". Commun. Math. Sci. Volume 6, Number 2 (2008), 507-520. [pdf].

[Chung07] Chung, Fan R.K. (2007) "Four proofs for the Cheeger inequality and graph partition algorithms". ICCM 2007. [pdf].

[Coifman05] Coifman, Lafon, Lee, Maggioni, Nadler, Warner, and Zucker (2005) Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps I; Multiscale Methods II. PNAS [I.pdf][II.pdf].

[CoifmanLafon06] Coifman and Lafon (2006) Diffusion maps. Applied and Computational Harmonic Analysis. [pdf].

[Dasgupta99] Dasgupta and Gupta (1999) "An Elementary Proof of Johnson-Lindenstrauss Lemma" Technical Report, ICSI Berkeley. Extended version at Random Structures and Algorithms, 2003, 22(1):60-65. .

[SPCA_SDP] A. d'Aspremont, L. El Ghaoui, M. Jordan, and G. Lanckriet (2006) "A Direct Formulation of Sparse PCA using Semidefinite Programming". preprint arxiv.org/pdf/cs/0406021. Published at SIAM Review, vol. 49, no. 3, 2007. .

[Hessian] Donoho, D.L. and C. Grimes (2003) "Hessian Eigenmaps: New Locally Linear Embedding techniques for high dimensional data" PNAS 100 (10):5591-5596. .

[EfronMorris74] Efron, Bradley and Carl Morris (1974). Data Analysis using Stein's Estimator and Its Generalizations. [pdf]

[FanHoffman55] Fan, K. and Hoffman, A. J. (1987) Some Metric Inequalities in the Space of Matrices. Proceedings of the American Mathematical Society, 6 (1), pp. 111-116. [pdf].

[Fouss07-CommuteDistance] Fouss, Francois, Alain Pirotte, Jean-michel Renders, and Marco Saerens (2007) Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19(3), pp. 355-369. [pdf].

[GobelJagers74] Gobel, F. and A. Jagers (1974). "Random Walks on Graphs". Stochastic Processes and Their Applications, 2: 311-336. [pdf].

[Hein05] Hein, M., J. Audibert, and U. von Luxburg (2005) From graphs to manifolds: weak and strong pointwise consistency of graph Laplacians, COLT, 2005. [pdf].

[Hochbaum10] Hochbaum, Dorit (2010) "Polynomial Time Algorithms for Ratio Regions and a Variant of Normalized Cut". IEEE Trans. Pattern Analysis and Machine Intelligence, 32, 2010. [pdf].

[Hunter06] Hunter, J.J. (2006) "Variances of first passage times in a Markov chain with applications to mixing times". Res. Lett. Inf. Math. Sci., 10:17-48, 2010. [pdf].

[Indyk98] Indyk, P. and R. Motwani (1998) "Approximate nearest neighbors: Towards removing the curse of dimensionality". Proc 30th Annu ACM Symp Theory of Computing, Dallas, TX, 1998, pp. 604-613. [pdf].

[Johnstone06] Johnstone, I (2006) High Dimensional Statistical Inference and Random Matrices. arXiv:0611589.

[Jones11] Peter Wilcox Jones, Andrei Osipov, and Vladimir Rokhlin (2011) Randomized Approximate Nearest Neighbhors Algorithm. PNAS, 2011 [pdf].

[Keller75] Keller, J. B. (1975) Closest Unitary, Orthogonal and Hermitian Operators to a Given Operator. Mathematics Magazine, 48 (4), pp. 192-197. [pdf].

[Kleinberg99] Kleinberg, Jon (1999). "Authoritative sources in a hyperlinked environment". Journal of the ACM 46 (5): 604-632. [pdf].

[KleinRandic93] Klein, D.J. and M. Randic (1993). "Resistance Distance". J. Math. Chemistry 12: 81-95. [pdf].

[Li2008] Li J.Z., et al. (2008). "Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation". Science 319(5866):1100-1104, 2008 .

[Luxburg07] Ulrike von Luxburg (2007). A tutorial on spectral clustering. [pdf]

[Luxburg08] Ulrike von Luxburg, Mikhail Belkin, and Olivier Bousquet (2008). Consistency of Spectral Clustering. Ann. Stat. 36(2): 555-586. [pdf]

[MeilaShi01] Meila and Shi (2001). "A random walk view of spectral segmentation". AISTAT'01 [pdf, 7.7 MB].

[Nadler_Srebro_NIPS2009] Nalder, Boaz, Nathan Srebro, and Xueyuan Zhou (2009) "Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data". NIPS 2009 [pdf].

[Nadakuditi10] Nadakuditi, R. R. and F. Benaych-Georges (2010) The breakdown point of signal subspace estimation. IEEE Sensor Array and Multichannel Signal Processing Workshop (October 2010), pg. 177-180 [pdf].

[QiuHancock07] Qiu, Huaijun, and E.R. Hancock (2007) "Clustering and Embedding Using Commute Times", IEEE Trans. Pattern Analysis and Machine Intelligence, 29(11): 1873-1890. [pdf].

[RadLuxHei09] Radl, Agnes, Ulrike von Luxburg, and Matthias Hein (2007) The Resistance Distance is Meaningless for Large Random Geometric Graphs. Workshop on Analyzing Networks and Learning with Graphs NIPS 2009 [pdf].

[LLE] Roweis, Sam T. and Saul K. Lawrence (2000) Locally Linear Embedding. Science, 290:2323-2326. [LLE Website].

[ShiMalik00] Shi, Jianbo and Jitendra Malik (2000). "Normalized Cuts and Image Segmentation". IEEE Transactions on Pattern Analysis and Machine Intelligence,22(8): 888-905. [pdf].

[Singer06] Singer, Amit (2006) From graph to manifold Laplacian: The convergence rate. Applied and Computational Harmonic Analysis. [pdf].

[Stein56] Stein, Charles (1956). Inadmissibility of the usual estimator for the mean of a multivariate distribution. 1974. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability. 1. pp. 197-206. [pdf]

[ISOMAP] Tenenbaum, J.B., V. de Silva and J. C. Langford (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290:2319-2323. [ISOMAP Website]

[MVU] Weinberger, Killian Q. and Lawrence K. Saul (2006). "Unsupervised Learning of Image Manifolds by Semidefinite Programming". International Journal of Computer Vision 70(1), 77-90, 2006 [pdf]

[ZhaZha09] Hongyuan Zha and Zhenyue Zhang (2009). "Spectral properties of the alignment matrices in manifold learning". SIAM Review. [pdf]

[LTSA] Zhenyue Zhang and Hongyuan Zha (2005). "Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment". SIAM Journal of Scientific Computing 26(1)[pdf]

[ZhuLaf_ICML2003] Xiaojin Zhu, Zoubin Ghahramani and John Lafferty (2003). "Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions". ICML 2003 [pdf]

Datasets

[Anzhen Heart Data] Heart Operation Effect Prediction, provided by Dr. Jinwen Wang, Anzhen Hospital

[DSP-Bidding Competition Data] Contest Website , provided by Dr. Xuehua Shen, iPinyou Co. Ltd.

[Keywords Pricing] Keywords and profit index in paid search advertising, by Hansheng Wang (Guanghua, PKU). [sample file] [readme.txt] [data in csv]

[Protein Folding] Protein Folding prediction from sequence variations, by Steve Smale and Massimo Andreatta (CityUHK). [compressed .zip] [readme.txt]

[红楼梦人物事件矩阵] a 376-by-475 matrix (374-by-475 updated by WAN, Mengting) for character-event appearance in A Dream of Red Mansion (Xueqin Cao) [374 Characters.txt (for R/read.table)] [HongLouMeng374.csv] [HongLouMeng376.xls] [.mat] [readme.m]

[Data_Gonewind] Gone with Wind Interaction Network in matlab, by Xiuyuan Cheng, 'A: 68-by-68 adjacency matrix; name: 68 character names'

[西游记] characters-scene occurance matrices for 100 chapters [data in matlab (302-by-408 matrix)]

chap001-005	chap006-009	chap010-013	chap014-017	chap018-021	chap022-025
chap026-029	chap030-033	chap034-037	chap038-041	chap042-045	chap046-049
chap050-053	chap054-057	chap058-061	chap062-065	chap066-069	chap070-073
chap074-077	chap078-081	chap082-085	chap086-088	chap089-091	chap092-094
chap095-097	chap098-100	All in TXT	readData.m