2012 Fall: Mathematical Introduction to Data Science

Mathematical Introduction to Data Science (数据中的数学)
Fall 2012

Course Information

Synopsis (摘要)

This course is open to graduates and senior undergraduates in applied mathematics and statistics who are involved in dealing with data. It covers some topics on high dimensional statistics, manifold learning, diffusion geometry, random walks on graphs, concentration of measure, random matrix theory, geometric and topological methods, etc.
Prerequisite: linear algebra, basic probability and multivariate statistics, basic stochastic process (Markov chains); familarity with Matlab or R.

Lecture Notes

[pdf download]

Time and Place:

Monday 3:10-5:00pm;
Thursday (odd weeks) 1:00-2:50pm
Rm 316 Lecture Hall 2nd; 二教 316
Biweekly Seminars will be notified separately (Friday).

Homework and Projects:

We are targeting weekly homeworks with monthly mini-projects, and a final major project. No final exam. Scribers will get bonus credit for their work!

Teaching Assistant (助教):

LV, Yuan (吕渊) Email: shirleybluely (add "AT gmail DOT com" afterwards)

Schedule (时间表)

Date	Topic	Instructor	Scriber
09/10/2012, Mon	Lecture 01: Introduction: Data Representation, Sample Mean, Variance, and PCA [lecture note 1.pdf] [Reference]: For PCA, see [ESL] Chapter 14.5 For SVD, see [Matrix] Chapter 2.5 etc. PCA in the analysis of SNPs: Li et al. Science 319(5866):1100-1104, 2008 [data]: Handwritten digit 3 1258_by_452-stock closed prices for 4 years, SNP'500 650K-SNPs_by_1000-persons, Human Genome Diversity Project
09/11/2012, Tue	Seminar: Recent Progresses on Linear Programming and the Simplex Method Speaker: Yinyu Ye (Stanford University) Location: 1114 Science Building 1st (理科一号楼 1114) Time: 3-4pm, Tue 9/11 Abstract: Linear programming (LP), together with the simplex method, remain a core Operations Research, Computer Science and Mathematics topic since 1947. Due to the relentless research effort, a linear program can be solved today one million times faster than it was done thirty years ago. Businesses, large and small, now use LP models to control manufacture inventories, price commodities, design civil/communication networks, and plan investments. LP even becomes a popular subject taught in under/graduate and MBA curriculums, advancing human knowledge and promoting science education. The aim of the talk is to describe several recent exciting progresses on LP and the simplex method, include counter examples to the Hirsch conjecture, some pivoting rules and their exponential behavior, strongly polynomial-time bounds of the simplex and policy-iteration methods for solving Markov decision process (MDP) and turn-based zero-sum game with any constant discount factor, the strongly polynomial-time complexity of the simplex method for solving deterministic MDP regardless discounts, etc. Bio: Yinyu Ye is currently a full Professor of Management Science and Engineering and Institute of Computational and Mathematical Engineering and the Director of the MS&E Industrial Affiliates Program, Stanford University. He received the B.S. degree in System Engineering from the Huazhong University of Science and Technology, China, and the M.S. and Ph.D. degrees in Engineering-Economic Systems and Operations Research from Stanford University. His current research interests include Continuous and Discrete Optimization, Algorithm Design and Analysis, Computational Game/Market Equilibrium, Metric Distance Geometry, Dynamic Resource Allocation, and Stochastic and Robust Decision Making, etc. He is an INFORMS (The Institute for Operations Research and The Management Science) Fellow, and has received several research awards including the inaugural 2012 ISMP Tseng Lectureship Prize for outstanding contribution to continuous optimization, the 2009 John von Neumann Theory Prize for fundamental sustained contributions to theory in Operations Research and the Management Sciences, the inaugural 2006 Farkas prize on Optimization, and the 2009 IBM Faculty Award. He has supervised numerous doctoral students at Stanford who received the 2008 Nicholson Prize and the 2006 and 2010 INFORMS Optimization Prizes for Young Researchers.
09/13/2012, Thu	Lecture 02: Stein's Phenomenon and Shrinkage [lecture note 2.pdf] [Reference]: [Tsybakov09] Chapter 3.4 [pdf] describes Stein's phenomenon and derives James-Stein's estimators. Charles Stein (1956) [Stein56] firstly defines the inadmissiblity and finds better estimators than sample mean in terms of uniformly smaller mean square error, which starts a new Odyssey toward high dimensional inference. [EfronMorris74] [pdf] gives 3 data examples that JS estimator is significantly better than sample mean. [Homework 1]: Homework 1 [pdf]. Deadline: 09/24/2012, Friday. Two ways for submissions: Submit your electronic version to TAs by email before deadline; or Hand in your paper version to TA on the class 09/24/2011, Monday. Mark on the head of your homework: Name - Student ID	Y.Y.
09/17/2012, Mon	Lecture 03: Random Matrix Theory and PCA [lecture note 3.pdf] [Reference]: [Johnstone06] shows that in high-dimensional inference when p/n is fixed, PCA will not converge by random matrix theory. [Nadakuditi10] gives a brief treatment on phase transitions appeared in PCA when p/n is fixed.	Y.Y.	Tengyuan Liang; Bowei Yan
09/21/2012, Fri	Seminar: Amazing growth of Real-time advertising ecosystem Speaker: Dr. Xuehua Shen Location: 1560 Science Building 1st (理科一号楼 1560) Time: 2-3pm, Friday 9/21 Abstract: We have been witnessing the amazing growth of real-time bidding in USA in the past two years and China since late 2011. In this talk, I discuss three things about real-time bidding in display advertising. First what is real-time bidding (RTB) and important players such as DSP and ad exchanges in this ecosystem; second, what is the current status of RTB in USA and China and technical challenges; third, how do we grow to make RTB ecosystem even bigger. In the third part, I particularly mention the difference of BigData and cloud computing work between academia and industry, and describe three challenging research problems in computational advertising. Bio: Xuehua Shen received B.S. of Computer Science in Nanjing University, China, and Ph.D. of Computer Science at University of Illinois at Urbana-Champaign, USA. His Ph.D. thesis is personalized search. After Ph.D. research, he worked in Google search quality at Mountain View, CA, doing personalized search, and search quality live experiment platform based on real user interactions. He then worked in BlueKai, the biggest data exchange and data management platform (DMP) in Silicon Valley, using Hadoop cloud-computing platform to do personalized ads and predictive modeling. Now, he is co-founder and CTO in iPinyou (www.ipinyou.com), the leader of real-time advertising and audience targeting in China.
09/24/2012, Mon	Lecture 04: Random Matrix Theory and PCA (continued) [Reference]: [Johnstone06] shows that in high-dimensional inference when p/n is fixed, PCA will not converge by random matrix theory. [Nadakuditi10] gives a brief treatment on phase transitions appeared in PCA when p/n is fixed. [Homework 2]: Homework 2 [pdf]. Deadline: 10/8/2012, Monday. [Mini-Project 1]: Mini-Project 1 [pdf]. Deadline: 10/8/2012, Monday. Mark on the head of your homework: Name - Student ID Submit your report with source codes (as appendix in report or .zip files).
09/26/2012, Wed	Seminar:Geometric Inference Using Distance-like Functions [slides] Speaker: Prof. Frederic Chazal (INRIA Saclay -- Ile-de-France) Location: 1st Science Building, Rm 1273 Time: 10:30-11:30am, Wednesday 9/26 Abstract: Data often comes in the form of a point cloud sampled from an unknown compact subset of Euclidean space. The general goal of geometric inference is then to recover geometric and topological features of this subset from the approximating point cloud data. In recent years, it appeared that the study of distance functions allows to address many of these questions successfully. However, one of the main limitations of this framework is that it does not cope well with outliers nor with background noise. In this talk, we will show how to extend the framework of distance functions to overcome this problem. Replacing compact subsets by probability measures, we will introduce a notion of distance-to-measure functions. These functions share many properties with classical distance functions, which makes them suitable for inference purposes. In particular, by considering appropriate level sets of these distance functions, it is possible to associate in a robust way topological and geometric features to a probability measure. Bio: Frederic Chazal is a Senior Researcher in the GEOMETRICA team at INRIA Saclay, France. He obtained his Ph.d in Mathematics from University of Burgundy. His research interests are in computational geometry and topology. He has made significant contributions to this area. His work includes geometric inference for probability measure, sampling theory for compact sets in Euclidean spaces, stability of persistence diagram, etc.
09/27/2012, Thu	Lecture 5: Multidimensional Scaling [Reference]: [Bavaud10]: a survey on MDS and Schoeberg Transformation in data analysis. [data]: Pairwise Distances among 9 Cities in US
10/08/2012, Mon	Lecture 6: Random Projections and Almost Isometry: Johnson-Lindenstrauss Lemma [Reference]: [Dasgupta99]: an elementary proof of Johnson-Lindenstrauss Lemma. [Achlioptas01]: easy-to-operate random projections in database search. [Indyk98]: random projections for approximate nearest neighbors. [Jones11]: randomized approximate nearest neighbors. [Homework 3]: Homework 3 [pdf]. Deadline: 10/15/2012, Monday.	Y.Y.	Y.Y.
10/11/2012, Tue	Lecture 7: Robust and Sparse PCA: SDP Extensions [Reference]: [RPCA]: Robust Pricipal Component Analysis. [SPCA]: Sparse Pricipal Component Analysis formulated by a Semidefinite Programming. [Parrilo_SIAM09]: Robust PCA with a view of convex Algebraic Geometry. Emmanuel Candes talk at PKU, Oct 2011 [Matlab]: testRPCA.m : my matlab codes for RPCA, based on CVX. testSPCA.m : my matlab codes for SPCA, based on CVX. CVX : Matlab software for Disciplined Convex Programming, a basic package for semidefinite programming. Yi MA's webpage of Low-Rank Matrix Recovery at UIUC : many references and matlab codes	Y.Y.	Y.Y.
10/12/2012, Fri	Seminar: Topological Landscape of Complex Networks Speaker: Yuan Yao (PKU) Location: 1114 Science Building 1st (理科一号楼 1114) Time: 10-11am, Friday 10/12 Abstract: Topological landscape is introduced for networks with functions defined on the nodes. By extending the notion of gradient flows to the network setting, critical nodes of different indices are defined. This leads to a concise and hierarchical representation of the network. Persistent homology from computational topology is used to design efficient algorithms for performing such analysis. Applications to some examples in social and biological networks are demonstrated, which show that critical nodes carry important information about structures and dynamics of such networks. This is a joint work with Weinan E and Jianfeng Lu, et al.
10/15/2012, Monday	Presentation of the first project [Homework 4]: Homework 4 [pdf]. Deadline: 10/22/2012, Monday. (problems marked by * are optional)
10/16/2012, Tuesday	Seminar: Learning Theory for Kernel and Metric Learning Speaker: Prof. Yiming Ying (University of Exeter, UK) Location: 1st Science Building, Rm 1560 Time: 3-4pm, Tuesday 10/16 Abstract: The performance of many machine learning algorithms largely depends on the data representation via the choice of kernel function or distance metric. Hence, one central issue is the problem of learning a kernel and metric from data. In this talk, I will present our work on theoretical analysis of kernel learning methods, and also present our recent results on the analysis of metric and similarity learning. Bio: Dr. Yiming Ying received his B.S. degree in mathematics from Hangzhou University in 1997, Hangzhou, China and his PhD degree in mathematics from Zhejiang University in 2002, Hangzhou, China. Currently he is a Lecturer (Assistant Professor) in Computer Science in the School of Engineering, Computing and Mathematics at the University of Exeter, United Kingdom. His research interests include machine learning, learning theory, optimization, probabilistic graphical models and the applications to computer vision, bioinformatics and multimedia data analysis.
10/18/2012, Thursday	Seminar: Multi-component models for object detection Speaker: Dr. Chunhui Gu (Google) Location: 1st Science Building, Rm 1114 Time: 2-3pm, Thursday 10/18 Abstract: In this talk, I will present a multi-component approach for object detection. Rather than attempting to represent an object category with a monolithic model, or pre-defining a reduced set of aspects, we form visual clusters from the data that are tight in appearance and configuration spaces. We train individual classifiers for each component, and then learn a second classifier that operates at the category level by aggregating responses from multiple components. In order to reduce computation cost during detection, we adopt the idea of object window selection, and our segmentation-based selection mechanism produces fewer than 500 windows per image while preserving high object recall. When compared to the leading methods in the challenging VOC PASCAL 2010 dataset, our multi-component approach obtains highly competitive results. Furthermore, unlike monolithic detection methods, our approach allows the transfer of finer-grained semantic information from the components, such as keypoint location and segmentation masks. Bio: Chunhui Gu's research focuses on computer vision and machine learning, specifically in object detection and segmentation. He joined Google in January 2012 and works on applying computer vision techniques to various Google products. Before that, he received his PhD in Electrical Engineering and Computer Sciences from UC Berkeley in 2012, and bachelor's degree in Electrical Engineering from California Institute of Technology in 2006.
10/22/2012, Monday	Lecture 8: Random Projections and Compressed Sensing (Chapter 3.4) [Homework 5]: Homework 5 [pdf]. Deadline: 10/29/2012, Monday. (problems marked by * are optional)

10/25/2012, Thursday	Lecture 9: MDS with uncertainty: SDP embedding [Chapter 4.5-4.7] [Reference]: [Ye06]: a semidefinite programming (SDP) approach for MDS with missing values (Sensor Network Localization). [MVU]: another use of SDP in manifold learning, Maximum Variance Unfolding (MVU). [Ye11]: Yinyu Ye's talk at Fields Insitute (2011) on Universal Rigidity and SDP, some state-of-the-art open problems. [Matlab]: SNLSDP: SDP for SNL problem with up to 200 sensors DISCO: SDP for anchor-free SNL problem with a few thousands sensors
10/29/2012, Mon	Lecture 10: Manifold Learning (Nonlinear Dimensionality Reduction): ISOMAP vs. LLE [my slides before] [Reference]: [ISOMAP]: a science (2000) paper on MDS with geodesic distance (graph shortest path distance); [LLE]: a science (2000) paper on Locally Linear Embedding, i.e. local pca (complement) with global alignment; [Matlab]: isomap.m : isomap by Tennenbaum, de Silva isomapII.m : isomap by Tennenbaum, de Silva with sparsity, fast mex with dijkstra.cpp and fibheap.h lle.m : lle with k-nearest neighbors	Sun, Jian (Tsinghua)
11/05/2012, Mon	Lecture 11: Other Manifold Learning Techniques: Laplacian, Diffusion, Hessian, LTSA [slides] [Reference]: [Laplacian]: Laplacian Eigenmap (LLE) by Misha Belkin and Partha Niyogi 2003; [Hessian]: Hessian Eigenmap (LLE) by David Donoho and Carrie Grimes 2003; [LTSA]: Local Tangent Space Alignment by Hongyuan Zha and Zhenyue Zhang 2005; [Hein05]: consistency of Laplacian Eigenmap [Luxburg08]: consistency of Spectral Clustering [ZhaZha09]: consistency of LTSA -- Necessary and Sufficient conditions for recovery of global parameterization Todd Wittman's slides on comparison of manifold learning techniques [Matlab]: hlle.m : Hessian LLE (eigenmap) matlab codes. mani.m : Todd Wittman's manifold learning demo (MDS, ISOMAP, LLE, Hessian LLE, Laplacian, Diffusion, LTSA) [Mini-Project 2]: Mini-Project 2 [pdf]. Deadline: 11/19/2012, Monday. Mark on the head of your homework: Name - Student ID Submit your report with source codes (as appendix in report or .zip files).
11/12/2012, Mon	Lecture 12: Vector Laplacian and Diffusion Map [lecture notes] [Reference]: [VDM]: Singer and Wu, Vector diffusion maps and the connection Laplacian, Comm. Pure Appl. Math. 65(8):1067-1144, 2012; [ABT]: Aswani, Bickel and Tomlin, Regression on manifolds: Estimation of the exterior derivative, Ann. Stat. 39(1):48-81, 2011; [MALLER]: Cheng and Wu, Local linear regression on manifolds and its geometric interpretation, arxiv.org/abs/1201.0327.		Yuwei Jiang; Chendi Huang
11/19/2011, Mon	Lecture 13: Random Walk on Graphs: Perron-Frobenius Vector and PageRank [Reference]: [PageRank]: A book on mathematics for Google's PageRank. Perron-Frobenius Theory on pp. 168-174. [Meyer]: Chapter 8 Perron-Frobenius Theory for Nonnegative Matrices. [BrinPage98] Have you ever read the original paper by Brin-Page on PageRank? [Kleinberg99] Another important class of ranking of authorities and hubs, based on singular value decomposition of link matrix, by Jon Kleinberg. [Chang08]: Chang-Pearson-Zhang generalizes this to nonnegative tensors. Can you develop it into an application as "PageRank" on hypergraphs? [data]: Chinese (mainland) University Weblink during 12/2001-1/2002, more can be found at The Academic Web Link Database Matlab codes for PageRank, HITS etc.	Y.Y.
11/22/2011, Thu	Lecture 14: Random Walk on Graphs: Fiedler Theory and Cheeger inequality [Reference]: [Chung97]: a classic on Spectral Graph Theory, whose first four chapters can be found here. [Chung07]: four proofs for the cheeger inequality Jim Demmel's Lecture notes on Fiedler Theory at UC Berkeley: why we use unnormalized Laplacian eigenvectors for spectral partition [ChungLu06]: if you wanna read more random graph theory and complex networks.	Y.Y.
11/26/2011, Mon	Lecture 15: Random Walk on Graphs: Lumpability (metastability) and MNcut [Reference]: [KemenySnell76]: Chapter 6.3, 6.4 give definitions of lumpability. [MeilaShi01]: relationship between lumpability and multiple spectral clustering (MNcut). [ShiMalik00]: spectral clustering and image segmentation (Ncut). [Luxburg07]: a tutorial on spectral clustering. [Hochbaum10]: shows that a variant Ncut without 1/2 volume constraint, is surprisingly of polyonomial complexity although original one is NP-hard.	Y.Y.
11/28/2012, Wed	Seminar: Laplacian and Commute Time of Directed Graphs [lecture note.pdf] [Reference]: F. R. K. Chung. Laplacians and the cheeger inequality for directed graphs. Annals of Combinatorics, 9:1–19, sep 2005. Asymptotic theory of Diffusion Map. Li, Yanhua and Zhili Zhang (2010) Random Walks on Digraphs, The Generalized Digraph Laplacian, and The Degree of Asymmetry, The first paper on Diffusion Map and Diffusion Distance.	Tianqi Wu (Tsinghua)	Foling Zou Liying Li
12/3/2012, Mon	Lecture 16: Diffusion Map and Diffusion Distance [Reference]: [Coifman05] The first paper on Diffusion Map and Diffusion Distance. [CoifmanLafon06] Asymptotic theory of Diffusion Map. Amit Singer's Note on Diffusion Distances: An introduction to Diffusion Distances, which was used in the class. [data]: UMIST face data subset (33 faces of the same person in different angles)	Y.Y.
12/6/2012, Thu	Lecture 17: Commute Time Map and Distance [Reference]: [QiuHancock07]: diffusion distance vs. commute time distance, in applications of spectral embedding and clustering. [Fouss06-CommuteDistance]: shows applications of average commute time distance, or the pseudoinverse of the Laplacian matrix of the graph, in measuring stochastic similiary between nodes on large graphs. [RadLuxHei09]: however, shows in large geometric random graphs commute distance converges to something meaningless for similarity measure! [GobelJagers74-CommuteDistance]: shows that the average commuting time derived from mean first passage time is in fact an Euclidean distance metric [KleinRandic93]: shows that effective resistance is a distance, which is upto a constant equivalent to average commuting time distance.	Y.Y.
12/10/2012, Mon	Lecture 18: Introduction to Topic Models [slides] [Reference]: http://www.cs.princeton.edu/~blei/papers/Blei2012.pdf: http://www.cs.princeton.edu/~mimno/topics.html http://www.cs.princeton.edu/~blei/topicmodeling.html http://net.pku.edu.cn/~zhaoxin/Topic-model-xin-zhao-wayne.pdf [Homework 6]: Homework 6 [pdf]. Deadline: 12/17/2012, Monday. (problems marked by * are optional)	Xin Zhao (PKU)
12/17/2012, Mon	Lecture 19: Smoothed Sparse Optimization and Parallel LASSO [Reference]: As the slides contains some unpublished content, we will withhold the slides from the public at the speaker's request. Please email me if you attended the class and would like his slides.	Wotao Yin (Rice)
12/20/2012, Thu	Lecture 20: Project 2 presentation
12/24/2012, Mon	Lecture 21: Final Project [Final Project]: Final Project [pdf]. Deadline: 1/3/2013, Thursday. You may send me the report after the presentation on Jan 3rd, 2013.
1/3/2012, Thu	Lecture 22: Final Project Presentation

Reference

Books

[Boyd09] Boyd and Vandenbergh. Convex Optimization, 7th ed. (2009). [pdf]

[Chung97] Chung, Fan R.K. Spectral Graph Theory. 1997, AMS-CBMS. [Chapter 1-4]

[ChungLu06] Chung and Lu. Complex Graphs and Networks. 2006, AMS-CBMS. [Some chapters]

[ESL] Hastie, Tibshirani, and Friedman. The Elements of Statistical Learning. 2008, Springer. [Online book]

[KemenySnell76] Kemeny and Snell. Finite Markov Chains. 1976. [pdf]

[Kleinberg10] Easley and Kleinberg. Networks, Crowds, and Markets. 2010, Cambridge.

[Matrix] Golub and Van Loan. Matrix Computation. 1996.

[PageRank] Langville and Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. 2006, Princeton University Press. [Google book]

[SHDD] Buhlmann and van de Geer. Statistics for High-Dimensional Data. 2011, Springer.

[Tao] Tao, Terrence. Topics in random matrix theory. Lecture Notes in UCLA. [pdf] [Online Book from Terry's BLOG]

[Tsybakov09] Tsybakov. Introduction to Nonparametric Estimation. 2009, Springer. [Online Book]

Papers

[Achlioptas01] Achlioptas, Dimitris (2001) "Database-friendly Random Projections". Proc 20th ACM Symp Principles of Database Systems, Santa Barbara, CA, 2001, 274-281. [pdf].

[Arun87] Arun, K. S., Huang, T. S., and Blostein, S. D. (1987) Least-squares fitting of two 3-D point sets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9 (5), pp. 698-700. [pdf].

[Bavaud10] Bavaud, Francois (2010) "On the Schoenberg Transformations in Data Analysis: Theory and Illustrations". [arXiv].

[Laplacian] Belkin, M. and P. Niyogi (2003) "Laplacian eigenmaps for dimensionality reduction and data representation.". Neural Computation 15:1373-1396. [pdf].

[Belkin_Niyogi_NIPS2002] Belkin, M. and P. Niyogi (2002) "Using Manifold Structure for Partially Labelled Classification". NIPS 2002 [pdf].

[Ye06] P. Biswas, T.-C. Liang, K.-C. Toh, T.-C. Wang, and Y. Ye (2006) "Semidefinite programming approaches for sensor network localization with noisy distance measurements". IEEE Transactions on Automation Science and Engineering, 3 (2006), pp. 360--371. [pdf].

[BrinPage98] Sergey Brin, Larry Page (1998) "The Anatomy of a Large-Scale Hypertextual Web Search Engine". Proceedings of the 7th international conference on World Wide Web (WWW). Brisbane, Australia. pp. 107-117. [pdf].

[RPCA] E. J. Candes, X. Li, Y. Ma, and J. Wright (2009) "Robust Principal Component Analysis?". Journal of ACM, 58(1), 1-37. [pdf].

[Parrilo_SIAM09] V. Chandrasekaran, S. Sanghavi, P.A. Parrilo, A. Willsky (2009) "Rank-Sparsity Incoherence for Matrix Decomposition". http://arxiv.org/pdf/0906.2220 . [pdf].

[Chang08] Chang, Kung Ching, Kelly Pearson, and Tan Zhang (2008) "Perron-Frobenius theorem for nonnegative tensors". Commun. Math. Sci. Volume 6, Number 2 (2008), 507-520. [pdf].

[Chung07] Chung, Fan R.K. (2007) "Four proofs for the Cheeger inequality and graph partition algorithms". ICCM 2007. [pdf].

[Coifman05] Coifman, Lafon, Lee, Maggioni, Nadler, Warner, and Zucker (2005) Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps I; Multiscale Methods II. PNAS [I.pdf][II.pdf].

[CoifmanLafon06] Coifman and Lafon (2006) Diffusion maps. Applied and Computational Harmonic Analysis. [pdf].

[Dasgupta99] Dasgupta and Gupta (1999) "An Elementary Proof of Johnson-Lindenstrauss Lemma" Technical Report, ICSI Berkeley. Extended version at Random Structures and Algorithms, 2003, 22(1):60-65. .

[SPCA_SDP] A. d'Aspremont, L. El Ghaoui, M. Jordan, and G. Lanckriet (2006) "A Direct Formulation of Sparse PCA using Semidefinite Programming". preprint arxiv.org/pdf/cs/0406021. Published at SIAM Review, vol. 49, no. 3, 2007. .

[Hessian] Donoho, D.L. and C. Grimes (2003) "Hessian Eigenmaps: New Locally Linear Embedding techniques for high dimensional data" PNAS 100 (10):5591-5596. .

[EfronMorris74] Efron, Bradley and Carl Morris (1974). Data Analysis using Stein's Estimator and Its Generalizations. [pdf]

[FanHoffman55] Fan, K. and Hoffman, A. J. (1987) Some Metric Inequalities in the Space of Matrices. Proceedings of the American Mathematical Society, 6 (1), pp. 111-116. [pdf].

[Fouss07-CommuteDistance] Fouss, Francois, Alain Pirotte, Jean-michel Renders, and Marco Saerens (2007) Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19(3), pp. 355-369. [pdf].

[GobelJagers74] Gobel, F. and A. Jagers (1974). "Random Walks on Graphs". Stochastic Processes and Their Applications, 2: 311-336. [pdf].

[Hein05] Hein, M., J. Audibert, and U. von Luxburg (2005) From graphs to manifolds: weak and strong pointwise consistency of graph Laplacians, COLT, 2005. [pdf].

[Hochbaum10] Hochbaum, Dorit (2010) "Polynomial Time Algorithms for Ratio Regions and a Variant of Normalized Cut". IEEE Trans. Pattern Analysis and Machine Intelligence, 32, 2010. [pdf].

[Hunter06] Hunter, J.J. (2006) "Variances of first passage times in a Markov chain with applications to mixing times". Res. Lett. Inf. Math. Sci., 10:17-48, 2010. [pdf].

[Indyk98] Indyk, P. and R. Motwani (1998) "Approximate nearest neighbors: Towards removing the curse of dimensionality". Proc 30th Annu ACM Symp Theory of Computing, Dallas, TX, 1998, pp. 604-613. [pdf].

[Johnstone06] Johnstone, I (2006) High Dimensional Statistical Inference and Random Matrices. arXiv:0611589.

[Jones11] Peter Wilcox Jones, Andrei Osipov, and Vladimir Rokhlin (2011) Randomized Approximate Nearest Neighbhors Algorithm. PNAS, 2011 [pdf].

[Keller75] Keller, J. B. (1975) Closest Unitary, Orthogonal and Hermitian Operators to a Given Operator. Mathematics Magazine, 48 (4), pp. 192-197. [pdf].

[Kleinberg99] Kleinberg, Jon (1999). "Authoritative sources in a hyperlinked environment". Journal of the ACM 46 (5): 604-632. [pdf].

[KleinRandic93] Klein, D.J. and M. Randic (1993). "Resistance Distance". J. Math. Chemistry 12: 81-95. [pdf].

[Li2008] Li J.Z., et al. (2008). "Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation". Science 319(5866):1100-1104, 2008 .

[Luxburg07] Ulrike von Luxburg (2007). A tutorial on spectral clustering. [pdf]

[Luxburg08] Ulrike von Luxburg, Mikhail Belkin, and Olivier Bousquet (2008). Consistency of Spectral Clustering. Ann. Stat. 36(2): 555-586. [pdf]

[MeilaShi01] Meila and Shi (2001). "A random walk view of spectral segmentation". AISTAT'01 [pdf, 7.7 MB].

[Nadler_Srebro_NIPS2009] Nalder, Boaz, Nathan Srebro, and Xueyuan Zhou (2009) "Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data". NIPS 2009 [pdf].

[Nadakuditi10] Nadakuditi, R. R. and F. Benaych-Georges (2010) The breakdown point of signal subspace estimation. IEEE Sensor Array and Multichannel Signal Processing Workshop (October 2010), pg. 177-180 [pdf].

[QiuHancock07] Qiu, Huaijun, and E.R. Hancock (2007) "Clustering and Embedding Using Commute Times", IEEE Trans. Pattern Analysis and Machine Intelligence, 29(11): 1873-1890. [pdf].

[RadLuxHei09] Radl, Agnes, Ulrike von Luxburg, and Matthias Hein (2007) The Resistance Distance is Meaningless for Large Random Geometric Graphs. Workshop on Analyzing Networks and Learning with Graphs NIPS 2009 [pdf].

[LLE] Roweis, Sam T. and Saul K. Lawrence (2000) Locally Linear Embedding. Science, 290:2323-2326. [LLE Website].

[ShiMalik00] Shi, Jianbo and Jitendra Malik (2000). "Normalized Cuts and Image Segmentation". IEEE Transactions on Pattern Analysis and Machine Intelligence,22(8): 888-905. [pdf].

[Singer06] Singer, Amit (2006) From graph to manifold Laplacian: The convergence rate. Applied and Computational Harmonic Analysis. [pdf].

[Stein56] Stein, Charles (1956). Inadmissibility of the usual estimator for the mean of a multivariate distribution. 1974. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability. 1. pp. 197-206. [pdf]

[ISOMAP] Tenenbaum, J.B., V. de Silva and J. C. Langford (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290:2319-2323. [ISOMAP Website]

[MVU] Weinberger, Killian Q. and Lawrence K. Saul (2006). "Unsupervised Learning of Image Manifolds by Semidefinite Programming". International Journal of Computer Vision 70(1), 77-90, 2006 [pdf]

[ZhaZha09] Hongyuan Zha and Zhenyue Zhang (2009). "Spectral properties of the alignment matrices in manifold learning". SIAM Review. [pdf]

[LTSA] Zhenyue Zhang and Hongyuan Zha (2005). "Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment". SIAM Journal of Scientific Computing 26(1)[pdf]

[ZhuLaf_ICML2003] Xiaojin Zhu, Zoubin Ghahramani and John Lafferty (2003). "Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions". ICML 2003 [pdf]

Datasets

[Data_Univ] The Academic Web Link Database

[Data_ESL] Datasets for "The Elements of Statistical Learning"

[Data_Roweis] Sam Roweis's Data for Matlab Hackers

[Data_Stanford] Stanford Large Network Dataset Collection

[Data_Newman] Mark Newman's Network Data at U Michigan: small and medium networks

[Data_HGDP] Human Genome Diversity Project [Website] [data 1064 individuals with 650K SNPs][Information about 1064 individuals]

[Data_SNP500] S&P 500 stock price time series, Courtesy by Han Liu.

[Data_Karate] Zachery's Karate Club Network in matlab: karate.mat with 'A: 34-by-34 matrix'

[Data_Lesmis] Les Miserables Main Character Coappearance Network in matlab, by Don Knuth, 'X: 77-by-77 coappearance frequency matrix; nodesName: character names'

[Data_Gonewind] Gone with Wind Interaction Network in matlab, by Xiuyuan Cheng, 'A: 68-by-68 adjacency matrix; name: 68 character names'

[西游记] characters-scene occurance matrices for 100 chapters [data in matlab (302-by-408 matrix)]

chap001-005	chap006-009	chap010-013	chap014-017	chap018-021	chap022-025
chap026-029	chap030-033	chap034-037	chap038-041	chap042-045	chap046-049
chap050-053	chap054-057	chap058-061	chap062-065	chap066-069	chap070-073
chap074-077	chap078-081	chap082-085	chap086-088	chap089-091	chap092-094
chap095-097	chap098-100	All in TXT	readData.m

[Data_Keywords] Keywords and profit index in paid search advertising, by Hansheng Wang (Guanghua, PKU). [sample file] [readme.txt] [data in csv]

Latex Template for Lecture Notes

[lecture00] lecture00.tex .

by YAO, Yuan.