报告人：Dr. Xiuyuan Cheng (Duke University)
地点：Online (Tencent Meeting)
Abstract: Graph Laplacians are widely-used objects in today’s data analysis and machine learning applications. For example, classical spectral clustering and spectral embedding methods use the leading eigenvectors of the graph Laplacian matrices to identify clusters in the data, and to produce dimension-reduced features from high dimensional input data. An important class of graph Laplacian methods use kernelized affinities, that is, the weight on the edge (i,j) is given by K(x_i,x_j), where K(x,y) is a kernel matrix, and x_i's are data samples. This talk starts from preliminaries of kernelized graph Laplacians, where we review the theoretical advantage in recovering the intrinsic data geometry under a manifold data setting, as well as several limitations of the approach. In particular, we explain the difficulty of truncating the spectrum and/or selecting useful eigenvectors in practice, for example, when unbalanced clusters are present in data. Another difficulty, which is inherent in general kernel methods, is the choice of kernel bandwidth parameter. We then introduce two recent works trying to overcome the above issues: first, the so-called spectral embedding norm, defined at each node i, computes the squared sum of a possibly large number of leading eigenvectors of graph Laplacian at node i, and can be shown to be more stable than individual eigenvectors, using a matrix perturbation analysis; second, for the existing algorithm of computing adaptive bandwidth by k-nearest neighbor (kNN) distances, which is called self-tuning, we analyze the convergence to the manifold operator and show that kNN self-tuning provably reduces the variance error. As a by-product, the analysis reveals an interesting distinction between (fixed-bandwidth) kernel density estimator and the kNN estimator in the relative error of the estimated density. We introduce an application of kNN estimators in detecting different biological states in single-cell RNA sequencing data. At last, we conclude the talk with challenges and opportunities of kernel and spectral methods, and more broadly, how to incorporate data geometric properties into learning algorithms.
ID: 358 746 637