With the rapid development of modern technology, massive amounts of data with complex pattern are generated. Gaussian process models that can easily fit the nonlinearity in data become more and more popular nowadays. It is often the case that in some data only a few features are important or active. However, unlike classical linear models, it is challenging to identify active variables in Gaussian process models. One of the most commonly used methods for variable selection in Gaussian process models is automatic relevance determination, which is known to be open ended. There is no rule of thumb to determine the threshold for dropping features, which makes the variable selection in Gaussian process models ambiguous. In this work, we propose two variable selection algorithms for Gaussian process models, which use the artificial nuisance columns as baseline for identifying the active features. Moreover, the proposed methods work for both regression and classification problems. The algorithms are demonstrated using comprehensive simulation experiments and an application to multi-subject electroencephalography data that studies alcoholiclevels of experimental subjects.
Keywords: Automatic relevance determination, Electroencephalography data, Gaussian process, Principal component analysis, Variable selection
Note:
This seminar is one of the joint colloquium series with the University of Connecticut. The zoom link to this seminar is https://umass-amherst.zoom.us/j/93295628670.