Information Theoretic Learning
Source of Funding: NSF, ONR
Information Theoretic Learning (ITL) was initiated in the late 90’s at CNEL and has been a center piece of the research effort. ITL uses descriptors from information theory (entropy and divergences) estimated directly from the data to substitute the conventional statistical descriptors of variance and covariance. ITL can be used in the adaptation of linear or nonlinear filters and also in unsupervised and supervised machine learning applications.
Visit ITL wiki for more information including slides and codes.
Correntropy Dependence Measure
Correntropy was defined as a generalization of correlation of random processes. The name indicates that the mean value over lags provides the information potential which is the argument of the logarithm of quadratic Renyi’s entropy. As can be expected correntropy includes not only second order but also higher order moment information of the random variables. Using correntropy we can derive centered correntropy and correntropy coefficient which are equivalent to the covariance and correlation coefficient respectively. A novel parametric correntropy is defined as the correntropy between a shifted and a scaled random variable. The supremum of the parametric correntropy coefficient over all possible shifts and scales gives rise to a new dependence measure that has very interesting properties. It leads to new tests of independence and is also able to quantify the dependence among random variables.
- * Rao M., Xu J., Seth S., Chen Y., Tagare M., Principe J., Correntropy Dependence Measure, submitted to J. Machine Learning Research, 2008.
Nonlinearity tests based on Correntropy
The inclusion of second and higher order information in correntropy makes it particularly useful in distinguishing between linear and nonlinear signal sources. We have created a simple procedure based on the correntropy spectral density (CSD) and surrogates for nonlinear tests. If an examined time series was created by linear dynamics, the underlying distribution of its CSD and that of its surrogates should be the same. On the other hand, if the two underlying distributions are different, we deduce that the time series contains nonlinear structures not contained in its surrogates. Normalizing the CSD by its total value converts correntropy per frequency into a pdf and allows for the use of the two-sample Kolmogorov-Smirnoff goodness-of-fit test.
- Aysegul Gunduz, Jose Principe, Correntropy as a Novel Measure for Nonlinearity Tests (submitted)
Pitch Detection Based on Correntropy
Another very interesting property of correntropy is its higher temporal definition for similarity because of the higher order moment information, which is controlled by the kernel size. We were able to show that correntropy can be used with advantage in pitch detection algorithms based on the cochlea filters and correlogram, or in general in any applications that requires a time bandwidth product better than the conventional nonparametric spectral estimation.
- Jianwu Xu and Jose C. Principe, A Novel Pitch Determination Algorithm Based on Generalized Correlation Function (accepted)
Nonlinear Granger Causality based on correntropy
Correntropy defines an RKHS(reproducing kernel Hilbert space) nonlinearly related to the data space. Therefore, Wiener filters in the correntropy RKHS are nonlinear filters in the data space. We have applied this idea to derive a nonlinear causality test based on Granger's causality in the correntropy RKHS. Preliminary results show that for certain time series, the method outperforms the linear counterpart.
- Il Park, Jose C. Principe, Correntropy based Granger Causality ", ICASSP", /home/users/www/cnel_docroot/files/1485291325.pdf, pp. -, 4 2008
Compressive sampling based on correntropy
Correntropy induces a nonlinear metric in the sample space that is very interesting since it changes from an L2 metric to L1 and finally to L0 depending upon the distance between the samples. This flexibility can be utilized to seek an approximation for the L0 norm solution required in compressive sampling.
- * Seth S., Principe J., Compressed Signal Reconstruction Using the Correntropy Induced metric, in Proc. ICASSP 2008, Las Vegas.
ITL Feature Extraction for mine recognition
ITL descriptors of divergence (Euclidean and Cauchy Schwarz distances) are particularly useful in estimation distances in probability spaces. This project exploits this advantage to extract features in sonar (mine recognition). Here two methods are being compared. One uses the natural metric of the ITL RKHS to create infinite capacity associative memories. The other uses the Cauchy Schwarz quadratic mutual information to project the image snippets into a low dimensional space where the projections are used as features from each image.
"Information Theoretic Shape Matching" paper's Matlab scripts
The Principle of Relevant Entropy
We have created a multi-objective cost function for unsupervised learning that is able to yield as special cases clustering, principal curves and vector quantization as particular values of a parameter. One of the terms is the entropy of the data, and the other is the Cauchy-Schwarz distance between the original data set and the processed data. As a weighted combination of these two terms, the cost is completely specified by just two parameters; one defining the goal or task of learning and the other the resolution of the analysis. This new framework has some striking similarities with the popular bottleneck method. Further, there is a fast fixed point algorithm that is able to implement all of these cases, thus avoiding the issues of step-size all together.
- Sudhir Rao, Allan de Medeiros Martins, Weifeng Liu, Jose C. Principe, Information Theoretic Mean Shift Algorithm ", Intl. Work. on Neural Networks for Signal Processing", /home/users/www/cnel_docroot/files/1485291325.pdf, pp. -, 9 2006
On-line learning in RKHS
Funding Source: NSF
On-line learning in RKHS provides a new class of nonlinear filters that are adapted one sample at a time and that also approximate the required nonlinearity incrementally. When the kernel is the Gaussian, they are growing RBF networks, where the weights are related to the error at each sample. Unlike neural networks, this class of nonlinear filters does not suffer from local minima. They have many potential applications in nonlinear signal processing and the methods can also be applied to large machine learning applications.
On-line KLMS is intrinsically regularized
Regularization seems to be always required in kernel space systems. However, we proved that the KLMS is well posed in the sense of Hadamard. Since the KLMS algorithm adapts the filter weights always in the direction of the gradient, the solution never leaves the data manifold, hence KLMS does not need regularization. The stepsize controls how regularized the solution is.
- Weifeng Liu, P.P.Pokharel, Jose C. Principe, Kernel Least Mean Square Algorithm , Trans. on Signal Processing, Vol. 56, No. 2, pp. 543-554, Dec 2007
Nonlinear adaptive filters in RKHS
The beauty of implementing linear adaptive filters in RKHS is that they are nonlinear in the input space. We propose the class of Kernel Affine Projection Algorithms (KAPA) as a general framework for on-line algorithms in RKHS. We have recently been able to develop a kernelized version of the extended RLS algorithm (Ex-KRLS) for tracking, which has a scalar state in the RKHS.
- * Liu W., Principe J., Kernel Affine Projection Algorithms, European J. of Signal Processing, Special Issue on Machine Learning for Signal Processing, 2008
- * Liu W., Principe J., Extended Recursive Least Squares in RKHS, in Proc. First Workshop on Cognitive Signal Processing, Santorini, Greece, 2008
Active Learning Strategies
One of the issues of on-line learning filters is that the filter grows with each new sample. It is intuitive that once sufficient samples are used, it is wasteful to create a new center for each new sample because of redundancy. How to choose samples that are relevant is a nontrivial matter. We are investigating a class of algorithms based on an instantaneous information cost to select if the new sample should be incorporated or not in the filter. The criterion for filtering can be exactly estimated using Gaussian process theory
- * Liu W., Principe J., Active Online Gaussian Process Regression Based on Conditional Information, submitted to NIPS 2008.