Wednesday, November 21, 2012

A Tutorial Introduction to Reproducing Kernel Hilbert Spaces - Part I

In this post we will study different spaces encountered in Linear Algebra and Functional Analysis. Our final goal is to understand what is meant by Reproducing Kernel Hilbert Spaces or RKHS which have a wide range of applications in machine learning and related fields. We begin with vector spaces, which any engineering undergraduate student is familiar with.

Vector Space

In simple terms, a vector space is a set whose elements have two characteristic properties:

1) Two elements can be added to obtain an element in the same set.
2) A given element can be multiplied by a scalar to obtain an element in the same set.

Elements of such a set are dubbed vectors. Perhaps, the simplest example of a vector space is $\mathbb{R}^3$ with elements of $\mathbb{R}$ as scalars. Addition and multiplication in this vector space can be demonstrated as follows.

\[\text{Addition:}\hspace{1cm} \begin{bmatrix} 1 \\ 2 \\ 3 \\ \end{bmatrix} + \begin{bmatrix} 4 \\ 5 \\ 6 \\ \end{bmatrix} = \begin{bmatrix} 5 \\ 7 \\ 9 \\ \end{bmatrix} \hspace{3cm} \text{Scalar Multiplication:} \hspace{1cm} 0.2 \begin{bmatrix} 9 \\ 8 \\ 7 \\ \end{bmatrix} = \begin{bmatrix} 1.8 \\ 1.6 \\ 1.4 \\ \end{bmatrix} \]

Formally, a vector space over a field $\mathbb{F}$ is a set $\mathcal{V}$ with two binary operations: addition (denoted by $+$) and scalar multiplication (denoted by $.$ or by writing two objects close to each other), that satisfy number of axioms for all $a, b \in \mathbb{F}$ and $\mathbf{u}, \mathbf{v} \in \mathcal{V}$. We omit these axioms here for the sake of brevity. Virtually in all cases $\mathbb{F}$ is either the field of real numbers ($\mathbb{R}$) or complex numbers ($\mathbb{C}$).

As you may have already noted, we use simple letters ($a, b, $ etc.) to denote elements of $\mathbb{F}$, known as scalars and boldface letters ($\mathbf{u}, \mathbf{v}$, etc.) to denote elements of $\mathcal{V}$, known as vectors. Hereafter, we may use $\mathcal{V}$ to denote a vector space without explicit reference to the field or the two operations.

Vector space introduces a very rich structure for a set. However, to perform mathematical analysis on a vector space, we need additional tools on the vector space such as a vector lengths, distance between vectors and angle measures. Note that a plain vector space lacks all these ideas. We shall first consider the notion of length of a vector in a vector space.

Normed Vector Space

Simply put, a normed vector space is a vector space whose vectors have lengths. The length or the norm of a vector $\mathbf{v}$ is denoted by $\|\mathbf{v}\|$. The norm has to satisfy several axioms as formalized by the following definition of a normed vector space.

Definition (Normed Vector Space):
A normed vector space $(\mathcal{V}, \|.\|)$, is a vector space $\mathcal{V}$ over a field $\mathbb{F}$, endowed with a norm, $\|.\| : \mathcal{V} \to \mathbb{R}$ that satisfies following axioms for all $\mathbf{v} \in \mathcal{V}$ and $a \in \mathbb{F}$.
  1. $\|\mathbf{v}\| \ge 0$.
  2. $\|\mathbf{v}\| = 0$ if and only if $\mathbf{v} = \mathbf{0}$.
  3. $\|a\mathbf{v}\| = |a| \|\mathbf{v}\|$.
  4. $\|\mathbf{u} + \mathbf{v}\| \le \|\mathbf{u}\| + \|\mathbf{v}\|$.
Defining a norm on a vector space enables us to measure distance between vectors in the space. Distance between two vectors $\mathbf{u}, \mathbf{v} \in \mathcal{V}$ is equal to the the length of $\mathbf{u} - \mathbf{v}$ 1. Formally, $d(\mathbf{u}, \mathbf{v}) = \|\mathbf{u} - \mathbf{v}\|$ where $d(., .)$ is the distance function.

In fact, the distance between points can be defined on any nonempty set which is not necessarily a vector space. A set with a distance measure between its elements is known as a metric space. We now present the formal definition of a metric space.

Metric Space

Definition (Metric Space):
A metric space $(M, d)$ is a set $M$ along with a function $d : M \times M \to \mathbb{R}$ that satisfies following axioms for all $x, y, z \in M$.
  1. $d(x, y) \ge 0$.
  2. $d(x, y) = 0$ if and only if $x = y$.
  3. $d(x, y) = d(y, x)$.
  4. $d(x, z) \le d(x,y) + d(y, z)$.
The function $d$ that satisfies above axioms is known as the metric or the distance function. The last axiom (or property when it's satisfied by a metric), known as the triangle inequality, is perhaps the most interesting property of a metric.

Every normed vector space is a metric space. But the converse is not true; we don't require a vector space structure on a set to define a valid metric on it.

Banach Space

A Banach space is a complete normed vector space. What exactly is completeness? To define completeness, we first need to understand what is meant by a Cauchy sequence.

Cauchy Sequence

A Cauchy sequence is a sequence of elements from a set where nearby elements get increasingly close to each other (see Figure 1). For the above sentence to make sense, we need a way to measure the closeness (or the distance) between two elements of the set. Hence, Cauchy sequences are defined on metric spaces. Below we give the definition of a Cauchy sequence in a normed vector space and a more general metric space. The reader can easily see that the first definition is just a special case of the second.

Definition (Cauchy Sequence):
Normed Vector Space: In a normed vector space $(\mathcal{V}, \|.\|)$ a sequence of vectors $\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3,\ldots$ is called a Cauchy sequence if for all $\varepsilon > 0$, there exists an $N \in \mathbb{Z}^+$ such that, for all $n, m > N$, $\|\mathbf{x}_n - \mathbf{x}_m\| < \varepsilon$.

Metric Space: In a metric space $(M, d)$, a sequence $x_1, x_2, x_3, \ldots$ where $x_i \in M$ for $i = 1,2,3,\ldots$ is called a Cauchy sequence if for all $\varepsilon > 0$, there exists an $N \in \mathbb{Z}^+$ such that, for all $n, m > N$, $d(x_n, x_m) < \varepsilon$.

Intuitively, a Cauchy sequence is a sequence that converges or has a limit as $n$ goes to infinity. However, for some spaces and for some Cauchy sequences, this limit might not be an element of the space on which the sequence is defined. Therefore, such a limit does not exist, and the Cauchy sequence does not converge.

A Cauchy sequence on R
Figure 1. A Cauchy sequence on $\mathbb{R}$. As $n$ grows, $x_n$ s become arbitrary close to each other and to the limit of the sequence (1 in this case).


A metric space is called complete if every Cauchy sequence defined on it is convergent, i.e., every Cauchy sequence has a limit which is a point in the same space. It's worth noting again that we must have a metric defined on the space to talk about Cauchy sequences and hence about completeness. Therefore, the completeness is always defined with respect to a metric.

An example for a complete space is $\mathbb{R}^n$ with respect to the usual distance measure (Euclidean distance). Let's consider $\mathbb{R}$, to be more specific. Euclidean distance narrows down to absolute value in this case. Every Cauchy sequence defined on $\mathbb{R}$ with respect to this metric converges to an element of $\mathbb{R}$, hence $\mathbb{R}$ is complete. 

Consider the set of rational numbers, $\mathbb{Q}$ with absolute value as the metric. The sequence defined recursively on $\mathbb{Q}$ as $x_1 = 1$, $x_n = x_n/2 + 1/x_n$ is Cauchy but does not have a limit in $\mathbb{Q}$. In fact, if the same sequence is defined on $\mathbb{R}$, its limit is identified as $\sqrt{2}$ which is not an element of $\mathbb{Q}$.

Now, coming back to the definition of a Banach space, we understand that a Banach space is a normed vector space where all Cauchy sequences converge (or equivalently, have limits). We can think of an incomplete space as a space with holes. A complete space, on the other hand, does not have any holes. Completing a space (something we will encounter shortly), is similar to filling holes in the space by adding missing elements to it.

In the next article we shall continue the tutorial to discuss inner product spaces, Hilbert spaces and Reproducing Kernel Hilbert Spaces.

1 Subtraction 
$-$ in a vector space is defined as $\mathbf{u} - \mathbf{v} := \mathbf{u} + (-\mathbf{v})$ where $-\mathbf{v}$ is the additive inverse of $\mathbf{v}$