Saturday, May 31, 2014

A Tutorial Introduction to Reproducing Kernel Hilbert Spaces - Part III

In this article of the series, we provide a summary of the theory of Reproducing Kernel Hilbert Spaces (RKHS), which was originally established by Aronszajn in his 1950 paper [1].

In the following, unless otherwise stated, we use $\mathcal{X}$ to denote an arbitrary nonempty set and $\mathcal{H}$ to denote a Hilbert space of complex-valued functions defined on $\mathcal{X}$. Therefore, $\mathcal{H}$ is a subset of $\mathbb{C}^\mathcal{X}$, the set of all complex-valued functions on $\mathcal{X}$. Although the general theory is presented for more general complex-valued Hilbert spaces, one can think of real-valued Hilbert spaces for ease of understanding. The following example provides an example for a real-valued Hilbert space over a non-empty set.

Example: 

Let $\mathcal{X} = [0, 1]$ and $\mathcal{H}$ be the set of real-valued, square-integrable functions on $\mathcal{X}$. That is,
$$
\mathcal{H} = \mathcal{F} = \{f | f : [0, 1] \to \mathbb{R}, \int_{0}^{1} |f(x)|^2 dx < \infty \}.
$$
As was hinted in Exercise 2 of the previous post, $\mathcal{F}$ defined above is a Hilbert space. Recall that a Hilbert space is a vector space which has an inner product (additionally, it should also be complete with respect to the norm induced by the inner product). Therefore, a Hilbert space should have three key operations: addition & scalar multiplication, inherited from vector space, and inner product. To convince the reader that $\mathcal{F}$ is indeed an Hilbert space, we now summarize these three key operations in $\mathcal{F}$. In the following, think of $\mathcal{F}$ as a vector space and $f, g \in \mathcal{F}$ as vectors (although, at the same time, they are functions as well).

1. Addition - For each $f \in \mathcal{H}$ and $g \in \mathcal{H}$, we can define an element $(f + g) \in \mathcal{H}$ as, $(f + g) : [0, 1] \to \mathbb{R} : (f + g)(x) = f(x) + g(x)$
2. Scalar multiplication - For each $f \in \mathcal{H}$ and $a \in \mathbb{R}$, the element $(af) \in \mathcal{H}$ is defined as: $(af) : [0, 1] \to \mathbb{R} : (af)(x) = a f(x)$
3. Inner product - For each $f \in \mathcal{H}$ and $g \in \mathcal{H}$, the inner product in $\mathcal{H}$ is defined as,
$$
\langle f, g \rangle_\mathcal{H} = \int_0^1 f(x)g(x) dx
$$

Kernels

Throughout this article, we use the term kernel to identify a real or complex-valued bivariate function defined on an arbitrary set. We interchangebly use terms a kernel on $\mathcal{X}$ and a kernel on $\mathcal{X} \times \mathcal{X}$ for a function $k : \mathcal{X} \times \mathcal{X} \to \mathbb{C}$ (or $\mathbb{R}$, depending on the context). For example, if $\mathcal{X} = \mathbb{R}$, $k : \mathbb{R} \times \mathbb{R} \to \mathbb{R} : k(x, y) = xy$ is a real-valued kernel.

We now define reproducing kernels, a special type of kernel.

Definition (Reproducing Kernel)

Let $\mathcal{X}$ be a nonempty set and $\mathcal{H} \subseteq \mathbb{C}^\mathcal{X}$ be a Hilbert space of complex-valued functions defined on $\mathcal{X}$. A kernel $k : \mathcal{X} \times \mathcal{X} \to \mathbb{C}$ is called a reproducing kernel of $\mathcal{H}$ if it has the two properties:
  1. For every $x_0 \in \mathcal{X}$, $k(y, x_0)$ as a function of $y$ belongs to $\mathcal{H}$.
  2. The reproducing property: for each $x_0 \in \mathcal{X}$ and $f \in \mathcal{H}$, $f(x_0) = \langle f(.), k(.,x_0) \rangle_\mathcal{H}$

The first property is easy to understand: since $k(y, x)$ is a bivariate function, when we fix the second variable $x$ at $x_0$, we get a function of the first variable $y$, which goes from $\mathcal{X}$ to $\mathbb{C}$. Now, the first property requires this function to be a member of $\mathcal{H}$ (a subset of all functions from $\mathcal{X}$ to $\mathbb{C}$) for all $x_0 \in \mathcal{X}$. The second property can now be understood as follows: Pick any element $x_0$ of $\mathcal{X}$ and $f$ of $\mathcal{H}$. From the first property, $k(.,x_0)$ is in $\mathcal{H}$. Therefore, we should be able to calculate the inner product between two elements of $\mathcal{H}$, $f$ and $k(.,x_0)$. The second property requires this value (a complex number) to be equal to $f(x_0)$.

Evaluation Functional

A Hilbert space is a set endowed with vector space structure and an inner product. When we consider $\mathcal{H}$ as a conventional set, there is nothing unusual in defining functions from $\mathcal{H}$ to $\mathbb{C}$. However, since elements of $\mathcal{H}$ are also functions, this can be little confusing at times. We therefore use the term functional to refer to a function from $\mathcal{H}$ to $\mathbb{C}$, i.e., a function of a function, and reserve the term function to refer to a member of $\mathcal{H}$, i.e., a function from $\mathcal{X}$ to $\mathbb{C}$.

Pick an element $x_0$ from $\mathcal{X}$. Now, one can map each $f \in \mathcal{H}$ to the element $f(x_0) \in \mathbb{C}$. Because $f$ is a function from $\mathcal{X}$ to $\mathbb{C}$, only one $f(x_0)$ exists. Therefore, the mapping $f \mapsto f(x_0)$ is a function from $\mathcal{H}$ to $\mathbb{C}$ (or a functional in our terminology). This functional is known as the point evaluation functional at $x_0$. The formal definition is given below.

Point evaluation functional - The point evaluation functional $L_x : \mathcal{H} \to \mathbb{C}$ at a point $x \in \mathcal{X}$ is defined as $L_x(f) := f(x)$.

We are now ready to study the formal definition of an RKHS.

Definition (Reproducing Kernel Hilbert Space)

A Hilbert space $\mathcal{H}$ of complex-valued functions on a nonempty set $\mathcal{X}$ is called a reproducing kernel Hilbert space if the point evaluation functional $L_x$ is a bounded (equivalently, continuous) linear operator for all $x \in \mathcal{X}$.

The above definition should be self-explanatory at this point since we have previously discussed all related terms. Definition of a bounded linear operator was discussed in the previous post. Note that a linear operator between two normed vector spaces is bounded if and only if it is continuous.

We next state and prove the following theorem to get more insight into the concept of an RKHS.

Theorem
A Hilbert space $\mathcal{H}$ is an RKHS if and only if it has a reproducing kernel.

Proof:
First, assume that $\mathcal{H}$ is an RKHS. Denote the dual space of $\mathcal{H}$ that consists of all continuous linear functionals from $\mathcal{H}$ to $\mathbb{C}$ by $\mathcal{H}^*$. Since $\mathcal{H}$ is an RKHS, $L_x$ belongs to $\mathcal{H}^*$ for any $x \in \mathcal{X}$. Therefore, from the Riesz representation theorem, any $L_x$ has a unique representation of the form
$$
L_x(f) = \langle f, k_x \rangle_\mathcal{H}
$$
where $k_x \in \mathcal{H}$ depends only on $x$. Now, define the bivariate function $k : \mathcal{X} \times \mathcal{X} \to \mathbb{C}$ as $k(y, x) = k_x(y)$. By its definition, $k(y, x)$ gets the first property of a reproducing kernel. It also has the reproducing property since,
$$
f(x) = L_x(f) = \langle f(.), k(.,x) \rangle_\mathcal{H}\;,
$$
for all $x \in \mathcal{X}$ and $f \in \mathcal{H}$. Therefore, $k(y, x)$ is a reproducing kernel of $\mathcal{H}$.

To prove the converse, assume the existence of a reproducing kernel $k(y, x)$ for the Hilbert space $\mathcal{H}$. Recall the definition of a bounded linear operator from the previous post. Note that $L_x$ is a function from $\mathcal{H}$ to $\mathbb{C}$, the norm in the former induced by its inner product, while the norm in $\mathbb{C}$ is simply the absolute value.

Let $f \in \mathcal{H}$, then,

$|L_x(f)| = |f(x)| = |\langle f(.), k(.,x) \rangle_\mathcal{H}|$
             $\le \|f(.)\|_\mathcal{H} \|k(.,x)\|_\mathcal{H}$ (Cauchy-Schwarz inequality)
             $= \|f(.)\|_\mathcal{H} \langle k(.,x), k(.,x) \rangle_\mathcal{H}^{1/2}$
             $= \|f\|_\mathcal{H}k(x, x)^{1/2} \quad$ (from the second property of a reproducing kernel).

Therefore, from the definition, $L_x$ is a bounded linear operator for all $x \in \mathcal{X}$ and hence $\mathcal{H}$ is an RKHS.

For a given RKHS, the reproducing kernel is unique. We refer the reader to [1] for the proof of this uniqueness.

It is known that a reproducing kernel is characterized by the property of positive definiteness. In the next article, we shall discuss positive and negative definite kernels in detail and state their connection with reproducing kernel Hilbert spaces.


References

[1] Aronszajn, N. Theory of Reproducing Kernels. Transactions of the American Mathematical Society,(1950).

11 comments:

  1. Hi sadeep!

    Your explanation is quite neat, clear and to the point. A very good job!!!

    ReplyDelete
  2. Thank you for an excellent tutorial.

    ReplyDelete
  3. Wow, amazing work! Exactly what I was looking for as most other explanations (e.g. Wiki or Papers) are hard to understand.
    Thank you and keep us up to date!

    ReplyDelete
  4. Very well-written tutorial! Any plan on the next article?

    ReplyDelete
  5. Thank you for your helpful tutorials.
    In fact, Hilbert space and Reproducing Kernel Hilbert Spaces (RKHS) are related to Hilbert-Schmit Independence Criterion (HSIC).
    Would you please provide a helpful tutorials about HSIC?

    ReplyDelete