I recently attended an ICML tutorial on submodular optimization and I enjoyed it so much that I continued to read a review/tutorial paper: “Submodularity in Action: From Machine Learning to Signal Processing Applications”.

As I’m reading the paper, I just wanted to summarize what I’m learning on this blog. Initially, I was very ambitious and wanted to recreate something similar to the ICML tutorial. Soon, I realized that that will require probably one entire semester, not a week. Now, I’m just breaking things down into much smaller posts 🙂

Let me begin with the definition of submodular functions. The core concept in submodular function is diminishing return:

    \[f(e|\mathcal{A}) \leq f(e|\mathcal{B}) \;\; \text{ if } \mathcal{B} \subseteq \mathcal{A},\]

where f(e|\mathcal{S}) is defined as:

    \[f(e|\mathcal{S}) = f(e \cup \mathcal{S}) - f(\mathcal{S}) \;\; \text{ for } e \in \mathcal{S}^C.\]

So, if you consider f as some reward function, f(e|\mathcal{S}) is an additional reward you get if you add e into the set \mathcal{S}.

The best way to visualize this is through the “covering” problem, e.g, covering area with Internet access. As you add more transmission towers, there is more area covered, i.e, that has Internet access. Initially, every new tower you build, you have a new area that newly gains Internet access. However, once you have a few towers, the area covered by a new tower will start to have overlap with already-covered areas, and the amount of newly covered area decreases. See the image below:

  Image borrowed from: http://iid.yale.edu/icml/icml-20.md/

The concept of submodularity is important for two reasons:

  1. “Diminishing returns” scenario occurs frequently in real-world. Especially, it gained a lot of attention recently due to its applications in machine learning. For instance, feature selection can be modeled as a submodular optimization problem where you want to find features that explain the variance of the output the most. Clustering can also be formulated as a submodular optimization problem: choosing subsets that maximize the combinatorial dependence function.
  2. Submodularity is a discrete-function counterpart of convexity. Thanks to this similarity, some submodular optimization problems can be solved efficiently. For unconstrained submodular minimization, we can obtain an exact solution by utilizing convexity. For many constrained submodular optimization problems, there are greedy algorithms that provide approximation guarantees. (Will cover this in later posts!)

In this post, I am just going to talk about one thing:

Any set function can be represented as a difference between two submodular functions.

This is a really useful property because it lets you use techniques for maximizing the difference of two convex functions to maximize an arbitrary set function.

Now let’s prove why this is true!

 

 

Theorem 1. For any set function f: 2^{\mathcal{N} \rightarrow \mathbb{R}  defined over a ground set \mathcal{N}, there exist submodular set functions g: 2^{\mathcal{N} \rightarrow \mathbb{R} and h: 2^{\mathcal{N} \rightarrow \mathbb{R} such that:

(1)   \begin{equation*} f(\mathcal{S}) = g(\mathcal{S}) - h(\mathcal{S}) \quad \forall \mathcal{S} \subseteq \mathcal{N}. \end{equation*}

(Proof of Theorem 1)

Suppose f is any set function on \mathcal{N} and h is a submodular function on \mathcal{N}. And, let g be such that:

    \[g(\mathcal{S}) = f(\mathcal{S}) + h(\mathcal{S}).\]

If

(2)   \begin{equation*}  g(e|\mathcal{A})  \leq g(e|\mathcal{B}) \end{equation*}

is satisfied for every e \in \mathcal{N}, \mathcal{A} \subset \mathcal{B} \subset \mathcal{N}, we are all set!

If not, find e \in \mathcal{N}, \mathcal{A} \subset \mathcal{B} \subset \mathcal{N} that violates (2), i.e.,

(3)   \begin{equation*}  g(e|\mathcal{A}) > g(e|\mathcal{B}) \text{ for } \mathcal{B} \subset \mathcal{A}. \end{equation*}

We will show that we can change h that will remove the pair (e, \mathcal{A}, \mathcal{B}) that violates the condition (2) while maintaining the submodularity of h.

To do that, we will first prove a small lemma.

Lemma 1. Let h be a submodular function on \mathcal{N}. Let h' be another set function on \mathcal{N} such that:

    \[h'(\mathcal{S}) = \begin{cases} h(\mathcal{S}) -c  \text{ for } \mathcal{S} \in \mathcal{B}  \\ h(\mathcal{S}) \text{ otherwise.} \end{cases}\]

Then, h' is also a submodular function on \mathcal{N}.

(Proof of Lemma 1) 

We have to show that for all e \in \mathcal{N} and for all \mathcal{C} \subset \mathcal{A} \subseteq \mathcal{N},

(4)   \begin{equation*}  h'(e|\mathcal{A}) \leq h'(e|\mathcal{C}). \end{equation*}

Let’s divide this into three cases: i) \mathcal{B} \subset \mathcal{C} \subseteq \mathcal{A},  ii) \mathcal{C} \subseteq \mathcal{B} \subseteq \mathcal{A}, iii) \mathcal{C} \subseteq \mathcal{A} \subseteq \mathcal{B}.

i) h'(e|\mathcal{C}) = h(e|\mathcal{C}) and h'(e|\mathcal{A}) = h(e|mathcal{A}). Hence, (4) holds.

ii)

    \[h'(e|\mathcal{C}) = h'(\mathcal{C} \cup \{ e \}) - h'(\mathcal{C}) = h(\mathcal{C}') - h(\mathcal{C}) + c.\]

    \[h'(e|\mathcal{A}) = \begin{cases} h(\mathcal{A}') - h(\mathcal{A}) + c \quad \text{if} \mathcal{A} = \mathcal{B} \\ h(e|\mathcal{A}) \quad \text{otherwise}. \end{cases}\]

Hence, h'(e|\mathcal{A}) - h'(e|\mathcal{C}) \leq h(e|\mathcal{A}) - h(e|\mathcal{C}) = 0.

iii)

    \[h'(e|\mathcal{C}) = h'(e \cup \mathcal{C}) - h'(\mathcal{C}) = \begin{cases} h(\mathcal{C}') - h(\mathcal{C}), \quad \text { if } e \in \mathcal{B}\setminus \mathcal{A} \\ h(\mathcal{C}') - h(\mathcal{C}) + c \quad \text{otherwise} \end{cases},\]

where \mathcal{C}' = \mathcal{C} \cup \{ e \}, and, the same will hold for \mathcal{A}. I.e.,

    \[h'(e|\mathcal{A}) = h'(e \cup \mathcal{A}) - h'(\mathcal{A}) = \begin{cases} h(\mathcal{A}') - h(\mathcal{A}) \quad \text { if } e \in \mathcal{B}\setminus \mathcal{A} \\ h(\mathcal{A}') - h(\mathcal{A}) + c \quad \text{otherwise} \end{cases}\]

.

Hence, for either case, h'(e|\mathcal{A}) - h'(e|\mathcal{C}) = h(e|\mathcal{A}) - h(e|\mathcal{C}).

\square

Now, let \delta(e, \mathcal{A}, \mathcal{B}) = g(e|\mathcal{A}) - g(e|\mathcal{B}). Let

    \[h'(\mathcal{S}) = \begin{cases} h(\mathcal{S}) -\delta{e, \mathcal{A}, \mathcal{B})  \text{ for } \mathcal{S} \in \mathcal{B}  \\ h(\mathcal{S}) \text{ otherwise.} \end{cases}\]

Then, h' is still a submodular function following Lemma 1. Now let g' = f + h'.

    \begin{align*} g'(e|\mathcal{A}) - g'(e|\mathcal{B}) &= g(e|\mathcal{A}) - (g'(e \cup \mathcal{B}) - g'(\mathcal{B})) \\ &= g(e|\mathcal{A})  - (g(e \cup \mathcal{B}) - (g(\mathcal{B}) - \delta(e,\mathcal{A}, \mathcal{B}) )) \\ &= g(e|\mathcal{A}) - g(e|\mathcal{B}) - \delta (e,\mathcal{A}, \mathcal{B})  \\ & = \delta(e, \mathcal{A}, \mathcal{B}) - \delta (e,\mathcal{A}, \mathcal{B})  = 0. \end{align*}

 

We can repeat this process until there is no such pair (e, \mathcal{A}, \mathcal{B}) that violates (2).

\square

 

 

 

That’s it for today, and I hope I can come back with another submodular optimization post or quantum algorithm post soon 🙂