Jekyll2023-11-14T11:29:02-08:00https://mastane.github.io/feed.xmlMastane AchabResearcher in artificial intelligenceMastane AchabBeyond Convexity #1: Introduction to Cross-Convexity2023-10-14T00:00:00-07:002023-10-14T00:00:00-07:00https://mastane.github.io/posts/2023/10/blog-post-1<p>In this blog post, we introduce a generalized notion of convexity for functions, that we call <a href="https://arxiv.org/abs/2309.15298">“cross-convexity”</a>, yielding inequalities that involve additional interaction terms compared to <a href="https://en.wikipedia.org/wiki/Convex_function">standard convexity</a>.</p>
<p><u>Definition:</u> A function $F:\mathbb{R}^d \rightarrow \mathbb{R}$ is said cross-convex if there exists $S\ge 1$ <a href="https://en.wikipedia.org/wiki/Logarithmically_concave_function">log-concave functions</a> $p_1,\dots,p_S$ such that
\(\forall x\in \mathbb{R}^d , \ F(x) = -\log\left( \sum_{s=1}^S p_s(x) \right) .\)</p>
<p>If $S=1$, then a cross-convex function is simply convex.
In the general case $S\ge 1$, we argue that this family of functions is still a natural one to consider as it includes the negative log-likelihood of the Gaussian mixture model.</p>
<p>Let us first recall that a differentiable function $f:\mathbb{R}^d \rightarrow \mathbb{R}$ is convex if it dominates all its tangent hyperplanes:
\(\forall a, \forall x, f(x) \ge f(a) + \nabla f(a)^\top (x-a).\)</p>
<p>$\bullet$ <u>Summation/Affine Closure:</u> A very important characteristic of the family of convex functions is that it is closed under (i) summation and (ii) affine reparametrization.
Formally, (i) if $f_1,f_2$ are two convex functions, then $f_1+f_2$ is also convex ; and (ii) if $f$ is convex, then $z \mapsto f(Az+b)$ is also convex.
These two closure properties are very important in machine learning where the objective function is typically equal to a sum over the dataset with affine neuronal transformations of the input data.
A first challenge when trying to generalize convexity for ML applications is to pick a family of functions satisfying such closedness under summation and affine reparametrization.
For instance, while the notion of <a href="https://en.wikipedia.org/wiki/Quasiconvex_function">quasi-convexity</a> is often cited as a natural extension of standard convexity, unfortunately it is not closed under summation since the sum of two quasi-convex functions is not necessarily quasi-convex.</p>
<p><u>Proposition:</u> Let $F = -\log\left( \sum_{s=1}^S p_s \right)$ and $\tilde F = -\log\left( \sum_{s=1}^{\tilde S} \tilde p_s \right)$ be two cross-convex functions with $S,\tilde S \ge 1$ and $p_s,\tilde p_s : \mathbb{R}^d \rightarrow (0,+\infty)$ log-concave functions.
Then, the sum $F+\tilde F$ is also cross-convex. Indeed,
\(F+\tilde F = -\log\left( \sum_{s=1}^S \sum_{s'=1}^{\tilde S} p_s \tilde p_{s'} \right) ,\)
where the product of two log-concave functions $p_s \tilde p_{s’}$ is also log-concave.</p>
<p>Moreover, the class of cross-convex functions is also closed under affine reparametrization (follows from the closedness of log-concave functions).</p>
<p>$\bullet$ <u>Generalized convexity inequality:</u> Note that the right-hand side in the convexity inequality, namely “$f(a) + \nabla f(a)^\top (x-a)$”, represents the tangent hyperplane of $f$ at the point $a$.
In particular, this lower bound is a critical ingredient in the analysis of gradient descent (see e.g. <a href="https://www.di.ens.fr/~fbach/ltfp_book.pdf">Bach’s LTFP book</a>).</p>
<p><u>Key Remark:</u> For any element $\mu \in [0,1]^S$ of the probability simplex (i.e., $\lVert \mu \rVert_1=1$), the KL-regularized log-loss
\(\ell_\mu(x) = -\log( p_1(x)+\dots+p_S(x) ) + D_{\text{KL}}\left( \mu \Bigg\| \left[ \frac{p_s(x)}{ \sum_{s'} p_{s'}(x) } \right]_{s} \right)\)
is convex.</p>
<p>In the following, we explain step-by-step how to obtain a similar lower bound in the cross convex case.
For simplicity, we focus on the very specific case $S=2, d=2$ and $p_1(x,y)=p_2(y,x)=p(x)$ for some log-concave function $p: \mathbb{R} \rightarrow (0,+\infty)$, which already unveils phenomena unseen in the convex scenario. We refer the reader to Lemma 1 in <a href="https://arxiv.org/abs/2309.15298">“Beyond Log-Concavity: Theory and Algorithm for Sum-Log-Concave Optimization”</a> for the more general statement.</p>
<h3 id="steps">Steps</h3>
<ul>
<li>Given log-concave function $p$</li>
<li>Compute cross-convex function $F(x,y)=-\log(p(x)+p(y))$</li>
<li>Tangent lower bound $\mathcal{T}_{a,b}(x, y) \le F(x,y)$ at point $(a,b)$:
\(\mathcal{T}_{a,b}(x, y) = F(a,b) + \nabla F(a,b)^\top \begin{pmatrix} x-a \\ y-b \end{pmatrix} - D_{\text{KL}}\left( \begin{pmatrix} \frac{p(a)}{p(a)+p(b)} \\ \frac{p(b)}{p(a)+p(b)} \end{pmatrix} \, \Bigg \| \, \begin{pmatrix} \frac{p(x)}{p(x)+p(y)} \\ \frac{p(y)}{p(x)+p(y)} \end{pmatrix} \right)\)</li>
<li><u>Note:</u> <em>Actually, the negative sign in front of the KL is bad news for the analysis of gradient descent…check out my paper to see how to solve that issue, by considering a reweighted version of the gradient</em></li>
</ul>
<p>$\bullet$ \(\mathfrak{B}\)<u>estiary:</u> Collection of illustrations of cross-convex functions (in red) with their tangent surface (in green): <a href="https://github.com/mastane/TheXCB">https://github.com/mastane/TheXCB</a>. The green tangent surface represents the lower bound $\mathcal{T}_{0,0}$ in the generalized convexity inequality.</p>
<ul>
<li>
<p>\(\mathcal{G}\)<u>aussian mixture</u>
<img src="https://mastane.github.io/images/XCB/gifs/rotating_plot_00_Gaussian.gif" width="100%" height="100%" /></p>
</li>
<li>
<p>\(\mathcal{L}\)<u>ogistic mixture</u>
<img src="https://mastane.github.io/images/XCB/gifs/rotating_plot_00_Logistic.gif" width="100%" height="100%" /></p>
</li>
<li>
<p>\(\mathcal{H}\)<u>yperbolic</u> \(\mathcal{S}\)<u>ecant mixture</u>
<img src="https://mastane.github.io/images/XCB/gifs/rotating_plot_00_Sech.gif" width="100%" height="100%" /></p>
</li>
<li>
<p>\(\mathcal{G}\)<u>umbel mixture</u>
<img src="https://mastane.github.io/images/XCB/gifs/rotating_plot_00_Gumbel.gif" width="100%" height="100%" /></p>
</li>
</ul>
<hr />Mastane AchabIn this blog post, we introduce a generalized notion of convexity for functions, that we call “cross-convexity”, yielding inequalities that involve additional interaction terms compared to standard convexity.