Steven Soojin KimJekyll2019-02-09T18:39:24-05:00http://ssk.im/Steven Soojin Kimhttp://ssk.im/steven@ssk.im<![CDATA[Thinning PRMs]]>http://ssk.im/blog/poisson2016-03-06T00:00:00-05:002016-03-06T00:00:00-05:00Steven Soojin Kimhttp://ssk.imsteven@ssk.im
<p>Let <script type="math/tex">X</script> be a Poisson random variable with rate <script type="math/tex">\lambda</script>, and let <script type="math/tex">Y</script> be an independent Poisson random variable with rate <script type="math/tex">\mu</script>. Then, simple calculations show that <script type="math/tex">X+Y</script> is a Poisson random variable with rate <script type="math/tex">\lambda + \mu</script>. This is known as the <em>superposition</em> property of the Poisson distribution. In the opposite direction, suppose <script type="math/tex">X</script> is a Poisson random variable with rate <script type="math/tex">\lambda</script>, and let <script type="math/tex">Z_1,Z_2,\dots</script> be an independent sequence of i.i.d. Bernoulli random variables with probability <script type="math/tex">p</script>. Then,
<script type="math/tex">\sum_{i=1}^\infty Z_i \mathbb{I}_{\{i \le X\}}</script>
is Poisson distributed with rate <script type="math/tex">\lambda p</script>. This is known as the <em>thinning</em> property. It turns out that this latter property is far more general, and the goal of this post is to illustrate thinning for general Poisson random measures.</p>
<p>For a given <script type="math/tex">\sigma</script>-finite measure space <script type="math/tex">(E,\mathcal{E},\mu)</script>, the <strong>Poisson random measure (PRM)</strong> with intensity <script type="math/tex">\mu</script> is a random measure <script type="math/tex">\omega\mapsto N_\omega(\cdot)</script> defined on some probability space <script type="math/tex">(\Omega,\mathcal{F},\mathbb{P})</script> such that:</p>
<ol>
<li>for all <script type="math/tex">\omega\in\Omega</script>, <script type="math/tex">N_\omega(\cdot)</script> is a measure on <script type="math/tex">(E,\mathcal{E})</script>;</li>
<li>for all <script type="math/tex">A \in \mathcal{E}</script>, the random variable <script type="math/tex">\omega\mapsto N_\omega(A)</script> is Poisson distributed with rate <script type="math/tex">\mu(A)</script>;</li>
<li>if <script type="math/tex">A_1,\dots, A_k \in \mathcal{E}</script> do not intersect, then <script type="math/tex">N(A_1),\dots,N(A_k)</script> are mutually independent.</li>
</ol>
<p>(In the remainder of this post, I’ll omit <script type="math/tex">\omega</script>).</p>
<blockquote>
<p><strong>Example.</strong> Suppose <script type="math/tex">E=\mathbb{R}_+</script>, let <script type="math/tex">\mathcal{E}</script> be the Borel $\sigma$-algebra, and <script type="math/tex">\mu(dx) = \lambda \,dx</script>, where <script type="math/tex">dx</script> is the Lebesgue measure. Then, the process <script type="math/tex">t \mapsto N([0,t])</script> is a Poisson process with rate <script type="math/tex">\lambda</script>.</p>
</blockquote>
<p>More generally, (integrals with respect to) Poisson random measures offer a convenient way to describe and represent stochastic processes with jumps. Compare to the setting of processes with continuous paths, where the Ito integral plays the corresponding role.</p>
<p>One approach to proving large deviations results for stochastic processes is to characterize certain changes of measure as “control” problems for random paths. In the continuous setting, the Girsanov theorem indicates that such changes of measure are achieved by imposing drift via adapted processes. In the jump setting, such “control” is achieved through a generalization of the “thinning” mechanism described above.</p>
<p>Instead of getting into details, I’d like to just visualize this thinning mechanism in a very simple case: with constant “control”. Consider the plot below, which displays the outcome of a Poisson random measure <script type="math/tex">N</script> on <script type="math/tex">[0,40] \times [0,1]</script>, with intensity the Lebesgue measure. The blue and grey points together represent the outcome of the Poisson random measure <script type="math/tex">N</script>. The blue dots alone represent a thinning of <script type="math/tex">N</script>; that is, the outcome of the PRM <script type="math/tex">\mathbb{I}_{[0,c]}(x) N(dt\, dx)</script>, where in the plot below, I have chosen <script type="math/tex">c=0.65</script>. In the second plot are the associated homogeneous Poisson processes <script type="math/tex">t\mapsto N([0,t]\times [0,1])</script>, again with grey representing the total process (with rate 1) and blue representing the thinned process (with rate 0.65). In particular, every jump of the blue path (respectively, grey path) corresponds to a blue point (respectively, a blue or grey point). Lastly, the dashed lines represent the “average” behavior of the associated Poisson processes.</p>
<div id="fig_el5597744156283687044080157"></div>
<script>
function mpld3_load_lib(url, callback){
var s = document.createElement('script');
s.src = url;
s.async = true;
s.onreadystatechange = s.onload = callback;
s.onerror = function(){console.warn("failed to load library " + url);};
document.getElementsByTagName("head")[0].appendChild(s);
}
if(typeof(mpld3) !== "undefined" && mpld3._mpld3IsLoaded){
// already loaded: just create the figure
!function(mpld3){
mpld3.draw_figure("fig_el5597744156283687044080157", {"axes": [{"xlim": [0.0, 40.0], "yscale": "linear", "axesbg": "#FFFFFF", "texts": [{"v_baseline": "hanging", "h_anchor": "middle", "color": "#000000", "text": "time", "coordinates": "axes", "zorder": 3, "alpha": 1, "fontsize": 12.0, "position": [0.5, -0.13177083333333339], "rotation": -0.0, "id": "el559774415805520"}, {"v_baseline": "auto", "h_anchor": "middle", "color": "#000000", "text": "thinning dimension", "coordinates": "axes", "zorder": 3, "alpha": 1, "fontsize": 12.0, "position": [-0.059538810483870969, 0.5], "rotation": -90.0, "id": "el559774467029968"}], "zoomable": true, "images": [], "xdomain": [0.0, 40.0], "ylim": [0.0, 1.0], "paths": [], "sharey": [], "sharex": [], "axesbgalpha": null, "axes": [{"scale": "linear", "tickformat": null, "grid": {"gridOn": false}, "fontsize": 12.0, "position": "bottom", "nticks": 9, "tickvalues": null}, {"scale": "linear", "tickformat": null, "grid": {"gridOn": false}, "fontsize": 12.0, "position": "left", "nticks": 6, "tickvalues": null}], "lines": [{"color": "#7F7F7F", "yindex": 1, "coordinates": "data", "dasharray": "2,2", "zorder": 2, "alpha": 0.5, "xindex": 0, "linewidth": 2, "data": "data01", "id": "el559774467432656"}], "markers": [], "id": "el559774415628176", "ydomain": [0.0, 1.0], "collections": [{"paths": [[[[0.0, -0.5], [0.13260155, -0.5], [0.25978993539242673, -0.44731684579412084], [0.3535533905932738, -0.3535533905932738], [0.44731684579412084, -0.25978993539242673], [0.5, -0.13260155], [0.5, 0.0], [0.5, 0.13260155], [0.44731684579412084, 0.25978993539242673], [0.3535533905932738, 0.3535533905932738], [0.25978993539242673, 0.44731684579412084], [0.13260155, 0.5], [0.0, 0.5], [-0.13260155, 0.5], [-0.25978993539242673, 0.44731684579412084], [-0.3535533905932738, 0.3535533905932738], [-0.44731684579412084, 0.25978993539242673], [-0.5, 0.13260155], [-0.5, 0.0], [-0.5, -0.13260155], [-0.44731684579412084, -0.25978993539242673], [-0.3535533905932738, -0.3535533905932738], [-0.25978993539242673, -0.44731684579412084], [-0.13260155, -0.5], [0.0, -0.5]], ["M", "C", "C", "C", "C", "C", "C", "C", "C", "Z"]]], "edgecolors": ["#000000"], "edgewidths": [1.0], "offsets": "data02", "yindex": 1, "id": "el559774467431376", "pathtransforms": [[7.027283689263066, 0.0, 0.0, 7.027283689263066, 0.0, 0.0]], "pathcoordinates": "display", "offsetcoordinates": "data", "zorder": 1, "xindex": 0, "alphas": [0.5], "facecolors": ["#1199EE"]}, {"paths": [[[[0.0, -0.5], [0.13260155, -0.5], [0.25978993539242673, -0.44731684579412084], [0.3535533905932738, -0.3535533905932738], [0.44731684579412084, -0.25978993539242673], [0.5, -0.13260155], [0.5, 0.0], [0.5, 0.13260155], [0.44731684579412084, 0.25978993539242673], [0.3535533905932738, 0.3535533905932738], [0.25978993539242673, 0.44731684579412084], [0.13260155, 0.5], [0.0, 0.5], [-0.13260155, 0.5], [-0.25978993539242673, 0.44731684579412084], [-0.3535533905932738, 0.3535533905932738], [-0.44731684579412084, 0.25978993539242673], [-0.5, 0.13260155], [-0.5, 0.0], [-0.5, -0.13260155], [-0.44731684579412084, -0.25978993539242673], [-0.3535533905932738, -0.3535533905932738], [-0.25978993539242673, -0.44731684579412084], [-0.13260155, -0.5], [0.0, -0.5]], ["M", "C", "C", "C", "C", "C", "C", "C", "C", "Z"]]], "edgecolors": ["#000000"], "edgewidths": [1.0], "offsets": "data03", "yindex": 1, "id": "el559774467433360", "pathtransforms": [[7.027283689263066, 0.0, 0.0, 7.027283689263066, 0.0, 0.0]], "pathcoordinates": "display", "offsetcoordinates": "data", "zorder": 1, "xindex": 0, "alphas": [0.25], "facecolors": ["#7F7F7F"]}], "xscale": "linear", "bbox": [0.125, 0.53636363636363638, 0.77500000000000002, 0.36363636363636365]}, {"xlim": [0.0, 40.0], "yscale": "linear", "axesbg": "#FFFFFF", "texts": [{"v_baseline": "hanging", "h_anchor": "middle", "color": "#000000", "text": "time", "coordinates": "axes", "zorder": 3, "alpha": 1, "fontsize": 12.0, "position": [0.5, -0.13177083333333334], "rotation": -0.0, "id": "el559774467608528"}, {"v_baseline": "auto", "h_anchor": "middle", "color": "#000000", "text": "number of jumps", "coordinates": "axes", "zorder": 3, "alpha": 1, "fontsize": 12.0, "position": [-0.051411290322580641, 0.5], "rotation": -90.0, "id": "el559774467706512"}], "zoomable": true, "images": [], "xdomain": [0.0, 40.0], "ylim": [0.0, 60.0], "paths": [], "sharey": [], "sharex": [], "axesbgalpha": null, "axes": [{"scale": "linear", "tickformat": null, "grid": {"gridOn": false}, "fontsize": 12.0, "position": "bottom", "nticks": 9, "tickvalues": null}, {"scale": "linear", "tickformat": null, "grid": {"gridOn": false}, "fontsize": 12.0, "position": "left", "nticks": 7, "tickvalues": null}], "lines": [{"color": "#7F7F7F", "yindex": 1, "coordinates": "data", "dasharray": "10,0", "zorder": 2, "alpha": 0.3, "xindex": 0, "linewidth": 1.5, "data": "data04", "id": "el559774468088784"}, {"color": "#1199EE", "yindex": 1, "coordinates": "data", "dasharray": "10,0", "zorder": 2, "alpha": 0.6, "xindex": 0, "linewidth": 1.5, "data": "data05", "id": "el559774468146832"}, {"color": "#7F7F7F", "yindex": 0, "coordinates": "data", "dasharray": "6,6", "zorder": 2, "alpha": 0.5, "xindex": 0, "linewidth": 1.0, "data": "data01", "id": "el559774468148560"}, {"color": "#1199EE", "yindex": 2, "coordinates": "data", "dasharray": "6,6", "zorder": 2, "alpha": 0.5, "xindex": 0, "linewidth": 1.0, "data": "data01", "id": "el559774468150160"}], "markers": [], "id": "el559774467525840", "ydomain": [0.0, 60.0], "collections": [], "xscale": "linear", "bbox": [0.125, 0.099999999999999978, 0.77500000000000002, 0.36363636363636365]}], "height": 480.0, "width": 640.0, "plugins": [{"type": "reset"}, {"enabled": false, "button": true, "type": "zoom"}, {"enabled": false, "button": true, "type": "boxzoom"}], "data": {"data04": [[0.0, 0.0], [1.042884762249856, 0.0], [1.042884762249856, 1.0], [2.508217998455069, 1.0], [2.508217998455069, 2.0], [2.862900439748115, 2.0], [2.862900439748115, 3.0], [3.5834466806438803, 3.0], [3.5834466806438803, 4.0], [3.8961033730992645, 4.0], [3.8961033730992645, 5.0], [5.620597519400867, 5.0], [5.620597519400867, 6.0], [5.653249874122497, 6.0], [5.653249874122497, 7.0], [5.724751483373458, 7.0], [5.724751483373458, 8.0], [5.808782685799034, 8.0], [5.808782685799034, 9.0], [5.884039621625585, 9.0], [5.884039621625585, 10.0], [5.932809093025075, 10.0], [5.932809093025075, 11.0], [5.97153978397869, 11.0], [5.97153978397869, 12.0], [8.75855465488424, 12.0], [8.75855465488424, 13.0], [8.890785405075356, 13.0], [8.890785405075356, 14.0], [12.0541773076252, 14.0], [12.0541773076252, 15.0], [12.108560366533467, 15.0], [12.108560366533467, 16.0], [12.842873808832994, 16.0], [12.842873808832994, 17.0], [13.612250993832653, 17.0], [13.612250993832653, 18.0], [14.581638265699336, 18.0], [14.581638265699336, 19.0], [15.657330426681062, 19.0], [15.657330426681062, 20.0], [15.996054166457654, 20.0], [15.996054166457654, 21.0], [20.873825062267628, 21.0], [20.873825062267628, 22.0], [23.81746713175772, 22.0], [23.81746713175772, 23.0], [24.809394880531475, 23.0], [24.809394880531475, 24.0], [27.482616276913966, 24.0], [27.482616276913966, 25.0], [27.622807374916114, 25.0], [27.622807374916114, 26.0], [28.27880121219315, 26.0], [28.27880121219315, 27.0], [31.85770714654026, 27.0], [31.85770714654026, 28.0], [32.93600494115783, 28.0], [32.93600494115783, 29.0], [35.32034892280397, 29.0], [35.32034892280397, 30.0], [35.78855252700181, 30.0], [35.78855252700181, 31.0], [36.94701158325401, 31.0], [36.94701158325401, 32.0], [37.01769689882119, 32.0], [37.01769689882119, 33.0], [37.02121093843795, 33.0], [37.02121093843795, 34.0], [37.02121093843795, 34.0], [40.0, 34.0]], "data05": [[0.0, 0.0], [1.042884762249856, 0.0], [1.042884762249856, 1.0], [2.508217998455069, 1.0], [2.508217998455069, 2.0], [2.862900439748115, 2.0], [2.862900439748115, 3.0], [3.5834466806438803, 3.0], [3.5834466806438803, 4.0], [3.8961033730992645, 4.0], [3.8961033730992645, 5.0], [5.620597519400867, 5.0], [5.620597519400867, 6.0], [5.724751483373458, 6.0], [5.724751483373458, 7.0], [5.808782685799034, 7.0], [5.808782685799034, 8.0], [5.97153978397869, 8.0], [5.97153978397869, 9.0], [8.890785405075356, 9.0], [8.890785405075356, 10.0], [12.0541773076252, 10.0], [12.0541773076252, 11.0], [12.108560366533467, 11.0], [12.108560366533467, 12.0], [12.842873808832994, 12.0], [12.842873808832994, 13.0], [15.657330426681062, 13.0], [15.657330426681062, 14.0], [15.996054166457654, 14.0], [15.996054166457654, 15.0], [20.873825062267628, 15.0], [20.873825062267628, 16.0], [23.81746713175772, 16.0], [23.81746713175772, 17.0], [27.622807374916114, 17.0], [27.622807374916114, 18.0], [28.27880121219315, 18.0], [28.27880121219315, 19.0], [31.85770714654026, 19.0], [31.85770714654026, 20.0], [32.93600494115783, 20.0], [32.93600494115783, 21.0], [35.32034892280397, 21.0], [35.32034892280397, 22.0], [35.78855252700181, 22.0], [35.78855252700181, 23.0], [36.94701158325401, 23.0], [36.94701158325401, 24.0], [37.01769689882119, 24.0], [37.01769689882119, 25.0], [37.02121093843795, 25.0], [37.02121093843795, 26.0], [37.02121093843795, 26.0], [40.0, 26.0]], "data02": [[20.873825062267628, 0.4030171294761066], [12.108560366533467, 0.22738502786384973], [12.842873808832994, 0.4118494283627656], [32.93600494115783, 0.6382523758414013], [3.5834466806438803, 0.5990599362916212], [5.724751483373458, 0.5157221242988332], [3.8961033730992645, 0.25811442056748835], [15.996054166457654, 0.06909933131627022], [2.862900439748115, 0.584970234780849], [37.01769689882119, 0.40200532940451483], [1.042884762249856, 0.14386112303336585], [15.657330426681062, 0.5518721816399371], [2.508217998455069, 0.5452332412064024], [35.78855252700181, 0.44564143016623525], [27.622807374916114, 0.42407250510415717], [35.32034892280397, 0.0810107235317542], [31.85770714654026, 0.4255356763149597], [5.97153978397869, 0.14365832036453197], [28.27880121219315, 0.3579038076563412], [23.81746713175772, 0.05169913933821235], [8.890785405075356, 0.631032737022811], [37.02121093843795, 0.21472710786442462], [5.620597519400867, 0.5398613783911531], [5.808782685799034, 0.5613868764037775], [12.0541773076252, 0.003291303813164781], [36.94701158325401, 0.5850159235692868]], "data03": [[8.75855465488424, 0.7545165483929938], [5.884039621625585, 0.7224708127832236], [13.612250993832653, 0.8073563014484035], [14.581638265699336, 0.9756929978672928], [24.809394880531475, 0.9136537482524745], [5.932809093025075, 0.7847700309365135], [5.653249874122497, 0.9417487254455423], [27.482616276913966, 0.9197689785166068]], "data01": [[0.0, 0.65, 0.0], [40.0, 0.65, 26.0]]}, "id": "el559774415628368"});
}(mpld3);
}else if(typeof define === "function" && define.amd){
// require.js is available: use it to load d3/mpld3
require.config({paths: {d3: "https://mpld3.github.io/js/d3.v3.min"}});
require(["d3"], function(d3){
window.d3 = d3;
mpld3_load_lib("https://mpld3.github.io/js/mpld3.v0.2.js", function(){
mpld3.draw_figure("fig_el5597744156283687044080157", {"axes": [{"xlim": [0.0, 40.0], "yscale": "linear", "axesbg": "#FFFFFF", "texts": [{"v_baseline": "hanging", "h_anchor": "middle", "color": "#000000", "text": "time", "coordinates": "axes", "zorder": 3, "alpha": 1, "fontsize": 12.0, "position": [0.5, -0.13177083333333339], "rotation": -0.0, "id": "el559774415805520"}, {"v_baseline": "auto", "h_anchor": "middle", "color": "#000000", "text": "thinning dimension", "coordinates": "axes", "zorder": 3, "alpha": 1, "fontsize": 12.0, "position": [-0.059538810483870969, 0.5], "rotation": -90.0, "id": "el559774467029968"}], "zoomable": true, "images": [], "xdomain": [0.0, 40.0], "ylim": [0.0, 1.0], "paths": [], "sharey": [], "sharex": [], "axesbgalpha": null, "axes": [{"scale": "linear", "tickformat": null, "grid": {"gridOn": false}, "fontsize": 12.0, "position": "bottom", "nticks": 9, "tickvalues": null}, {"scale": "linear", "tickformat": null, "grid": {"gridOn": false}, "fontsize": 12.0, "position": "left", "nticks": 6, "tickvalues": null}], "lines": [{"color": "#7F7F7F", "yindex": 1, "coordinates": "data", "dasharray": "2,2", "zorder": 2, "alpha": 0.5, "xindex": 0, "linewidth": 2, "data": "data01", "id": "el559774467432656"}], "markers": [], "id": "el559774415628176", "ydomain": [0.0, 1.0], "collections": [{"paths": [[[[0.0, -0.5], [0.13260155, -0.5], [0.25978993539242673, -0.44731684579412084], [0.3535533905932738, -0.3535533905932738], [0.44731684579412084, -0.25978993539242673], [0.5, -0.13260155], [0.5, 0.0], [0.5, 0.13260155], [0.44731684579412084, 0.25978993539242673], [0.3535533905932738, 0.3535533905932738], [0.25978993539242673, 0.44731684579412084], [0.13260155, 0.5], [0.0, 0.5], [-0.13260155, 0.5], [-0.25978993539242673, 0.44731684579412084], [-0.3535533905932738, 0.3535533905932738], [-0.44731684579412084, 0.25978993539242673], [-0.5, 0.13260155], [-0.5, 0.0], [-0.5, -0.13260155], [-0.44731684579412084, -0.25978993539242673], [-0.3535533905932738, -0.3535533905932738], [-0.25978993539242673, -0.44731684579412084], [-0.13260155, -0.5], [0.0, -0.5]], ["M", "C", "C", "C", "C", "C", "C", "C", "C", "Z"]]], "edgecolors": ["#000000"], "edgewidths": [1.0], "offsets": "data02", "yindex": 1, "id": "el559774467431376", "pathtransforms": [[7.027283689263066, 0.0, 0.0, 7.027283689263066, 0.0, 0.0]], "pathcoordinates": "display", "offsetcoordinates": "data", "zorder": 1, "xindex": 0, "alphas": [0.5], "facecolors": ["#1199EE"]}, {"paths": [[[[0.0, -0.5], [0.13260155, -0.5], [0.25978993539242673, -0.44731684579412084], [0.3535533905932738, -0.3535533905932738], [0.44731684579412084, -0.25978993539242673], [0.5, -0.13260155], [0.5, 0.0], [0.5, 0.13260155], [0.44731684579412084, 0.25978993539242673], [0.3535533905932738, 0.3535533905932738], [0.25978993539242673, 0.44731684579412084], [0.13260155, 0.5], [0.0, 0.5], [-0.13260155, 0.5], [-0.25978993539242673, 0.44731684579412084], [-0.3535533905932738, 0.3535533905932738], [-0.44731684579412084, 0.25978993539242673], [-0.5, 0.13260155], [-0.5, 0.0], [-0.5, -0.13260155], [-0.44731684579412084, -0.25978993539242673], [-0.3535533905932738, -0.3535533905932738], [-0.25978993539242673, -0.44731684579412084], [-0.13260155, -0.5], [0.0, -0.5]], ["M", "C", "C", "C", "C", "C", "C", "C", "C", "Z"]]], "edgecolors": ["#000000"], "edgewidths": [1.0], "offsets": "data03", "yindex": 1, "id": "el559774467433360", "pathtransforms": [[7.027283689263066, 0.0, 0.0, 7.027283689263066, 0.0, 0.0]], "pathcoordinates": "display", "offsetcoordinates": "data", "zorder": 1, "xindex": 0, "alphas": [0.25], "facecolors": ["#7F7F7F"]}], "xscale": "linear", "bbox": [0.125, 0.53636363636363638, 0.77500000000000002, 0.36363636363636365]}, {"xlim": [0.0, 40.0], "yscale": "linear", "axesbg": "#FFFFFF", "texts": [{"v_baseline": "hanging", "h_anchor": "middle", "color": "#000000", "text": "time", "coordinates": "axes", "zorder": 3, "alpha": 1, "fontsize": 12.0, "position": [0.5, -0.13177083333333334], "rotation": -0.0, "id": "el559774467608528"}, {"v_baseline": "auto", "h_anchor": "middle", "color": "#000000", "text": "number of jumps", "coordinates": "axes", "zorder": 3, "alpha": 1, "fontsize": 12.0, "position": [-0.051411290322580641, 0.5], "rotation": -90.0, "id": "el559774467706512"}], "zoomable": true, "images": [], "xdomain": [0.0, 40.0], "ylim": [0.0, 60.0], "paths": [], "sharey": [], "sharex": [], "axesbgalpha": null, "axes": [{"scale": "linear", "tickformat": null, "grid": {"gridOn": false}, "fontsize": 12.0, "position": "bottom", "nticks": 9, "tickvalues": null}, {"scale": "linear", "tickformat": null, "grid": {"gridOn": false}, "fontsize": 12.0, "position": "left", "nticks": 7, "tickvalues": null}], "lines": [{"color": "#7F7F7F", "yindex": 1, "coordinates": "data", "dasharray": "10,0", "zorder": 2, "alpha": 0.3, "xindex": 0, "linewidth": 1.5, "data": "data04", "id": "el559774468088784"}, {"color": "#1199EE", "yindex": 1, "coordinates": "data", "dasharray": "10,0", "zorder": 2, "alpha": 0.6, "xindex": 0, "linewidth": 1.5, "data": "data05", "id": "el559774468146832"}, {"color": "#7F7F7F", "yindex": 0, "coordinates": "data", "dasharray": "6,6", "zorder": 2, "alpha": 0.5, "xindex": 0, "linewidth": 1.0, "data": "data01", "id": "el559774468148560"}, {"color": "#1199EE", "yindex": 2, "coordinates": "data", "dasharray": "6,6", "zorder": 2, "alpha": 0.5, "xindex": 0, "linewidth": 1.0, "data": "data01", "id": "el559774468150160"}], "markers": [], "id": "el559774467525840", "ydomain": [0.0, 60.0], "collections": [], "xscale": "linear", "bbox": [0.125, 0.099999999999999978, 0.77500000000000002, 0.36363636363636365]}], "height": 480.0, "width": 640.0, "plugins": [{"type": "reset"}, {"enabled": false, "button": true, "type": "zoom"}, {"enabled": false, "button": true, "type": "boxzoom"}], "data": {"data04": [[0.0, 0.0], [1.042884762249856, 0.0], [1.042884762249856, 1.0], [2.508217998455069, 1.0], [2.508217998455069, 2.0], [2.862900439748115, 2.0], [2.862900439748115, 3.0], [3.5834466806438803, 3.0], [3.5834466806438803, 4.0], [3.8961033730992645, 4.0], [3.8961033730992645, 5.0], [5.620597519400867, 5.0], [5.620597519400867, 6.0], [5.653249874122497, 6.0], [5.653249874122497, 7.0], [5.724751483373458, 7.0], [5.724751483373458, 8.0], [5.808782685799034, 8.0], [5.808782685799034, 9.0], [5.884039621625585, 9.0], [5.884039621625585, 10.0], [5.932809093025075, 10.0], [5.932809093025075, 11.0], [5.97153978397869, 11.0], [5.97153978397869, 12.0], [8.75855465488424, 12.0], [8.75855465488424, 13.0], [8.890785405075356, 13.0], [8.890785405075356, 14.0], [12.0541773076252, 14.0], [12.0541773076252, 15.0], [12.108560366533467, 15.0], [12.108560366533467, 16.0], [12.842873808832994, 16.0], [12.842873808832994, 17.0], [13.612250993832653, 17.0], [13.612250993832653, 18.0], [14.581638265699336, 18.0], [14.581638265699336, 19.0], [15.657330426681062, 19.0], [15.657330426681062, 20.0], [15.996054166457654, 20.0], [15.996054166457654, 21.0], [20.873825062267628, 21.0], [20.873825062267628, 22.0], [23.81746713175772, 22.0], [23.81746713175772, 23.0], [24.809394880531475, 23.0], [24.809394880531475, 24.0], [27.482616276913966, 24.0], [27.482616276913966, 25.0], [27.622807374916114, 25.0], [27.622807374916114, 26.0], [28.27880121219315, 26.0], [28.27880121219315, 27.0], [31.85770714654026, 27.0], [31.85770714654026, 28.0], [32.93600494115783, 28.0], [32.93600494115783, 29.0], [35.32034892280397, 29.0], [35.32034892280397, 30.0], [35.78855252700181, 30.0], [35.78855252700181, 31.0], [36.94701158325401, 31.0], [36.94701158325401, 32.0], [37.01769689882119, 32.0], [37.01769689882119, 33.0], [37.02121093843795, 33.0], [37.02121093843795, 34.0], [37.02121093843795, 34.0], [40.0, 34.0]], "data05": [[0.0, 0.0], [1.042884762249856, 0.0], [1.042884762249856, 1.0], [2.508217998455069, 1.0], [2.508217998455069, 2.0], [2.862900439748115, 2.0], [2.862900439748115, 3.0], [3.5834466806438803, 3.0], [3.5834466806438803, 4.0], [3.8961033730992645, 4.0], [3.8961033730992645, 5.0], [5.620597519400867, 5.0], [5.620597519400867, 6.0], [5.724751483373458, 6.0], [5.724751483373458, 7.0], [5.808782685799034, 7.0], [5.808782685799034, 8.0], [5.97153978397869, 8.0], [5.97153978397869, 9.0], [8.890785405075356, 9.0], [8.890785405075356, 10.0], [12.0541773076252, 10.0], [12.0541773076252, 11.0], [12.108560366533467, 11.0], [12.108560366533467, 12.0], [12.842873808832994, 12.0], [12.842873808832994, 13.0], [15.657330426681062, 13.0], [15.657330426681062, 14.0], [15.996054166457654, 14.0], [15.996054166457654, 15.0], [20.873825062267628, 15.0], [20.873825062267628, 16.0], [23.81746713175772, 16.0], [23.81746713175772, 17.0], [27.622807374916114, 17.0], [27.622807374916114, 18.0], [28.27880121219315, 18.0], [28.27880121219315, 19.0], [31.85770714654026, 19.0], [31.85770714654026, 20.0], [32.93600494115783, 20.0], [32.93600494115783, 21.0], [35.32034892280397, 21.0], [35.32034892280397, 22.0], [35.78855252700181, 22.0], [35.78855252700181, 23.0], [36.94701158325401, 23.0], [36.94701158325401, 24.0], [37.01769689882119, 24.0], [37.01769689882119, 25.0], [37.02121093843795, 25.0], [37.02121093843795, 26.0], [37.02121093843795, 26.0], [40.0, 26.0]], "data02": [[20.873825062267628, 0.4030171294761066], [12.108560366533467, 0.22738502786384973], [12.842873808832994, 0.4118494283627656], [32.93600494115783, 0.6382523758414013], [3.5834466806438803, 0.5990599362916212], [5.724751483373458, 0.5157221242988332], [3.8961033730992645, 0.25811442056748835], [15.996054166457654, 0.06909933131627022], [2.862900439748115, 0.584970234780849], [37.01769689882119, 0.40200532940451483], [1.042884762249856, 0.14386112303336585], [15.657330426681062, 0.5518721816399371], [2.508217998455069, 0.5452332412064024], [35.78855252700181, 0.44564143016623525], [27.622807374916114, 0.42407250510415717], [35.32034892280397, 0.0810107235317542], [31.85770714654026, 0.4255356763149597], [5.97153978397869, 0.14365832036453197], [28.27880121219315, 0.3579038076563412], [23.81746713175772, 0.05169913933821235], [8.890785405075356, 0.631032737022811], [37.02121093843795, 0.21472710786442462], [5.620597519400867, 0.5398613783911531], [5.808782685799034, 0.5613868764037775], [12.0541773076252, 0.003291303813164781], [36.94701158325401, 0.5850159235692868]], "data03": [[8.75855465488424, 0.7545165483929938], [5.884039621625585, 0.7224708127832236], [13.612250993832653, 0.8073563014484035], [14.581638265699336, 0.9756929978672928], [24.809394880531475, 0.9136537482524745], [5.932809093025075, 0.7847700309365135], [5.653249874122497, 0.9417487254455423], [27.482616276913966, 0.9197689785166068]], "data01": [[0.0, 0.65, 0.0], [40.0, 0.65, 26.0]]}, "id": "el559774415628368"});
});
});
}else{
// require.js not available: dynamically load d3 & mpld3
mpld3_load_lib("https://mpld3.github.io/js/d3.v3.min.js", function(){
mpld3_load_lib("https://mpld3.github.io/js/mpld3.v0.2.js", function(){
mpld3.draw_figure("fig_el5597744156283687044080157", {"axes": [{"xlim": [0.0, 40.0], "yscale": "linear", "axesbg": "#FFFFFF", "texts": [{"v_baseline": "hanging", "h_anchor": "middle", "color": "#000000", "text": "time", "coordinates": "axes", "zorder": 3, "alpha": 1, "fontsize": 12.0, "position": [0.5, -0.13177083333333339], "rotation": -0.0, "id": "el559774415805520"}, {"v_baseline": "auto", "h_anchor": "middle", "color": "#000000", "text": "thinning dimension", "coordinates": "axes", "zorder": 3, "alpha": 1, "fontsize": 12.0, "position": [-0.059538810483870969, 0.5], "rotation": -90.0, "id": "el559774467029968"}], "zoomable": true, "images": [], "xdomain": [0.0, 40.0], "ylim": [0.0, 1.0], "paths": [], "sharey": [], "sharex": [], "axesbgalpha": null, "axes": [{"scale": "linear", "tickformat": null, "grid": {"gridOn": false}, "fontsize": 12.0, "position": "bottom", "nticks": 9, "tickvalues": null}, {"scale": "linear", "tickformat": null, "grid": {"gridOn": false}, "fontsize": 12.0, "position": "left", "nticks": 6, "tickvalues": null}], "lines": [{"color": "#7F7F7F", "yindex": 1, "coordinates": "data", "dasharray": "2,2", "zorder": 2, "alpha": 0.5, "xindex": 0, "linewidth": 2, "data": "data01", "id": "el559774467432656"}], "markers": [], "id": "el559774415628176", "ydomain": [0.0, 1.0], "collections": [{"paths": [[[[0.0, -0.5], [0.13260155, -0.5], [0.25978993539242673, -0.44731684579412084], [0.3535533905932738, -0.3535533905932738], [0.44731684579412084, -0.25978993539242673], [0.5, -0.13260155], [0.5, 0.0], [0.5, 0.13260155], [0.44731684579412084, 0.25978993539242673], [0.3535533905932738, 0.3535533905932738], [0.25978993539242673, 0.44731684579412084], [0.13260155, 0.5], [0.0, 0.5], [-0.13260155, 0.5], [-0.25978993539242673, 0.44731684579412084], [-0.3535533905932738, 0.3535533905932738], [-0.44731684579412084, 0.25978993539242673], [-0.5, 0.13260155], [-0.5, 0.0], [-0.5, -0.13260155], [-0.44731684579412084, -0.25978993539242673], [-0.3535533905932738, -0.3535533905932738], [-0.25978993539242673, -0.44731684579412084], [-0.13260155, -0.5], [0.0, -0.5]], ["M", "C", "C", "C", "C", "C", "C", "C", "C", "Z"]]], "edgecolors": ["#000000"], "edgewidths": [1.0], "offsets": "data02", "yindex": 1, "id": "el559774467431376", "pathtransforms": [[7.027283689263066, 0.0, 0.0, 7.027283689263066, 0.0, 0.0]], "pathcoordinates": "display", "offsetcoordinates": "data", "zorder": 1, "xindex": 0, "alphas": [0.5], "facecolors": ["#1199EE"]}, {"paths": [[[[0.0, -0.5], [0.13260155, -0.5], [0.25978993539242673, -0.44731684579412084], [0.3535533905932738, -0.3535533905932738], [0.44731684579412084, -0.25978993539242673], [0.5, -0.13260155], [0.5, 0.0], [0.5, 0.13260155], [0.44731684579412084, 0.25978993539242673], [0.3535533905932738, 0.3535533905932738], [0.25978993539242673, 0.44731684579412084], [0.13260155, 0.5], [0.0, 0.5], [-0.13260155, 0.5], [-0.25978993539242673, 0.44731684579412084], [-0.3535533905932738, 0.3535533905932738], [-0.44731684579412084, 0.25978993539242673], [-0.5, 0.13260155], [-0.5, 0.0], [-0.5, -0.13260155], [-0.44731684579412084, -0.25978993539242673], [-0.3535533905932738, -0.3535533905932738], [-0.25978993539242673, -0.44731684579412084], [-0.13260155, -0.5], [0.0, -0.5]], ["M", "C", "C", "C", "C", "C", "C", "C", "C", "Z"]]], "edgecolors": ["#000000"], "edgewidths": [1.0], "offsets": "data03", "yindex": 1, "id": "el559774467433360", "pathtransforms": [[7.027283689263066, 0.0, 0.0, 7.027283689263066, 0.0, 0.0]], "pathcoordinates": "display", "offsetcoordinates": "data", "zorder": 1, "xindex": 0, "alphas": [0.25], "facecolors": ["#7F7F7F"]}], "xscale": "linear", "bbox": [0.125, 0.53636363636363638, 0.77500000000000002, 0.36363636363636365]}, {"xlim": [0.0, 40.0], "yscale": "linear", "axesbg": "#FFFFFF", "texts": [{"v_baseline": "hanging", "h_anchor": "middle", "color": "#000000", "text": "time", "coordinates": "axes", "zorder": 3, "alpha": 1, "fontsize": 12.0, "position": [0.5, -0.13177083333333334], "rotation": -0.0, "id": "el559774467608528"}, {"v_baseline": "auto", "h_anchor": "middle", "color": "#000000", "text": "number of jumps", "coordinates": "axes", "zorder": 3, "alpha": 1, "fontsize": 12.0, "position": [-0.051411290322580641, 0.5], "rotation": -90.0, "id": "el559774467706512"}], "zoomable": true, "images": [], "xdomain": [0.0, 40.0], "ylim": [0.0, 60.0], "paths": [], "sharey": [], "sharex": [], "axesbgalpha": null, "axes": [{"scale": "linear", "tickformat": null, "grid": {"gridOn": false}, "fontsize": 12.0, "position": "bottom", "nticks": 9, "tickvalues": null}, {"scale": "linear", "tickformat": null, "grid": {"gridOn": false}, "fontsize": 12.0, "position": "left", "nticks": 7, "tickvalues": null}], "lines": [{"color": "#7F7F7F", "yindex": 1, "coordinates": "data", "dasharray": "10,0", "zorder": 2, "alpha": 0.3, "xindex": 0, "linewidth": 1.5, "data": "data04", "id": "el559774468088784"}, {"color": "#1199EE", "yindex": 1, "coordinates": "data", "dasharray": "10,0", "zorder": 2, "alpha": 0.6, "xindex": 0, "linewidth": 1.5, "data": "data05", "id": "el559774468146832"}, {"color": "#7F7F7F", "yindex": 0, "coordinates": "data", "dasharray": "6,6", "zorder": 2, "alpha": 0.5, "xindex": 0, "linewidth": 1.0, "data": "data01", "id": "el559774468148560"}, {"color": "#1199EE", "yindex": 2, "coordinates": "data", "dasharray": "6,6", "zorder": 2, "alpha": 0.5, "xindex": 0, "linewidth": 1.0, "data": "data01", "id": "el559774468150160"}], "markers": [], "id": "el559774467525840", "ydomain": [0.0, 60.0], "collections": [], "xscale": "linear", "bbox": [0.125, 0.099999999999999978, 0.77500000000000002, 0.36363636363636365]}], "height": 480.0, "width": 640.0, "plugins": [{"type": "reset"}, {"enabled": false, "button": true, "type": "zoom"}, {"enabled": false, "button": true, "type": "boxzoom"}], "data": {"data04": [[0.0, 0.0], [1.042884762249856, 0.0], [1.042884762249856, 1.0], [2.508217998455069, 1.0], [2.508217998455069, 2.0], [2.862900439748115, 2.0], [2.862900439748115, 3.0], [3.5834466806438803, 3.0], [3.5834466806438803, 4.0], [3.8961033730992645, 4.0], [3.8961033730992645, 5.0], [5.620597519400867, 5.0], [5.620597519400867, 6.0], [5.653249874122497, 6.0], [5.653249874122497, 7.0], [5.724751483373458, 7.0], [5.724751483373458, 8.0], [5.808782685799034, 8.0], [5.808782685799034, 9.0], [5.884039621625585, 9.0], [5.884039621625585, 10.0], [5.932809093025075, 10.0], [5.932809093025075, 11.0], [5.97153978397869, 11.0], [5.97153978397869, 12.0], [8.75855465488424, 12.0], [8.75855465488424, 13.0], [8.890785405075356, 13.0], [8.890785405075356, 14.0], [12.0541773076252, 14.0], [12.0541773076252, 15.0], [12.108560366533467, 15.0], [12.108560366533467, 16.0], [12.842873808832994, 16.0], [12.842873808832994, 17.0], [13.612250993832653, 17.0], [13.612250993832653, 18.0], [14.581638265699336, 18.0], [14.581638265699336, 19.0], [15.657330426681062, 19.0], [15.657330426681062, 20.0], [15.996054166457654, 20.0], [15.996054166457654, 21.0], [20.873825062267628, 21.0], [20.873825062267628, 22.0], [23.81746713175772, 22.0], [23.81746713175772, 23.0], [24.809394880531475, 23.0], [24.809394880531475, 24.0], [27.482616276913966, 24.0], [27.482616276913966, 25.0], [27.622807374916114, 25.0], [27.622807374916114, 26.0], [28.27880121219315, 26.0], [28.27880121219315, 27.0], [31.85770714654026, 27.0], [31.85770714654026, 28.0], [32.93600494115783, 28.0], [32.93600494115783, 29.0], [35.32034892280397, 29.0], [35.32034892280397, 30.0], [35.78855252700181, 30.0], [35.78855252700181, 31.0], [36.94701158325401, 31.0], [36.94701158325401, 32.0], [37.01769689882119, 32.0], [37.01769689882119, 33.0], [37.02121093843795, 33.0], [37.02121093843795, 34.0], [37.02121093843795, 34.0], [40.0, 34.0]], "data05": [[0.0, 0.0], [1.042884762249856, 0.0], [1.042884762249856, 1.0], [2.508217998455069, 1.0], [2.508217998455069, 2.0], [2.862900439748115, 2.0], [2.862900439748115, 3.0], [3.5834466806438803, 3.0], [3.5834466806438803, 4.0], [3.8961033730992645, 4.0], [3.8961033730992645, 5.0], [5.620597519400867, 5.0], [5.620597519400867, 6.0], [5.724751483373458, 6.0], [5.724751483373458, 7.0], [5.808782685799034, 7.0], [5.808782685799034, 8.0], [5.97153978397869, 8.0], [5.97153978397869, 9.0], [8.890785405075356, 9.0], [8.890785405075356, 10.0], [12.0541773076252, 10.0], [12.0541773076252, 11.0], [12.108560366533467, 11.0], [12.108560366533467, 12.0], [12.842873808832994, 12.0], [12.842873808832994, 13.0], [15.657330426681062, 13.0], [15.657330426681062, 14.0], [15.996054166457654, 14.0], [15.996054166457654, 15.0], [20.873825062267628, 15.0], [20.873825062267628, 16.0], [23.81746713175772, 16.0], [23.81746713175772, 17.0], [27.622807374916114, 17.0], [27.622807374916114, 18.0], [28.27880121219315, 18.0], [28.27880121219315, 19.0], [31.85770714654026, 19.0], [31.85770714654026, 20.0], [32.93600494115783, 20.0], [32.93600494115783, 21.0], [35.32034892280397, 21.0], [35.32034892280397, 22.0], [35.78855252700181, 22.0], [35.78855252700181, 23.0], [36.94701158325401, 23.0], [36.94701158325401, 24.0], [37.01769689882119, 24.0], [37.01769689882119, 25.0], [37.02121093843795, 25.0], [37.02121093843795, 26.0], [37.02121093843795, 26.0], [40.0, 26.0]], "data02": [[20.873825062267628, 0.4030171294761066], [12.108560366533467, 0.22738502786384973], [12.842873808832994, 0.4118494283627656], [32.93600494115783, 0.6382523758414013], [3.5834466806438803, 0.5990599362916212], [5.724751483373458, 0.5157221242988332], [3.8961033730992645, 0.25811442056748835], [15.996054166457654, 0.06909933131627022], [2.862900439748115, 0.584970234780849], [37.01769689882119, 0.40200532940451483], [1.042884762249856, 0.14386112303336585], [15.657330426681062, 0.5518721816399371], [2.508217998455069, 0.5452332412064024], [35.78855252700181, 0.44564143016623525], [27.622807374916114, 0.42407250510415717], [35.32034892280397, 0.0810107235317542], [31.85770714654026, 0.4255356763149597], [5.97153978397869, 0.14365832036453197], [28.27880121219315, 0.3579038076563412], [23.81746713175772, 0.05169913933821235], [8.890785405075356, 0.631032737022811], [37.02121093843795, 0.21472710786442462], [5.620597519400867, 0.5398613783911531], [5.808782685799034, 0.5613868764037775], [12.0541773076252, 0.003291303813164781], [36.94701158325401, 0.5850159235692868]], "data03": [[8.75855465488424, 0.7545165483929938], [5.884039621625585, 0.7224708127832236], [13.612250993832653, 0.8073563014484035], [14.581638265699336, 0.9756929978672928], [24.809394880531475, 0.9136537482524745], [5.932809093025075, 0.7847700309365135], [5.653249874122497, 0.9417487254455423], [27.482616276913966, 0.9197689785166068]], "data01": [[0.0, 0.65, 0.0], [40.0, 0.65, 26.0]]}, "id": "el559774415628368"});
})
});
}
</script>
<p>For reference, the plot was produced in Python with matplotlib, and then ported to the web with <a href="http://mpld3.github.io">mpld3</a>. To zoom in and pan, use the icons in the lower left of the figure. The code can be found in my <a href="https://github.com/mikss/poissthin">Github repo</a>.</p>
<p><a href="http://ssk.im/blog/poisson/">Thinning PRMs</a> was originally published by Steven Soojin Kim at <a href="http://ssk.im">Steven Soojin Kim</a> on 2016.03.06.</p><![CDATA[Control theory today]]>http://ssk.im/blog/control2015-10-10T00:00:00-04:002015-10-10T00:00:00-04:00Steven Soojin Kimhttp://ssk.imsteven@ssk.im
<p>Optimal control theory is a rich mathematical field with a surprisingly interesting history. It dates back to the <a href="https://en.wikipedia.org/wiki/Brachistochrone_curve#History">brachistochrone problem</a> of Johann Bernoulli in 1696, but it genuinely boomed during the <a href="http://www.emis.ams.org/journals/DMJDMV/vol-ismp/48_pesch-hans-josef-cold-war.pdf">Cold War</a> through independent developments in Soviet Union (Steklov Institute) and the United States (RAND Corporation). But at some point, I came across <a href="http://blog.sciencenet.cn/home.php?mod=space&uid=1565&do=blog&id=329153">Yu-Chi Ho’s blog post</a>, from 2010, where he reports the bold pronouncement of an NSF program director: <strong>“Control is dead!”</strong></p>
<p>Professor Ho explains that perhaps “mature” is a better word, but even this might be seen as a strong claim. As a graduate student, I am in no place to judge the “life” or “death” of such a broad field, but my bias towards my home department compels me to promote and celebrate control theory, which played an important role in the growth of the <a href="http://www.brown.edu/academics/applied-mathematics/origin">Division of Applied Mathematics</a> and the development of the associated Lefschetz Center for Dynamical Systems. That is, I would like to believe that this significant part of Brown’s history still plays a serious role in the mathematical community today.</p>
<p>The goal of this post is to point out some modern appearances of control theory, particularly in seemingly unexpected areas (at least, unexpected to this amateur author). Of course, control theory has long played a leading role in applied probability: e.g., in <a href="http://robertcmerton.com/continuous-time-finance-8.html">financial engineering</a>, <a href="http://www.meyn.ece.ufl.edu/archive/spm_files/CTCN/CTCN.html">queueing networks</a>, and <a href="https://books.google.com/books?hl=en&lr=&id=LbxTJHEO4agC&oi=fnd&pg=PP1&dq=Stochastic+control+of+partially+observable+systems&ots=f8KavJm6x2&sig=GruhewUjpwAVdPD_IWkLq4ycxaY#v=onepage&q&f=false">filtering</a>. But these are somewhat well-recognized as the usual stomping grounds of control theory, and I would like to highlight some other (possibly more surprising) connections.</p>
<p>In particular, the first three applications described below invoke the following variational formula from <a href="https://projecteuclid.org/euclid.aop/1022855876" title="A variational representation for certain functionals of Brownian motion">Boué, Dupuis (AoP’98)</a>. For $W$ a standard $d$-dimensional Brownian motion on $[0,1]$, and $f: C([0,1];\mathbb{R}^d) \rightarrow \mathbb{R}$ measurable and bounded from above, we have</p>
<script type="math/tex; mode=display"> -\log \mathbb{E} e^{-f(W)} = \inf_{u} \,\, \mathbb{E}\left[ \frac{1}{2}\int_0^1 |u_s|^2 ds + f\left(W + \int_0^\cdot u_s\,ds\right) \right], \tag{$\star$} </script>
<p>where the infimum is over the space of <em>controls</em> $u$ which are progressively measurable with respect to the augmented Brownian filtration. One should view the first term in the infimum as a “running cost” for the effort exerted by the control $u$, and the second term as a “state occupation cost”. In particular, if $f$ only depends on the time 1 state of the controlled input process, then it can be interpreted as the usual “terminal cost”. Under the preceding interpretations, $-\log\mathbb{E} e^{-f(W)}$ is a representation for the value function of the associated stochastic control problem. In fact, formulas like $(\star)$ arose even earlier in the control literature; e.g., in <a href="http://link.springer.com/article/10.1007%2FBF01442148" title="Exit probabilities and optimal stochastic control">Fleming (AMO’77)</a>.</p>
<p>As promised, here are a few “modern” links to control theory:</p>
<ol>
<li>
<p><strong>Functional inequalities:</strong> <a href="https://projecteuclid.org/euclid.aihp/1372772648" title="Representation formula for the entropy and functional inequalities">Lehec (AIHP’13)</a> derives what is essentially the dual formulation of $(\star)$: for $\gamma$ the Wiener measure on $C([0,1];\mathbb{R}^d)$,
<script type="math/tex"> \begin{align} H( \mu \| \gamma ) = \min_{u} \mathbb{E}\left[\frac{1}{2}\int_0^1 |u_s|^2 ds\right], \end{align}</script>
where the minimum is over all controls $u$ such that the process $W + \int_0^\cdot u_s ds$ has law $\mu$. That is, the control $u$ is related to the <em>optimal</em> change of measure from $\gamma$ to $\mu$. Related analysis of (an) optimizing $u$ combined with basic martingale arguments yield straightforward proofs of Talagrand’s transportation cost inequality, log Sobolev inequality, and Brascamp-Lieb inequality (for the Wiener measure). Similar control-like principles (for the standard Gaussian measure on $\mathbb{R}^d$ instead of the Wiener measure on path space) are employed in <a href="http://arxiv.org/pdf/1410.3887v2.pdf" title="Regularization under diffusion and anti-concentration of temperature">Eldan, Lee (preprint’14)</a> to establish uniform decay of the level sets of the Gaussian measure under the Ornstein-Uhlenbeck semigroup.</p>
</li>
<li>
<p><strong>Spin glasses:</strong> In seminal work by <a href="http://annals.math.princeton.edu/2006/163-1/p04" title="The Parisi formula">Talagrand (AoM’06)</a> and <a href="http://projecteuclid.org/euclid.aop/1395838120" title="The Parisi formula for mixed p-spin models">Panchenko (AoP’14)</a>, it was established that the thermodynamic limit of the free energy of the Sherrington-Kirkpatrick model (and associated mixed $p$-spin model) is given by a minimization problem involving the “Parisi functional”, the solution to a particular nonlinear PDE. Inspired by the variational formula $(\star)$, it is shown in <a href="http://link.springer.com/article/10.1007%2Fs00220-014-2254-z" title="The Parisi formula has a unique minimizer">Auffinger, Chen (CMP’14)</a> that the Parisi functional is strictly convex, and thus a unique “Parisi measure” characterizes the limiting free energy of the SK model. The proof of strict convexity is simplified in <a href="http://arxiv.org/abs/1502.04398" title="A Dynamic Programming Approach to the Parisi Functional">Jagannath, Tobasco (PAMS’15)</a>, by explicitly appealing to the dynamic programming principle from stochastic control theory. This theme of a control-theoretic approach to analysis of the Parisi functional is continued in <a href="http://arxiv.org/abs/1501.06635" title="Variational representations for the Parisi functional and the two-dimensional Guerra-Talagrand bound">Chen (preprint’15)</a>.</p>
</li>
<li>
<p><strong>KPZ and rough paths:</strong> In Section 7 of <a href="http://arxiv.org/pdf/1508.03877v1.pdf" title="KPZ reloaded">Gubinelli, Perkowski (preprint’15)</a>, the authors use a generalized version of $(\star)$ to frame the KPZ equation as the value function of a stochastic control problem. This representation is in turn used to prove certain a priori estimates which yield global existence of solutions to the KPZ equation, complementing Hairer’s approach via regularity structures.</p>
</li>
<li>
<p><strong>First-passage percolation:</strong> <a href="http://arxiv.org/pdf/1311.0316v2.pdf" title="Variational formula for the time-constant of first-passage percolation">Krishnan (preprint’14)</a> views the first-passage time on $\mathbb{Z}^d$ as a discrete control problem, where the canonical basis vectors ${\pm e_1, \cdots, \pm e_d}$ act as the “controls” of a minimizing path between two points. This in turn yields a characterization of the associated time constant as the solution to a discrete Hamilton-Jacobi equation.</p>
</li>
</ol>
<p>I suppose the overarching theme is that many mathematical problems are just (highly sophisticated) optimization problems which, with some work, can be massaged into control problems. In particular, the preceding examples show that adopting a control-theoretic perspective can lead to insightful, meaningful, and productive reformulations of existing problems!</p>
<p><a href="http://ssk.im/blog/control/">Control theory today</a> was originally published by Steven Soojin Kim at <a href="http://ssk.im">Steven Soojin Kim</a> on 2015.10.10.</p><![CDATA[Applied Math Retreat]]>http://ssk.im/blog/apma-retreat2015-09-22T00:00:00-04:002015-09-22T00:00:00-04:00Steven Soojin Kimhttp://ssk.imsteven@ssk.im
<p>This past weekend, 20 of the applied mathematics graduate students gathered in Franklin, NH for our first (annual?) department retreat, organized by Michael Snarski and myself. The goals of the retreat were to stimulate scientific interactions, introduce first-years to the mathematical energy of the department, and refresh our minds in a beautiful natural environment.</p>
<p>I personally had a great time! The fresh air lent the whole weekend a very positive atmosphere which seemed conducive to mathematical activity. The retreat was also a great way to catch up with what others are working on, and to get to know the new students in our department.</p>
<p>We spent the mornings doing math, with awesome workshops led by <a href="http://aguang.github.io">August Guang</a> and Michael, and some short research talks given by <a href="http://ivanapetrovic.weebly.com">Ivana Petrovic</a>, Leroy Jia, Melissa McGuirl, Clark Bowman, and <a href="http://www.dam.brown.edu/people/volkening/home.html">Alexandria Volkening</a>. In the afternoons, we enjoyed the surroundings with some relaxing hikes and excursions on the lake.</p>
<p>Special shoutouts to: our department chair <a href="http://www.dam.brown.edu/people/sandsted/">Björn Sandstede</a> for his feedback, support, and encouragement; <a href="http://www.gautamkamath.com">Gautam Kamath</a> from MIT for the initial idea and some preliminary help with logistical aspects of the trip; Guo-Jhen Wu for a lot of behind-the-scenes help and early brainstorming; and all the participants for sharing their mathematical ideas, driving, cooking, cleaning, and basically making this weekend happen.</p>
<p>Here are a few pictures!</p>
<div class="popup-gallery">
<a href="http://ssk.im/images/ret1-house.jpg" title="Our wonderful home for two nights!"><img src="http://ssk.im/images/ret1-house_s.jpg" width="100" height="100" /></a>
<a href="http://ssk.im/images/ret2-island.jpg" title="A ton of fog in the morning..."><img src="http://ssk.im/images/ret2-island_s.jpg" width="100" height="100" /></a>
<a href="http://ssk.im/images/ret3-michael.jpg" title="Thanks to Michael (pictured) and August, who gave two great workshops with lots of back-and-forth discussion."><img src="http://ssk.im/images/ret3-michael_s.jpg" width="100" height="100" /></a>
<a href="http://ssk.im/images/ret4-leroy.jpg" title="We were also fortunate to hear crisp and informative short talks from Ivana, Leroy (pictured), and Melissa, all outdoors!"><img src="http://ssk.im/images/ret4-leroy_s.jpg" width="100" height="100" /></a>
<a href="http://ssk.im/images/ret5-lunch.jpg" title="Math makes you hungry."><img src="http://ssk.im/images/ret5-lunch_s.jpg" width="100" height="100" /></a>
<a href="http://ssk.im/images/ret6-fire.jpg" title="The fire is where all of the deep thinking happens."><img src="http://ssk.im/images/ret6-fire_s.jpg" width="100" height="100" /></a>
<a href="http://ssk.im/images/ret7-clark.jpg" title="We wrapped up with two indoor talks by Clark (pictured) and Al, who used the TV to display nice graphics."><img src="http://ssk.im/images/ret7-clark_s.jpg" width="100" height="100" /></a>
<a href="http://ssk.im/images/ret8-lake.jpg" title="Can you think of a more perfect place to do math?"><img src="http://ssk.im/images/ret8-lake_s.jpg" width="100" height="100" /></a>
</div>
<p><a href="http://ssk.im/blog/apma-retreat/">Applied Math Retreat</a> was originally published by Steven Soojin Kim at <a href="http://ssk.im">Steven Soojin Kim</a> on 2015.09.22.</p><![CDATA[Great Pens]]>http://ssk.im/blog/great-pens2015-06-22T00:00:00-04:002015-06-22T00:00:00-04:00Steven Soojin Kimhttp://ssk.imsteven@ssk.im
<p>A quality pen makes writing a joy. Sure, a cheap 10¢ BiC pen suffices for the mundane task of placing ink onto paper, but a pen which flows freely and naturally makes you <em>want</em> to write. Here are a few things I look for in pens:</p>
<ol>
<li>no dry “chunks” <sup id="fnref:chunks"><a href="#fn:chunks" class="footnote">1</a></sup> </li>
<li>smooth “write-feel” without any “scratching” sensation</li>
<li>no bleeding, blotting, or smudging </li>
<li>color with sharp contrast and no fading</li>
<li>good ergonomic feel</li>
</ol>
<p>Currently, I’m a fan of <a href="http://www.jetpens.com/Uni-ball-Signo-UM-151-Gel-Ink-Pen-0.38-mm-Green-Black/pd/314">uni-ball UM-151 0.38mm</a> and <a href="http://www.amazon.com/uni-ball-Retractable-Pens-Micro-Point/dp/B001P1ZDTG">uni-ball 207 Micro</a>.</p>
<p><img src="/images/uni-pens.jpg" alt="uni-ball pens" /></p>
<p>Neither is particularly ergonomic, since they both have lightweight plastic bodies. But they both have amazing “write-feel”. The 207 is particularly creamy.</p>
<p>To be updated as I find more favorites…</p>
<div class="footnotes">
<ol>
<li id="fn:chunks">
<p>For some pens, a strange dried-ink gunk can build up after repeated use. Maybe ball-point pens are at higher risk for this? <a href="#fnref:chunks" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
<p><a href="http://ssk.im/blog/great-pens/">Great Pens</a> was originally published by Steven Soojin Kim at <a href="http://ssk.im">Steven Soojin Kim</a> on 2015.06.22.</p><![CDATA[Random Schrodinger Operators]]>http://ssk.im/blog/schrodinger-density2014-11-15T00:00:00-05:002014-11-15T00:00:00-05:00Steven Soojin Kimhttp://ssk.imsteven@ssk.im
<p>Consider the one-dimensional, discrete Schrodinger operator on $\ell^2(\mathbb{Z})$:</p>
<script type="math/tex; mode=display">
(Hu)(n) := u(n+1) + u(n-1) + V(n) u(n) \, , \quad n \in \mathbb{Z}.
</script>
<p>This is a toy model for the behavior of an electron in a large domain. In order to model electron dynamics in the presence of random disordered environment, we can place assumptions on the randomness of the <em>potential</em> $V$. From the point of view of matrices, $H$ is an infinite Jacobi matrix with random elements along the diagonal. While this model is very simplified, random Schrodinger operators and their spectra turn out to have many interesting mathematical properties, including the prediction/replication of physically observed phenomena – see Chapter 9 of <a href="http://books.google.com/books/about/Schrödinger_Operators.html?id=HR_-P2mxkSkC">Cycon, Froese, Kirsch, Simon</a> for an introduction which is fairly gentle, aside from a few typos. I will focus on two nice results: the <em>integrated density of states</em> and the <em>Thouless formula</em>.</p>
<h2 id="integrated-density-of-states">Integrated Density of States</h2>
<p>Let $(\Omega,\mathcal{F},\mathbb{P})$ be the canonical probability space associated with $V$, and suppose the potential $V$ is <em>stationary</em> and <a href="http://en.wikipedia.org/wiki/Measure-preserving_dynamical_system"><em>ergodic</em></a>. For a given outcome $\omega\in\Omega$, let <script type="math/tex">V_\omega</script> denote the potential, and <script type="math/tex">H_\omega</script> the associated operator.</p>
<p>By elementary manipulations and applications of the ergodic property, it is possible to show that $\sigma(H_\omega)$, the spectrum of <script type="math/tex">H_\omega</script>, equals a deterministic set, call it $\Sigma$, $\mathbb{P}$-a.s. However, it is interesting to ask how the spectrum is distributed along this set $\Sigma$. Let $\delta_0\in \ell^2(\mathbb{Z})$ be the unit vector with value 1 in the 0th coordinate, and 0 elsewhere. Define the measure $dk$ by</p>
<script type="math/tex; mode=display"> \int_\mathbb{R} f(\lambda) dk(\lambda) := \mathbb{E} [\langle \delta_0, f(H_\omega) \delta_0 \rangle ]. </script>
<blockquote>
<p><strong>Theorem.</strong> The support of $dk$ is $\Sigma$.</p>
</blockquote>
<p>From just this definition and claim above, it is not clear why $dk$ is in any way related to the “distribution” of the spectrum. Note that the spectrum of <script type="math/tex">H_\omega</script> is an infinite set, so it is not possibly to naively bucket and histogram to describe the distribution… but it essentially is! That is, we will restrict <script type="math/tex">H_\omega</script> to a finite interval $[-L,L]$, compute the empirical density of the eigenvalues in this interval, and then see what happens as we take $L\rightarrow\infty$.</p>
<p>Let <script type="math/tex">\{\mathcal{E}_\Delta(\omega)\}_{\Delta\subset\mathbb{R}}</script> represent the family of spectral projections associated with <script type="math/tex">H_\omega</script>, let $\chi_L$ be the indicator function of $[-L,L]$, and define the measure $dk_L$ by</p>
<script type="math/tex; mode=display"> \int_A dk_L := \frac{1}{2L+1} \text{dim Range}(\chi_L \mathcal{E}_A(\omega) \chi_L) = \frac{1}{2L+1} \text{tr}(\mathcal{E}_A(\omega)\chi_L). </script>
<blockquote>
<p><strong>Theorem.</strong> As $L\rightarrow\infty$, $dk_L$ converges vaguely to $dk$, $\mathbb{P}$-a.s.</p>
</blockquote>
<p><em>Idea of Proof.</em> First, prove that for a <em>given</em> bounded measurable function $f$, then $\int f dk_L \rightarrow \int f dk$, $\mathbb{P}$-a.s. To do so requires an application of Birkhoff’s ergodic theorem. Then, for each bounded measurable function $f$, we have a set $\Omega_f$ of measure 1 on which the desired behavior occurs. The conclusion of the proof is a classical approximation argument which exploits the separability of $C_0$ to stitch together countably many $\Omega_f$ to get a set of measure 1 on which the statement is true for all $f\in C_0$. $\square$.</p>
<p>Note that the prelimit measures $dk_L$ are random, so it is <em>not</em> a priori obvious (at least, to me) that $dk$ <em>should</em> be a deterministic measure! This offers a nice parallel to other results in random matrix theory, where taking the limit of empirical spectral distributions can give an unexpectedly explicit (and universal!) limiting measure – e.g., the semicircle law for Wigner matrices, or the circular law for matrices with iid elements.</p>
<h2 id="thouless-formula">Thouless Formula</h2>
<p>Part of the joy of the one-dimensional assumption is that solutions to the eigenvalue problem $(H-E)u=0$ can be written in terms of $2\times 2$ <em>transfer matrices</em> since any solution is determined by its value at two adjacent points in $\mathbb{Z}$. That is, let $\mathbf{u}(n) = (u(n+1), u(n))$, and define</p>
<script type="math/tex; mode=display">% <![CDATA[
A_n(E,\omega) := \begin{pmatrix} E - V_\omega(n) & -1 \\ 1 & 0 \end{pmatrix}
%]]></script>
<p>Then,</p>
<script type="math/tex; mode=display">
\begin{array}{c}
u(n+1) + u(n-1) + (V_\omega(n) - E) u(n) = 0\\
\Updownarrow\\
\mathbf{u}(n+1) = A_{n+1}(E,\omega) \mathbf{u}(n).
\end{array}
</script>
<p>Note that $\mathbf{u}(n)$ can be written in terms of the product of the random matrices <script type="math/tex">A_n(E)</script> applied to some initial condition. Then, Furstenberg’s theorem tells us that for $E\in\mathbb{R}$ and $\mathbb{P}$-a.s. $\omega\in\Omega$, there exists $\gamma(E)$ such that</p>
<script type="math/tex; mode=display">
\gamma(E) := \lim_{N\rightarrow \pm \infty} \frac{1}{|N|} \log \left\| \prod_{i=0}^N A_i(E,\omega) \right\|
</script>
<blockquote>
<p><strong>Theorem.</strong>
<script type="math/tex">\gamma(E) = \int \log |E - E'| dk(E') </script>.</p>
</blockquote>
<p>To me, this relationship is incredible! Unfortunately, I don’t have much intuition as to <em>why</em> it should be true, but the proof involves showing that a similar result holds in the finite $N$ case, and then exploiting the subharmonicity of $\gamma$. Moreover, this is not merely a nice connection between $\gamma$ and $k$, but also a fundamental ingredient of the proof that under certain conditions on $V$, the spectrum $\sigma(H_\omega)$ has no absolutely continuous part.</p>
<p><a href="http://ssk.im/blog/schrodinger-density/">Random Schrodinger Operators</a> was originally published by Steven Soojin Kim at <a href="http://ssk.im">Steven Soojin Kim</a> on 2014.11.15.</p><![CDATA[Weak solutions of SDE]]>http://ssk.im/blog/weak-solutions2014-09-17T00:00:00-04:002014-09-17T00:00:00-04:00Steven Soojin Kimhttp://ssk.imsteven@ssk.im
<ol id="markdown-toc">
<li><a href="#background">Background</a> <ol>
<li><a href="#strong-solution">Strong solution</a></li>
<li><a href="#weak-solution">Weak solution</a></li>
<li><a href="#simple-example">Simple example</a></li>
</ol>
</li>
<li><a href="#weak-but-not-strong">Weak but not strong</a> <ol>
<li><a href="#tanaka-example">Tanaka example</a></li>
<li><a href="#tsirelson-example">Tsirelson Example</a></li>
<li><a href="#discrete-time-analog">Discrete time analog</a></li>
</ol>
</li>
</ol>
<h2 id="background">Background</h2>
<p>Let $b:[0,\infty)\times\mathbb{R}^d \rightarrow \mathbb{R}^d$ and $\sigma:[0,\infty)\times\mathbb{R}^d \rightarrow \mathbb{R}^{d\times r}$ be Borel-measurable functions. We would like to “solve” the following SDE,</p>
<script type="math/tex; mode=display">% <![CDATA[
dX_t = b(t,X_t) dt + \sigma(t, X_t) dW_t,\quad 0\le t < \infty, \tag{$\star$} %]]></script>
<p>where $W$ is an $r$-dimensional Brownian motion, and $X$ is a suitable stochastic process with continuous sample paths and values in $\mathbb{R}^d$ is the “solution” to the equation.</p>
<h3 id="strong-solution">Strong solution</h3>
<p>Fix a filtered probability space $(\Omega,\mathcal{F},\{\mathcal{F}_t\}, P)$<sup id="fnref:usual"><a href="#fn:usual" class="footnote">1</a></sup>. Recall that a <em>strong solution</em> to $(\star)$ w.r.t. fixed Brownian motion $W$ and initial condition $\xi$ is a process $X$ with continuous sample paths such that:</p>
<ol>
<li>$X$ is $\{\mathcal{F}_t\}$-adapted;</li>
<li>$P(X_0 = \xi) = 1$;</li>
<li>
<p>for every $1 \le i \le d$, $1\le j \le r$, and $0\le t < \infty$,
<script type="math/tex">% <![CDATA[
P\left( \int_0^t \left\{ \lvert b_i(s,X_s)\rvert + \sigma_{ij}^2 (s, X_s)\right\} ds < \infty\right) =1; %]]></script></p>
</li>
<li>the integral version of $(\star)$ holds – that is, $P$-a.s.</li>
</ol>
<script type="math/tex; mode=display">% <![CDATA[
X_t = X_0 + \int_0^t b(s,X_s) ds + \int_0^t \sigma(s,X_s) dW_s; \quad 0\le t< \infty. %]]></script>
<p>The key to this definition is the adaptedness condition 1., which says that $X_t$ depends only on $W_s$ for $s$ up to time $t$. On the other hand, there is an alternative notion of solution which is in some sense less “pathwise” and more “distributional”.</p>
<h3 id="weak-solution">Weak solution</h3>
<p>A <em>weak solution</em> to $(\star)$ is a pairing of $(X,W)$ and $(\Omega,\mathcal{F},\{\mathcal{F}_t\}, P)$ such that:</p>
<ol>
<li>$(\Omega,\mathcal{F},\{\mathcal{F}_t\}, P)$ is a filtered probability space satisfying the usual conditions;</li>
<li>$X$ is a continuous, adapted $\mathbb{R}^d$-valued proess and $W$ is an $r$-dimensional Brownian motion;</li>
<li>(see 3. for strong solutions);</li>
<li>(see 4. for strong solutions).</li>
</ol>
<p>Note that for the case of strong solutions, a probability space was <em>given</em>; on the other hand, for the case of weak solutions, a probability space must be provided as part of the solution! </p>
<h3 id="simple-example">Simple example</h3>
<p>An application of Girsanov’s theorem provides a nice example of a weak solution. Suppose we would like to find a solution to the SDE </p>
<script type="math/tex; mode=display"> dX_t = b(t,X_t)dt + dW_t , \quad 0\le t \le T \tag{$\dagger$} </script>
<p>where $T < \infty$ is a fixed positive number and $b:\mathbb{R}^d\rightarrow \mathbb{R}^d$ is a measurable function with sublinear growth</p>
<script type="math/tex; mode=display"> \|b(t,x)\| \le K(1+ \|x\|); \quad 0\le t \le T, \, x\in\mathbb{R}^d. </script>
<p>Let $(\Omega,\mathcal{F},P)$ be a probability space which supports a Brownian motion $X$, and let $\{\mathcal{F}_t\}$ be the (augmented) Brownian filtration (generated by $X$). Define the process $Z$ as</p>
<script type="math/tex; mode=display"> Z_t := \exp\left( \int_0^t b(s,X_s) dX_s - \tfrac{1}{2} \int_0^t \|b(s,X_s)\|^2 ds \right), \quad 0\le t \le T.</script>
<p>Due to the Benes condition (see, Karatzas & Shreve, Corollary 3.5.16), $Z$ is a martingale under $P$. Define the measure $Q$ via its Radon-Nikodym derivative $\frac{dQ}{dP} = Z_T$. By applying the Girsanov theorem, the process $W$ as defined by </p>
<script type="math/tex; mode=display"> W_t := X_t - X_0 - \int_0^t b(s,X_s)ds, \quad 0\le t \le T </script>
<p>is a Brownian motion with $Q(W_0 = 0) = 1$. It is easy to check that $(X,W)$ and $(\Omega,\mathcal{F}, \{\mathcal{F}_t\}, Q)$ constitute a weak solution to $(\dagger)$. </p>
<p>This example demonstrates the peculiarity of adaptedness. For a strong solution, we require $X_t$ to be $\{\mathcal{F}_t^W\} = \sigma(W_s;s\le t)$-measurable. On the other hand, the weak solution constructed above gives the “opposite” in some sense; here, $W_t$ is $\{\mathcal{F}_t^X\} = \sigma(X_s;s\le t)$-measurable. This distinction is important, and it leads to different interpretations of solutions. Weak solutions provide a very probabilistic interpretation, since to provide a weak solution is essentially to construct a measure. On the other hand, strong solutions allow us to make parallels to deterministic dynamical systems. That is, if an SDE has a strong solution, it can reasonably be interpreted as “an ODE with noise”, and Wong-Zakai type approximations should hold.</p>
<h2 id="weak-but-not-strong">Weak but not strong</h2>
<p>Given that we have two notions of solution, we should have some examples which explicitly demonstrate that they are in fact distinct. There is also an important philosophical question of what it means to be a weak solution; that is, if the randomness of $X$ cannot be explained entirely through $W$, then where is this “extra randomness” coming from, and how should we interpret it? </p>
<h3 id="tanaka-example">Tanaka example</h3>
<p>Consider the SDE with drift $b(x) \equiv 0$ and diffusion $\sigma(x) = \text{sgn}(x)$. That is,</p>
<script type="math/tex; mode=display"> X_t = \int_0^t \text{sgn}(X_s) dW_s. \tag{$\ddagger$} </script>
<p>Note that the quadratic variation is $\langle X \rangle_t = t$, so by Lévy’s characterization of Brownian motion, $X$ is a Brownian motion. Then, define $W$ as:</p>
<script type="math/tex; mode=display"> W_t = \int_0^t \text{sgn}(X_s) dX_s. </script>
<p>It is apparent that $(X,W)$ and $(\Omega,\mathcal{F}, \{\mathcal{F}_t^X\}, P)$ form a weak solution to $(\ddagger)$. Note that as in the case of weak solutions formed by Girsanov theorem, $\mathcal{F}_t^W \subset \mathcal{F}_t^X$.</p>
<p>In fact, $(\ddagger)$ admits no strong solution at all! Suppose it did. First, recall the <em>Tanaka formula</em> for local time (at zero). For a Brownian motion $B$ starting at zero, where $L_t^B$ is its local time at 0,</p>
<script type="math/tex; mode=display">% <![CDATA[
2L_t^B = |B_t| - \int_0^t \text{sgn}(B_s) dB_s; \quad 0\le t < \infty. %]]></script>
<p>Then, since $X$ is a Brownian motion started at zero,</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
W_t &= \int_0^t \text{sgn}(X_s) dX_s \\
&= |X_t| - 2L_t^X\\
&= |X_t| - \lim_{\epsilon\downarrow 0} \frac{1}{2\epsilon} \text{meas} \{ 0\le s \le t: \lvert X_s \rvert \le \epsilon \}; \quad 0\le t < \infty, P-\text{a.s.}
\end{align*}
%]]></script>
<p>Combined with the adaptedness condition for strong solutions, this implies </p>
<script type="math/tex; mode=display">\mathcal{F}_t^X \subset \mathcal{F}_t^W \subset \mathcal{F}_t^{|X|},</script>
<p>which is a contradiction.</p>
<h3 id="tsirelson-example">Tsirelson Example</h3>
<p>One might think that if the coefficients $b$ and $\sigma$ are sufficiently well-behaved, then one can obtain a weak solution using Girsanov theorem, and then somehow prove pathwise uniqueness to show that such a solution is in fact strong. In principle, this is true, but let’s consider an example where $\sigma \equiv 1$ (avoiding the issue of the sign function) and $b$ is bounded (which would presumably prevent any sort of explosion), except now we let $b:[0,\infty)\times C([0,\infty);\mathbb{R})\rightarrow \mathbb{R}$ progressively measurable, meaning the drift depends on the entire past history of $X$ instead of just on a single point $X_t$. That is, we wish to solve the functional SDE:</p>
<script type="math/tex; mode=display">% <![CDATA[
dX_t = b(t,X) dt + dW_t,\quad 0\le t < \infty. \tag{$\S$} %]]></script>
<p>The following discussion is taken from Yor, Revuz p.392. Let <script type="math/tex">(t_k)_{k\in -\mathbb{N}}</script> be a strictly increasing sequence such that $0 < t_k < 1$ for $k <0$, $t_0 = 1$, and $\lim_{k\rightarrow -\infty} t_k = 0$. Then, set</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\tau (t,z) &= \left[ \frac{ z(t_k) - z(t_{k-1}) }{ t_k - t_{k-1} } \right] \quad \text{ if } t_k < t\le t_{k+1} \\
&= 0 \quad \text{ if } t= 0 \text{ or } t > 1,
\end{align*}
%]]></script>
<p>where $[x]$ indicates the fractional part of a real number $x$. Let $(X,W)$ denote a solution to $(\S)$ with $b= \tau$. For <script type="math/tex">% <![CDATA[
t_k < t \le t_{k+1} %]]></script>, let <script type="math/tex">\eta_t = \frac{X_t - X_{t_k}}{t - t_k}</script> and let <script type="math/tex">\epsilon_t = \frac{W_t - W_{t_k}}{t- t_k}</script>. Then,</p>
<script type="math/tex; mode=display"> \eta_t = \epsilon_t + [\eta_{t_k}]. </script>
<p>Note that $\mathcal{F}_t^X = \sigma([\eta_t]) \vee \mathcal{F}_t^W$. The lack of a strong solution follows from the following claim, which can be proved through from some elementary calculations involving conditional expectation and characteristic functions:</p>
<blockquote>
<p><strong>Lemma.</strong> For $t\in [0,1]$, the random variable $[\eta_t]$ is uniformly distributed on $[0,1]$, and independent of $\mathcal{F}_1^W$.</p>
</blockquote>
<h3 id="discrete-time-analog">Discrete time analog</h3>
<p>In this section, we analyze a discrete-time version of the Tanaka example, for which a “strong solution” does exist! We proceed as in <a href="http://arxiv.org/abs/math.PR/9911115/">Warren 1999</a>. Let <script type="math/tex">X=(X_n)_{n\in\mathbb{N}_0}</script> be the symmetric nearest neighbor random walk on $\mathbb{Z}$, and define <script type="math/tex">W= (W_n)_{n\in\mathbb{N}_0}</script> by setting $W_0 = 0$ and</p>
<script type="math/tex; mode=display"> W_{n+1} - W_n = \text{sgn}(X_n)(X_{n+1} - X_n). </script>
<p>Note that $W$ is also a symmetric nearest neighbor random walk on $\mathbb{W}$, and we can write</p>
<script type="math/tex; mode=display"> X_n = \sum_{k=0}^{n-1} \text{sgn}(X_k) (W_{k+1} - W_k), \tag{$\parallel$}</script>
<p>the discrete version of the SDE $(\ddagger)$. Moreover, we can obtain discrete versions of the Tanaka formula. Let $L_0=0$ and define <script type="math/tex">L_n = \sum_{k=0}^{n-1} \mathbb{1}_{\{X_k,X_{k+1} \in \{0,-1\}\}}</script>. Then,</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
|X_n + \tfrac{1}{2}| - \tfrac{1}{2} &= \sum_{k=0}^{n-1} \text{sgn}(X_k) (X_{k+1} - X_k) + L_n\\
&= W_n + \sup_{k\le n} (-W_k).
\end{align*}
%]]></script>
<p>In light of this formula, we can show that in the discrete setting, $X$ is fully determined by $W$.To do so, it remains to show that $\text{sgn}(X_n + \tfrac{1}{2})$ can be determined from $\{W_k\}_{k\le n}$. For $n\in\mathbb{N}$, define</p>
<script type="math/tex; mode=display"> m_n := \sup \left\{ m \in \{0,1,\cdots, n\} : W_m = -\sup_{k \le m} (-W_k) \right\} .</script>
<p>Note that <script type="math/tex">X_{m_n} \in \{0, -1\}</script>, and for all $ m_n < \ell \le n$, we know <script type="math/tex">W_\ell > - \sup_{k\le \ell} (-W_k)</script>, meaning <script type="math/tex">\lvert X_\ell + \tfrac{1}{2}\rvert > \tfrac{1}{2}</script>. This implies that</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
X_{m_n} + \tfrac{1}{2} = +\tfrac{1}{2} &\quad \Rightarrow \quad X_\ell + \tfrac{1}{2} > 0, \\
X_{m_n} + \tfrac{1}{2} = -\tfrac{1}{2} &\quad \Rightarrow \quad X_\ell + \tfrac{1}{2} < 0.
\end{align*}
%]]></script>
<p>Thus, $X_n$ is measurable with respect to $\mathcal{F}_n^W = \sigma(W_k : k \le n)$. That is, this adaptedness property gives us a “strong solution” to $(\parallel)$, even though no such strong solution exists for the SDE $(\ddagger)$!</p>
<p>One interpretation of this phenomenon is that there is some connection between the “loss of information” about the sign, and the fact that $x\mapsto \text{sgn}(x)$ is “noise sensitive”. As described by Warren, suppose we have a pair of random walks $W$ and $W’$ such that the step sizes have correlation $\rho \in (0,1)$. From this pair of noises, define $X$ and $X’$ as in $(\parallel)$. In the asymptotic limit as $n$ grows large, it is possible to show that $\text{sgn}(X_n)$ and $\text{sgn}(X_n’)$ are uncorrelated, regardless of the correlation $\rho \in (0,1)$, indicating that the sign function is asymptotically sensitive to any non-zero perturbation of the noise $W$. </p>
<p>The fundamental question is: <em>What is happening the scaling limit?!</em> Questions of this nature are discussed in brief in the Warren paper, and can also be found in some works of <a href="http://arxiv.org/abs/math/0301237">Tsirelson</a>.</p>
<div class="footnotes">
<ol>
<li id="fn:usual">
<p>In order to avoid dealing with completions or augmentations of any kind, we will always assume the usual conditions on any filtration discussed. That is, the filtration $\{\mathcal{F}_t\}$ is right-continuous and $\mathcal{F}_0$ contains all $P$-null sets in $\mathcal{F}$. <a href="#fnref:usual" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
<p><a href="http://ssk.im/blog/weak-solutions/">Weak solutions of SDE</a> was originally published by Steven Soojin Kim at <a href="http://ssk.im">Steven Soojin Kim</a> on 2014.09.17.</p><![CDATA[Sparse PCA]]>http://ssk.im/blog/sparse-PCA2014-09-16T00:00:00-04:002014-09-16T00:00:00-04:00Steven Soojin Kimhttp://ssk.imsteven@ssk.im
<p>I recently sat in on the applied probability topics seminar at MIT, which will spend the semester covering some very modern topics in statistics (in particular: sparse PCA, matrix completion, and community detection). This past Friday, two students gave a very nice overview of the statistical and computational aspects of the semidefinite relaxation developed in <a href="http://arxiv.org/abs/cs/0406021">d’Aspremont, El Ghaoui, Jordan, Lanckriet 2007</a>. As a brief addendum to my previous post on <a href="/blog/optimization-relaxations">relaxations</a>, and as a reminder to myself, I’d like to review sparse principal component analysis (PCA), or at least the basic problem formulation.</p>
<p>Recall the setup of PCA. For $n$ observations with $p$ features, denote the data matrix by $X \in \mathbb{R}^{n\times p}$, and the sample covariance by</p>
<script type="math/tex; mode=display"> S = \frac{1}{n} X^T X - \frac{1}{n^2} X^T \mathbb{1}_n \mathbb{1}_n^T X \in \mathbb{R}^{p\times p}, </script>
<p>where $\mathbb{1}_n$ is the $n\times 1$ vector of ones. The notion behind (classical) PCA is to find an orthonormal subset of vectors which will “explain” much of the data. To be precise, for $j=1,\cdots, p$, the $j$-th principal component is defined as follows:</p>
<script type="math/tex; mode=display"> v_j = \arg \max_{\substack{ v \in \mathbb{R}^p \\ \text{s.t. } \| v\|_2 = 1, \\ v \perp v_1, \cdots, v_{j-1} } } v^T S v.</script>
<p>For some intuition on the objective function, note that $v^T S v$ is the empirical variance of $X v$, the $n$ samples projected onto the vector $v$. By selecting vectors to maximize empirical variance, we are finding the most influential aspects of the data. From one perspective, this optimization problem is already somewhat hard, since it is doubly non-convex: we are asked to maximize a convex function $v\mapsto v^T S v$, and we are optimizing over the sphere $\|v \|_2 = 1$, a non-convex set. On the other hand, if we combine the unit normal constraint with the objective function, this problem becomes one of maximizing the Rayleigh quotient. That is, the $j$-th principal component is the eigenvector associated with $\lambda_j$, the $j$-th largest eigenvalue of $S$, so PCA becomes a problem of eigenvalue decomposition <sup id="fnref:psd"><a href="#fn:psd" class="footnote">1</a></sup>. Note that the “total variance” of the data can be written as $\text{tr} (S) = \lambda_1 + \cdots + \lambda_p$.</p>
<p>We would like to use PCA not only for dimension reduction (i.e., by projecting onto $p’ \ll p$ dimensions which capture much of the variance of the data), but also for interpretation of the $p$ factors. The trouble is that in general, each principal component $v_i$ has non-zero elements at each coordinate. What if we would like to have only $k \ll p$ non-zeros? Then, for the first principal component, we would like to solve:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*} \tag{$\star$}
\text{ max } \, & v^T S v \\
\text{ s.t. } & \|v\|_2 = 1\\
& \text{card}(v) \le k.
\end{align*}
%]]></script>
<p>The cardinality constraint makes this problem even more difficult due to the additional combinatorial aspect involved. Note that we can rewrite $v^T S v = \text{tr}(Svv^T)$, which inspires the following rewrite of $(\star)$, which is more conducive to a semidefinite relaxation:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*} \tag{$\dagger$}
\text{ max } \,& \text{tr}(SV) \\
\text{ s.t. } & V \succeq 0\\
& \text{tr}(V) = 1\\
& \text{card}(V) \le k^2\\
& \text{rank}(V) = 1.
\end{align*}
%]]></script>
<p>Note that this formulation is already quite nice: the objective is linear in $V$ instead of quadratic in $v$, and the constraint $\|v\|_2=1$ has been changed into linear constraints on $V$ (positive semi-definiteness and trace = 1). However, the cardinality and rank constraints are still combinatorial in nature. To this end, since $\text{tr}(V) = 1$ and $\text{rank}(V) = 1$, note that the cardinality constraint $\text{card}(V) \le k^2$ implies</p>
<script type="math/tex; mode=display">
\mathbb{1}_p^T |V| \mathbb{1}_p = \|\text{vec}(V)\|_1 \le k \|\text{vec}(V)\|_2 = k\|V\|_F = k.
</script>
<p>As for the rank constraint, simply drop it to obtain the relaxation:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*} \tag{$\ddagger$}
\text{ max } & \text{tr}(SV) \\
\text{ s.t. } & V \succeq 0\\
& \text{tr}(V) = 1\\
& \mathbb{1}_p^T |V| \mathbb{1}_p \le k.
\end{align*}
%]]></script>
<p>Note that $(\ddagger)$ is an SDP, with variable $V \in \mathbb{S}^p$. The optimal value achieved by this optimization problem acts as an upper bound to the solution of the original problem $(\star)$.</p>
<p>Such is the problem setup. There are several questions one can ask regarding this problem, none of which I will go into in detail:</p>
<ol>
<li>Does solving this relaxed problem produce feasible solutions (i.e., sparse vectors)? Numerical experiments suggest that this is frequently the case.</li>
<li>What if instead of the constrained problem, we analyze the penalized problem, with objective $v^T S v - \rho \,\text{card}(V)$? It turns out that the dual of the relaxed penalized problem can be interpreted as a “worst-case” computation of the maximum eigenvalue of a perturbed version of $S$.</li>
<li>How can we actually solve the associated SDP? One might immediately turn to interior point methods, but the $O(p^2)$ constraints make Newton’s method too costly. On the other hand, first-order methods have cheap iterations, low memory requirements, and lend themselves to parallelization. The cost is that they converge slowly (typically something like $O(1/\epsilon)$ for $\epsilon$ precision); but this cost is somewhat artificial since the statistical nature of this problem means that we only care about achieving computational error up to some statistical threshold.</li>
</ol>
<div class="footnotes">
<ol>
<li id="fn:psd">
<p>Note that since $S$ is a symmetric and positive semi-definite real matrix, the eigenvalues are all real and non-negative. <a href="#fnref:psd" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
<p><a href="http://ssk.im/blog/sparse-PCA/">Sparse PCA</a> was originally published by Steven Soojin Kim at <a href="http://ssk.im">Steven Soojin Kim</a> on 2014.09.16.</p><![CDATA[Relaxations in Optimization]]>http://ssk.im/blog/optimization-relaxations2014-08-31T00:00:00-04:002014-08-31T00:00:00-04:00Steven Soojin Kimhttp://ssk.imsteven@ssk.im
<ol id="markdown-toc">
<li><a href="#mathbbr-vs-mathbbz-lp-relaxations-of-ip">$\mathbb{R}$ vs. $\mathbb{Z}$: LP Relaxations of IP</a></li>
<li><a href="#the-great-watershed-convex-relaxation">The Great Watershed: Convex Relaxation</a> <ol>
<li><a href="#semidefinite-programming">Semidefinite programming</a></li>
<li><a href="#goemans-williamson-max-cut">Goemans, Williamson (MAX CUT)</a></li>
</ol>
</li>
<li><a href="#sparsity-statistics-and-selection">Sparsity, Statistics, and Selection</a> <ol>
<li><a href="#lasso">LASSO</a></li>
<li><a href="#matrix-completion">Matrix Completion</a></li>
</ol>
</li>
<li><a href="#related-notions">Related Notions</a></li>
</ol>
<p>I’m excited to announce that I am tentatively assigned to be a TA for <a href="https://courses.brown.edu/courses/fall-2014/apma-1210-s01">APMA 1210</a> in Fall 2014. The course will introduce the elements of operations research, with a focus on deterministic optimization methods. I am glad that Brown offers an undergraduate course on this subject, since it’s been my experience that many mathematicians overlook the structural beauty, historical importance, and practical relevance of optimization theory.</p>
<p>Since the semester is fast approaching, I decided to review a little bit for myself. I find myself particularly amazed by the recurring theme of <em>relaxation</em> (or rather, how well relaxations manage to work). For those new to the concept, recall that mathematicians are known for simplifying: they like to turn mathematically “hard” problems into mathematically “simple” ones. Analogously, computer scientists like to turn computationally hard problems into computationally tractable ones. The trick is, in both cases, upon solving the tractable problem, one must check whether it tells you anything useful about the original hard problem.</p>
<p>To be a little bit more precise, let $X$ be some set and $f: X\rightarrow\mathbb{R}$ some function. Then, we wish to compute </p>
<script type="math/tex; mode=display"> \min_{x\in X} f(x), </script>
<p>and also find the minimizing value(s) <script type="math/tex">x^*</script>. But suppose that finding this optimal solution is quite hard; the broad idea of relaxation is to settle for a suboptimal solution that can be found more easily. That is, consider an alternative set $\tilde{X}$ that is “similar” to $X$, and an alternative function $\tilde{f}:\tilde{X}\rightarrow\mathbb{R}$ that is “similar” to $f$, and then compute</p>
<script type="math/tex; mode=display"> \min_{x\in \tilde{X}} \tilde{f}(x) </script>
<p>and the minimizing value <script type="math/tex">\tilde{x}^*</script>. A miracle happens when finding the solution to the relaxed problem $\tilde{x}^* \in \tilde{X}$ can produce an approximate solution $x’ \in X$ that works reasonably well for the original problem.</p>
<p>As a side note, I think the timing is perfect for an undergraduate to learn about optimization, relaxation, and approximation. On the practical side, optimization is an increasingly important part of our computational and data-oriented world. On the theoretical side, the recent ICM 2014 provided a showcase of some recent aspects of optimization and complexity (as it has in previous years). In particular, the <a href="http://www.mathunion.org/fileadmin/IMU/Prizes/2014/news_release_khot.pdf">Nevanlinna Prize</a> was recently awarded to Subhash Khot for his work on the <a href="http://en.wikipedia.org/wiki/Unique_games_conjecture">Unique Games Conjecture</a>, which offers a lens through which one can analyze the critical frontier of approximate computability. Also at the ICM, there was a <a href="https://www.youtube.com/watch?v=W-b4aDGsbJk">plenary lecture</a> given by Emmanuel Candes, whose work on the computational end of compressed sensing sparked many modern developments approximation and relaxation theory. Moreover, ICERM will be hosting a workshop on <a href="http://icerm.brown.edu/sp-f14-w2/">Approximation, Integration, and Optimization</a> in Fall 2014. While APMA 1210 will only offer a peek at optimization, it should build up some of the classical foundations underlying the very modern mathematics described above.</p>
<p>In the remainder of the blog post, I will survey a few examples of relaxation, with a particular focus on turning <em>combinatorial optimization</em> problems into <em>linear/convex optimization</em> problems.</p>
<h2 id="mathbbr-vs-mathbbz-lp-relaxations-of-ip">$\mathbb{R}$ vs. $\mathbb{Z}$: LP Relaxations of IP</h2>
<p>Recall the typical form of a <em>linear program (LP)</em>. Given $m,n\in\mathbb{N}$, $A \in \mathbb{R}^{m\times n}$, $b\in \mathbb{R}^m$, $c\in \mathbb{R}^n$, we wish to find $x^*\in\mathbb{R}^n$ which solves:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\text{ min } & c^T x\\
\text{ s.t. } & Ax \le b \\
& x_i \ge 0, \quad i=1,\cdots, n.
\end{align*}
%]]></script>
<p>This problem has an incredibly rich mathematical history, from the war-changing work of Kantorovich, to the legal issues on patentability that arose as a result of Karmarkar’s algorithm. The linearity assumption imposes strong structural constraints on this optimization problem. In particular, a linear function on a convex polytope has the nice property that it will achieve its optima at the polytope’s corners. There are several ways to solve such an LP: e.g., Dantzig’s simplex method (which works well empirically) and Khachiyan’s ellipsoid method (which has a “better” theoretical guarantee). For the practitioner who wishes to use LP as a black box, or for the mathematician who wishes to reduce harder problems to LP, the most important fact is that an LP can solved in polynomial time with respect to number of variables $n$. </p>
<p>Of course, we should ask what is meant by “polynomial time”, since we are dealing with real-valued solutions<sup id="fnref:poly"><a href="#fn:poly" class="footnote">1</a></sup>. There are (at least) two notions of complexity when dealing with optimization. </p>
<ol>
<li><em>Rational Arithmetic Model</em> – The computational cost is measured in terms of the number of arithmetic operations and comparisons on rational numbers (which can be represented as finite-length binary words). This is reasonably similar to a physical model of computation. Under this model, Khachiyan’s ellipsoid method showed that an LP can solved in time polynomial with respect to $n$, the number of variables, and $L$, the size of the problem in terms of bits required to represent it.</li>
<li><em>Information Complexity</em> – Here, complexity is measured in terms of number of calls to an “oracle” which takes input $x$ and outputs the objective $f(x)$ and its gradient $\nabla f(x)$. There is also dependence on $\epsilon$, the level of precision desired for a solution. Roughly speaking, this model measures not the number of computations, but the number of “iterations” an algorithm takes. This measure of complexity is particularly natural in statistics for two reasons: first, optimization problems in statistics typically have an unknown objective function for which there is limited data; second, there is little point in finding an exact minimum, since in addition to computational error of imprecise minimization, there will always be a level of statistical error as quantified by whatever generalization bound exists for the problem.</li>
</ol>
<p>For a more expansive discussion on posing the question of complexity in the context of optimization, see Nemirovski’s notes on <a href="http://www2.isye.gatech.edu/~nemirovs/OPTI_LectureNotes.pdf">linear optimization</a> §6.1 and <a href="http://www2.isye.gatech.edu/~nemirovs/Lect_IPM.pdf">convex optimization</a> §1.2.</p>
<p>Without going into further detail about complexity, one should be happy if a problem can be reduced to an LP. In practice, many problems are “almost” an LP, but not quite. For a concrete (but not very serious) example, suppose you go to a restaurant and set a $50 budget for yourself. There are various items you can order, each of which gives you a certain amount of happiness per item (e.g., <script type="math/tex">h_{\text{pâté}}, h_{\text{ceviche}}, h_{\text{shortrib}}</script>), but comes at a certain price level (e.g., <script type="math/tex">p_{\text{pâté}}, p_{\text{ceviche}}, p_{\text{shortrib}}</script>). You want to maximize your happiness <script type="math/tex">\sum_{i \in \text{menu}} h_i x_i</script>, given that you must spend stay within your budget <script type="math/tex">\sum_{i \in \text{menu}} p_i x_i \le 50</script>. Unfortunately, the restaurant doesn’t let you order a non-integer amount of short rib dishes, so you must select $x_i \in \mathbb{Z}$. The same applies to other practical problems like vehicle scheduling, employee assignment, and resource allocation. </p>
<p>The situation described above is an example of a combinatorial problem known as <em>integer programming (IP)</em>. The general setup is similar to LP, but with one additional constraint:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\text{ min } & c^T x\\
\text{ s.t. } & Ax \le b \\
& x_i \ge 0\\
& x_i \in \mathbb{Z}, \quad i=1,\cdots, n.
\end{align*}
%]]></script>
<p>As one might guess from the combinatorial nature of integer programming, it is in general NP-hard to find a solution<sup id="fnref:RvZ"><a href="#fn:RvZ" class="footnote">2</a></sup>. It turns out that several algorithmic problems can be formulated as integer programs: TSP, cover problems, and satisfiability problems to name a few. This shouldn’t come as a surprise, since all of these problems are vaguely questions of assigning elements under certain constraints.</p>
<p>Given this computational difficulty, and the fact that LP is so “easy” in comparison, it is natural to try to simply discard the $x\in\mathbb{Z}$ condition and reduce an IP to an LP. This is known as <em>linear programming relaxation</em>, since we “relax” the integer constraint. The solution to the LP provides a lower bound on the optimal value of the IP (since it is minimizing over a larger set, with fewer constraints). However, solving the LP does not immediately provide a candidate solution to the IP, since the resulting optimal $x$ will, in general, <em>not</em> be integer-valued. Here we should usurp the words of John Tukey:</p>
<blockquote>
<p>Far better an approximate answer to the right question … than an exact answer to the wrong question.</p>
</blockquote>
<p>To this end, we can consider <a href="http://en.wikipedia.org/wiki/Randomized_rounding#Set_Cover_example">randomized rounding</a> of a solution to the LP to obtain an integer-valued candidate. The general idea is to round each coordinate of the LP solution $\tilde{x}^*$ according to some rule, in order to obtain a vector of integers $x’$. Then, with some probability (or, always, via derandomization), $x’$ is a candidate for the original integer program which is “almost as good” as the optimal solution. </p>
<p>Consider the set cover problem: fix a set of elements $U={1,\cdots, M}$, a collection $S$ of $n$ sets whose union equals $U$, and costs $c_1,\cdots,c_n$ attached to each set in $S$; find a subset of $S$ whose union equals $U$ while minimizing total cost. This problem is NP-hard. However, rounding the solution from an LP relaxation can efficiently provide a solution which is not bad:</p>
<blockquote>
<p><strong>Theorem.</strong> A (derandomized) rounding scheme for the set cover problem can return a candidate of cost $O(\log M)$ times the cost of the optimal set cover.</p>
</blockquote>
<p>One might think the set cover problem is just a lucky special case where the math happens to work out, but hopefully with the additional examples below, it becomes clear that relaxation is a broadly applicable approach towards obtaining approximate solutions in a much shorter time.</p>
<h2 id="the-great-watershed-convex-relaxation">The Great Watershed: Convex Relaxation</h2>
<p>Based on the above discussion on linear programs, one might naturally wonder about nonlinear programs. In dynamical systems and PDE, “linearity” proves to be a very useful property, and “nonlinearity” is more difficult to analyze. This is somewhat true for optimization as well, and in fact, one can find a vast literature that references “nonlinear programming”. However, I tend to agree with the following quote from R.T. Rockafellar:</p>
<blockquote>
<p>In fact the great watershed in optimization isn’t between linearity and nonlinearity, but convexity and nonconvexity.</p>
</blockquote>
<p>Recall the typical form of a <em>convex program (CP)</em>. Given a convex set $D\subset \mathbb{R}^n$ (or, more generally, a real vector space), and a convex function $f: D\rightarrow \mathbb{R}$, we wish to find $x^*\in \mathbb{R}^n$ which solves: </p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\text{ min } & f(x)\\
\text{ s.t. } & x\in D
\end{align*}
%]]></script>
<p>Of course, convex optimization is certainly harder than linear optimization: abstractly, linearity is a trivial type of convexity; practically, convex functions can achieve their minima on the interior as well as on the boundary. Nonetheless, convexity is an incredibly strong assumption when searching for optima, since for a convex function on a convex domain, any local optimum is a global optimum. Because of this, a great many algorithms for solving convex programs rely on the fundamental principle of <em>gradient descent</em>; that is, since “local” searches are provably asymptotically correct, an algorithm can keep taking small steps in the direction of a region of lower potential. Of course, there is a great deal of elegance in choosing the precise step size and direction in order to achieve optimal error, but I won’t go into detail here. Just as in the case of LP, one should be happy if a problem can be reduced to a CP.</p>
<h3 id="semidefinite-programming">Semidefinite programming</h3>
<p>A particularly interesting case of convex optimization is <em>semidefinite programming (SDP)</em>. Essentially, the task is to optimize over matrices instead of typical vectors. Let $\mathbb{S}^n$ denote the set of $n\times n$ real symmetric matrices. For $A,B \in \mathbb{S}^n$, define the inner product $\langle A, B\rangle = \text{tr}(A^TB)$. Fix $m$. For $C, A_1,\cdots, A_m \in \mathbb{S}^n$, and $b_1,\cdots, b_m \in \mathbb{R}$:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\text{ min } & \langle C, X\rangle \\
\text{ s.t. } & X \in \mathbb{S}^n \\
& \langle A_k, X\rangle \le b_k, \quad k=1,\cdots,m \\
& X \succeq 0.
\end{align*}
%]]></script>
<p>I know very little about the details of algorithms which can solve SDPs, but the guarantee as I know it is this: to obtain a solution up to additive error $\epsilon$, an algorithm can output a solution in time polynomial in $n$ and $\log(1/\epsilon)$. </p>
<h3 id="goemans-williamson-max-cut">Goemans, Williamson (MAX CUT)</h3>
<p>One of the most famous and well-cited examples of convex relaxation to an SDP is the application to the <a href="http://dl.acm.org/citation.cfm?id=227684">MAX CUT</a> problem by Goemans and Williamson. Given a graph $G= ([n],E)$ and weights <script type="math/tex">W_{ij}= W_{ji}</script> for $(i,j) \in E$, the <em>maximum cut</em> problem is to find a subset $S\subset [n]$ such that the weight of the edges in the cut $(S, S^c)$ (that is, the sum of the weights of the edges between $S$ and $S^c$) is maximized. This problem is known to be NP-complete, so we should consider approximate solutions, again through relaxation and randomization. To formulate our problem precisely, we want to solve:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*} \tag{$\star$}
\text{ max } & \frac{1}{2} \sum_{i,j=1}^n W_{ij} (x_i-x_j)^2\\
\text{ s.t. } & x_i \in \{-1, +1\}, \quad i=1 ,\cdots ,n
\end{align*}
%]]></script>
<p>Let <script type="math/tex">W = (W_{ij})_{i,j}</script> be the weighted adjacency matrix of $G$. Let $D$ be the diagonal matrix with $i$-th entry <script type="math/tex">\sum_{j=1}^n W_{ij}</script>. Then, we define $L=D-W$ to be the <em>graph Laplacian</em>. Then, note that <script type="math/tex">\frac{1}{2} \sum_{i,j=1}^n W_{ij} (x_i-x_j)^2 = x^T L x</script>. Using the inner product on matrix space, we can further rewrite $x^TL x= \langle L, xx^T\rangle$. Thus, we can rewrite the above problem as</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*} \tag{$\dagger$}
\text{ max } & \langle L, xx^T\rangle \\
\text{ s.t. } & x\in \{-1,+1\}^n
\end{align*}
%]]></script>
<p>Then, a natural convex relaxation is to move from the combinatorial problem of optimizing over matrices $xx^T$ such that <script type="math/tex">x\in \{-1,+1\}^n</script>, to a convex problem by searching over a larger space of matrices.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*} \tag{$\ddagger$}
\text{ max } & \langle L, X\rangle \\
\text{ s.t. } & X \in \mathbb{S}^n\\
& X \succeq 0\\
& X_{ii} =1, \quad i=1,\cdots,n
\end{align*}
%]]></script>
<p>In essence, the Goemans-Williamson algorithm is: solve the relaxed problem $(\ddagger)$, generate a uniformly random hyperplane in $\mathbb{R}^n$, separate the vertices $[n]$ by seeing on which side of the hyperplane the column vectors of $X$ fall. The interesting thing is how much such an approach helps!
To be precise (and taking a slightly different, but equivalent, geometric view): let $X$ be the solution to $(\ddagger)$, let $\xi \sim N(0,\Sigma)$, and let <script type="math/tex">\zeta = \text{sign}(\xi) \in \{-1,+1\}^n</script>. Note that $\zeta$ is a random candidate for a cut of $[n]$.</p>
<blockquote>
<p><strong>Theorem.</strong> $\mathbb{E} \langle L, \zeta \zeta^T\rangle \ge 0.878 \cdot \text{ optimal solution value of MAX CUT }$</p>
</blockquote>
<p>For reasons I don’t quite understand, I have read that this approximation algorithm (and the 0.878 ratio) is in some sense optimal, assuming the validity of the Unique Games Conjecture!</p>
<h2 id="sparsity-statistics-and-selection">Sparsity, Statistics, and Selection</h2>
<p>The fundamental concept behind the combinatorial problems described above is the <em>selection</em> of an optimal subset. It turns out that problems of selection are also common in statistics and computational learning theory. Suppose we have data which is high-dimensional in some sense (in that there are far more parameters than data points); solving such a problem is in general quite hard, but it becomes much more tractable if we can assume there is some sort of sparse low-dimensional structure. The key is to properly select the low dimensions.</p>
<h3 id="lasso">LASSO</h3>
<p>Consider the classical problem of <em>regression</em>. Fix $n$ samples and $p$ dimension such that $p \gg n$. Given data $(X, y)$ and coefficients $\theta$, where </p>
<script type="math/tex; mode=display">% <![CDATA[
X = \begin{pmatrix}
x_{11} & \cdots & x_{1p} \\
\vdots & \ddots & \vdots \\
x_{n1} & \cdots & x_{np}
\end{pmatrix} \in \mathbb{R}^{n\times p},
%]]></script>
<p>$y \in \mathbb{R}^n$, and $\theta \in \mathbb{R}^p$, we assume:</p>
<script type="math/tex; mode=display">
y = X \theta + \epsilon,
</script>
<p>where $\epsilon \in \mathbb{R}^n$ is some noise<sup id="fnref:linearnote"><a href="#fn:linearnote" class="footnote">3</a></sup>. Our goal is to estimate $\theta$ given the data. The standard estimator is the <em>ordinary least squares (OLS)</em> estimator:</p>
<script type="math/tex; mode=display"> \hat{\theta}^{\text{LS}} = \arg\min_{\theta \in \mathbb{R}^p} \| y - X\theta \|_2^2 ,</script>
<p>where <script type="math/tex">\|z\|_q = \left(\sum_{j=1}^m z_j^q\right)^{1/q}</script> is the $\ell^q$-norm of a vector $z$. This is a theoretically and practically simple estimator, but OLS estimates typicall suffer from the problem of <em>overfitting</em>. That is, OLS estimates typically have low or no bias, but high variance. These estimates are too specific to a particular data instance to provide good predictions. On the other hand, prediction error can actually be reduced by shrinking some of the coefficients $\theta$. Moreover, one might argue that anything which happens in the universe is affected in some small magnitude by everything else in the universe – but for modeling purposes, we should select only the few parameters which exhibit the most significant effects.</p>
<p>One way to resolve these problems is to set coefficients of $\theta$ to zero when they are sufficiently small, and to scale them down otherwise. This point of view leads to <em>thresholding estimators</em>, which is a useful characterization when $X$ is an orthonormal matrix.</p>
<p>However, to fit with the theme of relaxation, we will take an alternate but equivalent view. We would like to select $\theta$ with minimal residual squared error among a small subset of $\mathbb{R}^p$. The Lagrangian formulation of such an approach is <em>penalized least-squares</em>. That is, we would like to analyze estimates of the sort:</p>
<script type="math/tex; mode=display"> \hat{\theta}^{g} = \arg\min_{\theta \in \mathbb{R}^p} \| y - X\theta \|_2^2 + g(\theta), </script>
<p>where $g:\mathbb{R}^p \rightarrow \mathbb{R}$ is some sort of penalty on large, overfitting values of $\theta$. </p>
<p>The classical example of such a penalty is:</p>
<script type="math/tex; mode=display"> g(\theta) = \|\theta\|_2^2.</script>
<p>This penalty leads to a method is known as <em>ridge regression</em>, which produces estimates of smaller $\ell^2$ norm and lower bias than OLS. Moreover, the optimization problem is still a convex (indeed, quadratic) program, and thus rather simple to solve. Unfortunately, this process will not set any coefficients to zero, so it does not produce the “low-dimensional” model we would like.</p>
<p>An alternate approach is to choose the “$\ell^0$ norm”<sup id="fnref:l0foot"><a href="#fn:l0foot" class="footnote">4</a></sup> as the penalty:</p>
<script type="math/tex; mode=display"> g(\theta) = \|\theta\|_0 = \# \{ i \in [p] : \theta_i \ne 0 \} </script>
<p>This penalty produces estimates with only a few non-zero coefficients, but it tends to be very unstable in that slight perturbations to the data lead to highly different selected models. Moreover, the optimization problem becomes one of combinatorial optimization since one must select a subset of $p$ possible dimensions. In fact, solving the penalized optimization problem with $\ell^0$ penalty is in general NP-hard, which one might expect since exhaustive search over all subsets of columns of $X$ has exponential complexity in $p$.</p>
<p>Given that $\ell^2$-penalization provides a tractable optimization problem which provides shrinkage but no selection, and $\ell^0$-penalization provides a difficult optimization problem which provides selection but no shrinkage, it is natural to ask what lies in between. To this end, set </p>
<script type="math/tex; mode=display"> g(\theta) = \|\theta\|_1.</script>
<p>This penalty produces the <em>least absolute shrinkage and selection estimator (lasso)</em>, developed by <a href="http://algomagic.s3.amazonaws.com/algomagic_1f64_lasso.pdf">Tibshirani</a> in 1996. Fortuitously, the $\ell^1$ penalty produces behavior which exactly interpolates between the behavior of the $\ell^0$ and $\ell^2$ penalties: the lasso shrinks some coefficients and sets others to zero. Moreover, using the $\ell^1$ norm turns penalized least-squares into a convex optimization problem. That is, the lasso is the convex relaxation of the $\ell^0$-penalized estimator. </p>
<p>For a suggestion of why, consider the “unit balls” in the plane <script type="math/tex">\{x \in \mathbb{R}^2 : \|x\|_q \le 1 \}</script> for various values of $q$. For $q=0$, the unit ball looks like a cross <strong>+</strong> (with arms extended out to infinity); clearly this is not convex. For $0 < q < 1$, the unit ball looks like a concave diamond <strong>⟡</strong>. For $q=1$, the unit ball is convex and looks like a typical diamond <strong>◆</strong>. In fact, $q=1$ is the smallest value of $q\ge 0$ for which the unit ball is a convex set, and correspondingly, the smallest value of $q$ for which the norm is a convex function.</p>
<p>Geometrically, this “minimal” amount of convexity is quite important. The reason the $\ell^1$-norm still provides selection is due to the presence of corners. The penalized/constrained optimization problem of computing an estimator can be seen geometrically as finding the intersection of the norm ball and the contours of the squared error function; given that the corners of the $\ell^1$ ball are on the axes (representing few non-zero coefficients), such a penalization will provide a sparse estimator. Below is a figure from Tibshirani 1996.</p>
<figure>
<img src="/images/relax-fig2.png" />
</figure>
<blockquote>
<p><strong>Theorem.</strong> With high probability, the mean squared error of the lasso estimator is within a $\log p$ factor of the “best-possible” estimator (i.e., an an estimator if we knew which elements were nonzero in advance).</p>
</blockquote>
<p>For precise results on the performance of the lasso, see <a href="http://arxiv.org/abs/math/0506081">Candes, Tao 2007</a> and <a href="http://arxiv.org/abs/0801.1095">Bickel, Ritov, Tsybakov 2009</a>.</p>
<h3 id="matrix-completion">Matrix Completion</h3>
<p>This post has already gotten quite long, and there is fortunately a wealth of popular science literature on matrix completion, so I will keep this next section particularly vague and imprecise for the sake of brevity. </p>
<p>The matrix completion problem is commonly introduced in the context of the <em>Netflix prize</em> (hence the image for this post). Suppose we have an array where each column represents a film, each row represents a user, and each element is the user’s rating of a particular film. We might have many users, but each user is likely to rate only a small subset of all the films available. Netflix had a contest in the mid-2000s to develop a method to predict users’ ratings for films. This might seem quite hard, but it’s natural to assume that among several million users, there are perhaps only a few thousand “typical user” profiles which constitute the preferences of all the other users.</p>
<p>Mathematically, let $M\in \mathbb{R}^{n_1\times n_2}$. Suppose we only observe $M_{ij}$ for $(i,j) \in N \subset [n_1]\times [n_2]$. The problem is to reconstruct the rest of $M$ given these few observations. In general, this is impossible! But the “typical user” assumption in the context of the Netflix prize can be interpreted as a <em>low rank</em> assumption. That is, we wish to solve the optimization problem:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\text{ min } & \text{rank}(X) \\
\text{ s.t. } & X \in \mathbb{R}^{n_1\times n_2}\\
& X_{ij} = M_{ij} \quad \forall (i,j) \in N
\end{align*}
%]]></script>
<p>Once again, we have a combinatorial selection problem, this time due to the rank, and once again, we can attempt to solve the convexification:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\text{ min } & \|X\|_* \\
\text{ s.t. } & X \in \mathbb{R}^{n_1\times n_2}\\
& X_{ij} = M_{ij} \quad \forall (i,j) \in N,
\end{align*}
%]]></script>
<p>where <script type="math/tex">\|X\|_*</script> is the <em>nuclear norm</em> of $X$, the sum of its singular values. </p>
<blockquote>
<p><strong>Theorem.</strong> Under some regularity conditions on the matrix $X$, and assuming we observe sufficiently many random entries of $M$, then with high probability, nuclear norm minimization will exactly recover $M$. </p>
</blockquote>
<p>For precise results, see <a href="http://arxiv.org/abs/math/0502327">Candes, Tao 2004</a>, <a href="http://arxiv.org/abs/0805.4471">Candes, Recht 2009</a> and <a href="http://arxiv.org/abs/0903.1476">Candes, Tao 2010</a>.</p>
<h2 id="related-notions">Related Notions</h2>
<p>The principle of convexification occurs in at least two other contexts, neither of which I will go into any detail on at all:</p>
<ol>
<li>The <em>Bethe approximation</em> of the log partition function of a Gibbs measure is not necessarily convex, but a “convexified” version can give an upper bound, as analyzed by <a href="http://arxiv.org/abs/1301.0610">Wainwright, Jaakkola, Willsky 2005</a>.</li>
<li>Convexification by applying the Legendre-Fenchel transform is a technique used in <a href="http://en.wikipedia.org/wiki/Calculus_of_variations">Calculus of Variations</a>, an area of analysis focusing on infinite-dimensional optimization problems.</li>
</ol>
<p>For a broad view of convex relaxation and associated computational and statistical problems, see <a href="http://arxiv.org/abs/1211.1073">Chandrasekaran, Jordan 2013</a>.</p>
<div class="footnotes">
<ol>
<li id="fn:poly">
<p>Indeed, even loading an arbitrary real number into a computer would take infinite time and memory! <a href="#fnref:poly" class="reversefootnote">↩</a></p>
</li>
<li id="fn:RvZ">
<p>Naive intuition would suggest the opposite: that by restricting to $\mathbb{Z}$, there are <em>fewer</em> $x$ to “pick from” as compared with searching through $\mathbb{R}$, so it should be easier to find an optimal solution. Unfortunately, the additional constraint tends to make the optimization more difficult. The key is to remember that the structure which made LP tractable is removed with the addition of an integer constraint. <a href="#fnref:RvZ" class="reversefootnote">↩</a></p>
</li>
<li id="fn:linearnote">
<p>This is the general setup of <em>linear</em> regression, but for more general problems, we can fix a <em>dictionary</em> of functions <script type="math/tex">\varphi_1,\cdots, \varphi_{p^*} : \mathbb{R}^p \rightarrow \mathbb{R}^{p^*}</script>, and solve the linear regression problem in $p^*$ dimensions. <a href="#fnref:linearnote" class="reversefootnote">↩</a></p>
</li>
<li id="fn:l0foot">
<p>Note that this is <em>not</em> a norm due to the lack of positive homogeneity. However, it is graced with such a name since <script type="math/tex">\lim_{q\rightarrow 0} \|\theta\|_q = \|\theta\|_0</script>. <a href="#fnref:l0foot" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
<p><a href="http://ssk.im/blog/optimization-relaxations/">Relaxations in Optimization</a> was originally published by Steven Soojin Kim at <a href="http://ssk.im">Steven Soojin Kim</a> on 2014.08.31.</p><![CDATA[Poincaré Inequalities]]>http://ssk.im/blog/poincare-inequalities2014-07-23T00:00:00-04:002014-07-23T00:00:00-04:00Steven Soojin Kimhttp://ssk.imsteven@ssk.im
<ol id="markdown-toc">
<li><a href="#the-classical-poincar-inequality">The (classical) Poincaré inequality</a> <ol>
<li><a href="#the-poincar-constant">The Poincaré constant</a></li>
</ol>
</li>
<li><a href="#probabilistic-poincar-inequality">Probabilistic Poincaré inequality</a></li>
<li><a href="#gaussian-poincar-inequality">Gaussian Poincaré inequality</a> <ol>
<li><a href="#proof-by-efron-stein-inequality">Proof by Efron-Stein inequality</a></li>
<li><a href="#proof-by-markov-semigroups">Proof by Markov semigroups</a></li>
</ol>
</li>
<li><a href="#other-poincar-inequalities">Other Poincaré inequalities</a></li>
</ol>
<h2 id="the-classical-poincar-inequality">The (classical) Poincaré inequality</h2>
<p>In functional analysis, Sobolev inequalities and Morrey’s inequalities are a collection of useful estimates which quantify the tradeoff between integrability and smoothness. The ability to compare such properties is particularly useful when studying regularity of PDEs, or when attempting to show boundedness in a particular space in order to apply the direct method in the calculus of variations. </p>
<p>The Poincaré inequality is an example of this kind of estimate. Let $1\le p \le \infty$ and $U$ a bounded, connected, open subset of $\mathbb{R}^n$, with $C^1$ boundary. For $f:U\rightarrow\mathbb{R}$, denote by $(f)_U = \frac{1}{\vert U\vert} \int_U f(x)\,dx$ the average of $f$ over $U$.</p>
<blockquote>
<p><strong>Theorem.</strong> There exists a constant $c=c(n,p,U)$ such that</p>
<script type="math/tex; mode=display"> \lVert f - (f)_U\rVert_{L^p(U)} \le c \, \lVert\nabla f\rVert_{L^p(U)}, </script>
<p>for all $f \in W^{1,p}(U)$.</p>
</blockquote>
<p>A simple case is when $n=1$, $p=2$, $f \in C^1$, and $U=[-r,r]$. Using the intermediate value theorem, the fundamental theorem of calculus and Hölder’s inequality, one can easily prove the result with the constant $c= 2r$. For a proof of the general result, see Evans §5.8.1.</p>
<p>One way to interpret the Poincaré inequality is as an isoperimetric inequality applied to the level sets of $f$. That is, just as one can control the area of a set via its perimeter (for example, a circle is the unique maximizer of area for a given perimeter), one can control the norm of $f$ via the norm of $\nabla f$. For a discussion of how one might obtain the Poincaré inequality from isoperimetry, see <a href="http://maze5.net/?page_id=790">Nick Alger’s blog</a>. To see how to recover the isoperimetric inequality from the Poincaré inequality, see <a href="http://cornellmath.wordpress.com/2008/05/16/two-cute-proofs-of-the-isoperimetric-inequality/">Peter Luthy’s blog post</a>.</p>
<h3 id="the-poincar-constant">The Poincaré constant</h3>
<p>Aside from the trivial $n=1$ case, what can we say about the constant $c$? When can we recover the <em>optimal</em> constant? Consider the case $p=2$, so the right-hand side of the Poincaré inequality is the Dirichlet energy of $f$. The min-max principle says that the first eigenvalue of the negative Laplacian on $H_0^1(U)$ minimizes the Rayleigh quotient. That is, where $0 < \lambda_1 \le \lambda_2 \le \cdots$ are the eigenvalues of $-\Delta$,</p>
<script type="math/tex; mode=display"> \lambda_1 = \inf_{f \ne 0} \frac{\int_U \vert \nabla f\vert^2 dx}{\int_U \vert f \vert^2 dx}. </script>
<p>This recovers a Poincaré-type inequality with (optimal) constant $\lambda_1^{-1}$. </p>
<p>Note that $\lambda_1$ is related to the <a href="http://en.wikipedia.org/wiki/Cheeger_constant">Cheeger constant</a>, so we have yet another way to make the connection to isoperimetry. For an exposition of this connection in the (relatively) simple case of graphs, see §2.3 of Chung’s book on <a href="http://www.math.ucsd.edu/~fan/research/cb/ch2.pdf">Spectral Graph Theory</a>.</p>
<h2 id="probabilistic-poincar-inequality">Probabilistic Poincaré inequality</h2>
<p>What if we want to prove a Poincaré-type inequality, except we would like to integrate over a measure $\mu$ other than Lebesgue measure? In the PDE and numerical analysis literature, one can find references to “weighted Poincaré inequalities”, where $d\mu(x)=w(x)dx$ is absolutely continuous with respect to Lebesgue measure with density (or “weight”) $w$ on some bounded domain. Typically, these estimates follow from analytical methods. </p>
<p>Instead, let’s try a probabilistic approach. Let $(M,d)$ be a metric space, $\mu$ a probability measure on its Borel sets. Consider the case $p=2$, so that the $L^p$ norm of a measurable function $f$ is just the variance (of the random variable $f$). For some class of measurable functions $f$ on $(M,d)$, we’d like to show that there exists a constant $C$ such that</p>
<script type="math/tex; mode=display">
\begin{equation} \label{mupoinc}
\textrm{Var}_\mu (f) \le C\,\mathbb{E}_\mu \vert \nabla f \vert^2 .
\end{equation}
</script>
<p>This inequality claims that the fluctuations of $f$ are controlled by how quickly $f$ can change. In particular, if $f$ is Lipschitz, then the variance is bounded by a constant! Analogous to the PDE case, “regularity” or “smoothness” (via the first derivative) gives an energy bound which restricts the possible behaviors of the system/function $f$.</p>
<p>Bounds on variance can be manipulated to develop <em>concentration inequalities</em> which bound the probability of $f$ deviating too far from its mean/median. Such estimates are a fundamental tool in statistics and machine learning (e.g., PAC learning, VC theory). </p>
<h2 id="gaussian-poincar-inequality">Gaussian Poincaré inequality</h2>
<blockquote>
<p><strong>Theorem (GPI).</strong>
Let $\mu$ be the standard Gaussian measure on $M=\mathbb{R}^n$. Assume $f:\mathbb{R}^n\rightarrow\mathbb{R}$ is $C^1$. Then, \eqref{mupoinc} holds with $C=1$. </p>
</blockquote>
<p>Note that this inequality is tight! Consider $f(x) = x_1 + x_2 + \cdots + x_n$ and note that $\mu$ is a product measure so there is no covariance between cross terms.</p>
<p><strong>Applications:</strong></p>
<ol>
<li>Consider the $(1+1)$-dimensional Gaussian polymer model. The variance of the ground state energy (the minimum of sums of Gaussians) is bounded by $n+1$, which is surprising given that the expected size grows linearly in $n$. </li>
<li>Consider the Sherrington-Kirkpatrick model with inverse temperature $\beta$. The variance of its free energy (the normalized log partition function) is bounded by $C(\beta)n$.</li>
<li>Let $(g_1,\cdots,g_n)$ be jointly Gaussian (possibly correlated). Then the variance of the maximum of $(g_i)$ is bounded by the maximum of the variances.</li>
</ol>
<p>For these examples and others, the bounds produced by Gaussian Poincaré inequality are known to be suboptimal. For more on when/why this occurs, take a look at these notes on <a href="http://cims.nyu.edu/~nica/CPSS_2012_Notes.pdf">superconcentration</a>.</p>
<h3 id="proof-by-efron-stein-inequality">Proof by Efron-Stein inequality</h3>
<p>We first prove the Efron-Stein inequality, which will allow us to analyze the variability of $f$ in a coordinate-wise manner. Extending a result from one dimension to arbitrary dimension is an example of “tensorization”.</p>
<p>Let $X_1,\cdots,X_n$ be independent random variables such that $X_k$ takes values in some space $\mathcal{X}_k$, and $f:\prod_k \mathcal{X}_k \rightarrow\mathbb{R}$ a measurable function. Let $\mathbb{E}_i$ denote the expectation with respect to the $i$-th coordinate; that is, </p>
<script type="math/tex; mode=display">\mathbb{E}_i f(X) = \mathbb{E}[f(X) \vert X_1,\cdots, X_{i-1}, X_{i+1}, \cdots, X_n],</script>
<p>and let $\textrm{Var}_i$ denote the conditional variance,</p>
<script type="math/tex; mode=display"> \textrm{Var}_i f(X) = \mathbb{E}_i \left[ (f(X) - \mathbb{E}_i f(X))^2 \right] .</script>
<blockquote>
<p><strong>Theorem.</strong></p>
<script type="math/tex; mode=display"> \textrm{Var} \,f(X) \le \mathbb{E} \sum_{i=1}^n \textrm{Var}_i f(X)</script>
</blockquote>
<p><strong>Proof.</strong> First note that if $f(x) = \sum_{i=1}^n x_i$, we have an exact equality since $X_i - \mathbb{E} X_i$ are orthogonal in $L^2$. More generally, we’d like to bound the variance of $f$ by expressing $f(X) - \mathbb{E} f(X)$ as the sum of martingale differences, and somehow exploit the orthogonality of those differences.</p>
<p>Let $Z = f(X)$, $Y = Z - \mathbb{E}Z$, and </p>
<script type="math/tex; mode=display"> Y_i = \mathbb{E}_{(i+1):n} Z - \mathbb{E}_{i:n} Z </script>
<p>for $i=1,\cdots, n$. Then,</p>
<script type="math/tex; mode=display"> \textrm{Var} Z = \mathbb{E}\,Y^2 = \sum_{i=1}^n \mathbb{E}\, Y_i^2 + \sum_{i \ne j} \mathbb{E} Y_i Y_j </script>
<p>The cross terms evaluate to zero due to elementary properties of conditional expectation. As for the first sum, by Jensen’s inequality,</p>
<script type="math/tex; mode=display"> Y_i^2 = ( \mathbb{E}_{(i+1):n} ( Z -\mathbb{E}_{i} Z) )^2 \le \mathbb{E}_{(i+1):n} [ (Z - \mathbb{E}_{i} Z)^2] </script>
<p>$ \square$</p>
<p><strong>Proof of GPI.</strong></p>
<p>Suppose $X \sim \mu$ such that <script type="math/tex">\textrm{Var}_\mu(f) = \textrm{Var} f(X)</script>. Let $\mathbb{E} \lvert \nabla f(X) \rvert^2 < \infty$, since otherwise the result holds trivially. It is sufficient to consider the case $n=1$ since the Efron-Stein inequality allows us to analyze each coordinate separately. Also, suppose $f$ has compact support and is $C^2$ – otherwise, just approximate. </p>
<p>The key insight here is that a Gaussian random variable $X$ is just the scaling limit of a sum of mean 0 variance 1 random variables. Thus, to study the variance of a function of $X$, we can try to study the variance of a function of finite sums. To this end, let $\epsilon_1,\cdots,\epsilon_n$ be independent Rademacher random variables, and let $S_n = \frac{1}{\sqrt{n}} \sum_{j=1}^n \epsilon_j$. For all $i$,</p>
<script type="math/tex; mode=display">
\textrm{Var}_i f(S_n) = \tfrac{1}{4} \left[ f( S_n - \tfrac{\epsilon_i}{\sqrt{n}} + \tfrac{1}{\sqrt{n}} ) - f( S_n- \tfrac{\epsilon_i}{\sqrt{n}} - \tfrac{1}{\sqrt{n}} )\right]^2.
</script>
<p>Apply Efron-Stein to get</p>
<script type="math/tex; mode=display"> \textrm{Var} f(S_n) \le \tfrac{1}{4}\sum_{i=1}^n \mathbb{E}\left[ f( S_n - \tfrac{\epsilon_i}{\sqrt{n}} + \tfrac{1}{\sqrt{n}} ) - f( S_n- \tfrac{\epsilon_i}{\sqrt{n}} - \tfrac{1}{\sqrt{n}} )\right]^2 .</script>
<p>The central limit theorem says $S_n \Rightarrow N(0,1)$, so $\textrm{Var} f(S_n) \rightarrow \textrm{Var} f(X)$. Let $K=\sup_x \vert f”(x)\vert$. By Taylor’s theorem,</p>
<script type="math/tex; mode=display"> \left\vert f( S_n - \tfrac{\epsilon_i}{\sqrt{n}} + \tfrac{1}{\sqrt{n}} ) - ( S_n - \tfrac{\epsilon_i}{\sqrt{n}} - \tfrac{1}{\sqrt{n}} ) \right\vert \le \tfrac{2}{\sqrt{n}} \vert f'(S_n)\vert + \tfrac{2K}{n} . </script>
<p>Then apply the CLT again,</p>
<script type="math/tex; mode=display"> \limsup_{n\rightarrow\infty}\tfrac{1}{4}\sum_{i=1}^n \mathbb{E}\left[ \left( f( S_n - \tfrac{\epsilon_i}{\sqrt{n}} + \tfrac{1}{\sqrt{n}} ) - f( S_n- \tfrac{\epsilon_i}{\sqrt{n}} - \tfrac{1}{\sqrt{n}} ) \right)^2\right] \le \mathbb{E}[f'(X)^2] .</script>
<p>$\square$</p>
<p>This proof (along with several related results) can be found in the book by <a href="http://books.google.com/books/about/Concentration_Inequalities.html?id=koNqWRluhP0C">Boucheron, Lugosi, Massart</a>.</p>
<h3 id="proof-by-markov-semigroups">Proof by Markov semigroups</h3>
<p>For a proof of a rather different flavor, we take a more dynamical view, and think of the Gaussian measure $\mu$ as the invariant measure of some system.</p>
<p>Let $(X_t)$ be a Markov process in some state space, $P_t$ its semigroup, $L$ its generator, $f$ an element of some appropriate domain of test functions. Suppose $(X_t)$ has an invariant measure $\mu$. Note that $\mu$ defines a natural $L^2$ space and an inner product, but we can define another bilinear form $\mathcal{E}$, the <em>Dirichlet form</em>:</p>
<script type="math/tex; mode=display"> \mathcal{E}(f,g) := -(f,Lg) = -\int f Lg \, d\mu. </script>
<p>If $L$ is self-adjoint (with respect to $\mu$ inner-product), the Dirichlet form is symmetric. Note that $L$ is self-adjoint when the Markov process $(X_t)$ is reversible. A related bilinear form is the covariance,</p>
<script type="math/tex; mode=display"> \mathrm{Cov}_\mu(f,g) := \int fg\,d\mu - \int f\,d\mu \, \int g\,d\mu </script>
<blockquote>
<p><strong>Covariance Lemma.</strong>
Let $f,g\in L^2(\mu)$. Suppose when differentiating $(f,P_tg)$ with respect to $t$, we can move the derivative inside. Also, assume the “heat equation” $\partial_t P_t g = L P_t g$ holds. Then,</p>
<script type="math/tex; mode=display"> \mathrm{Cov}_\mu(f,g) = \int_0^\infty \mathcal{E}(f,P_tg)\,dt .</script>
</blockquote>
<p><strong>Proof.</strong>
Note that $P_tg$ tends to $\mathbb{E}_\mu g$ in $L^2$ as $t\rightarrow\infty$. Thus,</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\mathrm{Cov}_\mu(f,g) &= (f,g) - (f,\mathbb{E}_\mu g)\\
&= \lim_{t\rightarrow\infty}\left[ (f,P_0g) - (f,P_tg)\right] \\
&= - \int_0^\infty \partial_t (f,P_tg)\, dt\\
&= - \int_0^\infty (f,\partial_t P_tg) \,dt\\
&= - \int_0^\infty (f,L P_tg)\,dt \\
&= \int_0^\infty \mathcal{E}(f,P_tg)\,dt
\end{align*}
%]]></script>
<p>$\square$</p>
<p>What does $\mathcal{E}$ look like? Consider gradient diffusions. That is, for some potential $V$,</p>
<script type="math/tex; mode=display"> Lf = \nabla V \cdot \nabla f + \Delta f </script>
<p>One can show that this is the generator for the diffusion $dX_t = \nabla V(X_t) dt + \sqrt{2} dB_t$, and has invariant measure $\gamma(dx) = e^{V(x)}\,dx$. That is, $\int Lf \, e^V \,dx = 0$ for all $f$, or $L^\star e^V =0$. Using integration by parts (assuming appropriate boundary conditions on $f,g$),</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\mathcal{E}(f,g) &= - (f, Lg)_\gamma\\
&= -(f, \nabla V \cdot \nabla g + \Delta g)_\gamma\\
&= -(f,\nabla V \cdot \nabla g)_\gamma + (\nabla f + (\nabla V)f, \nabla g)_\gamma\\
&= -(f,\nabla V \cdot \nabla g)_\gamma + (f, \nabla V \cdot \nabla g)_\gamma + (\nabla f , \nabla g)_\gamma\\
&= (\nabla f, \nabla g)_\gamma
\end{align*}
%]]></script>
<p><strong>Proof of GPI</strong></p>
<p>Consider the example of the OU operator, $L= -x \cdot \nabla + \Delta$. Then, the stationary distribution is the standard Gaussian $\mu$. Note that we can explicitly write the semigroup as $P_t f(x) = \mathbb{E}[f(e^{-t}x + \sqrt{1-e^{-2t}}Z)]$, so $\nabla P_t f = e^{-t} P_t \nabla f$. By the covariance lemma,</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\textrm{Var}_{\mu} f &= \int_0^\infty \mathcal{E}(f,P_tf) \,dt\\
&= \int_0^\infty (\nabla f , \nabla P_t f)_\mu\,dt \\
&= \int_0^\infty e^{-t} \mathbb{E}_{\mu} (\nabla f \cdot P_t \nabla f)\, dt\\
(\text{Cauchy-Schwarz}) &\le \int_0^\infty e^{-t} \mathbb{E}_{\mu} \vert\nabla f\vert\, \vert P_t \nabla f\vert\, dt\\
(\text{Hölder}) &\le \int_0^\infty e^{-t} \left(\mathbb{E}_{\mu} \vert\nabla f\vert^2 \, \mathbb{E}_{\mu} \vert P_t \nabla f\vert^2\right)^{1/2}dt \\
(\text{Jensen on } P_t) &\le \int_0^\infty e^{-t} \left(\mathbb{E}_{\mu} \vert\nabla f\vert^2 \, \mathbb{E}_{\mu} P_t \vert\nabla f\vert^2\right)^{1/2}dt\\
&=\mathbb{E}_{\mu}\vert\nabla f\vert^2 \int_0^\infty e^{-t} dt \\
&= \mathbb{E}_{\mu}\vert\nabla f\vert^2
\end{align*}
%]]></script>
<p>$\square$</p>
<h2 id="other-poincar-inequalities">Other Poincaré inequalities</h2>
<p>Earlier, we claimed that we want to prove \eqref{mupoinc} for certain measures $\mu$. The GPI is one example, but it could be more useful to think of it as a special case of an inequality like</p>
<p>\begin{equation} \label{genpoinc}
\textrm{Var}_\mu f \le C \, \mathcal{E}(f,f)
\end{equation}
where $\mathcal{E}$ is the Dirichlet form associated with some Markov process with invariant measure $\mu$. </p>
<p>For example, in the case of finite-state Markov chains, one can analyze the constant $C$ using the <em>canonical paths</em> method. For Poincaré inequalities on Markov random fields, see <a href="http://projecteuclid.org/euclid.aop/1163517230">Wu AoP 2006</a>.</p>
<p>It is possible to prove that $-L$ is positive semi-definite, so it is reasonable to ask whether its spectrum $\lambda_0 \le \lambda_1 \le \cdots $ encodes nice properties. Note that constant functions are in the null space of $L$, and in fact one can prove that the eigenspace of $\lambda_0= 0$ is one-dimensional. Using Plancherel identity in $L^2(\mu)$, one can show that a Poincaré inequality \eqref{genpoinc} holds iff $\lambda_1$ is strictly positive, in which case the optimal constant is $C = \frac{1}{\lambda_1}$. </p>
<p>The persistent appearance of the spectral gap $\lambda_1$ is suggestive of the deep connection between <em>concentration</em> and <em>ergodicity</em>. In some sense, both notions encode some sort of rigidity of a system. For example,</p>
<blockquote>
<p><strong>Theorem.</strong> Let $P_t$ be a Markov semigroup with stationary measure $\mu$. Let $f$ in the domain of $L$, and $C> 0$. The following are equivalent:</p>
<ol>
<li><em>(Poincaré inequality)</em> $\mu$ satisfies \eqref{genpoinc} with constant $C$.</li>
<li><em>(Exponential decay)</em> For all $t$,</li>
</ol>
<script type="math/tex; mode=display"> \left\lVert P_tf - \mathbb{E}_\mu f \right \rVert_{L^2(\mu)} \le e^{- t/C} \lVert f - \mathbb{E}_\mu f\rVert_{L^2(\mu)}.</script>
</blockquote>
<p><strong>Proof.</strong></p>
<p>Assume 1. Then, </p>
<script type="math/tex; mode=display">\frac{d}{dt}\textrm{Var}_\mu (P_tf ) = -2\mathcal{E}(P_tf,P_tf) \le -\frac{2}{C} \textrm{Var}_\mu (P_tf) </script>
<p>where the first equality comes from the definition of Dirichlet form and stationary measure $\mu$, and the second is the Poincare inequality. Thus, <script type="math/tex">\textrm{Var}_\mu (P_t f) \le e^{-2t/C} \textrm{Var}_\mu f</script>, and note that <script type="math/tex">\mathbb{E}_\mu f = \mathbb{E}_\mu P_tf</script>.</p>
<p>Assume 2. Then, using the equality from the previous part again at time $t=0$,</p>
<script type="math/tex; mode=display"> 2 \mathcal{E}(f,f) = - \lim_{t\downarrow 0} \frac{\textrm{Var}_\mu(P_tf) - \textrm{Var}_\mu f }{t} \ge \textrm{Var}_\mu f \, \lim_{t\downarrow 0} \frac{1 - e^{-2t/C}}{t} = \frac{2}{C} \textrm{Var}_\mu f</script>
<p>$\square$.</p>
<p>For more on this connection, see <a href="http://projecteuclid.org/download/pdf_1/euclid.aop/1176991408">Liggett AoP 1989</a>. For a more recent paper on this interplay, see <a href="http://www.sciencedirect.com/science/article/pii/S0022123601937760">Röckner, Wang JFA 2001</a> or <a href="http://www.sciencedirect.com/science/article/pii/S0022123607004259">Bakry, Cattiaux, Guillin JFA 2008</a>.</p>
<p>A great overall reference is the recent book by <a href="http://link.springer.com/book/10.1007%2F978-3-319-00227-9">Bakry, Gentil, Ledoux</a>.</p>
<p><a href="http://ssk.im/blog/poincare-inequalities/">Poincaré Inequalities</a> was originally published by Steven Soojin Kim at <a href="http://ssk.im">Steven Soojin Kim</a> on 2014.07.23.</p><![CDATA[Welcome!]]>http://ssk.im/blog/intro2014-07-22T00:00:00-04:002014-07-22T00:00:00-04:00Steven Soojin Kimhttp://ssk.imsteven@ssk.im
<p>Hello! I plan to use this blog as a little cache where I can tuck away things which I find interesting. This might include ideas of my own, summaries of papers, surveys of classical material, or just cool tidbits I learn from others. I’m sure the direction and tone will develop over time.</p>
<p>For the curious (and as a reminder to myself), I built this blog using <a href="http://jekyllrb.com">Jekyll</a> and the <a href="http://mademistakes.com/articles/so-simple-jekyll-theme/">So Simple Theme</a>. I’m hosted on <a href="https://pages.github.com">Github Pages</a>. The math on this site is rendered through <a href="https://pages.github.com">MathJax</a>. The domain <strong>.im</strong> is the ccTLD for the <a href="http://en.wikipedia.org/wiki/Isle_of_Man">Isle of Man</a>.</p>
<blockquote>
<p><strong>Theorem</strong> Blogging is cool.</p>
</blockquote>
<p><strong>Proof:</strong> Trivial.</p>
<p><a href="http://ssk.im/blog/intro/">Welcome!</a> was originally published by Steven Soojin Kim at <a href="http://ssk.im">Steven Soojin Kim</a> on 2014.07.22.</p>