Paper Summary 1: Online Convex Programming

0 Paper: Zinkevich, Martin. "Online convex programming and generalized infinitesimal gradient ascent." In Proceedings of the 20th international conference on machine learning (icml-03), pp. 928-936. 2003.

1 Motivation: A general setting is like this. Imagine a farmer who decides what to plant each year. She has certain restrictions on her resources, both land and labor, as well as restrictions on the output she is allowed to produce. How can she select which crops to grow without knowing in advance what the prices will be?

 2 Problem statement

Given a known feasible set  F\subseteq R^n and an infinite sequence of convex functions\left\{ c^1, c^2, ... \right\}  that  c^t:F\rightarrow R at step t. By the time a decision x\in F is made, a cost function c^t associate with the decision is assigned. How to make the decision so as to minimize the cost? And how to evaluate the algorithm that makes the decision?

3 Regret of an algorithm:

Regret is a metric to assess the performance of a decision making algorithm . Give an algorithm Aand its selected decision \{ x^1, x^2, ...\}, the regret of A can be defined as follows.

The cost of A
The cost of a static solution x in F


The regret of A

\min\limits_{x \in F} \ C_x(T) means that select a fixed x \in F such that the sum of costs received by A with x as input is minimized. 

4 Important Notations

1) \| x \| = \sqrt{x \cdot x}

2) d(x,y)=\| x-y\|

3) projection P(y)=\mathop{\arg\min}_{x \in F} d(x,y)

4) \| F \| = \max_{x,y\in F} d(x,y)

5)  \| \nabla  c  \| = \sup_{x\in F, t \in\{1,2,...\}} \| \nabla  c^t(x)  \|

5 Algorithm 1: Greedy Projection 

1) Definition 

Greedy Projection

where, \eta _tis the learning rate. What P(\cdot) does is to map the gradient descent update into the feasible set by projection. 

2) Upper bound of regret

For \eta _t=t^{-1/2}

The upper bound of regret of Greedy Projection

6 Algorithm 2: Lazy Projection 

1) Definition 

For an arbitrary start y^1=x^1 \in F,

Lazy projection

2) Upper bound of regret

For fixed learning rate,


The upper bound of regret of Lazy Projection

7 Algorithm 3: Generalized Infinitesimal Gradient Ascent

0) Background: Repeated Games

Algorithm 3 is formulated from the concept of repeated games. Consider a matching game that a player needs to match two sets of actions A=\{a_1, a_2, a_3\} and Y=\{y_1, y_2, y_3\}, where a utility function u:A\times Y \rightarrow \R can be defined as u(a_1, y_1)=u(a_2, y_2)=u(a_3, y_4)=1 and otherwise u(a_i, y_i)=0 

(1) history: a sequence of joint actions.

(1)' H^t=(A\times Y)^t is the set of all histories of length t

(2)' |h| is the length of history h

(3)' e.g. of h with length |h|=3

h=\{(a_3,y_1), (a_1,y_2), (a_2,y_3), (a_2,y_2), (a_3,y_3), (a_1,y_2)\}

(4)' utility of a history 

To access the history, define h_i to be the i-th joint action. E.g., h_3=(a_2, y_3)and h_{3,1}=a_2

utility of history h

(5)' history change

 h^{*\rightarrow a}=\{(a_i,y_i) \in h| (a_i,y_i)=(a,y_i)\}

Namely, replace each action of h with an action a 

(6)' regret of not playing action a


 regret of not playing action  a

(7)' the maximum regret,or just simply regret

regret of repeated games

The most important aspect of this definition of regret is that regret is a function of the resulting history, independent of the strategies that generated that history.

(8)' behavior \sigma

Let \Delta (S) be the set of all probabilities over S

Let \Pr_{x \in D}[P(x)] be the probability that the boolean predicate P(x) is true for x is selected from a distribution D

Then \sigma : H \rightarrow \Delta(A) is a function from histories of past actions to distributions over the next action of the player.

(9)' environment \rho

\rho: H \rightarrow \Delta (Y) is a function from the history of past actions to distributions over the next action of the environment.

(10)' universally consistent

a behavior \sigma  is universally consistent if for any \epsilon >0 there exists a T such that for all \rho :


universally consistent

In other words, after some time, with high probability the average regret never again exceeds \epsilon

1) Definition 

For an arbitrary starty^1=x^1 \in F, do the followings

(1) Play according to x^t: play action i with probability x^t_{i}.

(2) Observe the action h_(t,2) \in Y of the other player and calculate 


update of action

2) upper bound of regret

Regarding the special example presented in (0) background, we have \| F\|\leq \sqrt{2}, and \|\nabla c \| \leq \sqrt{\|A\|}|u|, where 


definition of abs(u)

Then for \eta _t=t^{-1/2}, the expected regret of Generalized Infinitesimal Gradient Ascent for all obvious determinsitic environments \rho, for all a \in A, for all T is 


upper bound of  regret of generalized infinitesimal gradient ascent

Note: An environment is an oblivious deterministic environment if it plays the same sequence of actions regardless of the actions of the player.

8 Summary

The paper formulated online convex programing and introduced several algorithms including the three algorithms presented in this post. A index called regret was utilized to evaluate the performance of the algorithms. Finally, the upper bound of regret of the discussed algorithms were derived. 

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,793评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,567评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,342评论 0 338
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,825评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,814评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,680评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,033评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,687评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 42,175评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,668评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,775评论 1 332
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,419评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,020评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,978评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,206评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,092评论 2 351
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,510评论 2 343

推荐阅读更多精彩内容

  • 夜莺2517阅读 127,709评论 1 9
  • 版本:ios 1.2.1 亮点: 1.app角标可以实时更新天气温度或选择空气质量,建议处女座就不要选了,不然老想...
    我就是沉沉阅读 6,876评论 1 6
  • 我是一名过去式的高三狗,很可悲,在这三年里我没有恋爱,看着同龄的小伙伴们一对儿一对儿的,我的心不好受。怎么说呢,高...
    小娘纸阅读 3,375评论 4 7
  • 这些日子就像是一天一天在倒计时 一想到他走了 心里就是说不出的滋味 从几个月前认识他开始 就意识到终究会发生的 只...
    栗子a阅读 1,613评论 1 3