“`” 参考回答:
给定的训练样本是<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656677625_FF34E061732DAAD6F56E628A04C973E6"">,样例间独立,我们想找到每个样例隐含的类别z,能使得p(x,z)最大。p(x,z)的最大似然估计如下:
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656704748_9089EB58C9CA105E9B5104A9D94C7AA7"">
第一步是对极大似然取对数,第二步是对每个样例的每个可能类别z求联合分布概率和。但是直接求<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656732709_6BFE2F1945BF6A79249C6D6DDDEE333B"">一般比较困难,因为有隐藏变量z存在,但是一般确定了z后,求解就容易了。
EM是一种解决存在隐含变量优化问题的有效方法。竟然不能直接最大化<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656754223_12F0C02F17079F8B944E7694DC211CB3"">,我们可以不断地建立<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656770652_D2CDC96BBFB62EEF0487ACD1DC5264D3"">的下界(E步),然后优化下界(M步)。这句话比较抽象,看下面的。
对于每一个样例i,让表示该样例隐含变量z的某种分布,<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656827857_2EE502CE04932EF4279481D28018B177"">满足的条件是<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656804373_EC9091B97CCB534D853840BD04FE6A7F"">。(如果z是连续性的,那么<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656848034_C658E5EB67ECE128C8D07E8CC71149DA"">是概率密度函数,需要将求和符号换做积分符号)。比如要将班上学生聚类,假设隐藏变量z是身高,那么就是连续的高斯分布。如果按照隐藏变量是男女,那么就是伯努利分布了。
可以由前面阐述的内容得到下面的公式:
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656886322_1E1FD7D35C2B1DCAFEB70CABF0239E03"">
(1)到(2)比较直接,就是分子分母同乘以一个相等的函数。(2)到(3)利用了Jensen不等式,考虑到<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656911226_61F97563CECD1A6F6E5028F9524E6541"">是凹函数(二阶导数小于0),而且
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656925875_72041D6C6CC34609D29F55069B0311A2"">
就是<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656966099_2B43AED945D3B62E88A8D38952145614"">的期望(回想期望公式中的Lazy Statistician规则)
设Y是随机变量X的函数<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656992885_40F1E38D81E7721E017831E71763F32F"">(g是连续函数),那么
(1) X是离散型随机变量,它的分布律为<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657014300_70EDAB8FC8EFDAA1942F72ADE1D43390"">,k=1,2,…。若<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657030794_F1079F5DA0EA513EE47824D37795038E"">绝对收敛,则有
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657050271_B07B91CEB729EA962F28412AEBFCB6C7"">
(2) X是连续型随机变量,它的概率密度为<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657068967_464BB8DF4799DC22DE520628B0F42624"">,若<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657087871_34B55AE81571311B22B43DF2F49D3F0F"">绝对收敛,则有
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657107386_9952B4D00DB6F50BF841E077515A3AA9"">
对应于上述问题,Y是<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657125940_C25B7BCDF1E4894B22AC1FAF5FDD7AD2"">,X是<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657141501_FDB405EA6E266038BD5EC5F99B0C0939"">,<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657159215_E240C1B6CD76B3FC1518B7BE52A03B69"">是<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657180118_B84C74766069C38C760D55C4E5709BB1"">,g是<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657197885_93D6433C47DE87FDB33F04C1616D4ED8"">到<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657211255_6665CAB3A3F4293F4531C6BEF6AC158E"">的映射。这样解释了式子(2)中的期望,再根据凹函数时的Jensen不等式:
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657231203_3FDC52400C0A2EBCFC47E0CE42A3EF3F"">
可以得到(3)。
这个过程可以看作是对<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657260170_076D779534DB894413D97466ADF992EB"">求了下界。对于<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657274597_01DDF1500C0F149B8D61AFA16F268C53"">的选择,有多种可能,那种更好的?假设<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657292040_D1B69B77582B1C699434A12C4362D163"">已经给定,那么<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657306876_AE49C56181F81364C52C4320C096A65F"">的值就决定于<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657321955_465152844F4D5F9BDCB7D7D43D7AA23C"">和<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657337012_AA79E45F5BB3F5BF77BED9B4C6C999D3"">了。我们可以通过调整这两个概率使下界不断上升,以逼近<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657360598_6B3983E303AC7B5A489F85F0803A84B1"">的真实值,那么什么时候算是调整好了呢?当不等式变成等式时,说明我们调整后的概率能够等价于<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552657377762_5D1A1F0A842C55BC853101A3A7E9F9F2"">了。按照这个思路,我们要找到等式成立的条件。根据Jensen不等式,要想让等式成立,需要让随机变量变成常数值,这里得到:
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656623279_51F811FC3B63504DD5431AF7002964B3"">
c为常数,不依赖于<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656602182_0D31BBDD73A76C383BB2A5C9E399CCE3"">。对此式子做进一步推导,我们知道<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656587658_E7EAAFAC1B564EB02FB34B7C011B3A95"">,那么也就有<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656564973_1544DF6BE05D68029AD76D2D4B702998"">,(多个等式分子分母相加不变,这个认为每个样例的两个概率比值都是c),那么有下式:
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656540364_98DFAD890ECD62189D5536AA9DB847A9"">
至此,我们推出了在固定其他参数<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656518837_318849129B615DE36D0A83C9FD5C1C81"">后,<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656505656_E712403E822207687A520953DAEDA0A7"">的计算公式就是后验概率,解决了<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656489287_C6C098B994B79A7AF323D95D295E8072"">如何选择的问题。这一步就是E步,建立<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656461569_895AAA9A0A0E2CD4A8B03DE08B3084DF"">的下界。接下来的M步,就是在给定<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656446220_27BD5599218262EDAA8A5001AB587C17"">后,调整<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656431985_2EFEF4BCCDC11C4BAF9D1EBB838FF680"">,去极大化<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656409137_05F42387CE0EDBA59A60B1E5C4811EE0"">的下界(在固定<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656392960_E0DD2E188790E5A02EC754BCFACB62AF"">后,下界还可以调整的更大)。那么一般的EM算法的步骤如下:
循环重复直到收敛{
(E步)对于每一个i,计算
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656340146_9C63ED721D886E77D5FE6E2338B71032"">
(M步)计算
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656322624_4C09F6A58029239D1F23589B927D49A8"">
那么究竟怎么确保EM收敛?假定<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656309240_BF8BCAE96917ACD0A8DD21D12FACD0BE"">和<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656288880_23AD4B453566E704D4D2E202F108B030"">是EM第t次和t+1次迭代后的结果。如果我们证明了<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656266616_45A20D0A9862DB6760A8E6FED66FF022"">,也就是说极大似然估计单调增加,那么最终我们会到达最大似然估计的最大值。下面来证明,选定<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656245516_9A3B5E337162684A35963B34DD2664F8"">后,我们得到E步
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656220402_ADB8903117E980E303E440463D8B8241"">
这一步保证了在给定<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656204236_8E6FC4314DE15D388F98BD3DF3E51DAC"">时,Jensen不等式中的等式成立,也就是
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656172149_602749E68CEB5CE1B57F555BEE7A34C9"">
然后进行M步,固定<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656157638_D920156D24954BA2C5B89763F65C15F3"">,并将<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656141597_9C2812809D5788A879AF146456C3F9E4"">视作变量,对上面的<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656122019_C0D8C9E00395775C9797AEAF5E09C046"">求导后,得到<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656104966_30B30C0B0612F65E8F2D82F5B7C35FF4"">,这样经过一些推导会有以下式子成立:
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656069961_A1C4792A0C76FCE7E8D385FD03CB0F02"">
解释第(4)步,得到<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656053181_0F000183E0D3A734CFC84AE5596DEB58"">时,只是最大化<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656036127_1FC3995EAB08ED094DD4DFC1FA1C04DB"">,也就是<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552656017395_CAE8E56F913E3E851B1CBD9DD0091A14"">的下界,而没有使等式成立,等式成立只有是在固定<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655991598_2895D0185B0F5A2CA83A52CCE36A2B7C"">,并按E步得到<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655973151_CF419F037C10C1CE1567B88065FB90FB"">时才能成立。
况且根据我们前面得到的下式,对于所有的<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655942553_9B8C6D511BFF35C8A749D00197468740"">和<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655923695_CCD81F07EB9E6C74DA25248291C51583"">都成立
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655841514_59FF926517000E3489AB302F701396FF"">
第(5)步利用了M步的定义,M步就是将<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655803749_6A67769E416B886272A59347BE217460"">调整到<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655779212_9389B3273638C142D32F989091FFF783"">,使得下界最大化。因此(5)成立,(6)是之前的等式结果。这样就证明了<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655760860_777D4809B1E4212B0FA8992D6DD39F0A"">会单调增加。一种收敛方法是<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655738327_F6088594CD21E495E728AE032DD5440F"">不再变化,还有一种就是变化幅度很小。
再次解释一下(4)、(5)、(6)。首先(4)对所有的参数都满足,而其等式成立条件只是在固定<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655685749_F81E0FE60A972524B16733C5D081A2CC"">,并调整好Q时成立,而第(4)步只是固定Q,调整<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655605718_922CBCB515D74D41675482672DE098F0"">,不能保证等式一定成立。(4)到(5)就是M步的定义,(5)到(6)是前面E步所保证等式成立条件。也就是说E步会将下界拉到与<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655632422_341A9542C89D756E1ECF40E98D437BAF"">一个特定值(这里<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655514310_C24BD4F81FE2BC277BBA048ACB319ACE"">)一样的高度,而此时发现下界仍然可以上升,因此经过M步后,下界又被拉升,但达不到与<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655537580_09189906EEE453215C5A4209DA8489AD"">另外一个特定值一样的高度,之后E步又将下界拉到与这个特定值一样的高度,重复下去,直到最大值。
如果我们定义
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655051313_C4EBEEB2A4ECD0C589E64DACF8E59A21"">
从前面的推导中我们知,EM可以看作是J的坐标上升法,E步固定<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655442602_F6434E4F72AA10C3D4B71164C3909C3B"">,优化<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655421168_D31E3FC278C55C6FFE20B4AC4D6A70A8"">,M步固定<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655394365_8607A96CD03E4088717E3246934622C1"">。优化<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655322029_D418144E5B109EDDF8B6325FDBAF467C"">
Jensen不等式表述如下:
如果f是凸函数,X是随机变量,那么:<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655171001_A815CFF6D10A553AEDC2BB002E0FFB97"">,特别地,如果f是严格凸函数,当且
仅当X是常量时,上式取等号。所以其下界是<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190315/311436_1552655011455_32E13B7D17BDE8B45B6D788808C74D17"">。
<pre><code> "“`
Was this helpful?
0 /
0