“`” 参考回答:
考虑一个多分类问题,即预测变量y可以取k个离散值中的任何一个.比如一个邮件分类系统将邮件分为私人邮件,工作邮件和垃圾邮件。由于y仍然是一个离散值,只是相对于二分类的逻辑回归多了一些类别。下面将根据多项式分布建模。
考虑将样本共有k类,每一类的概率分别为<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779904083_FA834A2A351FCECCB390A113766A14E6"">,由于<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779889154_068563C4CCE6ED4D30EA77C148651ED8"">,所以通常我们只需要k-1个参数<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779871865_012CD98F3DD13B73A2ABC8CF62140079"">即可
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779851931_EE0E04B90B197D01FC72D54872475928"">,<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779837979_E0D3F029CD923FC44CE5930507D3F8CD"">
为了推导,引入表达式:
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779821121_9E13972C928FDDD8E237A5BA7FBE8425"">
上面T(y)是k-1维列向量,其中y = 1, 2, …k.
T(y)i 表示向量T(y)的第i个元素。
还要引入表达式<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779797793_6E0A875948F3AF072512EAB55646B2B6"">,如果大括号里面为真,则真个表达式就为1,否则为0.例如:1{2=3} = 0和1{3=3} = 1.
则上面的k个向量就可以表示为<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779774034_FF03AB84CED37544257D8E35FB25C6AB"">
以为y只能属于某一个类别,于是T(y)中只能有一个元素为1其他元素都为0,可以求出k-1个元素的期望:<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779757600_1F26E44B81F04A2A4B746C4E4AD7240C"">
定义:<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779740293_C2E8175C478475C6EA262AC68F6CAB20"">
其中i = 1,2,…k.则有:
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779722440_3DD00AD5C16193C84289DB5E3FDDC4EE"">
也就容易得出:<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779707511_DA251D425D8A2FF247168F4E14CAD26E"">,由该式和上面使得等式:<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779693940_5F222A0B6F85FEA592E9F8DAF500A449"">一起可以得到:<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779676255_0DC3386AA07BF5AE5D8CBD5307DFC3B7"">这个函数就是softmax函数。
然后假设<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779658531_C36B3EC6D43EBE296DAFDA84705C8470"">和<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779643255_8520C7E0FC27AAE84E33AAF72B960BE3"">具有线性关系,即<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779622779_D6D168A0622DB60E18D8A68A6640538E"">
于是从概率的角度出发:
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779602408_70E4F65AA49CB3510018331DB1AA3D2D"">
其中<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779584405_08483E0FEB6D39010C80FCE23C9FC076"">这个模型就是softmax回归(softmax regression), 它是逻辑回归的泛化。
这样我们的输出:
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779560159_909CF0609C6A5802CF430C8EB9F3293E"">
就是输出了x属于(1,2,…k-1)中每一类的概率,当然属于第k类的概率就是:<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779542826_72C55CD19B591B3525D91B588554A7DF"">
下面开始拟合参数
同样使用最大化参数θ的对数似然函数:
<img alt=""img"" referrerpolicy=""no-referrer"" src=""https://uploadfiles.nowcoder.com/images/20190317/311436_1552779522104_AFD11AE3F759576D9399D481FC8D7C20"">
这里使用梯度下降和牛顿法均可。
<pre><code> "“`
Was this helpful?
0 /
0