python逻辑回归怎么求正系数

2023-02-25 19:40:02Python031

python逻辑回归怎么求正系数,第1张

Python 逻辑回归求正系数的方法可以分为两种：

1. 使用线性模型的求解方法：可以使用sklearn中的LogisticRegression类来求解正系数，调用其中的fit()方法就可以求解出正系数。

2. 使用梯度下降法：可以自己实现梯度下降法，通过不断迭代更新正系数，最终获得最优的正系数。

是的，python逻辑回归可以画出列线图。列线图是一种用来表示目标变量与自变量之间关系的可视化图表。可以使用python的matplotlib库来绘制列线图，以更好地理解模型的结果。

以下为python代码，由于训练数据比较少，这边使用了批处理梯度下降法，没有使用增量梯度下降法。

##author:lijiayan##data:2016/10/27

##name:logReg.pyfrom numpy import *import matplotlib.pyplot as pltdef loadData(filename):

data = loadtxt(filename)

m,n = data.shape print 'the number of examples:',m print 'the number of features:',n-1 x = data[:,0:n-1]

y = data[:,n-1:n] return x,y#the sigmoid functiondef sigmoid(z): return 1.0 / (1 + exp(-z))#the cost functiondef costfunction(y,h):

y = array(y)

h = array(h)

J = sum(y*log(h))+sum((1-y)*log(1-h)) return J# the batch gradient descent algrithmdef gradescent(x,y):

m,n = shape(x) #m: number of training examplen: number of features x = c_[ones(m),x] #add x0 x = mat(x) # to matrix y = mat(y)

a = 0.0000025 # learning rate maxcycle = 4000 theta = zeros((n+1,1)) #initial theta J = [] for i in range(maxcycle):

h = sigmoid(x*theta)

theta = theta + a * (x.T)*(y-h)

cost = costfunction(y,h)

J.append(cost)

plt.plot(J)

plt.show() return theta,cost#the stochastic gradient descent (m should be large,if you want the result is good)def stocGraddescent(x,y):

m,n = shape(x) #m: number of training examplen: number of features x = c_[ones(m),x] #add x0 x = mat(x) # to matrix y = mat(y)

a = 0.01 # learning rate theta = ones((n+1,1)) #initial theta J = [] for i in range(m):

h = sigmoid(x[i]*theta)

theta = theta + a * x[i].transpose()*(y[i]-h)

cost = costfunction(y,h)

J.append(cost)

plt.plot(J)

plt.show() return theta,cost#plot the decision boundarydef plotbestfit(x,y,theta):

plt.plot(x[:,0:1][where(y==1)],x[:,1:2][where(y==1)],'ro')

plt.plot(x[:,0:1][where(y!=1)],x[:,1:2][where(y!=1)],'bx')

x1= arange(-4,4,0.1)

x2 =(-float(theta[0])-float(theta[1])*x1) /float(theta[2])

plt.plot(x1,x2)

plt.xlabel('x1')

plt.ylabel(('x2'))

plt.show()def classifyVector(inX,theta):

prob = sigmoid((inX*theta).sum(1)) return where(prob >= 0.5, 1, 0)def accuracy(x, y, theta):

m = shape(y)[0]

x = c_[ones(m),x]

y_p = classifyVector(x,theta)

accuracy = sum(y_p==y)/float(m) return accuracy

调用上面代码：

from logReg import *

x,y = loadData("horseColicTraining.txt")

theta,cost = gradescent(x,y)print 'J:',cost

ac_train = accuracy(x, y, theta)print 'accuracy of the training examples:', ac_train

x_test,y_test = loadData('horseColicTest.txt')

ac_test = accuracy(x_test, y_test, theta)print 'accuracy of the test examples:', ac_test

学习速率=0.0000025，迭代次数=4000时的结果：

似然函数走势（J = sum(y*log(h))+sum((1-y)*log(1-h))），似然函数是求最大值，一般是要稳定了才算最好。

下图为计算结果，可以看到训练集的准确率为73%，测试集的准确率为78%。

这个时候，我去看了一下数据集，发现没个特征的数量级不一致，于是我想到要进行归一化处理：

归一化处理句修改列loadData(filename)函数：

def loadData(filename):

data = loadtxt(filename)

m,n = data.shape print 'the number of examples:',m print 'the number of features:',n-1 x = data[:,0:n-1]

max = x.max(0)

min = x.min(0)

x = (x - min)/((max-min)*1.0) #scaling y = data[:,n-1:n] return x,y

在没有归一化的时候，我的学习速率取了0.0000025（加大就会震荡，因为有些特征的值很大，学习速率取的稍大，波动就很大），由于学习速率小，迭代了4000次也没有完全稳定。现在当把特征归一化后（所有特征的值都在0~1之间），这样学习速率可以加大，迭代次数就可以大大减少，以下是学习速率=0.005，迭代次数=500的结果：

此时的训练集的准确率为72%，测试集的准确率为73%

从上面这个例子，我们可以看到对特征进行归一化操作的重要性。

速率系数梯度线图特征

# 上一篇：c语言sum是什么意思

# 下一篇：为什么说 Python 是强类型语言？