如何用r语言求函数极值

2023-02-24 08:59:02Python014

如何用r语言求函数极值,第1张

新年好！运用二阶导数求极值的步骤：1、先求一阶导数，令一阶导数等于0，解出来的点，可能就是极值点。这样的点，称为 stationary point，汉语翻译成驻点；2、再求二阶导数，将驻点的坐标代入到二阶导数的表达式。如果大于0，将驻点坐标代入原来的函数，得到的就是最小值；如果小于0，将驻点坐标代入原来的函数，得到的就是最大值；如果二阶导数，是一个大于0的常数，将驻点坐标代入原来的函数，得到的就是最小值； (如 x2、2x2、3x2、4x2、、、、) 如果二阶导数，是一个小于0的常数，将驻点坐标代入原来的函数，得到的就是最大值； (如 -x2、-2x2、-3x2、-4x2、、、、）为了具体说明，距离如下，下面的例子是求最大值跟最小值的乘积是多少？向左转|向右转

1、K最近邻(k-NearestNeighbor，KNN)分类算法，是一个理论上比较成熟的方法，也是最简单的机器学习算法之一。该方法的思路是：如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别，则该样本也属于这个类别。

2、KNN算法中，所选择的邻居都是已经正确分类的对象。该方法在定类决策上只依据最邻近的一个或者几个样本的类别来决定待分样本所属的类别。 KNN方法虽然从原理上也依赖于极限定理，但在类别决策时，只与极少量的相邻样本有关。由于KNN方法主要靠周围有限的邻近的样本，而不是靠判别类域的方法来确定所属类别的，因此对于类域的交叉或重叠较多的待分样本集来说，KNN方法较其他方法更为适合。

3、KNN算法不仅可以用于分类，还可以用于回归。通过找出一个样本的k个最近邻居，将这些邻居的属性的平均值赋给该样本，就可以得到该样本的属性。更有用的方法是将不同距离的邻居对该样本产生的影响给予不同的权值(weight)，如权值与距离成正比。

简言之，就是将未标记的案例归类为与它们最近相似的、带有标记的案例所在的类。

原理及举例

工作原理：我们知道样本集中每一个数据与所属分类的对应关系，输入没有标签的新数据后，将新数据与训练集的数据对应特征进行比较，找出“距离”最近的k（通常k<20）数据，选择这k个数据中出现最多的分类作为新数据的分类。

算法描述

1、计算已知数据集中的点与当前点的距离

2、按距离递增次序排序

3、选取与当前数据点距离最近的K个点

4、确定前K个点所在类别出现的频率

5、返回频率最高的类别作为当前类别的预测

距离计算方法有"euclidean"（欧氏距离）,”minkowski”（明科夫斯基距离）, "maximum"（切比雪夫距离）, "manhattan"（绝对值距离）,"canberra"（兰式距离）, 或 "minkowski"（马氏距离）等

Usage

knn(train, test, cl, k = 1, l = 0, prob =FALSE, use.all = TRUE)

Arguments

train

matrix or data frame of training set cases.

test

matrix or data frame of test set cases. A vector will be interpreted as a row vector for a single case.

factor of true classifications of training set

number of neighbours considered.

minimum vote for definite decision, otherwisedoubt. (More precisely, less thank-ldissenting votes are allowed, even

ifkis increased by ties.)

prob

If this is true, the proportion of the votes for the

winning class are returned as attributeprob.

use.all

controls handling of ties. If true, all distances equal

to thekth largest are

included. If false, a random selection of distances equal to thekth is chosen to use exactlykneighbours.

kknn(formula = formula(train), train, test, na.action = na.omit(), k = 7, distance = 2, kernel = "optimal", ykernel = NULL, scale=TRUE, contrasts = c('unordered' = "contr.dummy", ordered = "contr.ordinal"))

参数：

formula A formula object.

train Matrix or data frame of training set cases.

test Matrix or data frame of test set cases.

na.action A function which indicates what should happen when the data contain ’NA’s.

k Number of neighbors considered.

distance Parameter of Minkowski distance.

kernel Kernel to use. Possible choices are "rectangular" (which is standard unweighted knn), "triangular", "epanechnikov" (or beta(2,2)), "biweight" (or beta(3,3)), "triweight" (or beta(4,4)), "cos", "inv", "gaussian", "rank" and "optimal".

ykernel Window width of an y-kernel, especially for prediction of ordinal classes.

scale Logical, scale variable to have equal sd.

contrasts A vector containing the ’unordered’ and ’ordered’ contrasts to use

kknn的返回值如下：

fitted.values Vector of predictions.

CL Matrix of classes of the k nearest neighbors.

W Matrix of weights of the k nearest neighbors.

D Matrix of distances of the k nearest neighbors.

C Matrix of indices of the k nearest neighbors.

prob Matrix of predicted class probabilities.

response Type of response variable, one of continuous, nominal or ordinal.

distance Parameter of Minkowski distance.

call The matched call.

terms The ’terms’ object used.

iris%>%ggvis(~Length,~Sepal.Width,fill=~Species)

library（kknn）

data（iris）

dim(iris)

m<-(dim(iris))[1]

val<-sample(1:m,size=round(m/3）,replace=FALSE,prob=rep(1/m,m))

建立训练数据集

data.train<-iris[-val,]

建立测试数据集

data.test<-iris[val,]

调用kknn 之前首先定义公式

formula ： Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width

iris.kknn<-kknn(Species~.,iris.train,iris.test,distance=1,kernel="triangular")

summary(iris.kknn)

# 获取fitted.values

fit <- fitted(iris.kknn)

# 建立表格检验判类准确性

table(iris.valid$Species, fit)

# 绘画散点图，k-nearest neighbor用红色高亮显示

pcol <- as.character(as.numeric(iris.valid$Species))

pairs(iris.valid[1:4], pch = pcol, col = c("green3", "red")[(iris.valid$Species != fit)+1]

二、R语言knn算法

install.packages("class")

library(class)

对于新的测试样例基于距离相似度的法则，确定其K个最近的邻居，在K个邻居中少数服从多数

确定新测试样例的类别

1、获得数据

2、理解数据

对数据进行探索性分析，散点图

如上例

3、确定问题类型，分类数据分析

4、机器学习算法knn

5、数据处理，归一化数据处理

normalize <- function(x){

num <- x - min(x)

denom <- max(x) - min(x)

return(num/denom)

}

iris_norm <-as.data.frame(lapply(iris[,1:4], normalize))

summary(iris_norm)

6、训练集与测试集选取

一般按照3:1的比例选取

方法一、set.seed(1234)

ind <- sample(2,nrow(iris), replace=TRUE, prob=c(0.67, 0.33))

iris_train <-iris[ind==1, 1:4]

iris_test <-iris[ind==2, 1:4]

train_label <-iris[ind==1, 5]

test_label <-iris[ind==2, 5]

方法二、

ind<-sample(1:150,50)

iris_train<-iris[-ind,]

iris_test<-iris[ind,1:4]

iris_train<-iris[-ind,1:4]

train_label<-iris[-ind,5]

test_label<-iris[ind,5]

7、构建KNN模型

iris_pred<-knn(train=iris_train,test=iris_test,cl=train_label,k=3)

8、模型评价

交叉列联表法

table（test_label,iris_pred)

实例二

数据集

http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data

导入数据

dir <-'http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data'wdbc.data <-read.csv(dir,header = F)

names(wdbc.data) <- c('ID','Diagnosis','radius_mean','texture_mean','perimeter_mean','area_mean','smoothness_mean','compactness_mean','concavity_mean','concave points_mean','symmetry_mean','fractal dimension_mean','radius_sd','texture_sd','perimeter_sd','area_sd','smoothness_sd','compactness_sd','concavity_sd','concave points_sd','symmetry_sd','fractal dimension_sd','radius_max_mean','texture_max_mean','perimeter_max_mean','area_max_mean','smoothness_max_mean','compactness_max_mean','concavity_max_mean','concave points_max_mean','symmetry_max_mean','fractal dimension_max_mean')

table(wdbc.data$Diagnosis)## M = malignant, B = benign

wdbc.data$Diagnosis <- factor(wdbc.data$Diagnosis,levels =c('B','M'),labels = c(B ='benign',M ='malignant'))

生物统计学是生物数学中最早形成的一大分支，它是在用统计学的原理和方法研究生物学的客观现象及问题的过程中形成的，生物学中的问题又促使生物统计学中大部分基本方法进一步发展。生物统计学是应用统计学的分支，它将统计方法应用到医学及生物学领域，对于生物医学领域科研人员及在读学生，理解好统计学的基本方法及原理，才能真正准确的运用统计学的方法分析解释科研数据，得出更令人信服的结论，本系列专题将配合统计开源工具R语言系统介绍统计学在生物医学领域的原理及用法。

大数定律：重要实验测试足够多，样本均值就会趋近于总体的期望值。

中心极限定理：许多小的随机因素的叠加总会使总体的分布趋近于正态分布；不管总体分布是什么，只要样本量足够大，就可以把样本的均值视为服从正态分布。

数据分布情况

离散型变量：

吸烟情况

列表

饼图

两个变量之间的关系

两个连续变量（身高体重）：

散点图

一个离散一个连续：

吸烟和身高