网络-R语言进行网络分析的基础包 igraph

2023-03-02 17:29:01Python015

网络-R语言进行网络分析的基础包 igraph,第1张

图有一个类' Igraph '。下面是一个示例，一个使用make_ring创建的环形图:

如果想查看图形的边，可以使用print_all函数:

要创建具有给定结构的小图形，graph_from_literal函数可能是最简单的。它使用R的公式界面，它的手册页包含了许多示例。另一种选择是graph函数，它直接接受数值顶点id。graph.atlas从 Graph Atlas创建图，make_graph 函数可以创建一些特殊的图。

igraph中有很多用于创建图的函数，有确定性的，也有随机的随机图构造器称为‘games’。

要从字段数据创建图，graph_from_edgelist、graph_from_data_frame和graph_from_adjacency_matrix可能是最好的选择。

igraph包括一些经典的随机图，如Erdos-Renyi GNP and GNM graphs (sample_gnp, sample_gnm)，以及一些最近流行的模型，如preferential attachment (sample_pa) and the small-world model (sample_smallworld)。

对于边也是一样，边id总是在1到m之间，m是图中边的总数。

顶点和边在igraph中都有数值的顶点id。顶点id从1开始，总是连续的。即对于一个有n个顶点的图，顶点id在1到n之间。如果某些操作改变了图中的顶点数，例如通过induced_subgraph创建了一个子图，那么顶点将重新编号以满足这个条件。

在igraph中，可以将属性赋给图的顶点或边，或者赋给图本身。igraph提供了基于属性值选择一组顶点或边的灵活构造，有关详细信息，请参阅vertex_attr、V和E。

一些顶点/边/图属性被特殊处理。其中一个是“name”属性。这用于打印图形，而不是数字id(如果存在)。在所有igraph函数中，顶点名称也可以用来指定一个向量或顶点集。例如，度有一个v参数，它给出了度被计算的顶点。这个参数可以作为顶点名称的字符向量给出。

边也可以有一个“name”属性，这也是特别处理的。就像顶点一样，边也可以根据它们的名字来选择，例如在delete_edges和其他函数中。

我们注意到，顶点名称也可以用来选择边。形式“from|to”，其中“from”和“to”是顶点名称，选择一个单一的，可能是有方向的，从“from”到“to”的边。这两种形式也可以在同一个边选择器中混合。

如果您使用save和load来存储/检索图形，那么所有的属性值都将被保留。

igraph提供了三种不同的可视化方法。首先是情节。igraph函数。(实际上你不需要写情节。igraph, plot就够了。这个函数使用常规的R图形，可以与任何R设备一起使用。

第二个函数是tkplot，它使用一个Tk GUI来进行基本的交互式图形操作。(Tk非常需要资源，所以不要对非常大的图尝试这种方法。)

第三种方法需要rgl包并使用OpenGL。

igraph可以处理各种图形文件格式，通常用于读写。我们建议对图形使用GraphML文件格式，除非图形太大。对于较大的图形，建议采用更简单的格式。有关详细信息，请参阅read_graph和write_graph。

igraph development team

igraph Tutorials

系列文章：

networkD3 绘制动态网络

R语言学习笔记之聚类分析

使用k-means聚类所需的包：

factoextra

cluster #加载包

library(factoextra)

library(cluster)l

#数据准备

使用内置的R数据集USArrests

#load the dataset

data("USArrests")

#remove any missing value (i.e, NA values for not available)

#That might be present in the data

USArrests <- na.omit(USArrests)#view the first 6 rows of the data

head(USArrests, n=6)

在此数据集中，列是变量，行是观测值

在聚类之前我们可以先进行一些必要的数据检查即数据描述性统计，如平均值、标准差等

desc_stats <- data.frame( Min=apply(USArrests, 2, min),#minimum

Med=apply(USArrests, 2, median),#median

Mean=apply(USArrests, 2, mean),#mean

SD=apply(USArrests, 2, sd),#Standard deviation

Max=apply(USArrests, 2, max)#maximum

)

desc_stats <- round(desc_stats, 1)#保留小数点后一位head(desc_stats)

变量有很大的方差及均值时需进行标准化

df <- scale(USArrests)

#数据集群性评估

使用get_clust_tendency()计算Hopkins统计量

res <- get_clust_tendency(df, 40, graph = TRUE)

res$hopkins_stat

## [1] 0.3440875

#Visualize the dissimilarity matrix

res$plot

Hopkins统计量的值<0.5，表明数据是高度可聚合的。另外，从图中也可以看出数据可聚合。

#估计聚合簇数

由于k均值聚类需要指定要生成的聚类数量，因此我们将使用函数clusGap()来计算用于估计最优聚类数。函数fviz_gap_stat()用于可视化。

set.seed(123)

## Compute the gap statistic

gap_stat <- clusGap(df, FUN = kmeans, nstart = 25, K.max = 10, B = 500)

# Plot the result

fviz_gap_stat(gap_stat)

图中显示最佳为聚成四类（k=4）

#进行聚类

set.seed(123)

km.res <- kmeans(df, 4, nstart = 25)

head(km.res$cluster, 20)

# Visualize clusters using factoextra

fviz_cluster(km.res, USArrests)

#检查cluster silhouette图

Recall that the silhouette measures (SiSi) how similar an object ii is to the the other objects in its own cluster versus those in the neighbor cluster. SiSi values range from 1 to - 1:

A value of SiSi close to 1 indicates that the object is well clustered. In the other words, the object ii is similar to the other objects in its group.

A value of SiSi close to -1 indicates that the object is poorly clustered, and that assignment to some other cluster would probably improve the overall results.

sil <- silhouette(km.res$cluster, dist(df))

rownames(sil) <- rownames(USArrests)

head(sil[, 1:3])

#Visualize

fviz_silhouette(sil)

图中可以看出有负值，可以通过函数silhouette()确定是哪个观测值

neg_sil_index <- which(sil[, "sil_width"] <0)

sil[neg_sil_index, , drop = FALSE]

## cluster neighbor sil_width

## Missouri 3 2 -0.07318144

#eclust():增强的聚类分析

与其他聚类分析包相比，eclust()有以下优点：