R语言igraph学习路径

远客岂知今再到,老僧能记昔相逢。

Posted by Chevy on March 1, 2023

本文讨论igraph的使用

准备工作

首先我们需要使用igraph是因为在protein-protein-interaction或者disease-gene中networking visualization是很重要的,而igraph就是一个很好的用来展示的方法

Protein-protein-interaction或者disease-gene的相互关系来自DisGeNET data/STRING data

使用disgenet2r 包可以提取gene-disease associations (GDAs)

  • https://www.disgenet.org/static/disgenet2r/disgenet2r.html

在DO官网可以查询Disease-Ontology来pull down gene sets

  • https://disease-ontology.org/?id=DOID:863

作图

拿到关系数据后,使用igraph就可以构建network

library(devtools)
library(tidyverse)
library(disgenet2r)
library(igraph)

# ==============================================================================
# disease
# ==============================================================================
neurological_metabolic <- disease2gene( disease = c("D009422"),vocabulary = "MESH",
                                        database = "CURATED",
                                        score = c(0,1))
neurological_metabolic <- disease2gene( disease = c("0014667", "863"),vocabulary = "DO",
                                        database = "CURATED",
                                        score = c(0,1))

plot(neurological_metabolic, prop = 50)
results <- disgenet2r::extract(neurological_metabolic)
results %>% head
# ==============================================================================
# plot
# ==============================================================================
nodes <- data.frame(c(results$disease_class_name %>% unique(), results$gene_symbol), 
                    c("Nutritional and Metabolic Diseases", "Nervous System Diseases",
                      ifelse(results$gene_symbol %in% potential_targets, "Potential MGD targets", "Non-MGD targets")))
colnames(nodes)<-c("label","location")

### 构建network
net_pc <- results %>% dplyr::select(disease_class_name, gene_symbol, gene_dsi) %>% 
  rename(from = disease_class_name, to = gene_symbol, weight=gene_dsi) %>% 
  igraph::graph_from_data_frame(directed = T, vertices = nodes)

plot(net_pc)

###计算节点的度
deg <- degree(net_pc, mode="all")
deg[1:2] <- c(2, 3)

###设置颜色
vcolor <- c("orange","lightblue", "tomato","red")


###指定节点的颜色
V(net_pc)$color <- vcolor[factor(V(net_pc)$location)]

###指定边的颜色
E(net_pc)$width <- E(net_pc)$weight/2

pdf(file = "Neurological_Metabolic Disease.pdf", width = 10, height = 10)
plot(net_pc,
     vertex.size=5*deg,
     vertex.label.cex = 0.7,
     vertex.label.family = "sans", 
     vertex.label.dist= 1, 
     vertex.label.font = 2,
     edge.arrow.mode = 1, 
     layout = l, 
     main = "Degron Potential MGD Targets",
     edge.color="gray50", 
     edge.arrow.size=.3,
     edge.curved=.3)
legend(x=-1.4, y=1, levels(factor(V(net_pc)$location)), pch = 21, col="#777777", pt.bg = vcolor)
legend(x=-1.4, y=0.5, c("DiseaseOntologyID:863", "DiseaseOntologyID:0014667"), pch = 21, col="#777777", pt.bg = c("orange", "tomato"))
dev.off()

参考作图代码:

  • https://zhuanlan.zhihu.com/p/126126781
  • https://r-graph-gallery.com/248-igraph-plotting-parameters.html
par(bg="black")

plot(network, 
    
    # === vertex
    vertex.color = rgb(0.8,0.4,0.3,0.8),          # Node color
    vertex.frame.color = "white",                 # Node border color
    vertex.shape="circle",                        # One of “none”, “circle”, “square”, “csquare”, “rectangle” “crectangle”, “vrectangle”, “pie”, “raster”, or “sphere”
    vertex.size=14,                               # Size of the node (default is 15)
    vertex.size2=NA,                              # The second size of the node (e.g. for a rectangle)
    
    # === vertex label
    vertex.label=LETTERS[1:10],                   # Character vector used to label the nodes
    vertex.label.color="white",
    vertex.label.family="Times",                  # Font family of the label (e.g.“Times”, “Helvetica”)
    vertex.label.font=2,                          # Font: 1 plain, 2 bold, 3, italic, 4 bold italic, 5 symbol
    vertex.label.cex=1,                           # Font size (multiplication factor, device-dependent)
    vertex.label.dist=0,                          # Distance between the label and the vertex
    vertex.label.degree=0 ,                       # The position of the label in relation to the vertex (use pi)
    
    # === Edge
    edge.color="white",                           # Edge color
    edge.width=4,                                 # Edge width, defaults to 1
    edge.arrow.size=1,                            # Arrow size, defaults to 1
    edge.arrow.width=1,                           # Arrow width, defaults to 1
    edge.lty="solid",                             # Line type, could be 0 or “blank”, 1 or “solid”, 2 or “dashed”, 3 or “dotted”, 4 or “dotdash”, 5 or “longdash”, 6 or “twodash”
    edge.curved=0.3    ,                          # Edge curvature, range 0-1 (FALSE sets it to 0, TRUE to 0.5)
    )

引申

可以尝试使用ggraph画更好看的network plot

文献引用

Large-scale analysis of disease pathways in the human interactome

Human protein-protein interaction network.

We use the human PPI network compiled by Menche et al.18 and Chatr-Aryamontri et al.25 Culled from 15 databases, the network contains physical interactions experimentally documented in humans, such as metabolic enzyme-coupled interactions and signaling interactions. The network is unweighted and undirected with n = 21,557 proteins and m = 342,353 experimentally validated physical interactions. Proteins are mapped to genes and the largest connected component of the PPI network is used in the analysis. We also investigate two other PPI network datasets to make sure that our findings are not specific to the version of the PPI network we are using. Unless specified, results in the paper are stated with respect to the first dataset. The other two PPI networks are from the BioGRID database25 and the STRING database.26 Both of these networks are restricted to those edges that have been experimentally verified.

Protein-disease associations.

A protein-disease association is a tuple (u, d) indicating that alteration of protein u is linked to disease d. Protein-disease associations are pulled from DisGeNET, a platform that centralized the knowledge on Mendelian and complex diseases.2 We examine over 21,000 proteindisease associations, which are split among the 519 diseases that each has at least 10 associated disease proteins. The diseases range greatly in complexity and scope; the median number of associations per disease is 21, but the more complex diseases, e.g., cancers, have hundreds of associations.

Disease categories.

Diseases are subdivided into categories and subcategories using the Disease Ontology.27 The diseases in the ontology are each mapped to one or more Unified Medical Language System (UMLS) codes, and of the 519 diseases pulled from DisGeNET, 290 have a UMLS code that maps to one of the codes in the ontology. For the purposes of this study, we examine the second-level of the ontology; this level consists of 10 categories, such as cancers (68 diseases), nervous system diseases (44), cardiovascular system diseases (33), and immune system diseases (21). Altogether, we use human disease and PPI network information that is more comprehensive than in previous works,7,18,22 which focused on smaller sets of diseases and proteins.