本文讨论igraph
的使用
准备工作
首先我们需要使用igraph是因为在protein-protein-interaction或者disease-gene中networking visualization是很重要的,而igraph就是一个很好的用来展示的方法
Protein-protein-interaction或者disease-gene的相互关系来自DisGeNET data/STRING data
使用disgenet2r 包可以提取gene-disease associations (GDAs)
- https://www.disgenet.org/static/disgenet2r/disgenet2r.html
在DO官网可以查询Disease-Ontology来pull down gene sets
- https://disease-ontology.org/?id=DOID:863
作图
拿到关系数据后,使用igraph就可以构建network
library(devtools)
library(tidyverse)
library(disgenet2r)
library(igraph)
# ==============================================================================
# disease
# ==============================================================================
neurological_metabolic <- disease2gene( disease = c("D009422"),vocabulary = "MESH",
database = "CURATED",
score = c(0,1))
neurological_metabolic <- disease2gene( disease = c("0014667", "863"),vocabulary = "DO",
database = "CURATED",
score = c(0,1))
plot(neurological_metabolic, prop = 50)
results <- disgenet2r::extract(neurological_metabolic)
results %>% head
# ==============================================================================
# plot
# ==============================================================================
nodes <- data.frame(c(results$disease_class_name %>% unique(), results$gene_symbol),
c("Nutritional and Metabolic Diseases", "Nervous System Diseases",
ifelse(results$gene_symbol %in% potential_targets, "Potential MGD targets", "Non-MGD targets")))
colnames(nodes)<-c("label","location")
### 构建network
net_pc <- results %>% dplyr::select(disease_class_name, gene_symbol, gene_dsi) %>%
rename(from = disease_class_name, to = gene_symbol, weight=gene_dsi) %>%
igraph::graph_from_data_frame(directed = T, vertices = nodes)
plot(net_pc)
###计算节点的度
deg <- degree(net_pc, mode="all")
deg[1:2] <- c(2, 3)
###设置颜色
vcolor <- c("orange","lightblue", "tomato","red")
###指定节点的颜色
V(net_pc)$color <- vcolor[factor(V(net_pc)$location)]
###指定边的颜色
E(net_pc)$width <- E(net_pc)$weight/2
pdf(file = "Neurological_Metabolic Disease.pdf", width = 10, height = 10)
plot(net_pc,
vertex.size=5*deg,
vertex.label.cex = 0.7,
vertex.label.family = "sans",
vertex.label.dist= 1,
vertex.label.font = 2,
edge.arrow.mode = 1,
layout = l,
main = "Degron Potential MGD Targets",
edge.color="gray50",
edge.arrow.size=.3,
edge.curved=.3)
legend(x=-1.4, y=1, levels(factor(V(net_pc)$location)), pch = 21, col="#777777", pt.bg = vcolor)
legend(x=-1.4, y=0.5, c("DiseaseOntologyID:863", "DiseaseOntologyID:0014667"), pch = 21, col="#777777", pt.bg = c("orange", "tomato"))
dev.off()
参考作图代码:
- https://zhuanlan.zhihu.com/p/126126781
- https://r-graph-gallery.com/248-igraph-plotting-parameters.html
par(bg="black")
plot(network,
# === vertex
vertex.color = rgb(0.8,0.4,0.3,0.8), # Node color
vertex.frame.color = "white", # Node border color
vertex.shape="circle", # One of “none”, “circle”, “square”, “csquare”, “rectangle” “crectangle”, “vrectangle”, “pie”, “raster”, or “sphere”
vertex.size=14, # Size of the node (default is 15)
vertex.size2=NA, # The second size of the node (e.g. for a rectangle)
# === vertex label
vertex.label=LETTERS[1:10], # Character vector used to label the nodes
vertex.label.color="white",
vertex.label.family="Times", # Font family of the label (e.g.“Times”, “Helvetica”)
vertex.label.font=2, # Font: 1 plain, 2 bold, 3, italic, 4 bold italic, 5 symbol
vertex.label.cex=1, # Font size (multiplication factor, device-dependent)
vertex.label.dist=0, # Distance between the label and the vertex
vertex.label.degree=0 , # The position of the label in relation to the vertex (use pi)
# === Edge
edge.color="white", # Edge color
edge.width=4, # Edge width, defaults to 1
edge.arrow.size=1, # Arrow size, defaults to 1
edge.arrow.width=1, # Arrow width, defaults to 1
edge.lty="solid", # Line type, could be 0 or “blank”, 1 or “solid”, 2 or “dashed”, 3 or “dotted”, 4 or “dotdash”, 5 or “longdash”, 6 or “twodash”
edge.curved=0.3 , # Edge curvature, range 0-1 (FALSE sets it to 0, TRUE to 0.5)
)
引申
可以尝试使用ggraph
画更好看的network plot
文献引用
Large-scale analysis of disease pathways in the human interactome
Human protein-protein interaction network.
We use the human PPI network compiled by Menche et al.18 and Chatr-Aryamontri et al.25 Culled from 15 databases, the network contains physical interactions experimentally documented in humans, such as metabolic enzyme-coupled interactions and signaling interactions. The network is unweighted and undirected with n = 21,557 proteins and m = 342,353 experimentally validated physical interactions. Proteins are mapped to genes and the largest connected component of the PPI network is used in the analysis. We also investigate two other PPI network datasets to make sure that our findings are not specific to the version of the PPI network we are using. Unless specified, results in the paper are stated with respect to the first dataset. The other two PPI networks are from the BioGRID database25 and the STRING database.26 Both of these networks are restricted to those edges that have been experimentally verified.
Protein-disease associations.
A protein-disease association is a tuple (u, d) indicating that alteration of protein u is linked to disease d. Protein-disease associations are pulled from DisGeNET, a platform that centralized the knowledge on Mendelian and complex diseases.2 We examine over 21,000 proteindisease associations, which are split among the 519 diseases that each has at least 10 associated disease proteins. The diseases range greatly in complexity and scope; the median number of associations per disease is 21, but the more complex diseases, e.g., cancers, have hundreds of associations.
Disease categories.
Diseases are subdivided into categories and subcategories using the Disease Ontology.27 The diseases in the ontology are each mapped to one or more Unified Medical Language System (UMLS) codes, and of the 519 diseases pulled from DisGeNET, 290 have a UMLS code that maps to one of the codes in the ontology. For the purposes of this study, we examine the second-level of the ontology; this level consists of 10 categories, such as cancers (68 diseases), nervous system diseases (44), cardiovascular system diseases (33), and immune system diseases (21). Altogether, we use human disease and PPI network information that is more comprehensive than in previous works,7,18,22 which focused on smaller sets of diseases and proteins.