Protein-protein-interaction或者disease-gene的相互关系来自DisGeNET data/STRING data
使用disgenet2r 包可以提取gene-disease associations (GDAs)
在DO官网可以查询Disease-Ontology来pull down gene sets
library(devtools) library(tidyverse) library(disgenet2r) library(igraph) # ============================================================================== # disease # ============================================================================== neurological_metabolic <- disease2gene( disease = c("D009422"),vocabulary = "MESH", database = "CURATED", score = c(0,1)) neurological_metabolic <- disease2gene( disease = c("0014667", "863"),vocabulary = "DO", database = "CURATED", score = c(0,1)) plot(neurological_metabolic, prop = 50) results <- disgenet2r::extract(neurological_metabolic) results %>% head # ============================================================================== # plot # ============================================================================== nodes <- data.frame(c(results$disease_class_name %>% unique(), results$gene_symbol), c("Nutritional and Metabolic Diseases", "Nervous System Diseases", ifelse(results$gene_symbol %in% potential_targets, "Potential MGD targets", "Non-MGD targets"))) colnames(nodes)<-c("label","location") ### 构建network net_pc <- results %>% dplyr::select(disease_class_name, gene_symbol, gene_dsi) %>% rename(from = disease_class_name, to = gene_symbol, weight=gene_dsi) %>% igraph::graph_from_data_frame(directed = T, vertices = nodes) plot(net_pc) ###计算节点的度 deg <- degree(net_pc, mode="all") deg[1:2] <- c(2, 3) ###设置颜色 vcolor <- c("orange","lightblue", "tomato","red") ###指定节点的颜色 V(net_pc)$color <- vcolor[factor(V(net_pc)$location)] ###指定边的颜色 E(net_pc)$width <- E(net_pc)$weight/2 pdf(file = "Neurological_Metabolic Disease.pdf", width = 10, height = 10) plot(net_pc, vertex.size=5*deg, vertex.label.cex = 0.7, vertex.label.family = "sans", vertex.label.dist= 1, vertex.label.font = 2, edge.arrow.mode = 1, layout = l, main = "Degron Potential MGD Targets", edge.color="gray50", edge.arrow.size=.3, edge.curved=.3) legend(x=-1.4, y=1, levels(factor(V(net_pc)$location)), pch = 21, col="#777777", pt.bg = vcolor) legend(x=-1.4, y=0.5, c("DiseaseOntologyID:863", "DiseaseOntologyID:0014667"), pch = 21, col="#777777", pt.bg = c("orange", "tomato")) dev.off()
par(bg="black") plot(network, # === vertex vertex.color = rgb(0.8,0.4,0.3,0.8), # Node color vertex.frame.color = "white", # Node border color vertex.shape="circle", # One of “none”, “circle”, “square”, “csquare”, “rectangle” “crectangle”, “vrectangle”, “pie”, “raster”, or “sphere” vertex.size=14, # Size of the node (default is 15) vertex.size2=NA, # The second size of the node (e.g. for a rectangle) # === vertex label vertex.label=LETTERS[1:10], # Character vector used to label the nodes vertex.label.color="white", vertex.label.family="Times", # Font family of the label (e.g.“Times”, “Helvetica”) vertex.label.font=2, # Font: 1 plain, 2 bold, 3, italic, 4 bold italic, 5 symbol vertex.label.cex=1, # Font size (multiplication factor, device-dependent) vertex.label.dist=0, # Distance between the label and the vertex vertex.label.degree=0 , # The position of the label in relation to the vertex (use pi) # === Edge edge.color="white", # Edge color edge.width=4, # Edge width, defaults to 1 edge.arrow.size=1, # Arrow size, defaults to 1 edge.arrow.width=1, # Arrow width, defaults to 1 edge.lty="solid", # Line type, could be 0 or “blank”, 1 or “solid”, 2 or “dashed”, 3 or “dotted”, 4 or “dotdash”, 5 or “longdash”, 6 or “twodash” edge.curved=0.3 , # Edge curvature, range 0-1 (FALSE sets it to 0, TRUE to 0.5) )
Human protein-protein interaction network.
We use the human PPI network compiled by Menche et al.18 and Chatr-Aryamontri et al.25 Culled from 15 databases, the network contains physical interactions experimentally documented in humans, such as metabolic enzyme-coupled interactions and signaling interactions. The network is unweighted and undirected with n = 21,557 proteins and m = 342,353 experimentally validated physical interactions. Proteins are mapped to genes and the largest connected component of the PPI network is used in the analysis. We also investigate two other PPI network datasets to make sure that our findings are not specific to the version of the PPI network we are using. Unless specified, results in the paper are stated with respect to the first dataset. The other two PPI networks are from the BioGRID database25 and the STRING database.26 Both of these networks are restricted to those edges that have been experimentally verified.
A protein-disease association is a tuple (u, d) indicating that alteration of protein u is linked to disease d. Protein-disease associations are pulled from DisGeNET, a platform that centralized the knowledge on Mendelian and complex diseases.2 We examine over 21,000 proteindisease associations, which are split among the 519 diseases that each has at least 10 associated disease proteins. The diseases range greatly in complexity and scope; the median number of associations per disease is 21, but the more complex diseases, e.g., cancers, have hundreds of associations.
Diseases are subdivided into categories and subcategories using the Disease Ontology.27 The diseases in the ontology are each mapped to one or more Unified Medical Language System (UMLS) codes, and of the 519 diseases pulled from DisGeNET, 290 have a UMLS code that maps to one of the codes in the ontology. For the purposes of this study, we examine the second-level of the ontology; this level consists of 10 categories, such as cancers (68 diseases), nervous system diseases (44), cardiovascular system diseases (33), and immune system diseases (21). Altogether, we use human disease and PPI network information that is more comprehensive than in previous works,7,18,22 which focused on smaller sets of diseases and proteins.