Cohesive blocking

From InterSciWiki

Jump to: navigation, search
Southern Women event network: Davis.net N=32 Peter McMahan igraph vertex.size=20 Moody-White (2003) cohesive blocks algorithm for finding cohesive subgroups. There are not two cohesive subgroups here as claimed by most analysts, but one. See 30 physicists can be wrong
Southern Women event network: Davis.net N=32 Peter McMahan igraph vertex.size=20 Moody-White (2003) cohesive blocks algorithm for finding cohesive subgroups. There are not two cohesive subgroups here as claimed by most analysts, but one. See 30 physicists can be wrong

Contents

[edit] Introduction to network cohesion

(Author: Douglas R. White). Wasserman and Faust (1994:249) devote two chapters to network cohesion, one on one-mode networks (Chapter 7, Cohesive Subgroups), which treats properties of pairwise ties (relatively strong, direct, intense, frequent or positive), and the other on two-mode networks (Chapter 8, Affiliations and Overlapping Subgroups), which treats ties existing among actors through their joint memberships in collectivities. Their Chapter 4 (Graphs, 4.2.8 pp. 155-116 Connectivity) gives definitions of line and node connectivity in graphs. These are treated by White and Harary (2001) as giving criteria for subgraph adhesion (line dependent connectivity) and subgraph cohesion (node dependent connectivity):

Line connectivity λ is the minimum number of lines that must be removed to disconnect a network (adhesion)
Node connectivity κ is the minimum number of nodes that must be removed to disconnect a network (cohesion, multiconnectivity)

Borgatti's slides (2006:19-20) on Graph cohesion refer to these definitions but use only pairwise multiconnectivity as computed by the White and Newman algorithm (2005) for

Pairwise node connectivity κ is the minimum number of nodes that must be removed to disconnect two nodes in a network (pairwise multiconnectivity).
Fig. 1: Borgatti's Science camp network with three 3-connected blocks within a bicomponent. Russ, Gery and John are only 2-cohesive however: John and Gery are a 2-node cut of the graph, and so forth
Fig. 1: Borgatti's Science camp network with three 3-connected blocks within a bicomponent. Russ, Gery and John are only 2-cohesive however: John and Gery are a 2-node cut of the graph, and so forth

Borgatti's (2006:7) blocks are not consistently node-connected by the removal criteria. His p. 20 "Graph connectivity" matrix is incorrectly labeled: This is Arc connectivity used in computing maxflow, not node connectivity.

[edit] Calculation

http://intersci.ss.uci.edu/wiki//Vlado/BoCoRaw.net.R - 18 node Science camp network - just click, copy and paste into R. Fig. 2 Moody's SAS drawing of Fig. 1 using the method of Moody and White 2003.

Note that the tree-structure graph in Peter's algorithm has one extra 3-block that is not correctly computed. This will be fixed, but the three red clusters are correct. The Moody-White SAS algorithm computes the tree graph correctly. Note that the data are called inside the R program from http://intersci.ss.uci.edu/wiki/Vlado/BoCoRaw.net and when downloaded the network can be examined directly in Pajek or Ucinet.

James Moody comments: You're right, this is "Incorrect, it should be 3, as there are only 3 distinct k-components and his graph looks like 4 sets coming from the whole graph."
"The only room for an alternative is whether you posit an "intermediary" 2-component nested in the largest 2-component that contains Holly plus the other 6 in that 3-component. If so, then you have 3 distinct 3-components, but two are nested at level 2 (just below the full graph) and 1 is nested at level 3 -- below the intermediary 2-connected set (like the hierarchy subgraph in the image I sent)."

DRW: So now we need to get Peter to amend or defend his algorithm or justify the difference in results.

[edit] Alternate measures

  • A k-component (with node connectivity κ) is the maximal subgraph for which k is the minimum number of nodes that must be removed to disconnect it (structural k-cohesion, k-multiconnected group, k-cohesive block).
  • A k-component (with node connectivity κ) is the maximal subgraph for which every pair of nodes is connected by at least k non-independent paths, i.e., direct links or indirect paths between the pair of nodes with no common intermediate nodes (structural k-cohesion, k-multiconnected group, k-cohesive block).
  • The first and second definition were proven to be equivalent by Karl Menger (Wikipedia:Menger's theorem, 1927), and in each case the maximal subgraphs in each case are unique.
  • A clique of n nodes is a k-component where k=n-1, i.e., a special case of k-connectivity and thus not a measure of cohesion.

[edit] Insufficient measures

  • A k-core is a maximal subgraph in which each node has at least degree k. This is a necessary condition for a k-component but not a sufficient condition. A k-core need not be connected and is thus not a measure of cohesion.
  • A k-plex is a maximal subgraph of size n in which each node has at least degree n-k. This is a necessary condition for a k-component but not a sufficient condition. A k-plex need not be connected and is thus not a measure of cohesion.
  • Distance criteria for cohesion are independent of k-connectivity except that k-cohesive nodes for k>1 will have finite graph distances.
  • Density criteria for cohesion are independent of k-connectivity except that density is non-zero.
  • Community detection density criteria as to within-group versus between-group ties are independent of k-cohesion except that the k-cohesion within a k-cohesive block is maximal with respect to any larger block. Similarly for LS and Lambda sets.
James Moody comment: -- Does raise the question of the relation between various k-cohesion levels and the modularity score. My sense is that there is no simple relation, since modularity is edge-level sorting and a high k-component nested inside a lower k-component will appear to have lots of "cross-group" ties, precisely because modularity misses the nested nature of the groups. Could emphasize this as important - most community detection features are non nested, and thus treat all groups as equal.
  • In general, k-cohesive subsets are not identical to known outcomes of matrix operations (PCA, SVD, etc.) or multidimensional scaling.

[edit] Weighted measures for valued networks?

  • Pairwise properties of nodes (strength of tie, direct/indirect ties, intensity, frequency, positive sign) are independent of k-connectivity except that when ties are signed they much be positive to be cohesive.
  • It is not clear how these could be used as weightings because they are not uniform across the ties of a cohesive block.
  • It is better to use these as alternate or competing measures, e.g., in multiple regression.

[edit] Cohesion measures for directed networks (nonsymmetric ties)

  • There are some phenomena in which k-cohesion does not act as "a rising tide for all boats" (nodes) in a cohesion subset because the ties are directed but do not form directed cycles, hence some of the relations within the cohesive block may be hierarchical.
  • In the case where directed cycles are thought to have greater cohesive effects than semicycles in directed graphs it may be useful to:
1. Compute the cohesive blocks for semicycles.
2. Compute the directed cycles and compute the nodes not in these cycles but in the semicycles (this assumes that not all ties in the cohesive blocks are symmetric).
3. Disqualify those those nodes in the cohesive blocks.
4. Recompute cohesive blocks for the reduced set of nodes.

James Moody comment: Rick Grannis is working on / has a paper on directed cohesion. It's all based on pair-wise connections, but does make a case for why direction would be useful.

Doug found Grannis draft, 2007 Strong and Unilateral Structural Cohesion.... "In this article, I recast structural cohesion in terms of directed social flow and identify four distinct ways of measuring it: recursive, strong, unilateral, and weak."

Earlier work on some of these aspects of cohesion is found in Structural Models: An Introduction to the Theory of Directed Graphs, 1966. Frank Harary, Robert Z. Norman, and Dorwin Cartwright.

[edit] Cohesion measures for multiple ties

  • It is always better to compute cohesion independently for each type of positive ties, and test the independent and weighted multiplicative effects of each independently.
James Moody comment: I'm just not sure about this; guess it will depend ultimately on the nature of the group. If you have 3 "positive" relations ("Friend" "Coworker" "Kin"), you could imagine easily that the cohesion of the union network would be very different than the cohesion of any particular part, and this would contribute to multi-connectivity.
DRW: Ok, I take the point. This opens up to more interesting questions even if more complex. Do both, and more: (1) compute cohesion independently for each type of positive ties, and test the independent and weighted multiplicative effects of each independently; (2) compute cohesion for each combined pair of ties, threesome combinations, and all types of ties combined into one network. (3) If effects are found, then examine specific what kinds of cycles have effects: either homogeneous in their ties, heterogeneous, or in different cyclic orders. This is the topic of my 2004 article on "Ring Cohesion."
JM: I think the second point "Justification..." is right on; since the simplest thing to do is to treat them seperately.
  • Justification is needed to validate combining multiple types of tie into a single combined tie, then measure the cohesive blocks for the combined tie network, and then test the effects of combined-tie cohesion.

[edit] Membership in multiple overlapping cohesive subsets

  • Each node will have a "highest measure" for the most cohesive subset of which it is a measure.

[edit] Very large networks

  • Cohesive blocks are easily computed for relatively small, sparse networks. Computation is NP-Complete, meaning non-polynomial of an order so complex that if any NPC problem can be solved in polynomial time, then all NPC problems can be solved in polynomial time by a new but unknown procedure thought to be impossible.
  • Moody's spring embedding approximation method may be used as follows:
1. Run a spring embedding algorithm.
2. Divide the resultant scaling space into equal-size quadrants.
3. Count the density of nodes placed in each quadrant as a proxy for cohesive level. Monotonicity of this measure with k-cohesion levels has not been proven.
4. Density clines can be used to identify suspected high-cohesion subnetworks.
5. Such subnetworks can be extracted and run through the k-cohesion algorithm.

[edit] Speed-up

  • For large or dense graphs, it is useful to run the k-core algorthm that assigns an integer k to each node (see above).
  • The subgraph with nodes with k or higher k-cores can be extracted, and the k-cohesion algorithm run on that smaller subgraph.
  • The k-cohesion blocks in a network will always be subgraphs of the corresponding k-core.
  • Where this k-cores are too large or dense, James Moody also suggests:
  • 1. Identify the k-cores of the graph
  • 2. Then run a pair-wise node-connectivity on a random sub-set of pairs within each k-core, to see if you're likely to find node-connectivity less than the k-core level.
  • This is just an approximation, but will likely get you what you want if you're trying to test whether, as in a random graph, k-cores are highly likely to be k-components.

[edit] Cloud computing

On 25 May 2009 at 16:03, drwhite@uci.edu wrote to Jim Moody, Scott White, Peter McMahan:

>> We have an offer for free cloud computing that is worth looking into. Back in the olden days (1985-86) I parallelized my FORTRAN code for the UCSD supercomputer to do the Smith and White paper which had N**4 complexity. Parallelizing code for cohesive blocking might be a way to go given the complexity of the algorithm and the free cloud computing offer. If we can do this we can also do other things by way of ongoing big, complex computing tasks.

>From Jim: I do think there are parallel algorithms for node-connectivity -- we didn't employ them in the SAS code because it's not parallel equipped. But I think it would likely mean re-writing the code from scratch -- but that shouldn't be too hard to do from published papers on the problem. See: http://portal.acm.org/citation.cfm?id=237826

Heck, we could probably hire a talented duke comp sci undergrad to pull together the code... I've CC'd Peter Mucha, in applied math over at UNC, who's the go-to guy for making slow problems faster, he may have some insights on this....

PTs, Jim

[edit] k-Node connectivity

DRW: Let's break this down for cloud computing. The problem of finding all the k-node components is Wikipedia:NP-complete, but finding all the k-separators is #P-complete, meaning a polynomial-time counting reduction. So let's say we take the largest k-core subgraph of a network and find all the k-separators, i.e., sets of j=1-k nodes whose removal disconnects the subgraph. This can be done in computation time O(k2n2n + k3n1.5n). For each successive j the smallest set of separated nodes is cleaved off, leaving this j-node connectivity set for the highest j, along with other fragments. Following the Moody-White (2003) logic we classify the k-components.

[edit] References

Borgatti (2008) graphcohesion.pdf @ Group Cohesion & Shape MGT 780: Social Network Analysis.
Wasserman and Faust (1994). Social Network Analysis. Cambridge: Cambridge University Press.
White and Harary (2001)
White and Newman (2005)

[edit] Peter McMahan's brief example of use (interactively in R) of cohesive blocking

The following example is for an interactive R session, and should apply to any platform.

[edit] Installation and loading

First, if you haven't done so, install the 'igraph' package. You only need to do this if you haven't installed it already or if there's an update to the package (see igraph home page):

install.packages("igraph",dep=TRUE)

Now, you'll need to load igraph into your R environment:

library(igraph)

igraph comes with a very slightly outdated version of the algorithm. It should be updated soon, but until then either following command will load up the most recent version.

source("http://www.charting1968.net/CohesiveBlocks.R") #this does not run a dataset it is just the program
source("http://intersci.ss.uci.edu/wiki/pub/CohesiveBlocks.R") #this does not run a dataset it is just the program
Install R, then cut and paste into R as a command
source("http://intersci.ss.uci.edu/wiki/Vlado/DavisNet32.R") #this runs a dataset and gives results for the Davis, G&G Southern Women

[edit] Simple use example

Now we'll run through an example using the SanJuanSur.net dataset. First we need to load the Pajek-format file as an igraph object:

g <- read.graph(file="http://intersci.ss.uci.edu/wiki/Vlado/SanJuanSur.net", format="pajek")

The 'file' argument to read.graph can take a URL like above or a file on your machine. (A useful trick for local files is to use 'read.graph(file=file.choose(),format="pajek")' to bring up a filesystem browser).

The object 'g' is now an igraph object, so commands like

summary(g)                         ## basic info about the graph
plot(g,layout=layout.kamada.kawai) ## plot the graph using Kamada-Kawai layout algorithm
graph.density(g)                   ## edge density
hist(degree(g))                    ## plot a histogram of the degree distribution

work nicely.

To run the Moody-White cohesive blocking algorithm, run a command like

cb <- cohesive.blocks(g)

(This takes about 20 seconds on my machine). You can see a plot of the results with different layout algorithms:

plot(cb) ## uniform random layout
plot(cb,layout=layout.kamada.kawai)
plot(cb,layout=layout.fruchterman.reingold)

These plots show the network on the left and tree showing the cohesive block inclusion structure on the right. The vertex colors in the full graph plot represent the most cohesive block to which the vertices belong. This example has a simple hierarchy of three cohesive blocks, each a subgraph of the previous one.

If you simply want to see the cohesion of the deepest-nested subset of each vertex, use

max.cohesion(cb)
## if you are using the version of the function that comes
## with igraph, you may need to run the following command
## to 'export' max.cohesion before the above call will work:
##   max.cohesion <- igraph:::maxcohesion

Finally, if you want to export the results to Pajek, use one of the following commands:

write.pajek.bgraph(cb, filename="myBlocks")
write.pajek.bgraph(cb, filename="myBlocks", hierarchy=TRUE)

The first of these commands creates two files. 'myBlocks.net' is a straightforward Pajek network file, and should be very similar to the original 'SanJuanSur.net' we started with. 'myBlocks.clu', the second file, is a Pajek cluster file that groups the vertices based on the connectivity their maximally cohesive subgraph (same as vertex coloring in 'plot(cb)' above).

The second command, with 'hierarchy=TRUE', will output a further set of Pajek cluster files named like 'myBlocks_block2(1).clu', where the first number refers to the (arbitrary) block number and the number in parenthesis refers to that block's cohesion. Finally, the tree representing the hierarchy (like in the plot) is written to 'myBlocks_tree.net'.

[edit] Optional arguments

The main algorithm function, 'cohesive.blocks()', can take a number of optional arguments. These are explained in detail in the help file, which you can read by typing

?cohesive.blocks

Two of the more useful arguments are illustrated with this command:

cb <- cohesive.blocks(g, verbose=TRUE, cutsetHeuristic=FALSE)

The 'verbose' argument has the function print information about what it's doing as it runs. This is useful to see if the algorithm is hanging at a particular spot, or if it just has to iterate through a large number of subgraphs.

The 'cutsetHeuristic' turns off a specific heuristic used in the cutset-finding algorithm. This heuristic can speed up very large graphs (hundreds of vertices), but tends to slow down smaller graphs like the one used in this example. It's usually not a problem, but if you're having trouble definitely try switching this. Some graphs hang indefinitely with the heuristic on, but take just a few seconds with it off. (On my machine, turning the heuristic off on the example graph above decreases the run time from 20 to 16 seconds.)

Also, be sure to check out the help files for further assistance:

?cohesive.blocks
?plot.bgraph ## cohesive-block specific arguments
?plot.igraph ## general graph-plotting argumetns
?write.pajek.bgraph
?read.graph ## for getting graph from other types of data files

[edit] Working with Pajek graphs 4-27-09

When importing from Pajek .net files (as in the example above), igraph tries to import any edge and vertex attributes. These attributes are initially loaded into the imported graph, and should be carried through to the output of cohesive.blocks(). As an example, we can import the same graph and run the same cohesive.blocks() function as above:

g <- read.graph(file="http://intersci.ss.uci.edu/wiki/Vlado/SanJuanSur.net", format="pajek")
cb <- cohesive.blocks(g)

Now use the following igraph commands to see what vertex and edge attributes were imported (these commands work on g or on cb):

list.vertex.attributes(cb)
# [1] "id" "x"  "y"  "z" 
list.edge.attributes(cb)
# [1] "weight" "color" 

In igraph, vertex attributes are usually accessed through the V() function, and edge attributes through the E() function:

V(cb)$id
E(cb)$color

When plotting, igraph normally labels the vertices with their internal identifiers (integers starting at zero). If you'd rather use the 'id' attribute imported from Pajek, you simply need to define the 'label' vertex attribute:

V(cb)$label <- V(cb)$id
plot(cb)
## or, if the labels are hard to read
plot(cb,vertex.size=10)

Notice that the edge colors are already being used, because igraph understands the 'color' edge attribute as imported from Pajek.

If you want to use the imported Pajek layout, you'll need to create a layout matrix giving the x-y coordinates of each node. You can do this using the cbind function (short for column-bind) on the 'x' and 'y' vertex attributes:

pajLayout <- cbind( V(g)$x, V(g)$y)
plot(cb,layout=pajLayout)

Finally, as in the examples above you can use "cb$blocks" to get a list of the cohesive blocks. However this uses the igraph identifiers. To translate this to the pajek 'id' attribute, use the following (somewhat complicated) call:

lapply(cb$blocks,function(i){V(cb)[i]$id})

or for a fully verbose printout:

blockLists <- lapply(cb$blocks,function(i){V(cb)[i]$id})
for(i in seq_along(blockLists)){
 cat(cb$block.cohesion[[i]],"- cohesive:\n",blockLists[[i]],"\n\n")
}

[edit] Tests of Peter McMahan implementation of Moody-White algorithm

MAKE SURE, after igraph installs, that it is installed correctly and that it is opened as a library

install.packages("igraph",dep=TRUE)
library(igraph)

Restart R to clear workspace, cut and paste into R as a command SIMPLE THE FOLLOWING
source("http://intersci.ss.uci.edu/wiki/pub/MW.R")  #caution - under other conditions this does not work. Then to enlarge the nodes, try:
plot.bgraph(gBlocks,layout=layout.fruchterman.reingold, vertex.size = 25)
Copy http://intersci.ss.uci.edu/wiki/pub/mwExample1.net to your directory, then

Try another - takes longer and needed some changes - (Obsolete: dont use the Snow package from Cran that speeds up computationally demanding statistical procedures; although cohesive blocking may speed up significantly by using several connected computers in parallel.)

[edit] Limitations

Cohesive blocking is for sparse networks where blocks have relatively low levels of cohesion. For example, Jason Owen-Smith ran a network that doesn't fit cohesive blocking well that ran on the order of weeks and "STOPPED AT A 104 NODE CLUSTER THAT IS 67-COHESIVE" and gave an error message. So beware using this for medium to large highly cohesive graphs! Perhaps someone will do an approximation algorithm, and it may be the case that James Moody's SAS algorithm would run faster and complete the task for Jason's graph.

Hey Jason -

Happy to send along the SAS if you want; may be faster (may not too!).

Were it me, I might try:

  • a) Identify the k-cores of the graph
  • b) Then run a pair-wise node-connectivity on a random sub-set of pairs within each k-core, to see if you're likely to find node-connectivity less than the k-core level.

This is just an approximation, but will likely get you what you want.... (this suggestion copied to the introduction above).

PTs, Jim

On 14 May 2009 at 9:43, drwhite@uci.edu wrote: JIM - GOOD SUGGESTION - Pair-wise using Ucinet, yes? May I post?

Hey Doug -

Yeah, feel free to post. UCINET can do it, but also we can do it in SAS. The pairwise problem is actually much simpler than the full-connectivity problem -- just a lot less book keeping you have to do! But, the one trick is to make sure you only look *within* a given k- core, rather than subset the simple pairwise based on the global network (since pairwise paths could be outside the set if you don't select first). That is, k-core first, then connectivity; not pairwise connectivity then k-core.

PTs, Jim

[edit] Wayne Zachary karate club

[edit] Examples for Random graph and modified graph

structural cohesion in 20 node graph with 38 random edges
structural cohesion in 20 node graph with 38 random edges
structural cohesion in 20 node graph with 33 random edges and 3 nonrandom edges to show separation and overlap of 3-components
structural cohesion in 20 node graph with 33 random edges and 3 nonrandom edges to show separation and overlap of 3-components

[edit] San Juan Sur

SanJuanSur.net   *Branch 0:  1-cohesive; 75(74) vertices; 1 sub-branches; *Branch 1:  2-cohesive; 74(60) vertices; 2 sub-branches; *Branch 2:  2-cohesive; 6(0) vertices; 0 sub-branches; *Branch 3:  2-cohesive; 56(55) vertices; 1 sub-branches; *Branch 4:  3-cohesive; 55(0) vertices; 0 sub-branches;
SanJuanSur.net *Branch 0: 1-cohesive; 75(74) vertices; 1 sub-branches; *Branch 1: 2-cohesive; 74(60) vertices; 2 sub-branches; *Branch 2: 2-cohesive; 6(0) vertices; 0 sub-branches; *Branch 3: 2-cohesive; 56(55) vertices; 1 sub-branches; *Branch 4: 3-cohesive; 55(0) vertices; 0 sub-branches;

This community network has 75 nodes and an average density of 4 to 5. Takes less than a minute to run. Uses cutsetHeuristic=FALSE to avoid an algorithm error.

http://intersci.ss.uci.edu/wiki/Vlado/SanJuanSur.net
 source("http://intersci.ss.uci.edu/wiki/Vlado/MW_SanJuanSurNet.R") #works but takes several minutes
The chapter 3 San Juan network data from de Nooy, Batagelj, and Mrvar 2005 (Jim Moody drawing)
The chapter 3 San Juan network data from de Nooy, Batagelj, and Mrvar 2005 (Jim Moody drawing)

Vlado Batagelj extracted only the kinship visiting portion of the San Juan Sur network, which works with the default (cutsetHeuristic=FALSE)

 source("http://intersci.ss.uci.edu/wiki/Vlado/MW_SanJuanSur_kin.R") #works in under a minute

[edit] Davis net

Davis.net N=32 Moody-White Moody drawing
Davis.net N=32 Moody-White Moody drawing

Successful test on the Davis network, took 2 min 10 seconds, with the default.

http://intersci.ss.uci.edu/wiki/Vlado/Davis.net
 source("http://intersci.ss.uci.edu/wiki/Vlado/DavisNet.R") #works but not the right dataset
Davis.net N=32 Peter McMahan igraph vertex.size=20
Davis.net N=32 Peter McMahan igraph vertex.size=20

There is a difference between the Peter McMahan strict structural cohesion algorithm and the Moody-White cohesive embedding algorithm. M-W does a further refinement. This result needs to be checked out against http://intersci.ss.uci.edu/wiki/index.php/Image:DRWFig1.png. The data for the two studies however need to be the same. In Moody's data event 13 has 4 ties, in Ours, only 3. In Davis2mode, 4. It looks like the DRWFig1.png image is off but maybe there is an unseen link from event 13 through Nora to Helen.

Cohesion  Structural  Structural       Structural     Structural 
  Level    Cohesion    Embedding        Cohesion       Embedding
    4         26         23
    3          3          5            7,18,19       7,18,19, 30,31
    2          3          4            16,17,28      28,      15,16,17    
       N=32

For a comparison with a graph of the network see http://intersci.ss.uci.edu/wiki/index.php/Image:DRWFig1.png Events 1-14 in that figure are 19-32 above

    3          3          5              1              1,    12*,13
    2          3          4              10**          10**
12* at level 4 not 3?
10** at level 2 but degree 4
 1*** level 3

[edit] Interim problems for large networks

Adjusting SQLite, Vista, etc. here: Interim - will later be updated

[edit] NETWORK TESTING N=36 Davis net 3

 source("http://intersci.ss.uci.edu/wiki/Vlado/DavisNet3.R") #works but not the correct Davis dataset
Davis.net N-36 NETWORK TESTING NET (3) Peter McMahan igraph vertex.size=20
Davis.net N-36 NETWORK TESTING NET (3) Peter McMahan igraph vertex.size=20

[edit] Other tests

iGraph objects -- Peter McMahan is the author of this program, which implements Moody and White (2003) in R: Structural cohesion


[edit] Other functions in the algorithm

On Feb 26, 2009, at 5:19 PM, drwhite@uci.edu wrote:

> what command gives a list of such functions in cohesive.blocking?

There's no way to list them, but you can check out the source file at: http://www.charting1968.net/CohesiveBlocks.R The last bunch of functions (just after it says: "The following are specific utility functions for dealing with the bgraphs output by the cohesive blocking algorithm.") are about all there is. I think there are only two you might not already know about:

  plot.kCore.bgraph

acts like the regular plot, but uses k-cores instead of cohesive blocks. I've used it in the past to compare k-cores with cohesive blocks on specific networks -- to see if the computationally cheaper k- cores give significanlty different results. Similarly, there is

  write.pajek.kCore.bgraph

which is the same as write.pajek.bgraph, but it ignores the cohesive blocks and uses k-cores instead.

The only other thing you might want to check out is:

  help(package="igraph")

This will list all the igraph functions, and any of them that take a graph object will also take the output from cohesive.blocks.

Peter

On Feb 26, 2009, at 11:03 AM, drwhite@uci.edu wrote:
> How do I pull out the vector of cohesion values for each node, i.e.,
> the highest k-component for each in the network of N nodes?
> Doug White
Not obvious at all. There's a function:
  max.cohesion(cb)
that lists them. I'll add it to the example on the wiki.
Peter

[edit] Setup

getwd()
setwd("C:/Program Files/R//R-2.6.2/")
list.files()
library(igraph)
library(digest)
library(RSQLite)
#help(cohesive.blocks) #graph object of class igraph
example(cohesive.blocks) #creates gBlocks
plot(gBlocks) #
#plot(gBlocks, layout=layout.spring) #igraph Network spring embedding in R
plot.bgraph(gBlocks,layout=layout.kamada.kawai) #igraph 
plot.bgraph(gBlocks,layout=layout.fruchterman.reingold) #igraph
require(igraph)
require(digest)
require(RSQLite)
example(cohesive.blocks)
plot.bgraph(gBlocks,layout=layout.kamada.kawai)
write.pajek.bgraph(gBlocks,file="gBlocks")
require(igraph)
require(digest)
require(RSQLite)
g <- read.graph(file="mwExample1.net", format="pajek")
gBlocks <- cohesive.blocks(g)                           
plot.bgraph(gBlocks,layout=layout.kamada.kawai)
write.pajek.bgraph(gBlocks,file="gBlocks")
source("C:\\Program Files\\R\\R-2.6.2\\MW.R")
source("http://intersci.ss.uci.edu/wiki/pub/MW.R")

[edit] From Pajek to Cohesive Blocks

g <- read.graph(file="davis.net", format="pajek")
g <- read.graph(file="Garfa.net", format="pajek")
g <- read.graph(file="livorno.net", format="pajek")
g <- read.graph(file="TARO.NET", format="pajek")
g <- read.graph(file="methodscamp.net", format="pajek")
gBlocks <- cohesive.blocks(g)                           #igraph  help(cohesive.blocks)
plot.bgraph(gBlocks,layout=layout.fruchterman.reingold) #igraph  help(plot.bgraph)  Excellent layout
plot.bgraph(gBlocks,layout=layout.kamada.kawai)         #igraph  Not as good
plot(gBlocks, layout=layout.spring)                     #not well separated -- shown as a clump
plot(g, layout=layout.spring) #just the original graph  
Network spring embedding in R

[edit] Mark Handcock R programs to get from Netdata (Network package) to Cohesive Blocks

Netdata60
#The problem is to get cohesive.blocks working inside the network package, reading netdata for example.
#library(igraph, warn.conflicts = FALSE, quietly=TRUE)
#
#library(digest)
#library(RSQLite)
#
#library(network)
#library(netdata)
#library(RBGL)
#tte packages need to be installed but will be activated by cutting and pasting these two lines
source("http://intersci.ss.uci.edu/wiki/pub/CohesiveBlocks.network.R")
source("http://intersci.ss.uci.edu/wiki/pub/CohesiveBlocksExamples.R")
mwExample1.clu                          
mwExample1.net                          
mwExample2.clu                          
mwExample2.net 
#Dont use these
http://www.csde.washington.edu/~handcock/CohesiveBlocks.network.R
http://www.csde.washington.edu/~handcock/CohesiveBlocksExamples.R

MW example

The key function is network2igraph (see the code). The plotting in igraph does not work for me. --Mark Handcock

[edit] Carter Butts' proposal for data conversion

There's not yet a routine to convert network objects into igraph objects. However, one quick and dirty approach is to convert the network into a matrix, and then build an igraph object from that. Here are two ways to put a network object into adjacency matrix form:

as.sociomatrix(mynet)  #Converts to sociomatrix
mynet[,]               #This works, too.

as.sociomatrix has some extra options, and the [,] operator allows for fancy overloading, but both work similarly for simple data. You can also convert the object into an edge list if the network is extremely large -- check out the as.matrix.network help pages for more options.

Hope that helps! --Carter

[edit] Peter McMahan's proposal for data conversion

Is the edgelist an object from the "network" package in R? if so then it's probably best to convert to an adjacency matrix then to an igraph object. Starting with "network" object G

require(igraph);require(network)
g <- graph.adjacency(as.matrix.network(G))

should give you an igraph graph object g. This will not preserve vertex attributes.

If the edgelist is a text file that looks like what you sent, then it will take some complicated text parsing to do the trick.

Check out igraph's file import capabilities:

?read.graph

the ability to read a number of existing file formats is built in to that function. -- Peter

[edit] Update from Peter for Mark

DRW: Some enhanced collaboration here!) `Oh, great. It looks like the only change Mark Handcock made was to add a function "network2igraph", which does what I described but looks like it does a good job of keeping vertex attributes intact.

It also looks like he's using an older version of the cohesive.blocks() code, so that may explain the strange saving behavior. The code in the attached file should implement his conversion without modifying the original function. This way it should keep up with any improvements to the cohesive.blocks() funtion that come along with new versions of igraph (such as a significant speed imporovement upcoming in v.0.5). To use it just source the file ( source("CB.networkCompatibility.R") ) and it defines a new function cohesive.blocks.network() that takes network objects rather than igraph objects. So, with network object G:

source('CB.networkCompatibility.R') ## if not done already - ELSE source('http://intersci.ss.uci.edu/wiki/pub/CB.networkCompatibility.R)
cb <- cohesive.blocks.network(G)
plot(cb,layout=layout.kamada.kawai)
write.pajek.bgraph(H,"G",T)

The output object is an igraph object still, so write.pajek.bgraph whould work fine. -- Peter 13:33, 26 February 2008 (PST)

[edit] Vertex cohesion

g <- erdos.renyi.game(50, 5/50)
g <- as.directed(g)
g <- subgraph(g, subcomponent(g, 1))
graph.cohesion(g)
#gBlocks <- cohesive.blocks(g)                           #igraph  help(cohesive.blocks)
plot.bgraph(g,layout=layout.fruchterman.reingold) #igraph  help(plot.bgraph)  Excellent layout


g <- CEOs  #not igraph
gg=g[,1]
cohesive.blocks(gg)
  1. HighTech.paj - multirelational network with 21 vertices and 190+102+20 arcs
  2. Padgett.paj - multirelational network with 16 vertices and 30+40 arcs
  3. EIES.paj - multirelational network with 48/32 vertices and 695+830/460 arcs
  4. Trade.paj - multirelational network with 24 vertices and 307+307+310+135+369 arcs
  5. CEOs.net - two-mode network with 41 vertices and 98 edges.

==Comment from Jim Moody to Pajek listserve]] Subject: Re: [Pajek] Structural cohesion

Hey Folks - happy to see a discussion of k-connectivity on this list!

But, just to be clear, UCINET calculates pairwise node connectivity, not structural cohesion per se. Every k-component is pairwise k-connected, but not every pairwise-k-connected set is a k-component. The key is that there can be node-independent paths connecting two nodes, but these paths may not be in the component. Consider this graph:

a -- b -- c
a -- d -- c
a -- e -- c

nodes a and c are 3-connected, but there is no 3-component in the network. So if you just pulled out sets that share the maximum k-point-connectivity, you'd put {a c} together, but that would be an error.

In practice, I'm betting it'll be a very good approximation, particularly if you first extract the set then re-run to ensure that all paths are within the set.

Best, Jim

[edit] Forget this stuff for now

test.3 <- read.paj("Garfa.net")   #Import/Export igraph network
plot(test.3,main=test.3$gal$title)
plot(test.3,layout=layout.fruchterman.reingold)
plot(test.3,layout=layout.kamada.kawai)
g <- read.graph("test.3", format="edgelist")
cohesive.blocks(test.3)

Netdata60

  1. read.graph("filename.net","pajek") #(igraph) read.graph(file, format = c("edgelist", "pajek", "ncol", "lgl","graphml", "dimacs", "graphdb", "gml"), ...)

R plot networks

[edit] Links

[edit] Ramiro Berardo

Cut and paste the contents of the following

  1. twomodeFunding176.net.R central node of connected 176, only bicomponents
  2. twomode175NOcenter.R delete central node leaving disconnected 175, large components
  3. twomode136_NO_centerComponent.R delete central node giant component, 136 nodes with three connected bicomponents
  • Because there are no tricomponents or higher levels of cohesion, all this can be done in Pajek (Image is for 3 above)
  • In these 2-mode matrices, rhe rows are organizations and the columns are projects. So a 1 (arrow, arc) means that the organization provided assistance to secure funding for the project in the column.
Personal tools