MCSN Tuesday, 22-Nov-11

From CCE wiki archived
Jump to: navigation, search


  • Today: course evaluations (11:00 to 11:20), then review:
    • 5.7 (if questions remain)
    • chapter 6
    • results from last Thursday's self-guided session (how did it go?)
    • Note: I've reserved office hour time for you tomorrow, 1:45 - 2:30 pm

5.7 exercises

Chapter 6: centrality in undirected networks

Note that in discussing centrality we generally assume an undirected network; chapter 9 centers on the same issues for directed networks (prestige)

  • Observation: the "energize" command enables visualization (under assumption that connecting lines of equal value are of equal length), but doesn't measure centrality precisely. Like other visualization techniques, it uses the power of visual cognition to suggest patterns for exploration, but doesn't measure anything precisely...
  • Metric: a precise technique for measurement. There are a number of ways one might define centrality. (Can you suggest others, beyond what the chapter discusses?)
  • Basic questions:
    • centrality (egocentric): how central is a particular vertex?
      • Intuitively: a vertex is central if it's well-connected.
      • Maximum centrality: center of a star network
    • centralization (sociocentric): how centralized is the network as a whole?
      • Intuitively: a network is centralized if a well-defined segment is central, and the rest is not.
      • Maximum centralization: a start network
  • ESNAP offers 3 metrics (measures) for vertex centrality and network centralization:
    • Degree: local view, as message sender/receiver
    • Closeness: global view, as message sender/receiver
    • Betweenness: global view, as message interceptor
  • All three metrics on a network of size N require a similar sequence of computations:
    • Raw score
      • degree of a vertex
      • reciprocal of total geodesic distance from vertex (=1/tgd)
      • # geodesics passing through a vertex (ng)
    • Normalized score
      • degree/(N-1)
      • (1/tgd) /(1/(N-1)) = (N-1)/tgd
      • ng/(N*(N-1)/2)
    • identify vertex with maximum centrality
    • sum differences from this maximum (variation from maximum, or variation)
    • maximum possible variation on network of this size (one vertex at max, others close to zero) = variation for star network
    • centralization = variation divided by maximum variation

Degree centrality

  • Intuition: Let's start with the local view. Those who are well-connected get information faster, and send it out faster. They're more central. So, let's measure centrality simply based on number of neighbors.
  • (This is a naive intuition, since I might have lots of neighbors who are not well-connected, but it's a starting point, and it's easy to compute with limited information, e.g. ego network approach)
  • In a simple network of N vertices, number of neighbors = degree.
  • Star formation provides maximum centrality
  • Centralization:
    • first compute degree variation as follows:
      • find the most central vertex, and use its centrality as a comparison point, C
      • compute the difference between C and ever other vertex's centrality
      • add these differences together
    • then divide by the maximum degree variation on N vertices (Note: this will be the variation on a star network, with one vertex at the center, and (N-1) around the edge; the former has degree (N-1) and the latter all have degree 1. So the max is (N-1)*[(N-1)-1] = (N-1)*(N-2). Thus for N=5, the maximum variation is 12.
  • Pajek command: Net>partitions>degree>all.

Closeness centrality

  • Intuition: Let's now adopt a more global view. Instead of looking at how well-connected a vertex is at one hop (neighbors), let's look at how well-connected it is to the entire network. How many other vertices can I reach in (a) one hop? (b) two hops? (c) three hops? Etc. In other words, we want to check on how long the paths are connecting our selected vertex to everyone else. Those with short paths are closer to the network as a whole - they can send and receive messages more quickly: they're more central.
  • Geodesic: shortest path between 2 vertices
  • Distance from vertex A to B (call it d(A,B)) is the length of the geodesic from A to B.
  • Note that d(A,B) = d(B,A) for undirected networks (symmetry), since all lines are bidirectional.
  • In a star network, the center is maximally closeness-central - it's at one hop from everyone. So we compute closeness centrality of any vertex by comparing to the value for such a star-center.
  • More precisely: for each vertex in a network of size N
    • compute the sum of the distances to every other vertex (note that the metric fails if the network contains more than one component)
    • divide this value into the number of other vertices (N-1), which represents the sum for the star-center (meaning the normalized value is at most 1)
  • Compute centralization as before:
    • compute degree variation
      • find the most central vertex, and use its centrality as a comparison point, C
      • compute the difference between C and ever other vertex's centrality
      • add all these differences
    • divide by the star network degree variation
  • Note that this technique can only work if the network is connected! (one component)
  • Pajek command: Net>vector>centrality>closeness>all

Betweenness centrality

  • Intuition: a vertex is central if lots of geodesics pass through it (the vertex can intercept messages)
  • Define betweenness centrality of a vertex as the percentage of geodesics between other pairs of vertices that pass through it.
  • Betweenness centralization is the variation in this centrality, divided by the maximum possible variation.
  • Pajek command: Net>vector>centrality