MCSN Tuesday, 8-Nov-11

From CCE wiki archived
Revision as of 11:28, 8 November 2011 by Michaelf (talk | contribs) (Applications: creating and manipulating two mode networks)
(diff) ?Older revision | view current revision (diff) | Newer revision? (diff)
Jump to: navigation, search

Schedule

  • office hrs Wed Nov 9 from 2:30 to 3:15 (I can stay until just before 4. No office hours next week however...)
  • no class Thursday Nov 10 (Remembrance Day)
  • short class Tuesday Nov 15 (new drafts of proposals due; intro to chapter 6 and network game)
  • self-guided class Thurs Nov 17
  • course evaluation on Tuesday Nov 22
  • quiz on Thursday November 24 (to cover material up to that point)
  • presentations on Tuesday November 29 (presentations: 10 minutes each)
  • self-guided class Dec 1 (polish your music compositions for possible performance)
  • class on Dec 6 (last class - more presentations)

Project discussions

for those who didn't talk about their projects last time, and would like class feedback, suggestions, discussion...(but briefly so we can wrap up chapter 5 today)

Chapter 5: Affiliation networks

Concepts

Basic ideas

  • People affiliate to groups (often defined by space, like the University of Alberta), and events (typically defined by space-time, like this class session), whether by choice or circumstance.
  • Such affiliations define bipartite networks comprising two kinds of vertex, which we can call actors and events (don't be confused - events could be more like groups).
  • In a bipartite network there are two kinds of vertex, type A and type B. All lines connect a type A vertex to a type B vertex - there are no direct connections between vertices of type A, nor are there direct connections between vertices of type B.
  • A bipartite network is also called "two mode", since there are two kinds of vertex, and is represented by a matrix rectangle rather than a square (see this in Excel).
  • Affiliation networks are bipartite (or two-mode), but the reverse doesn't hold (e.g. links between keys and locks is bipartite, but not an affiliation network in the usual sense). An affiliation network is a special interpretation of a bipartite network.
  • Affiliations define social circles (the term comes from sociologist George Simmel) which overlap.
  • Network representation of identity as a model for social belonging:
    • Culture model (common in traditional ethnomusicology, and applicable small-scale societies): each individual belongs to one "complex whole" as Tylor put it in 1847. This "complex whole" might be identified as the sum of many social circles, but they heavily overlap: everyone in a small community belongs to many or most of them (work circle, religious circle, etc.) with diversions primarily on the basis of age and gender.
    • Identity model (more common in sociology and contemporary ethnomusicology, and more applicable to large-scale societies): individuality is the sum total of multiple "simple parts", each person summing them in a slightly different way. These "parts" can be viewed as social circles whose intersection is the individual.
    • Note: social identity can't be captured in a single Pajek partition....why? The concept of partition is closer to the traditional "culture" model of exclusive all-encompassing identities.
  • Social circles may also imply power circles with critical implications for relationships among "events" (groups). Example: Interlocking directorates
  • Degree of a vertex indicates the scope of the corresponding social circle:
    • Degree of an event (group): size of the event (group)
    • Degree of an actor: rate of participation of the actor

Typical assumptions about affiliation networks

  • Book states them as facts (see p. 101), but you should critique them in theory! test them in your projects!
  1. Affiliations are institutional or structural - less personal than friendships or sentiments. [What do you think? How could we test this?]
  2. "Although membership lists do not tell us exactly which people interact, communicate, and like each other, we may assume that there is a fair chance that they will." [what factors might impact the chances of actual dyadic interaction?]
  3. Actors at the intersection of multiple social circles...
    1. tend to interact even more
    2. become bridges enabling indirect communication/control between the circles as a whole.
  4. "Joint membership in a social circle often comes with similarities in other social domains." (i.e. homophily principle..."birds of a feather flock together". Is this cause of common affiliation, or effect? Understanding the difference might require different methods: (a) temporal network analysis, or (b) qualitative (interview, observation) analysis.)

Representations via matrix or edge list

Matrices are useful conceptual tools, but Pajek relies on lists of edges.

  • Matrices represent one or two mode networks very naturally
  • One-mode networks are naturally represented using
    • upper triangular matrix, no diagonal (undirected simple)
    • upper triangular matrix (undirected with loops)
    • square matrix (directed with loops)
  • Two-mode networks are naturally represented using rectangular matrices
    • Rows represent first mode (e.g. actors)
    • Columns represent second mode (e.g. events)
  • Deriving one-mode network from two-mode network.
    • Mapping the "hidden networks" implied by two-mode network (under assumptions above) can be highly significant
    • One-mode network derived from rows (e.g. actors)
    • One-mode network derived from columns (e.g. events)
  • One can also represent two-mode networks with lists of edges
    • Actors and events must be clearly differentiated
    • Simply listing edges may violate condition that actors can't link to actors, or events to events
    • Thus it's necessary to provide a simple means of identifying which vertices are rows (or, conversely, which vertices are columns)
    • If we number the vertices, it's easiest to separate rows and columns by assuming that the first N vertices are rows (and so the rest are columns). This is the approach taken in Pajek...

Applications: creating and manipulating two mode networks

  • Two-mode network in Pajek
    • Vertex command is followed by two numbers: (a) the number of vertices; (b) the number of rows (whether actors or events)
    • When Pajek sees two numbers instead of one, it generates an affiliation partition to match, that assigns each vertex to class 1 (rows) or class 2 (columns).
  • Using txt2pajek to generate a sample two-mode network

Corporate interlocks in Scotland, 1904-5

  • Early 20th century: joint stock companies began to form
    • owned by shareholders
    • represented by boards of directors
  • Interlocking directorates linked the companies (and companies linked the directors)
  • Data: 136 multiple directors for 108 largest joint stock companies, of various types:
    • non-financial firms (64)
    • banks (8)
    • insurance companies (14)
    • investment and property companies (22)
  • Partition: indicates industry type
  1. oil & mining
  2. railway
  3. engineering & steel
  4. electricity & chemicals
  5. domestic products
  6. banks
  7. insurance
  8. investment
  • Vector: indicates total capital in 1,000 pounds sterling

Analyzing Scotland.paj

  • Info->network
    • Number of vertices
    • Number of lines
  • Affiliation partition separates firms and directors (examine)
  • Drawing and energizing. Note bipartite property.
  • Degree partition (size of events and rates of participation), can be displayed as vertex size (convert to vector)
  • Components

Deriving one mode nets from affiliation nets

  • Derived networks: Each two-mode network induces two one-mode networks: (a) by events (groups), (b) by actors, as follows:
    • By events (groups): events are linked by one line per shared actor
    • By actors: actors are linked by one line per shared event (group)
    • Note: loops represent size of events, participation rates of actors:
      • each event (group) shares each actor with itself, so each actor induces a loop for every event in which it participates
      • each actor shares each event (group) with itself, so each event induces a loop for every actor participating in it
  • Derived networks are typically not simple, but one can replace multiple lines by a single line with value = number of lines replaced. This value is called line multiplicity and the resulting network is called a valued network.
  • IMPORTANT:
    • Every event (group) will define a clique among actors. Conversely, every actor will define a clique among events (groups). There will be lots and lots of overlapping cliques, and not all of them are significant.
    • What is significant is lines of higher value, indicating multiple shared affiliations, so we concentrate on these prior to any further analysis (whether components or k-cores or cliques).
    • However note that a clique of actors in which every line is of multiplicity two does not mean that all the actors participate together in two events (groups). All it means is that every pair does!
  • m-slices
    • m-slice is derived by deleting all lines of multiplicity less than m, and then deleting all isolated vertices. Detect isolated vertices by creating a degree partition, then deleting the zero cluster.
    • vertices of the m-slice are precisely those that are attached to at least one line of multiplicity m or more.
    • m-slices are therefore nested (like k-cores): a vertex in the 3-slice (m=3) is automatically in the 2-slice (m=2), and a line in the 3-slice is automatically in the 2-slice.
      • 1-slice contains the 2-slice
      • 2-slice contains the 3-slice
      • 3-slice contains the 4-slice
      • etc.
      • The slices can thus be represented with contour lines (as in cartography)
    • Note: m-slices need not be connected - so after finding an m-slice, run component analysis to find its "pieces"
  • valued core
    • useful when we want to derive a partition from values on incident lines
    • first we decide if we want to derive this value from the maximum or sum of incident lines
    • then we sort this maximum or sum into a single "bin" representing a range of values
    • finally we have our vertex partition
  • valued core for m-slices
    • note that the highest m-slice to which a vertex belongs is simply the highest incident line multiplicity
    • so we can use valued core on the "max" setting, with bins defined as one per multiplicity (this is the default)
    • vertices are now partitioned by m-slice, but in order to find a particular m-slice we need to both (a) extract clusters, and (b) delete lines with value under m.

Analyzing Scotland.paj

  • We can convert Scotland.net into one-mode network of firms (settings: no loops, no multiple lines).
    • Lines between firms now represent the number of shared directors.
    • View line values (info->network->line values)
    • Energize with line values = similarities
  • Let's add degree information from the original network
    • create a degree partition for the full 2-mode network
    • extract a degree partition for firms by setting the degree partition as #1, and using the affiliation partition as #2
    • Derive the 2-slice directly
      • delete lines of multiplicity under 2
      • run degree partition
      • delete class 0
      • examine
      • construct components and reexamine
  • Run valued core and examine results with "info" and "draw"
    • Derive the 2-slice again from valued core result
    • Use valued core partition to derive Industrial Categories partition and Capital vector.
    • Examine resulting sociogram
  • 3D views
    • Note: SVG is no longer supported; VRML may work on PCs
    • Two techniques will work:
      • Energize in 2D, then apply Layers command in Draw menu
        • Draw network of firms, with valued core partition selected
        • Select Layers>Type of layout>3D
        • Run Layers>In Z direction
        • Energize in 3D and rotate or spin
      • Energize with Fruchterman Reingold 3D