MCSN Thursday, 24-Nov-11

From CCE wiki archived
Revision as of 12:18, 26 November 2011 by Michaelf (talk | contribs) (Variable relations)
(diff) ?Older revision | view current revision (diff) | Newer revision? (diff)
Jump to: navigation, search

Download:

  • Fig.59.net
  • Korea.paj
  • SanJuanSur2.paj
  • Galesburg2.paj


Announcements

  • Quiz (chapters 4,5,6,9): to be distributed on Friday November 25 by email; due on Tuesday Nov 29 11 am. The quiz is open book, but you are on your honor not to work together!
  • Presentations:
    • Tuesday Nov 29: Lauren, Nadine, Geoff, Raimundo
    • Tuesday Dec 6: Chee Meng, Jill, Chris, Allison, Alyssa
    • Who didn't sign up?
    • Please time your presentation - you have 10 minutes to explain your results. We'll add 2-3 minutes for discussion. You may like to take notes so that you can incorporate questions, critiques, and suggestions into your final paper.
  • Final concert: when can we perform? You have a final rehearsal next Thursday Dec 1 (I will be at a conference for Middle East studies, in Washington DC)

Centrality in undirected networks (chapter 6)

[1]

Prestige in directed networks (chapter 9)

Definitions, basics

  • Prestige can be defined in two different ways:
    • Structural prestige. As a function of social network structure. Positive choices are generally a sign of prestige, and being chosen by a prestigious vertex is generally a sign of greater prestige. SNA uses this ideas as a means of defining prestige.
    • Social prestige. Other measures of prestige simply depend on combining what many different people think of a particular vertex (representing a person or association), without any explicit reference to social networks. This social prestige thus defines an attribute variable that could be represented in Pajek by a partition or vector.
  • As usual, the task of SNA lies in determining the relation between network and attribute properties - here between structural and social prestige.
    • Is it possible that social prestige in fact derives from structural prestige? (choices result in prestige)
    • Or perhaps structural prestige derives from social prestige? (prestige results in choices)
    • Either is likely, but the relation may not be simple - it's a topic for exploratory SNA!
  • Popularity of a vertex is its indegree (analogous to degree centrality).
    • Note that this procedure presumes no multiple arcs (simple network)
    • Multiple arcs and line values suggest a different operation
  • Note that the choice relation must be "positive" (e.g. "likes")
    • if negative ("dislikes"), prestige may be interpreted as a kind of "infamy"
    • if reversable ("lends money to") use outdegree instead (or transpose the network matrix, reversing all arc directions)
  • Pajek techniques:
    • Net>partitions>degree>input to generate input degree partition and normalized input degree vector
    • Net>transform>transpose to transpose network

Variable relations

  • Relating two variables: two variables are related (or associated) if their values don't occur independently of one another: the value of one variable biases the possible values of the other (for instance: "height" and "weight" are certainly not independent, whereas "height" and "musical preference" probably are). Statistical tests are used to measure the significance of these relations or associations. The kind of test used depends on the kind of variable.
    • Variable types:
      • Nominal (categorical): variable values cannot be ordered; are fundamentally non-numeric (e.g. red, blue, green)
      • Ordinal or rank: variable values are ordered, without implication of distance (e.g. first, second, third)
      • Interval: variable values occur on a number line, but the zero point is arbitrary (e.g. temperature Celsius) so the ratio between two values is meaningless (20 degrees isn't twice as hot as 10 degrees)
      • Ratio: variable values on number line with a significant zero point, so that ratios are meaningful (e.g. weight: 20 kg is really twice as heavy as 10 kg)
    • Pajek and variable representations
      • Partitions: Nominal and ordinal
      • Vectors: Interval and ratio
    • Different statistical tests are required to detect relations between different kinds of variables
      • Comparing two nominal variables: Chi-squared and cross tables - use Partitions>info>Cramer's V, Rajski
      • Comparing two rank variables: Spearman rank correlation - use Partitions>info>Spearman's rank
      • Comparing two interval or ratio variables: Pearson correlation - use Vectors>info
      • Spearman's takes only rank into account; few cases should have equal rank. Pearson's measures the degree of linear association between two variables and is more precise for interval or ratio data, however may miss relationships when the linearity condition isn't fulfilled for rank data. Generally, if Pearson's suggests a relation, so will Spearman's - but the former is more precise. But the latter may suggest a relation when the former doesn't.
  • San Juan Sur data
    • Network of visits, partitions identifying (a) social prestige rank and (b) prestige leaders (families 23, 39, 47, 61, 66).
    • Relation between input degree and prestige leadership: draw partition-vector ... no clear relation?
    • Relation between indegree and social status groupings
      • Run Spearman rank test on two partitions: social status and indegree
      • Run Pearson's test on two vectors: social status (converted to vector) and indegree

Domains

  • Domains
    • Popularity is defined in terms of direct choices only (indegree)
    • Analogous to closeness centrality, we may wish to weight choices by popular vertices higher.
    • Input domain of a vertex V: the set of vertices connected by a path to V (extended popularity - if an arc represents choice (?)) [The book also defines input domain as: "the percentage of vertices connected by a path to V" which is a bit confusing. Context will suggest which definition the authors intend.]
    • Restricted input domain: the set of vertices connected by a path to V within a fixed maximum path length (e.g. restricted to 2 hops). This is useful if all input domains are otherwise the same.
    • Output domain of a vertex V: the percentage of vertices connected by a path from V (extended influence - if an arc represents communication (?))
    • Overall domain of a vertex V: the union of input and output domains.
  • Pajek:
    • Net>k-neighbors>input generates a partition indicating distance from the selected vertex (the value 9999998 is used for vertices outside the input domain)
    • Net>partitions>domain>input generates a partition and two vectors:
      • partition: number of vertices in input domain for each vertex
      • vector 1: size of input domain as a proportion of all vertices minus the vertex itself (i.e. normalized to N-1)
      • vector 2: average distance to vertices in the input domain (measure of closeness). Vertices with no input domain are assigned 999998 (infinity).
  • Note that a vertex V's strong component is always part of its domain, but the domain can be bigger than the strong component (since it includes vertices with a path to V, but for which there may be no return path)
  • San Juan Sur data
    • Prestige leaders all have large input domains, except for f61. Is this significant?
    • Test the relation by using a statistical test (which one would you select?)

Proximity prestige

  • We'd like to differentiate large input domains to V where vertices are connected by long paths to V, from those where vertices are connected by short paths to V. The latter indicates more prestige for V.
  • One approach: use restricted domains. But the choice of maximum path length is arbitrary. (In the case of path length=1, we have indegree or popularity prestige)
  • Let's try to weight vertices in the input domain so that those which are close count more than those which are far away.
  • For a particular vertex V, we do this by dividing the normalized size of V's input domain (as a proportion of all other vertices) by the average distance to V.
    • Fixing average distance, a bigger input domain implies greater prestige.
    • Fixing input domain size, a lower average distance implies greater prestige.
    • Maximal prestige: when all other vertices (max) directly choose V (min), i.e. a star network.
  • Proximity prestige: proportion of vertices in input domain divided by average distance to all vertices in the input domain.
    • This is what we want!
  • Pajek: assign the two vectors and divide using Vectors>divide first by second
    • Note: there are a number of arithmetic operations available that you may find useful under "Vectors"
  • San Juan Sur data
    • Compute proximity prestige
    • Convert resulting vector to partition using Vector>Make partition>by intervals>first threshold and step (use 0.01 for both in this case). NOTE: BE CAREFUL HOW YOU CONVERT VECTORS TO PARTITIONS. You want to capture the densities in the data. If you make the intervals too small, every partition cluster will contain just 0 or 1 values. You want intervals that are wide enough to capture multiple values, yet not so wide that they capture too many. The same principles apply for the command Info>vector. Examine your vector before setting parameters for its conversion to a partition, and examine the resulting partition afterwards to be sure you don't wind up with a 0/1 partition.
    • Is proximity prestige related to prestige leaders? Use Cramer's V/Rajski test and examine crosstabs
    • Is proximity prestige related to social status? Use Spearman's rank test.