MCSN Thursday, 20-Oct-11
Preliminaries
- Quiz
- Thursday, 27-Oct-11 Submit project proposal: to analyze a social network using public online data, comprising five short paragraphs and a bibliography (to be resubmitted until accepted). The whole thing is under 2 pages. Please submit via the Moodle.
- Identify: Identify a phenomenon from MCSN. Discuss the general aim and value of your study, framed as a contribution to ethnomusicology.
- Contextualize: Provide background (topical and theoretical), citing relevant literature
- Query: List a few research questions inquiring about this phenomenon. What do you want to understand? Provide tentative hypotheses if you have any.
- Model: Theorize your questions using SNA to model the phenomenon. How can you frame the question in terms of social networks? Cite relevant literature, if appropriate.
- Method: What sort of procedure can you propose for gathering data, analyzing it (with Pajek), and answering the questions? Cite relevant literature, if appropriate.
- Bibliography (can be short at this initial stage), including web sites. You may find that INSNA contains useful sources, but your bibliography can include musical as well as SNA materials.
Sentiments and friendship (chapter 4)
- Line attributes: We already know that lines can have values, and that values may be interpreted in various ways, either more "partition-like" or more "vector-like", i.e....
- qualitatively (kind of relation, e.g. 1st or 2nd choice, friend or kin visits...), in which case negative and fractional values probably don't make much sense, or
- quantitatively: strength of a relation (degree of friendship, amount of interaction), in which case negative and fractional values may make some sense; the former may indicate a negative interaction or emotion.
- How can we generalize our search for cohesive subgroups given line values that vary from negative to positive?
- If lines values represent strength of relation, they may help us to detect cohesive subgroups.
- Ideally those subgroups should be tightly interconnected with (high value) positive lines, and clearly separated by (high value) negative lines, so that it's neither the case that (a) a single subgroup could be divided, or (b) two subgroups could be elided.
- In SNA terms: is it possible to detect cohesive subgroups within which strong negative relations are minimized, and between which strong negative relations are maximized? Detecting such subgroups is the goal of signed graphs and balance theory.
- Intuition about social stress:
- If I like A, I tend to identify with A, so may tend to (want to) share A's likes
- A likes B, so I like B
- A dislikes C, so I dislike C
- If I dislike A, I may tend to (want to) avoid A's likes
- A likes B, so I dislike B
- A dislikes C, so I like C
- If I can't do this, I feel stressed....and I want to avoid stress
- If I like A, I tend to identify with A, so may tend to (want to) share A's likes
- Balance theory
- Social psychology of 1940s (Fritz Heider)
- a person feels uncomfortable disagreeing with another person about something, when that other person is a friend.
- if this happens, the person may adjust the situation, either
- agreeing with the friend
- convincing self that they don't really disagree, after all
- deciding that the other person in fact isn't a friend
- Note that Heider considered everything from the point of view of one person (attribution), so edges suffice
- Cartwright and Harary later developed mathematical theory of signed graphs and balance for social networks.
- Here edges don't suffice, since affect need not be reciprocated
- In particular A might like B, but B doesn't like A (creating a cycle with a single negative arc - very stressful!)
- Viewing from the point of view of a single actor A, the cycle will tend to go in one direction, except the final arc (from A to the endpoint), which goes in the opposite direction.
- Generalization of the mathematical "transitive" relation: if A r B, and B r C, then A r C (e.g. r could be "=", "equals" or ">", "greater than")
- Sentiments, friendship, and transitivity, given the "low stress" requirement
- Transitive relation: if A r B, and B r C, then A r C
- Consider a triangle including possible positive relation p, negative relation, n:
- if A p B, and B p C, then A p C
- if A p B, and B n C, then A n C
- if A n B, and B p C, then A n C
- if A n B, and B n C, then A p C
- In other words, the "outcome" relation A to C is the product of the two intermediary relations, A to B, and B to C, where "p" is a positive, and "n" is a negative (1 x 1 = 1, 1 x -1 = -1, -1 x -1 = 1).
- With a longer "chain" of relations (called a semicycle) (A1, A2, A3, A4,....An, A1) by dividing into triangles we can see that the final relation (from A1 to An) is the product of the previous ones, meaning the final relation is negative if preceded by an odd number of negatives, and positive if preceded by an even number of negatives up to and including A(n-1).
- This means that the number of negatives is always even, if we are to avoid stress.
- Balance theory definitions:
- Signed graph: a graph in which each line carries a positive or negative sign (e.g. 1 or -1) or value.
- (semi)cycle: closed (semi)path (starts and ends on the same vertex)
- Balanced (semi)cycle: contains even number of negative arcs
- Balanced signed graph: contains only balanced semicycles
- Balance theory theorem: A signed graph is balanced iff it can be partitioned into two clusters such that all positive ties are contained within the clusters and all negative ties are situated between the clusters. Sketch of proof:
- If a signed graph can be so partitioned, it's easy to see that it must be balanced
- If a signed graph is balanced, we can begin to sketch the construction of the two clusters...
- Thought questions:
- Does this "two cluster" result account for social dualism, i.e. tendency of groups to divide into two?
- But in reality we know that groups often divide into more than two.
- Perhaps this is because each subgroup subsequently subdivides (when we recalibrate line values) - we live in a binary but hierarchical world?
- Or perhaps the balance requirement must be too strict...(we don't live in a binary world)
- We relax it with the "clusterable" requirement
- Clusterability theory definitions:
- Unclusterable (semi)cycle: contains exactly one negative arc
- Clusterable (semi)cycle: isn't unclusterable (can contain 2 or more negative arcs). (Thus balance implies clusterability, but not the reverse. Balanced (semi)cycle contains an even number of negative arcs, clusterable semicycle contains 0 or more than 1.)
- Clusterable signed graph: contains only clusterable semicycles
- Clusterability theorem: A signed graph is clusterable iff it can be partitioned into clusters such that all positive ties are contained within clusters and all negative ties are situated between clusters.
- If a signed graph can be so partitioned, it's easy to see that it must be clusterable.
- If a signed graph is clusterable, we can begin to sketch the construction of the clusters...but this is a bit more complex than the balanced case.
- Thought question: how does all this apply to musical taste? In a network of people and genres, do we expect balance? Clusterability?
- Optimization procedures: finding clusters
- Optimization is typically a heuristic algorithm that searches for a local optimum, i.e. attempting to make an error value as small as possible.
- Typically sensitive to starting point, "initial conditions"
- Creeps around from the starting point, in the direction of improvement (reduced error)
- May not find a solution far from the starting poing (which is why one might randomize the initial condition).
- Analogous to finding a minimum in calculus through incremental steps: one may find a local rather than global minimum.
- Two optimization techniques for detecting clusters in signed graphs:
- Energize, after setting "options->value of lines->similarities". In this way negative lines will be made longer, positive lines shorter, which is what we want. Clustering may appear as a result. Note that energizing is an optimization problem, so every time you run it, you may get slightly different results. This approach depends on visualization...from here you may need to create a partition by hand to represent the clusters.
- Use the Balance command:
- Create a random partition: Partition->Create random partition. These are your initial conditions. You may like to begin with 2 or 3, and check how well the algorithm proceeds.
- Run Operations->Balance. Specify error assigned to forbidden intracluster negative arc (which determines the error for forbidden intercluster positive arc).
- Optimal solutions are saved as partitions.