MCSN Tuesday, 18-Oct-11
Notes
- When completing multiple choice questions (e.g. 3.8), be sure to add your explanation for your answer, in your own words. Don't just write down the letter response, and don't copy the book's text. Convince me that you've thought about the problem and that you understand why a particular answer is correct. This is more important than giving me the correct answers.
- Please check that you've submitted assignments via the Moodle on the right day, otherwise I may not find them. Web examples should also be added on days when they're due, so we can easily retrieve them.
- Please remember that your assignments will be downgraded a quarter point per day late, so do submit on time!
- Quiz on Thursday: mainly on chapter 3, resembling the kinds of questions presented in 3.8 (or 2.8).
- Due today: 3.9 and your compositions
Cohesive subgroups: (end of chapter 3)
- Cliques and complete subnetworks
- Degree of matching between two partitions, and statistical tests
- This is very important for your work, so be sure you understand the procedure and significance of the test results (though you need not understand the underlying statistical theory)
- Each partition defines a variable on every vertex (e.g. "family-friendship group", "component")
- Cross-tabulations indicate degree of variable independence (like a scatterplot).
- Cramer V (phi) (related to the chi-squared test for independence): measures the extent to which a cross-tabulation deviates from what one would expect if the two variables were independent.
- Rajski: informational predictive measure: to what extent does knowing the cluster value in one partition allow us to predict a cluster value in another partition? (C1->C2, C2->C1, C1<->C2)
- Try this for some random partitions.
- Flow chart, p. 78
- Assignments 3.8, 3.9
Sentiments and friendship (chapter 4)
- Line attributes: We already know that lines can have values, and that values may be interpreted in various ways, either...
- qualitatively (kind of relation, e.g. 1st or 2nd choice, friend or kin visits...), in which case negative values probably don't make much sense, or
- quantitatively: strength of a relation (degree of friendship, amount of interaction), in which case negative values may indicate a negative interaction or emotion.
- How can we generalize our search for cohesive subgroups given line values that vary from negative to positive?
- If lines values represent strength of relation, they may help us to detect cohesive subgroups.
- Ideally those subgroups should be tightly interconnected with positive lines, and clearly separated by negative lines, so that it's neither the case that (a) a single subgroup could be divided, or (b) two subgroups could be elided.
- In SNA terms: is it possible to detect cohesive subgroups within which negative relations are minimized, and between which negative relations are maximized? Detecting such subgroups is the goal of signed graphs and balance theory.
- Intuition about social stress:
- If I like A, I tend to identify with A, so may tend to (want to) share A's likes, to avoid stress
- A likes B, so I like B
- A dislikes C, so I dislike C
- If I dislike A, I may tend to (want to) avoid A's likes
- A likes B, so I dislike B
- A dislikes C, so I like C
- If I like A, I tend to identify with A, so may tend to (want to) share A's likes, to avoid stress
An Ewe proverb from West Africa says:
Ame deka medzea xorlor futor eve o, "one person does not make friends with two enemies...
This proverbial wisdom leads to...
- Balance theory
- Social psychology of 1940s (Fritze Heider)
- a person feels uncomfortable disagreeing with another person who is a friend about something.
- if this happens, the person may adjust the situation, either
- agreeing with the friend
- deciding that the other person in fact isn't a friend
- convincing self that they don't really disagree, after all
- Note that Heider considered everything from the point of view of one person (attribution), so edges suffice
- Cartwright and Harary later developed mathematical theory of signed graphs and balance for social networks.
- Here edges don't suffice, since affect need not be reciprocated
- In particular A might like B, but B doesn't like A (creating a cycle with a single negative arc - very stressful!)
- Viewing from the point of view of a single actor A, the cycle will tend to go in one direction, except the final arc (from A to the endpoint), which goes in the opposite direction.
- Generalization of the "transitive" relation: if A r B, and B r C, then A r C
- Sentiments, friendship, and transitivity
- Transitive relation: if A r B, and B r C, then A r C
- Consider a triangle including possible positive relation p, negative relation, n:
- if A p B, and B p C, then A p C
- if A p B, and B n C, then A n C
- if A n B, and B p C, then A n C
- if A n B, and B n C, then A p C
- In other words, the "outcome" relation A to C is the product of the two intermediary relations, A to B, and B to C.
- With a longer "chain" of relations (called a cycle) (A1, A2, A3, A4,....An, A1) by dividing into triangles we can see that the final relation (from A1 to An) is the product of the previous ones, meaning it is negative with an odd number of negatives, and positive with an even number of negatives up to and including A(n-1).
- This means that the number of negatives is always even, if we are to avoid stress.
- Balance theory definitions:
- Signed graph: a graph in which each line carries a positive or negative sign (e.g. 1 or -1) or value.
- (semi)cycle: closed (semi)path (starts and ends on the same vertex)
- Balanced (semi)cycle: contains even number of negative arcs
- Balanced signed graph: contains only balanced semicycles
- Balance theory theorem: A signed graph is balanced iff it can be partitioned into two clusters such that all positive ties are contained within the clusters and all negative ties are situated between the clusters. Sketch of proof:
- If a signed graph can be so partitioned, it's easy to see that it must be balanced
- If a signed graph is balanced, we can begin to sketch the construction of the two clusters...
- Thought questions:
- Does this "two cluster" result account for social dualism, i.e. tendency of groups to divide into two?
- But in reality we know that groups often divide into more than two.
- Perhaps this is because each subgroup subsequently subdivides (when we recalibrate line values)?
- Or perhaps the balance requirement must be too strict...
- We relax it with the "clusterable" requirement
- Clusterability theory definitions:
- Unclusterable (semi)cycle: contains exactly one negative arc
- Clusterable (semi)cycle: isn't unclusterable (can contain 2 or more negative arcs)
- Clusterable signed graph: contains only clusterable semicycles
- Clusterability theorem: A signed graph is clusterable iff it can be partitioned into clusters such that all positive ties are contained within clusters and all negative ties are situated between clusters.
- If a signed graph can be so partitioned, it's easy to see that it must be clusterable.
- If a signed graph is clusterable, we can begin to sketch the construction of the clusters...but this is a bit more complex than the balanced case.
- Thought question: how does all this apply to musical taste? In a network of people and genres, do we expect balance? Clusterability?
- Optimization procedures: finding clusters
- Optimization is typically a heuristic algorithm that searches for a local optimum, i.e. attempting to make an error value as small as possible.
- Typically sensitive to starting point, "initial conditions"
- Creeps around from the starting point, in the direction of improvement (reduced error)
- May not find a solution far from the starting poing (which is why one might randomize the initial condition).
- Analogous to finding a minimum in calculus through incremental steps: one may find a local rather than global minimum.
- Two optimization techniques for detecting clusters in signed graphs:
- Energize, after setting "options->value of lines->similarities". In this way negative lines will be made longer, positive lines shorter, which is what we want. Clustering may appear as a result. Note that energizing is an optimization problem, so every time you run it, you may get slightly different results. This approach depends on visualization...from here you may need to create a partition by hand to represent the clusters.
- Use the Balance command:
- Create a random partition: Partition->Create random partition. These are your initial conditions. You may like to begin with 2 or 3, and check how well the algorithm proceeds.
- Run Operations->Balance. Specify error assigned to forbidden intracluster negative arc (which determines the error for forbidden intercluster positive arc).
- Optimal solutions are saved as partitions.