MCSN Thursday, 13-Oct-11
1) Pajek data structures and tabular data
- Think of all vectors and partitions pertaining to a particular network as columns in a table or spreadsheet. The rows represent vertices, and each column contains properties for each vertex, either integer (partition) or real (vector) values.
- In Pajek we don't enter this table as a single data structure, but rather each column is treated separately. But you can create your dataset as a table, then cut and paste individual columns into separate files to create your partitions and vectors.
- When we extract a subnetwork based on a partition, we also need to extract from the other partitions and vectors. Hence: Partitions->extract second from first, and Vector->extract subvector commands. These are equivalent to selecting rows using conditions on one table column (for instance, selecting all rows for which that column=1).
2) Your projects should not involve research with human subjects. Data will typically come from the web, though you could use other sources (e.g. a database, newspapers, etc.). But networks need not be purely "virtual" - for instance, consider the relation between performers, songs, festivals, and concert attendees.
- Find dense subnetworks.
- 100% dense subnetwork = complete network.
- But we can relax this condition and search for, say, 75% dense subnetworks.
- See example 6node.net
- Density of the whole?
- Identify 100% dense subnets (complete subnets)
- Identify 75% dense subnets
- Identify 50% dense subnets
- Find maximal dense subnetworks
- 100%: "clique". Identify cliques in 6node.net
- 70%: 'relaxed' clique, based on density.
- Find high degree subnetworks
- Find subnetworks containing minimal degree nodes. Problem: not necessarily connected! (e.g. hubs)
- k-core: require minimum degree within the subnet! The k-core is a maximal subnetwork in which each vertex's degree is at least k.
- how to find them? Pajek does this for you. But it's instructive to come up with an algorithm: Start with a node of minimal degree, add neighbors of minimal degree, and check if they can be admitted (perhaps by checking their neighbors). Then maximize by checking which other vertices can be added, and iterate for those new vertices. let's do this for 6node.net
- e.g. 0-core: all vertices
- e.g. 1-core: any vertex with at least one incident line, e.g. all vertices in this case.
- e.g. 2-core: start with E, add A, F; maximize by adding D, C, B
- e.g. 3-core: start with A, add C, F, D
- Note two important facts:
- For each k there is only ever one k-core (by definition).
- However, the k-core is not necessarily connected (we should subdivide by components)
- Each core contains a subnetwork, which is typically smaller than the whole
- The 1-core contains the 2-core, the 2-core contains the 3-core, etc. because the degree condition is "minimum". Generally: the k-core contains the k+1-core. We say the cores are nested.
- Therefore vertices can belong to multiple cores. When we want to create a partition, we assign each vertex to its highest core.
- Attiro: symmetrize, then run Net->partitions->core->all, then draw-partition. Then Operations->extract from network to examine only higher cores.
- Your web examples?
Now, back to cliques...
- A clique is a maximal complete subnetwork of order 3 or more.
- Cliques will tend to overlap, and the connected components of such overlapping cliques are considered social circles.
- Cliques are network "bones"; structure of overlapping cliques is network "skeleton".
- Problem: Clique detection algorithm is slow. Therefore we instead search for complete subnetworks (without worrying about whether they're maximal).
- Triads: complete subnetworks of order 3 (triangles).
- Pajek: set a model subnetwork as "First network", and network to analyze as "Second network", then Fragment (1 in 2), all in Nets menu. T