Pajek help

From CCE wiki archived
Revision as of 10:36, 11 November 2011 by Michaelf (talk | contribs)
Jump to: navigation, search

quick link: http://bit.ly/pajekhelp

Installing Pajek on your computer

See these instructions.

Pajek data files

Pajek reads and saves network data by using special files with extensions such as "net", "clu", "vec", and "paj". The "net" file stores a network; "clu" and "vec" store cluster (integer) and vector (real) arrays, assigning one value per node. A "paj" file can combine other files, saving the time of reading them all in separately. You may create networks from within Pajek, in which case Pajek will create these files for you when you save. Or you may wish to create the files yourself. But be careful - these files must be pure text, without any additional formatting information (such as word processors like Word usually add)...or they won't work.

Creating Pajek files in Pajek

Pajek is capable of creating and editing networks, which can then be saved in the form of Pajek files. The easiest way to create a network from Pajek is to use the command:

Net->Random Network->Total number of arcs

Tell Pajek how many vertices you want, and then ask it to create zero arcs. The result is a graph with no arcs. Draw the graph, and you can add the arcs yourself. Or you can edit the graph in Pajek's main screen under "Network".


Editing Pajek files

Pajek files are pure text, and easily readable or writable. The trick is to keep them pure text.

Do not use tab characters! Formatting using spaces is much safer!!

On Windows, create data files using a simple text editor, such as Wordpad, that saves text only.

For Mac, follow these instructions:

  • Create a folder on your Desktop called "AAA Pajek Data", with as many subfolders as you like. You can create an alias to this folder and keep it anywhere else, but having this folder on the Desktop will save time, because the Mac version of Pajek requires that you navigate from the top level of your directory hierarchy.
  • I suggest that you create your Pajek files using Word. Be sure to turn off "Smart Quotes" (so that quotation marks appear "straight"); in MS Word 2008 for Mac this option appears under menus Tools/AutoCorrect/AutoFormat as You Type (other versions may place it elsewhere). Be sure to enter a carriage return ("return") after every line, and leave no blank lines. Save your files using output format “Text only with line breaks”, or "Plain text" (with option "MS DOS"). This will force the file into the required plaintext PC format (where line breaks contain both a carriage return and a newline, unlike unix which uses newline only). Word wants to add the ".txt" extension to such files - delete that and replace with the required file extension instead. Other text editors may also be able to create readable Pajek files - try and see what works.

Renaming file extensions

For either, Mac or Windows, saving as pure text inevitably adds a ".txt" file extension. You'll need to change this extension to what Pajek wants (e.g. from "txt" to "net", "clu", "vec", or "paj", as needed), or it won't read them. This is simply a matter of editing the file name. The problem is that operating systems like to hide the extension from you (thinking you don't want to see them!). Here's what to do:

For Windows: from the Start menu, go to Control Panel, select Folder Options, select the View Tab, and then uncheck Hide extensions for known file types.

see http://maximumpcguides.com/windows-7/show-file-extensions/

On the Mac, go to Finder. select Preferences, Advanced, and check Show all file extensions.

http://www.fileinfo.com/help/mac_show_extensions

Once you can see the file extensions, you can change them by renaming the file (usually a right or control-click will do it). Delete txt and insert the correct Pajek extension instead.

Pajek file formats

Pajek uses primarily three file types: .net, .clu, .vec (network, partition, vector). The .paj type allows you to combine multiple networks, partitions, and vectors in one file. You'll generally generate .paj files using Pajek. However you may be creating .net, .clu, and .vec files using spreadsheets and word or text processors.

Network files

Network files (extension ".net") define a network, as in the following:


*Vertices 3
1 "Doc1" 0.0 0.0 0.0 ic Green bc Brown
2 "Doc2" 0.0 0.0 0.0 ic Green bc Brown
3 "Doc3" 0.0 0.0 0.0 ic Green bc Brown
*Arcs
1 2 3 c Green
2 3 5 c Black
*Edges
1 3 4 c Green


This example defines 3 vertices (Doc1, Doc2 and Doc3) denoted by numbers 1, 2 and 3. The (fill) color of these nodes is Green and the border color is Brown. The initial layout location of the nodes is (0,0,0). Note that the (x,y,z) values can be changed interactively after drawing.

There are two arcs (directed edges). The first goes from node 1 (Doc1) to node 2 (Doc2) with a weight of 3 and in color Green.

There is one edgerom node 1 (Doc1) to node 3 (Doc3) of weight of 4, and is colored green.

Note that you must include a carriage return/new line after the last line of a Pajek file!

Partition files

Partition files (extension ".clu") divide the vertices into classes, called partitions. These partitions cannot overlap, and yet must cover all the vertices. In other words, every vertex is assigned to one and only one partition. Here's an example:


*Vertices 3
4
8
8


In this example, the three vertices are assigned to two different partitions - vertex #1 in partition #4, and the other two in partition #8 (note that the vertex numbers do not appear, but are simply inferred by ordering.

Vector files

Vector files (extension ".vec") assign a number to every vertex.

Here's an example:


*Vertices 3
0.35
8.9
100.0


In this example, the first vertex is assigned a value of .35, the second 8.9, and the third 100.0. Again, the vertex numbers do not appear, but are simply inferred by ordering.

What's the difference between vec and clu?

Numerically, vector values are continuous (real numbers), while partition values are discrete (integers). Vector values are continuous, while partition values are discrete. More importantly, they have different interpretations. A vector assigns a real value to each vertex - these values are not expected to repeat, or define classes. A partition divides the vertices into classes, most typically with more than one vertex per class.

For instance, if people are vertices, a vector might define their weights or heights. We don't expect these values to repeat exactly. A partition might divide them into classes based on educational level, assigning the value 0 for those who haven't finished elementary school, 1 for those who have completed only elementary school, 2 for high school, and 3 for college.

But Pajek can convert partitions to vectors (Partition->Make Vector) easily enough (integer values are also real numbers). Slightly more complex is converting a vector to a partition (Vector->Make Partition) since real values must first be "rounded" to integers. Ensuring that these conversions make sense is up to the user!

Pajek datasets for ESNAP

Download these sample data sets for use with the textbook, ESNAP. Store them in your Pajek Data directory for future use.

Converting lists of vertex labels to Pajek files

Pajek files require you to number vertices, and define arcs and edges by vertex number, not label, requiring you to keep track of the number/label relationship. The following two programs are very useful for generating Pajek files out of data comprising a list of label pairs. Each label automatically defines a vertex. Be careful of misspellings! Note also the difference between 1-mode and 2-mode, and between arc and edge networks. Text2pajek is actually more powerful, but you can use either.

text2pajek

excel2Pajek


Using Pajek...

Note that ESNAP contains an overview of Pajek use in Appendix 1, and an index of Pajek commands from page 327 on...please consult these sections as needed. What follows is more specific advice that may be useful in your projects.

Creating

  • Creating networks
    • Create networks quickly using text2pajek, which will create either one-mode or two-mode networks.
    • Or create a random network (Net>Random network>total number of arcs) with no lines then add them yourself by clicking with the mouse in the Draw window (the book explains how to do that in detail. Mac users take note: you must enable the "secondary click" feature of your trackpad in order to right click a vertex; set this under system preferences>track pad)
    • Or type and edit the .net file yourself - use any of the book's sample files as a starting point to get the format right. Note that for two mode networks you'll need to specify both the number of vertices, and the number in the first mode (see chapter 5).
    • A third way is to use the matrix input format - this can be handy if you find a table containing all network connections and simply want to read it into Pajek. ESNAP explains this at the back of the book, in Appendix 1.
    • To edit vertex labels, use the Partition section of the Pajek main screen. Edit any partition (or create a null partition using Partition->Create null partition). Click the edit button (small hand holding pencil). You can then edit vertex labels.
  • Creating partitions and vectors:
    • Create partitions and vectors quickly by using commands Partition>Create null partition or Vector>Create identity vector, then editing these (by clicking the hand/pencil icon) to insert the values you want.
    • Or, for bigger networks, you can simply edit the file yourself (.clu for partitions; .vec for vectors), using any of the book's sample files as a starting point and model. If you can cut and paste the data from a website, this way may be faster.
    • Remember that the partition must be the same size as the network, so that every vertex gets a value. If partition and network size don't match, you'll get an error when trying to Draw-partition (and likewise for vectors), for a simple reason: Pajek can't assign each vertex a value from the partition (or vector), as it should be able to do. Likewise, other operations requiring a network and a partition (or vector) won't work. So be sure they match. When you extract subnetwork, this means you have to perform a corresponding extraction on associated partitions and vectors. See below on how to do this.

Extracting

Typically you need to reduce your initial network. Maybe you wish to delete vertices that are isolated from the rest, or delete low-value lines, or delete everything that's not in the 4-core. How to do this? These techniques are very important, so read carefully (and try it!):

  • You can delete lines of low value using Net>transform>remove>lines with value>lower than. You may wind up with a network containing isolated vertices which you want to "clean up". To do so, it's easiest to create a degree partition, then eliminate the zero cluster by extracting a subnetwork comprising only the higher clusters (1-999)...to do that, read on...
  • To extract a subnetwork (a set of vertices together with their connecting lines) you first need an appropriate partition, as you will be extracting based on vertex attributes stored in that partition. For instance, you might create a degree partition (network data) - that prepares you to eliminate vertices of low degree. Or you might create a gender partition (attribute data) that prepares you to examine the subnetwork containing only males or only females.
  • Let's call the partition determining the selection the selection partition. You'll use it several times:
    • To reduce the network itself: Once you have the selection partition selected, use Operations>extract from network>partition. Type the partition clusters (or ranges) you wish to extract, and Pajek does the rest.
    • Now you may need to reduce an associated partition or vector in the same way. Let's call this the data partition or data vector. For instance, if you used a degree partition to eliminate vertices of degree zero, you may need gender information for the remaining vertices. How do you do this? Obviously you've got to eliminate partition values that correspond to the vertices of degree zero. Here's how to do that:
      • For partitions: set the data partition as the "first partition" using Partitions>First partition. Set the selection partition as the "second partition" using Partitions>Second partition. Then execute the command: Partitions>extract second from first, specifying the same partition clusters you used to extract the subnetwork.
      • For vectors, select the selection partition in the dropdown list of partitions, and the data vector in the dropdown list of vectors. Then use Vector>extract subvector, again specifying the same partition clusters you used to extract the subnetwork.

Comparing

Ultimately your work probably boils down to exploring in order to locate significant relationships between network data (derived from the connections among vertices) and attribute data (properties of vertices and lines that can't be derived from the network connections themselves). Testing for significance in turn requires statistical tests. Luckily these are available in Pajek, and can be applied without understanding exactly how they're computed.

  • Comparing multiple partitions. Typically, after exploring using the various available tools, you'll want to test if there's a significant relationship between two or more partitions - one derived from attribute data external to the network itself (for instance, gender, age, musical genre, etc.) and the other resulting from a Pajek analysis of network connections (for instance, specifying, for each vertex, which component it's in - or its degree, or which component of the 4-core (or 4-slice) it's in, or how many triads it participates in). How to do this?
    • Make sure you have both partitions available in the drop-down list.
    • Select one of the two.
    • Choose Partitions>First partition
    • Select the second of the two
    • Choose Partitions>Second partition
    • Run Partitions>Info>Cramers V/Rajski
    • Examine the resulting cross-table and interpret the statistical results, as per ESNAP
  • Comparing multiple vectors. You may want to do the same sort of comparison between vectors. In this case, the test is a Pearson correlation, which tests for linearity between the two. Simply set vectors #1 and #2, using the Vectors menu command, and then Vectors>info.
  • If the data you want is contained in both partitions and vectors, you should convert partitions to vectors, or vice versa, in order to compare them.

Cleaning, copying, and saving data

  • Deleting networks, partitions, or vectors. Sometimes you've generated too much stuff - you don't want to keep it all. Just use: File>Network>Dispose, or File>Partition>Dispose, or File>Vector>Dispose, and they'll go away. Be sure you don't want them before doing this - you can't get them back!
  • Saving data structures. After you've done lots of analysis, you may want to save all the Pajek data structures - your initial data, along with analyses. Just use: File>Pajek project file>save to create a .paj file containing everything. When you read this back in using File>Pajek project file>read everything will be as it was. Very handy.
  • Saving reports. You may also wish to save your report files, so that you can incorporate cross tabs or partition summaries in your papers. You'll notice the report window has a File command - use this to save the contents of the report window to a file, which you can then open with any text editor or wordprocessor.
  • Saving drawings. There are two export commands that work well from the Draw window:
    • First, note that whatever appears is what will be saved. If you don't want line values, for instance, be sure they're hidden.
    • Second, set options, using Export>Options from the Draw window. There are lots of options, and generally the initial settings work. But if something's not quite right (you want bigger vertices, or a different color) you should experiment to find out how to achieve what you'd prefer.
    • Third, choose one of the following two commands:
      • Export>EPS/PS. This command creates the most beautiful graphs. Technically, it generates an "encapsulated postscript" file. On my mac, double-clicking such a file automatically generates a PDF, but you may need a different procedure. It may be possible to insert either a PDF or a postscript file directly into your word document, or you can print and scan or append to the printed paper. These are "vector" graphs, which means you can enlarge without losing resolution. They also draw curved lines and loops can be displayed.
      • Export>Bitmap. This command essentially just preserves the bit information on your screen, so the graph looks exactly like the screen - and will be grainy if you enlarge it. Still, it may be good enough, especially if you've drawn your vertices large. You should be able to paste into a wordprocessor document. Note that loops won't display.
    • Another approach is to save the screen while the graph is displayed - on the Mac you have a utility (Applications>Utilities) called "Grab.app" which allows you to grab any region of your screen and convert to a .tiff file. PCs have similar features for saving screen shots.
  • I recommend you save everything to the desktop, or a folder on the desktop, because it takes time to navigate elsewhere using Pajek.

Exploring with Pajek

When you explore, I suggest that you keep a detailed log (paper or electronic, on your computer or online - your choice) with entries labelled by date/time, and explaining:

  • exactly what you did, in what order
  • where data came from
  • how you created files and where they reside
  • what files you input (.net, .clu, .vec, .paj)
  • what operations you performed (and exactly how you set parameters)
  • what files resulted and where you saved them
  • what you discovered
  • what worked, what didn't work
  • what you want to try next time

I also suggest that you organize your results using folders.

For greater flexibility, keep all your files on google docs so you can work from anywhere.

Like an laboratory work, results can accumulate and quickly become confusing if you're not organized, so you should keep track of everything carefully, so it isn't all a blur...

Further Pajek documentation and tutorials

Pajek installation

Pajek wiki

Pajek website