Pajek help

From CCE wiki archived
Revision as of 12:30, 13 September 2011 by Michaelf (talk | contribs)
Jump to: navigation, search

Installing Pajek on your computer

See these instructions.

Pajek data files

Pajek reads and saves network data by using special files with extensions such as "net", "clu", "vec", and "paj". The "net" file stores a network; "clu" and "vec" store cluster (integer) and vector (real) arrays, assigning one value per node. A "paj" file can combine other files, saving the time of reading them all in separately. You may create networks from within Pajek, in which case Pajek will create these files for you when you save. Or you may wish to create the files yourself. But be careful - these files must be pure text, without any additional formatting information (such as word processors like Word usually add)...or they won't work.

Creating Pajek files in Pajek

Pajek is capable of creating and editing networks, which can then be saved in the form of Pajek files. The easiest way to create a network from Pajek is to use the command:

Net->Random Network->Total number of arcs

Tell Pajek how many vertices you want, and then ask it to create zero arcs. The result is a graph with no arcs. Draw the graph, and you can add the arcs yourself. Or you can edit the graph in Pajek's main screen under "Network".


Editing Pajek files

Pajek files are pure text, and easily readable or writable. The trick is to keep them pure text.

On Windows, create data files using a simple text editor, such as Wordpad, that saves text only.

For Mac, follow these instructions:

  • Create a folder on your Desktop called "AAA Pajek Data", with as many subfolders as you like. You can create an alias to this folder and keep it anywhere else, but having this folder on the Desktop will save time, because the Mac version of Pajek requires that you navigate from the top level of your directory hierarchy.
  • I suggest that you create your Pajek files using Word. Be sure to turn off "Smart Quotes" (so that quotation marks appear "straight"); in MS Word 2008 for Mac this option appears under menus Tools/AutoCorrect/AutoFormat as You Type (other versions may place it elsewhere). Be sure to enter a carriage return ("return") after every line, and leave no blank lines. Save your files using output format “Text only with line breaks”. This will force the file into the required plaintext PC format (where line breaks contain both a carriage return and a newline, unlike unix which uses newline only). Word wants to add the ".txt" extension to such files - delete that and replace with the required file extension instead. Other text editors may also be able to create readable Pajek files - try and see what works.

Renaming file extensions

For either, Mac or Windows, saving as pure text inevitably adds a ".txt" file extension. You'll need to change this extension to what Pajek wants (e.g. from "txt" to "net", "clu", "vec", or "paj", as needed), or it won't read them. This is simply a matter of editing the file name. The problem is that operating systems like to hide the extension from you (thinking you don't want to see them!). Here's what to do:

For Windows: from the Start menu, go to Control Panel, select Folder Options, select the View Tab, and then uncheck Hide extensions for known file types.

see http://maximumpcguides.com/windows-7/show-file-extensions/

On the Mac, go to Finder. select Preferences, Advanced, and check Show all file extensions.

http://www.fileinfo.com/help/mac_show_extensions

Once you can see the file extensions, you can change them by renaming the file (usually a right or control-click will do it). Delete txt and insert the correct Pajek extension instead.

Pajek file formats

Pajek uses primarily three file types: .net, .clu, .vec (network, partition, vector). The .paj type allows you to combine multiple networks, partitions, and vectors in one file. You'll generally generate .paj files using Pajek. However you may be creating .net, .clu, and .vec files using spreadsheets and word or text processors.

Network files

Network files (extension ".net") define a network, as in the following:


*Vertices 3
1 "Doc1" 0.0 0.0 0.0 ic Green bc Brown
2 "Doc2" 0.0 0.0 0.0 ic Green bc Brown
3 "Doc3" 0.0 0.0 0.0 ic Green bc Brown
*Arcs
1 2 3 c Green
2 3 5 c Black
*Edges
1 3 4 c Green


This example defines 3 vertices (Doc1, Doc2 and Doc3) denoted by numbers 1, 2 and 3. The (fill) color of these nodes is Green and the border color is Brown. The initial layout location of the nodes is (0,0,0). Note that the (x,y,z) values can be changed interactively after drawing.

There are two arcs (directed edges). The first goes from node 1 (Doc1) to node 2 (Doc2) with a weight of 3 and in color Green.

There is one edgerom node 1 (Doc1) to node 3 (Doc3) of weight of 4, and is colored green.

Note that you must include a carriage return/new line after the last line of a Pajek file!

Partition files

Partition files (extension ".clu") divide the vertices into classes, called partitions. These partitions cannot overlap, and yet must cover all the vertices. In other words, every vertex is assigned to one and only one partition. Here's an example:


*Vertices 3
4
8
8


In this example, the three vertices are assigned to two different partitions - vertex #1 in partition #4, and the other two in partition #8 (note that the vertex numbers do not appear, but are simply inferred by ordering.

Vector files

Vector files (extension ".vec") assign a number to every vertex.

Here's an example:


*Vertices 3
0.35
8.9
100.0


In this example, the first vertex is assigned a value of .35, the second 8.9, and the third 100.0. Again, the vertex numbers do not appear, but are simply inferred by ordering.

What's the difference between vec and clu?

Numerically, vector values are continuous (real numbers), while partition values are discrete (integers). Vector values are continuous, while partition values are discrete. More importantly, they have different interpretations. A vector assigns a real value to each vertex - these values are not expected to repeat, or define classes. A partition divides the vertices into classes, most typically with more than one vertex per class.

For instance, if people are vertices, a vector might define their weights or heights. We don't expect these values to repeat exactly. A partition might divide them into classes based on educational level, assigning the value 0 for those who haven't finished elementary school, 1 for those who have completed only elementary school, 2 for high school, and 3 for college.

But Pajek can convert partitions to vectors (Partition->Make Vector) easily enough (integer values are also real numbers). Slightly more complex is converting a vector to a partition (Vector->Make Partition) since real values must first be "rounded" to integers. Ensuring that these conversions make sense is up to the user!

Pajek tips

==

Converting datasets to Pajek files

These two programs are very useful for generating Pajek files out of data presented in a different format:

[1]

[2]


Documentation and Tutorials

[3]