Computing

The Orlando Project is exploring ways in which advanced computing systems can be used to organise and deliver scholarly work in the humanities in ways that open doors to new research and new understanding of texts in their multiple contexts.

The Orlando Textbase
The Orlando Textbase, published online by Cambridge University Press in June 2006, has been developed as an integrated part of the Orlando Project’s literary history. The overall research question it has been addressing since its beginning is this: how can the interpretive, critical work of this history be structured and delivered online in such a way as to be capable of responding to complex scholarly queries?

Development of the Orlando Textbase has fallen into three major phases: in the first, the Orlando team adopted and developed a new application for the Standard Generalized Markup Language (SGML). In the second, the team developed its production system, which is capable of exploiting Orlando’s very large SGML-encoded text. In the third phase, SGML DTDs were converted to XML RelaxNG schemas.

SGML: A Text Encoding Language
Orlando bases its computing work on SGML, which provides a way of encoding features of interest in a document. SGML encoding makes it possible to use the same electronic document for many different purposes. It also ensures that the documents will outlast the computer system on which they were created. Because SGML documents and the DTDs that define them are stored as standard text files and because SGML separates formatting of documents from their structure and content, the documents are therefore not tied to any particular platform or software package.

Orlando has used and built upon the encoding principles devised by the ground-breaking Text Encoding Initiative, but it has also undertaken to use SGML in an innovative, ambitious way, which departs substantially from the TEI. The Orlando Project is not encoding existing texts (such as, for instance, the novels of Jane Austen), but is encoding new textual material created by team members for the Orlando history. Encoding therefore takes a central role in our composition process. Orlando’s challenge has been how to represent and deliver online text that is interpretive, critical, literary history. To do this, the Orlando Project created SGML specifications (Document Type Definitions, or DTDs) for encoding interpretive information about women authors’ lives and writing, the project bibliography, and other important information, such as historical events. These are described in some detail on the DTDs page of this site.

How and Why Do We Encode Text?
Textual encoding allows us to construct behind our texts a fairly complex analytical structure, and to do this without compromising the readability of the prose that is so encoded. The embedded structure then opens the material to multiple uses – not to just straight reading through of text but also to detailed searching and even to on-the-fly restructuring.

Encoding allows us, for instance, to create chronologies by pulling material out of all of the different documents in the system. The system will identify, say, all of the women who lived or wrote in a specific place; or, all of the writers who wrote specific genres; or, all of the writers who wrote in a specific genre in a specific place over a specific period of time.

The network of data that enables the search for networks of writers, or of literary and social influences, can produce important historical insight. For instance, the extent of information (currently 8 million words of text) and the rich searchability of the textbase conveys not the traditional sense of the artist working in isolation, but a consciousness of networks and connections of various kinds between them. The Orlando textbase will support a view of many writers operating in connection with one another and of literary production as shaped by the circulation of words and ideas.

Hyperlinking
One exciting Orlando use of SGML is in highlighting relationships between writers on the basis of their connections to places, organizations, literary works, other people. The system automatically hyperlinks any name, organization, place, or title which is tagged and exists elsewhere in the system. These are delivered by category of the contexts in which they occur. For example, the tagged title of George Eliot’s Middlemarch will return more than seventy-five links to other mentions of the text (in the categories of “Intertextuality and Influence,” “Reception,” “Textual Features,” and “Publication.”)

Chronology
The SGML tagging of Orlando‘s text makes possible the creation of dynamic chronologies of several different kinds. The system can draw together a broad range of discrete events from social and political history and combine these with chronological information drawn from accounts of writers’ lives and writing careers. The result is a vast timeline base which can instantly generate more focused chronologies, on, for example, particular writers, texts, genres, movements or issues.

The Orlando Delivery System
Orlando’s innovative use of SGML requires a robust delivery system designed specifically to open up to users the intricately-structured textbase. Orlando has developed an exciting new approach to delivering complex encoded materials via XML (Extensible Markup Language) on the web. The unprecedently dynamic delivery of scholarly materials that is enabled by Orlando software tests the possibilities for literary history in electronic delivery.

The Orlando delivery system provides entries on authors’ lives and writing careers, contextual material, timelines, sets of internal links, and bibliographies. Users can navigate according to their interests, drawing on the uniquely structured materials.

The Orlando Project Delivery system is based on Java and Java related technologies on the server side, thus making it platform independent. The heart of the system is based on an Apache Tomcat servlet server, which interacts with a web browser client (e.g. Microsoft Internet Explorer, Mozilla Firefox, or Google Chrome). The servlets use information stored in a PostgreSQL database to extract and present information to the user. Stored within the database is the XML prose composed by the Orlando Project members plus some added information to enhance the performance of the system. An index on the XML text and attribute values is one example. XSLT is used to transform the XML data into HTML.