<< Chapter < Page
  Online humanities scholarship:     Page 12 / 32
Chapter >> Page >

3.2. tools and platforms

The database management system chosen for the REKn prototype was PostgreSQL. As a standard system commonly used by the academic community, PostgreSQL allows for future collaboration with other researchers and integration with other projects. PostgreSQL’s open source status caters to the possibility of writing custom functions and indexes that cannot be supplied by other means. Moreover, PostgreSQL offers scaling and clustering of database systems and the data in the systems. Redundancy is also possible with PostgreSQL—that is, if one server in a cluster crashes, the others will continue processing queries and data uninterrupted.

A similar rationale dictated writing the web service in PHP, since PHP is a commonly used and well-understood framework for database access via the Internet, in addition to being open source. The data-entry application is likewise based on Perl scripts to use the web service as a database access proxy, since in addition to being open source software, Perl is well suited for string processing.

3.3. gathering primary and secondary materials

The gathering of primary materials for the knowledgebase was initially accomplished by pulling down content from open-access archives of Renaissance texts, and by requesting materials from various partnerships (researchers, publishers, scholarly centers) interested in the project. These materials included a total of some 12,830 texts in the public domain or otherwise generously donated by EEBO-TCP (9,533), Chadwyck-Healey (1,820), Text Analysis Computing Tools (311), the Early and Middle English Collections from the University of Virginia Electronic Text Centre (273 and 27 respectively), the Brown Women Writers Project (241), the Oxford Text Archive (241), the Early Tudor Textbase (180), Renascence Editions (162), the Christian Classics Ethereal Library (65), Elizabethan Authors (21), the Norwegian University of Science and Technology (8), the Richard III Society (5), the University of Nebraska School of Music (4), Project Bartleby (2), and Project Gutenberg (2). A master list of the primary text titles and their sources is included as Appendix 2. The harvesting and initial integration of these materials took a year, during which time various formats of almost 4 gigabytes of files were standardized into a basic TEI-compliant XML format. Roughly a dozen different implementations of XML, SGML, COCOA, HTML, plain text, and more eclectic encoding systems were accommodated.

For example, accommodating the XML TEI P4 conforming documents obtained from the University of Virginia Electronic Text Center’s Early English Collection required the following three-step process:

  1. EarlyUVaStepOne.xslApplication of an XSL transformation to remove the unnecessary XML tags, and to restructure the document using our internal-use tags. This step also derived a minimal set of metadata necessary for identifying the document with bibliographic MARC records.
  2. EarlyUVaStepTwo.xslThis step, applied to the cleaned, stripped and possibly restructured documents resulting from step one, transformed the XML list of our metadata into an HTML list, built links to the HTML and XML files, and provided some rudimentary navigation and statistics.
  3. EarlyUVaToHTML.xslApplied to either the source document or to the result of the first transformation, this process produced HTML suitable for web browsers. These transformations are very simple, producing only a minimum of HTML tagging. When we wish to serve more polished products to web browsers, this XSLT will serve as a starting point.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Online humanities scholarship: the shape of things to come. OpenStax CNX. May 08, 2010 Download for free at http://cnx.org/content/col11199/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Online humanities scholarship: the shape of things to come' conversation and receive update notifications?

Ask