Data: integration and understanding

Archaeology produces a vast amount of data of a massive range of types. These include artefact descriptions, measurements, site plans, context plans, photographs, and cartographic and spatial data. Every excavation has particular challenges for data gathering and recording, and the possible responses of excavators to these challenges are constrained by scale, resources, the type of material, and so on. But there is no question that e-Research methods offer enormous potential for supporting such processes; and few archaeologists would doubt the desirability of integrating data from different sites. Eiteljorg (2004) writes of ‘the hope that data storehouses could be used by scholars to retrieve and analyze information from related excavations, thus permitting broader syntheses’ (Eiteljorg 2004: 22): broader synthesis is at the core of academic archaeology, and is vital for any interpretation that seeks to embrace any combination of site, inter-site, or regional scale. However, there is an obvious tension between the structures and standards any database must impose in order to be useful, and the unordered (and incomplete) nature of the archaeological record (see Lock 2003: 85-98). e-Research technologies can support researchers faced with such problems in a number of ways. One approach is the construction of domain-specific ontologies and controlled vocabularies, which can describe and link concepts, and map between different groups of concepts. Thus if artefact of type A is found at site X, then a linked ontological system should be able to identify further examples of type A at site Y, even if the artefacts have been otherwise recorded or described differently. This approach has limitations – those producing data still have to describe and/or annotate the information in a way that conforms with, or can be adapted to, the ontology. This will impose extra costs on already-overburdened resources. On the other hand, standardized metadata and data storage systems can be immensely useful and easy to implement, if supported by centralized support services and repositories like the Archaeology Data Service ( (External Link) ).

Other approaches seek to apply Natural Language Processing (NLP) technologies to primary archaeological material and secondary archives. One such example is the Archaeotools project, conducted as part of the AHRC-JISC-EPSRC Arts and Humanities e-Science Initiative by the universities of York and Sheffield ( (External Link) ). Archaeotools identifies and extracts references to ‘what’, ‘when’ and ‘where’ entities in so-called ‘grey literature’. Grey literature refers to reports of (usually small-scale) archaeological investigations that have been produced and archived, often never to be seen again. The NLP process allows information to be tagged in a systematic way according to ‘what’, ‘where’ and ‘when’ and structured into facets for facetted browsing. It should therefore be possible, for example, to search across a range of disparate archaeological reports for references to data concerning Early Medieval coins from North Eastern England [ when, what, where ], even if the information has not been tagged or described in such terms and the point of being recorded. In another important development, Archaeotools uses NLP-generated entities to search for the information according to the terms in existing controlled vocabularies such as Sites and Monuments Records (SMRs): as will be seen below, integrating e-Research methods within existing practices is essential for archaeology, so allowing researchers to search using the terms and conventions they are already familiar with is critical.

Source:  OpenStax, Research in a connected world. OpenStax CNX. Nov 22, 2009 Download for free at http://cnx.org/content/col10677/1.12
