archaeotools logo

Project Update

Stewart Waller

Stewart Waller
ADS Curatorial Officer




Following the announcement in our last newsletter, the ADS, in collaboration with the Department of Computer Science at the University of Sheffield, have begun work on Archaeotools - a major research project funded by the e-Science Research Grants scheme (AHRC-EPSRC-JISC). Although covering a number of different areas, the project will focus on data mining, natural language processing and ontology development. In particular, the project comprises three inter-linked but self-contained work-packages:

1. Development of advanced geospatial browser and faceted classification index

The aim of this part of the project is to classify the ADS's ArchSearch metadata records using sophisticated knowledge organisation techniques developed by the Sheffield group. Whilst the folk at Sheffield have been busy developing the classification engine, the ADS have been busy investigating various taxonomies and information schemas in order to develop a suitable archaeological ontology which will underpin the classification process. To achieve the best possible classification accuracy the system will make use of the ontology together with a set of classification rules. The classification system will adopt a multi-faceted approach, allowing the user to browse the ArchSearch catalogue in an intuitive way, rather than blindly searching for records that may or may not exist. The facets themselves will represent 'When', 'What', and 'Where' concepts - the three most common types of search employed by archaeological researchers.

Archaeotools screen shot

A screen shot of the CIE Demonstrator's interface. This shows how sites can be searched by clicking on facets, the number of sites for each facet is in brackets. The demonstrator can be viewed at: http://ads.ahds.ac.uk/project/cie/index.html

2. Data-mining of grey literature reports

The aim for this work package will be to mine much richer metadata from a sample of around 1000 archaeological grey literature reports archived by ADS. The 'When', 'What', 'Where' facets used in work package one can be re-employed, and the deeper access can be incorporated within the ArchSearch index.

3. Data mining of historic journal literature

The final work-package is the extension of work package two into fully published antiquarian literature, using the Proceedings of the Society of Antiquaries of Scotland as a test case resource. This is likely to be more challenging because of the greater variation in structure and vocabulary found in 19th-and early 20th-century reports. A particular area for investigation will be the extraction of geospatial referencing from antiquarian reports.

Data sources:

For Work-package One, records in the ArchSearch catalogue have been harvested as MidasXML fragments which will be fed into the system as the primary data source. In this way, the model allows potential for external data sets to become classification targets. Until work commences on the second and third work package, the success of these later investigations is as yet unknown. Ultimately however, we fully expect the system to facilitate a major enhancement of the ADS ArchSearch facility and make more, and richer, archaeological information available to our users - watch this space!

Archaeotools: http://ads.ahds.ac.uk/project/archaeotools/

contentsbar.gif