Scanning New Horizons:
The `Big Data' Project

In May this year we launched a new research project funded by English Heritage to develop policies for the preservation of unusually large data sets that are becoming more common among researchers. The Preservation and Management Strategies for Exceptionally Large Data Formats Project, `Big Data' for short, has begun investigating why, when it comes to preservation and access, size matters.

It is a fact of life that the technology for gathering, processing and storing data is always smaller, faster and more ubiquitous than before. As our collective capacity to generate and process data grows so does the size of our data sets. The data in question in this project is typified by research methods that give rise to exceptionally large file sizes.

The technologies associated with the preservation, storage and delivery of such data are problematic for any organisations who have a responsibility for archiving and making these data available for the future; and the ADS is most definitely one of them.

In recent years we have reaped the benefits of digital technologies in archaeology, allowing us to survey large areas of topography from the air using LiDAR or at sea using various scanning techniques. We have also seen a proliferation of applications for 3D scanning in the analysis and presentation of anything from artefacts to rock art or Egyptian tombs. Maritime archaeology in particular generates a great deal of `big data' both through sea bed scanning and modelling techniques such as side scan sonar or the extensive use of video to record maritime excavation. The proliferation of data capture and analysis technologies has not been matched by our understanding of the implications for preservation, dissemination, reuse and access. This is exacerbated by the proprietary nature of many of the formats created by the new research technologies, and thus their dependency on specific, little used software.

The `Big Data' project will be managed by the ADS, exploring the issues three practical case studies. A recently completed 3D laser scanning project from Durham University entitled Breaking through Rock Art will provide a test case with 3D laser scanning.

Sonar image of HMS A1

Side-scanning sonar image of `HMS A1' Britain's first submarine: image by kind permission of Wessex Archaeology

This project has a full range of scanning data, having scanned all the stones of Castlerigg in Cumbria. Meanwhile, Wessex Archaeology have offered to provide their Wrecks on the Seabed Project as an example of marine data concerning the investigation of submerged archaeological sites using geophysical tools and diver-based techniques. Our third data set comes from the Where Rivers Meet Project at the University of Birmingham. This project has made extensive use of airborne remote sensing devices (LiDAR) to survey the landscape at the confluence of the Trent and Tame rivers in Staffordshire.

During the 18 months of the `Big Data' project ADS staff will audit the archives created by these three projects and make recommendations on archiving of the data and most importantly on options for allowing access to these data and others like them. Highlights for the project will be a workshop for specialists in November and a paper at IFA in 2006. We will, of course, publish the `Big Data' activity online.

Jon Kenny and Tony Austin

jk18@york.ac.uk and afa2@york.ac.uk

For more on the `Big Data' project see:
http://ads.ahds.ac.uk/project/bigdata/

In this issue ...