Overview Edit

The Open Context project aims to demonstrate a practical, open-access, web-based method for organizing, sharing, and enhancing research on archaeological data. Open Context is an open sourced project built with commonplace open sourced web technologies, including PHP ([1]), MySQL ([2]), and Dojo Ajax ([3]). The project has been funded by the William and Flora Hewlett Foundation ([4]), and has recently won additional funding from the US National Endowment for the Humanities. All Open Context content is licensed with Creative Commons ([5]) copyright licenses.

Most humanities collection managers have little capacity to develop their own customized, web-accessible database solutions. Because Open Context can accommodate multiple project and collections datasets, it significantly reduces the costs of data dissemination. Open Context provides a common portal for browsing, simple “Google-like” searches, complex Boolean queries, data summary, data export, and tagging of several pooled datasets. The latter range from archaeological field projects to geological science and zooarchaeological datasets. Open Context also provides access to museum reference collections to help facilitate the identification and comparison of specimens recovered in current field projects.

The largest and most significant single dataset now served by Open Context includes some fifteen years of excavation documentation and artifact analysis from the Brown University excavations at the Great Temple of Nabataean Petra (a UNESCO World Heritage Site). Several more collections will be added to Open Context shortly, most notably an NEH-funded initiative to publish UC Berkeley excavations at Nineveh (a major ancient Assyrian urban center in northern Iraq).

This flexibility stems from over 20 years of development and field-testing of database designs by David Schloen, lead of the University of Chicago OCHRE(“Online Cultural Heritage Research Environment”) system. While OCHRE provides sophisticated data management tools targeted for active research projects, Open Context uses a subset of the OCHRE data structure (ArchaeoML) to support streamlined, web-based access and community organization of diverse cultural heritage content. Schloen designed the ArchaeoML schemas to accommodate cultural heritage datasets without imposing predetermined standard vocabularies or recording systems.Overly rigid predetermined standards may inhibit innovation in research design and poorly accommodate “legacy” datasets.

Thus, Open Context now successfully demonstrates the potential to pool diverse cultural heritage datasets in a common platform. It provides powerful tools and services to browse, retrieve, and analyze these datasets. However, Open Context development thus far has mainly focused on validation of the ArchaeoML schema and the technologies required to manage multiple datasets expressed in this schema. To support innovative research, Open Context requires continued development focused around user needs and experience requirements. In addition, significant community building must be accomplished.

Continuing Technical Development Edit

Current development efforts aim to make Open Context easier to adapt to meet the user experience requirements for different applications and user communities. To support greater flexibility in interface composition, Open Context development is currently continuing to implement XML/XSLT technologies. These will allow great flexibility in customizing content presentation and user interface tools and will enable Open Context to support rapid prototyping and iterative design techniques typical of user experience optimization methods. Finally, a major motivation for the XML/XSLT development focus is to facilitate the creation of simple, RESTful web services that will enable machine-to-machine interoperability and facilitate “mashups” of Open Context content with content from other data sources on the Web.

In addition, Open Context will shortly be updated to enhance speed and performance, especially as more content gets added to the system. It now runs on a shared server provided by a commercial web hosting service. This hosting service maintains, patches, and troubleshoots the server, thus limiting administrative costs. While the web-hosting service continually upgrades server performance, such upgrades may not come fast enough to support some use-cases and features for Open Context. For example, the faceted browsing feature is computationally intensive because it dynamically calculates sub-totals for many different “facets” of Open Context’s MySQL database. This feature allows users to find items based on more than one dimension and according to project-specific taxonomies, which is ideal for exploratory browsing. As Open Context’s collections grow, this faceted browsing feature will suffer greater performance problems. In order to meet these performance challenges, Open Context developers are now finalizing implementation of the Apache-Solr faceted index application. Solr has impressive performance characteristics, and is specifically designed for running faceted browse applications. The Open Context Solr implementation will also support RESTful web-services based on the Atom Syndication Format. Initial public deployment of the new Solr-based web-services is targeted for the last quarter of 2008.

In January 2009, a testing site was launched to test and demonstrate the new faceted search and Atom syndication services. The testing site will see continual roll-out of new functionality over the course of 2009.

ArchaeoML and Semantic Data Integration Edit

Cultural heritage collections span the humanities (history, classics, languages and the arts), social sciences (anthropology, linguistics, architecture, and area studies), and even frequently overlap with the natural sciences (geology, zoology, botany, and other environmental sciences). This array of disciplinary perspectives encourages a diversity of documentation needs and methods. Data, evidence, interpretations, and syntheses all have very different roles across this widely varying community. Such heterogeneity (not to mention uneven levels of technical expertise and funding) makes the construction of a digital dissemination infrastructure for cultural heritage collections a great challenge.

Open Context adopts the Archaeological Markup Language (ArchaeoML) standard, developed by David Schloen for the University of Chicago Online Cultural Heritage Research Environment project. ArchaeoML provides a common framework for expressing archaeological observations, their descriptive properties, and their contextual relationships. However, its inherently flexible item-based structure insures that the organization and description of the content is not predetermined. ArchaeoML's key features include:

  • Flexibility in Scale: An ArchaeoML item can be any type of archaeological observation at any scale, ranging from a region, to a site, to a specific deposit, to an artifact, ecofact, or even microscopic observation. Each item has its own unique label (site name, context ID, bone ID, etc.) created at the discretion of the researcher.
  • Flexibility in Description: Similarly, the names, terminologies, and values of the descriptive properties of each item are also created at the discretion of the individual researcher. For instance, one is free to describe the composition of pottery with a property like fabric, ware type, or any other set of variables. In other words, descriptive variables and terminologies are left to the researcher's discretion, and are not hard-coded into the data structure. Multiple media, including video, images, GIS, etc., also can be used in addition to alphanumeric text to describe specific items.
  • Accommodates Heterogeneity: New descriptive variables can be tailor-made for a specific unit without changing the descriptive framework for a whole class of artifacts. Researchers can create new observational criteria and descriptive properties very easily if they encounter unexpected or unique items.
  • Multiple Observations and Observers: ArchaeoML easily represents multiple observations (even contradictory observations) made on a single item. Each observation can be authored individually, thus explicating much of the process of knowledge construction. This feature also enables ArchaeoML to represent multiple descriptions of items created for multiple purposes. Museum catalogue data and archaeological contextual observations and descriptions can coexist on the same system.
  • Expresses Contextual Relationships: Extrinsic contextual relationships organize the mass of individual items into archaeologically meaningful structures. These relationships include spatial hierarchies (some items contain smaller items, which contain even smaller items), stratigraphic relationships of sequences of deposition (shown graphically stratigraphic flow charts such as Harris matrices), and relationships of spatial adjacency. These archaeologically meaningful structures (many of which are recursive) provide the framework that guides searches and analytically powerful queries.

As demonstrated by OCHRE’s pilot projects as well as the content now publicly available in Open Context, ArchaeoML is sufficiently generalized to accommodate datasets from wide-ranging archaeological and museum collections. ArchaeoML is conceptualized as a very generalized, item-based model, where individual atomic units of observation are related to each other and their descriptive attributes. Each item does not belong to a predetermined observational class (pottery, bone, deposit, grave good, etc.).

Users can group items into multiple and overlapping classifications defined according to their changing interests and assumptions. For Open Context, each location and object (site, building, context, artifact) has its own unique URL reference. Each item can therefore be further annotated with folksonomy tags or described with more formal taxonomies and ontologies. Thus, ArchaeoML’s high level of abstraction and simplicity has an important advantage; it is capable of being used in conjunction with many different classification systems and ontologies, including the Getty Art and Architecture Thesaurus or the CIDOC-CRM. Less formal and less costly (though less precise) methods for achieving greater semantic integration can be met with folksonomy systems. Therefore, the choice between the ArchaeoML and other standards is not a binary decision. Indeed, through implementation of the ArchaeoML global schema, Open Context can be readily extended to enable multiple individuals and groups to apply multiple ontologies as they see fit.

Data Mapping, Import, and Publishing with Open Context Edit

Open Context serves as a data-publishing tool for individual researchers and small institutions. Because of the nature of this community, standards for data archiving cannot be overly complex or prescriptive. Because ArchaeoML data structures are highly abstracted and generalized, mapping a given project’s database schema into the ArchaeoML global schema is relatively simple and fast. The web-based Penelope import tool both asks users to map their data schema to ArchaeoML and provide high-level descriptive metadata (Dublin Core elements). Penelope then sends the imported dataset to Open Context database managers for review and incorporation into the public site. A researcher's (or collection manager’s) original idiosyncratic vocabularies, observational variables, and values are all retained. Thus, while ArchaeoML is a standard, researchers retain full flexibility to develop and continually refine their methods, vocabularies, and recording systems. Data can make a round-trip into and out of expression in ArchaeoML (for example, Open Context users may “dump” data out into a familiar Excel table).

Related Open Access Publications Edit

Keywords Edit

  • Content
  • DataIntegration
  • ArchaeoML
  • Datasharing
  • Database
  • OpenAccess
  • OpenData
  • Software
  • Projects
Community content is available under CC-BY-SA unless otherwise noted.