<< Chapter < Page Chapter >> Page >

More recently, this team has also been in the forefront of federating repositories. While a solid foundation within an institution may be essential for the longevity and sustainability of the material, from the point of view of a researcher the location of a dataset is irrelevant. A researcher, or indeed any user, wants to be able to locate, access, reuse and combine datasets independently of where they happen to be managed. The eCrystals Federation is establishing an international partnership of crystallography data repositories implemented using heterogeneous technologies, with the aim of integrating them into a broader framework for the curation and dissemination of crystallographic data.

There is also a recognised need for linking research publications in institutional repositories with the primary scientific datasets on which the research is based, which will in general be held in separate repositories such as the ones described here. A number of projects – for example StORe and CLADDIER – have been investigating the potential for such linkage.

Repositories of workflows

There is a need to curate scientific processes as well as the data. myExperiment uses digital repository systems to curate and share scientific workflows as part of a social networking environment in which researchers can annotate, discuss and execute these workflows. Initially developed as a community resource for Taverna users, the scope of myExperiment has expanded to incorporate users of other workflow systems.

myExperiment is more than a website for sharing these objects – it addresses directly many of the issues around publication in more traditional digital repositories, including object versioning, and – importantly for a repository of published artefacts – author attribution and credit, making use of the trust models implicit in the Web 2.0 world. Workflows are just as much an intellectual creation of someone’s research as a traditional publication. myExperiment is currently being enhanced as a result of further JISC funding, to support additional content types, and provide a richer range of functionality, such as user-defined vocabularies for more effective tagging and discovery, and a mechanism for better expressing relationships between items in the repository, which can be used for (e.g.) provenance capture.

Repositories within workflows

The RepoMMan project developed an environment that allowed users to interact with a repository as part of their natural workflow. The repository is not just viewed as a stand-alone “silo” of information into which researchers explicitly deposit completed research outputs (and then perhaps forget about them), but is rather a tool that supports researchers by providing services for managing digital content throughout their research activities, from initial conception and experimental work through to archiving and publication. This was implemented by using a workflow engine to orchestrate web services supporting common repository-related tasks.

Although RepoMMan was developed in the context of creating and managing documents rather than complex material and research data, this restriction is not essential. Indeed, this pioneering work is being continued in other current projects, notably Hydra , which is developing a Scholar’s Workbench with a repository at its core, and CLIF , which is investigating the repository as an embedded part of an environment supporting the lifecycle of digital content in various forms.

Repositories as virtual research environments

The term Virtual Research Environment (VRE) covers a range of different systems, and this is not the place for a definitive definition. However, as virtually all research deals with data in some form, a repository is likely to be an important component of any such environment.

An example is provided by the eSciDoc system, one of the aims of which was to provide a digital library/archive ensuring permanent access to the variety of research outputs of the Max Planck Society (MPG) . For the present purposes, however, more interesting is its use as a generic framework for building virtual research environments for specific research communities within the MPG .

eSciDoc provides a generic infrastructure and a rich set of services , built around a digital repository, to support the entire lifecycle of a research project, including visualisation, manipulation, processing and publication of the various information objects used by researchers. This framework is used to develop “ solutions ”, which are researcher-centric applications for supporting particular research communities; a number of these have been developed by the library service of the MPG .

eSciDoc is a system with very rich functionality and potential; in consequence, developing these solutions for specific communities requires a significant amount of programming effort. At the other end of the spectrum, the Islandora project is developing a VRE framework based around the Fedora repository system, which like eSciDoc allows distinct environments to be created for individual research groups around a common repository infrastructure. It is however a much more lightweight solution, developed in the form of plug-ins for an open source content management system, and allows new environments to be created in rapid and agile fashion.

A final example illustrates the potential synergies between repositories and wider e-infrastructure technologies, specifically the gLite grid middleware. The gCube system allows the ad hoc creation of ad hoc data-centric virtual research environments that are tailored to the needs of specific resource groups, and built on top of gLite -based grid infrastructures such as EGEE [will add link to Steve Newhouse’s chapter here]. These environments provide virtual repositories that allow pre-existing data resources, diverse in terms of formats and metadata standards, to be combined, manipulated and annotated.

Summary: the value of repositories in research

We have seen how repositories have evolved greatly from their beginnings as a “home” for research papers, both in terms of the material that they hold and the uses to which they are put, and our examples are moreover far from exhaustive. Crucially, repositories are being integrated with broader processes and infrastructures, not only within research groups and institutions, but globally. In particular, the architecture of the web is being exploited to facilitate the exposure and re-use of digital material in repositories. This movement towards a more joined up world may be expected to continue, making use of approaches such as Linked Data , which aims to use the web, and in particular Semantic Web , technologies, to forge connections between related data, leading to a vision of a digital ecosystem in which repositories form a key component.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Research in a connected world. OpenStax CNX. Nov 22, 2009 Download for free at http://cnx.org/content/col10677/1.12
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Research in a connected world' conversation and receive update notifications?

Ask