Meeting Report

The data needed to build taxonomic revisions on the web

WP6 Workshop: 26.-27. April 2007, The Natural History Museum, London, UK

Introduction

About 40 interested people from all over Europe attended the first EDIT-WP6 workshop to explore commonalities in content structure. For two days EDIT members met with colleagues from related projects and with representatives from the user community. A list of data needed to build taxonomic web revisions and a list of key conclusions regarding challenges for implementing web taxonomies were successfully assembled.

Exemplar groups

Dave Roberts gave an overview of WP6, emphasising the diversity of data which needs to be stored and presented a model of the data warehouse. Next followed the presentations of the palms exemplar group by Soraya Villalba, the Lepidoptera/Diptera exemplar groups by Irina Brake and the Compositae exemplar group by Eckhard von Raab-Straube. The exemplar groups differ widely in the amount, type and format of data currently available, but aim to provide very similar content, with main differences being taxon-specific.

Related projects

Malcolm Scoble presented the CATE project, which aims to create a web-based, consensus taxonomy of two major taxa (Sphingidae, Araceae) which can be updated continuously through a peer-review and editing system. Other presented projects included Fishbase (Gert Boden) and the Solanaceae Source project (Lisa Walley). Fishbase provided a contrast to EDIT as it is not based on promoting collaborative taxonomy. Interestingly, the kind of content and tools it provides are highly relevant to EDIT’s exemplar groups. The Solanaceae project and the exemplar groups also show high overlap in content structure and presentation.

Relevant technology

David Remsen presented the GBIF perspective (ECAT) and focused on the process of compiling, normalising and importing checklists. Vince Smith talked about community editing and presented scratchpads as a way to help foster the development of communities creating and sharing data about biological organisms on the web. Markus Döring (WP5) presented EDIT’s Internet Platform for Cybertaxonomy, to consist of interoperable independent platform components, e.g. applications and services. These will be assembled using, whenever possible, existing software made interoperable through interfaces and using generic data standards. He emphasised the need to have a common data model between all services.

The user community

The second day was focused on the user community. Christos Arvanitidis presented the PROPE-taxon Responsive Mode Project which is very similar to EDIT in that one of its main goals is to promote communication and collaboration in taxonomic research. Users of PROPE are most interested in keys and literature followed by species lists and classification, and tools. Charles Godfray talked about the relationship between ecology and taxonomy. He emphasised the need for taxonomy to link the diverse information about a taxon and present both a plurality of scientific views as well as an expert consensus for the general public.

Rod Page talked about mashups, where data from multiple sources are machine-aggregated in one place. He emphasised the need for consistent identifiers (GUIDs), simple APIs (application programming interfaces), the need for meaningful rewards for contributors (e.g. publications) and a cultural change in regard to sharing data.

The presentation by Klaus Riede (WP7) on the requirements of ATBI+M included the need for geo-referenced lists of primary occurrence data with voucher specimens, well-documented observations, images (geo-references photos) and sound data.

Alessandro Minelli focussed on community building and David Agassiz on the amateur perspective. These presentations emphasised the importance of human factors in building successful and inclusive working groups of taxonomists to deliver web resources.

Conclusions and Key Issues

The workshop closed with a final discussion. Overall the following key issues emerged from the workshop as challenges for web taxonomy:

  1. There is high overlap among the exemplar groups in the main content elements they are aiming to provide in their web portals. The structure of the database, however, should be flexible and allow for creation of content fields as they are needed.
  2. The exemplar groups differ widely in the amount and format of the initial available content. A mechanism is needed to translate different data sources into a format that allows for easy import of data into the database, including bulk data import (data parsing, etc). TDWG standards should be implemented.
  3. There seems to be general agreement between the attendants of the meeting that it is on the one hand desirable to provide a consensus taxonomy (a preferred “expert” view) for the benefit of non-specialist users seeking information about a taxon. On the other hand, scientists and taxonomic experts would also need to have access to and work with different equal-level taxonomic concepts, which could be confusing for the other users.
  4. For quality control an online peer-review by an editorial board is generally favoured over a community-based system.
  5. Sharing a common data structure is essential to develop the cyberplatform (EDIT WP5), which will include independent interoperable components, either to work online or offline. The web portals are a component of the cyberplatform.
  6. Connectivity with other online sites and data providers is a key element in the web portals. We need mechanisms to extract and feed data into global and specialised websites, as well as a system for automatic notification of changes to the interested parties.
  7. There are incentives to encourage taxonomist to use the web: provision of easy tools, rapid dissemination and better visibility of information, and better communication with colleagues.
  8. It is essential to have a mechanism for crediting contributions to the taxonomic web portals that translate into traditional measures of impact.
  9. Some taxonomists with key knowledge will not use the web. They can be engaged by offering manuscript and publication mark-up and extraction techniques for information in MS Word and other common formats. The aim is to make the technology fit the taxonomists rather than vice-versa.

Preliminary list of content for data warehouse

  • Taxonomic and nomenclatural data
    • Taxon names
    • Parental relationships
    • Synonymies
    • Name relationships
    • Authorities
    • Bibliographic data
    • Information source
  • Common names
  • Type information
    • Type status, images, references
  • Protologues
  • Descriptions
    • Character state data
  • Keys
  • Locality data
    • Distribution
    • Habitat
    • Stratigraphy
  • Specimen data
  • Images
  • Biology & ecology
    • Conservation status
    • Uses
  • Phylogenies with associated metadata
  • Genetics
  • Molecular data
  • Barcodes
  • Glossary

List of participants

David Agassiz The Natural History Museum, London
Christos Arvanitidis Hellenic Centre for Marine Research
Bill Baker Royal Botanic Gardens, Kew
Vladimir Blagoderov The Natural History Museum, London
Gert Boden Royal Museum for Central Africa
Finn Borchsenius University of Aarhus
Irina Brake The Natural History Museum, London
Isabel Calabuig University of Copenhagen
Ben Clark University of Oxford
Markus Döring Botanic Garden and Botanical Museum Berlin-Dahlem
Daphne Duin Muséum National d’Histoire Naturelle
Henrik Enghoff University of Copenhagen
Charles Godfray University of Oxford
András Gubányi Hungarian Natural History Museum
Anna Haigh Royal Botanic Gardens, Kew
Kenan Harman Royal Botanic Gardens, Kew
Charles Hussey The Natural History Museum, London
Mark Jackson Royal Botanic Gardens, Kew
Norbert Kilian Botanic Garden and Botanical Museum Berlin-Dahlem
Paul M. Kirk CABI UK
Niels P. Kristensen Natural History Museum of Denmark, Copenhagen (SNM)
Chris Lyal The Natural History Museum, London
Simon Mayo Royal Botanic Gardens, Kew
Alessandro Minelli University of Padova
Rod Page University of Glasgow
Andrew Polaszek International Committee for Zoological Nomenclature
Eckhard von Raab-Straube Botanic Garden and Botanical Museum Berlin-Dahlem
David Remsen Global Biodiversity Information Facility - GBIF, Copenhagen
Dave Roberts The Natural History Museum, London
Simon Rycroft The Natural History Museum, London
Malcolm Scoble The Natural History Museum, London
Ole Seberg University of Copenhagen
Vince Smith The Natural History Museum, London
Piet Stoffelen National Botanic Garden of Belgium
David Taylor Royal Botanic Gardens, Kew
Soraya Villalba Royal Botanic Gardens, Kew
Lisa Walley The Natural History Museum, London
Piotr Wegrzynowicz Museum and Institute of Zoology – Polish Academy of Sciences
Julius Welby The Natural History Museum, London
Geoff Woodward Royal Botanic Gardens, Kew
Scratchpads developed and conceived by: Vince Smith, Simon Rycroft & Dave Roberts