The capture and encoding of bibliographic reference information

Content added via parsing of existing documents

The data import system would ideally be able to recognise, parse and annotate bibliographic references within input documents, probably for later extraction by the analyser and addition to the data warehouse.

In TaxonX, bibliographies are marked up using the <bibref> element, without atomisation:

<tax:bibref>
Donisthorpe, H. St. J. K. 1946. The ants of Mauritius. Ann. Mag. Natur. Hist. (11) 13: 25 - 35.
</tax:bibref>

In it's current state of development, the involvement of GG in the bibliographic markup process would involve no more than the annotation of the references either as a document section or individually, for later processing.

Future development

Tony Catapano intends to extend the TaxonX schema to add sub-elements to the element content model. This may be accomplished by adding MODS or Dublin Core (in which all elements are global) to <bibref>.

Presumably, once these additional elements are added to the schema, development work will be done to add support the automated parsing and annotation of bibliographic references within GoldenGATE, either using built-in functionality or a 3rd party library or web service to populate the additional elements.

Available tools

There is an existing perl library for reference parsing ParaTools, which Rod Page has used in his demonstration of a prototype reference parsing and DOI retrieval service.

See also: Automatic extraction of references from a paper and ParaCite.

TaxonX reference mark-up

Discussions relating to TaxonX and the treatment of biliographic content can be found here.

edit logoScratchpads logoCreative Commons Licensedrupal logo
Scratchpads developed and conceived by: Vince Smith, Simon Rycroft & Dave Roberts