Revision of The capture and encoding of bibliographic reference information from Wed, 2007-05-30 15:31

Bibliographic content

The Web Revisions system in general would need to be able to handle bibliographic references.

Content added via the main data input process

The requirements for this are still to be confirmed.

Content added via parsing of existing documents

The data import system would ideally be able to recognise, parse and annotate bibliographic references within input documents, probably for later extraction by the analyser and addition to the data warehouse.

It seems wise to look for existing components which could be used to carry out this step, either as a pre or post GoldenGATE stage in the data flow.

The involvement of GG in the bibliographic markup process may simply involve the annotation of the references either as a document section or individually, for later processing.

It is conceivable that GG could call a 3rd party library to carry out the automatic detection, parsing and mark-up of bibliographic entries. I do not believe that this is a high  priority for the developers at this time.

Available tools

There is an existing perl library for reference parsing ParaTools, which Rod Page has used in his demonstration of a prototype reference parsing and DOI retrieval service.

See also: Automatic extraction of references from a paper and ParaCite.

TaxonX reference mark-up

Discussions relating to TaxonX and the treatment of biliographic content can be found here.

Scratchpads developed and conceived by: Vince Smith, Simon Rycroft & Dave Roberts