Bibliographic content
The Web Revisions system in general would need to be able to handle bibliographic references.
The requirements for this are still to be confirmed.
The data import system would ideally be able to recognise, parse and annotate bibliographic references within input documents, probably for later extraction by the analyser and addition to the data warehouse.
It seems wise to look for existing components which could be used to carry out this step, either as a pre or post GoldenGATE stage in the import process. (Can external components be easy called from within GG?)
There is an existing perl library for reference extraction - see Rod Page's demonstration of a reference parsing service.
(This function may instead be carried out by a module to be written WP5).
See also: Automatic extraction of references from a paper and ParaCite.
Discussions relating to TaxonX and the treatment of biliographic content can be found here.