Specific usability issues have been noted as follows:
The following comments refer to more complex functions which require user interaction after they have been invoked. Examples include the spell-checker, the FAT function, and others.
Ideally, in terms of interaction, the GG user interface would behave more like familiar applications such as Word.
In particular, the use of modal windows for interaction within functions can cause problems. This type of window does not allow the user to interact with the file being worked on until the user input is finished and the dialogue window is closed. This makes checking the context for the decisions being made about the text being worked on difficult or impossible. In some cases this makes using other applications more sensible for certain processes (e.g. spell checking).
One improvement would be to include more surrounding context where tokens are shown in the dialogue window for checking or correction. (The Edit/Slide Annotations function, which displays each instance of a specified annotation type singly and with surrounding text already uses this approach. I would like to see this extended to address the modality issue).
Another possibility would be to remove the modal dialogue box completely and work within the main document, moving from example to example as changes are approved.
Another example of modality is the differing behaviours available to users depending on whether the mark-up of an element is shown, or if just the content is highlighted. In highlight mode, elements can be removed, but tokens cannot be excluded or included from an element, even where there is no ambiguity in which element is being referred to.
The addition of elements to the text results in element names being added to the right-hand column, where their display can optionally be turned on. For new users, the effects of their actions may appear to be invisible or absent unless they notice the new element names appearing in the list. This may be confusing, and may also prompt multiple applications of functions. I would suggest a user preference option to allow newly added tags to be shown.
Guido tells me that a user preference for this exists, which turns on highlighting of new elements when these are added.
On one occasion I saved a document to a file and later found it to consist of only the text content with no mark-up at all. This was probably caused having the 'save as' output set to 'text'. If this preference is preserved during a session, perhaps the impending loss of markup should be brought to the attention of the user?... Maybe not, as it is a case of user error, but it might be more friendly.
Update: Guido reports that this is not so much iof an ssue now, as files are saved with different file extensions depending on the output type selected. However, mistakenly saving as 'text' can still result in lost mark-up if the original file is then worked on with the assumption that the saved-as file contains the full content of the file. This is also true when saving only selected elements, so users need to be aware of this.
Importing a large file (~1MB, 120 pages in Word) into GG takes 2 to 3mins, as does each subsequent screen redraw, which occurs after most edits and any window resize. (This is on a PC with a 3.2GHz Xeon processor and 1GB RAM - a fast machine).
From a user perspective, this is too slow to enable work on long documents (> approx. 200kB), unless a job can be largely or completely automated as a batch.
They may need to be an upper limit placed on the size of the input file, though how to specify this may be problematic, as the processing overhead may be more directly related to the size and complexity of the document tree than absolute file size per se.
GG has the capability to handle files as 'parts', i.e. split them into multiple files for individual editing and later rejoining.
The user specifies an element to use as the unit for splitting, e.g. section. This goes a significant way to reducing the speed problem. The issue still remains that a suitable set of elements must exist in the file or be added to it if the document is to be split into an appropriate number of parts and in the appropriate locations, which may still require edits to the large file in GG, unless some other application is used.
I have extracted just the main section of the file (about 30 pages in the Word document) and I have exported this section as filtered HTML. This has reduced the 'refresh' lag to 10 seconds or so (on this fast PC), which is still not ideal, but is workable. It would probably be best to break the document down still further into species chunks. I will give this a go in GG, as it allows me to export to document 'parts'.
The use of predefined tools and pipelines reduces the impact of screen redraws for scriptable edits, as multiple edits are carried out automatically in a batch. Human driven edits are still affected, however.
Guido tells me that one option would be to only render visible part of the document, though there is no immediate plan to implement this. Otherwise, a move to a faster GUI component might be of use. (GG currently uses JTextPane). The availabilty of a replacement component is unclear. I will leave this issue to wiser heads than mine.
Increasing the Java heap size by editing the -Xms and -Xmx arguments in the GoldenGATE.bat did not resolve the problem. The process appears to be CPU limited, not memory limited.
For now, large documents are probably best worked on by saving them into several chunks in another application, and working on them individually. This may to some extent reduce the accuracy of the automatic detection and mark-up of taxon names using FAT.
Files saved as XML appear to be ANSI encoded, and have no encoding specified. Undeclared encodings are usually treated as UTF-8 in XML reading tools.
Opening UTF-8 encoded XML files in testing caused certain characters to display incorrectly. It is unclear if this is just a display issue, or if characters are handled incorrectly at other levels in the GG processing. This is a particular issue with texts using non-Latin characters, and author and place names containing non-ANSI characters.
Attribute Taxon Names function
The Find Taxon Names tool in GG may miss some names, and these are then annotated manually. Rather than requiring the user to adding the required attributes to the taxonName elements manually, a time consuming and repetitive process, a function has been provided which analyses the document content which has been annotated with <taxonName> tags and attempts to populate the element attributes automatically.
The user is presented with a UI panel listing the content of all of the taxonName elements, and the results of parsing these into attribute values.
This parsing process appears to be significantly less reliable than the results of the FAT tool. For example, FAT usually handles genus name expansion in taxonName attributes properly, so a name string like S. earlei in the Simuliidae file will be marked up as follows:
<taxonomicName _evidence="knownData" genus="Simulium" genus.bestMatchDistance="86" genus.bestMatchVote="2" genus.innerRound="1" genus.outerRound="1" rank="species" species="earlei">
S. earlei
</taxonomicName>
Note the expanded genus attribute.
In testing, the Attribute Taxon Names function instead offered a genus value of S, which must be corrected using a drop-down selection, once for each element needing correction. This is time consuming and prone to error.
There is an option in the UI to select any of the displayed elements for tag removal, but there is no option to ignore an element. This approach may be the best way to work, as FAT can then be rerun. This combination works, but requires a repeat of the post-FAT manual checking stage which has already been done, and introduces opportunities for error.
My suggestions would be:
In the section of the GG manual Workflow to generate a valid TaxonX XML document up to Level 1 (pg7), the Get LSIDs for Taxa custom function is used.
For reference: The TaxonX discussion regarding the handling of page numbers ( here - scroll down to Pages/Page Breaks) says "Page numbers are part of the minimal information requested to stay with the treatments in traditional publications." A page break element in the form <pb n="373" url="http://foo/bar.html"/> has been added to the TaxonX schema.
Should such an element be required as part of the Web Revisions data, this will need to be bourne in mind, particularly as the input document will not necessarily be a scanned/OCRed version of the document as published. In this case, should the published page information be needed, a means of adding the <pb/> tags with their associated attributes will be needed.
Moving the file to my Mac allowed me to run the spellcheck. The UI is awkward, with a modal approach (so you are either in 'spellcheck-mode' or not, so no ability to edit the file manually or scroll the file to check on something) , and there is no highlighting of the text being corrected, making it difficult to be sure exactly which instance of a word is being corrected at any particular time, or even to find it at all.
I gave up on GG's spellchecker very quickly. Until this is aspect of the UI improved, it is probably worth carrying out this function in an external text editor, and re-importing the corrected file.