This can be overridden by custom harvesters willing to support non ISO
formats (like FGDC). It is called whrn the original_document and
original_format harvest object extras are present. Custom harvesters are
responsible for transforming the original document to ISO.
Prior to the merging of the new spatial harvesters, the existing ones
based on Gemini and UKLP have been moved to their own namespace
(ckanext.spatial.harvesters.gemini). The plugin points have been updated
so users currently using these harvesters will still be able to use them
as normal.
The base harvester (SpatialHarvester) has been updated with new methods,
most significally '_get_package_dict' and 'import_stage'. Note that
SpatialHarvester now extends HarvesterBase on ckanext-harvest, which had
some of its methods updated.
TODO: still some geo.data.gov specific bits!
Changes in multiplicity to support the ISO 19115 spec rather than just
the Gemini 2 one. Thanks to @dread for his help on this.
Summary of the changes:
* dataset-reference-date: Set to 1..*
Note that there was a bug with mutliple values allowed per date.
Returned object should now be like:
"dataset-reference-date": [{"type": "creation", "value": "2004-02"},
{"type": "revision", "value": "2006-07-03"}]
* metadata-languge: Set to 0..1
* resource-type: Set to *. That means that a list is now returned
* bbox: Set to *. Note that bboxes are now returned as objects such as:
[{"north":xxx, "south":xxx,, "east":xxx, "west":xxx}, {"north":xxx,
"south":xxx,, "east":xxx, "west":xxx}]
The existing Gemini based harvesters and validators have been adapted,
all tests pass.
Add new mandatory fields when creating sources, status dict has new
keys, CKAN lower cases formats, take into account harvest source
datasets.
Added a local getcapabilities response to avoid remote 404s.
Note that the TestValidation tests need to be fixed, as 27c4ee81e
removed the validation from the gather stage.
- Added a config option ('ckanext.spatial.use_postgis_sorting') to
activate this as this behaviour will be deprecated in the future
in favour of Solr 4 spatial sorting capabilities.
Also fixed the tests
Conflicts:
ckanext/spatial/plugin.py
To make easier to filter and display errors on the UI, the validators
have been modified to return the message and line number separately. The
return format for validators is now:
(is_valid, [(error_message_string, error_line_number)])
Also the XSD based validators were returning only the last error found on
the document, instead of iterating the whole error log. Harvesters should
create a harvest object error for each of this validation errors.
Tests have been adapted to these changes.
Sometimes, even when requesting 10 records, the CSW server returns less
of them (see eg http://goo.gl/b7Rdj, only 9 records returned). The
current check made the process stop on this case, missing other
identifiers.