Commit Graph

369 Commits

Author SHA1 Message Date
amercader 32a93b8ec8 [#8] Add get_extra to SpatialHarvester class 2013-02-15 14:56:00 +00:00
amercader 4964ce9ec3 [#8] Provide a config option to continue after validation errors
This can be set instance wide on the ini file with

ckanext.spatial.harvest.continue_on_validation_errors

or per source, adding continue_on_validation_errors=true to the source
config.
2013-02-15 14:47:25 +00:00
amercader 596bdbf5d0 [#8] Minor tweak to source_config 2013-02-15 12:50:09 +00:00
amercader 2ff5a11911 [#8] Use site user for harvesting actions
You don't need to create a 'harvest' sysadmin user any more.
By default this will be the internal site admin user. This is the
recommended setting, but if necessary it can be overridden by
the `ckanext.spatial.harvest.user_name` config option, eg to
support the old hardcoded 'harvest' user
2013-02-15 12:28:58 +00:00
amercader baf7b5da67 [#8] Rename harvested document model to ISODocument
GeminiDocument has been kept for backwards incompatibility.
2013-02-13 19:16:36 +00:00
amercader dadd174293 [#8] Make sure only list and dict extras are dumped as JSON 2013-02-13 18:48:53 +00:00
amercader 00a0b5946b [#8] Add transform_to_iso method
This can be overridden by custom harvesters willing to support non ISO
formats (like FGDC). It is called whrn the original_document and
original_format harvest object extras are present. Custom harvesters are
responsible for transforming the original document to ISO.
2013-02-13 18:33:58 +00:00
amercader 641ffad589 Remove silly debug message 2013-02-13 17:49:15 +00:00
amercader 305951aeb5 [#8] Make get_package_dict public and document
So it is more obvious that it can be overridden by custom harvesters.
2013-02-13 17:06:01 +00:00
amercader 1d8a4c17c4 [#8] Update harvesters for CsW, WAF and Doc sources
These are the new versions of the spatial harvesters with significant
improvement over previous ones.
2013-02-12 18:29:30 +00:00
amercader f153b0f4ba [#8] Minor fixes in base harvester 2013-02-12 18:26:03 +00:00
amercader c7d872af7e [#8] Rename source config object and method to avoid confusion 2013-02-12 18:07:05 +00:00
amercader 24270cb4cb [#8] Move Gemini harvesters and updated base harvester to own files
Prior to the merging of the new spatial harvesters, the existing ones
based on Gemini and UKLP have been moved to their own namespace
(ckanext.spatial.harvesters.gemini). The plugin points have been updated
so users currently using these harvesters will still be able to use them
as normal.

The base harvester (SpatialHarvester) has been updated with new methods,
most significally '_get_package_dict' and 'import_stage'. Note that
SpatialHarvester now extends HarvesterBase on ckanext-harvest, which had
some of its methods updated.

TODO: still some geo.data.gov specific bits!
2013-02-12 17:40:41 +00:00
amercader b7f486ce04 [#8] Adapt model parsing code to make it ISO 19115 friendly
Changes in multiplicity to support the ISO 19115 spec rather than just
the Gemini 2 one. Thanks to @dread for his help on this.

Summary of the changes:

* dataset-reference-date: Set to 1..*
Note that there was a bug with mutliple values allowed per date.
Returned object should now be like:
 "dataset-reference-date": [{"type": "creation", "value": "2004-02"},
{"type": "revision", "value": "2006-07-03"}]

* metadata-languge: Set to 0..1

* resource-type: Set to *. That means that a list is now returned

* bbox: Set to *. Note that bboxes are now returned as objects such as:
[{"north":xxx, "south":xxx,, "east":xxx, "west":xxx}, {"north":xxx,
  "south":xxx,, "east":xxx, "west":xxx}]

The existing Gemini based harvesters and validators have been adapted,
all tests pass.
2013-02-11 17:35:06 +00:00
amercader 672e168bfa [#8] Adapt harvest tests to CKAN 2.0
Add new mandatory fields when creating sources, status dict has new
keys, CKAN lower cases formats, take into account harvest source
datasets.

Added a local getcapabilities response to avoid remote 404s.

Note that the TestValidation tests need to be fixed, as 27c4ee81e
removed the validation from the gather stage.
2013-02-11 16:57:38 +00:00
amercader 456d127967 Remove dataset_extent_map from test-core.ini 2013-02-08 17:50:08 +00:00
David Read b0312ed3a5 Conflicts:
ckanext/spatial/harvesters.py
	ckanext/spatial/tests/test_harvest.py
2013-02-08 17:47:32 +00:00
David Read fe2ebe016f Conflicts:
ckanext/spatial/harvesters.py
2013-02-08 17:40:19 +00:00
David Read f7d23dd576 #154 Gemini schematron 1.3 has been accepted, so loses the "a" suffix. 2013-02-08 17:38:41 +00:00
David Read aa080e9f75 #noticket No functionality has changed! Factored out responsible_organisation stuff into a separate method to add tests to show what it does. 2013-02-08 17:38:05 +00:00
David Read 9daff6a5b2 #noticket Tests added to clarify license URL extraction. 2013-02-08 17:36:34 +00:00
David Read e20080e69d Latest schematron added. FCSC is a good test of it. 2013-02-08 17:35:31 +00:00
David Read 6e23ae55c8 Relaxed "Multiplicity Check" so that it does not raise Exceptions any more - just log errors. This is because they are simply duplicates of the Gemini Schematron. Adria agreed these will be deleted anyway in 2.0. 2013-02-08 17:34:43 +00:00
David Read 779e00cd75 [xs] Improve docstrings and error messages. 2013-02-08 17:33:42 +00:00
David Read 5bcffdf14b More debug logging added to WAF harvester. 2013-02-08 17:33:16 +00:00
David Read 44728f12f7 Add XSL for converting Gemini XML to nice HTML, used in controllers/api.py. 2013-02-08 17:31:34 +00:00
David Read bcdf360b01 Spatial query can now be ordered. Does not play nicely with SOLR options - just uses that to get the facets counts and return each result. Have added performance tests for two alternative queries.
- Added a config option ('ckanext.spatial.use_postgis_sorting') to
activate this as this behaviour will be deprecated in the future
in favour of Solr 4 spatial sorting capabilities.
Also fixed the tests

Conflicts:

	ckanext/spatial/plugin.py
2013-02-08 17:28:37 +00:00
David Read fb4b041b30 Adding Parslow constraints schema previously missed. 2013-02-08 16:41:44 +00:00
David Read 8e0f7c7148 Added lower level tests for bbox search (at the lib level), complementing the API level ones.
Conflicts:

	ckanext/spatial/plugin.py
2013-02-08 16:41:02 +00:00
David Read 46fb0030a5 Comments about on cardinality/multiplicity. 2013-02-08 16:38:14 +00:00
David Read d2c97fe3cc Added new Parslow Constraints Schematron to test. Added command to validate on the command-line. 2013-02-08 16:37:44 +00:00
David Read 9ea3295b46 Get the validation XML to be included in the distribution. 2013-02-08 16:36:36 +00:00
David Read 892b44a3b3 Added useful logging to the validation report. Useful to have the date (i.e. version) in the name of the Eden schema. 2013-02-08 16:35:59 +00:00
amercader 3142794524 Merge branch '2.0-validation-changes' into release-v2.0 2013-02-07 12:45:02 +00:00
David Read c6ac9494a2 Revert "#276 Coupled resources - first iteration."
Reverting changes for "#276 Coupled Resource" on master as it is INSPIRE-specific. Moving to datagovuk/ckanext-spatial branch dgu.

This reverts commit 91e547a622.
2013-02-04 16:07:19 +00:00
David Read f4e3cfad00 Revert "#276 Coupled resources - second iteration. Just need to update harvester now."
Reverting changes for "#276 Coupled Resource" on master as it is INSPIRE-specific. Moving to datagovuk/ckanext-spatial branch dgu.

This reverts commit ecd6036efe.
2013-02-04 16:07:03 +00:00
David Read c771a76e3d Revert "#276 Coupled Resource table now gets updated during harvest."
Reverting changes for "#276 Coupled Resource" on master as it is INSPIRE-specific. Moving to datagovuk/ckanext-spatial branch dgu.

This reverts commit 01536873b9.

Conflicts:

	ckanext/spatial/harvesters.py
2013-02-04 16:03:50 +00:00
David Read 84b75ea759 Revert "#276 Coupled Resource - fix to not show withdrawn packages in list of coupled resources."
Reverting changes for "#276 Coupled Resource" on master as it is INSPIRE-specific. Moving to datagovuk/ckanext-spatial branch dgu.

This reverts commit 3f627d9700.
2013-02-04 16:00:34 +00:00
David Read 27c4ee81e2 #287 Avoid doing extra validation for WAF in the gather stage. 2013-02-02 00:06:33 +00:00
David Read 3f627d9700 #276 Coupled Resource - fix to not show withdrawn packages in list of coupled resources. 2013-02-01 23:10:58 +00:00
David Read 3eee0be135 #noticket Bugfix - diff (when harvest content changed without timestamp change) displayed raw html. 2013-02-01 18:09:12 +00:00
David Read cdabab12cd #154 Gemini schematron 1.3 has been accepted, so loses the "a" suffix. 2013-02-01 14:00:51 +00:00
David Read 73af616b81 #noticket No functionality has changed! Factored out responsible_organisation stuff into a separate method to add tests to show what it does. 2013-02-01 12:04:50 +00:00
David Read 01536873b9 #276 Coupled Resource table now gets updated during harvest. 2013-01-31 18:05:48 +00:00
David Read 2cef9d6dae #noticket Tests added to clarify license URL extraction. 2013-01-31 14:32:19 +00:00
David Read ecd6036efe #276 Coupled resources - second iteration. Just need to update harvester now. 2013-01-28 23:56:12 +00:00
David Read 91e547a622 #276 Coupled resources - first iteration. 2013-01-28 21:53:15 +00:00
David Read 775a57d1f5 Latest schematron added. FCSC is a good test of it. 2013-01-24 11:30:23 +00:00
David Read 70c3eccdf5 Relaxed "Multiplicity Check" so that it does not raise Exceptions any more - just log errors. This is because they are simply duplicates of the Gemini Schematron. Adria agreed these will be deleted anyway in 2.0. 2013-01-21 17:22:24 +00:00
amercader 461607f06f Merge branch 'release-v2.0' into 2.0-validation-changes 2013-01-21 16:30:59 +00:00