Commit Graph

205 Commits

Author SHA1 Message Date
amercader 9760546d13 [#8] Add extra xpath for getting resources
According to the ISO spec gmd:MD_DigitalTransferOptions can be accessed
via a gmd:MD_Distributer tag
2013-03-27 17:39:51 +00:00
amercader 0c98e6ec4c [#8] Minor fix in single doc harvester 2013-03-27 17:38:42 +00:00
amercader fede0b0831 [#15] Ensure that bounding boxes are defined counter-clockwise
To return correct results on a spatial query, rectangle geometries must
be defined in counter-clockwise order [1]. This changeset adds a small
sanity check to before_index when we are dealing with a Polygon geometry
that has 5 coordinate pairs. Shapely is used to generate a LinearRing
from the polygon coordinates and check if they are ccw. If not, they are
reordered and a new polygon is generated so the WKT sent to Solr is
properly ordered.

The GeoJSON template used for extents in the base spatial harvester has
been also updated to define the coordinates counter-clockwise.

[1]
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4#JTS_.2BAC8_WKT_.2BAC8_Polygon_notes
2013-03-23 19:28:31 +00:00
amercader 40967385b0 [#8] Fix typo in WMS format detection 2013-03-21 17:53:38 +00:00
amercader 0e0b5a2cc2 [#8] Fix bug that prevented setting a default resource name 2013-03-21 14:34:42 +00:00
amercader 627c4c58e0 [#8] Fix bug that prevented setting a default resource name 2013-03-21 14:34:23 +00:00
amercader c7a9cc660f [#15] Add suport for Solr spatial indexing and querying
When the 'ckanext.spatial.search_backend' config option is set to
'solr', the extension will index geometries stored in the 'spatial'
extra on the spatial field of the Solr index (named 'spatial_geom').
This is done on the 'before_index' extension point.

Also, when doing a query, if the same config option is in place, the
necessary fq parameter will be set to pass the spatial query to Solr.
2013-03-20 16:54:55 +00:00
amercader a727aa815b Add helper functions useful to format extras coming from the spatial harvesters 2013-03-18 15:59:20 +00:00
amercader a7fc19768b [#8] Remove print commands from WAF harvester 2013-03-14 17:45:34 +00:00
amercader eb201e1759 [#8] Waf harvester: improve exception and return empty list if no records 2013-03-14 17:35:51 +00:00
amercader 0aafffc8dc [#8] Capture exceptions during request in WAF harvester 2013-03-14 14:56:42 +00:00
amercader d2723c3020 [#8] resource-type not always present 2013-03-14 14:30:16 +00:00
amercader a76a8d2ca7 [#8] Don't use object id so messages can be grouped 2013-03-14 12:37:35 +00:00
amercader 4638b3899f Revert "[#8] Don't use object id so messages can be grouped"
This reverts commit 032cc4d961.
2013-03-14 12:36:39 +00:00
amercader 032cc4d961 [#8] Don't use object id so messages can be grouped 2013-03-14 12:34:20 +00:00
amercader da1dc02c7e [#8] Improve fields returned in the package dict
Make them less uklp specific and more parse friendly. Helper functions
should be used in the UI to format them nicely.
2013-03-08 18:57:30 +00:00
amercader 724ef6ed7c [#8] Fix gemini harvester after change in spatial field 2013-03-08 18:56:03 +00:00
amercader e7f70c4f85 [#8] Fix KeyError in point template 2013-03-05 18:38:55 +00:00
amercader 7c5071bfc2 [#8] Sanitize bbox before creating spatial extra
Some common problems:
* Whitespace, tabs, line feeds and plus signs: should be handled by
  float()
* Text: log error and skip creation of spatial extra
* Same set of 2 coords for extent: create point instead of polygon

Note that the bbox values are stored as they are in the bbox-xx-yy
extras
2013-03-05 18:31:49 +00:00
amercader d158a6c684 [#8] Change 'Resource locator' string for unnamed resources
'Resource locator' was confusing, has been replaced by 'Unnamed
resource' and made translatable. Also don't set description if not
present, set name.
2013-03-04 17:55:34 +00:00
amercader d43cbb8800 [#8] Improve resource format detection
The 'guess_resource_format' function looks for common patterns in popular
geospatial services and file extensions. It just looks at the provided URL,
it does not attempt to perform any remote check. By default, it will use the
mimetypes module if no match was found before to try to guess the format.

On the previous version, all resources in documents of type 'service' were
queried to see if they were actually WMS. This is no longer the case,
but services flagged as 'wms' can be verified if the following setting
is set to True: ckanext.spatial.harvest.validate_wms
2013-03-04 17:44:18 +00:00
amercader da5b37bc45 [#8] Fix typo in GeoJSON template 2013-03-01 17:42:51 +00:00
amercader d5461477aa [#8] Do not add XML declaration when storing the content
Rather than store the XML declaration in the DB, we add it if not present
when outputing the contents (Also in ckanext-harvest's show_object)
2013-03-01 17:33:53 +00:00
amercader d7d49c37e5 [#8] Check if tags exist on get_package_dict 2013-02-25 16:03:36 +00:00
amercader cc60327d0b [#10] Improve harvested metadata API
Some improvements on the endpoints that return the contents of the
harvest objects:

* Nicer URLs with redirects to the old ones
* Returning the raw harvest object content is available on the main
 harvest extension, so just redirect there
* Support for showing the original document of a harvest object, if
 present
* Suport for defining a custom XSLT for the HTML view, via

ckanext.spatial.harvest.xslt_html_content
ckanext.spatial.harvest.xslt_html_content_original
2013-02-19 18:38:15 +00:00
amercader 8647f90cb6 [#8] Clean up base harvester, docstrings, some pep8 2013-02-18 18:17:36 +00:00
amercader 6783d58006 [#8] Add support for import command to new harvester
The import CLI reruns the import stage for the last current objects, so
when running it, the previous objects don't need to be changed. Any
date check is overridden to force the update of the package.
2013-02-18 17:19:14 +00:00
amercader 1ffe8fc902 [#8] Use plugins toolkit whenever possible 2013-02-15 15:03:03 +00:00
amercader 32a93b8ec8 [#8] Add get_extra to SpatialHarvester class 2013-02-15 14:56:00 +00:00
amercader 4964ce9ec3 [#8] Provide a config option to continue after validation errors
This can be set instance wide on the ini file with

ckanext.spatial.harvest.continue_on_validation_errors

or per source, adding continue_on_validation_errors=true to the source
config.
2013-02-15 14:47:25 +00:00
amercader 596bdbf5d0 [#8] Minor tweak to source_config 2013-02-15 12:50:09 +00:00
amercader 2ff5a11911 [#8] Use site user for harvesting actions
You don't need to create a 'harvest' sysadmin user any more.
By default this will be the internal site admin user. This is the
recommended setting, but if necessary it can be overridden by
the `ckanext.spatial.harvest.user_name` config option, eg to
support the old hardcoded 'harvest' user
2013-02-15 12:28:58 +00:00
amercader baf7b5da67 [#8] Rename harvested document model to ISODocument
GeminiDocument has been kept for backwards incompatibility.
2013-02-13 19:16:36 +00:00
amercader dadd174293 [#8] Make sure only list and dict extras are dumped as JSON 2013-02-13 18:48:53 +00:00
amercader 00a0b5946b [#8] Add transform_to_iso method
This can be overridden by custom harvesters willing to support non ISO
formats (like FGDC). It is called whrn the original_document and
original_format harvest object extras are present. Custom harvesters are
responsible for transforming the original document to ISO.
2013-02-13 18:33:58 +00:00
amercader 641ffad589 Remove silly debug message 2013-02-13 17:49:15 +00:00
amercader 305951aeb5 [#8] Make get_package_dict public and document
So it is more obvious that it can be overridden by custom harvesters.
2013-02-13 17:06:01 +00:00
amercader 1d8a4c17c4 [#8] Update harvesters for CsW, WAF and Doc sources
These are the new versions of the spatial harvesters with significant
improvement over previous ones.
2013-02-12 18:29:30 +00:00
amercader f153b0f4ba [#8] Minor fixes in base harvester 2013-02-12 18:26:03 +00:00
amercader c7d872af7e [#8] Rename source config object and method to avoid confusion 2013-02-12 18:07:05 +00:00
amercader 24270cb4cb [#8] Move Gemini harvesters and updated base harvester to own files
Prior to the merging of the new spatial harvesters, the existing ones
based on Gemini and UKLP have been moved to their own namespace
(ckanext.spatial.harvesters.gemini). The plugin points have been updated
so users currently using these harvesters will still be able to use them
as normal.

The base harvester (SpatialHarvester) has been updated with new methods,
most significally '_get_package_dict' and 'import_stage'. Note that
SpatialHarvester now extends HarvesterBase on ckanext-harvest, which had
some of its methods updated.

TODO: still some geo.data.gov specific bits!
2013-02-12 17:40:41 +00:00
amercader b7f486ce04 [#8] Adapt model parsing code to make it ISO 19115 friendly
Changes in multiplicity to support the ISO 19115 spec rather than just
the Gemini 2 one. Thanks to @dread for his help on this.

Summary of the changes:

* dataset-reference-date: Set to 1..*
Note that there was a bug with mutliple values allowed per date.
Returned object should now be like:
 "dataset-reference-date": [{"type": "creation", "value": "2004-02"},
{"type": "revision", "value": "2006-07-03"}]

* metadata-languge: Set to 0..1

* resource-type: Set to *. That means that a list is now returned

* bbox: Set to *. Note that bboxes are now returned as objects such as:
[{"north":xxx, "south":xxx,, "east":xxx, "west":xxx}, {"north":xxx,
  "south":xxx,, "east":xxx, "west":xxx}]

The existing Gemini based harvesters and validators have been adapted,
all tests pass.
2013-02-11 17:35:06 +00:00
amercader 672e168bfa [#8] Adapt harvest tests to CKAN 2.0
Add new mandatory fields when creating sources, status dict has new
keys, CKAN lower cases formats, take into account harvest source
datasets.

Added a local getcapabilities response to avoid remote 404s.

Note that the TestValidation tests need to be fixed, as 27c4ee81e
removed the validation from the gather stage.
2013-02-11 16:57:38 +00:00
David Read b0312ed3a5 Conflicts:
ckanext/spatial/harvesters.py
	ckanext/spatial/tests/test_harvest.py
2013-02-08 17:47:32 +00:00
David Read fe2ebe016f Conflicts:
ckanext/spatial/harvesters.py
2013-02-08 17:40:19 +00:00
David Read f7d23dd576 #154 Gemini schematron 1.3 has been accepted, so loses the "a" suffix. 2013-02-08 17:38:41 +00:00
David Read aa080e9f75 #noticket No functionality has changed! Factored out responsible_organisation stuff into a separate method to add tests to show what it does. 2013-02-08 17:38:05 +00:00
David Read 9daff6a5b2 #noticket Tests added to clarify license URL extraction. 2013-02-08 17:36:34 +00:00
David Read e20080e69d Latest schematron added. FCSC is a good test of it. 2013-02-08 17:35:31 +00:00
David Read 6e23ae55c8 Relaxed "Multiplicity Check" so that it does not raise Exceptions any more - just log errors. This is because they are simply duplicates of the Gemini Schematron. Adria agreed these will be deleted anyway in 2.0. 2013-02-08 17:34:43 +00:00
David Read 779e00cd75 [xs] Improve docstrings and error messages. 2013-02-08 17:33:42 +00:00
David Read 5bcffdf14b More debug logging added to WAF harvester. 2013-02-08 17:33:16 +00:00
David Read 44728f12f7 Add XSL for converting Gemini XML to nice HTML, used in controllers/api.py. 2013-02-08 17:31:34 +00:00
David Read bcdf360b01 Spatial query can now be ordered. Does not play nicely with SOLR options - just uses that to get the facets counts and return each result. Have added performance tests for two alternative queries.
- Added a config option ('ckanext.spatial.use_postgis_sorting') to
activate this as this behaviour will be deprecated in the future
in favour of Solr 4 spatial sorting capabilities.
Also fixed the tests

Conflicts:

	ckanext/spatial/plugin.py
2013-02-08 17:28:37 +00:00
David Read fb4b041b30 Adding Parslow constraints schema previously missed. 2013-02-08 16:41:44 +00:00
David Read 8e0f7c7148 Added lower level tests for bbox search (at the lib level), complementing the API level ones.
Conflicts:

	ckanext/spatial/plugin.py
2013-02-08 16:41:02 +00:00
David Read 46fb0030a5 Comments about on cardinality/multiplicity. 2013-02-08 16:38:14 +00:00
David Read d2c97fe3cc Added new Parslow Constraints Schematron to test. Added command to validate on the command-line. 2013-02-08 16:37:44 +00:00
David Read 9ea3295b46 Get the validation XML to be included in the distribution. 2013-02-08 16:36:36 +00:00
David Read 892b44a3b3 Added useful logging to the validation report. Useful to have the date (i.e. version) in the name of the Eden schema. 2013-02-08 16:35:59 +00:00
amercader 461607f06f Merge branch 'release-v2.0' into 2.0-validation-changes 2013-01-21 16:30:59 +00:00
amercader fd1071959e [#6] Move to Leaflet for dataset map widget
For this particular use case Leaflet offered the best option. Also
solves the issue when showing extent covering the whole world.
2013-01-18 15:12:09 +00:00
amercader 66b72163d5 Remove stuff from html.py
The rest will eventually go when we migrate the spatial search snippet.
2013-01-18 15:11:06 +00:00
amercader 70f7f6144b Factor out DGU code from the dataset map
It now lives in the DGU extension:

https://github.com/datagovuk/ckanext-dgu/blob/master/ckanext/dgu/theme/public/scripts/dgu-dataset-map.js
2013-01-18 13:32:30 +00:00
amercader 3da5807eb4 [#6] Update dataset map to be a pure snippet
No need to load an extension.
2013-01-18 13:07:26 +00:00
amercader 21a85a6b3f [#4] Register resources only once for all plugins
spatial_metadata will load the resources (public, templates and
resources) for all plugins to use, as it needs to be loaded anyway.
2013-01-15 20:00:46 +00:00
amercader 7abfb4eb61 Use plugins toolkit whenver possible on plugin.py 2013-01-15 19:57:31 +00:00
amercader da7bb48eb5 Merge branch 'release-v2.0' into 2.0-validation-changes 2013-01-14 14:03:04 +00:00
amercader 711391e971 [#4] Rewrtie WMS preview plugin for ResourcePreview interface
Much simplified plugin for previewing WMS. It requires the
resource_proxy plugin to work.

Also clean up public and template dir to mimic core layout.
2013-01-14 13:59:15 +00:00
kindly 4e47141717 add extra resource locator 2012-12-24 10:43:44 +00:00
amercader c927d8b6ab Add method for adding custom validators
This probably needs to be done properly, adding them once on startup
somehow.
2012-12-20 18:26:40 +00:00
amercader 615a58ce93 Reduce log level for harvest object errors 2012-11-21 16:22:55 +00:00
amercader 9ea721e256 Encode remote documents from CSW servers as unicode 2012-11-20 15:42:07 +00:00
amercader 6cf7f79942 Save line if present when storing object errors 2012-11-20 11:47:04 +00:00
amercader 7113466760 Update harvesters to new validator outputs 2012-11-19 18:12:40 +00:00
amercader e12e38cab0 Improvements on the validation code
To make easier to filter and display errors on the UI, the validators
have been modified to return the message and line number separately. The
return format for validators is now:

(is_valid, [(error_message_string, error_line_number)])

Also the XSD based validators were returning only the last error found on
the document, instead of iterating the whole error log. Harvesters should
create a harvest object error for each of this validation errors.

Tests have been adapted to these changes.
2012-11-19 17:15:16 +00:00
amercader 0379852fe0 Merge branch '2641-spatial-widgets-new-theme' into release-v2.0 2012-11-12 16:44:05 +00:00
amercader b82dd4a9c0 Merge branch 'csw-harvester-enhancements' into release-v2.0 2012-11-12 16:43:52 +00:00
amercader a84268abf6 Don't trust the number of records returned by the remote server
Sometimes, even when requesting 10 records, the CSW server returns less
of them (see eg http://goo.gl/b7Rdj, only 9 records returned). The
current check made the process stop on this case, missing other
identifiers.
2012-11-02 11:12:46 +00:00
amercader 0444c14da2 Better check for config object 2012-10-30 16:40:32 +00:00
amercader 99dc2a7c55 Allow to define the validation profiles via source config
The profiles used are decided as follows:

1. 'validator_profiles' property of the harvest source config
object
2. 'ckan.spatial.validator.profiles' configuration option in
the ini file
3. Default value as defined in DEFAULT_VALIDATOR_PROFILES
2012-10-30 14:18:01 +00:00
amercader ac7947549e Remove unused imports 2012-10-29 16:35:52 +00:00
amercader 9e9048c272 Add validator for FGDC XSD schema 2012-10-29 14:34:29 +00:00
amercader 9488ecd5a9 Add validator for ISO 19139 NGDC XSD schema 2012-10-29 14:28:58 +00:00
amercader c1d2a479f2 Add traceback to exception when getting CSW identifiers 2012-10-23 18:57:11 +01:00
amercader 92b781d0f1 Minor model tweaks to support parsing generic ISO documents 2012-10-23 13:03:53 +01:00
amercader aeb7d27bab Fix failing tests
The WMS one is just skipped, until we have a more clear way of how the
previews will work.
2012-10-22 19:44:33 +01:00
amercader 7f58374ac7 Enable command line interface for validation, useful for debug 2012-10-22 19:39:07 +01:00
amercader d95602eaff Fix wrong resource paths on validation 2012-10-22 19:37:54 +01:00
amercader 019cb3b45f Fix wrong imports and docs 2012-10-22 19:36:03 +01:00
David Read d90114cf07 Added ability to produce report into validation errors, for when changing validation. Added report infrastructure. 2012-10-19 18:20:32 +01:00
David Read 0e8a62fe1e Reorganise XML test files into more sensible directory names. Add lower level validation tests. 2012-10-19 14:23:34 +01:00
David Read 2d6f497720 Missed off some files from the previous commit., 2012-10-19 12:14:09 +01:00
David Read 20e8f12615 Merged in ckanext-inspire.
Tests are passing, apart from a couple which didn't work before:
* test_functional.py -> functional/test_package.py (3 failures in 4)
* functional/test_dataset_map.py (1 fail in 1)
There may be some code errors still untested.
Renamed Validator -> Validators to make more sense.
2012-10-19 11:19:01 +01:00
David Read 58fa06051d Added EDEN ISO10139 schema that was missed off before. 2012-10-17 17:08:20 +01:00
David Read 8181b3d3bf Merged in ckanext-csw @44d5a04656dff084e6bca57dda7b63deec69778c. Not tested yet. 2012-10-17 16:59:02 +01:00
amercader a84034d902 Merge branch '2641-spatial-widgets-new-theme' of github.com:okfn/ckanext-spatial into 2641-spatial-widgets-new-theme 2012-10-17 13:03:11 +01:00
amercader 3bb174d56d Update WMS viewer prototype to work with the new iframe on resource read page 2012-10-17 13:02:57 +01:00
amercader 80b8bc33c6 Update jquery path 2012-09-28 12:20:37 +01:00
amercader efa2307ee5 [wms] Fix small bug 2012-08-14 12:48:39 +01:00