* Harvest Metadata API - a way for a user to view the harvested metadata XML, either as a raw file or styled to view in a web browser. (`spatial_harvest_metadata_api`)
Every time a dataset is created, updated or deleted, the extension will synchronize
the information stored in the extra with the geometry table.
Spatial Search Widget
---------------------
**Note**: this plugin requires CKAN 1.6 or higher.
To enable the search map widget you need to add the `spatial_query_widget` plugin to your
ini file (See `Configuration`_). You also need to load both the `spatial_metadata`
and the `spatial_query` plugins.
When the plugin is enabled, a map widget will be shown in the dataset search form,
where users can refine their searchs drawing an area of interest.
Dataset Extent Map
------------------
To enable the dataset map you need to add the `dataset_extent_map` plugin to your
ini file (See `Configuration`_). You need to load the `spatial_metadata` plugin also.
When the plugin is enabled, if datasets contain a 'spatial' extra like the one
described in the previous section, a map will be shown on the dataset details page.
WMS Preview
-----------
To enable the WMS previewer you need to add the `wms_preview` plugin to your
ini file (See `Configuration`_).
Please note that this is an experimental plugin and may be unstable.
When the plugin is enabled, if datasets contain a resource that has 'WMS' format,
a 'View available WMS layers' link will be displayed on the dataset details page.
It forwards to a simple map viewer that will attempt to load the remote service
layers, based on the GetCapabilities response.
CSW Server
----------
CSW (Catalogue Service for the Web) is an OGC standard for a web interface that allows you to access metadata (which are records that describe data or services)
The currently supported methods with this CSW Server are:
* GetCapabilities
* GetRecords
* GetRecordById
ckanext-csw provides the CSW service at ``/csw``.
For example you can ask the capabilities of the CSW server installed into CKAN running on 127.0.0.1:5000 like this::
These harvesters were are designed to harvest metadata records in the GEMINI2 format, which is an XML spatial metadata format very similar to ISO19139. This was developed for the UK Location Programme and GEMINI2, but it would be simple to adapt them for other INSPIRE or ISO19139-based metadata.
The harvesters get the metadata from these types of server:
* GeminiCswHarvester - CSW server
* GeminiWafHarvester - WAF file server - An index page with links to GEMINI resources
* GeminiDocHarvester - HTTP file server - An individual GEMINI resource
The GEMINI-specific parts of the code are restricted to the fields imported into CKAN, so it would be relatively simple to generalise these to other INSPIRE profiles.
Each contains code to do the three stages of harvesting:
* gather_stage - Submits a request to Harvest Sources and assembles a list of all the metadata URLs (since each CSW record can recursively refer to more records?). Some processing of the XML or validation may occur.
* fetch_stage - Fetches all the Gemini metadata
* import_stage - validates all the Gemini, converts it to a CKAN Package and saves it in CKAN
CswService is a client for python software (such as the CSW Harvester in ckanext-inspire) to conveniently access a CSW server, using the same three methods as the CSW Server supports. It is a wrapper around OWSLib's tool, dealing with the details of the calls and responses to make it very convenient to use, whereas OWSLib on its own is more complicated.
This library can validate metadata records. It currently supports ISO19139 / INSPIRE / GEMINI2 formats, validating them with XSD and Schematron schemas. It is easily extensible.
To specify which validators to use during harvesting, specify their names in CKAN config. e.g.::
When ckanext-csw is installed, it provides a command-line tool ``cswinfo``, for making queries on CSW servers and returns the info in nicely formatted JSON. This may be more convenient to type than using, for example, curl.
Currently available queries are:
* getcapabilities
* getidentifiers
* getrecords
* getrecordbyid
For details, type::
cswinfo csw -h
There are options for querying by only certain types, keywords and typenames as well as configuring the ElementSetName.
The equivalent example to the one above for asking the cabailities is::
NOTE: The ISO19139 XSD Validator requires system library ``libxml2`` v2.9 (released Sept 2012). If you intend to use this validator then see the section below about installing libxml2.
Setup
=====
Install Python
--------------
Install this extension into your python environment (where CKAN is also installed) in the normal way::
`cswserver` requires that ckanext-harvest is also installed (and enabled) - see https://github.com/okfn/ckanext-harvest
There are various python modules required by the various components of this module. To install them all, use::
(pyenv) $ pip install -r pip-requirements.txt
Install System Packages
-----------------------
There are also some system packages that are required::
* PostGIS and must be installed and the database needs spatial features enabling to be able to use Spatial Search. See the "Setting up PostGIS" section for details.
* The Validator for ISO19139 requires the install of a particular version of libxml2 - see "Installing libxml2" for full details.
Configuration
-------------
Once PostGIS is installed and configured in your database (see the "Setting up PostGIS" section for details), you need to create some DB tables for the spatial search, by running the following command (with your python env activated)::
The Dataset Extent Map displays only on certain routes. By default it is just the 'Package' controller, 'read' method. To display it on other routes you can specify it in a space separated list like this::
The Dataset Extent Map provides two different map types. It defaults to 'osm' but if you have a license and apikey for 'os' then you can use that map type using this configuration::
The Dataset Extent Map will be inserted by default at the end of the dataset page. This can be changed by supplying an alternative element_id to the default::
If using Spatial Query functionality then there is an additional SOLR/Lucene setting that should be used to set the limit on number of datasets searchable with a spatial value.
The setting is ``maxBooleanClauses`` in the solrconfig.xml and the value is the number of datasets spatially searchable. The default is ``1024`` and this could be increased to say ``16384``. For a SOLR single core this will probably be at `/etc/solr/conf/solrconfig.xml`. For a multiple core set-up, there will me several solrconfig.xml files a couple of levels below `/etc/solr`. For that case, *ALL* of the cores' `solrconfig.xml` should have this setting at the new value.
Example::
<maxBooleanClauses>16384</maxBooleanClauses>
This setting is needed because PostGIS spatial query results are fed into SOLR using a Boolean expression, and the parser for that has a limit. So if your spatial area contains more than the limit (of which the default is 1024) then you will get this error::
Dataset search error: ('SOLR returned an error running query...
The spatial model has not been loaded. You probably forgot to add the `spatial_metadata` plugin to your ini configuration file.
::
InternalError: (InternalError) Operation on two geometries with different SRIDs
The spatial reference system of the database geometry column and the one used by CKAN differ. Remember, if you are using a different spatial reference system from the default one (WGS 84 lat/lon, EPSG:4326), you must define it in the configuration file as follows::
In some places in this extension, ALL exceptions get caught and reported as errors. Since these could be basic coding errors, to aid debugging these during development, you can request exceptions are reraised by setting the DEBUG environment variable::
Version 2.9 is required for the ISO19139 XSD validation.
With CKAN you would probably have installed an older version from your distribution. (e.g. with ``sudo apt-get install libxml2-dev``). You need to find the SO files for the old version::
$ find /usr -name "libxml2.so"
For example, it may show it here: ``/usr/lib/x86_64-linux-gnu/libxml2.so``. The directory of the SO file is used as a parameter to the ``configure`` next on.