2013-09-08 20:42:39 +02:00
|
|
|
===========
|
|
|
|
CSW support
|
|
|
|
===========
|
|
|
|
|
|
|
|
The extension provides the support for the CSW_ standard, a specification from
|
|
|
|
the Open Geospatial Consortium for exposing geospatial catalogues over the web.
|
|
|
|
|
|
|
|
This support consists of:
|
|
|
|
|
|
|
|
* Ability to import records from CSW servers with the CSW harvester. See
|
|
|
|
:doc:`harvesters` for more details.
|
|
|
|
|
2013-11-15 16:02:20 +01:00
|
|
|
* Integration with pycsw_ to provide a fully compliant CSW interface for
|
|
|
|
harvested records. This integration is described in the following sections.
|
2013-09-08 20:42:39 +02:00
|
|
|
|
|
|
|
|
|
|
|
ckan-pycsw
|
|
|
|
----------
|
|
|
|
|
|
|
|
The spatial extension offers the ``ckan-pycsw`` command, which allows to expose
|
|
|
|
the spatial datasets harvested from other sources in a CSW interface. This is
|
|
|
|
powered by pycsw_, which fully implements the OGC CSW specification.
|
|
|
|
|
|
|
|
How it works
|
|
|
|
++++++++++++
|
|
|
|
|
|
|
|
|
|
|
|
The current implementation is based on CKAN and pycsw being loosely integrated
|
|
|
|
via the CKAN API. pycsw will be generally installed in the same server as CKAN
|
|
|
|
(although it can also be run on a separate one), and the synchronization
|
|
|
|
command will be run regularly to keep the records on the pycsw repository up to
|
|
|
|
date. This is done using the CKAN API to get all the datasets identifiers (more
|
|
|
|
precisely the ones from datasets that have been harvested) and then deciding
|
|
|
|
which ones need to be created, updated or deleted on the pycsw repository. For
|
|
|
|
those that need to be created or updated, the original harvested spatial
|
|
|
|
document (ie ISO 19139) is requested from CKAN, and it is then imported using
|
|
|
|
pycsw internal functions::
|
|
|
|
|
|
|
|
Harvested
|
|
|
|
datasets
|
|
|
|
+
|
|
|
|
|
|
|
|
|
v
|
|
|
|
+--------+ +---------+
|
|
|
|
| | CKAN API | |
|
|
|
|
| CKAN | +------------> | pycsw | +------> CSW
|
|
|
|
| | | |
|
|
|
|
+--------+ +---------+
|
|
|
|
|
|
|
|
|
|
|
|
Remember, only datasets that were harvested with the :doc:`harvesters`
|
|
|
|
can currently be exposed via pycsw.
|
|
|
|
|
|
|
|
All necessary tasks are done with the ``ckan-pycsw`` command. To get more
|
|
|
|
details of its usage, run the following::
|
|
|
|
|
|
|
|
cd /usr/lib/ckan/default/src/ckanext-spatial
|
2021-05-28 13:27:12 +02:00
|
|
|
python bin/ckan_pycsw.py --help
|
2013-09-08 20:42:39 +02:00
|
|
|
|
|
|
|
|
|
|
|
Setup
|
|
|
|
+++++
|
|
|
|
|
|
|
|
1. Install pycsw. There are several options for this, depending on your
|
|
|
|
server setup, check the `pycsw documentation`_.
|
|
|
|
|
2016-05-06 14:53:04 +02:00
|
|
|
.. note:: CKAN integration requires least pycsw version 1.8.0. In general,
|
|
|
|
use the latest stable version.
|
2013-09-08 20:42:39 +02:00
|
|
|
|
|
|
|
The following instructions assume that you have installed CKAN via a
|
|
|
|
`package install`_ and should be run as root, but the steps are the same if
|
|
|
|
you are setting it up in another location::
|
|
|
|
|
|
|
|
cd /usr/lib/ckan/default/src
|
|
|
|
source ../bin/activate
|
|
|
|
|
|
|
|
# From now on the virtualenv should be activated
|
|
|
|
|
|
|
|
git clone https://github.com/geopython/pycsw.git
|
|
|
|
cd pycsw
|
2016-05-06 14:53:04 +02:00
|
|
|
# always use the latest stable version
|
|
|
|
git checkout 1.10.4
|
2013-09-08 20:42:39 +02:00
|
|
|
pip install -e .
|
|
|
|
python setup.py build
|
|
|
|
python setup.py install
|
|
|
|
|
|
|
|
2. Create a database for pycsw. In theory you can use the same database that
|
|
|
|
CKAN is using, but if you want to keep them separated, use the following
|
|
|
|
command to create a new one (we'll use the same default user though)::
|
|
|
|
|
|
|
|
sudo -u postgres createdb -O ckan_default pycsw -E utf-8
|
|
|
|
|
2022-08-24 16:25:35 +02:00
|
|
|
It is strongly recommended that you install PostGIS in the pycsw database,
|
|
|
|
so its spatial functions are used.
|
2013-09-08 20:42:39 +02:00
|
|
|
|
|
|
|
3. Configure pycsw. An example configuration file is included on the source::
|
|
|
|
|
|
|
|
cp default-sample.cfg default.cfg
|
|
|
|
|
|
|
|
To keep things tidy we will create a symlink to this file on the CKAN
|
|
|
|
configuration directory::
|
|
|
|
|
|
|
|
ln -s /usr/lib/ckan/default/src/pycsw/default.cfg /etc/ckan/default/pycsw.cfg
|
|
|
|
|
|
|
|
Open the file with your favourite editor. The main settings you should tweak
|
|
|
|
are ``server.home`` and ``repository.database``::
|
|
|
|
|
|
|
|
[server]
|
|
|
|
home=/usr/lib/ckan/default/src/pycsw
|
|
|
|
...
|
|
|
|
[repository]
|
|
|
|
database=postgresql://ckan_default:pass@localhost/pycsw
|
|
|
|
|
2014-06-28 02:33:38 +02:00
|
|
|
The rest of the options are described `here <http://docs.pycsw.org/en/latest/configuration.html>`_.
|
2013-09-08 20:42:39 +02:00
|
|
|
|
2021-05-28 13:27:12 +02:00
|
|
|
4. Setup the pycsw table. This is done with the ``ckan-pycsw`` script
|
2013-09-08 20:42:39 +02:00
|
|
|
(Remember to have the virtualenv activated when running it)::
|
|
|
|
|
|
|
|
cd /usr/lib/ckan/default/src/ckanext-spatial
|
2021-05-28 13:27:12 +02:00
|
|
|
python bin/ckan_pycsw.py setup -p /etc/ckan/default/pycsw.cfg
|
2013-09-08 20:42:39 +02:00
|
|
|
|
|
|
|
At this point you should be ready to run pycsw with the wsgi script that it
|
|
|
|
includes::
|
|
|
|
|
|
|
|
cd /usr/lib/ckan/default/src/pycsw
|
|
|
|
python csw.wsgi
|
|
|
|
|
|
|
|
This will run pycsw at http://localhost:8000. Visiting the following URL
|
|
|
|
should return you the Capabilities file:
|
|
|
|
|
|
|
|
http://localhost:8000/?service=CSW&version=2.0.2&request=GetCapabilities
|
|
|
|
|
|
|
|
5. Load the CKAN datasets into pycsw. Again, we will use the ``ckan-pycsw``
|
|
|
|
command for this::
|
|
|
|
|
|
|
|
cd /usr/lib/ckan/default/src/ckanext-spatial
|
2021-05-28 13:27:12 +02:00
|
|
|
python bin/ckan_pycsw.py load -p /etc/ckan/default/pycsw.cfg
|
2013-09-08 20:42:39 +02:00
|
|
|
|
|
|
|
When the loading is finished, check that results are returned when visiting
|
|
|
|
this link:
|
|
|
|
|
|
|
|
http://localhost:8000/?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary
|
|
|
|
|
|
|
|
The ``numberOfRecordsMatched`` should match the number of harvested datasets
|
|
|
|
in CKAN (minus import errors). If you run the command again new or udpated
|
|
|
|
datasets will be synchronized and deleted datasets from CKAN will be removed
|
|
|
|
from pycsw as well.
|
|
|
|
|
2013-11-18 22:32:23 +01:00
|
|
|
Setting Service Metadata Keywords
|
|
|
|
+++++++++++++++++++++++++++++++++
|
|
|
|
|
|
|
|
The CSW standard allows for administrators to set CSW service metadata. These
|
|
|
|
values can be set in the pycsw configuration ``metadata:main`` section. If you
|
|
|
|
would like the CSW service metadata keywords to be reflective of the CKAN
|
2013-12-21 14:10:30 +01:00
|
|
|
tags, run the following convenience command::
|
2013-11-18 22:32:23 +01:00
|
|
|
|
2021-05-28 13:27:12 +02:00
|
|
|
python ckan_pycsw.py set_keywords -p /etc/ckan/default/pycsw.cfg
|
2013-11-18 22:32:23 +01:00
|
|
|
|
2013-11-22 14:52:32 +01:00
|
|
|
Note that you must have privileges to write to the pycsw configuration file.
|
|
|
|
|
2013-11-18 22:32:23 +01:00
|
|
|
|
2013-09-08 20:42:39 +02:00
|
|
|
Running it on production site
|
|
|
|
+++++++++++++++++++++++++++++
|
|
|
|
|
|
|
|
On a production site you probably want to run the load command regularly to
|
|
|
|
keep CKAN and pycsw in sync, and serve pycsw with Apache + mod_wsgi like CKAN.
|
|
|
|
|
|
|
|
* To run the load command regularly you can set up a cron job. Type ``crontab -e``
|
|
|
|
and copy the following lines::
|
|
|
|
|
|
|
|
# m h dom mon dow command
|
2021-05-28 13:27:12 +02:00
|
|
|
0 * * * * /var/lib/ckan/default/bin/python /var/lib/ckan/default/src/ckanext-spatial/bin/ckan_pycsw.py load -p /etc/ckan/default/pycsw.cfg
|
2013-09-08 20:42:39 +02:00
|
|
|
|
|
|
|
This particular example will run the load command every hour. You can of
|
|
|
|
course modify this periodicity, for instance reducing it for huge instances.
|
|
|
|
This `Wikipedia page <http://en.wikipedia.org/wiki/Cron#CRON_expression>`_
|
|
|
|
has a good overview of the crontab syntax.
|
|
|
|
|
2014-06-28 02:33:38 +02:00
|
|
|
* To run pycsw under Apache check the pycsw `installation documentation <http://docs.pycsw.org/en/latest/installation.html#running-on-wsgi>`_
|
2013-11-15 16:02:20 +01:00
|
|
|
or follow these quick steps (they assume the paths used in previous steps):
|
2013-09-08 20:42:39 +02:00
|
|
|
|
|
|
|
- Edit ``/etc/apache2/sites-available/ckan_default`` and add the following
|
|
|
|
line just before the existing ``WSGIScriptAlias`` directive::
|
|
|
|
|
|
|
|
WSGIScriptAlias /csw /usr/lib/ckan/default/src/pycsw/csw.wsgi
|
|
|
|
|
|
|
|
- Edit the ``/usr/lib/ckan/default/src/pycsw/csw.wsgi`` file and add these two
|
|
|
|
lines just after the imports on the top of the file::
|
|
|
|
|
|
|
|
activate_this = os.path.join('/usr/lib/ckan/default/bin/activate_this.py')
|
|
|
|
execfile(activate_this, {"__file__":activate_this})
|
|
|
|
|
|
|
|
We need these to activate the virtualenv where we installed pycsw into.
|
|
|
|
|
|
|
|
- Restart Apache::
|
|
|
|
|
|
|
|
service apache2 restart
|
|
|
|
|
|
|
|
pycsw should be now accessible at http://localhost/csw
|
|
|
|
|
|
|
|
.. _pycsw: http://pycsw.org
|
2014-06-28 02:33:38 +02:00
|
|
|
.. _pycsw documentation: http://docs.pycsw.org/en/latest/installation.html
|
2013-09-08 20:42:39 +02:00
|
|
|
.. _package install: http://docs.ckan.org/en/latest/install-from-package.html
|
|
|
|
.. _CSW: http://www.opengeospatial.org/standards/cat
|
|
|
|
|