[#24] Document CKAN-pycsw integration
Ad overview of how it works, how to set it up and deploy it in a production server. Conflicts: README.rst
This commit is contained in:
parent
b2a653d90f
commit
b2759a83c4
180
README.rst
180
README.rst
|
@ -11,7 +11,7 @@ The following plugins are currently available:
|
||||||
* `Harvest Metadata API`_ - a way for a user to view the harvested metadata XML, either as a raw file or styled to view in a web browser. (``spatial_harvest_metadata_api``)
|
* `Harvest Metadata API`_ - a way for a user to view the harvested metadata XML, either as a raw file or styled to view in a web browser. (``spatial_harvest_metadata_api``)
|
||||||
* `GeoJSON Preview`_ - a GeoJSON previewer (``geojson_preview``).
|
* `GeoJSON Preview`_ - a GeoJSON previewer (``geojson_preview``).
|
||||||
* `WMS Preview`_ - a Web Map Service (WMS) previewer (``wms_preview``).
|
* `WMS Preview`_ - a Web Map Service (WMS) previewer (``wms_preview``).
|
||||||
* `CSW Server`_ - a basic CSW server - to server metadata from the CKAN instance (``cswserver``)
|
* `CSW Server`_ - a basic CSW server - to server metadata from the CKAN instance (``cswserver``). **Deprecated:** Please see `ckan-pycsw`_.
|
||||||
|
|
||||||
These snippets (to be used with CKAN>=2.0):
|
These snippets (to be used with CKAN>=2.0):
|
||||||
|
|
||||||
|
@ -26,8 +26,10 @@ These libraries:
|
||||||
|
|
||||||
And these command-line tools:
|
And these command-line tools:
|
||||||
|
|
||||||
|
* `ckan-pycsw`_ - a command for integrating CKAN with `pycsw <http://pycsw.org>`_, a fully compliant CSW server.
|
||||||
* `cswinfo`_ - a command-line tool to help making requests of any CSW server
|
* `cswinfo`_ - a command-line tool to help making requests of any CSW server
|
||||||
|
|
||||||
|
|
||||||
As of October 2012, ckanext-csw and ckanext-inspire were merged into this extension.
|
As of October 2012, ckanext-csw and ckanext-inspire were merged into this extension.
|
||||||
|
|
||||||
About the components
|
About the components
|
||||||
|
@ -313,12 +315,181 @@ from the spatial extension).
|
||||||
When the plugin is enabled, if datasets contain a resource that has 'gjson' or 'geojson'
|
When the plugin is enabled, if datasets contain a resource that has 'gjson' or 'geojson'
|
||||||
format, the resource page will load simple map viewer that will show the features on a map.
|
format, the resource page will load simple map viewer that will show the features on a map.
|
||||||
|
|
||||||
|
|
||||||
.. _resource_proxy: http://docs.ckan.org/en/latest/data-viewer.html#viewing-remote-resources-the-resource-proxy
|
.. _resource_proxy: http://docs.ckan.org/en/latest/data-viewer.html#viewing-remote-resources-the-resource-proxy
|
||||||
|
|
||||||
|
|
||||||
|
ckan-pycsw
|
||||||
|
----------
|
||||||
|
|
||||||
|
The spatial extension offers the ``ckan-pycsw`` command, which allows to expose
|
||||||
|
the spatial datasets harvested from other sources in a CSW interface. This is
|
||||||
|
powered by `pycsw <http://pycsw.org>`_, which fully implements the OGC CSW
|
||||||
|
specification.
|
||||||
|
|
||||||
|
How it works
|
||||||
|
++++++++++++
|
||||||
|
|
||||||
|
|
||||||
|
The current implementation is based on CKAN and pycsw being loosely integrated
|
||||||
|
via the CKAN API. pycsw will be generally installed in the same server as CKAN
|
||||||
|
(although it can also be run on a separate one), and the synchronization
|
||||||
|
command will be run regularly to keep the records on the pycsw repository up to
|
||||||
|
date. This is done using the CKAN API to get all the datasets identifiers (more
|
||||||
|
precisely the ones from datasets that have been harvested) and then deciding
|
||||||
|
which ones need to be created, updated or deleted on the pycsw repository. For
|
||||||
|
those that need to be created or updated, the original harvested spatial
|
||||||
|
document (ie ISO 19139) is requested from CKAN, and it is then imported using
|
||||||
|
pycsw internal functions::
|
||||||
|
|
||||||
|
Harvested
|
||||||
|
datasets
|
||||||
|
+
|
||||||
|
|
|
||||||
|
v
|
||||||
|
+--------+ +---------+
|
||||||
|
| | CKAN API | |
|
||||||
|
| CKAN | +------------> | pycsw | +------> CSW
|
||||||
|
| | | |
|
||||||
|
+--------+ +---------+
|
||||||
|
|
||||||
|
|
||||||
|
Remember, only datasets that were harvested with the `Spatial Harvesters`_
|
||||||
|
can currently be exposed via pycsw.
|
||||||
|
|
||||||
|
All necessary tasks are done with the ``ckan-pycsw`` command. To get more
|
||||||
|
details of its usage, run the following::
|
||||||
|
|
||||||
|
cd /usr/lib/ckan/default/src/ckanext-spatial
|
||||||
|
paster ckan-pycsw --help
|
||||||
|
|
||||||
|
|
||||||
|
Setup
|
||||||
|
+++++
|
||||||
|
|
||||||
|
1. Install pycsw. There are several options for this, depending on your
|
||||||
|
server setup, check the `pycsw documentation <http://pycsw.org/docs/installation.html>`_.
|
||||||
|
|
||||||
|
The following instructions assume that you have installed CKAN via a
|
||||||
|
`package install <http://docs.ckan.org/en/latest/install-from-package.html>`_
|
||||||
|
and should be run as root, but the steps are the same if you are setting
|
||||||
|
it up in another location::
|
||||||
|
|
||||||
|
cd /usr/lib/ckan/default/src
|
||||||
|
source ../bin/activate
|
||||||
|
|
||||||
|
# From now on the virtualenv should be activated
|
||||||
|
|
||||||
|
git clone https://github.com/geopython/pycsw.git
|
||||||
|
cd pycsw
|
||||||
|
pip install -e . && pip install -r requirements.txt
|
||||||
|
python setup.py build
|
||||||
|
python setup.py install
|
||||||
|
|
||||||
|
2. Create a database for pycsw. In theory you can use the same database that
|
||||||
|
CKAN is using, but if you want to keep them separated, use the following
|
||||||
|
command to create a new one (we'll use the same default user though)::
|
||||||
|
|
||||||
|
sudo -u postgres createdb -O ckan_default pycsw -E utf-8
|
||||||
|
|
||||||
|
3. Configure pycsw. An example configuration file is included on the source::
|
||||||
|
|
||||||
|
cp default-sample.cfg default.cfg
|
||||||
|
|
||||||
|
To keep things tidy we will create a symlink to this file on the CKAN
|
||||||
|
configuration directory::
|
||||||
|
|
||||||
|
ln -s /usr/lib/ckan/default/src/pycsw/default.cfg /etc/ckan/default/pycsw.cfg
|
||||||
|
|
||||||
|
Open the file with your favourite editor. The main settings you should tweak
|
||||||
|
are ``server.home`` and ``repository.database``::
|
||||||
|
|
||||||
|
[server]
|
||||||
|
home=/usr/lib/ckan/default/src/pycsw
|
||||||
|
...
|
||||||
|
[repository]
|
||||||
|
database=postgresql://ckan_default:pass@localhost/pycsw
|
||||||
|
|
||||||
|
The rest of the options are described `here <http://pycsw.org/docs/configuration.html>`_.
|
||||||
|
|
||||||
|
4. Setup the pycsw table. This is done with the ``ckan-pycsw`` paster command
|
||||||
|
(Remember to have the virtualenv activated when running it)::
|
||||||
|
|
||||||
|
cd /usr/lib/ckan/default/src/ckanext-spatial
|
||||||
|
paster ckan-pycsw setup -p /etc/ckan/default/pycsw.cfg
|
||||||
|
|
||||||
|
At this point you should be ready to run pycsw with the wsgi script that it
|
||||||
|
includes::
|
||||||
|
|
||||||
|
cd /usr/lib/ckan/default/src/pycsw
|
||||||
|
python csw.wsgi
|
||||||
|
|
||||||
|
This will run pycsw at http://localhost:8000. Visiting the following URL
|
||||||
|
should return you the Capabilities file:
|
||||||
|
|
||||||
|
http://localhost:8000?service=CSW&version=2.0.2&request=GetCapabilities
|
||||||
|
|
||||||
|
5. Load the CKAN datasets into pycsw. Again we will use the ``ckan-pycsw``
|
||||||
|
command for this::
|
||||||
|
|
||||||
|
cd /usr/lib/ckan/default/src/ckanext-spatial
|
||||||
|
paster ckan-pycsw load -p /etc/ckan/default/pycsw.cfg
|
||||||
|
|
||||||
|
When the loading is finished, check that results are returned when visiting
|
||||||
|
this link:
|
||||||
|
|
||||||
|
http://localhost:8000/?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary
|
||||||
|
|
||||||
|
The ``numberOfRecordsMatched`` should match the number of harvested datasets
|
||||||
|
in CKAN (minus import errors). If you run the command again new or udpated
|
||||||
|
datasets will be synchronized and deleted datasets from CKAN will be removed
|
||||||
|
from pycsw as well.
|
||||||
|
|
||||||
|
Running it on production site
|
||||||
|
+++++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
On a production site you probably want to run the load command regularly to
|
||||||
|
keep CKAN and pycsw in sync, and serve pycsw with Apache + mod_wsgi like CKAN.
|
||||||
|
|
||||||
|
* To run the load command regularly you can set up a cron job. Type ``crontab -e``
|
||||||
|
and copy the following lines::
|
||||||
|
|
||||||
|
# m h dom mon dow command
|
||||||
|
0 * * * * /usr/lib/ckan/default/bin/paster --plugin=ckanext-spatial ckan-pycsw load -p /etc/ckan/default/pycsw.cfg
|
||||||
|
|
||||||
|
This particular example will run the load command every hour. You can of
|
||||||
|
course modify this periodicity, for instance reducing it for huge instances.
|
||||||
|
This `Wikipedia page <http://en.wikipedia.org/wiki/Cron#CRON_expression>`_
|
||||||
|
has a good overview of the crontab syntax.
|
||||||
|
|
||||||
|
* To run pycsw under Apache check the pycsw `installation documentation <http://pycsw.org/docs/installation.html#running-on-wsgi>`_
|
||||||
|
or follow this quick steps (they assume the paths used on the previous steps):
|
||||||
|
|
||||||
|
- Edit ``/etc/apache2/sites-available/ckan_default`` and add the following
|
||||||
|
line just before the existing ``WSGIScriptAlias`` directive::
|
||||||
|
|
||||||
|
WSGIScriptAlias /csw /usr/lib/ckan/default/src/pycsw/csw.wsgi
|
||||||
|
|
||||||
|
- Edit the ``/usr/lib/ckan/default/src/pycsw/csw.wsgi`` file and add these two
|
||||||
|
lines just after the imports on the top of the file::
|
||||||
|
|
||||||
|
activate_this = os.path.join('/usr/lib/ckan/default/bin/activate_this.py')
|
||||||
|
execfile(activate_this, {"__file__":activate_this})
|
||||||
|
|
||||||
|
We need these to activate the virtualenv where we installed pycsw into.
|
||||||
|
|
||||||
|
- Restart Apache::
|
||||||
|
|
||||||
|
service apache2 restart
|
||||||
|
|
||||||
|
pycsw should be now accessible at http://localhost/csw
|
||||||
|
|
||||||
|
|
||||||
CSW Server
|
CSW Server
|
||||||
----------
|
----------
|
||||||
|
|
||||||
|
.. note:: **Deprecated:** The old csw plugin has been deprecated, please see `ckan-pycsw`_
|
||||||
|
for details on how to integrate with pycsw.
|
||||||
|
|
||||||
CSW (Catalogue Service for the Web) is an OGC standard for a web interface that allows you to access metadata (which are records that describe data or services)
|
CSW (Catalogue Service for the Web) is an OGC standard for a web interface that allows you to access metadata (which are records that describe data or services)
|
||||||
|
|
||||||
The currently supported methods with this CSW Server are:
|
The currently supported methods with this CSW Server are:
|
||||||
|
@ -454,7 +625,7 @@ To specify which validators to use during harvesting, specify their names in CKA
|
||||||
cswinfo
|
cswinfo
|
||||||
-------
|
-------
|
||||||
|
|
||||||
When ckanext-csw is installed, it provides a command-line tool ``cswinfo``, for making queries on CSW servers and returns the info in nicely formatted JSON. This may be more convenient to type than using, for example, curl.
|
The command-line tool ``cswinfo`` allows to make queries on CSW servers and returns the info in nicely formatted JSON. This may be more convenient to type than using, for example, curl.
|
||||||
|
|
||||||
Currently available queries are:
|
Currently available queries are:
|
||||||
* getcapabilities
|
* getcapabilities
|
||||||
|
@ -570,6 +741,9 @@ the EPSG code as an integer (e.g 4326, 4258, 27700, etc). It defaults to
|
||||||
Configuration - CSW Server
|
Configuration - CSW Server
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
|
.. note:: **Deprecated:** The old csw plugin has been deprecated, please see `ckan-pycsw`_
|
||||||
|
for details on how to integrate with pycsw.
|
||||||
|
|
||||||
Configure the CSW Server with the following keys in your CKAN config file (default values are shown)::
|
Configure the CSW Server with the following keys in your CKAN config file (default values are shown)::
|
||||||
|
|
||||||
cswservice.title = Untitled Service - set cswservice.title in config
|
cswservice.title = Untitled Service - set cswservice.title in config
|
||||||
|
|
|
@ -264,15 +264,18 @@ Manages the CKAN-pycsw integration
|
||||||
python ckan-pycsw.py load [-p] -u
|
python ckan-pycsw.py load [-p] -u
|
||||||
Loads CKAN datasets as records into the pycsw db.
|
Loads CKAN datasets as records into the pycsw db.
|
||||||
|
|
||||||
|
python ckan-pycsw.py clear [-p]
|
||||||
|
Removes all records from the pycsw table.
|
||||||
|
|
||||||
All commands require the pycsw configuration file. By default it will try
|
All commands require the pycsw configuration file. By default it will try
|
||||||
to find a file called 'default.cfg' in the same directory, but you'll
|
to find a file called 'default.cfg' in the same directory, but you'll
|
||||||
probably need to provide the actual location via the -p option:
|
probably need to provide the actual location via the -p option:
|
||||||
|
|
||||||
paster ckan-pycsw setup -p /etc/pycsw/default.cfg
|
paster ckan-pycsw setup -p /etc/ckan/default/pycsw.cfg
|
||||||
|
|
||||||
The load command requires a CKAN URL from where the datasets will be pulled:
|
The load command requires a CKAN URL from where the datasets will be pulled:
|
||||||
|
|
||||||
paster ckan-pycsw setup -p /etc/pycsw/default.cfg -u http://localhost
|
paster ckan-pycsw load -p /etc/ckan/default/pycsw.cfg -u http://localhost
|
||||||
|
|
||||||
'''
|
'''
|
||||||
|
|
||||||
|
@ -301,11 +304,11 @@ if __name__ == '__main__':
|
||||||
|
|
||||||
parser.add_argument('-p', '--pycsw_config',
|
parser.add_argument('-p', '--pycsw_config',
|
||||||
action='store', default='default.cfg',
|
action='store', default='default.cfg',
|
||||||
help='Path to pycsw config file')
|
help='pycsw config file to use.')
|
||||||
|
|
||||||
parser.add_argument('-u', '--ckan_url',
|
parser.add_argument('-u', '--ckan_url',
|
||||||
action='store',
|
action='store',
|
||||||
help='CKAN URL')
|
help='CKAN instance to import the datasets from.')
|
||||||
|
|
||||||
if len(sys.argv) <= 1:
|
if len(sys.argv) <= 1:
|
||||||
parser.print_usage()
|
parser.print_usage()
|
||||||
|
|
|
@ -14,17 +14,20 @@ class Pycsw(script.command.Command):
|
||||||
ckan-pycsw load [-p] [-u]
|
ckan-pycsw load [-p] [-u]
|
||||||
Loads CKAN datasets as records into the pycsw db.
|
Loads CKAN datasets as records into the pycsw db.
|
||||||
|
|
||||||
|
ckan-pycsw clear [-p]
|
||||||
|
Removes all records from the pycsw table.
|
||||||
|
|
||||||
All commands require the pycsw configuration file. By default it will try
|
All commands require the pycsw configuration file. By default it will try
|
||||||
to find a file called 'default.cfg' in the same directory, but you'll
|
to find a file called 'default.cfg' in the same directory, but you'll
|
||||||
probably need to provide the actual location with the -p option.
|
probably need to provide the actual location with the -p option.
|
||||||
|
|
||||||
paster ckan-pycsw setup -p /etc/pycsw/default.cfg
|
paster ckan-pycsw setup -p /etc/ckan/default/pycsw.cfg
|
||||||
|
|
||||||
The load command requires a CKAN URL from where the datasets will be pulled.
|
The load command requires a CKAN URL from where the datasets will be pulled.
|
||||||
By default it is set to 'http://localhost', but you can define it with the -u
|
By default it is set to 'http://localhost', but you can define it with the -u
|
||||||
option:
|
option:
|
||||||
|
|
||||||
paster ckan-pycsw setup -p /etc/pycsw/default.cfg -u http://ckan.instance.org
|
paster ckan-pycsw load -p /etc/ckan/default/pycsw.cfg -u http://ckan.instance.org
|
||||||
|
|
||||||
'''
|
'''
|
||||||
|
|
||||||
|
@ -32,7 +35,7 @@ option:
|
||||||
parser.add_option('-p', '--pycsw-config', dest='pycsw_config',
|
parser.add_option('-p', '--pycsw-config', dest='pycsw_config',
|
||||||
default='default.cfg', help='pycsw config file to use.')
|
default='default.cfg', help='pycsw config file to use.')
|
||||||
parser.add_option('-u', '--ckan-url', dest='ckan_url',
|
parser.add_option('-u', '--ckan-url', dest='ckan_url',
|
||||||
default='http://localhost', help='pycsw config file to use.')
|
default='http://localhost', help='CKAN instance to import the datasets from.')
|
||||||
|
|
||||||
summary = __doc__.split('\n')[0]
|
summary = __doc__.split('\n')[0]
|
||||||
usage = __doc__
|
usage = __doc__
|
||||||
|
|
Loading…
Reference in New Issue