2011-03-16 12:06:25 +01:00
|
|
|
================================================
|
|
|
|
ckanext-harvest - Remote harvesting extension
|
|
|
|
================================================
|
2011-03-11 10:49:28 +01:00
|
|
|
|
|
|
|
This extension will contain all harvesting related code, now present
|
|
|
|
in ckan core, ckanext-dgu and ckanext-csw.
|
|
|
|
|
2011-04-13 13:39:53 +02:00
|
|
|
Configuration
|
|
|
|
=============
|
2011-03-25 18:01:26 +01:00
|
|
|
|
2011-04-13 13:39:53 +02:00
|
|
|
Run the following command (in the ckanext-harvest directory) to create
|
|
|
|
the necessary tables in the database::
|
2011-03-25 18:01:26 +01:00
|
|
|
|
2011-04-13 13:39:53 +02:00
|
|
|
paster harvester initdb --config=../ckan/development.ini
|
2011-03-25 18:01:26 +01:00
|
|
|
|
2011-03-11 10:49:28 +01:00
|
|
|
|
|
|
|
The extension needs a user with sysadmin privileges to perform the
|
2011-03-16 12:06:25 +01:00
|
|
|
harvesting jobs. You can create such a user running these two commands in
|
|
|
|
the ckan directory::
|
|
|
|
|
|
|
|
paster user add harvest
|
|
|
|
|
|
|
|
paster sysadmin add harvest
|
|
|
|
|
|
|
|
The user's API key must be defined in the CKAN
|
|
|
|
configuration file (.ini) in the [app:main] section::
|
|
|
|
|
2011-04-13 13:46:52 +02:00
|
|
|
ckan.harvest.api_key = 4e1dac58-f642-4e54-bbc4-3ea262271fe2
|
2011-03-11 10:49:28 +01:00
|
|
|
|
|
|
|
|
|
|
|
The API URL used can be also defined in the ini file (it defaults to
|
2011-03-16 12:06:25 +01:00
|
|
|
http://localhost:5000/)::
|
2011-03-11 10:49:28 +01:00
|
|
|
|
2011-03-16 12:06:25 +01:00
|
|
|
ckan.api_url = <api_url>
|
2011-03-18 18:46:47 +01:00
|
|
|
|
2011-04-13 13:39:53 +02:00
|
|
|
Tests
|
|
|
|
=====
|
|
|
|
|
|
|
|
To run the tests, this is the basic command::
|
|
|
|
|
|
|
|
$ nosetests --ckan tests/
|
|
|
|
|
|
|
|
Or with postgres::
|
|
|
|
|
|
|
|
$ nosetests --ckan --with-pylons=../ckan/test-core.ini tests/
|
|
|
|
|
|
|
|
(See the Ckan README for more information.)
|
|
|
|
|
|
|
|
|
2011-03-16 12:06:25 +01:00
|
|
|
Command line interface
|
|
|
|
======================
|
|
|
|
|
|
|
|
The following operations can be run from the command line using the
|
|
|
|
``paster harvester`` command::
|
|
|
|
|
2011-04-13 13:39:53 +02:00
|
|
|
harvester initdb
|
|
|
|
- Creates the necessary tables in the database
|
|
|
|
|
|
|
|
harvester source {url} {type} [{active}] [{user-id}] [{publisher-id}]
|
2011-03-16 12:06:25 +01:00
|
|
|
- create new harvest source
|
|
|
|
|
2011-04-13 13:39:53 +02:00
|
|
|
harvester rmsource {id}
|
|
|
|
- remove (inactivate) a harvester source
|
2011-03-16 12:06:25 +01:00
|
|
|
|
2011-04-13 13:39:53 +02:00
|
|
|
harvester sources [all]
|
2011-03-16 12:06:25 +01:00
|
|
|
- lists harvest sources
|
2011-04-13 13:39:53 +02:00
|
|
|
If 'all' is defined, it also shows the Inactive sources
|
2011-03-16 12:06:25 +01:00
|
|
|
|
2011-04-13 13:39:53 +02:00
|
|
|
harvester job {source-id}
|
|
|
|
- create new harvest job
|
2011-03-16 12:06:25 +01:00
|
|
|
|
|
|
|
harvester jobs
|
2011-04-13 13:39:53 +02:00
|
|
|
- lists harvest jobs
|
2011-03-16 12:06:25 +01:00
|
|
|
|
|
|
|
harvester run
|
2011-04-13 13:39:53 +02:00
|
|
|
- runs harvest jobs
|
|
|
|
|
|
|
|
harvester gather_consumer
|
|
|
|
- starts the consumer for the gathering queue
|
|
|
|
|
|
|
|
harvester fetch_consumer
|
|
|
|
- starts the consumer for the fetching queue
|
2011-03-18 16:44:40 +01:00
|
|
|
|
2011-03-16 12:06:25 +01:00
|
|
|
The commands should be run from the ckanext-harvest directory and expect
|
|
|
|
a development.ini file to be present. Most of the time you will specify
|
|
|
|
the config explicitly though::
|
2011-03-11 10:49:28 +01:00
|
|
|
|
2011-03-16 12:06:25 +01:00
|
|
|
paster harvester sources --config=../ckan/development.ini
|
2011-03-09 19:56:55 +01:00
|
|
|
|