Update README for 2.0
General clean-up, mention redis, new auth, run command
This commit is contained in:
parent
1714e55110
commit
b316cc26a2
67
README.rst
67
README.rst
|
@ -8,13 +8,17 @@ and adds a CLI and a WUI to CKAN to manage harvesting sources and jobs.
|
||||||
Installation
|
Installation
|
||||||
============
|
============
|
||||||
|
|
||||||
The harvest extension uses Message Queuing to handle the different gather
|
The harvest extension can use two different backends. You can choose whichever
|
||||||
stages.
|
you prefer depending on your needs:
|
||||||
|
|
||||||
You will need to install the RabbitMQ server::
|
* `RabbitMQ <http://www.rabbitmq.com/>`_: To install it, run::
|
||||||
|
|
||||||
sudo apt-get install rabbitmq-server
|
sudo apt-get install rabbitmq-server
|
||||||
|
|
||||||
|
* `Redis <http://redis.io/>`_: To install it, run::
|
||||||
|
|
||||||
|
sudo apt-get install redis-server
|
||||||
|
|
||||||
Clone the repository and set up the extension::
|
Clone the repository and set up the extension::
|
||||||
|
|
||||||
git clone https://github.com/okfn/ckanext-harvest
|
git clone https://github.com/okfn/ckanext-harvest
|
||||||
|
@ -27,6 +31,11 @@ well as the harvester for CKAN instances (included with the extension)::
|
||||||
|
|
||||||
ckan.plugins = harvest ckan_harvester
|
ckan.plugins = harvest ckan_harvester
|
||||||
|
|
||||||
|
Also define the backend that you are using with the ``ckan.harvest.mq.type``
|
||||||
|
option (it defaults to ``rabbitmq``)::
|
||||||
|
|
||||||
|
ckan.harvest.mq.type = redis
|
||||||
|
|
||||||
|
|
||||||
Configuration
|
Configuration
|
||||||
=============
|
=============
|
||||||
|
@ -35,13 +44,7 @@ Run the following command to create the necessary tables in the database::
|
||||||
|
|
||||||
paster --plugin=ckanext-harvest harvester initdb --config=mysite.ini
|
paster --plugin=ckanext-harvest harvester initdb --config=mysite.ini
|
||||||
|
|
||||||
The extension needs a user with sysadmin privileges to perform the
|
After installation, the harvest source listing should be available under /harvest, eg:
|
||||||
harvesting jobs. You can create such a user running this command::
|
|
||||||
|
|
||||||
paster --plugin=ckan sysadmin add harvest
|
|
||||||
|
|
||||||
After installation, the harvest interface should be available under /harvest
|
|
||||||
if you're logged in with sysadmin permissions, eg.
|
|
||||||
|
|
||||||
http://localhost:5000/harvest
|
http://localhost:5000/harvest
|
||||||
|
|
||||||
|
@ -55,7 +58,7 @@ The following operations can be run from the command line using the
|
||||||
harvester initdb
|
harvester initdb
|
||||||
- Creates the necessary tables in the database
|
- Creates the necessary tables in the database
|
||||||
|
|
||||||
harvester source {url} {type} [{active}] [{user-id}] [{publisher-id}]
|
harvester source {url} {type} [{config}] [{active}] [{user-id}] [{publisher-id}] [{frequency}]
|
||||||
- create new harvest source
|
- create new harvest source
|
||||||
|
|
||||||
harvester rmsource {id}
|
harvester rmsource {id}
|
||||||
|
@ -80,16 +83,26 @@ The following operations can be run from the command line using the
|
||||||
harvester fetch_consumer
|
harvester fetch_consumer
|
||||||
- starts the consumer for the fetching queue
|
- starts the consumer for the fetching queue
|
||||||
|
|
||||||
harvester import [{source-id}]
|
harvester purge_queues
|
||||||
- perform the import stage with the last fetched objects, optionally
|
- removes all jobs from fetch and gather queue
|
||||||
belonging to a certain source.
|
|
||||||
Please note that no objects will be fetched from the remote server.
|
harvester [-j] [--segments={segments}] import [{source-id}]
|
||||||
It will only affect the last fetched objects already present in the
|
- perform the import stage with the last fetched objects, optionally belonging to a certain source.
|
||||||
database.
|
Please note that no objects will be fetched from the remote server. It will only affect
|
||||||
|
the last fetched objects already present in the database.
|
||||||
|
|
||||||
|
If the -j flag is provided, the objects are not joined to existing datasets. This may be useful
|
||||||
|
when importing objects for the first time.
|
||||||
|
|
||||||
|
The --segments flag allows to define a string containing hex digits that represent which of
|
||||||
|
the 16 harvest object segments to import. e.g. 15af will run segments 1,5,a,f
|
||||||
|
|
||||||
harvester job-all
|
harvester job-all
|
||||||
- create new harvest jobs for all active sources.
|
- create new harvest jobs for all active sources.
|
||||||
|
|
||||||
|
harvester reindex
|
||||||
|
- reindexes the harvest source datasets
|
||||||
|
|
||||||
The commands should be run with the pyenv activated and refer to your sites configuration file (mysite.ini in this example)::
|
The commands should be run with the pyenv activated and refer to your sites configuration file (mysite.ini in this example)::
|
||||||
|
|
||||||
paster --plugin=ckanext-harvest harvester sources --config=mysite.ini
|
paster --plugin=ckanext-harvest harvester sources --config=mysite.ini
|
||||||
|
@ -97,8 +110,15 @@ The commands should be run with the pyenv activated and refer to your sites conf
|
||||||
Authorization
|
Authorization
|
||||||
=============
|
=============
|
||||||
|
|
||||||
TODO
|
Starting from CKAN 2.0, harvest sources behave exactly the same as datasets
|
||||||
|
(they are actually internally implemented as a dataset type). That means that
|
||||||
|
can be searched and faceted, and that the same authorization rules can be
|
||||||
|
applied to them. The default authorization settings are based on organizations
|
||||||
|
(equivalent to the `publisher profile` found in old versions).
|
||||||
|
|
||||||
|
Have a look at the `Authorization <http://docs.ckan.org/en/latest/authorization.html>`_
|
||||||
|
documentation on CKAN core to see how to configure your instance depending on
|
||||||
|
your needs.
|
||||||
|
|
||||||
The CKAN harvester
|
The CKAN harvester
|
||||||
===================
|
===================
|
||||||
|
@ -347,11 +367,12 @@ pending harvesting jobs::
|
||||||
|
|
||||||
paster --plugin=ckanext-harvest harvester run --config=mysite.ini
|
paster --plugin=ckanext-harvest harvester run --config=mysite.ini
|
||||||
|
|
||||||
Note: If you don't have the `synchronous_search` plugin loaded, you will need
|
The ``run`` command not only starts any pending harvesting jobs, but also
|
||||||
to update the search index after the harvesting in order for the packages to
|
flags those that are finished, allowing new jobs to be created on that particular
|
||||||
appear in search results::
|
source and refreshing the source statistics. That means that you will need to run
|
||||||
|
this command before being able to create a new job on a source that was being
|
||||||
paster --plugin=ckan search-index rebuild
|
harvested (On a production site you will tipically have a cron job that runs the
|
||||||
|
command regularly, see next section).
|
||||||
|
|
||||||
|
|
||||||
Setting up the harvesters on a production server
|
Setting up the harvesters on a production server
|
||||||
|
|
Loading…
Reference in New Issue