diff --git a/README.rst b/README.rst
index 4bdd029..1789fb5 100644
--- a/README.rst
+++ b/README.rst
@@ -9,7 +9,7 @@ Dependencies
============
The harvest extension uses Message Queuing to handle the different gather
-stages.
+stages.
You will need to install the RabbitMQ server::
@@ -23,12 +23,12 @@ The extension uses `carrot` as messaging library::
Configuration
=============
-Run the following command (in the ckanext-harvest directory) to create
+Run the following command (in the ckanext-harvest directory) to create
the necessary tables in the database::
paster harvester initdb --config=../ckan/development.ini
-The extension needs a user with sysadmin privileges to perform the
+The extension needs a user with sysadmin privileges to perform the
harvesting jobs. You can create such a user running these two commands in
the ckan directory::
@@ -53,25 +53,25 @@ Or with postgres::
Command line interface
======================
-The following operations can be run from the command line using the
+The following operations can be run from the command line using the
``paster harvester`` command::
harvester initdb
- Creates the necessary tables in the database
- harvester source {url} {type} [{active}] [{user-id}] [{publisher-id}]
+ harvester source {url} {type} [{active}] [{user-id}] [{publisher-id}]
- create new harvest source
harvester rmsource {id}
- remove (inactivate) a harvester source
- harvester sources [all]
+ harvester sources [all]
- lists harvest sources
If 'all' is defined, it also shows the Inactive sources
harvester job {source-id}
- create new harvest job
-
+
harvester jobs
- lists harvest jobs
@@ -83,9 +83,9 @@ The following operations can be run from the command line using the
harvester fetch_consumer
- starts the consumer for the fetching queue
-
+
The commands should be run from the ckanext-harvest directory and expect
-a development.ini file to be present. Most of the time you will specify
+a development.ini file to be present. Most of the time you will specify
the config explicitly though::
paster harvester sources --config=../ckan/development.ini
@@ -103,18 +103,18 @@ Extensions can implement the harvester interface to perform harvesting
operations. The harvesting process takes place on three stages:
1. The **gather** stage compiles all the resource identifiers that need to
- be fetched in the next stage (e.g. in a CSW server, it will perform a
+ be fetched in the next stage (e.g. in a CSW server, it will perform a
`GetRecords` operation).
2. The **fetch** stage gets the contents of the remote objects and stores
- them in the database (e.g. in a CSW server, it will perform n
+ them in the database (e.g. in a CSW server, it will perform n
`GetRecordById` operations).
3. The **import** stage performs any necessary actions on the fetched
resource (generally creating a CKAN package, but it can be anything the
extension needs).
-Plugins willing to implement the harvesting interface must provide the
+Plugins willing to implement the harvesting interface must provide the
following methods::
from ckan.plugins.core import SingletonPlugin, implements
@@ -126,17 +126,32 @@ following methods::
'''
implements(IHarvester)
- def get_type(self):
+ def info(self):
'''
- Plugins must provide this method, which will return a string with the
- Harvester type implemented by the plugin (e.g ``CSW``,``INSPIRE``, etc).
- This will ensure that they only receive Harvest Jobs and Objects
- relevant to them.
+ Harvesting implementations must provide this method, which will return a
+ dictionary containing different descriptors of the harvester. The
+ returned dictionary should contain:
- returns: A string with the harvester type
+ * name: machine-readable name. This will be the value stored in the
+ database, and the one used by ckanext-harvest to call the appropiate
+ harvester.
+ * title: human-readable name. This will appear in the form's select box
+ in the WUI.
+ * description: a small description of what the harvester does. This will
+ appear on the form as a guidance to the user.
+
+ A complete example may be::
+
+ {
+ 'name': 'csw',
+ 'title': 'CSW Server',
+ 'description': 'A server that implements OGC's Catalog Service
+ for the Web (CSW) standard'
+ }
+
+ returns: A dictionary with the harvester descriptors
'''
-
def gather_stage(self, harvest_job):
'''
The gather stage will recieve a HarvestJob object and will be
@@ -172,7 +187,7 @@ following methods::
'''
The import stage will receive a HarvestObject object and will be
responsible for:
- - performing any necessary action with the fetched object (e.g
+ - performing any necessary action with the fetched object (e.g
create a CKAN package).
Note: if this stage creates or updates a package, a reference
to the package should be added to the HarvestObject.
@@ -196,7 +211,7 @@ Running the harvest jobs
The harvesting extension uses two different queues, one that handles the
gathering and another one that handles the fetching and importing. To start
-the consumers run the following command from the ckanext-harvest directory
+the consumers run the following command from the ckanext-harvest directory
(make sure you have your python environment activated)::
paster harvester gather_consumer --config=../ckan/development.ini
diff --git a/ckanext/harvest/controllers/view.py b/ckanext/harvest/controllers/view.py
index 1b0662f..1695a40 100644
--- a/ckanext/harvest/controllers/view.py
+++ b/ckanext/harvest/controllers/view.py
@@ -9,7 +9,7 @@ from ckan.logic import NotFound, ValidationError
from ckanext.harvest.logic.schema import harvest_source_form_schema
from ckanext.harvest.lib import create_harvest_source, edit_harvest_source, \
get_harvest_source, get_harvest_sources, \
- create_harvest_job, get_registered_harvesters_types
+ create_harvest_job, get_registered_harvesters_info
import logging
log = logging.getLogger(__name__)
@@ -39,7 +39,7 @@ class ViewController(BaseController):
errors = errors or {}
error_summary = error_summary or {}
#TODO: Use new description interface to build the types select and descriptions
- vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'types': get_registered_harvesters_types()}
+ vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'harvesters': get_registered_harvesters_info()}
c.form = render('source/new_source_form.html', extra_vars=vars)
return render('source/new.html')
@@ -61,7 +61,7 @@ class ViewController(BaseController):
abort(400, 'Integrity Error')
except ValidationError,e:
errors = e.error_dict
- error_summary = e.error_summary if 'error_summary' in e else None
+ error_summary = e.error_summary if hasattr(e,'error_summary') else None
return self.new(data_dict, errors, error_summary)
def edit(self, id, data = None,errors = None, error_summary = None):
@@ -79,7 +79,7 @@ class ViewController(BaseController):
errors = errors or {}
error_summary = error_summary or {}
#TODO: Use new description interface to build the types select and descriptions
- vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'types': get_registered_harvesters_types()}
+ vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'harvesters': get_registered_harvesters_info()}
c.form = render('source/new_source_form.html', extra_vars=vars)
return render('source/edit.html')
@@ -99,7 +99,7 @@ class ViewController(BaseController):
abort(404, _('Harvest Source not found'))
except ValidationError,e:
errors = e.error_dict
- error_summary = e.error_summary if 'error_summary' in e else None
+ error_summary = e.error_summary if hasattr(e,'error_summary') else None
return self.edit(id,data_dict, errors, error_summary)
def _check_data_dict(self, data_dict):
diff --git a/ckanext/harvest/harvesters.py b/ckanext/harvest/harvesters.py
index 22116c5..fabbd14 100644
--- a/ckanext/harvest/harvesters.py
+++ b/ckanext/harvest/harvesters.py
@@ -75,8 +75,12 @@ class CKANHarvester(SingletonPlugin):
err.save()
log.error(message)
- def get_type(self):
- return 'CKAN'
+ def info(self):
+ return {
+ 'name': 'ckan',
+ 'title': 'CKAN',
+ 'description': 'Harvests remote CKAN instances'
+ }
def gather_stage(self,harvest_job):
log.debug('In CKANHarvester gather_stage')
diff --git a/ckanext/harvest/interfaces.py b/ckanext/harvest/interfaces.py
index d59bbfb..9d47883 100644
--- a/ckanext/harvest/interfaces.py
+++ b/ckanext/harvest/interfaces.py
@@ -6,17 +6,32 @@ class IHarvester(Interface):
'''
- def get_type(self):
+ def info(self):
'''
- Plugins must provide this method, which will return a string with the
- Harvester type implemented by the plugin (e.g ``CSW``,``INSPIRE``, etc).
- This will ensure that they only receive Harvest Jobs and Objects
- relevant to them.
+ Harvesting implementations must provide this method, which will return a
+ dictionary containing different descriptors of the harvester. The
+ returned dictionary should contain:
- returns: A string with the harvester type
+ * name: machine-readable name. This will be the value stored in the
+ database, and the one used by ckanext-harvest to call the appropiate
+ harvester.
+ * title: human-readable name. This will appear in the form's select box
+ in the WUI.
+ * description: a small description of what the harvester does. This will
+ appear on the form as a guidance to the user.
+
+ A complete example may be::
+
+ {
+ 'name': 'csw',
+ 'title': 'CSW Server',
+ 'description': 'A server that implements OGC's Catalog Service
+ for the Web (CSW) standard'
+ }
+
+ returns: A dictionary with the harvester descriptors
'''
-
def gather_stage(self, harvest_job):
'''
The gather stage will recieve a HarvestJob object and will be
@@ -55,7 +70,7 @@ class IHarvester(Interface):
'''
The import stage will receive a HarvestObject object and will be
responsible for:
- - performing any necessary action with the fetched object (e.g
+ - performing any necessary action with the fetched object (e.g
create a CKAN package).
Note: if this stage creates or updates a package, a reference
to the package should be added to the HarvestObject.
diff --git a/ckanext/harvest/lib/__init__.py b/ckanext/harvest/lib/__init__.py
index 30b94b3..d6fa701 100644
--- a/ckanext/harvest/lib/__init__.py
+++ b/ckanext/harvest/lib/__init__.py
@@ -196,7 +196,6 @@ def _prettify(field_name):
return field_name.replace('_', ' ')
def _error_summary(error_dict):
-
error_summary = {}
for key, error in error_dict.iteritems():
error_summary[_prettify(key)] = error[0]
@@ -373,7 +372,7 @@ def import_last_objects(source_id=None):
if obj.guid != last_obj_guid:
imported_objects.append(obj)
for harvester in PluginImplementations(IHarvester):
- if harvester.get_type() == obj.job.source.type:
+ if harvester.info()['name'] == obj.job.source.type:
if hasattr(harvester,'force_import'):
harvester.force_import = True
harvester.import_stage(obj)
@@ -381,9 +380,14 @@ def import_last_objects(source_id=None):
return imported_objects
-def get_registered_harvesters_types():
+def get_registered_harvesters_info():
# TODO: Use new description interface when implemented
- available_types = []
+ available_harvesters = []
for harvester in PluginImplementations(IHarvester):
- available_types.append(harvester.get_type())
- return available_types
+ info = harvester.info()
+ if not info or 'name' not in info:
+ log.error('Harvester %r does not provide the harvester name in the info response' % str(harvester))
+ continue
+ available_harvesters.append(info)
+
+ return available_harvesters
diff --git a/ckanext/harvest/logic/validators.py b/ckanext/harvest/logic/validators.py
index 85c1454..9f7343c 100644
--- a/ckanext/harvest/logic/validators.py
+++ b/ckanext/harvest/logic/validators.py
@@ -66,7 +66,12 @@ def harvest_source_type_exists(value,context):
# Get all the registered harvester types
available_types = []
for harvester in PluginImplementations(IHarvester):
- available_types.append(harvester.get_type())
+ info = harvester.info()
+ if not info or 'name' not in info:
+ log.error('Harvester %r does not provide the harvester name in the info response' % str(harvester))
+ continue
+ available_types.append(info['name'])
+
if not value in available_types:
raise Invalid('Unknown harvester type: %s. Have you registered a harvester for this type?' % value)
diff --git a/ckanext/harvest/public/ckanext/harvest/style.css b/ckanext/harvest/public/ckanext/harvest/style.css
index 2aa3cfa..de04d84 100644
--- a/ckanext/harvest/public/ckanext/harvest/style.css
+++ b/ckanext/harvest/public/ckanext/harvest/style.css
@@ -9,3 +9,7 @@
#harvest-sources th.action{
font-style: italic;
}
+
+.harvester-title{
+ font-weight: bold;
+}
diff --git a/ckanext/harvest/queue.py b/ckanext/harvest/queue.py
index 944e6e9..8029a94 100644
--- a/ckanext/harvest/queue.py
+++ b/ckanext/harvest/queue.py
@@ -77,7 +77,7 @@ def gather_callback(message_data,message):
# matches
harvester_found = False
for harvester in PluginImplementations(IHarvester):
- if harvester.get_type() == job.source.type:
+ if harvester.info()['name'] == job.source.type:
harvester_found = True
# Get a list of harvest object ids from the plugin
job.gather_started = datetime.datetime.now()
@@ -123,7 +123,7 @@ def fetch_callback(message_data,message):
# the Harvester interface, only if the source type
# matches
for harvester in PluginImplementations(IHarvester):
- if harvester.get_type() == obj.source.type:
+ if harvester.info()['name'] == obj.source.type:
# See if the plugin can fetch the harvest object
obj.fetch_started = datetime.datetime.now()
diff --git a/ckanext/harvest/templates/source/edit.html b/ckanext/harvest/templates/source/edit.html
index 96465b5..ca811c7 100644
--- a/ckanext/harvest/templates/source/edit.html
+++ b/ckanext/harvest/templates/source/edit.html
@@ -8,6 +8,7 @@