diff --git a/README.rst b/README.rst index 4bdd029..1789fb5 100644 --- a/README.rst +++ b/README.rst @@ -9,7 +9,7 @@ Dependencies ============ The harvest extension uses Message Queuing to handle the different gather -stages. +stages. You will need to install the RabbitMQ server:: @@ -23,12 +23,12 @@ The extension uses `carrot` as messaging library:: Configuration ============= -Run the following command (in the ckanext-harvest directory) to create +Run the following command (in the ckanext-harvest directory) to create the necessary tables in the database:: paster harvester initdb --config=../ckan/development.ini -The extension needs a user with sysadmin privileges to perform the +The extension needs a user with sysadmin privileges to perform the harvesting jobs. You can create such a user running these two commands in the ckan directory:: @@ -53,25 +53,25 @@ Or with postgres:: Command line interface ====================== -The following operations can be run from the command line using the +The following operations can be run from the command line using the ``paster harvester`` command:: harvester initdb - Creates the necessary tables in the database - harvester source {url} {type} [{active}] [{user-id}] [{publisher-id}] + harvester source {url} {type} [{active}] [{user-id}] [{publisher-id}] - create new harvest source harvester rmsource {id} - remove (inactivate) a harvester source - harvester sources [all] + harvester sources [all] - lists harvest sources If 'all' is defined, it also shows the Inactive sources harvester job {source-id} - create new harvest job - + harvester jobs - lists harvest jobs @@ -83,9 +83,9 @@ The following operations can be run from the command line using the harvester fetch_consumer - starts the consumer for the fetching queue - + The commands should be run from the ckanext-harvest directory and expect -a development.ini file to be present. Most of the time you will specify +a development.ini file to be present. Most of the time you will specify the config explicitly though:: paster harvester sources --config=../ckan/development.ini @@ -103,18 +103,18 @@ Extensions can implement the harvester interface to perform harvesting operations. The harvesting process takes place on three stages: 1. The **gather** stage compiles all the resource identifiers that need to - be fetched in the next stage (e.g. in a CSW server, it will perform a + be fetched in the next stage (e.g. in a CSW server, it will perform a `GetRecords` operation). 2. The **fetch** stage gets the contents of the remote objects and stores - them in the database (e.g. in a CSW server, it will perform n + them in the database (e.g. in a CSW server, it will perform n `GetRecordById` operations). 3. The **import** stage performs any necessary actions on the fetched resource (generally creating a CKAN package, but it can be anything the extension needs). -Plugins willing to implement the harvesting interface must provide the +Plugins willing to implement the harvesting interface must provide the following methods:: from ckan.plugins.core import SingletonPlugin, implements @@ -126,17 +126,32 @@ following methods:: ''' implements(IHarvester) - def get_type(self): + def info(self): ''' - Plugins must provide this method, which will return a string with the - Harvester type implemented by the plugin (e.g ``CSW``,``INSPIRE``, etc). - This will ensure that they only receive Harvest Jobs and Objects - relevant to them. + Harvesting implementations must provide this method, which will return a + dictionary containing different descriptors of the harvester. The + returned dictionary should contain: - returns: A string with the harvester type + * name: machine-readable name. This will be the value stored in the + database, and the one used by ckanext-harvest to call the appropiate + harvester. + * title: human-readable name. This will appear in the form's select box + in the WUI. + * description: a small description of what the harvester does. This will + appear on the form as a guidance to the user. + + A complete example may be:: + + { + 'name': 'csw', + 'title': 'CSW Server', + 'description': 'A server that implements OGC's Catalog Service + for the Web (CSW) standard' + } + + returns: A dictionary with the harvester descriptors ''' - def gather_stage(self, harvest_job): ''' The gather stage will recieve a HarvestJob object and will be @@ -172,7 +187,7 @@ following methods:: ''' The import stage will receive a HarvestObject object and will be responsible for: - - performing any necessary action with the fetched object (e.g + - performing any necessary action with the fetched object (e.g create a CKAN package). Note: if this stage creates or updates a package, a reference to the package should be added to the HarvestObject. @@ -196,7 +211,7 @@ Running the harvest jobs The harvesting extension uses two different queues, one that handles the gathering and another one that handles the fetching and importing. To start -the consumers run the following command from the ckanext-harvest directory +the consumers run the following command from the ckanext-harvest directory (make sure you have your python environment activated):: paster harvester gather_consumer --config=../ckan/development.ini diff --git a/ckanext/harvest/controllers/view.py b/ckanext/harvest/controllers/view.py index 1b0662f..1695a40 100644 --- a/ckanext/harvest/controllers/view.py +++ b/ckanext/harvest/controllers/view.py @@ -9,7 +9,7 @@ from ckan.logic import NotFound, ValidationError from ckanext.harvest.logic.schema import harvest_source_form_schema from ckanext.harvest.lib import create_harvest_source, edit_harvest_source, \ get_harvest_source, get_harvest_sources, \ - create_harvest_job, get_registered_harvesters_types + create_harvest_job, get_registered_harvesters_info import logging log = logging.getLogger(__name__) @@ -39,7 +39,7 @@ class ViewController(BaseController): errors = errors or {} error_summary = error_summary or {} #TODO: Use new description interface to build the types select and descriptions - vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'types': get_registered_harvesters_types()} + vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'harvesters': get_registered_harvesters_info()} c.form = render('source/new_source_form.html', extra_vars=vars) return render('source/new.html') @@ -61,7 +61,7 @@ class ViewController(BaseController): abort(400, 'Integrity Error') except ValidationError,e: errors = e.error_dict - error_summary = e.error_summary if 'error_summary' in e else None + error_summary = e.error_summary if hasattr(e,'error_summary') else None return self.new(data_dict, errors, error_summary) def edit(self, id, data = None,errors = None, error_summary = None): @@ -79,7 +79,7 @@ class ViewController(BaseController): errors = errors or {} error_summary = error_summary or {} #TODO: Use new description interface to build the types select and descriptions - vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'types': get_registered_harvesters_types()} + vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'harvesters': get_registered_harvesters_info()} c.form = render('source/new_source_form.html', extra_vars=vars) return render('source/edit.html') @@ -99,7 +99,7 @@ class ViewController(BaseController): abort(404, _('Harvest Source not found')) except ValidationError,e: errors = e.error_dict - error_summary = e.error_summary if 'error_summary' in e else None + error_summary = e.error_summary if hasattr(e,'error_summary') else None return self.edit(id,data_dict, errors, error_summary) def _check_data_dict(self, data_dict): diff --git a/ckanext/harvest/harvesters.py b/ckanext/harvest/harvesters.py index 22116c5..fabbd14 100644 --- a/ckanext/harvest/harvesters.py +++ b/ckanext/harvest/harvesters.py @@ -75,8 +75,12 @@ class CKANHarvester(SingletonPlugin): err.save() log.error(message) - def get_type(self): - return 'CKAN' + def info(self): + return { + 'name': 'ckan', + 'title': 'CKAN', + 'description': 'Harvests remote CKAN instances' + } def gather_stage(self,harvest_job): log.debug('In CKANHarvester gather_stage') diff --git a/ckanext/harvest/interfaces.py b/ckanext/harvest/interfaces.py index d59bbfb..9d47883 100644 --- a/ckanext/harvest/interfaces.py +++ b/ckanext/harvest/interfaces.py @@ -6,17 +6,32 @@ class IHarvester(Interface): ''' - def get_type(self): + def info(self): ''' - Plugins must provide this method, which will return a string with the - Harvester type implemented by the plugin (e.g ``CSW``,``INSPIRE``, etc). - This will ensure that they only receive Harvest Jobs and Objects - relevant to them. + Harvesting implementations must provide this method, which will return a + dictionary containing different descriptors of the harvester. The + returned dictionary should contain: - returns: A string with the harvester type + * name: machine-readable name. This will be the value stored in the + database, and the one used by ckanext-harvest to call the appropiate + harvester. + * title: human-readable name. This will appear in the form's select box + in the WUI. + * description: a small description of what the harvester does. This will + appear on the form as a guidance to the user. + + A complete example may be:: + + { + 'name': 'csw', + 'title': 'CSW Server', + 'description': 'A server that implements OGC's Catalog Service + for the Web (CSW) standard' + } + + returns: A dictionary with the harvester descriptors ''' - def gather_stage(self, harvest_job): ''' The gather stage will recieve a HarvestJob object and will be @@ -55,7 +70,7 @@ class IHarvester(Interface): ''' The import stage will receive a HarvestObject object and will be responsible for: - - performing any necessary action with the fetched object (e.g + - performing any necessary action with the fetched object (e.g create a CKAN package). Note: if this stage creates or updates a package, a reference to the package should be added to the HarvestObject. diff --git a/ckanext/harvest/lib/__init__.py b/ckanext/harvest/lib/__init__.py index 30b94b3..d6fa701 100644 --- a/ckanext/harvest/lib/__init__.py +++ b/ckanext/harvest/lib/__init__.py @@ -196,7 +196,6 @@ def _prettify(field_name): return field_name.replace('_', ' ') def _error_summary(error_dict): - error_summary = {} for key, error in error_dict.iteritems(): error_summary[_prettify(key)] = error[0] @@ -373,7 +372,7 @@ def import_last_objects(source_id=None): if obj.guid != last_obj_guid: imported_objects.append(obj) for harvester in PluginImplementations(IHarvester): - if harvester.get_type() == obj.job.source.type: + if harvester.info()['name'] == obj.job.source.type: if hasattr(harvester,'force_import'): harvester.force_import = True harvester.import_stage(obj) @@ -381,9 +380,14 @@ def import_last_objects(source_id=None): return imported_objects -def get_registered_harvesters_types(): +def get_registered_harvesters_info(): # TODO: Use new description interface when implemented - available_types = [] + available_harvesters = [] for harvester in PluginImplementations(IHarvester): - available_types.append(harvester.get_type()) - return available_types + info = harvester.info() + if not info or 'name' not in info: + log.error('Harvester %r does not provide the harvester name in the info response' % str(harvester)) + continue + available_harvesters.append(info) + + return available_harvesters diff --git a/ckanext/harvest/logic/validators.py b/ckanext/harvest/logic/validators.py index 85c1454..9f7343c 100644 --- a/ckanext/harvest/logic/validators.py +++ b/ckanext/harvest/logic/validators.py @@ -66,7 +66,12 @@ def harvest_source_type_exists(value,context): # Get all the registered harvester types available_types = [] for harvester in PluginImplementations(IHarvester): - available_types.append(harvester.get_type()) + info = harvester.info() + if not info or 'name' not in info: + log.error('Harvester %r does not provide the harvester name in the info response' % str(harvester)) + continue + available_types.append(info['name']) + if not value in available_types: raise Invalid('Unknown harvester type: %s. Have you registered a harvester for this type?' % value) diff --git a/ckanext/harvest/public/ckanext/harvest/style.css b/ckanext/harvest/public/ckanext/harvest/style.css index 2aa3cfa..de04d84 100644 --- a/ckanext/harvest/public/ckanext/harvest/style.css +++ b/ckanext/harvest/public/ckanext/harvest/style.css @@ -9,3 +9,7 @@ #harvest-sources th.action{ font-style: italic; } + +.harvester-title{ + font-weight: bold; +} diff --git a/ckanext/harvest/queue.py b/ckanext/harvest/queue.py index 944e6e9..8029a94 100644 --- a/ckanext/harvest/queue.py +++ b/ckanext/harvest/queue.py @@ -77,7 +77,7 @@ def gather_callback(message_data,message): # matches harvester_found = False for harvester in PluginImplementations(IHarvester): - if harvester.get_type() == job.source.type: + if harvester.info()['name'] == job.source.type: harvester_found = True # Get a list of harvest object ids from the plugin job.gather_started = datetime.datetime.now() @@ -123,7 +123,7 @@ def fetch_callback(message_data,message): # the Harvester interface, only if the source type # matches for harvester in PluginImplementations(IHarvester): - if harvester.get_type() == obj.source.type: + if harvester.info()['name'] == obj.source.type: # See if the plugin can fetch the harvest object obj.fetch_started = datetime.datetime.now() diff --git a/ckanext/harvest/templates/source/edit.html b/ckanext/harvest/templates/source/edit.html index 96465b5..ca811c7 100644 --- a/ckanext/harvest/templates/source/edit.html +++ b/ckanext/harvest/templates/source/edit.html @@ -8,6 +8,7 @@ hide-sidebar +
diff --git a/ckanext/harvest/templates/source/new.html b/ckanext/harvest/templates/source/new.html index 16315fa..12ba71a 100644 --- a/ckanext/harvest/templates/source/new.html +++ b/ckanext/harvest/templates/source/new.html @@ -8,6 +8,7 @@ hide-sidebar +
diff --git a/ckanext/harvest/templates/source/new_source_form.html b/ckanext/harvest/templates/source/new_source_form.html index 9dd05b6..578a233 100644 --- a/ckanext/harvest/templates/source/new_source_form.html +++ b/ckanext/harvest/templates/source/new_source_form.html @@ -22,18 +22,17 @@
${errors.get('type', '')}
Which type of source does the URL above represent? - -
    -
  • A server's CSW interface
  • -
  • A Web Accessible Folder (WAF) displaying a list of GEMINI 2.1 documents
  • -
  • A single GEMINI 2.1 document
  • +
      + +
    • ${harvester.title}: ${harvester.description}
    • +