Add a new info method to the harvester interface so implementations can provide details. Use this to build the WUI form

This commit is contained in:
Adrià Mercader 2011-05-13 18:39:36 +01:00
parent ce71379d25
commit 565eaf3d0a
11 changed files with 100 additions and 52 deletions

View File

@ -9,7 +9,7 @@ Dependencies
============ ============
The harvest extension uses Message Queuing to handle the different gather The harvest extension uses Message Queuing to handle the different gather
stages. stages.
You will need to install the RabbitMQ server:: You will need to install the RabbitMQ server::
@ -23,12 +23,12 @@ The extension uses `carrot` as messaging library::
Configuration Configuration
============= =============
Run the following command (in the ckanext-harvest directory) to create Run the following command (in the ckanext-harvest directory) to create
the necessary tables in the database:: the necessary tables in the database::
paster harvester initdb --config=../ckan/development.ini paster harvester initdb --config=../ckan/development.ini
The extension needs a user with sysadmin privileges to perform the The extension needs a user with sysadmin privileges to perform the
harvesting jobs. You can create such a user running these two commands in harvesting jobs. You can create such a user running these two commands in
the ckan directory:: the ckan directory::
@ -53,25 +53,25 @@ Or with postgres::
Command line interface Command line interface
====================== ======================
The following operations can be run from the command line using the The following operations can be run from the command line using the
``paster harvester`` command:: ``paster harvester`` command::
harvester initdb harvester initdb
- Creates the necessary tables in the database - Creates the necessary tables in the database
harvester source {url} {type} [{active}] [{user-id}] [{publisher-id}] harvester source {url} {type} [{active}] [{user-id}] [{publisher-id}]
- create new harvest source - create new harvest source
harvester rmsource {id} harvester rmsource {id}
- remove (inactivate) a harvester source - remove (inactivate) a harvester source
harvester sources [all] harvester sources [all]
- lists harvest sources - lists harvest sources
If 'all' is defined, it also shows the Inactive sources If 'all' is defined, it also shows the Inactive sources
harvester job {source-id} harvester job {source-id}
- create new harvest job - create new harvest job
harvester jobs harvester jobs
- lists harvest jobs - lists harvest jobs
@ -83,9 +83,9 @@ The following operations can be run from the command line using the
harvester fetch_consumer harvester fetch_consumer
- starts the consumer for the fetching queue - starts the consumer for the fetching queue
The commands should be run from the ckanext-harvest directory and expect The commands should be run from the ckanext-harvest directory and expect
a development.ini file to be present. Most of the time you will specify a development.ini file to be present. Most of the time you will specify
the config explicitly though:: the config explicitly though::
paster harvester sources --config=../ckan/development.ini paster harvester sources --config=../ckan/development.ini
@ -103,18 +103,18 @@ Extensions can implement the harvester interface to perform harvesting
operations. The harvesting process takes place on three stages: operations. The harvesting process takes place on three stages:
1. The **gather** stage compiles all the resource identifiers that need to 1. The **gather** stage compiles all the resource identifiers that need to
be fetched in the next stage (e.g. in a CSW server, it will perform a be fetched in the next stage (e.g. in a CSW server, it will perform a
`GetRecords` operation). `GetRecords` operation).
2. The **fetch** stage gets the contents of the remote objects and stores 2. The **fetch** stage gets the contents of the remote objects and stores
them in the database (e.g. in a CSW server, it will perform n them in the database (e.g. in a CSW server, it will perform n
`GetRecordById` operations). `GetRecordById` operations).
3. The **import** stage performs any necessary actions on the fetched 3. The **import** stage performs any necessary actions on the fetched
resource (generally creating a CKAN package, but it can be anything the resource (generally creating a CKAN package, but it can be anything the
extension needs). extension needs).
Plugins willing to implement the harvesting interface must provide the Plugins willing to implement the harvesting interface must provide the
following methods:: following methods::
from ckan.plugins.core import SingletonPlugin, implements from ckan.plugins.core import SingletonPlugin, implements
@ -126,17 +126,32 @@ following methods::
''' '''
implements(IHarvester) implements(IHarvester)
def get_type(self): def info(self):
''' '''
Plugins must provide this method, which will return a string with the Harvesting implementations must provide this method, which will return a
Harvester type implemented by the plugin (e.g ``CSW``,``INSPIRE``, etc). dictionary containing different descriptors of the harvester. The
This will ensure that they only receive Harvest Jobs and Objects returned dictionary should contain:
relevant to them.
returns: A string with the harvester type * name: machine-readable name. This will be the value stored in the
database, and the one used by ckanext-harvest to call the appropiate
harvester.
* title: human-readable name. This will appear in the form's select box
in the WUI.
* description: a small description of what the harvester does. This will
appear on the form as a guidance to the user.
A complete example may be::
{
'name': 'csw',
'title': 'CSW Server',
'description': 'A server that implements OGC's Catalog Service
for the Web (CSW) standard'
}
returns: A dictionary with the harvester descriptors
''' '''
def gather_stage(self, harvest_job): def gather_stage(self, harvest_job):
''' '''
The gather stage will recieve a HarvestJob object and will be The gather stage will recieve a HarvestJob object and will be
@ -172,7 +187,7 @@ following methods::
''' '''
The import stage will receive a HarvestObject object and will be The import stage will receive a HarvestObject object and will be
responsible for: responsible for:
- performing any necessary action with the fetched object (e.g - performing any necessary action with the fetched object (e.g
create a CKAN package). create a CKAN package).
Note: if this stage creates or updates a package, a reference Note: if this stage creates or updates a package, a reference
to the package should be added to the HarvestObject. to the package should be added to the HarvestObject.
@ -196,7 +211,7 @@ Running the harvest jobs
The harvesting extension uses two different queues, one that handles the The harvesting extension uses two different queues, one that handles the
gathering and another one that handles the fetching and importing. To start gathering and another one that handles the fetching and importing. To start
the consumers run the following command from the ckanext-harvest directory the consumers run the following command from the ckanext-harvest directory
(make sure you have your python environment activated):: (make sure you have your python environment activated)::
paster harvester gather_consumer --config=../ckan/development.ini paster harvester gather_consumer --config=../ckan/development.ini

View File

@ -9,7 +9,7 @@ from ckan.logic import NotFound, ValidationError
from ckanext.harvest.logic.schema import harvest_source_form_schema from ckanext.harvest.logic.schema import harvest_source_form_schema
from ckanext.harvest.lib import create_harvest_source, edit_harvest_source, \ from ckanext.harvest.lib import create_harvest_source, edit_harvest_source, \
get_harvest_source, get_harvest_sources, \ get_harvest_source, get_harvest_sources, \
create_harvest_job, get_registered_harvesters_types create_harvest_job, get_registered_harvesters_info
import logging import logging
log = logging.getLogger(__name__) log = logging.getLogger(__name__)
@ -39,7 +39,7 @@ class ViewController(BaseController):
errors = errors or {} errors = errors or {}
error_summary = error_summary or {} error_summary = error_summary or {}
#TODO: Use new description interface to build the types select and descriptions #TODO: Use new description interface to build the types select and descriptions
vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'types': get_registered_harvesters_types()} vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'harvesters': get_registered_harvesters_info()}
c.form = render('source/new_source_form.html', extra_vars=vars) c.form = render('source/new_source_form.html', extra_vars=vars)
return render('source/new.html') return render('source/new.html')
@ -61,7 +61,7 @@ class ViewController(BaseController):
abort(400, 'Integrity Error') abort(400, 'Integrity Error')
except ValidationError,e: except ValidationError,e:
errors = e.error_dict errors = e.error_dict
error_summary = e.error_summary if 'error_summary' in e else None error_summary = e.error_summary if hasattr(e,'error_summary') else None
return self.new(data_dict, errors, error_summary) return self.new(data_dict, errors, error_summary)
def edit(self, id, data = None,errors = None, error_summary = None): def edit(self, id, data = None,errors = None, error_summary = None):
@ -79,7 +79,7 @@ class ViewController(BaseController):
errors = errors or {} errors = errors or {}
error_summary = error_summary or {} error_summary = error_summary or {}
#TODO: Use new description interface to build the types select and descriptions #TODO: Use new description interface to build the types select and descriptions
vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'types': get_registered_harvesters_types()} vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'harvesters': get_registered_harvesters_info()}
c.form = render('source/new_source_form.html', extra_vars=vars) c.form = render('source/new_source_form.html', extra_vars=vars)
return render('source/edit.html') return render('source/edit.html')
@ -99,7 +99,7 @@ class ViewController(BaseController):
abort(404, _('Harvest Source not found')) abort(404, _('Harvest Source not found'))
except ValidationError,e: except ValidationError,e:
errors = e.error_dict errors = e.error_dict
error_summary = e.error_summary if 'error_summary' in e else None error_summary = e.error_summary if hasattr(e,'error_summary') else None
return self.edit(id,data_dict, errors, error_summary) return self.edit(id,data_dict, errors, error_summary)
def _check_data_dict(self, data_dict): def _check_data_dict(self, data_dict):

View File

@ -75,8 +75,12 @@ class CKANHarvester(SingletonPlugin):
err.save() err.save()
log.error(message) log.error(message)
def get_type(self): def info(self):
return 'CKAN' return {
'name': 'ckan',
'title': 'CKAN',
'description': 'Harvests remote CKAN instances'
}
def gather_stage(self,harvest_job): def gather_stage(self,harvest_job):
log.debug('In CKANHarvester gather_stage') log.debug('In CKANHarvester gather_stage')

View File

@ -6,17 +6,32 @@ class IHarvester(Interface):
''' '''
def get_type(self): def info(self):
''' '''
Plugins must provide this method, which will return a string with the Harvesting implementations must provide this method, which will return a
Harvester type implemented by the plugin (e.g ``CSW``,``INSPIRE``, etc). dictionary containing different descriptors of the harvester. The
This will ensure that they only receive Harvest Jobs and Objects returned dictionary should contain:
relevant to them.
returns: A string with the harvester type * name: machine-readable name. This will be the value stored in the
database, and the one used by ckanext-harvest to call the appropiate
harvester.
* title: human-readable name. This will appear in the form's select box
in the WUI.
* description: a small description of what the harvester does. This will
appear on the form as a guidance to the user.
A complete example may be::
{
'name': 'csw',
'title': 'CSW Server',
'description': 'A server that implements OGC's Catalog Service
for the Web (CSW) standard'
}
returns: A dictionary with the harvester descriptors
''' '''
def gather_stage(self, harvest_job): def gather_stage(self, harvest_job):
''' '''
The gather stage will recieve a HarvestJob object and will be The gather stage will recieve a HarvestJob object and will be
@ -55,7 +70,7 @@ class IHarvester(Interface):
''' '''
The import stage will receive a HarvestObject object and will be The import stage will receive a HarvestObject object and will be
responsible for: responsible for:
- performing any necessary action with the fetched object (e.g - performing any necessary action with the fetched object (e.g
create a CKAN package). create a CKAN package).
Note: if this stage creates or updates a package, a reference Note: if this stage creates or updates a package, a reference
to the package should be added to the HarvestObject. to the package should be added to the HarvestObject.

View File

@ -196,7 +196,6 @@ def _prettify(field_name):
return field_name.replace('_', ' ') return field_name.replace('_', ' ')
def _error_summary(error_dict): def _error_summary(error_dict):
error_summary = {} error_summary = {}
for key, error in error_dict.iteritems(): for key, error in error_dict.iteritems():
error_summary[_prettify(key)] = error[0] error_summary[_prettify(key)] = error[0]
@ -373,7 +372,7 @@ def import_last_objects(source_id=None):
if obj.guid != last_obj_guid: if obj.guid != last_obj_guid:
imported_objects.append(obj) imported_objects.append(obj)
for harvester in PluginImplementations(IHarvester): for harvester in PluginImplementations(IHarvester):
if harvester.get_type() == obj.job.source.type: if harvester.info()['name'] == obj.job.source.type:
if hasattr(harvester,'force_import'): if hasattr(harvester,'force_import'):
harvester.force_import = True harvester.force_import = True
harvester.import_stage(obj) harvester.import_stage(obj)
@ -381,9 +380,14 @@ def import_last_objects(source_id=None):
return imported_objects return imported_objects
def get_registered_harvesters_types(): def get_registered_harvesters_info():
# TODO: Use new description interface when implemented # TODO: Use new description interface when implemented
available_types = [] available_harvesters = []
for harvester in PluginImplementations(IHarvester): for harvester in PluginImplementations(IHarvester):
available_types.append(harvester.get_type()) info = harvester.info()
return available_types if not info or 'name' not in info:
log.error('Harvester %r does not provide the harvester name in the info response' % str(harvester))
continue
available_harvesters.append(info)
return available_harvesters

View File

@ -66,7 +66,12 @@ def harvest_source_type_exists(value,context):
# Get all the registered harvester types # Get all the registered harvester types
available_types = [] available_types = []
for harvester in PluginImplementations(IHarvester): for harvester in PluginImplementations(IHarvester):
available_types.append(harvester.get_type()) info = harvester.info()
if not info or 'name' not in info:
log.error('Harvester %r does not provide the harvester name in the info response' % str(harvester))
continue
available_types.append(info['name'])
if not value in available_types: if not value in available_types:
raise Invalid('Unknown harvester type: %s. Have you registered a harvester for this type?' % value) raise Invalid('Unknown harvester type: %s. Have you registered a harvester for this type?' % value)

View File

@ -9,3 +9,7 @@
#harvest-sources th.action{ #harvest-sources th.action{
font-style: italic; font-style: italic;
} }
.harvester-title{
font-weight: bold;
}

View File

@ -77,7 +77,7 @@ def gather_callback(message_data,message):
# matches # matches
harvester_found = False harvester_found = False
for harvester in PluginImplementations(IHarvester): for harvester in PluginImplementations(IHarvester):
if harvester.get_type() == job.source.type: if harvester.info()['name'] == job.source.type:
harvester_found = True harvester_found = True
# Get a list of harvest object ids from the plugin # Get a list of harvest object ids from the plugin
job.gather_started = datetime.datetime.now() job.gather_started = datetime.datetime.now()
@ -123,7 +123,7 @@ def fetch_callback(message_data,message):
# the Harvester interface, only if the source type # the Harvester interface, only if the source type
# matches # matches
for harvester in PluginImplementations(IHarvester): for harvester in PluginImplementations(IHarvester):
if harvester.get_type() == obj.source.type: if harvester.info()['name'] == obj.source.type:
# See if the plugin can fetch the harvest object # See if the plugin can fetch the harvest object
obj.fetch_started = datetime.datetime.now() obj.fetch_started = datetime.datetime.now()

View File

@ -8,6 +8,7 @@
<py:def function="body_class">hide-sidebar</py:def> <py:def function="body_class">hide-sidebar</py:def>
<py:def function="optional_head"> <py:def function="optional_head">
<link rel="stylesheet" href="${g.site_url}/css/forms.css" type="text/css" media="screen, print" /> <link rel="stylesheet" href="${g.site_url}/css/forms.css" type="text/css" media="screen, print" />
<link type="text/css" rel="stylesheet" media="all" href="/ckanext/harvest/style.css" />
</py:def> </py:def>
<div py:match="content"> <div py:match="content">

View File

@ -8,6 +8,7 @@
<py:def function="body_class">hide-sidebar</py:def> <py:def function="body_class">hide-sidebar</py:def>
<py:def function="optional_head"> <py:def function="optional_head">
<link rel="stylesheet" href="${g.site_url}/css/forms.css" type="text/css" media="screen, print" /> <link rel="stylesheet" href="${g.site_url}/css/forms.css" type="text/css" media="screen, print" />
<link type="text/css" rel="stylesheet" media="all" href="/ckanext/harvest/style.css" />
</py:def> </py:def>
<div py:match="content"> <div py:match="content">

View File

@ -22,18 +22,17 @@
<dt><label class="field_req" for="type">Source Type *</label></dt> <dt><label class="field_req" for="type">Source Type *</label></dt>
<dd> <dd>
<select id="type" name="type"> <select id="type" name="type">
<py:for each="type in types"> <py:for each="harvester in harvesters">
<option value="${type}" py:attrs="{'selected': 'selected' if data.get('type', '') == type else None}" >${type}</option> <option value="${harvester.name}" py:attrs="{'selected': 'selected' if data.get('type', '') == harvester.name else None}" >${harvester.title}</option>
</py:for> </py:for>
</select> </select>
</dd> </dd>
<dd class="field_error" py:if="errors.get('type', '')">${errors.get('type', '')}</dd> <dd class="field_error" py:if="errors.get('type', '')">${errors.get('type', '')}</dd>
<dd class="instructions basic">Which type of source does the URL above represent? <dd class="instructions basic">Which type of source does the URL above represent?
<!--TODO: get these from the harvesters--> <ul>
<ul> <py:for each="harvester in harvesters">
<li>A server's CSW interface</li> <li><span class="harvester-title">${harvester.title}</span>: ${harvester.description}</li>
<li>A Web Accessible Folder (WAF) displaying a list of GEMINI 2.1 documents</li> </py:for>
<li>A single GEMINI 2.1 document</li>
</ul> </ul>
</dd> </dd>
<dt><label class="field_opt" for="description">Description</label></dt> <dt><label class="field_opt" for="description">Description</label></dt>