Add a new info method to the harvester interface so implementations can provide details. Use this to build the WUI form
This commit is contained in:
parent
ce71379d25
commit
565eaf3d0a
57
README.rst
57
README.rst
|
@ -9,7 +9,7 @@ Dependencies
|
|||
============
|
||||
|
||||
The harvest extension uses Message Queuing to handle the different gather
|
||||
stages.
|
||||
stages.
|
||||
|
||||
You will need to install the RabbitMQ server::
|
||||
|
||||
|
@ -23,12 +23,12 @@ The extension uses `carrot` as messaging library::
|
|||
Configuration
|
||||
=============
|
||||
|
||||
Run the following command (in the ckanext-harvest directory) to create
|
||||
Run the following command (in the ckanext-harvest directory) to create
|
||||
the necessary tables in the database::
|
||||
|
||||
paster harvester initdb --config=../ckan/development.ini
|
||||
|
||||
The extension needs a user with sysadmin privileges to perform the
|
||||
The extension needs a user with sysadmin privileges to perform the
|
||||
harvesting jobs. You can create such a user running these two commands in
|
||||
the ckan directory::
|
||||
|
||||
|
@ -53,25 +53,25 @@ Or with postgres::
|
|||
Command line interface
|
||||
======================
|
||||
|
||||
The following operations can be run from the command line using the
|
||||
The following operations can be run from the command line using the
|
||||
``paster harvester`` command::
|
||||
|
||||
harvester initdb
|
||||
- Creates the necessary tables in the database
|
||||
|
||||
harvester source {url} {type} [{active}] [{user-id}] [{publisher-id}]
|
||||
harvester source {url} {type} [{active}] [{user-id}] [{publisher-id}]
|
||||
- create new harvest source
|
||||
|
||||
harvester rmsource {id}
|
||||
- remove (inactivate) a harvester source
|
||||
|
||||
harvester sources [all]
|
||||
harvester sources [all]
|
||||
- lists harvest sources
|
||||
If 'all' is defined, it also shows the Inactive sources
|
||||
|
||||
harvester job {source-id}
|
||||
- create new harvest job
|
||||
|
||||
|
||||
harvester jobs
|
||||
- lists harvest jobs
|
||||
|
||||
|
@ -83,9 +83,9 @@ The following operations can be run from the command line using the
|
|||
|
||||
harvester fetch_consumer
|
||||
- starts the consumer for the fetching queue
|
||||
|
||||
|
||||
The commands should be run from the ckanext-harvest directory and expect
|
||||
a development.ini file to be present. Most of the time you will specify
|
||||
a development.ini file to be present. Most of the time you will specify
|
||||
the config explicitly though::
|
||||
|
||||
paster harvester sources --config=../ckan/development.ini
|
||||
|
@ -103,18 +103,18 @@ Extensions can implement the harvester interface to perform harvesting
|
|||
operations. The harvesting process takes place on three stages:
|
||||
|
||||
1. The **gather** stage compiles all the resource identifiers that need to
|
||||
be fetched in the next stage (e.g. in a CSW server, it will perform a
|
||||
be fetched in the next stage (e.g. in a CSW server, it will perform a
|
||||
`GetRecords` operation).
|
||||
|
||||
2. The **fetch** stage gets the contents of the remote objects and stores
|
||||
them in the database (e.g. in a CSW server, it will perform n
|
||||
them in the database (e.g. in a CSW server, it will perform n
|
||||
`GetRecordById` operations).
|
||||
|
||||
3. The **import** stage performs any necessary actions on the fetched
|
||||
resource (generally creating a CKAN package, but it can be anything the
|
||||
extension needs).
|
||||
|
||||
Plugins willing to implement the harvesting interface must provide the
|
||||
Plugins willing to implement the harvesting interface must provide the
|
||||
following methods::
|
||||
|
||||
from ckan.plugins.core import SingletonPlugin, implements
|
||||
|
@ -126,17 +126,32 @@ following methods::
|
|||
'''
|
||||
implements(IHarvester)
|
||||
|
||||
def get_type(self):
|
||||
def info(self):
|
||||
'''
|
||||
Plugins must provide this method, which will return a string with the
|
||||
Harvester type implemented by the plugin (e.g ``CSW``,``INSPIRE``, etc).
|
||||
This will ensure that they only receive Harvest Jobs and Objects
|
||||
relevant to them.
|
||||
Harvesting implementations must provide this method, which will return a
|
||||
dictionary containing different descriptors of the harvester. The
|
||||
returned dictionary should contain:
|
||||
|
||||
returns: A string with the harvester type
|
||||
* name: machine-readable name. This will be the value stored in the
|
||||
database, and the one used by ckanext-harvest to call the appropiate
|
||||
harvester.
|
||||
* title: human-readable name. This will appear in the form's select box
|
||||
in the WUI.
|
||||
* description: a small description of what the harvester does. This will
|
||||
appear on the form as a guidance to the user.
|
||||
|
||||
A complete example may be::
|
||||
|
||||
{
|
||||
'name': 'csw',
|
||||
'title': 'CSW Server',
|
||||
'description': 'A server that implements OGC's Catalog Service
|
||||
for the Web (CSW) standard'
|
||||
}
|
||||
|
||||
returns: A dictionary with the harvester descriptors
|
||||
'''
|
||||
|
||||
|
||||
def gather_stage(self, harvest_job):
|
||||
'''
|
||||
The gather stage will recieve a HarvestJob object and will be
|
||||
|
@ -172,7 +187,7 @@ following methods::
|
|||
'''
|
||||
The import stage will receive a HarvestObject object and will be
|
||||
responsible for:
|
||||
- performing any necessary action with the fetched object (e.g
|
||||
- performing any necessary action with the fetched object (e.g
|
||||
create a CKAN package).
|
||||
Note: if this stage creates or updates a package, a reference
|
||||
to the package should be added to the HarvestObject.
|
||||
|
@ -196,7 +211,7 @@ Running the harvest jobs
|
|||
|
||||
The harvesting extension uses two different queues, one that handles the
|
||||
gathering and another one that handles the fetching and importing. To start
|
||||
the consumers run the following command from the ckanext-harvest directory
|
||||
the consumers run the following command from the ckanext-harvest directory
|
||||
(make sure you have your python environment activated)::
|
||||
|
||||
paster harvester gather_consumer --config=../ckan/development.ini
|
||||
|
|
|
@ -9,7 +9,7 @@ from ckan.logic import NotFound, ValidationError
|
|||
from ckanext.harvest.logic.schema import harvest_source_form_schema
|
||||
from ckanext.harvest.lib import create_harvest_source, edit_harvest_source, \
|
||||
get_harvest_source, get_harvest_sources, \
|
||||
create_harvest_job, get_registered_harvesters_types
|
||||
create_harvest_job, get_registered_harvesters_info
|
||||
|
||||
import logging
|
||||
log = logging.getLogger(__name__)
|
||||
|
@ -39,7 +39,7 @@ class ViewController(BaseController):
|
|||
errors = errors or {}
|
||||
error_summary = error_summary or {}
|
||||
#TODO: Use new description interface to build the types select and descriptions
|
||||
vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'types': get_registered_harvesters_types()}
|
||||
vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'harvesters': get_registered_harvesters_info()}
|
||||
|
||||
c.form = render('source/new_source_form.html', extra_vars=vars)
|
||||
return render('source/new.html')
|
||||
|
@ -61,7 +61,7 @@ class ViewController(BaseController):
|
|||
abort(400, 'Integrity Error')
|
||||
except ValidationError,e:
|
||||
errors = e.error_dict
|
||||
error_summary = e.error_summary if 'error_summary' in e else None
|
||||
error_summary = e.error_summary if hasattr(e,'error_summary') else None
|
||||
return self.new(data_dict, errors, error_summary)
|
||||
|
||||
def edit(self, id, data = None,errors = None, error_summary = None):
|
||||
|
@ -79,7 +79,7 @@ class ViewController(BaseController):
|
|||
errors = errors or {}
|
||||
error_summary = error_summary or {}
|
||||
#TODO: Use new description interface to build the types select and descriptions
|
||||
vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'types': get_registered_harvesters_types()}
|
||||
vars = {'data': data, 'errors': errors, 'error_summary': error_summary, 'harvesters': get_registered_harvesters_info()}
|
||||
|
||||
c.form = render('source/new_source_form.html', extra_vars=vars)
|
||||
return render('source/edit.html')
|
||||
|
@ -99,7 +99,7 @@ class ViewController(BaseController):
|
|||
abort(404, _('Harvest Source not found'))
|
||||
except ValidationError,e:
|
||||
errors = e.error_dict
|
||||
error_summary = e.error_summary if 'error_summary' in e else None
|
||||
error_summary = e.error_summary if hasattr(e,'error_summary') else None
|
||||
return self.edit(id,data_dict, errors, error_summary)
|
||||
|
||||
def _check_data_dict(self, data_dict):
|
||||
|
|
|
@ -75,8 +75,12 @@ class CKANHarvester(SingletonPlugin):
|
|||
err.save()
|
||||
log.error(message)
|
||||
|
||||
def get_type(self):
|
||||
return 'CKAN'
|
||||
def info(self):
|
||||
return {
|
||||
'name': 'ckan',
|
||||
'title': 'CKAN',
|
||||
'description': 'Harvests remote CKAN instances'
|
||||
}
|
||||
|
||||
def gather_stage(self,harvest_job):
|
||||
log.debug('In CKANHarvester gather_stage')
|
||||
|
|
|
@ -6,17 +6,32 @@ class IHarvester(Interface):
|
|||
|
||||
'''
|
||||
|
||||
def get_type(self):
|
||||
def info(self):
|
||||
'''
|
||||
Plugins must provide this method, which will return a string with the
|
||||
Harvester type implemented by the plugin (e.g ``CSW``,``INSPIRE``, etc).
|
||||
This will ensure that they only receive Harvest Jobs and Objects
|
||||
relevant to them.
|
||||
Harvesting implementations must provide this method, which will return a
|
||||
dictionary containing different descriptors of the harvester. The
|
||||
returned dictionary should contain:
|
||||
|
||||
returns: A string with the harvester type
|
||||
* name: machine-readable name. This will be the value stored in the
|
||||
database, and the one used by ckanext-harvest to call the appropiate
|
||||
harvester.
|
||||
* title: human-readable name. This will appear in the form's select box
|
||||
in the WUI.
|
||||
* description: a small description of what the harvester does. This will
|
||||
appear on the form as a guidance to the user.
|
||||
|
||||
A complete example may be::
|
||||
|
||||
{
|
||||
'name': 'csw',
|
||||
'title': 'CSW Server',
|
||||
'description': 'A server that implements OGC's Catalog Service
|
||||
for the Web (CSW) standard'
|
||||
}
|
||||
|
||||
returns: A dictionary with the harvester descriptors
|
||||
'''
|
||||
|
||||
|
||||
def gather_stage(self, harvest_job):
|
||||
'''
|
||||
The gather stage will recieve a HarvestJob object and will be
|
||||
|
@ -55,7 +70,7 @@ class IHarvester(Interface):
|
|||
'''
|
||||
The import stage will receive a HarvestObject object and will be
|
||||
responsible for:
|
||||
- performing any necessary action with the fetched object (e.g
|
||||
- performing any necessary action with the fetched object (e.g
|
||||
create a CKAN package).
|
||||
Note: if this stage creates or updates a package, a reference
|
||||
to the package should be added to the HarvestObject.
|
||||
|
|
|
@ -196,7 +196,6 @@ def _prettify(field_name):
|
|||
return field_name.replace('_', ' ')
|
||||
|
||||
def _error_summary(error_dict):
|
||||
|
||||
error_summary = {}
|
||||
for key, error in error_dict.iteritems():
|
||||
error_summary[_prettify(key)] = error[0]
|
||||
|
@ -373,7 +372,7 @@ def import_last_objects(source_id=None):
|
|||
if obj.guid != last_obj_guid:
|
||||
imported_objects.append(obj)
|
||||
for harvester in PluginImplementations(IHarvester):
|
||||
if harvester.get_type() == obj.job.source.type:
|
||||
if harvester.info()['name'] == obj.job.source.type:
|
||||
if hasattr(harvester,'force_import'):
|
||||
harvester.force_import = True
|
||||
harvester.import_stage(obj)
|
||||
|
@ -381,9 +380,14 @@ def import_last_objects(source_id=None):
|
|||
|
||||
return imported_objects
|
||||
|
||||
def get_registered_harvesters_types():
|
||||
def get_registered_harvesters_info():
|
||||
# TODO: Use new description interface when implemented
|
||||
available_types = []
|
||||
available_harvesters = []
|
||||
for harvester in PluginImplementations(IHarvester):
|
||||
available_types.append(harvester.get_type())
|
||||
return available_types
|
||||
info = harvester.info()
|
||||
if not info or 'name' not in info:
|
||||
log.error('Harvester %r does not provide the harvester name in the info response' % str(harvester))
|
||||
continue
|
||||
available_harvesters.append(info)
|
||||
|
||||
return available_harvesters
|
||||
|
|
|
@ -66,7 +66,12 @@ def harvest_source_type_exists(value,context):
|
|||
# Get all the registered harvester types
|
||||
available_types = []
|
||||
for harvester in PluginImplementations(IHarvester):
|
||||
available_types.append(harvester.get_type())
|
||||
info = harvester.info()
|
||||
if not info or 'name' not in info:
|
||||
log.error('Harvester %r does not provide the harvester name in the info response' % str(harvester))
|
||||
continue
|
||||
available_types.append(info['name'])
|
||||
|
||||
|
||||
if not value in available_types:
|
||||
raise Invalid('Unknown harvester type: %s. Have you registered a harvester for this type?' % value)
|
||||
|
|
|
@ -9,3 +9,7 @@
|
|||
#harvest-sources th.action{
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
.harvester-title{
|
||||
font-weight: bold;
|
||||
}
|
||||
|
|
|
@ -77,7 +77,7 @@ def gather_callback(message_data,message):
|
|||
# matches
|
||||
harvester_found = False
|
||||
for harvester in PluginImplementations(IHarvester):
|
||||
if harvester.get_type() == job.source.type:
|
||||
if harvester.info()['name'] == job.source.type:
|
||||
harvester_found = True
|
||||
# Get a list of harvest object ids from the plugin
|
||||
job.gather_started = datetime.datetime.now()
|
||||
|
@ -123,7 +123,7 @@ def fetch_callback(message_data,message):
|
|||
# the Harvester interface, only if the source type
|
||||
# matches
|
||||
for harvester in PluginImplementations(IHarvester):
|
||||
if harvester.get_type() == obj.source.type:
|
||||
if harvester.info()['name'] == obj.source.type:
|
||||
|
||||
# See if the plugin can fetch the harvest object
|
||||
obj.fetch_started = datetime.datetime.now()
|
||||
|
|
|
@ -8,6 +8,7 @@
|
|||
<py:def function="body_class">hide-sidebar</py:def>
|
||||
<py:def function="optional_head">
|
||||
<link rel="stylesheet" href="${g.site_url}/css/forms.css" type="text/css" media="screen, print" />
|
||||
<link type="text/css" rel="stylesheet" media="all" href="/ckanext/harvest/style.css" />
|
||||
</py:def>
|
||||
|
||||
<div py:match="content">
|
||||
|
|
|
@ -8,6 +8,7 @@
|
|||
<py:def function="body_class">hide-sidebar</py:def>
|
||||
<py:def function="optional_head">
|
||||
<link rel="stylesheet" href="${g.site_url}/css/forms.css" type="text/css" media="screen, print" />
|
||||
<link type="text/css" rel="stylesheet" media="all" href="/ckanext/harvest/style.css" />
|
||||
</py:def>
|
||||
|
||||
<div py:match="content">
|
||||
|
|
|
@ -22,18 +22,17 @@
|
|||
<dt><label class="field_req" for="type">Source Type *</label></dt>
|
||||
<dd>
|
||||
<select id="type" name="type">
|
||||
<py:for each="type in types">
|
||||
<option value="${type}" py:attrs="{'selected': 'selected' if data.get('type', '') == type else None}" >${type}</option>
|
||||
<py:for each="harvester in harvesters">
|
||||
<option value="${harvester.name}" py:attrs="{'selected': 'selected' if data.get('type', '') == harvester.name else None}" >${harvester.title}</option>
|
||||
</py:for>
|
||||
</select>
|
||||
</dd>
|
||||
<dd class="field_error" py:if="errors.get('type', '')">${errors.get('type', '')}</dd>
|
||||
<dd class="instructions basic">Which type of source does the URL above represent?
|
||||
<!--TODO: get these from the harvesters-->
|
||||
<ul>
|
||||
<li>A server's CSW interface</li>
|
||||
<li>A Web Accessible Folder (WAF) displaying a list of GEMINI 2.1 documents</li>
|
||||
<li>A single GEMINI 2.1 document</li>
|
||||
<ul>
|
||||
<py:for each="harvester in harvesters">
|
||||
<li><span class="harvester-title">${harvester.title}</span>: ${harvester.description}</li>
|
||||
</py:for>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt><label class="field_opt" for="description">Description</label></dt>
|
||||
|
|
Loading…
Reference in New Issue