139 lines
5.9 KiB
Python
139 lines
5.9 KiB
Python
from ckan.plugins.interfaces import Interface
|
|
|
|
|
|
class IHarvester(Interface):
|
|
'''
|
|
Common harvesting interface
|
|
|
|
'''
|
|
|
|
def info(self):
|
|
'''
|
|
Harvesting implementations must provide this method, which will return
|
|
a dictionary containing different descriptors of the harvester. The
|
|
returned dictionary should contain:
|
|
|
|
* name: machine-readable name. This will be the value stored in the
|
|
database, and the one used by ckanext-harvest to call the appropiate
|
|
harvester.
|
|
* title: human-readable name. This will appear in the form's select box
|
|
in the WUI.
|
|
* description: a small description of what the harvester does. This
|
|
will appear on the form as a guidance to the user.
|
|
|
|
A complete example may be::
|
|
|
|
{
|
|
'name': 'csw',
|
|
'title': 'CSW Server',
|
|
'description': 'A server that implements OGC's Catalog Service
|
|
for the Web (CSW) standard'
|
|
}
|
|
|
|
:returns: A dictionary with the harvester descriptors
|
|
'''
|
|
|
|
def validate_config(self, config):
|
|
'''
|
|
|
|
[optional]
|
|
|
|
Harvesters can provide this method to validate the configuration
|
|
entered in the form. It should return a single string, which will be
|
|
stored in the database. Exceptions raised will be shown in the form's
|
|
error messages.
|
|
|
|
:param harvest_object_id: Config string coming from the form
|
|
:returns: A string with the validated configuration options
|
|
'''
|
|
|
|
def get_original_url(self, harvest_object_id):
|
|
'''
|
|
|
|
[optional]
|
|
|
|
This optional but very recommended method allows harvesters to return
|
|
the URL to the original remote document, given a Harvest Object id.
|
|
Note that getting the harvest object you have access to its guid as
|
|
well as the object source, which has the URL.
|
|
This URL will be used on error reports to help publishers link to the
|
|
original document that has the errors. If this method is not provided
|
|
or no URL is returned, only a link to the local copy of the remote
|
|
document will be shown.
|
|
|
|
Examples:
|
|
* For a CKAN record: http://{ckan-instance}/api/rest/{guid}
|
|
* For a WAF record: http://{waf-root}/{file-name}
|
|
* For a CSW record: http://{csw-server}/?Request=GetElementById&Id={guid}&...
|
|
|
|
:param harvest_object_id: HarvestObject id
|
|
:returns: A string with the URL to the original document
|
|
'''
|
|
|
|
def gather_stage(self, harvest_job):
|
|
'''
|
|
The gather stage will receive a HarvestJob object and will be
|
|
responsible for:
|
|
- gathering all the necessary objects to fetch on a later.
|
|
stage (e.g. for a CSW server, perform a GetRecords request)
|
|
- creating the necessary HarvestObjects in the database, specifying
|
|
the guid and a reference to its job. The HarvestObjects need a
|
|
reference date with the last modified date for the resource, this
|
|
may need to be set in a different stage depending on the type of
|
|
source.
|
|
- creating and storing any suitable HarvestGatherErrors that may
|
|
occur.
|
|
- returning a list with all the ids of the created HarvestObjects.
|
|
- to abort the harvest, create a HarvestGatherError and raise an
|
|
exception. Any created HarvestObjects will be deleted.
|
|
|
|
:param harvest_job: HarvestJob object
|
|
:returns: A list of HarvestObject ids
|
|
'''
|
|
|
|
def fetch_stage(self, harvest_object):
|
|
'''
|
|
The fetch stage will receive a HarvestObject object and will be
|
|
responsible for:
|
|
- getting the contents of the remote object (e.g. for a CSW server,
|
|
perform a GetRecordById request).
|
|
- saving the content in the provided HarvestObject.
|
|
- creating and storing any suitable HarvestObjectErrors that may
|
|
occur.
|
|
- returning True if everything is ok (ie the object should now be
|
|
imported), "unchanged" if the object didn't need harvesting after
|
|
all (ie no error, but don't continue to import stage) or False if
|
|
there were errors.
|
|
|
|
:param harvest_object: HarvestObject object
|
|
:returns: True if successful, 'unchanged' if nothing to import after
|
|
all, False if not successful
|
|
'''
|
|
|
|
def import_stage(self, harvest_object):
|
|
'''
|
|
The import stage will receive a HarvestObject object and will be
|
|
responsible for:
|
|
- performing any necessary action with the fetched object (e.g.
|
|
create, update or delete a CKAN package).
|
|
Note: if this stage creates or updates a package, a reference
|
|
to the package should be added to the HarvestObject.
|
|
- setting the HarvestObject.package (if there is one)
|
|
- setting the HarvestObject.current for this harvest:
|
|
- True if successfully created/updated
|
|
- False if successfully deleted
|
|
- setting HarvestObject.current to False for previous harvest
|
|
objects of this harvest source if the action was successful.
|
|
- creating and storing any suitable HarvestObjectErrors that may
|
|
occur.
|
|
- creating the HarvestObject - Package relation (if necessary)
|
|
- returning True if the action was done, "unchanged" if the object
|
|
didn't need harvesting after all or False if there were errors.
|
|
|
|
NB You can run this stage repeatedly using 'paster harvest import'.
|
|
|
|
:param harvest_object: HarvestObject object
|
|
:returns: True if the action was done, "unchanged" if the object didn't
|
|
need harvesting after all or False if there were errors.
|
|
'''
|