Document method and cleanup the interface file
This commit is contained in:
parent
676c7d34b6
commit
8431182f01
51
README.rst
51
README.rst
|
@ -224,6 +224,7 @@ following methods::
|
||||||
'''
|
'''
|
||||||
implements(IHarvester)
|
implements(IHarvester)
|
||||||
|
|
||||||
|
|
||||||
def info(self):
|
def info(self):
|
||||||
'''
|
'''
|
||||||
Harvesting implementations must provide this method, which will return a
|
Harvesting implementations must provide this method, which will return a
|
||||||
|
@ -237,30 +238,53 @@ following methods::
|
||||||
in the WUI.
|
in the WUI.
|
||||||
* description: a small description of what the harvester does. This will
|
* description: a small description of what the harvester does. This will
|
||||||
appear on the form as a guidance to the user.
|
appear on the form as a guidance to the user.
|
||||||
* form_config_interface [optional]: Harvesters willing to store configuration
|
|
||||||
values in the database must provide this key. The only supported value is
|
|
||||||
'Text'. This will enable the configuration text box in the form. See also
|
|
||||||
the ``validate_config`` method.
|
|
||||||
|
|
||||||
A complete example may be::
|
A complete example may be::
|
||||||
|
|
||||||
{
|
{
|
||||||
'name': 'csw',
|
'name': 'csw',
|
||||||
'title': 'CSW Server',
|
'title': 'CSW Server',
|
||||||
'description': 'A server that implements OGC\'s Catalog Service
|
'description': 'A server that implements OGC's Catalog Service
|
||||||
for the Web (CSW) standard'
|
for the Web (CSW) standard'
|
||||||
}
|
}
|
||||||
|
|
||||||
returns: A dictionary with the harvester descriptors
|
:returns: A dictionary with the harvester descriptors
|
||||||
'''
|
'''
|
||||||
|
|
||||||
def validate_config(self, config):
|
def validate_config(self, config):
|
||||||
'''
|
'''
|
||||||
|
|
||||||
|
[optional]
|
||||||
|
|
||||||
Harvesters can provide this method to validate the configuration entered in the
|
Harvesters can provide this method to validate the configuration entered in the
|
||||||
form. It should return a single string, which will be stored in the database.
|
form. It should return a single string, which will be stored in the database.
|
||||||
Exceptions raised will be shown in the form's error messages.
|
Exceptions raised will be shown in the form's error messages.
|
||||||
|
|
||||||
returns A string with the validated configuration options
|
:param harvest_object_id: Config string coming from the form
|
||||||
|
:returns: A string with the validated configuration options
|
||||||
|
'''
|
||||||
|
|
||||||
|
def get_original_url(self, harvest_object_id):
|
||||||
|
'''
|
||||||
|
|
||||||
|
[optional]
|
||||||
|
|
||||||
|
This optional but very recommended method allows harvesters to return
|
||||||
|
the URL to the original remote document, given a Harvest Object id.
|
||||||
|
Note that getting the harvest object you have access to its guid as
|
||||||
|
well as the object source, which has the URL.
|
||||||
|
This URL will be used on error reports to help publishers link to the
|
||||||
|
original document that has the errors. If this method is not provided
|
||||||
|
or no URL is returned, only a link to the local copy of the remote
|
||||||
|
document will be shown.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
* For a CKAN record: http://{ckan-instance}/api/rest/{guid}
|
||||||
|
* For a WAF record: http://{waf-root}/{file-name}
|
||||||
|
* For a CSW record: http://{csw-server}/?Request=GetElementById&Id={guid}&...
|
||||||
|
|
||||||
|
:param harvest_object_id: HarvestObject id
|
||||||
|
:returns: A string with the URL to the original document
|
||||||
'''
|
'''
|
||||||
|
|
||||||
def gather_stage(self, harvest_job):
|
def gather_stage(self, harvest_job):
|
||||||
|
@ -270,7 +294,10 @@ following methods::
|
||||||
- gathering all the necessary objects to fetch on a later.
|
- gathering all the necessary objects to fetch on a later.
|
||||||
stage (e.g. for a CSW server, perform a GetRecords request)
|
stage (e.g. for a CSW server, perform a GetRecords request)
|
||||||
- creating the necessary HarvestObjects in the database, specifying
|
- creating the necessary HarvestObjects in the database, specifying
|
||||||
the guid and a reference to its source and job.
|
the guid and a reference to its job. The HarvestObjects need a
|
||||||
|
reference date with the last modified date for the resource, this
|
||||||
|
may need to be set in a different stage depending on the type of
|
||||||
|
source.
|
||||||
- creating and storing any suitable HarvestGatherErrors that may
|
- creating and storing any suitable HarvestGatherErrors that may
|
||||||
occur.
|
occur.
|
||||||
- returning a list with all the ids of the created HarvestObjects.
|
- returning a list with all the ids of the created HarvestObjects.
|
||||||
|
@ -301,8 +328,7 @@ following methods::
|
||||||
- performing any necessary action with the fetched object (e.g
|
- performing any necessary action with the fetched object (e.g
|
||||||
create a CKAN package).
|
create a CKAN package).
|
||||||
Note: if this stage creates or updates a package, a reference
|
Note: if this stage creates or updates a package, a reference
|
||||||
to the package must be added to the HarvestObject.
|
to the package should be added to the HarvestObject.
|
||||||
Additionally, the HarvestObject must be flagged as current.
|
|
||||||
- creating the HarvestObject - Package relation (if necessary)
|
- creating the HarvestObject - Package relation (if necessary)
|
||||||
- creating and storing any suitable HarvestObjectErrors that may
|
- creating and storing any suitable HarvestObjectErrors that may
|
||||||
occur.
|
occur.
|
||||||
|
@ -312,6 +338,7 @@ following methods::
|
||||||
:returns: True if everything went right, False if errors were found
|
:returns: True if everything went right, False if errors were found
|
||||||
'''
|
'''
|
||||||
|
|
||||||
|
|
||||||
See the CKAN harvester for an example of how to implement the harvesting
|
See the CKAN harvester for an example of how to implement the harvesting
|
||||||
interface:
|
interface:
|
||||||
|
|
||||||
|
@ -466,8 +493,8 @@ following steps with the one you are using.
|
||||||
you defined in the `stdout_logfile` section to see what happened. Common errors include::
|
you defined in the `stdout_logfile` section to see what happened. Common errors include::
|
||||||
|
|
||||||
`socket.error: [Errno 111] Connection refused`
|
`socket.error: [Errno 111] Connection refused`
|
||||||
RabbitMQ is not running::
|
RabbitMQ is not running::
|
||||||
|
|
||||||
sudo service rabbitmq-server start
|
sudo service rabbitmq-server start
|
||||||
|
|
||||||
4. Once we have the two consumers running and monitored, we just need to create a cron job
|
4. Once we have the two consumers running and monitored, we just need to create a cron job
|
||||||
|
|
|
@ -29,7 +29,43 @@ class IHarvester(Interface):
|
||||||
for the Web (CSW) standard'
|
for the Web (CSW) standard'
|
||||||
}
|
}
|
||||||
|
|
||||||
returns: A dictionary with the harvester descriptors
|
:returns: A dictionary with the harvester descriptors
|
||||||
|
'''
|
||||||
|
|
||||||
|
def validate_config(self, config):
|
||||||
|
'''
|
||||||
|
|
||||||
|
[optional]
|
||||||
|
|
||||||
|
Harvesters can provide this method to validate the configuration entered in the
|
||||||
|
form. It should return a single string, which will be stored in the database.
|
||||||
|
Exceptions raised will be shown in the form's error messages.
|
||||||
|
|
||||||
|
:param harvest_object_id: Config string coming from the form
|
||||||
|
:returns: A string with the validated configuration options
|
||||||
|
'''
|
||||||
|
|
||||||
|
def get_original_url(self, harvest_object_id):
|
||||||
|
'''
|
||||||
|
|
||||||
|
[optional]
|
||||||
|
|
||||||
|
This optional but very recommended method allows harvesters to return
|
||||||
|
the URL to the original remote document, given a Harvest Object id.
|
||||||
|
Note that getting the harvest object you have access to its guid as
|
||||||
|
well as the object source, which has the URL.
|
||||||
|
This URL will be used on error reports to help publishers link to the
|
||||||
|
original document that has the errors. If this method is not provided
|
||||||
|
or no URL is returned, only a link to the local copy of the remote
|
||||||
|
document will be shown.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
* For a CKAN record: http://{ckan-instance}/api/rest/{guid}
|
||||||
|
* For a WAF record: http://{waf-root}/{file-name}
|
||||||
|
* For a CSW record: http://{csw-server}/?Request=GetElementById&Id={guid}&...
|
||||||
|
|
||||||
|
:param harvest_object_id: HarvestObject id
|
||||||
|
:returns: A string with the URL to the original document
|
||||||
'''
|
'''
|
||||||
|
|
||||||
def gather_stage(self, harvest_job):
|
def gather_stage(self, harvest_job):
|
||||||
|
@ -82,4 +118,3 @@ class IHarvester(Interface):
|
||||||
:param harvest_object: HarvestObject object
|
:param harvest_object: HarvestObject object
|
||||||
:returns: True if everything went right, False if errors were found
|
:returns: True if everything went right, False if errors were found
|
||||||
'''
|
'''
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue