It will return the counts for the 20 most common errors for that
particular job. These will available when calling harvest_job_show.
Also refactor the harvest source status object to just call
harvest_job_dictize on the 'last_job' key, as it has all the
interesting fields anyway.
The authorization functions have been refactored to take into account
both the new organizaton based authorization on CKAN core and the
harvest source datasets.
Basically at the source level, authorization checks are forwarded to the
relevant package auth function (package_create, package_update, etc.)
wich will check for organizations membership, sysadmin, etc.
Also we only use functions available on the plugins toolkit whenever
possible.
Until now, harvest jobs were set to Finished just after sending all
objects to the fetch stage. Now every time the run command is run, jobs
are set to Running, and all previous Running jobs are checked to see if
all harvest objects have a state of Complete or Error. Only then the job
is flagged as Finished.
The status dict is added automatically to harvest source packages.
Note that the actual queries still need to be updated as they proabably
won't scale.
`harvest_source_create` and `harvest_source_update` now call
`package_create` and `package_update` respectively, making sure to
define a 'harvest_source' type. The returned dict uses the db_to_form
schema.
Use case: In ckanext-dgu we want to index the harvest_object.content field. As indexing is done synchronously we need
to provide a way for that harvest_object to be accessed when the current http request is made by a non-sysadmin user.
The publisher profile allows general users to handle harvest sources
based on membership to a certain group (publisher), as opposed to the
default auth profile where only sysadmins can perform any harvesting
task.
To enable it, put this directive in your ini file:
ckan.harvest.auth.profile = publisher
TODO:
* Save publisher id / user id when creating sources
* Show publisher in form and index page
The first version of the auth layer is based on the current policy, i.e.
you need to be sysadmin to perform any action.
TODO: the CLI is still not working.