Otherwise CKAN thinks they are uploads, datastore resources, etc, which
it can cause problems eg when displaying the URL of the resource. We
are just linking to the remote resource URL.
This was caused by a combination of the auth audit leaking and the
harvester reusing the context for the package_show and package_create
actions. If the package is not found, package_show does not call
check_access, and the auth audit does not pass. This is stored in the
context (`__auth_audit`) and is raised next time that we call
get_action (when we call package_create with the same context)
It could potentially be fixed on master, but it is probably quite rare.
The currently implementation returns False when a harvest source is being harvested. This leads to an error on the harvesting job, which in turn tends to confuse users that have no idea of this special implementation. This fix ensures that harvest sources are still ignored, but silently.
If the harvest source belongs to an organization, new datasets should be added
to it. This is already the case in the spatial harvesters.
The remote orgs logic has been kept, with the difference that if for
some reason the remote org can not be assigned, the local one is used.
If the source does not have an organization, none is added.
It's hard for someone outside CKAN to make sure they're sending it in the format
we expect. And they'll also have to keep track of our name format, to keep in
sync whenever we change.
To fix this, we simply do what we already do when creating packages: use a
default name. In this case, the current one.
Remove extras whose values are not strings (e.g. dicts, lists..) from
packages before attempting to create or update the packages on the
target site.
In CKAN 1 it was possible for the values of extras to be other types,
but in CKAN 2 they must be strings, so when harvesting from a CKAN 1 site
into a CKAN 2 site SQLAlchemy would crash when trying to create packages
with non-string extras.
The fix in this commit is to simply remove any non-string extras from
the harvested package. (Alternatively, we could try to convert them to a
string using JSON.)
Fixes#42.
If neither 'only_local' or 'create' are used the remote groups property
needs to be removed, otherwise it causes an exception when the group is
not found.
I have forgotten to update one check for the api_version 1 in the code
responsible for the remote group import feature. This commit fixes that.
Signed-off-by: Konrad Reiche <konrad.reiche@fokus.fraunhofer.de>
I have added try-except clauses in order to prevent the process from
crashing if a non-parsable integer is used for the api_version option.
Signed-off-by: Konrad Reiche <konrad.reiche@fokus.fraunhofer.de>
The CKAN logic uses integers when dealing with the API version, e.g.
making checks which API version is in use. Currently, the harvester
uses strings to identify the API version. Instead of dealing with
type conversion the harvester could use integers directly.
This commit fixesokfn/ckanext-harvest#36. When the API version is
parsed from the configuration it is passed through the int() function.
This way the harvesting will still work even if a harvest source was
configured with a string API version which makes this commit backward
compatible.
Signed-off-by: Konrad Reiche <konrad.reiche@fokus.fraunhofer.de>
When a new harvest_object for a new package was being created, it
was immediately being marked as false, as all objects were marked
as false, including the new object just created and newly marked
as current=true.
Fix so that old HarvestObjects are only marked as current=False
when updating an existing package.
change check on gather stage to check for changed packages since
last job instead of current harvest job's gather_start
fix attribute look up bug
fix print_job to print 0 gather_errors instead of key error
This is a convenience class that other harvesters can extend. Updates
include a cleanup of old functions and porting of enhancements from the
spatial harvesters.