This prevents exceptions from appearing in the log from Jinja:
[error] [client 1.2.3.4] Error - <class 'jinja2.exceptions.UndefinedError'>: 'dict object' has no attribute 'status'
This is especially needed if you create a new harvest source which does not have all the optional arguments. Before this lead to a KeyError after the creation of the source. Now this simply output 'None'.
At some point we may want to transform these to local time at the
dictization level. We will need a library like dateutil to handle it
properly though.
Remove extras whose values are not strings (e.g. dicts, lists..) from
packages before attempting to create or update the packages on the
target site.
In CKAN 1 it was possible for the values of extras to be other types,
but in CKAN 2 they must be strings, so when harvesting from a CKAN 1 site
into a CKAN 2 site SQLAlchemy would crash when trying to create packages
with non-string extras.
The fix in this commit is to simply remove any non-string extras from
the harvested package. (Alternatively, we could try to convert them to a
string using JSON.)
Fixes#42.
If neither 'only_local' or 'create' are used the remote groups property
needs to be removed, otherwise it causes an exception when the group is
not found.
I have forgotten to update one check for the api_version 1 in the code
responsible for the remote group import feature. This commit fixes that.
Signed-off-by: Konrad Reiche <konrad.reiche@fokus.fraunhofer.de>
I have added try-except clauses in order to prevent the process from
crashing if a non-parsable integer is used for the api_version option.
Signed-off-by: Konrad Reiche <konrad.reiche@fokus.fraunhofer.de>
The CKAN logic uses integers when dealing with the API version, e.g.
making checks which API version is in use. Currently, the harvester
uses strings to identify the API version. Instead of dealing with
type conversion the harvester could use integers directly.
This commit fixesokfn/ckanext-harvest#36. When the API version is
parsed from the configuration it is passed through the int() function.
This way the harvesting will still work even if a harvest source was
configured with a string API version which makes this commit backward
compatible.
Signed-off-by: Konrad Reiche <konrad.reiche@fokus.fraunhofer.de>
Current implementation only checked for the first source to exist and
didn't allow to rerun the migration for other sources if there was an
error. With the new one, all non existing sources are migrated each
time.
When deleting a source, if clear_source equals true in the context,
harvest_source_clear will be called. Default is false. The UI shows a
select with the two options.
See issue for full details. Basically we don't want to catch any
exception at the queue.py level, as they prevent debugging. Harvesters
should deal with them and return a list of ids or an empty list if no
objects need to be fetched.
Also improved the debug messages.
Due to #607 in CKAN core, once a source was deleted you could not
reactivate it again. As a workaround, if the source is deleted the
Delete button is not shown and the state select is, so you can set it to
'active'.
Also fixed wrong redirect after deletion.
The harvest_source_delete logic function proxies to package delete,
which will delete the harvest source dataset. The harvest plugin then
hooks to the after_delete extension point in order to inactivate the
actual HarvestSource object and abort any pending jobs.
Also added the Delete button to the harvest source form.
When a new harvest_object for a new package was being created, it
was immediately being marked as false, as all objects were marked
as false, including the new object just created and newly marked
as current=true.
Fix so that old HarvestObjects are only marked as current=False
when updating an existing package.