Commit Graph

65 Commits

Author SHA1 Message Date
Stefan Oderbolz 0fd38e0e54 Use _get_group as a fallback for remote orgs
First try to get a remote org from the remote Action API, if this fails
try to use the old rest api call, which works on older CKAN versions.

Only if both options fail, its currently not possible to get the remote
organization.
2015-01-14 00:10:27 +01:00
Stefan Oderbolz f214577872 Fetch remote organization via action api
Organizations used to be returned by /api/2/rest/group, this is what the
old implementation used to fetch the information to create the remote
organization on the local instance of CKAN.

With this commit the Action API is used to fetch the same information.
2015-01-13 14:46:53 +01:00
amercader 8cf254f112 Merge branch '99-all-non-ascii-tags' of https://github.com/morty/ckanext-harvest into morty-99-all-non-ascii-tags 2014-08-29 14:40:43 +01:00
Tom Mortimer-Jones 8a2c072d4e [101] Use name from database when reharvesting package 2014-08-12 11:18:48 +01:00
Tom Mortimer-Jones 65cfade420 [99] Remove empty tags produced by munging all non-ascii tags
I thought this way of filtering was easier to read than filter(None, tags)
2014-08-07 17:05:16 +01:00
amercader fbde0b8dc1 [#87] Remove remote url_type from resources
Otherwise CKAN thinks they are uploads, datastore resources, etc, which
it can cause problems eg when displaying the URL of the resource. We
are just linking to the remote resource URL.
2014-02-11 17:27:19 +00:00
amercader 2a07a144fc [#84] Fix auth audit exception when creating datasets
This was caused by a combination of the auth audit leaking and the
harvester reusing the context for the package_show and package_create
actions. If the package is not found, package_show does not call
check_access, and the auth audit does not pass. This is stored in the
context (`__auth_audit`) and is raised next time that we call
get_action (when we call package_create with the same context)

It could potentially be fixed on master, but it is probably quite rare.
2014-02-10 18:22:48 +00:00
amercader 5b677b6099 [#83] Fix key error when using default_groups 2014-02-10 13:16:58 +00:00
Rachel Knowler bf11e4d330 Moved clean_tags check into _create_or_update_package method. 2014-02-10 09:29:01 +01:00
Rachel Knowler 2ba9908653 Config option to munge tags changed to be consistent with other config options in this extension, and noted in README. 2014-01-29 10:55:51 +01:00
Rachel Knowler 5e1aef1d08 Removed extra newline. 2014-01-29 10:06:32 +01:00
Rachel Knowler 7d71b0a00b Wrap tag munging code in config option, defaulting to False. 2014-01-29 10:02:16 +01:00
Stefan Oderbolz c52085006a [#61] Truly ignore harvest sources
The currently implementation returns False when a harvest source is being harvested. This leads to an error on the harvesting job, which in turn tends to confuse users that have no idea of this special implementation. This fix ensures that harvest sources are still ignored, but silently.
2013-10-23 07:40:55 +02:00
amercader c18d9dc3af [#71] CKAN harvester: Add datasets to source organization
If the harvest source belongs to an organization, new datasets should be added
to it. This is already the case in the spatial harvesters.

The remote orgs logic has been kept, with the difference that if for
some reason the remote org can not be assigned, the local one is used.

If the source does not have an organization, none is added.
2013-10-22 16:24:43 +01:00
amercader bd62b62764 Merge branch 'metaodi-add-harvesting-of-organizations' 2013-10-15 17:50:04 +01:00
Stefan Oderbolz 8b5d70c6fe Only try to create/match a organization if there is a remote_org 2013-10-11 18:08:32 +02:00
amercader 0f5624822c Use remote name if present when creating datasets on CKAN harvester 2013-10-11 16:50:25 +01:00
Stefan Oderbolz dd1acd0c6b Use remote_orgs for organizations 2013-10-07 11:22:19 +02:00
Stefan Oderbolz d50eb6fca8 Harvesting of remote organisations similar to remote groups 2013-10-04 16:37:52 +02:00
Vitor Baptista f028375ad3 [#62] Use current name when updating package, if the user haven't sent a new one
It's hard for someone outside CKAN to make sure they're sending it in the format
we expect. And they'll also have to keep track of our name format, to keep in
sync whenever we change.

To fix this, we simply do what we already do when creating packages: use a
default name. In this case, the current one.
2013-08-18 12:08:30 -03:00
amercader 01ca5c0dfd [#61] Ignore harvest sources on the CKAN harvester 2013-08-15 14:38:33 +01:00
amercader b25fffda93 [#36] Fix bug on API version checking 2013-08-15 14:37:55 +01:00
amercader 39ad78d90a [#59] Ignore auth in the CKAN harvester 2013-08-15 14:37:12 +01:00
amercader 584c340583 Merge branch '42-remove-non-string-extras' 2013-06-03 10:33:59 +01:00
Sean Hammond 01df3a1db4 [#42] Dump non-string extras with json
Convert any non-string extra values to strings using json.dumps(),
instead of just deleting them.
2013-05-31 20:35:06 +02:00
amercader 3a31db59b6 [#36] Move validation code to validate_config
This ensures it is checked whenever the source is edited or created.
2013-05-31 17:23:40 +01:00
amercader a6a0196a4e Merge branch 'api-version-fix' of git://github.com/fraunhoferfokus/ckanext-harvest into fraunhoferfokus-api-version-fix 2013-05-31 17:15:43 +01:00
Sean Hammond 85a013f2c9 [#42] Remove non-string extras from packages
Remove extras whose values are not strings (e.g. dicts, lists..) from
packages before attempting to create or update the packages on the
target site.

In CKAN 1 it was possible for the values of extras to be other types,
but in CKAN 2 they must be strings, so when harvesting from a CKAN 1 site
into a CKAN 2 site SQLAlchemy would crash when trying to create packages
with non-string extras.

The fix in this commit is to simply remove any non-string extras from
the harvested package. (Alternatively, we could try to convert them to a
string using JSON.)

Fixes #42.
2013-05-31 15:43:42 +02:00
amercader 361abcfc07 [#17] Fix bug with remote groups handling
If neither 'only_local' or 'create' are used the remote groups property
needs to be removed, otherwise it causes an exception when the group is
not found.
2013-05-30 18:06:15 +01:00
Konrad Reiche 87cae31c75 Fix api_version check in the group importer code
I have forgotten to update one check for the api_version 1 in the code
responsible for the remote group import feature. This commit fixes that.

Signed-off-by: Konrad Reiche <konrad.reiche@fokus.fraunhofer.de>
2013-05-27 13:36:56 +02:00
Konrad Reiche c858b9fe9f Add exception handling for the API version parsing
I have added try-except clauses in order to prevent the process from
crashing if a non-parsable integer is used for the api_version option.

Signed-off-by: Konrad Reiche <konrad.reiche@fokus.fraunhofer.de>
2013-05-27 13:12:05 +02:00
Konrad Reiche 05094090af Change type of the API version to integer
The CKAN logic uses integers when dealing with the API version, e.g.
making checks which API version is in use. Currently, the harvester
uses strings to identify the API version. Instead of dealing with
type conversion the harvester could use integers directly.

This commit fixes okfn/ckanext-harvest#36. When the API version is
parsed from the configuration it is passed through the int() function.
This way the harvesting will still work even if a harvest source was
configured with a string API version which makes this commit backward
compatible.

Signed-off-by: Konrad Reiche <konrad.reiche@fokus.fraunhofer.de>
2013-05-27 12:51:48 +02:00
amercader 3d2867ca04 [#17] Remove ckanclient dependency as it is not used 2013-05-24 17:55:37 +01:00
amercader f1d11c1307 [#17] Import remote groups in CKAN harvester
This is a cleaner commit of the great work done by @platzhirsch
implementing remote groups import on the CKAN harvester.
2013-05-24 16:55:05 +01:00
amercader 1efd7ab4cd Ignore remote orgs in CKAN harvester
If #17 progresses we can do somethign similar for them, although it amy
be more complicated because of authorization issues.
2013-05-16 17:30:54 +01:00
kindly c754479014 #29 make new idatasets form work with harvest source form 2013-03-25 17:38:07 +00:00
joetsoi 7257258ca4 mark new harvest objects as current
When a new harvest_object for a new package was being created, it
was immediately being marked as false, as all objects were marked
as false, including the new object just created and newly marked
as current=true.

Fix so that old HarvestObjects are only marked as current=False
when updating an existing package.
2013-03-07 20:27:27 +00:00
joetsoi 9432368bea fix gather_stage if there is a previous job
change check on gather stage to check for changed packages since
last job instead of current harvest job's gather_start

fix attribute look up bug

fix print_job to print 0 gather_errors instead of key error
2013-02-28 19:06:21 +00:00
joetsoi ba486a9482 add indexing of datasets whilst harvesting 2013-02-27 11:34:09 +00:00
joetsoi f97e3b4c6c add return True to import stage of ckanharvester
Was causing queue.py to report that the import had errored.
2013-02-22 10:13:36 +00:00
amercader 177349fd76 Update HarvesterBase
This is a convenience class that other harvesters can extend. Updates
include a cleanup of old functions and porting of enhancements from the
spatial harvesters.
2013-02-12 16:10:13 +00:00
amercader 871eae94b6 [ckan harvester] Fix bug on force all check 2012-03-15 11:31:12 +00:00
amercader f210455aef [ckan harvester] Replace title on default extras 2012-03-13 12:38:14 +00:00
amercader e0bef2ef9c [base] Minor fix for harvesters without config 2012-03-12 14:46:28 +00:00
amercader 50537a6738 Merge branch 'master' into enh-1726-harvesting-model-update 2012-02-15 12:01:15 +00:00
amercader 9ed152cbea [ckan harvester] Add support for forcing gathering of all remote packages 2012-02-03 17:54:34 +00:00
amercader 479750da09 [#1726][base harvester] Set current field when importing 2012-02-02 13:18:43 +00:00
amercader eb646b3385 [ckan harvester] Add support for defining default extras 2012-01-10 17:07:19 +00:00
amercader ae51093213 [ckan harvester] Ignore __junk field, was causing imports to fail 2012-01-10 14:46:12 +00:00
Adrià Mercader da469ab08e [base harvester] Custom tag munge function. TODO: check with flexible tags 2011-11-23 11:05:52 +00:00