Commit Graph

133 Commits

Author SHA1 Message Date
amercader 753d29d5ff Merge branch 'master' of https://github.com/etri-odp/ckanext-harvest into etri-odp-master 2018-10-25 12:47:26 +02:00
seitenbau-govdata 938d0322f5 Add config option for dataset name append type 2018-06-01 21:43:37 +02:00
MinhChau 0f8d448864 feat: add groups filter config 2018-04-17 13:22:26 +09:00
Knud Möller 717fdb35dd move _last_error_free_job from CKANHarvester to HarvesterBase 2017-11-10 12:19:25 +01:00
etj d1dd4eb227 303: fix clean_tags with tags dict (fixes requested by review) 2017-11-08 13:46:44 +01:00
etj 41aa9c121f 303: fix clean_tags with tags dict 2017-11-07 14:14:50 +01:00
etj 207ac81d70 [289] Fix default_extras 2017-05-29 20:11:49 +02:00
David Read cc438786de More explicit checks of the exception thrown when checking harvest config. Also the default_groups test was checking the wrong thing completely. 2017-05-02 20:39:49 +00:00
Adrià Mercader e3b4854b07 Merge pull request #270 from GovDataOfficial/improve-resolving-local-groups
Improve resolving local groups
2017-03-01 12:29:29 +00:00
Mark Gregson dc06f92ec7 Return an empty list when no CKAN datasets are gathered 2017-02-21 12:22:02 +11:00
seitenbau-govdata 0f951d9fc0 Improve resolving local groups
Improve resolving local groups by searching for group additionally by name.
2016-11-15 22:38:27 +01:00
David Read 78933fb775 [#253] Fix default_groups by saving the dicts to the config object, since saving it to the harvester object doesnt work in the real world. This is a lot more efficient than doing group_show for every dataset imported. 2016-06-27 12:01:35 +01:00
Jardel Weyrich e8f539a45e Don't let the user specify mutually exclusive configuration options:
- organizations_filter_include
- organizations_filter_exclude
2016-06-14 11:35:38 -03:00
David Read f1742fb51a Fix default_groups. It accepted a list of package_name/ids and was trying to add this to the package, but the package needs a dict. Added test. 2016-06-10 09:16:32 +00:00
David Read bfc9b8e0d9 [#249] Test and fix docs for default_tags. Needed to improve error handling when saving ValidationError in a HOE. 2016-06-09 22:11:03 +00:00
amercader 5e1512f717 Don't reuse contexts on ckan harvester
Reusing the same context on all calls can lead to hard to debug failures
like

Action function organization_show did not call its auth function

In this case that was caused because the first organization/group_show
raised a NotFound so the auth audit was still in the context. When
organization/group_show was called again at the end of
organization/group_create the auth audit exception was raised.

This commit makes sure that each call has its own context.
2016-05-23 12:20:08 +01:00
Petar Efnushev c16ecea7f0 reverted change in default groups validation 2016-05-20 20:15:54 +02:00
Petar Efnushev c154365371 Fixed creation/import of groups and organizations when harvesting from remote ckan instance 2016-05-20 16:38:48 +02:00
amercader 9dfeb154eb [#158] Tone down log message 2016-02-17 10:05:57 +00:00
David Read 84b0462979 No need to go back twice 2016-02-15 15:36:02 +00:00
David Read 794fc93230 Maintain compatibility with rest-style updates 2016-02-15 15:23:39 +00:00
David Read f22100e6c2 Merge remote-tracking branch 'origin' into 157-version-three-apify 2016-02-15 15:20:33 +00:00
David Read bf0d1fd779 Fix name error 2016-02-15 13:54:58 +00:00
David Read 4516bfe44e PEP8 and lint, extracted from PR158 2016-02-15 13:50:18 +00:00
David Read 385b369148 Error-free jobs now include ones where an object was not modified. 2016-02-15 13:16:23 +00:00
David Read f63140354d Fix logic error in previous commit 2016-02-15 12:28:46 +00:00
David Read 52c071dbe9 Improved error handling. e.g. if the site it harvests just returns errors. 2016-02-15 12:10:44 +00:00
David Read 331ad84272 Deal with worry about datasets on the remote CKAN being added/removed during harvest. 2016-02-12 18:00:00 +00:00
David Read 7096b7ddf2 Merge branch 'master' of github.com:ckan/ckanext-harvest into 157-version-three-apify 2016-02-12 16:51:26 +00:00
amercader 9d06820bcd Merge branch 'error_creation_moved_to_model' 2015-12-10 13:25:05 +00:00
amercader 04162ce9e4 Merge branch 'munge-tag' 2015-12-10 13:15:17 +00:00
David Read 07c76b0cbf Docs & pep8 2015-12-02 16:23:54 +00:00
David Read c7021933a0 Move creation of errors to the model as thats a more natural home. Provide backwards compatibility. 2015-12-02 08:15:13 +00:00
David Read 392c13d828 If not revisions then we get a 404, so deal with that better. 2015-11-23 21:36:45 +00:00
David Read 4405066fab Catch exceptions from urllib2.urlopen more comprehensively. I think 400 errors were from CKAN 0.6 or something like that - ignore now. 2015-11-23 21:26:32 +00:00
David Read 4b5014d381 Fix test for older ckan. 2015-11-23 18:27:04 +00:00
David Read 3b4daf0609 fix typo 2015-11-23 17:40:35 +00:00
David Read bc26159fb6 tag_munge from ckan 2.2 fails the test with dashes, so use the harvest one for this ckan version. 2015-11-23 17:31:20 +00:00
David Read 52f7e0dd07 Use the ckan version of munge_tag if available, but provide a fallback for older ckans. 2015-11-23 12:48:05 +00:00
Stefan Oderbolz 129b1a0cf5 Enable custom solution to detect existing packages
With this change, all harvesters that extend the base harvester have the
possibility to use the very useful create_or_update method, but still
define their own way of detecting what package is the existing one.

This is very useful for harvest sources that have no knowledge of the
CKAN internal id, but have another way of finding previous package.
2015-11-20 16:31:47 +01:00
David Read ae7c500745 Merge branch 'master' into yhteentoimivuuspalvelut-job-reporting-fixes 2015-11-17 12:35:59 +00:00
Raphael Stolt 084723abb7 Catch JSONDecodeError when no JSON content 2015-11-16 10:59:18 +01:00
David Read c7fac36c1c [#107] "unchanged" response tested and related fixes
* fix "existing_package_dict" which wasn't containing metadata_modified (because of the schema in the context) so you never skipped an object.
* fix IntegrityError due to resource revision_id being harvested. No idea why this hasn't caused errors before now.
* "unchanged" is now checked in base instead of ckanharvester - makes sense. Looking at other harvesters, it's normal to return from the import_stage with the value returned from base._create_or_update_package so I've continued with that.
* "unchanged" response is now documented
* better report_status tests in test_queue2.
2015-11-03 00:22:53 +00:00
David Read e59760fefe Merge branch 'job-reporting-fixes' of https://github.com/yhteentoimivuuspalvelut/ckanext-harvest into yhteentoimivuuspalvelut-job-reporting-fixes 2015-11-02 21:25:32 +00:00
David Read 24415844e0 [#158] Fix revision_id problem in second harvest. 2015-11-02 18:13:29 +00:00
David Read b7552ba700 [#158] Try harder to use the "get datasets since time X" method of harvesting. Go back to the last completely successful harvest, rather than just consider the previous one. And that had a bug, because fetch errors were ignored, meaning one fetch error could mean that dataset never got harvested again. 2015-11-02 16:59:19 +00:00
David Read 1a680f3fd3 [#158] Fix spaces encoding broken in previous merge. Tested with data.gov.uk. 2015-10-29 17:31:04 +00:00
David Read e2ab9e58e7 Merge remote-tracking branch 'origin/master' into 157-version-three-apify
Conflicts:
	ckanext/harvest/harvesters/ckanharvester.py
2015-10-28 14:34:27 +00:00
David Read 55245b5091 [#158] PEP8/formatting. 2015-10-27 17:43:11 +00:00
David Read 2a79873855 [#158] Use package search to get all datasets. Add paging search results. Store pkg_dict from search in the object rather than request it again in fetch_stage. 2015-10-27 17:33:22 +00:00