seitenbau-govdata
938d0322f5
Add config option for dataset name append type
2018-06-01 21:43:37 +02:00
Knud Möller
717fdb35dd
move _last_error_free_job from CKANHarvester to HarvesterBase
2017-11-10 12:19:25 +01:00
etj
d1dd4eb227
303: fix clean_tags with tags dict (fixes requested by review)
2017-11-08 13:46:44 +01:00
etj
41aa9c121f
303: fix clean_tags with tags dict
2017-11-07 14:14:50 +01:00
etj
207ac81d70
[289] Fix default_extras
2017-05-29 20:11:49 +02:00
David Read
cc438786de
More explicit checks of the exception thrown when checking harvest config. Also the default_groups test was checking the wrong thing completely.
2017-05-02 20:39:49 +00:00
Adrià Mercader
e3b4854b07
Merge pull request #270 from GovDataOfficial/improve-resolving-local-groups
...
Improve resolving local groups
2017-03-01 12:29:29 +00:00
Mark Gregson
dc06f92ec7
Return an empty list when no CKAN datasets are gathered
2017-02-21 12:22:02 +11:00
seitenbau-govdata
0f951d9fc0
Improve resolving local groups
...
Improve resolving local groups by searching for group additionally by name.
2016-11-15 22:38:27 +01:00
David Read
78933fb775
[ #253 ] Fix default_groups by saving the dicts to the config object, since saving it to the harvester object doesnt work in the real world. This is a lot more efficient than doing group_show for every dataset imported.
2016-06-27 12:01:35 +01:00
Jardel Weyrich
e8f539a45e
Don't let the user specify mutually exclusive configuration options:
...
- organizations_filter_include
- organizations_filter_exclude
2016-06-14 11:35:38 -03:00
David Read
f1742fb51a
Fix default_groups. It accepted a list of package_name/ids and was trying to add this to the package, but the package needs a dict. Added test.
2016-06-10 09:16:32 +00:00
David Read
bfc9b8e0d9
[ #249 ] Test and fix docs for default_tags. Needed to improve error handling when saving ValidationError in a HOE.
2016-06-09 22:11:03 +00:00
amercader
5e1512f717
Don't reuse contexts on ckan harvester
...
Reusing the same context on all calls can lead to hard to debug failures
like
Action function organization_show did not call its auth function
In this case that was caused because the first organization/group_show
raised a NotFound so the auth audit was still in the context. When
organization/group_show was called again at the end of
organization/group_create the auth audit exception was raised.
This commit makes sure that each call has its own context.
2016-05-23 12:20:08 +01:00
Petar Efnushev
c16ecea7f0
reverted change in default groups validation
2016-05-20 20:15:54 +02:00
Petar Efnushev
c154365371
Fixed creation/import of groups and organizations when harvesting from remote ckan instance
2016-05-20 16:38:48 +02:00
amercader
9dfeb154eb
[ #158 ] Tone down log message
2016-02-17 10:05:57 +00:00
David Read
84b0462979
No need to go back twice
2016-02-15 15:36:02 +00:00
David Read
794fc93230
Maintain compatibility with rest-style updates
2016-02-15 15:23:39 +00:00
David Read
f22100e6c2
Merge remote-tracking branch 'origin' into 157-version-three-apify
2016-02-15 15:20:33 +00:00
David Read
bf0d1fd779
Fix name error
2016-02-15 13:54:58 +00:00
David Read
4516bfe44e
PEP8 and lint, extracted from PR158
2016-02-15 13:50:18 +00:00
David Read
385b369148
Error-free jobs now include ones where an object was not modified.
2016-02-15 13:16:23 +00:00
David Read
f63140354d
Fix logic error in previous commit
2016-02-15 12:28:46 +00:00
David Read
52c071dbe9
Improved error handling. e.g. if the site it harvests just returns errors.
2016-02-15 12:10:44 +00:00
David Read
331ad84272
Deal with worry about datasets on the remote CKAN being added/removed during harvest.
2016-02-12 18:00:00 +00:00
David Read
7096b7ddf2
Merge branch 'master' of github.com:ckan/ckanext-harvest into 157-version-three-apify
2016-02-12 16:51:26 +00:00
amercader
9d06820bcd
Merge branch 'error_creation_moved_to_model'
2015-12-10 13:25:05 +00:00
amercader
04162ce9e4
Merge branch 'munge-tag'
2015-12-10 13:15:17 +00:00
David Read
07c76b0cbf
Docs & pep8
2015-12-02 16:23:54 +00:00
David Read
c7021933a0
Move creation of errors to the model as thats a more natural home. Provide backwards compatibility.
2015-12-02 08:15:13 +00:00
David Read
392c13d828
If not revisions then we get a 404, so deal with that better.
2015-11-23 21:36:45 +00:00
David Read
4405066fab
Catch exceptions from urllib2.urlopen more comprehensively. I think 400 errors were from CKAN 0.6 or something like that - ignore now.
2015-11-23 21:26:32 +00:00
David Read
4b5014d381
Fix test for older ckan.
2015-11-23 18:27:04 +00:00
David Read
3b4daf0609
fix typo
2015-11-23 17:40:35 +00:00
David Read
bc26159fb6
tag_munge from ckan 2.2 fails the test with dashes, so use the harvest one for this ckan version.
2015-11-23 17:31:20 +00:00
David Read
52f7e0dd07
Use the ckan version of munge_tag if available, but provide a fallback for older ckans.
2015-11-23 12:48:05 +00:00
Stefan Oderbolz
129b1a0cf5
Enable custom solution to detect existing packages
...
With this change, all harvesters that extend the base harvester have the
possibility to use the very useful create_or_update method, but still
define their own way of detecting what package is the existing one.
This is very useful for harvest sources that have no knowledge of the
CKAN internal id, but have another way of finding previous package.
2015-11-20 16:31:47 +01:00
David Read
ae7c500745
Merge branch 'master' into yhteentoimivuuspalvelut-job-reporting-fixes
2015-11-17 12:35:59 +00:00
Raphael Stolt
084723abb7
Catch JSONDecodeError when no JSON content
2015-11-16 10:59:18 +01:00
David Read
c7fac36c1c
[ #107 ] "unchanged" response tested and related fixes
...
* fix "existing_package_dict" which wasn't containing metadata_modified (because of the schema in the context) so you never skipped an object.
* fix IntegrityError due to resource revision_id being harvested. No idea why this hasn't caused errors before now.
* "unchanged" is now checked in base instead of ckanharvester - makes sense. Looking at other harvesters, it's normal to return from the import_stage with the value returned from base._create_or_update_package so I've continued with that.
* "unchanged" response is now documented
* better report_status tests in test_queue2.
2015-11-03 00:22:53 +00:00
David Read
e59760fefe
Merge branch 'job-reporting-fixes' of https://github.com/yhteentoimivuuspalvelut/ckanext-harvest into yhteentoimivuuspalvelut-job-reporting-fixes
2015-11-02 21:25:32 +00:00
David Read
24415844e0
[ #158 ] Fix revision_id problem in second harvest.
2015-11-02 18:13:29 +00:00
David Read
b7552ba700
[ #158 ] Try harder to use the "get datasets since time X" method of harvesting. Go back to the last completely successful harvest, rather than just consider the previous one. And that had a bug, because fetch errors were ignored, meaning one fetch error could mean that dataset never got harvested again.
2015-11-02 16:59:19 +00:00
David Read
1a680f3fd3
[ #158 ] Fix spaces encoding broken in previous merge. Tested with data.gov.uk.
2015-10-29 17:31:04 +00:00
David Read
e2ab9e58e7
Merge remote-tracking branch 'origin/master' into 157-version-three-apify
...
Conflicts:
ckanext/harvest/harvesters/ckanharvester.py
2015-10-28 14:34:27 +00:00
David Read
55245b5091
[ #158 ] PEP8/formatting.
2015-10-27 17:43:11 +00:00
David Read
2a79873855
[ #158 ] Use package search to get all datasets. Add paging search results. Store pkg_dict from search in the object rather than request it again in fetch_stage.
2015-10-27 17:33:22 +00:00
David Read
b56fae8aed
Fixes and tests
...
* Fix extras as a list of dicts
* Fix SOLR dates syntax - needed a Z
* Basic tests for this updated ckan harvester
* Now require CKAN 2.0 to be able to be able to save these packages in package_show form. Take advantage of this now we are such various imports from are definitely available, such as munge_tag.
* Add back compatibility for other harvesters supplying restful-like package_dicts to _create_or_update_package
TODO add back in the ability to harvest pre 2.0 CKANs with the RESTful calls (fallback or maybe configurable)
2015-10-23 17:30:28 +00:00
David Read
caeeace8dc
Merge branch 'master' into 157-version-three-apify
2015-10-23 14:39:48 +01:00