Commit Graph

115 Commits

Author SHA1 Message Date
amercader 9dfeb154eb [#158] Tone down log message 2016-02-17 10:05:57 +00:00
David Read 84b0462979 No need to go back twice 2016-02-15 15:36:02 +00:00
David Read 794fc93230 Maintain compatibility with rest-style updates 2016-02-15 15:23:39 +00:00
David Read f22100e6c2 Merge remote-tracking branch 'origin' into 157-version-three-apify 2016-02-15 15:20:33 +00:00
David Read bf0d1fd779 Fix name error 2016-02-15 13:54:58 +00:00
David Read 4516bfe44e PEP8 and lint, extracted from PR158 2016-02-15 13:50:18 +00:00
David Read 385b369148 Error-free jobs now include ones where an object was not modified. 2016-02-15 13:16:23 +00:00
David Read f63140354d Fix logic error in previous commit 2016-02-15 12:28:46 +00:00
David Read 52c071dbe9 Improved error handling. e.g. if the site it harvests just returns errors. 2016-02-15 12:10:44 +00:00
David Read 331ad84272 Deal with worry about datasets on the remote CKAN being added/removed during harvest. 2016-02-12 18:00:00 +00:00
David Read 7096b7ddf2 Merge branch 'master' of github.com:ckan/ckanext-harvest into 157-version-three-apify 2016-02-12 16:51:26 +00:00
amercader 9d06820bcd Merge branch 'error_creation_moved_to_model' 2015-12-10 13:25:05 +00:00
amercader 04162ce9e4 Merge branch 'munge-tag' 2015-12-10 13:15:17 +00:00
David Read 07c76b0cbf Docs & pep8 2015-12-02 16:23:54 +00:00
David Read c7021933a0 Move creation of errors to the model as thats a more natural home. Provide backwards compatibility. 2015-12-02 08:15:13 +00:00
David Read 392c13d828 If not revisions then we get a 404, so deal with that better. 2015-11-23 21:36:45 +00:00
David Read 4405066fab Catch exceptions from urllib2.urlopen more comprehensively. I think 400 errors were from CKAN 0.6 or something like that - ignore now. 2015-11-23 21:26:32 +00:00
David Read 4b5014d381 Fix test for older ckan. 2015-11-23 18:27:04 +00:00
David Read 3b4daf0609 fix typo 2015-11-23 17:40:35 +00:00
David Read bc26159fb6 tag_munge from ckan 2.2 fails the test with dashes, so use the harvest one for this ckan version. 2015-11-23 17:31:20 +00:00
David Read 52f7e0dd07 Use the ckan version of munge_tag if available, but provide a fallback for older ckans. 2015-11-23 12:48:05 +00:00
Stefan Oderbolz 129b1a0cf5 Enable custom solution to detect existing packages
With this change, all harvesters that extend the base harvester have the
possibility to use the very useful create_or_update method, but still
define their own way of detecting what package is the existing one.

This is very useful for harvest sources that have no knowledge of the
CKAN internal id, but have another way of finding previous package.
2015-11-20 16:31:47 +01:00
David Read ae7c500745 Merge branch 'master' into yhteentoimivuuspalvelut-job-reporting-fixes 2015-11-17 12:35:59 +00:00
Raphael Stolt 084723abb7 Catch JSONDecodeError when no JSON content 2015-11-16 10:59:18 +01:00
David Read c7fac36c1c [#107] "unchanged" response tested and related fixes
* fix "existing_package_dict" which wasn't containing metadata_modified (because of the schema in the context) so you never skipped an object.
* fix IntegrityError due to resource revision_id being harvested. No idea why this hasn't caused errors before now.
* "unchanged" is now checked in base instead of ckanharvester - makes sense. Looking at other harvesters, it's normal to return from the import_stage with the value returned from base._create_or_update_package so I've continued with that.
* "unchanged" response is now documented
* better report_status tests in test_queue2.
2015-11-03 00:22:53 +00:00
David Read e59760fefe Merge branch 'job-reporting-fixes' of https://github.com/yhteentoimivuuspalvelut/ckanext-harvest into yhteentoimivuuspalvelut-job-reporting-fixes 2015-11-02 21:25:32 +00:00
David Read 24415844e0 [#158] Fix revision_id problem in second harvest. 2015-11-02 18:13:29 +00:00
David Read b7552ba700 [#158] Try harder to use the "get datasets since time X" method of harvesting. Go back to the last completely successful harvest, rather than just consider the previous one. And that had a bug, because fetch errors were ignored, meaning one fetch error could mean that dataset never got harvested again. 2015-11-02 16:59:19 +00:00
David Read 1a680f3fd3 [#158] Fix spaces encoding broken in previous merge. Tested with data.gov.uk. 2015-10-29 17:31:04 +00:00
David Read e2ab9e58e7 Merge remote-tracking branch 'origin/master' into 157-version-three-apify
Conflicts:
	ckanext/harvest/harvesters/ckanharvester.py
2015-10-28 14:34:27 +00:00
David Read 55245b5091 [#158] PEP8/formatting. 2015-10-27 17:43:11 +00:00
David Read 2a79873855 [#158] Use package search to get all datasets. Add paging search results. Store pkg_dict from search in the object rather than request it again in fetch_stage. 2015-10-27 17:33:22 +00:00
David Read b56fae8aed Fixes and tests
* Fix extras as a list of dicts
* Fix SOLR dates syntax - needed a Z
* Basic tests for this updated ckan harvester
* Now require CKAN 2.0 to be able to be able to save these packages in package_show form. Take advantage of this now we are such various imports from are definitely available, such as munge_tag.
* Add back compatibility for other harvesters supplying restful-like package_dicts to _create_or_update_package

TODO add back in the ability to harvest pre 2.0 CKANs with the RESTful calls (fallback or maybe configurable)
2015-10-23 17:30:28 +00:00
David Read caeeace8dc Merge branch 'master' into 157-version-three-apify 2015-10-23 14:39:48 +01:00
David Read bc49149d5e Merge branch 'master' into include-exclude-org 2015-10-23 14:36:53 +01:00
David Read eb9aa17862 Include/exclude orgs funcationality based on work by memaldi and ross. 2015-10-21 16:33:16 +00:00
David Read be3e88086a Generating unique names improved
* Harvesters that change the name when the title changes have had a
  problem when the change is small and a number was unnecessarily
  appended. e.g. "Trees "->"Trees" meant _gen_new_name("Trees") returned
  "trees1". Now you can specify the existing value and it will return
  that if it still holds.
* Maximum dataset name length is now adhered to.
* To make a name unique, a sequential number is now added, since for
  users that is more understandable and pleasant. However hex digits are
  still an option, for those that want to harvest concurrently.
2015-10-01 17:53:03 +01:00
Ross Jones 6dd40bfcf9 Changes the gather state to use v3 API
Rather than using the revisions in v2 API this now uses the
package_search API so that we can extend it with proper filters in
future.
2015-09-10 18:53:16 +01:00
amercader 673dfc9882 [#127] Use site user on the CKAN harvester
Add missing call
2015-06-11 10:38:33 +01:00
amercader d3a3f09ad1 [#127] Use site user on the CKAN harvester
To avoid having to create a 'harvest' sysadmin explicitly. It will still
be used if present, but if not the site user will be used. You can also
define to user to use via a config option.
2015-06-11 10:19:07 +01:00
Jari Voutilainen 859133fe36 move detecting unchanged datasets to ckanharvester and queue.py 2015-03-10 14:48:41 +02:00
David Read b3ed6cae5a Merge pull request #121 from metaodi/120-create-remote-orgs
Fetch remote organization via action api
2015-01-15 10:49:09 +00:00
Stefan Oderbolz c1bcee9684 Use str() to get the error message 2015-01-15 11:36:15 +01:00
Stefan Oderbolz 191c39ce5c Catch the more general URLError instead of HTTPError
HTTPError is a subclass of URLError, so catch URLError is enough. I
think the HTTP error code is not as important in this situation, so
catching the more generic error seems like the best solution.
2015-01-15 10:57:24 +01:00
Stefan Oderbolz b978c26e70 Use ContentFetchError instead of generic Exception 2015-01-15 00:49:11 +01:00
Stefan Oderbolz 935b9dda01 Munge group name before fetching remote group
The API call /api/2/rest/package/<id> returns the display name of the
group instead of its ID. To properly match the group, munge the name
before calling /api/2/rest/group
2015-01-15 00:44:53 +01:00
Stefan Oderbolz ef35c21e2a Improve exception handling with custom exception
1. Try whenever possible to catch specific exceptions
2. Raise custom exception where appropriate
3. Fix the exception handling in _get_group and _get_organization
2015-01-15 00:44:45 +01:00
Stefan Oderbolz 0fd38e0e54 Use _get_group as a fallback for remote orgs
First try to get a remote org from the remote Action API, if this fails
try to use the old rest api call, which works on older CKAN versions.

Only if both options fail, its currently not possible to get the remote
organization.
2015-01-14 00:10:27 +01:00
Stefan Oderbolz f214577872 Fetch remote organization via action api
Organizations used to be returned by /api/2/rest/group, this is what the
old implementation used to fetch the information to create the remote
organization on the local instance of CKAN.

With this commit the Action API is used to fetch the same information.
2015-01-13 14:46:53 +01:00
Stefan Oderbolz ea9debf714 Fix logic of conditional and make it more pythonic 2014-12-18 16:03:33 +01:00