harvester-d4science

Commit Graph

Author	SHA1	Message	Date
etj	41aa9c121f	303: fix clean_tags with tags dict	2017-11-07 14:14:50 +01:00
etj	207ac81d70	[289] Fix default_extras	2017-05-29 20:11:49 +02:00
David Read	cc438786de	More explicit checks of the exception thrown when checking harvest config. Also the default_groups test was checking the wrong thing completely.	2017-05-02 20:39:49 +00:00
Adrià Mercader	e3b4854b07	Merge pull request #270 from GovDataOfficial/improve-resolving-local-groups Improve resolving local groups	2017-03-01 12:29:29 +00:00
Mark Gregson	dc06f92ec7	Return an empty list when no CKAN datasets are gathered	2017-02-21 12:22:02 +11:00
seitenbau-govdata	0f951d9fc0	Improve resolving local groups Improve resolving local groups by searching for group additionally by name.	2016-11-15 22:38:27 +01:00
David Read	78933fb775	[#253 ] Fix default_groups by saving the dicts to the config object, since saving it to the harvester object doesnt work in the real world. This is a lot more efficient than doing group_show for every dataset imported.	2016-06-27 12:01:35 +01:00
Jardel Weyrich	e8f539a45e	Don't let the user specify mutually exclusive configuration options: - organizations_filter_include - organizations_filter_exclude	2016-06-14 11:35:38 -03:00
David Read	f1742fb51a	Fix default_groups. It accepted a list of package_name/ids and was trying to add this to the package, but the package needs a dict. Added test.	2016-06-10 09:16:32 +00:00
David Read	bfc9b8e0d9	[#249 ] Test and fix docs for default_tags. Needed to improve error handling when saving ValidationError in a HOE.	2016-06-09 22:11:03 +00:00
amercader	5e1512f717	Don't reuse contexts on ckan harvester Reusing the same context on all calls can lead to hard to debug failures like Action function organization_show did not call its auth function In this case that was caused because the first organization/group_show raised a NotFound so the auth audit was still in the context. When organization/group_show was called again at the end of organization/group_create the auth audit exception was raised. This commit makes sure that each call has its own context.	2016-05-23 12:20:08 +01:00
Petar Efnushev	c16ecea7f0	reverted change in default groups validation	2016-05-20 20:15:54 +02:00
Petar Efnushev	c154365371	Fixed creation/import of groups and organizations when harvesting from remote ckan instance	2016-05-20 16:38:48 +02:00
amercader	9dfeb154eb	[#158 ] Tone down log message	2016-02-17 10:05:57 +00:00
David Read	84b0462979	No need to go back twice	2016-02-15 15:36:02 +00:00
David Read	794fc93230	Maintain compatibility with rest-style updates	2016-02-15 15:23:39 +00:00
David Read	f22100e6c2	Merge remote-tracking branch 'origin' into 157-version-three-apify	2016-02-15 15:20:33 +00:00
David Read	bf0d1fd779	Fix name error	2016-02-15 13:54:58 +00:00
David Read	4516bfe44e	PEP8 and lint, extracted from PR158	2016-02-15 13:50:18 +00:00
David Read	385b369148	Error-free jobs now include ones where an object was not modified.	2016-02-15 13:16:23 +00:00
David Read	f63140354d	Fix logic error in previous commit	2016-02-15 12:28:46 +00:00
David Read	52c071dbe9	Improved error handling. e.g. if the site it harvests just returns errors.	2016-02-15 12:10:44 +00:00
David Read	331ad84272	Deal with worry about datasets on the remote CKAN being added/removed during harvest.	2016-02-12 18:00:00 +00:00
David Read	7096b7ddf2	Merge branch 'master' of github.com:ckan/ckanext-harvest into 157-version-three-apify	2016-02-12 16:51:26 +00:00
amercader	9d06820bcd	Merge branch 'error_creation_moved_to_model'	2015-12-10 13:25:05 +00:00
amercader	04162ce9e4	Merge branch 'munge-tag'	2015-12-10 13:15:17 +00:00
David Read	07c76b0cbf	Docs & pep8	2015-12-02 16:23:54 +00:00
David Read	c7021933a0	Move creation of errors to the model as thats a more natural home. Provide backwards compatibility.	2015-12-02 08:15:13 +00:00
David Read	392c13d828	If not revisions then we get a 404, so deal with that better.	2015-11-23 21:36:45 +00:00
David Read	4405066fab	Catch exceptions from urllib2.urlopen more comprehensively. I think 400 errors were from CKAN 0.6 or something like that - ignore now.	2015-11-23 21:26:32 +00:00
David Read	4b5014d381	Fix test for older ckan.	2015-11-23 18:27:04 +00:00
David Read	3b4daf0609	fix typo	2015-11-23 17:40:35 +00:00
David Read	bc26159fb6	tag_munge from ckan 2.2 fails the test with dashes, so use the harvest one for this ckan version.	2015-11-23 17:31:20 +00:00
David Read	52f7e0dd07	Use the ckan version of munge_tag if available, but provide a fallback for older ckans.	2015-11-23 12:48:05 +00:00
Stefan Oderbolz	129b1a0cf5	Enable custom solution to detect existing packages With this change, all harvesters that extend the base harvester have the possibility to use the very useful create_or_update method, but still define their own way of detecting what package is the existing one. This is very useful for harvest sources that have no knowledge of the CKAN internal id, but have another way of finding previous package.	2015-11-20 16:31:47 +01:00
David Read	ae7c500745	Merge branch 'master' into yhteentoimivuuspalvelut-job-reporting-fixes	2015-11-17 12:35:59 +00:00
Raphael Stolt	084723abb7	Catch JSONDecodeError when no JSON content	2015-11-16 10:59:18 +01:00
David Read	c7fac36c1c	[#107 ] "unchanged" response tested and related fixes * fix "existing_package_dict" which wasn't containing metadata_modified (because of the schema in the context) so you never skipped an object. * fix IntegrityError due to resource revision_id being harvested. No idea why this hasn't caused errors before now. * "unchanged" is now checked in base instead of ckanharvester - makes sense. Looking at other harvesters, it's normal to return from the import_stage with the value returned from base._create_or_update_package so I've continued with that. * "unchanged" response is now documented * better report_status tests in test_queue2.	2015-11-03 00:22:53 +00:00
David Read	e59760fefe	Merge branch 'job-reporting-fixes' of https://github.com/yhteentoimivuuspalvelut/ckanext-harvest into yhteentoimivuuspalvelut-job-reporting-fixes	2015-11-02 21:25:32 +00:00
David Read	24415844e0	[#158 ] Fix revision_id problem in second harvest.	2015-11-02 18:13:29 +00:00
David Read	b7552ba700	[#158 ] Try harder to use the "get datasets since time X" method of harvesting. Go back to the last completely successful harvest, rather than just consider the previous one. And that had a bug, because fetch errors were ignored, meaning one fetch error could mean that dataset never got harvested again.	2015-11-02 16:59:19 +00:00
David Read	1a680f3fd3	[#158 ] Fix spaces encoding broken in previous merge. Tested with data.gov.uk.	2015-10-29 17:31:04 +00:00
David Read	e2ab9e58e7	Merge remote-tracking branch 'origin/master' into 157-version-three-apify Conflicts: ckanext/harvest/harvesters/ckanharvester.py	2015-10-28 14:34:27 +00:00
David Read	55245b5091	[#158 ] PEP8/formatting.	2015-10-27 17:43:11 +00:00
David Read	2a79873855	[#158 ] Use package search to get all datasets. Add paging search results. Store pkg_dict from search in the object rather than request it again in fetch_stage.	2015-10-27 17:33:22 +00:00
David Read	b56fae8aed	Fixes and tests * Fix extras as a list of dicts * Fix SOLR dates syntax - needed a Z * Basic tests for this updated ckan harvester * Now require CKAN 2.0 to be able to be able to save these packages in package_show form. Take advantage of this now we are such various imports from are definitely available, such as munge_tag. * Add back compatibility for other harvesters supplying restful-like package_dicts to _create_or_update_package TODO add back in the ability to harvest pre 2.0 CKANs with the RESTful calls (fallback or maybe configurable)	2015-10-23 17:30:28 +00:00
David Read	caeeace8dc	Merge branch 'master' into 157-version-three-apify	2015-10-23 14:39:48 +01:00
David Read	bc49149d5e	Merge branch 'master' into include-exclude-org	2015-10-23 14:36:53 +01:00
David Read	eb9aa17862	Include/exclude orgs funcationality based on work by memaldi and ross.	2015-10-21 16:33:16 +00:00
David Read	be3e88086a	Generating unique names improved * Harvesters that change the name when the title changes have had a problem when the change is small and a number was unnecessarily appended. e.g. "Trees "->"Trees" meant _gen_new_name("Trees") returned "trees1". Now you can specify the existing value and it will return that if it still holds. * Maximum dataset name length is now adhered to. * To make a name unique, a sequential number is now added, since for users that is more understandable and pleasant. However hex digits are still an option, for those that want to harvest concurrently.	2015-10-01 17:53:03 +01:00

1 2 3

128 Commits