Commit Graph

52 Commits

Author SHA1 Message Date
etj d1dd4eb227 303: fix clean_tags with tags dict (fixes requested by review) 2017-11-08 13:46:44 +01:00
etj 41aa9c121f 303: fix clean_tags with tags dict 2017-11-07 14:14:50 +01:00
David Read cc438786de More explicit checks of the exception thrown when checking harvest config. Also the default_groups test was checking the wrong thing completely. 2017-05-02 20:39:49 +00:00
David Read 794fc93230 Maintain compatibility with rest-style updates 2016-02-15 15:23:39 +00:00
David Read 385b369148 Error-free jobs now include ones where an object was not modified. 2016-02-15 13:16:23 +00:00
David Read 7096b7ddf2 Merge branch 'master' of github.com:ckan/ckanext-harvest into 157-version-three-apify 2016-02-12 16:51:26 +00:00
amercader 9d06820bcd Merge branch 'error_creation_moved_to_model' 2015-12-10 13:25:05 +00:00
David Read 07c76b0cbf Docs & pep8 2015-12-02 16:23:54 +00:00
David Read c7021933a0 Move creation of errors to the model as thats a more natural home. Provide backwards compatibility. 2015-12-02 08:15:13 +00:00
David Read 4b5014d381 Fix test for older ckan. 2015-11-23 18:27:04 +00:00
David Read 3b4daf0609 fix typo 2015-11-23 17:40:35 +00:00
David Read bc26159fb6 tag_munge from ckan 2.2 fails the test with dashes, so use the harvest one for this ckan version. 2015-11-23 17:31:20 +00:00
David Read 52f7e0dd07 Use the ckan version of munge_tag if available, but provide a fallback for older ckans. 2015-11-23 12:48:05 +00:00
Stefan Oderbolz 129b1a0cf5 Enable custom solution to detect existing packages
With this change, all harvesters that extend the base harvester have the
possibility to use the very useful create_or_update method, but still
define their own way of detecting what package is the existing one.

This is very useful for harvest sources that have no knowledge of the
CKAN internal id, but have another way of finding previous package.
2015-11-20 16:31:47 +01:00
David Read c7fac36c1c [#107] "unchanged" response tested and related fixes
* fix "existing_package_dict" which wasn't containing metadata_modified (because of the schema in the context) so you never skipped an object.
* fix IntegrityError due to resource revision_id being harvested. No idea why this hasn't caused errors before now.
* "unchanged" is now checked in base instead of ckanharvester - makes sense. Looking at other harvesters, it's normal to return from the import_stage with the value returned from base._create_or_update_package so I've continued with that.
* "unchanged" response is now documented
* better report_status tests in test_queue2.
2015-11-03 00:22:53 +00:00
David Read 24415844e0 [#158] Fix revision_id problem in second harvest. 2015-11-02 18:13:29 +00:00
David Read b56fae8aed Fixes and tests
* Fix extras as a list of dicts
* Fix SOLR dates syntax - needed a Z
* Basic tests for this updated ckan harvester
* Now require CKAN 2.0 to be able to be able to save these packages in package_show form. Take advantage of this now we are such various imports from are definitely available, such as munge_tag.
* Add back compatibility for other harvesters supplying restful-like package_dicts to _create_or_update_package

TODO add back in the ability to harvest pre 2.0 CKANs with the RESTful calls (fallback or maybe configurable)
2015-10-23 17:30:28 +00:00
David Read caeeace8dc Merge branch 'master' into 157-version-three-apify 2015-10-23 14:39:48 +01:00
David Read be3e88086a Generating unique names improved
* Harvesters that change the name when the title changes have had a
  problem when the change is small and a number was unnecessarily
  appended. e.g. "Trees "->"Trees" meant _gen_new_name("Trees") returned
  "trees1". Now you can specify the existing value and it will return
  that if it still holds.
* Maximum dataset name length is now adhered to.
* To make a name unique, a sequential number is now added, since for
  users that is more understandable and pleasant. However hex digits are
  still an option, for those that want to harvest concurrently.
2015-10-01 17:53:03 +01:00
Ross Jones 6dd40bfcf9 Changes the gather state to use v3 API
Rather than using the revisions in v2 API this now uses the
package_search API so that we can extend it with proper filters in
future.
2015-09-10 18:53:16 +01:00
amercader d3a3f09ad1 [#127] Use site user on the CKAN harvester
To avoid having to create a 'harvest' sysadmin explicitly. It will still
be used if present, but if not the site user will be used. You can also
define to user to use via a config option.
2015-06-11 10:19:07 +01:00
Stefan Oderbolz ea9debf714 Fix logic of conditional and make it more pythonic 2014-12-18 16:03:33 +01:00
Stefan Oderbolz 08930d01bf Make sure for new packages get a unique 'name' 2014-12-16 15:02:36 +01:00
amercader 8cf254f112 Merge branch '99-all-non-ascii-tags' of https://github.com/morty/ckanext-harvest into morty-99-all-non-ascii-tags 2014-08-29 14:40:43 +01:00
Tom Mortimer-Jones 8a2c072d4e [101] Use name from database when reharvesting package 2014-08-12 11:18:48 +01:00
Tom Mortimer-Jones 65cfade420 [99] Remove empty tags produced by munging all non-ascii tags
I thought this way of filtering was easier to read than filter(None, tags)
2014-08-07 17:05:16 +01:00
amercader 2a07a144fc [#84] Fix auth audit exception when creating datasets
This was caused by a combination of the auth audit leaking and the
harvester reusing the context for the package_show and package_create
actions. If the package is not found, package_show does not call
check_access, and the auth audit does not pass. This is stored in the
context (`__auth_audit`) and is raised next time that we call
get_action (when we call package_create with the same context)

It could potentially be fixed on master, but it is probably quite rare.
2014-02-10 18:22:48 +00:00
Rachel Knowler bf11e4d330 Moved clean_tags check into _create_or_update_package method. 2014-02-10 09:29:01 +01:00
Rachel Knowler 2ba9908653 Config option to munge tags changed to be consistent with other config options in this extension, and noted in README. 2014-01-29 10:55:51 +01:00
Rachel Knowler 5e1aef1d08 Removed extra newline. 2014-01-29 10:06:32 +01:00
Rachel Knowler 7d71b0a00b Wrap tag munging code in config option, defaulting to False. 2014-01-29 10:02:16 +01:00
amercader 0f5624822c Use remote name if present when creating datasets on CKAN harvester 2013-10-11 16:50:25 +01:00
Vitor Baptista f028375ad3 [#62] Use current name when updating package, if the user haven't sent a new one
It's hard for someone outside CKAN to make sure they're sending it in the format
we expect. And they'll also have to keep track of our name format, to keep in
sync whenever we change.

To fix this, we simply do what we already do when creating packages: use a
default name. In this case, the current one.
2013-08-18 12:08:30 -03:00
amercader 39ad78d90a [#59] Ignore auth in the CKAN harvester 2013-08-15 14:37:12 +01:00
Konrad Reiche c858b9fe9f Add exception handling for the API version parsing
I have added try-except clauses in order to prevent the process from
crashing if a non-parsable integer is used for the api_version option.

Signed-off-by: Konrad Reiche <konrad.reiche@fokus.fraunhofer.de>
2013-05-27 13:12:05 +02:00
Konrad Reiche 05094090af Change type of the API version to integer
The CKAN logic uses integers when dealing with the API version, e.g.
making checks which API version is in use. Currently, the harvester
uses strings to identify the API version. Instead of dealing with
type conversion the harvester could use integers directly.

This commit fixes okfn/ckanext-harvest#36. When the API version is
parsed from the configuration it is passed through the int() function.
This way the harvesting will still work even if a harvest source was
configured with a string API version which makes this commit backward
compatible.

Signed-off-by: Konrad Reiche <konrad.reiche@fokus.fraunhofer.de>
2013-05-27 12:51:48 +02:00
kindly c754479014 #29 make new idatasets form work with harvest source form 2013-03-25 17:38:07 +00:00
joetsoi 7257258ca4 mark new harvest objects as current
When a new harvest_object for a new package was being created, it
was immediately being marked as false, as all objects were marked
as false, including the new object just created and newly marked
as current=true.

Fix so that old HarvestObjects are only marked as current=False
when updating an existing package.
2013-03-07 20:27:27 +00:00
joetsoi ba486a9482 add indexing of datasets whilst harvesting 2013-02-27 11:34:09 +00:00
amercader 177349fd76 Update HarvesterBase
This is a convenience class that other harvesters can extend. Updates
include a cleanup of old functions and porting of enhancements from the
spatial harvesters.
2013-02-12 16:10:13 +00:00
amercader f210455aef [ckan harvester] Replace title on default extras 2012-03-13 12:38:14 +00:00
amercader e0bef2ef9c [base] Minor fix for harvesters without config 2012-03-12 14:46:28 +00:00
amercader 479750da09 [#1726][base harvester] Set current field when importing 2012-02-02 13:18:43 +00:00
amercader ae51093213 [ckan harvester] Ignore __junk field, was causing imports to fail 2012-01-10 14:46:12 +00:00
Adrià Mercader da469ab08e [base harvester] Custom tag munge function. TODO: check with flexible tags 2011-11-23 11:05:52 +00:00
Adrià Mercader 0ab5c53b47 [ckan harvester] Fix typo 2011-11-18 17:53:01 +00:00
Adrià Mercader c939d90dbb [ckan harvester] Support for defining a custom user to do the harvesting 2011-11-18 14:12:30 +00:00
Adrià Mercader 2018d9e513 [ckan harvester] Support for default tags and groups 2011-11-18 13:20:41 +00:00
Adrià Mercader c04d80e27e Use get_action function instead of directly calling the action functions 2011-10-26 17:26:18 +01:00
Adrià Mercader 7927329536 Make harvesters work with latest ckan release 2011-07-29 11:31:03 +01:00