Commit Graph

838 Commits

Author SHA1 Message Date
David Read 77f98d5b18 Compatibility with ckan 2.2. 2015-12-10 16:33:09 +00:00
David Read 8f54421c99 [#214] ckan.legacy_templates=True is really harmless, so just warn if it is set. 2015-12-10 16:11:56 +00:00
David Read 260cd1f2b7 Merge branch 'master' of github.com:ckan/ckanext-harvest into 214-remove-genshi 2015-12-10 16:02:50 +00:00
David Read fa1ec64cc7 Rename templates_new dir to templates. 2015-12-10 15:52:46 +00:00
David Read 8f3006f494 [#214] Config for legacy templates removed. 2015-12-10 15:46:57 +00:00
David Read 41975a93d8 [#214] Removed genshi templates 2015-12-10 15:32:20 +00:00
amercader 468a195a25 Merge branch 'import-guid' 2015-12-10 13:37:23 +00:00
amercader 5ff3ef9d17 Merge branch '205-inconsistent-stats' 2015-12-10 13:36:15 +00:00
amercader 9d06820bcd Merge branch 'error_creation_moved_to_model' 2015-12-10 13:25:05 +00:00
amercader b7a0343763 Merge branch 'abort-by-job' 2015-12-10 13:22:40 +00:00
amercader 82fe2d0e53 Merge branch 'fetch_unchanged' 2015-12-10 13:19:45 +00:00
amercader 80b82ee968 Merge branch 'factory-job' 2015-12-10 13:18:19 +00:00
amercader 52fe7fb21d Merge branch 'named-exceptions' 2015-12-10 13:17:18 +00:00
amercader 072698b4bb Merge branch 'validator-raise' 2015-12-10 13:16:26 +00:00
amercader 4cc39e5ef4 Merge branch 'test-guid-fix' 2015-12-10 13:15:54 +00:00
amercader 04162ce9e4 Merge branch 'munge-tag' 2015-12-10 13:15:17 +00:00
David Read 4ca4b3a2f2 Merge pull request #213 from ckan/212-module-import-error
[#212] Fixes #212 - auth for harvest_job_create was broken.
2015-12-09 15:52:37 +00:00
David Read 8b8086fe48 [#212] Fixes #212 - auth for harvest_job_create was broken. 2015-12-09 15:50:05 +00:00
amercader 84fb3e3325 Add support for the new ITranslation interface
Move the i18n files to the expected place, load the interface if CKAN >=
2.5.0
2015-12-09 14:46:12 +00:00
David Read 031e680b6c Add option to re-import based on guid. 2015-12-08 16:17:39 +00:00
David Read 18169c5133 [#205] "detailed" removed in a couple of other benign places. 2015-12-08 10:18:36 +00:00
David Read 05d4baf040 [#205] "detailed" source status is removed. Now we have lost harvest_source_for_a_dataset it is not possible to call it. And it returned the wrong keys anyway. 2015-12-08 10:14:15 +00:00
David Read 67366093fe [#205] Remove harvest_source_for_a_dataset action. It has been deprecated since Mar 6 16:54:33 2013 and returns wrong harvest stats keys. 2015-12-08 10:06:42 +00:00
David Read 07c76b0cbf Docs & pep8 2015-12-02 16:23:54 +00:00
David Read 121e8bd918 Merge pull request #198 from ckan/catch-exceptions
Catch exceptions from urllib2.urlopen more comprehensively
2015-12-02 16:22:53 +00:00
David Read c7021933a0 Move creation of errors to the model as thats a more natural home. Provide backwards compatibility. 2015-12-02 08:15:13 +00:00
David Read b53682f267 You can abort a job by specifying the ID of the job, rather than the source. This is helpful since the "harvest run" command returns a list of still running job ids. 2015-12-02 07:59:08 +00:00
David Read f67029b993 Hint about rerunning the import_stage. 2015-12-01 17:59:11 +00:00
David Read b8c7d39e1a Document existing functionality for aborting during the gather stage. 2015-12-01 17:56:24 +00:00
David Read 25301e2152 Update README from interfaces.py. PEP8. Mention HarvestObject - Package relation in interface. 2015-12-01 17:51:35 +00:00
David Read b0780b2062 Fetch stage can also return "unchanged", same as the import stage. Used by DGU. It is useful to skip an object like this, to avoid saving the fetched content in a HarvestObject (saves disk usage). 2015-12-01 17:38:57 +00:00
David Read 414c33ac6b Easy fix to stop test_queue.py using MockHarvester in test_queue2.py by mistake. 2015-12-01 16:55:42 +00:00
David Read 673a64a820 Merge pull request #200 from ckan/fix-error-report-status
Fix job stats key
2015-11-30 17:49:14 +00:00
David Read 8163ec4d39 HarvestObject factory was creating a extra field containing the job, by mistake. PEP8 2015-11-27 14:21:16 +00:00
David Read 1f947471eb Take advantage of named exceptions. 2015-11-27 11:57:40 +00:00
Stefan Oderbolz 798866d872 Fix name of class to enable automatic conversion of datetimes
when ckan/ckan#2505 got finally merged, the class name to mark datetimes
with a time part that can be automatically converted to the users
timezone, changed.
This commit makes sure this change is reflected in ckanext-harvest.
2015-11-26 01:49:11 +01:00
David Read 4fbaec0986 Improve harvester type error message. Add docstring. 2015-11-25 20:58:50 +00:00
David Read 6c3efe53df Fix job stats key - it is "errored" not "errors" - the keys are from report_status. And lots of PEP8. 2015-11-25 20:55:32 +00:00
David Read f0a2e9fb8e CKAN revision API returns package ids not names (for v2 of the API). This ensures harvest guid is always the ID rather than the name. 2015-11-24 16:41:43 +00:00
David Read 392c13d828 If not revisions then we get a 404, so deal with that better. 2015-11-23 21:36:45 +00:00
David Read 4405066fab Catch exceptions from urllib2.urlopen more comprehensively. I think 400 errors were from CKAN 0.6 or something like that - ignore now. 2015-11-23 21:26:32 +00:00
David Read 4b5014d381 Fix test for older ckan. 2015-11-23 18:27:04 +00:00
David Read 3b4daf0609 fix typo 2015-11-23 17:40:35 +00:00
David Read bc26159fb6 tag_munge from ckan 2.2 fails the test with dashes, so use the harvest one for this ckan version. 2015-11-23 17:31:20 +00:00
David Read 52f7e0dd07 Use the ckan version of munge_tag if available, but provide a fallback for older ckans. 2015-11-23 12:48:05 +00:00
Stefan Oderbolz 129b1a0cf5 Enable custom solution to detect existing packages
With this change, all harvesters that extend the base harvester have the
possibility to use the very useful create_or_update method, but still
define their own way of detecting what package is the existing one.

This is very useful for harvest sources that have no knowledge of the
CKAN internal id, but have another way of finding previous package.
2015-11-20 16:31:47 +01:00
amercader e71cf35504 Fix queue tests 2015-11-20 14:25:31 +00:00
amercader f1ba2bcfb3 Namespace Redis keys to avoid conflicts between instances
The `ckan.site_id` config option (or `default` if missing) is used to
namespace the Redis keys: routing key and persistance key. Consumers
will only get the relevant keys for their instance.
2015-11-20 14:17:25 +00:00
amercader 920df684ae Merge branch 'db-error' 2015-11-20 12:29:37 +00:00
amercader ede50aa3fb Merge branch 'immediate-harvest' 2015-11-20 12:28:35 +00:00
amercader 3f42eb6ba0 Merge branch 'revision-id-fix'
Conflicts:
	ckanext/harvest/tests/harvesters/test_ckanharvester.py
2015-11-20 12:28:17 +00:00
David Read 60c4371df4 Add "not modified" to the stats always returned. 2015-11-17 12:45:00 +00:00
David Read ae7c500745 Merge branch 'master' into yhteentoimivuuspalvelut-job-reporting-fixes 2015-11-17 12:35:59 +00:00
Stefan Oderbolz 8e02aedc65 Fix tests 2015-11-17 13:29:25 +01:00
Stefan Oderbolz f9b87fff0c Make sure all possible status are always returned
It makes it harded to parse the API response if you always have to check
if 'deleted' is set and if what value it has. I think to simply return 0
for all status value is good practice.
2015-11-17 11:43:11 +01:00
Raphael Stolt 084723abb7 Catch JSONDecodeError when no JSON content 2015-11-16 10:59:18 +01:00
David Read c0a865e64e Revert ok_ - makes it slightly less readable for little benefit. 2015-11-13 13:45:56 +00:00
David Read 42ab55cb6d No longer need uuid since we clear db between tests now. Added ignore_missing because of occasional failures. 2015-11-13 13:32:55 +00:00
David Read b150b50887 Move the SkipTest to include inherited tests too. 2015-11-13 12:44:27 +00:00
David Read 01a4bfd314 Patch test should skip if ckan version is wrong, rather than ignore all exceptions from posts of all tests. Remove FunctionalTestBaseWithoutClearBetweenTests now the tests are modernized. 2015-11-13 12:33:18 +00:00
David Read 1288a4d9e7 Reflow text to 79 char width. Warning not necessary with an exception I think. 2015-11-13 12:32:13 +00:00
Stefan Oderbolz 0ce3748153 Do not use ActionError as this does not yet exist in CKAN 2.2 2015-11-13 12:01:19 +01:00
Stefan Oderbolz c33c6e8c13 Raise an error instead of falling back to harvest_source_update
As the behaviour of *_patch is clearly different from the *_update we
should raise an error if this action is called on a CKAN instance, where
the action is not available.
2015-11-13 11:41:53 +01:00
Stefan Oderbolz 644fa49dd4 Make tests independent from cls.sysadmin
Generate unique harvest sources
2015-11-11 19:49:49 +01:00
Stefan Oderbolz ffca5cc3da Use new factory style for fixtures
- Remove default_source_dict from tests
- Replace setup_class with constructor
- Create mixin for harvest source fixture
- Replace assert with assert_equal where appropriate
- Replace assert with ok_
- Remove dependency to global SOURCE_DICT
- General refactoring of tests
2015-11-11 19:07:59 +01:00
Stefan Oderbolz 136fcb87d5 Make sure package_patch has a fallback for package_update on CKAN < 2.3 2015-11-11 11:37:23 +01:00
Stefan Oderbolz 359da2eb69 Add test class for harvest_source_patch 2015-11-11 11:34:01 +01:00
Stefan Oderbolz 3f09010039 Add harvest_source_patch to API 2015-11-11 05:39:29 +01:00
David Read 735ab3e286 [#157] Try to fix test for ckan 2.2 - cf 91afc0e928 2015-11-04 11:37:03 +00:00
David Read a0742d69b0 Merge branch 'master' of github.com:ckan/ckanext-harvest into 157-version-three-apify
Conflicts:
	ckanext/harvest/logic/action/update.py
2015-11-04 09:50:00 +00:00
David Read 679ed421e9 Merge branch 'master' of github.com:ckan/ckanext-harvest into immediate-harvest
Conflicts:
	ckanext/harvest/logic/action/update.py
2015-11-04 09:44:05 +00:00
David Read f0207ad38f Merge branch 'master' of github.com:ckan/ckanext-harvest into yhteentoimivuuspalvelut-job-reporting-fixes 2015-11-04 09:36:55 +00:00
David Read cbe9b40e66 Merge branch 'master' of github.com:ckan/ckanext-harvest into revision-id-fix 2015-11-04 09:36:08 +00:00
David Read f9da3654f8 [#184] Fix tests for older ckan versions. 2015-11-03 23:27:52 +00:00
David Read 5fba056c59 [#184] Add tests 2015-11-03 23:19:05 +00:00
David Read 77e5b89a01 Blank line needed. 2015-11-03 22:23:04 +00:00
David Read 8c1f7619cb Fix code style to be more ckan-like whilst still pep8. 2015-11-03 22:08:46 +00:00
David Read 20531c0dda Merge branch 'master' of github.com:ckan/ckanext-harvest into LondonAppDev-master
Conflicts:
	ckanext/harvest/logic/action/update.py
	ckanext/harvest/logic/validators.py
2015-11-03 22:02:49 +00:00
David Read 10685badb5 PEP8 based on #174
Conflicts:
	ckanext/harvest/logic/action/delete.py
	ckanext/harvest/logic/action/update.py
	ckanext/harvest/logic/validators.py
2015-11-03 21:56:06 +00:00
David Read 5a5260ff0b Add test for harvest_source_clear since the PEP8 changes were quite a lot there. 2015-11-03 21:42:39 +00:00
David Read 4f71612002 PEP8 based on #174 2015-11-03 20:30:11 +00:00
Mark Winterbottom 208d1c4185 Setting back to master. 2015-11-03 17:31:00 +00:00
David Read 91afc0e928 [#178] Fix test for ckan 2.2, which does not suffer problem #180. 2015-11-03 10:01:47 +00:00
David Read f4f124c181 [#178] Fix resouce_revision_id_fkey error. Fixes #178. 2015-11-03 07:38:01 +00:00
David Read 59be6e2c71 Merge branch 'master' into db-error
Conflicts:
	ckanext/harvest/queue.py
2015-11-03 00:57:14 +00:00
David Read 8a7bc9e1d8 Merge remote-tracking branch 'origin/master' into immediate-harvest
Conflicts:
	README.rst
	ckanext/harvest/commands/harvester.py
	ckanext/harvest/logic/action/create.py
	ckanext/harvest/logic/action/update.py
	ckanext/harvest/logic/auth/update.py
2015-11-03 00:40:25 +00:00
David Read c7fac36c1c [#107] "unchanged" response tested and related fixes
* fix "existing_package_dict" which wasn't containing metadata_modified (because of the schema in the context) so you never skipped an object.
* fix IntegrityError due to resource revision_id being harvested. No idea why this hasn't caused errors before now.
* "unchanged" is now checked in base instead of ckanharvester - makes sense. Looking at other harvesters, it's normal to return from the import_stage with the value returned from base._create_or_update_package so I've continued with that.
* "unchanged" response is now documented
* better report_status tests in test_queue2.
2015-11-03 00:22:53 +00:00
David Read e59760fefe Merge branch 'job-reporting-fixes' of https://github.com/yhteentoimivuuspalvelut/ckanext-harvest into yhteentoimivuuspalvelut-job-reporting-fixes 2015-11-02 21:25:32 +00:00
David Read 24415844e0 [#158] Fix revision_id problem in second harvest. 2015-11-02 18:13:29 +00:00
David Read d495e269e7 [#158] Fix tests 2015-11-02 17:29:45 +00:00
David Read 14f372aec6 Merge branch 'master' of github.com:ckan/ckanext-harvest into 157-version-three-apify
Conflicts:
	README.rst
2015-11-02 17:01:22 +00:00
Mark Winterbottom 7ffd6748f3 Corrected docstring params field, duplicate if statement and deleting keys
for blank values.
2015-11-02 16:59:43 +00:00
David Read b7552ba700 [#158] Try harder to use the "get datasets since time X" method of harvesting. Go back to the last completely successful harvest, rather than just consider the previous one. And that had a bug, because fetch errors were ignored, meaning one fetch error could mean that dataset never got harvested again. 2015-11-02 16:59:19 +00:00
Mark Winterbottom 443d690ac8 Fixed big typo error. 2015-11-02 16:45:16 +00:00
Mark Winterbottom 53f692b802 Merge remote-tracking branch 'remotes/upstream/master' 2015-11-02 16:00:14 +00:00
Mark Winterbottom 1702cf2f09 Remove ', None' on .get() calls because it's the default value. 2015-11-02 15:51:25 +00:00
Mark Winterbottom 0c19acba78 Changed double quotes to single quotes in docstrings. 2015-11-02 15:50:04 +00:00
Mark Winterbottom a6069d93db Fixed bug where the harvest source url validator would validate against
all harvest sources that were ever created instead of just sources that
were currently enabled.
2015-10-30 16:59:04 +00:00
Mark Winterbottom 3f37ae5f45 Corrected docstring. 2015-10-30 16:11:25 +00:00
Mark Winterbottom 02b81187df Fixed bug with deleting harvest source's which have a custom
configuration. Added PEP-8 compliance.
2015-10-30 15:15:41 +00:00
Mark Winterbottom 55325f5940 Updated harvest source url validator to allow for duplicate URL's with
unique configs.
2015-10-30 11:59:24 +00:00
Mark Winterbottom 2c41293c9c Updated the validator to check for unique sets as well as URL. 2015-10-29 18:30:51 +00:00
David Read 1a680f3fd3 [#158] Fix spaces encoding broken in previous merge. Tested with data.gov.uk. 2015-10-29 17:31:04 +00:00
Mark Winterbottom 39ce744368 Modified to make PEP-8 compliant. 2015-10-29 17:18:51 +00:00
David Read f1d2d5fdc4 [#111] Run jobs straight away. 2015-10-28 21:58:36 +00:00
David Read 421e6da660 Add run_test, job_abort, source commands
* run_test - for running a whole harvest on the command-line
* job_abort - for aborting a limbo job
* source - for showing a single harvest source
* allowing a source to be specified by name in several commands
2015-10-28 17:51:58 +00:00
David Read e2ab9e58e7 Merge remote-tracking branch 'origin/master' into 157-version-three-apify
Conflicts:
	ckanext/harvest/harvesters/ckanharvester.py
2015-10-28 14:34:27 +00:00
David Read 3f74c29c99 Merge branch 'master' into 157-version-three-apify 2015-10-27 17:45:27 +00:00
David Read 55245b5091 [#158] PEP8/formatting. 2015-10-27 17:43:11 +00:00
David Read 2a79873855 [#158] Use package search to get all datasets. Add paging search results. Store pkg_dict from search in the object rather than request it again in fetch_stage. 2015-10-27 17:33:22 +00:00
amercader 86630adab7 Merge branch 'include-exclude-org' 2015-10-27 15:52:55 +00:00
David Read b56fae8aed Fixes and tests
* Fix extras as a list of dicts
* Fix SOLR dates syntax - needed a Z
* Basic tests for this updated ckan harvester
* Now require CKAN 2.0 to be able to be able to save these packages in package_show form. Take advantage of this now we are such various imports from are definitely available, such as munge_tag.
* Add back compatibility for other harvesters supplying restful-like package_dicts to _create_or_update_package

TODO add back in the ability to harvest pre 2.0 CKANs with the RESTful calls (fallback or maybe configurable)
2015-10-23 17:30:28 +00:00
amercader 24574f485b Setup harvest model in harvester tests 2015-10-23 15:43:01 +01:00
David Read caeeace8dc Merge branch 'master' into 157-version-three-apify 2015-10-23 14:39:48 +01:00
David Read bc49149d5e Merge branch 'master' into include-exclude-org 2015-10-23 14:36:53 +01:00
David Read 0c0a996b85 Merge branch 'master' into db-error
Conflicts:
	ckanext/harvest/queue.py
2015-10-23 13:33:44 +01:00
amercader 2f4adfb338 Merge branch 'tests' 2015-10-23 13:18:15 +01:00
amercader 3c6cc55be0 Only flush keys on the current Redis database 2015-10-23 11:52:22 +01:00
amercader fdbade465f Merge branch 'master' into purge 2015-10-23 11:33:43 +01:00
amercader d950b13400 Merge branch 'unique-names-improved' 2015-10-23 11:02:49 +01:00
amercader 501edffe2d Merge branch 'master' into migration-states 2015-10-23 10:59:04 +01:00
David Read 3e4a9933ce Remove prints. 2015-10-21 16:52:19 +00:00
David Read dc7af5d150 Remove prints. 2015-10-21 16:38:03 +00:00
David Read eb9aa17862 Include/exclude orgs funcationality based on work by memaldi and ross. 2015-10-21 16:33:16 +00:00
David Read f70c16bce7 Add framework for testing harvesters. Modernize existing tests. 2015-10-21 16:26:57 +00:00
David Read d1f84295f8 purge_queues command now has warning about impact of Redis flushall, plus add some (log) output when you run a purge. 2015-10-21 16:12:40 +00:00
David Read 6360681a8f [#105] Fix order of deletes, as agreed with @florianm. 2015-10-12 15:57:27 +01:00
David Read 82bdff2f34 Add tests 2015-10-01 17:59:17 +01:00
David Read be3e88086a Generating unique names improved
* Harvesters that change the name when the title changes have had a
  problem when the change is small and a number was unnecessarily
  appended. e.g. "Trees "->"Trees" meant _gen_new_name("Trees") returned
  "trees1". Now you can specify the existing value and it will return
  that if it still holds.
* Maximum dataset name length is now adhered to.
* To make a name unique, a sequential number is now added, since for
  users that is more understandable and pleasant. However hex digits are
  still an option, for those that want to harvest concurrently.
2015-10-01 17:53:03 +01:00
David Read 1a6dca7c00 [#148] Catch a more specific exception. 2015-10-01 12:30:40 +01:00
Ross Jones 6dd40bfcf9 Changes the gather state to use v3 API
Rather than using the revisions in v2 API this now uses the
package_search API so that we can extend it with proper filters in
future.
2015-09-10 18:53:16 +01:00
Florian Mayer a6cdda0a14 set max version to 2.4.99 2015-08-19 08:41:42 +00:00
florianm 1905caa961 upgrade harvest_source_clear to not delete from authz models removed in migration 078 2015-08-19 10:25:20 +08:00
David Read de17e0ae8c Catch, record and recover from temporary db problems. 2015-07-22 10:25:11 +01:00
David Read 46f7b32b04 Merge branch 'master' of github.com:okfn/ckanext-harvest into migration-states 2015-07-22 10:13:55 +01:00
David Read 2da918c2e4 Fix migration for old harvests so that ones that errored are correctly marked. Added helpful comments in model. 2015-07-22 10:13:02 +01:00
Stefan Oderbolz ab76830e85 [#145] Throw + catch a custom exception if there are no jobs to run
If there are no harvesting jobs to run, there was always an ugly
exception message when using the paster command. This replaces the ugly
output with a proper message and uses a custom exception to allow others
to deal with this error differently.
2015-07-20 18:41:50 +02:00
Stefan Oderbolz 83dd0b4b68 [#138] Add data attributes to support timezone conversion 2015-07-09 22:35:54 +02:00
Stefan Oderbolz 4dc2f7367d [#139] Delete package relationships when clearing a harvest source 2015-06-26 17:20:23 +02:00
amercader 88d9ba0397 [#136] Fix broken RabbitMQ queue names
The harvester command was still using the old ones.
Use specific ones for testing.
2015-06-11 13:56:22 +01:00
amercader 673dfc9882 [#127] Use site user on the CKAN harvester
Add missing call
2015-06-11 10:38:33 +01:00
amercader d3a3f09ad1 [#127] Use site user on the CKAN harvester
To avoid having to create a 'harvest' sysadmin explicitly. It will still
be used if present, but if not the site user will be used. You can also
define to user to use via a config option.
2015-06-11 10:19:07 +01:00
amercader b17c3269b5 Merge branch 'clear-command' of https://github.com/metaodi/ckanext-harvest into metaodi-clear-command 2015-06-10 15:32:37 +01:00
Stefan Oderbolz 64ff0f3a3a Use single quotes to be consistent 2015-06-10 16:22:04 +02:00
Stefan Oderbolz 2a2d85f60c Wording changes for clearsource and rmsource 2015-06-10 16:19:23 +02:00
joetsoi 92b93c53fc add some translation strings 2015-06-10 12:14:20 +01:00
Stefan Oderbolz 8ebb843052 Add documentation for clearsource command 2015-06-10 11:29:24 +02:00
Stefan Oderbolz 61bc150ae6 Expose clear harvester source as a paster command 2015-06-10 11:19:10 +02:00
amercader 9f8aae3a18 Append site id to queue name
This allows multiple CKAN sites to share the same RabbitMQ exchange
(For the Redis backend this is handled via different Redis databases)
2015-06-01 17:54:22 +01:00
amercader 3e21ea4f82 Fix tests, set up Travis
TODO: sort out the tests properly, avoiding imports from the legacy ones
2015-04-07 13:31:45 +01:00
amercader f72d6da521 Change toolkit import
Apparently on package installs this is not well supported

from ckan.plugins.toolkit import check_ckan_version

But this works:

from ckan.plugins import toolkit

toolkit.check_ckan_version(...
2015-03-19 12:48:46 +00:00
amercader 7a20e93716 Raise on startup import errors so we don't mask problems
Otherwise if there was eg an actual ImportError we jut got

2015-03-19 12:30:08,430 DEBUG [ckanext.harvest.plugin] No auth module
for action "update"

on the log
2015-03-19 12:48:15 +00:00
Jari Voutilainen 859133fe36 move detecting unchanged datasets to ckanharvester and queue.py 2015-03-10 14:48:41 +02:00
David Read d6e9b80496 Merge pull request #118 from clementmouchet/114-remove_resource_groups
Removed ResourceGroup from query when using CKAN 2.3 or above
2015-02-24 09:56:44 +00:00
clementmouchet ead9e67a33 updated def harvest_source_clear() to delete resource views, resource revisions & resources in CKAN >= 2.3 2015-02-23 17:02:21 +00:00
David Read b3ed6cae5a Merge pull request #121 from metaodi/120-create-remote-orgs
Fetch remote organization via action api
2015-01-15 10:49:09 +00:00
Stefan Oderbolz c1bcee9684 Use str() to get the error message 2015-01-15 11:36:15 +01:00
Stefan Oderbolz 191c39ce5c Catch the more general URLError instead of HTTPError
HTTPError is a subclass of URLError, so catch URLError is enough. I
think the HTTP error code is not as important in this situation, so
catching the more generic error seems like the best solution.
2015-01-15 10:57:24 +01:00
Stefan Oderbolz b978c26e70 Use ContentFetchError instead of generic Exception 2015-01-15 00:49:11 +01:00
Stefan Oderbolz 935b9dda01 Munge group name before fetching remote group
The API call /api/2/rest/package/<id> returns the display name of the
group instead of its ID. To properly match the group, munge the name
before calling /api/2/rest/group
2015-01-15 00:44:53 +01:00
Stefan Oderbolz ef35c21e2a Improve exception handling with custom exception
1. Try whenever possible to catch specific exceptions
2. Raise custom exception where appropriate
3. Fix the exception handling in _get_group and _get_organization
2015-01-15 00:44:45 +01:00
Stefan Oderbolz 0fd38e0e54 Use _get_group as a fallback for remote orgs
First try to get a remote org from the remote Action API, if this fails
try to use the old rest api call, which works on older CKAN versions.

Only if both options fail, its currently not possible to get the remote
organization.
2015-01-14 00:10:27 +01:00
Stefan Oderbolz f214577872 Fetch remote organization via action api
Organizations used to be returned by /api/2/rest/group, this is what the
old implementation used to fetch the information to create the remote
organization on the local instance of CKAN.

With this commit the Action API is used to fetch the same information.
2015-01-13 14:46:53 +01:00
Stefan Oderbolz ea9debf714 Fix logic of conditional and make it more pythonic 2014-12-18 16:03:33 +01:00
Stefan Oderbolz 08930d01bf Make sure for new packages get a unique 'name' 2014-12-16 15:02:36 +01:00
clementmouchet 82c7988bf3 Removed ResourceGroup from query when using CKAN 2.3 or above 2014-12-12 13:10:40 +00:00
amercader a3affc9702 Fix validators on harvest_source_show schema
Remove validators on several keys so they don't get stripped during the
show validation.
2014-10-08 12:02:26 +01:00
amercader 098b54f1e5 Merge branch 'clear-source-delete-related' of https://github.com/waldvogel/ckanext-harvest into waldvogel-clear-source-delete-related 2014-09-29 13:49:19 +01:00
amercader e60e2eee03 Fix output for harvest_source_create/update
They were using an incorrect schema, so not returning a harvest source
like dict.
2014-09-29 12:43:37 +01:00
waldvogel c9b4e10506 delete records from related and related_dataset when clearing source 2014-09-12 10:56:37 +02:00
Jari Voutilainen 1e0376cff6 fix typo 2014-09-10 10:33:13 +03:00
Jari Voutilainen f6c1456abe fix job reporting to have job finished timestamp when there was zero datasets to gather 2014-09-10 09:22:55 +03:00
Jari Voutilainen 97f09913cf fix job reporting all datasets deleted when actually nothing changed during last two harvests 2014-09-10 09:22:44 +03:00
amercader 8cf254f112 Merge branch '99-all-non-ascii-tags' of https://github.com/morty/ckanext-harvest into morty-99-all-non-ascii-tags 2014-08-29 14:40:43 +01:00
amercader 546159744e Merge branch '101-modified-package-name' of https://github.com/morty/ckanext-harvest into morty-101-modified-package-name 2014-08-29 14:38:33 +01:00
amercader 039ac7c0ad Always remove harvest extras on after_show if there
Up until now we where relying on `for_edit` being present in the
context, but this is only added on the controllers. It's better to be
safe and remove them always. If needed (at index time) they will be
added afterwards.
2014-08-14 15:31:39 +01:00
Tom Mortimer-Jones 8a2c072d4e [101] Use name from database when reharvesting package 2014-08-12 11:18:48 +01:00
Tom Mortimer-Jones 65cfade420 [99] Remove empty tags produced by munging all non-ascii tags
I thought this way of filtering was easier to read than filter(None, tags)
2014-08-07 17:05:16 +01:00
amercader 13dbb1eea4 Fix variable not defined 2014-07-30 15:49:02 +01:00
amercader 58a873ac7a [#91] Remove config fields from source dict before indexing
We don't need them and will avoid indexing errors
2014-06-27 16:54:39 +01:00
amercader a59ab4b5ff [#91] Consolidate all harvest source reindex code in a single action
Make it available to users with permissions on the harvest source
2014-06-27 16:48:14 +01:00
amercader 7459358fa1 Support for single import commands
We are now able to run `paster harvester import` for a single harvest
object or for a single dataset, providing ids or name.
2014-05-15 16:30:30 +01:00
amercader 2c6aaf5bb1 Merge branch 'master' into 96-harvest-object-encoding-errors 2014-05-15 15:52:13 +01:00
amercader 43f1d08255 [#97] Persitent endpoint for datasets harvest objects
Contrary to `/harvest/object/xxx`, this endpoint is passed the dataset
id, thus it not depends on a particular object but the most recent one.
2014-04-30 17:45:07 +01:00
amercader 1b458b1772 [#96] Handle encoding errors on harvest object endpoint
When parsing the harvest object content to see if it is an XML file,
etree.fromstring would fail id there are incorrect unicode errors.
2014-04-28 12:48:09 +01:00
Richard Claydon e3492b57e7 Update plugin.py
Updating plugin.py to check for the existence of the extras key in the data_dict.
2014-02-27 16:05:39 +00:00
amercader d3cf5e58d1 [#86] Fix duplicate extras 2014-02-11 18:16:49 +00:00
amercader fbde0b8dc1 [#87] Remove remote url_type from resources
Otherwise CKAN thinks they are uploads, datastore resources, etc, which
it can cause problems eg when displaying the URL of the resource. We
are just linking to the remote resource URL.
2014-02-11 17:27:19 +00:00
amercader 5739e541d7 [#80] Support for Python 2.6 when handling xml exceptions 2014-02-10 18:44:46 +00:00
amercader 2a07a144fc [#84] Fix auth audit exception when creating datasets
This was caused by a combination of the auth audit leaking and the
harvester reusing the context for the package_show and package_create
actions. If the package is not found, package_show does not call
check_access, and the auth audit does not pass. This is stored in the
context (`__auth_audit`) and is raised next time that we call
get_action (when we call package_create with the same context)

It could potentially be fixed on master, but it is probably quite rare.
2014-02-10 18:22:48 +00:00
amercader 5b677b6099 [#83] Fix key error when using default_groups 2014-02-10 13:16:58 +00:00
Rachel Knowler bf11e4d330 Moved clean_tags check into _create_or_update_package method. 2014-02-10 09:29:01 +01:00
Rachel Knowler 2ba9908653 Config option to munge tags changed to be consistent with other config options in this extension, and noted in README. 2014-01-29 10:55:51 +01:00
Rachel Knowler 5e1aef1d08 Removed extra newline. 2014-01-29 10:06:32 +01:00
Rachel Knowler 7d71b0a00b Wrap tag munging code in config option, defaulting to False. 2014-01-29 10:02:16 +01:00
amercader 2b803a3f66 [#77] Use auth_allow_anonymous_access decorator
Starting from 2.2 you need to explicitly flag auth functions that
allow anonymous access with the p.toolkit.auth_allow_anonymous_access
decorator. A local version of the decorator is used to ensure we only
use it on CKAN>=2.2
2014-01-20 13:47:37 +00:00
amercader 4cc56f51ab [#76] Use harvest_source_show on reindex command 2014-01-14 17:04:34 +00:00
amercader 95d0ef0f01 [#76] Add extra fields to the source schema
Add 'private' and its core validators, and 'metadata_modified' and
'metadata_created'.

Also ignore '__extras'
2014-01-14 17:01:25 +00:00
amercader 467fb7bb8f Fix resource updating for harvested datasets
Starting from 2.2, resource_update calls package_show before updating
the resource via a package_update call. The dict passed had the harvest
extras (eg harvest_object_id) added which made the update call fails due
to duplicated extra keys. To fix it we now remove any harvest extras
on after_show if there is a 'for_edit' property on the context.
2014-01-13 10:30:52 +00:00
amercader 278a8e1ada Merge branch 'master' of github.com:okfn/ckanext-harvest 2014-01-10 13:49:38 +00:00