Commit Graph

69 Commits

Author SHA1 Message Date
Ken Tsang 0cbb3579a9 Update Redis exception to drop stack trace
Full stack trace is probably not needed as the Redis data corruption is probably happening somewhere else, the error log should make it easier to investigate it when it does happen.
2019-05-08 11:26:05 +01:00
Ken Tsang edcf80c944 Add try except around Redis set to handle corrupt redis data
- prevents bad data from stopping harvest processing
- log the error for investigation, full stack trace output as part of log
2019-05-03 13:13:31 +01:00
Edward Kerry fb0cf32546
Merge pull request #4 from alphagov/add-not-modified-status
Add `Not modified` statistic report for unchanged imports.
2019-04-15 13:17:28 +01:00
Ken Tsang 93a839efce Use == rather than `is` for string comparison 2019-04-12 11:31:04 +01:00
Jari Voutilainen 98364723d7 Apply flake8 2019-03-06 13:19:05 +02:00
Gavin Cannizzaro 9f8b98e54f Use Redis password from configuration when present. 2018-07-12 10:14:24 -04:00
Denis Laxalde 7bb9a2b5e4 Catch sqlalchemy's DatabaseError in fetch and gather callback
I sometimes see "connection timed out" message which are reported as
sqlalchemy.exc.DatabaseError, so by catching the latter exception, it'd
avoid the harvester to be stuck in "limbo" state.

As DatabaseError is a super-class of OperationalError, the latter would
still be catched.
2017-11-02 17:20:31 +01:00
Denis Laxalde cc44d03a41 Drop log.error() call redundant with prior log.exception() call
logging.exception() already logs an ERROR message with exception
information, so there's no need to call both log.exception() and
log.error().

Along the way, make messages uniform in fetch_callback() and
gather_callback().
2017-11-02 17:17:00 +01:00
Florian Brucker 2602de9094 [#257] Purge only our own Redis data.
Previously purging the queue on the Redis backend would clear the whole
database, making it hard to share the same database with other parts of
CKAN. With this commit, only the keys that belong to ckanext-harvest and
the current CKAN instance are purged.
2016-07-20 16:24:13 +02:00
David Read b0780b2062 Fetch stage can also return "unchanged", same as the import stage. Used by DGU. It is useful to skip an object like this, to avoid saving the fetched content in a HarvestObject (saves disk usage). 2015-12-01 17:38:57 +00:00
amercader f1ba2bcfb3 Namespace Redis keys to avoid conflicts between instances
The `ckan.site_id` config option (or `default` if missing) is used to
namespace the Redis keys: routing key and persistance key. Consumers
will only get the relevant keys for their instance.
2015-11-20 14:17:25 +00:00
amercader 920df684ae Merge branch 'db-error' 2015-11-20 12:29:37 +00:00
amercader ede50aa3fb Merge branch 'immediate-harvest' 2015-11-20 12:28:35 +00:00
David Read 59be6e2c71 Merge branch 'master' into db-error
Conflicts:
	ckanext/harvest/queue.py
2015-11-03 00:57:14 +00:00
David Read 8a7bc9e1d8 Merge remote-tracking branch 'origin/master' into immediate-harvest
Conflicts:
	README.rst
	ckanext/harvest/commands/harvester.py
	ckanext/harvest/logic/action/create.py
	ckanext/harvest/logic/action/update.py
	ckanext/harvest/logic/auth/update.py
2015-11-03 00:40:25 +00:00
David Read e59760fefe Merge branch 'job-reporting-fixes' of https://github.com/yhteentoimivuuspalvelut/ckanext-harvest into yhteentoimivuuspalvelut-job-reporting-fixes 2015-11-02 21:25:32 +00:00
David Read f1d2d5fdc4 [#111] Run jobs straight away. 2015-10-28 21:58:36 +00:00
David Read 421e6da660 Add run_test, job_abort, source commands
* run_test - for running a whole harvest on the command-line
* job_abort - for aborting a limbo job
* source - for showing a single harvest source
* allowing a source to be specified by name in several commands
2015-10-28 17:51:58 +00:00
David Read 0c0a996b85 Merge branch 'master' into db-error
Conflicts:
	ckanext/harvest/queue.py
2015-10-23 13:33:44 +01:00
amercader 2f4adfb338 Merge branch 'tests' 2015-10-23 13:18:15 +01:00
amercader 3c6cc55be0 Only flush keys on the current Redis database 2015-10-23 11:52:22 +01:00
amercader fdbade465f Merge branch 'master' into purge 2015-10-23 11:33:43 +01:00
David Read f70c16bce7 Add framework for testing harvesters. Modernize existing tests. 2015-10-21 16:26:57 +00:00
David Read d1f84295f8 purge_queues command now has warning about impact of Redis flushall, plus add some (log) output when you run a purge. 2015-10-21 16:12:40 +00:00
David Read 1a6dca7c00 [#148] Catch a more specific exception. 2015-10-01 12:30:40 +01:00
David Read de17e0ae8c Catch, record and recover from temporary db problems. 2015-07-22 10:25:11 +01:00
David Read 46f7b32b04 Merge branch 'master' of github.com:okfn/ckanext-harvest into migration-states 2015-07-22 10:13:55 +01:00
David Read 2da918c2e4 Fix migration for old harvests so that ones that errored are correctly marked. Added helpful comments in model. 2015-07-22 10:13:02 +01:00
amercader 9f8aae3a18 Append site id to queue name
This allows multiple CKAN sites to share the same RabbitMQ exchange
(For the Redis backend this is handled via different Redis databases)
2015-06-01 17:54:22 +01:00
Jari Voutilainen 859133fe36 move detecting unchanged datasets to ckanharvester and queue.py 2015-03-10 14:48:41 +02:00
Jari Voutilainen 97f09913cf fix job reporting all datasets deleted when actually nothing changed during last two harvests 2014-09-10 09:22:44 +03:00
amercader 55d2b4e304 Fix purge command 2013-10-16 12:59:23 +01:00
amercader f89f12203c Merge branch 'fix/rename-ampq-to-amqp' of git://github.com/opendatatrentino/ckanext-harvest into opendatatrentino-fix/rename-ampq-to-amqp 2013-10-04 17:24:53 +01:00
Samuele Santi 611b9aab6d Fixed typo: ampq -> amqp 2013-09-19 11:43:03 +02:00
amercader cc3f3d3426 [#50] Fix objects deletion on gather exceptions 2013-07-05 13:29:11 +01:00
amercader e2696b98bb [#50] Save all dates as UTC in the database
At some point we may want to transform these to local time at the
dictization level. We will need a library like dateutil to handle it
properly though.
2013-07-04 14:59:27 +01:00
amercader 9041f3f3ad Changes in Redis conusmer to make tests work 2013-04-22 18:08:19 +01:00
kindly dcfd201cdd [#32] redis queue support 2013-04-21 17:04:57 +01:00
kindly 0ce59a29b6 delete insead of update harvest objects when error 2013-04-12 12:32:33 +01:00
kindly 7d7657f94a make gather phase as finished if there is an error 2013-04-12 10:35:08 +01:00
kindly 0b5c3c608a catch and raise gather exception, acking the message 2013-03-25 11:57:57 +00:00
kindly 634a0bbd30 return instead of continue 2013-03-19 01:21:20 +00:00
kindly 3adf38105e readd code from old branch seperating the fetch and import logic 2013-03-19 01:16:43 +00:00
amercader d77f16aba9 [#21] Improve gather stage error handling
See issue for full details. Basically we don't want to catch any
exception at the queue.py level, as they prevent debugging. Harvesters
should deal with them and return a list of ids or an empty list if no
objects need to be fetched.
Also improved the debug messages.
2013-03-14 17:31:07 +00:00
amercader 5c17a525c1 Refresh session after each harvest stage
Otherwise the eg the source config got cached and you needed to restart
the consumers to refresh it.
2013-03-01 12:55:59 +00:00
kindly ebe246fe99 make report emit added so shows up on front end 2013-02-22 17:32:33 +00:00
kindly acb17ff3b0 capture errors more cleanly 2013-01-10 10:48:48 +00:00
kindly 36389e7ce0 make sure gather phase finishes job if there is a severe error 2012-12-24 12:21:21 +00:00
kindly 6b42d96fe0 add report_status field 2012-12-17 23:50:26 +00:00
amercader 0dde483992 Set job status to Finished when actually finishing it
Until now, harvest jobs were set to Finished just after sending all
objects to the fetch stage. Now every time the run command is run, jobs
are set to Running, and all previous Running jobs are checked to see if
all harvest objects have a state of Complete or Error. Only then the job
is flagged as Finished.
2012-12-13 18:19:22 +00:00