Commit Graph

203 Commits

Author SHA1 Message Date
David Read 4f71612002 PEP8 based on #174 2015-11-03 20:30:11 +00:00
Mark Winterbottom 208d1c4185 Setting back to master. 2015-11-03 17:31:00 +00:00
David Read 8a7bc9e1d8 Merge remote-tracking branch 'origin/master' into immediate-harvest
Conflicts:
	README.rst
	ckanext/harvest/commands/harvester.py
	ckanext/harvest/logic/action/create.py
	ckanext/harvest/logic/action/update.py
	ckanext/harvest/logic/auth/update.py
2015-11-03 00:40:25 +00:00
David Read e59760fefe Merge branch 'job-reporting-fixes' of https://github.com/yhteentoimivuuspalvelut/ckanext-harvest into yhteentoimivuuspalvelut-job-reporting-fixes 2015-11-02 21:25:32 +00:00
David Read d495e269e7 [#158] Fix tests 2015-11-02 17:29:45 +00:00
David Read 14f372aec6 Merge branch 'master' of github.com:ckan/ckanext-harvest into 157-version-three-apify
Conflicts:
	README.rst
2015-11-02 17:01:22 +00:00
Mark Winterbottom 7ffd6748f3 Corrected docstring params field, duplicate if statement and deleting keys
for blank values.
2015-11-02 16:59:43 +00:00
David Read b7552ba700 [#158] Try harder to use the "get datasets since time X" method of harvesting. Go back to the last completely successful harvest, rather than just consider the previous one. And that had a bug, because fetch errors were ignored, meaning one fetch error could mean that dataset never got harvested again. 2015-11-02 16:59:19 +00:00
Mark Winterbottom 443d690ac8 Fixed big typo error. 2015-11-02 16:45:16 +00:00
Mark Winterbottom 53f692b802 Merge remote-tracking branch 'remotes/upstream/master' 2015-11-02 16:00:14 +00:00
Mark Winterbottom 1702cf2f09 Remove ', None' on .get() calls because it's the default value. 2015-11-02 15:51:25 +00:00
Mark Winterbottom 0c19acba78 Changed double quotes to single quotes in docstrings. 2015-11-02 15:50:04 +00:00
Mark Winterbottom a6069d93db Fixed bug where the harvest source url validator would validate against
all harvest sources that were ever created instead of just sources that
were currently enabled.
2015-10-30 16:59:04 +00:00
Mark Winterbottom 3f37ae5f45 Corrected docstring. 2015-10-30 16:11:25 +00:00
Mark Winterbottom 02b81187df Fixed bug with deleting harvest source's which have a custom
configuration. Added PEP-8 compliance.
2015-10-30 15:15:41 +00:00
Mark Winterbottom 55325f5940 Updated harvest source url validator to allow for duplicate URL's with
unique configs.
2015-10-30 11:59:24 +00:00
Mark Winterbottom 2c41293c9c Updated the validator to check for unique sets as well as URL. 2015-10-29 18:30:51 +00:00
Mark Winterbottom 39ce744368 Modified to make PEP-8 compliant. 2015-10-29 17:18:51 +00:00
David Read f1d2d5fdc4 [#111] Run jobs straight away. 2015-10-28 21:58:36 +00:00
David Read 421e6da660 Add run_test, job_abort, source commands
* run_test - for running a whole harvest on the command-line
* job_abort - for aborting a limbo job
* source - for showing a single harvest source
* allowing a source to be specified by name in several commands
2015-10-28 17:51:58 +00:00
David Read f70c16bce7 Add framework for testing harvesters. Modernize existing tests. 2015-10-21 16:26:57 +00:00
David Read 6360681a8f [#105] Fix order of deletes, as agreed with @florianm. 2015-10-12 15:57:27 +01:00
Florian Mayer a6cdda0a14 set max version to 2.4.99 2015-08-19 08:41:42 +00:00
florianm 1905caa961 upgrade harvest_source_clear to not delete from authz models removed in migration 078 2015-08-19 10:25:20 +08:00
Stefan Oderbolz ab76830e85 [#145] Throw + catch a custom exception if there are no jobs to run
If there are no harvesting jobs to run, there was always an ugly
exception message when using the paster command. This replaces the ugly
output with a proper message and uses a custom exception to allow others
to deal with this error differently.
2015-07-20 18:41:50 +02:00
Stefan Oderbolz 4dc2f7367d [#139] Delete package relationships when clearing a harvest source 2015-06-26 17:20:23 +02:00
amercader f72d6da521 Change toolkit import
Apparently on package installs this is not well supported

from ckan.plugins.toolkit import check_ckan_version

But this works:

from ckan.plugins import toolkit

toolkit.check_ckan_version(...
2015-03-19 12:48:46 +00:00
clementmouchet ead9e67a33 updated def harvest_source_clear() to delete resource views, resource revisions & resources in CKAN >= 2.3 2015-02-23 17:02:21 +00:00
clementmouchet 82c7988bf3 Removed ResourceGroup from query when using CKAN 2.3 or above 2014-12-12 13:10:40 +00:00
amercader a3affc9702 Fix validators on harvest_source_show schema
Remove validators on several keys so they don't get stripped during the
show validation.
2014-10-08 12:02:26 +01:00
amercader 098b54f1e5 Merge branch 'clear-source-delete-related' of https://github.com/waldvogel/ckanext-harvest into waldvogel-clear-source-delete-related 2014-09-29 13:49:19 +01:00
amercader e60e2eee03 Fix output for harvest_source_create/update
They were using an incorrect schema, so not returning a harvest source
like dict.
2014-09-29 12:43:37 +01:00
waldvogel c9b4e10506 delete records from related and related_dataset when clearing source 2014-09-12 10:56:37 +02:00
Jari Voutilainen 1e0376cff6 fix typo 2014-09-10 10:33:13 +03:00
Jari Voutilainen f6c1456abe fix job reporting to have job finished timestamp when there was zero datasets to gather 2014-09-10 09:22:55 +03:00
amercader 13dbb1eea4 Fix variable not defined 2014-07-30 15:49:02 +01:00
amercader 58a873ac7a [#91] Remove config fields from source dict before indexing
We don't need them and will avoid indexing errors
2014-06-27 16:54:39 +01:00
amercader a59ab4b5ff [#91] Consolidate all harvest source reindex code in a single action
Make it available to users with permissions on the harvest source
2014-06-27 16:48:14 +01:00
amercader 7459358fa1 Support for single import commands
We are now able to run `paster harvester import` for a single harvest
object or for a single dataset, providing ids or name.
2014-05-15 16:30:30 +01:00
amercader 43f1d08255 [#97] Persitent endpoint for datasets harvest objects
Contrary to `/harvest/object/xxx`, this endpoint is passed the dataset
id, thus it not depends on a particular object but the most recent one.
2014-04-30 17:45:07 +01:00
amercader 2b803a3f66 [#77] Use auth_allow_anonymous_access decorator
Starting from 2.2 you need to explicitly flag auth functions that
allow anonymous access with the p.toolkit.auth_allow_anonymous_access
decorator. A local version of the decorator is used to ensure we only
use it on CKAN>=2.2
2014-01-20 13:47:37 +00:00
amercader 4cc56f51ab [#76] Use harvest_source_show on reindex command 2014-01-14 17:04:34 +00:00
amercader 95d0ef0f01 [#76] Add extra fields to the source schema
Add 'private' and its core validators, and 'metadata_modified' and
'metadata_created'.

Also ignore '__extras'
2014-01-14 17:01:25 +00:00
Mikko Koho 51e842ee6e Add quotes to harvest_source_id in Solr query when clearing harvest sources 2013-11-22 11:01:26 +02:00
joetsoi da2fd45e80 [#65] make harvest_job_exists validator return model object
return the model in the validator instead of checking that it exists in
the validator, returning the id and then fetching it again in the action
function
2013-10-03 15:51:37 +01:00
joetsoi 9b3199b41b [#65] remove unused code 2013-09-17 17:02:38 +01:00
joetsoi 5da153c6b6 [#65] harvest_object_create action
update to use schema and validators. Also accept more parameters to
data_dict.
2013-09-17 16:49:19 +01:00
joetsoi 1b663bbff4 add harvest_object_create action 2013-09-04 14:17:01 +01:00
Vitor Baptista 70e53a7833 Fix bug where source was being treated as an object, when it's a dict 2013-07-29 07:06:58 -03:00
kindly a42991b8c9 fix so that non sysadmins can edit harvest sources of organizations they
are admins or editors of.
2013-06-27 12:16:11 +01:00
amercader 751409ab7d [#34] Integrate clear command with delete source
When deleting a source, if clear_source equals true in the context,
harvest_source_clear will be called. Default is false. The UI shows a
select with the two options.
2013-05-20 14:30:22 +01:00
amercader b9e2613458 [#34] Allow all authorized users for a source to clear it 2013-05-16 17:57:59 +01:00
amercader 7b652542e7 [#34] Fix harvest_source_clear action
Some typos in the SQL statements, and also the source needs to be
reindexed to update the status with the counts.
2013-05-16 17:33:39 +01:00
kindly 1714e55110 simplify harvest_clear queries so they do not lock on big db 2013-04-30 13:59:23 +01:00
kindly a2b8ab1994 make harvest source clear not create table 2013-04-30 12:40:46 +01:00
kindly dcfd201cdd [#32] redis queue support 2013-04-21 17:04:57 +01:00
kindly bd761498f0 make sure config dict is not jsonified if it contains an error 2013-04-08 18:52:36 +01:00
amercader 5414b6c08d Merge branch '29-new-idataset-form' into release-v2.0 2013-04-08 13:23:41 +01:00
amercader 99bd17401c Handle wrong JSON in harvest_source_extra_validator 2013-03-28 16:19:16 +00:00
kindly a9b8be8f01 harvest source index clear 2013-03-28 15:36:44 +00:00
amercader fbc8ecde97 [#29] Fix some imports on actions and plugin 2013-03-28 15:00:44 +00:00
kindly c754479014 #29 make new idatasets form work with harvest source form 2013-03-25 17:38:07 +00:00
kindly 845c9927a8 add harvest source clear 2013-03-25 11:39:00 +00:00
amercader c2a6bd14eb Add auth function for harvest_source_show_status 2013-03-18 16:48:27 +00:00
amercader c76b7d95f3 Only count public datasets on the source status
This is more in line with what is done on the orgs/groups pages
2013-03-18 16:41:01 +00:00
amercader 23d1d5742c [#18] Update delete harvest source functionality
The harvest_source_delete logic function proxies to package delete,
which will delete the harvest source dataset. The harvest plugin then
hooks to the after_delete extension point in order to inactivate the
actual HarvestSource object and abort any pending jobs.
Also added the Delete button to the harvest source form.
2013-03-12 13:14:07 +00:00
amercader 949bb6fe6a [#16] Add organization to source dict 2013-03-08 14:47:11 +00:00
amercader 2ee27164c3 [#13] Remove or deprecate unused code
Mostly in controllers, dictization and plugin, either related to the old
templates pre-dataset type or old authorization.
2013-03-06 16:54:33 +00:00
amercader 74633d0803 Fix error count in job stats
We want to take into account objects with errors that where created or
updated anyway (eg bbox errors), so we bascially query for the number of
objects that have object errors.

Also add the number of gather errors to this count.
2013-03-06 13:44:04 +00:00
amercader bec31a611e Fix empty job finished date 2013-03-06 13:42:35 +00:00
amercader 04710fd1c6 Revert removal of filter in job list action in 7544d5c 2013-03-06 12:19:20 +00:00
John Martin 7544d5c5ef [#7] Removed faceted navigation for uneeded toggles in job reports 2013-03-05 15:23:42 +00:00
amercader 217d58d3a4 Merge branch 'source_extra_config_validation' of github.com:okfn/ckanext-harvest into source_extra_config_validation 2013-02-28 16:03:27 +00:00
amercader f28dc97f79 Fix bug in harvest job reports 2013-02-28 15:47:56 +00:00
kindly 871576f89c Merge remote-tracking branch 'remotes/origin/source_extra_config_validation' into source_extra_config_validation 2013-02-28 13:48:58 +00:00
kindly 9cef777e7b make sure config is also on top level 2013-02-28 13:46:16 +00:00
amercader e82410724a Merge branch '7-harvest-source-templates' into source_extra_config_validation 2013-02-28 12:18:09 +00:00
amercader f7cba69fe6 Merge branch '2.0-dataset-sources' into 7-harvest-source-templates 2013-02-28 12:17:47 +00:00
amercader a86d91c3f0 [#11] Make get actions side_effect_free 2013-02-28 12:17:15 +00:00
amercader fe6952ed00 Merge branch '7-harvest-source-templates' into source_extra_config_validation 2013-02-27 15:45:33 +00:00
kindly 5b50126670 source extras field type 2013-02-25 18:07:34 +00:00
amercader efe977512b Include gather errors on job summaries and reports 2013-02-25 17:17:08 +00:00
amercader 57b3739dd4 [#7] Return most recent job on source status, not just finished 2013-02-25 15:32:39 +00:00
amercader eaa8988440 [#4] Changes in schema to accommodate organizations
Basically handle the 'owner_org' field in form_to_db and db_to_form.
Added 'owner_org', 'frequency' (has default) and 'config' to surplus
keys in check_data_dict.
Also remove schema tweaks to let package_show call the appropiate schema
function.
2013-02-11 16:34:52 +00:00
amercader 3c50a40a76 [#5] Fix auth for harvest_job_list (should forward to harvest_source_update) 2013-02-05 16:41:29 +00:00
amercader e1ce0b7267 [#5] Allow not returning error summary on job dictize 2013-02-04 18:28:45 +00:00
amercader 8576ad6784 [#5] Add job listing page 2013-02-04 18:20:58 +00:00
amercader 42bace3628 [#5] Add new finished field for harvest job
When the run command flags a job as finished, it will query the most
recent harvest object for this job and use its import_finished value as
the job finishing time.
2013-01-28 17:19:28 +00:00
amercader c8e7086567 [#5] Change default auth for showing and listing jobs
Forward auth checks to harvest_source_update instead of
harvest_source_show, as job reports should only be visible to users that
can manage sources.
2013-01-28 16:31:11 +00:00
amercader ab78bf21b9 [#5] Fix typo in delete auth function 2013-01-28 16:15:38 +00:00
amercader 676c7d34b6 [#5] Add method for returning the original URL for a document
Harvesters implementing IHarvester can define a `get_original_url`
method that should return a URL pointing to the original location of a
document in the remote server. If present, this URL will be used on the
job reports.

Examples:
* For a CKAN record: http://{ckan-instance}/api/rest/{guid}
* For a WAF record: http://{waf-root}/{file-name}
* For a CSW record: http://{csw-server}/?Request=GetElementById&Id={guid}&...
2013-01-24 18:35:43 +00:00
amercader daa9a385ff Update job keys changed on 9ba6e8f 2013-01-24 17:36:58 +00:00
amercader 30d58b2b7b [#5] Preliminary job report logic function and page (WIP) 2013-01-23 18:04:19 +00:00
amercader b2b89dfd61 Add command for reindex all harvest sources 2013-01-22 16:43:36 +00:00
amercader 9ba6e8f3b3 [#5] Add error summary to harvest_job_dictize
It will return the counts for the 20 most common errors for that
particular job. These will available when calling harvest_job_show.

Also refactor the harvest source status object to just call
harvest_job_dictize on the 'last_job' key, as it has all the
interesting fields anyway.
2013-01-22 13:13:24 +00:00
amercader 30c9eedf5f Improve harvest source status creation
Use report_status field to improve speed, remove unnecessary fields.
2013-01-17 15:43:45 +00:00
amercader 2ab10afcf9 [#4] Fix typo in auth functions 2013-01-16 12:56:58 +00:00
amercader 2bb669af21 [#4] Add owner_org field to schema and form
This should store the owner organization id.

Also added the errors box on the form.
2013-01-10 12:23:01 +00:00
amercader e49dd94b34 [#4] Remove authorization functions for the publisher profile
The different profiles will be now configured via the harvest source
datasets on CKAN core, so they are no longer needed.
2013-01-09 17:35:47 +00:00
amercader 058dcad435 [#4] Minor change on the state field to fix a bug on harvest_source_show 2013-01-09 17:31:30 +00:00