Commit Graph

4093 Commits

Author SHA1 Message Date
Miriam Baglioni 009730b3d1 added properties file in the forlder for the workflow of orcid propagation. Changes the path in the classes implementing the propagationchanged the path to the parameter file in the class for entitytoorganization propagation 2023-12-22 11:42:09 +01:00
Miriam Baglioni 89f269c7f4 changed the path to the parameter file in the class for entitytoorganization propagation 2023-12-22 11:37:50 +01:00
Miriam Baglioni b06aea0adf adding the bulkTag parameter file in the folder for the oozie workflow for bulkTagging. Changes the path in the class 2023-12-22 11:35:37 +01:00
Miriam Baglioni 3afd4aa57b adjustments for country propagation 2023-12-22 11:27:30 +01:00
dimitrispie ffdd03d2f4 Monitor Irish Stats WF
Parameters (with examples):
stats_db_name=openaire_beta_stats_20231208
monitor_irish_db_name=openaire_beta_stats_monitor_ie_20231208b
monitor_irish_db_prod_name=openaire_beta_stats_monitor_ie
graph_db_name=openaire_beta_20231208
monitor_irish_db_shadow_name=openaire_beta_stats_monitor_ie_shadow
hive_timeout=150000
hadoop_user_name=dnet.beta
resumeFrom=Step1-buildIrishMonitorDB
2023-12-22 11:05:24 +02:00
dimitrispie 40b98d8182 Changes to indicators and funders definition
- Changes result_refereed definition
- Added result_country indicator
- Added indi_pub_green_with_license indicator
- Added country from jurisdiction to funders
2023-12-22 10:29:20 +02:00
Claudio Atzori 62104790ae added metaresourcetype to the result hive DB view 2023-12-21 12:27:10 +01:00
Miriam Baglioni 5011c4d11a refactoring after compiletion 2023-12-20 15:57:26 +01:00
Miriam Baglioni 4740c808f7 - 2023-12-20 14:26:54 +01:00
Miriam Baglioni d410ea8a41 added needed parameter 2023-12-19 12:15:01 +01:00
Sandro La Bruzzo 15fd93a2b6 uploaded input parameters on CreateBaseline WF 2023-12-18 12:21:55 +01:00
Sandro La Bruzzo 9d342a47da updated the transformation Baseline workflow to include mdstore rollback/commit action 2023-12-18 11:48:57 +01:00
Miriam Baglioni 3eca5d2e1c - 2023-12-18 09:55:27 +01:00
Miriam Baglioni 01ce0b9c76 [doiboost - preprocess] remove transition to orcid preparation from sequence of steps at the beginning of the workflow 2023-12-15 12:24:55 +01:00
Miriam Baglioni 0d8e496a63 - 2023-12-15 12:16:43 +01:00
Claudio Atzori ff924215b8 [graph provision] added tests for new peerreviewed field 2023-12-12 11:21:30 +01:00
Claudio Atzori 7e8eff40c1 [graph provision] added tests for the new model fields 2023-12-12 08:54:15 +01:00
Miriam Baglioni 8752d275fa removed not needed parameter 2023-12-09 15:24:45 +01:00
Miriam Baglioni d4eedada71 adjusting workflow definition 2023-12-09 15:20:11 +01:00
Claudio Atzori cb71a7936b [graph cleaning] avoid stack overflow error when navigating Oaf objects declaring an Enum 2023-12-07 23:09:54 +01:00
Claudio Atzori 70eb1796b2 logging typo 2023-12-07 14:08:04 +01:00
Claudio Atzori c381bacee0 [enrichment] passing the community API base URL 2023-12-07 14:07:11 +01:00
Miriam Baglioni 336fb31d87 [community_result_propagation] adjusting starting poit of workflow 2023-12-07 10:27:25 +01:00
Miriam Baglioni c0cde53bf6 [bulktagging] setting first step of bulktaggin as the copy of the entities and relations not involved in the tagging' 2023-12-07 10:08:35 +01:00
Miriam Baglioni 616622d2bb first version of the workflow single step 2023-12-07 09:59:52 +01:00
Claudio Atzori 259c69e446 [orcid enrichment] fixed workflow definition 2023-12-06 19:41:53 +01:00
Claudio Atzori 431c6bb08a [dedup] added isLookupUrl to the graph consistency workflow definition, required now by the entity grouping phase 2023-12-06 11:06:46 +01:00
Claudio Atzori 321922772b added serialization for the new fields imported for the Irish tender 2023-12-05 16:37:04 +01:00
Claudio Atzori c5b7253130 [community_organization propagation] fixed workflow parameters 2023-12-05 09:13:33 +01:00
Claudio Atzori 3c3bdb8318 [bulktagging] fixed workflow parameters 2023-12-05 09:08:48 +01:00
Claudio Atzori 2a233a89aa [graph grouping] added isLookupUrl to the workflow definition, passed to the grouping spark aciton 2023-12-03 13:32:52 +01:00
Claudio Atzori 178a14c491 code formatting 2023-12-03 13:31:58 +01:00
Sandro La Bruzzo 3caf6ff27e Extracted the correct original type to pass to instanceTypeMapping in Crossref Mapping 2023-12-01 16:33:56 +01:00
Claudio Atzori 511a98dd80 fixed doiboost process workflow, removed references to the ProcessORCID step 2023-12-01 16:21:53 +01:00
Claudio Atzori 09d061e90b Merge branch 'beta' into orcid_import 2023-12-01 15:05:35 +01:00
Claudio Atzori 93a700742a Merge pull request 'Changes for tables and creation of the new indicator indi_is_result_accessible' (#363) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #363
2023-12-01 15:05:23 +01:00
Claudio Atzori 0c3c9ea43d Merge pull request 'StatsDB workflow to export actionsets about OA routes, diamond, and publicly-funded' (#355) from dimitris.pierrakos/dnet-hadoop:beta into beta
Reviewed-on: #355
2023-12-01 15:03:56 +01:00
Claudio Atzori 33cb483c75 using objectSubType as originalType in Crossref2Oaf, code formatting 2023-12-01 15:03:05 +01:00
dimitrispie c9d995dde0 New institutions added 2023-12-01 15:44:35 +02:00
dimitrispie a397112cb8 Add new indicator
Add indi_pub_publicly_funded
2023-12-01 15:00:18 +02:00
dimitrispie 76594ded23 Changes to indicators
Fixes on open access colours indicators
- indi_pub_green_oa
- indi_pub_gold_oa
- indi_pub_hybrid
- indi_pub_bronze_oa
- indi_pub_diamond
2023-12-01 13:38:19 +02:00
Claudio Atzori 622fafbd2e Merge branch 'beta' into orcid_import 2023-12-01 12:28:14 +01:00
Sandro La Bruzzo bf0fd27c36 Removed unused function
Applied PR Comment of Giambattista in the PR
2023-12-01 12:16:42 +01:00
dimitrispie 48430a32a6 Update StatsAtomicActionsJob.java
Added indi_funded_result_with_fundref indicator
2023-12-01 11:35:01 +02:00
Sandro La Bruzzo cdfb7588dd code formatting 2023-11-30 15:31:42 +01:00
Sandro La Bruzzo 5e22b67b8a Merge remote-tracking branch 'origin/beta' into orcid_import 2023-11-30 15:27:46 +01:00
Sandro La Bruzzo f718caaac9 Added copy of the untouched entities of the graph 2023-11-30 14:51:00 +01:00
Sandro La Bruzzo 7b5e04f37e removed Orcid intersection on DOIBoost 2023-11-30 14:36:50 +01:00
Claudio Atzori 6f10791e77 Merge branch 'beta' into propagationapi 2023-11-30 14:20:18 +01:00
Claudio Atzori 4e1aac2e2f resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350 2023-11-29 14:37:52 +01:00
Sandro La Bruzzo 86b5775e08 added vocabulary in instanceTypeMapping for
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 13:15:43 +01:00
Sandro La Bruzzo c96ff54b45 Merge remote-tracking branch 'origin/resource_types' into resource_types 2023-11-29 12:45:41 +01:00
Sandro La Bruzzo af1c2634b3 added instanceTypeMapping original field in the mapping of
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 12:45:30 +01:00
Sandro La Bruzzo 279100fa52 added test 2023-11-29 11:17:58 +01:00
Sandro La Bruzzo 59111713fa added comment 2023-11-28 09:00:48 +01:00
Sandro La Bruzzo 6f4d0c05ea Implemented Author MErger for ORCID that takes in account the case when name and surname are swapped 2023-11-28 08:43:56 +01:00
Miriam Baglioni 8eb70e6657 refactoring 2023-11-27 15:13:15 +01:00
Miriam Baglioni e3cce9a5a0 mergin with branch beta 2023-11-27 15:10:55 +01:00
Miriam Baglioni 48e0427a23 changed the parameter from production to baseURL. Fixed issue in tagging configuration 2023-11-27 15:10:27 +01:00
Sandro La Bruzzo 34a4b3cbdf Implemented ORCID Enrichment 2023-11-24 12:39:58 +01:00
dimitrispie 359e81b7a6 Update StatsAtomicActionsJob.java
Bug fix for duplicate bronze checks
2023-11-23 10:48:55 +02:00
Claudio Atzori 2c77638bf5 Merge branch 'beta' into cleaning_8898 2023-11-22 14:00:10 +01:00
Claudio Atzori 745039ad5b Merge branch 'beta' into 9117_pubmed_affiliations 2023-11-22 13:52:53 +01:00
Claudio Atzori 11a1207f9c [graph cleaning] applying coar based vocabularies in bulk 2023-11-22 12:22:14 +01:00
dimitrispie a94a54a2d0 Changes for tables and creation of the new indicator indi_is_result_accessible
- Drop table statements for all tables to avoid duplicates in case of wf rerun
- Add pdfsaggregated step to create the indi_is_result_accessible table. This step is executed on the new impala cluster only, since the pdfaggregation_i is updated on this cluster.
2023-11-15 14:32:18 +02:00
Miriam Baglioni eaf0a702de - 2023-11-14 14:53:34 +01:00
Sandro La Bruzzo 6ce36b3e41 Implemented ORCID Workflow on DHP-Aggregation for retrieving ORCID DUMP and generating tables 2023-11-14 12:04:29 +01:00
dimitrispie d524e30866 Changes to actionsets
Resolve comments from
#355
2023-11-14 09:46:52 +02:00
Miriam Baglioni 5bc97615d5 - 2023-11-03 15:35:10 +01:00
Miriam Baglioni 7b1e34f159 refactoring 2023-11-03 15:30:01 +01:00
Miriam Baglioni 638ad9e74f changing test for new implementation 2023-11-03 15:06:50 +01:00
Miriam Baglioni edcb17ca98 refactoring and test 2023-11-03 13:01:14 +01:00
Miriam Baglioni 937ff6a7c7 - 2023-10-31 15:56:08 +01:00
Miriam Baglioni a737dd47b6 removed not needed test class 2023-10-31 15:54:49 +01:00
Miriam Baglioni c80b768af0 test for project propagation 2023-10-31 15:49:42 +01:00
Miriam Baglioni e9a20fc8f6 mergin with branch beta 2023-10-31 14:36:03 +01:00
Claudio Atzori 262d7c581b [graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898 2023-10-31 14:34:10 +01:00
Serafeim Chatzopoulos 2090003ea9 Adjust tests to new WF input params 2023-10-26 13:47:06 -07:00
Serafeim Chatzopoulos a82aaf57b2 Renaming input param for crossref input path 2023-10-25 12:05:02 -07:00
Claudio Atzori b3a61ea955 Merge branch 'beta' into url_validation 2023-10-25 14:22:56 +02:00
dimitrispie 89c4dfbaf4 StatsDB workflow to export actionsets about OA routes, diamond, and publicly-funded
A new oozie workflow capable to read from the stats db to produce a new actionSet for updating results with:
- green_oa ={true, false}
- openAccesColor = {gold, hybrid, bronze}
- in_diamond_journal={true, false}
- publicly_funded={true, false}

Inputs:

- outputPath
- statsDB
2023-10-24 09:48:23 +03:00
Claudio Atzori 7fc621cdec added defaults to the graph resolution workflow config-default.xml 2023-10-20 22:28:12 +02:00
Serafeim Chatzopoulos aad5982bf1 Change the description of the workflow 2023-10-20 12:48:21 +03:00
Miriam Baglioni a4214ced1e fixing issue on propagation organization. added --config to workflow definition. added oozie_app to communtiy project 2023-10-20 10:14:20 +02:00
Serafeim Chatzopoulos 6b19dcee80 Add actionset creation for pubmed affiliations 2023-10-19 19:58:25 +03:00
Claudio Atzori 2b9d0416ec [graph raw] URL Validator to accept double slashes 2023-10-19 16:26:37 +02:00
Claudio Atzori b0fed1725e avoid NPEs 2023-10-19 12:13:45 +02:00
Miriam Baglioni f1b898c6b4 mergin with branch beta 2023-10-19 09:04:35 +02:00
Claudio Atzori 6dfcd0c9a2 [raw graph] mapping original resource types 2023-10-16 12:57:18 +02:00
Claudio Atzori 39d24d5469 Merge branch 'beta' into resource_types 2023-10-16 11:56:38 +02:00
Sandro La Bruzzo a5a89a702f new spark parrameter updated 2023-10-16 11:46:12 +02:00
Miriam Baglioni 159388f9c2 testing and fix some issues 2023-10-16 11:26:07 +02:00
Claudio Atzori 03670bb9ce [dedup] use common saveParquet and save methods to ensure outputs are compressed 2023-10-16 10:55:47 +02:00
Claudio Atzori 54fbf09ac6 [raw graph] WIP: mapping original resource types 2023-10-16 08:57:47 +02:00
Claudio Atzori 6cf64d5d8b [SWH] renamed 'Software Heritage Identifier' to 'Software Hash Identifier' 2023-10-13 10:09:26 +02:00
Claudio Atzori 76447958bb cleanup & docs 2023-10-12 12:23:20 +02:00
Claudio Atzori dda602fff7 [AMF] docs 2023-10-12 10:05:46 +02:00
Miriam Baglioni 8e9493fad9 mergin with branch beta 2023-10-11 18:18:09 +02:00
Miriam Baglioni 89184d5b4f used the API instead of the IS for bulktagging and propagation for community through organization. Added a new propagation step for communities through projects. Still using the API and not the IS 2023-10-11 18:17:35 +02:00
Claudio Atzori 554551682d [raw graph] adopting the new COAR based vocabularies for the resource typing 2023-10-11 16:09:19 +02:00