Miriam Baglioni
9f966b59d4
added properties file in the forlder for the workflow of result to community from semrel propagation. Changes the path in the classes implementing the propagation
2023-12-22 14:11:47 +01:00
Miriam Baglioni
2f3b5a133d
added properties file in the forlder for the workflow of result to community from organization propagation. Changes the path in the classes implementing the propagation
2023-12-22 13:56:40 +01:00
Miriam Baglioni
2f7b9ad815
added properties file in the forlder for the workflow of project to result propagation. Changes the path in the classes implementing the propagation
2023-12-22 11:46:15 +01:00
Miriam Baglioni
f2352e8a78
changed in the classes the path for the property files for the propagation of community from project
2023-12-22 11:43:34 +01:00
Miriam Baglioni
009730b3d1
added properties file in the forlder for the workflow of orcid propagation. Changes the path in the classes implementing the propagationchanged the path to the parameter file in the class for entitytoorganization propagation
2023-12-22 11:42:09 +01:00
Miriam Baglioni
89f269c7f4
changed the path to the parameter file in the class for entitytoorganization propagation
2023-12-22 11:37:50 +01:00
Miriam Baglioni
b06aea0adf
adding the bulkTag parameter file in the folder for the oozie workflow for bulkTagging. Changes the path in the class
2023-12-22 11:35:37 +01:00
Miriam Baglioni
3afd4aa57b
adjustments for country propagation
2023-12-22 11:27:30 +01:00
dimitrispie
ffdd03d2f4
Monitor Irish Stats WF
...
Parameters (with examples):
stats_db_name=openaire_beta_stats_20231208
monitor_irish_db_name=openaire_beta_stats_monitor_ie_20231208b
monitor_irish_db_prod_name=openaire_beta_stats_monitor_ie
graph_db_name=openaire_beta_20231208
monitor_irish_db_shadow_name=openaire_beta_stats_monitor_ie_shadow
hive_timeout=150000
hadoop_user_name=dnet.beta
resumeFrom=Step1-buildIrishMonitorDB
2023-12-22 11:05:24 +02:00
dimitrispie
40b98d8182
Changes to indicators and funders definition
...
- Changes result_refereed definition
- Added result_country indicator
- Added indi_pub_green_with_license indicator
- Added country from jurisdiction to funders
2023-12-22 10:29:20 +02:00
Claudio Atzori
62104790ae
added metaresourcetype to the result hive DB view
2023-12-21 12:27:10 +01:00
Miriam Baglioni
5011c4d11a
refactoring after compiletion
2023-12-20 15:57:26 +01:00
Miriam Baglioni
4740c808f7
-
2023-12-20 14:26:54 +01:00
Miriam Baglioni
d410ea8a41
added needed parameter
2023-12-19 12:15:01 +01:00
Miriam Baglioni
624f5f3f21
[Transformative Agreement] added check to verify the APC were paid byu the IReL funder
2023-12-18 15:28:19 +01:00
Miriam Baglioni
354e02e6a9
[Transformative Agreement] removed not needed class. Read directly the json and no need to pass from the csv
2023-12-18 15:20:27 +01:00
Miriam Baglioni
b00771c7cc
[Transformative Agreement] added code to extract relations from the transformative agreement file for the IE products got from OpenAPC
2023-12-18 15:12:44 +01:00
Sandro La Bruzzo
15fd93a2b6
uploaded input parameters on CreateBaseline WF
2023-12-18 12:21:55 +01:00
Sandro La Bruzzo
9d342a47da
updated the transformation Baseline workflow to include mdstore rollback/commit action
2023-12-18 11:48:57 +01:00
Miriam Baglioni
3eca5d2e1c
-
2023-12-18 09:55:27 +01:00
Miriam Baglioni
01ce0b9c76
[doiboost - preprocess] remove transition to orcid preparation from sequence of steps at the beginning of the workflow
2023-12-15 12:24:55 +01:00
Miriam Baglioni
0d8e496a63
-
2023-12-15 12:16:43 +01:00
Claudio Atzori
ff924215b8
[graph provision] added tests for new peerreviewed field
2023-12-12 11:21:30 +01:00
Claudio Atzori
7e8eff40c1
[graph provision] added tests for the new model fields
2023-12-12 08:54:15 +01:00
Miriam Baglioni
8752d275fa
removed not needed parameter
2023-12-09 15:24:45 +01:00
Miriam Baglioni
d4eedada71
adjusting workflow definition
2023-12-09 15:20:11 +01:00
Claudio Atzori
cb71a7936b
[graph cleaning] avoid stack overflow error when navigating Oaf objects declaring an Enum
2023-12-07 23:09:54 +01:00
Claudio Atzori
70eb1796b2
logging typo
2023-12-07 14:08:04 +01:00
Claudio Atzori
c381bacee0
[enrichment] passing the community API base URL
2023-12-07 14:07:11 +01:00
Miriam Baglioni
336fb31d87
[community_result_propagation] adjusting starting poit of workflow
2023-12-07 10:27:25 +01:00
Miriam Baglioni
c0cde53bf6
[bulktagging] setting first step of bulktaggin as the copy of the entities and relations not involved in the tagging'
2023-12-07 10:08:35 +01:00
Miriam Baglioni
616622d2bb
first version of the workflow single step
2023-12-07 09:59:52 +01:00
Claudio Atzori
259c69e446
[orcid enrichment] fixed workflow definition
2023-12-06 19:41:53 +01:00
Claudio Atzori
431c6bb08a
[dedup] added isLookupUrl to the graph consistency workflow definition, required now by the entity grouping phase
2023-12-06 11:06:46 +01:00
Claudio Atzori
321922772b
added serialization for the new fields imported for the Irish tender
2023-12-05 16:37:04 +01:00
Claudio Atzori
c5b7253130
[community_organization propagation] fixed workflow parameters
2023-12-05 09:13:33 +01:00
Claudio Atzori
3c3bdb8318
[bulktagging] fixed workflow parameters
2023-12-05 09:08:48 +01:00
Claudio Atzori
2a233a89aa
[graph grouping] added isLookupUrl to the workflow definition, passed to the grouping spark aciton
2023-12-03 13:32:52 +01:00
Claudio Atzori
178a14c491
code formatting
2023-12-03 13:31:58 +01:00
Sandro La Bruzzo
3caf6ff27e
Extracted the correct original type to pass to instanceTypeMapping in Crossref Mapping
2023-12-01 16:33:56 +01:00
Claudio Atzori
511a98dd80
fixed doiboost process workflow, removed references to the ProcessORCID step
2023-12-01 16:21:53 +01:00
Claudio Atzori
09d061e90b
Merge branch 'beta' into orcid_import
2023-12-01 15:05:35 +01:00
Claudio Atzori
93a700742a
Merge pull request 'Changes for tables and creation of the new indicator indi_is_result_accessible' ( #363 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#363
2023-12-01 15:05:23 +01:00
Claudio Atzori
0c3c9ea43d
Merge pull request 'StatsDB workflow to export actionsets about OA routes, diamond, and publicly-funded' ( #355 ) from dimitris.pierrakos/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#355
2023-12-01 15:03:56 +01:00
Claudio Atzori
33cb483c75
using objectSubType as originalType in Crossref2Oaf, code formatting
2023-12-01 15:03:05 +01:00
dimitrispie
c9d995dde0
New institutions added
2023-12-01 15:44:35 +02:00
dimitrispie
a397112cb8
Add new indicator
...
Add indi_pub_publicly_funded
2023-12-01 15:00:18 +02:00
dimitrispie
76594ded23
Changes to indicators
...
Fixes on open access colours indicators
- indi_pub_green_oa
- indi_pub_gold_oa
- indi_pub_hybrid
- indi_pub_bronze_oa
- indi_pub_diamond
2023-12-01 13:38:19 +02:00
Claudio Atzori
622fafbd2e
Merge branch 'beta' into orcid_import
2023-12-01 12:28:14 +01:00
Sandro La Bruzzo
bf0fd27c36
Removed unused function
...
Applied PR Comment of Giambattista in the PR
2023-12-01 12:16:42 +01:00
dimitrispie
48430a32a6
Update StatsAtomicActionsJob.java
...
Added indi_funded_result_with_fundref indicator
2023-12-01 11:35:01 +02:00
Sandro La Bruzzo
cdfb7588dd
code formatting
2023-11-30 15:31:42 +01:00
Sandro La Bruzzo
5e22b67b8a
Merge remote-tracking branch 'origin/beta' into orcid_import
2023-11-30 15:27:46 +01:00
Sandro La Bruzzo
f718caaac9
Added copy of the untouched entities of the graph
2023-11-30 14:51:00 +01:00
Sandro La Bruzzo
7b5e04f37e
removed Orcid intersection on DOIBoost
2023-11-30 14:36:50 +01:00
Claudio Atzori
6f10791e77
Merge branch 'beta' into propagationapi
2023-11-30 14:20:18 +01:00
Claudio Atzori
4e1aac2e2f
resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350
2023-11-29 14:37:52 +01:00
Sandro La Bruzzo
86b5775e08
added vocabulary in instanceTypeMapping for
...
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 13:15:43 +01:00
Sandro La Bruzzo
c96ff54b45
Merge remote-tracking branch 'origin/resource_types' into resource_types
2023-11-29 12:45:41 +01:00
Sandro La Bruzzo
af1c2634b3
added instanceTypeMapping original field in the mapping of
...
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 12:45:30 +01:00
Sandro La Bruzzo
279100fa52
added test
2023-11-29 11:17:58 +01:00
Sandro La Bruzzo
59111713fa
added comment
2023-11-28 09:00:48 +01:00
Sandro La Bruzzo
6f4d0c05ea
Implemented Author MErger for ORCID that takes in account the case when name and surname are swapped
2023-11-28 08:43:56 +01:00
Miriam Baglioni
8eb70e6657
refactoring
2023-11-27 15:13:15 +01:00
Miriam Baglioni
e3cce9a5a0
mergin with branch beta
2023-11-27 15:10:55 +01:00
Miriam Baglioni
48e0427a23
changed the parameter from production to baseURL. Fixed issue in tagging configuration
2023-11-27 15:10:27 +01:00
Sandro La Bruzzo
34a4b3cbdf
Implemented ORCID Enrichment
2023-11-24 12:39:58 +01:00
dimitrispie
359e81b7a6
Update StatsAtomicActionsJob.java
...
Bug fix for duplicate bronze checks
2023-11-23 10:48:55 +02:00
Claudio Atzori
2c77638bf5
Merge branch 'beta' into cleaning_8898
2023-11-22 14:00:10 +01:00
Claudio Atzori
745039ad5b
Merge branch 'beta' into 9117_pubmed_affiliations
2023-11-22 13:52:53 +01:00
Claudio Atzori
11a1207f9c
[graph cleaning] applying coar based vocabularies in bulk
2023-11-22 12:22:14 +01:00
dimitrispie
a94a54a2d0
Changes for tables and creation of the new indicator indi_is_result_accessible
...
- Drop table statements for all tables to avoid duplicates in case of wf rerun
- Add pdfsaggregated step to create the indi_is_result_accessible table. This step is executed on the new impala cluster only, since the pdfaggregation_i is updated on this cluster.
2023-11-15 14:32:18 +02:00
Miriam Baglioni
eaf0a702de
-
2023-11-14 14:53:34 +01:00
Sandro La Bruzzo
6ce36b3e41
Implemented ORCID Workflow on DHP-Aggregation for retrieving ORCID DUMP and generating tables
2023-11-14 12:04:29 +01:00
dimitrispie
d524e30866
Changes to actionsets
...
Resolve comments from
D-Net/dnet-hadoop#355
2023-11-14 09:46:52 +02:00
Miriam Baglioni
5bc97615d5
-
2023-11-03 15:35:10 +01:00
Miriam Baglioni
7b1e34f159
refactoring
2023-11-03 15:30:01 +01:00
Miriam Baglioni
638ad9e74f
changing test for new implementation
2023-11-03 15:06:50 +01:00
Miriam Baglioni
edcb17ca98
refactoring and test
2023-11-03 13:01:14 +01:00
Miriam Baglioni
937ff6a7c7
-
2023-10-31 15:56:08 +01:00
Miriam Baglioni
a737dd47b6
removed not needed test class
2023-10-31 15:54:49 +01:00
Miriam Baglioni
c80b768af0
test for project propagation
2023-10-31 15:49:42 +01:00
Miriam Baglioni
e9a20fc8f6
mergin with branch beta
2023-10-31 14:36:03 +01:00
Claudio Atzori
262d7c581b
[graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898
2023-10-31 14:34:10 +01:00
Serafeim Chatzopoulos
2090003ea9
Adjust tests to new WF input params
2023-10-26 13:47:06 -07:00
Serafeim Chatzopoulos
a82aaf57b2
Renaming input param for crossref input path
2023-10-25 12:05:02 -07:00
Claudio Atzori
b3a61ea955
Merge branch 'beta' into url_validation
2023-10-25 14:22:56 +02:00
dimitrispie
89c4dfbaf4
StatsDB workflow to export actionsets about OA routes, diamond, and publicly-funded
...
A new oozie workflow capable to read from the stats db to produce a new actionSet for updating results with:
- green_oa ={true, false}
- openAccesColor = {gold, hybrid, bronze}
- in_diamond_journal={true, false}
- publicly_funded={true, false}
Inputs:
- outputPath
- statsDB
2023-10-24 09:48:23 +03:00
Claudio Atzori
7fc621cdec
added defaults to the graph resolution workflow config-default.xml
2023-10-20 22:28:12 +02:00
Serafeim Chatzopoulos
aad5982bf1
Change the description of the workflow
2023-10-20 12:48:21 +03:00
Miriam Baglioni
a4214ced1e
fixing issue on propagation organization. added --config to workflow definition. added oozie_app to communtiy project
2023-10-20 10:14:20 +02:00
Serafeim Chatzopoulos
6b19dcee80
Add actionset creation for pubmed affiliations
2023-10-19 19:58:25 +03:00
Claudio Atzori
2b9d0416ec
[graph raw] URL Validator to accept double slashes
2023-10-19 16:26:37 +02:00
Claudio Atzori
b0fed1725e
avoid NPEs
2023-10-19 12:13:45 +02:00
Miriam Baglioni
f1b898c6b4
mergin with branch beta
2023-10-19 09:04:35 +02:00
Claudio Atzori
6dfcd0c9a2
[raw graph] mapping original resource types
2023-10-16 12:57:18 +02:00
Claudio Atzori
39d24d5469
Merge branch 'beta' into resource_types
2023-10-16 11:56:38 +02:00
Sandro La Bruzzo
a5a89a702f
new spark parrameter updated
2023-10-16 11:46:12 +02:00
Miriam Baglioni
159388f9c2
testing and fix some issues
2023-10-16 11:26:07 +02:00
Claudio Atzori
03670bb9ce
[dedup] use common saveParquet and save methods to ensure outputs are compressed
2023-10-16 10:55:47 +02:00
Claudio Atzori
54fbf09ac6
[raw graph] WIP: mapping original resource types
2023-10-16 08:57:47 +02:00
Claudio Atzori
6cf64d5d8b
[SWH] renamed 'Software Heritage Identifier' to 'Software Hash Identifier'
2023-10-13 10:09:26 +02:00
Claudio Atzori
76447958bb
cleanup & docs
2023-10-12 12:23:20 +02:00
Claudio Atzori
dda602fff7
[AMF] docs
2023-10-12 10:05:46 +02:00
Miriam Baglioni
8e9493fad9
mergin with branch beta
2023-10-11 18:18:09 +02:00
Miriam Baglioni
89184d5b4f
used the API instead of the IS for bulktagging and propagation for community through organization. Added a new propagation step for communities through projects. Still using the API and not the IS
2023-10-11 18:17:35 +02:00
Claudio Atzori
554551682d
[raw graph] adopting the new COAR based vocabularies for the resource typing
2023-10-11 16:09:19 +02:00
Claudio Atzori
a460ebe215
[UnresolvedEntities] updated action name
2023-10-10 15:50:11 +02:00
Claudio Atzori
66064e99fe
Merge branch 'beta' into fos
2023-10-10 15:07:21 +02:00
Miriam Baglioni
a431b04814
leftover for the properties and removal of bipfinder
2023-10-10 12:53:57 +02:00
Claudio Atzori
ed9282ef2a
removed module dhp-stats-monitor-update
2023-10-10 09:52:03 +02:00
Miriam Baglioni
110ce4b40f
extend the fos model to include the level4 and the scores for level3 and level4. removed bip indicators from the instance
2023-10-10 09:46:40 +02:00
Claudio Atzori
204404b0e3
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-10-10 09:36:13 +02:00
Claudio Atzori
9a98f408b3
code formatting
2023-10-10 09:36:11 +02:00
Claudio Atzori
4e6fccf4f6
Merge pull request 'Beta stats wf updated' ( #332 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#332
2023-10-10 09:35:32 +02:00
Miriam Baglioni
a3d01ccb24
refactoring
2023-10-09 14:52:17 +02:00
Miriam Baglioni
8448b9ebfb
mergin with branch beta
2023-10-09 14:27:23 +02:00
Miriam Baglioni
3d6be20989
changes to use the API instead of the IS the get the information for the communities to be used during bulktagging and context propagation
2023-10-09 14:26:33 +02:00
dimitrispie
17586f0ff8
Update step20-createMonitorDB.sql
...
Add result_orcid table to monitor dbs
2023-10-09 14:21:31 +03:00
dimitrispie
489a082f04
Update step16-createIndicatorsTables.sql
...
Change scripts for gold, hybrid, bronze indicators
2023-10-09 14:00:50 +03:00
Claudio Atzori
ef833840c3
[Doiboost] removed linkage to SFI unidentified project
2023-10-06 15:48:18 +02:00
Claudio Atzori
84a58802ab
[OC] using the common pid cleaning function
2023-10-06 14:48:05 +02:00
Claudio Atzori
46034630cf
[OC] compress the output actionset
2023-10-06 14:42:02 +02:00
Claudio Atzori
3bc44fbf1d
Merge branch 'beta' into irish_funder
2023-10-06 14:26:41 +02:00
Claudio Atzori
3c23d5f9bc
Merge branch 'beta' into SWH_integration
2023-10-06 14:15:38 +02:00
Claudio Atzori
858931ccb6
[SWH] compress the output actionset
2023-10-06 14:03:33 +02:00
Claudio Atzori
f759b18bca
[SWH] aligned parameter name
2023-10-06 13:43:20 +02:00
Claudio Atzori
eed9fe0902
code formatting
2023-10-06 12:31:17 +02:00
Claudio Atzori
73c49b8d26
Merge branch 'beta' into SWH_integration
2023-10-06 12:21:51 +02:00
Sandro La Bruzzo
42a2dad975
implemented relation to irish funder from a Json list
2023-10-06 11:52:33 +02:00
Serafeim Chatzopoulos
1bb83b9188
Add prefix in SWH ID
2023-10-04 20:31:45 +03:00
Claudio Atzori
ee8a39e7d2
cleanup and refinements
2023-10-04 12:32:05 +02:00
Serafeim Chatzopoulos
e9f24df21c
Move SWH API Key from constants to workflow param
2023-10-03 20:57:57 +03:00
Serafeim Chatzopoulos
cae75fc75d
Add SWH in the collectedFrom field
2023-10-03 16:55:10 +03:00
Serafeim Chatzopoulos
b49a3ac9b2
Add actionsetsPath as a global WF param
2023-10-03 15:43:38 +03:00
Serafeim Chatzopoulos
24c43e0c60
Restructure workflow parameters
2023-10-03 15:11:58 +03:00
Serafeim Chatzopoulos
9f73d93e62
Add param for limiting repo Urls
2023-10-03 14:39:08 +03:00
Claudio Atzori
5919e488dd
Merge branch 'beta' into importpoci
2023-10-03 10:43:53 +02:00
Serafeim Chatzopoulos
839a8524e7
Add action for creating actionsets
2023-10-02 23:50:38 +03:00
Miriam Baglioni
d7fccdc64b
fixed paths in wf to match the req of the pathname
2023-10-02 14:10:57 +02:00
Miriam Baglioni
9898470b0e
Addressing comments in D-Net/dnet-hadoop#340 \#issuecomment-10592
2023-10-02 12:54:16 +02:00
Claudio Atzori
7b403a920f
Merge branch 'beta' into consistency_keep_mergerels
2023-10-02 11:26:00 +02:00
Claudio Atzori
dc86018a5f
Merge branch 'merge_entities_job' into beta
2023-10-02 11:24:48 +02:00
Claudio Atzori
7f244d9a7a
code formatting
2023-10-02 11:04:36 +02:00
Giambattista Bloisi
e239b81740
Fix defect #8997 : GenerateEventsJob is generating huge amounts of logs because broker entity similarity calculation consistently failed
2023-10-02 11:04:18 +02:00
Miriam Baglioni
e84f5b5e64
extended existing codo to accomodate import of POCI from open citation
2023-10-02 09:25:16 +02:00
Serafeim Chatzopoulos
ab0d70691c
Add step for archiving repoUrls to SWH
2023-09-28 20:56:18 +03:00
Serafeim Chatzopoulos
ed9c81a0b7
Add steps to collect last visit data && archive not found repository URLs
2023-09-27 19:00:54 +03:00
Alessia Bardi
0935d7757c
Use v5 of the UNIBI Gold ISSN list in test
2023-09-20 15:41:35 +02:00
Alessia Bardi
cc7204a089
tests for d4science catalog
2023-09-20 15:38:32 +02:00
dimitrispie
9ef971a146
Update step16-createIndicatorsTables.sql
...
Fix int year for:
indi_org_openess_year
indi_org_fairness_year
indi_org_findable_year
2023-09-19 14:25:42 +03:00
Serafeim Chatzopoulos
9d44418d38
Add collecting software code repository URLs
2023-09-14 18:43:25 +03:00
Serafeim Chatzopoulos
395a4af020
Run CC and RAM sequentieally in dhp-impact-indicators WF
2023-09-13 08:59:40 +02:00
Claudio Atzori
4786aa0e09
added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map
2023-09-07 11:21:07 +02:00
dimitrispie
5f90cc11e9
Update step16-createIndicatorsTables.sql
...
Fix indi_pub_bronze_oa
2023-09-06 14:14:38 +03:00
Claudio Atzori
adec6692ca
Merge branch 'beta' into invisible_relations
2023-09-04 16:13:06 +02:00
Claudio Atzori
15666e86a8
added collectedfrom to the affiliation relations imported from Crossref
2023-09-04 15:56:06 +02:00
Claudio Atzori
5b06c9d06f
[graph raw] datainfo.invisible set as true only for entities
2023-09-04 15:15:24 +02:00
Serafeim Chatzopoulos
7de0164c26
Fix import of affiliations relations from Crossref
2023-09-04 16:04:41 +03:00
Giambattista Bloisi
2caaaec42d
Include SparkCleanRelation logic in SparkPropagateRelation
...
SparkPropagateRelation includes merge relations
Revised tests for SparkPropagateRelation
2023-09-04 11:33:20 +02:00
dimitrispie
964c2f553e
Changes in indicators step, monitor step
...
- graduatedoctorates for observatory
- result_apc_affiliations table
- new indicators
indi_is_funder_plan_s
indi_funder_fairness
indi_ris_fairness
indi_funder_openess
indi_ris_openess
indi_funder_findable
indi_ris_findable
indi_is_project_result_after
- cast year to int in composite indicators
- new institutions
-- Universidade Católica Portuguesa
-- Iscte - Instituto Universitário de Lisboa
-- Munster Technological University
-- Cardiff University
-- Leibniz Institute of Ecological Urban and Regional Development
2023-09-01 10:57:02 +03:00
Giambattista Bloisi
6cc7d8ca7b
GroupEntities and DispatchEntites are now merged in GroupEntitiesSparkJob
2023-08-30 10:43:31 +02:00
Giambattista Bloisi
6b1c05d118
Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb
2023-08-29 16:04:19 +02:00
Claudio Atzori
bf35280ea6
code formatting
2023-08-29 11:11:00 +02:00
Claudio Atzori
58665a246c
Merge branch 'beta' into propagate_relation_rewrite
2023-08-29 10:47:02 +02:00
Claudio Atzori
f437be80ad
[impact indicators] adjusted paths in the bip ranker wf parameters
2023-08-29 09:03:03 +02:00
Giambattista Bloisi
d012aec0b3
Revert PropagateRelation's argument name from outputPath to graphOutputPath in consistency workflow ( #8964 )
2023-08-28 22:44:54 +02:00
Giambattista Bloisi
a860e19423
Fix ensure all relations are written out, not only those managed by dedup
2023-08-28 15:36:02 +02:00
Giambattista Bloisi
0d7b2bf83d
Rewrite SparkPropagateRelation exploiting Dataframe API
2023-08-28 10:34:54 +02:00
Miriam Baglioni
9c8b41475a
Merge pull request '8172_impact_indicators_workflow' ( #284 ) from 8172_impact_indicators_workflow into beta
...
Reviewed-on: D-Net/dnet-hadoop#284
2023-08-14 15:50:48 +02:00
Serafeim Chatzopoulos
97c1ba8918
Merge actionsets of results and projects
2023-08-11 15:56:53 +03:00
Giambattista Bloisi
95cd2b9b1e
Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
...
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
2023-08-10 11:53:48 +02:00
Giambattista Bloisi
fab9920271
DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
2023-08-09 15:41:43 +02:00
Miriam Baglioni
c25ac21e5e
Merge pull request 'graph cleaning, suggestions from ticket 8898' ( #325 ) from cleaning_8898 into beta
...
Reviewed-on: D-Net/dnet-hadoop#325
2023-08-08 11:14:19 +02:00
Miriam Baglioni
c334fe2438
Merge pull request 'Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities' ( #328 ) from cleanup_relations_after_dedup into beta
...
Reviewed-on: D-Net/dnet-hadoop#328
2023-08-08 09:49:12 +02:00
Miriam Baglioni
0e2f855807
Merge pull request 'Updates Promotion DBs' ( #321 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#321
2023-08-07 12:09:16 +02:00
Miriam Baglioni
18fbe52b20
Merge pull request 'Import affiliation relations from Crossref' ( #320 ) from 8876 into beta
...
Reviewed-on: D-Net/dnet-hadoop#320
2023-08-07 10:45:30 +02:00
Giambattista Bloisi
97b6d1dc45
Filter ids by dataInfo.deletedbyinference and DataInfo.invisible flags
...
Filter relations also by dataInfo.invisible flag
2023-08-07 10:24:11 +02:00
Giambattista Bloisi
af49424b59
Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleyted by inference or that are pointing to dangling entities
2023-08-04 14:27:39 +02:00
Claudio Atzori
11ffb9bd68
rule out records with NULL dataInfo
2023-07-31 12:35:33 +02:00
Serafeim Chatzopoulos
7cefe2665b
Remove unnecessary classes
2023-07-28 19:14:39 +03:00
Serafeim Chatzopoulos
26a92ce762
Merge branch '8876' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8876
2023-07-28 19:03:57 +03:00
Serafeim Chatzopoulos
ebfba38ab6
Add changes from code review
2023-07-28 19:03:47 +03:00
Serafeim Chatzopoulos
eb8684a8cf
Merge branch 'beta' into 8876
2023-07-28 13:39:33 +02:00
Claudio Atzori
a72b9e96ac
expand the instance level fulltext in the XML records
2023-07-27 14:57:38 +02:00
Claudio Atzori
270df939c4
partial implementation of the suggestions from https://support.openaire.eu/issues/8898
2023-07-25 17:29:50 +02:00
Giambattista Bloisi
e64c2854a3
Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
...
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Giambattista Bloisi
bb5b845e3c
Use scala.binary.version property to resolve scala maven dependencies
...
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
Serafeim Chatzopoulos
3a0f09774a
Add script to find score limits
2023-07-21 17:55:41 +03:00
Ilias Kanellos
06b9b71c4e
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 17:42:49 +03:00
Ilias Kanellos
2374f445a9
Produce additional bip update specific files
2023-07-21 17:42:46 +03:00
Serafeim Chatzopoulos
cb0f3c50f6
Format workflow.xml
2023-07-21 16:07:10 +03:00
Serafeim Chatzopoulos
c64e5e588f
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 15:27:02 +03:00
Serafeim Chatzopoulos
2cc5b1a39b
Fixes in workflow.xml
2023-07-21 15:26:50 +03:00
Ilias Kanellos
0f96af5d56
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 13:42:35 +03:00
Ilias Kanellos
03da965162
Format bip-score based file without doi references
2023-07-21 13:42:30 +03:00
Giambattista Bloisi
f03153823a
Update testCitationRelations number of expected citations according to changes made in 0559d8b4
(monodirectional citations)
2023-07-21 10:48:28 +02:00
Giambattista Bloisi
54c1eacef1
SparkJobTest was failing because testing workingdir was not cleaned up after eact test
2023-07-21 10:42:24 +02:00
Giambattista Bloisi
5e15f20e6e
Fix entityMerger that was excluding the authors of the first entity in the list to merge
2023-07-21 00:46:54 +02:00
Giambattista Bloisi
0210a14e43
Ignore timestamp differences in PromoteActionPayloadForGraphTableJobTest
2023-07-20 23:45:57 +02:00
Giambattista Bloisi
dba34505de
Fix SparkStatsTest bug where parquet tables were incorrectly read as text files leading to unpredictable count() values
2023-07-19 14:24:52 +02:00
Giambattista Bloisi
e47ed1fdb2
Use DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES in json mapper to avoid that tests fail if they encounter unmapped properties
2023-07-19 14:21:40 +02:00
Serafeim Chatzopoulos
db4ca43ee8
Resolve conflict
2023-07-18 18:38:26 +03:00
Serafeim Chatzopoulos
be320ba3c1
Indentation fixes
2023-07-17 16:04:21 +03:00
dimitrispie
be4856ef35
Update step15.sql
2023-07-17 15:33:58 +03:00
Serafeim Chatzopoulos
bc1a4611aa
Minor changes
2023-07-17 11:17:53 +03:00
dimitrispie
163b2ee2a8
Changes
...
1. Monitor updates
2. Bug fixes during copy to impala cluster
2023-07-13 15:25:00 +03:00
dimitrispie
76901a25f9
Updates Promotion DBs
...
- Add a step for promoting the splitted monitor DBs
2023-07-12 22:49:08 +03:00
Serafeim Chatzopoulos
4eba14a80e
Add oozie workflow
2023-07-06 21:07:50 +03:00
Serafeim Chatzopoulos
c2998a14e8
Add basic tests for affiliation relations
2023-07-06 20:28:16 +03:00
Serafeim Chatzopoulos
bc7b00bcd1
Add bi-directional affiliation relations
2023-07-06 18:29:15 +03:00
Serafeim Chatzopoulos
12528ed2ef
Refactor PrepareAffiliationRelations.java to use OafMapperUtils common functions
2023-07-06 18:08:33 +03:00
Serafeim Chatzopoulos
bbc245696e
Prepare actionsets for BIP affiliations
2023-07-06 15:56:12 +03:00
Ilias Kanellos
0c433eccdd
Fix scores & Workflow
2023-07-06 15:06:28 +03:00
Ilias Kanellos
d5c39a1059
Fix map scores to doi
2023-07-06 15:04:48 +03:00
Ilias Kanellos
772d5f0aab
Make PR and AttRank serial
2023-07-06 13:47:51 +03:00
Giambattista Bloisi
801da2fd4a
New sources formatted by maven plugin
2023-07-06 10:28:53 +02:00
Giambattista Bloisi
bd3fcf869a
rename dnet-pace-core into dhp-pace-core module and use it as dependency in other modules
2023-07-06 10:02:23 +02:00
Serafeim Chatzopoulos
347a889b20
Read affiliation relations
2023-07-06 00:51:01 +03:00
Miriam Baglioni
4c9bc4c3a5
refactoring
2023-06-30 19:05:15 +02:00
Miriam Baglioni
7738372125
[UsageCount] fixed typo in attribute name for datasource table
2023-06-30 18:56:41 +02:00
Miriam Baglioni
55ea485783
[UsageCount] split the count for result at the level of the datasource. for each indicator one unit is specified for each datasource contrinuting to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the value
2023-06-30 18:39:30 +02:00
Sandro La Bruzzo
9963fd6d29
updated log to add subentity
2023-06-28 13:36:05 +02:00
Sandro La Bruzzo
ed7e2ab6d1
reverted mistake on commit workflow.xml
2023-06-28 11:40:19 +02:00
Sandro La Bruzzo
9910ce06ae
added to CreateSimRel the feature to write time log
2023-06-28 11:38:16 +02:00
Miriam Baglioni
2717edafb7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-06-28 11:25:14 +02:00
Miriam Baglioni
2f04c9d149
[BulkTagging] fixing left over for test
2023-06-28 11:24:42 +02:00
Sandro La Bruzzo
bd17c3edc8
added to CreateSimRel the feature to write time log
2023-06-28 11:20:58 +02:00
Serafeim Chatzopoulos
60f25b780d
Minor fixes in workflow.xml and job.properties
2023-06-23 12:51:50 +03:00
Michele Artini
88a1cbc37d
fixed a datasource id
2023-06-22 07:56:33 +02:00
Claudio Atzori
b0ebf56367
Merge pull request 'Update step15_5.sql' ( #314 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#314
2023-06-21 10:33:22 +02:00
dimitrispie
2b6370eaee
Update step15_5.sql
...
Bug fix
2023-06-21 11:31:10 +03:00
Claudio Atzori
35e42a86ed
Merge pull request 'Update step15_5.sql' ( #313 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#313
2023-06-21 10:26:16 +02:00
dimitrispie
74cb060bfe
Update step15_5.sql
...
Add "if not exists" clause
2023-06-21 11:24:06 +03:00
Claudio Atzori
85e016df17
Merge pull request 'Update step16-createIndicatorsTables.sql' ( #312 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#312
2023-06-21 09:52:33 +02:00
dimitrispie
a475cfcb7b
Update step16-createIndicatorsTables.sql
...
Rename a field in indi_pub_interdisciplinarity
2023-06-21 10:42:02 +03:00
Claudio Atzori
979cf9cd87
Merge pull request 'Update step15.sql' ( #311 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#311
2023-06-21 09:20:01 +02:00
dimitrispie
4648cd88d4
Update step15.sql
...
Cast score to double
2023-06-21 10:02:19 +03:00
dimitrispie
94d2573c77
Update step15.sql
...
Bug Fix
2023-06-21 09:22:39 +03:00
Claudio Atzori
0561362de2
Merge pull request 'Update step20-createMonitorDB_institutions.sql' ( #309 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#309
2023-06-20 15:07:09 +02:00
Claudio Atzori
50d7dc0078
[graph enrichment] fixed projectOrganizationPath not being passed to the apply_resulttoorganization_propagation node
2023-06-19 15:42:44 +02:00
Claudio Atzori
fbd9bf704e
indent
2023-06-19 15:41:22 +02:00
dimitrispie
be2caedb04
Update step20-createMonitorDB_institutions.sql
...
Add openorgs____::1624ff7c01bb641b91f4518539a0c28a Vrije Universiteit Amsterdam
2023-06-19 12:12:17 +03:00
dimitrispie
36e0a8fec4
Changes to Promotion Stats WF
...
1. Add new cluster host at impala-shell commands
2. Add a step for splitting monitor dbs
3. Update workflow.xml to included the new splitting monitor dbs step
2023-06-19 09:44:34 +03:00
dimitrispie
4c770a5e29
Update finalizeImpalaCluster.sh
...
Drop views in shadow dbs before dropping the db
2023-06-15 13:25:37 +03:00
dimitrispie
e06d962a6a
Update step15.sql
2023-06-15 12:20:35 +03:00
dimitrispie
afcad08396
Update step20-createMonitorDB_institutions.sql
...
Added openorgs____::c0b262bd6eab819e4c994914f9c010e2 -- National Institute of Geophysics and Volcanology
2023-06-15 10:28:49 +03:00
Claudio Atzori
b9748763e2
Merge pull request '[stats wf] Bug fixes' ( #308 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#308
2023-06-14 21:57:03 +02:00
dimitrispie
42b8ce2ba4
Update copyDataToImpalaCluster.sh
2023-06-14 19:23:42 +03:00
dimitrispie
2032b0df40
Bug fixes
...
1. Remove tables/views from old databases in the new cluster, before dropping the dbs
2. Fix id in result_accessroute, indi_impact_measures, indi_pub_bronze_oa
2023-06-14 19:09:09 +03:00
Claudio Atzori
b76a47b103
[aggregator graph] added column alias when mapping organization PIDs from the OpenOrgs database
2023-06-13 11:38:10 +02:00
Claudio Atzori
ad04f14b81
Merge branch 'beta' into distinct_pids_from_openorgs_beta
2023-06-12 09:58:21 +02:00
Claudio Atzori
55f002f1e9
Merge branch 'beta' into propagationProjectThroughParentChils
2023-06-12 09:56:53 +02:00
Claudio Atzori
4b00a76271
Merge branch 'beta' into fulltext_url_validation
2023-06-12 09:55:25 +02:00
Claudio Atzori
de225c71cd
Merge branch 'beta' into removeTaggingCondition
2023-06-12 09:50:40 +02:00
Claudio Atzori
e1409ffe80
update sql query to return distinct pids
2023-06-12 09:47:45 +02:00
Claudio Atzori
da7b66c542
Merge pull request '[stats wf] Added memory to hive' ( #305 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#305
2023-06-08 08:58:48 +02:00
dimitrispie
c5f42c7f5b
Added memory to hive
2023-06-07 18:18:23 +03:00
Claudio Atzori
afb76ebf0f
Merge pull request '[stats wf] Bug fix on indicators step' ( #304 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#304
2023-06-07 16:49:09 +02:00
dimitrispie
fa24e2e18f
Bug fix on indicators step
...
indi_pub_gold_oa table was missing during the creation of other indicators
2023-06-07 17:43:37 +03:00
Claudio Atzori
01c67e697d
Merge pull request '[ stats wf] Bug fix' ( #303 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#303
2023-06-07 14:41:44 +02:00
dimitrispie
28272c1b0e
Bug fix
2023-06-07 15:34:01 +03:00
Alessia Bardi
d5be6a13e9
Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal
2023-06-06 14:43:32 +02:00
Claudio Atzori
8f651f1225
Merge pull request 'Changes to beta stats wf' ( #300 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#300
2023-06-06 11:41:36 +02:00
dimitrispie
ad07fbf053
Add names to organizations for collaboration indicators
2023-06-02 14:13:10 +03:00
dimitrispie
2324670714
Split Monitor DBs-Interdisciplinarity indicators
...
- Split DBs Monitor for faster rendering of visualizations
- Add interdisciplinarity indicators from result_fos
2023-06-02 13:34:16 +03:00
Miriam Baglioni
daf4d7971b
refactoring
2023-05-31 18:56:58 +02:00
Miriam Baglioni
97d72d41c3
finalization of implementation and testing
2023-05-31 18:53:22 +02:00
Miriam Baglioni
0389b57ca7
added propagation for project to organization
2023-05-31 11:06:58 +02:00
Claudio Atzori
e45777e7e1
[aggregator graph] added validation for URLs mapped from oaf:fulltext
2023-05-26 11:33:42 +02:00
dimitrispie
ebe586b1d1
Impact indicators/Unpaywall
...
- Added Impact indicators
- Added unpaywall open access colours
2023-05-26 10:25:28 +03:00
dimitrispie
d6102dd576
Update step16-createIndicatorsTables.sql
...
- Add org names to indi_project_collab_org
- Add indi_pub_bronze_oa
- Changes to indi_pub_hybrid_oa_with_cc
2023-05-25 14:52:34 +03:00
Miriam Baglioni
9097e71853
Added assertion in test
2023-05-24 16:30:53 +02:00
Miriam Baglioni
9567c13bc3
refactoring
2023-05-24 16:20:05 +02:00
Miriam Baglioni
34172455d1
[BulkTag] Adding remove constraints to specify when a community must not appear in the context of a result.
2023-05-24 09:56:23 +02:00
Ilias Kanellos
a1b9187039
Fix syntax error on workflow.xml
2023-05-23 17:17:12 +03:00
Ilias Kanellos
6a7e370a21
Remove unnecessary counts in graph creation
2023-05-23 16:48:58 +03:00
Ilias Kanellos
ec4e010687
End after rankings | Create graph debugged
2023-05-23 16:44:04 +03:00
Claudio Atzori
a235d2a24a
Merge pull request 'Updates to steps related to transfer data to impala cluster' ( #295 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#295
2023-05-18 08:46:15 +02:00
dimitrispie
86f4f63daf
Updates to steps related to transfer data to impala cluster
...
1. Remove external table definitions in stats_ext
2. Fix the issue where some views are not created.
3. Added two workflow parameters for copying also the usage stats dbs
2023-05-18 09:33:05 +03:00
Claudio Atzori
909729a2fc
[dedup] tweaking num partitions, minor changes
2023-05-17 10:16:22 +02:00
Ilias Kanellos
38020e242a
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-05-16 17:34:53 +03:00
Ilias Kanellos
3d69f33c84
Fix selection of columns in graph creation
2023-05-16 17:34:42 +03:00
Ilias Kanellos
3c38f7ba6f
Fix selection of columns in graph creation
2023-05-16 17:32:53 +03:00
Serafeim Chatzopoulos
8ef718c363
Fix workflow application path
2023-05-16 16:28:48 +03:00
Serafeim Chatzopoulos
26328e2a0d
Move job.properties
2023-05-16 14:39:53 +03:00
Serafeim Chatzopoulos
4eec3e7052
Add jobTracker, nameNode && spark2Lib as global params in oozie wf
2023-05-15 22:28:48 +03:00
Serafeim Chatzopoulos
b83135c252
Add missing kill nodes in workflow.xml
2023-05-15 19:55:35 +03:00
Serafeim Chatzopoulos
45f2aa0867
Move end node ... at the end in workflow.xml
2023-05-15 17:52:20 +03:00
Claudio Atzori
8acad52a0c
Merge branch 'beta' into apc_affiliation
2023-05-15 15:47:33 +02:00
Claudio Atzori
8a463cc3e8
fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common
2023-05-15 15:44:46 +02:00
Serafeim Chatzopoulos
12a57e1f58
Resolve conflicts
2023-05-15 16:20:11 +03:00
Serafeim Chatzopoulos
82e2a96f51
Resolve conflicts
2023-05-15 15:53:12 +03:00
Serafeim Chatzopoulos
b8e8c959fe
Update workflow.xml && job.properties
2023-05-15 15:50:23 +03:00
Ilias Kanellos
4a905932a3
Spark properties from job.properties
2023-05-15 15:24:22 +03:00
Claudio Atzori
0c314d5e09
Merge pull request 'Update copyDataToImpalaCluster.sh' ( #293 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#293
2023-05-15 12:05:54 +02:00
Serafeim Chatzopoulos
07818131ef
Update documentation
2023-05-15 13:04:44 +03:00
dimitrispie
b3f9633205
Update copyDataToImpalaCluster.sh
...
Added option --user to impala-shell command
2023-05-15 12:51:44 +03:00
Miriam Baglioni
78b07400c0
changed test classes
2023-05-15 11:37:08 +02:00
Miriam Baglioni
86fe886c1a
removed the inverse of the Citing relation
2023-05-15 11:20:51 +02:00