Commit Graph

3703 Commits

Author SHA1 Message Date
Miriam Baglioni ecd398fe51 refactoring 2023-01-20 14:23:45 +01:00
Miriam Baglioni 0a5c6010b0 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-01-13 16:14:46 +01:00
dimitrispie dd70c32ad7 Bug fixes 2023-01-12 17:18:05 +02:00
dimitrispie 51f7ab5864 Bug fixes 2023-01-12 17:15:06 +02:00
dimitrispie 686580a220 - New Monitor DB workflow
- New Organization added
2023-01-12 11:18:03 +02:00
Claudio Atzori 0a58bc7ba7 [broker] prevent NPEs 2023-01-11 14:44:14 +01:00
Claudio Atzori 04cb96001c [broker] d40e20f437 adapted to the beta graph model 2023-01-11 10:10:12 +01:00
Michele Artini 91b845f611 Considering instance pids and alteternative identifiers 2023-01-11 09:58:54 +01:00
Miriam Baglioni 1f367122e4 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-01-11 09:47:44 +01:00
Michele Artini 7b7520850b fixed an invalid char 2023-01-11 09:22:18 +01:00
Miriam Baglioni d6895f0387 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-01-09 17:28:38 +01:00
dimitrispie becb242c17 Monitor DB only Workflow 2023-01-04 16:50:29 +02:00
dimitrispie 592013d5dd Added more steps in decision node 2022-12-23 09:43:16 +02:00
dimitrispie 6449ff4207 1. Added a decision node to enables the workflow to make a selection on the execution path to follow
2. Added new organization
3. Added 5 new tables from Eurostast
2022-12-22 10:18:21 +02:00
Miriam Baglioni 8893389895 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-12-21 12:42:27 +01:00
Sandro La Bruzzo 3c9826f186 updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline 2022-12-21 11:21:17 +01:00
Claudio Atzori 6aa91204a5 [orcid propagation] skip empty directories 2022-12-20 14:15:46 +01:00
Miriam Baglioni 6674cccb94 [BulkTag] description of parameters more comprehensive for those who do not implement it 2022-12-16 15:33:20 +01:00
Miriam Baglioni f37113a941 [BulkTag] moving xquery to get community configuration in dedicated file 2022-12-16 15:32:26 +01:00
Miriam Baglioni 8685eaa706 [Clean Country] added test to verify remove of country 2022-12-16 15:31:25 +01:00
Miriam Baglioni dc0ec88a58 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-12-16 13:18:32 +01:00
Miriam Baglioni d791840b82 [Clean Country] added test to verify remove of country: 2022-12-16 13:18:29 +01:00
Claudio Atzori 7b80b24f82 [cleaning] country cleaning must use both PID and AlternateIdentifier fields 2022-12-15 14:49:04 +01:00
Claudio Atzori b8bafab8a0 [cleaning] improved vocabulary based mapping, specialization for the strict vocab cleaning 2022-12-12 14:43:03 +01:00
Sandro La Bruzzo 5e4866d033 implemented synch for single mdstore 2022-12-12 11:29:46 +01:00
Claudio Atzori c18b8048c3 [cleaning] avoid NPE 2022-12-10 11:41:38 +01:00
Claudio Atzori 8b44afe5e5 [cleaning] avoid NPE 2022-12-09 15:44:57 +01:00
Claudio Atzori 389dd25430 [cleaning] avoid NPE 2022-12-08 18:40:48 +01:00
Claudio Atzori 730228d73d [cleaning] align wf parameter names in test 2022-12-08 18:40:22 +01:00
Claudio Atzori 2094fa6db0 [cleaning] align wf parameter names 2022-12-08 17:22:26 +01:00
Miriam Baglioni a485a94956 [Cleaning] fixed parameter name in property file 2022-12-08 16:59:34 +01:00
Miriam Baglioni 3d99b78d94 [Cleaning] fixed error in parameter (workingPath to workingDir) 2022-12-08 10:25:02 +01:00
Claudio Atzori 1b8488976b code formatting 2022-12-07 10:45:38 +01:00
Claudio Atzori cd1b58483e [bulk tag] fixed Community configuration parsing to void NPE 2022-12-07 10:39:00 +01:00
Claudio Atzori 062abfd669 fixed NPE, removed unused stuff 2022-12-06 12:04:00 +01:00
dimitrispie 2a52a42169 Added 4 institutions:
-University of Modena and Reggio Emilia
-Bilkent University
-Saints Cyril and Methodius University of Skopje
-University of Milan
2022-12-06 10:10:21 +02:00
Claudio Atzori 8248da40d9 Merge branch 'beta' into graph_cleaning 2022-12-02 14:49:00 +01:00
Claudio Atzori ddf065756f Merge pull request 'Two organizations are added for monitor' (#258) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #258
2022-12-02 14:45:27 +01:00
Sandro La Bruzzo 5a48a2fb18 implemented synch for single mdstore 2022-12-01 11:34:43 +01:00
Claudio Atzori a38116546d Merge branch 'beta' into deduptesting 2022-11-30 11:27:29 +01:00
Miriam Baglioni ce020f2c83 [EOSC FUTURE] added resources and test for review 2022-11-30 09:57:30 +01:00
Miriam Baglioni bb0ddc1c44 [BulkTag] adding verb starts_with 2022-11-30 09:56:24 +01:00
Claudio Atzori 8e3edba318 [graph cleaning] testing the collectedfron and hostedby patch procedure 2022-11-29 16:07:09 +01:00
Claudio Atzori 58c05731f9 [graph cleaning] WIP: testing the collectedfron and hostedby patch procedure 2022-11-29 11:21:51 +01:00
Miriam Baglioni 9c70c5dbd6 [Bulk Tag horizontal] added new path in definition of constraint (to recognize fos subjects) - changed test and resource class to test this new aspect 2022-11-28 14:51:20 +01:00
Miriam Baglioni 0628df7a3a resolving conflicts 2022-11-28 10:44:56 +01:00
Claudio Atzori 11695ba649 [graph cleaning] patch also the result's collectedfrom and hostedby datasource name according to the datasource master-duplicate mapping 2022-11-28 10:18:43 +01:00
Claudio Atzori 6082d235d3 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into graph_cleaning 2022-11-28 09:54:48 +01:00
Claudio Atzori 24ef301cc1 [graph cleaning] patch the result's collectedfrom and hostedby identifiers according to the datasource master-duplicate mapping 2022-11-28 09:54:18 +01:00
Alessia Bardi 90c8f9cb61 tests for EOSC Future 2022-11-23 12:18:44 +01:00
Miriam Baglioni 0e3edc5018 [Bulk Tag] fixed issue in verb name 2022-11-23 11:26:36 +01:00
Claudio Atzori a79c47522d updated ORCID datasource identifier 2022-11-23 10:17:49 +01:00
Alessia Bardi 2832117f23 added eoscifguidelines in test 2022-11-22 18:01:12 +01:00
Alessia Bardi 3c08269a4d Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-11-22 17:31:00 +01:00
Alessia Bardi 2687fc9f73 tests for EOSC Future review - ROhub 2022-11-22 17:30:56 +01:00
Claudio Atzori 1d5143b0b6 Merge branch 'beta' into deduptesting 2022-11-22 10:21:30 +01:00
Claudio Atzori 0aa725083f extended dedup testing 2022-11-17 16:13:43 +01:00
Claudio Atzori 3dbc637d3e code formatting 2022-11-17 09:55:41 +01:00
Claudio Atzori ddff0e8999 merging duplicates using IdentifierComparator 2022-11-11 16:10:25 +01:00
Claudio Atzori 5af5a8ae42 added IdentifierComparator 2022-11-09 14:20:59 +01:00
Claudio Atzori 7c3390ac10 Merge branch 'beta' into eoscifguidelines-from-mdstores 2022-11-07 12:18:40 +01:00
dimitrispie 992fc5b628 Added McMaster University Institution 2022-11-03 11:02:18 +02:00
dimitrispie 7fda05e380 Added Autonomous University of Barcelona 2022-11-01 13:59:40 +02:00
Claudio Atzori 22873c9172 Merge pull request 'Added fields: totalcost, fundedamount, currency, in project table' (#257) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #257
2022-10-31 13:49:27 +01:00
dimitrispie 7861c472e0 Hive memory parameters 2022-10-28 19:00:32 +03:00
dimitrispie 5df9c63963 Added fields: totalcost, fundedamount, currency, in project table 2022-10-27 16:44:26 +03:00
Sandro La Bruzzo 2b9a20a4a3 Changed the way Scholexplorer filter the relationships, I found that filter all relation coming from openCitation is wrong, because we loose a lot of relation than intersect OpenCitation, but they don't come only from there 2022-10-24 12:53:47 +02:00
Alessia Bardi 208ed32315 fixed xpath for semantic relation 2022-10-23 18:18:13 +02:00
Alessia Bardi ee759ac92d file format after mvn compile 2022-10-23 18:09:47 +02:00
Alessia Bardi 31a10f000b Map the field oaf:eoscifguidelines from mdstores. Currently we can find it in ROHub metadata 2022-10-23 18:05:37 +02:00
Claudio Atzori ec39b84898 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-10-19 15:21:02 +02:00
Claudio Atzori bca4a61710 suppressing hyper verbose spark logs during unit test execution 2022-10-19 15:20:58 +02:00
Sandro La Bruzzo 72f0d88d6c formatted code 2022-10-19 14:18:42 +02:00
Claudio Atzori 9b449110c6 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-10-14 15:48:04 +02:00
Claudio Atzori ae7cd0735a [graph2hive] more partitions 2022-10-14 15:47:58 +02:00
Sandro La Bruzzo 135cf81151 Merge remote-tracking branch 'origin/beta' into beta 2022-10-13 11:47:25 +02:00
Sandro La Bruzzo a1f94530a3 added documentation 2022-10-13 11:47:11 +02:00
Claudio Atzori b47aaf4dd1 [cleaning] subjects declared as belonging to specific vocabularies whose values are not found in the vocab are set to type keyword 2022-10-13 11:23:43 +02:00
Claudio Atzori 6163ecbf63 [cleaning] renamed parameters in wf action 2022-10-11 11:20:03 +02:00
Claudio Atzori b301e9fdff [cleaning] renamed action name/description 2022-10-11 11:08:52 +02:00
Claudio Atzori ece40adc09 [cleaning] fixing NPE in the country cleaning phase 2022-10-11 10:10:20 +02:00
Claudio Atzori d51275a965 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-10-07 09:52:49 +02:00
Claudio Atzori 8d97949316 [cleaning] fixed loop in wf nodes 2022-10-07 09:52:45 +02:00
Miriam Baglioni 4d8339614b Revert "[BipFinder] Fixed issue for wrong escaped char in doi"
This reverts commit 188f25eefa.
2022-10-04 14:29:47 +02:00
Miriam Baglioni 7324853a17 Revert "[BipFinder] refactoring"
This reverts commit 28dc317350.
2022-10-04 14:29:39 +02:00
Miriam Baglioni 28dc317350 [BipFinder] refactoring 2022-10-04 09:47:27 +02:00
Miriam Baglioni 188f25eefa [BipFinder] Fixed issue for wrong escaped char in doi 2022-10-03 12:42:52 +02:00
Claudio Atzori 89f7007080 Merge pull request '[stats wf] misc changes' (#254) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #254
2022-10-03 10:32:05 +02:00
dimitrispie 2c0c3f1806 Cast amount to float for table result_apcs 2022-09-28 19:33:24 +03:00
Alessia Bardi 49360770d7 map w3id as instance url 2022-09-28 14:16:39 +02:00
dimitrispie bdc46e3eaa Remove denormalization of results to fix downloads numbers in monitor 2022-09-28 14:59:08 +03:00
dimitrispie 2ebb1459a9 Fixed type in no_downloads 2022-09-28 14:36:57 +03:00
Miriam Baglioni b5b5a4c192 [CleanCountry] fixed issue 2022-09-28 12:42:51 +02:00
Miriam Baglioni f1d7d45cf7 [BulkTag] fixed issue 2022-09-28 12:01:43 +02:00
Miriam Baglioni 3ec044600d [BulkTag] fixed conflicts 2022-09-28 11:58:28 +02:00
Miriam Baglioni 1cb79719a7 [BulkTag] fixed issues 2022-09-28 11:44:55 +02:00
Claudio Atzori f3f7604e6c trying to fix a test that fails only on Jenkins 2022-09-27 15:21:37 +02:00
Claudio Atzori 3f90d159e3 code formatting 2022-09-27 15:08:00 +02:00
Claudio Atzori 0b3e44e521 Merge branch 'beta' into relation-from-odf 2022-09-27 14:57:01 +02:00
Claudio Atzori 57dbeb08d2 code formatting 2022-09-27 14:55:10 +02:00
Claudio Atzori b60985cf68 Merge branch 'beta' into horizontalConstraints 2022-09-27 14:39:31 +02:00
Claudio Atzori 3b60642ef9 Merge pull request 'Synchronize indicators in stats-db with monitor-db' (#249) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #249
2022-09-27 14:37:33 +02:00
Claudio Atzori 25e9d92aad Merge branch 'beta' into clean_country 2022-09-27 14:27:49 +02:00
Alessia Bardi fd63e9bfac Mapping all relationships supported in ModelConstants and ModelSupport 2022-09-26 11:24:13 +02:00
Miriam Baglioni ca216a92ad [BulkTagging] changed the query to the IS to insert values for FOS and SDG as subject in the configuration used for the tagging 2022-09-23 17:06:07 +02:00
Miriam Baglioni 3e6b0f58bb [BulkTagging] changed the query to the IS to get also the information for the advancedConstraint from the profile 2022-09-23 16:47:19 +02:00
Miriam Baglioni 4a3e119b73 mergin with branch beta 2022-09-23 16:16:06 +02:00
Miriam Baglioni f0e303abf9 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-09-23 16:15:32 +02:00
Miriam Baglioni 55da4d8715 [BulkTagging] modifying code to represent constraints horizontally on all the results. Added subject to the set of field used to express the constraint. Modified resorces to test the new approach. Modified test calss 2022-09-23 16:02:19 +02:00
Alessia Bardi c5eb722170 relationships from relatedIdentifier whose target id type is one of the pid type with an authority 2022-09-23 15:47:05 +02:00
Claudio Atzori c86cc53520 suppressing hyper verbose spark logs during unit test execution 2022-09-23 15:20:40 +02:00
Alessia Bardi ba33ff71fd refactoring for the generation of relationships from related identifier of type 'OPENAIRE' 2022-09-23 15:17:13 +02:00
Alessia Bardi 982bcc1e35 test wrid pid and record identifier 2022-09-23 12:06:06 +02:00
Miriam Baglioni 960cb861a0 refactoring 2022-09-23 11:14:04 +02:00
Claudio Atzori c42850328e fixed semantic (subreltype) for ServiceOrganization relations 2022-09-22 16:23:25 +02:00
Miriam Baglioni 33bb79459e Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-09-22 15:55:17 +02:00
dimitrispie dcd85f8cd7 - Synchronize indicators in stats-db with monitor-db
- added new openorg id for Nanyang Technological University
- changed openorg id for University of Helsinki #8088 ticket
2022-09-22 13:33:07 +03:00
Claudio Atzori e45ec15221 Merge branch 'beta' into clean_country 2022-09-19 11:34:02 +02:00
Claudio Atzori 26e1badded added instance.url syntactical validation, avoid creating multiple duplicated URLs 2022-09-19 11:19:10 +02:00
Miriam Baglioni 5240ac3d7b [EOSC Tag] remove addition of eosc context for result with eosc if guidelines set 2022-09-19 11:02:18 +02:00
Claudio Atzori 192215a18e merged from branch discard-non-wellformed 2022-09-19 10:17:10 +02:00
Claudio Atzori e370e940d8 [aggregator graph] save invalid records aside for further inspection 2022-09-16 14:06:28 +02:00
Claudio Atzori 465e941214 Merge pull request '[stats wf] Changes to indicators tables' (#244) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #244
2022-09-16 10:13:58 +02:00
Claudio Atzori 1e42d984e1 [aggregator graph] save invalid records aside for further inspection 2022-09-15 10:49:42 +02:00
Alessia Bardi 9e7ec4198f fixed test 2022-09-14 18:08:56 +02:00
Claudio Atzori c48f6e9c57 [aggregator graph] save invalid records aside for further inspection 2022-09-14 17:11:26 +02:00
dimitrispie 3bf3127251 Changes to monitor and indicator scripts 2022-09-14 16:36:19 +03:00
Claudio Atzori a0919ed495 [aggregator graph] save invalid records aside for further inspection 2022-09-14 13:27:39 +02:00
Alessia Bardi b99a011345 return empty Oaf list if record cannot be parsed 2022-09-13 11:51:55 +02:00
Alessia Bardi 27af5122d2 logs for non well formed XML files 2022-09-12 14:25:23 +02:00
Claudio Atzori ff6f789b6d code formatting 2022-09-09 15:16:31 +02:00
Claudio Atzori b5d6966c01 Merge branch 'beta' into clean_country 2022-09-09 12:20:19 +02:00
Claudio Atzori b5f7bd30be Merge branch 'beta' into clean_subjects 2022-09-09 12:20:04 +02:00
Alessia Bardi f14107ad77 Merge branch 'handle_as_instance_urls' of https://code-repo.d4science.org/D-Net/dnet-hadoop into handle_as_instance_urls 2022-09-09 12:17:19 +02:00
Alessia Bardi a539c6ccaf https for handle URLs 2022-09-09 12:16:28 +02:00
dimitrispie 71b069ca90 Changes to indicator and monitor scripts 2022-09-09 13:15:58 +03:00
Claudio Atzori 1203378441 Merge branch 'beta' into clean_subjects 2022-09-09 10:38:47 +02:00
Claudio Atzori 14dc909a14 Merge branch 'beta' into clean_country 2022-09-09 10:38:17 +02:00
Claudio Atzori 853c996fa2 Merge branch 'beta' into handle_as_instance_urls 2022-09-09 09:47:16 +02:00
Claudio Atzori a431e01383 Merge pull request 'orcid_multipleworks_download' (#242) from enrico.ottonello/dnet-hadoop:orcid_multipleworks_download into beta
Reviewed-on: #242
2022-09-09 08:45:02 +02:00
Alessia Bardi 9ef063d502 #7861#note-8 instance url from handle 2022-09-07 17:29:54 +03:00
Alessia Bardi 5c45d52af3 testing for RiuNet 2022-09-07 15:40:57 +03:00
dimitrispie 2b5f8c9c9a comment out duplicate table creation 2022-09-06 12:27:53 +03:00
Alessia Bardi a11eb38065 testing for RO-Hub 2022-09-02 16:07:36 +02:00
Enrico Ottonello bfdf2dc390 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid_multipleworks_download 2022-08-25 12:07:54 +02:00
Enrico Ottonello da1cf561e6 alignment with beta 2022-08-25 11:57:20 +02:00
Enrico Ottonello 27445ccdaa cleaned log 2022-08-25 11:56:14 +02:00
Claudio Atzori b7c387c21f cleaning of subjects: avoid duplicated subjects, prioritise collected vs inferred or other sources 2022-08-12 15:09:16 +02:00
Claudio Atzori adb526b0e1 Merge branch 'beta' into clean_subjects 2022-08-12 10:51:17 +02:00
Claudio Atzori cb7c07c54e [scholix] added step to create tar archive 2022-08-11 11:25:24 +02:00