Commit Graph

3672 Commits

Author SHA1 Message Date
Miriam Baglioni 3d99b78d94 [Cleaning] fixed error in parameter (workingPath to workingDir) 2022-12-08 10:25:02 +01:00
Claudio Atzori 1b8488976b code formatting 2022-12-07 10:45:38 +01:00
Claudio Atzori cd1b58483e [bulk tag] fixed Community configuration parsing to void NPE 2022-12-07 10:39:00 +01:00
Claudio Atzori 062abfd669 fixed NPE, removed unused stuff 2022-12-06 12:04:00 +01:00
dimitrispie 2a52a42169 Added 4 institutions:
-University of Modena and Reggio Emilia
-Bilkent University
-Saints Cyril and Methodius University of Skopje
-University of Milan
2022-12-06 10:10:21 +02:00
Claudio Atzori 8248da40d9 Merge branch 'beta' into graph_cleaning 2022-12-02 14:49:00 +01:00
Claudio Atzori ddf065756f Merge pull request 'Two organizations are added for monitor' (#258) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #258
2022-12-02 14:45:27 +01:00
Sandro La Bruzzo 5a48a2fb18 implemented synch for single mdstore 2022-12-01 11:34:43 +01:00
Claudio Atzori a38116546d Merge branch 'beta' into deduptesting 2022-11-30 11:27:29 +01:00
Miriam Baglioni ce020f2c83 [EOSC FUTURE] added resources and test for review 2022-11-30 09:57:30 +01:00
Miriam Baglioni bb0ddc1c44 [BulkTag] adding verb starts_with 2022-11-30 09:56:24 +01:00
Claudio Atzori 8e3edba318 [graph cleaning] testing the collectedfron and hostedby patch procedure 2022-11-29 16:07:09 +01:00
Claudio Atzori 58c05731f9 [graph cleaning] WIP: testing the collectedfron and hostedby patch procedure 2022-11-29 11:21:51 +01:00
Miriam Baglioni 9c70c5dbd6 [Bulk Tag horizontal] added new path in definition of constraint (to recognize fos subjects) - changed test and resource class to test this new aspect 2022-11-28 14:51:20 +01:00
Miriam Baglioni 0628df7a3a resolving conflicts 2022-11-28 10:44:56 +01:00
Claudio Atzori 11695ba649 [graph cleaning] patch also the result's collectedfrom and hostedby datasource name according to the datasource master-duplicate mapping 2022-11-28 10:18:43 +01:00
Claudio Atzori 6082d235d3 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into graph_cleaning 2022-11-28 09:54:48 +01:00
Claudio Atzori 24ef301cc1 [graph cleaning] patch the result's collectedfrom and hostedby identifiers according to the datasource master-duplicate mapping 2022-11-28 09:54:18 +01:00
Alessia Bardi 90c8f9cb61 tests for EOSC Future 2022-11-23 12:18:44 +01:00
Miriam Baglioni 0e3edc5018 [Bulk Tag] fixed issue in verb name 2022-11-23 11:26:36 +01:00
Claudio Atzori a79c47522d updated ORCID datasource identifier 2022-11-23 10:17:49 +01:00
Alessia Bardi 2832117f23 added eoscifguidelines in test 2022-11-22 18:01:12 +01:00
Alessia Bardi 3c08269a4d Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-11-22 17:31:00 +01:00
Alessia Bardi 2687fc9f73 tests for EOSC Future review - ROhub 2022-11-22 17:30:56 +01:00
Claudio Atzori 1d5143b0b6 Merge branch 'beta' into deduptesting 2022-11-22 10:21:30 +01:00
Claudio Atzori 0aa725083f extended dedup testing 2022-11-17 16:13:43 +01:00
Claudio Atzori 3dbc637d3e code formatting 2022-11-17 09:55:41 +01:00
Claudio Atzori ddff0e8999 merging duplicates using IdentifierComparator 2022-11-11 16:10:25 +01:00
Claudio Atzori 5af5a8ae42 added IdentifierComparator 2022-11-09 14:20:59 +01:00
Claudio Atzori 7c3390ac10 Merge branch 'beta' into eoscifguidelines-from-mdstores 2022-11-07 12:18:40 +01:00
dimitrispie 992fc5b628 Added McMaster University Institution 2022-11-03 11:02:18 +02:00
dimitrispie 7fda05e380 Added Autonomous University of Barcelona 2022-11-01 13:59:40 +02:00
Claudio Atzori 22873c9172 Merge pull request 'Added fields: totalcost, fundedamount, currency, in project table' (#257) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #257
2022-10-31 13:49:27 +01:00
dimitrispie 7861c472e0 Hive memory parameters 2022-10-28 19:00:32 +03:00
dimitrispie 5df9c63963 Added fields: totalcost, fundedamount, currency, in project table 2022-10-27 16:44:26 +03:00
Sandro La Bruzzo 2b9a20a4a3 Changed the way Scholexplorer filter the relationships, I found that filter all relation coming from openCitation is wrong, because we loose a lot of relation than intersect OpenCitation, but they don't come only from there 2022-10-24 12:53:47 +02:00
Alessia Bardi 208ed32315 fixed xpath for semantic relation 2022-10-23 18:18:13 +02:00
Alessia Bardi ee759ac92d file format after mvn compile 2022-10-23 18:09:47 +02:00
Alessia Bardi 31a10f000b Map the field oaf:eoscifguidelines from mdstores. Currently we can find it in ROHub metadata 2022-10-23 18:05:37 +02:00
Claudio Atzori ec39b84898 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-10-19 15:21:02 +02:00
Claudio Atzori bca4a61710 suppressing hyper verbose spark logs during unit test execution 2022-10-19 15:20:58 +02:00
Sandro La Bruzzo 72f0d88d6c formatted code 2022-10-19 14:18:42 +02:00
Claudio Atzori 9b449110c6 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-10-14 15:48:04 +02:00
Claudio Atzori ae7cd0735a [graph2hive] more partitions 2022-10-14 15:47:58 +02:00
Sandro La Bruzzo 135cf81151 Merge remote-tracking branch 'origin/beta' into beta 2022-10-13 11:47:25 +02:00
Sandro La Bruzzo a1f94530a3 added documentation 2022-10-13 11:47:11 +02:00
Claudio Atzori b47aaf4dd1 [cleaning] subjects declared as belonging to specific vocabularies whose values are not found in the vocab are set to type keyword 2022-10-13 11:23:43 +02:00
Claudio Atzori 6163ecbf63 [cleaning] renamed parameters in wf action 2022-10-11 11:20:03 +02:00
Claudio Atzori b301e9fdff [cleaning] renamed action name/description 2022-10-11 11:08:52 +02:00
Claudio Atzori ece40adc09 [cleaning] fixing NPE in the country cleaning phase 2022-10-11 10:10:20 +02:00
Claudio Atzori d51275a965 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-10-07 09:52:49 +02:00
Claudio Atzori 8d97949316 [cleaning] fixed loop in wf nodes 2022-10-07 09:52:45 +02:00
Miriam Baglioni 4d8339614b Revert "[BipFinder] Fixed issue for wrong escaped char in doi"
This reverts commit 188f25eefa.
2022-10-04 14:29:47 +02:00
Miriam Baglioni 7324853a17 Revert "[BipFinder] refactoring"
This reverts commit 28dc317350.
2022-10-04 14:29:39 +02:00
Miriam Baglioni 28dc317350 [BipFinder] refactoring 2022-10-04 09:47:27 +02:00
Miriam Baglioni 188f25eefa [BipFinder] Fixed issue for wrong escaped char in doi 2022-10-03 12:42:52 +02:00
Claudio Atzori 89f7007080 Merge pull request '[stats wf] misc changes' (#254) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #254
2022-10-03 10:32:05 +02:00
dimitrispie 2c0c3f1806 Cast amount to float for table result_apcs 2022-09-28 19:33:24 +03:00
Alessia Bardi 49360770d7 map w3id as instance url 2022-09-28 14:16:39 +02:00
dimitrispie bdc46e3eaa Remove denormalization of results to fix downloads numbers in monitor 2022-09-28 14:59:08 +03:00
dimitrispie 2ebb1459a9 Fixed type in no_downloads 2022-09-28 14:36:57 +03:00
Miriam Baglioni b5b5a4c192 [CleanCountry] fixed issue 2022-09-28 12:42:51 +02:00
Miriam Baglioni f1d7d45cf7 [BulkTag] fixed issue 2022-09-28 12:01:43 +02:00
Miriam Baglioni 3ec044600d [BulkTag] fixed conflicts 2022-09-28 11:58:28 +02:00
Miriam Baglioni 1cb79719a7 [BulkTag] fixed issues 2022-09-28 11:44:55 +02:00
Claudio Atzori f3f7604e6c trying to fix a test that fails only on Jenkins 2022-09-27 15:21:37 +02:00
Claudio Atzori 3f90d159e3 code formatting 2022-09-27 15:08:00 +02:00
Claudio Atzori 0b3e44e521 Merge branch 'beta' into relation-from-odf 2022-09-27 14:57:01 +02:00
Claudio Atzori 57dbeb08d2 code formatting 2022-09-27 14:55:10 +02:00
Claudio Atzori b60985cf68 Merge branch 'beta' into horizontalConstraints 2022-09-27 14:39:31 +02:00
Claudio Atzori 3b60642ef9 Merge pull request 'Synchronize indicators in stats-db with monitor-db' (#249) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #249
2022-09-27 14:37:33 +02:00
Claudio Atzori 25e9d92aad Merge branch 'beta' into clean_country 2022-09-27 14:27:49 +02:00
Alessia Bardi fd63e9bfac Mapping all relationships supported in ModelConstants and ModelSupport 2022-09-26 11:24:13 +02:00
Miriam Baglioni ca216a92ad [BulkTagging] changed the query to the IS to insert values for FOS and SDG as subject in the configuration used for the tagging 2022-09-23 17:06:07 +02:00
Miriam Baglioni 3e6b0f58bb [BulkTagging] changed the query to the IS to get also the information for the advancedConstraint from the profile 2022-09-23 16:47:19 +02:00
Miriam Baglioni 4a3e119b73 mergin with branch beta 2022-09-23 16:16:06 +02:00
Miriam Baglioni f0e303abf9 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-09-23 16:15:32 +02:00
Miriam Baglioni 55da4d8715 [BulkTagging] modifying code to represent constraints horizontally on all the results. Added subject to the set of field used to express the constraint. Modified resorces to test the new approach. Modified test calss 2022-09-23 16:02:19 +02:00
Alessia Bardi c5eb722170 relationships from relatedIdentifier whose target id type is one of the pid type with an authority 2022-09-23 15:47:05 +02:00
Claudio Atzori c86cc53520 suppressing hyper verbose spark logs during unit test execution 2022-09-23 15:20:40 +02:00
Alessia Bardi ba33ff71fd refactoring for the generation of relationships from related identifier of type 'OPENAIRE' 2022-09-23 15:17:13 +02:00
Alessia Bardi 982bcc1e35 test wrid pid and record identifier 2022-09-23 12:06:06 +02:00
Miriam Baglioni 960cb861a0 refactoring 2022-09-23 11:14:04 +02:00
Claudio Atzori c42850328e fixed semantic (subreltype) for ServiceOrganization relations 2022-09-22 16:23:25 +02:00
Miriam Baglioni 33bb79459e Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-09-22 15:55:17 +02:00
dimitrispie dcd85f8cd7 - Synchronize indicators in stats-db with monitor-db
- added new openorg id for Nanyang Technological University
- changed openorg id for University of Helsinki #8088 ticket
2022-09-22 13:33:07 +03:00
Claudio Atzori e45ec15221 Merge branch 'beta' into clean_country 2022-09-19 11:34:02 +02:00
Claudio Atzori 26e1badded added instance.url syntactical validation, avoid creating multiple duplicated URLs 2022-09-19 11:19:10 +02:00
Miriam Baglioni 5240ac3d7b [EOSC Tag] remove addition of eosc context for result with eosc if guidelines set 2022-09-19 11:02:18 +02:00
Claudio Atzori 192215a18e merged from branch discard-non-wellformed 2022-09-19 10:17:10 +02:00
Claudio Atzori e370e940d8 [aggregator graph] save invalid records aside for further inspection 2022-09-16 14:06:28 +02:00
Claudio Atzori 465e941214 Merge pull request '[stats wf] Changes to indicators tables' (#244) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #244
2022-09-16 10:13:58 +02:00
Claudio Atzori 1e42d984e1 [aggregator graph] save invalid records aside for further inspection 2022-09-15 10:49:42 +02:00
Alessia Bardi 9e7ec4198f fixed test 2022-09-14 18:08:56 +02:00
Claudio Atzori c48f6e9c57 [aggregator graph] save invalid records aside for further inspection 2022-09-14 17:11:26 +02:00
dimitrispie 3bf3127251 Changes to monitor and indicator scripts 2022-09-14 16:36:19 +03:00
Claudio Atzori a0919ed495 [aggregator graph] save invalid records aside for further inspection 2022-09-14 13:27:39 +02:00
Alessia Bardi b99a011345 return empty Oaf list if record cannot be parsed 2022-09-13 11:51:55 +02:00
Alessia Bardi 27af5122d2 logs for non well formed XML files 2022-09-12 14:25:23 +02:00
Claudio Atzori ff6f789b6d code formatting 2022-09-09 15:16:31 +02:00
Claudio Atzori b5d6966c01 Merge branch 'beta' into clean_country 2022-09-09 12:20:19 +02:00
Claudio Atzori b5f7bd30be Merge branch 'beta' into clean_subjects 2022-09-09 12:20:04 +02:00
Alessia Bardi f14107ad77 Merge branch 'handle_as_instance_urls' of https://code-repo.d4science.org/D-Net/dnet-hadoop into handle_as_instance_urls 2022-09-09 12:17:19 +02:00
Alessia Bardi a539c6ccaf https for handle URLs 2022-09-09 12:16:28 +02:00
dimitrispie 71b069ca90 Changes to indicator and monitor scripts 2022-09-09 13:15:58 +03:00
Claudio Atzori 1203378441 Merge branch 'beta' into clean_subjects 2022-09-09 10:38:47 +02:00
Claudio Atzori 14dc909a14 Merge branch 'beta' into clean_country 2022-09-09 10:38:17 +02:00
Claudio Atzori 853c996fa2 Merge branch 'beta' into handle_as_instance_urls 2022-09-09 09:47:16 +02:00
Claudio Atzori a431e01383 Merge pull request 'orcid_multipleworks_download' (#242) from enrico.ottonello/dnet-hadoop:orcid_multipleworks_download into beta
Reviewed-on: #242
2022-09-09 08:45:02 +02:00
Alessia Bardi 9ef063d502 #7861#note-8 instance url from handle 2022-09-07 17:29:54 +03:00
Alessia Bardi 5c45d52af3 testing for RiuNet 2022-09-07 15:40:57 +03:00
dimitrispie 2b5f8c9c9a comment out duplicate table creation 2022-09-06 12:27:53 +03:00
Alessia Bardi a11eb38065 testing for RO-Hub 2022-09-02 16:07:36 +02:00
Enrico Ottonello bfdf2dc390 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid_multipleworks_download 2022-08-25 12:07:54 +02:00
Enrico Ottonello da1cf561e6 alignment with beta 2022-08-25 11:57:20 +02:00
Enrico Ottonello 27445ccdaa cleaned log 2022-08-25 11:56:14 +02:00
Claudio Atzori b7c387c21f cleaning of subjects: avoid duplicated subjects, prioritise collected vs inferred or other sources 2022-08-12 15:09:16 +02:00
Claudio Atzori adb526b0e1 Merge branch 'beta' into clean_subjects 2022-08-12 10:51:17 +02:00
Claudio Atzori cb7c07c54e [scholix] added step to create tar archive 2022-08-11 11:25:24 +02:00
Claudio Atzori 2aa16d0432 [scholix] fixed OpenCitation dump procedure 2022-08-10 17:39:29 +02:00
Miriam Baglioni 7dbdd4a0fe [Clean Country]changes related to #241 (comment) 2022-08-10 15:13:10 +02:00
Claudio Atzori 51ad93e545 [scholix] fixed OpenCitation dump procedure 2022-08-10 11:57:56 +02:00
Miriam Baglioni 62d2138806 [Clean Context] changed a bit the logic. Added the check not to have result hosted by a datasource of type institutional repository from NL. Added also the check that the country should have been included in the result via propagation for it to be removed 2022-08-08 14:10:47 +02:00
Claudio Atzori 3418ce50ac cleaning of subjects: perform the cleaning when the given value is equivalent to one of the terms in the vocabulary 2022-08-08 12:48:47 +02:00
Claudio Atzori a78028dabc Merge branch 'beta' into clean_subjects 2022-08-08 12:34:33 +02:00
Miriam Baglioni 390013a4b2 mergin with branch beta 2022-08-08 12:30:31 +02:00
Claudio Atzori 3937ff04de Merge branch 'beta' into tagEosc 2022-08-08 09:57:23 +02:00
Claudio Atzori a4815f6bec Merge branch 'beta' into clean_subjects 2022-08-05 16:57:03 +02:00
Claudio Atzori 29c4cde42e Merge branch 'clean_subjects' of https://code-repo.d4science.org/D-Net/dnet-hadoop into clean_subjects 2022-08-05 16:56:37 +02:00
Claudio Atzori 4eaa063b1f cleaning of subjects 2022-08-05 16:56:09 +02:00
Claudio Atzori 84598c7535 Merge pull request 'restored some collab indicators' (#240) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #240
2022-08-05 15:50:39 +02:00
Antonis Lempesis fcef5294e2 restored some collab indicators 2022-08-05 13:45:01 +03:00
Claudio Atzori 844f6eb465 Merge branch 'beta' into clean_subjects 2022-08-05 12:39:05 +02:00
Claudio Atzori 32cee1f619 WIP: cleaning of subjects 2022-08-05 12:32:08 +02:00
Claudio Atzori c1f2ffc53d Merge pull request 'commenting out the collab indicators because they still fail' (#237) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #237
2022-08-05 11:57:36 +02:00
Antonis Lempesis 227e10f4b3 commenting out the collab indicators because they still fail 2022-08-05 12:54:36 +03:00
Claudio Atzori 6c0fd9284b merge from beta 2022-08-05 10:42:53 +02:00
Claudio Atzori b78889a0ce WIP: cleaning of subjects 2022-08-05 09:11:37 +02:00
Miriam Baglioni a7a18d7630 [Graph Dump] removed code for the dump from the project. Fixed issues in tests when possible 2022-08-04 17:40:40 +02:00
Claudio Atzori 499826ead1 serialising field eoscifguidelines field in the Solr XML records 2022-08-04 12:40:48 +02:00
Claudio Atzori 27a91841e7 WIP: cleaning of subjects 2022-08-04 11:39:39 +02:00
Antonis Lempesis b09d7ddc74 fixed the datasourceOrganization relations 2022-08-03 12:26:50 +02:00
Claudio Atzori e62018e95d [aggregator graph] added more assertions in test 2022-08-03 12:26:05 +02:00
Claudio Atzori efd96e7e66 Merge pull request 'fixed the datasourceOrganization relations' (#233) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #233
2022-08-03 12:25:05 +02:00
Antonis Lempesis 8b0407d8ec fixed the datasourceOrganization relations 2022-08-03 12:26:59 +03:00
Claudio Atzori eb53b52f7c code formatting 2022-08-02 13:24:47 +02:00
Claudio Atzori 27681cf6bf Merge pull request '[stats wf] latest version of indicators + added FOS classification' (#232) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #232
2022-08-02 12:57:15 +02:00
Antonis Lempesis 1778d40c40 latest version of indicators 2022-08-02 13:39:34 +03:00
Claudio Atzori 209c7e9dab [datacite] avoid UnsupportedOperationException 2022-08-01 09:05:35 +02:00
Enrico Ottonello 64311b8be4 removed unuseful accumulator 2022-07-31 01:03:29 +02:00