Commit Graph

5687 Commits

Author SHA1 Message Date
Claudio Atzori 0139f23d66 Merge pull request 'organization type from OpenOrgs' (#445) from import_openorg_type into beta
Reviewed-on: #445
2024-06-07 12:17:31 +02:00
Michele Artini c726572418 changed some parameters in OSF test 2024-06-07 12:03:26 +02:00
Claudio Atzori ec79405cc9 [graph raw] set organization type from openorgs 2024-06-07 11:30:31 +02:00
Miriam Baglioni 1477406ecc [bulkTag] fixed issue that made project disappear in graph_10_enriched 2024-06-06 10:45:41 +02:00
Claudio Atzori 92c3abd5a4 [graph cleaning] use sparkExecutorMemory to define also the memoryOverhead 2024-06-06 10:44:33 +02:00
Claudio Atzori ce2364743a applying changes from PR#442: Fix for missing collectedfrom after dedup 2024-06-06 10:43:43 +02:00
Claudio Atzori f70dc76b61 minor 2024-06-06 10:43:10 +02:00
Claudio Atzori efc1632e16 code formatting 2024-06-06 09:25:26 +02:00
Claudio Atzori 91b49366c6 [graph provision] align serialisation of the usage count measures to the agrred specifications 2024-06-05 16:34:40 +02:00
Claudio Atzori 5e05385d35 minor 2024-06-05 16:31:58 +02:00
Miriam Baglioni c4d9b5b9d2 [downloadsAndViews]update the test file to consider the new serialization for downloads and views 2024-06-05 16:30:15 +02:00
Miriam Baglioni bf9a5e6314 [downloadsAndViews]changed the test file to check the indicators are not there if their value is 0 2024-06-05 16:29:40 +02:00
Miriam Baglioni 9d79ddb3dd [bulkTag] fixed issue that made project disappear in graph_10_enriched 2024-06-05 16:20:40 +02:00
Miriam Baglioni 907aa28c6c [downloadsAndViews] fixed issue 2024-06-05 16:19:29 +02:00
Miriam Baglioni 3955ceaa76 [downloadsAndViews] changed the serialization for downloads and views 2024-06-05 16:18:46 +02:00
Miriam Baglioni 128c143394 {downloadsAndViews] extended test file with measures for downloads and views 2024-06-05 16:17:59 +02:00
Claudio Atzori 5133993ee5 Merge branch 'beta_to_master_may2024' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta_to_master_may2024 2024-06-05 12:17:48 +02:00
Claudio Atzori 73bd1938a5 [graph2hive] use sparkExecutorMemory to define also the memoryOverhead 2024-06-05 12:17:35 +02:00
Claudio Atzori 5cf259a851 [graph2hive] use sparkExecutorMemory to define also the memoryOverhead 2024-06-05 12:17:16 +02:00
Claudio Atzori e1828fc60e Merge pull request '[PROD] Irish oaipmh exporter' (#444) from irish-oaipmh-exporter into beta_to_master_may2024
Reviewed-on: #444
2024-06-05 10:56:20 +02:00
Claudio Atzori da5c1e73a4 Merge pull request 'Irish oaipmh exporter' (#443) from irish-oaipmh-exporter into beta
Reviewed-on: #443
2024-06-05 10:55:09 +02:00
Claudio Atzori 81090ad593 [IE OAIPHM] added oozie workflow, minor changes, code formatting 2024-06-05 10:03:33 +02:00
Claudio Atzori 56920b447d Merge pull request 'Fix for missing collectedfrom after dedup' (#442) from fix_mergedcliquesort into beta_to_master_may2024
Reviewed-on: #442
2024-06-03 15:34:01 +02:00
Giambattista Bloisi 3feab5d92d Fix MergeUtils.mergeGroup: it could get rid of some records and did not consider all PID authorities whilke sorting records.
ResultTypeComparator is now renamed in MergeEntitiesComparator and can be used as a general comparator for merging groups of records
2024-06-03 15:13:40 +02:00
Claudio Atzori a02f3f0d2b code formatting 2024-05-30 10:21:18 +02:00
Alessia Bardi eadfd8d71d Merge pull request 'Updated XMLIterator for splitting on different nodes' (#436) from dblp_collection_plugin into beta
Reviewed-on: #436
2024-05-29 16:05:06 +02:00
Alessia Bardi 05ee783c07 Merge branch 'beta' into dblp_collection_plugin 2024-05-29 16:04:39 +02:00
Alessia Bardi fe9fb59c90 Merge pull request 'Rest collector plugin on hadoop supports a new param to pass request headers' (#441) from rest-collector-request-header-map into beta
Reviewed-on: #441
2024-05-29 15:54:39 +02:00
Claudio Atzori c272c4ad68 code formatting 2024-05-29 15:50:07 +02:00
Alessia Bardi c5f4da16a4 Merge branch 'beta' into rest-collector-request-header-map 2024-05-29 15:46:23 +02:00
Alessia 1b165a14a0 Rest collector plugin on hadoop supports a new param to pass request headers 2024-05-29 15:41:36 +02:00
Michele Artini e996787be2 OSF test 2024-05-29 15:05:17 +02:00
Claudio Atzori 6be783caec [graph cleaning] use sparkExecutorMemory to define also the memoryOverhead 2024-05-29 14:36:49 +02:00
Claudio Atzori 62716141c5 Merge pull request 'Miscellaneous updates to the copying operation to Impala Cluster' (#440) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #440
2024-05-29 14:34:51 +02:00
Claudio Atzori b703f94f09 Merge pull request 'changes in copy script - beta2master' (#439) from antonis.lempesis/dnet-hadoop:beta into beta_to_master_may2024
Reviewed-on: #439
2024-05-29 14:29:26 +02:00
Miriam Baglioni 5d85b70e1f [NOAMI] removed Ireland funder id 501100011103. ticket 9635 2024-05-29 11:55:00 +02:00
Miriam Baglioni 14f275ffaf [NOAMI] removed Ireland funder id 501100011103. ticket 9635 2024-05-29 11:54:17 +02:00
Claudio Atzori a428e7be7e graph cleaning to implement ugly hardcoded rules, avoid NPEs 2024-05-29 09:26:12 +02:00
Lampros Smyrnaios e3f28338c1 Miscellaneous updates to the copying operation to Impala Cluster:
- Assign the WRITE and EXECUTE permissions to the DBs' HDFS-directories, in order to be able to create tables on top of them, in the Impala Cluster.
- Make sure the "copydb" function returns early, when it encounters a fatal error, while respecting the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" config.
2024-05-28 17:51:45 +03:00
Claudio Atzori 8e45c5baa8 graph cleaning to implement ugly hardcoded rules 2024-05-28 15:28:42 +02:00
Claudio Atzori db5e18c784 hostedby patching to work with the updated Crossref contents 2024-05-28 15:28:13 +02:00
Giambattista Bloisi 73316d8c83 Add jaxb and jaxws dependencies when compiling with spark-34 profile as they are required to run with jdk > 8 2024-05-28 14:14:51 +02:00
Miriam Baglioni 75d5ddb999 Update to include a blackList that filters out the results we know are wrongly associated to IE - update workflow definition - the blacklist parameter 2024-05-27 12:01:28 +02:00
Miriam Baglioni 87c9c61b41 Update to include a blackList that filters out the results we know are wrongly associated to IE - refactoring 2024-05-27 12:01:16 +02:00
Miriam Baglioni b55fed09f8 Update to include a blackList that filters out the results we know are wrongly associated to IE 2024-05-27 12:01:01 +02:00
Claudio Atzori 107d958b89 [org dedup] avoid NPEs in SparkPrepareNewOrgs 2024-05-27 11:59:54 +02:00
Claudio Atzori 3a7a6ecc32 [org dedup] avoid NPEs in SparkPrepareOrgRels 2024-05-27 11:59:45 +02:00
Claudio Atzori 1af4224d3d [org dedup] avoid NPEs in SparkPrepareOrgRels 2024-05-27 11:59:33 +02:00
Claudio Atzori 0d5bdb2db0 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2024-05-27 11:59:02 +02:00
Claudio Atzori 66548e6a83 Merge pull request 'changes in copy script' (#438) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #438
2024-05-27 11:54:03 +02:00