Commit Graph

5229 Commits

Author SHA1 Message Date
Giambattista Bloisi 85c1eae7e0 Fixes for pagination strategy looping at end of download 2024-06-10 19:03:58 +02:00
Claudio Atzori b0eba210c0 [actionset promotion] use sparkExecutorMemory to define also the memoryOverhead 2024-06-10 16:15:24 +02:00
Claudio Atzori 3776327a8c hostedby patching to work with the updated Crossref contents, resolved conflict 2024-06-10 15:24:12 +02:00
Claudio Atzori 0139f23d66 Merge pull request 'organization type from OpenOrgs' (#445) from import_openorg_type into beta
Reviewed-on: D-Net/dnet-hadoop#445
2024-06-07 12:17:31 +02:00
Michele Artini c726572418 changed some parameters in OSF test 2024-06-07 12:03:26 +02:00
Claudio Atzori ec79405cc9 [graph raw] set organization type from openorgs 2024-06-07 11:30:31 +02:00
Miriam Baglioni 1477406ecc [bulkTag] fixed issue that made project disappear in graph_10_enriched 2024-06-06 10:45:41 +02:00
Claudio Atzori 92c3abd5a4 [graph cleaning] use sparkExecutorMemory to define also the memoryOverhead 2024-06-06 10:44:33 +02:00
Claudio Atzori ce2364743a applying changes from PR#442: Fix for missing collectedfrom after dedup 2024-06-06 10:43:43 +02:00
Claudio Atzori f70dc76b61 minor 2024-06-06 10:43:10 +02:00
Claudio Atzori 73bd1938a5 [graph2hive] use sparkExecutorMemory to define also the memoryOverhead 2024-06-05 12:17:35 +02:00
Claudio Atzori da5c1e73a4 Merge pull request 'Irish oaipmh exporter' (#443) from irish-oaipmh-exporter into beta
Reviewed-on: D-Net/dnet-hadoop#443
2024-06-05 10:55:09 +02:00
Claudio Atzori 81090ad593 [IE OAIPHM] added oozie workflow, minor changes, code formatting 2024-06-05 10:03:33 +02:00
Claudio Atzori a02f3f0d2b code formatting 2024-05-30 10:21:18 +02:00
Alessia Bardi eadfd8d71d Merge pull request 'Updated XMLIterator for splitting on different nodes' (#436) from dblp_collection_plugin into beta
Reviewed-on: D-Net/dnet-hadoop#436
2024-05-29 16:05:06 +02:00
Alessia Bardi 05ee783c07 Merge branch 'beta' into dblp_collection_plugin 2024-05-29 16:04:39 +02:00
Alessia Bardi fe9fb59c90 Merge pull request 'Rest collector plugin on hadoop supports a new param to pass request headers' (#441) from rest-collector-request-header-map into beta
Reviewed-on: D-Net/dnet-hadoop#441
2024-05-29 15:54:39 +02:00
Claudio Atzori c272c4ad68 code formatting 2024-05-29 15:50:07 +02:00
Alessia Bardi c5f4da16a4 Merge branch 'beta' into rest-collector-request-header-map 2024-05-29 15:46:23 +02:00
Alessia 1b165a14a0 Rest collector plugin on hadoop supports a new param to pass request headers 2024-05-29 15:41:36 +02:00
Michele Artini e996787be2 OSF test 2024-05-29 15:05:17 +02:00
Claudio Atzori 62716141c5 Merge pull request 'Miscellaneous updates to the copying operation to Impala Cluster' (#440) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#440
2024-05-29 14:34:51 +02:00
Miriam Baglioni 5d85b70e1f [NOAMI] removed Ireland funder id 501100011103. ticket 9635 2024-05-29 11:55:00 +02:00
Lampros Smyrnaios e3f28338c1 Miscellaneous updates to the copying operation to Impala Cluster:
- Assign the WRITE and EXECUTE permissions to the DBs' HDFS-directories, in order to be able to create tables on top of them, in the Impala Cluster.
- Make sure the "copydb" function returns early, when it encounters a fatal error, while respecting the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" config.
2024-05-28 17:51:45 +03:00
Giambattista Bloisi 73316d8c83 Add jaxb and jaxws dependencies when compiling with spark-34 profile as they are required to run with jdk > 8 2024-05-28 14:14:51 +02:00
Miriam Baglioni 75d5ddb999 Update to include a blackList that filters out the results we know are wrongly associated to IE - update workflow definition - the blacklist parameter 2024-05-27 12:01:28 +02:00
Miriam Baglioni 87c9c61b41 Update to include a blackList that filters out the results we know are wrongly associated to IE - refactoring 2024-05-27 12:01:16 +02:00
Miriam Baglioni b55fed09f8 Update to include a blackList that filters out the results we know are wrongly associated to IE 2024-05-27 12:01:01 +02:00
Claudio Atzori 107d958b89 [org dedup] avoid NPEs in SparkPrepareNewOrgs 2024-05-27 11:59:54 +02:00
Claudio Atzori 3a7a6ecc32 [org dedup] avoid NPEs in SparkPrepareOrgRels 2024-05-27 11:59:45 +02:00
Claudio Atzori 1af4224d3d [org dedup] avoid NPEs in SparkPrepareOrgRels 2024-05-27 11:59:33 +02:00
Claudio Atzori 0d5bdb2db0 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2024-05-27 11:59:02 +02:00
Claudio Atzori 66548e6a83 Merge pull request 'changes in copy script' (#438) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#438
2024-05-27 11:54:03 +02:00
Antonis Lempesis 15b54a345a added fos lvl4 2024-05-24 13:21:28 +03:00
Lampros Smyrnaios b48ed6e617 Change configuration in the copy-operation to Impala Cluster:
Set the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" parameter to "false".
2024-05-23 16:58:12 +03:00
Lampros Smyrnaios 68322843e2 Small updates to the copy-operation to Impala Cluster:
- Add a configuration-"switch" to control whether the script exits upon an error or not.
- Allow the script to exit when a table could not be created.
- Show the elapsed time for processing each database.
2024-05-23 15:07:49 +03:00
Lampros Smyrnaios c7b32bbacc Update CopyDataToImpalaCluster:
Update the code of acquiring the entities from Ocean cluster, through hive, in order to optimize the process and account for additional reserved keywords in Impala.

Co-authored-by: Antonis Lempesis <antleb@di.uoa.gr>
2024-05-23 13:00:19 +03:00
Giambattista Bloisi 1b2357e10a Merge pull request 'Changes in maven poms to build and test the project using Spark 3.4.x and scala 2.12' (#327) from spark34-integration into beta
Reviewed-on: D-Net/dnet-hadoop#327
2024-05-23 09:20:28 +02:00
Sandro La Bruzzo f1fe363b19 merged again from beta (I hope for the last time) 2024-05-22 11:08:52 +02:00
Sandro La Bruzzo 66c1ffc866 merged again from beta (I hope for the last time) 2024-05-22 11:02:46 +02:00
Claudio Atzori 1ea67eba82 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2024-05-21 13:48:48 +02:00
Claudio Atzori f9fb2fef6e Merge pull request 'Modification of Microsoft Academic Graph Mapping' (#435) from mag_only_doi into beta
Reviewed-on: D-Net/dnet-hadoop#435
2024-05-21 13:48:42 +02:00
Claudio Atzori 834461ba26 [graph provision]fixed wf definition, revised serialization of the usage counts measures 2024-05-21 13:48:06 +02:00
Sandro La Bruzzo e8a61d5dd5 removed plugin, use only FileGZip plugin 2024-05-21 13:45:29 +02:00
Sandro La Bruzzo ca9414b737 Implement multiple node name splitter on GZipCollectorPlugin and all nodes that use XMLIterator. If the splitter name contains is a comma separated values it splits for all the values 2024-05-21 09:11:13 +02:00
Sandro La Bruzzo 032bcc8279 since last beta workflow we decide to introduce in the graph only MAG item with DOI and set them invisible ( this should be the same behaviour of the previous DOIBoost mapping).
This commit apply this type of mapping
2024-05-20 09:24:15 +02:00
Sandro La Bruzzo 103e2652b3 merged beta 2024-05-17 14:43:07 +02:00
Sandro La Bruzzo a87f9ea643 fixed scholexplorer bug 2024-05-17 14:16:43 +02:00
Sandro La Bruzzo 6efab4d88e fixed scholexplorer bug 2024-05-16 16:19:18 +02:00
Claudio Atzori 92f018d196 [graph provision] fixed path pointing to an intermediate data store in the working directory 2024-05-15 15:39:18 +02:00