1
0
Fork 0

Commit Graph

  • b19643f6eb Dedup aliases, created when a dedup in a previous build has been merged in a new dedup, need to be marked as "deletedbyinference", since they are "merged" in the new dedup Giambattista Bloisi 2024-02-08 15:12:16 +0100
  • e6bdee86d1 Merge pull request 'Support for the PromoteAction strategy' (#389) from promote_actions_join_type into beta Claudio Atzori 2024-02-08 15:08:05 +0100
  • dd4c27f4f3 added 2 new institutions in monitor Antonis Lempesis 2024-02-08 12:57:57 +0200
  • 38c9001147 fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite) Claudio Atzori 2024-02-07 17:02:05 +0100
  • fd17c1f17c [actiosets] fixed join type Claudio Atzori 2024-02-05 16:55:36 +0200
  • 009dcf6aea [actiosets] introduced support for the PromoteAction strategy Claudio Atzori 2024-02-05 16:43:40 +0200
  • 4fd242155e Use the "nameservice1" virtual name which should resolve automatically to the active node in the Ocean Cluster. Lampros Smyrnaios 2024-02-05 16:38:27 +0200
  • ad99940c3b Upgrade the script to copy stats-DB from Ocean to Impala cluster: - Use the "Hive" CLI in the Ocean Cluster, as the "impala-shell" will be removed. - Select the active Node of the Impala Cluster, before performing any actions there. - Invalidate metadata of tables after creating their schema or filling them with data. - Code polishing. Lampros Smyrnaios 2024-02-05 15:18:48 +0200
  • bb82052c40 [graph cleaning] rule out datasources without an officialname Claudio Atzori 2024-02-05 14:59:06 +0200
  • aaa7d3cf86 Merge remote-tracking branch 'main/beta' into continuous_validation2 Lampros Smyrnaios 2024-02-05 11:57:45 +0200
  • 42f5506306 [orcid enrichment] fixed directory cleanup before distcp Claudio Atzori 2024-02-05 09:44:56 +0200
  • cbe7c6734a - Add documentation. - Code polishing/cleanup. Lampros Smyrnaios 2024-02-02 14:18:46 +0200
  • b5f4d37827 Merge branch 'beta' of https://code-repo.d4science.org/lsmyrnaios/dnet-hadoop into continuous_validation2 Lampros Smyrnaios 2024-01-31 13:01:42 +0200
  • f2a08d8cc2 test for Italian records from IRS repositories Alessia Bardi 2024-01-30 19:20:14 +0100
  • a512ead447 changed orcid ids to all capital Antonis Lempesis 2024-01-30 16:54:47 +0200
  • 07a373a0bd [bulkTagging] removing checks while performing the substring action so that it will fire an Exception if the paramneters are wrongly set Miriam Baglioni 2024-01-30 13:51:11 +0100
  • ead08b0dd4 mergin with branch beta Miriam Baglioni 2024-01-30 12:19:10 +0100
  • bb10a22290 merged changes from dnet-hadoop Antonis Lempesis 2024-01-29 21:51:47 +0200
  • a5995ab557 [orcid-enrichment] change the value of parameters. Miriam Baglioni 2024-01-29 18:19:48 +0100
  • a418dacb47 [UsageCount] code extention to include also the name of the datasource Miriam Baglioni 2024-01-29 18:12:33 +0100
  • e9131f4e4a mergin with branch beta Miriam Baglioni 2024-01-29 16:27:18 +0100
  • 9aebca77a0 Added exception throwing in Hadoop transformation when TR is not syntactically valid Sandro La Bruzzo 2024-01-29 14:41:02 +0100
  • f804c58bc7 Merge pull request 'Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf' (#386) from stats_with_spark_sql into beta Claudio Atzori 2024-01-29 09:11:59 +0100
  • 926903b06b Merge branch 'beta' into stats_with_spark_sql Claudio Atzori 2024-01-29 09:11:45 +0100
  • 078df0b4d1 Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf Giambattista Bloisi 2024-01-26 20:19:52 +0100
  • bf99c424fa Merge pull request 'Fixed problem on missing author in crossref Mapping' (#383) from crossref_missing_author_fix into beta Claudio Atzori 2024-01-26 15:57:23 +0100
  • ce3200263e Merge branch 'beta' into crossref_missing_author_fix Claudio Atzori 2024-01-26 15:57:04 +0100
  • e889808daa Fixed problem on missing author in crossref Mapping Sandro La Bruzzo 2024-01-26 12:19:04 +0100
  • 9e8fc6aa88 [collection] increased logging from the oai-pmh metadata collection process Claudio Atzori 2024-01-26 09:17:20 +0100
  • c548796463 Changed step16-createIndicatorsTables to use a spark oozie action instead of hive Antonis Lempesis 2024-01-26 02:04:48 +0200
  • 0386f36385 Added workflow to update ORCID and replaced some parsing, because the update works and employments xml differs from the dump one. Sandro La Bruzzo 2024-01-25 19:40:59 +0100
  • 8d0ed7d414 Continuous-Validation updates: - Update the "uoa-validator-engine2" dependency. - Update the "installProject.sh" script to account for potential conflict with previous builds. - Add documentation. Lampros Smyrnaios 2024-01-25 18:23:14 +0200
  • a7115cfa9e max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%. Antonis Lempesis 2024-01-25 15:06:34 +0100
  • fd43b0e84a max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%. Antonis Lempesis 2024-01-25 15:06:34 +0100
  • 2838a9b630 Update 'CONTRIBUTING.md' Claudio Atzori 2024-01-24 16:07:05 +0100
  • da944a5c55 Merge pull request 'code of conduct and contributing' (#382) from contributing into beta Claudio Atzori 2024-01-24 15:40:26 +0100
  • eee085c104 Force a check for updated releases and snapshots on remote repositories, when building the Continuous-Validation workflow. Lampros Smyrnaios 2024-01-24 14:08:32 +0200
  • 0c97a3a81a minor Claudio Atzori 2024-01-24 10:56:33 +0100
  • 2c1e6849f0 added code of conduct and contributing files Claudio Atzori 2024-01-24 10:36:41 +0100
  • 9b13c22e5d [graph provision] retrieve all the context information by adding all=true to the requests issued to thr API Claudio Atzori 2024-01-23 15:36:08 +0100
  • 3e96777cc4 [collection] increased logging from the oai-pmh metadata collection process Claudio Atzori 2024-01-23 15:21:03 +0100
  • 43e0bba7ed logg added during download Sandro La Bruzzo 2024-01-23 15:04:49 +0100
  • f7d06dc661 compilation after merging Miriam Baglioni 2024-01-23 11:43:08 +0100
  • 6e58d79623 mergin with branch beta Miriam Baglioni 2024-01-23 11:36:47 +0100
  • e0ec800d7e [BulkTagging] extend the definition of the pathMap to include also actions that should be performed of the value extracted from the result befor applying the constraint Miriam Baglioni 2024-01-23 11:34:53 +0100
  • 9812406589 Merge pull request '[graph provision] updated param specification for the XML converter job' (#380) from provision_community_api into beta Claudio Atzori 2024-01-23 08:55:59 +0100
  • f87f3a6483 [graph provision] updated param specification for the XML converter job Claudio Atzori 2024-01-23 08:54:37 +0100
  • 6fd25cf549 code formatting Claudio Atzori 2024-01-23 08:47:12 +0100
  • bd187ec6e7 Merge pull request 'Implements pivots table update oozie workflow' (#376) from update_pivots_table into beta Claudio Atzori 2024-01-22 16:37:30 +0100
  • f76852f385 Merge branch 'beta' into update_pivots_table Claudio Atzori 2024-01-22 16:37:22 +0100
  • b9fcc5ad5e Merge pull request 'Context API update' (#379) from provision_community_api into beta Claudio Atzori 2024-01-22 15:55:33 +0100
  • 1c6db320f4 [graph provision] obtain context info from the context API instead from the ISLookUp service Claudio Atzori 2024-01-22 15:53:17 +0100
  • 2655eea5bc [orcid enrichment] drop paths before copying the non-modifyed contents Claudio Atzori 2024-01-19 16:28:05 +0100
  • c6b3401596 increased shuffle partitions for publications in the country propagation workflow Claudio Atzori 2024-01-19 10:15:39 +0100
  • bcc0a13981 [enrichment single step] adding <end> element in wf definition Miriam Baglioni 2024-01-18 17:39:14 +0100
  • ff47a941f5 - Add the "installProject.sh" script. - Show the Job-ID or potential deployment-error-logs, right after the deployment of the workflow. - Code polishing. Lampros Smyrnaios 2024-01-18 18:06:50 +0200
  • 6af536541d [enrichment single step] moving parameter file in correct location Miriam Baglioni 2024-01-18 15:35:40 +0100
  • a12a3eb143 - Miriam Baglioni 2024-01-18 15:18:10 +0100
  • 00644ef487 - Fix the "NoSuchFieldError", caused by library-conflicts, by introducing the "oozie.libpath" property in "workflow.xml". - Fix the value of the "outputPath" property, in "workflow.xml". Lampros Smyrnaios 2024-01-18 15:46:27 +0200
  • 628fdfb5eb Merge pull request '[enrichment single step]' (#378) from enrichmentSingleStepFixed into beta Claudio Atzori 2024-01-18 09:41:09 +0100
  • 82e9e262ee [enrichment single step] remove parameter from execution Miriam Baglioni 2024-01-17 17:38:03 +0100
  • 23ec57c670 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into continuous_validation2 Lampros Smyrnaios 2024-01-17 18:16:11 +0200
  • c17834dddf - Use "KryoSerializer" for Spark and register some "result" classes. - Code polishing and cleanup. Lampros Smyrnaios 2024-01-17 18:08:05 +0200
  • 67ce2d54be [enrichment single step] refactoring to fix issues in disappeared result type Miriam Baglioni 2024-01-17 16:50:00 +0100
  • 59eaccbd87 [enrichment single step] refactoring to fix issue in disappeared result type Miriam Baglioni 2024-01-15 17:49:54 +0100
  • 21a14fcd80 Reusable RunSQLSparkJob for executing SQL in Spark through Oozie Spark Actions Implements pivots table update oozie workflow Giambattista Bloisi 2024-01-15 00:08:07 +0100
  • e0753f19da Fixed error of connection timeout Sandro La Bruzzo 2024-01-13 09:27:08 +0100
  • e328bc0ade fixed missing parameter on download update sandro.labruzzo 2024-01-12 16:18:20 +0100
  • 2d302e6827 Merge pull request '[FoS integration]fix issue on FoS integration. Removing the null values from FoS' (#375) from fosPreparationBeta into beta Claudio Atzori 2024-01-12 10:27:28 +0100
  • f612125939 fix issue on FoS integration. Removing the null values from FoS Miriam Baglioni 2024-01-12 10:20:28 +0100
  • c67467723b Merge pull request 'refined mapping for the extraction of the original resource type' (#374) from resource_types into beta Claudio Atzori 2024-01-11 16:29:47 +0100
  • cb9e739484 Merge branch 'beta' into resource_types Claudio Atzori 2024-01-11 16:29:41 +0100
  • 2753044d13 refined mapping for the extraction of the original resource type Claudio Atzori 2024-01-11 16:28:26 +0100
  • a88dce5bf3 Merge pull request 'Improvements and refactoring in Dedup' (#367) from dedup_increasenumofblocks into beta Giambattista Bloisi 2024-01-11 11:24:06 +0100
  • 3c66e3bd7b Create dedup record for "merged" pivots Do not create dedup records for group that have more than 20 different acceptance date Giambattista Bloisi 2023-12-22 09:57:30 +0100
  • 10e135db1e Use dedup_wf_002 in place of dedup_wf_001 to make explicit a different algorithm has been used to generate those kind of ids Giambattista Bloisi 2023-12-22 09:55:10 +0100
  • 831cc1fdde Generate "merged" dedup id relations also for records that are filtered out by the cut parameters Giambattista Bloisi 2023-12-14 11:51:02 +0100
  • 1287315ffb Do no longer use dedupId information from pivotHistory Database Giambattista Bloisi 2023-12-11 21:26:05 +0100
  • 02636e802c SparkCreateSimRels: - Create dedup blocks from the complete queue of records matching cluster key instead of truncating the results - Clean titles once before clustering and similarity comparisons - Added support for filtered fields in model - Added support for sorting List fields in model - Added new JSONListClustering and numAuthorsTitleSuffixPrefixChain clustering functions - Added new maxLengthMatch comparator function - Use reduced complexity Levenshtein with threshold in levensteinTitle - Use reduced complexity AuthorsMatch with threshold early-quit - Use incremental Connected Component to decrease comparisons in similarity match in BlockProcessor - Use new clusterings configuration in Dedup tests Giambattista Bloisi 2023-10-02 09:25:12 +0200
  • e024718f73 creating result_instances even when no pids exist for the instance Antonis Lempesis 2024-01-10 22:25:50 +0100
  • 859babf722 added some useful comment Sandro La Bruzzo 2024-01-10 19:51:13 +0100
  • 39ebb60b38 Merge remote-tracking branch 'origin/beta' into orcid_update Sandro La Bruzzo 2024-01-10 19:50:00 +0100
  • 9d5a7c3b22 code refactor Sandro La Bruzzo 2024-01-10 19:42:34 +0100
  • 8f61063201 Added workflow Sandro La Bruzzo 2024-01-10 19:42:22 +0100
  • 1a42a5c10d Implemented Download update of ORCID Sandro La Bruzzo 2024-01-10 18:03:20 +0100
  • 16d858fbf0 Merge pull request 'enrichmentSingleStep' (#373) from enrichmentSingleStep into beta Claudio Atzori 2024-01-10 16:58:49 +0100
  • e711a05229 fixed conflicts Miriam Baglioni 2024-01-10 11:03:42 +0100
  • 71d6f30711 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta Miriam Baglioni 2024-01-10 10:59:58 +0100
  • 32e02247bc Merge branch 'continuous_validation2' of https://code-repo.d4science.org/lsmyrnaios/dnet-hadoop into continuous_validation2 Lampros Smyrnaios 2024-01-09 17:05:07 +0200
  • eaa070f1e6 Code cleanup. Lampros Smyrnaios 2024-01-09 17:03:35 +0200
  • fc35b44e22 bumped version of uoa-validator-engine2 to 0.9.3 Claudio Atzori 2024-01-09 15:56:56 +0100
  • b920307bdd Changes to indicators dimitrispie 2024-01-09 00:47:09 +0200
  • 8b2cbb611e Changes to beta db names dimitrispie 2024-01-09 00:40:56 +0200
  • 2e4cab026c fixed the result_country definition Antonis Lempesis 2024-01-08 16:01:26 +0200
  • 6b823100ae Update buildIrishMonitorDB.sql dimitrispie 2024-01-07 22:54:39 +0200
  • 75bfde043c Historical Snapshots Workflow dimitrispie 2024-01-04 15:11:04 +0200
  • cb14470ba6 added properties file in the forlder for the workflow of result to organization from inst repo propagation. Changes the path in the classes implementing the propagation Miriam Baglioni 2023-12-22 14:50:05 +0100
  • 9f966b59d4 added properties file in the forlder for the workflow of result to community from semrel propagation. Changes the path in the classes implementing the propagation Miriam Baglioni 2023-12-22 14:11:47 +0100
  • 2f3b5a133d added properties file in the forlder for the workflow of result to community from organization propagation. Changes the path in the classes implementing the propagation Miriam Baglioni 2023-12-22 13:56:40 +0100
  • 2f7b9ad815 added properties file in the forlder for the workflow of project to result propagation. Changes the path in the classes implementing the propagation Miriam Baglioni 2023-12-22 11:46:15 +0100