Commit Graph

  • 9aebca77a0 Added exception throwing in Hadoop transformation when TR is not syntactically valid Sandro La Bruzzo 2024-01-29 14:41:02 +0100
  • f804c58bc7 Merge pull request 'Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf' (#386) from stats_with_spark_sql into beta Claudio Atzori 2024-01-29 09:11:59 +0100
  • 926903b06b Merge branch 'beta' into stats_with_spark_sql Claudio Atzori 2024-01-29 09:11:45 +0100
  • 078df0b4d1 Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf Giambattista Bloisi 2024-01-26 20:19:52 +0100
  • 4d0c59669b merged changes from beta Claudio Atzori 2024-01-26 16:08:54 +0100
  • bf99c424fa Merge pull request 'Fixed problem on missing author in crossref Mapping' (#383) from crossref_missing_author_fix into beta Claudio Atzori 2024-01-26 15:57:23 +0100
  • ce3200263e Merge branch 'beta' into crossref_missing_author_fix Claudio Atzori 2024-01-26 15:57:04 +0100
  • e889808daa Fixed problem on missing author in crossref Mapping Sandro La Bruzzo 2024-01-26 12:19:04 +0100
  • 9e8fc6aa88 [collection] increased logging from the oai-pmh metadata collection process Claudio Atzori 2024-01-26 09:17:20 +0100
  • c548796463 Changed step16-createIndicatorsTables to use a spark oozie action instead of hive Antonis Lempesis 2024-01-26 02:04:48 +0200
  • 0386f36385 Added workflow to update ORCID and replaced some parsing, because the update works and employments xml differs from the dump one. Sandro La Bruzzo 2024-01-25 19:40:59 +0100
  • a7115cfa9e max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%. Antonis Lempesis 2024-01-25 15:06:34 +0100
  • fd43b0e84a max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%. Antonis Lempesis 2024-01-25 15:06:34 +0100
  • 2838a9b630 Update 'CONTRIBUTING.md' Claudio Atzori 2024-01-24 16:07:05 +0100
  • da944a5c55 Merge pull request 'code of conduct and contributing' (#382) from contributing into beta Claudio Atzori 2024-01-24 15:40:26 +0100
  • 0c97a3a81a minor Claudio Atzori 2024-01-24 10:56:33 +0100
  • 2c1e6849f0 added code of conduct and contributing files Claudio Atzori 2024-01-24 10:36:41 +0100
  • 9b13c22e5d [graph provision] retrieve all the context information by adding all=true to the requests issued to thr API Claudio Atzori 2024-01-23 15:36:08 +0100
  • 3e96777cc4 [collection] increased logging from the oai-pmh metadata collection process Claudio Atzori 2024-01-23 15:21:03 +0100
  • 43e0bba7ed logg added during download Sandro La Bruzzo 2024-01-23 15:04:49 +0100
  • f7d06dc661 compilation after merging Miriam Baglioni 2024-01-23 11:43:08 +0100
  • 6e58d79623 mergin with branch beta Miriam Baglioni 2024-01-23 11:36:47 +0100
  • e0ec800d7e [BulkTagging] extend the definition of the pathMap to include also actions that should be performed of the value extracted from the result befor applying the constraint Miriam Baglioni 2024-01-23 11:34:53 +0100
  • 9812406589 Merge pull request '[graph provision] updated param specification for the XML converter job' (#380) from provision_community_api into beta Claudio Atzori 2024-01-23 08:55:59 +0100
  • f87f3a6483 [graph provision] updated param specification for the XML converter job Claudio Atzori 2024-01-23 08:54:37 +0100
  • 6fd25cf549 code formatting Claudio Atzori 2024-01-23 08:47:12 +0100
  • bd187ec6e7 Merge pull request 'Implements pivots table update oozie workflow' (#376) from update_pivots_table into beta Claudio Atzori 2024-01-22 16:37:30 +0100
  • f76852f385 Merge branch 'beta' into update_pivots_table Claudio Atzori 2024-01-22 16:37:22 +0100
  • b9fcc5ad5e Merge pull request 'Context API update' (#379) from provision_community_api into beta Claudio Atzori 2024-01-22 15:55:33 +0100
  • 1c6db320f4 [graph provision] obtain context info from the context API instead from the ISLookUp service Claudio Atzori 2024-01-22 15:53:17 +0100
  • 2655eea5bc [orcid enrichment] drop paths before copying the non-modifyed contents Claudio Atzori 2024-01-19 16:28:05 +0100
  • c6b3401596 increased shuffle partitions for publications in the country propagation workflow Claudio Atzori 2024-01-19 10:15:39 +0100
  • bcc0a13981 [enrichment single step] adding <end> element in wf definition Miriam Baglioni 2024-01-18 17:39:14 +0100
  • 6af536541d [enrichment single step] moving parameter file in correct location Miriam Baglioni 2024-01-18 15:35:40 +0100
  • a12a3eb143 - Miriam Baglioni 2024-01-18 15:18:10 +0100
  • 628fdfb5eb Merge pull request '[enrichment single step]' (#378) from enrichmentSingleStepFixed into beta Claudio Atzori 2024-01-18 09:41:09 +0100
  • 82e9e262ee [enrichment single step] remove parameter from execution Miriam Baglioni 2024-01-17 17:38:03 +0100
  • 22eaf211e8 Last commit usage-stats-export-wf-v2 dimitrispie 2024-01-17 18:02:33 +0200
  • 67ce2d54be [enrichment single step] refactoring to fix issues in disappeared result type Miriam Baglioni 2024-01-17 16:50:00 +0100
  • 59eaccbd87 [enrichment single step] refactoring to fix issue in disappeared result type Miriam Baglioni 2024-01-15 17:49:54 +0100
  • 21a14fcd80 Reusable RunSQLSparkJob for executing SQL in Spark through Oozie Spark Actions Implements pivots table update oozie workflow Giambattista Bloisi 2024-01-15 00:08:07 +0100
  • e0753f19da Fixed error of connection timeout Sandro La Bruzzo 2024-01-13 09:27:08 +0100
  • e328bc0ade fixed missing parameter on download update sandro.labruzzo 2024-01-12 16:18:20 +0100
  • 2d302e6827 Merge pull request '[FoS integration]fix issue on FoS integration. Removing the null values from FoS' (#375) from fosPreparationBeta into beta Claudio Atzori 2024-01-12 10:27:28 +0100
  • f612125939 fix issue on FoS integration. Removing the null values from FoS Miriam Baglioni 2024-01-12 10:20:28 +0100
  • c67467723b Merge pull request 'refined mapping for the extraction of the original resource type' (#374) from resource_types into beta Claudio Atzori 2024-01-11 16:29:47 +0100
  • cb9e739484 Merge branch 'beta' into resource_types Claudio Atzori 2024-01-11 16:29:41 +0100
  • 2753044d13 refined mapping for the extraction of the original resource type Claudio Atzori 2024-01-11 16:28:26 +0100
  • a88dce5bf3 Merge pull request 'Improvements and refactoring in Dedup' (#367) from dedup_increasenumofblocks into beta Giambattista Bloisi 2024-01-11 11:24:06 +0100
  • 3c66e3bd7b Create dedup record for "merged" pivots Do not create dedup records for group that have more than 20 different acceptance date Giambattista Bloisi 2023-12-22 09:57:30 +0100
  • 10e135db1e Use dedup_wf_002 in place of dedup_wf_001 to make explicit a different algorithm has been used to generate those kind of ids Giambattista Bloisi 2023-12-22 09:55:10 +0100
  • 831cc1fdde Generate "merged" dedup id relations also for records that are filtered out by the cut parameters Giambattista Bloisi 2023-12-14 11:51:02 +0100
  • 1287315ffb Do no longer use dedupId information from pivotHistory Database Giambattista Bloisi 2023-12-11 21:26:05 +0100
  • 02636e802c SparkCreateSimRels: - Create dedup blocks from the complete queue of records matching cluster key instead of truncating the results - Clean titles once before clustering and similarity comparisons - Added support for filtered fields in model - Added support for sorting List fields in model - Added new JSONListClustering and numAuthorsTitleSuffixPrefixChain clustering functions - Added new maxLengthMatch comparator function - Use reduced complexity Levenshtein with threshold in levensteinTitle - Use reduced complexity AuthorsMatch with threshold early-quit - Use incremental Connected Component to decrease comparisons in similarity match in BlockProcessor - Use new clusterings configuration in Dedup tests Giambattista Bloisi 2023-10-02 09:25:12 +0200
  • e024718f73 creating result_instances even when no pids exist for the instance Antonis Lempesis 2024-01-10 22:25:50 +0100
  • 859babf722 added some useful comment Sandro La Bruzzo 2024-01-10 19:51:13 +0100
  • 39ebb60b38 Merge remote-tracking branch 'origin/beta' into orcid_update Sandro La Bruzzo 2024-01-10 19:50:00 +0100
  • 9d5a7c3b22 code refactor Sandro La Bruzzo 2024-01-10 19:42:34 +0100
  • 8f61063201 Added workflow Sandro La Bruzzo 2024-01-10 19:42:22 +0100
  • 1a42a5c10d Implemented Download update of ORCID Sandro La Bruzzo 2024-01-10 18:03:20 +0100
  • 16d858fbf0 Merge pull request 'enrichmentSingleStep' (#373) from enrichmentSingleStep into beta Claudio Atzori 2024-01-10 16:58:49 +0100
  • e711a05229 fixed conflicts Miriam Baglioni 2024-01-10 11:03:42 +0100
  • 71d6f30711 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta Miriam Baglioni 2024-01-10 10:59:58 +0100
  • b920307bdd Changes to indicators dimitrispie 2024-01-09 00:47:09 +0200
  • 8b2cbb611e Changes to beta db names dimitrispie 2024-01-09 00:40:56 +0200
  • 2e4cab026c fixed the result_country definition Antonis Lempesis 2024-01-08 16:01:26 +0200
  • 6b823100ae Update buildIrishMonitorDB.sql dimitrispie 2024-01-07 22:54:39 +0200
  • 75bfde043c Historical Snapshots Workflow dimitrispie 2024-01-04 15:11:04 +0200
  • cb14470ba6 added properties file in the forlder for the workflow of result to organization from inst repo propagation. Changes the path in the classes implementing the propagation Miriam Baglioni 2023-12-22 14:50:05 +0100
  • 9f966b59d4 added properties file in the forlder for the workflow of result to community from semrel propagation. Changes the path in the classes implementing the propagation Miriam Baglioni 2023-12-22 14:11:47 +0100
  • 2f3b5a133d added properties file in the forlder for the workflow of result to community from organization propagation. Changes the path in the classes implementing the propagation Miriam Baglioni 2023-12-22 13:56:40 +0100
  • 2f7b9ad815 added properties file in the forlder for the workflow of project to result propagation. Changes the path in the classes implementing the propagation Miriam Baglioni 2023-12-22 11:46:15 +0100
  • f2352e8a78 changed in the classes the path for the property files for the propagation of community from project Miriam Baglioni 2023-12-22 11:43:34 +0100
  • 009730b3d1 added properties file in the forlder for the workflow of orcid propagation. Changes the path in the classes implementing the propagationchanged the path to the parameter file in the class for entitytoorganization propagation Miriam Baglioni 2023-12-22 11:42:09 +0100
  • 89f269c7f4 changed the path to the parameter file in the class for entitytoorganization propagation Miriam Baglioni 2023-12-22 11:37:50 +0100
  • b06aea0adf adding the bulkTag parameter file in the folder for the oozie workflow for bulkTagging. Changes the path in the class Miriam Baglioni 2023-12-22 11:35:37 +0100
  • 3afd4aa57b adjustments for country propagation Miriam Baglioni 2023-12-22 11:27:30 +0100
  • ffdd03d2f4 Monitor Irish Stats WF dimitrispie 2023-12-22 11:05:24 +0200
  • 40b98d8182 Changes to indicators and funders definition dimitrispie 2023-12-22 10:29:20 +0200
  • 62104790ae added metaresourcetype to the result hive DB view Claudio Atzori 2023-12-21 12:26:19 +0100
  • 106968adaa Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop Claudio Atzori 2023-12-21 12:26:29 +0100
  • a8a4db96f0 added metaresourcetype to the result hive DB view Claudio Atzori 2023-12-21 12:26:19 +0100
  • 5011c4d11a refactoring after compiletion Miriam Baglioni 2023-12-20 15:57:26 +0100
  • 4740c808f7 - Miriam Baglioni 2023-12-20 14:26:54 +0100
  • d410ea8a41 added needed parameter Miriam Baglioni 2023-12-19 12:15:01 +0100
  • 37e36baf76 updated workflow for generation of Scholix Datasource's to use mdstore transactions Sandro La Bruzzo 2023-12-18 16:05:35 +0100
  • 624f5f3f21 [Transformative Agreement] added check to verify the APC were paid byu the IReL funder Miriam Baglioni 2023-12-18 15:28:19 +0100
  • 354e02e6a9 [Transformative Agreement] removed not needed class. Read directly the json and no need to pass from the csv Miriam Baglioni 2023-12-18 15:20:27 +0100
  • b00771c7cc [Transformative Agreement] added code to extract relations from the transformative agreement file for the IE products got from OpenAPC Miriam Baglioni 2023-12-18 15:12:44 +0100
  • 9d39845d1f uploaded input parameters on CreateBaseline WF Sandro La Bruzzo 2023-12-18 12:23:12 +0100
  • 15fd93a2b6 uploaded input parameters on CreateBaseline WF Sandro La Bruzzo 2023-12-18 12:21:55 +0100
  • 9d342a47da updated the transformation Baseline workflow to include mdstore rollback/commit action Sandro La Bruzzo 2023-12-18 11:48:57 +0100
  • 1fbd4325f5 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop Sandro La Bruzzo 2023-12-18 11:47:17 +0100
  • 1f1a6a5f5f updated the transformation Baseline workflow to include mdstore rollback/commit action Sandro La Bruzzo 2023-12-18 11:47:00 +0100
  • 3eca5d2e1c - Miriam Baglioni 2023-12-18 09:55:27 +0100
  • 01ce0b9c76 [doiboost - preprocess] remove transition to orcid preparation from sequence of steps at the beginning of the workflow Miriam Baglioni 2023-12-15 12:24:55 +0100
  • 0d8e496a63 - Miriam Baglioni 2023-12-15 12:16:43 +0100
  • c4ec35b6cd Merge pull request 'Master branch updates from beta December 2023' (#369) from beta_to_master_dicember2023 into master Claudio Atzori 2023-12-15 11:18:30 +0100
  • 1726f49790 code formatting Claudio Atzori 2023-12-15 10:37:02 +0100
  • a59be5779e Merge pull request '9078_xml_records_irish_tender' (#368) from 9078_xml_records_irish_tender into beta Claudio Atzori 2023-12-12 12:34:43 +0100