Commit Graph

  • 1275a07d45 Merge pull request '[graph indexing] expand the instance level fulltext in the XML records' (#326) from instance_fulltext_xml into beta Claudio Atzori 2023-07-27 15:02:07 +0200
  • a72b9e96ac expand the instance level fulltext in the XML records Claudio Atzori 2023-07-27 14:57:38 +0200
  • d512df8612 code formatting Claudio Atzori 2023-07-26 09:14:08 +0200
  • d8435a6512 inverted condition Claudio Atzori 2023-07-25 17:39:57 +0200
  • 59764145bb cherry picked & fixed commit 270df939c4 Claudio Atzori 2023-07-25 17:39:00 +0200
  • 270df939c4 partial implementation of the suggestions from https://support.openaire.eu/issues/8898 Claudio Atzori 2023-07-25 17:29:50 +0200
  • 8c63e4a864 Merge pull request 'Refactor Dedup using Spark Dataframe API, initial support for scala 2.12 and Spark 3.4' (#324) from dedup-with-dataframe-2 into beta Claudio Atzori 2023-07-25 10:17:17 +0200
  • e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface JsonPath cache contention fixed by using a ConcurrentHashMap Blacklist filtering performance improvement Minor performance improvements when evaluating similarity Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only) Giambattista Bloisi 2023-07-18 11:38:56 +0200
  • bb5b845e3c Use scala.binary.version property to resolve scala maven dependencies Ensure consistent usage of maven properties Profile for compiling with scala 2.12 and Spark 3.4 Giambattista Bloisi 2023-07-17 17:18:46 +0200
  • 002b24e06f Merge pull request '[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests' (#315) from pid_cleaning into beta Claudio Atzori 2023-07-24 10:49:44 +0200
  • c754397a19 Merge branch 'beta' into pid_cleaning Claudio Atzori 2023-07-24 10:49:31 +0200
  • f0678cda09 Merge pull request 'fix_beta_tests' (#323) from fix_beta_tests into beta Claudio Atzori 2023-07-24 10:47:35 +0200
  • 3a0f09774a Add script to find score limits Serafeim Chatzopoulos 2023-07-21 17:55:41 +0300
  • 06b9b71c4e Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow Ilias Kanellos 2023-07-21 17:42:49 +0300
  • 2374f445a9 Produce additional bip update specific files Ilias Kanellos 2023-07-21 17:42:46 +0300
  • cb0f3c50f6 Format workflow.xml Serafeim Chatzopoulos 2023-07-21 16:07:10 +0300
  • c64e5e588f Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow Serafeim Chatzopoulos 2023-07-21 15:27:02 +0300
  • 2cc5b1a39b Fixes in workflow.xml Serafeim Chatzopoulos 2023-07-21 15:26:50 +0300
  • 0f96af5d56 Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow Ilias Kanellos 2023-07-21 13:42:35 +0300
  • 03da965162 Format bip-score based file without doi references Ilias Kanellos 2023-07-21 13:42:30 +0300
  • f03153823a Update testCitationRelations number of expected citations according to changes made in 0559d8b4 (monodirectional citations) Giambattista Bloisi 2023-07-21 10:48:28 +0200
  • 54c1eacef1 SparkJobTest was failing because testing workingdir was not cleaned up after eact test Giambattista Bloisi 2023-07-21 10:42:24 +0200
  • 5e15f20e6e Fix entityMerger that was excluding the authors of the first entity in the list to merge Giambattista Bloisi 2023-07-21 00:46:54 +0200
  • 0210a14e43 Ignore timestamp differences in PromoteActionPayloadForGraphTableJobTest Giambattista Bloisi 2023-07-20 23:45:57 +0200
  • dba34505de Fix SparkStatsTest bug where parquet tables were incorrectly read as text files leading to unpredictable count() values Giambattista Bloisi 2023-07-19 14:24:52 +0200
  • e47ed1fdb2 Use DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES in json mapper to avoid that tests fail if they encounter unmapped properties Giambattista Bloisi 2023-07-19 14:21:40 +0200
  • 38dfebfbe6 Disable MdStoreClientTest test as it requires a local mongodb running and it does not perform any assertions Giambattista Bloisi 2023-07-19 14:18:56 +0200
  • 9e8e39f78a - Miriam Baglioni 2023-07-19 11:35:58 +0200
  • 373a5f2c83 Merge pull request 'Master branch updates from beta July 2023' (#317) from master_july23 into master Claudio Atzori 2023-07-18 18:22:04 +0200
  • db4ca43ee8 Resolve conflict Serafeim Chatzopoulos 2023-07-18 18:38:26 +0300
  • be320ba3c1 Indentation fixes Serafeim Chatzopoulos 2023-07-17 16:04:21 +0300
  • be4856ef35 Update step15.sql dimitrispie 2023-07-17 15:33:58 +0300
  • bc1a4611aa Minor changes Serafeim Chatzopoulos 2023-07-17 11:17:53 +0300
  • 8af129b0c7 merged stats promotion step from antonis/promotion-prod-only Claudio Atzori 2023-07-13 15:03:28 +0200
  • 706092bc19 Update updateProductionViews.sh dimitrispie 2023-07-13 15:48:12 +0300
  • aedd279f78 Updates Promotion DBs dimitrispie 2023-07-12 22:49:08 +0300
  • 163b2ee2a8 Changes dimitrispie 2023-07-13 15:25:00 +0300
  • 76901a25f9 Updates Promotion DBs dimitrispie 2023-07-12 22:49:08 +0300
  • ef493681d9 Merge pull request 'Import dnet-pace-core module in this project and use it after renaming to dhp-pace-core' (#319) from beta_with_pace_core into beta Giambattista Bloisi 2023-07-11 14:03:15 +0200
  • 4eba14a80e Add oozie workflow Serafeim Chatzopoulos 2023-07-06 21:07:50 +0300
  • c2998a14e8 Add basic tests for affiliation relations Serafeim Chatzopoulos 2023-07-06 20:28:16 +0300
  • bc7b00bcd1 Add bi-directional affiliation relations Serafeim Chatzopoulos 2023-07-06 18:29:15 +0300
  • 12528ed2ef Refactor PrepareAffiliationRelations.java to use OafMapperUtils common functions Serafeim Chatzopoulos 2023-07-06 18:08:33 +0300
  • bbc245696e Prepare actionsets for BIP affiliations Serafeim Chatzopoulos 2023-07-06 15:56:12 +0300
  • 0c433eccdd Fix scores & Workflow Ilias Kanellos 2023-07-06 15:06:28 +0300
  • d5c39a1059 Fix map scores to doi Ilias Kanellos 2023-07-06 15:04:48 +0300
  • 772d5f0aab Make PR and AttRank serial Ilias Kanellos 2023-07-06 13:47:51 +0300
  • 801da2fd4a New sources formatted by maven plugin Giambattista Bloisi 2023-07-06 10:28:53 +0200
  • bd3fcf869a rename dnet-pace-core into dhp-pace-core module and use it as dependency in other modules Giambattista Bloisi 2023-07-06 10:02:23 +0200
  • 347a889b20 Read affiliation relations Serafeim Chatzopoulos 2023-07-06 00:51:01 +0300
  • 3b35db5fbd Import dnet-pace-core module from dnet-dedup repository Giambattista Bloisi 2023-07-05 22:13:22 +0200
  • 8dcd028eed [UsageCount] fixed typo in attribute name for datasource table Miriam Baglioni 2023-07-01 16:07:22 +0200
  • 8621377917 [UsageCount] fixed typo in attribute name for datasource table Miriam Baglioni 2023-06-30 19:02:44 +0200
  • ef2dd7a980 resolved conflicts Miriam Baglioni 2023-06-30 18:59:47 +0200
  • 7738372125 [UsageCount] fixed typo in attribute name for datasource table Miriam Baglioni 2023-06-30 18:56:41 +0200
  • 9963fd6d29 updated log to add subentity Sandro La Bruzzo 2023-06-28 13:36:05 +0200
  • f3a85e224b merged from branch beta the bulk tagging (single step, negative constraints), the cleanig worflow (single step, pid type based cleaning), instance level fulltext Claudio Atzori 2023-06-28 13:33:57 +0200
  • 4ef0f2ec26 added dependency commons-validator:commons-validator:1.7 Claudio Atzori 2023-06-28 13:32:01 +0200
  • ed7e2ab6d1 reverted mistake on commit workflow.xml Sandro La Bruzzo 2023-06-28 11:40:19 +0200
  • 9910ce06ae added to CreateSimRel the feature to write time log Sandro La Bruzzo 2023-06-28 11:38:16 +0200
  • 2717edafb7 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta Miriam Baglioni 2023-06-28 11:25:14 +0200
  • 2f04c9d149 [BulkTagging] fixing left over for test Miriam Baglioni 2023-06-28 11:24:42 +0200
  • bd17c3edc8 added to CreateSimRel the feature to write time log Sandro La Bruzzo 2023-06-28 11:20:58 +0200
  • b195da3a83 Added utility to write time logs during the deduplication phase Sandro La Bruzzo 2023-06-28 11:20:09 +0200
  • 288ec0b7d6 [doiboost] merged workflow from branch beta Claudio Atzori 2023-06-28 09:15:37 +0200
  • 5f32edd9bf adopting dhp-schema:3.17.1 Claudio Atzori 2023-06-27 16:57:17 +0200
  • e10ce92fe5 [stats wf] merged workflows from branch beta Claudio Atzori 2023-06-27 14:32:48 +0200
  • b93e1541aa Merge pull request 'update sql query to return distinct pids' (#301) from distinct_pids_from_openorgs into master Claudio Atzori 2023-06-27 12:24:47 +0200
  • d029bf0b94 Merge branch 'master' into distinct_pids_from_openorgs Claudio Atzori 2023-06-27 12:24:35 +0200
  • 0f5a819f44 [graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests Claudio Atzori 2023-06-23 16:10:49 +0200
  • 60f25b780d Minor fixes in workflow.xml and job.properties Serafeim Chatzopoulos 2023-06-23 12:51:50 +0300
  • 88a1cbc37d fixed a datasource id Michele Artini 2023-06-22 07:56:33 +0200
  • 009d7f312f fixed a datasource Id Michele Artini 2023-06-21 16:17:34 +0200
  • e4b27182d0 [master] refactoring Miriam Baglioni 2023-06-21 11:15:53 +0200
  • b0ebf56367 Merge pull request 'Update step15_5.sql' (#314) from antonis.lempesis/dnet-hadoop:beta into beta Claudio Atzori 2023-06-21 10:33:22 +0200
  • 2b6370eaee Update step15_5.sql dimitrispie 2023-06-21 11:31:10 +0300
  • 35e42a86ed Merge pull request 'Update step15_5.sql' (#313) from antonis.lempesis/dnet-hadoop:beta into beta Claudio Atzori 2023-06-21 10:26:16 +0200
  • 74cb060bfe Update step15_5.sql dimitrispie 2023-06-21 11:24:06 +0300
  • 85e016df17 Merge pull request 'Update step16-createIndicatorsTables.sql' (#312) from antonis.lempesis/dnet-hadoop:beta into beta Claudio Atzori 2023-06-21 09:52:33 +0200
  • a475cfcb7b Update step16-createIndicatorsTables.sql dimitrispie 2023-06-21 10:42:02 +0300
  • 979cf9cd87 Merge pull request 'Update step15.sql' (#311) from antonis.lempesis/dnet-hadoop:beta into beta Claudio Atzori 2023-06-21 09:20:01 +0200
  • 4648cd88d4 Update step15.sql dimitrispie 2023-06-21 10:02:19 +0300
  • 94d2573c77 Update step15.sql dimitrispie 2023-06-21 09:22:39 +0300
  • 0561362de2 Merge pull request 'Update step20-createMonitorDB_institutions.sql' (#309) from antonis.lempesis/dnet-hadoop:beta into beta Claudio Atzori 2023-06-20 15:07:09 +0200
  • 50d7dc0078 [graph enrichment] fixed projectOrganizationPath not being passed to the apply_resulttoorganization_propagation node Claudio Atzori 2023-06-19 15:42:44 +0200
  • fbd9bf704e indent Claudio Atzori 2023-06-19 15:41:22 +0200
  • 758e662ab8 Revert "REmove duplicated code and ensure that load and initialization is done through "DedupConfig.load" method" Giambattista Bloisi 2023-06-19 13:08:10 +0200
  • 485f9d18cb REmove duplicated code and ensure that load and initialization is done through "DedupConfig.load" method Giambattista Bloisi 2023-06-19 13:00:02 +0200
  • 6210f6ee48 Merge pull request 'Precompile blacklists patterns before evaluating clustering criteria' (#1) from optimized-clustering into master Claudio Atzori 2023-06-19 12:43:49 +0200
  • be2caedb04 Update step20-createMonitorDB_institutions.sql dimitrispie 2023-06-19 12:12:17 +0300
  • 36e0a8fec4 Changes to Promotion Stats WF dimitrispie 2023-06-19 09:44:34 +0300
  • b0ade43608 Precompile blacklists patterns before evaluating clustering criteria Enable Junit 5 tests in maven builds Make path comparisons platform-independent Read String resource files assuming they are encoded in UTF-8 Fix a few test conditions Giambattista Bloisi 2023-06-16 09:41:11 +0200
  • 4c770a5e29 Update finalizeImpalaCluster.sh dimitrispie 2023-06-15 13:25:37 +0300
  • e06d962a6a Update step15.sql dimitrispie 2023-06-15 12:20:35 +0300
  • afcad08396 Update step20-createMonitorDB_institutions.sql dimitrispie 2023-06-15 10:28:49 +0300
  • b9748763e2 Merge pull request '[stats wf] Bug fixes' (#308) from antonis.lempesis/dnet-hadoop:beta into beta Claudio Atzori 2023-06-14 21:57:03 +0200
  • 42b8ce2ba4 Update copyDataToImpalaCluster.sh dimitrispie 2023-06-14 19:23:42 +0300
  • 2032b0df40 Bug fixes dimitrispie 2023-06-14 19:09:09 +0300
  • a92206dab5 re-added the name of a column (pid) Michele Artini 2023-06-13 11:43:10 +0200
  • b76a47b103 [aggregator graph] added column alias when mapping organization PIDs from the OpenOrgs database Claudio Atzori 2023-06-13 11:38:10 +0200