395a4af020Run CC and RAM sequentieally in dhp-impact-indicators WF
Serafeim Chatzopoulos
2023-09-12 22:31:50 +0300
c2f179800cMerge pull request 'Run CC and RAM sequentieally in dhp-impact-indicators WF' (#338) from run_cc_and_ram_sequentially into masterClaudio Atzori2023-09-13 08:52:53 +0200
2aed5a74beRun CC and RAM sequentieally in dhp-impact-indicators WF
#338
Serafeim Chatzopoulos
2023-09-12 22:31:50 +0300
8a6892cc63[graph dedup] consistency wf should not remove the relations while dispatching the entitiesClaudio Atzori2023-09-12 14:34:28 +0200
9f5d16624cMerge pull request '[graph raw] datainfo.invisible set as true only for entities' (#336) from invisible_relations into beta
#337
Claudio Atzori2023-09-04 16:14:47 +0200
15666e86a8added collectedfrom to the affiliation relations imported from CrossrefClaudio Atzori2023-09-04 15:56:06 +0200
7d6bd4f20bMerge pull request 'Fix import of affiliations relations from Crossref' (#335) from 8876_fix_crossref_affiliation_relations_import into betaClaudio Atzori2023-09-04 15:19:58 +0200
5b06c9d06f[graph raw] datainfo.invisible set as true only for entitiesClaudio Atzori2023-09-04 15:15:24 +0200
7de0164c26Fix import of affiliations relations from Crossref
#335
Serafeim Chatzopoulos
2023-09-04 16:04:41 +0300
2caaaec42dInclude SparkCleanRelation logic in SparkPropagateRelation SparkPropagateRelation includes merge relations Revised tests for SparkPropagateRelationGiambattista Bloisi2023-09-01 09:32:57 +0200
964c2f553eChanges in indicators step, monitor step
dimitrispie
2023-09-01 10:57:02 +0300
488d9a1ceaMerge pull request 'Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb' (#331) from consistencywf_memoryoverhead_conf into betaClaudio Atzori2023-08-29 16:31:36 +0200
6b1c05d118Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb
#331
Giambattista Bloisi2023-08-29 16:04:19 +0200
f437be80ad[impact indicators] adjusted paths in the bip ranker wf parametersClaudio Atzori2023-08-29 09:03:03 +0200
d012aec0b3Revert PropagateRelation's argument name from outputPath to graphOutputPath in consistency workflow (#8964)Giambattista Bloisi2023-08-28 22:44:54 +0200
a860e19423Fix ensure all relations are written out, not only those managed by dedupGiambattista Bloisi2023-08-28 15:36:02 +0200
35b8deb2c6Merge pull request 'DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag' (#329) from dispatch_filter_invisible_entities into betaMiriam Baglioni2023-08-10 12:56:18 +0200
95cd2b9b1eMake filterInvisible a mandatory parameter of DispathEntitiesSparkJob Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
#329
Giambattista Bloisi2023-08-10 11:53:48 +0200
fab9920271DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flagGiambattista Bloisi2023-08-08 15:52:20 +0200
c25ac21e5eMerge pull request 'graph cleaning, suggestions from ticket 8898' (#325) from cleaning_8898 into betaMiriam Baglioni2023-08-08 11:14:19 +0200
c334fe2438Merge pull request 'Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities' (#328) from cleanup_relations_after_dedup into betaMiriam Baglioni2023-08-08 09:49:12 +0200
0e2f855807Merge pull request 'Updates Promotion DBs' (#321) from antonis.lempesis/dnet-hadoop:beta into betaMiriam Baglioni2023-08-07 12:09:16 +0200
18fbe52b20Merge pull request 'Import affiliation relations from Crossref' (#320) from 8876 into betaMiriam Baglioni2023-08-07 10:45:30 +0200
97b6d1dc45Filter ids by dataInfo.deletedbyinference and DataInfo.invisible flags Filter relations also by dataInfo.invisible flag
#328
Giambattista Bloisi2023-08-07 10:24:11 +0200
af49424b59Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleyted by inference or that are pointing to dangling entitiesGiambattista Bloisi2023-08-04 14:27:39 +0200
1275a07d45Merge pull request '[graph indexing] expand the instance level fulltext in the XML records' (#326) from instance_fulltext_xml into betaClaudio Atzori2023-07-27 15:02:07 +0200
8c63e4a864Merge pull request 'Refactor Dedup using Spark Dataframe API, initial support for scala 2.12 and Spark 3.4' (#324) from dedup-with-dataframe-2 into betaClaudio Atzori2023-07-25 10:17:17 +0200
e64c2854a3Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface JsonPath cache contention fixed by using a ConcurrentHashMap Blacklist filtering performance improvement Minor performance improvements when evaluating similarity Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
#324
Giambattista Bloisi2023-07-18 11:38:56 +0200
bb5b845e3cUse scala.binary.version property to resolve scala maven dependencies Ensure consistent usage of maven properties Profile for compiling with scala 2.12 and Spark 3.4Giambattista Bloisi2023-07-17 17:18:46 +0200
002b24e06fMerge pull request '[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests' (#315) from pid_cleaning into betaClaudio Atzori2023-07-24 10:49:44 +0200
03da965162Format bip-score based file without doi referencesIlias Kanellos2023-07-21 13:42:30 +0300
f03153823aUpdate testCitationRelations number of expected citations according to changes made in 0559d8b4 (monodirectional citations)
#323
Giambattista Bloisi2023-07-21 10:48:28 +0200
54c1eacef1SparkJobTest was failing because testing workingdir was not cleaned up after eact testGiambattista Bloisi2023-07-21 10:42:24 +0200
5e15f20e6eFix entityMerger that was excluding the authors of the first entity in the list to mergeGiambattista Bloisi2023-07-21 00:46:54 +0200
0210a14e43Ignore timestamp differences in PromoteActionPayloadForGraphTableJobTestGiambattista Bloisi2023-07-20 23:45:57 +0200
74fcea66e6erge branch 'dedup-with-dataframe-spark34' of code-repo.d4science.org:D-Net/dnet-hadoop into dedup-with-dataframe-spark34
Sandro La Bruzzo
2023-07-19 16:55:19 +0200
e4feedd67eimproved scholix generation using bean
Sandro La Bruzzo
2023-07-19 16:53:28 +0200
dba34505deFix SparkStatsTest bug where parquet tables were incorrectly read as text files leading to unpredictable count() valuesGiambattista Bloisi2023-07-19 14:24:52 +0200
e47ed1fdb2Use DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES in json mapper to avoid that tests fail if they encounter unmapped propertiesGiambattista Bloisi2023-07-19 14:21:40 +0200
38dfebfbe6Disable MdStoreClientTest test as it requires a local mongodb running and it does not perform any assertionsGiambattista Bloisi2023-07-19 14:18:56 +0200
617ef05e15Update commons.lang.version to 3.12.0 to match spark 3.4 version and fix an incompatibility when running with Java 11Giambattista Bloisi2023-07-17 17:01:07 +0200