• september-2023 77a2199837

    claudio.atzori released this 2023-09-11 16:07:49 +02:00 | 386 commits to master since this release

    This release is based on 265180bfd2 and features the following changes:

    NEW!!!

    • #284 impact indicators workflow #8172
    • #320 Import affiliation relations from Crossref and relative fix #335

    Misc

    • #323 fixed various unit tests
    • #328 Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities

    Raw Graph creation

    • #336 datainfo.invisible set as true only for entities

    Deduplication workflow

    • #319 Import dnet-pace-core module in this project and use it after renaming to dhp-pace-core
    • #324 Refactor Dedup using Spark Dataframe API, initial support for scala 2.12 and Spark 3.4
    • #329 DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
    • #330 Rewrite SparkPropagateRelation exploiting Dataframe API
    • #331 Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb

    Graph cleaning

    • #325 1st implementation of the suggestions from ticket #8898 for new cleaning criteria

    Graph indexing

    • #326 expand the instance level fulltext in the XML records

    Stats update workflow

    • #321, #322 [stats wf] Changes for promotion of production DBs to the new cluster
    Downloads