• 2024-10-BETA 69aee609ef

    claudio.atzori released this 2024-10-29 16:19:14 +01:00 | 106 commits to beta since this release

    Redmine ticket #10118.

    The graph internal schema module version in use is 9.0.0

    The implementation of the Graph pipeline has the following changes

    Datasources

    Affiliations

    • #494 - AffRo to include affiliation provenance. Affiliations from WebCrawl to include (1) a wider set of contents and (2) the publisher websites.

    Crossref

    • e75326d6ec - Included Mapping to consider DFG projects acks
    • 6a097abc89 - Anything that has a relationship "is-review-of" must be mapped as publication of type "Review". Force the hostedby of records with DOI prefix 10.3410 and 10.12703 to the H1 Connect data source.

    Deduplication

    • #475 - avoid NPEs in the countryInference dedup utility
    • #485 - blacklist filtering moved before the cleanup phase in order to have case sensitive regex
    • #500 - Fill mergedIds field and filter mergerels with dedup records actually created

    Changes to the graph pipeline

    • #471 - impact indicators workflow optimisation
    • #476 - include claimed affiliation relationships, redmine ticket #9839
    • #490 - cleaning of PIDs
    • #497 - Person records management: adds links to projects (added to the action set) and extending the propagation workflows to extract also relations from the orcid_pending present in the graph.
    • d5867a1992 - improved cleaning of PIDs
    • #468 - person entities through the graph

    Graph provision

    • #498 & #499 - dhp-schema upgrade & provision mapping

    Graph stats

    • #489 - datasource table creation split in steps
    • #472 - Latest institutions in monitor dbs

    Misc

    Downloads
  • september-2023 77a2199837

    claudio.atzori released this 2023-09-11 16:07:49 +02:00 | 386 commits to master since this release

    This release is based on 265180bfd2 and features the following changes:

    NEW!!!

    • #284 impact indicators workflow #8172
    • #320 Import affiliation relations from Crossref and relative fix #335

    Misc

    • #323 fixed various unit tests
    • #328 Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities

    Raw Graph creation

    • #336 datainfo.invisible set as true only for entities

    Deduplication workflow

    • #319 Import dnet-pace-core module in this project and use it after renaming to dhp-pace-core
    • #324 Refactor Dedup using Spark Dataframe API, initial support for scala 2.12 and Spark 3.4
    • #329 DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
    • #330 Rewrite SparkPropagateRelation exploiting Dataframe API
    • #331 Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb

    Graph cleaning

    • #325 1st implementation of the suggestions from ticket #8898 for new cleaning criteria

    Graph indexing

    • #326 expand the instance level fulltext in the XML records

    Stats update workflow

    • #321, #322 [stats wf] Changes for promotion of production DBs to the new cluster
    Downloads