-
released this
2024-10-29 16:19:14 +01:00 | 106 commits to beta since this releaseRedmine ticket #10118.
The graph internal schema module version in use is 9.0.0
The implementation of the Graph pipeline has the following changes
Datasources
Affiliations
- #494 - AffRo to include affiliation provenance. Affiliations from WebCrawl to include (1) a wider set of contents and (2) the publisher websites.
Crossref
e75326d6ec
- Included Mapping to consider DFG projects acks6a097abc89
- Anything that has a relationship "is-review-of" must be mapped as publication of type "Review". Force the hostedby of records with DOI prefix10.3410
and10.12703
to the H1 Connect data source.
Deduplication
- #475 - avoid NPEs in the
countryInference
dedup utility - #485 - blacklist filtering moved before the cleanup phase in order to have case sensitive regex
- #500 - Fill mergedIds field and filter mergerels with dedup records actually created
Changes to the graph pipeline
- #471 - impact indicators workflow optimisation
- #476 - include claimed affiliation relationships, redmine ticket #9839
- #490 - cleaning of PIDs
- #497 - Person records management: adds links to projects (added to the action set) and extending the propagation workflows to extract also relations from the orcid_pending present in the graph.
d5867a1992
- improved cleaning of PIDs- #468 - person entities through the graph
Graph provision
Graph stats
Misc
- #496 Minor fixes
Downloads
-
[PROD] September 2023 Stable
released this
2023-09-11 16:07:49 +02:00 | 386 commits to master since this releaseThis release is based on
265180bfd2
and features the following changes:NEW!!!
- #284 impact indicators workflow #8172
- #320 Import affiliation relations from Crossref and relative fix #335
Misc
- #323 fixed various unit tests
- #328 Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities
Raw Graph creation
- #336
datainfo.invisible
set as true only for entities
Deduplication workflow
- #319 Import
dnet-pace-core
module in this project and use it after renaming todhp-pace-core
- #324 Refactor Dedup using Spark Dataframe API, initial support for scala 2.12 and Spark 3.4
- #329 DispatchEntitiesSparkJob: manage all entity types together, support filtering by
dataInfo.invisible
flag - #330 Rewrite SparkPropagateRelation exploiting Dataframe API
- #331 Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb
Graph cleaning
Graph indexing
- #326 expand the instance level fulltext in the XML records
Stats update workflow
Downloads