Commit Graph

501 Commits

Author SHA1 Message Date
Miriam Baglioni 86fb19a7f0 [orcipPropagation] refactoring after compilation and fixed issue in path for propagation constants in incremental module 2024-12-20 14:56:06 +01:00
Miriam Baglioni 071cc95afb [orcipPropagation] test to verify the merge for multiple enrichment from multiple sources. It covers also the check for persistency of other identifiers types 2024-12-20 14:41:09 +01:00
Miriam Baglioni 29611b2091 [orcipPropagation]rewritten in scala. generategraph again not abstract 2024-12-20 12:40:33 +01:00
Miriam Baglioni 7752d47776 [orcipPropagation]changes for merge for propagation 2024-12-20 10:20:32 +01:00
Miriam Baglioni a9ccd00483 [orcidPropagatio] - 2024-12-20 08:35:17 +01:00
Miriam Baglioni 3021dfda77 [orcidPropagatio] - 2024-12-19 15:14:09 +01:00
Miriam Baglioni ec4a90f669 [orcidPropagatio] added specific classid and classname for pid qualifier 2024-12-19 11:03:09 +01:00
Miriam Baglioni 60cfaf119b [orcidPropagatio] added specific orcid propagation classid and classname in the qualifier 2024-12-19 11:02:11 +01:00
Miriam Baglioni df5f1caa7a [orcidPropagatio] added classid and classname for qualifier of the pid 2024-12-19 11:01:45 +01:00
Giambattista Bloisi 71fe0374dc Revise propagation tests 2024-12-17 16:01:03 +01:00
Giambattista Bloisi d095b31ea8 [orcidenrichment] Fix lambda to avoid requiring serialization on enclosing class 2024-12-17 15:17:11 +01:00
Giambattista Bloisi 6260526fa1 [orcidenrichment] Fix imports and formatting 2024-12-17 15:17:11 +01:00
Giambattista Bloisi 64f4d7fb71 [orcidenrichment] When comparing authors manage the case of hyphenation and punctuations characters and normalizes utf strings 2024-12-17 15:17:11 +01:00
Giambattista Bloisi e03e8a39c0 [orcidenrichment] Do not match in case of ambiguity: two authors match and at least one of them has affiliation string 2024-12-17 15:17:11 +01:00
Miriam Baglioni 1b4bbb2691 [orcidenrichment] refactoring 2024-12-17 15:17:11 +01:00
Miriam Baglioni da9bbdede4 [orcidenrichment] refactoring 2024-12-17 15:17:11 +01:00
Giambattista Bloisi 36ca0b123e Move AuthorMatchers in dhp-common 2024-12-17 15:17:11 +01:00
Claudio Atzori dade7d5bb8 minor changes 2024-12-06 10:02:07 +01:00
Michele De Bonis bde59a7c8f implementation of the utilities for the inclusion of raids in the graph 2024-12-05 11:09:30 +01:00
Claudio Atzori b95672b420 mergeUtils set the result identifier when enforcing the result type 2024-11-15 09:16:18 +01:00
Claudio Atzori 4a3b173ca2 defaults to 0000 - Unknown in case the instance type lookup in the dnet:result_typologies doesn't find a corresponding result type binding 2024-11-13 16:27:00 +01:00
Claudio Atzori 07f267bb10 fix vocabulary lookup in mergeutils 2024-11-13 08:14:26 +01:00
Claudio Atzori 8088943399 Merge pull request 'enforce resulttype' (#506) from merge_resulttypes into beta
Reviewed-on: #506
2024-11-12 14:20:22 +01:00
Claudio Atzori 6c5df761e2 enforce resulttype based on the dnet:result_typologies vocabulary and upon merge 2024-11-12 14:18:04 +01:00
Giambattista Bloisi 8f5171557e Remove ORCID information when the same ORCID ID is used multiple times in the same result for different authors 2024-11-07 12:22:34 +01:00
Claudio Atzori a877c76d70 make MergeUtils.selectOldestDate less prone to errors when receiving invalid date formats 2024-10-30 11:24:25 +01:00
Claudio Atzori 26cdc7e439 Avoid NPEs in MergeUtils 2024-10-30 07:35:47 +01:00
Claudio Atzori 5ca031c8d6 [graph raw] rule out empty PIDs 2024-10-29 13:48:41 +01:00
Claudio Atzori e4abe55988 merged person_through_the_graph & code formatting 2024-10-28 11:01:49 +01:00
Miriam Baglioni 0fb6af5586 Updated main pom dependency against dhp-schema, from 8.0.1 to 9.0.0. The new fields included in the updated schema module are populated by the Solr JSON payload mapping, which also limits the number of authors serialised to 200. 2024-10-25 16:28:50 +02:00
Miriam Baglioni c921cf7ee0 [personEntity] removed the deletedbyinference results (not indexed, but still in the graph). Changed the writing mode: append instead of overwrite 2024-10-24 09:57:20 +02:00
Giambattista Bloisi 6bc741715c Fix OafMapperUtilsTest.testMergePubs 2024-10-23 14:02:45 +02:00
Claudio Atzori d5867a1992 merged #490 2024-10-08 15:39:59 +02:00
Giambattista Bloisi c45cae447a Fix: invert the "natural" order when ordering by id lexicographically 2024-09-26 17:08:02 +02:00
Claudio Atzori 3fcafc7ed6 Merge pull request 'Latest institutions in monitor dbs' (#472) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #472
2024-09-26 09:49:01 +02:00
Claudio Atzori 535a7b99f1 the metadata collection plugins using the HttpConnector2 class shall now retry instead of failing in case of UnknownHostException 2024-09-25 11:35:34 +02:00
Claudio Atzori d1cadc77c9 [graph provision] person serialisation, limit the number of authorships and coauthorships before expanding the payloads 2024-09-24 10:57:20 +02:00
Claudio Atzori e0ff84baf0 [graph provision] person serialisation, limit the number of authorships and coauthorships before expanding the payloads 2024-09-23 10:29:46 +02:00
Claudio Atzori 23e0ab3a7c run mergeResultsOfDifferentTypes only when checkDelegatedAuthority is true 2024-09-17 15:36:10 +02:00
Claudio Atzori bfd05cdab2 run mergeResultsOfDifferentTypes only when checkDelegatedAuthority is true 2024-09-17 10:49:32 +02:00
Claudio Atzori 9486e21a44 copy or process the person records throughout the graph pipeline 2024-07-30 14:25:31 +02:00
Claudio Atzori 5aa7847ea6 consider the transformative agreement text when merging results 2024-07-16 10:38:50 +02:00
Claudio Atzori 1180d78b71 make entity level pids unique by pidType:pidValue 2024-07-04 09:41:12 +02:00
Claudio Atzori 7d3292551b ignore dates containing 'null's 2024-07-02 15:44:31 +02:00
Lampros Smyrnaios fe2275a9b0 Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
# Conflicts:
#	dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step14.sql
2024-06-25 20:17:47 +03:00
Claudio Atzori a8d68c9d29 avoid NPEs 2024-06-11 14:19:24 +02:00
Claudio Atzori ce2364743a applying changes from PR#442: Fix for missing collectedfrom after dedup 2024-06-06 10:43:43 +02:00
Claudio Atzori f70dc76b61 minor 2024-06-06 10:43:10 +02:00
Lampros Smyrnaios a644a6f4fe Catch Spark-sql errors and show a log with the statement that failed. 2024-05-29 12:10:11 +03:00
Giambattista Bloisi 73316d8c83 Add jaxb and jaxws dependencies when compiling with spark-34 profile as they are required to run with jdk > 8 2024-05-28 14:14:51 +02:00