Commit Graph

474 Commits

Author SHA1 Message Date
Claudio Atzori 5ca031c8d6 [graph raw] rule out empty PIDs 2024-10-29 13:48:41 +01:00
Claudio Atzori e4abe55988 merged person_through_the_graph & code formatting 2024-10-28 11:01:49 +01:00
Miriam Baglioni 0fb6af5586 Updated main pom dependency against dhp-schema, from 8.0.1 to 9.0.0. The new fields included in the updated schema module are populated by the Solr JSON payload mapping, which also limits the number of authors serialised to 200. 2024-10-25 16:28:50 +02:00
Miriam Baglioni c921cf7ee0 [personEntity] removed the deletedbyinference results (not indexed, but still in the graph). Changed the writing mode: append instead of overwrite 2024-10-24 09:57:20 +02:00
Giambattista Bloisi 6bc741715c Fix OafMapperUtilsTest.testMergePubs 2024-10-23 14:02:45 +02:00
Claudio Atzori d5867a1992 merged #490 2024-10-08 15:39:59 +02:00
Giambattista Bloisi c45cae447a Fix: invert the "natural" order when ordering by id lexicographically 2024-09-26 17:08:02 +02:00
Claudio Atzori 3fcafc7ed6 Merge pull request 'Latest institutions in monitor dbs' (#472) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #472
2024-09-26 09:49:01 +02:00
Claudio Atzori 535a7b99f1 the metadata collection plugins using the HttpConnector2 class shall now retry instead of failing in case of UnknownHostException 2024-09-25 11:35:34 +02:00
Claudio Atzori d1cadc77c9 [graph provision] person serialisation, limit the number of authorships and coauthorships before expanding the payloads 2024-09-24 10:57:20 +02:00
Claudio Atzori e0ff84baf0 [graph provision] person serialisation, limit the number of authorships and coauthorships before expanding the payloads 2024-09-23 10:29:46 +02:00
Claudio Atzori 23e0ab3a7c run mergeResultsOfDifferentTypes only when checkDelegatedAuthority is true 2024-09-17 15:36:10 +02:00
Claudio Atzori bfd05cdab2 run mergeResultsOfDifferentTypes only when checkDelegatedAuthority is true 2024-09-17 10:49:32 +02:00
Claudio Atzori 9486e21a44 copy or process the person records throughout the graph pipeline 2024-07-30 14:25:31 +02:00
Claudio Atzori 5aa7847ea6 consider the transformative agreement text when merging results 2024-07-16 10:38:50 +02:00
Claudio Atzori 1180d78b71 make entity level pids unique by pidType:pidValue 2024-07-04 09:41:12 +02:00
Claudio Atzori 7d3292551b ignore dates containing 'null's 2024-07-02 15:44:31 +02:00
Lampros Smyrnaios fe2275a9b0 Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
# Conflicts:
#	dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step14.sql
2024-06-25 20:17:47 +03:00
Claudio Atzori a8d68c9d29 avoid NPEs 2024-06-11 14:19:24 +02:00
Claudio Atzori ce2364743a applying changes from PR#442: Fix for missing collectedfrom after dedup 2024-06-06 10:43:43 +02:00
Claudio Atzori f70dc76b61 minor 2024-06-06 10:43:10 +02:00
Lampros Smyrnaios a644a6f4fe Catch Spark-sql errors and show a log with the statement that failed. 2024-05-29 12:10:11 +03:00
Giambattista Bloisi 73316d8c83 Add jaxb and jaxws dependencies when compiling with spark-34 profile as they are required to run with jdk > 8 2024-05-28 14:14:51 +02:00
Sandro La Bruzzo f1fe363b19 merged again from beta (I hope for the last time) 2024-05-22 11:08:52 +02:00
Sandro La Bruzzo 66c1ffc866 merged again from beta (I hope for the last time) 2024-05-22 11:02:46 +02:00
Sandro La Bruzzo 103e2652b3 merged beta 2024-05-17 14:43:07 +02:00
Sandro La Bruzzo 6efab4d88e fixed scholexplorer bug 2024-05-16 16:19:18 +02:00
Claudio Atzori a5d13d5d27 code formatting 2024-05-03 14:14:34 +02:00
Giambattista Bloisi 69c5efbd8b Fix: when applying enrichments with no instance information the resulting merge entity was generated with no instance instead of keeping the original information 2024-05-03 13:57:56 +02:00
Sandro La Bruzzo db358ad0d2 code formatted 2024-05-02 15:25:57 +02:00
Sandro La Bruzzo 26bf8e763a merged from beta 2024-05-02 15:20:23 +02:00
Sandro La Bruzzo 0646d0d064 Updated main sparkApplication to avoid to require master variable 2024-05-02 15:15:03 +02:00
Claudio Atzori 4355f64810 reverted to version 1.2.5-SNAPSHOT 2024-05-02 11:23:53 +02:00
Claudio Atzori 66680b8b9a refactoring of common utilities 2024-05-02 11:16:58 +02:00
Claudio Atzori dcf23b3d06 Merge branch 'beta' into beta-release-1.2.5 2024-05-02 10:01:49 +02:00
Sandro La Bruzzo 9cd3bc0f10 Added a new generation of the dump for scholexplorer tested with last version of spark, and strongly refactored 2024-04-26 16:02:07 +02:00
Claudio Atzori e2937db385 Merge branch 'beta' into misc_fixes_merge_entities 2024-04-24 08:55:28 +02:00
Giambattista Bloisi 1878199dae Miscellaneous fixes:
- in Merge By ID pick by preference those records coming from delegated Authorities
- fix various tests
- close spark session in SparkCreateSimRels
2024-04-24 08:12:45 +02:00
Sandro La Bruzzo 0d628cd62b merged again from beta 2024-04-23 17:34:55 +02:00
Claudio Atzori c3053ef34d using version 1.2.5-beta for the release 2024-04-23 14:52:32 +02:00
Claudio Atzori b5bcab13ec using version 1.2.5-beta for the release 2024-04-23 14:36:39 +02:00
Claudio Atzori 425c9afc36 using version 1.2.5-beta for the release 2024-04-23 14:30:04 +02:00
Claudio Atzori 24a83fc24f avoid NPEs in common Oaf merge utilities 2024-04-22 11:39:44 +02:00
Claudio Atzori 5857fd38c1 avoid NPEs in common Oaf merge utilities 2024-04-21 08:29:09 +02:00
Sandro La Bruzzo b84ad0c06e merged beta 2024-04-19 14:39:59 +02:00
Sandro La Bruzzo 342cb6189b fixed problem on changed signature on RowEncoder
removed property dhp.schema.artifact
2024-04-19 12:13:26 +02:00
Claudio Atzori ac8747582c Merge branch 'beta' into doidoost_dismiss 2024-04-17 12:01:01 +02:00
Giambattista Bloisi 8ac167e420 Refinements to PR #404: refactoring the Oaf records merge utilities into dhp-common 2024-04-16 17:18:28 +02:00
Miriam Baglioni 9eeb9f5d32 mergin with branch beta 2024-04-16 15:24:40 +02:00
Giambattista Bloisi da333e9f4d Merge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' (#419) from dedup_authorsmatch_bytoken into beta
Reviewed-on: #419
2024-04-16 10:24:11 +02:00