Commit Graph

14 Commits

Author SHA1 Message Date
Sandro La Bruzzo 26bf8e763a merged from beta 2024-05-02 15:20:23 +02:00
Claudio Atzori 66680b8b9a refactoring of common utilities 2024-05-02 11:16:58 +02:00
Sandro La Bruzzo 9cd3bc0f10 Added a new generation of the dump for scholexplorer tested with last version of spark, and strongly refactored 2024-04-26 16:02:07 +02:00
Giambattista Bloisi 21a14fcd80 Reusable RunSQLSparkJob for executing SQL in Spark through Oozie Spark Actions
Implements pivots table update oozie workflow
2024-01-15 10:18:14 +01:00
Claudio Atzori 11a1207f9c [graph cleaning] applying coar based vocabularies in bulk 2023-11-22 12:22:14 +01:00
Giambattista Bloisi 6cc7d8ca7b GroupEntities and DispatchEntites are now merged in GroupEntitiesSparkJob 2023-08-30 10:43:31 +02:00
Giambattista Bloisi fab9920271 DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag 2023-08-09 15:41:43 +02:00
Claudio Atzori cb7c07c54e [scholix] added step to create tar archive 2022-08-11 11:25:24 +02:00
Claudio Atzori 316b0fd73c added 'von' to the name particles file 2022-06-27 09:36:51 +02:00
Claudio Atzori b295a40d9c restored use of name_particles when parsing author names 2022-06-16 12:20:43 +02:00
Claudio Atzori 391aa1373b added unit test 2022-01-19 17:13:21 +01:00
Claudio Atzori 943b961cf6 introduced PidBlacklist 2020-12-02 09:30:34 +01:00
Claudio Atzori 893ac4a77b GenerateEntitiesApplication can be configured to hash the id value or not 2020-12-02 09:30:06 +01:00
Sandro La Bruzzo addaaa091f migrate relation from RDD to Dataset 2020-03-13 09:13:20 +01:00