Claudio Atzori
|
11a1207f9c
|
[graph cleaning] applying coar based vocabularies in bulk
|
2023-11-22 12:22:14 +01:00 |
Claudio Atzori
|
a24178cb93
|
Merge branch 'beta' into resource_types
|
2023-10-17 11:09:50 +02:00 |
Claudio Atzori
|
d28b7085f6
|
more NPE checks
|
2023-10-17 11:09:31 +02:00 |
Giambattista Bloisi
|
0e44b037a5
|
FIX: GroupEntitiesSparkJob deletes whole graph outputPath instead of its temporary folder
|
2023-10-17 07:54:01 +02:00 |
Claudio Atzori
|
39d24d5469
|
Merge branch 'beta' into resource_types
|
2023-10-16 11:56:38 +02:00 |
Claudio Atzori
|
05ee7d8b09
|
[graph cleaning] avoid NPEs
|
2023-10-12 09:13:42 +02:00 |
Claudio Atzori
|
554551682d
|
[raw graph] adopting the new COAR based vocabularies for the resource typing
|
2023-10-11 16:09:19 +02:00 |
Claudio Atzori
|
8108491722
|
Merge branch 'beta' into peer_reviewed
|
2023-10-06 14:21:52 +02:00 |
Giambattista Bloisi
|
2f3cf6d0e7
|
Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0' character
|
2023-10-06 14:20:15 +02:00 |
Claudio Atzori
|
eed9fe0902
|
code formatting
|
2023-10-06 12:31:17 +02:00 |
Claudio Atzori
|
73c49b8d26
|
Merge branch 'beta' into SWH_integration
|
2023-10-06 12:21:51 +02:00 |
Claudio Atzori
|
c9a5ad6a02
|
extending the coverage of the peer non-unknown refereed instances
|
2023-10-02 16:28:42 +02:00 |
Serafeim Chatzopoulos
|
ab0d70691c
|
Add step for archiving repoUrls to SWH
|
2023-09-28 20:56:18 +03:00 |
Serafeim Chatzopoulos
|
ed9c81a0b7
|
Add steps to collect last visit data && archive not found repository URLs
|
2023-09-27 19:00:54 +03:00 |
Claudio Atzori
|
8a6892cc63
|
[graph dedup] consistency wf should not remove the relations while dispatching the entities
|
2023-09-12 21:27:05 +02:00 |
Giambattista Bloisi
|
6cc7d8ca7b
|
GroupEntities and DispatchEntites are now merged in GroupEntitiesSparkJob
|
2023-08-30 10:43:31 +02:00 |
Claudio Atzori
|
bf35280ea6
|
code formatting
|
2023-08-29 11:11:00 +02:00 |
Giambattista Bloisi
|
95cd2b9b1e
|
Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
|
2023-08-10 11:53:48 +02:00 |
Giambattista Bloisi
|
fab9920271
|
DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
|
2023-08-09 15:41:43 +02:00 |
Miriam Baglioni
|
c25ac21e5e
|
Merge pull request 'graph cleaning, suggestions from ticket 8898' (#325) from cleaning_8898 into beta
Reviewed-on: D-Net/dnet-hadoop#325
|
2023-08-08 11:14:19 +02:00 |
Claudio Atzori
|
b9dddbfe54
|
rule out records with NULL dataInfo, except for Relations
|
2023-07-31 17:53:54 +02:00 |
Claudio Atzori
|
11ffb9bd68
|
rule out records with NULL dataInfo
|
2023-07-31 12:35:33 +02:00 |
Claudio Atzori
|
d8435a6512
|
inverted condition
|
2023-07-25 17:39:57 +02:00 |
Claudio Atzori
|
270df939c4
|
partial implementation of the suggestions from https://support.openaire.eu/issues/8898
|
2023-07-25 17:29:50 +02:00 |
Giambattista Bloisi
|
bb5b845e3c
|
Use scala.binary.version property to resolve scala maven dependencies
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
|
2023-07-24 11:13:48 +02:00 |
Claudio Atzori
|
c754397a19
|
Merge branch 'beta' into pid_cleaning
|
2023-07-24 10:49:31 +02:00 |
Giambattista Bloisi
|
38dfebfbe6
|
Disable MdStoreClientTest test as it requires a local mongodb running and it does not perform any assertions
|
2023-07-19 14:18:56 +02:00 |
Giambattista Bloisi
|
bd3fcf869a
|
rename dnet-pace-core into dhp-pace-core module and use it as dependency in other modules
|
2023-07-06 10:02:23 +02:00 |
Sandro La Bruzzo
|
9910ce06ae
|
added to CreateSimRel the feature to write time log
|
2023-06-28 11:38:16 +02:00 |
Sandro La Bruzzo
|
b195da3a83
|
Added utility to write time logs during the deduplication phase
|
2023-06-28 11:20:09 +02:00 |
Claudio Atzori
|
0f5a819f44
|
[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests
|
2023-06-23 16:10:49 +02:00 |
Claudio Atzori
|
1d33074fd1
|
WIP: pid cleaning
|
2023-06-09 16:47:25 +02:00 |
Claudio Atzori
|
8a463cc3e8
|
fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common
|
2023-05-15 15:44:46 +02:00 |
Claudio Atzori
|
d02916ef82
|
code formatting
|
2023-05-02 11:05:37 +02:00 |
Claudio Atzori
|
851f664bd9
|
Merge branch 'beta' into graph_cleaning_refactoring
|
2023-05-02 09:55:40 +02:00 |
Miriam Baglioni
|
73f77575bd
|
[ZenodoApiClient] align with master version
|
2023-04-18 10:25:27 +02:00 |
Claudio Atzori
|
2a6ba29b64
|
[graph cleaning] unit tests & cleanup
|
2023-04-04 12:34:51 +02:00 |
Claudio Atzori
|
6d3d18d8b5
|
[graph cleaning] WIP: refactoring of the cleaning stages
|
2023-03-16 17:23:36 +01:00 |
Sandro La Bruzzo
|
0b9819f1ab
|
Code formatted
|
2023-02-08 10:32:33 +01:00 |
Sandro La Bruzzo
|
6c81a161d2
|
Merge remote-tracking branch 'origin/beta' into 8231-mdstore-synch-improve
|
2023-02-08 10:29:09 +01:00 |
Claudio Atzori
|
9cf0a98699
|
[cleaning] set the common subject classid/name
|
2022-12-20 10:17:33 +01:00 |
Claudio Atzori
|
b8bafab8a0
|
[cleaning] improved vocabulary based mapping, specialization for the strict vocab cleaning
|
2022-12-12 14:43:03 +01:00 |
Sandro La Bruzzo
|
5a48a2fb18
|
implemented synch for single mdstore
|
2022-12-01 11:34:43 +01:00 |
Claudio Atzori
|
11695ba649
|
[graph cleaning] patch also the result's collectedfrom and hostedby datasource name according to the datasource master-duplicate mapping
|
2022-11-28 10:18:43 +01:00 |
Claudio Atzori
|
24ef301cc1
|
[graph cleaning] patch the result's collectedfrom and hostedby identifiers according to the datasource master-duplicate mapping
|
2022-11-28 09:54:18 +01:00 |
Claudio Atzori
|
b47aaf4dd1
|
[cleaning] subjects declared as belonging to specific vocabularies whose values are not found in the vocab are set to type keyword
|
2022-10-13 11:23:43 +02:00 |
Claudio Atzori
|
b7c387c21f
|
cleaning of subjects: avoid duplicated subjects, prioritise collected vs inferred or other sources
|
2022-08-12 15:09:16 +02:00 |
Claudio Atzori
|
adb526b0e1
|
Merge branch 'beta' into clean_subjects
|
2022-08-12 10:51:17 +02:00 |
Claudio Atzori
|
cb7c07c54e
|
[scholix] added step to create tar archive
|
2022-08-11 11:25:24 +02:00 |
Claudio Atzori
|
3418ce50ac
|
cleaning of subjects: perform the cleaning when the given value is equivalent to one of the terms in the vocabulary
|
2022-08-08 12:48:47 +02:00 |