Claudio Atzori
|
488d9a5eaa
|
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
|
2023-03-23 10:41:13 +01:00 |
Claudio Atzori
|
4f5ba0ed52
|
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
|
2023-03-21 14:41:20 +01:00 |
Claudio Atzori
|
6d3d18d8b5
|
[graph cleaning] WIP: refactoring of the cleaning stages
|
2023-03-16 17:23:36 +01:00 |
Claudio Atzori
|
518618f1a9
|
[graph cleaning] avoid to overwrite the subject class to 'keyword' for those with provenance 'subject:fos'
|
2023-03-14 15:22:47 +01:00 |
Claudio Atzori
|
e28d395e87
|
[aggregator graph] using dedicated path to sync claims, adjusted paths with wildcards
|
2023-03-08 21:16:52 +01:00 |
Claudio Atzori
|
5b8fd37314
|
[aggregator graph] using dedicated path to sync claims
|
2023-03-08 15:28:14 +01:00 |
Claudio Atzori
|
7fd89566c2
|
[aggregator graph] handle paths including wildcards
|
2023-03-08 12:43:00 +01:00 |
Claudio Atzori
|
8ec0d62d91
|
pre-group the records in each table before joning the contents from BETA and PROD together
|
2023-03-02 14:49:19 +01:00 |
Claudio Atzori
|
6f488547a7
|
ignore non processable records
|
2023-03-01 14:49:51 +01:00 |
Claudio Atzori
|
7d263f265e
|
adjusted logs
|
2023-03-01 11:58:07 +01:00 |
Claudio Atzori
|
9c59dac859
|
followup changes reorganising the mdstore synchronisation mechanism
|
2023-03-01 10:16:20 +01:00 |
Sandro La Bruzzo
|
78e51c182a
|
Added missing parametero to raw all workflow
|
2023-02-28 10:16:01 +01:00 |
Michele Artini
|
fddcf701e9
|
updated the order of the compatibilities
|
2023-02-22 12:07:09 +01:00 |
Sandro La Bruzzo
|
8920932dd8
|
Code formatted
|
2023-02-08 11:34:18 +01:00 |
Sandro La Bruzzo
|
6c81a161d2
|
Merge remote-tracking branch 'origin/beta' into 8231-mdstore-synch-improve
|
2023-02-08 10:29:09 +01:00 |
Miriam Baglioni
|
d6895f0387
|
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
|
2023-01-09 17:28:38 +01:00 |
Sandro La Bruzzo
|
3c9826f186
|
updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline
|
2022-12-21 11:21:17 +01:00 |
Miriam Baglioni
|
8685eaa706
|
[Clean Country] added test to verify remove of country
|
2022-12-16 15:31:25 +01:00 |
Miriam Baglioni
|
dc0ec88a58
|
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
|
2022-12-16 13:18:32 +01:00 |
Miriam Baglioni
|
d791840b82
|
[Clean Country] added test to verify remove of country:
|
2022-12-16 13:18:29 +01:00 |
Claudio Atzori
|
7b80b24f82
|
[cleaning] country cleaning must use both PID and AlternateIdentifier fields
|
2022-12-15 14:49:04 +01:00 |
Claudio Atzori
|
b8bafab8a0
|
[cleaning] improved vocabulary based mapping, specialization for the strict vocab cleaning
|
2022-12-12 14:43:03 +01:00 |
Sandro La Bruzzo
|
5e4866d033
|
implemented synch for single mdstore
|
2022-12-12 11:29:46 +01:00 |
Claudio Atzori
|
c18b8048c3
|
[cleaning] avoid NPE
|
2022-12-10 11:41:38 +01:00 |
Claudio Atzori
|
8b44afe5e5
|
[cleaning] avoid NPE
|
2022-12-09 15:44:57 +01:00 |
Claudio Atzori
|
389dd25430
|
[cleaning] avoid NPE
|
2022-12-08 18:40:48 +01:00 |
Claudio Atzori
|
730228d73d
|
[cleaning] align wf parameter names in test
|
2022-12-08 18:40:22 +01:00 |
Claudio Atzori
|
2094fa6db0
|
[cleaning] align wf parameter names
|
2022-12-08 17:22:26 +01:00 |
Miriam Baglioni
|
a485a94956
|
[Cleaning] fixed parameter name in property file
|
2022-12-08 16:59:34 +01:00 |
Miriam Baglioni
|
3d99b78d94
|
[Cleaning] fixed error in parameter (workingPath to workingDir)
|
2022-12-08 10:25:02 +01:00 |
Sandro La Bruzzo
|
5a48a2fb18
|
implemented synch for single mdstore
|
2022-12-01 11:34:43 +01:00 |
Claudio Atzori
|
8e3edba318
|
[graph cleaning] testing the collectedfron and hostedby patch procedure
|
2022-11-29 16:07:09 +01:00 |
Claudio Atzori
|
58c05731f9
|
[graph cleaning] WIP: testing the collectedfron and hostedby patch procedure
|
2022-11-29 11:21:51 +01:00 |
Claudio Atzori
|
11695ba649
|
[graph cleaning] patch also the result's collectedfrom and hostedby datasource name according to the datasource master-duplicate mapping
|
2022-11-28 10:18:43 +01:00 |
Claudio Atzori
|
24ef301cc1
|
[graph cleaning] patch the result's collectedfrom and hostedby identifiers according to the datasource master-duplicate mapping
|
2022-11-28 09:54:18 +01:00 |
Alessia Bardi
|
3c08269a4d
|
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
|
2022-11-22 17:31:00 +01:00 |
Alessia Bardi
|
2687fc9f73
|
tests for EOSC Future review - ROhub
|
2022-11-22 17:30:56 +01:00 |
Claudio Atzori
|
7c3390ac10
|
Merge branch 'beta' into eoscifguidelines-from-mdstores
|
2022-11-07 12:18:40 +01:00 |
Sandro La Bruzzo
|
2b9a20a4a3
|
Changed the way Scholexplorer filter the relationships, I found that filter all relation coming from openCitation is wrong, because we loose a lot of relation than intersect OpenCitation, but they don't come only from there
|
2022-10-24 12:53:47 +02:00 |
Alessia Bardi
|
208ed32315
|
fixed xpath for semantic relation
|
2022-10-23 18:18:13 +02:00 |
Alessia Bardi
|
ee759ac92d
|
file format after mvn compile
|
2022-10-23 18:09:47 +02:00 |
Alessia Bardi
|
31a10f000b
|
Map the field oaf:eoscifguidelines from mdstores. Currently we can find it in ROHub metadata
|
2022-10-23 18:05:37 +02:00 |
Claudio Atzori
|
ae7cd0735a
|
[graph2hive] more partitions
|
2022-10-14 15:47:58 +02:00 |
Claudio Atzori
|
b47aaf4dd1
|
[cleaning] subjects declared as belonging to specific vocabularies whose values are not found in the vocab are set to type keyword
|
2022-10-13 11:23:43 +02:00 |
Claudio Atzori
|
6163ecbf63
|
[cleaning] renamed parameters in wf action
|
2022-10-11 11:20:03 +02:00 |
Claudio Atzori
|
b301e9fdff
|
[cleaning] renamed action name/description
|
2022-10-11 11:08:52 +02:00 |
Claudio Atzori
|
ece40adc09
|
[cleaning] fixing NPE in the country cleaning phase
|
2022-10-11 10:10:20 +02:00 |
Claudio Atzori
|
8d97949316
|
[cleaning] fixed loop in wf nodes
|
2022-10-07 09:52:45 +02:00 |
Alessia Bardi
|
49360770d7
|
map w3id as instance url
|
2022-09-28 14:16:39 +02:00 |
Miriam Baglioni
|
b5b5a4c192
|
[CleanCountry] fixed issue
|
2022-09-28 12:42:51 +02:00 |
Claudio Atzori
|
3f90d159e3
|
code formatting
|
2022-09-27 15:08:00 +02:00 |
Claudio Atzori
|
0b3e44e521
|
Merge branch 'beta' into relation-from-odf
|
2022-09-27 14:57:01 +02:00 |
Claudio Atzori
|
57dbeb08d2
|
code formatting
|
2022-09-27 14:55:10 +02:00 |
Claudio Atzori
|
25e9d92aad
|
Merge branch 'beta' into clean_country
|
2022-09-27 14:27:49 +02:00 |
Alessia Bardi
|
fd63e9bfac
|
Mapping all relationships supported in ModelConstants and ModelSupport
|
2022-09-26 11:24:13 +02:00 |
Alessia Bardi
|
c5eb722170
|
relationships from relatedIdentifier whose target id type is one of the pid type with an authority
|
2022-09-23 15:47:05 +02:00 |
Claudio Atzori
|
c86cc53520
|
suppressing hyper verbose spark logs during unit test execution
|
2022-09-23 15:20:40 +02:00 |
Alessia Bardi
|
ba33ff71fd
|
refactoring for the generation of relationships from related identifier of type 'OPENAIRE'
|
2022-09-23 15:17:13 +02:00 |
Alessia Bardi
|
982bcc1e35
|
test wrid pid and record identifier
|
2022-09-23 12:06:06 +02:00 |
Claudio Atzori
|
c42850328e
|
fixed semantic (subreltype) for ServiceOrganization relations
|
2022-09-22 16:23:25 +02:00 |
Claudio Atzori
|
e45ec15221
|
Merge branch 'beta' into clean_country
|
2022-09-19 11:34:02 +02:00 |
Claudio Atzori
|
26e1badded
|
added instance.url syntactical validation, avoid creating multiple duplicated URLs
|
2022-09-19 11:19:10 +02:00 |
Claudio Atzori
|
192215a18e
|
merged from branch discard-non-wellformed
|
2022-09-19 10:17:10 +02:00 |
Claudio Atzori
|
e370e940d8
|
[aggregator graph] save invalid records aside for further inspection
|
2022-09-16 14:06:28 +02:00 |
Claudio Atzori
|
1e42d984e1
|
[aggregator graph] save invalid records aside for further inspection
|
2022-09-15 10:49:42 +02:00 |
Alessia Bardi
|
9e7ec4198f
|
fixed test
|
2022-09-14 18:08:56 +02:00 |
Claudio Atzori
|
c48f6e9c57
|
[aggregator graph] save invalid records aside for further inspection
|
2022-09-14 17:11:26 +02:00 |
Claudio Atzori
|
a0919ed495
|
[aggregator graph] save invalid records aside for further inspection
|
2022-09-14 13:27:39 +02:00 |
Alessia Bardi
|
b99a011345
|
return empty Oaf list if record cannot be parsed
|
2022-09-13 11:51:55 +02:00 |
Alessia Bardi
|
27af5122d2
|
logs for non well formed XML files
|
2022-09-12 14:25:23 +02:00 |
Claudio Atzori
|
ff6f789b6d
|
code formatting
|
2022-09-09 15:16:31 +02:00 |
Claudio Atzori
|
b5d6966c01
|
Merge branch 'beta' into clean_country
|
2022-09-09 12:20:19 +02:00 |
Claudio Atzori
|
b5f7bd30be
|
Merge branch 'beta' into clean_subjects
|
2022-09-09 12:20:04 +02:00 |
Alessia Bardi
|
a539c6ccaf
|
https for handle URLs
|
2022-09-09 12:16:28 +02:00 |
Claudio Atzori
|
1203378441
|
Merge branch 'beta' into clean_subjects
|
2022-09-09 10:38:47 +02:00 |
Claudio Atzori
|
14dc909a14
|
Merge branch 'beta' into clean_country
|
2022-09-09 10:38:17 +02:00 |
Alessia Bardi
|
9ef063d502
|
#7861#note-8 instance url from handle
|
2022-09-07 17:29:54 +03:00 |
Alessia Bardi
|
5c45d52af3
|
testing for RiuNet
|
2022-09-07 15:40:57 +03:00 |
Alessia Bardi
|
a11eb38065
|
testing for RO-Hub
|
2022-09-02 16:07:36 +02:00 |
Claudio Atzori
|
b7c387c21f
|
cleaning of subjects: avoid duplicated subjects, prioritise collected vs inferred or other sources
|
2022-08-12 15:09:16 +02:00 |
Claudio Atzori
|
adb526b0e1
|
Merge branch 'beta' into clean_subjects
|
2022-08-12 10:51:17 +02:00 |
Claudio Atzori
|
cb7c07c54e
|
[scholix] added step to create tar archive
|
2022-08-11 11:25:24 +02:00 |
Claudio Atzori
|
2aa16d0432
|
[scholix] fixed OpenCitation dump procedure
|
2022-08-10 17:39:29 +02:00 |
Miriam Baglioni
|
7dbdd4a0fe
|
[Clean Country]changes related to D-Net/dnet-hadoop#241 (comment)
|
2022-08-10 15:13:10 +02:00 |
Claudio Atzori
|
51ad93e545
|
[scholix] fixed OpenCitation dump procedure
|
2022-08-10 11:57:56 +02:00 |
Miriam Baglioni
|
62d2138806
|
[Clean Context] changed a bit the logic. Added the check not to have result hosted by a datasource of type institutional repository from NL. Added also the check that the country should have been included in the result via propagation for it to be removed
|
2022-08-08 14:10:47 +02:00 |
Claudio Atzori
|
3418ce50ac
|
cleaning of subjects: perform the cleaning when the given value is equivalent to one of the terms in the vocabulary
|
2022-08-08 12:48:47 +02:00 |
Miriam Baglioni
|
390013a4b2
|
mergin with branch beta
|
2022-08-08 12:30:31 +02:00 |
Claudio Atzori
|
4eaa063b1f
|
cleaning of subjects
|
2022-08-05 16:56:09 +02:00 |
Claudio Atzori
|
32cee1f619
|
WIP: cleaning of subjects
|
2022-08-05 12:32:08 +02:00 |
Claudio Atzori
|
6c0fd9284b
|
merge from beta
|
2022-08-05 10:42:53 +02:00 |
Claudio Atzori
|
b78889a0ce
|
WIP: cleaning of subjects
|
2022-08-05 09:11:37 +02:00 |
Miriam Baglioni
|
a7a18d7630
|
[Graph Dump] removed code for the dump from the project. Fixed issues in tests when possible
|
2022-08-04 17:40:40 +02:00 |
Claudio Atzori
|
27a91841e7
|
WIP: cleaning of subjects
|
2022-08-04 11:39:39 +02:00 |
Claudio Atzori
|
e62018e95d
|
[aggregator graph] added more assertions in test
|
2022-08-03 12:26:05 +02:00 |
Claudio Atzori
|
f62c4e05cd
|
code formatting
|
2022-07-29 11:56:01 +02:00 |
Claudio Atzori
|
1dd1e4fe3a
|
extended test for mapping project_organization relations
|
2022-07-28 11:27:08 +02:00 |
Claudio Atzori
|
09ccc7b472
|
Merge branch 'beta' into project_organization_contribution
|
2022-07-28 09:49:59 +02:00 |
Miriam Baglioni
|
5968ec018d
|
[Clean Country] modified workflow and added param file
|
2022-07-22 16:48:38 +02:00 |
Miriam Baglioni
|
a12d28c644
|
[Clean Country] added logic not to remove country from result if it exist a hosting datasource with that country. Moreover the country will be removed only if added with propagation
|
2022-07-22 16:23:12 +02:00 |