Giambattista Bloisi
|
3f22c101d9
|
Merge pull request 'Enrich authors with ORCID info using new matching algorithm' (#398) from new_orcid_enhancement into beta
Reviewed-on: #398
|
2024-03-22 17:29:20 +01:00 |
Claudio Atzori
|
aaa73f89d1
|
refactoring the Oaf records merge utilities into dhp-common
|
2024-03-22 16:34:03 +01:00 |
Sandro La Bruzzo
|
58dbe71d39
|
update crossref mapping to be runnable separately as a single datasource outside doiboost
|
2024-03-20 17:04:52 +01:00 |
Giambattista Bloisi
|
664a381d31
|
Unify merge logic of entities in MergeUtils.class
|
2024-03-18 16:04:49 +01:00 |
Claudio Atzori
|
af154d4456
|
implemented changes from #9497: sort abstracts by string length, included author fullnames in the related results, expanded instance details within each children/result XML element
|
2024-03-14 16:21:23 +01:00 |
Sandro La Bruzzo
|
c532831718
|
Moved Crossref Mapping on dhp-aggregations,
refactored code, avoid to use utility for create part of the oaf defined in DOIBoostMappingUtils, used instead utility in OafMappingUtils
|
2024-03-13 06:56:10 +01:00 |
Giambattista Bloisi
|
9092075760
|
Enrich authors with ORCID info using new matching algorithm
|
2024-03-11 13:23:59 +01:00 |
Claudio Atzori
|
bb82052c40
|
[graph cleaning] rule out datasources without an officialname
|
2024-02-05 14:59:27 +02:00 |
Claudio Atzori
|
e8630a6d03
|
[graph cleaning] rule out datasources without an officialname
|
2024-02-05 14:59:06 +02:00 |
Claudio Atzori
|
4d0c59669b
|
merged changes from beta
|
2024-01-26 16:08:54 +01:00 |
Claudio Atzori
|
9e8fc6aa88
|
[collection] increased logging from the oai-pmh metadata collection process
|
2024-01-26 09:17:20 +01:00 |
Claudio Atzori
|
3e96777cc4
|
[collection] increased logging from the oai-pmh metadata collection process
|
2024-01-23 15:21:03 +01:00 |
Claudio Atzori
|
6fd25cf549
|
code formatting
|
2024-01-23 08:47:12 +01:00 |
Claudio Atzori
|
f76852f385
|
Merge branch 'beta' into update_pivots_table
|
2024-01-22 16:37:22 +01:00 |
Claudio Atzori
|
1c6db320f4
|
[graph provision] obtain context info from the context API instead from the ISLookUp service
|
2024-01-22 15:53:17 +01:00 |
Giambattista Bloisi
|
21a14fcd80
|
Reusable RunSQLSparkJob for executing SQL in Spark through Oozie Spark Actions
Implements pivots table update oozie workflow
|
2024-01-15 10:18:14 +01:00 |
Claudio Atzori
|
1726f49790
|
code formatting
|
2023-12-15 10:37:02 +01:00 |
Claudio Atzori
|
98cce5bfb2
|
code formatting
|
2023-12-12 09:59:05 +01:00 |
Claudio Atzori
|
84d54643cf
|
[cleaning] allow enriched orcids to pass the cleaning, rule out non-orcid author pids
|
2023-12-12 09:57:00 +01:00 |
Claudio Atzori
|
aba95ed1d1
|
code formatting
|
2023-12-08 17:06:19 +01:00 |
Claudio Atzori
|
34abd0fc43
|
Merge branch 'beta' into clean_license_publisher
|
2023-12-08 16:58:27 +01:00 |
Giambattista Bloisi
|
613ec5ffce
|
Add profiles for different spark versions: spark-24, spark-34, spark-35
|
2023-12-05 19:11:06 +01:00 |
Claudio Atzori
|
7c3041b276
|
avoid NPEs
|
2023-12-03 16:49:49 +01:00 |
Claudio Atzori
|
74b185d07b
|
avoid NPEs
|
2023-12-03 16:18:20 +01:00 |
Claudio Atzori
|
e6086efc53
|
avoid NPEs in Vocabulary.getTermBySynonym
|
2023-12-03 13:33:20 +01:00 |
Claudio Atzori
|
d33f578e54
|
code formatting
|
2023-12-01 15:14:17 +01:00 |
Claudio Atzori
|
622fafbd2e
|
Merge branch 'beta' into orcid_import
|
2023-12-01 12:28:14 +01:00 |
Sandro La Bruzzo
|
bf0fd27c36
|
Removed unused function
Applied PR Comment of Giambattista in the PR
|
2023-12-01 12:16:42 +01:00 |
Sandro La Bruzzo
|
cdfb7588dd
|
code formatting
|
2023-11-30 15:31:42 +01:00 |
Sandro La Bruzzo
|
5e22b67b8a
|
Merge remote-tracking branch 'origin/beta' into orcid_import
|
2023-11-30 15:27:46 +01:00 |
Claudio Atzori
|
4e1aac2e2f
|
resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350
|
2023-11-29 14:37:52 +01:00 |
Sandro La Bruzzo
|
aa239ec673
|
Changed implementation of check similarity to verify exact match of name instead of the first char
|
2023-11-29 11:17:41 +01:00 |
Sandro La Bruzzo
|
59111713fa
|
added comment
|
2023-11-28 09:00:48 +01:00 |
Sandro La Bruzzo
|
6f4d0c05ea
|
Implemented Author MErger for ORCID that takes in account the case when name and surname are swapped
|
2023-11-28 08:43:56 +01:00 |
Sandro La Bruzzo
|
34a4b3cbdf
|
Implemented ORCID Enrichment
|
2023-11-24 12:39:58 +01:00 |
Claudio Atzori
|
1ba582de3c
|
[graph cleaning] added cleaning for result.publisher and result.instance.license
|
2023-11-23 16:27:19 +01:00 |
Claudio Atzori
|
11a1207f9c
|
[graph cleaning] applying coar based vocabularies in bulk
|
2023-11-22 12:22:14 +01:00 |
Claudio Atzori
|
5f1ed61c1f
|
merging from bulkTag branch
|
2023-11-03 12:51:37 +01:00 |
Claudio Atzori
|
dde2fec035
|
[graph cleaning] cleanup
|
2023-10-31 14:35:33 +01:00 |
Claudio Atzori
|
262d7c581b
|
[graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898
|
2023-10-31 14:34:10 +01:00 |
Miriam Baglioni
|
0097f4e64b
|
Removed Query community testing. Removed package from common related to the interaction with Zenodo since it was moved to the dump-project
|
2023-10-26 09:38:09 +02:00 |
Miriam Baglioni
|
a9ede1e989
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2023-10-20 10:14:43 +02:00 |
Claudio Atzori
|
b0fed1725e
|
avoid NPEs
|
2023-10-19 12:13:45 +02:00 |
Claudio Atzori
|
a24178cb93
|
Merge branch 'beta' into resource_types
|
2023-10-17 11:09:50 +02:00 |
Claudio Atzori
|
d28b7085f6
|
more NPE checks
|
2023-10-17 11:09:31 +02:00 |
Giambattista Bloisi
|
0e44b037a5
|
FIX: GroupEntitiesSparkJob deletes whole graph outputPath instead of its temporary folder
|
2023-10-17 07:54:01 +02:00 |
Claudio Atzori
|
39d24d5469
|
Merge branch 'beta' into resource_types
|
2023-10-16 11:56:38 +02:00 |
Claudio Atzori
|
05ee7d8b09
|
[graph cleaning] avoid NPEs
|
2023-10-12 09:13:42 +02:00 |
Claudio Atzori
|
554551682d
|
[raw graph] adopting the new COAR based vocabularies for the resource typing
|
2023-10-11 16:09:19 +02:00 |
Claudio Atzori
|
8108491722
|
Merge branch 'beta' into peer_reviewed
|
2023-10-06 14:21:52 +02:00 |
Giambattista Bloisi
|
2f3cf6d0e7
|
Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0' character
|
2023-10-06 14:20:15 +02:00 |
Giambattista Bloisi
|
2c235e82ad
|
Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0' character
|
2023-10-06 12:35:54 +02:00 |
Claudio Atzori
|
eed9fe0902
|
code formatting
|
2023-10-06 12:31:17 +02:00 |
Claudio Atzori
|
73c49b8d26
|
Merge branch 'beta' into SWH_integration
|
2023-10-06 12:21:51 +02:00 |
Claudio Atzori
|
c9a5ad6a02
|
extending the coverage of the peer non-unknown refereed instances
|
2023-10-02 16:28:42 +02:00 |
Serafeim Chatzopoulos
|
ab0d70691c
|
Add step for archiving repoUrls to SWH
|
2023-09-28 20:56:18 +03:00 |
Serafeim Chatzopoulos
|
ed9c81a0b7
|
Add steps to collect last visit data && archive not found repository URLs
|
2023-09-27 19:00:54 +03:00 |
Claudio Atzori
|
8a6892cc63
|
[graph dedup] consistency wf should not remove the relations while dispatching the entities
|
2023-09-12 21:27:05 +02:00 |
Claudio Atzori
|
dc80ab14d3
|
[graph dedup] consistency wf should not remove the relations while dispatching the entities
|
2023-09-12 14:34:28 +02:00 |
Giambattista Bloisi
|
6cc7d8ca7b
|
GroupEntities and DispatchEntites are now merged in GroupEntitiesSparkJob
|
2023-08-30 10:43:31 +02:00 |
Claudio Atzori
|
bf35280ea6
|
code formatting
|
2023-08-29 11:11:00 +02:00 |
Giambattista Bloisi
|
95cd2b9b1e
|
Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
|
2023-08-10 11:53:48 +02:00 |
Giambattista Bloisi
|
fab9920271
|
DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
|
2023-08-09 15:41:43 +02:00 |
Miriam Baglioni
|
599828ce35
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2023-08-09 13:07:13 +02:00 |
Miriam Baglioni
|
c25ac21e5e
|
Merge pull request 'graph cleaning, suggestions from ticket 8898' (#325) from cleaning_8898 into beta
Reviewed-on: #325
|
2023-08-08 11:14:19 +02:00 |
Claudio Atzori
|
0bc74e2000
|
code formatting
|
2023-08-02 11:52:10 +02:00 |
Claudio Atzori
|
7180911ded
|
[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests
|
2023-08-02 11:44:14 +02:00 |
Claudio Atzori
|
b9dddbfe54
|
rule out records with NULL dataInfo, except for Relations
|
2023-07-31 17:53:54 +02:00 |
Claudio Atzori
|
da1727f93f
|
rule out records with NULL dataInfo, except for Relations
|
2023-07-31 17:52:56 +02:00 |
Claudio Atzori
|
11ffb9bd68
|
rule out records with NULL dataInfo
|
2023-07-31 12:35:33 +02:00 |
Claudio Atzori
|
ccac6a7f75
|
rule out records with NULL dataInfo
|
2023-07-31 12:35:05 +02:00 |
Claudio Atzori
|
d512df8612
|
code formatting
|
2023-07-26 09:14:08 +02:00 |
Claudio Atzori
|
d8435a6512
|
inverted condition
|
2023-07-25 17:39:57 +02:00 |
Claudio Atzori
|
59764145bb
|
cherry picked & fixed commit 270df939c4
|
2023-07-25 17:39:00 +02:00 |
Claudio Atzori
|
270df939c4
|
partial implementation of the suggestions from https://support.openaire.eu/issues/8898
|
2023-07-25 17:29:50 +02:00 |
Claudio Atzori
|
c754397a19
|
Merge branch 'beta' into pid_cleaning
|
2023-07-24 10:49:31 +02:00 |
Giambattista Bloisi
|
38dfebfbe6
|
Disable MdStoreClientTest test as it requires a local mongodb running and it does not perform any assertions
|
2023-07-19 14:18:56 +02:00 |
Miriam Baglioni
|
9e8e39f78a
|
-
|
2023-07-19 11:35:58 +02:00 |
Claudio Atzori
|
f3a85e224b
|
merged from branch beta the bulk tagging (single step, negative constraints), the cleanig worflow (single step, pid type based cleaning), instance level fulltext
|
2023-06-28 13:33:57 +02:00 |
Sandro La Bruzzo
|
9910ce06ae
|
added to CreateSimRel the feature to write time log
|
2023-06-28 11:38:16 +02:00 |
Sandro La Bruzzo
|
b195da3a83
|
Added utility to write time logs during the deduplication phase
|
2023-06-28 11:20:09 +02:00 |
Claudio Atzori
|
0f5a819f44
|
[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests
|
2023-06-23 16:10:49 +02:00 |
Miriam Baglioni
|
e4b27182d0
|
[master] refactoring
|
2023-06-21 11:15:53 +02:00 |
Claudio Atzori
|
1d33074fd1
|
WIP: pid cleaning
|
2023-06-09 16:47:25 +02:00 |
Miriam Baglioni
|
d9506035e4
|
[ZenodoApi] gone back to okhttp3 to send the payload.
|
2023-06-09 12:05:02 +02:00 |
Claudio Atzori
|
8a463cc3e8
|
fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common
|
2023-05-15 15:44:46 +02:00 |
Claudio Atzori
|
d02916ef82
|
code formatting
|
2023-05-02 11:05:37 +02:00 |
Claudio Atzori
|
851f664bd9
|
Merge branch 'beta' into graph_cleaning_refactoring
|
2023-05-02 09:55:40 +02:00 |
Miriam Baglioni
|
9fc8ebe98b
|
refactoring
|
2023-04-19 09:32:13 +02:00 |
Miriam Baglioni
|
73f77575bd
|
[ZenodoApiClient] align with master version
|
2023-04-18 10:25:27 +02:00 |
Miriam Baglioni
|
24c41806ac
|
[ZenodoApiClienttest] change test to mirror change in the omplementation
|
2023-04-18 09:08:09 +02:00 |
Miriam Baglioni
|
087b5a7973
|
[ZenodiAPIClient] new version of the API to connect to Zenodo (change the http client
|
2023-04-17 18:59:22 +02:00 |
Miriam Baglioni
|
c6a7602b3e
|
refactoring after compilation
|
2023-04-06 14:45:01 +02:00 |
Claudio Atzori
|
2a6ba29b64
|
[graph cleaning] unit tests & cleanup
|
2023-04-04 12:34:51 +02:00 |
Miriam Baglioni
|
9a9cc6a1dd
|
changed the way the tar archive is build to support renaming in case we need to change .tt.gz into .json.gz
|
2023-04-04 11:40:58 +02:00 |
Claudio Atzori
|
6d3d18d8b5
|
[graph cleaning] WIP: refactoring of the cleaning stages
|
2023-03-16 17:23:36 +01:00 |
Miriam Baglioni
|
32870339f5
|
refactoring after compile
|
2023-02-13 13:06:48 +01:00 |
Sandro La Bruzzo
|
0b9819f1ab
|
Code formatted
|
2023-02-08 10:32:33 +01:00 |
Sandro La Bruzzo
|
6c81a161d2
|
Merge remote-tracking branch 'origin/beta' into 8231-mdstore-synch-improve
|
2023-02-08 10:29:09 +01:00 |
Miriam Baglioni
|
b713132db7
|
[Cleaning] adding missing classes
|
2022-12-21 12:49:08 +01:00 |