Commit Graph

2788 Commits

Author SHA1 Message Date
miconis 680bfa490f Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-09-17 11:27:25 +02:00
miconis 43ac539414 minor change 2021-09-17 11:27:06 +02:00
Miriam Baglioni c26980f1c4 Adding spark.close() to avoid Only one SparkContext may be running in this JVM error while running test on Jenkins and fixed issue 2021-07-13 10:33:00 +02:00
Miriam Baglioni 4f309e625c Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-07-12 10:07:01 +02:00
Miriam Baglioni 1ea66e8917 some more tests for authormerger 2021-07-12 10:06:29 +02:00
Claudio Atzori ae2b47b29d [broker] added coalesce(1) on the stats dataset before storing it on postgres 2021-07-09 15:47:51 +02:00
Miriam Baglioni c30f3ce647 merge doi normalization 2021-07-08 19:20:02 +02:00
Miriam Baglioni 6e987fc084 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-07-08 18:57:25 +02:00
Miriam Baglioni b0d86d32b0 added list of author to be merged 2021-07-08 18:56:29 +02:00
Miriam Baglioni abe546e5ba added resource files for test author merger for empy crossref and other merging providers (related to DoiBoostAuthorMerger) 2021-07-08 18:55:55 +02:00
Miriam Baglioni bf24f588e2 Added test for empty author list for crossref and other merging providers (related to DoiBoostAuthorMerger) 2021-07-08 18:55:13 +02:00
Miriam Baglioni 96255fa647 - 2021-07-08 18:54:27 +02:00
Miriam Baglioni 0e47e94099 Added variable to verify if crossref is base for the merging of authors (related to DoiBoostAuthorMerger) 2021-07-08 18:54:07 +02:00
Miriam Baglioni 434aa6380b Adding description of the merging process for DoiBoost (related to DoiBoostAuthorMerger) - to be refined 2021-07-08 18:53:15 +02:00
Miriam Baglioni e0e80cde22 Added class to store the most similar author list to be enriched w.r.t. one enriching author (related to DoiBoostAuthorMerger) 2021-07-08 18:52:25 +02:00
Miriam Baglioni 97e0c27db9 Added check for empty author list. If crossref is empty, the longest from all the merging providers is taken. If crossref is not empty, crossref is chosen as base for the enrichment 2021-07-08 15:27:05 +02:00
Claudio Atzori b7b8e0986e [raw_all] The claim merge procedure includes the claimed contexts in the merged result 2021-07-08 10:42:31 +02:00
Claudio Atzori fdcff42e46 [raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity 2021-07-07 19:01:59 +02:00
Claudio Atzori 777536ce91 [aggregation] string values used as regular expressions in the OAI collection classes are defined in a single point as constants, to be reused across the code (PR#122) 2021-07-07 11:23:48 +02:00
Claudio Atzori bc014023c8 Merge pull request 'to solve the scala SI-3623' (#122) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
Reviewed-on: D-Net/dnet-hadoop#122
2021-07-07 11:13:51 +02:00
Claudio Atzori 32bdfdccbc [raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity 2021-07-07 11:08:27 +02:00
Andreas Czerniak ebf3f47a02 from&until more OAI2.0 compl., adding tfs 2021-07-07 09:29:49 +02:00
Claudio Atzori f580cb77e1 added mapping for claim relation 'resultResult_publicationDataset_isRelatedTo' (present on BETA) 2021-07-06 21:11:11 +02:00
Andreas Czerniak 3531802710 to solve the scala SI-3623 2021-07-06 11:30:56 +02:00
Claudio Atzori 70ded407bb HttpClient used in metadata collection retries also on 404 2021-07-05 18:04:30 +02:00
Miriam Baglioni 3ed90420e4 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-07-05 16:48:19 +02:00
Miriam Baglioni 7498e63174 added resource files for testing of DoiBoostAuthorMerger 2021-07-05 16:26:46 +02:00
Miriam Baglioni 22ce947335 added resource files for testing of DoiBoostAuthorMerger 2021-07-05 16:26:17 +02:00
Miriam Baglioni f64f5d9e23 first implementation and test class for the specific Author Merger for doiboost. First change: crossref as base to be enriched. Modified the normalization function to remove accents from words 2021-07-05 16:24:47 +02:00
Miriam Baglioni 238d692a0a apply specific AuthorMerger for doiboost 2021-07-05 16:23:33 +02:00
Miriam Baglioni 7177c25261 added check for null value during doi normalization 2021-07-05 16:22:38 +02:00
Miriam Baglioni 0892cad4e8 the normalization of the content of value was not visible outside the block. Moved doi normalization operation while returning value 2021-07-05 16:21:42 +02:00
Claudio Atzori 350a0823bd Merge pull request 'using organization ids instead of names in monitor db creation' (#121) from antonis.lempesis/dnet-hadoop:stable_ids into stable_ids
Reviewed-on: D-Net/dnet-hadoop#121
2021-07-05 11:07:39 +02:00
Antonis Lempesis 89e6f46682 using organization ids instead of names in monitor db creation 2021-07-05 12:00:00 +03:00
Miriam Baglioni bc34347643 added assertions to verify doi normalization 2021-06-30 14:37:08 +02:00
Miriam Baglioni 86f47afcc7 slight modification of the resource to accomodate also doi normalization tests 2021-06-30 14:36:49 +02:00
Miriam Baglioni 03767ea8e6 slight modification of the resource to accomodate also doi normalization tests 2021-06-30 13:21:24 +02:00
Miriam Baglioni f8eec0ca9a added resource to test the normalization of doi during the import of MAG 2021-06-30 13:19:54 +02:00
Miriam Baglioni 149f85ddf5 added tests for the normalization of the dois 2021-06-30 13:00:52 +02:00
Miriam Baglioni e487b5544c added tests for the normalization of the dois 2021-06-30 12:57:11 +02:00
Miriam Baglioni 1503ccbbb5 added tests for the normalization of the dois 2021-06-30 12:55:37 +02:00
Miriam Baglioni 1299bfb357 Added class to test the normalization of doi 2021-06-30 12:53:27 +02:00
Miriam Baglioni cf758f4f91 added normalization step for the doi 2021-06-30 10:03:15 +02:00
Miriam Baglioni 801763a0fa there is no more the need to lower case the doi since it is done in the first step. Also changed the creation of the id by using the factory 2021-06-29 19:07:23 +02:00
Miriam Baglioni a74de1cda2 added normalization step to the doi 2021-06-29 18:51:11 +02:00
Miriam Baglioni 06074ea7d3 added normalization step to the doi 2021-06-29 18:46:08 +02:00
Miriam Baglioni 8b8ffe82dc added step of normalization for the doi 2021-06-29 18:41:39 +02:00
Miriam Baglioni 50cc21d92e Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.) 2021-06-29 18:35:28 +02:00
Claudio Atzori 6d3f960238 Merge pull request 'added the missing indicators files' (#120) from antonis.lempesis/dnet-hadoop:stable_ids into stable_ids
Reviewed-on: D-Net/dnet-hadoop#120
2021-06-29 15:57:39 +02:00
Antonis Lempesis ae18171212 Merge branch 'stable_ids' into stable_ids 2021-06-29 15:33:39 +02:00