Commit Graph

2530 Commits

Author SHA1 Message Date
Sandro La Bruzzo 31d2d6d41e Scholexplorer: introduction of dedup openaire 2021-07-21 18:09:32 +02:00
Miriam Baglioni b226ba4439 mergin with branch beta 2021-07-21 09:46:40 +02:00
Claudio Atzori 10d7b4f0b4 filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure 2021-07-20 11:52:05 +02:00
Miriam Baglioni 83fe31c92e changed the name of the workflows 2021-07-19 18:19:14 +02:00
Miriam Baglioni dd81c36b60 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-07-19 18:18:14 +02:00
Miriam Baglioni 54acc5373b changed the name of the workflows 2021-07-19 18:18:09 +02:00
Miriam Baglioni b420b11ed3 duplicate the number of partitions in ProcessMag 2021-07-19 18:16:23 +02:00
Claudio Atzori 65934888a1 adding record identifier among the originalIds regardless of what IdentifierFactory produces 2021-07-19 17:52:52 +02:00
Claudio Atzori 0977baf41d contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph 2021-07-19 17:43:52 +02:00
Miriam Baglioni 662c396354 duplicate the number of partitions in ConvertCrossrefToOaf 2021-07-19 12:41:14 +02:00
Miriam Baglioni 59530a14fb DoiBoost AccessRigh #4362 - set BestAccessRight with the ususal comparator 2021-07-19 12:34:35 +02:00
Miriam Baglioni 199123b74b DoiBoost AccessRigh #4362 - Fixed issue on date formatting. Added test method and associated resource 2021-07-16 17:30:27 +02:00
Miriam Baglioni 3bc9a05bc9 mergin with branch beta 2021-07-16 10:32:27 +02:00
Miriam Baglioni 34506df1b6 DoiBoost AccessRigh #4362 - if the journal is open, the OPEN access right is set to all instances and color is GOLD (overwrite if the color was already set in one of the previous steps) 2021-07-16 10:29:51 +02:00
Claudio Atzori bf9e0d2d4f Merge pull request 'orcid-no-doi' (#123) from enrico.ottonello/dnet-hadoop:orcid-no-doi into beta
Reviewed-on: D-Net/dnet-hadoop#123
2021-07-15 17:59:41 +02:00
Sandro La Bruzzo 7e2caafe84 Scholexplorer: fixed mapping typologies 2021-07-15 09:53:12 +02:00
Miriam Baglioni 4da46bb62f mergin with branch beta 2021-07-14 15:08:52 +02:00
Miriam Baglioni 09ad7b2a9e DoiBoost AccessRigh #4362 - Unpaywall mapped to OAF with OPEN instance (non oa are filtered out) (unknown hostedby) + map the color as it is 2021-07-14 14:45:21 +02:00
Miriam Baglioni f4f7c6f9d3 DoiBoost AccessRigh #4362 - Unpaywall mapped to OAF with OPEN instance (non oa are filtered out) (unknown hostedby) + map the color as it is 2021-07-14 14:44:54 +02:00
Miriam Baglioni 6222adf176 DoiBoost AccessRigh #4362 - added resources and test for crossref mapping (licence part included) 2021-07-14 14:42:34 +02:00
Miriam Baglioni 981b1018f6 DoiBoost AccessRigh #4362 - decide access right according to licence. Default access right is Unknown 2021-07-14 14:42:06 +02:00
Sandro La Bruzzo 3d8e2aa146 Code refactor:
- removed old workflows in doiboost
 - splitted workflow of doiboost in preprocess and process
2021-07-14 14:37:06 +02:00
Miriam Baglioni 441701c85c DoiBoost AccessRigh #4362 - If multiple licenses are available, take the one applied to 'vor' 2021-07-14 14:14:50 +02:00
Sandro La Bruzzo c35c117601 fixed process doiboost workflow:
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 12:48:01 +02:00
Sandro La Bruzzo bbe8193930 merged stable ids 2021-07-12 17:00:43 +02:00
Claudio Atzori ae2b47b29d [broker] added coalesce(1) on the stats dataset before storing it on postgres 2021-07-09 15:47:51 +02:00
Sandro La Bruzzo 57c74c73c6 fixed mistakes in oozie workflow 2021-07-09 12:28:09 +02:00
Sandro La Bruzzo 61ccb54fde removed wrong loop on oozie wf 2021-07-09 12:17:57 +02:00
Sandro La Bruzzo 9f5a0f3ab6 moved wf indexing of Scholexplorer in dhp-graph-provision 2021-07-09 12:06:43 +02:00
Sandro La Bruzzo 09fccf8000 added workflow to serialize scholix and summary in json 2021-07-09 11:01:42 +02:00
Sandro La Bruzzo 0ea576745f updated CreateInputGraph because ggenerics don't work on Spark Dataset 2021-07-09 10:29:24 +02:00
Sandro La Bruzzo cd17e19044 implemented branch workflow to import datacite and crossref in scholexplorer 2021-07-08 21:20:19 +02:00
Miriam Baglioni c30f3ce647 merge doi normalization 2021-07-08 19:20:02 +02:00
Sandro La Bruzzo 8a034e46e1 updated baseline workflow 2021-07-08 11:11:41 +02:00
Claudio Atzori b7b8e0986e [raw_all] The claim merge procedure includes the claimed contexts in the merged result 2021-07-08 10:42:31 +02:00
Sandro La Bruzzo 0799ac9fb6 fixed wrong path 2021-07-08 10:36:37 +02:00
Sandro La Bruzzo 4d53402712 extended ebiLinks to create a dataset before generation of OAF 2021-07-08 10:26:21 +02:00
Sandro La Bruzzo a4a54a3786 code refactor 2021-07-08 09:08:25 +02:00
Sandro La Bruzzo a01dbe0ab0 completed workflow of generation of scholix and summaries 2021-07-07 23:10:34 +02:00
Claudio Atzori fdcff42e46 [raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity 2021-07-07 19:01:59 +02:00
Claudio Atzori 777536ce91 [aggregation] string values used as regular expressions in the OAI collection classes are defined in a single point as constants, to be reused across the code (PR#122) 2021-07-07 11:23:48 +02:00
Claudio Atzori bc014023c8 Merge pull request 'to solve the scala SI-3623' (#122) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
Reviewed-on: D-Net/dnet-hadoop#122
2021-07-07 11:13:51 +02:00
Claudio Atzori 32bdfdccbc [raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity 2021-07-07 11:08:27 +02:00
Andreas Czerniak ebf3f47a02 from&until more OAI2.0 compl., adding tfs 2021-07-07 09:29:49 +02:00
Claudio Atzori f580cb77e1 added mapping for claim relation 'resultResult_publicationDataset_isRelatedTo' (present on BETA) 2021-07-06 21:11:11 +02:00
Sandro La Bruzzo ed684874f2 deleted old scholix project 2021-07-06 17:20:08 +02:00
Sandro La Bruzzo 8535506c22 added scholix generation 2021-07-06 17:18:06 +02:00
Sandro La Bruzzo 4c54bd8742 add test to verify merge scholix on source 2021-07-06 11:32:14 +02:00
Andreas Czerniak 3531802710 to solve the scala SI-3623 2021-07-06 11:30:56 +02:00
Sandro La Bruzzo 7d8db2eb8a betterRenamingMethod 2021-07-06 09:56:32 +02:00
Sandro La Bruzzo c952c8d236 generate first side of scholix mapping 2021-07-06 09:53:14 +02:00
Claudio Atzori 70ded407bb HttpClient used in metadata collection retries also on 404 2021-07-05 18:04:30 +02:00
Miriam Baglioni 7177c25261 added check for null value during doi normalization 2021-07-05 16:22:38 +02:00
Miriam Baglioni 0892cad4e8 the normalization of the content of value was not visible outside the block. Moved doi normalization operation while returning value 2021-07-05 16:21:42 +02:00
Antonis Lempesis 89e6f46682 using organization ids instead of names in monitor db creation 2021-07-05 12:00:00 +03:00
Sandro La Bruzzo e4b84ef5d6 fixed mapping OAF to Scholix summary 2021-07-02 16:48:48 +02:00
Sandro La Bruzzo c6fa8598e1 massive code refactor:
removed modules dhp-*-scholexplorer
2021-07-01 22:13:45 +02:00
Antonis Lempesis 829caee4fd added the missing indicators files 2021-06-30 17:31:33 +02:00
Sandro La Bruzzo 84b834c893 added test dataset test for pangaea 2021-06-30 17:31:09 +02:00
Sandro La Bruzzo 1a6b398968 implemented Creation of Raw Graph and Resolution 2021-06-30 17:27:55 +02:00
Miriam Baglioni bc34347643 added assertions to verify doi normalization 2021-06-30 14:37:08 +02:00
Miriam Baglioni 86f47afcc7 slight modification of the resource to accomodate also doi normalization tests 2021-06-30 14:36:49 +02:00
Miriam Baglioni 03767ea8e6 slight modification of the resource to accomodate also doi normalization tests 2021-06-30 13:21:24 +02:00
Miriam Baglioni f8eec0ca9a added resource to test the normalization of doi during the import of MAG 2021-06-30 13:19:54 +02:00
Miriam Baglioni 149f85ddf5 added tests for the normalization of the dois 2021-06-30 13:00:52 +02:00
Miriam Baglioni e487b5544c added tests for the normalization of the dois 2021-06-30 12:57:11 +02:00
Miriam Baglioni 1503ccbbb5 added tests for the normalization of the dois 2021-06-30 12:55:37 +02:00
Miriam Baglioni 1299bfb357 Added class to test the normalization of doi 2021-06-30 12:53:27 +02:00
Sandro La Bruzzo 623a0c4edb code Refactor, renaming packages 2021-06-30 11:09:30 +02:00
Miriam Baglioni cf758f4f91 added normalization step for the doi 2021-06-30 10:03:15 +02:00
Miriam Baglioni 801763a0fa there is no more the need to lower case the doi since it is done in the first step. Also changed the creation of the id by using the factory 2021-06-29 19:07:23 +02:00
Miriam Baglioni a74de1cda2 added normalization step to the doi 2021-06-29 18:51:11 +02:00
Miriam Baglioni 06074ea7d3 added normalization step to the doi 2021-06-29 18:46:08 +02:00
Miriam Baglioni 8b8ffe82dc added step of normalization for the doi 2021-06-29 18:41:39 +02:00
Miriam Baglioni 50cc21d92e Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.) 2021-06-29 18:35:28 +02:00
Antonis Lempesis 87f14a3899 added the missing indicators files 2021-06-29 16:31:51 +03:00
Sandro La Bruzzo db933ebd21 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-06-29 14:16:12 +02:00
Sandro La Bruzzo 7e08655e5f added relation dates in all scholexplorer Datasources 2021-06-29 12:02:03 +02:00
Sandro La Bruzzo 075055eaca added relation dates in bio mapping 2021-06-29 10:33:09 +02:00
Sandro La Bruzzo f36f92287d implemented mapping from Crossref Event Data to Oaf 2021-06-29 10:21:23 +02:00
Antonis Lempesis 018c4eb52c copied latest changes from old fork: indicators+monitor institutions 2021-06-28 23:46:52 +03:00
Sandro La Bruzzo 511ec14c63 implemented mapping from EBI and Scholix Resolved to OAF 2021-06-28 22:04:22 +02:00
Claudio Atzori af42377d0e HttpClient used in metadata collection retries on 502, 503, 504 2021-06-28 09:34:30 +02:00
Sandro La Bruzzo ad50415167 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-06-24 17:20:50 +02:00
Sandro La Bruzzo 80e15cc455 implemented mapping from uniprot, pdb and ebi links 2021-06-24 17:20:00 +02:00
Claudio Atzori 2e8fd2c531 cleanup 2021-06-23 14:38:24 +02:00
Claudio Atzori 4dc9ebf217 [raw_all] fixed unit test 2021-06-23 14:38:07 +02:00
Claudio Atzori 50fc5a64a0 [raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity 2021-06-23 11:49:42 +02:00
Claudio Atzori 5edcc6832a applying sonarLint suggestions 2021-06-23 09:53:29 +02:00
Sandro La Bruzzo 080a280bea added pdb to Oaf Transformation 2021-06-21 16:23:59 +02:00
Sandro La Bruzzo 1dc0c59e20 merged fix thai dates from stable_ids 2021-06-21 10:39:46 +02:00
Sandro La Bruzzo dc66cf615b Merge branch 'stable_id_scholexplorer' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer 2021-06-21 09:38:33 +02:00
Sandro La Bruzzo 507e42102a added pdb to oaf class 2021-06-21 09:36:40 +02:00
Sandro La Bruzzo a167543637 Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer 2021-06-21 09:14:11 +02:00
Sandro La Bruzzo 4fe7b75644 renamed packages 2021-06-18 16:41:24 +02:00
Sandro La Bruzzo 3990165d05 changed typologies of unresolved relation 2021-06-18 11:43:59 +02:00
Miriam Baglioni 180d671127 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-06-18 09:46:18 +02:00
Miriam Baglioni 13c96622c9 - 2021-06-18 09:45:16 +02:00
Miriam Baglioni b486ae498f added test and test resource to verify the generation of the date of acceptance from the input extracted from the dump 2021-06-18 09:43:32 +02:00
Miriam Baglioni 464c2ddde3 changed to split in two steps the generation of the crossref dataset 2021-06-18 09:42:31 +02:00