Commit Graph

372 Commits

Author SHA1 Message Date
Miriam Baglioni b2bb8d9d79 [DOIBoost - Mapping] selecting the url from Crossref containing the doi 2021-11-03 15:44:57 +01:00
Miriam Baglioni 779318961c [DOIBoost - Mapping] removed the url from crossref containing the api.elsevier.com... string in the url 2021-11-03 14:38:52 +01:00
Miriam Baglioni 2480e590d1 [DOIBoost - Mapping] changed the type on which to map dissertation from Crossref: from 006 Doctoral thesis to 0044 Thesis since dissertation could be either Doctoral or master thesis 2021-11-03 14:25:23 +01:00
Miriam Baglioni 46f82c7c8f removed not needed folder deletion 2021-10-18 10:53:16 +02:00
Miriam Baglioni 5d9cc2452d changed the working path parameter value as dependant from the dnet-workflow working dir parameter 2021-10-13 15:33:50 +02:00
Miriam Baglioni 16b28494a9 added new parameter in the doiboost process workflow to specify a folder for the process of MAG dataset 2021-10-13 11:34:24 +02:00
Sandro La Bruzzo 8f99d2af86 Make the node of doiBoost to point to the correct OpenAire Organization in relations 2021-10-08 08:35:12 +02:00
Sandro La Bruzzo be79d74e3d Fixed DoiBoost generation to point to correct organization in affiliation relation 2021-09-27 16:57:04 +02:00
Enrico Ottonello 92a63f78fe multiple download attempts handling if a connection to orcid server fails 2021-09-20 18:25:00 +02:00
Enrico Ottonello 8b804e7fe1 removed unused imports 2021-09-14 17:30:52 +02:00
Enrico Ottonello aefa36c54b other task executions go ahead if UnknownHostException happens on a single task 2021-09-14 17:26:15 +02:00
Miriam Baglioni 882abb40e4 CrossrefDump - 2021-08-20 11:12:53 +02:00
Miriam Baglioni 45c62609af CrossrefDump - modified because parameter file was moved 2021-08-20 11:12:31 +02:00
Miriam Baglioni 35880c0e7b CrossrefDump - changed the wf to be able to resume from one of the steps 2021-08-20 11:11:35 +02:00
Miriam Baglioni f3b6c392c1 CrossrefDump - moving parameter file under folder crossref_dump_reader 2021-08-20 11:10:58 +02:00
Miriam Baglioni 65822400ce CrossrefDump - added new parameter file that was missing 2021-08-20 11:10:35 +02:00
Claudio Atzori bc7068106c added crossref download oozie workflow 2021-08-13 17:19:44 +02:00
Claudio Atzori 2c0a05f11a manually merged PR#139 2021-08-13 17:15:53 +02:00
Miriam Baglioni 5856ca8a7b merging with branch beta - resolved conflicts 2021-08-13 16:45:45 +02:00
Miriam Baglioni 6fec71e8d2 removed the specific of the infra we are running the wf from the wf name 2021-08-13 16:39:02 +02:00
Miriam Baglioni ed7e28490a change in sh 2021-08-13 16:19:01 +02:00
Miriam Baglioni 6eb7508995 mergin with branch beta 2021-08-13 16:07:04 +02:00
Miriam Baglioni dc8b05b39e Hosted By Map - changed the association with the datasource id for the hostedby element: there is no more the need to compute it. With the new HBM it is already the id in the graph 2021-08-13 10:18:25 +02:00
Miriam Baglioni ac417ca798 removed not needed test resource 2021-08-11 17:50:33 +02:00
Miriam Baglioni e33daaeee8 reverting 2021-08-11 17:46:19 +02:00
Miriam Baglioni 95e5482bbb removing not needed dependency 2021-08-11 17:42:26 +02:00
Miriam Baglioni b6b58bba28 reverting 2021-08-11 17:25:37 +02:00
Miriam Baglioni 52c18c2697 removed not needed test class. Teh functionality has been moved 2021-08-11 16:16:55 +02:00
Miriam Baglioni 8da3a25cf6 merging with branch beta 2021-08-11 15:55:34 +02:00
Claudio Atzori 9f4db73f30 updated/fixed unit tests 2021-08-11 15:02:51 +02:00
Claudio Atzori 2ee21da43b suggestions from SonarLint 2021-08-11 12:13:22 +02:00
Miriam Baglioni da20fceaf7 removed all the part related to the crossref dump download since it is done in a separate workflow 2021-08-09 11:53:45 +02:00
Miriam Baglioni 54a6cbb244 CrossrefDump - put token among the parameters 2021-08-09 11:41:10 +02:00
Miriam Baglioni b7079804cb CrossrefDump - put token among the parameters 2021-08-09 11:34:35 +02:00
Miriam Baglioni bd096f5170 removed not needed param file 2021-08-05 10:55:43 +02:00
Miriam Baglioni 5faeefbda8 added script to download the dump,changed the workflow input paramenters 2021-08-05 10:54:03 +02:00
Miriam Baglioni 1965e4eece new workflow for downloading the dump of crossref and unpack it 2021-08-04 18:29:03 +02:00
Miriam Baglioni b4eb026c8b mergin with branch beta 2021-08-04 10:21:37 +02:00
Miriam Baglioni 9831725073 Hosted By Map - remove from workflow a step not needed. The hbm will be take care also of the integration of the unibi list of gold openaccess journals 2021-08-03 11:02:17 +02:00
Miriam Baglioni 1d6ac3715b merge branch with beta 2021-07-30 11:58:29 +02:00
Miriam Baglioni baad01cadc hostedbymap 2021-07-29 13:04:39 +02:00
Miriam Baglioni 3d2bba3d5d removing not needed classes 2021-07-28 11:25:43 +02:00
Miriam Baglioni 80d5b3b4de DoiBoost AccessRigh #4362 - removing commented code 2021-07-28 11:16:49 +02:00
Miriam Baglioni 5fe016dcbc DoiBoost AccessRigh #4362 - related to https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/126/files#issuecomment-4194 2021-07-28 11:14:28 +02:00
Miriam Baglioni 43e62fcae9 DoiBoost AccessRigh #4362 - related to https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/126/files#issuecomment-4193 2021-07-28 11:04:55 +02:00
Miriam Baglioni eb07f7f40f Hosted By Map 2021-07-27 12:27:26 +02:00
Miriam Baglioni 63553a76b3 added code to download gold issn list from unibi 2021-07-22 12:01:48 +02:00
Miriam Baglioni 1a5b114906 DoiBoost AccessRigh #4362 - refactoring 2021-07-22 12:00:23 +02:00
Miriam Baglioni b226ba4439 mergin with branch beta 2021-07-21 09:46:40 +02:00
Miriam Baglioni 83fe31c92e changed the name of the workflows 2021-07-19 18:19:14 +02:00
Miriam Baglioni 54acc5373b changed the name of the workflows 2021-07-19 18:18:09 +02:00
Miriam Baglioni b420b11ed3 duplicate the number of partitions in ProcessMag 2021-07-19 18:16:23 +02:00
Miriam Baglioni 662c396354 duplicate the number of partitions in ConvertCrossrefToOaf 2021-07-19 12:41:14 +02:00
Miriam Baglioni 59530a14fb DoiBoost AccessRigh #4362 - set BestAccessRight with the ususal comparator 2021-07-19 12:34:35 +02:00
Miriam Baglioni 199123b74b DoiBoost AccessRigh #4362 - Fixed issue on date formatting. Added test method and associated resource 2021-07-16 17:30:27 +02:00
Miriam Baglioni c4b18e6ccb changed the download.sh, added skip step to allow to not execute one phase and changed the workflow sequence of steps 2021-07-16 15:01:25 +02:00
Miriam Baglioni acd6056330 added shell action to automatically download the new dump and put it in a specified hdfs location 2021-07-16 12:47:10 +02:00
Miriam Baglioni 3bc9a05bc9 mergin with branch beta 2021-07-16 10:32:27 +02:00
Miriam Baglioni 34506df1b6 DoiBoost AccessRigh #4362 - if the journal is open, the OPEN access right is set to all instances and color is GOLD (overwrite if the color was already set in one of the previous steps) 2021-07-16 10:29:51 +02:00
Claudio Atzori bf9e0d2d4f Merge pull request 'orcid-no-doi' (#123) from enrico.ottonello/dnet-hadoop:orcid-no-doi into beta
Reviewed-on: D-Net/dnet-hadoop#123
2021-07-15 17:59:41 +02:00
Miriam Baglioni 4da46bb62f mergin with branch beta 2021-07-14 15:08:52 +02:00
Miriam Baglioni 09ad7b2a9e DoiBoost AccessRigh #4362 - Unpaywall mapped to OAF with OPEN instance (non oa are filtered out) (unknown hostedby) + map the color as it is 2021-07-14 14:45:21 +02:00
Miriam Baglioni f4f7c6f9d3 DoiBoost AccessRigh #4362 - Unpaywall mapped to OAF with OPEN instance (non oa are filtered out) (unknown hostedby) + map the color as it is 2021-07-14 14:44:54 +02:00
Miriam Baglioni 6222adf176 DoiBoost AccessRigh #4362 - added resources and test for crossref mapping (licence part included) 2021-07-14 14:42:34 +02:00
Miriam Baglioni 981b1018f6 DoiBoost AccessRigh #4362 - decide access right according to licence. Default access right is Unknown 2021-07-14 14:42:06 +02:00
Sandro La Bruzzo 3d8e2aa146 Code refactor:
- removed old workflows in doiboost
 - splitted workflow of doiboost in preprocess and process
2021-07-14 14:37:06 +02:00
Miriam Baglioni 441701c85c DoiBoost AccessRigh #4362 - If multiple licenses are available, take the one applied to 'vor' 2021-07-14 14:14:50 +02:00
Sandro La Bruzzo c35c117601 fixed process doiboost workflow:
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 12:48:01 +02:00
Sandro La Bruzzo bbe8193930 merged stable ids 2021-07-12 17:00:43 +02:00
Miriam Baglioni 7177c25261 added check for null value during doi normalization 2021-07-05 16:22:38 +02:00
Miriam Baglioni 0892cad4e8 the normalization of the content of value was not visible outside the block. Moved doi normalization operation while returning value 2021-07-05 16:21:42 +02:00
Sandro La Bruzzo c6fa8598e1 massive code refactor:
removed modules dhp-*-scholexplorer
2021-07-01 22:13:45 +02:00
Miriam Baglioni bc34347643 added assertions to verify doi normalization 2021-06-30 14:37:08 +02:00
Miriam Baglioni 86f47afcc7 slight modification of the resource to accomodate also doi normalization tests 2021-06-30 14:36:49 +02:00
Miriam Baglioni 03767ea8e6 slight modification of the resource to accomodate also doi normalization tests 2021-06-30 13:21:24 +02:00
Miriam Baglioni f8eec0ca9a added resource to test the normalization of doi during the import of MAG 2021-06-30 13:19:54 +02:00
Miriam Baglioni 149f85ddf5 added tests for the normalization of the dois 2021-06-30 13:00:52 +02:00
Miriam Baglioni e487b5544c added tests for the normalization of the dois 2021-06-30 12:57:11 +02:00
Miriam Baglioni 1503ccbbb5 added tests for the normalization of the dois 2021-06-30 12:55:37 +02:00
Miriam Baglioni 1299bfb357 Added class to test the normalization of doi 2021-06-30 12:53:27 +02:00
Miriam Baglioni cf758f4f91 added normalization step for the doi 2021-06-30 10:03:15 +02:00
Miriam Baglioni 801763a0fa there is no more the need to lower case the doi since it is done in the first step. Also changed the creation of the id by using the factory 2021-06-29 19:07:23 +02:00
Miriam Baglioni a74de1cda2 added normalization step to the doi 2021-06-29 18:51:11 +02:00
Miriam Baglioni 06074ea7d3 added normalization step to the doi 2021-06-29 18:46:08 +02:00
Miriam Baglioni 8b8ffe82dc added step of normalization for the doi 2021-06-29 18:41:39 +02:00
Miriam Baglioni 50cc21d92e Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.) 2021-06-29 18:35:28 +02:00
Sandro La Bruzzo 80e15cc455 implemented mapping from uniprot, pdb and ebi links 2021-06-24 17:20:00 +02:00
Sandro La Bruzzo a167543637 Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer 2021-06-21 09:14:11 +02:00
Miriam Baglioni 13c96622c9 - 2021-06-18 09:45:16 +02:00
Miriam Baglioni b486ae498f added test and test resource to verify the generation of the date of acceptance from the input extracted from the dump 2021-06-18 09:43:32 +02:00
Miriam Baglioni 464c2ddde3 changed to split in two steps the generation of the crossref dataset 2021-06-18 09:42:31 +02:00
Miriam Baglioni 6aca0d8ebb added kryo encoding for input files 2021-06-18 09:42:07 +02:00
Miriam Baglioni 3585e53da3 changed to split in two steps the generation of the crossref dataset 2021-06-18 09:41:23 +02:00
Sandro La Bruzzo 3100166d29 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-06-16 16:22:16 +02:00
Miriam Baglioni 95885bcf12 forces executor Executor memory and driver executor memory to be 7G (trying to avoid OOM) 2021-06-16 10:17:52 +02:00
Miriam Baglioni 2550a73981 - 2021-06-16 10:04:41 +02:00
Miriam Baglioni 1c47c0d786 modified the number of executors trying to avoid OOM exception 2021-06-15 21:05:39 +02:00
Miriam Baglioni 7deac55138 added one option for resume from in the wf 2021-06-15 18:38:20 +02:00
Miriam Baglioni 66e7ef892f changed the parameter name 2021-06-15 11:08:54 +02:00
Miriam Baglioni 4f47ad0891 no need to rename the folders, just write in overwrite mode, so I changed the name of the output folder 2021-06-15 09:28:31 +02:00