Commit Graph

2722 Commits

Author SHA1 Message Date
Miriam Baglioni ab8abd61bb GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:11:07 +02:00
Miriam Baglioni 335a824e34 GetCSV refactoring - fixed issue 2021-08-12 18:10:10 +02:00
Miriam Baglioni f0845e9865 GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:04:58 +02:00
Miriam Baglioni 7a789423aa GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:04:27 +02:00
Miriam Baglioni e9fc3ef3bc GetCSV refactoring - changed to use the new class to get and write the csv file 2021-08-12 18:03:41 +02:00
Miriam Baglioni 4317211a2b GetCSV refactoring - refactoring due to movement 2021-08-12 18:03:14 +02:00
Miriam Baglioni b62cd656a7 GetCSV refactoring - changed the model to store only the information needed 2021-08-12 18:01:10 +02:00
Miriam Baglioni d36e925277 GetCSV refactoring - moved under model package 2021-08-12 18:00:21 +02:00
Miriam Baglioni 6e84b3951f GetCSV refactoring - moving classes to dhp-common that have dependency with GetCSV class (that was located in graph-mapper) 2021-08-12 17:57:41 +02:00
Claudio Atzori 9587d4aee8 Merge branch 'beta' into hostedbymap 2021-08-12 17:04:30 +02:00
Claudio Atzori 86d940044c added test to verify bad records from FWF-E-Book-Library 2021-08-12 11:32:56 +02:00
Claudio Atzori 8cdce59e0e [graph raw] let the mapping exceptions propagate 2021-08-12 11:32:26 +02:00
Miriam Baglioni 08dd2b2102 moving the dependency version to the external pom file 2021-08-11 18:09:41 +02:00
Miriam Baglioni ac417ca798 removed not needed test resource 2021-08-11 17:50:33 +02:00
Miriam Baglioni e33daaeee8 reverting 2021-08-11 17:46:19 +02:00
Miriam Baglioni 785db1d5b2 refactoring 2021-08-11 17:44:07 +02:00
Miriam Baglioni 95e5482bbb removing not needed dependency 2021-08-11 17:42:26 +02:00
Miriam Baglioni b966329833 reverting 2021-08-11 17:37:00 +02:00
Miriam Baglioni 8ad7c71417 reverting 2021-08-11 17:36:12 +02:00
Miriam Baglioni 0e1a6bec20 reverting 2021-08-11 17:32:29 +02:00
Miriam Baglioni c6a2a780a9 reverting 2021-08-11 17:30:17 +02:00
Miriam Baglioni b6b58bba28 reverting 2021-08-11 17:25:37 +02:00
Miriam Baglioni 804589eb30 reverting 2021-08-11 17:23:35 +02:00
Miriam Baglioni d688749ad9 reverting 2021-08-11 17:22:28 +02:00
Miriam Baglioni 524c06e028 reverting 2021-08-11 17:20:30 +02:00
Miriam Baglioni 7aa3260729 reverting 2021-08-11 17:18:45 +02:00
Miriam Baglioni 55fc500d8d reverting 2021-08-11 17:17:48 +02:00
Miriam Baglioni 8229632839 adding assertions to the mapping of the unibi part of gold list 2021-08-11 16:36:01 +02:00
Miriam Baglioni b1c6140ebf removed all comments in Italian 2021-08-11 16:23:33 +02:00
Miriam Baglioni 52c18c2697 removed not needed test class. Teh functionality has been moved 2021-08-11 16:16:55 +02:00
Miriam Baglioni 8da3a25cf6 merging with branch beta 2021-08-11 15:55:34 +02:00
Claudio Atzori 9f4db73f30 updated/fixed unit tests 2021-08-11 15:02:51 +02:00
Claudio Atzori 61d811ba53 suggestions from intellij 2021-08-11 12:18:20 +02:00
Claudio Atzori 2ee21da43b suggestions from SonarLint 2021-08-11 12:13:22 +02:00
Miriam Baglioni b954fe9ba8 mergin with branch beta 2021-08-11 10:12:46 +02:00
Miriam Baglioni b688567db5 hostedbymap - modified part of test to check the bestaccessright changed 2021-08-11 10:12:10 +02:00
Miriam Baglioni 9731a6144a hostedbymap - in case the journal is open access the access may be changed also for the best access right in the result 2021-08-10 17:49:45 +02:00
Miriam Baglioni 2efa5abda5 refactoring 2021-08-09 12:28:36 +02:00
Claudio Atzori 577f3b1ac8 added dnet workflows responsible for the graph construction, enrichment, provision 2021-08-09 11:53:58 +02:00
Miriam Baglioni da20fceaf7 removed all the part related to the crossref dump download since it is done in a separate workflow 2021-08-09 11:53:45 +02:00
Claudio Atzori 964f97ed4d cleanup 2021-08-09 11:53:06 +02:00
Miriam Baglioni 54a6cbb244 CrossrefDump - put token among the parameters 2021-08-09 11:41:10 +02:00
Miriam Baglioni b7079804cb CrossrefDump - put token among the parameters 2021-08-09 11:34:35 +02:00
Miriam Baglioni a5f82f442b Merge branch 'beta' into doiboost_wf 2021-08-09 11:17:51 +02:00
Miriam Baglioni b6dcf89d22 mergin with branch beta 2021-08-09 11:14:43 +02:00
Claudio Atzori 66f398fe6f Merge pull request '[stats] fixed a typo' (#133) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #133
2021-08-06 14:29:57 +02:00
Miriam Baglioni bd096f5170 removed not needed param file 2021-08-05 10:55:43 +02:00
Miriam Baglioni 5faeefbda8 added script to download the dump,changed the workflow input paramenters 2021-08-05 10:54:03 +02:00
Miriam Baglioni 1965e4eece new workflow for downloading the dump of crossref and unpack it 2021-08-04 18:29:03 +02:00
Claudio Atzori 83c04e5d28 mapping test for dataset records adapted to reflect the delegated pid authority (zenodo) 2021-08-04 10:37:57 +02:00
Miriam Baglioni b4eb026c8b mergin with branch beta 2021-08-04 10:21:37 +02:00
Miriam Baglioni c7b71647c6 Hosted By Map - modification of the resource for testing the presence of only one entry per datasource id 2021-08-04 10:20:02 +02:00
Miriam Baglioni eb8c3f8594 Hosted By Map - test modified because of the application of the new aggregator on datasources 2021-08-04 10:19:17 +02:00
Miriam Baglioni e94ae0b1de Hosted By Map - extention of the workflow to consider also the application of the map to publications and datasources 2021-08-04 10:18:11 +02:00
Miriam Baglioni 67ba4c40e0 Hosted By Map - added parameter resources 2021-08-04 10:17:28 +02:00
Miriam Baglioni eccf3851b0 Hosted By Map - refactoring 2021-08-04 10:16:30 +02:00
Miriam Baglioni 1e952cccf6 Hosted By Map - refactoring and deletion of not needed methods 2021-08-04 10:15:43 +02:00
Miriam Baglioni 8ba8c77f92 Hosted By Map - refactoring 2021-08-04 10:14:57 +02:00
Miriam Baglioni 8f7623e77a Hosted By Map - refactoring and application of the new aggregator 2021-08-04 10:14:20 +02:00
Sandro La Bruzzo 3fc820203b fixed wrong test file 2021-08-04 10:13:59 +02:00
Miriam Baglioni a7bf314fd2 Hosted By Map - added new aggregator to get just one result per datasource id 2021-08-04 10:13:30 +02:00
Miriam Baglioni 9831725073 Hosted By Map - remove from workflow a step not needed. The hbm will be take care also of the integration of the unibi list of gold openaccess journals 2021-08-03 11:02:17 +02:00
Miriam Baglioni 100e54e6c8 mergin with branch beta 2021-08-03 10:47:11 +02:00
Miriam Baglioni 461b8a29a0 removed not needed class 2021-08-03 10:46:51 +02:00
Miriam Baglioni 327cddde33 Hosted By Map - refactoring 2021-08-03 10:44:13 +02:00
Miriam Baglioni 17292c6641 Hosted By Map - resources for testing purposes 2021-08-02 19:37:08 +02:00
Miriam Baglioni ee7ccb98dc Hosted By Map - test class to verify the application of the hbm to results and datasource 2021-08-02 19:36:18 +02:00
Miriam Baglioni 90e91486e2 Hosted By Map - test class to verify each step in the preparation process 2021-08-02 19:35:52 +02:00
Miriam Baglioni 1e859706a3 Hosted By Map - Classes to apply the HBM to results and datasources 2021-08-02 19:35:23 +02:00
Miriam Baglioni 72df8f9232 Hosted By Map - removed the aggregator for the datasource (it is no more needed) and added a new aggregator for the results. Changed also the hostedBYMap aggregator 2021-08-02 19:34:44 +02:00
Miriam Baglioni ff1ce75e33 Hosted By Map - modification in the code to prepare the info needed to apply the HostedByMap. There is no need to join datasources with the hbm: all the information needed is in the hosted by map already 2021-08-02 19:32:59 +02:00
Claudio Atzori e826aae848 using constants from ModelConstants 2021-08-02 14:28:59 +02:00
Antonis Lempesis 117c3d5c67 fixed a typo 2021-08-02 12:15:58 +03:00
Miriam Baglioni 1695d45bd4 Hosted By Map - Test class to verify the preparation of the intermediate information 2021-07-30 17:57:01 +02:00
Miriam Baglioni 7c6ea2f4c7 Hosted By Map - first attempt for the creation of intermedia information to be used to applu the hosted by map on the graph entities 2021-07-30 17:56:27 +02:00
Miriam Baglioni d8b9b0553b Hosted By Map - model classes to store the intermediate information to be used to apply the hosted by map 2021-07-30 17:55:39 +02:00
Miriam Baglioni 613bd3bde0 Hosted By Map - refactor of the first attemp to prepare a new hosted by map dependent on the datasource in the graph and on two external sources: the gold list from unibi ad the doaj list of open access journal. Both the lists are downloaded from provided url parameter 2021-07-30 17:54:45 +02:00
Miriam Baglioni d1807781c0 mergin with branch beta 2021-07-30 14:34:07 +02:00
Miriam Baglioni 1d6ac3715b merge branch with beta 2021-07-30 11:58:29 +02:00
Claudio Atzori 19620eed46 applying PR#131, Patch the identifiers (source/target) in the relations, refinements 2021-07-30 11:09:32 +02:00
Claudio Atzori 55e6470f44 Merge pull request 'added the sprint 2 indicators in monitor db' (#129) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #129
2021-07-30 10:11:46 +02:00
Sandro La Bruzzo 6358f92c3a added sleep to solve problem of lost request of creating index 2021-07-30 08:54:37 +02:00
Antonis Lempesis 26af0320d0 added the sprint 2 indicators in monitor db 2021-07-30 00:31:33 +03:00
Claudio Atzori 7b172e7cd9 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-07-29 13:57:06 +02:00
Claudio Atzori c53d106e80 [provision] lowercase relation filter 2021-07-29 13:57:00 +02:00
Sandro La Bruzzo b1b0cc3f15 fixed wrong package name 2021-07-29 13:55:08 +02:00
Miriam Baglioni baad01cadc hostedbymap 2021-07-29 13:04:39 +02:00
Claudio Atzori 5d08ad86ae [raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations 2021-07-29 13:03:16 +02:00
Claudio Atzori e87e1805c4 [raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset 2021-07-29 12:13:06 +02:00
Claudio Atzori dc55ed4acd Merge pull request '[beta] stats update workflow' (#128) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #128
2021-07-29 11:13:21 +02:00
Claudio Atzori 908f57a475 code formatting 2021-07-29 10:49:39 +02:00
Sandro La Bruzzo 3721df7aa6 refactoring create actionset of scholexplorer, moved on package dhp-aggregation 2021-07-29 10:45:35 +02:00
Antonis Lempesis 4afa5215a9 fixed a NPE? 2021-07-28 21:59:12 +03:00
Antonis Lempesis 3d1580fa9b fixed a typo 2021-07-28 18:50:31 +03:00
Claudio Atzori 4c5a71ba2f [broker] updated relation descriptors, making use of constant values 2021-07-28 17:11:18 +02:00
Claudio Atzori e1797c0a42 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-07-28 16:21:36 +02:00
Claudio Atzori 6dddad86ee [cleaning] title cleaning based on the me.xuender:unidecode library 2021-07-28 16:21:29 +02:00
Sandro La Bruzzo 3d8f0f629b implemented workflow of creation action set for scholexplorer 2021-07-28 16:15:34 +02:00
Antonis Lempesis 9b181ffa73 added the h2020 classification scheme for projects 2021-07-28 16:31:29 +03:00
Alessia Bardi df8715a1ec format code after mvn compile 2021-07-28 11:58:26 +02:00
Michele Artini 3e2a2d6e71 added new fields in xml 2021-07-28 11:56:55 +02:00
Alessia Bardi c806387d4b tests for enermaps 2021-07-28 11:54:36 +02:00
Claudio Atzori 2fff24df55 code formatting 2021-07-28 11:34:19 +02:00
Michele Artini 9f1c7b8e17 tests 2021-07-28 11:32:34 +02:00
Antonis Lempesis 4a9741825d added result_orcid, result_project provenance, issn in datasources 2021-07-28 12:28:04 +03:00
Miriam Baglioni 3d2bba3d5d removing not needed classes 2021-07-28 11:25:43 +02:00
Miriam Baglioni cc0d3d8a7b mergin with branch beta 2021-07-28 11:24:46 +02:00
Michele Artini e6f1773d63 mapping of new eosc fields 2021-07-28 11:17:11 +02:00
Miriam Baglioni 80d5b3b4de DoiBoost AccessRigh #4362 - removing commented code 2021-07-28 11:16:49 +02:00
Miriam Baglioni 5fe016dcbc DoiBoost AccessRigh #4362 - related to https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/126/files#issuecomment-4194 2021-07-28 11:14:28 +02:00
Miriam Baglioni 73ed7374a9 mergin with branch beta 2021-07-28 11:05:16 +02:00
Miriam Baglioni 43e62fcae9 DoiBoost AccessRigh #4362 - related to https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/126/files#issuecomment-4193 2021-07-28 11:04:55 +02:00
Michele Artini c72c960ffb added eosc fields 2021-07-28 11:03:15 +02:00
Michele Artini 1fb572a33a added eosc fields 2021-07-28 10:52:24 +02:00
Miriam Baglioni 708d0ade34 Merge branch 'beta' into hostedbymap 2021-07-28 10:37:22 +02:00
Sandro La Bruzzo 16c91203bd implemented workflow of creation action set for scholexplorer 2021-07-28 10:30:49 +02:00
Miriam Baglioni 6c936943aa mergin with branch beta 2021-07-28 10:24:48 +02:00
Miriam Baglioni 0424f47494 HostedByMap fixing issues 2021-07-28 10:24:13 +02:00
Michele Artini 52e2315ba2 removed trick for datasourcetypeui 2021-07-28 10:23:00 +02:00
Sandro La Bruzzo 825d9f0289 fixed datacite workflow starting from Importing delta 2021-07-27 16:09:46 +02:00
Claudio Atzori 5aa7d16d1b updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest 2021-07-27 15:11:58 +02:00
Antonis Lempesis 1a28a69cac changed the citeee in *_citations to cites 2021-07-27 15:14:09 +03:00
Miriam Baglioni 74f801b689 mergin with branch beta 2021-07-27 13:18:31 +02:00
Miriam Baglioni eb07f7f40f Hosted By Map 2021-07-27 12:27:26 +02:00
Antonis Lempesis ed185fd7ed added missing colons 2021-07-27 11:42:47 +03:00
Antonis Lempesis f3b9570354 properly invalidating metadata 2021-07-26 13:00:16 +03:00
Sandro La Bruzzo 848aabbb6c minor fix 2021-07-25 12:06:41 +02:00
Sandro La Bruzzo 8fac10c91e fixed defintion wf of creation final infospace of scholexplorer 2021-07-25 11:15:37 +02:00
Sandro La Bruzzo 3920c69bc8 change implementation of resolve Relation to generate jsonRdd in output 2021-07-25 09:51:36 +02:00
Antonis Lempesis f9fbb0f261 added indicators second sprint 2021-07-24 16:40:28 +03:00
Claudio Atzori a0393607a7 mapping funding relations from Datacite should be done according to the actual result identifier 2021-07-23 18:15:08 +02:00
Sandro La Bruzzo d9e3b89937 implemented last part of workflows to generate scholixGraph 2021-07-23 16:38:32 +02:00
Sandro La Bruzzo cfde63a7c3 fixed resolve relation join 2021-07-23 14:17:29 +02:00
Sandro La Bruzzo 4a439c3863 NPE fixed 2021-07-23 14:17:29 +02:00
Sandro La Bruzzo ca74e8dd02 create a separate wf for resolving relation 2021-07-23 11:40:06 +02:00
Sandro La Bruzzo 43e9380cd3 update resolve relation to use the same format of openaire graph 2021-07-23 11:25:18 +02:00
Sandro La Bruzzo 058b636d4d added control to check if the entity exists 2021-07-22 16:08:54 +02:00
Sandro La Bruzzo 62ae36a3d2 fixed NPE 2021-07-22 15:41:38 +02:00
Miriam Baglioni 63553a76b3 added code to download gold issn list from unibi 2021-07-22 12:01:48 +02:00
Miriam Baglioni 1a5b114906 DoiBoost AccessRigh #4362 - refactoring 2021-07-22 12:00:23 +02:00
Sandro La Bruzzo 31d2d6d41e Scholexplorer: introduction of dedup openaire 2021-07-21 18:09:32 +02:00
Miriam Baglioni b226ba4439 mergin with branch beta 2021-07-21 09:46:40 +02:00
Claudio Atzori 10d7b4f0b4 filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure 2021-07-20 11:52:05 +02:00
Miriam Baglioni 83fe31c92e changed the name of the workflows 2021-07-19 18:19:14 +02:00
Miriam Baglioni dd81c36b60 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-07-19 18:18:14 +02:00
Miriam Baglioni 54acc5373b changed the name of the workflows 2021-07-19 18:18:09 +02:00
Miriam Baglioni b420b11ed3 duplicate the number of partitions in ProcessMag 2021-07-19 18:16:23 +02:00
Claudio Atzori 65934888a1 adding record identifier among the originalIds regardless of what IdentifierFactory produces 2021-07-19 17:52:52 +02:00
Claudio Atzori 0977baf41d contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph 2021-07-19 17:43:52 +02:00
Miriam Baglioni 662c396354 duplicate the number of partitions in ConvertCrossrefToOaf 2021-07-19 12:41:14 +02:00