Commit Graph

423 Commits

Author SHA1 Message Date
Miriam Baglioni 6c65c93c0e refactoring 2020-08-07 17:45:35 +02:00
Miriam Baglioni 68adf86fe4 refactoring 2020-08-07 17:43:20 +02:00
Miriam Baglioni 26d2ad6ebb refactoring 2020-08-07 17:41:56 +02:00
Miriam Baglioni 9675af7965 refactoring 2020-08-07 17:41:07 +02:00
Miriam Baglioni 346a91f4d9 Added constants 2020-08-07 17:35:39 +02:00
Miriam Baglioni d52b0e1797 no use of IsLookUp. The query is done once and its result stored on HDFS. The path to the result is given instead of the isLookUpUrl 2020-08-07 17:34:40 +02:00
Miriam Baglioni ae1b7fbfdb changed method signature from set of mapkey entries to String representing path on file system where to find the map 2020-08-07 17:32:27 +02:00
Miriam Baglioni 931fa2ff00 removed dependencies 2020-08-07 16:46:37 +02:00
Miriam Baglioni 545ea9f77e moved in common. Zenodo response model and APIClient to deposit in Zenodo 2020-08-07 16:44:51 +02:00
Miriam Baglioni da9b012c15 fixed dewcription 2020-08-06 11:55:44 +02:00
Miriam Baglioni 6dbadcf181 the new schema for the dumped result 2020-08-06 11:05:56 +02:00
Miriam Baglioni adf0ca5aa7 test to send is from hdfs 2020-08-05 14:24:43 +02:00
Miriam Baglioni 14eda4f46e added method to try to put inputstream to zenodo 2020-08-05 14:18:25 +02:00
Miriam Baglioni e737a47270 added classes to try to send input stream to zenodo for the upload 2020-08-05 14:17:40 +02:00
Miriam Baglioni 873e9cd50c changed hadoop setting to connect to s3 2020-08-04 15:37:25 +02:00
Miriam Baglioni 5b651abf82 merge branch with master 2020-08-04 10:14:07 +02:00
Miriam Baglioni 901ae37f7b added step to workflow 2020-08-03 18:12:54 +02:00
Miriam Baglioni fa38cdb10b added resource 2020-08-03 18:11:12 +02:00
Miriam Baglioni e9fcc0b2f1 commented test unit - to decide change for mirroring the changed logics 2020-08-03 18:10:53 +02:00
Miriam Baglioni e43aeb139a added new property file and changed some parameter to old files 2020-08-03 18:07:28 +02:00
Miriam Baglioni aa9f3d9698 changed logic for save in s3 directly 2020-08-03 18:06:18 +02:00
Miriam Baglioni d465f0eec9 added fulltext to result 2020-08-03 18:03:27 +02:00
Miriam Baglioni ec4b392d12 added new dependencies for writing on s3 2020-08-03 17:57:04 +02:00
Miriam Baglioni c892c7dfa7 changed to query for community map just once and save the result for remaining executions 2020-08-03 17:56:31 +02:00
Alessia Bardi 8cc067fe76 specific test for claims 2020-08-03 11:17:50 +02:00
Michele Artini 652b13abb6 Merge branch 'master' into nsprefix_blacklist 2020-07-31 07:58:37 +02:00
Claudio Atzori cd631bb5bc defaults fixed in the cleaning workflow forces result.publisher to NULL when result.publisher.value in empty 2020-07-30 17:03:53 +02:00
Miriam Baglioni 872d7783fc - 2020-07-30 16:50:36 +02:00
Miriam Baglioni 57c87b7653 re-implemented to fix issue on not serializable Set<String> variable 2020-07-30 16:43:43 +02:00
Miriam Baglioni ef8e5957b5 added specific directory where to save results 2020-07-30 16:42:46 +02:00
Miriam Baglioni 75f3361c85 - 2020-07-30 16:41:31 +02:00
Miriam Baglioni 3f695b25fa refactoring 2020-07-30 16:40:15 +02:00
Miriam Baglioni e623f12bef refactoring 2020-07-30 16:32:59 +02:00
Miriam Baglioni ff7d05abb4 added support class to store the couple organizationId representativeId gaot from sql query on hive 2020-07-30 16:32:04 +02:00
Miriam Baglioni cf6d80b2ab added command to close the writer 2020-07-30 16:31:22 +02:00
Miriam Baglioni f985bca37b added USER_CLAIM constant value 2020-07-30 16:25:26 +02:00
Claudio Atzori 4bbfcf1ac6 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-07-30 16:25:06 +02:00
Claudio Atzori 4ff8007518 added function to set the missing vocabulary names, used in the cleaning workflow as a pre-cleaning step 2020-07-30 16:24:39 +02:00
Miriam Baglioni 6f1c40a933 - 2020-07-30 16:24:28 +02:00
Miriam Baglioni 2b66a93f9e added property file that was missing 2020-07-30 16:24:17 +02:00
Michele Artini bdece15ca0 blacklist of nsprefix 2020-07-30 16:13:38 +02:00
Sandro La Bruzzo c97c8f0c44 implemented new oozie job to extract entities in a separate dataset 2020-07-30 12:13:58 +02:00
Sandro La Bruzzo 3010a362bc updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset 2020-07-30 09:25:56 +02:00
Sandro La Bruzzo 487226f669 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-30 09:25:39 +02:00
Sandro La Bruzzo 16ae3c9ccf updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset 2020-07-30 09:25:32 +02:00
Miriam Baglioni ee8420c6b3 added resource for datasource test 2020-07-29 18:28:43 +02:00
Miriam Baglioni 76bcab98ce added code to filter out null originalId from the dump 2020-07-29 18:28:21 +02:00
Miriam Baglioni ef1d8aef17 added one test to verify the dump for the datasources 2020-07-29 18:27:46 +02:00
Miriam Baglioni 86bab79512 - 2020-07-29 18:20:22 +02:00
Miriam Baglioni 31791dcf3d fixed wrong property file path name 2020-07-29 18:20:08 +02:00