Commit Graph

1473 Commits

Author SHA1 Message Date
Miriam Baglioni a16bbf3202 changed test resource to mirror change in the Xquery that produced data to be parsed. The main Zenodo community it is no more provided in a different element, but it is part of the <zenodocommunities> 2020-08-11 17:48:44 +02:00
Miriam Baglioni 25f4fbceea draft of test and resources 2020-08-11 17:37:22 +02:00
Miriam Baglioni 30a2b19b65 changed metadata for deposition od covid-19 dump in Zenodo 2020-08-11 17:36:56 +02:00
Miriam Baglioni 49788b532a changed to mirror changes in the schema 2020-08-11 16:05:03 +02:00
Miriam Baglioni b08511287b - 2020-08-11 16:01:36 +02:00
Miriam Baglioni 7e81a17068 changed the XQUERY to mirror the change in the code 2020-08-11 16:00:33 +02:00
Miriam Baglioni 37ad2f28e9 removed added | in prefix for datasource 2020-08-11 15:55:06 +02:00
Miriam Baglioni f31c2e9461 enabled test 2020-08-11 15:49:25 +02:00
Miriam Baglioni 2d67476417 merge branch with master 2020-08-11 15:46:04 +02:00
Miriam Baglioni 77a390878c merge upstream 2020-08-11 15:45:48 +02:00
Miriam Baglioni 6d3804e24c - 2020-08-11 15:45:12 +02:00
Miriam Baglioni 0603ec4757 changed test to upload the dump for covid-19 community 2020-08-11 15:43:25 +02:00
Miriam Baglioni 7dfd56df9d - 2020-08-11 15:42:35 +02:00
Miriam Baglioni a169d7e7c1 added test file for the MakeTar class 2020-08-11 15:40:41 +02:00
Miriam Baglioni acb0926b2e json schemas for the dumped entities and relation 2020-08-11 15:39:48 +02:00
Miriam Baglioni ff52c51f92 added the communityMapPath parameter and removed the isLookUpUrl parameter 2020-08-11 15:39:22 +02:00
Miriam Baglioni 6f43acda5e added the maketar and send to zenodo step. Adjusted wf parameters 2020-08-11 15:38:20 +02:00
Miriam Baglioni ddc19de2e9 removed the isLookUpUrl among the parameters 2020-08-11 15:37:47 +02:00
Miriam Baglioni 592a8ea573 added parameter file for maketar class 2020-08-11 15:37:14 +02:00
Miriam Baglioni 77a0951b32 added the make archive step in the workflow 2020-08-11 15:32:32 +02:00
Miriam Baglioni cf4d918787 added description, changed parameter name and added method 2020-08-11 15:27:31 +02:00
Miriam Baglioni dc5fc5366d Creation of an archive for each related dump part 2020-08-11 15:26:06 +02:00
Miriam Baglioni 0ce49049d6 added description 2020-08-11 15:25:11 +02:00
Miriam Baglioni 9bae991167 added description of the class 2020-08-11 11:20:43 +02:00
Miriam Baglioni 341dc59ead removed the repartition(1). Added code for the creation of an archive containing all the parts dumped for each community 2020-08-11 11:18:58 +02:00
Sandro La Bruzzo fe8d640aee fixed error on oozie workflow 2020-08-11 09:43:03 +02:00
Sandro La Bruzzo 304590e854 updated workflow of indexing to start from begin 2020-08-11 09:17:47 +02:00
Sandro La Bruzzo eaf0dc68a2 fixed indexing 2020-08-11 09:17:03 +02:00
Miriam Baglioni 1991a49f70 removed reference to isLookUp to get the communityMap 2020-08-10 18:02:56 +02:00
Miriam Baglioni c378c38546 disabled test. The testing functionalities for hte upload in Zenode are moved to common 2020-08-10 12:41:11 +02:00
Miriam Baglioni 63ad0ed209 changed to use communityMapPath instead of IsLookUp 2020-08-10 12:40:19 +02:00
Miriam Baglioni cec795f2ea changed resources to mirror changes in the model 2020-08-10 12:39:35 +02:00
Miriam Baglioni f50e3e7333 changed the class for which to generate the schema 2020-08-10 12:03:49 +02:00
Miriam Baglioni b8c26f656c test using communityMapPath instead of isLookUp 2020-08-10 12:02:55 +02:00
Miriam Baglioni fe88904df0 changed the wf definition 2020-08-10 12:01:14 +02:00
Miriam Baglioni 87856467e2 removed isLookUpUrl and added code to read from HDSF the communitymap 2020-08-10 11:38:41 +02:00
Miriam Baglioni 1cf7043e26 removed isLookUoUrl from the parameters 2020-08-10 11:38:03 +02:00
Claudio Atzori cf6b68ce5a Merge pull request 'data provision workflow: add nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed' (#36) from provision_indexing into master 2020-08-10 11:16:29 +02:00
Sandro La Bruzzo 0ade33ad15 updated mergeFrom function for DLI Unknown 2020-08-10 10:18:35 +02:00
Miriam Baglioni 46986aae2d added the new parameter for newdeposion/newversion and concept_record_id 2020-08-07 18:00:06 +02:00
Miriam Baglioni 3aedfdf0d6 added option to do a new deposition or new version of an old deposition 2020-08-07 17:49:14 +02:00
Miriam Baglioni 1b3ad1bce6 filter out authors pid (only orcid). Added check to get unique provenance for context id. filtr out countries with code UNKNOWN 2020-08-07 17:48:18 +02:00
Miriam Baglioni 5ceb8c5f0a moved constants from graph.Constants 2020-08-07 17:46:47 +02:00
Miriam Baglioni 6c65c93c0e refactoring 2020-08-07 17:45:35 +02:00
Miriam Baglioni 68adf86fe4 refactoring 2020-08-07 17:43:20 +02:00
Miriam Baglioni 26d2ad6ebb refactoring 2020-08-07 17:41:56 +02:00
Miriam Baglioni 9675af7965 refactoring 2020-08-07 17:41:07 +02:00
Miriam Baglioni 346a91f4d9 Added constants 2020-08-07 17:35:39 +02:00
Miriam Baglioni d52b0e1797 no use of IsLookUp. The query is done once and its result stored on HDFS. The path to the result is given instead of the isLookUpUrl 2020-08-07 17:34:40 +02:00
Miriam Baglioni ae1b7fbfdb changed method signature from set of mapkey entries to String representing path on file system where to find the map 2020-08-07 17:32:27 +02:00
Miriam Baglioni 931fa2ff00 removed dependencies 2020-08-07 16:46:37 +02:00
Miriam Baglioni 545ea9f77e moved in common. Zenodo response model and APIClient to deposit in Zenodo 2020-08-07 16:44:51 +02:00
Sandro La Bruzzo ddb1446ceb fixed test 2020-08-07 11:34:33 +02:00
Sandro La Bruzzo 718bc7bbc8 implemented provision workflows using the new implementation with Dataset 2020-08-07 11:05:18 +02:00
Miriam Baglioni da9b012c15 fixed dewcription 2020-08-06 11:55:44 +02:00
Miriam Baglioni 6dbadcf181 the new schema for the dumped result 2020-08-06 11:05:56 +02:00
Sandro La Bruzzo a44e5abaa7 reformat code 2020-08-06 10:30:22 +02:00
Sandro La Bruzzo 4fb1821fab Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-08-06 10:28:31 +02:00
Sandro La Bruzzo 9d9e9edbd2 improved extractEntity Relation workflows using dataset 2020-08-06 10:28:24 +02:00
Miriam Baglioni adf0ca5aa7 test to send is from hdfs 2020-08-05 14:24:43 +02:00
Miriam Baglioni 14eda4f46e added method to try to put inputstream to zenodo 2020-08-05 14:18:25 +02:00
Miriam Baglioni e737a47270 added classes to try to send input stream to zenodo for the upload 2020-08-05 14:17:40 +02:00
Miriam Baglioni 873e9cd50c changed hadoop setting to connect to s3 2020-08-04 15:37:25 +02:00
Alessia Bardi a29565ff57 code formatting 2020-08-04 12:55:27 +02:00
Alessia Bardi 01db29e208 fixes redmine issue #5846: datacite and its different namespace declarations 2020-08-04 12:53:48 +02:00
Alessia Bardi b4e4e5f858 do not duplicate result PIDs 2020-08-04 12:52:14 +02:00
Alessia Bardi 09a323d18d testing a dataset from Nakala 2020-08-04 12:50:52 +02:00
Alessia Bardi c35bf486cc added handle among the possible PIDs 2020-08-04 12:50:12 +02:00
Miriam Baglioni 5b651abf82 merge branch with master 2020-08-04 10:14:07 +02:00
Miriam Baglioni 88e4c3b751 added default trust to context bulktagged 2020-08-04 10:13:25 +02:00
Miriam Baglioni f9342cb484 added constant 2020-08-03 18:32:35 +02:00
Miriam Baglioni 96c3c891f4 added trust 2020-08-03 18:32:17 +02:00
Miriam Baglioni 53656600ad changed XQuery to select only community and ri with status not hidden 2020-08-03 18:29:30 +02:00
Miriam Baglioni b34177d8ef merge upstream 2020-08-03 18:13:42 +02:00
Miriam Baglioni 901ae37f7b added step to workflow 2020-08-03 18:12:54 +02:00
Miriam Baglioni fa38cdb10b added resource 2020-08-03 18:11:12 +02:00
Miriam Baglioni e9fcc0b2f1 commented test unit - to decide change for mirroring the changed logics 2020-08-03 18:10:53 +02:00
Miriam Baglioni e43aeb139a added new property file and changed some parameter to old files 2020-08-03 18:07:28 +02:00
Miriam Baglioni aa9f3d9698 changed logic for save in s3 directly 2020-08-03 18:06:18 +02:00
Miriam Baglioni d465f0eec9 added fulltext to result 2020-08-03 18:03:27 +02:00
Miriam Baglioni ec4b392d12 added new dependencies for writing on s3 2020-08-03 17:57:04 +02:00
Miriam Baglioni c892c7dfa7 changed to query for community map just once and save the result for remaining executions 2020-08-03 17:56:31 +02:00
Claudio Atzori 3a11a387a9 data provision workflow enhancement: added nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed 2020-08-03 14:28:08 +02:00
Alessia Bardi 8cc067fe76 specific test for claims 2020-08-03 11:17:50 +02:00
Claudio Atzori a89b6cc3ba Merge pull request 'nsprefix_blacklist' (#34) from nsprefix_blacklist into master 2020-07-31 11:52:23 +02:00
Sandro La Bruzzo 0c3bc9ea4b Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-31 09:07:18 +02:00
Sandro La Bruzzo 168bfb496a adopted dedup to the new schema 2020-07-31 09:06:57 +02:00
Michele Artini 652b13abb6 Merge branch 'master' into nsprefix_blacklist 2020-07-31 07:58:37 +02:00
Claudio Atzori cd631bb5bc defaults fixed in the cleaning workflow forces result.publisher to NULL when result.publisher.value in empty 2020-07-30 17:03:53 +02:00
Miriam Baglioni 872d7783fc - 2020-07-30 16:50:36 +02:00
Miriam Baglioni 57c87b7653 re-implemented to fix issue on not serializable Set<String> variable 2020-07-30 16:43:43 +02:00
Miriam Baglioni ef8e5957b5 added specific directory where to save results 2020-07-30 16:42:46 +02:00
Miriam Baglioni 75f3361c85 - 2020-07-30 16:41:31 +02:00
Miriam Baglioni 3f695b25fa refactoring 2020-07-30 16:40:15 +02:00
Miriam Baglioni e623f12bef refactoring 2020-07-30 16:32:59 +02:00
Miriam Baglioni ff7d05abb4 added support class to store the couple organizationId representativeId gaot from sql query on hive 2020-07-30 16:32:04 +02:00
Miriam Baglioni cf6d80b2ab added command to close the writer 2020-07-30 16:31:22 +02:00
Miriam Baglioni f985bca37b added USER_CLAIM constant value 2020-07-30 16:25:26 +02:00
Claudio Atzori 4bbfcf1ac6 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-07-30 16:25:06 +02:00
Claudio Atzori 4ff8007518 added function to set the missing vocabulary names, used in the cleaning workflow as a pre-cleaning step 2020-07-30 16:24:39 +02:00