Commit Graph

1556 Commits

Author SHA1 Message Date
Miriam Baglioni fcd10f452c changed because of D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:55:32 +02:00
Miriam Baglioni fd48ae3b85 changed because of D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:19:15 +02:00
Miriam Baglioni 04a3e1ab38 disabled tests 2020-08-13 12:18:13 +02:00
Miriam Baglioni 2ede397933 Apply change because of D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:16:39 +02:00
Miriam Baglioni bfd1fcde6d removed not useful method and changed because of D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:14:37 +02:00
Miriam Baglioni 7fd8397123 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:13:15 +02:00
Miriam Baglioni 753d448cc9 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:12:58 +02:00
Miriam Baglioni c0e071fa26 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:12:40 +02:00
Miriam Baglioni 526db915bc apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:12:16 +02:00
Miriam Baglioni b0fab0d138 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:11:57 +02:00
Miriam Baglioni 1b6320b251 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:11:41 +02:00
Miriam Baglioni 743d31be22 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:11:22 +02:00
Miriam Baglioni 65b48df652 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:11:06 +02:00
Miriam Baglioni 90b54d3efb apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:08:24 +02:00
Miriam Baglioni 69bbb9592a apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:07:39 +02:00
Miriam Baglioni 945323299a apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:07:24 +02:00
Miriam Baglioni e04c993247 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:07:07 +02:00
Miriam Baglioni ed0812d0ce apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:06:49 +02:00
Miriam Baglioni d55cfe0ea5 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:06:20 +02:00
Miriam Baglioni 80866bec7d apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:06:05 +02:00
Miriam Baglioni 1400978c0a apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:05:44 +02:00
Miriam Baglioni 7b941a2e0a apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:05:17 +02:00
Miriam Baglioni f7474f50fe apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:04:52 +02:00
Miriam Baglioni 367203f412 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:04:33 +02:00
Miriam Baglioni 3ab4809d31 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:04:10 +02:00
Miriam Baglioni 02a4986e7b Applying changed from code reviews D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment) 2020-08-13 11:53:01 +02:00
Miriam Baglioni 235d4e4d6e moved Context as relevant for Communities dump 2020-08-12 18:16:45 +02:00
Miriam Baglioni adf9f96a67 test for extraction of relation between organizations and context 2020-08-12 10:04:47 +02:00
Miriam Baglioni 7400cd019d removed not needed variable 2020-08-12 10:03:33 +02:00
Miriam Baglioni 98d28bab5c fixed missing _ in context nsprefix 2020-08-12 10:00:18 +02:00
Miriam Baglioni 8f48cb29f4 changed resource because of a change in the XQuery that returned the XML to be parsed. The main Zenodo community is no more a separate element, but part of the <zenodocommunities> element 2020-08-11 18:04:38 +02:00
Miriam Baglioni c3672b162b merge branch with master 2020-08-11 17:53:04 +02:00
Miriam Baglioni a16bbf3202 changed test resource to mirror change in the Xquery that produced data to be parsed. The main Zenodo community it is no more provided in a different element, but it is part of the <zenodocommunities> 2020-08-11 17:48:44 +02:00
Miriam Baglioni 25f4fbceea draft of test and resources 2020-08-11 17:37:22 +02:00
Miriam Baglioni 30a2b19b65 changed metadata for deposition od covid-19 dump in Zenodo 2020-08-11 17:36:56 +02:00
Claudio Atzori f7cc52ab02 Merge pull request 'enrichment_wfs' (#39) from enrichment_wfs into master
LGTM
2020-08-11 17:26:13 +02:00
Miriam Baglioni 49788b532a changed to mirror changes in the schema 2020-08-11 16:05:03 +02:00
Miriam Baglioni b08511287b - 2020-08-11 16:01:36 +02:00
Miriam Baglioni 7e81a17068 changed the XQUERY to mirror the change in the code 2020-08-11 16:00:33 +02:00
Miriam Baglioni 37ad2f28e9 removed added | in prefix for datasource 2020-08-11 15:55:06 +02:00
Miriam Baglioni f31c2e9461 enabled test 2020-08-11 15:49:25 +02:00
Miriam Baglioni 2d67476417 merge branch with master 2020-08-11 15:46:04 +02:00
Miriam Baglioni 77a390878c merge upstream 2020-08-11 15:45:48 +02:00
Miriam Baglioni 6d3804e24c - 2020-08-11 15:45:12 +02:00
Miriam Baglioni 0603ec4757 changed test to upload the dump for covid-19 community 2020-08-11 15:43:25 +02:00
Miriam Baglioni 7dfd56df9d - 2020-08-11 15:42:35 +02:00
Miriam Baglioni a169d7e7c1 added test file for the MakeTar class 2020-08-11 15:40:41 +02:00
Miriam Baglioni acb0926b2e json schemas for the dumped entities and relation 2020-08-11 15:39:48 +02:00
Miriam Baglioni ff52c51f92 added the communityMapPath parameter and removed the isLookUpUrl parameter 2020-08-11 15:39:22 +02:00
Miriam Baglioni 6f43acda5e added the maketar and send to zenodo step. Adjusted wf parameters 2020-08-11 15:38:20 +02:00
Miriam Baglioni ddc19de2e9 removed the isLookUpUrl among the parameters 2020-08-11 15:37:47 +02:00
Miriam Baglioni 592a8ea573 added parameter file for maketar class 2020-08-11 15:37:14 +02:00
Miriam Baglioni 77a0951b32 added the make archive step in the workflow 2020-08-11 15:32:32 +02:00
Miriam Baglioni cf4d918787 added description, changed parameter name and added method 2020-08-11 15:27:31 +02:00
Miriam Baglioni dc5fc5366d Creation of an archive for each related dump part 2020-08-11 15:26:06 +02:00
Miriam Baglioni 0ce49049d6 added description 2020-08-11 15:25:11 +02:00
Miriam Baglioni 9bae991167 added description of the class 2020-08-11 11:20:43 +02:00
Miriam Baglioni 341dc59ead removed the repartition(1). Added code for the creation of an archive containing all the parts dumped for each community 2020-08-11 11:18:58 +02:00
Sandro La Bruzzo fe8d640aee fixed error on oozie workflow 2020-08-11 09:43:03 +02:00
Sandro La Bruzzo 304590e854 updated workflow of indexing to start from begin 2020-08-11 09:17:47 +02:00
Sandro La Bruzzo eaf0dc68a2 fixed indexing 2020-08-11 09:17:03 +02:00
Miriam Baglioni 1991a49f70 removed reference to isLookUp to get the communityMap 2020-08-10 18:02:56 +02:00
Miriam Baglioni c378c38546 disabled test. The testing functionalities for hte upload in Zenode are moved to common 2020-08-10 12:41:11 +02:00
Miriam Baglioni 63ad0ed209 changed to use communityMapPath instead of IsLookUp 2020-08-10 12:40:19 +02:00
Miriam Baglioni cec795f2ea changed resources to mirror changes in the model 2020-08-10 12:39:35 +02:00
Miriam Baglioni f50e3e7333 changed the class for which to generate the schema 2020-08-10 12:03:49 +02:00
Miriam Baglioni b8c26f656c test using communityMapPath instead of isLookUp 2020-08-10 12:02:55 +02:00
Miriam Baglioni fe88904df0 changed the wf definition 2020-08-10 12:01:14 +02:00
Miriam Baglioni 87856467e2 removed isLookUpUrl and added code to read from HDSF the communitymap 2020-08-10 11:38:41 +02:00
Miriam Baglioni 1cf7043e26 removed isLookUoUrl from the parameters 2020-08-10 11:38:03 +02:00
Claudio Atzori cf6b68ce5a Merge pull request 'data provision workflow: add nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed' (#36) from provision_indexing into master 2020-08-10 11:16:29 +02:00
Sandro La Bruzzo 0ade33ad15 updated mergeFrom function for DLI Unknown 2020-08-10 10:18:35 +02:00
Miriam Baglioni 46986aae2d added the new parameter for newdeposion/newversion and concept_record_id 2020-08-07 18:00:06 +02:00
Miriam Baglioni 3aedfdf0d6 added option to do a new deposition or new version of an old deposition 2020-08-07 17:49:14 +02:00
Miriam Baglioni 1b3ad1bce6 filter out authors pid (only orcid). Added check to get unique provenance for context id. filtr out countries with code UNKNOWN 2020-08-07 17:48:18 +02:00
Miriam Baglioni 5ceb8c5f0a moved constants from graph.Constants 2020-08-07 17:46:47 +02:00
Miriam Baglioni 6c65c93c0e refactoring 2020-08-07 17:45:35 +02:00
Miriam Baglioni 68adf86fe4 refactoring 2020-08-07 17:43:20 +02:00
Miriam Baglioni 26d2ad6ebb refactoring 2020-08-07 17:41:56 +02:00
Miriam Baglioni 9675af7965 refactoring 2020-08-07 17:41:07 +02:00
Miriam Baglioni 346a91f4d9 Added constants 2020-08-07 17:35:39 +02:00
Miriam Baglioni d52b0e1797 no use of IsLookUp. The query is done once and its result stored on HDFS. The path to the result is given instead of the isLookUpUrl 2020-08-07 17:34:40 +02:00
Miriam Baglioni ae1b7fbfdb changed method signature from set of mapkey entries to String representing path on file system where to find the map 2020-08-07 17:32:27 +02:00
Miriam Baglioni 931fa2ff00 removed dependencies 2020-08-07 16:46:37 +02:00
Miriam Baglioni 545ea9f77e moved in common. Zenodo response model and APIClient to deposit in Zenodo 2020-08-07 16:44:51 +02:00
Sandro La Bruzzo ddb1446ceb fixed test 2020-08-07 11:34:33 +02:00
Sandro La Bruzzo 718bc7bbc8 implemented provision workflows using the new implementation with Dataset 2020-08-07 11:05:18 +02:00
Miriam Baglioni da9b012c15 fixed dewcription 2020-08-06 11:55:44 +02:00
Miriam Baglioni 6dbadcf181 the new schema for the dumped result 2020-08-06 11:05:56 +02:00
Sandro La Bruzzo a44e5abaa7 reformat code 2020-08-06 10:30:22 +02:00
Sandro La Bruzzo 4fb1821fab Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-08-06 10:28:31 +02:00
Sandro La Bruzzo 9d9e9edbd2 improved extractEntity Relation workflows using dataset 2020-08-06 10:28:24 +02:00
Miriam Baglioni adf0ca5aa7 test to send is from hdfs 2020-08-05 14:24:43 +02:00
Miriam Baglioni 14eda4f46e added method to try to put inputstream to zenodo 2020-08-05 14:18:25 +02:00
Miriam Baglioni e737a47270 added classes to try to send input stream to zenodo for the upload 2020-08-05 14:17:40 +02:00
Miriam Baglioni 873e9cd50c changed hadoop setting to connect to s3 2020-08-04 15:37:25 +02:00
Alessia Bardi a29565ff57 code formatting 2020-08-04 12:55:27 +02:00
Alessia Bardi 01db29e208 fixes redmine issue #5846: datacite and its different namespace declarations 2020-08-04 12:53:48 +02:00
Alessia Bardi b4e4e5f858 do not duplicate result PIDs 2020-08-04 12:52:14 +02:00
Alessia Bardi 09a323d18d testing a dataset from Nakala 2020-08-04 12:50:52 +02:00
Alessia Bardi c35bf486cc added handle among the possible PIDs 2020-08-04 12:50:12 +02:00
Miriam Baglioni 5b651abf82 merge branch with master 2020-08-04 10:14:07 +02:00
Miriam Baglioni 88e4c3b751 added default trust to context bulktagged 2020-08-04 10:13:25 +02:00
Miriam Baglioni f9342cb484 added constant 2020-08-03 18:32:35 +02:00
Miriam Baglioni 96c3c891f4 added trust 2020-08-03 18:32:17 +02:00
Miriam Baglioni 53656600ad changed XQuery to select only community and ri with status not hidden 2020-08-03 18:29:30 +02:00
Miriam Baglioni b34177d8ef merge upstream 2020-08-03 18:13:42 +02:00
Miriam Baglioni 901ae37f7b added step to workflow 2020-08-03 18:12:54 +02:00
Miriam Baglioni fa38cdb10b added resource 2020-08-03 18:11:12 +02:00
Miriam Baglioni e9fcc0b2f1 commented test unit - to decide change for mirroring the changed logics 2020-08-03 18:10:53 +02:00
Miriam Baglioni e43aeb139a added new property file and changed some parameter to old files 2020-08-03 18:07:28 +02:00
Miriam Baglioni aa9f3d9698 changed logic for save in s3 directly 2020-08-03 18:06:18 +02:00
Miriam Baglioni d465f0eec9 added fulltext to result 2020-08-03 18:03:27 +02:00
Miriam Baglioni ec4b392d12 added new dependencies for writing on s3 2020-08-03 17:57:04 +02:00
Miriam Baglioni c892c7dfa7 changed to query for community map just once and save the result for remaining executions 2020-08-03 17:56:31 +02:00
Claudio Atzori 3a11a387a9 data provision workflow enhancement: added nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed 2020-08-03 14:28:08 +02:00
Alessia Bardi 8cc067fe76 specific test for claims 2020-08-03 11:17:50 +02:00
Claudio Atzori a89b6cc3ba Merge pull request 'nsprefix_blacklist' (#34) from nsprefix_blacklist into master 2020-07-31 11:52:23 +02:00
Sandro La Bruzzo 0c3bc9ea4b Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-31 09:07:18 +02:00
Sandro La Bruzzo 168bfb496a adopted dedup to the new schema 2020-07-31 09:06:57 +02:00
Michele Artini 652b13abb6 Merge branch 'master' into nsprefix_blacklist 2020-07-31 07:58:37 +02:00
Claudio Atzori cd631bb5bc defaults fixed in the cleaning workflow forces result.publisher to NULL when result.publisher.value in empty 2020-07-30 17:03:53 +02:00
Miriam Baglioni 872d7783fc - 2020-07-30 16:50:36 +02:00
Miriam Baglioni 57c87b7653 re-implemented to fix issue on not serializable Set<String> variable 2020-07-30 16:43:43 +02:00
Miriam Baglioni ef8e5957b5 added specific directory where to save results 2020-07-30 16:42:46 +02:00
Miriam Baglioni 75f3361c85 - 2020-07-30 16:41:31 +02:00
Miriam Baglioni 3f695b25fa refactoring 2020-07-30 16:40:15 +02:00
Miriam Baglioni e623f12bef refactoring 2020-07-30 16:32:59 +02:00
Miriam Baglioni ff7d05abb4 added support class to store the couple organizationId representativeId gaot from sql query on hive 2020-07-30 16:32:04 +02:00
Miriam Baglioni cf6d80b2ab added command to close the writer 2020-07-30 16:31:22 +02:00
Miriam Baglioni f985bca37b added USER_CLAIM constant value 2020-07-30 16:25:26 +02:00
Claudio Atzori 4bbfcf1ac6 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-07-30 16:25:06 +02:00
Claudio Atzori 4ff8007518 added function to set the missing vocabulary names, used in the cleaning workflow as a pre-cleaning step 2020-07-30 16:24:39 +02:00
Miriam Baglioni 6f1c40a933 - 2020-07-30 16:24:28 +02:00
Miriam Baglioni 2b66a93f9e added property file that was missing 2020-07-30 16:24:17 +02:00
Michele Artini bdece15ca0 blacklist of nsprefix 2020-07-30 16:13:38 +02:00
Sandro La Bruzzo c97c8f0c44 implemented new oozie job to extract entities in a separate dataset 2020-07-30 12:13:58 +02:00
Sandro La Bruzzo 3010a362bc updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset 2020-07-30 09:25:56 +02:00
Sandro La Bruzzo 487226f669 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-30 09:25:39 +02:00
Sandro La Bruzzo 16ae3c9ccf updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset 2020-07-30 09:25:32 +02:00
Miriam Baglioni ee8420c6b3 added resource for datasource test 2020-07-29 18:28:43 +02:00
Miriam Baglioni 76bcab98ce added code to filter out null originalId from the dump 2020-07-29 18:28:21 +02:00
Miriam Baglioni ef1d8aef17 added one test to verify the dump for the datasources 2020-07-29 18:27:46 +02:00
Miriam Baglioni 86bab79512 - 2020-07-29 18:20:22 +02:00
Miriam Baglioni 31791dcf3d fixed wrong property file path name 2020-07-29 18:20:08 +02:00
Miriam Baglioni 9e722aa1ef - 2020-07-29 18:00:08 +02:00
Miriam Baglioni d22f106f27 added constant to identify datasource associated to funders 2020-07-29 17:56:55 +02:00
Miriam Baglioni 40e194fe2f added check to not dump datasources related to funders 2020-07-29 17:56:18 +02:00
Miriam Baglioni b48934f6df changed the workflow name 2020-07-29 17:43:43 +02:00
Miriam Baglioni 1433db825d refactorign 2020-07-29 17:43:24 +02:00