Commit Graph

1746 Commits

Author SHA1 Message Date
Miriam Baglioni 609ff17cfc now the commission give us the framework programme (FP7 - H2020) so use this information to filter out programmes not associated to H2020 2020-09-24 15:19:31 +02:00
Miriam Baglioni b66f930466 Added optionl1 and optional2 information to the files red from the db. Optional1 contains the topic code and optional2 contains the topic description 2020-09-24 15:16:56 +02:00
Miriam Baglioni 860e6d38a6 added topic description to the CSV project variables 2020-09-24 15:15:26 +02:00
Claudio Atzori 044d3a0214 fixed query used to load datasources in the Graph 2020-09-24 13:48:58 +02:00
Claudio Atzori 27df1cea6d code formatting 2020-09-24 12:16:00 +02:00
Claudio Atzori fb22f4d70b included values for projects fundedamount and totalcost fields in the mapping tests. Swapped expected and actual values in junit test assertions 2020-09-24 12:10:59 +02:00
Claudio Atzori 42f55395c8 fixed order of the ISSNs returned by the SQL query 2020-09-24 12:09:58 +02:00
Claudio Atzori fadf5c7c69 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-09-24 10:42:52 +02:00
Claudio Atzori 9a7e72d528 using concat_ws to join textual columns from PSQL. When using || to perform the concatenation, Null columns makes the operation result to be Null 2020-09-24 10:42:47 +02:00
Claudio Atzori 9e3e93c6b6 setting the correct issn type in the datasource.journal element 2020-09-24 10:39:16 +02:00
Miriam Baglioni 0d83f47166 merge branch with master 2020-09-23 17:33:49 +02:00
Miriam Baglioni 39eb8ab25b changed the dump to move from h2020programme to h2020classification 2020-09-23 17:33:00 +02:00
Miriam Baglioni 1d84cf19a6 added new line to resource file 2020-09-23 17:32:22 +02:00
Miriam Baglioni f0c476b6c9 modification to the test classes to consider h2020classification 2020-09-23 17:31:49 +02:00
Miriam Baglioni 2cba3cb484 modification to the classes building the actionset to consider the h2020classification 2020-09-23 17:31:15 +02:00
Miriam Baglioni 1069cf243a modification to the schema to consider the H2020classification of the programme. The filed Programme has been moved inside the H2020classification that is now associated to the Project. Programme is no more associated directly to the Project but via H2020CLassification 2020-09-22 14:38:00 +02:00
Enrico Ottonello a97ad20c7b exception is now propagated (PR review) 2020-09-22 10:46:34 +02:00
Enrico Ottonello fefbcfb106 dependency version moved to main pom (PR review) 2020-09-22 10:20:25 +02:00
Michele Artini 9e681609fd stats to sql file 2020-09-17 15:51:22 +02:00
Michele Artini 51321c2701 partition of events by opedoarId 2020-09-17 11:38:07 +02:00
Claudio Atzori cf2ce1a09b code formatting 2020-09-15 15:58:03 +02:00
Enrico Ottonello 9e8e7fe6ef add comments 2020-09-15 11:32:49 +02:00
Miriam Baglioni c2b5c780ff - 2020-09-14 14:34:03 +02:00
Miriam Baglioni e2ceefe9be - 2020-09-14 14:33:28 +02:00
Miriam Baglioni 1f893e63dc - 2020-09-14 14:33:10 +02:00
Enrico Ottonello 538f299767 merged 2020-09-14 12:35:16 +02:00
Enrico Ottonello eb8c9b2348 Merge remote-tracking branch 'upstream/master' into orcid-no-doi 2020-09-14 12:00:56 +02:00
Michele Artini 9b0c12f5d3 send notifications 2020-09-11 12:06:16 +02:00
Michele Artini 028613b751 remove old notifications 2020-09-09 15:32:06 +02:00
Michele Artini 9cfc124ac5 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-09-08 16:39:54 +02:00
Michele Artini a597a218ab * forall topics 2020-09-08 16:39:40 +02:00
Claudio Atzori 8a523474b7 code formatting 2020-09-07 11:40:16 +02:00
Michele Artini bb459caf69 support for all topic subscriptions 2020-08-27 11:01:21 +02:00
Michele Artini 82ed8edafd notification indexing 2020-08-26 15:10:48 +02:00
Miriam Baglioni b72a7dad46 resuorce for pid graph dump 2020-08-24 17:09:01 +02:00
Miriam Baglioni 8694bb9b31 refactoring due to compilation 2020-08-24 17:07:34 +02:00
Miriam Baglioni 8a069a4fea - 2020-08-24 17:01:30 +02:00
Miriam Baglioni 34fa96f3b1 - 2020-08-24 17:00:20 +02:00
Miriam Baglioni 5fb2949cb8 added utils methods 2020-08-24 17:00:09 +02:00
Miriam Baglioni 2a540b6c01 added constants for the pid graph dump 2020-08-24 16:55:35 +02:00
Miriam Baglioni da103c399a resources for the pid graph dump test 2020-08-24 16:52:07 +02:00
Miriam Baglioni 630a6a1fe7 first tests for the pid graph dump 2020-08-24 16:51:26 +02:00
Miriam Baglioni 40c8d2de7b test resources for the dump of the pids graph 2020-08-24 16:50:39 +02:00
Miriam Baglioni bef79d3bdf first attempt to the dump of pids graph 2020-08-24 16:49:38 +02:00
Michele Artini da470422d3 deleting events 2020-08-21 14:52:48 +02:00
Michele Artini 6e60bf026a indexing only a subset of eventsa 2020-08-19 12:39:22 +02:00
Miriam Baglioni 85203c16e3 merge branch with master 2020-08-19 11:49:03 +02:00
Miriam Baglioni 2c783793ba removed the affiliation from the author to mirror the changes in the model 2020-08-19 11:48:12 +02:00
Miriam Baglioni f6bf888016 removed affiliation from author to mirror the changes in the model 2020-08-19 11:41:41 +02:00
Miriam Baglioni 66d0e0d3f2 - 2020-08-19 11:31:50 +02:00
Miriam Baglioni 1c593a9cfe - 2020-08-19 11:29:51 +02:00
Miriam Baglioni e42b2f5ae2 - 2020-08-19 11:29:09 +02:00
Miriam Baglioni f81ee22418 changed to mirror the changes in the model (Instance, CommunityInstance, GraphResult) 2020-08-19 11:28:26 +02:00
Miriam Baglioni 387be43fd4 changed to discriminate if dumping all the results type together or each one in its own archive 2020-08-19 11:25:27 +02:00
Miriam Baglioni c5858afb88 added parameter to guide the dump for the result (resultAggregation). true if all the result types should be dump together, false otherwise. 2020-08-19 11:24:14 +02:00
Miriam Baglioni d407852ac2 changed to reflect the changed in the model 2020-08-19 11:15:05 +02:00
Miriam Baglioni 47c21a8961 refactoring due to compilation 2020-08-19 11:11:57 +02:00
Miriam Baglioni 5570678c65 changed parameter name from hfdsNameNode to nameNode 2020-08-19 10:59:26 +02:00
Miriam Baglioni dc5096a327 refactoring due to compilation 2020-08-19 10:57:36 +02:00
Miriam Baglioni 55e24c2547 relclass for relation and corresponding values have been put to lower case (isSupplementedBy wrote as IsSupplementedBy - orcid propagation) 2020-08-18 16:42:08 +02:00
Miriam Baglioni f44dd5d886 changed in mapping the result semantic name as it will be visible il the relclass Relation: from IsSupplementedBy to isSupplementedBy 2020-08-17 17:15:09 +02:00
Miriam Baglioni bc6b5d5b34 removed leftover parameter 2020-08-15 11:22:35 +02:00
Miriam Baglioni 200cd5c730 removed leftover parameter 2020-08-15 11:22:19 +02:00
Miriam Baglioni 96600ed04a modified test resource for mirroring the deletion of affiliation from author parameters 2020-08-14 20:41:49 +02:00
Miriam Baglioni 09f5b92763 added specific reference to class 2020-08-14 20:00:09 +02:00
Miriam Baglioni 37e7c43652 changed parameter name from hdfsNaemNode to nameNode 2020-08-14 18:18:25 +02:00
Claudio Atzori 5b994d7ccf Merge branch 'dump' of https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop into resolve_conflicts_pr40_dump 2020-08-14 15:32:29 +02:00
Miriam Baglioni de995970ea try again to solve clash with master 2020-08-14 15:24:36 +02:00
Miriam Baglioni 5040d72d5e changed to make it equal to master branch 2020-08-14 15:20:17 +02:00
Miriam Baglioni be8106c339 added space toavoid conflicts with master branch 2020-08-14 15:16:27 +02:00
Claudio Atzori 1871d1c6f6 solve error java.lang.NoSuchFieldError: INSTANCE when instantiating Solr client 2020-08-14 11:18:30 +02:00
Miriam Baglioni d2a8a4961a refactoring 2020-08-13 18:50:33 +02:00
Miriam Baglioni a5043de5da added method to get the mapped instance 2020-08-13 18:45:50 +02:00
Miriam Baglioni b7e49aee8d removed commented code 2020-08-13 18:44:07 +02:00
Miriam Baglioni f439a6231e added missing constraint in XQuery (verify the status of the RC/RI different from hidden) 2020-08-13 15:30:55 +02:00
Miriam Baglioni 0fe800b1c9 modified because of D-Net/dnet-hadoop#40\#issuecomment-1902 2020-08-13 15:17:12 +02:00
Miriam Baglioni 270c89489c fixed issue created while renaming subject to subjects in community configuration xml 2020-08-13 15:16:04 +02:00
Miriam Baglioni fcd10f452c changed because of D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:55:32 +02:00
Miriam Baglioni fd48ae3b85 changed because of D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:19:15 +02:00
Miriam Baglioni 04a3e1ab38 disabled tests 2020-08-13 12:18:13 +02:00
Miriam Baglioni 2ede397933 Apply change because of D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:16:39 +02:00
Miriam Baglioni bfd1fcde6d removed not useful method and changed because of D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:14:37 +02:00
Miriam Baglioni 7fd8397123 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:13:15 +02:00
Miriam Baglioni 753d448cc9 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:12:58 +02:00
Miriam Baglioni c0e071fa26 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:12:40 +02:00
Miriam Baglioni 526db915bc apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:12:16 +02:00
Miriam Baglioni b0fab0d138 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:11:57 +02:00
Miriam Baglioni 1b6320b251 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:11:41 +02:00
Miriam Baglioni 743d31be22 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:11:22 +02:00
Miriam Baglioni 65b48df652 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:11:06 +02:00
Miriam Baglioni 90b54d3efb apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:08:24 +02:00
Miriam Baglioni 69bbb9592a apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:07:39 +02:00
Miriam Baglioni 945323299a apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:07:24 +02:00
Miriam Baglioni e04c993247 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:07:07 +02:00
Miriam Baglioni ed0812d0ce apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:06:49 +02:00
Miriam Baglioni d55cfe0ea5 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:06:20 +02:00
Miriam Baglioni 80866bec7d apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:06:05 +02:00
Miriam Baglioni 1400978c0a apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:05:44 +02:00
Miriam Baglioni 7b941a2e0a apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:05:17 +02:00
Miriam Baglioni f7474f50fe apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:04:52 +02:00
Miriam Baglioni 367203f412 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:04:33 +02:00
Miriam Baglioni 3ab4809d31 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:04:10 +02:00
Miriam Baglioni 02a4986e7b Applying changed from code reviews D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment) 2020-08-13 11:53:01 +02:00
Miriam Baglioni 235d4e4d6e moved Context as relevant for Communities dump 2020-08-12 18:16:45 +02:00
Miriam Baglioni adf9f96a67 test for extraction of relation between organizations and context 2020-08-12 10:04:47 +02:00
Miriam Baglioni 7400cd019d removed not needed variable 2020-08-12 10:03:33 +02:00
Miriam Baglioni 98d28bab5c fixed missing _ in context nsprefix 2020-08-12 10:00:18 +02:00
Miriam Baglioni 8f48cb29f4 changed resource because of a change in the XQuery that returned the XML to be parsed. The main Zenodo community is no more a separate element, but part of the <zenodocommunities> element 2020-08-11 18:04:38 +02:00
Miriam Baglioni c3672b162b merge branch with master 2020-08-11 17:53:04 +02:00
Miriam Baglioni a16bbf3202 changed test resource to mirror change in the Xquery that produced data to be parsed. The main Zenodo community it is no more provided in a different element, but it is part of the <zenodocommunities> 2020-08-11 17:48:44 +02:00
Miriam Baglioni 25f4fbceea draft of test and resources 2020-08-11 17:37:22 +02:00
Miriam Baglioni 30a2b19b65 changed metadata for deposition od covid-19 dump in Zenodo 2020-08-11 17:36:56 +02:00
Claudio Atzori f7cc52ab02 Merge pull request 'enrichment_wfs' (#39) from enrichment_wfs into master
LGTM
2020-08-11 17:26:13 +02:00
Miriam Baglioni 49788b532a changed to mirror changes in the schema 2020-08-11 16:05:03 +02:00
Miriam Baglioni b08511287b - 2020-08-11 16:01:36 +02:00
Miriam Baglioni 7e81a17068 changed the XQUERY to mirror the change in the code 2020-08-11 16:00:33 +02:00
Miriam Baglioni 37ad2f28e9 removed added | in prefix for datasource 2020-08-11 15:55:06 +02:00
Miriam Baglioni f31c2e9461 enabled test 2020-08-11 15:49:25 +02:00
Miriam Baglioni 2d67476417 merge branch with master 2020-08-11 15:46:04 +02:00
Miriam Baglioni 77a390878c merge upstream 2020-08-11 15:45:48 +02:00
Miriam Baglioni 6d3804e24c - 2020-08-11 15:45:12 +02:00
Miriam Baglioni 0603ec4757 changed test to upload the dump for covid-19 community 2020-08-11 15:43:25 +02:00
Miriam Baglioni 7dfd56df9d - 2020-08-11 15:42:35 +02:00
Miriam Baglioni a169d7e7c1 added test file for the MakeTar class 2020-08-11 15:40:41 +02:00
Miriam Baglioni acb0926b2e json schemas for the dumped entities and relation 2020-08-11 15:39:48 +02:00
Miriam Baglioni ff52c51f92 added the communityMapPath parameter and removed the isLookUpUrl parameter 2020-08-11 15:39:22 +02:00
Miriam Baglioni 6f43acda5e added the maketar and send to zenodo step. Adjusted wf parameters 2020-08-11 15:38:20 +02:00
Miriam Baglioni ddc19de2e9 removed the isLookUpUrl among the parameters 2020-08-11 15:37:47 +02:00
Miriam Baglioni 592a8ea573 added parameter file for maketar class 2020-08-11 15:37:14 +02:00
Miriam Baglioni 77a0951b32 added the make archive step in the workflow 2020-08-11 15:32:32 +02:00
Miriam Baglioni cf4d918787 added description, changed parameter name and added method 2020-08-11 15:27:31 +02:00
Miriam Baglioni dc5fc5366d Creation of an archive for each related dump part 2020-08-11 15:26:06 +02:00
Miriam Baglioni 0ce49049d6 added description 2020-08-11 15:25:11 +02:00
Miriam Baglioni 9bae991167 added description of the class 2020-08-11 11:20:43 +02:00
Miriam Baglioni 341dc59ead removed the repartition(1). Added code for the creation of an archive containing all the parts dumped for each community 2020-08-11 11:18:58 +02:00
Sandro La Bruzzo fe8d640aee fixed error on oozie workflow 2020-08-11 09:43:03 +02:00
Sandro La Bruzzo 304590e854 updated workflow of indexing to start from begin 2020-08-11 09:17:47 +02:00
Sandro La Bruzzo eaf0dc68a2 fixed indexing 2020-08-11 09:17:03 +02:00
Miriam Baglioni 1991a49f70 removed reference to isLookUp to get the communityMap 2020-08-10 18:02:56 +02:00
Miriam Baglioni c378c38546 disabled test. The testing functionalities for hte upload in Zenode are moved to common 2020-08-10 12:41:11 +02:00
Miriam Baglioni 63ad0ed209 changed to use communityMapPath instead of IsLookUp 2020-08-10 12:40:19 +02:00
Miriam Baglioni cec795f2ea changed resources to mirror changes in the model 2020-08-10 12:39:35 +02:00
Miriam Baglioni f50e3e7333 changed the class for which to generate the schema 2020-08-10 12:03:49 +02:00
Miriam Baglioni b8c26f656c test using communityMapPath instead of isLookUp 2020-08-10 12:02:55 +02:00
Miriam Baglioni fe88904df0 changed the wf definition 2020-08-10 12:01:14 +02:00
Miriam Baglioni 87856467e2 removed isLookUpUrl and added code to read from HDSF the communitymap 2020-08-10 11:38:41 +02:00
Miriam Baglioni 1cf7043e26 removed isLookUoUrl from the parameters 2020-08-10 11:38:03 +02:00
Claudio Atzori cf6b68ce5a Merge pull request 'data provision workflow: add nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed' (#36) from provision_indexing into master 2020-08-10 11:16:29 +02:00
Sandro La Bruzzo 0ade33ad15 updated mergeFrom function for DLI Unknown 2020-08-10 10:18:35 +02:00
Miriam Baglioni 46986aae2d added the new parameter for newdeposion/newversion and concept_record_id 2020-08-07 18:00:06 +02:00
Miriam Baglioni 3aedfdf0d6 added option to do a new deposition or new version of an old deposition 2020-08-07 17:49:14 +02:00
Miriam Baglioni 1b3ad1bce6 filter out authors pid (only orcid). Added check to get unique provenance for context id. filtr out countries with code UNKNOWN 2020-08-07 17:48:18 +02:00
Miriam Baglioni 5ceb8c5f0a moved constants from graph.Constants 2020-08-07 17:46:47 +02:00
Miriam Baglioni 6c65c93c0e refactoring 2020-08-07 17:45:35 +02:00
Miriam Baglioni 68adf86fe4 refactoring 2020-08-07 17:43:20 +02:00
Miriam Baglioni 26d2ad6ebb refactoring 2020-08-07 17:41:56 +02:00
Miriam Baglioni 9675af7965 refactoring 2020-08-07 17:41:07 +02:00
Miriam Baglioni 346a91f4d9 Added constants 2020-08-07 17:35:39 +02:00
Miriam Baglioni d52b0e1797 no use of IsLookUp. The query is done once and its result stored on HDFS. The path to the result is given instead of the isLookUpUrl 2020-08-07 17:34:40 +02:00
Miriam Baglioni ae1b7fbfdb changed method signature from set of mapkey entries to String representing path on file system where to find the map 2020-08-07 17:32:27 +02:00
Miriam Baglioni 931fa2ff00 removed dependencies 2020-08-07 16:46:37 +02:00
Miriam Baglioni 545ea9f77e moved in common. Zenodo response model and APIClient to deposit in Zenodo 2020-08-07 16:44:51 +02:00
Sandro La Bruzzo ddb1446ceb fixed test 2020-08-07 11:34:33 +02:00
Sandro La Bruzzo 718bc7bbc8 implemented provision workflows using the new implementation with Dataset 2020-08-07 11:05:18 +02:00
Miriam Baglioni da9b012c15 fixed dewcription 2020-08-06 11:55:44 +02:00
Miriam Baglioni 6dbadcf181 the new schema for the dumped result 2020-08-06 11:05:56 +02:00
Sandro La Bruzzo a44e5abaa7 reformat code 2020-08-06 10:30:22 +02:00
Sandro La Bruzzo 4fb1821fab Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-08-06 10:28:31 +02:00
Sandro La Bruzzo 9d9e9edbd2 improved extractEntity Relation workflows using dataset 2020-08-06 10:28:24 +02:00
Miriam Baglioni adf0ca5aa7 test to send is from hdfs 2020-08-05 14:24:43 +02:00
Miriam Baglioni 14eda4f46e added method to try to put inputstream to zenodo 2020-08-05 14:18:25 +02:00
Miriam Baglioni e737a47270 added classes to try to send input stream to zenodo for the upload 2020-08-05 14:17:40 +02:00
Miriam Baglioni 873e9cd50c changed hadoop setting to connect to s3 2020-08-04 15:37:25 +02:00
Alessia Bardi a29565ff57 code formatting 2020-08-04 12:55:27 +02:00
Alessia Bardi 01db29e208 fixes redmine issue #5846: datacite and its different namespace declarations 2020-08-04 12:53:48 +02:00
Alessia Bardi b4e4e5f858 do not duplicate result PIDs 2020-08-04 12:52:14 +02:00
Alessia Bardi 09a323d18d testing a dataset from Nakala 2020-08-04 12:50:52 +02:00
Alessia Bardi c35bf486cc added handle among the possible PIDs 2020-08-04 12:50:12 +02:00
Miriam Baglioni 5b651abf82 merge branch with master 2020-08-04 10:14:07 +02:00
Miriam Baglioni 88e4c3b751 added default trust to context bulktagged 2020-08-04 10:13:25 +02:00
Miriam Baglioni f9342cb484 added constant 2020-08-03 18:32:35 +02:00
Miriam Baglioni 96c3c891f4 added trust 2020-08-03 18:32:17 +02:00
Miriam Baglioni 53656600ad changed XQuery to select only community and ri with status not hidden 2020-08-03 18:29:30 +02:00
Miriam Baglioni b34177d8ef merge upstream 2020-08-03 18:13:42 +02:00
Miriam Baglioni 901ae37f7b added step to workflow 2020-08-03 18:12:54 +02:00
Miriam Baglioni fa38cdb10b added resource 2020-08-03 18:11:12 +02:00
Miriam Baglioni e9fcc0b2f1 commented test unit - to decide change for mirroring the changed logics 2020-08-03 18:10:53 +02:00
Miriam Baglioni e43aeb139a added new property file and changed some parameter to old files 2020-08-03 18:07:28 +02:00
Miriam Baglioni aa9f3d9698 changed logic for save in s3 directly 2020-08-03 18:06:18 +02:00
Miriam Baglioni d465f0eec9 added fulltext to result 2020-08-03 18:03:27 +02:00
Miriam Baglioni ec4b392d12 added new dependencies for writing on s3 2020-08-03 17:57:04 +02:00
Miriam Baglioni c892c7dfa7 changed to query for community map just once and save the result for remaining executions 2020-08-03 17:56:31 +02:00
Claudio Atzori 3a11a387a9 data provision workflow enhancement: added nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed 2020-08-03 14:28:08 +02:00
Alessia Bardi 8cc067fe76 specific test for claims 2020-08-03 11:17:50 +02:00
Claudio Atzori a89b6cc3ba Merge pull request 'nsprefix_blacklist' (#34) from nsprefix_blacklist into master 2020-07-31 11:52:23 +02:00
Sandro La Bruzzo 0c3bc9ea4b Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-31 09:07:18 +02:00
Sandro La Bruzzo 168bfb496a adopted dedup to the new schema 2020-07-31 09:06:57 +02:00
Michele Artini 652b13abb6 Merge branch 'master' into nsprefix_blacklist 2020-07-31 07:58:37 +02:00
Enrico Ottonello 0377b40fba output to one parquet file 2020-07-30 18:38:07 +02:00
Claudio Atzori cd631bb5bc defaults fixed in the cleaning workflow forces result.publisher to NULL when result.publisher.value in empty 2020-07-30 17:03:53 +02:00
Miriam Baglioni 872d7783fc - 2020-07-30 16:50:36 +02:00
Miriam Baglioni 57c87b7653 re-implemented to fix issue on not serializable Set<String> variable 2020-07-30 16:43:43 +02:00
Miriam Baglioni ef8e5957b5 added specific directory where to save results 2020-07-30 16:42:46 +02:00
Miriam Baglioni 75f3361c85 - 2020-07-30 16:41:31 +02:00
Miriam Baglioni 3f695b25fa refactoring 2020-07-30 16:40:15 +02:00
Miriam Baglioni e623f12bef refactoring 2020-07-30 16:32:59 +02:00
Miriam Baglioni ff7d05abb4 added support class to store the couple organizationId representativeId gaot from sql query on hive 2020-07-30 16:32:04 +02:00
Miriam Baglioni cf6d80b2ab added command to close the writer 2020-07-30 16:31:22 +02:00
Miriam Baglioni f985bca37b added USER_CLAIM constant value 2020-07-30 16:25:26 +02:00
Claudio Atzori 4bbfcf1ac6 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-07-30 16:25:06 +02:00
Claudio Atzori 4ff8007518 added function to set the missing vocabulary names, used in the cleaning workflow as a pre-cleaning step 2020-07-30 16:24:39 +02:00
Miriam Baglioni 6f1c40a933 - 2020-07-30 16:24:28 +02:00
Miriam Baglioni 2b66a93f9e added property file that was missing 2020-07-30 16:24:17 +02:00
Michele Artini bdece15ca0 blacklist of nsprefix 2020-07-30 16:13:38 +02:00
Enrico Ottonello 196f36c6ed fix publication dataset creation 2020-07-30 13:38:33 +02:00
Sandro La Bruzzo c97c8f0c44 implemented new oozie job to extract entities in a separate dataset 2020-07-30 12:13:58 +02:00
Sandro La Bruzzo 3010a362bc updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset 2020-07-30 09:25:56 +02:00
Sandro La Bruzzo 487226f669 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-30 09:25:39 +02:00
Sandro La Bruzzo 16ae3c9ccf updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset 2020-07-30 09:25:32 +02:00
Miriam Baglioni ee8420c6b3 added resource for datasource test 2020-07-29 18:28:43 +02:00
Miriam Baglioni 76bcab98ce added code to filter out null originalId from the dump 2020-07-29 18:28:21 +02:00
Miriam Baglioni ef1d8aef17 added one test to verify the dump for the datasources 2020-07-29 18:27:46 +02:00
Miriam Baglioni 86bab79512 - 2020-07-29 18:20:22 +02:00
Miriam Baglioni 31791dcf3d fixed wrong property file path name 2020-07-29 18:20:08 +02:00
Miriam Baglioni 9e722aa1ef - 2020-07-29 18:00:08 +02:00
Miriam Baglioni d22f106f27 added constant to identify datasource associated to funders 2020-07-29 17:56:55 +02:00
Miriam Baglioni 40e194fe2f added check to not dump datasources related to funders 2020-07-29 17:56:18 +02:00
Miriam Baglioni b48934f6df changed the workflow name 2020-07-29 17:43:43 +02:00
Miriam Baglioni 1433db825d refactorign 2020-07-29 17:43:24 +02:00
Miriam Baglioni 074e9ab75e refactoring 2020-07-29 17:42:50 +02:00
Miriam Baglioni 8ad8dac7d4 merge branch with fork master 2020-07-29 17:38:28 +02:00
Miriam Baglioni 9e997e63a2 merge upstream 2020-07-29 17:38:14 +02:00
Miriam Baglioni 9fa82dc93b fixed issue 2020-07-29 17:36:16 +02:00
Miriam Baglioni 8907648d6a - 2020-07-29 17:35:47 +02:00
Miriam Baglioni 536e7f6352 added and changed resources for testing of the whole graph dump and of community related products dumps 2020-07-29 17:33:34 +02:00
Miriam Baglioni 4d7f590493 testings for the whole graph dump 2020-07-29 17:32:37 +02:00
Miriam Baglioni a2f73ec2c7 changed due to changes in the model 2020-07-29 17:32:02 +02:00
Miriam Baglioni 481585e9d3 - 2020-07-29 17:31:41 +02:00
Miriam Baglioni 40a8dafbdc - 2020-07-29 17:30:44 +02:00
Miriam Baglioni de2ebb467e changed due to changes in the model 2020-07-29 17:08:02 +02:00
Miriam Baglioni d0ff2a56fb - 2020-07-29 17:06:53 +02:00
Miriam Baglioni b96dedb56b changed due to changes in the model 2020-07-29 17:05:31 +02:00
Miriam Baglioni 6d0f08277b classes to implement the dump of the whole graph. 2020-07-29 17:03:19 +02:00
Miriam Baglioni 8d4327b292 input parameters and workflow definition for the dump of the whole graph 2020-07-29 17:00:34 +02:00
Miriam Baglioni b5f995ab12 refactoring 2020-07-29 16:59:48 +02:00
Miriam Baglioni f7a87cc447 added new constants value 2020-07-29 16:58:40 +02:00
Miriam Baglioni b71d12cf26 refactoring 2020-07-29 16:52:44 +02:00
Miriam Baglioni a8d65b68cb changed to delete the part to check if it was a test or a real execution 2020-07-29 16:47:57 +02:00
Miriam Baglioni 3ec2392904 Added new class to move the place the split is effectively run 2020-07-29 16:46:50 +02:00
Michele Artini 8ba94833bd added an es prop 2020-07-29 14:16:08 +02:00