WIP: dump of the OpenAIRE graph - Changes #103

Closed
miriam.baglioni wants to merge 77 commits from miriam.baglioni/dnet-hadoop:dump into master

77 Commits

Author SHA1 Message Date
Miriam Baglioni 197471972c resolved conflicts 2021-05-20 11:26:43 +02:00
Miriam Baglioni e0ca4cead0 - 2021-05-20 10:44:03 +02:00
Miriam Baglioni f5ecd0d3ab added new parameters 2021-05-20 10:41:29 +02:00
Miriam Baglioni 85bd6ab5d0 added verification tha one result linked to more than on project is saved just onece 2021-05-20 10:40:09 +02:00
Miriam Baglioni b8f25fbf39 changed the way to find the funders: we select the projects distinct nsp 2021-05-20 10:30:53 +02:00
Miriam Baglioni c6cadacd24 Changed the way to find results linked to projects. We verify to actually have the project on the graph before selecting the result 2021-05-20 10:29:13 +02:00
Miriam Baglioni 25c5dd744f new definition of the workflow for the dump as sub workflows 2021-05-20 10:19:49 +02:00
Miriam Baglioni e902699afa refactoring 2021-04-23 13:48:27 +02:00
Miriam Baglioni bcbadeb107 added a decision node to choose if to upload results in zenodo 2021-04-23 12:42:06 +02:00
Miriam Baglioni 952d6dc2fb modified the code to allow the dump for a single community (indip from its status) 2021-04-23 12:41:32 +02:00
Miriam Baglioni 3c3e3537e0 Merge branch 'singleCommunityDump' into dump 2021-04-23 12:12:43 +02:00
Miriam Baglioni 1416a49b63 merge branch with master 2021-04-23 12:11:15 +02:00
Miriam Baglioni 8981a82011 - 2021-04-23 11:55:20 +02:00
Miriam Baglioni eb0762622c added decision node to upload on zenodo or not 2021-04-23 11:54:54 +02:00
Miriam Baglioni a469d79b84 test for the creation of relationships between context and projects when the funding contains h2020 2021-04-23 11:52:27 +02:00
Miriam Baglioni 251178aca8 the new json schema for the result 2021-04-23 11:51:27 +02:00
Miriam Baglioni 7cf1f49d5e if the funding does not start with H2020 but contains it the nsp should be corda__h2020 2021-04-23 11:50:26 +02:00
Miriam Baglioni 7465fa3f20 dumping only the communities with status "all". We decided those with status manager wil be available on demand 2021-04-23 11:49:45 +02:00
Miriam Baglioni bc501f41f6 added test class for community removal from the set to be dumped 2021-04-13 16:40:24 +02:00
Miriam Baglioni 80a7170794 - 2021-04-13 16:39:55 +02:00
Miriam Baglioni 08e731916b removed parameter communityMap when sending data to Zenodo 2021-04-13 16:38:59 +02:00
Miriam Baglioni 50d13a1d74 changed the workflow for the dump of a single community 2021-04-13 16:33:00 +02:00
Miriam Baglioni 8c4c74a640 changed logic to be able to create a dump for a single community at a time 2021-04-13 16:32:19 +02:00
Miriam Baglioni 6179deb836 removed the part after part-x- in the file name generated by spark. It was too long and created problems while creating the tar entries 2021-04-13 16:30:59 +02:00
Miriam Baglioni 04a0d1ba6e added test method to check the creation of relations between context and projects 2021-04-09 12:49:51 +02:00
Miriam Baglioni 6b51b69cf7 added the creation of the openaireId from funder and grant number if the element is not present in the context profile 2021-04-09 12:49:07 +02:00
Miriam Baglioni bd4b6b053d changed classid with classname in the construction of provenance for the dump 2021-04-09 12:48:09 +02:00
Miriam Baglioni 26b34201ec refactoring 2021-04-09 12:47:03 +02:00
Miriam Baglioni 3d94c12d6e refactoring 2021-04-09 12:45:45 +02:00
Miriam Baglioni 95c5f97259 added the part for the extraction of relations versus projects 2021-04-09 11:31:37 +02:00
Miriam Baglioni eaf86828e6 refactoring 2021-04-09 11:30:30 +02:00
Miriam Baglioni c58206c3ba added test for the creation of relations with funders 2021-04-09 11:30:07 +02:00
Miriam Baglioni 3e3a45d930 refactoring 2021-04-08 10:44:37 +02:00
Miriam Baglioni 46a322b770 changed the name of originalId in acronym 2021-04-08 10:40:06 +02:00
Miriam Baglioni f95ec49a59 changed the substring to be pk for communities of arbitrary name length 2021-04-07 13:22:54 +02:00
Miriam Baglioni c52355b516 refactoring 2021-04-07 12:13:45 +02:00
Miriam Baglioni e1af14833d refactoring 2021-04-07 12:13:00 +02:00
Miriam Baglioni 22f4930479 refactoring 2021-04-07 12:12:04 +02:00
Miriam Baglioni 7f9b7cfcf6 removing from the dump organization that have been deleted by inference 2021-04-07 12:11:36 +02:00
Miriam Baglioni 66d64947af merge branch with master 2021-04-07 10:38:18 +02:00
Miriam Baglioni ad6d0ca9eb added to all the entities the check that deletedbyinference = false 2021-04-07 10:37:49 +02:00
Miriam Baglioni 26cf32c066 changed the test to mirror the change in the logic of the code 2021-04-01 18:22:57 +02:00
Miriam Baglioni 5022f1b50d removing organization deletedbyinference from the dump 2021-04-01 18:16:40 +02:00
Miriam Baglioni 0421f5e1d8 added check to verify not to add void APC 2021-04-01 17:38:30 +02:00
Miriam Baglioni 2c209e1140 added resources for testing selection of valid relations 2021-04-01 16:57:34 +02:00
Miriam Baglioni b3f02083e7 refactoring 2021-04-01 16:56:58 +02:00
Miriam Baglioni 8d28ca9815 added test class for the selection of valid relations 2021-04-01 16:56:32 +02:00
Miriam Baglioni 152ba8e2ef added description 2021-04-01 16:55:57 +02:00
Miriam Baglioni c0c225f3b2 added logic to select only the valid relations: those not deletedbyinference and having both part of the relation as entities in the graph 2021-04-01 16:53:33 +02:00
Miriam Baglioni daabc370c5 changed the workflow to add the step for selecting the valid relations 2021-04-01 16:52:39 +02:00
Miriam Baglioni f93356f690 refactoring 2021-04-01 16:24:08 +02:00
Miriam Baglioni f7714645d2 merge with dump 2021-03-30 16:27:38 +02:00
Miriam Baglioni 4632795f25 merge branch with master 2021-03-30 16:27:23 +02:00
Miriam Baglioni 870ee28dd6 refactoring 2021-03-30 12:55:48 +02:00
Miriam Baglioni 08f8dd9454 refactoring 2021-03-30 12:53:07 +02:00
Miriam Baglioni e5463fea01 added resource for apc dump 2021-03-30 12:47:07 +02:00
Miriam Baglioni 16c1a27852 added test for APC dump 2021-03-30 12:46:42 +02:00
Miriam Baglioni d0c94462e4 refactoring 2021-03-30 12:45:34 +02:00
Miriam Baglioni a896febc02 added APC in the dumped information 2021-03-30 11:13:07 +02:00
Miriam Baglioni 5dea729de3 added article processing charges and modified description 2021-03-30 10:49:39 +02:00
Miriam Baglioni 200e7e9c46 modified description 2021-03-30 10:49:15 +02:00
Miriam Baglioni 931b2a2e15 merge branch with master 2021-03-30 10:27:32 +02:00
Miriam Baglioni 330343937c - 2021-02-24 12:49:27 +01:00
Miriam Baglioni defbb71561 extended resource info to match the new test 2021-02-24 11:52:44 +01:00
Miriam Baglioni 17049f8bde extended the test class - should have done by the start :) 2021-02-24 11:52:07 +01:00
Miriam Baglioni cc11ee1cb9 changed the param value to directly upload on Zenodo 2021-02-24 11:51:40 +01:00
Miriam Baglioni 871e5bea29 should have fixed for real now 2021-02-24 11:51:20 +01:00
Miriam Baglioni 5d92df0627 tried again to fix issue for croatian funder 2021-02-24 10:49:55 +01:00
Miriam Baglioni 9841086ef3 modified code to split the Croazian funders 2021-02-23 18:09:14 +01:00
Miriam Baglioni d4ad740c98 merge branch with master 2021-02-23 11:10:41 +01:00
Miriam Baglioni a684e1065e merge branch with master 2021-02-23 10:45:42 +01:00
Miriam Baglioni f7c35e6311 merge branch with master 2021-02-08 10:39:10 +01:00
Miriam Baglioni 9bdadd4ddb merge branch with master 2021-01-22 11:55:27 +01:00
Miriam Baglioni 0d76e039cf changed to logic to verify in community is contained in the list of context of a result 2020-12-16 10:53:10 +01:00
Miriam Baglioni 7c86e66697 merge branch with master 2020-12-16 10:51:18 +01:00
Miriam Baglioni bc09d37e8c used constants in ModelConstants class 2020-12-10 10:01:23 +01:00
Miriam Baglioni 815c7c11aa merge branch with master 2020-12-03 11:28:01 +01:00