Commit Graph

473 Commits

Author SHA1 Message Date
Miriam Baglioni 7cf1f49d5e if the funding does not start with H2020 but contains it the nsp should be corda__h2020 2021-04-23 11:50:26 +02:00
Miriam Baglioni 7465fa3f20 dumping only the communities with status "all". We decided those with status manager wil be available on demand 2021-04-23 11:49:45 +02:00
Miriam Baglioni 8c4c74a640 changed logic to be able to create a dump for a single community at a time 2021-04-13 16:32:19 +02:00
Miriam Baglioni 6179deb836 removed the part after part-x- in the file name generated by spark. It was too long and created problems while creating the tar entries 2021-04-13 16:30:59 +02:00
Miriam Baglioni 6b51b69cf7 added the creation of the openaireId from funder and grant number if the element is not present in the context profile 2021-04-09 12:49:07 +02:00
Miriam Baglioni bd4b6b053d changed classid with classname in the construction of provenance for the dump 2021-04-09 12:48:09 +02:00
Miriam Baglioni 95c5f97259 added the part for the extraction of relations versus projects 2021-04-09 11:31:37 +02:00
Miriam Baglioni 3e3a45d930 refactoring 2021-04-08 10:44:37 +02:00
Miriam Baglioni f95ec49a59 changed the substring to be pk for communities of arbitrary name length 2021-04-07 13:22:54 +02:00
Miriam Baglioni c52355b516 refactoring 2021-04-07 12:13:45 +02:00
Miriam Baglioni e1af14833d refactoring 2021-04-07 12:13:00 +02:00
Miriam Baglioni 22f4930479 refactoring 2021-04-07 12:12:04 +02:00
Miriam Baglioni 7f9b7cfcf6 removing from the dump organization that have been deleted by inference 2021-04-07 12:11:36 +02:00
Miriam Baglioni ad6d0ca9eb added to all the entities the check that deletedbyinference = false 2021-04-07 10:37:49 +02:00
Miriam Baglioni 5022f1b50d removing organization deletedbyinference from the dump 2021-04-01 18:16:40 +02:00
Miriam Baglioni 0421f5e1d8 added check to verify not to add void APC 2021-04-01 17:38:30 +02:00
Miriam Baglioni 152ba8e2ef added description 2021-04-01 16:55:57 +02:00
Miriam Baglioni c0c225f3b2 added logic to select only the valid relations: those not deletedbyinference and having both part of the relation as entities in the graph 2021-04-01 16:53:33 +02:00
Miriam Baglioni f93356f690 refactoring 2021-04-01 16:24:08 +02:00
Miriam Baglioni f7714645d2 merge with dump 2021-03-30 16:27:38 +02:00
Miriam Baglioni 08f8dd9454 refactoring 2021-03-30 12:53:07 +02:00
Miriam Baglioni d0c94462e4 refactoring 2021-03-30 12:45:34 +02:00
Miriam Baglioni a896febc02 added APC in the dumped information 2021-03-30 11:13:07 +02:00
Miriam Baglioni 871e5bea29 should have fixed for real now 2021-02-24 11:51:20 +01:00
Miriam Baglioni 5d92df0627 tried again to fix issue for croatian funder 2021-02-24 10:49:55 +01:00
Miriam Baglioni 9841086ef3 modified code to split the Croazian funders 2021-02-23 18:09:14 +01:00
Miriam Baglioni d4ad740c98 merge branch with master 2021-02-23 11:10:41 +01:00
Claudio Atzori 885e0dd926 [Cleaning] filter authors not providing word characters in the fullname 2021-01-26 09:48:53 +01:00
Claudio Atzori 2890511613 [Cleaning] normalise missing Result.country 2021-01-26 09:41:44 +01:00
Claudio Atzori 4eb9ed35b1 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2021-01-25 18:12:24 +01:00
Claudio Atzori cd379eb5e3 [Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname 2021-01-25 18:11:49 +01:00
Alessia Bardi 505477f36f format code 2021-01-25 18:02:49 +01:00
Alessia Bardi ded6ed8d7d no ',' author, if there are no author in ODF records 2021-01-25 17:57:51 +01:00
Claudio Atzori 3465c8ccee [Cleaning] trying to avoid NPEs 2021-01-25 16:54:53 +01:00
Claudio Atzori 07a0ccfc96 [Cleaning] trying to avoid NPEs 2021-01-25 13:36:01 +01:00
Claudio Atzori 34d653de41 [Cleaning] updated cleaning rule for DOIs 2021-01-22 14:16:33 +01:00
Miriam Baglioni 9bdadd4ddb merge branch with master 2021-01-22 11:55:27 +01:00
Claudio Atzori 26e9d55c13 code formatting 2021-01-05 09:59:26 +01:00
Claudio Atzori 7185158942 ignore missing properties 2020-12-29 11:06:28 +01:00
Claudio Atzori 28460c2cd1 using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper 2020-12-23 16:59:52 +01:00
Claudio Atzori 723b01f9e9 trivial: the less magic numbers and values around, the better 2020-12-23 12:22:48 +01:00
Claudio Atzori 6cb0dc3f43 extended OCRID cleaning procedure 2020-12-21 11:40:17 +01:00
Miriam Baglioni 0d76e039cf changed to logic to verify in community is contained in the list of context of a result 2020-12-16 10:53:10 +01:00
Miriam Baglioni 7c86e66697 merge branch with master 2020-12-16 10:51:18 +01:00
Michele Artini 991e675dc6 validation in claim rels 2020-12-14 15:41:25 +01:00
Miriam Baglioni bc09d37e8c used constants in ModelConstants class 2020-12-10 10:01:23 +01:00
Claudio Atzori 4705144918 Merge pull request 'rel_project_validation' (#69) from rel_project_validation into master
LGTM
2020-12-09 19:01:20 +01:00
Claudio Atzori ada21ad920 Merge pull request 'dump of the results related to at least one project' (#61) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
2020-12-09 17:22:56 +01:00
Michele Artini 1bc9adc10d default trust for validated rels 2020-12-09 16:18:37 +01:00
Michele Artini 5f21a356fd reindent 2020-12-09 11:24:30 +01:00