Commit Graph

369 Commits

Author SHA1 Message Date
Miriam Baglioni 5402062ff5 changed parameter file with the ono associated to the job 2020-11-18 16:58:20 +01:00
Miriam Baglioni a172a37ad1 fixed typo 2020-11-18 16:55:07 +01:00
Miriam Baglioni 46ba3793f6 code, workflow and parameters for the dump of the results associated to funders 2020-11-18 16:47:31 +01:00
Alessia Bardi 10e673660f Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-11-18 10:01:23 +01:00
Alessia Bardi be7b310cef rel semantcis ignore case 2020-11-18 10:01:20 +01:00
Michele Artini 33da2e3d6c xpaths for dateOfCollection and dateOfTransformation 2020-11-18 09:26:20 +01:00
Alessia Bardi 8f87020a50 #56: map relevantDates from aggregated ODF records 2020-11-17 18:42:09 +01:00
Claudio Atzori cfc01f136e PID filtering based on a blacklist 2020-11-17 12:27:06 +01:00
Claudio Atzori 6ab1ce53c9 fixed condition in result pid cleaning; cleanup 2020-11-16 10:09:17 +01:00
Claudio Atzori 768bc5304c Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-11-13 15:40:34 +01:00
Claudio Atzori 93f7b7974f Merge pull request 'trust truncated to 3 decimals' (#24) from trunc_trust into master
LGTM
2020-11-13 15:40:02 +01:00
Claudio Atzori 528231a287 grouping graph entities by id turned out to be an easy extension for the already existing cleaning workflow 2020-11-13 15:37:48 +01:00
Claudio Atzori 2bed29eb09 WIP: added oozie workflow for grouping graph entities by id 2020-11-13 10:05:12 +01:00
Claudio Atzori 13e36a4da0 WIP: added oozie workflow for grouping graph entities by id 2020-11-13 10:05:02 +01:00
Michele Artini 40160d171f organizations pids 2020-11-09 12:58:36 +01:00
Claudio Atzori d10447e747 re-packaged graph dump workflow sources 2020-11-05 17:38:18 +01:00
Miriam Baglioni f8e9bda24c merge branch with master 2020-11-05 16:31:18 +01:00
Miriam Baglioni be5ed8f554 added check to avoid sending empty metadata. 2020-11-05 16:10:17 +01:00
Claudio Atzori 4625b7486e code formatting 2020-11-04 18:12:43 +01:00
Miriam Baglioni e9ac471ae9 removed dependency from classes for the pid graph dump 2020-11-04 18:04:42 +01:00
Miriam Baglioni bac307155a removed properties specific for pid graph dump 2020-11-04 17:28:04 +01:00
Miriam Baglioni 9c9d50f486 removed code specific for pid graph dump 2020-11-04 17:26:22 +01:00
Miriam Baglioni 5669890934 removed commented lines 2020-11-04 17:15:21 +01:00
Miriam Baglioni 6a89f59be9 removed commented lines 2020-11-04 17:13:59 +01:00
Miriam Baglioni 56150d7e5e removed all code related to the dump of pids graph 2020-11-04 17:13:12 +01:00
Miriam Baglioni 16c54a96f8 removed pid dump 2020-11-04 17:11:32 +01:00
Miriam Baglioni 8ec7a61188 merge branch with master 2020-11-03 16:59:08 +01:00
Miriam Baglioni 7d2eda43ca added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase 2020-11-03 16:57:01 +01:00
Miriam Baglioni cbbb1bdc54 moved business logic to new class in common for handling the zip of hte archives 2020-11-03 16:55:50 +01:00
Claudio Atzori 5310e56dba remove empy PIDs 2020-11-03 11:52:10 +01:00
Miriam Baglioni dabb33e018 changed the discriminant for which split the file 2020-10-30 17:52:22 +01:00
Miriam Baglioni 0fba08eae4 max allowed size per file 10 Gb 2020-10-30 16:05:55 +01:00
Miriam Baglioni b828587252 prevent the code to cicle indefinetly 2020-10-30 15:01:25 +01:00
Miriam Baglioni f747e303ac classes for dumping of the graph as ttl file 2020-10-30 14:13:45 +01:00
Miriam Baglioni 16baf5b69e formatting 2020-10-30 14:13:14 +01:00
Miriam Baglioni a9eef9c852 added check for possible Optional value in relation dataInfo 2020-10-30 14:12:28 +01:00
Miriam Baglioni 5f4de9a962 formatting 2020-10-30 14:11:40 +01:00
Miriam Baglioni 14bf2e7238 added option to split dumps bigger that 40Gb on different files 2020-10-30 14:09:04 +01:00
Miriam Baglioni d2374e3b9e added code to handle cases where the funding tree is not existing 2020-10-27 16:15:21 +01:00
Miriam Baglioni 5d3012eeb4 changed code to dump only the programme list and not the classification list 2020-10-27 16:14:18 +01:00
Miriam Baglioni 3241ec1777 added connection timeout and socket timeout 600 sec 2020-10-27 16:12:11 +01:00
Claudio Atzori a3f37a9414 javadoc 2020-10-07 16:44:22 +02:00
Claudio Atzori 8d85a2fced [BETA wf only] datasources involved in the merge operation doesn't obey to the infra precedence policy, but relies on a custom behaviour that, given two datasources from beta and prod returns the one from prod with the highest compatibility among the two 2020-10-07 16:28:52 +02:00
Miriam Baglioni ae08b3c0dd merge branch with master 2020-10-05 11:35:55 +02:00
Miriam Baglioni 32bffb0134 changed the name from communities_infrastructures to communities_infrastuctures.json 2020-10-05 11:24:17 +02:00
Miriam Baglioni 25cbcf6114 changed to solve issues about names. context renamed communities_infrastructure.json and removed the double json.gz extention to the name of the part in the tar 2020-10-02 12:17:46 +02:00
Claudio Atzori 49ae3450a9 code formatting 2020-10-02 09:43:24 +02:00
Claudio Atzori c2a6e2a9bf fixed mapping for datasource journal info (ISSNs) 2020-10-02 09:37:08 +02:00
Miriam Baglioni cfb5766c6b removed double json.gz from names of files in the tar 2020-10-01 17:18:34 +02:00
Miriam Baglioni fcaedac980 merge branch with master 2020-10-01 16:46:59 +02:00
Miriam Baglioni c6e6ed1bd8 merge branch with master 2020-10-01 16:24:41 +02:00
Claudio Atzori 2e9e13444d author pids made unique by value 2020-10-01 12:50:40 +02:00
Claudio Atzori e265c3e125 cleaning functions factored out in a dedicated class 2020-10-01 10:50:15 +02:00
Miriam Baglioni 7b6a7333e6 merge branch with master 2020-09-25 16:42:07 +02:00
Miriam Baglioni ed5239f9ec added new code to handle the new possibility to upload files to an already open deposition 2020-09-25 16:34:32 +02:00
Miriam Baglioni 3a8c524fce refactor 2020-09-25 16:34:02 +02:00
Miriam Baglioni de6c4d46d8 fixed conflicts 2020-09-24 15:35:01 +02:00
Claudio Atzori 9e3e93c6b6 setting the correct issn type in the datasource.journal element 2020-09-24 10:39:16 +02:00
Miriam Baglioni 39eb8ab25b changed the dump to move from h2020programme to h2020classification 2020-09-23 17:33:00 +02:00
Miriam Baglioni 1f893e63dc - 2020-09-14 14:33:10 +02:00
Claudio Atzori 8a523474b7 code formatting 2020-09-07 11:40:16 +02:00
Miriam Baglioni 8694bb9b31 refactoring due to compilation 2020-08-24 17:07:34 +02:00
Miriam Baglioni 8a069a4fea - 2020-08-24 17:01:30 +02:00
Miriam Baglioni 34fa96f3b1 - 2020-08-24 17:00:20 +02:00
Miriam Baglioni 5fb2949cb8 added utils methods 2020-08-24 17:00:09 +02:00
Miriam Baglioni 2a540b6c01 added constants for the pid graph dump 2020-08-24 16:55:35 +02:00
Miriam Baglioni bef79d3bdf first attempt to the dump of pids graph 2020-08-24 16:49:38 +02:00
Miriam Baglioni 85203c16e3 merge branch with master 2020-08-19 11:49:03 +02:00
Miriam Baglioni 1c593a9cfe - 2020-08-19 11:29:51 +02:00
Miriam Baglioni e42b2f5ae2 - 2020-08-19 11:29:09 +02:00
Miriam Baglioni f81ee22418 changed to mirror the changes in the model (Instance, CommunityInstance, GraphResult) 2020-08-19 11:28:26 +02:00
Miriam Baglioni 387be43fd4 changed to discriminate if dumping all the results type together or each one in its own archive 2020-08-19 11:25:27 +02:00
Miriam Baglioni dc5096a327 refactoring due to compilation 2020-08-19 10:57:36 +02:00
Miriam Baglioni 09f5b92763 added specific reference to class 2020-08-14 20:00:09 +02:00
Miriam Baglioni a5043de5da added method to get the mapped instance 2020-08-13 18:45:50 +02:00
Miriam Baglioni fcd10f452c changed because of D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:55:32 +02:00
Miriam Baglioni bfd1fcde6d removed not useful method and changed because of D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:14:37 +02:00
Miriam Baglioni 7fd8397123 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:13:15 +02:00
Miriam Baglioni 753d448cc9 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:12:58 +02:00
Miriam Baglioni c0e071fa26 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:12:40 +02:00
Miriam Baglioni 526db915bc apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:12:16 +02:00
Miriam Baglioni b0fab0d138 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:11:57 +02:00
Miriam Baglioni 1b6320b251 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:11:41 +02:00
Miriam Baglioni 743d31be22 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:11:22 +02:00
Miriam Baglioni 65b48df652 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:11:06 +02:00
Miriam Baglioni 90b54d3efb apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:08:24 +02:00
Miriam Baglioni 69bbb9592a apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:07:39 +02:00
Miriam Baglioni 945323299a apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:07:24 +02:00
Miriam Baglioni e04c993247 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:07:07 +02:00
Miriam Baglioni ed0812d0ce apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:06:49 +02:00
Miriam Baglioni d55cfe0ea5 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:06:20 +02:00
Miriam Baglioni 80866bec7d apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:06:05 +02:00
Miriam Baglioni 1400978c0a apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:05:44 +02:00
Miriam Baglioni 7b941a2e0a apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:05:17 +02:00
Miriam Baglioni f7474f50fe apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:04:52 +02:00
Miriam Baglioni 367203f412 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:04:33 +02:00
Miriam Baglioni 3ab4809d31 apply changes in D-Net/dnet-hadoop#40 (comment) 2020-08-13 12:04:10 +02:00
Miriam Baglioni 235d4e4d6e moved Context as relevant for Communities dump 2020-08-12 18:16:45 +02:00
Miriam Baglioni 7400cd019d removed not needed variable 2020-08-12 10:03:33 +02:00
Miriam Baglioni 98d28bab5c fixed missing _ in context nsprefix 2020-08-12 10:00:18 +02:00