Commit Graph

1838 Commits

Author SHA1 Message Date
Miriam Baglioni be5ed8f554 added check to avoid sending empty metadata. 2020-11-05 16:10:17 +01:00
Claudio Atzori 2148a51fae minor changes 2020-11-05 11:24:12 +01:00
Claudio Atzori 4625b7486e code formatting 2020-11-04 18:12:43 +01:00
Claudio Atzori f5f346dd2b Merge pull request 'dump' (#50) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
2020-11-04 18:07:01 +01:00
Miriam Baglioni e9ac471ae9 removed dependency from classes for the pid graph dump 2020-11-04 18:04:42 +01:00
Miriam Baglioni b90a945c49 removed property files for pid graph dump 2020-11-04 17:28:33 +01:00
Miriam Baglioni bac307155a removed properties specific for pid graph dump 2020-11-04 17:28:04 +01:00
Miriam Baglioni 9c9d50f486 removed code specific for pid graph dump 2020-11-04 17:26:22 +01:00
Miriam Baglioni 5669890934 removed commented lines 2020-11-04 17:15:21 +01:00
Miriam Baglioni 6a89f59be9 removed commented lines 2020-11-04 17:13:59 +01:00
Miriam Baglioni 56150d7e5e removed all code related to the dump of pids graph 2020-11-04 17:13:12 +01:00
Miriam Baglioni 16c54a96f8 removed pid dump 2020-11-04 17:11:32 +01:00
Miriam Baglioni 0cac5436ff Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump 2020-11-04 13:21:11 +01:00
Alessia Bardi 51808b5afd Updated descriptions 2020-11-04 12:29:48 +01:00
Alessia Bardi e6becf8659 Updated descriptions 2020-11-04 12:17:57 +01:00
Alessia Bardi 0abe0eee33 Updated descriptions 2020-11-04 12:15:30 +01:00
Alessia Bardi f6ab238f5d Updated descriptions 2020-11-04 11:50:47 +01:00
Sandro La Bruzzo 3581244daf Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-11-04 09:04:22 +01:00
Sandro La Bruzzo 66efb39634 implemented merge scholix 2020-11-04 09:04:01 +01:00
Miriam Baglioni c010a8442f fixed issue on test code 2020-11-03 17:26:51 +01:00
Miriam Baglioni 8ec7a61188 merge branch with master 2020-11-03 16:59:08 +01:00
Miriam Baglioni c209284ca7 new schemas for the entities in the dump with added descriptions 2020-11-03 16:58:08 +01:00
Miriam Baglioni 08806deddf added the splitSize non mandatory parameter. Default size 10G 2020-11-03 16:57:34 +01:00
Miriam Baglioni 7d2eda43ca added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase 2020-11-03 16:57:01 +01:00
Miriam Baglioni cbbb1bdc54 moved business logic to new class in common for handling the zip of hte archives 2020-11-03 16:55:50 +01:00
Miriam Baglioni d4382b54df moved the tar archive with maz size on common module 2020-11-03 16:54:50 +01:00
Claudio Atzori 5310e56dba remove empy PIDs 2020-11-03 11:52:10 +01:00
Sandro La Bruzzo 754c86f33e fixed test to work on jenkins 2020-11-02 09:35:01 +01:00
Sandro La Bruzzo 39337d8a8a fixed test 2020-11-02 09:26:25 +01:00
Miriam Baglioni dabb33e018 changed the discriminant for which split the file 2020-10-30 17:52:22 +01:00
Claudio Atzori c5dda3a00c Merge pull request 'h2020classification' (#49) from miriam.baglioni/dnet-hadoop:h2020classification into master
LGTM
2020-10-30 17:10:05 +01:00
Miriam Baglioni 4905739be6 changed resource file to mirror change in business logic 2020-10-30 17:02:57 +01:00
Miriam Baglioni b40360ebfb changed the code to mirror the changed decision in the classification level and prodramme description labels 2020-10-30 17:02:30 +01:00
Miriam Baglioni 696409fb9f disabled tests because needing remote resource 2020-10-30 17:01:48 +01:00
Miriam Baglioni 0fba08eae4 max allowed size per file 10 Gb 2020-10-30 16:05:55 +01:00
Miriam Baglioni b828587252 prevent the code to cicle indefinetly 2020-10-30 15:01:25 +01:00
Miriam Baglioni f747e303ac classes for dumping of the graph as ttl file 2020-10-30 14:13:45 +01:00
Miriam Baglioni 16baf5b69e formatting 2020-10-30 14:13:14 +01:00
Miriam Baglioni a9eef9c852 added check for possible Optional value in relation dataInfo 2020-10-30 14:12:28 +01:00
Miriam Baglioni 5f4de9a962 formatting 2020-10-30 14:11:40 +01:00
Miriam Baglioni 14bf2e7238 added option to split dumps bigger that 40Gb on different files 2020-10-30 14:09:04 +01:00
Miriam Baglioni 78fdb11c3f merge branch with master 2020-10-29 12:55:22 +01:00
Sandro La Bruzzo 1d9fdb7367 fixed spark memory issue in SparkSplitOafTODLIEntities 2020-10-28 12:30:32 +01:00
Miriam Baglioni d2374e3b9e added code to handle cases where the funding tree is not existing 2020-10-27 16:15:21 +01:00
Miriam Baglioni 5d3012eeb4 changed code to dump only the programme list and not the classification list 2020-10-27 16:14:18 +01:00
Miriam Baglioni 3241ec1777 added connection timeout and socket timeout 600 sec 2020-10-27 16:12:11 +01:00
Enrico Ottonello 9818e74a70 added dependency version in main pom.xml for orcid no doi 2020-10-22 16:38:00 +02:00
Enrico Ottonello 210a50e4f4 replaced null value 2020-10-22 16:24:42 +02:00
Enrico Ottonello b0290dbcb7 moved all dependencies version to main pom.xml 2020-10-22 16:20:46 +02:00
Enrico Ottonello a38ab57062 let run test methods 2020-10-22 15:43:50 +02:00
Enrico Ottonello 1139d6568d replaced null value with a more safe empty string as return value 2020-10-22 15:32:26 +02:00
Enrico Ottonello c58db1c8ea added filter on null value after map function 2020-10-22 15:11:02 +02:00
Enrico Ottonello 846ba30873 if typologies mapping fails, an exception will be propagated 2020-10-22 14:36:18 +02:00
Enrico Ottonello c3114ba0ae replaced null as return value with a more safe empty string 2020-10-22 14:21:31 +02:00
Enrico Ottonello c295c71ca0 added comment 2020-10-22 14:07:26 +02:00
Enrico Ottonello ab083f9946 propagate exception on parsing work (PR request) 2020-10-22 14:02:32 +02:00
sandro 3a81a940b7 solved bug on merge publication 2020-10-21 22:41:55 +02:00
Miriam Baglioni a2ce527fae changed to match the requirements for short titles in level and long titles in classification 2020-10-20 17:03:25 +02:00
Sandro La Bruzzo 346ed65e2c added upload to zenodo node 2020-10-20 16:59:55 +02:00
sandro 271b4db450 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-10-20 16:09:49 +02:00
sandro d58d02d448 added workflow upload on zenodo 2020-10-20 16:09:07 +02:00
Alessia Bardi 1425d810a8 testing mapping 2020-10-19 17:46:14 +02:00
Sandro La Bruzzo fed711da80 Merge remote-tracking branch 'origin/master' into merge_record_to_common 2020-10-13 15:32:45 +02:00
Sandro La Bruzzo 34bf64c94f fixed export Scholexplorer to OpenAire 2020-10-13 08:47:58 +02:00
Alessia Bardi 8775a64bc1 Merge pull request 'Merging different compatibility levels (pinocchio operator)' (#47) from merge_graph into master 2020-10-09 14:44:52 +02:00
Claudio Atzori e751c1402f Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-10-09 13:53:21 +02:00
Claudio Atzori b961dc7d1e added originalid to the fields in the result graph view 2020-10-09 13:53:15 +02:00
Sandro La Bruzzo 734934e2eb fixed error on empty intersection with publication and relation on export to OAF 2020-10-08 17:29:29 +02:00
Sandro La Bruzzo eec418cd26 moved AuthoreMerger into dhp-common 2020-10-08 10:33:55 +02:00
Sandro La Bruzzo fe0a7870e6 Added test to check if merge authors works 2020-10-08 10:33:12 +02:00
Sandro La Bruzzo cd9c377d18 adpted scholexplorer Dump generation to the new Dataset definition 2020-10-08 10:10:13 +02:00
Claudio Atzori a3f37a9414 javadoc 2020-10-07 16:44:22 +02:00
Claudio Atzori 8d85a2fced [BETA wf only] datasources involved in the merge operation doesn't obey to the infra precedence policy, but relies on a custom behaviour that, given two datasources from beta and prod returns the one from prod with the highest compatibility among the two 2020-10-07 16:28:52 +02:00
Claudio Atzori 5f7b75f5c5 code formatting 2020-10-07 13:22:54 +02:00
miconis 5a8bc329c5 bug fix in the result merge: it takes the correct bestaccessright basing on the license instead of the trust 2020-10-06 15:26:44 +02:00
Miriam Baglioni 061527f06e adding short description 2020-10-05 13:54:39 +02:00
Miriam Baglioni 0c12d7bdd8 adding short description 2020-10-05 11:39:55 +02:00
Miriam Baglioni ae08b3c0dd merge branch with master 2020-10-05 11:35:55 +02:00
Miriam Baglioni 11b7eaae09 changed the name of the folder where to store the context entity from context to communities_infrastructures 2020-10-05 11:24:54 +02:00
Miriam Baglioni 32bffb0134 changed the name from communities_infrastructures to communities_infrastuctures.json 2020-10-05 11:24:17 +02:00
Claudio Atzori 23f64d9eb4 updated dedup tests following the dnet-pace-core library update 2020-10-02 14:30:53 +02:00
Miriam Baglioni fc2f7636be removed not used code 2020-10-02 12:33:52 +02:00
Miriam Baglioni 25cbcf6114 changed to solve issues about names. context renamed communities_infrastructure.json and removed the double json.gz extention to the name of the part in the tar 2020-10-02 12:17:46 +02:00
Claudio Atzori 9db0f88fb8 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-10-02 09:43:35 +02:00
Claudio Atzori 49ae3450a9 code formatting 2020-10-02 09:43:24 +02:00
Claudio Atzori c2a6e2a9bf fixed mapping for datasource journal info (ISSNs) 2020-10-02 09:37:08 +02:00
Miriam Baglioni 01117a46e1 whole workflow activated 2020-10-01 17:19:21 +02:00
Miriam Baglioni cfb5766c6b removed double json.gz from names of files in the tar 2020-10-01 17:18:34 +02:00
Miriam Baglioni fcaedac980 merge branch with master 2020-10-01 16:46:59 +02:00
Miriam Baglioni c6e6ed1bd8 merge branch with master 2020-10-01 16:24:41 +02:00
Miriam Baglioni 4aec347351 refactoring 2020-10-01 16:23:52 +02:00
Miriam Baglioni 61946b4092 refactoring 2020-10-01 16:22:48 +02:00
Miriam Baglioni 7e6d35e56c added the link to the excel file related to topic 2020-10-01 15:53:31 +02:00
Sandro La Bruzzo 1a0a44e85a Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-10-01 15:46:53 +02:00
Sandro La Bruzzo c4a3c52e45 fixed Doiboost bug in the identifier 2020-10-01 15:46:44 +02:00
Miriam Baglioni 43cbd62c2b added classpath.first in the configuration 2020-10-01 15:46:34 +02:00
Miriam Baglioni cd69c6b023 added dependency for the topic file path 2020-10-01 15:45:59 +02:00
Miriam Baglioni 771cde3d05 moved the library version to global pom 2020-10-01 15:43:47 +02:00
Miriam Baglioni 632351c0da modified test resources to mirror the changed in the code 2020-10-01 15:43:02 +02:00
Miriam Baglioni ebc1c5513f modified test resources to mirror the changed in the code 2020-10-01 15:42:29 +02:00
Miriam Baglioni 3a374c34b6 fixed null pointer exception 2020-10-01 15:41:01 +02:00
Miriam Baglioni 83ea746163 added check to the test 2020-10-01 15:40:28 +02:00
Claudio Atzori 2e9e13444d author pids made unique by value 2020-10-01 12:50:40 +02:00
Miriam Baglioni 6e5db85b32 - 2020-10-01 11:51:11 +02:00
Miriam Baglioni a46179f61c refactoring 2020-10-01 11:22:01 +02:00
Miriam Baglioni b90bee124b removing raws that are empy from thos imported 2020-10-01 11:16:49 +02:00
Miriam Baglioni c107f193c9 refactoring 2020-10-01 11:16:22 +02:00
Claudio Atzori e265c3e125 cleaning functions factored out in a dedicated class 2020-10-01 10:50:15 +02:00
Miriam Baglioni 706a80a29a added test to check that separator '-' (not hyphen) will be recognized 2020-10-01 10:38:31 +02:00
Miriam Baglioni 3dca586b3b refactoring 2020-10-01 10:34:48 +02:00
Miriam Baglioni 416bda6066 changed the programme.desxcription by using the same value used in the classification instead of the short title or the title 2020-10-01 10:31:33 +02:00
Miriam Baglioni f6587c91f3 added comparison to a char that seems - but it is not 2020-10-01 10:30:26 +02:00
Claudio Atzori 4287164aba include relevantdate field in the result view 2020-10-01 10:28:55 +02:00
Miriam Baglioni 7e73bb88b3 changed the logic to add the topic description to the project 2020-09-28 17:21:43 +02:00
Miriam Baglioni 0a035e3630 - 2020-09-28 17:20:49 +02:00
Miriam Baglioni 16bee2084d added the topic code to the project subset 2020-09-28 17:20:11 +02:00
Miriam Baglioni 0bf2d0db52 added to the workflow the download of the topic excel file and one property needed to get the input path of the topic file in the hdfs filesystem 2020-09-28 12:17:22 +02:00
Miriam Baglioni c2abde4d9f changed the implementation of Atomic Actions creation by exploiting the topic information get from the cordis excel file 2020-09-28 12:16:34 +02:00
Miriam Baglioni d930b8d3fc changed the query to get only the code of the project and not the optional1 (topic code) and optional2 (topic description) 2020-09-28 12:15:48 +02:00
Miriam Baglioni f8f5cfd5cc removed the part added to set the topic code and description in the step of project preparation 2020-09-28 12:13:33 +02:00
Miriam Baglioni 9e19c9a221 remove the topic description from the values in the CSVProject class 2020-09-28 12:11:03 +02:00
Miriam Baglioni 6d8b932e40 refactoring 2020-09-28 12:06:56 +02:00
Miriam Baglioni b77f166549 changed the package name from csvutils to utils 2020-09-28 12:05:47 +02:00
Miriam Baglioni e33e3277de added needed dependency to read the excel file 2020-09-28 12:03:14 +02:00
Miriam Baglioni f4739a371a code to get the information related to the topic association between code and description. 2020-09-28 12:02:48 +02:00
Miriam Baglioni 7b6a7333e6 merge branch with master 2020-09-25 16:42:07 +02:00
Miriam Baglioni 983a12ed15 temporary modification to allow the upload of files in the sandbox without the neew to recreate the mapping from scratch 2020-09-25 16:41:51 +02:00
Miriam Baglioni 8b36d19182 added property depositionId and chenage property newVersion that became string from boolean to handle the three possible distinct values 2020-09-25 16:41:15 +02:00
Miriam Baglioni ed5239f9ec added new code to handle the new possibility to upload files to an already open deposition 2020-09-25 16:34:32 +02:00
Miriam Baglioni 3a8c524fce refactor 2020-09-25 16:34:02 +02:00
Miriam Baglioni 2ac2b537b6 merge branch with master 2020-09-25 14:40:47 +02:00
Miriam Baglioni 54800fb9b0 enabled only the step to upload in zenodo 2020-09-25 14:40:22 +02:00
Miriam Baglioni 12c2dfc268 modified the resource to consider the information added to the model 2020-09-25 14:17:23 +02:00
Miriam Baglioni 969fa8d96e fixed issue and changed the transformation of the programme file to consider the new model 2020-09-25 13:32:34 +02:00
Michele Artini c171fdebe1 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-09-25 09:03:09 +02:00
Michele Artini c96598aaa4 opendoar partition 2020-09-25 09:02:58 +02:00
Miriam Baglioni de6c4d46d8 fixed conflicts 2020-09-24 15:35:01 +02:00
Miriam Baglioni e917281822 - 2020-09-24 15:24:05 +02:00
Miriam Baglioni 9f54f69e6d added topic information 2020-09-24 15:23:35 +02:00
Miriam Baglioni d6206d6e63 add the topic description to the action set associated to the project 2020-09-24 15:22:40 +02:00
Miriam Baglioni 6b50226f3b added topic code and topic description 2020-09-24 15:21:49 +02:00
Miriam Baglioni 15af1f527e modified to consider the topic information 2020-09-24 15:20:56 +02:00
Miriam Baglioni 609ff17cfc now the commission give us the framework programme (FP7 - H2020) so use this information to filter out programmes not associated to H2020 2020-09-24 15:19:31 +02:00
Miriam Baglioni b66f930466 Added optionl1 and optional2 information to the files red from the db. Optional1 contains the topic code and optional2 contains the topic description 2020-09-24 15:16:56 +02:00
Miriam Baglioni 860e6d38a6 added topic description to the CSV project variables 2020-09-24 15:15:26 +02:00
Claudio Atzori 044d3a0214 fixed query used to load datasources in the Graph 2020-09-24 13:48:58 +02:00
Claudio Atzori 27df1cea6d code formatting 2020-09-24 12:16:00 +02:00
Claudio Atzori fb22f4d70b included values for projects fundedamount and totalcost fields in the mapping tests. Swapped expected and actual values in junit test assertions 2020-09-24 12:10:59 +02:00
Claudio Atzori 42f55395c8 fixed order of the ISSNs returned by the SQL query 2020-09-24 12:09:58 +02:00
Claudio Atzori fadf5c7c69 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-09-24 10:42:52 +02:00
Claudio Atzori 9a7e72d528 using concat_ws to join textual columns from PSQL. When using || to perform the concatenation, Null columns makes the operation result to be Null 2020-09-24 10:42:47 +02:00
Claudio Atzori 9e3e93c6b6 setting the correct issn type in the datasource.journal element 2020-09-24 10:39:16 +02:00
Miriam Baglioni 0d83f47166 merge branch with master 2020-09-23 17:33:49 +02:00
Miriam Baglioni 39eb8ab25b changed the dump to move from h2020programme to h2020classification 2020-09-23 17:33:00 +02:00
Miriam Baglioni 1d84cf19a6 added new line to resource file 2020-09-23 17:32:22 +02:00
Miriam Baglioni f0c476b6c9 modification to the test classes to consider h2020classification 2020-09-23 17:31:49 +02:00
Miriam Baglioni 2cba3cb484 modification to the classes building the actionset to consider the h2020classification 2020-09-23 17:31:15 +02:00
Miriam Baglioni 1069cf243a modification to the schema to consider the H2020classification of the programme. The filed Programme has been moved inside the H2020classification that is now associated to the Project. Programme is no more associated directly to the Project but via H2020CLassification 2020-09-22 14:38:00 +02:00
Enrico Ottonello a97ad20c7b exception is now propagated (PR review) 2020-09-22 10:46:34 +02:00
Enrico Ottonello fefbcfb106 dependency version moved to main pom (PR review) 2020-09-22 10:20:25 +02:00
Michele Artini 9e681609fd stats to sql file 2020-09-17 15:51:22 +02:00
Michele Artini 51321c2701 partition of events by opedoarId 2020-09-17 11:38:07 +02:00
Claudio Atzori cf2ce1a09b code formatting 2020-09-15 15:58:03 +02:00
Enrico Ottonello 9e8e7fe6ef add comments 2020-09-15 11:32:49 +02:00
Miriam Baglioni c2b5c780ff - 2020-09-14 14:34:03 +02:00
Miriam Baglioni e2ceefe9be - 2020-09-14 14:33:28 +02:00
Miriam Baglioni 1f893e63dc - 2020-09-14 14:33:10 +02:00
Enrico Ottonello 538f299767 merged 2020-09-14 12:35:16 +02:00
Enrico Ottonello eb8c9b2348 Merge remote-tracking branch 'upstream/master' into orcid-no-doi 2020-09-14 12:00:56 +02:00
Michele Artini 9b0c12f5d3 send notifications 2020-09-11 12:06:16 +02:00
Michele Artini 028613b751 remove old notifications 2020-09-09 15:32:06 +02:00
Michele Artini 9cfc124ac5 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-09-08 16:39:54 +02:00
Michele Artini a597a218ab * forall topics 2020-09-08 16:39:40 +02:00
Claudio Atzori 8a523474b7 code formatting 2020-09-07 11:40:16 +02:00
Michele Artini bb459caf69 support for all topic subscriptions 2020-08-27 11:01:21 +02:00
Michele Artini 82ed8edafd notification indexing 2020-08-26 15:10:48 +02:00
Miriam Baglioni b72a7dad46 resuorce for pid graph dump 2020-08-24 17:09:01 +02:00
Miriam Baglioni 8694bb9b31 refactoring due to compilation 2020-08-24 17:07:34 +02:00
Miriam Baglioni 8a069a4fea - 2020-08-24 17:01:30 +02:00
Miriam Baglioni 34fa96f3b1 - 2020-08-24 17:00:20 +02:00
Miriam Baglioni 5fb2949cb8 added utils methods 2020-08-24 17:00:09 +02:00
Miriam Baglioni 2a540b6c01 added constants for the pid graph dump 2020-08-24 16:55:35 +02:00
Miriam Baglioni da103c399a resources for the pid graph dump test 2020-08-24 16:52:07 +02:00
Miriam Baglioni 630a6a1fe7 first tests for the pid graph dump 2020-08-24 16:51:26 +02:00
Miriam Baglioni 40c8d2de7b test resources for the dump of the pids graph 2020-08-24 16:50:39 +02:00
Miriam Baglioni bef79d3bdf first attempt to the dump of pids graph 2020-08-24 16:49:38 +02:00
Michele Artini da470422d3 deleting events 2020-08-21 14:52:48 +02:00
Michele Artini 6e60bf026a indexing only a subset of eventsa 2020-08-19 12:39:22 +02:00
Miriam Baglioni 85203c16e3 merge branch with master 2020-08-19 11:49:03 +02:00
Miriam Baglioni 2c783793ba removed the affiliation from the author to mirror the changes in the model 2020-08-19 11:48:12 +02:00
Miriam Baglioni f6bf888016 removed affiliation from author to mirror the changes in the model 2020-08-19 11:41:41 +02:00
Miriam Baglioni 66d0e0d3f2 - 2020-08-19 11:31:50 +02:00
Miriam Baglioni 1c593a9cfe - 2020-08-19 11:29:51 +02:00
Miriam Baglioni e42b2f5ae2 - 2020-08-19 11:29:09 +02:00
Miriam Baglioni f81ee22418 changed to mirror the changes in the model (Instance, CommunityInstance, GraphResult) 2020-08-19 11:28:26 +02:00
Miriam Baglioni 387be43fd4 changed to discriminate if dumping all the results type together or each one in its own archive 2020-08-19 11:25:27 +02:00
Miriam Baglioni c5858afb88 added parameter to guide the dump for the result (resultAggregation). true if all the result types should be dump together, false otherwise. 2020-08-19 11:24:14 +02:00
Miriam Baglioni d407852ac2 changed to reflect the changed in the model 2020-08-19 11:15:05 +02:00
Miriam Baglioni 47c21a8961 refactoring due to compilation 2020-08-19 11:11:57 +02:00
Miriam Baglioni 5570678c65 changed parameter name from hfdsNameNode to nameNode 2020-08-19 10:59:26 +02:00