Commit Graph

354 Commits

Author SHA1 Message Date
Sandro La Bruzzo 524e5f3092 Improved parallelization on transformation wf on hadoop 2021-04-19 15:17:25 +02:00
Sandro La Bruzzo cdfe01bbae improved parallelization on transformation job 2021-04-19 15:14:52 +02:00
Claudio Atzori 3125cef545 code formatting 2021-04-14 09:11:54 +02:00
Andreas Czerniak 3b694074ff add xslt, personname cleaner 2021-04-13 07:04:27 +02:00
Claudio Atzori 7941d7be29 WIP: using common definitions from ModelConstants 2021-03-31 18:33:57 +02:00
Claudio Atzori 879e8cc7ef WIP: using common definitions from ModelConstants 2021-03-31 17:12:01 +02:00
Claudio Atzori 72ce741ea6 WIP: using common definitions from ModelConstants 2021-03-31 17:07:13 +02:00
Sandro La Bruzzo 616d2ecce2 splitted workflow collecting datacite into two workflows.
Released on beta
2021-03-31 15:45:58 +02:00
Sandro La Bruzzo 1dfda3624e improved workflow importing datacite 2021-03-26 13:56:29 +01:00
Claudio Atzori 8db248aa13 avoiding error on jenkins compilations: java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)! 2021-03-23 09:56:34 +01:00
Sandro La Bruzzo c73072079d fix conflicts 2021-03-22 16:36:31 +01:00
Claudio Atzori 61a2551e74 migrated last changes from svn (dnet45) 2021-03-15 17:17:55 +01:00
Claudio Atzori acbe3119a4 RestCollectorPlugin imported from dne45 2021-03-08 09:44:09 +01:00
Claudio Atzori fa7930d2e2 merging contributions from PR#97 2021-03-05 15:45:28 +01:00
Claudio Atzori 36f750cd1d removed unused classes 2021-03-03 10:22:29 +01:00
Claudio Atzori b73dce3e3a more logging on the MDStore mongodb client. Forcing UTF_8 encoding on the content 2021-03-03 10:17:16 +01:00
Claudio Atzori e76c4f62c1 MetadataRecord moved in dhp-schemas 2021-02-26 10:58:48 +01:00
Claudio Atzori 7df2461ccc indent XML records collected from oai-pmh endpoints 2021-02-25 16:19:12 +01:00
Claudio Atzori b830e33392 mdstore collector plugin 2021-02-25 12:30:30 +01:00
Claudio Atzori 271e88537b code formatting 2021-02-25 12:28:56 +01:00
Claudio Atzori 9c899f4433 cleanup on transformation functions and the relative tests 2021-02-24 15:07:59 +01:00
Claudio Atzori fc3fa5e343 implemented mdstore collector plugin 2021-02-24 15:07:24 +01:00
Claudio Atzori e7eba9f7e7 WIP: transformation workflow error reporting; cleanup 2021-02-17 16:54:08 +01:00
Claudio Atzori 58467aaf1e WIP: transformation workflow error reporting 2021-02-17 16:14:41 +01:00
Claudio Atzori cc88701f29 retry for any Socket exception 2021-02-17 16:13:54 +01:00
Claudio Atzori 545f8f3e48 using jackson objectmapper instead of GSon to serialise the aggregation report 2021-02-17 12:15:00 +01:00
Claudio Atzori b592d78bb4 WIP: collectorWorker error reporting, generalised reported implementation 2021-02-17 10:28:01 +01:00
Claudio Atzori cf27905a71 WIP: collectorWorker error reporting, added report messages 2021-02-16 16:53:14 +01:00
Claudio Atzori 1abe6d1ad7 WIP: collectorWorker error reporting, added report messages 2021-02-15 15:08:59 +01:00
Claudio Atzori 523a6bfa97 Merge pull request 'first commit to the correct branch' (#94) from andreas.czerniak/BrAggr_dnet-hadoop:hadoop_aggregator into hadoop_aggregator
Looks good to me, thanks Andreas!
2021-02-15 12:15:31 +01:00
Sandro La Bruzzo 7edcc87ed4 changed xslt behaviour on failure 2021-02-12 17:27:08 +01:00
Sandro La Bruzzo 6a37c7f175 merge fixed 2021-02-12 16:38:47 +01:00
Sandro La Bruzzo b3f5c2351d Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator
 Conflicts:
	dhp-workflows/dhp-aggregation/src/test/java/eu/dnetlib/dhp/transformation/TransformationJobTest.java
2021-02-12 16:37:14 +01:00
Sandro La Bruzzo f216277219 Implemented cleaning date 2021-02-12 16:34:52 +01:00
Andreas Czerniak 5a9017cf18 clone, min. changes, test, run 2021-02-12 14:32:36 +01:00
Claudio Atzori aa55dedb8a Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator 2021-02-12 12:31:05 +01:00
Claudio Atzori 29c6f7e255 classes related to the collection workflow moved into common package; implemented MongoDB collection plugins 2021-02-12 12:31:02 +01:00
Sandro La Bruzzo 17e6f1934e fixed NPE on cleaner 2021-02-12 11:48:11 +01:00
Sandro La Bruzzo ebcc3ec14f updated wrong datacite identifier in trasformation 2021-02-11 16:25:51 +01:00
Claudio Atzori bae029f828 collection_java_xmx allows to declare the heap size allocated for the java actions involved in the metadata collectionw workflow 2021-02-08 18:07:23 +01:00
Claudio Atzori bebc54d5bf seq file storing native records is now compressed 2021-02-08 18:06:25 +01:00
Claudio Atzori 50add4c61b added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common 2021-02-08 12:19:38 +01:00
Claudio Atzori 40df0f987d better logging, WIP: collectorWorker error reporting; common functions moved in DHPUtils 2021-02-06 20:12:00 +01:00
Claudio Atzori a8a758925e better logging, WIP: collectorWorker error reporting 2021-02-05 19:18:05 +01:00
Claudio Atzori 730973679a Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator 2021-02-04 17:25:00 +01:00
Claudio Atzori deb85706db imported HttpConnector from https://svn.driver.research-infrastructures.eu/driver/dnet45/modules/dnet-modular-collector-service/trunk/src/main/java/eu/dnetlib/data/collector/plugins/HttpConnector.java as HttpConnector2 2021-02-04 17:24:52 +01:00
Sandro La Bruzzo 4dae5e605d implemented messaging btween collection worker and Dnet 2021-02-04 15:51:15 +01:00
Claudio Atzori 40764cf626 better logging, WIP: collectorWorker error reporting 2021-02-04 14:06:02 +01:00
Sandro La Bruzzo 69c253710b fixed test 2021-02-04 10:30:49 +01:00
Claudio Atzori e04045089f better logging, WIP: collectorWorker error reporting 2021-02-03 17:58:22 +01:00
Claudio Atzori 0e8a4f9f1a better logging, WIP: collectorWorker error reporting 2021-02-03 12:33:41 +01:00
Claudio Atzori 53884d12c2 code formatting 2021-02-02 14:38:03 +01:00
Claudio Atzori ac46c247d2 code formatting 2021-02-02 14:24:00 +01:00
Claudio Atzori bde14b149a fixed transformation target paths 2021-02-02 12:49:29 +01:00
Claudio Atzori ca4391aa1c minor changes 2021-02-02 12:44:04 +01:00
Claudio Atzori bb89b99b24 code formatting 2021-02-02 12:34:14 +01:00
Claudio Atzori 75807ea5ae factored out constants 2021-02-02 12:28:21 +01:00
Sandro La Bruzzo 0634674add implemented transformation test 2021-02-02 12:12:14 +01:00
Claudio Atzori 8eaa1fd4b4 WIP: metadata collection in INCREMENTAL mode and relative test 2021-02-01 19:29:10 +01:00
Sandro La Bruzzo bead34d11a code refactor 2021-02-01 14:58:06 +01:00
Sandro La Bruzzo 6ff234d81b Implemented a first prototype of incremental harvesting and trasformation using readlock 2021-02-01 13:56:05 +01:00
Sandro La Bruzzo b6b835ef49 update transformation Factory to get Transformation Rule by Id and not by Title 2021-02-01 08:49:42 +01:00
Sandro La Bruzzo e423634cb6 RollBack in case of error WORKS!!! 2021-01-29 17:21:42 +01:00
Sandro La Bruzzo 8ee82576c6 Collection on Refresh WORKS!!! 2021-01-29 17:02:46 +01:00
Sandro La Bruzzo 0276180039 WIP mdstore
transaction implemented on hadoop side
2021-01-29 16:42:41 +01:00
Sandro La Bruzzo 0f8e2ecce6 Merged Datacite transfrom into this branch 2021-01-29 10:45:07 +01:00
Sandro La Bruzzo 99cf3a8ea4 Merged Datacite transfrom into this branch 2021-01-28 16:34:46 +01:00
Sandro La Bruzzo 98b9498b57 Removed old messaging system not quite used from collection and Transformation workflow
code refactor
2021-01-28 09:51:17 +01:00
Sandro La Bruzzo 184e7b3856 Implemented new Transformation using spark 2021-01-27 15:43:08 +01:00
Sandro La Bruzzo ffb092b8d3 removed duplicate code HttpConnector.java 2021-01-25 15:05:37 +01:00
Claudio Atzori 41500669e2 [BIP! Scores integration] merged missing classes from bipFinder branch 2021-01-11 14:39:47 +01:00
Claudio Atzori 2a7a10809e [BIP! Scores integration] merged missing classes from bipFinder branch 2021-01-11 10:05:02 +01:00
Claudio Atzori d6686dd7cf merged from master 2021-01-08 18:16:12 +01:00
Claudio Atzori 34229970e6 [BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation 2021-01-08 16:29:17 +01:00
Claudio Atzori 1361c9eb0c [BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation 2021-01-07 10:07:30 +01:00
Claudio Atzori 2e503ee101 code formatting 2020-12-17 13:47:38 +01:00
Claudio Atzori 03319d3bd9 Revert "Merge pull request 'Creation of the action set to include the bipFinder! score' (#62) from miriam.baglioni/dnet-hadoop:bipFinder into master"
This reverts commit add7e1693b, reversing
changes made to f9a8fd8bbd.
2020-12-17 12:23:58 +01:00
Miriam Baglioni 888175baf7 added java doc 2020-12-01 18:36:29 +01:00
Miriam Baglioni 3d62d99d5d fixed issue in workflow variable 2020-12-01 15:02:49 +01:00
Miriam Baglioni 17680296b9 removed unnecessary variable and unused method 2020-12-01 15:02:31 +01:00
Miriam Baglioni 5b3ed70808 refactoring 2020-12-01 14:31:34 +01:00
Miriam Baglioni 62ff4999e3 added workflow and last step of collection and save 2020-12-01 14:30:56 +01:00
Miriam Baglioni 45d06c45c7 collecting all the atoic actions for result type and save them all in the AS path 2020-12-01 14:29:18 +01:00
Miriam Baglioni 0051ebede5 extending test 2020-12-01 12:43:03 +01:00
Miriam Baglioni 719da15f04 added test resources 2020-12-01 12:42:30 +01:00
Miriam Baglioni db36e11912 classes test classes and resources for production of the actionset to include bipFinder score in results 2020-11-30 20:14:23 +01:00
Sandro La Bruzzo 66efb39634 implemented merge scholix 2020-11-04 09:04:01 +01:00
Miriam Baglioni 4905739be6 changed resource file to mirror change in business logic 2020-10-30 17:02:57 +01:00
Miriam Baglioni b40360ebfb changed the code to mirror the changed decision in the classification level and prodramme description labels 2020-10-30 17:02:30 +01:00
Miriam Baglioni 696409fb9f disabled tests because needing remote resource 2020-10-30 17:01:48 +01:00
Miriam Baglioni a2ce527fae changed to match the requirements for short titles in level and long titles in classification 2020-10-20 17:03:25 +02:00
Claudio Atzori 5f7b75f5c5 code formatting 2020-10-07 13:22:54 +02:00
Miriam Baglioni 061527f06e adding short description 2020-10-05 13:54:39 +02:00
Miriam Baglioni 0c12d7bdd8 adding short description 2020-10-05 11:39:55 +02:00
Miriam Baglioni fc2f7636be removed not used code 2020-10-02 12:33:52 +02:00
Miriam Baglioni 4aec347351 refactoring 2020-10-01 16:23:52 +02:00
Miriam Baglioni 61946b4092 refactoring 2020-10-01 16:22:48 +02:00
Miriam Baglioni 7e6d35e56c added the link to the excel file related to topic 2020-10-01 15:53:31 +02:00
Miriam Baglioni 43cbd62c2b added classpath.first in the configuration 2020-10-01 15:46:34 +02:00
Miriam Baglioni cd69c6b023 added dependency for the topic file path 2020-10-01 15:45:59 +02:00
Miriam Baglioni 632351c0da modified test resources to mirror the changed in the code 2020-10-01 15:43:02 +02:00
Miriam Baglioni ebc1c5513f modified test resources to mirror the changed in the code 2020-10-01 15:42:29 +02:00
Miriam Baglioni 3a374c34b6 fixed null pointer exception 2020-10-01 15:41:01 +02:00
Miriam Baglioni 83ea746163 added check to the test 2020-10-01 15:40:28 +02:00
Miriam Baglioni 6e5db85b32 - 2020-10-01 11:51:11 +02:00
Miriam Baglioni a46179f61c refactoring 2020-10-01 11:22:01 +02:00
Miriam Baglioni b90bee124b removing raws that are empy from thos imported 2020-10-01 11:16:49 +02:00
Miriam Baglioni c107f193c9 refactoring 2020-10-01 11:16:22 +02:00
Miriam Baglioni 706a80a29a added test to check that separator '-' (not hyphen) will be recognized 2020-10-01 10:38:31 +02:00
Miriam Baglioni 3dca586b3b refactoring 2020-10-01 10:34:48 +02:00
Miriam Baglioni 416bda6066 changed the programme.desxcription by using the same value used in the classification instead of the short title or the title 2020-10-01 10:31:33 +02:00
Miriam Baglioni f6587c91f3 added comparison to a char that seems - but it is not 2020-10-01 10:30:26 +02:00
Miriam Baglioni 7e73bb88b3 changed the logic to add the topic description to the project 2020-09-28 17:21:43 +02:00
Miriam Baglioni 0a035e3630 - 2020-09-28 17:20:49 +02:00
Miriam Baglioni 16bee2084d added the topic code to the project subset 2020-09-28 17:20:11 +02:00
Miriam Baglioni 0bf2d0db52 added to the workflow the download of the topic excel file and one property needed to get the input path of the topic file in the hdfs filesystem 2020-09-28 12:17:22 +02:00
Miriam Baglioni c2abde4d9f changed the implementation of Atomic Actions creation by exploiting the topic information get from the cordis excel file 2020-09-28 12:16:34 +02:00
Miriam Baglioni d930b8d3fc changed the query to get only the code of the project and not the optional1 (topic code) and optional2 (topic description) 2020-09-28 12:15:48 +02:00
Miriam Baglioni f8f5cfd5cc removed the part added to set the topic code and description in the step of project preparation 2020-09-28 12:13:33 +02:00
Miriam Baglioni 9e19c9a221 remove the topic description from the values in the CSVProject class 2020-09-28 12:11:03 +02:00
Miriam Baglioni 6d8b932e40 refactoring 2020-09-28 12:06:56 +02:00
Miriam Baglioni b77f166549 changed the package name from csvutils to utils 2020-09-28 12:05:47 +02:00
Miriam Baglioni f4739a371a code to get the information related to the topic association between code and description. 2020-09-28 12:02:48 +02:00
Miriam Baglioni 12c2dfc268 modified the resource to consider the information added to the model 2020-09-25 14:17:23 +02:00
Miriam Baglioni 969fa8d96e fixed issue and changed the transformation of the programme file to consider the new model 2020-09-25 13:32:34 +02:00
Miriam Baglioni e917281822 - 2020-09-24 15:24:05 +02:00
Miriam Baglioni 9f54f69e6d added topic information 2020-09-24 15:23:35 +02:00
Miriam Baglioni d6206d6e63 add the topic description to the action set associated to the project 2020-09-24 15:22:40 +02:00
Miriam Baglioni 6b50226f3b added topic code and topic description 2020-09-24 15:21:49 +02:00
Miriam Baglioni 15af1f527e modified to consider the topic information 2020-09-24 15:20:56 +02:00
Miriam Baglioni 609ff17cfc now the commission give us the framework programme (FP7 - H2020) so use this information to filter out programmes not associated to H2020 2020-09-24 15:19:31 +02:00
Miriam Baglioni b66f930466 Added optionl1 and optional2 information to the files red from the db. Optional1 contains the topic code and optional2 contains the topic description 2020-09-24 15:16:56 +02:00
Miriam Baglioni 860e6d38a6 added topic description to the CSV project variables 2020-09-24 15:15:26 +02:00
Miriam Baglioni 1d84cf19a6 added new line to resource file 2020-09-23 17:32:22 +02:00
Miriam Baglioni f0c476b6c9 modification to the test classes to consider h2020classification 2020-09-23 17:31:49 +02:00
Miriam Baglioni 2cba3cb484 modification to the classes building the actionset to consider the h2020classification 2020-09-23 17:31:15 +02:00
Miriam Baglioni 1069cf243a modification to the schema to consider the H2020classification of the programme. The filed Programme has been moved inside the H2020classification that is now associated to the Project. Programme is no more associated directly to the Project but via H2020CLassification 2020-09-22 14:38:00 +02:00
Claudio Atzori 306669209f code formatting 2020-06-16 16:54:44 +02:00
Claudio Atzori 603b1bd0bb Merge branch 'master' into dhp_oaf_model 2020-06-16 15:43:59 +02:00
Claudio Atzori a2fdf85ba1 WIP: graph cleaner implementation 2020-06-09 19:52:53 +02:00
Miriam Baglioni dfa4997a4f removed commented code 2020-05-29 10:45:18 +02:00
Miriam Baglioni 6f1eea28b6 changed message in log 2020-05-29 10:41:39 +02:00
Miriam Baglioni 8b6e886fb6 added new resource for testing 2020-05-28 23:54:31 +02:00
Miriam Baglioni 6989fb9c8a changed the project test according to the newly introduced join with the db project codes 2020-05-28 23:53:24 +02:00
Miriam Baglioni 782984d8e5 added needed parameter 2020-05-28 23:52:41 +02:00
Miriam Baglioni 01f7876595 fix issue with flatMap - the return type must not be null 2020-05-28 23:50:32 +02:00
Miriam Baglioni 773735f870 added the path to the file containing the projects code from the db 2020-05-28 17:30:45 +02:00
Miriam Baglioni 6a15067a64 added one step in the workflow 2020-05-28 17:30:09 +02:00
Miriam Baglioni 5309a99a70 modified the PrepareProjects to consider those in the db 2020-05-28 17:29:53 +02:00
Miriam Baglioni b737ed8236 added part to read projects from the openaire db to filter out those in the csv file that are not in the db 2020-05-28 17:29:21 +02:00