Commit Graph

287 Commits

Author SHA1 Message Date
Sandro La Bruzzo 6424cd9062 Added passing of the following parameters:
-varDataSourceId
-varOfficialName

in Each transformation Rule
2021-05-11 15:17:38 +02:00
Sandro La Bruzzo 073dcea2aa Added passing of the following parameters:
-varDataSourceId
-varOfficialName

in Each transformation Rule
2021-05-11 15:05:58 +02:00
Claudio Atzori 3797543600 MDStoreManager model classes moved in dhp-schemas 2021-05-10 14:32:05 +02:00
Michele Artini d82071ba6c originalId with prefix 2021-05-06 15:34:48 +02:00
Claudio Atzori 923d19ea8e mdstore read lock/unlock when bulk copying records from mongodb to hdfs 2021-05-04 18:06:21 +02:00
Claudio Atzori ba86835951 using common constants from ModelConstants 2021-05-04 11:51:52 +02:00
Michele Artini a278d67175 parse input file 2021-04-29 11:34:47 +02:00
Michele Artini f77ba34126 pid types 2021-04-29 09:50:05 +02:00
Michele Artini 7c5cd86927 annotations and tests 2021-04-29 09:29:19 +02:00
Michele Artini b5cf505cc6 partial implementation of the ROR->actionset workflow 2021-04-28 16:00:24 +02:00
Claudio Atzori 5afa7d3e0c core utilities in dhp-common moved in external module dhp-schemas 2021-04-27 15:44:01 +02:00
Sandro La Bruzzo 63c0303137 removed unused import, add log 2021-04-27 12:17:23 +02:00
Claudio Atzori fa42026590 fixed PersonCleaner extension functions 2021-04-27 10:10:06 +02:00
Claudio Atzori d0d477cca3 code formatting 2021-04-20 12:50:34 +02:00
Sandro La Bruzzo dbe0d0378e resolved ticket #6377 2021-04-20 09:44:44 +02:00
Sandro La Bruzzo 524e5f3092 Improved parallelization on transformation wf on hadoop 2021-04-19 15:17:25 +02:00
Sandro La Bruzzo cdfe01bbae improved parallelization on transformation job 2021-04-19 15:14:52 +02:00
Andreas Czerniak 3b694074ff add xslt, personname cleaner 2021-04-13 07:04:27 +02:00
Claudio Atzori 7941d7be29 WIP: using common definitions from ModelConstants 2021-03-31 18:33:57 +02:00
Claudio Atzori 879e8cc7ef WIP: using common definitions from ModelConstants 2021-03-31 17:12:01 +02:00
Claudio Atzori 72ce741ea6 WIP: using common definitions from ModelConstants 2021-03-31 17:07:13 +02:00
Sandro La Bruzzo 616d2ecce2 splitted workflow collecting datacite into two workflows.
Released on beta
2021-03-31 15:45:58 +02:00
Sandro La Bruzzo 1dfda3624e improved workflow importing datacite 2021-03-26 13:56:29 +01:00
Sandro La Bruzzo c73072079d fix conflicts 2021-03-22 16:36:31 +01:00
Claudio Atzori 61a2551e74 migrated last changes from svn (dnet45) 2021-03-15 17:17:55 +01:00
Claudio Atzori acbe3119a4 RestCollectorPlugin imported from dne45 2021-03-08 09:44:09 +01:00
Claudio Atzori 36f750cd1d removed unused classes 2021-03-03 10:22:29 +01:00
Claudio Atzori b73dce3e3a more logging on the MDStore mongodb client. Forcing UTF_8 encoding on the content 2021-03-03 10:17:16 +01:00
Claudio Atzori e76c4f62c1 MetadataRecord moved in dhp-schemas 2021-02-26 10:58:48 +01:00
Claudio Atzori 7df2461ccc indent XML records collected from oai-pmh endpoints 2021-02-25 16:19:12 +01:00
Claudio Atzori b830e33392 mdstore collector plugin 2021-02-25 12:30:30 +01:00
Claudio Atzori 271e88537b code formatting 2021-02-25 12:28:56 +01:00
Claudio Atzori 9c899f4433 cleanup on transformation functions and the relative tests 2021-02-24 15:07:59 +01:00
Claudio Atzori fc3fa5e343 implemented mdstore collector plugin 2021-02-24 15:07:24 +01:00
Claudio Atzori e7eba9f7e7 WIP: transformation workflow error reporting; cleanup 2021-02-17 16:54:08 +01:00
Claudio Atzori 58467aaf1e WIP: transformation workflow error reporting 2021-02-17 16:14:41 +01:00
Claudio Atzori cc88701f29 retry for any Socket exception 2021-02-17 16:13:54 +01:00
Claudio Atzori 545f8f3e48 using jackson objectmapper instead of GSon to serialise the aggregation report 2021-02-17 12:15:00 +01:00
Claudio Atzori b592d78bb4 WIP: collectorWorker error reporting, generalised reported implementation 2021-02-17 10:28:01 +01:00
Claudio Atzori cf27905a71 WIP: collectorWorker error reporting, added report messages 2021-02-16 16:53:14 +01:00
Claudio Atzori 1abe6d1ad7 WIP: collectorWorker error reporting, added report messages 2021-02-15 15:08:59 +01:00
Claudio Atzori 523a6bfa97 Merge pull request 'first commit to the correct branch' (#94) from andreas.czerniak/BrAggr_dnet-hadoop:hadoop_aggregator into hadoop_aggregator
Looks good to me, thanks Andreas!
2021-02-15 12:15:31 +01:00
Sandro La Bruzzo 7edcc87ed4 changed xslt behaviour on failure 2021-02-12 17:27:08 +01:00
Sandro La Bruzzo b3f5c2351d Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator
 Conflicts:
	dhp-workflows/dhp-aggregation/src/test/java/eu/dnetlib/dhp/transformation/TransformationJobTest.java
2021-02-12 16:37:14 +01:00
Sandro La Bruzzo f216277219 Implemented cleaning date 2021-02-12 16:34:52 +01:00
Andreas Czerniak 5a9017cf18 clone, min. changes, test, run 2021-02-12 14:32:36 +01:00
Claudio Atzori aa55dedb8a Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator 2021-02-12 12:31:05 +01:00
Claudio Atzori 29c6f7e255 classes related to the collection workflow moved into common package; implemented MongoDB collection plugins 2021-02-12 12:31:02 +01:00
Sandro La Bruzzo 17e6f1934e fixed NPE on cleaner 2021-02-12 11:48:11 +01:00
Sandro La Bruzzo ebcc3ec14f updated wrong datacite identifier in trasformation 2021-02-11 16:25:51 +01:00
Claudio Atzori bae029f828 collection_java_xmx allows to declare the heap size allocated for the java actions involved in the metadata collectionw workflow 2021-02-08 18:07:23 +01:00
Claudio Atzori bebc54d5bf seq file storing native records is now compressed 2021-02-08 18:06:25 +01:00
Claudio Atzori 50add4c61b added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common 2021-02-08 12:19:38 +01:00
Claudio Atzori 40df0f987d better logging, WIP: collectorWorker error reporting; common functions moved in DHPUtils 2021-02-06 20:12:00 +01:00
Claudio Atzori a8a758925e better logging, WIP: collectorWorker error reporting 2021-02-05 19:18:05 +01:00
Claudio Atzori 730973679a Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator 2021-02-04 17:25:00 +01:00
Claudio Atzori deb85706db imported HttpConnector from https://svn.driver.research-infrastructures.eu/driver/dnet45/modules/dnet-modular-collector-service/trunk/src/main/java/eu/dnetlib/data/collector/plugins/HttpConnector.java as HttpConnector2 2021-02-04 17:24:52 +01:00
Sandro La Bruzzo 4dae5e605d implemented messaging btween collection worker and Dnet 2021-02-04 15:51:15 +01:00
Claudio Atzori 40764cf626 better logging, WIP: collectorWorker error reporting 2021-02-04 14:06:02 +01:00
Claudio Atzori e04045089f better logging, WIP: collectorWorker error reporting 2021-02-03 17:58:22 +01:00
Claudio Atzori 0e8a4f9f1a better logging, WIP: collectorWorker error reporting 2021-02-03 12:33:41 +01:00
Claudio Atzori ac46c247d2 code formatting 2021-02-02 14:24:00 +01:00
Claudio Atzori bde14b149a fixed transformation target paths 2021-02-02 12:49:29 +01:00
Claudio Atzori ca4391aa1c minor changes 2021-02-02 12:44:04 +01:00
Claudio Atzori bb89b99b24 code formatting 2021-02-02 12:34:14 +01:00
Claudio Atzori 75807ea5ae factored out constants 2021-02-02 12:28:21 +01:00
Sandro La Bruzzo 0634674add implemented transformation test 2021-02-02 12:12:14 +01:00
Claudio Atzori 8eaa1fd4b4 WIP: metadata collection in INCREMENTAL mode and relative test 2021-02-01 19:29:10 +01:00
Sandro La Bruzzo bead34d11a code refactor 2021-02-01 14:58:06 +01:00
Sandro La Bruzzo 6ff234d81b Implemented a first prototype of incremental harvesting and trasformation using readlock 2021-02-01 13:56:05 +01:00
Sandro La Bruzzo b6b835ef49 update transformation Factory to get Transformation Rule by Id and not by Title 2021-02-01 08:49:42 +01:00
Sandro La Bruzzo 8ee82576c6 Collection on Refresh WORKS!!! 2021-01-29 17:02:46 +01:00
Sandro La Bruzzo 0276180039 WIP mdstore
transaction implemented on hadoop side
2021-01-29 16:42:41 +01:00
Sandro La Bruzzo 0f8e2ecce6 Merged Datacite transfrom into this branch 2021-01-29 10:45:07 +01:00
Sandro La Bruzzo 99cf3a8ea4 Merged Datacite transfrom into this branch 2021-01-28 16:34:46 +01:00
Sandro La Bruzzo 98b9498b57 Removed old messaging system not quite used from collection and Transformation workflow
code refactor
2021-01-28 09:51:17 +01:00
Sandro La Bruzzo 184e7b3856 Implemented new Transformation using spark 2021-01-27 15:43:08 +01:00
Sandro La Bruzzo ffb092b8d3 removed duplicate code HttpConnector.java 2021-01-25 15:05:37 +01:00
Claudio Atzori 2a7a10809e [BIP! Scores integration] merged missing classes from bipFinder branch 2021-01-11 10:05:02 +01:00
Claudio Atzori d6686dd7cf merged from master 2021-01-08 18:16:12 +01:00
Claudio Atzori 34229970e6 [BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation 2021-01-08 16:29:17 +01:00
Claudio Atzori 1361c9eb0c [BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation 2021-01-07 10:07:30 +01:00
Claudio Atzori 2e503ee101 code formatting 2020-12-17 13:47:38 +01:00
Claudio Atzori 03319d3bd9 Revert "Merge pull request 'Creation of the action set to include the bipFinder! score' (#62) from miriam.baglioni/dnet-hadoop:bipFinder into master"
This reverts commit add7e1693b, reversing
changes made to f9a8fd8bbd.
2020-12-17 12:23:58 +01:00
Miriam Baglioni 888175baf7 added java doc 2020-12-01 18:36:29 +01:00
Miriam Baglioni 17680296b9 removed unnecessary variable and unused method 2020-12-01 15:02:31 +01:00
Miriam Baglioni 5b3ed70808 refactoring 2020-12-01 14:31:34 +01:00
Miriam Baglioni 45d06c45c7 collecting all the atoic actions for result type and save them all in the AS path 2020-12-01 14:29:18 +01:00
Miriam Baglioni db36e11912 classes test classes and resources for production of the actionset to include bipFinder score in results 2020-11-30 20:14:23 +01:00
Miriam Baglioni a2ce527fae changed to match the requirements for short titles in level and long titles in classification 2020-10-20 17:03:25 +02:00
Claudio Atzori 5f7b75f5c5 code formatting 2020-10-07 13:22:54 +02:00
Miriam Baglioni 061527f06e adding short description 2020-10-05 13:54:39 +02:00
Miriam Baglioni 0c12d7bdd8 adding short description 2020-10-05 11:39:55 +02:00
Miriam Baglioni fc2f7636be removed not used code 2020-10-02 12:33:52 +02:00
Miriam Baglioni 4aec347351 refactoring 2020-10-01 16:23:52 +02:00
Miriam Baglioni 61946b4092 refactoring 2020-10-01 16:22:48 +02:00
Miriam Baglioni 3a374c34b6 fixed null pointer exception 2020-10-01 15:41:01 +02:00
Miriam Baglioni 6e5db85b32 - 2020-10-01 11:51:11 +02:00
Miriam Baglioni b90bee124b removing raws that are empy from thos imported 2020-10-01 11:16:49 +02:00
Miriam Baglioni 416bda6066 changed the programme.desxcription by using the same value used in the classification instead of the short title or the title 2020-10-01 10:31:33 +02:00
Miriam Baglioni f6587c91f3 added comparison to a char that seems - but it is not 2020-10-01 10:30:26 +02:00
Miriam Baglioni 7e73bb88b3 changed the logic to add the topic description to the project 2020-09-28 17:21:43 +02:00
Miriam Baglioni 0a035e3630 - 2020-09-28 17:20:49 +02:00
Miriam Baglioni 16bee2084d added the topic code to the project subset 2020-09-28 17:20:11 +02:00
Miriam Baglioni c2abde4d9f changed the implementation of Atomic Actions creation by exploiting the topic information get from the cordis excel file 2020-09-28 12:16:34 +02:00
Miriam Baglioni d930b8d3fc changed the query to get only the code of the project and not the optional1 (topic code) and optional2 (topic description) 2020-09-28 12:15:48 +02:00
Miriam Baglioni f8f5cfd5cc removed the part added to set the topic code and description in the step of project preparation 2020-09-28 12:13:33 +02:00
Miriam Baglioni 9e19c9a221 remove the topic description from the values in the CSVProject class 2020-09-28 12:11:03 +02:00
Miriam Baglioni 6d8b932e40 refactoring 2020-09-28 12:06:56 +02:00
Miriam Baglioni b77f166549 changed the package name from csvutils to utils 2020-09-28 12:05:47 +02:00
Miriam Baglioni f4739a371a code to get the information related to the topic association between code and description. 2020-09-28 12:02:48 +02:00
Miriam Baglioni 969fa8d96e fixed issue and changed the transformation of the programme file to consider the new model 2020-09-25 13:32:34 +02:00
Miriam Baglioni 9f54f69e6d added topic information 2020-09-24 15:23:35 +02:00
Miriam Baglioni d6206d6e63 add the topic description to the action set associated to the project 2020-09-24 15:22:40 +02:00
Miriam Baglioni 6b50226f3b added topic code and topic description 2020-09-24 15:21:49 +02:00
Miriam Baglioni 15af1f527e modified to consider the topic information 2020-09-24 15:20:56 +02:00
Miriam Baglioni 609ff17cfc now the commission give us the framework programme (FP7 - H2020) so use this information to filter out programmes not associated to H2020 2020-09-24 15:19:31 +02:00
Miriam Baglioni b66f930466 Added optionl1 and optional2 information to the files red from the db. Optional1 contains the topic code and optional2 contains the topic description 2020-09-24 15:16:56 +02:00
Miriam Baglioni 860e6d38a6 added topic description to the CSV project variables 2020-09-24 15:15:26 +02:00
Miriam Baglioni 2cba3cb484 modification to the classes building the actionset to consider the h2020classification 2020-09-23 17:31:15 +02:00
Claudio Atzori 306669209f code formatting 2020-06-16 16:54:44 +02:00
Miriam Baglioni 6f1eea28b6 changed message in log 2020-05-29 10:41:39 +02:00
Miriam Baglioni 01f7876595 fix issue with flatMap - the return type must not be null 2020-05-28 23:50:32 +02:00
Miriam Baglioni 5309a99a70 modified the PrepareProjects to consider those in the db 2020-05-28 17:29:53 +02:00
Miriam Baglioni b737ed8236 added part to read projects from the openaire db to filter out those in the csv file that are not in the db 2020-05-28 17:29:21 +02:00
Miriam Baglioni df44db686a refactoring 2020-05-28 10:07:00 +02:00
Miriam Baglioni 87b07f4af8 removed unused variables 2020-05-28 10:05:43 +02:00
Miriam Baglioni 96d1a3c431 deleted the file were to store the csv files 2020-05-28 10:04:10 +02:00
Miriam Baglioni 669c05c771 added groupBy before creating Actions 2020-05-28 10:00:45 +02:00
Miriam Baglioni 473c6d3a23 produces AtomicActions instead of Projects 2020-05-22 15:26:57 +02:00
Miriam Baglioni 4589c428b1 generate action sets and saves them in the hdfs path for the actions sets 2020-05-21 16:30:39 +02:00
Miriam Baglioni 75491482de added a new preparation step to replicate each project for the programme it is associated to 2020-05-20 10:28:56 +02:00
Miriam Baglioni 9447d78ef3 added preparation classes 2020-05-19 18:42:50 +02:00
Miriam Baglioni abc45f2708 added dnet-45 HttpConnector and related Classes, produced the POJO for projects and programme 2020-05-18 13:04:06 +02:00
Miriam Baglioni 22cb9e0da7 simple code to get file from URL 2020-05-15 18:18:01 +02:00
Claudio Atzori 0825321d0b improved unit tests in dhp-aggregation 2020-05-05 12:39:04 +02:00
Claudio Atzori 439c6255a2 cleanup 2020-04-29 19:09:07 +02:00
Claudio Atzori 6f5b899038 reformatted code according to the updated style descriptor 2020-04-28 11:23:29 +02:00
Claudio Atzori a0bdbacdae switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:52:31 +02:00
Claudio Atzori 7a3f8085f7 switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:45:40 +02:00
Claudio Atzori 9147af7fed actionsets migration workflow moved in dhp-workflows/dhp-actionmanager 2020-04-20 15:24:33 +02:00
Claudio Atzori d714bfb4d4 collectedfrom field moved in common parent class Oaf.java 2020-04-20 12:25:19 +02:00
Claudio Atzori ad7a131b18 introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project 2020-04-18 12:42:58 +02:00
Claudio Atzori 6b5f9ca9cb raw graph creation workflow moved under dhp-graph-mapper, claims integration is included 2020-04-10 17:53:07 +02:00
Claudio Atzori 7061d07727 ActionSets migration serialize the output as plain text files instead of SequenceFiles 2020-04-01 14:58:22 +02:00
Michele Artini 408be3c632 test and fixed a problem with datacite namespaces 2020-03-27 11:44:50 +01:00
Michele Artini fd57722c69 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-03-25 15:56:49 +01:00
Michele Artini 0fda2c3a30 some tests on db records 2020-03-25 09:43:58 +01:00
Claudio Atzori ecb64e4998 Merge branch 'migration_wfs_regular_all_steps' 2020-03-23 08:57:01 +01:00
Michele Artini 15160032bd fixed a bug setting some organization fields 2020-03-23 08:39:14 +01:00