Commit Graph

328 Commits

Author SHA1 Message Date
Miriam Baglioni d36e925277 GetCSV refactoring - moved under model package 2021-08-12 18:00:21 +02:00
Miriam Baglioni 6e84b3951f GetCSV refactoring - moving classes to dhp-common that have dependency with GetCSV class (that was located in graph-mapper) 2021-08-12 17:57:41 +02:00
Miriam Baglioni 8da3a25cf6 merging with branch beta 2021-08-11 15:55:34 +02:00
Claudio Atzori 9f4db73f30 updated/fixed unit tests 2021-08-11 15:02:51 +02:00
Claudio Atzori 2ee21da43b suggestions from SonarLint 2021-08-11 12:13:22 +02:00
Miriam Baglioni 1d6ac3715b merge branch with beta 2021-07-30 11:58:29 +02:00
Sandro La Bruzzo b1b0cc3f15 fixed wrong package name 2021-07-29 13:55:08 +02:00
Sandro La Bruzzo 3721df7aa6 refactoring create actionset of scholexplorer, moved on package dhp-aggregation 2021-07-29 10:45:35 +02:00
Sandro La Bruzzo 3d8f0f629b implemented workflow of creation action set for scholexplorer 2021-07-28 16:15:34 +02:00
Miriam Baglioni cc0d3d8a7b mergin with branch beta 2021-07-28 11:24:46 +02:00
Miriam Baglioni 708d0ade34 Merge branch 'beta' into hostedbymap 2021-07-28 10:37:22 +02:00
Sandro La Bruzzo 16c91203bd implemented workflow of creation action set for scholexplorer 2021-07-28 10:30:49 +02:00
Sandro La Bruzzo 825d9f0289 fixed datacite workflow starting from Importing delta 2021-07-27 16:09:46 +02:00
Miriam Baglioni 74f801b689 mergin with branch beta 2021-07-27 13:18:31 +02:00
Claudio Atzori a0393607a7 mapping funding relations from Datacite should be done according to the actual result identifier 2021-07-23 18:15:08 +02:00
Miriam Baglioni 63553a76b3 added code to download gold issn list from unibi 2021-07-22 12:01:48 +02:00
Sandro La Bruzzo bbe8193930 merged stable ids 2021-07-12 17:00:43 +02:00
Sandro La Bruzzo cd17e19044 implemented branch workflow to import datacite and crossref in scholexplorer 2021-07-08 21:20:19 +02:00
Claudio Atzori 777536ce91 [aggregation] string values used as regular expressions in the OAI collection classes are defined in a single point as constants, to be reused across the code (PR#122) 2021-07-07 11:23:48 +02:00
Claudio Atzori bc014023c8 Merge pull request 'to solve the scala SI-3623' (#122) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
Reviewed-on: D-Net/dnet-hadoop#122
2021-07-07 11:13:51 +02:00
Andreas Czerniak ebf3f47a02 from&until more OAI2.0 compl., adding tfs 2021-07-07 09:29:49 +02:00
Claudio Atzori 70ded407bb HttpClient used in metadata collection retries also on 404 2021-07-05 18:04:30 +02:00
Sandro La Bruzzo db933ebd21 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-06-29 14:16:12 +02:00
Sandro La Bruzzo 7e08655e5f added relation dates in all scholexplorer Datasources 2021-06-29 12:02:03 +02:00
Claudio Atzori af42377d0e HttpClient used in metadata collection retries on 502, 503, 504 2021-06-28 09:34:30 +02:00
Sandro La Bruzzo ad50415167 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-06-24 17:20:50 +02:00
Sandro La Bruzzo 80e15cc455 implemented mapping from uniprot, pdb and ebi links 2021-06-24 17:20:00 +02:00
Claudio Atzori 5edcc6832a applying sonarLint suggestions 2021-06-23 09:53:29 +02:00
Sandro La Bruzzo 1dc0c59e20 merged fix thai dates from stable_ids 2021-06-21 10:39:46 +02:00
Sandro La Bruzzo 3990165d05 changed typologies of unresolved relation 2021-06-18 11:43:59 +02:00
Sandro La Bruzzo aeb8132627 Merged branch stable_ids 2021-06-14 10:07:29 +02:00
Claudio Atzori e9e86a237d Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-06-11 17:00:02 +02:00
Claudio Atzori a900bfb874 delegating the date parsing to https://github.com/sisyphsu/dateparser 2021-06-11 16:53:01 +02:00
Sandro La Bruzzo dd997c49e0 fix wrong relation id
fix date thai ticket #6791
2021-06-10 14:47:18 +02:00
Sandro La Bruzzo 0cdb7ccdaa added inverse relations to datacite mapping 2021-06-04 15:10:20 +02:00
Sandro La Bruzzo 5b724d9972 added relations to datacite mapping 2021-06-04 10:14:22 +02:00
Sandro La Bruzzo 02ef46535f Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-05-31 09:50:15 +02:00
Sandro La Bruzzo aeadc5a366 updated wf Datacite Import to retrieve the block size as parameter 2021-05-31 09:49:53 +02:00
Claudio Atzori d512062b58 integrating pull #109, H2020Classification 2021-05-27 12:22:47 +02:00
Sandro La Bruzzo bced804151 updated wf Datacite Import to retrieve the block size as parameter 2021-05-26 17:06:50 +02:00
Miriam Baglioni c844877de2 changed workflow flow to possibly parallelize also the programme and project preparation steps 2021-05-21 14:41:57 +02:00
Miriam Baglioni 073d76864d refactoring 2021-05-21 14:41:03 +02:00
Miriam Baglioni 4c8b4a774c removed not needed code 2021-05-21 14:40:07 +02:00
Miriam Baglioni 1ee8f13580 refactoring and added "left" as join type to be 100% sure to get the whole set of projects 2021-05-21 11:49:05 +02:00
Miriam Baglioni e07c3ba089 due to change in the input file the filtering step is no more needed 2021-05-21 11:47:43 +02:00
Miriam Baglioni 54f6e2f693 changed to get the needed information to build the action set as parallel jobs 2021-05-21 11:47:00 +02:00
Miriam Baglioni 7180505519 removed non needed variable 2021-05-21 11:46:13 +02:00
Miriam Baglioni 2eb1a8b344 changed because the input file changed 2021-05-21 11:40:20 +02:00
Claudio Atzori 9d725efdc1 reverted implementation of the mdstore client 2021-05-20 18:26:09 +02:00
Miriam Baglioni 9610224671 added param to workflow property 2021-05-20 18:21:12 +02:00
Miriam Baglioni 052c837843 - 2021-05-20 15:54:44 +02:00
Claudio Atzori b695932ae4 integrated pull#108 2021-05-20 15:34:04 +02:00
Miriam Baglioni dc0ad8d2e0 fixed issue related to change in the file name downloaded. Added sheet name as parameter and also a check if the name should change 2021-05-20 14:53:53 +02:00
Claudio Atzori 239d0f0a9a ROR actionset import workflow backported from branch stable_ids 2021-05-18 16:12:11 +02:00
Michele Artini c1e20de7cf fixed the deserialization of a json property 2021-05-18 14:00:14 +02:00
Claudio Atzori 23b8883ab1 applied intellij code cleanup 2021-05-14 10:58:12 +02:00
Sandro La Bruzzo 6424cd9062 Added passing of the following parameters:
-varDataSourceId
-varOfficialName

in Each transformation Rule
2021-05-11 15:17:38 +02:00
Sandro La Bruzzo 073dcea2aa Added passing of the following parameters:
-varDataSourceId
-varOfficialName

in Each transformation Rule
2021-05-11 15:05:58 +02:00
Claudio Atzori 3797543600 MDStoreManager model classes moved in dhp-schemas 2021-05-10 14:32:05 +02:00
Michele Artini d82071ba6c originalId with prefix 2021-05-06 15:34:48 +02:00
Claudio Atzori 923d19ea8e mdstore read lock/unlock when bulk copying records from mongodb to hdfs 2021-05-04 18:06:21 +02:00
Claudio Atzori ba86835951 using common constants from ModelConstants 2021-05-04 11:51:52 +02:00
Michele Artini a278d67175 parse input file 2021-04-29 11:34:47 +02:00
Michele Artini f77ba34126 pid types 2021-04-29 09:50:05 +02:00
Michele Artini 7c5cd86927 annotations and tests 2021-04-29 09:29:19 +02:00
Michele Artini b5cf505cc6 partial implementation of the ROR->actionset workflow 2021-04-28 16:00:24 +02:00
Claudio Atzori 5afa7d3e0c core utilities in dhp-common moved in external module dhp-schemas 2021-04-27 15:44:01 +02:00
Sandro La Bruzzo 63c0303137 removed unused import, add log 2021-04-27 12:17:23 +02:00
Claudio Atzori fa42026590 fixed PersonCleaner extension functions 2021-04-27 10:10:06 +02:00
Sandro La Bruzzo fd29307b84 updated workflow name 2021-04-21 09:21:41 +02:00
Claudio Atzori d0d477cca3 code formatting 2021-04-20 12:50:34 +02:00
Sandro La Bruzzo e06c7f32f6 updated id figshare as described in #6377 2021-04-20 10:18:07 +02:00
Sandro La Bruzzo dbe0d0378e resolved ticket #6377 2021-04-20 09:44:44 +02:00
Sandro La Bruzzo 524e5f3092 Improved parallelization on transformation wf on hadoop 2021-04-19 15:17:25 +02:00
Sandro La Bruzzo cdfe01bbae improved parallelization on transformation job 2021-04-19 15:14:52 +02:00
Andreas Czerniak 3b694074ff add xslt, personname cleaner 2021-04-13 07:04:27 +02:00
Claudio Atzori 7941d7be29 WIP: using common definitions from ModelConstants 2021-03-31 18:33:57 +02:00
Claudio Atzori 879e8cc7ef WIP: using common definitions from ModelConstants 2021-03-31 17:12:01 +02:00
Claudio Atzori 72ce741ea6 WIP: using common definitions from ModelConstants 2021-03-31 17:07:13 +02:00
Sandro La Bruzzo 616d2ecce2 splitted workflow collecting datacite into two workflows.
Released on beta
2021-03-31 15:45:58 +02:00
Sandro La Bruzzo 1dfda3624e improved workflow importing datacite 2021-03-26 13:56:29 +01:00
Sandro La Bruzzo c73072079d fix conflicts 2021-03-22 16:36:31 +01:00
Claudio Atzori 61a2551e74 migrated last changes from svn (dnet45) 2021-03-15 17:17:55 +01:00
Claudio Atzori acbe3119a4 RestCollectorPlugin imported from dne45 2021-03-08 09:44:09 +01:00
Claudio Atzori 36f750cd1d removed unused classes 2021-03-03 10:22:29 +01:00
Claudio Atzori b73dce3e3a more logging on the MDStore mongodb client. Forcing UTF_8 encoding on the content 2021-03-03 10:17:16 +01:00
Claudio Atzori e76c4f62c1 MetadataRecord moved in dhp-schemas 2021-02-26 10:58:48 +01:00
Claudio Atzori 7df2461ccc indent XML records collected from oai-pmh endpoints 2021-02-25 16:19:12 +01:00
Claudio Atzori b830e33392 mdstore collector plugin 2021-02-25 12:30:30 +01:00
Claudio Atzori 271e88537b code formatting 2021-02-25 12:28:56 +01:00
Claudio Atzori 9c899f4433 cleanup on transformation functions and the relative tests 2021-02-24 15:07:59 +01:00
Claudio Atzori fc3fa5e343 implemented mdstore collector plugin 2021-02-24 15:07:24 +01:00
Claudio Atzori e7eba9f7e7 WIP: transformation workflow error reporting; cleanup 2021-02-17 16:54:08 +01:00
Claudio Atzori 58467aaf1e WIP: transformation workflow error reporting 2021-02-17 16:14:41 +01:00
Claudio Atzori cc88701f29 retry for any Socket exception 2021-02-17 16:13:54 +01:00
Claudio Atzori 545f8f3e48 using jackson objectmapper instead of GSon to serialise the aggregation report 2021-02-17 12:15:00 +01:00
Claudio Atzori b592d78bb4 WIP: collectorWorker error reporting, generalised reported implementation 2021-02-17 10:28:01 +01:00
Claudio Atzori cf27905a71 WIP: collectorWorker error reporting, added report messages 2021-02-16 16:53:14 +01:00
Claudio Atzori 1abe6d1ad7 WIP: collectorWorker error reporting, added report messages 2021-02-15 15:08:59 +01:00
Claudio Atzori 523a6bfa97 Merge pull request 'first commit to the correct branch' (#94) from andreas.czerniak/BrAggr_dnet-hadoop:hadoop_aggregator into hadoop_aggregator
Looks good to me, thanks Andreas!
2021-02-15 12:15:31 +01:00