Commit Graph

374 Commits

Author SHA1 Message Date
Sandro La Bruzzo 48923e46a1 added documentation to Pubmed Class and also added mvn site for dhp-aggregations 2021-11-15 14:32:01 +01:00
Miriam Baglioni 4ec88c718c merge with beta - resolved conflict in pom 2021-11-15 10:52:16 +01:00
Miriam Baglioni 157d33ebf9 [Bypass Action Set] Refactoring 2021-11-15 09:58:48 +01:00
Miriam Baglioni 92d0e18b55 [Bypass Action Set] used constant DOI instead of "doi" 2021-11-12 10:56:58 +01:00
Miriam Baglioni 881113743f [Bypass Action Set] refactoring 2021-11-12 10:55:50 +01:00
Miriam Baglioni 47ccb53c4f [Bypass Action Set] modification for comment #157 (comment) 2021-11-12 10:54:09 +01:00
Miriam Baglioni 716021546e [Bypass Action Set] minor fix 2021-11-12 10:18:01 +01:00
Miriam Baglioni 935062edec [Bypass Action Set] creation of unresolved entities 2021-11-11 16:11:25 +01:00
Claudio Atzori d02caef185 Merge branch 'beta' into hierarchical_orgs_relations 2021-10-27 15:36:29 +02:00
Sandro La Bruzzo 4acfa8fa2e Scholexplorer Datasource Aggregation:
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Michele Artini d66e20e7ac added hierarchy rel in ROR actionset 2021-10-21 15:51:48 +02:00
Sandro La Bruzzo aeeebd573b code refactor renamed datacite package 2021-10-20 17:37:42 +02:00
Sandro La Bruzzo ae4e99a471 Adapted workflow of resolution of PID to work into OpenAIRE data workflow
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Miriam Baglioni 1cc09adfaa Opencitations: chenaged the test class to mirror the creation or not of duplicate dois for .refs oc original plus added optional parameter to duplicate the relation 2021-10-18 14:11:27 +02:00
Sandro La Bruzzo 7b15b88d4c renamed wrong package, implemented last aggregation workflow for scholexplorer 2021-10-15 15:00:15 +02:00
Sandro La Bruzzo 51a03c0a50 refactor code for EBI from dhp-graph-mapper into dhp-aggregation 2021-10-14 14:23:13 +02:00
Sandro La Bruzzo 7387416e90 added params skip update to direct transform in OAF, this should be set to true in production 2021-10-12 12:36:30 +02:00
Sandro La Bruzzo 511da98d0c - fixed bug on download pmc Article
- removed unused line of code in SparkCreateActionset
2021-10-12 11:47:49 +02:00
Sandro La Bruzzo 5606014b17 code refactor see ticket #7065 2021-10-12 08:11:53 +02:00
Sandro La Bruzzo 66702b1973 Added node to update datacite 2021-09-28 08:59:06 +02:00
Miriam Baglioni 5ec69889db OpenCitations: creation of AS from OC 2021-09-27 16:02:06 +02:00
Miriam Baglioni f2118d771a first steps in the implementation of the integration of opencitations 2021-09-22 15:18:05 +02:00
Claudio Atzori 663b1556d7 manually integrating PR#140 #140 2021-09-15 16:40:25 +02:00
Sandro La Bruzzo aed29156c7 changed behavior in transformation job, that doesn't fail at first error 2021-09-07 19:05:46 +02:00
Sandro La Bruzzo 3c6fc2096c fix bug on oai iterator that skip record cleaned 2021-09-07 10:46:26 +02:00
Sandro La Bruzzo e8b3cb9147 Implemented method to download delta updates in EBI Links 2021-08-30 09:32:45 +02:00
Claudio Atzori 3359f73fcf cleanup & best practices 2021-08-13 12:00:42 +02:00
Miriam Baglioni 32fd75691f refactoring 2021-08-13 10:15:42 +02:00
Miriam Baglioni 5cd5714530 GetCSV refactoring - added ignore annotation for fields not in input csv 2021-08-13 10:06:49 +02:00
Miriam Baglioni 8769dd8eef GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:20:56 +02:00
Miriam Baglioni 6b9e1bf2e3 GetCSV refactoring - removing not needed dependency 2021-08-12 18:17:50 +02:00
Miriam Baglioni 335a824e34 GetCSV refactoring - fixed issue 2021-08-12 18:10:10 +02:00
Miriam Baglioni f0845e9865 GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:04:58 +02:00
Miriam Baglioni 7a789423aa GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:04:27 +02:00
Miriam Baglioni e9fc3ef3bc GetCSV refactoring - changed to use the new class to get and write the csv file 2021-08-12 18:03:41 +02:00
Miriam Baglioni 4317211a2b GetCSV refactoring - refactoring due to movement 2021-08-12 18:03:14 +02:00
Miriam Baglioni b62cd656a7 GetCSV refactoring - changed the model to store only the information needed 2021-08-12 18:01:10 +02:00
Miriam Baglioni d36e925277 GetCSV refactoring - moved under model package 2021-08-12 18:00:21 +02:00
Miriam Baglioni 6e84b3951f GetCSV refactoring - moving classes to dhp-common that have dependency with GetCSV class (that was located in graph-mapper) 2021-08-12 17:57:41 +02:00
Miriam Baglioni 8da3a25cf6 merging with branch beta 2021-08-11 15:55:34 +02:00
Claudio Atzori 9f4db73f30 updated/fixed unit tests 2021-08-11 15:02:51 +02:00
Claudio Atzori 2ee21da43b suggestions from SonarLint 2021-08-11 12:13:22 +02:00
Miriam Baglioni 1d6ac3715b merge branch with beta 2021-07-30 11:58:29 +02:00
Sandro La Bruzzo 3721df7aa6 refactoring create actionset of scholexplorer, moved on package dhp-aggregation 2021-07-29 10:45:35 +02:00
Sandro La Bruzzo 3d8f0f629b implemented workflow of creation action set for scholexplorer 2021-07-28 16:15:34 +02:00
Miriam Baglioni cc0d3d8a7b mergin with branch beta 2021-07-28 11:24:46 +02:00
Miriam Baglioni 708d0ade34 Merge branch 'beta' into hostedbymap 2021-07-28 10:37:22 +02:00
Sandro La Bruzzo 16c91203bd implemented workflow of creation action set for scholexplorer 2021-07-28 10:30:49 +02:00
Sandro La Bruzzo 825d9f0289 fixed datacite workflow starting from Importing delta 2021-07-27 16:09:46 +02:00
Miriam Baglioni 74f801b689 mergin with branch beta 2021-07-27 13:18:31 +02:00
Claudio Atzori a0393607a7 mapping funding relations from Datacite should be done according to the actual result identifier 2021-07-23 18:15:08 +02:00
Miriam Baglioni 63553a76b3 added code to download gold issn list from unibi 2021-07-22 12:01:48 +02:00
Sandro La Bruzzo bbe8193930 merged stable ids 2021-07-12 17:00:43 +02:00
Sandro La Bruzzo cd17e19044 implemented branch workflow to import datacite and crossref in scholexplorer 2021-07-08 21:20:19 +02:00
Claudio Atzori 777536ce91 [aggregation] string values used as regular expressions in the OAI collection classes are defined in a single point as constants, to be reused across the code (PR#122) 2021-07-07 11:23:48 +02:00
Claudio Atzori bc014023c8 Merge pull request 'to solve the scala SI-3623' (#122) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
Reviewed-on: #122
2021-07-07 11:13:51 +02:00
Andreas Czerniak ebf3f47a02 from&until more OAI2.0 compl., adding tfs 2021-07-07 09:29:49 +02:00
Claudio Atzori 70ded407bb HttpClient used in metadata collection retries also on 404 2021-07-05 18:04:30 +02:00
Sandro La Bruzzo db933ebd21 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-06-29 14:16:12 +02:00
Sandro La Bruzzo 7e08655e5f added relation dates in all scholexplorer Datasources 2021-06-29 12:02:03 +02:00
Claudio Atzori af42377d0e HttpClient used in metadata collection retries on 502, 503, 504 2021-06-28 09:34:30 +02:00
Sandro La Bruzzo ad50415167 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-06-24 17:20:50 +02:00
Sandro La Bruzzo 80e15cc455 implemented mapping from uniprot, pdb and ebi links 2021-06-24 17:20:00 +02:00
Claudio Atzori 5edcc6832a applying sonarLint suggestions 2021-06-23 09:53:29 +02:00
Sandro La Bruzzo 1dc0c59e20 merged fix thai dates from stable_ids 2021-06-21 10:39:46 +02:00
Sandro La Bruzzo 3990165d05 changed typologies of unresolved relation 2021-06-18 11:43:59 +02:00
Sandro La Bruzzo aeb8132627 Merged branch stable_ids 2021-06-14 10:07:29 +02:00
Claudio Atzori e9e86a237d Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-06-11 17:00:02 +02:00
Claudio Atzori a900bfb874 delegating the date parsing to https://github.com/sisyphsu/dateparser 2021-06-11 16:53:01 +02:00
Sandro La Bruzzo dd997c49e0 fix wrong relation id
fix date thai ticket #6791
2021-06-10 14:47:18 +02:00
Sandro La Bruzzo 0cdb7ccdaa added inverse relations to datacite mapping 2021-06-04 15:10:20 +02:00
Sandro La Bruzzo 5b724d9972 added relations to datacite mapping 2021-06-04 10:14:22 +02:00
Claudio Atzori d512062b58 integrating pull #109, H2020Classification 2021-05-27 12:22:47 +02:00
Sandro La Bruzzo bced804151 updated wf Datacite Import to retrieve the block size as parameter 2021-05-26 17:06:50 +02:00
Miriam Baglioni 073d76864d refactoring 2021-05-21 14:41:03 +02:00
Miriam Baglioni 4c8b4a774c removed not needed code 2021-05-21 14:40:07 +02:00
Miriam Baglioni 1ee8f13580 refactoring and added "left" as join type to be 100% sure to get the whole set of projects 2021-05-21 11:49:05 +02:00
Miriam Baglioni e07c3ba089 due to change in the input file the filtering step is no more needed 2021-05-21 11:47:43 +02:00
Miriam Baglioni 7180505519 removed non needed variable 2021-05-21 11:46:13 +02:00
Miriam Baglioni 2eb1a8b344 changed because the input file changed 2021-05-21 11:40:20 +02:00
Claudio Atzori 9d725efdc1 reverted implementation of the mdstore client 2021-05-20 18:26:09 +02:00
Miriam Baglioni 052c837843 - 2021-05-20 15:54:44 +02:00
Claudio Atzori b695932ae4 integrated pull#108 2021-05-20 15:34:04 +02:00
Miriam Baglioni dc0ad8d2e0 fixed issue related to change in the file name downloaded. Added sheet name as parameter and also a check if the name should change 2021-05-20 14:53:53 +02:00
Claudio Atzori 239d0f0a9a ROR actionset import workflow backported from branch stable_ids 2021-05-18 16:12:11 +02:00
Michele Artini c1e20de7cf fixed the deserialization of a json property 2021-05-18 14:00:14 +02:00
Claudio Atzori 23b8883ab1 applied intellij code cleanup 2021-05-14 10:58:12 +02:00
Sandro La Bruzzo 6424cd9062 Added passing of the following parameters:
-varDataSourceId
-varOfficialName

in Each transformation Rule
2021-05-11 15:17:38 +02:00
Sandro La Bruzzo 073dcea2aa Added passing of the following parameters:
-varDataSourceId
-varOfficialName

in Each transformation Rule
2021-05-11 15:05:58 +02:00
Claudio Atzori 3797543600 MDStoreManager model classes moved in dhp-schemas 2021-05-10 14:32:05 +02:00
Michele Artini d82071ba6c originalId with prefix 2021-05-06 15:34:48 +02:00
Claudio Atzori 923d19ea8e mdstore read lock/unlock when bulk copying records from mongodb to hdfs 2021-05-04 18:06:21 +02:00
Claudio Atzori ba86835951 using common constants from ModelConstants 2021-05-04 11:51:52 +02:00
Michele Artini a278d67175 parse input file 2021-04-29 11:34:47 +02:00
Michele Artini f77ba34126 pid types 2021-04-29 09:50:05 +02:00
Michele Artini 7c5cd86927 annotations and tests 2021-04-29 09:29:19 +02:00
Michele Artini b5cf505cc6 partial implementation of the ROR->actionset workflow 2021-04-28 16:00:24 +02:00
Claudio Atzori 5afa7d3e0c core utilities in dhp-common moved in external module dhp-schemas 2021-04-27 15:44:01 +02:00
Sandro La Bruzzo 63c0303137 removed unused import, add log 2021-04-27 12:17:23 +02:00
Claudio Atzori fa42026590 fixed PersonCleaner extension functions 2021-04-27 10:10:06 +02:00
Claudio Atzori d0d477cca3 code formatting 2021-04-20 12:50:34 +02:00
Sandro La Bruzzo dbe0d0378e resolved ticket #6377 2021-04-20 09:44:44 +02:00
Sandro La Bruzzo 524e5f3092 Improved parallelization on transformation wf on hadoop 2021-04-19 15:17:25 +02:00
Sandro La Bruzzo cdfe01bbae improved parallelization on transformation job 2021-04-19 15:14:52 +02:00
Andreas Czerniak 3b694074ff add xslt, personname cleaner 2021-04-13 07:04:27 +02:00
Claudio Atzori 7941d7be29 WIP: using common definitions from ModelConstants 2021-03-31 18:33:57 +02:00
Claudio Atzori 879e8cc7ef WIP: using common definitions from ModelConstants 2021-03-31 17:12:01 +02:00
Claudio Atzori 72ce741ea6 WIP: using common definitions from ModelConstants 2021-03-31 17:07:13 +02:00
Sandro La Bruzzo 616d2ecce2 splitted workflow collecting datacite into two workflows.
Released on beta
2021-03-31 15:45:58 +02:00
Sandro La Bruzzo 1dfda3624e improved workflow importing datacite 2021-03-26 13:56:29 +01:00
Sandro La Bruzzo c73072079d fix conflicts 2021-03-22 16:36:31 +01:00
Claudio Atzori 61a2551e74 migrated last changes from svn (dnet45) 2021-03-15 17:17:55 +01:00
Claudio Atzori acbe3119a4 RestCollectorPlugin imported from dne45 2021-03-08 09:44:09 +01:00
Claudio Atzori 36f750cd1d removed unused classes 2021-03-03 10:22:29 +01:00
Claudio Atzori b73dce3e3a more logging on the MDStore mongodb client. Forcing UTF_8 encoding on the content 2021-03-03 10:17:16 +01:00
Claudio Atzori e76c4f62c1 MetadataRecord moved in dhp-schemas 2021-02-26 10:58:48 +01:00
Claudio Atzori 7df2461ccc indent XML records collected from oai-pmh endpoints 2021-02-25 16:19:12 +01:00
Claudio Atzori b830e33392 mdstore collector plugin 2021-02-25 12:30:30 +01:00
Claudio Atzori 271e88537b code formatting 2021-02-25 12:28:56 +01:00
Claudio Atzori 9c899f4433 cleanup on transformation functions and the relative tests 2021-02-24 15:07:59 +01:00
Claudio Atzori fc3fa5e343 implemented mdstore collector plugin 2021-02-24 15:07:24 +01:00
Claudio Atzori e7eba9f7e7 WIP: transformation workflow error reporting; cleanup 2021-02-17 16:54:08 +01:00
Claudio Atzori 58467aaf1e WIP: transformation workflow error reporting 2021-02-17 16:14:41 +01:00
Claudio Atzori cc88701f29 retry for any Socket exception 2021-02-17 16:13:54 +01:00
Claudio Atzori 545f8f3e48 using jackson objectmapper instead of GSon to serialise the aggregation report 2021-02-17 12:15:00 +01:00
Claudio Atzori b592d78bb4 WIP: collectorWorker error reporting, generalised reported implementation 2021-02-17 10:28:01 +01:00
Claudio Atzori cf27905a71 WIP: collectorWorker error reporting, added report messages 2021-02-16 16:53:14 +01:00
Claudio Atzori 1abe6d1ad7 WIP: collectorWorker error reporting, added report messages 2021-02-15 15:08:59 +01:00
Claudio Atzori 523a6bfa97 Merge pull request 'first commit to the correct branch' (#94) from andreas.czerniak/BrAggr_dnet-hadoop:hadoop_aggregator into hadoop_aggregator
Looks good to me, thanks Andreas!
2021-02-15 12:15:31 +01:00
Sandro La Bruzzo 7edcc87ed4 changed xslt behaviour on failure 2021-02-12 17:27:08 +01:00
Sandro La Bruzzo b3f5c2351d Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator
 Conflicts:
	dhp-workflows/dhp-aggregation/src/test/java/eu/dnetlib/dhp/transformation/TransformationJobTest.java
2021-02-12 16:37:14 +01:00
Sandro La Bruzzo f216277219 Implemented cleaning date 2021-02-12 16:34:52 +01:00
Andreas Czerniak 5a9017cf18 clone, min. changes, test, run 2021-02-12 14:32:36 +01:00
Claudio Atzori aa55dedb8a Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator 2021-02-12 12:31:05 +01:00
Claudio Atzori 29c6f7e255 classes related to the collection workflow moved into common package; implemented MongoDB collection plugins 2021-02-12 12:31:02 +01:00
Sandro La Bruzzo 17e6f1934e fixed NPE on cleaner 2021-02-12 11:48:11 +01:00
Sandro La Bruzzo ebcc3ec14f updated wrong datacite identifier in trasformation 2021-02-11 16:25:51 +01:00
Claudio Atzori bae029f828 collection_java_xmx allows to declare the heap size allocated for the java actions involved in the metadata collectionw workflow 2021-02-08 18:07:23 +01:00
Claudio Atzori bebc54d5bf seq file storing native records is now compressed 2021-02-08 18:06:25 +01:00
Claudio Atzori 50add4c61b added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common 2021-02-08 12:19:38 +01:00
Claudio Atzori 40df0f987d better logging, WIP: collectorWorker error reporting; common functions moved in DHPUtils 2021-02-06 20:12:00 +01:00
Claudio Atzori a8a758925e better logging, WIP: collectorWorker error reporting 2021-02-05 19:18:05 +01:00
Claudio Atzori 730973679a Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator 2021-02-04 17:25:00 +01:00
Claudio Atzori deb85706db imported HttpConnector from https://svn.driver.research-infrastructures.eu/driver/dnet45/modules/dnet-modular-collector-service/trunk/src/main/java/eu/dnetlib/data/collector/plugins/HttpConnector.java as HttpConnector2 2021-02-04 17:24:52 +01:00
Sandro La Bruzzo 4dae5e605d implemented messaging btween collection worker and Dnet 2021-02-04 15:51:15 +01:00
Claudio Atzori 40764cf626 better logging, WIP: collectorWorker error reporting 2021-02-04 14:06:02 +01:00
Claudio Atzori e04045089f better logging, WIP: collectorWorker error reporting 2021-02-03 17:58:22 +01:00
Claudio Atzori 0e8a4f9f1a better logging, WIP: collectorWorker error reporting 2021-02-03 12:33:41 +01:00
Claudio Atzori ac46c247d2 code formatting 2021-02-02 14:24:00 +01:00
Claudio Atzori bde14b149a fixed transformation target paths 2021-02-02 12:49:29 +01:00