Commit Graph

276 Commits

Author SHA1 Message Date
Claudio Atzori dc98c39500 more logging 2021-02-25 12:29:18 +01:00
Claudio Atzori fc3fa5e343 implemented mdstore collector plugin 2021-02-24 15:07:24 +01:00
Claudio Atzori cf27905a71 WIP: collectorWorker error reporting, added report messages 2021-02-16 16:53:14 +01:00
Claudio Atzori 58288a95b8 WIP: collectorWorker error reporting, added report messages 2021-02-15 15:28:53 +01:00
Claudio Atzori 1abe6d1ad7 WIP: collectorWorker error reporting, added report messages 2021-02-15 15:08:59 +01:00
Claudio Atzori 29c6f7e255 classes related to the collection workflow moved into common package; implemented MongoDB collection plugins 2021-02-12 12:31:02 +01:00
Claudio Atzori 50add4c61b added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common 2021-02-08 12:19:38 +01:00
Claudio Atzori 40df0f987d better logging, WIP: collectorWorker error reporting; common functions moved in DHPUtils 2021-02-06 20:12:00 +01:00
Claudio Atzori a8a758925e better logging, WIP: collectorWorker error reporting 2021-02-05 19:18:05 +01:00
Michele Artini 2ee0c3e47e http entity as json string 2021-02-05 09:45:39 +01:00
Claudio Atzori 730973679a Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator 2021-02-04 17:25:00 +01:00
Claudio Atzori deb85706db imported HttpConnector from https://svn.driver.research-infrastructures.eu/driver/dnet45/modules/dnet-modular-collector-service/trunk/src/main/java/eu/dnetlib/data/collector/plugins/HttpConnector.java as HttpConnector2 2021-02-04 17:24:52 +01:00
Sandro La Bruzzo 4dae5e605d implemented messaging btween collection worker and Dnet 2021-02-04 15:51:15 +01:00
Claudio Atzori 72c57b28fa switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT 2021-02-04 14:08:18 +01:00
Claudio Atzori 40764cf626 better logging, WIP: collectorWorker error reporting 2021-02-04 14:06:02 +01:00
Michele Artini 26d2eb946f messages sender 2021-02-04 09:45:46 +01:00
Michele Artini 1b9731632b Message Sender 2021-02-03 16:42:36 +01:00
Michele Artini 820d729e99 recover of Message and MessageType class 2021-02-03 16:20:34 +01:00
Claudio Atzori 0e8a4f9f1a better logging, WIP: collectorWorker error reporting 2021-02-03 12:33:41 +01:00
Claudio Atzori d62ea1490d cleaned up RabbitMQ stuff 2021-02-02 10:53:19 +01:00
Claudio Atzori 73d772a4b4 added method to list the known vocabulary names 2021-02-02 10:39:47 +01:00
Claudio Atzori 8eaa1fd4b4 WIP: metadata collection in INCREMENTAL mode and relative test 2021-02-01 19:29:10 +01:00
Sandro La Bruzzo 6ff234d81b Implemented a first prototype of incremental harvesting and trasformation using readlock 2021-02-01 13:56:05 +01:00
Sandro La Bruzzo 0276180039 WIP mdstore
transaction implemented on hadoop side
2021-01-29 16:42:41 +01:00
Michele Artini d942d0c77d methods toString(), hashCode() and equals() 2021-01-29 13:16:48 +01:00
Michele Artini 38f2508c87 new fields in mdstore beans 2021-01-28 08:24:45 +01:00
Sandro La Bruzzo a54848a59c Moved Vocabulary stuff to common module 2021-01-25 15:43:04 +01:00
Claudio Atzori 28460c2cd1 using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper 2020-12-23 16:59:52 +01:00
Claudio Atzori 6848d0c3d7 trivial: avoid duplicated code 2020-12-23 12:21:58 +01:00
Claudio Atzori d8b5f43a7e code formatting 2020-12-22 14:59:03 +01:00
miconis 794e22b09c bug fix in the authormerge: now authors with higher size have priority, normalization of author name fixed 2020-12-21 17:51:42 +01:00
Claudio Atzori 12e2f930c8 resolved conflicts 2020-12-10 10:57:39 +01:00
Alessia Bardi 112da6d76a in theory, just auto-formatting after mvn compile 2020-12-09 20:00:27 +01:00
Miriam Baglioni 6fbc67a959 using ModelConstant.ORCID and removing not used constants 2020-12-09 17:10:20 +01:00
Claudio Atzori 3c5ce1dada code formatting 2020-12-09 17:07:20 +01:00
Miriam Baglioni 212b52614f added graph mapper versus community result without context and project in common to be used for the doiboost mapping 2020-12-09 16:59:02 +01:00
Claudio Atzori 491ad24750 introduced filtering for DOIs in graph cleaning workflow 2020-12-09 09:10:33 +01:00
Claudio Atzori 943b961cf6 introduced PidBlacklist 2020-12-02 09:30:34 +01:00
Claudio Atzori 893ac4a77b GenerateEntitiesApplication can be configured to hash the id value or not 2020-12-02 09:30:06 +01:00
Claudio Atzori 349e7246aa do not consider NCID, GBIF as PIDs candidate for the ID creation 2020-11-30 16:52:40 +01:00
Claudio Atzori 2c407e775e GenerateEntitiesApplication can be configured to hash the id value or not 2020-11-30 12:00:38 +01:00
Claudio Atzori 758d27745d cleaning tab characters from text fields 2020-11-27 16:07:24 +01:00
Claudio Atzori 596a2a459d added testing class for OafMapperUtils 2020-11-27 12:01:11 +01:00
Claudio Atzori fa66e5b6b8 ResultTypeComparator gives priority to Records collectedfrom Crossref 2020-11-26 13:09:19 +01:00
Claudio Atzori d0d5525d40 minor changes 2020-11-26 11:04:17 +01:00
Miriam Baglioni 66c0e3e574 changed because of D-Net/dnet-hadoop#61 (comment) 2020-11-25 17:52:17 +01:00
Claudio Atzori 1372a4d1bf fixed merging method 2020-11-25 16:05:51 +01:00
Claudio Atzori dfd6205b95 Consistency graph workflow merges all the entities by ID 2020-11-25 14:55:32 +01:00
Claudio Atzori e1a1bb3ee4 moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects 2020-11-24 18:34:03 +01:00
Claudio Atzori e43ab07af6 code formatting 2020-11-24 14:41:39 +01:00
Miriam Baglioni 73dbb79602 removed the checl for the community name in the common version on MakeTar 2020-11-24 14:36:15 +01:00
Claudio Atzori c016cc050a IdentifierFactory: in case a record provides more than one pid of the same type, the the lexicographically lower value is chosen as best pick 2020-11-23 19:16:40 +01:00
Claudio Atzori 3f34757c63 merged from master 2020-11-19 14:34:54 +01:00
Claudio Atzori 2bed29eb09 WIP: added oozie workflow for grouping graph entities by id 2020-11-13 10:05:12 +01:00
Claudio Atzori 13e36a4da0 WIP: added oozie workflow for grouping graph entities by id 2020-11-13 10:05:02 +01:00
Claudio Atzori 9b0fb9e958 merged from master 2020-11-12 09:27:12 +01:00
Miriam Baglioni f8e9bda24c merge branch with master 2020-11-05 16:31:18 +01:00
Miriam Baglioni 7ebdfacee9 removed commented code and added documentation to new method 2020-11-05 16:30:36 +01:00
Claudio Atzori 4625b7486e code formatting 2020-11-04 18:12:43 +01:00
Claudio Atzori e5da4ee9b1 dedup workflow using the common PidComparator 2020-11-04 15:02:02 +01:00
Claudio Atzori ea2a0ea949 IdentifierFactory considers only DOIs matching a given regex 2020-11-03 18:43:37 +01:00
Miriam Baglioni d4382b54df moved the tar archive with maz size on common module 2020-11-03 16:54:50 +01:00
Claudio Atzori 86d6fbe95b refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong 2020-11-03 12:19:46 +01:00
Claudio Atzori 3fcd669e99 result merge operation leverage on custom ResultTypeComparator in the aggregator graph construction 2020-11-03 10:53:23 +01:00
Claudio Atzori 78c3c1b62b exclude pid values set to 'none' 2020-11-02 14:25:26 +01:00
Claudio Atzori 09e44dabff Merge branch 'master' into stable_ids 2020-11-02 12:16:01 +01:00
Miriam Baglioni 10d8bbada8 changed deprecated method with non deprecated versioen 2020-10-30 14:10:10 +01:00
Claudio Atzori 58f28296ea ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values 2020-10-30 10:56:42 +01:00
Miriam Baglioni 4cf4454341 changed from deprecated method to new one 2020-10-27 17:46:19 +01:00
Miriam Baglioni c8f32dd109 - 2020-10-27 17:45:58 +01:00
Miriam Baglioni 3582eba565 - 2020-10-27 17:31:33 +01:00
Miriam Baglioni 3241ec1777 added connection timeout and socket timeout 600 sec 2020-10-27 16:12:11 +01:00
Miriam Baglioni cc68855a1e merge upstream 2020-10-27 15:54:16 +01:00
Miriam Baglioni 1cb60aede4 added connection timeout and socket timeout 600 sec 2020-10-27 15:53:02 +01:00
sandro 3a81a940b7 solved bug on merge publication 2020-10-21 22:41:55 +02:00
Claudio Atzori c188868450 Merge branch 'master' into stable_ids 2020-10-16 12:06:23 +02:00
Miriam Baglioni 959f30811e added connection timeout and socket timeout 600 sec 2020-10-16 10:52:30 +02:00
Sandro La Bruzzo 734934e2eb fixed error on empty intersection with publication and relation on export to OAF 2020-10-08 17:29:29 +02:00
Sandro La Bruzzo eec418cd26 moved AuthoreMerger into dhp-common 2020-10-08 10:33:55 +02:00
Claudio Atzori 8958f20813 code formatting 2020-10-07 13:14:31 +02:00
Claudio Atzori 1abcabb6e6 WIP stable ids: IdentifierFactory & unit test 2020-10-06 18:55:23 +02:00
Claudio Atzori 6ce340bd3d WIP stable ids: IdentifierFactory 2020-10-06 15:44:53 +02:00
Claudio Atzori 49ae3450a9 code formatting 2020-10-02 09:43:24 +02:00
Claudio Atzori 1c44182dea minor changes 2020-10-02 09:41:34 +02:00
Miriam Baglioni ccd48dd78a added new test for new method 2020-09-25 16:33:43 +02:00
Miriam Baglioni 3e5497b336 added new method to handle an open deposition to which upload data 2020-09-25 16:33:15 +02:00
Claudio Atzori 8a523474b7 code formatting 2020-09-07 11:40:16 +02:00
Miriam Baglioni c7f944a533 refactoring due to compilation 2020-08-19 10:01:26 +02:00
Miriam Baglioni 02a4986e7b Applying changed from code reviews D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment) 2020-08-13 11:53:01 +02:00
Miriam Baglioni 33a6a51333 Disabled Test (impossible to publish without accessToken). And applying changes from code review D-Net/dnet-hadoop#40 (comment) 2020-08-13 11:48:32 +02:00
Miriam Baglioni 306603272e removed accession token 2020-08-12 09:39:58 +02:00
Miriam Baglioni 30a2b19b65 changed metadata for deposition od covid-19 dump in Zenodo 2020-08-11 17:36:56 +02:00
Miriam Baglioni 10e2af3d3b - 2020-08-11 16:08:06 +02:00
Miriam Baglioni 0603ec4757 changed test to upload the dump for covid-19 community 2020-08-11 15:43:25 +02:00
Miriam Baglioni ecd2081f84 refactoring 2020-08-11 14:17:31 +02:00
Miriam Baglioni a6df01d329 - 2020-08-10 11:59:10 +02:00
Miriam Baglioni bcc70dce5e added dependency to POM 2020-08-07 16:46:18 +02:00
Miriam Baglioni 7e54b2189c resources for the test of the automatic deposition in Zenodo 2020-08-07 16:45:59 +02:00
Miriam Baglioni d161fa2aeb test for the automatic deposition in Zenodo 2020-08-07 16:45:28 +02:00
Miriam Baglioni 545ea9f77e moved in common. Zenodo response model and APIClient to deposit in Zenodo 2020-08-07 16:44:51 +02:00
Claudio Atzori 93052ae384 WIP: set the connect & request timeout for BindingProvider service implementation 2020-06-25 16:16:02 +02:00
Claudio Atzori 9cd27183b6 [maven-release-plugin] prepare for next development iteration 2020-06-22 11:27:44 +02:00
Claudio Atzori 1e3dab0631 [maven-release-plugin] prepare release dhp-1.2.3 2020-06-22 11:27:39 +02:00
Claudio Atzori 0d52816244 WIP: graph cleaner implementation 2020-06-13 13:06:04 +02:00
Claudio Atzori c4d9f1837f [maven-release-plugin] prepare for next development iteration 2020-06-12 12:21:08 +02:00
Claudio Atzori f0746a7605 [maven-release-plugin] prepare release dhp-1.2.2 2020-06-12 12:21:03 +02:00
Claudio Atzori 463489f59f code formatting 2020-06-12 12:03:25 +02:00
miconis fa8c5bcd39 javadoc for the PacePerson class and implementation of a unit test 2020-06-11 12:19:32 +02:00
Miriam Baglioni 54d869e618 merge upstream 2020-05-26 09:22:04 +02:00
Claudio Atzori 7582532e73 [maven-release-plugin] prepare for next development iteration 2020-05-25 19:48:18 +02:00
Claudio Atzori 01c2e93395 [maven-release-plugin] prepare release dhp-1.2.1 2020-05-25 19:48:14 +02:00
Miriam Baglioni 8f6ce970f9 moved PacePerson to dhp-common to avoid conflict in dependency with graph-mapper 2020-05-25 10:25:55 +02:00
Miriam Baglioni 2abb84877d Merge branch 'master' into blacklist 2020-05-11 10:37:49 +02:00
Miriam Baglioni 871e079b45 merged with master 2020-05-11 10:20:00 +02:00
Claudio Atzori 60c40618d3 [maven-release-plugin] prepare for next development iteration 2020-05-11 10:17:14 +02:00
Claudio Atzori c267d958d5 [maven-release-plugin] prepare release dhp-1.2.0 2020-05-11 10:17:10 +02:00
Claudio Atzori 42f1a2bf94 bumped project version to 1.2.0-SNAPSHOT 2020-05-11 10:05:57 +02:00
Claudio Atzori 0ccc864ad9 [maven-release-plugin] prepare for next development iteration 2020-05-08 17:01:31 +02:00
Claudio Atzori 6e47c724c6 [maven-release-plugin] prepare release dhp-1.1.7 2020-05-08 17:01:27 +02:00
Miriam Baglioni 4c94231cad merge with master fork 2020-05-08 12:25:57 +02:00
Miriam Baglioni 31ea05297d moved the DbClient to common and added needed dependency to pom 2020-05-04 12:22:28 +02:00
Claudio Atzori 439c6255a2 cleanup 2020-04-29 19:09:07 +02:00
Claudio Atzori 77ac995770 cleaned up poms, added descriptions 2020-04-29 18:44:17 +02:00
Miriam Baglioni f7695e833c resolved conflicts 2020-04-29 11:41:31 +02:00
Claudio Atzori 6f5b899038 reformatted code according to the updated style descriptor 2020-04-28 11:23:29 +02:00
Claudio Atzori a0bdbacdae switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:52:31 +02:00
Claudio Atzori 7a3f8085f7 switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:45:40 +02:00
Claudio Atzori ad7a131b18 introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project 2020-04-18 12:42:58 +02:00
Claudio Atzori 038ac7afd7 relation consistency workflow separated from dedup scan and creation of CCs 2020-04-17 13:12:44 +02:00
Claudio Atzori 47f3d9b757 unit test for GraphHiveImporterJob 2020-04-08 13:24:43 +02:00
Claudio Atzori d74e128aa6 Utility classes moved in dhp-common and dhp-schemas 2020-04-07 11:56:22 +02:00
Claudio Atzori 3d1b637cab dataset based provision WIP 2020-04-04 14:03:43 +02:00
Claudio Atzori 377e1ba840 [maven-release-plugin] prepare for next development iteration 2020-03-30 20:06:00 +02:00
Claudio Atzori 76d9315129 [maven-release-plugin] prepare release dhp-1.1.6 2020-03-30 20:05:56 +02:00
Sandro La Bruzzo 0cd022ad6a merge with master 2020-03-26 14:08:29 +01:00
Claudio Atzori 19b2048109 code formatting 2020-03-25 17:40:38 +01:00
Michele Artini ebe45003d9 fixed some junit packages 2020-03-25 16:45:03 +01:00
Sandro La Bruzzo addaaa091f migrate relation from RDD to Dataset 2020-03-13 09:13:20 +01:00
Sandro La Bruzzo b021b8a2e1 Added index wf 2020-02-24 10:15:55 +01:00
Claudio Atzori 33185fd0b7 ISLookupClientFactory moved in dhp-common 2020-02-19 16:56:38 +01:00
Sandro La Bruzzo 2b8675462f refactoring code 2020-02-19 10:07:08 +01:00
Claudio Atzori 56d1810a66 working procedure for records indexing using Spark, via lib com.lucidworks.spark:spark-solr 2020-02-14 12:28:52 +01:00
Claudio Atzori 1ee1baa8c0 Merge branch 'master' into provision_indexing 2020-02-13 18:17:07 +01:00
Claudio Atzori a3d0b57b25 [maven-release-plugin] prepare for next development iteration 2020-02-13 18:11:33 +01:00
Claudio Atzori 6ed9a15bc8 [maven-release-plugin] prepare release dhp-1.1.5 2020-02-13 18:11:31 +01:00
Claudio Atzori 49e648f7c3 bumped version 2020-02-13 18:09:31 +01:00
Claudio Atzori 956da2f923 added Saxon-HE extension functions and Transformer factory class 2020-02-13 16:49:45 +01:00
Sandro La Bruzzo 19a80e4638 implemented workfow for aggregation and generation of infospace graph 2020-01-24 09:58:55 +01:00
Sandro La Bruzzo abd9034da0 implemented DedupRecord factory with the merge of publications 2019-12-11 15:43:24 +01:00
Claudio Atzori 7fe6835b47 [maven-release-plugin] prepare for next development iteration 2019-11-07 17:39:30 +01:00