Commit Graph

92 Commits

Author SHA1 Message Date
miconis f446580e9f code refactoring (useless classes and wf removed), implementation of the test for the openorgs dedup 2021-03-29 16:10:46 +02:00
miconis 2355cc4e9b minor changes and bug fix 2021-03-29 10:07:12 +02:00
miconis 28c1cdd132 merged stable_ids into openorgswf 2021-03-25 10:44:49 +01:00
miconis 5dfb66b0fa minor changes 2021-03-25 10:29:34 +01:00
miconis 348b0ef921 bug fix, implementation of the workflow for the creation of raw_organizations (openorgs dedup), addition of the pid lists to the openorgs postgres db 2021-03-24 15:51:27 +01:00
miconis 98854b0124 minor changes 2021-03-19 16:57:40 +01:00
Claudio Atzori 972d5a3d98 [dedup] Datacite should be authoritative for datasets 2021-03-19 09:04:20 +01:00
miconis 1a85020572 bug fix in graph-mapper, changes in the implementation of the openorgs wf to create relations and populate openorgs db 2021-02-26 10:19:28 +01:00
miconis 4b2124a18e implementation of the openorgs wfs, implementation of the raw_all wf to migrate openorgs db entities 2021-02-10 11:51:50 +01:00
miconis 8fea29177c refactoring, minor changes and implementation of the wf for openorgs with integration of organization phases into the scan wf 2021-01-18 16:48:08 +01:00
miconis 1e1aab83e3 implementation of the raw wf for openorgs: still not complete, some functionalities are missing 2020-12-21 11:58:21 +01:00
Claudio Atzori d9532446eb imported more diffs from master branch; code formatting 2020-12-10 16:14:16 +01:00
Claudio Atzori 1eaad89a3c do not fail on uknown properties when grouping entities by ID 2020-12-10 15:56:11 +01:00
Claudio Atzori 758d27745d cleaning tab characters from text fields 2020-11-27 16:07:24 +01:00
Claudio Atzori 5151850a19 CROSSREF and DATACITE constants moved in common ModelConstants 2020-11-26 13:08:36 +01:00
Claudio Atzori d0d5525d40 minor changes 2020-11-26 11:04:17 +01:00
Claudio Atzori 13eae4b31e GroupEntitiesSparkJob must read all graph paths but relations 2020-11-26 11:04:01 +01:00
Claudio Atzori 76363a8512 SimpleDateFormat is not thread safe; improved error reporting in case of invalid dates 2020-11-26 11:03:12 +01:00
Claudio Atzori dfd6205b95 Consistency graph workflow merges all the entities by ID 2020-11-25 14:55:32 +01:00
Claudio Atzori e5da4ee9b1 dedup workflow using the common PidComparator 2020-11-04 15:02:02 +01:00
Claudio Atzori 385214eeae code formatting 2020-10-30 15:47:05 +01:00
miconis c4a59d1b9a merge with the master to port the new packages 2020-10-20 16:07:30 +02:00
miconis 0e54803177 bug fix in the id generator and implementation of jobs for organization dedup 2020-10-20 12:19:46 +02:00
miconis 6f8720982c bug fix in the idgenerator and test implementation 2020-10-09 09:30:23 +02:00
Sandro La Bruzzo 734934e2eb fixed error on empty intersection with publication and relation on export to OAF 2020-10-08 17:29:29 +02:00
Sandro La Bruzzo eec418cd26 moved AuthoreMerger into dhp-common 2020-10-08 10:33:55 +02:00
miconis 1804c5d809 refactoring: classes moved in the right package 2020-10-06 16:44:51 +02:00
miconis 7093355487 bug fix and minor changes 2020-10-06 16:21:34 +02:00
miconis a2ac7e52fb implementation of the workflow for new organizations in openorgs 2020-10-06 13:58:09 +02:00
miconis e3f7798d1b minor changes in dedup tests, bug fix in the idgenerator and pace-core version update 2020-09-29 15:31:46 +02:00
miconis 4cf79f32eb implementation of the oozie wf to prepare the openorgs input: relations between organizations 2020-09-25 11:29:51 +02:00
miconis 259362ef47 implementation of the job to collect simrels from postgres db 2020-09-22 09:43:27 +02:00
miconis d47352cbc7 refactoring of the procedure for the id generation, minor changes and addition of a comparation on the original id and the origin datasource 2020-07-24 20:10:47 +02:00
miconis b260fee787 implementation of the dedup_id generation using pids to make the graph more stable 2020-07-22 17:29:48 +02:00
Claudio Atzori de72b1c859 cleanup 2020-07-20 09:59:11 +02:00
Claudio Atzori 805de4eca1 fix: filter the blocks with size = 1 2020-07-16 10:11:32 +02:00
Claudio Atzori b90389bac4 code formatting 2020-07-15 11:24:48 +02:00
Claudio Atzori 4e6f46e8fa filter blocks with one record only 2020-07-15 11:22:20 +02:00
Claudio Atzori 06def0c0cb SparkBlockStats allows to repartition the input rdd via the numPartitions workflow parameter 2020-07-13 20:09:06 +02:00
miconis b52c246aed merge done 2020-07-13 19:57:02 +02:00
miconis b8a45041fd minor changes 2020-07-13 19:53:18 +02:00
Claudio Atzori 66f9f6d323 adjusted parameters for the dedup stats workflow 2020-07-13 19:26:46 +02:00
miconis 03ecfa5ebd implementation of the test class for the new block stats spark action 2020-07-13 18:48:23 +02:00
miconis 10e08ccf45 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-13 18:22:45 +02:00
miconis 9258e4f095 implementation of a new workflow to compute statistics on the blocks 2020-07-13 18:22:34 +02:00
Claudio Atzori c6f6fb0f28 code formatting 2020-07-13 16:46:13 +02:00
Claudio Atzori 1143f426aa WIP SparkCreateMergeRels distinct relations 2020-07-13 16:13:36 +02:00
Claudio Atzori 8c67938ad0 configurable number of partitions used in the SparkCreateSimRels phase 2020-07-13 16:07:07 +02:00
Claudio Atzori c8284bab06 WIP SparkCreateMergeRels distinct relations 2020-07-13 15:54:51 +02:00
Claudio Atzori 7dd91edf43 parsing of optional parameter 2020-07-13 15:40:41 +02:00