Commit Graph

1505 Commits

Author SHA1 Message Date
Claudio Atzori ebf60020ac map results as OPRs in case of missing //CobjCategory/@type and the vocabulary dnet:result_typologies doesn't resolve the super type 2020-07-20 19:01:10 +02:00
Miriam Baglioni 355d7e426e added dumo for project - not finished 2020-07-20 18:54:43 +02:00
Miriam Baglioni a2f01e5259 added getter and setter 2020-07-20 18:54:17 +02:00
Miriam Baglioni 40bbe94f7c merge with master fork 2020-07-20 18:10:03 +02:00
Miriam Baglioni 2a15494b16 merge upstream 2020-07-20 18:05:01 +02:00
Miriam Baglioni 23160b4d29 realignment of the workflow classes with the changes in the structure of the module 2020-07-20 18:04:30 +02:00
Miriam Baglioni b904e0699a - 2020-07-20 18:02:53 +02:00
Miriam Baglioni 3aab7680f6 changed the test results 2020-07-20 18:00:43 +02:00
Miriam Baglioni cde0300801 moved from projects to project 2020-07-20 17:57:35 +02:00
Miriam Baglioni 5076e4f320 changed test to comply with the modifications 2020-07-20 17:55:18 +02:00
Miriam Baglioni 08dbd99455 changed to dump the whole results graph by usign classes already implemented for communities. Added class to dump also organization 2020-07-20 17:54:28 +02:00
Miriam Baglioni e47ea9349c extended some types by adding provenance as the couple (provenance, trust) and moved some classes to be used by the complete graph dump also 2020-07-20 17:46:27 +02:00
Claudio Atzori 32f5e466e3 imports cleanup 2020-07-20 17:42:58 +02:00
Claudio Atzori 54ac583923 code formatting 2020-07-20 17:37:08 +02:00
Claudio Atzori 124e7ce19c in case of missing attribute //dr:CobjCategory/@type the resulttype is derived by looking up the vocabulary dnet:result_typologies with the 1st instance type available 2020-07-20 17:33:37 +02:00
Claudio Atzori 050dda223d Merge pull request 'removed duplicated fields' (#25) from unique_field_in_lists into master
Looks good as a temporary workaround. I agree the model could seamlessly make the distinct operation by using HashSets instead of Linked (or Array) Lists.

The task to update the model in such a way is added on #9#issuecomment-1583

Thanks!
2020-07-20 12:12:50 +02:00
Claudio Atzori e0c4cf6f7b added parameter to drive the graph merge strategy: priority (BETA|PROD) 2020-07-20 10:48:01 +02:00
Claudio Atzori 94ccdb4852 Merge branch 'master' into merge_graph 2020-07-20 10:14:55 +02:00
Claudio Atzori 0937c9998f Merge branch 'deduptesting' 2020-07-20 10:00:20 +02:00
Claudio Atzori de72b1c859 cleanup 2020-07-20 09:59:11 +02:00
Michele Artini 331a3cbdd0 fixed originalId 2020-07-20 09:50:29 +02:00
Michele Artini c59c5369b1 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-18 09:40:54 +02:00
Michele Artini 346a1d2b5a update eventId generator 2020-07-18 09:40:36 +02:00
Sandro La Bruzzo 9116d75b3e Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-17 18:01:30 +02:00
Miriam Baglioni d7d84c8217 - 2020-07-17 14:03:23 +02:00
Miriam Baglioni 47c7122773 changed priority from beta to production 2020-07-17 12:56:35 +02:00
Michele Artini 442f30930c removed duplicated fields 2020-07-17 12:25:36 +02:00
Claudio Atzori 1781609508 code formatting 2020-07-16 19:06:56 +02:00
Claudio Atzori db8b90a156 renamed CORE -> BETA 2020-07-16 19:05:13 +02:00
Miriam Baglioni 44e1c40c42 merge upstream 2020-07-16 18:49:38 +02:00
Claudio Atzori 878f2b931c Merge branch 'master' into merge_graph 2020-07-16 16:34:24 +02:00
Claudio Atzori cc5d13da85 introduced parameter shouldIndex (true|false) 2020-07-16 13:46:39 +02:00
Claudio Atzori b098cc3cbe avoid repeating identical values for fields: source, description 2020-07-16 13:45:53 +02:00
Claudio Atzori 805de4eca1 fix: filter the blocks with size = 1 2020-07-16 10:11:32 +02:00
Claudio Atzori 4b9fb2ffb8 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-07-15 11:26:04 +02:00
Claudio Atzori b90389bac4 code formatting 2020-07-15 11:24:48 +02:00
Claudio Atzori 4e6f46e8fa filter blocks with one record only 2020-07-15 11:22:20 +02:00
Michele Artini 262c29463e relations with multiple datasources 2020-07-15 09:18:40 +02:00
Claudio Atzori 7d6e269b40 reverted CreateRelatedEntitiesJob_phase1 to its previous state 2020-07-13 22:54:04 +02:00
Claudio Atzori 8e97598eb4 avoid to NPE in case of null instances 2020-07-13 20:46:14 +02:00
Claudio Atzori 06def0c0cb SparkBlockStats allows to repartition the input rdd via the numPartitions workflow parameter 2020-07-13 20:09:06 +02:00
miconis b52c246aed merge done 2020-07-13 19:57:02 +02:00
miconis b8a45041fd minor changes 2020-07-13 19:53:18 +02:00
Claudio Atzori 66f9f6d323 adjusted parameters for the dedup stats workflow 2020-07-13 19:26:46 +02:00
miconis 03ecfa5ebd implementation of the test class for the new block stats spark action 2020-07-13 18:48:23 +02:00
miconis 10e08ccf45 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-13 18:22:45 +02:00
miconis 9258e4f095 implementation of a new workflow to compute statistics on the blocks 2020-07-13 18:22:34 +02:00
Claudio Atzori c6f6fb0f28 code formatting 2020-07-13 16:46:13 +02:00
Claudio Atzori 8d2102d7d2 Merge branch 'deduptesting' 2020-07-13 16:32:43 +02:00
Claudio Atzori 344a90c2e6 updated assertions in propagateRelationTest 2020-07-13 16:32:04 +02:00
Claudio Atzori 1143f426aa WIP SparkCreateMergeRels distinct relations 2020-07-13 16:13:36 +02:00
Claudio Atzori 8c67938ad0 configurable number of partitions used in the SparkCreateSimRels phase 2020-07-13 16:07:07 +02:00
Claudio Atzori c73168b18e Merge branch 'deduptesting' of https://code-repo.d4science.org/D-Net/dnet-hadoop into deduptesting 2020-07-13 15:54:58 +02:00
Claudio Atzori c8284bab06 WIP SparkCreateMergeRels distinct relations 2020-07-13 15:54:51 +02:00
Sandro La Bruzzo 1d133b7fe6 update test 2020-07-13 15:52:41 +02:00
Michele Artini 3635d05061 poms 2020-07-13 15:52:23 +02:00
Claudio Atzori 7dd91edf43 parsing of optional parameter 2020-07-13 15:40:41 +02:00
Claudio Atzori 4c101a9d66 WIP SparkCreateMergeRels distinct relations 2020-07-13 15:31:38 +02:00
Claudio Atzori 8a612d861a WIP SparkCreateMergeRels distinct relations 2020-07-13 15:30:57 +02:00
Sandro La Bruzzo 9ef2385022 implemented test for cut of connected component 2020-07-13 15:28:17 +02:00
Sandro La Bruzzo d561b2dd21 implemented cut of connected component 2020-07-13 14:18:42 +02:00
Miriam Baglioni 8e0e090d7a merge upstream 2020-07-13 12:46:55 +02:00
Claudio Atzori e2093e42db Merge branch 'master' into deduptesting 2020-07-13 10:57:49 +02:00
Michele Artini 2c4ed9a043 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-13 10:55:39 +02:00
Michele Artini ccbe5c5658 fixed import of eu.dnetlib.dhp:dnet-openaire-broker-common 2020-07-13 10:55:27 +02:00
Claudio Atzori 7a3fd9f54c dedup relation aggregator moved into dedicated class 2020-07-13 10:11:36 +02:00
Alessia Bardi 7e96105947 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-07-12 19:29:12 +02:00
Alessia Bardi b7a39731a6 assert, not print 2020-07-12 19:28:56 +02:00
Miriam Baglioni f9ad6f3255 Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump 2020-07-10 19:42:53 +02:00
Miriam Baglioni c27f12d6e8 avoid to consider _SUCCESS file 2020-07-10 19:42:23 +02:00
Claudio Atzori 770adc26e9 WIP aggregator to make relationships unique 2020-07-10 19:35:10 +02:00
Claudio Atzori ecf119f37a Merge branch 'master' into deduptesting 2020-07-10 19:04:16 +02:00
Claudio Atzori 31071e363f Merge branch 'provision_indexing' 2020-07-10 19:03:57 +02:00
Claudio Atzori 06c1913062 added different limits for grouping by source and by target, incremented spark.sql.shuffle.partitions for the join operations 2020-07-10 19:03:33 +02:00
Claudio Atzori cc77446dc4 added dbSchema parameter to the raw_db workflow 2020-07-10 19:01:50 +02:00
Claudio Atzori 4c3836f62e materialize the related entities before joining them 2020-07-10 19:00:44 +02:00
Michele Artini e1ae964bc4 stats 2020-07-10 16:12:08 +02:00
Claudio Atzori 752d28f8eb make the relations produced by the dedup SparkPropagateRelation jon unique 2020-07-10 15:09:50 +02:00
Sandro La Bruzzo c01efed79b Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-10 14:44:57 +02:00
Sandro La Bruzzo a7d3977481 added generation of EBI Dataset 2020-07-10 14:44:50 +02:00
Claudio Atzori b21866a2da allow to set different to relations cut points by source and by target; adjusted weight assigned to relationship types 2020-07-10 13:59:48 +02:00
Claudio Atzori ff4d6214f1 experimenting with pruning of relations 2020-07-10 10:06:41 +02:00
Miriam Baglioni faea30cda0 - 2020-07-09 14:05:21 +02:00
Michele Artini 2d742a84ae DedupConfig as json file 2020-07-09 12:53:46 +02:00
Miriam Baglioni a634794242 merge upstream 2020-07-09 11:46:51 +02:00
Michele Artini a44b9b36b9 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-09 11:02:31 +02:00
Michele Artini 1c6a171633 updated pom 2020-07-09 11:02:09 +02:00
Claudio Atzori 3c728aaa0c trying to overcome OOM errors during duplicate scan phase 2020-07-08 22:39:51 +02:00
Claudio Atzori 18c555cd79 Merge branch 'master' into deduptesting 2020-07-08 22:32:01 +02:00
Claudio Atzori 4365cf41d7 trying to overcome OOM errors during duplicate scan phase 2020-07-08 22:31:46 +02:00
Claudio Atzori 67e1d222b6 bulk cleaning when found null or empty, sets bestaccessrights evaluating the result instances 2020-07-08 17:53:35 +02:00
Alessia Bardi 853e8d7987 test for software merge 2020-07-08 17:03:53 +02:00
Claudio Atzori 610d377d57 first implementation of the BETA & PROD graphs merge procedure 2020-07-08 16:54:26 +02:00
Alessia Bardi 9a898c0e4c Json schema generator 2020-07-08 12:52:00 +02:00
Alessia Bardi 636f9ce7d6 json schema generator lib 2020-07-08 12:50:57 +02:00
Alessia Bardi 8f83b726fa Dump json schema compliant to json schema Draft 7 2020-07-08 12:48:46 +02:00
Claudio Atzori e2ea30f89d updated graph construction workflow definition: cleaning wf moved at the bottom to include cleaning of the information produced by the enrichment workflows 2020-07-08 12:16:24 +02:00
Miriam Baglioni 1b0b968548 fixed issue on substring 2020-07-08 12:11:51 +02:00
Miriam Baglioni 7fe00cb4fb - 2020-07-08 10:29:37 +02:00
Miriam Baglioni 375ef07d7b changed the description for the upload 2020-07-07 18:41:27 +02:00
Miriam Baglioni 35c8265793 added the json extention to filename 2020-07-07 18:29:49 +02:00
Miriam Baglioni 81434f8e5e added method newInstance 2020-07-07 18:26:10 +02:00
Miriam Baglioni 817cddfc52 - 2020-07-07 18:25:12 +02:00
Miriam Baglioni a66aa9bd83 removed unuseful tests 2020-07-07 18:25:00 +02:00
Miriam Baglioni 9b20a21b24 removed unuseful tests 2020-07-07 18:23:37 +02:00
Miriam Baglioni 8a1b42ff21 added check to verify that dump contains at least one product 2020-07-07 18:21:35 +02:00
Miriam Baglioni d86adb82a7 - 2020-07-07 18:20:51 +02:00
Miriam Baglioni b2782025f6 enabled the whole workflow to run. Added property to give priority to depenedency in the classpath - to solve conflicts 2020-07-07 18:10:47 +02:00
Miriam Baglioni 83d2c84b77 added constraints to xquery so that to get only profiles with status manager or all 2020-07-07 18:09:48 +02:00
Miriam Baglioni 4c8d86493c - 2020-07-07 18:09:06 +02:00
Miriam Baglioni 0208bc18f3 added new resource for testing 2020-07-07 17:47:24 +02:00
Miriam Baglioni f5bb65c9ef the json schema for the dump of the results 2020-07-07 17:34:40 +02:00
Michele Artini dffa0b01a2 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-07 15:37:29 +02:00
Michele Artini efadbdb2bc fixed a bug with duplicated events 2020-07-07 15:37:13 +02:00
Claudio Atzori 8af8e7481a code formatting 2020-07-07 14:23:34 +02:00
Claudio Atzori b383ed42fa pass optional parameter relationFilter to the PrepareRelationJob implementation 2020-07-07 14:21:28 +02:00
Claudio Atzori 911894a987 Merge branch 'deduptesting' 2020-07-07 14:20:43 +02:00
Miriam Baglioni c19818a3f8 merge branch with fork master 2020-07-06 13:58:23 +02:00
Miriam Baglioni d22240c0ba merge upstream 2020-07-06 13:58:02 +02:00
Enrico Ottonello ca37d3427b separate workflow to parse orcid summaries, activities and generate dataset with no doi publications; test 2020-07-03 23:30:31 +02:00
Michele Artini edf6c6c4dc Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-03 11:48:24 +02:00
Michele Artini 04bebb708c some fixes 2020-07-03 11:48:12 +02:00
Enrico Ottonello 1729cc5cf3 publication conversion from json to oaf test 2020-07-02 18:46:20 +02:00
Claudio Atzori c3d67f709a adjusted dedup configuration for result entities: using new wordssuffixprefix clustering function, removed ngrampairs, adjusted queueMaxSize (800) and slidingWindowSize (80) 2020-07-02 17:35:22 +02:00
Miriam Baglioni f8bf4acd76 - 2020-07-02 16:03:11 +02:00
Miriam Baglioni e6c79d44e6 - 2020-07-02 16:02:02 +02:00
Miriam Baglioni d7f6f0c216 changed code to use other lib 2020-07-02 16:01:34 +02:00
Miriam Baglioni 8fdc9e070c added dependency to OkHttp 2020-07-02 16:01:08 +02:00
Miriam Baglioni 94500a581b merge branch with fork master 2020-07-02 14:25:39 +02:00
Miriam Baglioni c133a23cf0 merge upstream 2020-07-02 14:24:57 +02:00
Claudio Atzori 1d39f7901c Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-07-02 12:45:01 +02:00
Claudio Atzori 0f77cac4b5 fix: deduper must use queueMaxSize instead of groupMaxSize for the block definition 2020-07-02 12:43:51 +02:00
Sandro La Bruzzo 18b9330312 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-02 12:43:19 +02:00
Michele Artini b413db0bff white/blacklists 2020-07-02 12:43:03 +02:00
Claudio Atzori d380b85246 unit test for the preparation of the relations 2020-07-02 12:42:13 +02:00
Claudio Atzori ed1c7e5d75 fixed workflow for the import of the claims alone 2020-07-02 12:40:21 +02:00
Sandro La Bruzzo 07f0723fa7 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-02 12:37:49 +02:00
Sandro La Bruzzo 1d420eedb4 added generation of EBI Dataset 2020-07-02 12:37:43 +02:00
Claudio Atzori e4a29a4513 fixed workflow for the import of the claims alone 2020-07-02 12:36:33 +02:00
Enrico Ottonello 5525f57ec8 converter from orcid work json to oaf 2020-07-01 18:36:14 +02:00
Michele Artini 3bcdfbabe9 list with limits 2020-07-01 08:42:39 +02:00
Michele Artini 59a5421c24 indexing, accumulators, limited lists 2020-06-30 16:17:09 +02:00
Enrico Ottonello b7b6be12a5 fixed enriched works generation 2020-06-29 18:03:16 +02:00
Michele Artini 6f13673464 accumulators 2020-06-29 16:33:32 +02:00
Sandro La Bruzzo dab783b173 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-06-29 09:05:00 +02:00
Michele Artini a6ea432435 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-06-29 08:44:20 +02:00
Michele Artini 35ae381d28 all events matchers 2020-06-29 08:43:56 +02:00
Claudio Atzori 7817338e05 added test to verify the relation pre-processing 2020-06-26 17:58:33 +02:00
Enrico Ottonello b2213b6435 merged with dnet version 2020-06-26 17:27:34 +02:00
Enrico Ottonello c5e149c46e Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-06-26 16:15:38 +02:00
Claudio Atzori 8d59fdf34e WIP: dataset based PrepareRelationsJob 2020-06-26 14:32:58 +02:00
Michele Artini 2393d9da2f limits 2020-06-26 11:20:45 +02:00
Enrico Ottonello d6498278ed added workflow to generate seq(orcidId,work) and seq(orcidId,enrichedWork) 2020-06-25 18:43:29 +02:00
Sandro La Bruzzo 96ce124b59 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-06-25 17:00:43 +02:00
Miriam Baglioni 4a7de07ea2 refactoring 2020-06-25 16:32:40 +02:00
Miriam Baglioni 54a12978d3 fixed issue in xquery 2020-06-25 16:30:20 +02:00
Michele Artini 408165a756 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-06-25 15:53:35 +02:00
Michele Artini e8fb305f18 compilation of event map 2020-06-25 15:53:20 +02:00
Michele Artini 4eb3e109d7 compilation of event map 2020-06-25 15:45:50 +02:00
Claudio Atzori d839e88783 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-06-25 14:06:30 +02:00
Claudio Atzori 6f5771c1c9 sets author.rank when null 2020-06-25 14:06:21 +02:00
Michele Artini e28033c6d8 some fixes 2020-06-25 13:01:09 +02:00
Claudio Atzori 216975c4ec restored complete provision workflow 2020-06-25 12:55:52 +02:00
Claudio Atzori 2d77d3a388 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-06-25 12:54:30 +02:00
Claudio Atzori 93f627ea51 code formatting 2020-06-25 12:54:21 +02:00
Miriam Baglioni 05a99cfb61 change the position of value and description elements in the workflow definition 2020-06-25 12:36:08 +02:00
Claudio Atzori 7df2712824 Merge branch 'provision_indexing' 2020-06-25 12:22:41 +02:00
Claudio Atzori e62333192c WIP: prepare relation job 2020-06-25 12:22:18 +02:00
Claudio Atzori 6933ec11fb WIP: prepare relation job 2020-06-25 11:04:12 +02:00
Sandro La Bruzzo a6c0faac70 added test to verify secondary sorting 2020-06-25 10:48:15 +02:00
Claudio Atzori 69b0391708 WIP: prepare relation job 2020-06-25 10:19:56 +02:00
Michele Artini abcbebcbb4 fixed generation of ids 2020-06-25 09:50:46 +02:00
Michele Artini 77d2a1b1c4 params to choose sql queries for beta or production 2020-06-25 09:28:13 +02:00
Claudio Atzori 46e76affeb WIP: prepare relation job 2020-06-24 19:01:15 +02:00
Claudio Atzori 0e723d378b added default from vocab for missing instance.refereed; remove spurious prefixes from orcid values; WIP: prepare relation job 2020-06-24 18:34:42 +02:00
Enrico Ottonello fcbb4c1489 parser of orcid publication data from xml original dump 2020-06-24 16:29:32 +02:00
Michele Artini 202f6e62ff Splitted join wf 2020-06-24 15:47:06 +02:00
Sandro La Bruzzo 96689a8994 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-06-24 14:06:50 +02:00
Sandro La Bruzzo 46631a4421 updated mapping scholexplorer to OAF 2020-06-24 14:06:38 +02:00
Michele Artini e53dd62e87 minot changes 2020-06-24 09:24:45 +02:00
Michele Artini 8b9933b934 refactoring aggregators 2020-06-24 08:57:13 +02:00
Miriam Baglioni 3e5570de7a - 2020-06-23 15:44:54 +02:00
Michele Artini d13e3d3f68 fixed paths 2020-06-23 11:01:42 +02:00
Michele Artini 8386c6f90d filter of valid resultResult relations 2020-06-23 10:24:15 +02:00
Michele Artini 38bb45d0b6 test osf:refereed 2020-06-23 10:14:39 +02:00
Michele Artini c3286f4c37 fixed relType 2020-06-23 09:32:32 +02:00
Miriam Baglioni 507f7a94a8 added one of the main zenodo communities to the tagging conf for testing purposes 2020-06-23 08:45:27 +02:00
Michele Artini af2f7705fc partial refactoring of some joins 2020-06-23 08:37:35 +02:00
Miriam Baglioni af1d40351b changed XQuery to add also the main Zenodo community among the communities associated to the openaire community 2020-06-22 19:20:54 +02:00
Miriam Baglioni e4b21be004 - 2020-06-22 17:31:50 +02:00
Miriam Baglioni afa19b0c84 changed the way to PUT the files to the rest API 2020-06-22 17:20:07 +02:00
Miriam Baglioni 250fd1c854 merge branch with fork master 2020-06-22 16:25:48 +02:00
Claudio Atzori 8a3bc7c183 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-06-22 14:12:33 +02:00
Claudio Atzori e162ba5075 added dnet workflows to orchestrate the execution of graph2hive, updateSolr and updateStats oozie wfs 2020-06-22 14:12:28 +02:00
Michele Artini 3ce20c198e reformatting 2020-06-22 12:14:25 +02:00
Michele Artini ed787398b3 refactoring wf 2020-06-22 11:45:14 +02:00
Claudio Atzori 9cd27183b6 [maven-release-plugin] prepare for next development iteration 2020-06-22 11:27:44 +02:00
Claudio Atzori 1e3dab0631 [maven-release-plugin] prepare release dhp-1.2.3 2020-06-22 11:27:39 +02:00
Miriam Baglioni df80ae5c1b merge branch with fork master 2020-06-22 10:51:23 +02:00
Miriam Baglioni e8f914f8b3 - 2020-06-22 10:50:41 +02:00
Miriam Baglioni edeb862476 excluded dependency in module that generates conflict 2020-06-22 10:49:56 +02:00
Miriam Baglioni 185facb8e5 change the deprecated DefaultHttpClient with the CLoseableHttpClient 2020-06-22 10:49:10 +02:00
Claudio Atzori 961a0d0b49 [actionset promotion] log debugging info in case of error in the action payload extraction or parsing the data 2020-06-22 10:20:45 +02:00
Claudio Atzori 5e8b922962 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-06-22 09:50:47 +02:00
Claudio Atzori 7d416f08d8 graph cleaning workflow: set hostedby to unknown repository when defined as NULL 2020-06-22 09:50:43 +02:00
Michele Artini 16c7a18435 refactoring 2020-06-22 08:51:31 +02:00
Miriam Baglioni 669a509430 - 2020-06-19 17:39:46 +02:00
Michele Artini f9fc64ffaf âÃMerge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-06-19 15:24:43 +02:00
Michele Artini d88fe0ac84 join methods 2020-06-19 15:24:30 +02:00
Sandro La Bruzzo 464eeeec87 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-06-19 15:11:53 +02:00
Sandro La Bruzzo 1681de672d updated mapping scholexplorer to OAF 2020-06-19 15:11:46 +02:00
Michele Artini 4822747313 some fixes 2020-06-19 13:53:56 +02:00
Michele Artini 834f139e6e fixed some NPE 2020-06-19 12:33:29 +02:00
Claudio Atzori d0ac7514b2 cleaning workflow to include cleaning of default values 2020-06-18 19:37:25 +02:00
Miriam Baglioni 44a12d244f - 2020-06-18 18:38:54 +02:00
Michele Artini 52f62d5d8c events 2020-06-18 14:49:13 +02:00
Miriam Baglioni fb80353018 - 2020-06-18 14:21:36 +02:00
Michele Artini 61634fbfe0 removed kryo encoding 2020-06-18 14:09:58 +02:00
Michele Artini 8d2b199dd2 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-06-18 13:15:34 +02:00
Michele Artini e659b02e6b some wf fixing 2020-06-18 13:15:13 +02:00
Michele Artini 9a847b4557 some wf fixing 2020-06-18 13:14:10 +02:00
Miriam Baglioni 65bf312360 merge branch with fork master 2020-06-18 11:35:27 +02:00
Miriam Baglioni 3953f56bd3 added dependency to pom 2020-06-18 11:34:47 +02:00
Miriam Baglioni a118b66858 - 2020-06-18 11:34:30 +02:00
Miriam Baglioni f9578312b5 - 2020-06-18 11:34:15 +02:00
Miriam Baglioni 8b145e6aba - 2020-06-18 11:25:28 +02:00
Miriam Baglioni e8b3e972f2 changed the input params and the workflow definition to tackle the Result as all result product produced 2020-06-18 11:25:05 +02:00
Miriam Baglioni 3233b01089 changes due to adding all the result type under Result 2020-06-18 11:22:58 +02:00
Miriam Baglioni 5c8533d1a1 changed in the testing classes 2020-06-18 11:20:08 +02:00
Miriam Baglioni bc8611a95a added new resources for testing 2020-06-18 11:19:20 +02:00
Sandro La Bruzzo 9bf67f5de1 resolved conflicts 2020-06-17 09:15:43 +02:00
Sandro La Bruzzo 1d4275acc4 implemented first version of exportation of Scholexplorer into ActionSet 2020-06-17 09:10:38 +02:00
miconis 5233b15265 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-06-16 18:31:19 +02:00
miconis 11b77b9f4e json dumps for entity merge test modified to fit the new model. title merge adjusted to fix the error 2020-06-16 18:31:11 +02:00
Claudio Atzori 64f02de5d3 updated workflow definition to include the cleaning step 2020-06-16 17:48:51 +02:00
Claudio Atzori 306669209f code formatting 2020-06-16 16:54:44 +02:00
Claudio Atzori 1bc1d15eaf stubbing for mock datasource.identities must be typed as array 2020-06-16 16:54:28 +02:00
Claudio Atzori 631fef12a7 Merge branch 'master' into dhp_oaf_model 2020-06-16 16:11:19 +02:00
Michele Artini 9e2c23e391 partial refactoring 2020-06-16 15:55:42 +02:00
Michele Artini 113c9b1de0 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-06-16 15:53:39 +02:00
Michele Artini 76ea7607f7 partial refactoring 2020-06-16 15:53:13 +02:00
Claudio Atzori 603b1bd0bb Merge branch 'master' into dhp_oaf_model 2020-06-16 15:43:59 +02:00
Claudio Atzori 5441f01586 Merge pull request 'missing landingPage urls in instances' (#22) from instances-with-landing-page into master
Looks good, thanks!
2020-06-16 15:32:44 +02:00
Claudio Atzori 89859111ee Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-06-16 15:28:29 +02:00
Claudio Atzori 4ec262db53 included externalreference(s) in the result view on the Hive graph DB 2020-06-16 15:28:20 +02:00
Michele Artini 8a4f84f8c0 refactoring 2020-06-16 12:34:13 +02:00
Claudio Atzori 2a4f65795f WIP: graph cleaner implementation 2020-06-15 18:32:24 +02:00
Claudio Atzori c15c8c0ad0 map datasource identities (including piwik ids) as original IDs 2020-06-15 16:07:30 +02:00
Miriam Baglioni 9dd3ef22c5 merge branch with fork master 2020-06-15 11:23:26 +02:00
Miriam Baglioni 68cf0fd03f test input 2020-06-15 11:14:42 +02:00
Miriam Baglioni 0467145ae3 test for graph dump 2020-06-15 11:13:51 +02:00
Miriam Baglioni e43eedb5b0 added resources and workflow for dump of community products 2020-06-15 11:13:21 +02:00
Miriam Baglioni f96ca900e1 fixed issues while running on cluster 2020-06-15 11:12:14 +02:00
Miriam Baglioni 20b9e67728 added new class funder 2020-06-15 11:06:18 +02:00
Claudio Atzori 0d52816244 WIP: graph cleaner implementation 2020-06-13 13:06:04 +02:00
Claudio Atzori bed65a1be6 WIP: graph cleaner implementation 2020-06-12 18:25:47 +02:00
Claudio Atzori c4d9f1837f [maven-release-plugin] prepare for next development iteration 2020-06-12 12:21:08 +02:00
Claudio Atzori f0746a7605 [maven-release-plugin] prepare release dhp-1.2.2 2020-06-12 12:21:03 +02:00
Claudio Atzori 463489f59f code formatting 2020-06-12 12:03:25 +02:00
Claudio Atzori 4bcad1c9c3 Merge branch 'graph_cleaning' 2020-06-12 11:40:25 +02:00
Claudio Atzori cdb1956fe9 WIP: graph cleaner implementation 2020-06-12 11:36:59 +02:00
Alessia Bardi b347499745 do not use deprecated subreltype 2020-06-12 10:58:02 +02:00
Claudio Atzori 97b1c4057c WIP: graph cleaner implementation 2020-06-12 10:45:18 +02:00
Claudio Atzori ba8a024af9 avoid NPEs merging titles 2020-06-12 10:45:11 +02:00
Michele Artini 30ea1bda88 oozie workflow 2020-06-12 10:42:35 +02:00
Michele Artini c22cb5a3c6 refactoring 2020-06-12 09:47:55 +02:00
Michele Artini 472cf77639 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-06-11 14:30:47 +02:00
Michele Artini c6b5bb3f17 orcid events 2020-06-11 14:30:24 +02:00
Michele Artini c2e1b66e83 Revert "orcid events"
This reverts commit 48959e9a17.
2020-06-11 14:28:03 +02:00
Michele Artini 48959e9a17 orcid events 2020-06-11 14:24:02 +02:00
Miriam Baglioni e145972962 - 2020-06-11 13:08:39 +02:00
Miriam Baglioni a01800224c - 2020-06-11 13:02:04 +02:00
Miriam Baglioni 356dd582a3 map construction moved in class 2020-06-11 12:59:22 +02:00
Alessia Bardi e79943965b Fixes #5604: field oamandatepublications in XML 2020-06-11 12:49:31 +02:00
Michele Artini a41e0cb648 missing landingPage urls in instances 2020-06-11 12:28:34 +02:00
Michele Artini 04fdcacd83 results with all joined entities 2020-06-11 11:25:18 +02:00
Michele Artini 99f88e1cb8 fixed generation entities from claims 2020-06-11 10:51:57 +02:00
Miriam Baglioni db27663750 - 2020-06-11 10:49:01 +02:00
Miriam Baglioni bb9f21d0e7 job test for class producing first step of results dump 2020-06-11 10:20:05 +02:00
Claudio Atzori d1d92c4d8c fixed integration of claims in the graph 2020-06-11 10:12:00 +02:00
Claudio Atzori 953da4a427 Merge branch 'master' into graph_cleaning 2020-06-10 21:36:56 +02:00
Claudio Atzori f1bce64391 WIP: graph cleaner implementation 2020-06-10 21:36:31 +02:00
Claudio Atzori 67c7b31ba6 Merge branch 'master' into graph_cleaning 2020-06-10 15:00:35 +02:00
Claudio Atzori 3ebf81d2b0 Merge pull request 'oaf-store-interpretation' (#21) from oaf-store-interpretation into master
Looks good, thanks Michele!
2020-06-10 14:58:09 +02:00
Michele Artini 5869cb76b3 reformatting 2020-06-10 12:11:16 +02:00
Michele Artini c08e66e01e fixed a workflow parameter 2020-06-10 10:11:56 +02:00
Michele Artini 7177a32d75 import of invisible stores 2020-06-10 10:04:00 +02:00
Claudio Atzori ce12f236bb disabled test, need to need to update the joined_entity.json file 2020-06-09 20:07:36 +02:00
Claudio Atzori a2fdf85ba1 WIP: graph cleaner implementation 2020-06-09 19:52:53 +02:00
Alessia Bardi 4551c1082f mapping csv for orcid 2020-06-09 18:08:47 +02:00
Alessia Bardi 2d3f7d1eb4 fixed log classes to make the ORCID test run 2020-06-09 18:07:14 +02:00
Alessia Bardi a3a6755d58 mapping csv for Unpaywall 2020-06-09 17:45:44 +02:00
Claudio Atzori d9f33582c5 WIP: graph cleaner implementation 2020-06-09 17:20:40 +02:00
Alessia Bardi f3b033cf09 added csv line for funders from Crossref 2020-06-09 17:08:26 +02:00
Alessia Bardi 79969d78b9 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-06-09 17:05:39 +02:00
Alessia Bardi fc4d220964 updated function name for SNSF 2020-06-09 17:05:31 +02:00
Michele Artini baaa55f4a3 use of pace to calculate trusts 2020-06-09 16:01:31 +02:00
Alessia Bardi 33b130ec43 Mapping instructions for MAG 2020-06-09 15:57:15 +02:00
Miriam Baglioni 206abba48c merge branch with fork master 2020-06-09 15:41:14 +02:00
Miriam Baglioni a089db18f1 workflow and parameters to exucute the dump 2020-06-09 15:39:38 +02:00