Commit Graph

276 Commits

Author SHA1 Message Date
Sandro La Bruzzo 3010a362bc updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset 2020-07-30 09:25:56 +02:00
Sandro La Bruzzo 487226f669 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-30 09:25:39 +02:00
Sandro La Bruzzo 16ae3c9ccf updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset 2020-07-30 09:25:32 +02:00
Michele Artini 35e6e9c064 tests 2020-07-28 12:02:15 +02:00
Claudio Atzori 56bbfdc65d introduced parameter 'numParitions', driving the hive DB table data partitioning. Currently specified only for table 'project' 2020-07-23 08:54:10 +02:00
Sandro La Bruzzo 9ab594ccf6 fixed test 2020-07-21 10:36:21 +02:00
Claudio Atzori ebf60020ac map results as OPRs in case of missing //CobjCategory/@type and the vocabulary dnet:result_typologies doesn't resolve the super type 2020-07-20 19:01:10 +02:00
Claudio Atzori 32f5e466e3 imports cleanup 2020-07-20 17:42:58 +02:00
Claudio Atzori 54ac583923 code formatting 2020-07-20 17:37:08 +02:00
Claudio Atzori 124e7ce19c in case of missing attribute //dr:CobjCategory/@type the resulttype is derived by looking up the vocabulary dnet:result_typologies with the 1st instance type available 2020-07-20 17:33:37 +02:00
Claudio Atzori 050dda223d Merge pull request 'removed duplicated fields' (#25) from unique_field_in_lists into master
Looks good as a temporary workaround. I agree the model could seamlessly make the distinct operation by using HashSets instead of Linked (or Array) Lists.

The task to update the model in such a way is added on #9#issuecomment-1583

Thanks!
2020-07-20 12:12:50 +02:00
Claudio Atzori e0c4cf6f7b added parameter to drive the graph merge strategy: priority (BETA|PROD) 2020-07-20 10:48:01 +02:00
Claudio Atzori 94ccdb4852 Merge branch 'master' into merge_graph 2020-07-20 10:14:55 +02:00
Michele Artini 331a3cbdd0 fixed originalId 2020-07-20 09:50:29 +02:00
Sandro La Bruzzo 9116d75b3e Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-17 18:01:30 +02:00
Miriam Baglioni 47c7122773 changed priority from beta to production 2020-07-17 12:56:35 +02:00
Michele Artini 442f30930c removed duplicated fields 2020-07-17 12:25:36 +02:00
Claudio Atzori 1781609508 code formatting 2020-07-16 19:06:56 +02:00
Claudio Atzori 878f2b931c Merge branch 'master' into merge_graph 2020-07-16 16:34:24 +02:00
Claudio Atzori 31071e363f Merge branch 'provision_indexing' 2020-07-10 19:03:57 +02:00
Claudio Atzori cc77446dc4 added dbSchema parameter to the raw_db workflow 2020-07-10 19:01:50 +02:00
Michele Artini e1ae964bc4 stats 2020-07-10 16:12:08 +02:00
Sandro La Bruzzo c01efed79b Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-10 14:44:57 +02:00
Sandro La Bruzzo a7d3977481 added generation of EBI Dataset 2020-07-10 14:44:50 +02:00
Claudio Atzori 67e1d222b6 bulk cleaning when found null or empty, sets bestaccessrights evaluating the result instances 2020-07-08 17:53:35 +02:00
Claudio Atzori 610d377d57 first implementation of the BETA & PROD graphs merge procedure 2020-07-08 16:54:26 +02:00
Claudio Atzori ed1c7e5d75 fixed workflow for the import of the claims alone 2020-07-02 12:40:21 +02:00
Sandro La Bruzzo 1d420eedb4 added generation of EBI Dataset 2020-07-02 12:37:43 +02:00
Claudio Atzori e4a29a4513 fixed workflow for the import of the claims alone 2020-07-02 12:36:33 +02:00
Claudio Atzori 6f5771c1c9 sets author.rank when null 2020-06-25 14:06:21 +02:00
Claudio Atzori 2d77d3a388 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-06-25 12:54:30 +02:00
Miriam Baglioni 05a99cfb61 change the position of value and description elements in the workflow definition 2020-06-25 12:36:08 +02:00
Claudio Atzori 7df2712824 Merge branch 'provision_indexing' 2020-06-25 12:22:41 +02:00
Michele Artini abcbebcbb4 fixed generation of ids 2020-06-25 09:50:46 +02:00
Michele Artini 77d2a1b1c4 params to choose sql queries for beta or production 2020-06-25 09:28:13 +02:00
Claudio Atzori 0e723d378b added default from vocab for missing instance.refereed; remove spurious prefixes from orcid values; WIP: prepare relation job 2020-06-24 18:34:42 +02:00
Michele Artini 38bb45d0b6 test osf:refereed 2020-06-23 10:14:39 +02:00
Claudio Atzori 9cd27183b6 [maven-release-plugin] prepare for next development iteration 2020-06-22 11:27:44 +02:00
Claudio Atzori 1e3dab0631 [maven-release-plugin] prepare release dhp-1.2.3 2020-06-22 11:27:39 +02:00
Claudio Atzori 7d416f08d8 graph cleaning workflow: set hostedby to unknown repository when defined as NULL 2020-06-22 09:50:43 +02:00
Claudio Atzori d0ac7514b2 cleaning workflow to include cleaning of default values 2020-06-18 19:37:25 +02:00
Sandro La Bruzzo 9bf67f5de1 resolved conflicts 2020-06-17 09:15:43 +02:00
Sandro La Bruzzo 1d4275acc4 implemented first version of exportation of Scholexplorer into ActionSet 2020-06-17 09:10:38 +02:00
Claudio Atzori 1bc1d15eaf stubbing for mock datasource.identities must be typed as array 2020-06-16 16:54:28 +02:00
Claudio Atzori 5441f01586 Merge pull request 'missing landingPage urls in instances' (#22) from instances-with-landing-page into master
Looks good, thanks!
2020-06-16 15:32:44 +02:00
Claudio Atzori 89859111ee Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-06-16 15:28:29 +02:00
Claudio Atzori 4ec262db53 included externalreference(s) in the result view on the Hive graph DB 2020-06-16 15:28:20 +02:00
Michele Artini 8a4f84f8c0 refactoring 2020-06-16 12:34:13 +02:00
Claudio Atzori 2a4f65795f WIP: graph cleaner implementation 2020-06-15 18:32:24 +02:00
Claudio Atzori c15c8c0ad0 map datasource identities (including piwik ids) as original IDs 2020-06-15 16:07:30 +02:00