Commit Graph

183 Commits

Author SHA1 Message Date
Sandro La Bruzzo 16ae3c9ccf updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset 2020-07-30 09:25:32 +02:00
Claudio Atzori 56bbfdc65d introduced parameter 'numParitions', driving the hive DB table data partitioning. Currently specified only for table 'project' 2020-07-23 08:54:10 +02:00
Claudio Atzori ebf60020ac map results as OPRs in case of missing //CobjCategory/@type and the vocabulary dnet:result_typologies doesn't resolve the super type 2020-07-20 19:01:10 +02:00
Claudio Atzori 32f5e466e3 imports cleanup 2020-07-20 17:42:58 +02:00
Claudio Atzori 54ac583923 code formatting 2020-07-20 17:37:08 +02:00
Claudio Atzori 124e7ce19c in case of missing attribute //dr:CobjCategory/@type the resulttype is derived by looking up the vocabulary dnet:result_typologies with the 1st instance type available 2020-07-20 17:33:37 +02:00
Claudio Atzori 050dda223d Merge pull request 'removed duplicated fields' (#25) from unique_field_in_lists into master
Looks good as a temporary workaround. I agree the model could seamlessly make the distinct operation by using HashSets instead of Linked (or Array) Lists.

The task to update the model in such a way is added on #9#issuecomment-1583

Thanks!
2020-07-20 12:12:50 +02:00
Claudio Atzori e0c4cf6f7b added parameter to drive the graph merge strategy: priority (BETA|PROD) 2020-07-20 10:48:01 +02:00
Claudio Atzori 94ccdb4852 Merge branch 'master' into merge_graph 2020-07-20 10:14:55 +02:00
Michele Artini 331a3cbdd0 fixed originalId 2020-07-20 09:50:29 +02:00
Sandro La Bruzzo 9116d75b3e Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-17 18:01:30 +02:00
Miriam Baglioni 47c7122773 changed priority from beta to production 2020-07-17 12:56:35 +02:00
Michele Artini 442f30930c removed duplicated fields 2020-07-17 12:25:36 +02:00
Claudio Atzori 1781609508 code formatting 2020-07-16 19:06:56 +02:00
Claudio Atzori 878f2b931c Merge branch 'master' into merge_graph 2020-07-16 16:34:24 +02:00
Michele Artini e1ae964bc4 stats 2020-07-10 16:12:08 +02:00
Sandro La Bruzzo c01efed79b Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-10 14:44:57 +02:00
Sandro La Bruzzo a7d3977481 added generation of EBI Dataset 2020-07-10 14:44:50 +02:00
Claudio Atzori 67e1d222b6 bulk cleaning when found null or empty, sets bestaccessrights evaluating the result instances 2020-07-08 17:53:35 +02:00
Claudio Atzori 610d377d57 first implementation of the BETA & PROD graphs merge procedure 2020-07-08 16:54:26 +02:00
Sandro La Bruzzo 1d420eedb4 added generation of EBI Dataset 2020-07-02 12:37:43 +02:00
Claudio Atzori 6f5771c1c9 sets author.rank when null 2020-06-25 14:06:21 +02:00
Claudio Atzori 7df2712824 Merge branch 'provision_indexing' 2020-06-25 12:22:41 +02:00
Michele Artini abcbebcbb4 fixed generation of ids 2020-06-25 09:50:46 +02:00
Michele Artini 77d2a1b1c4 params to choose sql queries for beta or production 2020-06-25 09:28:13 +02:00
Claudio Atzori 0e723d378b added default from vocab for missing instance.refereed; remove spurious prefixes from orcid values; WIP: prepare relation job 2020-06-24 18:34:42 +02:00
Claudio Atzori 7d416f08d8 graph cleaning workflow: set hostedby to unknown repository when defined as NULL 2020-06-22 09:50:43 +02:00
Claudio Atzori d0ac7514b2 cleaning workflow to include cleaning of default values 2020-06-18 19:37:25 +02:00
Sandro La Bruzzo 9bf67f5de1 resolved conflicts 2020-06-17 09:15:43 +02:00
Sandro La Bruzzo 1d4275acc4 implemented first version of exportation of Scholexplorer into ActionSet 2020-06-17 09:10:38 +02:00
Claudio Atzori 5441f01586 Merge pull request 'missing landingPage urls in instances' (#22) from instances-with-landing-page into master
Looks good, thanks!
2020-06-16 15:32:44 +02:00
Claudio Atzori 2a4f65795f WIP: graph cleaner implementation 2020-06-15 18:32:24 +02:00
Claudio Atzori c15c8c0ad0 map datasource identities (including piwik ids) as original IDs 2020-06-15 16:07:30 +02:00
Claudio Atzori 0d52816244 WIP: graph cleaner implementation 2020-06-13 13:06:04 +02:00
Claudio Atzori bed65a1be6 WIP: graph cleaner implementation 2020-06-12 18:25:47 +02:00
Claudio Atzori 463489f59f code formatting 2020-06-12 12:03:25 +02:00
Claudio Atzori 4bcad1c9c3 Merge branch 'graph_cleaning' 2020-06-12 11:40:25 +02:00
Claudio Atzori cdb1956fe9 WIP: graph cleaner implementation 2020-06-12 11:36:59 +02:00
Alessia Bardi b347499745 do not use deprecated subreltype 2020-06-12 10:58:02 +02:00
Claudio Atzori 97b1c4057c WIP: graph cleaner implementation 2020-06-12 10:45:18 +02:00
Claudio Atzori ba8a024af9 avoid NPEs merging titles 2020-06-12 10:45:11 +02:00
Michele Artini a41e0cb648 missing landingPage urls in instances 2020-06-11 12:28:34 +02:00
Michele Artini 99f88e1cb8 fixed generation entities from claims 2020-06-11 10:51:57 +02:00
Claudio Atzori d1d92c4d8c fixed integration of claims in the graph 2020-06-11 10:12:00 +02:00
Claudio Atzori 953da4a427 Merge branch 'master' into graph_cleaning 2020-06-10 21:36:56 +02:00
Claudio Atzori f1bce64391 WIP: graph cleaner implementation 2020-06-10 21:36:31 +02:00
Michele Artini 7177a32d75 import of invisible stores 2020-06-10 10:04:00 +02:00
Claudio Atzori a2fdf85ba1 WIP: graph cleaner implementation 2020-06-09 19:52:53 +02:00
Claudio Atzori d9f33582c5 WIP: graph cleaner implementation 2020-06-09 17:20:40 +02:00
Claudio Atzori b2349659cf WIP: graph property fixing implementation 2020-06-05 18:37:38 +02:00