Commit Graph

879 Commits

Author SHA1 Message Date
Miriam Baglioni 72df8f9232 Hosted By Map - removed the aggregator for the datasource (it is no more needed) and added a new aggregator for the results. Changed also the hostedBYMap aggregator 2021-08-02 19:34:44 +02:00
Miriam Baglioni ff1ce75e33 Hosted By Map - modification in the code to prepare the info needed to apply the HostedByMap. There is no need to join datasources with the hbm: all the information needed is in the hosted by map already 2021-08-02 19:32:59 +02:00
Miriam Baglioni 1695d45bd4 Hosted By Map - Test class to verify the preparation of the intermediate information 2021-07-30 17:57:01 +02:00
Miriam Baglioni 7c6ea2f4c7 Hosted By Map - first attempt for the creation of intermedia information to be used to applu the hosted by map on the graph entities 2021-07-30 17:56:27 +02:00
Miriam Baglioni d8b9b0553b Hosted By Map - model classes to store the intermediate information to be used to apply the hosted by map 2021-07-30 17:55:39 +02:00
Miriam Baglioni 613bd3bde0 Hosted By Map - refactor of the first attemp to prepare a new hosted by map dependent on the datasource in the graph and on two external sources: the gold list from unibi ad the doaj list of open access journal. Both the lists are downloaded from provided url parameter 2021-07-30 17:54:45 +02:00
Miriam Baglioni d1807781c0 mergin with branch beta 2021-07-30 14:34:07 +02:00
Miriam Baglioni 1d6ac3715b merge branch with beta 2021-07-30 11:58:29 +02:00
Claudio Atzori 19620eed46 applying PR#131, Patch the identifiers (source/target) in the relations, refinements 2021-07-30 11:09:32 +02:00
Miriam Baglioni baad01cadc hostedbymap 2021-07-29 13:04:39 +02:00
Claudio Atzori 5d08ad86ae [raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations 2021-07-29 13:03:16 +02:00
Claudio Atzori e87e1805c4 [raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset 2021-07-29 12:13:06 +02:00
Claudio Atzori e1797c0a42 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-07-28 16:21:36 +02:00
Claudio Atzori 6dddad86ee [cleaning] title cleaning based on the me.xuender:unidecode library 2021-07-28 16:21:29 +02:00
Alessia Bardi c806387d4b tests for enermaps 2021-07-28 11:54:36 +02:00
Claudio Atzori 2fff24df55 code formatting 2021-07-28 11:34:19 +02:00
Miriam Baglioni 708d0ade34 Merge branch 'beta' into hostedbymap 2021-07-28 10:37:22 +02:00
Miriam Baglioni 0424f47494 HostedByMap fixing issues 2021-07-28 10:24:13 +02:00
Claudio Atzori 5aa7d16d1b updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest 2021-07-27 15:11:58 +02:00
Miriam Baglioni 74f801b689 mergin with branch beta 2021-07-27 13:18:31 +02:00
Miriam Baglioni eb07f7f40f Hosted By Map 2021-07-27 12:27:26 +02:00
Sandro La Bruzzo 848aabbb6c minor fix 2021-07-25 12:06:41 +02:00
Sandro La Bruzzo 8fac10c91e fixed defintion wf of creation final infospace of scholexplorer 2021-07-25 11:15:37 +02:00
Sandro La Bruzzo 3920c69bc8 change implementation of resolve Relation to generate jsonRdd in output 2021-07-25 09:51:36 +02:00
Sandro La Bruzzo d9e3b89937 implemented last part of workflows to generate scholixGraph 2021-07-23 16:38:32 +02:00
Sandro La Bruzzo cfde63a7c3 fixed resolve relation join 2021-07-23 14:17:29 +02:00
Sandro La Bruzzo 4a439c3863 NPE fixed 2021-07-23 14:17:29 +02:00
Sandro La Bruzzo ca74e8dd02 create a separate wf for resolving relation 2021-07-23 11:40:06 +02:00
Sandro La Bruzzo 43e9380cd3 update resolve relation to use the same format of openaire graph 2021-07-23 11:25:18 +02:00
Sandro La Bruzzo 62ae36a3d2 fixed NPE 2021-07-22 15:41:38 +02:00
Sandro La Bruzzo 31d2d6d41e Scholexplorer: introduction of dedup openaire 2021-07-21 18:09:32 +02:00
Claudio Atzori 65934888a1 adding record identifier among the originalIds regardless of what IdentifierFactory produces 2021-07-19 17:52:52 +02:00
Claudio Atzori 0977baf41d contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph 2021-07-19 17:43:52 +02:00
Sandro La Bruzzo 7e2caafe84 Scholexplorer: fixed mapping typologies 2021-07-15 09:53:12 +02:00
Sandro La Bruzzo bbe8193930 merged stable ids 2021-07-12 17:00:43 +02:00
Sandro La Bruzzo 09fccf8000 added workflow to serialize scholix and summary in json 2021-07-09 11:01:42 +02:00
Sandro La Bruzzo 0ea576745f updated CreateInputGraph because ggenerics don't work on Spark Dataset 2021-07-09 10:29:24 +02:00
Sandro La Bruzzo cd17e19044 implemented branch workflow to import datacite and crossref in scholexplorer 2021-07-08 21:20:19 +02:00
Sandro La Bruzzo 8a034e46e1 updated baseline workflow 2021-07-08 11:11:41 +02:00
Claudio Atzori b7b8e0986e [raw_all] The claim merge procedure includes the claimed contexts in the merged result 2021-07-08 10:42:31 +02:00
Sandro La Bruzzo 0799ac9fb6 fixed wrong path 2021-07-08 10:36:37 +02:00
Sandro La Bruzzo 4d53402712 extended ebiLinks to create a dataset before generation of OAF 2021-07-08 10:26:21 +02:00
Sandro La Bruzzo a4a54a3786 code refactor 2021-07-08 09:08:25 +02:00
Sandro La Bruzzo a01dbe0ab0 completed workflow of generation of scholix and summaries 2021-07-07 23:10:34 +02:00
Claudio Atzori fdcff42e46 [raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity 2021-07-07 19:01:59 +02:00
Claudio Atzori bc014023c8 Merge pull request 'to solve the scala SI-3623' (#122) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
Reviewed-on: #122
2021-07-07 11:13:51 +02:00
Claudio Atzori 32bdfdccbc [raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity 2021-07-07 11:08:27 +02:00
Claudio Atzori f580cb77e1 added mapping for claim relation 'resultResult_publicationDataset_isRelatedTo' (present on BETA) 2021-07-06 21:11:11 +02:00
Sandro La Bruzzo 8535506c22 added scholix generation 2021-07-06 17:18:06 +02:00
Sandro La Bruzzo 4c54bd8742 add test to verify merge scholix on source 2021-07-06 11:32:14 +02:00