Commit Graph

3182 Commits

Author SHA1 Message Date
Miriam Baglioni 69e9ea9eeb [Graph Dump] Test for extraction of rels from entities extended 2021-12-23 10:15:30 +01:00
Miriam Baglioni 31b26d48ac [Graph Dump] fixed issue on extraction of relation between entities and contexts: the relationship name and type were swapped 2021-12-23 10:09:47 +01:00
Miriam Baglioni 10579c0dd0 [FOS]fixed doi value in test 2021-12-22 23:10:16 +01:00
Miriam Baglioni 6116fc5d40 [FOS]added logic to include only different subjects. Test refactoring and extention 2021-12-22 23:04:22 +01:00
Miriam Baglioni b81efb6a9d [FOS]changed the mapping between the csv and the model. Changed Test classes and resources 2021-12-22 21:40:35 +01:00
Miriam Baglioni de6c4c8968 [FOS]creation of the unresolved entities: remove the split for the doi: no more needed since each row is related to one doi 2021-12-22 16:44:44 +01:00
Miriam Baglioni 34ac56565d refactoring 2021-12-22 16:28:11 +01:00
Miriam Baglioni 20ef1d657f refactoring 2021-12-22 16:26:36 +01:00
Miriam Baglioni 813f856d3f [BipFinder] removing left over parameter in wf 2021-12-22 16:11:12 +01:00
Miriam Baglioni 2c126ed014 [BipFinder] create unresolved entities with measures at the level of the instance 2021-12-22 16:03:41 +01:00
Miriam Baglioni 0807fdb65a [BipFinder] remove not needed resources 2021-12-22 15:37:00 +01:00
Miriam Baglioni b5e11a3a0a [BipFinder] put in common package BipFinder model 2021-12-22 15:33:05 +01:00
Miriam Baglioni c5739c4266 [BipFinder] create action set for the measures at the level of the result 2021-12-22 15:08:33 +01:00
Miriam Baglioni da5f6260aa mergin with branch beta 2021-12-22 13:12:02 +01:00
Miriam Baglioni be0acccf42 Merge branch 'beta' into dump 2021-12-22 12:39:57 +01:00
Antonis Lempesis 16539d7360 added usage stats 2021-12-22 02:54:42 +02:00
Antonis Lempesis 3edd661608 fixed column names 2021-12-21 22:55:04 +02:00
Antonis Lempesis a4c0cbb98c fixed typos in indicators. Added extra views in monitor 2021-12-21 15:54:38 +02:00
Miriam Baglioni e24a7f3496 mergin with branch beta 2021-12-21 13:57:19 +01:00
Miriam Baglioni d1ae219cb4 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-12-21 13:55:53 +01:00
Miriam Baglioni 460e6b95d6 [Graph Dump] - 2021-12-21 13:48:03 +01:00
Sandro La Bruzzo 3920d68992 Fixed workflow generation of delta in datacite 2021-12-21 11:41:49 +01:00
Antonis Lempesis 58996972d9 added first indicator of sprint 5 2021-12-21 03:35:04 +02:00
dimitrispie c1cdec09a9 Sprint 5 and other changes 2021-12-20 19:23:57 +02:00
Miriam Baglioni 3cc1b7b153 mergin with branch beta 2021-12-15 17:25:02 +01:00
Miriam Baglioni 63b648b0dd Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-12-15 12:41:15 +01:00
Antonis Lempesis f0b523cfa7 removed the too restrctive clause. will discuss again 2021-12-15 12:32:15 +01:00
Sandro La Bruzzo b881ee5ef8 [scholexplorer]
- implemented generation of scholix of delta update of datacite
2021-12-15 11:25:32 +01:00
Sandro La Bruzzo 63952018c0 [scholexplorer]
-moved SparkRetrieveDataciteDelta in scala folder
2021-12-15 11:25:32 +01:00
Sandro La Bruzzo e5bff64f2e [scholexplorer]
- Minor fix on SparkConvertRDDtoDataset
-first implementation of retrieve datacite dump
2021-12-15 11:25:32 +01:00
Claudio Atzori 1790fa2d44 Merge branch 'beta' into affiliationPropagation 2021-12-14 15:26:56 +01:00
Miriam Baglioni 56409d1281 [Dump] resolved conflicts with beta and merging 2021-12-14 15:03:45 +01:00
Miriam Baglioni 22d4b5619b [BipFinder Result] last changes to test and resources files 2021-12-14 14:54:13 +01:00
Miriam Baglioni 6fb6236cd4 changed the way to produce the AS for bipFinder. 2021-12-14 14:51:14 +01:00
Miriam Baglioni 573bd17cbb Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-12-14 11:12:25 +01:00
Miriam Baglioni 4eb8276493 - 2021-12-14 11:12:17 +01:00
Miriam Baglioni 936578aaf1 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-12-13 15:01:47 +01:00
Miriam Baglioni 8d755cca80 - 2021-12-13 15:01:40 +01:00
Claudio Atzori 98eb292c59 avoid NPEs merging XMLInstance(s) 2021-12-13 13:27:20 +01:00
Claudio Atzori 5e17247bb6 avoid NPEs merging XMLInstance(s) 2021-12-13 11:48:40 +01:00
Claudio Atzori b70ecccea0 avoid NPEs merging XMLInstance(s) 2021-12-12 12:37:38 +01:00
Claudio Atzori c1b6ae47cd cleaning workflow assigns the proper default instance type when a value could not be cleaned using the vocabularies 2021-12-09 16:47:41 +01:00
Claudio Atzori eb43eda42a Merge branch 'beta' into graph_cleaning 2021-12-09 16:46:48 +01:00
Claudio Atzori 41c70c607d cleaning workflow assigns the proper default instance type when a value could not be cleaned using the vocabularies 2021-12-09 16:44:28 +01:00
Alessia Bardi cba63e9f82 Merge branch 'beta' into sygma_indexing 2021-12-09 15:52:16 +01:00
Alessia Bardi e53228401b style 2021-12-09 15:46:22 +01:00
Claudio Atzori cd9c51fd7a vocabulary based cleaning considers also the term label when looking up for a synonym 2021-12-09 14:49:24 +01:00
Claudio Atzori e6e177dda0 vocabulary based cleaning considers also the term label when looking up for a synonym 2021-12-09 13:57:53 +01:00
Alessia Bardi 6b5d7688a4 #7275 serialize license information in XML records 2021-12-09 13:46:48 +01:00
Miriam Baglioni b113586207 resolved conflicts 2021-12-07 10:16:14 +01:00
Sandro La Bruzzo 5d51b3dd4a Merge pull request 'scala_refactor' (#169) from scala_refactor into beta
Reviewed-on: D-Net/dnet-hadoop#169
2021-12-06 15:33:44 +01:00
Miriam Baglioni d9836f0cf3 [OpenCitations] fixed test when executed one after the other 2021-12-06 15:27:09 +01:00
Miriam Baglioni d1df01ff1e [Graph Dump] fixed resource for test 2021-12-06 15:15:48 +01:00
Sandro La Bruzzo ed0c352799 [test-fixing] fixed wrong test 2021-12-06 15:07:41 +01:00
Miriam Baglioni 96a7d46278 [Graph Dump] fixed tests 2021-12-06 15:06:32 +01:00
Sandro La Bruzzo e9f285ec4d [scala-refactor] Module dhp-doiboost:
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 14:24:03 +01:00
Sandro La Bruzzo bf880e2508 [scala-refactor] Module dhp-graph-mapper:
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 13:57:41 +01:00
Sandro La Bruzzo 7af0bbd0b1 [scala-refactor] Module dhp-aggregation:
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 11:26:36 +01:00
Claudio Atzori 08795cbd30 using helper method from ModelSupport to find the inverse relation descriptor 2021-12-06 10:39:56 +01:00
Miriam Baglioni f430688ff7 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-12-03 12:36:08 +01:00
Miriam Baglioni 4bb1d43afc - 2021-12-03 12:35:51 +01:00
Sandro La Bruzzo f7011b90d8 format code 2021-12-03 11:15:09 +01:00
Claudio Atzori dd0b2e5244 Merge branch 'beta' into instance_group_by_url 2021-12-03 09:27:58 +01:00
Claudio Atzori 863a2f9db3 avoid to filter OAF records defined as invisible = true 2021-12-03 09:08:12 +01:00
Claudio Atzori 9cac283bec implemented Instance serialization features requested in https://support.openaire.eu/issues/7156 2021-12-02 17:20:33 +01:00
Miriam Baglioni d9f80488cc [GRAPH DUMP] Add one more test to check the filtering of the relations 2021-12-02 14:15:19 +01:00
Miriam Baglioni 58bc3f223a [GRAPH DUMP] Add filtering for relation we do not want to dump. It is based on the relclass 2021-12-02 14:09:46 +01:00
Miriam Baglioni 8905a39bf3 mergin with branch beta 2021-12-02 13:17:29 +01:00
Miriam Baglioni 87eedad898 - 2021-12-02 13:17:19 +01:00
Claudio Atzori 3b19821f3c added stats computation on the graph hive DB tables 2021-12-02 10:44:10 +01:00
Claudio Atzori cfa4560769 minor: fixed hive action name 2021-12-02 10:43:36 +01:00
Claudio Atzori d85af6fc25 [cleaning wf] fixed OAF record navigation, a mapping defined on a container object would have prevented the natvigation to continue on its properties 2021-12-01 15:49:15 +01:00
Claudio Atzori 4fe7888817 code formatting 2021-12-01 15:48:15 +01:00
Claudio Atzori 01e5e0142a added test to verify the relation inverse lookup operation 2021-12-01 09:46:26 +01:00
Claudio Atzori 0df9574a6f Merge pull request '[stats wf] Added sprint 3&4 of indicators' (#166) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#166
2021-11-29 10:40:26 +01:00
Claudio Atzori 1de881b796 resolved conflicts for #165 2021-11-26 16:15:11 +01:00
Claudio Atzori 014e872ae1 [resolution wf] added optional parameter to skip the entity resolution 2021-11-26 15:38:56 +01:00
Claudio Atzori 5c6d328537 code formatting 2021-11-26 15:38:16 +01:00
dimitrispie 09fc2afdca Added indi_funder_country_collab
Kept only indi_pub_has_cc_licence
2021-11-26 16:13:10 +02:00
Antonis Lempesis 0b4163ee0b added sprint3,4, removed 2, chaos 2021-11-26 15:58:01 +02:00
dimitrispie 29f69f2f89 Sprint 4 2021-11-26 15:22:04 +02:00
Miriam Baglioni ac07ed8251 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-11-25 12:32:58 +01:00
Miriam Baglioni 5fd0e610bf [DOIBOOST Process] fix filtering to filter results with non null id 2021-11-25 12:10:45 +01:00
Sandro La Bruzzo feea154e89 remove working dir after test 2021-11-25 11:02:38 +01:00
Sandro La Bruzzo 028a8acad8 add test resources 2021-11-25 10:54:47 +01:00
Sandro La Bruzzo 2164a2a889 Datacite: Code Refactor generated a general SparkApplication Scala where all the spark scala have to inherit
Commented a little the Datacite transformation code
2021-11-25 10:54:13 +01:00
Miriam Baglioni 3f9b2ba8ce [Hosted By Map] fix issue in test 2021-11-22 16:59:43 +01:00
Sandro La Bruzzo a7cf277d98 Datacite: Removed HostedBy Patch as described on ticket #7219, Now all the records will have hosted by Unknown Repository 2021-11-22 16:03:17 +01:00
Sandro La Bruzzo 483d3039d1 entity resolution: added distcpt of missing entities in graph materialization 2021-11-22 15:55:24 +01:00
Sandro La Bruzzo 93fe8ce8b2 entity resolution: fix test 2021-11-22 15:50:43 +01:00
Sandro La Bruzzo 35e20b0647 updated resolution wf:
- generate a new version of the graph
 - changed merge from union to join
2021-11-22 11:48:55 +01:00
Miriam Baglioni fdb75b180e [Cleaning] added couple of tests for DOIBOOST publications 2021-11-21 16:35:22 +01:00
Miriam Baglioni 0506fa2654 [Graph Dump] changed to mirror the changes in the model 2021-11-19 15:56:25 +01:00
Sandro La Bruzzo 3426451d3f Merge remote-tracking branch 'origin/beta' into beta 2021-11-19 14:49:04 +01:00
Sandro La Bruzzo 4542a2338b updated site configuration to deploy on website 2021-11-19 13:44:08 +01:00
Claudio Atzori e5a2c596b2 Merge branch 'beta' into preserve_openorg_parent_child_relations 2021-11-19 11:35:46 +01:00
Claudio Atzori f4538f3c4c cleanup 2021-11-19 11:33:10 +01:00
Claudio Atzori 2b46b87f56 fixed filtering criteria applied in SparkCopyRelationsNoOpenorgs to keep the parent/child relations from OpenOrgs 2021-11-19 11:30:29 +01:00
Miriam Baglioni 9fae872181 [Graph Dump] changed to mirror the changes in the model 2021-11-19 11:25:50 +01:00
Sandro La Bruzzo fc03c99805 fixed javadocs url after deploying site 2021-11-19 10:46:33 +01:00
Sandro La Bruzzo 0c0d561bc4 added public class into tests to create correct javadoc 2021-11-19 09:54:22 +01:00
Claudio Atzori 62fa61f3cf merge from beta 2021-11-19 09:23:42 +01:00
Claudio Atzori bd9a43cefd Revert to 4094f2bb9a 2021-11-19 09:20:43 +01:00
Claudio Atzori 3a4d925386 Merge branch 'beta' into hierarchical_orgs_relations 2021-11-18 18:07:08 +01:00
Claudio Atzori 3974fa7dc1 Merge branch 'beta' into affiliationPropagation 2021-11-18 18:06:26 +01:00
Claudio Atzori a24b9f8268 [dedup] trivial refactoring 2021-11-18 17:12:02 +01:00
Claudio Atzori c0750fb17c avoid non necessary count operations over large spark datasets 2021-11-18 17:11:31 +01:00
Claudio Atzori bb5dca7979 cleanup 2021-11-18 17:10:46 +01:00
Miriam Baglioni 793b5a8e5f Aggiornare 'dhp-workflows/dhp-graph-mapper/src/main/java/eu/dnetlib/dhp/oa/graph/dump/ResultMapper.java'
Removing the dump of Measure at the level of the result. We decided not to map it
2021-11-18 14:49:38 +01:00
Miriam Baglioni 5dc5792722 [Graph Dump] Change test resource to mirror the movement of the measure element 2021-11-18 14:39:12 +01:00
Miriam Baglioni 0136a8c266 [Graph Dump] Change test to mirror that measure is at the level of the isntance 2021-11-18 14:38:33 +01:00
Miriam Baglioni 1b79c0ee79 mergin with branch beta 2021-11-18 11:01:00 +01:00
Antonis Lempesis cb3adb90f4 Merge branch 'beta' into beta 2021-11-17 14:33:45 +01:00
Antonis Lempesis c283406829 added Universidad Polytecnica de Madrid 2021-11-17 15:33:00 +02:00
Claudio Atzori e0395719d7 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-11-17 14:17:27 +01:00
Claudio Atzori 82a4e4efae [cleaning wf] fixed methodology to rule out invalid result titles, based on https://support.openaire.eu/issues/7206 2021-11-17 14:17:22 +01:00
Miriam Baglioni 6d4a1c57ee [Resolve Entities] Change test dataset to mirror the modification in the creation of the map between the pids and the unresolved 2021-11-17 12:41:52 +01:00
Sandro La Bruzzo 9c82d670b8 make class public in order to create javadoc 2021-11-17 12:31:02 +01:00
Sandro La Bruzzo 1f5ee116ed code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
fixed test
2021-11-17 12:23:52 +01:00
Sandro La Bruzzo 2fd9ceac13 code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala 2021-11-17 11:35:22 +01:00
Sandro La Bruzzo 2506d7a679 Merge branch 'mvn_site_documentation' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation 2021-11-17 11:07:24 +01:00
Sandro La Bruzzo cded363b55 code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala 2021-11-17 11:06:35 +01:00
Miriam Baglioni 4094f2bb9a added integration md file 2021-11-17 10:04:52 +01:00
Miriam Baglioni ec8b0219ff [Documentation] Added first page for Integration via unresolved entities generation 2021-11-16 17:41:34 +01:00
Miriam Baglioni 2bbece2ca5 mergin with branch beta 2021-11-16 16:35:40 +01:00
Sandro La Bruzzo 2d67020c59 added dhp-enrichment maven site template 2021-11-16 16:01:08 +01:00
Miriam Baglioni 28ea532ece [Affilaition Propagation] moved the selection of graph relation as a preparation step 2021-11-16 15:24:19 +01:00
Sandro La Bruzzo 18c1d70ef4 Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation 2021-11-16 15:16:49 +01:00
Sandro La Bruzzo a1cafaf2e3 added mvn site for dnet-hadoop project 2021-11-16 15:16:28 +01:00
Miriam Baglioni 7c96e3fd46 removed not useful dir 2021-11-16 13:57:26 +01:00
Miriam Baglioni c7c0c3187b [AFFILIATION PROPAGATION] Applied some SonarLint suggestions 2021-11-16 13:56:32 +01:00
Miriam Baglioni c6a9f0a1a8 mergin with branch beta 2021-11-16 12:04:40 +01:00
Miriam Baglioni 99d86134f5 [Graph Dump] changed the dump since the measures have been moded at the level of the instance 2021-11-16 12:04:21 +01:00
Claudio Atzori 0a727d325d [dedup] increased number of partitions in the consistency phase 2021-11-16 08:43:41 +01:00
Claudio Atzori bafa2990f3 code formatting 2021-11-15 17:07:16 +01:00
Claudio Atzori 668ac25224 [graph resolution] using existing argument parser file name 2021-11-15 17:02:45 +01:00
Claudio Atzori 7d0a03f607 [graph resolution] minor 2021-11-15 14:45:54 +01:00
Claudio Atzori 941a50a2fc Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-11-15 14:42:49 +01:00
Claudio Atzori 7c804acda8 [graph resolution] minor 2021-11-15 14:42:43 +01:00
Sandro La Bruzzo efa09057db Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2021-11-15 14:32:09 +01:00
Sandro La Bruzzo 48923e46a1 added documentation to Pubmed Class and also added mvn site for dhp-aggregations 2021-11-15 14:32:01 +01:00
Claudio Atzori d2c787d416 [graph resolution] fixed sequence of the workflow steps 2021-11-15 14:31:15 +01:00
Claudio Atzori 975b10b711 [actionmanager] increased spark.sql.shuffle.partitions to 5000 2021-11-15 12:31:45 +01:00
Miriam Baglioni 4ec88c718c merge with beta - resolved conflict in pom 2021-11-15 10:52:16 +01:00
Miriam Baglioni 6f1a434e90 [Bypass Action Set] Fixed test to consider the new identifier utils 2021-11-15 09:59:23 +01:00
Miriam Baglioni 157d33ebf9 [Bypass Action Set] Refactoring 2021-11-15 09:58:48 +01:00
Miriam Baglioni 6595135a1a [Dump Schemas] changed the schema of the dumped result according to the modifications in the bestAccessRight type 2021-11-12 11:45:38 +01:00
Miriam Baglioni 43cae4ad88 Merge branch 'dump' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dump 2021-11-12 11:36:54 +01:00
Miriam Baglioni b3f9370125 merge with beta - resolved conflict in pom 2021-11-12 11:25:26 +01:00
Miriam Baglioni 92d0e18b55 [Bypass Action Set] used constant DOI instead of "doi" 2021-11-12 10:56:58 +01:00