Commit Graph

1370 Commits

Author SHA1 Message Date
Claudio Atzori 11695ba649 [graph cleaning] patch also the result's collectedfrom and hostedby datasource name according to the datasource master-duplicate mapping 2022-11-28 10:18:43 +01:00
Claudio Atzori 24ef301cc1 [graph cleaning] patch the result's collectedfrom and hostedby identifiers according to the datasource master-duplicate mapping 2022-11-28 09:54:18 +01:00
Alessia Bardi 3c08269a4d Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-11-22 17:31:00 +01:00
Alessia Bardi 2687fc9f73 tests for EOSC Future review - ROhub 2022-11-22 17:30:56 +01:00
Claudio Atzori 7c3390ac10 Merge branch 'beta' into eoscifguidelines-from-mdstores 2022-11-07 12:18:40 +01:00
Sandro La Bruzzo 2b9a20a4a3 Changed the way Scholexplorer filter the relationships, I found that filter all relation coming from openCitation is wrong, because we loose a lot of relation than intersect OpenCitation, but they don't come only from there 2022-10-24 12:53:47 +02:00
Alessia Bardi 208ed32315 fixed xpath for semantic relation 2022-10-23 18:18:13 +02:00
Alessia Bardi ee759ac92d file format after mvn compile 2022-10-23 18:09:47 +02:00
Alessia Bardi 31a10f000b Map the field oaf:eoscifguidelines from mdstores. Currently we can find it in ROHub metadata 2022-10-23 18:05:37 +02:00
Claudio Atzori ae7cd0735a [graph2hive] more partitions 2022-10-14 15:47:58 +02:00
Claudio Atzori b47aaf4dd1 [cleaning] subjects declared as belonging to specific vocabularies whose values are not found in the vocab are set to type keyword 2022-10-13 11:23:43 +02:00
Claudio Atzori 6163ecbf63 [cleaning] renamed parameters in wf action 2022-10-11 11:20:03 +02:00
Claudio Atzori b301e9fdff [cleaning] renamed action name/description 2022-10-11 11:08:52 +02:00
Claudio Atzori ece40adc09 [cleaning] fixing NPE in the country cleaning phase 2022-10-11 10:10:20 +02:00
Claudio Atzori 8d97949316 [cleaning] fixed loop in wf nodes 2022-10-07 09:52:45 +02:00
Alessia Bardi 49360770d7 map w3id as instance url 2022-09-28 14:16:39 +02:00
Miriam Baglioni b5b5a4c192 [CleanCountry] fixed issue 2022-09-28 12:42:51 +02:00
Claudio Atzori 3f90d159e3 code formatting 2022-09-27 15:08:00 +02:00
Claudio Atzori 0b3e44e521 Merge branch 'beta' into relation-from-odf 2022-09-27 14:57:01 +02:00
Claudio Atzori 57dbeb08d2 code formatting 2022-09-27 14:55:10 +02:00
Claudio Atzori 25e9d92aad Merge branch 'beta' into clean_country 2022-09-27 14:27:49 +02:00
Alessia Bardi fd63e9bfac Mapping all relationships supported in ModelConstants and ModelSupport 2022-09-26 11:24:13 +02:00
Alessia Bardi c5eb722170 relationships from relatedIdentifier whose target id type is one of the pid type with an authority 2022-09-23 15:47:05 +02:00
Claudio Atzori c86cc53520 suppressing hyper verbose spark logs during unit test execution 2022-09-23 15:20:40 +02:00
Alessia Bardi ba33ff71fd refactoring for the generation of relationships from related identifier of type 'OPENAIRE' 2022-09-23 15:17:13 +02:00
Alessia Bardi 982bcc1e35 test wrid pid and record identifier 2022-09-23 12:06:06 +02:00
Claudio Atzori c42850328e fixed semantic (subreltype) for ServiceOrganization relations 2022-09-22 16:23:25 +02:00
Claudio Atzori e45ec15221 Merge branch 'beta' into clean_country 2022-09-19 11:34:02 +02:00
Claudio Atzori 26e1badded added instance.url syntactical validation, avoid creating multiple duplicated URLs 2022-09-19 11:19:10 +02:00
Claudio Atzori 192215a18e merged from branch discard-non-wellformed 2022-09-19 10:17:10 +02:00
Claudio Atzori e370e940d8 [aggregator graph] save invalid records aside for further inspection 2022-09-16 14:06:28 +02:00
Claudio Atzori 1e42d984e1 [aggregator graph] save invalid records aside for further inspection 2022-09-15 10:49:42 +02:00
Alessia Bardi 9e7ec4198f fixed test 2022-09-14 18:08:56 +02:00
Claudio Atzori c48f6e9c57 [aggregator graph] save invalid records aside for further inspection 2022-09-14 17:11:26 +02:00
Claudio Atzori a0919ed495 [aggregator graph] save invalid records aside for further inspection 2022-09-14 13:27:39 +02:00
Alessia Bardi b99a011345 return empty Oaf list if record cannot be parsed 2022-09-13 11:51:55 +02:00
Alessia Bardi 27af5122d2 logs for non well formed XML files 2022-09-12 14:25:23 +02:00
Claudio Atzori ff6f789b6d code formatting 2022-09-09 15:16:31 +02:00
Claudio Atzori b5d6966c01 Merge branch 'beta' into clean_country 2022-09-09 12:20:19 +02:00
Claudio Atzori b5f7bd30be Merge branch 'beta' into clean_subjects 2022-09-09 12:20:04 +02:00
Alessia Bardi a539c6ccaf https for handle URLs 2022-09-09 12:16:28 +02:00
Claudio Atzori 1203378441 Merge branch 'beta' into clean_subjects 2022-09-09 10:38:47 +02:00
Claudio Atzori 14dc909a14 Merge branch 'beta' into clean_country 2022-09-09 10:38:17 +02:00
Alessia Bardi 9ef063d502 #7861#note-8 instance url from handle 2022-09-07 17:29:54 +03:00
Alessia Bardi 5c45d52af3 testing for RiuNet 2022-09-07 15:40:57 +03:00
Alessia Bardi a11eb38065 testing for RO-Hub 2022-09-02 16:07:36 +02:00
Claudio Atzori b7c387c21f cleaning of subjects: avoid duplicated subjects, prioritise collected vs inferred or other sources 2022-08-12 15:09:16 +02:00
Claudio Atzori adb526b0e1 Merge branch 'beta' into clean_subjects 2022-08-12 10:51:17 +02:00
Claudio Atzori cb7c07c54e [scholix] added step to create tar archive 2022-08-11 11:25:24 +02:00
Claudio Atzori 2aa16d0432 [scholix] fixed OpenCitation dump procedure 2022-08-10 17:39:29 +02:00
Miriam Baglioni 7dbdd4a0fe [Clean Country]changes related to #241 (comment) 2022-08-10 15:13:10 +02:00
Claudio Atzori 51ad93e545 [scholix] fixed OpenCitation dump procedure 2022-08-10 11:57:56 +02:00
Miriam Baglioni 62d2138806 [Clean Context] changed a bit the logic. Added the check not to have result hosted by a datasource of type institutional repository from NL. Added also the check that the country should have been included in the result via propagation for it to be removed 2022-08-08 14:10:47 +02:00
Claudio Atzori 3418ce50ac cleaning of subjects: perform the cleaning when the given value is equivalent to one of the terms in the vocabulary 2022-08-08 12:48:47 +02:00
Miriam Baglioni 390013a4b2 mergin with branch beta 2022-08-08 12:30:31 +02:00
Claudio Atzori 4eaa063b1f cleaning of subjects 2022-08-05 16:56:09 +02:00
Claudio Atzori 32cee1f619 WIP: cleaning of subjects 2022-08-05 12:32:08 +02:00
Claudio Atzori 6c0fd9284b merge from beta 2022-08-05 10:42:53 +02:00
Claudio Atzori b78889a0ce WIP: cleaning of subjects 2022-08-05 09:11:37 +02:00
Miriam Baglioni a7a18d7630 [Graph Dump] removed code for the dump from the project. Fixed issues in tests when possible 2022-08-04 17:40:40 +02:00
Claudio Atzori 27a91841e7 WIP: cleaning of subjects 2022-08-04 11:39:39 +02:00
Claudio Atzori e62018e95d [aggregator graph] added more assertions in test 2022-08-03 12:26:05 +02:00
Claudio Atzori f62c4e05cd code formatting 2022-07-29 11:56:01 +02:00
Claudio Atzori 1dd1e4fe3a extended test for mapping project_organization relations 2022-07-28 11:27:08 +02:00
Claudio Atzori 09ccc7b472 Merge branch 'beta' into project_organization_contribution 2022-07-28 09:49:59 +02:00
Miriam Baglioni 5968ec018d [Clean Country] modified workflow and added param file 2022-07-22 16:48:38 +02:00
Miriam Baglioni a12d28c644 [Clean Country] added logic not to remove country from result if it exist a hosting datasource with that country. Moreover the country will be removed only if added with propagation 2022-07-22 16:23:12 +02:00
Miriam Baglioni 2c933f1158 mergin with branch beta 2022-07-22 14:57:41 +02:00
Sandro La Bruzzo ddc414b258 fixed wrong json param 2022-07-22 09:43:15 +02:00
Sandro La Bruzzo 5f651f2316 changed filter relation on SubRelType 2022-07-21 10:11:48 +02:00
Miriam Baglioni 65cc736e2f [Clean Country] first implementation to remove country NL from results collected from NARCIS when doi starts with mendely prefix 2022-07-20 17:05:56 +02:00
Sandro La Bruzzo 5b76321d9c implemented oozie workflow to generate scholix dump filtering relclass semantic 2022-07-20 16:34:32 +02:00
Claudio Atzori 1138b2ac8e code formatting 2022-07-19 14:15:49 +02:00
Claudio Atzori 0c1cfee396 mapping oaf:fulltext elements in the result.fulltext field 2022-07-11 17:34:59 +02:00
Claudio Atzori 0cb1c70788 code formatting 2022-07-01 10:44:08 +02:00
Claudio Atzori 4ec13e2b66 Merge branch 'master' into dump_new_funded_products 2022-07-01 10:30:28 +02:00
Claudio Atzori 7da24c1dec added more logging 2022-06-28 13:47:49 +02:00
Miriam Baglioni 71744a1f52 [DUMP DELTA PROJECTS] refactoring 2022-06-27 18:07:58 +02:00
Miriam Baglioni 1d1fe3b151 [DUMP DELTA PROJECTS] refactoring 2022-06-27 18:04:59 +02:00
Claudio Atzori a8773af0cb Merge branch 'beta' into project_organization_contribution 2022-06-27 09:37:40 +02:00
Claudio Atzori 5130eac247 mapping by participant project contribution 2022-06-24 17:16:42 +02:00
Miriam Baglioni edddfc6c63 [DUMP DELTA PROJECTS] adding test and resource 2022-06-21 18:28:53 +02:00
Miriam Baglioni f561f13dd9 [Funder Products Dump] fixed names of parameters in workflow 2022-06-21 18:18:17 +02:00
Miriam Baglioni ff74e73369 [DUMP NEW FUNDED PRODUCTS] change in resources 2022-06-21 18:02:51 +02:00
Miriam Baglioni b98f904d48 [Funder Products Dump] new way to avoid using hive 2022-06-21 17:52:27 +02:00
Miriam Baglioni 7423577a08 [Graph DUMP] add code to produce the delta of new projects with respect to the previous delta/dump 2022-06-21 14:51:38 +02:00
Claudio Atzori b295a40d9c restored use of name_particles when parsing author names 2022-06-16 12:20:43 +02:00
Claudio Atzori 4c8e820ff0 mapping relationship from trasformed records based on oaf:relation 2022-06-14 08:49:02 +02:00
Claudio Atzori 116902c028 mapping relationship from trasformed records based on oaf:relation 2022-06-13 14:31:48 +02:00
Alessia Bardi 68bd58d6a4 tests for ROHub 2022-06-10 17:29:11 +02:00
Claudio Atzori 52cb086506 [graph grouping] drop relation target path before copying from source 2022-05-16 12:08:36 +02:00
Claudio Atzori 997c50078e [graph grouping] drop relation target path before copying from source 2022-05-16 12:07:40 +02:00
Claudio Atzori 6031acb2e3 [openorgs] fixed parent/child query, using the correct semantic labels 2022-05-16 09:20:48 +02:00
Claudio Atzori 0dc33ea391 [openorgs] fixed parent/child query, using the correct semantic labels 2022-05-16 09:20:30 +02:00
Miriam Baglioni e4eac1d20b [EOSC TAG] added code to remove EOSC Jupyter Notebook from subjects and put EOSC as classid in the qualifier 2022-05-13 11:01:33 +02:00
Sandro La Bruzzo 22f65680b9 Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2022-05-11 15:30:12 +02:00
Sandro La Bruzzo ca8d26bcb4 added better filter for openCitations 2022-05-11 15:29:57 +02:00
Claudio Atzori 5d3b4a9c25 [graph merge beta] merge datasource originalid, collectedfrom, and pid lists 2022-05-11 14:13:06 +02:00
Claudio Atzori 2a8e0fb72f [openorgs] mapping parent/child relations without massaging the semantic labels 2022-05-10 08:45:53 +02:00
Claudio Atzori 77bc9863e9 [openorgs] mapping parent/child relations without massaging the semantic labels 2022-05-09 16:06:04 +02:00
Claudio Atzori 378020e30a [eosc_services] unit test adaptation 2022-05-09 16:05:06 +02:00
Claudio Atzori 846975c886 [eosc_services] using the correct 'keyword' subject type, as declared in the dnet:subject_classification_typologies vocabulary 2022-05-05 11:37:58 +02:00
Claudio Atzori da611cfbbd [eosc_services] resolved merge conflicts 2022-05-03 13:37:15 +02:00
Claudio Atzori 2ade69dea6 EOSC Services - minor 2022-05-02 17:03:31 +02:00
Claudio Atzori b6a7ff3a99 EOSC Services - removed fields from mapping, testing preparation 2022-05-02 15:52:33 +02:00
Claudio Atzori a8c51f6f16 EOSC Services - fixed query and testing preparation 2022-05-02 11:09:03 +02:00
Claudio Atzori 05c1ea92e9 EOSC Services - added Service-specific fields in the XML record serialization 2022-04-29 15:56:55 +02:00
Claudio Atzori f5f532d134 EOSC Services - ongoing update 2022-04-29 12:25:24 +02:00
Claudio Atzori 5ffc24d1ba EOSC Services - ongoing update 2022-04-26 16:18:41 +02:00
Miriam Baglioni 19d90658fc [Clean Context] added description to parameters 2022-04-22 15:41:23 +02:00
Miriam Baglioni e0915061c2 [Clean Context] fixed issue in param name 2022-04-21 16:32:40 +02:00
Miriam Baglioni 9a961a0092 [Clean Context] fixed issue in param name 2022-04-21 15:12:24 +02:00
Miriam Baglioni 5b7d9e741c [Clean Context] added logic to cleaning workflow to accomodate also context cleaning 2022-04-21 13:02:14 +02:00
Miriam Baglioni ccba1a3db1 [Clean Context] added logic to cleaning workflow to accomodate also context cleaning 2022-04-21 13:00:06 +02:00
Miriam Baglioni a38f0f5ea7 mergin with branch beta 2022-04-20 15:44:18 +02:00
Miriam Baglioni dbfbe8841a [Clean Context] changed the description in input parameters 2022-04-20 15:41:03 +02:00
Michele Artini c96a8613f8 update SQL queries 2022-04-20 12:07:49 +02:00
Michele Artini 4314db55c8 migration to services: update sql queries 2022-04-19 15:05:02 +02:00
Claudio Atzori c26222623f [maven-release-plugin] prepare for next development iteration 2022-04-07 13:32:22 +02:00
Claudio Atzori 86585a6b27 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 13:32:19 +02:00
Claudio Atzori ad85d88eaf [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 13:28:35 +02:00
Claudio Atzori 598e11dfd7 [maven-release-plugin] prepare for next development iteration 2022-04-07 13:27:02 +02:00
Claudio Atzori db3d9877a5 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 13:26:58 +02:00
Claudio Atzori 3bba6d6e38 [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 12:23:17 +02:00
Claudio Atzori 2ac2d928bd [maven-release-plugin] prepare for next development iteration 2022-04-07 12:18:47 +02:00
Claudio Atzori 85bc722ff4 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 12:18:43 +02:00
Claudio Atzori bc05b6168a [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 11:49:06 +02:00
Claudio Atzori 505420fd61 [maven-release-plugin] prepare for next development iteration 2022-04-07 11:34:06 +02:00
Claudio Atzori 66e718981e [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 11:34:02 +02:00
Claudio Atzori 05fafa1408 [graph raw] avoid NPEs importing datasource consent fields 2022-04-06 15:23:50 +02:00
Claudio Atzori 8c457f1b2c conflicts resolved, merged from beta 2022-04-06 10:27:52 +02:00
Miriam Baglioni 79336d46c5 [Clean Context] first naive implementation of a functionality to clean not wanted contextes from one result. This implementation simply verifies the main title of the results start with a given string 2022-04-04 15:52:31 +02:00
Claudio Atzori 0a0ae84c22 [graph raw] DOI based instance URLs on https 2022-03-29 10:52:58 +02:00
Claudio Atzori 741bc99c47 Merge branch 'beta' into datasource_pdf_consent 2022-03-28 09:20:48 +02:00
Miriam Baglioni 89fd275480 [HostedByMap] added left over from PR and fixed issue on workflow 2022-03-21 09:54:45 +01:00
Miriam Baglioni 0f7d8ca2e0 [HostedByMap] change on master to align to PR 201 on beta merged as 9f3036c847 2022-03-11 15:16:02 +01:00
Claudio Atzori f25407bbe2 added mapping for datasource consent fields to integrate them in the graph 2022-03-11 09:32:42 +01:00
Miriam Baglioni 2c5087d55a [HostedByMap] download of doaj from json, modification of test resources, deletion of class no more needed for the CSV download 2022-03-04 15:18:21 +01:00
Miriam Baglioni 5d608d6291 [HostedByMap] changed the model to include also oaStart date and review process that could be possibly used in the future 2022-03-04 11:06:09 +01:00
Miriam Baglioni 8a41f63348 [HostedByMap] update to download the json instead of the csv 2022-03-04 10:38:43 +01:00
Miriam Baglioni 44b0c03080 [HostedByMap] update to download the json instead of the csv 2022-03-04 10:37:59 +01:00
Claudio Atzori a87c070447 conflicts resolved, merged from beta 2022-02-24 12:51:31 +01:00
Claudio Atzori 99f5b14469 [graph raw] invisible records stored among the raw graph rather than the claimed subgraph 2022-02-18 15:20:57 +01:00
Claudio Atzori cf8443780e added processingchargeamount to the result view 2022-02-18 15:17:48 +01:00
Alessia Bardi 600ede1798 serialisation of APCs int he XML records 2022-02-11 11:00:20 +01:00
Miriam Baglioni 493caef358 [stats-wf]fixed the result_result table related to PR#191 2022-02-04 14:51:25 +01:00
Miriam Baglioni aae667e6b6 [APC at the result level] added the APC at the level of the result and modified test class 2022-02-04 12:34:25 +01:00
Alessia Bardi 2e215abfa8 test for instances with URLs for OpenAPC 2022-02-02 17:27:44 +01:00
Claudio Atzori 8eb75ca169 adapted GenerateEntitiesApplicationTest behaviour 2022-01-27 16:24:37 +01:00
Claudio Atzori af61e44acc ported changes to the GraphCleaningFunctionsTest from 8de9788308 2022-01-27 16:19:14 +01:00