Commit Graph

3459 Commits

Author SHA1 Message Date
Claudio Atzori e03c0c7794 Merge branch 'beta' into oaf_relation_mapping 2022-06-16 09:27:01 +02:00
Claudio Atzori 06b5533d4c Merge branch 'beta' into 7096-fileGZip-collector-plugin 2022-06-16 09:22:16 +02:00
Claudio Atzori 4c8e820ff0 mapping relationship from trasformed records based on oaf:relation 2022-06-14 08:49:02 +02:00
Alessia Bardi 88d531dc91 exclude FAIRsharing records from Datacite 2022-06-13 16:17:17 +02:00
Claudio Atzori 116902c028 mapping relationship from trasformed records based on oaf:relation 2022-06-13 14:31:48 +02:00
Claudio Atzori b8cda65487 code formatting 2022-06-13 09:20:03 +02:00
Michele Artini 634869ce95 deleted hierarchical rels from ror action set 2022-06-13 09:12:21 +02:00
Alessia Bardi 922c6d66ef Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-06-10 17:29:15 +02:00
Alessia Bardi 68bd58d6a4 tests for ROHub 2022-06-10 17:29:11 +02:00
Miriam Baglioni b229c6e7af Merge pull request 'beta' (#218) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#218
2022-06-10 11:03:48 +02:00
Antonis Lempesis ab18c9daa9 Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta 2022-06-09 15:48:21 +03:00
Antonis Lempesis 574492c659 removed double result_apc table creation from monitor 2022-06-09 15:48:13 +03:00
Michele Artini b94a791bc5 unit tests to transform cnr explora 2022-06-09 12:25:34 +02:00
Miriam Baglioni 4b6913787b [DOI-BOOST] added one method in test of crossref mapping to aof and one resource. Related to ticket 7807 2022-06-08 14:55:19 +02:00
Antonis Lempesis db088cc69c fixed *_organization tables 2022-06-07 04:04:28 +03:00
Miriam Baglioni 31d4557e8d Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2022-06-06 11:52:29 +02:00
Claudio Atzori 5c2949a864 Merge pull request '[stats wf] added open citations & more orgs in monitor, removed collab indicator' (#213) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#213
2022-05-20 11:38:43 +02:00
Miriam Baglioni 5e0b8f9b5f [CountryPropagation] refactoring 2022-05-20 09:15:53 +02:00
Miriam Baglioni c298c148cb [CountryPropagation] fix NPE issue 2022-05-20 09:11:46 +02:00
Miriam Baglioni eaf9385ae5 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-05-17 15:09:37 +02:00
Miriam Baglioni f5207885e3 [EOSCTag] changed code to remove EOSC Jupyter Notebook and modified test to exclude galaxy + software from the tagging for Galaxy 2022-05-17 15:09:22 +02:00
Claudio Atzori d098ad0d93 [hb patch] updated map 2022-05-16 15:54:04 +02:00
Claudio Atzori 1dda11e031 [hb patch] updated map 2022-05-16 15:53:27 +02:00
Claudio Atzori 8dd5517548 code formatting 2022-05-16 14:35:24 +02:00
Claudio Atzori 52cb086506 [graph grouping] drop relation target path before copying from source 2022-05-16 12:08:36 +02:00
Claudio Atzori 6442763f97 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-05-16 12:07:45 +02:00
Claudio Atzori 997c50078e [graph grouping] drop relation target path before copying from source 2022-05-16 12:07:40 +02:00
Sandro La Bruzzo c1971d52c4 Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2022-05-16 10:30:35 +02:00
Sandro La Bruzzo 4c50f35c8b update publication Date format 2022-05-16 10:29:36 +02:00
Michele Artini 46c07e0724 deleted hierarchical rels from ror action set 2022-05-16 09:39:54 +02:00
Claudio Atzori 6031acb2e3 [openorgs] fixed parent/child query, using the correct semantic labels 2022-05-16 09:20:48 +02:00
Claudio Atzori 0dc33ea391 [openorgs] fixed parent/child query, using the correct semantic labels 2022-05-16 09:20:30 +02:00
Antonis Lempesis 3fc9efeab6 fixed typo, addded open citations and apcs in monitor 2022-05-13 14:28:13 +03:00
Miriam Baglioni e4eac1d20b [EOSC TAG] added code to remove EOSC Jupyter Notebook from subjects and put EOSC as classid in the qualifier 2022-05-13 11:01:33 +02:00
Sandro La Bruzzo 22f65680b9 Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2022-05-11 15:30:12 +02:00
Sandro La Bruzzo ca8d26bcb4 added better filter for openCitations 2022-05-11 15:29:57 +02:00
Claudio Atzori 5d3b4a9c25 [graph merge beta] merge datasource originalid, collectedfrom, and pid lists 2022-05-11 14:13:06 +02:00
Antonis Lempesis 23334479bb removed yet another collab, added more orgs in monitor 2022-05-11 13:05:52 +03:00
Claudio Atzori 2a8e0fb72f [openorgs] mapping parent/child relations without massaging the semantic labels 2022-05-10 08:45:53 +02:00
Claudio Atzori 77bc9863e9 [openorgs] mapping parent/child relations without massaging the semantic labels 2022-05-09 16:06:04 +02:00
Claudio Atzori 378020e30a [eosc_services] unit test adaptation 2022-05-09 16:05:06 +02:00
Miriam Baglioni 89657a0b78 [UsageCount] refactoring 2022-05-09 14:43:27 +02:00
Miriam Baglioni a056f59c6e [UsageCount] make it as an action set as it should be, plus changed the test to make them work as well now 2022-05-09 12:51:35 +02:00
Antonis Lempesis 61b4c19e65 restored indi_result_org_country_collab, removed indi_result_org_collab 2022-05-06 12:52:10 +03:00
Antonis Lempesis cfbbcaf7c4 commented out indi_result_org_country_collab 2022-05-06 12:49:36 +03:00
Claudio Atzori 658450d9a3 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-05-05 11:38:08 +02:00
Claudio Atzori 846975c886 [eosc_services] using the correct 'keyword' subject type, as declared in the dnet:subject_classification_typologies vocabulary 2022-05-05 11:37:58 +02:00
Miriam Baglioni 8a72de4011 [EOSCTag] modified workflow to execute all the steps and not only the last one 2022-05-04 10:10:56 +02:00
Miriam Baglioni bd1108f98b mergin with branch beta 2022-05-04 10:06:56 +02:00
Miriam Baglioni 3aeedd931a [EOSCTag] fixed issue in case description is null. Modified test resources and classes 2022-05-04 10:06:38 +02:00
Claudio Atzori da611cfbbd [eosc_services] resolved merge conflicts 2022-05-03 13:37:15 +02:00
Claudio Atzori 9e12cb3c92 EOSC Services - removed field knowledgegraph; depending on the released schema module 2022-05-03 11:55:45 +02:00
Miriam Baglioni a21fe310e5 [EOSCTag] last test and change in the implementation to search in title and descriptio 2022-05-02 17:43:20 +02:00
Claudio Atzori 2ade69dea6 EOSC Services - minor 2022-05-02 17:03:31 +02:00
Claudio Atzori b6a7ff3a99 EOSC Services - removed fields from mapping, testing preparation 2022-05-02 15:52:33 +02:00
Miriam Baglioni e37177e1ce mergin with branch beta 2022-05-02 12:31:50 +02:00
Claudio Atzori a8c51f6f16 EOSC Services - fixed query and testing preparation 2022-05-02 11:09:03 +02:00
Claudio Atzori 05c1ea92e9 EOSC Services - added Service-specific fields in the XML record serialization 2022-04-29 15:56:55 +02:00
Claudio Atzori f5f532d134 EOSC Services - ongoing update 2022-04-29 12:25:24 +02:00
Serafeim Chatzopoulos 623f7be26d Fix reading files from HDFS in FileCollector & FileGZipCollector plugins 2022-04-28 16:31:11 +03:00
Claudio Atzori 5ffc24d1ba EOSC Services - ongoing update 2022-04-26 16:18:41 +02:00
Sandro La Bruzzo 78015a5733 Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2022-04-26 09:56:34 +02:00
Sandro La Bruzzo 8c22e5c30a added fix to include date array with only year or year and month 2022-04-26 09:56:27 +02:00
Claudio Atzori 81c4496d32 Merge branch 'beta' into 7096-fileGZip-collector-plugin 2022-04-26 09:02:15 +02:00
Miriam Baglioni e342ec93f0 [EOSCTag] prepared resources for test 2022-04-22 18:35:37 +02:00
Miriam Baglioni 88562c0930 [EOSC TAG] added test for galaxy for title and description criterias 2022-04-22 18:35:03 +02:00
Miriam Baglioni dfbd2bcbea [EOSC TAG] added logic in case subject is null 2022-04-22 18:34:03 +02:00
Miriam Baglioni 27c85e901a [EOSCTag] added resources and finalized test for Jupyter Notebook tagging 2022-04-22 17:38:10 +02:00
Miriam Baglioni 87bff36d9e mergin with branch beta 2022-04-22 15:52:34 +02:00
Miriam Baglioni 911ce0780a Merge branch 'cleancontext' of https://code-repo.d4science.org/D-Net/dnet-hadoop into cleancontext 2022-04-22 15:41:42 +02:00
Miriam Baglioni 19d90658fc [Clean Context] added description to parameters 2022-04-22 15:41:23 +02:00
Claudio Atzori 54162f5c4f Merge branch 'beta' into cleancontext 2022-04-22 11:49:33 +02:00
Miriam Baglioni bbb77052d3 [EOSCTag] first test 2022-04-22 11:32:57 +02:00
Claudio Atzori 30105f0722 Merge branch 'beta' into 7096-fileGZip-collector-plugin 2022-04-22 11:22:21 +02:00
Sandro La Bruzzo a82ec3aaaf code formatter 2022-04-22 11:08:13 +02:00
Sandro La Bruzzo aa12429f50 Modified last intersection since we lost many titles. 2022-04-22 11:05:08 +02:00
Miriam Baglioni 7cb7066472 [EoscTag] first "rough" implementation 2022-04-22 10:44:17 +02:00
Sandro La Bruzzo d660895b30 fixed wrong mapping type of dataset 2022-04-21 20:41:13 +02:00
Miriam Baglioni e0915061c2 [Clean Context] fixed issue in param name 2022-04-21 16:32:40 +02:00
Miriam Baglioni 6dc68c48e0 [EOSCTag] - 2022-04-21 16:19:04 +02:00
Miriam Baglioni 9a961a0092 [Clean Context] fixed issue in param name 2022-04-21 15:12:24 +02:00
Claudio Atzori 29150a5d0c code formatting 2022-04-21 13:31:56 +02:00
Miriam Baglioni 5b7d9e741c [Clean Context] added logic to cleaning workflow to accomodate also context cleaning 2022-04-21 13:02:14 +02:00
Miriam Baglioni ccba1a3db1 [Clean Context] added logic to cleaning workflow to accomodate also context cleaning 2022-04-21 13:00:06 +02:00
Miriam Baglioni 20de75ca64 [Measures] removed typo 2022-04-21 12:14:03 +02:00
Miriam Baglioni bebb2a0560 Merge branch 'eosc_dimitris' of https://code-repo.d4science.org/D-Net/dnet-hadoop into eosc_dimitris 2022-04-21 12:10:19 +02:00
Miriam Baglioni b61efd613b [Measures] addressed comments in the PR 2022-04-21 12:09:37 +02:00
Miriam Baglioni d012d125d7 [EOSCTag] - 2022-04-21 12:02:09 +02:00
Claudio Atzori 88acad76f9 Merge branch 'beta' into eosc_dimitris 2022-04-21 12:00:03 +02:00
Claudio Atzori eabb40fccc Merge branch 'beta' into 7096-fileGZip-collector-plugin 2022-04-21 11:42:43 +02:00
Miriam Baglioni c304657d91 [Measures] put the logic in common, no need to change the schema 2022-04-21 11:27:26 +02:00
Sandro La Bruzzo d580e15442 Modified last intersection since we lost many titles.
this is my last resource, after that, I've to  change my job
2022-04-21 11:06:08 +02:00
Miriam Baglioni 5295effc96 [Measures] fixed issue 2022-04-20 16:20:40 +02:00
Miriam Baglioni a38f0f5ea7 mergin with branch beta 2022-04-20 15:44:18 +02:00
Miriam Baglioni dbfbe8841a [Clean Context] changed the description in input parameters 2022-04-20 15:41:03 +02:00
Miriam Baglioni 5feae77937 [Measures] last changes to accomodate tests 2022-04-20 15:13:09 +02:00
Miriam Baglioni 869407c6e2 [Measures] added new measure (usagecounts) as action set. Measure added at the level of the result. Ref #7587 2022-04-20 14:02:05 +02:00
Antonis Lempesis b7cd2c6ca1 added open citations 2022-04-20 14:46:55 +03:00
Michele Artini c96a8613f8 update SQL queries 2022-04-20 12:07:49 +02:00
Michele Artini 4314db55c8 migration to services: update sql queries 2022-04-19 15:05:02 +02:00
Miriam Baglioni 0012e57bf9 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2022-04-14 14:14:44 +02:00
Miriam Baglioni c5a863132c [BulkTagging] revert it 2022-04-14 14:14:13 +02:00
Sandro La Bruzzo d5b29d96a7 fix merging in crossrefAggregator which creates dataInfo null 2022-04-14 11:07:04 +02:00
Miriam Baglioni 8e8933d41a [BulkTagging] added fix if result.dataInfo is null 2022-04-14 09:04:24 +02:00
Claudio Atzori b93a141d6c [Doiboost] fixed fundingReference extraction from the Crossref records 2022-04-12 10:26:05 +02:00
Claudio Atzori 73c172926a [Doiboost] fixed fundingReference extraction from the Crossref records 2022-04-12 10:25:42 +02:00
Claudio Atzori 48b580b45c [graph enrichment] fixed country_propagation oozie workflow definition, parameter saveGraph is not needed anymore by the SparkCountryPropagationJob 2022-04-11 08:52:36 +02:00
Claudio Atzori 21f32b83c6 [graph enrichment] fixed country_propagation oozie workflow definition, parameter saveGraph is not needed anymore by the SparkCountryPropagationJob 2022-04-11 08:52:12 +02:00
Claudio Atzori 4eff7856f5 Merge pull request '[stats-wf] computing stats in each step' (#210) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#210
2022-04-08 14:21:01 +02:00
Serafeim Chatzopoulos d0b84d3297 Add FileCollectorPlugin and respective test 2022-04-07 15:06:38 +03:00
Serafeim Chatzopoulos bc1bf55507 Add AbstractSplittedRecordPlugin 2022-04-07 14:33:04 +03:00
Claudio Atzori c26222623f [maven-release-plugin] prepare for next development iteration 2022-04-07 13:32:22 +02:00
Claudio Atzori 86585a6b27 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 13:32:19 +02:00
Claudio Atzori ad85d88eaf [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 13:28:35 +02:00
Claudio Atzori 598e11dfd7 [maven-release-plugin] prepare for next development iteration 2022-04-07 13:27:02 +02:00
Claudio Atzori db3d9877a5 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 13:26:58 +02:00
Claudio Atzori 3bba6d6e38 [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 12:23:17 +02:00
Claudio Atzori 2ac2d928bd [maven-release-plugin] prepare for next development iteration 2022-04-07 12:18:47 +02:00
Claudio Atzori 85bc722ff4 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 12:18:43 +02:00
Claudio Atzori bc05b6168a [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 11:49:06 +02:00
Claudio Atzori 505420fd61 [maven-release-plugin] prepare for next development iteration 2022-04-07 11:34:06 +02:00
Claudio Atzori 66e718981e [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 11:34:02 +02:00
Serafeim Chatzopoulos e612489670 Add fileGZip collector plugin and respective test 2022-04-06 19:12:44 +03:00
Claudio Atzori 4190c9f6bc [graph raw] avoid NPEs importing datasource consent fields 2022-04-06 15:34:31 +02:00
Claudio Atzori 05fafa1408 [graph raw] avoid NPEs importing datasource consent fields 2022-04-06 15:23:50 +02:00
Antonis Lempesis c442c91f89 computing stats in each step 2022-04-06 12:40:02 +03:00
Claudio Atzori 8c457f1b2c conflicts resolved, merged from beta 2022-04-06 10:27:52 +02:00
Miriam Baglioni e77d104951 [OC] added / to workflow path 2022-04-05 15:07:11 +02:00
Miriam Baglioni 79336d46c5 [Clean Context] first naive implementation of a functionality to clean not wanted contextes from one result. This implementation simply verifies the main title of the results start with a given string 2022-04-04 15:52:31 +02:00
Claudio Atzori 873369af1c Merge pull request '[stats wf] added apcs in monitor db' (#207) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#207
2022-03-29 15:40:20 +02:00
Antonis Lempesis 7112806a73 views cannot be stored as parquet... 2022-03-29 16:37:29 +03:00
Antonis Lempesis fff0b3cc19 added apcs in monitor db 2022-03-29 14:15:31 +03:00
Claudio Atzori de85367695 Merge pull request '[stats wf] fix: views cannot be stored as parquet...' (#206) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#206
2022-03-29 12:51:02 +02:00
Antonis Lempesis ee24f3eb2c views cannot be stored as parquet... 2022-03-29 13:47:48 +03:00
Sandro La Bruzzo 1b11010169 minor fix 2022-03-29 10:59:14 +02:00
Claudio Atzori 0a0ae84c22 [graph raw] DOI based instance URLs on https 2022-03-29 10:52:58 +02:00
Claudio Atzori 9fa3dd78fe Merge pull request '[stats wf] various fixes, organization ids for inst. dashboard' (#205) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#205
2022-03-28 22:03:49 +02:00
Claudio Atzori 96aa2a5d0d Merge branch 'beta' into instance_group_by_url 2022-03-28 09:23:52 +02:00
Claudio Atzori 741bc99c47 Merge branch 'beta' into datasource_pdf_consent 2022-03-28 09:20:48 +02:00
Claudio Atzori 61319b2e83 updated dhp-schema version; set entity-level dataInfo before & after merging the fields from the group of duplicates 2022-03-25 16:38:33 +01:00
Antonis Lempesis d8503cd191 added moooar organizations 2022-03-24 14:02:36 +02:00
Miriam Baglioni 7b8f85692e [Enrichment country] fixed issues with parameters and workflow args 2022-03-23 17:20:23 +01:00
Claudio Atzori 48d32466e4 instances grouped by URL expose only one refereed 2022-03-23 14:52:03 +01:00
Claudio Atzori f10066547b increased spark.sql.shuffle.partitions in affiliation_from_semrel_propagation 2022-03-23 12:22:26 +01:00
Claudio Atzori 43733c1a18 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-03-23 12:14:27 +01:00
Antonis Lempesis 62f91b0869 cleanup 2022-03-22 16:17:49 +02:00
Antonis Lempesis 2e8394ecf8 creating aaall tables as parquet 2022-03-22 16:16:08 +02:00
Antonis Lempesis dcfbeb8142 yet more typos 2022-03-21 12:36:03 +02:00
Miriam Baglioni 89fd275480 [HostedByMap] added left over from PR and fixed issue on workflow 2022-03-21 09:54:45 +01:00
miconis c763aded70 dependency updated to the new pace-core version 2022-03-16 16:41:50 +01:00