Commit Graph

4023 Commits

Author SHA1 Message Date
dimitrispie 86f4f63daf Updates to steps related to transfer data to impala cluster
1. Remove external table definitions in stats_ext
2. Fix the issue where some views are not created.
3. Added two workflow parameters for copying also the usage stats dbs
2023-05-18 09:33:05 +03:00
Claudio Atzori 909729a2fc [dedup] tweaking num partitions, minor changes 2023-05-17 10:16:22 +02:00
Ilias Kanellos 38020e242a Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow 2023-05-16 17:34:53 +03:00
Ilias Kanellos 3d69f33c84 Fix selection of columns in graph creation 2023-05-16 17:34:42 +03:00
Ilias Kanellos 3c38f7ba6f Fix selection of columns in graph creation 2023-05-16 17:32:53 +03:00
Serafeim Chatzopoulos 8ef718c363 Fix workflow application path 2023-05-16 16:28:48 +03:00
Serafeim Chatzopoulos 26328e2a0d Move job.properties 2023-05-16 14:39:53 +03:00
Serafeim Chatzopoulos 4eec3e7052 Add jobTracker, nameNode && spark2Lib as global params in oozie wf 2023-05-15 22:28:48 +03:00
Serafeim Chatzopoulos b83135c252 Add missing kill nodes in workflow.xml 2023-05-15 19:55:35 +03:00
Serafeim Chatzopoulos 45f2aa0867 Move end node ... at the end in workflow.xml 2023-05-15 17:52:20 +03:00
Claudio Atzori 8acad52a0c Merge branch 'beta' into apc_affiliation 2023-05-15 15:47:33 +02:00
Claudio Atzori 8a463cc3e8 fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common 2023-05-15 15:44:46 +02:00
Serafeim Chatzopoulos 12a57e1f58 Resolve conflicts 2023-05-15 16:20:11 +03:00
Serafeim Chatzopoulos 82e2a96f51 Resolve conflicts 2023-05-15 15:53:12 +03:00
Serafeim Chatzopoulos b8e8c959fe Update workflow.xml && job.properties 2023-05-15 15:50:23 +03:00
Ilias Kanellos 4a905932a3 Spark properties from job.properties 2023-05-15 15:24:22 +03:00
Claudio Atzori 0c314d5e09 Merge pull request 'Update copyDataToImpalaCluster.sh' (#293) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #293
2023-05-15 12:05:54 +02:00
Serafeim Chatzopoulos 07818131ef Update documentation 2023-05-15 13:04:44 +03:00
dimitrispie b3f9633205 Update copyDataToImpalaCluster.sh
Added option --user to impala-shell command
2023-05-15 12:51:44 +03:00
Miriam Baglioni 78b07400c0 changed test classes 2023-05-15 11:37:08 +02:00
Miriam Baglioni 86fe886c1a removed the inverse of the Citing relation 2023-05-15 11:20:51 +02:00
Ilias Kanellos 1788ac2d4d Correct filtering for MAG records 2023-05-12 12:55:43 +03:00
Miriam Baglioni 12cd179d2d Merge pull request 'Update copyDataToImpalaCluster.sh' (#291) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #291
2023-05-12 11:36:34 +02:00
dimitrispie 00d0d162b6 Update copyDataToImpalaCluster.sh
Added a temporary folder to copy the files to avoid permission issues
2023-05-12 12:31:13 +03:00
Ilias Kanellos 5ddbb4ad10 Spark properties no longer hardcoded 2023-05-11 15:36:47 +03:00
Ilias Kanellos 3de35fd6a3 Produce 5 classes of ranking scores 2023-05-11 14:42:25 +03:00
Miriam Baglioni 8c05f49665 moved the version as it was before the change 2023-05-09 10:48:34 +02:00
Miriam Baglioni 99ac5bab46 added check to avoid NPE when checking the organization country 2023-05-04 19:38:39 +02:00
Claudio Atzori 0704e186f6 Merge pull request 'Stats wf executed on hive only' (#283) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #283
2023-05-02 14:05:12 +02:00
Claudio Atzori d8882c4481 extended mapping applied to datacite records to produce affiliations using the ROR ids. Inc ase of APCs it includes the amount and the currently in the relation 2023-05-02 11:56:51 +02:00
dimitrispie c3d58e58e1 Bug fixes 2023-05-02 11:54:07 +03:00
Claudio Atzori abd7ca0c18 Merge branch 'beta' into bulkTagRefactor 2023-05-02 10:50:01 +02:00
Claudio Atzori 45f625d14f Merge branch 'beta' into organizationToRepresentative 2023-05-02 10:46:55 +02:00
Claudio Atzori de11edca98 Merge branch 'beta' into organizationToRepresentative 2023-05-02 09:59:41 +02:00
Claudio Atzori 851f664bd9 Merge branch 'beta' into graph_cleaning_refactoring 2023-05-02 09:55:40 +02:00
dimitrispie e57ecdaf98 Update step20-createMonitorDB.sql
Add University of Manitoba
2023-04-30 17:52:23 +03:00
Ilias Kanellos 90332439ad Remove deletion of synonym folder 2023-04-28 13:45:19 +03:00
Ilias Kanellos a98da54896 Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow 2023-04-28 13:23:49 +03:00
Ilias Kanellos 09485fbee3 Fixed unicode bug. Workflow ends after first script 2023-04-28 13:09:13 +03:00
Serafeim Chatzopoulos 614cc1089b Add separate forder for results && project actionsets 2023-04-27 12:37:15 +03:00
Serafeim Chatzopoulos 815a4ddbba Add actionset creation for project bip indicators in workflow 2023-04-26 20:40:06 +03:00
Serafeim Chatzopoulos ee04cf92bf Add actionsets for project impact indicators 2023-04-26 20:23:46 +03:00
dimitrispie fdb5d2b39f Bug fixes 2023-04-23 18:29:00 +03:00
dimitrispie 53ce023035 Bug fixes 2023-04-23 18:23:45 +03:00
dimitrispie 4fa750b719 Bug fixes on monitor-update 2023-04-19 17:39:53 +03:00
dimitrispie 5247cb7115 Bug fix 2023-04-19 11:11:19 +03:00
Miriam Baglioni efc4f6a658 [bulkTag] refactor to enrich each result single step 2023-04-18 17:39:31 +02:00
Serafeim Chatzopoulos 23f58a86f1 Change jar param in project impact indicators action 2023-04-18 12:26:01 +03:00
Miriam Baglioni 697a134504 - 2023-04-18 10:21:12 +02:00
Miriam Baglioni 6cc95c96a2 - 2023-04-18 09:53:11 +02:00
dimitrispie 25dafccc24 Merge branch 'hive' into beta 2023-04-12 11:36:59 +03:00
Claudio Atzori a2dcb06daf added eoscifguidelines in the result view; removed compute statistics statements 2023-04-11 10:43:32 +02:00
Serafeim Chatzopoulos 7256c8d3c7 Add script for aggregating impact indicators at the project level 2023-04-07 16:30:12 +03:00
dimitrispie c85de8fa1f -Added Technological University Dublin
-Added project_organization_contribution table
-Add   Delft University of Technology
2023-04-07 09:22:59 +03:00
dimitrispie 9b41dff33c Update step20-createMonitorDB.sql
Added Delft University of Technology
2023-04-07 09:21:38 +03:00
Miriam Baglioni 932d07d2dd [bulkTag] added filtering for datasources in eosctag 2023-04-06 15:08:27 +02:00
Miriam Baglioni 287753417d better implementation for the fix 2023-04-06 12:22:38 +02:00
Miriam Baglioni b42abc9904 fixed issue on bulktagging for the advanced constraints 2023-04-06 12:15:00 +02:00
dimitrispie 91e18ac7f4 Added project_organization_contribution table 2023-04-06 10:53:11 +03:00
Miriam Baglioni b25b401065 added test to verify the advconstraints to dth community. inserted some additional logs. 2023-04-05 12:18:39 +02:00
Claudio Atzori 864f4051d3 [graph cleaning] added missing case 2023-04-05 11:35:47 +02:00
Claudio Atzori dead87917f [graph cleaning] cleanup 2023-04-04 13:13:43 +02:00
Claudio Atzori 2a6ba29b64 [graph cleaning] unit tests & cleanup 2023-04-04 12:34:51 +02:00
dimitrispie 9e1335df4c -Added Technological University Dublin
-Added project_organization_contribution table
2023-04-04 13:22:40 +03:00
Claudio Atzori 63b8bbc015 [graph to Solr] using dedicated sparkExecutorCores, sparkExecutorMemory, sparkDriverMemory in convert_to_xml 2023-03-24 13:43:20 +01:00
Claudio Atzori b502f86523 fixed input path supplemented to GetDatasourceFromCountry; adjusted the various spark.sql.shuffle.partitions 2023-03-24 13:09:12 +01:00
Claudio Atzori c07857fa37 [graph cleaning] unit tests & cleanup 2023-03-23 15:57:47 +01:00
Claudio Atzori 90e61a8aba [graph cleaning] WIP: refactoring of the cleaning stages, unit tests 2023-03-23 15:03:26 +01:00
Claudio Atzori 308e10d102 serialising: 1. measures for all the entity types and 2. result level fulltext 2023-03-23 11:23:22 +01:00
Claudio Atzori 488d9a5eaa [graph cleaning] WIP: refactoring of the cleaning stages, unit tests 2023-03-23 10:41:13 +01:00
dimitrispie fad7fa4af8 Added Technological University Dublin 2023-03-22 09:44:00 +02:00
Serafeim Chatzopoulos 102aa5ab81 Add dependency to dhp-aggregation 2023-03-21 19:25:29 +02:00
Serafeim Chatzopoulos 3e8a4cf952 Rearrange resources folder structure 2023-03-21 18:25:55 +02:00
Serafeim Chatzopoulos f992ecb657 Checkout BIP-Ranker during 'prepare-package' && add it in the oozie-package.tar.gz 2023-03-21 18:03:55 +02:00
Ilias Kanellos 9dc8f0f05f Add ActionSet step 2023-03-21 16:14:15 +02:00
Claudio Atzori 4f5ba0ed52 [graph cleaning] WIP: refactoring of the cleaning stages, unit tests 2023-03-21 14:41:20 +01:00
Ilias Kanellos b5c252865c Add filtering based on citation source 2023-03-20 15:38:36 +02:00
Claudio Atzori 6d3d18d8b5 [graph cleaning] WIP: refactoring of the cleaning stages 2023-03-16 17:23:36 +01:00
dimitrispie 43b23a9bf3 Update step20-createMonitorDB.sql
Added Technological University Dublin
2023-03-15 09:57:12 +02:00
Serafeim Chatzopoulos 720fd19b39 Add dhp-impact-indicators workflow files 2023-03-14 19:28:27 +02:00
Serafeim Chatzopoulos c6e39b7f33 Add dhp-impact-indicators 2023-03-14 18:50:54 +02:00
Claudio Atzori 518618f1a9 [graph cleaning] avoid to overwrite the subject class to 'keyword' for those with provenance 'subject:fos' 2023-03-14 15:22:47 +01:00
Claudio Atzori 41e00bcd07 [graph provision] avoid to parse again the XML records, apparently the escaped XML characters get unescaped invalidating the record 2023-03-13 15:19:49 +01:00
Claudio Atzori 24e2fd828b code formatting 2023-03-08 21:17:08 +01:00
Claudio Atzori e28d395e87 [aggregator graph] using dedicated path to sync claims, adjusted paths with wildcards 2023-03-08 21:16:52 +01:00
Claudio Atzori 5b8fd37314 [aggregator graph] using dedicated path to sync claims 2023-03-08 15:28:14 +01:00
Claudio Atzori 7fd89566c2 [aggregator graph] handle paths including wildcards 2023-03-08 12:43:00 +01:00
Miriam Baglioni 588aca5ce4 Merge pull request 'h2020classification' (#280) from h2020classification into beta
Reviewed-on: #280
2023-03-03 09:29:10 +01:00
Claudio Atzori 8ec0d62d91 pre-group the records in each table before joning the contents from BETA and PROD together 2023-03-02 14:49:19 +01:00
Miriam Baglioni 0fff98a14c [ECclassification] removed print 2023-03-02 11:46:57 +01:00
Miriam Baglioni b0c2f7e526 [ECclassification] removed not needed resources 2023-03-02 11:44:48 +01:00
Miriam Baglioni d4fc62c2f6 mergin with branch beta 2023-03-02 11:14:54 +01:00
Miriam Baglioni de8ad1caef [ECclassification] new implementation for the H2020 classification 2023-03-02 11:14:03 +01:00
Claudio Atzori db9dad4aa7 [actionmanager] increased spark.sql.shuffle.partitions for publication, dataset, relation records 2023-03-02 09:11:37 +01:00
Miriam Baglioni c1f9848953 [ECclassification] added new classes 2023-03-01 15:29:11 +01:00
Claudio Atzori 6f488547a7 ignore non processable records 2023-03-01 14:49:51 +01:00
Claudio Atzori 7d263f265e adjusted logs 2023-03-01 11:58:07 +01:00
Claudio Atzori 16ad42e8f3 code formatting 2023-03-01 10:22:13 +01:00
Claudio Atzori 9c59dac859 followup changes reorganising the mdstore synchronisation mechanism 2023-03-01 10:16:20 +01:00
Miriam Baglioni ad745c0aa3 [CrossrefFunderMapping] fixed issueson funder name 2023-02-28 14:58:27 +01:00
Miriam Baglioni 4f2df876cd [ECclassification] new implementation first try 2023-02-28 14:44:00 +01:00
Claudio Atzori 2f7346e9cf WIP monodirectional citations, Datacite 2023-02-28 13:30:51 +01:00
Claudio Atzori 0559d8b412 WIP monodirectional citations 2023-02-28 10:57:32 +01:00
Sandro La Bruzzo 69fa616490 removed wrong content 2023-02-28 10:27:38 +01:00
Sandro La Bruzzo 832a75d012 added mapping for crossref funder 2023-02-28 10:16:34 +01:00
Sandro La Bruzzo 78e51c182a Added missing parametero to raw all workflow 2023-02-28 10:16:01 +01:00
Claudio Atzori 7aebedb43c code formatting 2023-02-27 11:51:27 +01:00
Miriam Baglioni 80987801d7 [FoS] added check for null on level1 subject 2023-02-27 11:40:22 +01:00
Claudio Atzori 31e97c2a6b [unresolved entities] updated oozie wf node labels 2023-02-27 11:38:29 +01:00
Miriam Baglioni 23112929e9 [FoS] changed the default separator from comma to tab to solve the issue in subject value split 2023-02-27 10:18:39 +01:00
Serafeim Chatzopoulos 0b5bf53b45 Remove unecessary indexed fields from Solr 2023-02-23 12:42:42 +02:00
dimitrispie 1547611246 Merge branch 'beta' into hive 2023-02-22 16:57:12 +02:00
Michele Artini fddcf701e9 updated the order of the compatibilities 2023-02-22 12:07:09 +01:00
Claudio Atzori 0c1be41b30 code formatting 2023-02-22 10:15:25 +01:00
Claudio Atzori 99cd7761aa cleanup of non necessary dhp-monitor-update workflow 2023-02-22 10:10:22 +01:00
Claudio Atzori cd3a51a15f Merge branch 'beta' into 8232-mdstore-synch-improve 2023-02-22 09:57:07 +01:00
Claudio Atzori 477a7c416f Merge branch 'beta' into UsageCountOnProjectAndDatasource 2023-02-22 09:55:51 +01:00
Claudio Atzori c20c1c9159 Merge pull request 'Added 4 institutions:' (#261) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #261
2023-02-22 09:53:45 +01:00
Miriam Baglioni d617c3e812 [DOIBoost] extended mapping for funder #8407 2023-02-20 14:45:27 +01:00
dimitrispie 90807b60c7 Changes to monitor wf 2023-02-20 10:42:24 +02:00
dimitrispie d2f9ccf934 Changes to separate monitor wf 2023-02-20 10:41:21 +02:00
dimitrispie 032a401cbf Bug fixes 2023-02-20 09:29:20 +02:00
Miriam Baglioni 016337a0f9 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-02-16 15:54:59 +01:00
Sandro La Bruzzo 118c1fc3b3 Merge remote-tracking branch 'origin/beta' into beta 2023-02-15 10:29:28 +01:00
Sandro La Bruzzo a8ac79fa25 Added citation relation on crossref Mapping 2023-02-15 10:29:13 +01:00
dimitrispie 595192d510 Bug fix 2023-02-14 16:24:08 +02:00
dimitrispie f3aaff3688 Remove duplicate orgs 2023-02-14 09:48:36 +02:00
Claudio Atzori 9a03f71db1 code formatting 2023-02-13 16:25:47 +01:00
Michele Artini 554df257ab null values in date range conditions 2023-02-13 16:15:32 +01:00
dimitrispie 3400133c2f Bug fix 2023-02-13 09:44:00 +02:00
dimitrispie 935db0ab25 Added organizations for Monitor 2023-02-13 09:29:09 +02:00
dimitrispie 7b78b15c81 Changes for copying to Impala Cluster 2023-02-13 09:27:00 +02:00
Miriam Baglioni 5cf902a2b0 [UsageCount] changed query to make the sum be computed via sql instead of grouping 2023-02-10 16:16:37 +01:00
Miriam Baglioni f803530df6 [UsageCount] fixed query 2023-02-10 15:50:56 +01:00
Miriam Baglioni bb5bba51b3 [UsageCount] extended test 2023-02-09 19:08:30 +01:00
Miriam Baglioni 85e53fad00 [UsageCount] addition of usagecount for Projects and datasources. Extention of the action set created for the results with new entities for projects and datasources. Extention of the resource set and modification of the testing class 2023-02-09 18:59:45 +01:00
dimitrispie d71f5672d3 Add monitor post step 2023-02-09 13:44:14 +02:00
dimitrispie 35ba8bb328 Bug fixes 2023-02-09 12:57:57 +02:00
Sandro La Bruzzo 8920932dd8 Code formatted 2023-02-08 11:34:18 +01:00
Sandro La Bruzzo 0b9819f1ab Code formatted 2023-02-08 10:32:33 +01:00
Sandro La Bruzzo 6c81a161d2 Merge remote-tracking branch 'origin/beta' into 8231-mdstore-synch-improve 2023-02-08 10:29:09 +01:00
dimitrispie 3ba11d64a1 Changes 07022023 2023-02-07 12:53:51 +02:00
dimitrispie 98c34263ed Update step20-createMonitorDB.sql
Add University of Cape Town organization
2023-02-07 08:14:48 +02:00
dimitrispie 2dc6d47270 Changes 06022023 2023-02-06 13:18:53 +02:00
dimitrispie 973d78a4d6 Update step15_5.sql
Added unpaywalls open access colors
2023-02-02 08:03:54 +02:00
Claudio Atzori d05ca53a14 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-01-31 14:39:53 +01:00
Miriam Baglioni e82e009b46 added missing close tag for XML produced by the xquery to get information for the community from the IS 2023-01-31 10:19:34 +01:00
Miriam Baglioni b254a0375f [Affiliation from institutionalrepo] changed the field to check to verify the datasource type. Now it is in the field jurisdiction 2023-01-26 16:51:20 +01:00
dimitrispie cf58e4a5e4 Added Arts et Métiers ParisTech 2023-01-25 16:03:16 +02:00
dimitrispie db7d625ba9 Addedd Arts et Métiers ParisTech organization 2023-01-25 12:22:21 +02:00
Claudio Atzori 505867bce9 [bulk tagging] better node naming 2023-01-20 16:13:16 +01:00
Miriam Baglioni ecd398fe51 refactoring 2023-01-20 14:23:45 +01:00
Miriam Baglioni 0a5c6010b0 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-01-13 16:14:46 +01:00
dimitrispie 4d7553c9f1 Bug fixes 2023-01-12 17:19:19 +02:00
dimitrispie dd70c32ad7 Bug fixes 2023-01-12 17:18:05 +02:00
dimitrispie 51f7ab5864 Bug fixes 2023-01-12 17:15:06 +02:00
dimitrispie 34d4bf727c Bug fixes 2023-01-12 11:28:37 +02:00
dimitrispie 43f6d4f296 -Monitor DB workflow 2023-01-12 11:26:47 +02:00
dimitrispie 686580a220 - New Monitor DB workflow
- New Organization added
2023-01-12 11:18:03 +02:00
Claudio Atzori 0a58bc7ba7 [broker] prevent NPEs 2023-01-11 14:44:14 +01:00
Claudio Atzori 04cb96001c [broker] d40e20f437 adapted to the beta graph model 2023-01-11 10:10:12 +01:00
Michele Artini 91b845f611 Considering instance pids and alteternative identifiers 2023-01-11 09:58:54 +01:00
Miriam Baglioni 1f367122e4 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-01-11 09:47:44 +01:00
Michele Artini 7b7520850b fixed an invalid char 2023-01-11 09:22:18 +01:00
Miriam Baglioni d6895f0387 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-01-09 17:28:38 +01:00
dimitrispie becb242c17 Monitor DB only Workflow 2023-01-04 16:50:29 +02:00
dimitrispie dcb958e146 Changes to execute the stats wf only in hive 2023-01-04 11:39:01 +02:00
dimitrispie 592013d5dd Added more steps in decision node 2022-12-23 09:43:16 +02:00
dimitrispie 2a4bf32d4c Merge branch 'hive' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into hive
# Conflicts:
#	dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step10.sql
#	dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step13.sql
#	dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step14.sql
#	dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step16_1-definitions.sql
#	dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step7.sql
2022-12-22 10:22:46 +02:00
dimitrispie 6449ff4207 1. Added a decision node to enables the workflow to make a selection on the execution path to follow
2. Added new organization
3. Added 5 new tables from Eurostast
2022-12-22 10:18:21 +02:00
Miriam Baglioni 8893389895 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-12-21 12:42:27 +01:00
Antonis Lempesis c8309fe18e addded command line params to allow hive actions to run 2022-12-21 12:41:33 +02:00
Antonis Lempesis 028873cc51 added new hive opts 2022-12-21 12:41:33 +02:00
Antonis Lempesis 1ddea4f442 removed 'stored as parquet' from views.. 2022-12-21 12:41:33 +02:00
Antonis Lempesis 2754c3dd62 moving data to impala cluster and creating shadow databases there 2022-12-21 12:41:29 +02:00
Antonis Lempesis 778a1a724f finished migration to hive only 2022-12-21 12:41:25 +02:00
Antonis Lempesis e84dd5fe26 first 2022-12-21 12:41:23 +02:00
Sandro La Bruzzo 3c9826f186 updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline 2022-12-21 11:21:17 +01:00
Claudio Atzori 6aa91204a5 [orcid propagation] skip empty directories 2022-12-20 14:15:46 +01:00
Miriam Baglioni 6674cccb94 [BulkTag] description of parameters more comprehensive for those who do not implement it 2022-12-16 15:33:20 +01:00
Miriam Baglioni f37113a941 [BulkTag] moving xquery to get community configuration in dedicated file 2022-12-16 15:32:26 +01:00
Miriam Baglioni 8685eaa706 [Clean Country] added test to verify remove of country 2022-12-16 15:31:25 +01:00
Miriam Baglioni dc0ec88a58 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-12-16 13:18:32 +01:00
Miriam Baglioni d791840b82 [Clean Country] added test to verify remove of country: 2022-12-16 13:18:29 +01:00
Claudio Atzori 7b80b24f82 [cleaning] country cleaning must use both PID and AlternateIdentifier fields 2022-12-15 14:49:04 +01:00
Claudio Atzori b8bafab8a0 [cleaning] improved vocabulary based mapping, specialization for the strict vocab cleaning 2022-12-12 14:43:03 +01:00
Sandro La Bruzzo 5e4866d033 implemented synch for single mdstore 2022-12-12 11:29:46 +01:00
Claudio Atzori c18b8048c3 [cleaning] avoid NPE 2022-12-10 11:41:38 +01:00
Claudio Atzori 8b44afe5e5 [cleaning] avoid NPE 2022-12-09 15:44:57 +01:00
Claudio Atzori 389dd25430 [cleaning] avoid NPE 2022-12-08 18:40:48 +01:00
Claudio Atzori 730228d73d [cleaning] align wf parameter names in test 2022-12-08 18:40:22 +01:00
Claudio Atzori 2094fa6db0 [cleaning] align wf parameter names 2022-12-08 17:22:26 +01:00
Miriam Baglioni a485a94956 [Cleaning] fixed parameter name in property file 2022-12-08 16:59:34 +01:00
Miriam Baglioni 3d99b78d94 [Cleaning] fixed error in parameter (workingPath to workingDir) 2022-12-08 10:25:02 +01:00
Claudio Atzori 1b8488976b code formatting 2022-12-07 10:45:38 +01:00
Claudio Atzori cd1b58483e [bulk tag] fixed Community configuration parsing to void NPE 2022-12-07 10:39:00 +01:00
Claudio Atzori 062abfd669 fixed NPE, removed unused stuff 2022-12-06 12:04:00 +01:00
dimitrispie 2a52a42169 Added 4 institutions:
-University of Modena and Reggio Emilia
-Bilkent University
-Saints Cyril and Methodius University of Skopje
-University of Milan
2022-12-06 10:10:21 +02:00
Claudio Atzori 8248da40d9 Merge branch 'beta' into graph_cleaning 2022-12-02 14:49:00 +01:00
Claudio Atzori ddf065756f Merge pull request 'Two organizations are added for monitor' (#258) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #258
2022-12-02 14:45:27 +01:00
Sandro La Bruzzo 5a48a2fb18 implemented synch for single mdstore 2022-12-01 11:34:43 +01:00
Claudio Atzori a38116546d Merge branch 'beta' into deduptesting 2022-11-30 11:27:29 +01:00
Miriam Baglioni ce020f2c83 [EOSC FUTURE] added resources and test for review 2022-11-30 09:57:30 +01:00
Miriam Baglioni bb0ddc1c44 [BulkTag] adding verb starts_with 2022-11-30 09:56:24 +01:00
Claudio Atzori 8e3edba318 [graph cleaning] testing the collectedfron and hostedby patch procedure 2022-11-29 16:07:09 +01:00
Claudio Atzori 58c05731f9 [graph cleaning] WIP: testing the collectedfron and hostedby patch procedure 2022-11-29 11:21:51 +01:00
Miriam Baglioni 9c70c5dbd6 [Bulk Tag horizontal] added new path in definition of constraint (to recognize fos subjects) - changed test and resource class to test this new aspect 2022-11-28 14:51:20 +01:00
Miriam Baglioni 0628df7a3a resolving conflicts 2022-11-28 10:44:56 +01:00
Claudio Atzori 11695ba649 [graph cleaning] patch also the result's collectedfrom and hostedby datasource name according to the datasource master-duplicate mapping 2022-11-28 10:18:43 +01:00
Claudio Atzori 6082d235d3 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into graph_cleaning 2022-11-28 09:54:48 +01:00
Claudio Atzori 24ef301cc1 [graph cleaning] patch the result's collectedfrom and hostedby identifiers according to the datasource master-duplicate mapping 2022-11-28 09:54:18 +01:00
Alessia Bardi 90c8f9cb61 tests for EOSC Future 2022-11-23 12:18:44 +01:00
Miriam Baglioni 0e3edc5018 [Bulk Tag] fixed issue in verb name 2022-11-23 11:26:36 +01:00
Claudio Atzori a79c47522d updated ORCID datasource identifier 2022-11-23 10:17:49 +01:00
Alessia Bardi 2832117f23 added eoscifguidelines in test 2022-11-22 18:01:12 +01:00
Alessia Bardi 3c08269a4d Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-11-22 17:31:00 +01:00
Alessia Bardi 2687fc9f73 tests for EOSC Future review - ROhub 2022-11-22 17:30:56 +01:00
Claudio Atzori 1d5143b0b6 Merge branch 'beta' into deduptesting 2022-11-22 10:21:30 +01:00
Claudio Atzori 0aa725083f extended dedup testing 2022-11-17 16:13:43 +01:00
Claudio Atzori 3dbc637d3e code formatting 2022-11-17 09:55:41 +01:00
Claudio Atzori ddff0e8999 merging duplicates using IdentifierComparator 2022-11-11 16:10:25 +01:00
Claudio Atzori 5af5a8ae42 added IdentifierComparator 2022-11-09 14:20:59 +01:00
Claudio Atzori 7c3390ac10 Merge branch 'beta' into eoscifguidelines-from-mdstores 2022-11-07 12:18:40 +01:00
dimitrispie 992fc5b628 Added McMaster University Institution 2022-11-03 11:02:18 +02:00
dimitrispie 7fda05e380 Added Autonomous University of Barcelona 2022-11-01 13:59:40 +02:00
Claudio Atzori 22873c9172 Merge pull request 'Added fields: totalcost, fundedamount, currency, in project table' (#257) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #257
2022-10-31 13:49:27 +01:00
dimitrispie 7861c472e0 Hive memory parameters 2022-10-28 19:00:32 +03:00
dimitrispie 5df9c63963 Added fields: totalcost, fundedamount, currency, in project table 2022-10-27 16:44:26 +03:00
Sandro La Bruzzo 2b9a20a4a3 Changed the way Scholexplorer filter the relationships, I found that filter all relation coming from openCitation is wrong, because we loose a lot of relation than intersect OpenCitation, but they don't come only from there 2022-10-24 12:53:47 +02:00
Alessia Bardi 208ed32315 fixed xpath for semantic relation 2022-10-23 18:18:13 +02:00
Alessia Bardi ee759ac92d file format after mvn compile 2022-10-23 18:09:47 +02:00
Alessia Bardi 31a10f000b Map the field oaf:eoscifguidelines from mdstores. Currently we can find it in ROHub metadata 2022-10-23 18:05:37 +02:00
Claudio Atzori ec39b84898 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-10-19 15:21:02 +02:00
Claudio Atzori bca4a61710 suppressing hyper verbose spark logs during unit test execution 2022-10-19 15:20:58 +02:00
Sandro La Bruzzo 72f0d88d6c formatted code 2022-10-19 14:18:42 +02:00
Claudio Atzori 9b449110c6 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-10-14 15:48:04 +02:00
Claudio Atzori ae7cd0735a [graph2hive] more partitions 2022-10-14 15:47:58 +02:00
Sandro La Bruzzo 135cf81151 Merge remote-tracking branch 'origin/beta' into beta 2022-10-13 11:47:25 +02:00
Sandro La Bruzzo a1f94530a3 added documentation 2022-10-13 11:47:11 +02:00
Claudio Atzori b47aaf4dd1 [cleaning] subjects declared as belonging to specific vocabularies whose values are not found in the vocab are set to type keyword 2022-10-13 11:23:43 +02:00
Claudio Atzori 6163ecbf63 [cleaning] renamed parameters in wf action 2022-10-11 11:20:03 +02:00
Claudio Atzori b301e9fdff [cleaning] renamed action name/description 2022-10-11 11:08:52 +02:00
Claudio Atzori ece40adc09 [cleaning] fixing NPE in the country cleaning phase 2022-10-11 10:10:20 +02:00
Claudio Atzori d51275a965 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-10-07 09:52:49 +02:00
Claudio Atzori 8d97949316 [cleaning] fixed loop in wf nodes 2022-10-07 09:52:45 +02:00
Miriam Baglioni a653e1b3ea [Enrichment - result to community through organization] reimplementation of the data preparation step using spark 2022-10-04 15:01:28 +02:00
Miriam Baglioni 4d8339614b Revert "[BipFinder] Fixed issue for wrong escaped char in doi"
This reverts commit 188f25eefa.
2022-10-04 14:29:47 +02:00
Miriam Baglioni 7324853a17 Revert "[BipFinder] refactoring"
This reverts commit 28dc317350.
2022-10-04 14:29:39 +02:00
Miriam Baglioni 28dc317350 [BipFinder] refactoring 2022-10-04 09:47:27 +02:00
Miriam Baglioni 188f25eefa [BipFinder] Fixed issue for wrong escaped char in doi 2022-10-03 12:42:52 +02:00
Claudio Atzori 89f7007080 Merge pull request '[stats wf] misc changes' (#254) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #254
2022-10-03 10:32:05 +02:00
dimitrispie 2c0c3f1806 Cast amount to float for table result_apcs 2022-09-28 19:33:24 +03:00
Alessia Bardi 49360770d7 map w3id as instance url 2022-09-28 14:16:39 +02:00
dimitrispie bdc46e3eaa Remove denormalization of results to fix downloads numbers in monitor 2022-09-28 14:59:08 +03:00
dimitrispie 2ebb1459a9 Fixed type in no_downloads 2022-09-28 14:36:57 +03:00
Miriam Baglioni b5b5a4c192 [CleanCountry] fixed issue 2022-09-28 12:42:51 +02:00
Miriam Baglioni f1d7d45cf7 [BulkTag] fixed issue 2022-09-28 12:01:43 +02:00
Miriam Baglioni 3ec044600d [BulkTag] fixed conflicts 2022-09-28 11:58:28 +02:00
Miriam Baglioni 1cb79719a7 [BulkTag] fixed issues 2022-09-28 11:44:55 +02:00
Claudio Atzori f3f7604e6c trying to fix a test that fails only on Jenkins 2022-09-27 15:21:37 +02:00
Claudio Atzori 3f90d159e3 code formatting 2022-09-27 15:08:00 +02:00
Claudio Atzori 0b3e44e521 Merge branch 'beta' into relation-from-odf 2022-09-27 14:57:01 +02:00
Claudio Atzori 57dbeb08d2 code formatting 2022-09-27 14:55:10 +02:00
Claudio Atzori b60985cf68 Merge branch 'beta' into horizontalConstraints 2022-09-27 14:39:31 +02:00
Claudio Atzori 3b60642ef9 Merge pull request 'Synchronize indicators in stats-db with monitor-db' (#249) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #249
2022-09-27 14:37:33 +02:00
Claudio Atzori 25e9d92aad Merge branch 'beta' into clean_country 2022-09-27 14:27:49 +02:00
Alessia Bardi fd63e9bfac Mapping all relationships supported in ModelConstants and ModelSupport 2022-09-26 11:24:13 +02:00
Miriam Baglioni ca216a92ad [BulkTagging] changed the query to the IS to insert values for FOS and SDG as subject in the configuration used for the tagging 2022-09-23 17:06:07 +02:00
Miriam Baglioni 3e6b0f58bb [BulkTagging] changed the query to the IS to get also the information for the advancedConstraint from the profile 2022-09-23 16:47:19 +02:00
Miriam Baglioni 4a3e119b73 mergin with branch beta 2022-09-23 16:16:06 +02:00
Miriam Baglioni f0e303abf9 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-09-23 16:15:32 +02:00
Miriam Baglioni 55da4d8715 [BulkTagging] modifying code to represent constraints horizontally on all the results. Added subject to the set of field used to express the constraint. Modified resorces to test the new approach. Modified test calss 2022-09-23 16:02:19 +02:00
Alessia Bardi c5eb722170 relationships from relatedIdentifier whose target id type is one of the pid type with an authority 2022-09-23 15:47:05 +02:00
Claudio Atzori c86cc53520 suppressing hyper verbose spark logs during unit test execution 2022-09-23 15:20:40 +02:00
Alessia Bardi ba33ff71fd refactoring for the generation of relationships from related identifier of type 'OPENAIRE' 2022-09-23 15:17:13 +02:00
Alessia Bardi 982bcc1e35 test wrid pid and record identifier 2022-09-23 12:06:06 +02:00
Miriam Baglioni 960cb861a0 refactoring 2022-09-23 11:14:04 +02:00
Claudio Atzori c42850328e fixed semantic (subreltype) for ServiceOrganization relations 2022-09-22 16:23:25 +02:00
Miriam Baglioni 33bb79459e Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-09-22 15:55:17 +02:00
dimitrispie dcd85f8cd7 - Synchronize indicators in stats-db with monitor-db
- added new openorg id for Nanyang Technological University
- changed openorg id for University of Helsinki #8088 ticket
2022-09-22 13:33:07 +03:00
Claudio Atzori e45ec15221 Merge branch 'beta' into clean_country 2022-09-19 11:34:02 +02:00
Claudio Atzori 26e1badded added instance.url syntactical validation, avoid creating multiple duplicated URLs 2022-09-19 11:19:10 +02:00
Miriam Baglioni 5240ac3d7b [EOSC Tag] remove addition of eosc context for result with eosc if guidelines set 2022-09-19 11:02:18 +02:00
Claudio Atzori 192215a18e merged from branch discard-non-wellformed 2022-09-19 10:17:10 +02:00
Claudio Atzori e370e940d8 [aggregator graph] save invalid records aside for further inspection 2022-09-16 14:06:28 +02:00
Claudio Atzori 465e941214 Merge pull request '[stats wf] Changes to indicators tables' (#244) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #244
2022-09-16 10:13:58 +02:00
Claudio Atzori 1e42d984e1 [aggregator graph] save invalid records aside for further inspection 2022-09-15 10:49:42 +02:00
Alessia Bardi 9e7ec4198f fixed test 2022-09-14 18:08:56 +02:00
Claudio Atzori c48f6e9c57 [aggregator graph] save invalid records aside for further inspection 2022-09-14 17:11:26 +02:00
dimitrispie 3bf3127251 Changes to monitor and indicator scripts 2022-09-14 16:36:19 +03:00
Claudio Atzori a0919ed495 [aggregator graph] save invalid records aside for further inspection 2022-09-14 13:27:39 +02:00
Alessia Bardi b99a011345 return empty Oaf list if record cannot be parsed 2022-09-13 11:51:55 +02:00
Alessia Bardi 27af5122d2 logs for non well formed XML files 2022-09-12 14:25:23 +02:00
Claudio Atzori ff6f789b6d code formatting 2022-09-09 15:16:31 +02:00
Claudio Atzori b5d6966c01 Merge branch 'beta' into clean_country 2022-09-09 12:20:19 +02:00
Claudio Atzori b5f7bd30be Merge branch 'beta' into clean_subjects 2022-09-09 12:20:04 +02:00
Alessia Bardi f14107ad77 Merge branch 'handle_as_instance_urls' of https://code-repo.d4science.org/D-Net/dnet-hadoop into handle_as_instance_urls 2022-09-09 12:17:19 +02:00
Alessia Bardi a539c6ccaf https for handle URLs 2022-09-09 12:16:28 +02:00
dimitrispie 71b069ca90 Changes to indicator and monitor scripts 2022-09-09 13:15:58 +03:00
Claudio Atzori 1203378441 Merge branch 'beta' into clean_subjects 2022-09-09 10:38:47 +02:00