Commit Graph

3915 Commits

Author SHA1 Message Date
dimitrispie 4648cd88d4 Update step15.sql
Cast score to double
2023-06-21 10:02:19 +03:00
dimitrispie 94d2573c77 Update step15.sql
Bug Fix
2023-06-21 09:22:39 +03:00
Claudio Atzori 0561362de2 Merge pull request 'Update step20-createMonitorDB_institutions.sql' (#309) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#309
2023-06-20 15:07:09 +02:00
Claudio Atzori 50d7dc0078 [graph enrichment] fixed projectOrganizationPath not being passed to the apply_resulttoorganization_propagation node 2023-06-19 15:42:44 +02:00
Claudio Atzori fbd9bf704e indent 2023-06-19 15:41:22 +02:00
dimitrispie be2caedb04 Update step20-createMonitorDB_institutions.sql
Add openorgs____::1624ff7c01bb641b91f4518539a0c28a Vrije Universiteit Amsterdam
2023-06-19 12:12:17 +03:00
dimitrispie 36e0a8fec4 Changes to Promotion Stats WF
1. Add new cluster host at impala-shell commands
2. Add a step for splitting monitor dbs
3. Update workflow.xml to included the new splitting monitor dbs step
2023-06-19 09:44:34 +03:00
dimitrispie 4c770a5e29 Update finalizeImpalaCluster.sh
Drop views in shadow dbs before dropping the db
2023-06-15 13:25:37 +03:00
dimitrispie e06d962a6a Update step15.sql 2023-06-15 12:20:35 +03:00
dimitrispie afcad08396 Update step20-createMonitorDB_institutions.sql
Added openorgs____::c0b262bd6eab819e4c994914f9c010e2   -- National Institute of Geophysics and Volcanology
2023-06-15 10:28:49 +03:00
Claudio Atzori b9748763e2 Merge pull request '[stats wf] Bug fixes' (#308) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#308
2023-06-14 21:57:03 +02:00
dimitrispie 42b8ce2ba4 Update copyDataToImpalaCluster.sh 2023-06-14 19:23:42 +03:00
dimitrispie 2032b0df40 Bug fixes
1. Remove tables/views from old databases in the new cluster, before dropping the dbs
2. Fix id in result_accessroute, indi_impact_measures, indi_pub_bronze_oa
2023-06-14 19:09:09 +03:00
Claudio Atzori b76a47b103 [aggregator graph] added column alias when mapping organization PIDs from the OpenOrgs database 2023-06-13 11:38:10 +02:00
Claudio Atzori ad04f14b81 Merge branch 'beta' into distinct_pids_from_openorgs_beta 2023-06-12 09:58:21 +02:00
Claudio Atzori 55f002f1e9 Merge branch 'beta' into propagationProjectThroughParentChils 2023-06-12 09:56:53 +02:00
Claudio Atzori 4b00a76271 Merge branch 'beta' into fulltext_url_validation 2023-06-12 09:55:25 +02:00
Claudio Atzori de225c71cd Merge branch 'beta' into removeTaggingCondition 2023-06-12 09:50:40 +02:00
Claudio Atzori e1409ffe80 update sql query to return distinct pids 2023-06-12 09:47:45 +02:00
Claudio Atzori da7b66c542 Merge pull request '[stats wf] Added memory to hive' (#305) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#305
2023-06-08 08:58:48 +02:00
dimitrispie c5f42c7f5b Added memory to hive 2023-06-07 18:18:23 +03:00
Claudio Atzori afb76ebf0f Merge pull request '[stats wf] Bug fix on indicators step' (#304) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#304
2023-06-07 16:49:09 +02:00
dimitrispie fa24e2e18f Bug fix on indicators step
indi_pub_gold_oa table was missing during the creation of other indicators
2023-06-07 17:43:37 +03:00
Claudio Atzori 01c67e697d Merge pull request '[ stats wf] Bug fix' (#303) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#303
2023-06-07 14:41:44 +02:00
dimitrispie 28272c1b0e Bug fix 2023-06-07 15:34:01 +03:00
Alessia Bardi d5be6a13e9 Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal 2023-06-06 14:43:32 +02:00
Claudio Atzori 8f651f1225 Merge pull request 'Changes to beta stats wf' (#300) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#300
2023-06-06 11:41:36 +02:00
dimitrispie ad07fbf053 Add names to organizations for collaboration indicators 2023-06-02 14:13:10 +03:00
dimitrispie 2324670714 Split Monitor DBs-Interdisciplinarity indicators
- Split DBs Monitor for faster rendering of visualizations
- Add interdisciplinarity indicators from result_fos
2023-06-02 13:34:16 +03:00
Miriam Baglioni daf4d7971b refactoring 2023-05-31 18:56:58 +02:00
Miriam Baglioni 97d72d41c3 finalization of implementation and testing 2023-05-31 18:53:22 +02:00
Miriam Baglioni 0389b57ca7 added propagation for project to organization 2023-05-31 11:06:58 +02:00
Claudio Atzori e45777e7e1 [aggregator graph] added validation for URLs mapped from oaf:fulltext 2023-05-26 11:33:42 +02:00
dimitrispie ebe586b1d1 Impact indicators/Unpaywall
- Added Impact indicators
- Added unpaywall open access colours
2023-05-26 10:25:28 +03:00
dimitrispie d6102dd576 Update step16-createIndicatorsTables.sql
- Add org names to indi_project_collab_org
- Add indi_pub_bronze_oa
 - Changes to indi_pub_hybrid_oa_with_cc
2023-05-25 14:52:34 +03:00
Miriam Baglioni 9097e71853 Added assertion in test 2023-05-24 16:30:53 +02:00
Miriam Baglioni 9567c13bc3 refactoring 2023-05-24 16:20:05 +02:00
Miriam Baglioni 34172455d1 [BulkTag] Adding remove constraints to specify when a community must not appear in the context of a result. 2023-05-24 09:56:23 +02:00
Ilias Kanellos a1b9187039 Fix syntax error on workflow.xml 2023-05-23 17:17:12 +03:00
Ilias Kanellos 6a7e370a21 Remove unnecessary counts in graph creation 2023-05-23 16:48:58 +03:00
Ilias Kanellos ec4e010687 End after rankings | Create graph debugged 2023-05-23 16:44:04 +03:00
Claudio Atzori a235d2a24a Merge pull request 'Updates to steps related to transfer data to impala cluster' (#295) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#295
2023-05-18 08:46:15 +02:00
dimitrispie 86f4f63daf Updates to steps related to transfer data to impala cluster
1. Remove external table definitions in stats_ext
2. Fix the issue where some views are not created.
3. Added two workflow parameters for copying also the usage stats dbs
2023-05-18 09:33:05 +03:00
Claudio Atzori 909729a2fc [dedup] tweaking num partitions, minor changes 2023-05-17 10:16:22 +02:00
Ilias Kanellos 38020e242a Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow 2023-05-16 17:34:53 +03:00
Ilias Kanellos 3d69f33c84 Fix selection of columns in graph creation 2023-05-16 17:34:42 +03:00
Ilias Kanellos 3c38f7ba6f Fix selection of columns in graph creation 2023-05-16 17:32:53 +03:00
Serafeim Chatzopoulos 8ef718c363 Fix workflow application path 2023-05-16 16:28:48 +03:00
Serafeim Chatzopoulos 26328e2a0d Move job.properties 2023-05-16 14:39:53 +03:00
Serafeim Chatzopoulos 4eec3e7052 Add jobTracker, nameNode && spark2Lib as global params in oozie wf 2023-05-15 22:28:48 +03:00
Serafeim Chatzopoulos b83135c252 Add missing kill nodes in workflow.xml 2023-05-15 19:55:35 +03:00
Serafeim Chatzopoulos 45f2aa0867 Move end node ... at the end in workflow.xml 2023-05-15 17:52:20 +03:00
Claudio Atzori 8acad52a0c Merge branch 'beta' into apc_affiliation 2023-05-15 15:47:33 +02:00
Claudio Atzori 8a463cc3e8 fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common 2023-05-15 15:44:46 +02:00
Serafeim Chatzopoulos 12a57e1f58 Resolve conflicts 2023-05-15 16:20:11 +03:00
Serafeim Chatzopoulos 82e2a96f51 Resolve conflicts 2023-05-15 15:53:12 +03:00
Serafeim Chatzopoulos b8e8c959fe Update workflow.xml && job.properties 2023-05-15 15:50:23 +03:00
Ilias Kanellos 4a905932a3 Spark properties from job.properties 2023-05-15 15:24:22 +03:00
Claudio Atzori 0c314d5e09 Merge pull request 'Update copyDataToImpalaCluster.sh' (#293) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#293
2023-05-15 12:05:54 +02:00
Serafeim Chatzopoulos 07818131ef Update documentation 2023-05-15 13:04:44 +03:00
dimitrispie b3f9633205 Update copyDataToImpalaCluster.sh
Added option --user to impala-shell command
2023-05-15 12:51:44 +03:00
Miriam Baglioni 78b07400c0 changed test classes 2023-05-15 11:37:08 +02:00
Miriam Baglioni 86fe886c1a removed the inverse of the Citing relation 2023-05-15 11:20:51 +02:00
Ilias Kanellos 1788ac2d4d Correct filtering for MAG records 2023-05-12 12:55:43 +03:00
Miriam Baglioni 12cd179d2d Merge pull request 'Update copyDataToImpalaCluster.sh' (#291) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#291
2023-05-12 11:36:34 +02:00
dimitrispie 00d0d162b6 Update copyDataToImpalaCluster.sh
Added a temporary folder to copy the files to avoid permission issues
2023-05-12 12:31:13 +03:00
Ilias Kanellos 5ddbb4ad10 Spark properties no longer hardcoded 2023-05-11 15:36:47 +03:00
Ilias Kanellos 3de35fd6a3 Produce 5 classes of ranking scores 2023-05-11 14:42:25 +03:00
Miriam Baglioni 8c05f49665 moved the version as it was before the change 2023-05-09 10:48:34 +02:00
Miriam Baglioni 99ac5bab46 added check to avoid NPE when checking the organization country 2023-05-04 19:38:39 +02:00
Claudio Atzori 0704e186f6 Merge pull request 'Stats wf executed on hive only' (#283) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#283
2023-05-02 14:05:12 +02:00
Claudio Atzori d8882c4481 extended mapping applied to datacite records to produce affiliations using the ROR ids. Inc ase of APCs it includes the amount and the currently in the relation 2023-05-02 11:56:51 +02:00
dimitrispie c3d58e58e1 Bug fixes 2023-05-02 11:54:07 +03:00
Claudio Atzori abd7ca0c18 Merge branch 'beta' into bulkTagRefactor 2023-05-02 10:50:01 +02:00
Claudio Atzori 45f625d14f Merge branch 'beta' into organizationToRepresentative 2023-05-02 10:46:55 +02:00
Claudio Atzori de11edca98 Merge branch 'beta' into organizationToRepresentative 2023-05-02 09:59:41 +02:00
Claudio Atzori 851f664bd9 Merge branch 'beta' into graph_cleaning_refactoring 2023-05-02 09:55:40 +02:00
dimitrispie e57ecdaf98 Update step20-createMonitorDB.sql
Add University of Manitoba
2023-04-30 17:52:23 +03:00
Ilias Kanellos 90332439ad Remove deletion of synonym folder 2023-04-28 13:45:19 +03:00
Ilias Kanellos a98da54896 Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow 2023-04-28 13:23:49 +03:00
Ilias Kanellos 09485fbee3 Fixed unicode bug. Workflow ends after first script 2023-04-28 13:09:13 +03:00
Serafeim Chatzopoulos 614cc1089b Add separate forder for results && project actionsets 2023-04-27 12:37:15 +03:00
Serafeim Chatzopoulos 815a4ddbba Add actionset creation for project bip indicators in workflow 2023-04-26 20:40:06 +03:00
Serafeim Chatzopoulos ee04cf92bf Add actionsets for project impact indicators 2023-04-26 20:23:46 +03:00
dimitrispie fdb5d2b39f Bug fixes 2023-04-23 18:29:00 +03:00
dimitrispie 53ce023035 Bug fixes 2023-04-23 18:23:45 +03:00
dimitrispie 4fa750b719 Bug fixes on monitor-update 2023-04-19 17:39:53 +03:00
dimitrispie 5247cb7115 Bug fix 2023-04-19 11:11:19 +03:00
Miriam Baglioni efc4f6a658 [bulkTag] refactor to enrich each result single step 2023-04-18 17:39:31 +02:00
Serafeim Chatzopoulos 23f58a86f1 Change jar param in project impact indicators action 2023-04-18 12:26:01 +03:00
Miriam Baglioni 697a134504 - 2023-04-18 10:21:12 +02:00
Miriam Baglioni 6cc95c96a2 - 2023-04-18 09:53:11 +02:00
dimitrispie 25dafccc24 Merge branch 'hive' into beta 2023-04-12 11:36:59 +03:00
Claudio Atzori a2dcb06daf added eoscifguidelines in the result view; removed compute statistics statements 2023-04-11 10:43:32 +02:00
Serafeim Chatzopoulos 7256c8d3c7 Add script for aggregating impact indicators at the project level 2023-04-07 16:30:12 +03:00
dimitrispie c85de8fa1f -Added Technological University Dublin
-Added project_organization_contribution table
-Add   Delft University of Technology
2023-04-07 09:22:59 +03:00
dimitrispie 9b41dff33c Update step20-createMonitorDB.sql
Added Delft University of Technology
2023-04-07 09:21:38 +03:00
Miriam Baglioni 932d07d2dd [bulkTag] added filtering for datasources in eosctag 2023-04-06 15:08:27 +02:00
Miriam Baglioni 287753417d better implementation for the fix 2023-04-06 12:22:38 +02:00
Miriam Baglioni b42abc9904 fixed issue on bulktagging for the advanced constraints 2023-04-06 12:15:00 +02:00
dimitrispie 91e18ac7f4 Added project_organization_contribution table 2023-04-06 10:53:11 +03:00
Miriam Baglioni b25b401065 added test to verify the advconstraints to dth community. inserted some additional logs. 2023-04-05 12:18:39 +02:00
Claudio Atzori 864f4051d3 [graph cleaning] added missing case 2023-04-05 11:35:47 +02:00
Claudio Atzori dead87917f [graph cleaning] cleanup 2023-04-04 13:13:43 +02:00
Claudio Atzori 2a6ba29b64 [graph cleaning] unit tests & cleanup 2023-04-04 12:34:51 +02:00
dimitrispie 9e1335df4c -Added Technological University Dublin
-Added project_organization_contribution table
2023-04-04 13:22:40 +03:00
Claudio Atzori 63b8bbc015 [graph to Solr] using dedicated sparkExecutorCores, sparkExecutorMemory, sparkDriverMemory in convert_to_xml 2023-03-24 13:43:20 +01:00
Claudio Atzori b502f86523 fixed input path supplemented to GetDatasourceFromCountry; adjusted the various spark.sql.shuffle.partitions 2023-03-24 13:09:12 +01:00
Claudio Atzori c07857fa37 [graph cleaning] unit tests & cleanup 2023-03-23 15:57:47 +01:00
Claudio Atzori 90e61a8aba [graph cleaning] WIP: refactoring of the cleaning stages, unit tests 2023-03-23 15:03:26 +01:00
Claudio Atzori 308e10d102 serialising: 1. measures for all the entity types and 2. result level fulltext 2023-03-23 11:23:22 +01:00
Claudio Atzori 488d9a5eaa [graph cleaning] WIP: refactoring of the cleaning stages, unit tests 2023-03-23 10:41:13 +01:00
dimitrispie fad7fa4af8 Added Technological University Dublin 2023-03-22 09:44:00 +02:00
Serafeim Chatzopoulos 102aa5ab81 Add dependency to dhp-aggregation 2023-03-21 19:25:29 +02:00
Serafeim Chatzopoulos 3e8a4cf952 Rearrange resources folder structure 2023-03-21 18:25:55 +02:00
Serafeim Chatzopoulos f992ecb657 Checkout BIP-Ranker during 'prepare-package' && add it in the oozie-package.tar.gz 2023-03-21 18:03:55 +02:00
Ilias Kanellos 9dc8f0f05f Add ActionSet step 2023-03-21 16:14:15 +02:00
Claudio Atzori 4f5ba0ed52 [graph cleaning] WIP: refactoring of the cleaning stages, unit tests 2023-03-21 14:41:20 +01:00
Ilias Kanellos b5c252865c Add filtering based on citation source 2023-03-20 15:38:36 +02:00
Claudio Atzori 6d3d18d8b5 [graph cleaning] WIP: refactoring of the cleaning stages 2023-03-16 17:23:36 +01:00
dimitrispie 43b23a9bf3 Update step20-createMonitorDB.sql
Added Technological University Dublin
2023-03-15 09:57:12 +02:00
Serafeim Chatzopoulos 720fd19b39 Add dhp-impact-indicators workflow files 2023-03-14 19:28:27 +02:00
Serafeim Chatzopoulos c6e39b7f33 Add dhp-impact-indicators 2023-03-14 18:50:54 +02:00
Claudio Atzori 518618f1a9 [graph cleaning] avoid to overwrite the subject class to 'keyword' for those with provenance 'subject:fos' 2023-03-14 15:22:47 +01:00
Claudio Atzori 41e00bcd07 [graph provision] avoid to parse again the XML records, apparently the escaped XML characters get unescaped invalidating the record 2023-03-13 15:19:49 +01:00
Claudio Atzori 24e2fd828b code formatting 2023-03-08 21:17:08 +01:00
Claudio Atzori e28d395e87 [aggregator graph] using dedicated path to sync claims, adjusted paths with wildcards 2023-03-08 21:16:52 +01:00
Claudio Atzori 5b8fd37314 [aggregator graph] using dedicated path to sync claims 2023-03-08 15:28:14 +01:00
Claudio Atzori 7fd89566c2 [aggregator graph] handle paths including wildcards 2023-03-08 12:43:00 +01:00
Miriam Baglioni 588aca5ce4 Merge pull request 'h2020classification' (#280) from h2020classification into beta
Reviewed-on: D-Net/dnet-hadoop#280
2023-03-03 09:29:10 +01:00
Claudio Atzori 8ec0d62d91 pre-group the records in each table before joning the contents from BETA and PROD together 2023-03-02 14:49:19 +01:00
Miriam Baglioni 0fff98a14c [ECclassification] removed print 2023-03-02 11:46:57 +01:00
Miriam Baglioni b0c2f7e526 [ECclassification] removed not needed resources 2023-03-02 11:44:48 +01:00
Miriam Baglioni d4fc62c2f6 mergin with branch beta 2023-03-02 11:14:54 +01:00
Miriam Baglioni de8ad1caef [ECclassification] new implementation for the H2020 classification 2023-03-02 11:14:03 +01:00
Claudio Atzori db9dad4aa7 [actionmanager] increased spark.sql.shuffle.partitions for publication, dataset, relation records 2023-03-02 09:11:37 +01:00
Miriam Baglioni c1f9848953 [ECclassification] added new classes 2023-03-01 15:29:11 +01:00
Claudio Atzori 6f488547a7 ignore non processable records 2023-03-01 14:49:51 +01:00
Claudio Atzori 7d263f265e adjusted logs 2023-03-01 11:58:07 +01:00
Claudio Atzori 16ad42e8f3 code formatting 2023-03-01 10:22:13 +01:00
Claudio Atzori 9c59dac859 followup changes reorganising the mdstore synchronisation mechanism 2023-03-01 10:16:20 +01:00
Miriam Baglioni ad745c0aa3 [CrossrefFunderMapping] fixed issueson funder name 2023-02-28 14:58:27 +01:00
Miriam Baglioni 4f2df876cd [ECclassification] new implementation first try 2023-02-28 14:44:00 +01:00
Claudio Atzori 2f7346e9cf WIP monodirectional citations, Datacite 2023-02-28 13:30:51 +01:00
Claudio Atzori 0559d8b412 WIP monodirectional citations 2023-02-28 10:57:32 +01:00
Sandro La Bruzzo 69fa616490 removed wrong content 2023-02-28 10:27:38 +01:00
Sandro La Bruzzo 832a75d012 added mapping for crossref funder 2023-02-28 10:16:34 +01:00
Sandro La Bruzzo 78e51c182a Added missing parametero to raw all workflow 2023-02-28 10:16:01 +01:00
Claudio Atzori 7aebedb43c code formatting 2023-02-27 11:51:27 +01:00
Miriam Baglioni 80987801d7 [FoS] added check for null on level1 subject 2023-02-27 11:40:22 +01:00