Claudio Atzori
b0ebf56367
Merge pull request 'Update step15_5.sql' ( #314 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#314
2023-06-21 10:33:22 +02:00
dimitrispie
2b6370eaee
Update step15_5.sql
...
Bug fix
2023-06-21 11:31:10 +03:00
Claudio Atzori
35e42a86ed
Merge pull request 'Update step15_5.sql' ( #313 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#313
2023-06-21 10:26:16 +02:00
dimitrispie
74cb060bfe
Update step15_5.sql
...
Add "if not exists" clause
2023-06-21 11:24:06 +03:00
Claudio Atzori
85e016df17
Merge pull request 'Update step16-createIndicatorsTables.sql' ( #312 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#312
2023-06-21 09:52:33 +02:00
dimitrispie
a475cfcb7b
Update step16-createIndicatorsTables.sql
...
Rename a field in indi_pub_interdisciplinarity
2023-06-21 10:42:02 +03:00
Claudio Atzori
979cf9cd87
Merge pull request 'Update step15.sql' ( #311 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#311
2023-06-21 09:20:01 +02:00
dimitrispie
4648cd88d4
Update step15.sql
...
Cast score to double
2023-06-21 10:02:19 +03:00
dimitrispie
94d2573c77
Update step15.sql
...
Bug Fix
2023-06-21 09:22:39 +03:00
Claudio Atzori
0561362de2
Merge pull request 'Update step20-createMonitorDB_institutions.sql' ( #309 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#309
2023-06-20 15:07:09 +02:00
Claudio Atzori
50d7dc0078
[graph enrichment] fixed projectOrganizationPath not being passed to the apply_resulttoorganization_propagation node
2023-06-19 15:42:44 +02:00
Claudio Atzori
fbd9bf704e
indent
2023-06-19 15:41:22 +02:00
dimitrispie
be2caedb04
Update step20-createMonitorDB_institutions.sql
...
Add openorgs____::1624ff7c01bb641b91f4518539a0c28a Vrije Universiteit Amsterdam
2023-06-19 12:12:17 +03:00
dimitrispie
36e0a8fec4
Changes to Promotion Stats WF
...
1. Add new cluster host at impala-shell commands
2. Add a step for splitting monitor dbs
3. Update workflow.xml to included the new splitting monitor dbs step
2023-06-19 09:44:34 +03:00
dimitrispie
4c770a5e29
Update finalizeImpalaCluster.sh
...
Drop views in shadow dbs before dropping the db
2023-06-15 13:25:37 +03:00
dimitrispie
e06d962a6a
Update step15.sql
2023-06-15 12:20:35 +03:00
dimitrispie
afcad08396
Update step20-createMonitorDB_institutions.sql
...
Added openorgs____::c0b262bd6eab819e4c994914f9c010e2 -- National Institute of Geophysics and Volcanology
2023-06-15 10:28:49 +03:00
Claudio Atzori
b9748763e2
Merge pull request '[stats wf] Bug fixes' ( #308 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#308
2023-06-14 21:57:03 +02:00
dimitrispie
42b8ce2ba4
Update copyDataToImpalaCluster.sh
2023-06-14 19:23:42 +03:00
dimitrispie
2032b0df40
Bug fixes
...
1. Remove tables/views from old databases in the new cluster, before dropping the dbs
2. Fix id in result_accessroute, indi_impact_measures, indi_pub_bronze_oa
2023-06-14 19:09:09 +03:00
Claudio Atzori
b76a47b103
[aggregator graph] added column alias when mapping organization PIDs from the OpenOrgs database
2023-06-13 11:38:10 +02:00
Claudio Atzori
ad04f14b81
Merge branch 'beta' into distinct_pids_from_openorgs_beta
2023-06-12 09:58:21 +02:00
Claudio Atzori
55f002f1e9
Merge branch 'beta' into propagationProjectThroughParentChils
2023-06-12 09:56:53 +02:00
Claudio Atzori
4b00a76271
Merge branch 'beta' into fulltext_url_validation
2023-06-12 09:55:25 +02:00
Claudio Atzori
de225c71cd
Merge branch 'beta' into removeTaggingCondition
2023-06-12 09:50:40 +02:00
Claudio Atzori
e1409ffe80
update sql query to return distinct pids
2023-06-12 09:47:45 +02:00
Claudio Atzori
da7b66c542
Merge pull request '[stats wf] Added memory to hive' ( #305 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#305
2023-06-08 08:58:48 +02:00
dimitrispie
c5f42c7f5b
Added memory to hive
2023-06-07 18:18:23 +03:00
Claudio Atzori
afb76ebf0f
Merge pull request '[stats wf] Bug fix on indicators step' ( #304 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#304
2023-06-07 16:49:09 +02:00
dimitrispie
fa24e2e18f
Bug fix on indicators step
...
indi_pub_gold_oa table was missing during the creation of other indicators
2023-06-07 17:43:37 +03:00
Claudio Atzori
01c67e697d
Merge pull request '[ stats wf] Bug fix' ( #303 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#303
2023-06-07 14:41:44 +02:00
dimitrispie
28272c1b0e
Bug fix
2023-06-07 15:34:01 +03:00
Alessia Bardi
d5be6a13e9
Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal
2023-06-06 14:43:32 +02:00
Claudio Atzori
8f651f1225
Merge pull request 'Changes to beta stats wf' ( #300 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#300
2023-06-06 11:41:36 +02:00
dimitrispie
ad07fbf053
Add names to organizations for collaboration indicators
2023-06-02 14:13:10 +03:00
dimitrispie
2324670714
Split Monitor DBs-Interdisciplinarity indicators
...
- Split DBs Monitor for faster rendering of visualizations
- Add interdisciplinarity indicators from result_fos
2023-06-02 13:34:16 +03:00
Miriam Baglioni
daf4d7971b
refactoring
2023-05-31 18:56:58 +02:00
Miriam Baglioni
97d72d41c3
finalization of implementation and testing
2023-05-31 18:53:22 +02:00
Miriam Baglioni
0389b57ca7
added propagation for project to organization
2023-05-31 11:06:58 +02:00
Claudio Atzori
e45777e7e1
[aggregator graph] added validation for URLs mapped from oaf:fulltext
2023-05-26 11:33:42 +02:00
dimitrispie
ebe586b1d1
Impact indicators/Unpaywall
...
- Added Impact indicators
- Added unpaywall open access colours
2023-05-26 10:25:28 +03:00
dimitrispie
d6102dd576
Update step16-createIndicatorsTables.sql
...
- Add org names to indi_project_collab_org
- Add indi_pub_bronze_oa
- Changes to indi_pub_hybrid_oa_with_cc
2023-05-25 14:52:34 +03:00
Miriam Baglioni
9097e71853
Added assertion in test
2023-05-24 16:30:53 +02:00
Miriam Baglioni
9567c13bc3
refactoring
2023-05-24 16:20:05 +02:00
Miriam Baglioni
34172455d1
[BulkTag] Adding remove constraints to specify when a community must not appear in the context of a result.
2023-05-24 09:56:23 +02:00
Ilias Kanellos
a1b9187039
Fix syntax error on workflow.xml
2023-05-23 17:17:12 +03:00
Ilias Kanellos
6a7e370a21
Remove unnecessary counts in graph creation
2023-05-23 16:48:58 +03:00
Ilias Kanellos
ec4e010687
End after rankings | Create graph debugged
2023-05-23 16:44:04 +03:00
Claudio Atzori
a235d2a24a
Merge pull request 'Updates to steps related to transfer data to impala cluster' ( #295 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#295
2023-05-18 08:46:15 +02:00
dimitrispie
86f4f63daf
Updates to steps related to transfer data to impala cluster
...
1. Remove external table definitions in stats_ext
2. Fix the issue where some views are not created.
3. Added two workflow parameters for copying also the usage stats dbs
2023-05-18 09:33:05 +03:00
Claudio Atzori
909729a2fc
[dedup] tweaking num partitions, minor changes
2023-05-17 10:16:22 +02:00
Ilias Kanellos
38020e242a
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-05-16 17:34:53 +03:00
Ilias Kanellos
3d69f33c84
Fix selection of columns in graph creation
2023-05-16 17:34:42 +03:00
Ilias Kanellos
3c38f7ba6f
Fix selection of columns in graph creation
2023-05-16 17:32:53 +03:00
Serafeim Chatzopoulos
8ef718c363
Fix workflow application path
2023-05-16 16:28:48 +03:00
Serafeim Chatzopoulos
26328e2a0d
Move job.properties
2023-05-16 14:39:53 +03:00
Serafeim Chatzopoulos
4eec3e7052
Add jobTracker, nameNode && spark2Lib as global params in oozie wf
2023-05-15 22:28:48 +03:00
Serafeim Chatzopoulos
b83135c252
Add missing kill nodes in workflow.xml
2023-05-15 19:55:35 +03:00
Serafeim Chatzopoulos
45f2aa0867
Move end node ... at the end in workflow.xml
2023-05-15 17:52:20 +03:00
Claudio Atzori
8acad52a0c
Merge branch 'beta' into apc_affiliation
2023-05-15 15:47:33 +02:00
Claudio Atzori
8a463cc3e8
fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common
2023-05-15 15:44:46 +02:00
Serafeim Chatzopoulos
12a57e1f58
Resolve conflicts
2023-05-15 16:20:11 +03:00
Serafeim Chatzopoulos
82e2a96f51
Resolve conflicts
2023-05-15 15:53:12 +03:00
Serafeim Chatzopoulos
b8e8c959fe
Update workflow.xml && job.properties
2023-05-15 15:50:23 +03:00
Ilias Kanellos
4a905932a3
Spark properties from job.properties
2023-05-15 15:24:22 +03:00
Claudio Atzori
0c314d5e09
Merge pull request 'Update copyDataToImpalaCluster.sh' ( #293 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#293
2023-05-15 12:05:54 +02:00
Serafeim Chatzopoulos
07818131ef
Update documentation
2023-05-15 13:04:44 +03:00
dimitrispie
b3f9633205
Update copyDataToImpalaCluster.sh
...
Added option --user to impala-shell command
2023-05-15 12:51:44 +03:00
Miriam Baglioni
78b07400c0
changed test classes
2023-05-15 11:37:08 +02:00
Miriam Baglioni
86fe886c1a
removed the inverse of the Citing relation
2023-05-15 11:20:51 +02:00
Ilias Kanellos
1788ac2d4d
Correct filtering for MAG records
2023-05-12 12:55:43 +03:00
Miriam Baglioni
12cd179d2d
Merge pull request 'Update copyDataToImpalaCluster.sh' ( #291 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#291
2023-05-12 11:36:34 +02:00
dimitrispie
00d0d162b6
Update copyDataToImpalaCluster.sh
...
Added a temporary folder to copy the files to avoid permission issues
2023-05-12 12:31:13 +03:00
Ilias Kanellos
5ddbb4ad10
Spark properties no longer hardcoded
2023-05-11 15:36:47 +03:00
Ilias Kanellos
3de35fd6a3
Produce 5 classes of ranking scores
2023-05-11 14:42:25 +03:00
Miriam Baglioni
8c05f49665
moved the version as it was before the change
2023-05-09 10:48:34 +02:00
Miriam Baglioni
99ac5bab46
added check to avoid NPE when checking the organization country
2023-05-04 19:38:39 +02:00
Claudio Atzori
0704e186f6
Merge pull request 'Stats wf executed on hive only' ( #283 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#283
2023-05-02 14:05:12 +02:00
Claudio Atzori
d8882c4481
extended mapping applied to datacite records to produce affiliations using the ROR ids. Inc ase of APCs it includes the amount and the currently in the relation
2023-05-02 11:56:51 +02:00
dimitrispie
c3d58e58e1
Bug fixes
2023-05-02 11:54:07 +03:00
Claudio Atzori
abd7ca0c18
Merge branch 'beta' into bulkTagRefactor
2023-05-02 10:50:01 +02:00
Claudio Atzori
45f625d14f
Merge branch 'beta' into organizationToRepresentative
2023-05-02 10:46:55 +02:00
Claudio Atzori
de11edca98
Merge branch 'beta' into organizationToRepresentative
2023-05-02 09:59:41 +02:00
Claudio Atzori
851f664bd9
Merge branch 'beta' into graph_cleaning_refactoring
2023-05-02 09:55:40 +02:00
dimitrispie
e57ecdaf98
Update step20-createMonitorDB.sql
...
Add University of Manitoba
2023-04-30 17:52:23 +03:00
Ilias Kanellos
90332439ad
Remove deletion of synonym folder
2023-04-28 13:45:19 +03:00
Ilias Kanellos
a98da54896
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-04-28 13:23:49 +03:00
Ilias Kanellos
09485fbee3
Fixed unicode bug. Workflow ends after first script
2023-04-28 13:09:13 +03:00
Serafeim Chatzopoulos
614cc1089b
Add separate forder for results && project actionsets
2023-04-27 12:37:15 +03:00
Serafeim Chatzopoulos
815a4ddbba
Add actionset creation for project bip indicators in workflow
2023-04-26 20:40:06 +03:00
Serafeim Chatzopoulos
ee04cf92bf
Add actionsets for project impact indicators
2023-04-26 20:23:46 +03:00
dimitrispie
fdb5d2b39f
Bug fixes
2023-04-23 18:29:00 +03:00
dimitrispie
53ce023035
Bug fixes
2023-04-23 18:23:45 +03:00
dimitrispie
4fa750b719
Bug fixes on monitor-update
2023-04-19 17:39:53 +03:00
dimitrispie
5247cb7115
Bug fix
2023-04-19 11:11:19 +03:00
Miriam Baglioni
efc4f6a658
[bulkTag] refactor to enrich each result single step
2023-04-18 17:39:31 +02:00
Serafeim Chatzopoulos
23f58a86f1
Change jar param in project impact indicators action
2023-04-18 12:26:01 +03:00
Miriam Baglioni
697a134504
-
2023-04-18 10:21:12 +02:00
Miriam Baglioni
6cc95c96a2
-
2023-04-18 09:53:11 +02:00
dimitrispie
25dafccc24
Merge branch 'hive' into beta
2023-04-12 11:36:59 +03:00
Claudio Atzori
a2dcb06daf
added eoscifguidelines in the result view; removed compute statistics statements
2023-04-11 10:43:32 +02:00
Serafeim Chatzopoulos
7256c8d3c7
Add script for aggregating impact indicators at the project level
2023-04-07 16:30:12 +03:00
dimitrispie
c85de8fa1f
-Added Technological University Dublin
...
-Added project_organization_contribution table
-Add Delft University of Technology
2023-04-07 09:22:59 +03:00
dimitrispie
9b41dff33c
Update step20-createMonitorDB.sql
...
Added Delft University of Technology
2023-04-07 09:21:38 +03:00
Miriam Baglioni
932d07d2dd
[bulkTag] added filtering for datasources in eosctag
2023-04-06 15:08:27 +02:00
Miriam Baglioni
287753417d
better implementation for the fix
2023-04-06 12:22:38 +02:00
Miriam Baglioni
b42abc9904
fixed issue on bulktagging for the advanced constraints
2023-04-06 12:15:00 +02:00
dimitrispie
91e18ac7f4
Added project_organization_contribution table
2023-04-06 10:53:11 +03:00
Miriam Baglioni
b25b401065
added test to verify the advconstraints to dth community. inserted some additional logs.
2023-04-05 12:18:39 +02:00
Claudio Atzori
864f4051d3
[graph cleaning] added missing case
2023-04-05 11:35:47 +02:00
Claudio Atzori
dead87917f
[graph cleaning] cleanup
2023-04-04 13:13:43 +02:00
Claudio Atzori
2a6ba29b64
[graph cleaning] unit tests & cleanup
2023-04-04 12:34:51 +02:00
dimitrispie
9e1335df4c
-Added Technological University Dublin
...
-Added project_organization_contribution table
2023-04-04 13:22:40 +03:00
Claudio Atzori
63b8bbc015
[graph to Solr] using dedicated sparkExecutorCores, sparkExecutorMemory, sparkDriverMemory in convert_to_xml
2023-03-24 13:43:20 +01:00
Claudio Atzori
b502f86523
fixed input path supplemented to GetDatasourceFromCountry; adjusted the various spark.sql.shuffle.partitions
2023-03-24 13:09:12 +01:00
Claudio Atzori
c07857fa37
[graph cleaning] unit tests & cleanup
2023-03-23 15:57:47 +01:00
Claudio Atzori
90e61a8aba
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-23 15:03:26 +01:00
Claudio Atzori
308e10d102
serialising: 1. measures for all the entity types and 2. result level fulltext
2023-03-23 11:23:22 +01:00
Claudio Atzori
488d9a5eaa
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-23 10:41:13 +01:00
dimitrispie
fad7fa4af8
Added Technological University Dublin
2023-03-22 09:44:00 +02:00
Serafeim Chatzopoulos
102aa5ab81
Add dependency to dhp-aggregation
2023-03-21 19:25:29 +02:00
Serafeim Chatzopoulos
3e8a4cf952
Rearrange resources folder structure
2023-03-21 18:25:55 +02:00
Serafeim Chatzopoulos
f992ecb657
Checkout BIP-Ranker during 'prepare-package' && add it in the oozie-package.tar.gz
2023-03-21 18:03:55 +02:00
Ilias Kanellos
9dc8f0f05f
Add ActionSet step
2023-03-21 16:14:15 +02:00
Claudio Atzori
4f5ba0ed52
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-21 14:41:20 +01:00
Ilias Kanellos
b5c252865c
Add filtering based on citation source
2023-03-20 15:38:36 +02:00
Claudio Atzori
6d3d18d8b5
[graph cleaning] WIP: refactoring of the cleaning stages
2023-03-16 17:23:36 +01:00
dimitrispie
43b23a9bf3
Update step20-createMonitorDB.sql
...
Added Technological University Dublin
2023-03-15 09:57:12 +02:00
Serafeim Chatzopoulos
720fd19b39
Add dhp-impact-indicators workflow files
2023-03-14 19:28:27 +02:00
Serafeim Chatzopoulos
c6e39b7f33
Add dhp-impact-indicators
2023-03-14 18:50:54 +02:00
Claudio Atzori
518618f1a9
[graph cleaning] avoid to overwrite the subject class to 'keyword' for those with provenance 'subject:fos'
2023-03-14 15:22:47 +01:00
Claudio Atzori
41e00bcd07
[graph provision] avoid to parse again the XML records, apparently the escaped XML characters get unescaped invalidating the record
2023-03-13 15:19:49 +01:00
Claudio Atzori
24e2fd828b
code formatting
2023-03-08 21:17:08 +01:00
Claudio Atzori
e28d395e87
[aggregator graph] using dedicated path to sync claims, adjusted paths with wildcards
2023-03-08 21:16:52 +01:00
Claudio Atzori
5b8fd37314
[aggregator graph] using dedicated path to sync claims
2023-03-08 15:28:14 +01:00
Claudio Atzori
7fd89566c2
[aggregator graph] handle paths including wildcards
2023-03-08 12:43:00 +01:00
Miriam Baglioni
588aca5ce4
Merge pull request 'h2020classification' ( #280 ) from h2020classification into beta
...
Reviewed-on: D-Net/dnet-hadoop#280
2023-03-03 09:29:10 +01:00
Claudio Atzori
8ec0d62d91
pre-group the records in each table before joning the contents from BETA and PROD together
2023-03-02 14:49:19 +01:00
Miriam Baglioni
0fff98a14c
[ECclassification] removed print
2023-03-02 11:46:57 +01:00
Miriam Baglioni
b0c2f7e526
[ECclassification] removed not needed resources
2023-03-02 11:44:48 +01:00
Miriam Baglioni
d4fc62c2f6
mergin with branch beta
2023-03-02 11:14:54 +01:00
Miriam Baglioni
de8ad1caef
[ECclassification] new implementation for the H2020 classification
2023-03-02 11:14:03 +01:00
Claudio Atzori
db9dad4aa7
[actionmanager] increased spark.sql.shuffle.partitions for publication, dataset, relation records
2023-03-02 09:11:37 +01:00
Miriam Baglioni
c1f9848953
[ECclassification] added new classes
2023-03-01 15:29:11 +01:00
Claudio Atzori
6f488547a7
ignore non processable records
2023-03-01 14:49:51 +01:00
Claudio Atzori
7d263f265e
adjusted logs
2023-03-01 11:58:07 +01:00
Claudio Atzori
16ad42e8f3
code formatting
2023-03-01 10:22:13 +01:00
Claudio Atzori
9c59dac859
followup changes reorganising the mdstore synchronisation mechanism
2023-03-01 10:16:20 +01:00
Miriam Baglioni
ad745c0aa3
[CrossrefFunderMapping] fixed issueson funder name
2023-02-28 14:58:27 +01:00
Miriam Baglioni
4f2df876cd
[ECclassification] new implementation first try
2023-02-28 14:44:00 +01:00