dimitrispie
86f4f63daf
Updates to steps related to transfer data to impala cluster
...
1. Remove external table definitions in stats_ext
2. Fix the issue where some views are not created.
3. Added two workflow parameters for copying also the usage stats dbs
2023-05-18 09:33:05 +03:00
Claudio Atzori
909729a2fc
[dedup] tweaking num partitions, minor changes
2023-05-17 10:16:22 +02:00
Ilias Kanellos
38020e242a
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-05-16 17:34:53 +03:00
Ilias Kanellos
3d69f33c84
Fix selection of columns in graph creation
2023-05-16 17:34:42 +03:00
Ilias Kanellos
3c38f7ba6f
Fix selection of columns in graph creation
2023-05-16 17:32:53 +03:00
Serafeim Chatzopoulos
8ef718c363
Fix workflow application path
2023-05-16 16:28:48 +03:00
Serafeim Chatzopoulos
26328e2a0d
Move job.properties
2023-05-16 14:39:53 +03:00
Serafeim Chatzopoulos
4eec3e7052
Add jobTracker, nameNode && spark2Lib as global params in oozie wf
2023-05-15 22:28:48 +03:00
Serafeim Chatzopoulos
b83135c252
Add missing kill nodes in workflow.xml
2023-05-15 19:55:35 +03:00
Serafeim Chatzopoulos
45f2aa0867
Move end node ... at the end in workflow.xml
2023-05-15 17:52:20 +03:00
Claudio Atzori
8acad52a0c
Merge branch 'beta' into apc_affiliation
2023-05-15 15:47:33 +02:00
Claudio Atzori
8a463cc3e8
fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common
2023-05-15 15:44:46 +02:00
Serafeim Chatzopoulos
12a57e1f58
Resolve conflicts
2023-05-15 16:20:11 +03:00
Serafeim Chatzopoulos
82e2a96f51
Resolve conflicts
2023-05-15 15:53:12 +03:00
Serafeim Chatzopoulos
b8e8c959fe
Update workflow.xml && job.properties
2023-05-15 15:50:23 +03:00
Ilias Kanellos
4a905932a3
Spark properties from job.properties
2023-05-15 15:24:22 +03:00
Claudio Atzori
0c314d5e09
Merge pull request 'Update copyDataToImpalaCluster.sh' ( #293 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #293
2023-05-15 12:05:54 +02:00
Serafeim Chatzopoulos
07818131ef
Update documentation
2023-05-15 13:04:44 +03:00
dimitrispie
b3f9633205
Update copyDataToImpalaCluster.sh
...
Added option --user to impala-shell command
2023-05-15 12:51:44 +03:00
Miriam Baglioni
78b07400c0
changed test classes
2023-05-15 11:37:08 +02:00
Miriam Baglioni
86fe886c1a
removed the inverse of the Citing relation
2023-05-15 11:20:51 +02:00
Ilias Kanellos
1788ac2d4d
Correct filtering for MAG records
2023-05-12 12:55:43 +03:00
Miriam Baglioni
12cd179d2d
Merge pull request 'Update copyDataToImpalaCluster.sh' ( #291 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #291
2023-05-12 11:36:34 +02:00
dimitrispie
00d0d162b6
Update copyDataToImpalaCluster.sh
...
Added a temporary folder to copy the files to avoid permission issues
2023-05-12 12:31:13 +03:00
Ilias Kanellos
5ddbb4ad10
Spark properties no longer hardcoded
2023-05-11 15:36:47 +03:00
Ilias Kanellos
3de35fd6a3
Produce 5 classes of ranking scores
2023-05-11 14:42:25 +03:00
Miriam Baglioni
8c05f49665
moved the version as it was before the change
2023-05-09 10:48:34 +02:00
Miriam Baglioni
99ac5bab46
added check to avoid NPE when checking the organization country
2023-05-04 19:38:39 +02:00
Claudio Atzori
0704e186f6
Merge pull request 'Stats wf executed on hive only' ( #283 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #283
2023-05-02 14:05:12 +02:00
Claudio Atzori
d8882c4481
extended mapping applied to datacite records to produce affiliations using the ROR ids. Inc ase of APCs it includes the amount and the currently in the relation
2023-05-02 11:56:51 +02:00
dimitrispie
c3d58e58e1
Bug fixes
2023-05-02 11:54:07 +03:00
Claudio Atzori
abd7ca0c18
Merge branch 'beta' into bulkTagRefactor
2023-05-02 10:50:01 +02:00
Claudio Atzori
45f625d14f
Merge branch 'beta' into organizationToRepresentative
2023-05-02 10:46:55 +02:00
Claudio Atzori
de11edca98
Merge branch 'beta' into organizationToRepresentative
2023-05-02 09:59:41 +02:00
Claudio Atzori
851f664bd9
Merge branch 'beta' into graph_cleaning_refactoring
2023-05-02 09:55:40 +02:00
dimitrispie
e57ecdaf98
Update step20-createMonitorDB.sql
...
Add University of Manitoba
2023-04-30 17:52:23 +03:00
Ilias Kanellos
90332439ad
Remove deletion of synonym folder
2023-04-28 13:45:19 +03:00
Ilias Kanellos
a98da54896
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-04-28 13:23:49 +03:00
Ilias Kanellos
09485fbee3
Fixed unicode bug. Workflow ends after first script
2023-04-28 13:09:13 +03:00
Serafeim Chatzopoulos
614cc1089b
Add separate forder for results && project actionsets
2023-04-27 12:37:15 +03:00
Serafeim Chatzopoulos
815a4ddbba
Add actionset creation for project bip indicators in workflow
2023-04-26 20:40:06 +03:00
Serafeim Chatzopoulos
ee04cf92bf
Add actionsets for project impact indicators
2023-04-26 20:23:46 +03:00
dimitrispie
fdb5d2b39f
Bug fixes
2023-04-23 18:29:00 +03:00
dimitrispie
53ce023035
Bug fixes
2023-04-23 18:23:45 +03:00
dimitrispie
4fa750b719
Bug fixes on monitor-update
2023-04-19 17:39:53 +03:00
dimitrispie
5247cb7115
Bug fix
2023-04-19 11:11:19 +03:00
Miriam Baglioni
efc4f6a658
[bulkTag] refactor to enrich each result single step
2023-04-18 17:39:31 +02:00
Serafeim Chatzopoulos
23f58a86f1
Change jar param in project impact indicators action
2023-04-18 12:26:01 +03:00
Miriam Baglioni
697a134504
-
2023-04-18 10:21:12 +02:00
Miriam Baglioni
6cc95c96a2
-
2023-04-18 09:53:11 +02:00
dimitrispie
25dafccc24
Merge branch 'hive' into beta
2023-04-12 11:36:59 +03:00
Claudio Atzori
a2dcb06daf
added eoscifguidelines in the result view; removed compute statistics statements
2023-04-11 10:43:32 +02:00
Serafeim Chatzopoulos
7256c8d3c7
Add script for aggregating impact indicators at the project level
2023-04-07 16:30:12 +03:00
dimitrispie
c85de8fa1f
-Added Technological University Dublin
...
-Added project_organization_contribution table
-Add Delft University of Technology
2023-04-07 09:22:59 +03:00
dimitrispie
9b41dff33c
Update step20-createMonitorDB.sql
...
Added Delft University of Technology
2023-04-07 09:21:38 +03:00
Miriam Baglioni
932d07d2dd
[bulkTag] added filtering for datasources in eosctag
2023-04-06 15:08:27 +02:00
Miriam Baglioni
287753417d
better implementation for the fix
2023-04-06 12:22:38 +02:00
Miriam Baglioni
b42abc9904
fixed issue on bulktagging for the advanced constraints
2023-04-06 12:15:00 +02:00
dimitrispie
91e18ac7f4
Added project_organization_contribution table
2023-04-06 10:53:11 +03:00
Miriam Baglioni
b25b401065
added test to verify the advconstraints to dth community. inserted some additional logs.
2023-04-05 12:18:39 +02:00
Claudio Atzori
864f4051d3
[graph cleaning] added missing case
2023-04-05 11:35:47 +02:00
Claudio Atzori
dead87917f
[graph cleaning] cleanup
2023-04-04 13:13:43 +02:00
Claudio Atzori
2a6ba29b64
[graph cleaning] unit tests & cleanup
2023-04-04 12:34:51 +02:00
dimitrispie
9e1335df4c
-Added Technological University Dublin
...
-Added project_organization_contribution table
2023-04-04 13:22:40 +03:00
Claudio Atzori
63b8bbc015
[graph to Solr] using dedicated sparkExecutorCores, sparkExecutorMemory, sparkDriverMemory in convert_to_xml
2023-03-24 13:43:20 +01:00
Claudio Atzori
b502f86523
fixed input path supplemented to GetDatasourceFromCountry; adjusted the various spark.sql.shuffle.partitions
2023-03-24 13:09:12 +01:00
Claudio Atzori
c07857fa37
[graph cleaning] unit tests & cleanup
2023-03-23 15:57:47 +01:00
Claudio Atzori
90e61a8aba
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-23 15:03:26 +01:00
Claudio Atzori
308e10d102
serialising: 1. measures for all the entity types and 2. result level fulltext
2023-03-23 11:23:22 +01:00
Claudio Atzori
488d9a5eaa
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-23 10:41:13 +01:00
dimitrispie
fad7fa4af8
Added Technological University Dublin
2023-03-22 09:44:00 +02:00
Serafeim Chatzopoulos
102aa5ab81
Add dependency to dhp-aggregation
2023-03-21 19:25:29 +02:00
Serafeim Chatzopoulos
3e8a4cf952
Rearrange resources folder structure
2023-03-21 18:25:55 +02:00
Serafeim Chatzopoulos
f992ecb657
Checkout BIP-Ranker during 'prepare-package' && add it in the oozie-package.tar.gz
2023-03-21 18:03:55 +02:00
Ilias Kanellos
9dc8f0f05f
Add ActionSet step
2023-03-21 16:14:15 +02:00
Claudio Atzori
4f5ba0ed52
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-21 14:41:20 +01:00
Ilias Kanellos
b5c252865c
Add filtering based on citation source
2023-03-20 15:38:36 +02:00
Claudio Atzori
6d3d18d8b5
[graph cleaning] WIP: refactoring of the cleaning stages
2023-03-16 17:23:36 +01:00
dimitrispie
43b23a9bf3
Update step20-createMonitorDB.sql
...
Added Technological University Dublin
2023-03-15 09:57:12 +02:00
Serafeim Chatzopoulos
720fd19b39
Add dhp-impact-indicators workflow files
2023-03-14 19:28:27 +02:00
Serafeim Chatzopoulos
c6e39b7f33
Add dhp-impact-indicators
2023-03-14 18:50:54 +02:00
Claudio Atzori
518618f1a9
[graph cleaning] avoid to overwrite the subject class to 'keyword' for those with provenance 'subject:fos'
2023-03-14 15:22:47 +01:00
Claudio Atzori
41e00bcd07
[graph provision] avoid to parse again the XML records, apparently the escaped XML characters get unescaped invalidating the record
2023-03-13 15:19:49 +01:00
Claudio Atzori
24e2fd828b
code formatting
2023-03-08 21:17:08 +01:00
Claudio Atzori
e28d395e87
[aggregator graph] using dedicated path to sync claims, adjusted paths with wildcards
2023-03-08 21:16:52 +01:00
Claudio Atzori
5b8fd37314
[aggregator graph] using dedicated path to sync claims
2023-03-08 15:28:14 +01:00
Claudio Atzori
7fd89566c2
[aggregator graph] handle paths including wildcards
2023-03-08 12:43:00 +01:00
Miriam Baglioni
588aca5ce4
Merge pull request 'h2020classification' ( #280 ) from h2020classification into beta
...
Reviewed-on: #280
2023-03-03 09:29:10 +01:00
Claudio Atzori
8ec0d62d91
pre-group the records in each table before joning the contents from BETA and PROD together
2023-03-02 14:49:19 +01:00
Miriam Baglioni
0fff98a14c
[ECclassification] removed print
2023-03-02 11:46:57 +01:00
Miriam Baglioni
b0c2f7e526
[ECclassification] removed not needed resources
2023-03-02 11:44:48 +01:00
Miriam Baglioni
d4fc62c2f6
mergin with branch beta
2023-03-02 11:14:54 +01:00
Miriam Baglioni
de8ad1caef
[ECclassification] new implementation for the H2020 classification
2023-03-02 11:14:03 +01:00
Claudio Atzori
db9dad4aa7
[actionmanager] increased spark.sql.shuffle.partitions for publication, dataset, relation records
2023-03-02 09:11:37 +01:00
Miriam Baglioni
c1f9848953
[ECclassification] added new classes
2023-03-01 15:29:11 +01:00
Claudio Atzori
6f488547a7
ignore non processable records
2023-03-01 14:49:51 +01:00
Claudio Atzori
7d263f265e
adjusted logs
2023-03-01 11:58:07 +01:00
Claudio Atzori
16ad42e8f3
code formatting
2023-03-01 10:22:13 +01:00
Claudio Atzori
9c59dac859
followup changes reorganising the mdstore synchronisation mechanism
2023-03-01 10:16:20 +01:00
Miriam Baglioni
ad745c0aa3
[CrossrefFunderMapping] fixed issueson funder name
2023-02-28 14:58:27 +01:00
Miriam Baglioni
4f2df876cd
[ECclassification] new implementation first try
2023-02-28 14:44:00 +01:00
Claudio Atzori
2f7346e9cf
WIP monodirectional citations, Datacite
2023-02-28 13:30:51 +01:00
Claudio Atzori
0559d8b412
WIP monodirectional citations
2023-02-28 10:57:32 +01:00
Sandro La Bruzzo
69fa616490
removed wrong content
2023-02-28 10:27:38 +01:00
Sandro La Bruzzo
832a75d012
added mapping for crossref funder
2023-02-28 10:16:34 +01:00
Sandro La Bruzzo
78e51c182a
Added missing parametero to raw all workflow
2023-02-28 10:16:01 +01:00
Claudio Atzori
7aebedb43c
code formatting
2023-02-27 11:51:27 +01:00
Miriam Baglioni
80987801d7
[FoS] added check for null on level1 subject
2023-02-27 11:40:22 +01:00
Claudio Atzori
31e97c2a6b
[unresolved entities] updated oozie wf node labels
2023-02-27 11:38:29 +01:00
Miriam Baglioni
23112929e9
[FoS] changed the default separator from comma to tab to solve the issue in subject value split
2023-02-27 10:18:39 +01:00
Serafeim Chatzopoulos
0b5bf53b45
Remove unecessary indexed fields from Solr
2023-02-23 12:42:42 +02:00
dimitrispie
1547611246
Merge branch 'beta' into hive
2023-02-22 16:57:12 +02:00
Michele Artini
fddcf701e9
updated the order of the compatibilities
2023-02-22 12:07:09 +01:00
Claudio Atzori
0c1be41b30
code formatting
2023-02-22 10:15:25 +01:00
Claudio Atzori
99cd7761aa
cleanup of non necessary dhp-monitor-update workflow
2023-02-22 10:10:22 +01:00
Claudio Atzori
cd3a51a15f
Merge branch 'beta' into 8232-mdstore-synch-improve
2023-02-22 09:57:07 +01:00
Claudio Atzori
477a7c416f
Merge branch 'beta' into UsageCountOnProjectAndDatasource
2023-02-22 09:55:51 +01:00
Claudio Atzori
c20c1c9159
Merge pull request 'Added 4 institutions:' ( #261 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #261
2023-02-22 09:53:45 +01:00
Miriam Baglioni
d617c3e812
[DOIBoost] extended mapping for funder #8407
2023-02-20 14:45:27 +01:00
dimitrispie
90807b60c7
Changes to monitor wf
2023-02-20 10:42:24 +02:00
dimitrispie
d2f9ccf934
Changes to separate monitor wf
2023-02-20 10:41:21 +02:00
dimitrispie
032a401cbf
Bug fixes
2023-02-20 09:29:20 +02:00
Miriam Baglioni
016337a0f9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-02-16 15:54:59 +01:00
Sandro La Bruzzo
118c1fc3b3
Merge remote-tracking branch 'origin/beta' into beta
2023-02-15 10:29:28 +01:00
Sandro La Bruzzo
a8ac79fa25
Added citation relation on crossref Mapping
2023-02-15 10:29:13 +01:00
dimitrispie
595192d510
Bug fix
2023-02-14 16:24:08 +02:00
dimitrispie
f3aaff3688
Remove duplicate orgs
2023-02-14 09:48:36 +02:00
Claudio Atzori
9a03f71db1
code formatting
2023-02-13 16:25:47 +01:00
Michele Artini
554df257ab
null values in date range conditions
2023-02-13 16:15:32 +01:00
dimitrispie
3400133c2f
Bug fix
2023-02-13 09:44:00 +02:00
dimitrispie
935db0ab25
Added organizations for Monitor
2023-02-13 09:29:09 +02:00
dimitrispie
7b78b15c81
Changes for copying to Impala Cluster
2023-02-13 09:27:00 +02:00
Miriam Baglioni
5cf902a2b0
[UsageCount] changed query to make the sum be computed via sql instead of grouping
2023-02-10 16:16:37 +01:00
Miriam Baglioni
f803530df6
[UsageCount] fixed query
2023-02-10 15:50:56 +01:00
Miriam Baglioni
bb5bba51b3
[UsageCount] extended test
2023-02-09 19:08:30 +01:00
Miriam Baglioni
85e53fad00
[UsageCount] addition of usagecount for Projects and datasources. Extention of the action set created for the results with new entities for projects and datasources. Extention of the resource set and modification of the testing class
2023-02-09 18:59:45 +01:00
dimitrispie
d71f5672d3
Add monitor post step
2023-02-09 13:44:14 +02:00
dimitrispie
35ba8bb328
Bug fixes
2023-02-09 12:57:57 +02:00
Sandro La Bruzzo
8920932dd8
Code formatted
2023-02-08 11:34:18 +01:00
Sandro La Bruzzo
0b9819f1ab
Code formatted
2023-02-08 10:32:33 +01:00
Sandro La Bruzzo
6c81a161d2
Merge remote-tracking branch 'origin/beta' into 8231-mdstore-synch-improve
2023-02-08 10:29:09 +01:00
dimitrispie
3ba11d64a1
Changes 07022023
2023-02-07 12:53:51 +02:00
dimitrispie
98c34263ed
Update step20-createMonitorDB.sql
...
Add University of Cape Town organization
2023-02-07 08:14:48 +02:00
dimitrispie
2dc6d47270
Changes 06022023
2023-02-06 13:18:53 +02:00
dimitrispie
973d78a4d6
Update step15_5.sql
...
Added unpaywalls open access colors
2023-02-02 08:03:54 +02:00
Claudio Atzori
d05ca53a14
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-31 14:39:53 +01:00
Miriam Baglioni
e82e009b46
added missing close tag for XML produced by the xquery to get information for the community from the IS
2023-01-31 10:19:34 +01:00
Miriam Baglioni
b254a0375f
[Affiliation from institutionalrepo] changed the field to check to verify the datasource type. Now it is in the field jurisdiction
2023-01-26 16:51:20 +01:00
dimitrispie
cf58e4a5e4
Added Arts et Métiers ParisTech
2023-01-25 16:03:16 +02:00
dimitrispie
db7d625ba9
Addedd Arts et Métiers ParisTech organization
2023-01-25 12:22:21 +02:00
Claudio Atzori
505867bce9
[bulk tagging] better node naming
2023-01-20 16:13:16 +01:00
Miriam Baglioni
ecd398fe51
refactoring
2023-01-20 14:23:45 +01:00
Miriam Baglioni
0a5c6010b0
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-13 16:14:46 +01:00
dimitrispie
4d7553c9f1
Bug fixes
2023-01-12 17:19:19 +02:00
dimitrispie
dd70c32ad7
Bug fixes
2023-01-12 17:18:05 +02:00
dimitrispie
51f7ab5864
Bug fixes
2023-01-12 17:15:06 +02:00
dimitrispie
34d4bf727c
Bug fixes
2023-01-12 11:28:37 +02:00
dimitrispie
43f6d4f296
-Monitor DB workflow
2023-01-12 11:26:47 +02:00
dimitrispie
686580a220
- New Monitor DB workflow
...
- New Organization added
2023-01-12 11:18:03 +02:00
Claudio Atzori
0a58bc7ba7
[broker] prevent NPEs
2023-01-11 14:44:14 +01:00
Claudio Atzori
04cb96001c
[broker] d40e20f437
adapted to the beta graph model
2023-01-11 10:10:12 +01:00
Michele Artini
91b845f611
Considering instance pids and alteternative identifiers
2023-01-11 09:58:54 +01:00
Miriam Baglioni
1f367122e4
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-11 09:47:44 +01:00
Michele Artini
7b7520850b
fixed an invalid char
2023-01-11 09:22:18 +01:00
Miriam Baglioni
d6895f0387
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-09 17:28:38 +01:00
dimitrispie
becb242c17
Monitor DB only Workflow
2023-01-04 16:50:29 +02:00
dimitrispie
dcb958e146
Changes to execute the stats wf only in hive
2023-01-04 11:39:01 +02:00
dimitrispie
592013d5dd
Added more steps in decision node
2022-12-23 09:43:16 +02:00
dimitrispie
2a4bf32d4c
Merge branch 'hive' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into hive
...
# Conflicts:
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step10.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step13.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step14.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step16_1-definitions.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step7.sql
2022-12-22 10:22:46 +02:00
dimitrispie
6449ff4207
1. Added a decision node to enables the workflow to make a selection on the execution path to follow
...
2. Added new organization
3. Added 5 new tables from Eurostast
2022-12-22 10:18:21 +02:00
Miriam Baglioni
8893389895
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-12-21 12:42:27 +01:00
Antonis Lempesis
c8309fe18e
addded command line params to allow hive actions to run
2022-12-21 12:41:33 +02:00
Antonis Lempesis
028873cc51
added new hive opts
2022-12-21 12:41:33 +02:00
Antonis Lempesis
1ddea4f442
removed 'stored as parquet' from views..
2022-12-21 12:41:33 +02:00
Antonis Lempesis
2754c3dd62
moving data to impala cluster and creating shadow databases there
2022-12-21 12:41:29 +02:00
Antonis Lempesis
778a1a724f
finished migration to hive only
2022-12-21 12:41:25 +02:00
Antonis Lempesis
e84dd5fe26
first
2022-12-21 12:41:23 +02:00
Sandro La Bruzzo
3c9826f186
updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline
2022-12-21 11:21:17 +01:00
Claudio Atzori
6aa91204a5
[orcid propagation] skip empty directories
2022-12-20 14:15:46 +01:00
Miriam Baglioni
6674cccb94
[BulkTag] description of parameters more comprehensive for those who do not implement it
2022-12-16 15:33:20 +01:00
Miriam Baglioni
f37113a941
[BulkTag] moving xquery to get community configuration in dedicated file
2022-12-16 15:32:26 +01:00
Miriam Baglioni
8685eaa706
[Clean Country] added test to verify remove of country
2022-12-16 15:31:25 +01:00
Miriam Baglioni
dc0ec88a58
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-12-16 13:18:32 +01:00
Miriam Baglioni
d791840b82
[Clean Country] added test to verify remove of country:
2022-12-16 13:18:29 +01:00
Claudio Atzori
7b80b24f82
[cleaning] country cleaning must use both PID and AlternateIdentifier fields
2022-12-15 14:49:04 +01:00
Claudio Atzori
b8bafab8a0
[cleaning] improved vocabulary based mapping, specialization for the strict vocab cleaning
2022-12-12 14:43:03 +01:00
Sandro La Bruzzo
5e4866d033
implemented synch for single mdstore
2022-12-12 11:29:46 +01:00
Claudio Atzori
c18b8048c3
[cleaning] avoid NPE
2022-12-10 11:41:38 +01:00
Claudio Atzori
8b44afe5e5
[cleaning] avoid NPE
2022-12-09 15:44:57 +01:00
Claudio Atzori
389dd25430
[cleaning] avoid NPE
2022-12-08 18:40:48 +01:00
Claudio Atzori
730228d73d
[cleaning] align wf parameter names in test
2022-12-08 18:40:22 +01:00
Claudio Atzori
2094fa6db0
[cleaning] align wf parameter names
2022-12-08 17:22:26 +01:00
Miriam Baglioni
a485a94956
[Cleaning] fixed parameter name in property file
2022-12-08 16:59:34 +01:00
Miriam Baglioni
3d99b78d94
[Cleaning] fixed error in parameter (workingPath to workingDir)
2022-12-08 10:25:02 +01:00
Claudio Atzori
1b8488976b
code formatting
2022-12-07 10:45:38 +01:00
Claudio Atzori
cd1b58483e
[bulk tag] fixed Community configuration parsing to void NPE
2022-12-07 10:39:00 +01:00
Claudio Atzori
062abfd669
fixed NPE, removed unused stuff
2022-12-06 12:04:00 +01:00
dimitrispie
2a52a42169
Added 4 institutions:
...
-University of Modena and Reggio Emilia
-Bilkent University
-Saints Cyril and Methodius University of Skopje
-University of Milan
2022-12-06 10:10:21 +02:00
Claudio Atzori
8248da40d9
Merge branch 'beta' into graph_cleaning
2022-12-02 14:49:00 +01:00
Claudio Atzori
ddf065756f
Merge pull request 'Two organizations are added for monitor' ( #258 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #258
2022-12-02 14:45:27 +01:00
Sandro La Bruzzo
5a48a2fb18
implemented synch for single mdstore
2022-12-01 11:34:43 +01:00
Claudio Atzori
a38116546d
Merge branch 'beta' into deduptesting
2022-11-30 11:27:29 +01:00
Miriam Baglioni
ce020f2c83
[EOSC FUTURE] added resources and test for review
2022-11-30 09:57:30 +01:00
Miriam Baglioni
bb0ddc1c44
[BulkTag] adding verb starts_with
2022-11-30 09:56:24 +01:00
Claudio Atzori
8e3edba318
[graph cleaning] testing the collectedfron and hostedby patch procedure
2022-11-29 16:07:09 +01:00
Claudio Atzori
58c05731f9
[graph cleaning] WIP: testing the collectedfron and hostedby patch procedure
2022-11-29 11:21:51 +01:00
Miriam Baglioni
9c70c5dbd6
[Bulk Tag horizontal] added new path in definition of constraint (to recognize fos subjects) - changed test and resource class to test this new aspect
2022-11-28 14:51:20 +01:00
Miriam Baglioni
0628df7a3a
resolving conflicts
2022-11-28 10:44:56 +01:00
Claudio Atzori
11695ba649
[graph cleaning] patch also the result's collectedfrom and hostedby datasource name according to the datasource master-duplicate mapping
2022-11-28 10:18:43 +01:00
Claudio Atzori
6082d235d3
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into graph_cleaning
2022-11-28 09:54:48 +01:00
Claudio Atzori
24ef301cc1
[graph cleaning] patch the result's collectedfrom and hostedby identifiers according to the datasource master-duplicate mapping
2022-11-28 09:54:18 +01:00
Alessia Bardi
90c8f9cb61
tests for EOSC Future
2022-11-23 12:18:44 +01:00
Miriam Baglioni
0e3edc5018
[Bulk Tag] fixed issue in verb name
2022-11-23 11:26:36 +01:00
Claudio Atzori
a79c47522d
updated ORCID datasource identifier
2022-11-23 10:17:49 +01:00
Alessia Bardi
2832117f23
added eoscifguidelines in test
2022-11-22 18:01:12 +01:00
Alessia Bardi
3c08269a4d
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-11-22 17:31:00 +01:00
Alessia Bardi
2687fc9f73
tests for EOSC Future review - ROhub
2022-11-22 17:30:56 +01:00
Claudio Atzori
1d5143b0b6
Merge branch 'beta' into deduptesting
2022-11-22 10:21:30 +01:00
Claudio Atzori
0aa725083f
extended dedup testing
2022-11-17 16:13:43 +01:00
Claudio Atzori
3dbc637d3e
code formatting
2022-11-17 09:55:41 +01:00
Claudio Atzori
ddff0e8999
merging duplicates using IdentifierComparator
2022-11-11 16:10:25 +01:00
Claudio Atzori
5af5a8ae42
added IdentifierComparator
2022-11-09 14:20:59 +01:00
Claudio Atzori
7c3390ac10
Merge branch 'beta' into eoscifguidelines-from-mdstores
2022-11-07 12:18:40 +01:00
dimitrispie
992fc5b628
Added McMaster University Institution
2022-11-03 11:02:18 +02:00
dimitrispie
7fda05e380
Added Autonomous University of Barcelona
2022-11-01 13:59:40 +02:00
Claudio Atzori
22873c9172
Merge pull request 'Added fields: totalcost, fundedamount, currency, in project table' ( #257 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #257
2022-10-31 13:49:27 +01:00
dimitrispie
7861c472e0
Hive memory parameters
2022-10-28 19:00:32 +03:00
dimitrispie
5df9c63963
Added fields: totalcost, fundedamount, currency, in project table
2022-10-27 16:44:26 +03:00
Sandro La Bruzzo
2b9a20a4a3
Changed the way Scholexplorer filter the relationships, I found that filter all relation coming from openCitation is wrong, because we loose a lot of relation than intersect OpenCitation, but they don't come only from there
2022-10-24 12:53:47 +02:00
Alessia Bardi
208ed32315
fixed xpath for semantic relation
2022-10-23 18:18:13 +02:00
Alessia Bardi
ee759ac92d
file format after mvn compile
2022-10-23 18:09:47 +02:00
Alessia Bardi
31a10f000b
Map the field oaf:eoscifguidelines from mdstores. Currently we can find it in ROHub metadata
2022-10-23 18:05:37 +02:00
Claudio Atzori
ec39b84898
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-10-19 15:21:02 +02:00
Claudio Atzori
bca4a61710
suppressing hyper verbose spark logs during unit test execution
2022-10-19 15:20:58 +02:00
Sandro La Bruzzo
72f0d88d6c
formatted code
2022-10-19 14:18:42 +02:00
Claudio Atzori
9b449110c6
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-10-14 15:48:04 +02:00
Claudio Atzori
ae7cd0735a
[graph2hive] more partitions
2022-10-14 15:47:58 +02:00
Sandro La Bruzzo
135cf81151
Merge remote-tracking branch 'origin/beta' into beta
2022-10-13 11:47:25 +02:00
Sandro La Bruzzo
a1f94530a3
added documentation
2022-10-13 11:47:11 +02:00
Claudio Atzori
b47aaf4dd1
[cleaning] subjects declared as belonging to specific vocabularies whose values are not found in the vocab are set to type keyword
2022-10-13 11:23:43 +02:00
Claudio Atzori
6163ecbf63
[cleaning] renamed parameters in wf action
2022-10-11 11:20:03 +02:00
Claudio Atzori
b301e9fdff
[cleaning] renamed action name/description
2022-10-11 11:08:52 +02:00
Claudio Atzori
ece40adc09
[cleaning] fixing NPE in the country cleaning phase
2022-10-11 10:10:20 +02:00
Claudio Atzori
d51275a965
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-10-07 09:52:49 +02:00
Claudio Atzori
8d97949316
[cleaning] fixed loop in wf nodes
2022-10-07 09:52:45 +02:00
Miriam Baglioni
a653e1b3ea
[Enrichment - result to community through organization] reimplementation of the data preparation step using spark
2022-10-04 15:01:28 +02:00
Miriam Baglioni
4d8339614b
Revert "[BipFinder] Fixed issue for wrong escaped char in doi"
...
This reverts commit 188f25eefa
.
2022-10-04 14:29:47 +02:00
Miriam Baglioni
7324853a17
Revert "[BipFinder] refactoring"
...
This reverts commit 28dc317350
.
2022-10-04 14:29:39 +02:00
Miriam Baglioni
28dc317350
[BipFinder] refactoring
2022-10-04 09:47:27 +02:00
Miriam Baglioni
188f25eefa
[BipFinder] Fixed issue for wrong escaped char in doi
2022-10-03 12:42:52 +02:00
Claudio Atzori
89f7007080
Merge pull request '[stats wf] misc changes' ( #254 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #254
2022-10-03 10:32:05 +02:00
dimitrispie
2c0c3f1806
Cast amount to float for table result_apcs
2022-09-28 19:33:24 +03:00
Alessia Bardi
49360770d7
map w3id as instance url
2022-09-28 14:16:39 +02:00
dimitrispie
bdc46e3eaa
Remove denormalization of results to fix downloads numbers in monitor
2022-09-28 14:59:08 +03:00
dimitrispie
2ebb1459a9
Fixed type in no_downloads
2022-09-28 14:36:57 +03:00
Miriam Baglioni
b5b5a4c192
[CleanCountry] fixed issue
2022-09-28 12:42:51 +02:00
Miriam Baglioni
f1d7d45cf7
[BulkTag] fixed issue
2022-09-28 12:01:43 +02:00
Miriam Baglioni
3ec044600d
[BulkTag] fixed conflicts
2022-09-28 11:58:28 +02:00
Miriam Baglioni
1cb79719a7
[BulkTag] fixed issues
2022-09-28 11:44:55 +02:00
Claudio Atzori
f3f7604e6c
trying to fix a test that fails only on Jenkins
2022-09-27 15:21:37 +02:00
Claudio Atzori
3f90d159e3
code formatting
2022-09-27 15:08:00 +02:00
Claudio Atzori
0b3e44e521
Merge branch 'beta' into relation-from-odf
2022-09-27 14:57:01 +02:00
Claudio Atzori
57dbeb08d2
code formatting
2022-09-27 14:55:10 +02:00
Claudio Atzori
b60985cf68
Merge branch 'beta' into horizontalConstraints
2022-09-27 14:39:31 +02:00
Claudio Atzori
3b60642ef9
Merge pull request 'Synchronize indicators in stats-db with monitor-db' ( #249 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #249
2022-09-27 14:37:33 +02:00
Claudio Atzori
25e9d92aad
Merge branch 'beta' into clean_country
2022-09-27 14:27:49 +02:00
Alessia Bardi
fd63e9bfac
Mapping all relationships supported in ModelConstants and ModelSupport
2022-09-26 11:24:13 +02:00
Miriam Baglioni
ca216a92ad
[BulkTagging] changed the query to the IS to insert values for FOS and SDG as subject in the configuration used for the tagging
2022-09-23 17:06:07 +02:00
Miriam Baglioni
3e6b0f58bb
[BulkTagging] changed the query to the IS to get also the information for the advancedConstraint from the profile
2022-09-23 16:47:19 +02:00
Miriam Baglioni
4a3e119b73
mergin with branch beta
2022-09-23 16:16:06 +02:00
Miriam Baglioni
f0e303abf9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-09-23 16:15:32 +02:00
Miriam Baglioni
55da4d8715
[BulkTagging] modifying code to represent constraints horizontally on all the results. Added subject to the set of field used to express the constraint. Modified resorces to test the new approach. Modified test calss
2022-09-23 16:02:19 +02:00
Alessia Bardi
c5eb722170
relationships from relatedIdentifier whose target id type is one of the pid type with an authority
2022-09-23 15:47:05 +02:00
Claudio Atzori
c86cc53520
suppressing hyper verbose spark logs during unit test execution
2022-09-23 15:20:40 +02:00
Alessia Bardi
ba33ff71fd
refactoring for the generation of relationships from related identifier of type 'OPENAIRE'
2022-09-23 15:17:13 +02:00
Alessia Bardi
982bcc1e35
test wrid pid and record identifier
2022-09-23 12:06:06 +02:00
Miriam Baglioni
960cb861a0
refactoring
2022-09-23 11:14:04 +02:00
Claudio Atzori
c42850328e
fixed semantic (subreltype) for ServiceOrganization relations
2022-09-22 16:23:25 +02:00
Miriam Baglioni
33bb79459e
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-09-22 15:55:17 +02:00
dimitrispie
dcd85f8cd7
- Synchronize indicators in stats-db with monitor-db
...
- added new openorg id for Nanyang Technological University
- changed openorg id for University of Helsinki #8088 ticket
2022-09-22 13:33:07 +03:00
Claudio Atzori
e45ec15221
Merge branch 'beta' into clean_country
2022-09-19 11:34:02 +02:00
Claudio Atzori
26e1badded
added instance.url syntactical validation, avoid creating multiple duplicated URLs
2022-09-19 11:19:10 +02:00
Miriam Baglioni
5240ac3d7b
[EOSC Tag] remove addition of eosc context for result with eosc if guidelines set
2022-09-19 11:02:18 +02:00
Claudio Atzori
192215a18e
merged from branch discard-non-wellformed
2022-09-19 10:17:10 +02:00
Claudio Atzori
e370e940d8
[aggregator graph] save invalid records aside for further inspection
2022-09-16 14:06:28 +02:00
Claudio Atzori
465e941214
Merge pull request '[stats wf] Changes to indicators tables' ( #244 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #244
2022-09-16 10:13:58 +02:00
Claudio Atzori
1e42d984e1
[aggregator graph] save invalid records aside for further inspection
2022-09-15 10:49:42 +02:00
Alessia Bardi
9e7ec4198f
fixed test
2022-09-14 18:08:56 +02:00
Claudio Atzori
c48f6e9c57
[aggregator graph] save invalid records aside for further inspection
2022-09-14 17:11:26 +02:00
dimitrispie
3bf3127251
Changes to monitor and indicator scripts
2022-09-14 16:36:19 +03:00
Claudio Atzori
a0919ed495
[aggregator graph] save invalid records aside for further inspection
2022-09-14 13:27:39 +02:00
Alessia Bardi
b99a011345
return empty Oaf list if record cannot be parsed
2022-09-13 11:51:55 +02:00
Alessia Bardi
27af5122d2
logs for non well formed XML files
2022-09-12 14:25:23 +02:00
Claudio Atzori
ff6f789b6d
code formatting
2022-09-09 15:16:31 +02:00
Claudio Atzori
b5d6966c01
Merge branch 'beta' into clean_country
2022-09-09 12:20:19 +02:00
Claudio Atzori
b5f7bd30be
Merge branch 'beta' into clean_subjects
2022-09-09 12:20:04 +02:00
Alessia Bardi
f14107ad77
Merge branch 'handle_as_instance_urls' of https://code-repo.d4science.org/D-Net/dnet-hadoop into handle_as_instance_urls
2022-09-09 12:17:19 +02:00
Alessia Bardi
a539c6ccaf
https for handle URLs
2022-09-09 12:16:28 +02:00
dimitrispie
71b069ca90
Changes to indicator and monitor scripts
2022-09-09 13:15:58 +03:00
Claudio Atzori
1203378441
Merge branch 'beta' into clean_subjects
2022-09-09 10:38:47 +02:00