Commit Graph

4358 Commits

Author SHA1 Message Date
Sandro La Bruzzo edf5a780b8 minor fix 2023-08-02 12:12:20 +02:00
Sandro La Bruzzo 74fcea66e6 erge branch 'dedup-with-dataframe-spark34' of code-repo.d4science.org:D-Net/dnet-hadoop into dedup-with-dataframe-spark34 2023-07-19 16:55:19 +02:00
Sandro La Bruzzo e4feedd67e improved scholix generation using bean 2023-07-19 16:53:28 +02:00
Giambattista Bloisi 617ef05e15 Update commons.lang.version to 3.12.0 to match spark 3.4 version and fix an incompatibility when running with Java 11 2023-07-17 17:01:07 +02:00
Giambattista Bloisi b6a8be813b oozie.launcher.mapreduce.user.classpath.first property is required to avoid launch problems 2023-07-14 16:05:14 +02:00
Sandro La Bruzzo f1ae28fe42 implemented new version of pubmed parser 2023-07-12 10:32:25 +02:00
Sandro La Bruzzo acf947442a made the project compilable 2023-07-11 11:37:32 +02:00
Giambattista Bloisi d80f12da06 Build with spark 3.4 (dedup and dependencies only tested) 2023-07-10 15:54:48 +02:00
Giambattista Bloisi 861c368e65 Code for testing other grouping strategies 2023-07-10 15:52:35 +02:00
Giambattista Bloisi 745e70e0d7 When generating similarities put as 'from' component the one with smaller lexicographic id 2023-07-10 15:45:49 +02:00
Giambattista Bloisi dcc08cc512 Use UDAF and Aggregation class for testing 2023-07-07 12:35:30 +02:00
Giambattista Bloisi df19548c56 small changes 2023-07-04 18:36:58 +02:00
Sandro La Bruzzo 890b49fb5d optimized some dedup functions 2023-06-29 14:08:58 +02:00
Giambattista Bloisi 3129c1c48b Allow processing of immutable sorted blocks in dedup 2023-06-28 14:01:04 +02:00
Giambattista Bloisi cb7ad9889c Fix maven dependencies warning while building 2023-06-28 14:01:04 +02:00
Claudio Atzori 75ff902f9d WIP: various refactors 2023-06-28 14:00:54 +02:00
Claudio Atzori 326367eccc WIP: various refactors 2023-06-28 14:00:22 +02:00
Claudio Atzori 521dd7f167 WIP: various refactors 2023-06-28 14:00:18 +02:00
Claudio Atzori 649679de8d WIP: various refactors 2023-06-28 13:59:11 +02:00
Sandro La Bruzzo 4c2dfcbdf7 Added first implementation using UDF function 2023-06-28 13:58:01 +02:00
Sandro La Bruzzo 9963fd6d29 updated log to add subentity 2023-06-28 13:36:05 +02:00
Sandro La Bruzzo ed7e2ab6d1 reverted mistake on commit workflow.xml 2023-06-28 11:40:19 +02:00
Sandro La Bruzzo 9910ce06ae added to CreateSimRel the feature to write time log 2023-06-28 11:38:16 +02:00
Miriam Baglioni 2717edafb7 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-06-28 11:25:14 +02:00
Miriam Baglioni 2f04c9d149 [BulkTagging] fixing left over for test 2023-06-28 11:24:42 +02:00
Sandro La Bruzzo bd17c3edc8 added to CreateSimRel the feature to write time log 2023-06-28 11:20:58 +02:00
Sandro La Bruzzo b195da3a83 Added utility to write time logs during the deduplication phase 2023-06-28 11:20:09 +02:00
Michele Artini 88a1cbc37d fixed a datasource id 2023-06-22 07:56:33 +02:00
Claudio Atzori b0ebf56367 Merge pull request 'Update step15_5.sql' (#314) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#314
2023-06-21 10:33:22 +02:00
dimitrispie 2b6370eaee Update step15_5.sql
Bug fix
2023-06-21 11:31:10 +03:00
Claudio Atzori 35e42a86ed Merge pull request 'Update step15_5.sql' (#313) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#313
2023-06-21 10:26:16 +02:00
dimitrispie 74cb060bfe Update step15_5.sql
Add "if not exists" clause
2023-06-21 11:24:06 +03:00
Claudio Atzori 85e016df17 Merge pull request 'Update step16-createIndicatorsTables.sql' (#312) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#312
2023-06-21 09:52:33 +02:00
dimitrispie a475cfcb7b Update step16-createIndicatorsTables.sql
Rename a field in indi_pub_interdisciplinarity
2023-06-21 10:42:02 +03:00
Claudio Atzori 979cf9cd87 Merge pull request 'Update step15.sql' (#311) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#311
2023-06-21 09:20:01 +02:00
dimitrispie 4648cd88d4 Update step15.sql
Cast score to double
2023-06-21 10:02:19 +03:00
dimitrispie 94d2573c77 Update step15.sql
Bug Fix
2023-06-21 09:22:39 +03:00
Claudio Atzori 0561362de2 Merge pull request 'Update step20-createMonitorDB_institutions.sql' (#309) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#309
2023-06-20 15:07:09 +02:00
Claudio Atzori 50d7dc0078 [graph enrichment] fixed projectOrganizationPath not being passed to the apply_resulttoorganization_propagation node 2023-06-19 15:42:44 +02:00
Claudio Atzori fbd9bf704e indent 2023-06-19 15:41:22 +02:00
dimitrispie be2caedb04 Update step20-createMonitorDB_institutions.sql
Add openorgs____::1624ff7c01bb641b91f4518539a0c28a Vrije Universiteit Amsterdam
2023-06-19 12:12:17 +03:00
dimitrispie 36e0a8fec4 Changes to Promotion Stats WF
1. Add new cluster host at impala-shell commands
2. Add a step for splitting monitor dbs
3. Update workflow.xml to included the new splitting monitor dbs step
2023-06-19 09:44:34 +03:00
dimitrispie 4c770a5e29 Update finalizeImpalaCluster.sh
Drop views in shadow dbs before dropping the db
2023-06-15 13:25:37 +03:00
dimitrispie e06d962a6a Update step15.sql 2023-06-15 12:20:35 +03:00
dimitrispie afcad08396 Update step20-createMonitorDB_institutions.sql
Added openorgs____::c0b262bd6eab819e4c994914f9c010e2   -- National Institute of Geophysics and Volcanology
2023-06-15 10:28:49 +03:00
Claudio Atzori b9748763e2 Merge pull request '[stats wf] Bug fixes' (#308) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#308
2023-06-14 21:57:03 +02:00
dimitrispie 42b8ce2ba4 Update copyDataToImpalaCluster.sh 2023-06-14 19:23:42 +03:00
dimitrispie 2032b0df40 Bug fixes
1. Remove tables/views from old databases in the new cluster, before dropping the dbs
2. Fix id in result_accessroute, indi_impact_measures, indi_pub_bronze_oa
2023-06-14 19:09:09 +03:00
Claudio Atzori b76a47b103 [aggregator graph] added column alias when mapping organization PIDs from the OpenOrgs database 2023-06-13 11:38:10 +02:00
Claudio Atzori 744a61a030 depending on dhp-schema:3.17.1 2023-06-12 13:49:44 +02:00