Serafeim Chatzopoulos
7cefe2665b
Remove unnecessary classes
2023-07-28 19:14:39 +03:00
Serafeim Chatzopoulos
26a92ce762
Merge branch '8876' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8876
2023-07-28 19:03:57 +03:00
Serafeim Chatzopoulos
ebfba38ab6
Add changes from code review
2023-07-28 19:03:47 +03:00
Serafeim Chatzopoulos
eb8684a8cf
Merge branch 'beta' into 8876
2023-07-28 13:39:33 +02:00
Claudio Atzori
1275a07d45
Merge pull request '[graph indexing] expand the instance level fulltext in the XML records' ( #326 ) from instance_fulltext_xml into beta
...
Reviewed-on: D-Net/dnet-hadoop#326
2023-07-27 15:02:07 +02:00
Claudio Atzori
a72b9e96ac
expand the instance level fulltext in the XML records
2023-07-27 14:57:38 +02:00
Claudio Atzori
8c63e4a864
Merge pull request 'Refactor Dedup using Spark Dataframe API, initial support for scala 2.12 and Spark 3.4' ( #324 ) from dedup-with-dataframe-2 into beta
...
Reviewed-on: D-Net/dnet-hadoop#324
2023-07-25 10:17:17 +02:00
Giambattista Bloisi
e64c2854a3
Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
...
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Giambattista Bloisi
bb5b845e3c
Use scala.binary.version property to resolve scala maven dependencies
...
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
Claudio Atzori
002b24e06f
Merge pull request '[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests' ( #315 ) from pid_cleaning into beta
...
Reviewed-on: D-Net/dnet-hadoop#315
2023-07-24 10:49:44 +02:00
Claudio Atzori
c754397a19
Merge branch 'beta' into pid_cleaning
2023-07-24 10:49:31 +02:00
Claudio Atzori
f0678cda09
Merge pull request 'fix_beta_tests' ( #323 ) from fix_beta_tests into beta
...
Reviewed-on: D-Net/dnet-hadoop#323
2023-07-24 10:47:35 +02:00
Giambattista Bloisi
f03153823a
Update testCitationRelations number of expected citations according to changes made in 0559d8b4
(monodirectional citations)
2023-07-21 10:48:28 +02:00
Giambattista Bloisi
54c1eacef1
SparkJobTest was failing because testing workingdir was not cleaned up after eact test
2023-07-21 10:42:24 +02:00
Giambattista Bloisi
5e15f20e6e
Fix entityMerger that was excluding the authors of the first entity in the list to merge
2023-07-21 00:46:54 +02:00
Giambattista Bloisi
0210a14e43
Ignore timestamp differences in PromoteActionPayloadForGraphTableJobTest
2023-07-20 23:45:57 +02:00
Giambattista Bloisi
dba34505de
Fix SparkStatsTest bug where parquet tables were incorrectly read as text files leading to unpredictable count() values
2023-07-19 14:24:52 +02:00
Giambattista Bloisi
e47ed1fdb2
Use DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES in json mapper to avoid that tests fail if they encounter unmapped properties
2023-07-19 14:21:40 +02:00
Giambattista Bloisi
38dfebfbe6
Disable MdStoreClientTest test as it requires a local mongodb running and it does not perform any assertions
2023-07-19 14:18:56 +02:00
Serafeim Chatzopoulos
be320ba3c1
Indentation fixes
2023-07-17 16:04:21 +03:00
Serafeim Chatzopoulos
bc1a4611aa
Minor changes
2023-07-17 11:17:53 +03:00
Giambattista Bloisi
ef493681d9
Merge pull request 'Import dnet-pace-core module in this project and use it after renaming to dhp-pace-core' ( #319 ) from beta_with_pace_core into beta
...
Reviewed-on: D-Net/dnet-hadoop#319
2023-07-11 14:03:15 +02:00
Serafeim Chatzopoulos
4eba14a80e
Add oozie workflow
2023-07-06 21:07:50 +03:00
Serafeim Chatzopoulos
c2998a14e8
Add basic tests for affiliation relations
2023-07-06 20:28:16 +03:00
Serafeim Chatzopoulos
bc7b00bcd1
Add bi-directional affiliation relations
2023-07-06 18:29:15 +03:00
Serafeim Chatzopoulos
12528ed2ef
Refactor PrepareAffiliationRelations.java to use OafMapperUtils common functions
2023-07-06 18:08:33 +03:00
Serafeim Chatzopoulos
bbc245696e
Prepare actionsets for BIP affiliations
2023-07-06 15:56:12 +03:00
Giambattista Bloisi
801da2fd4a
New sources formatted by maven plugin
2023-07-06 10:28:53 +02:00
Giambattista Bloisi
bd3fcf869a
rename dnet-pace-core into dhp-pace-core module and use it as dependency in other modules
2023-07-06 10:02:23 +02:00
Serafeim Chatzopoulos
347a889b20
Read affiliation relations
2023-07-06 00:51:01 +03:00
Giambattista Bloisi
3b35db5fbd
Import dnet-pace-core module from dnet-dedup repository
2023-07-05 22:23:06 +02:00
Miriam Baglioni
7738372125
[UsageCount] fixed typo in attribute name for datasource table
2023-06-30 18:56:41 +02:00
Sandro La Bruzzo
9963fd6d29
updated log to add subentity
2023-06-28 13:36:05 +02:00
Sandro La Bruzzo
ed7e2ab6d1
reverted mistake on commit workflow.xml
2023-06-28 11:40:19 +02:00
Sandro La Bruzzo
9910ce06ae
added to CreateSimRel the feature to write time log
2023-06-28 11:38:16 +02:00
Miriam Baglioni
2717edafb7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-06-28 11:25:14 +02:00
Miriam Baglioni
2f04c9d149
[BulkTagging] fixing left over for test
2023-06-28 11:24:42 +02:00
Sandro La Bruzzo
bd17c3edc8
added to CreateSimRel the feature to write time log
2023-06-28 11:20:58 +02:00
Sandro La Bruzzo
b195da3a83
Added utility to write time logs during the deduplication phase
2023-06-28 11:20:09 +02:00
Claudio Atzori
0f5a819f44
[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests
2023-06-23 16:10:49 +02:00
Michele Artini
88a1cbc37d
fixed a datasource id
2023-06-22 07:56:33 +02:00
Claudio Atzori
b0ebf56367
Merge pull request 'Update step15_5.sql' ( #314 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#314
2023-06-21 10:33:22 +02:00
dimitrispie
2b6370eaee
Update step15_5.sql
...
Bug fix
2023-06-21 11:31:10 +03:00
Claudio Atzori
35e42a86ed
Merge pull request 'Update step15_5.sql' ( #313 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#313
2023-06-21 10:26:16 +02:00
dimitrispie
74cb060bfe
Update step15_5.sql
...
Add "if not exists" clause
2023-06-21 11:24:06 +03:00
Claudio Atzori
85e016df17
Merge pull request 'Update step16-createIndicatorsTables.sql' ( #312 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#312
2023-06-21 09:52:33 +02:00
dimitrispie
a475cfcb7b
Update step16-createIndicatorsTables.sql
...
Rename a field in indi_pub_interdisciplinarity
2023-06-21 10:42:02 +03:00
Claudio Atzori
979cf9cd87
Merge pull request 'Update step15.sql' ( #311 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#311
2023-06-21 09:20:01 +02:00
dimitrispie
4648cd88d4
Update step15.sql
...
Cast score to double
2023-06-21 10:02:19 +03:00
dimitrispie
94d2573c77
Update step15.sql
...
Bug Fix
2023-06-21 09:22:39 +03:00