c25ac21e5eMerge pull request 'graph cleaning, suggestions from ticket 8898' (#325) from cleaning_8898 into betaMiriam Baglioni2023-08-08 11:14:19 +0200
c334fe2438Merge pull request 'Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities' (#328) from cleanup_relations_after_dedup into betaMiriam Baglioni2023-08-08 09:49:12 +0200
0e2f855807Merge pull request 'Updates Promotion DBs' (#321) from antonis.lempesis/dnet-hadoop:beta into betaMiriam Baglioni2023-08-07 12:09:16 +0200
18fbe52b20Merge pull request 'Import affiliation relations from Crossref' (#320) from 8876 into betaMiriam Baglioni2023-08-07 10:45:30 +0200
97b6d1dc45Filter ids by dataInfo.deletedbyinference and DataInfo.invisible flags Filter relations also by dataInfo.invisible flag
#328
Giambattista Bloisi2023-08-07 10:24:11 +0200
af49424b59Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleyted by inference or that are pointing to dangling entitiesGiambattista Bloisi2023-08-04 14:27:39 +0200
1275a07d45Merge pull request '[graph indexing] expand the instance level fulltext in the XML records' (#326) from instance_fulltext_xml into betaClaudio Atzori2023-07-27 15:02:07 +0200
8c63e4a864Merge pull request 'Refactor Dedup using Spark Dataframe API, initial support for scala 2.12 and Spark 3.4' (#324) from dedup-with-dataframe-2 into betaClaudio Atzori2023-07-25 10:17:17 +0200
e64c2854a3Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface JsonPath cache contention fixed by using a ConcurrentHashMap Blacklist filtering performance improvement Minor performance improvements when evaluating similarity Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
#324
Giambattista Bloisi2023-07-18 11:38:56 +0200
bb5b845e3cUse scala.binary.version property to resolve scala maven dependencies Ensure consistent usage of maven properties Profile for compiling with scala 2.12 and Spark 3.4Giambattista Bloisi2023-07-17 17:18:46 +0200
002b24e06fMerge pull request '[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests' (#315) from pid_cleaning into betaClaudio Atzori2023-07-24 10:49:44 +0200
03da965162Format bip-score based file without doi referencesIlias Kanellos2023-07-21 13:42:30 +0300
f03153823aUpdate testCitationRelations number of expected citations according to changes made in 0559d8b4 (monodirectional citations)
#323
Giambattista Bloisi2023-07-21 10:48:28 +0200
54c1eacef1SparkJobTest was failing because testing workingdir was not cleaned up after eact testGiambattista Bloisi2023-07-21 10:42:24 +0200
5e15f20e6eFix entityMerger that was excluding the authors of the first entity in the list to mergeGiambattista Bloisi2023-07-21 00:46:54 +0200
0210a14e43Ignore timestamp differences in PromoteActionPayloadForGraphTableJobTestGiambattista Bloisi2023-07-20 23:45:57 +0200
74fcea66e6erge branch 'dedup-with-dataframe-spark34' of code-repo.d4science.org:D-Net/dnet-hadoop into dedup-with-dataframe-spark34
Sandro La Bruzzo
2023-07-19 16:55:19 +0200
e4feedd67eimproved scholix generation using bean
Sandro La Bruzzo
2023-07-19 16:53:28 +0200
dba34505deFix SparkStatsTest bug where parquet tables were incorrectly read as text files leading to unpredictable count() valuesGiambattista Bloisi2023-07-19 14:24:52 +0200
e47ed1fdb2Use DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES in json mapper to avoid that tests fail if they encounter unmapped propertiesGiambattista Bloisi2023-07-19 14:21:40 +0200
38dfebfbe6Disable MdStoreClientTest test as it requires a local mongodb running and it does not perform any assertionsGiambattista Bloisi2023-07-19 14:18:56 +0200
617ef05e15Update commons.lang.version to 3.12.0 to match spark 3.4 version and fix an incompatibility when running with Java 11Giambattista Bloisi2023-07-17 17:01:07 +0200
f1ae28fe42implemented new version of pubmed parser
Sandro La Bruzzo
2023-07-12 10:32:25 +0200
d9c72356c0provide explicit types to scala lambda to solve a scala 2.11 compiler errorGiambattista Bloisi2023-07-11 14:24:12 +0200
ef493681d9Merge pull request 'Import dnet-pace-core module in this project and use it after renaming to dhp-pace-core' (#319) from beta_with_pace_core into betaGiambattista Bloisi2023-07-11 14:03:15 +0200
acf947442amade the project compilable
Sandro La Bruzzo
2023-07-11 11:37:32 +0200
7738372125[UsageCount] fixed typo in attribute name for datasource tableMiriam Baglioni2023-06-30 18:56:41 +0200
55ea485783[UsageCount] split the count for result at the level of the datasource. for each indicator one unit is specified for each datasource contrinuting to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the valueMiriam Baglioni2023-06-30 18:39:30 +0200
890b49fb5doptimized some dedup functions
Sandro La Bruzzo
2023-06-29 14:08:58 +0200