Giambattista Bloisi giambattista.bloisi
  • Joined on 2023-06-30
giambattista.bloisi pushed to dedup-with-dataframe at D-Net/dnet-hadoop 2023-07-17 16:59:02 +02:00
b1b90868ff Simplify deduper code
giambattista.bloisi pushed to dedup-with-dataframe-spark34 at D-Net/dnet-hadoop 2023-07-14 16:05:54 +02:00
b6a8be813b oozie.launcher.mapreduce.user.classpath.first property is required to avoid launch problems
giambattista.bloisi pushed to dedup-with-dataframe at D-Net/dnet-hadoop 2023-07-14 09:29:12 +02:00
c95aa736dd Fix performance problem for high contention in accessing JsonPath library cache
giambattista.bloisi pushed to dedup-with-dataframe at D-Net/dnet-hadoop 2023-07-14 09:27:49 +02:00
df2b8ae72f Order by ordering field and identifier to make grouping deterministic
giambattista.bloisi pushed to dedup-with-dataframe at D-Net/dnet-hadoop 2023-07-11 14:24:24 +02:00
d9c72356c0 provide explicit types to scala lambda to solve a scala 2.11 compiler error
giambattista.bloisi deleted branch beta_with_pace_core from D-Net/dnet-hadoop 2023-07-11 14:03:16 +02:00
giambattista.bloisi pushed to beta at D-Net/dnet-hadoop 2023-07-11 14:03:16 +02:00
ef493681d9 Merge pull request 'Import dnet-pace-core module in this project and use it after renaming to dhp-pace-core' (#319) from beta_with_pace_core into beta
801da2fd4a New sources formatted by maven plugin
bd3fcf869a rename dnet-pace-core into dhp-pace-core module and use it as dependency in other modules
3b35db5fbd Import dnet-pace-core module from dnet-dedup repository
6210f6ee48 Merge pull request 'Precompile blacklists patterns before evaluating clustering criteria' (#1) from optimized-clustering into master
Compare 199 commits »
giambattista.bloisi merged pull request D-Net/dnet-hadoop#319 2023-07-11 14:03:15 +02:00
Import dnet-pace-core module in this project and use it after renaming to dhp-pace-core
giambattista.bloisi created branch dedup-with-dataframe-spark34 in D-Net/dnet-hadoop 2023-07-10 15:54:57 +02:00
giambattista.bloisi pushed to dedup-with-dataframe-spark34 at D-Net/dnet-hadoop 2023-07-10 15:54:57 +02:00
d80f12da06 Build with spark 3.4 (dedup and dependencies only tested)
giambattista.bloisi pushed to dedup-with-dataframe at D-Net/dnet-hadoop 2023-07-10 15:52:46 +02:00
861c368e65 Code for testing other grouping strategies
giambattista.bloisi pushed to dedup-with-dataframe at D-Net/dnet-hadoop 2023-07-10 15:46:31 +02:00
745e70e0d7 When generating similarities put as 'from' component the one with smaller lexicographic id
dcc08cc512 Use UDAF and Aggregation class for testing
df19548c56 small changes
Compare 3 commits »
giambattista.bloisi created branch scala212 in D-Net/dhp-schemas 2023-07-10 15:25:15 +02:00
giambattista.bloisi pushed to scala212 at D-Net/dhp-schemas 2023-07-10 15:25:15 +02:00
ad98cf0220 Change pom.xml to compile with scala 2.12
giambattista.bloisi created pull request D-Net/dnet-hadoop#319 2023-07-06 10:33:48 +02:00
Import dnet-pace-core module in this project and use it after renaming to dhp-pace-core
giambattista.bloisi pushed to beta_with_pace_core at D-Net/dnet-hadoop 2023-07-06 10:29:14 +02:00
801da2fd4a New sources formatted by maven plugin
giambattista.bloisi pushed to beta_with_pace_core at D-Net/dnet-hadoop 2023-07-06 10:26:00 +02:00
bd3fcf869a rename dnet-pace-core into dhp-pace-core module and use it as dependency in other modules
giambattista.bloisi created branch beta_with_pace_core in D-Net/dnet-hadoop 2023-07-05 22:25:13 +02:00
giambattista.bloisi pushed to beta_with_pace_core at D-Net/dnet-hadoop 2023-07-05 22:25:13 +02:00
3b35db5fbd Import dnet-pace-core module from dnet-dedup repository
giambattista.bloisi deleted branch import_dedup_project from D-Net/dnet-hadoop 2023-07-05 21:02:57 +02:00