Giambattista Bloisi

giambattista.bloisi created branch dispatch_filter_invisible_entities in D-Net/dnet-hadoop

2023-08-09 15:46:25 +02:00

giambattista.bloisi deleted branch cleanup_relations_after_dedup from D-Net/dnet-hadoop

2023-08-09 15:45:51 +02:00

giambattista.bloisi pushed to cleanup_relations_after_dedup at D-Net/dnet-hadoop

2023-08-07 10:24:30 +02:00

97b6d1dc45 Filter ids by dataInfo.deletedbyinference and DataInfo.invisible flags

giambattista.bloisi created pull request D-Net/dnet-hadoop#328

2023-08-04 17:28:08 +02:00

Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities

giambattista.bloisi created branch cleanup_relations_after_dedup in D-Net/dnet-hadoop

2023-08-04 17:25:24 +02:00

giambattista.bloisi pushed to cleanup_relations_after_dedup at D-Net/dnet-hadoop

2023-08-04 17:25:24 +02:00

af49424b59 Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleyted by inference or that are pointing to dangling entities

giambattista.bloisi created pull request D-Net/dnet-hadoop#327

2023-08-02 18:12:11 +02:00

WIP Changes in maven poms to build and test the project using Spark 3.4.x and scala 2.12

giambattista.bloisi pushed to spark34-integration at D-Net/dnet-hadoop

2023-08-02 18:06:16 +02:00

c13df9d6c3 Changes in maven poms to build and test the project using Spark 3.4.x and scala 2.12

giambattista.bloisi created branch spark34-integration in D-Net/dnet-hadoop

2023-08-02 18:06:15 +02:00

giambattista.bloisi commented on pull request D-Net/dnet-hadoop#320

2023-07-28 15:05:41 +02:00

Import affiliation relations from Crossref

This class can be removed by using dataframe api approach

giambattista.bloisi commented on pull request D-Net/dnet-hadoop#320

2023-07-28 15:05:41 +02:00

Import affiliation relations from Crossref

It is advisable to compress output file here (using /data/bip-affiliations/data.json as the input the total disk size for output file is reduced from 50Gb to 1.5Gb)

giambattista.bloisi suggested changes for D-Net/dnet-hadoop#320

2023-07-28 15:05:41 +02:00

Import affiliation relations from Crossref

Hi Serafeim,

giambattista.bloisi commented on pull request D-Net/dnet-hadoop#320

2023-07-28 15:05:41 +02:00

Import affiliation relations from Crossref

That class can be removed by using dataframe api approach

giambattista.bloisi commented on pull request D-Net/dnet-hadoop#320

2023-07-28 15:05:41 +02:00

Import affiliation relations from Crossref

AffiliationRelationDeserializer and AffiliationRelationModel are two classes used to store intermediate representation of the data that eventually is put in generated Relation(s). Those two classes leverage lombok annotations to get a few methods generated automatically.

giambattista.bloisi created pull request D-Net/dnet-hadoop#324

2023-07-24 15:51:11 +02:00

Refactor Dedup using Spark Dataframe API, initial support for scala 2.12 and Spark 3.4

giambattista.bloisi pushed to dedup-with-dataframe-2 at D-Net/dnet-hadoop

2023-07-24 15:37:01 +02:00

e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface

giambattista.bloisi pushed to dedup-with-dataframe-2 at D-Net/dnet-hadoop

2023-07-24 11:14:19 +02:00

45ed6e6229 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface

bb5b845e3c Use scala.binary.version property to resolve scala maven dependencies

002b24e06f Merge pull request '[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests' (#315) from pid_cleaning into beta

c754397a19 Merge branch 'beta' into pid_cleaning

f0678cda09 Merge pull request 'fix_beta_tests' (#323) from fix_beta_tests into beta

Compare 13 commits »

giambattista.bloisi pushed to dedup-with-dataframe-2 at D-Net/dnet-hadoop

2023-07-21 13:46:33 +02:00

b21a1107ae Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface

giambattista.bloisi pushed to dedup-with-dataframe-2 at D-Net/dnet-hadoop

2023-07-21 12:23:32 +02:00

587ca0e44d Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface

giambattista.bloisi pushed to fix_beta_tests at D-Net/dnet-hadoop

2023-07-21 10:48:58 +02:00

f03153823a Update testCitationRelations number of expected citations according to changes made in 0559d8b4 (monodirectional citations)

54c1eacef1 SparkJobTest was failing because testing workingdir was not cleaned up after eact test

5e15f20e6e Fix entityMerger that was excluding the authors of the first entity in the list to merge

0210a14e43 Ignore timestamp differences in PromoteActionPayloadForGraphTableJobTest

Compare 4 commits »