Giambattista Bloisi giambattista.bloisi
  • Joined on 2023-06-30
giambattista.bloisi created branch dispatch_filter_invisible_entities in D-Net/dnet-hadoop 2023-08-09 15:46:25 +02:00
giambattista.bloisi deleted branch cleanup_relations_after_dedup from D-Net/dnet-hadoop 2023-08-09 15:45:51 +02:00
giambattista.bloisi pushed to cleanup_relations_after_dedup at D-Net/dnet-hadoop 2023-08-07 10:24:30 +02:00
97b6d1dc45 Filter ids by dataInfo.deletedbyinference and DataInfo.invisible flags
giambattista.bloisi created pull request D-Net/dnet-hadoop#328 2023-08-04 17:28:08 +02:00
Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities
giambattista.bloisi created branch cleanup_relations_after_dedup in D-Net/dnet-hadoop 2023-08-04 17:25:24 +02:00
giambattista.bloisi pushed to cleanup_relations_after_dedup at D-Net/dnet-hadoop 2023-08-04 17:25:24 +02:00
af49424b59 Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleyted by inference or that are pointing to dangling entities
giambattista.bloisi created pull request D-Net/dnet-hadoop#327 2023-08-02 18:12:11 +02:00
WIP Changes in maven poms to build and test the project using Spark 3.4.x and scala 2.12
giambattista.bloisi pushed to spark34-integration at D-Net/dnet-hadoop 2023-08-02 18:06:16 +02:00
c13df9d6c3 Changes in maven poms to build and test the project using Spark 3.4.x and scala 2.12
giambattista.bloisi created branch spark34-integration in D-Net/dnet-hadoop 2023-08-02 18:06:15 +02:00
giambattista.bloisi commented on pull request D-Net/dnet-hadoop#320 2023-07-28 15:05:41 +02:00
Import affiliation relations from Crossref

This class can be removed by using dataframe api approach

giambattista.bloisi commented on pull request D-Net/dnet-hadoop#320 2023-07-28 15:05:41 +02:00
Import affiliation relations from Crossref

It is advisable to compress output file here (using /data/bip-affiliations/data.json as the input the total disk size for output file is reduced from 50Gb to 1.5Gb)

giambattista.bloisi suggested changes for D-Net/dnet-hadoop#320 2023-07-28 15:05:41 +02:00
Import affiliation relations from Crossref

Hi Serafeim,

giambattista.bloisi commented on pull request D-Net/dnet-hadoop#320 2023-07-28 15:05:41 +02:00
Import affiliation relations from Crossref

That class can be removed by using dataframe api approach

giambattista.bloisi commented on pull request D-Net/dnet-hadoop#320 2023-07-28 15:05:41 +02:00
Import affiliation relations from Crossref

AffiliationRelationDeserializer and AffiliationRelationModel are two classes used to store intermediate representation of the data that eventually is put in generated Relation(s). Those two classes leverage lombok annotations to get a few methods generated automatically.

giambattista.bloisi created pull request D-Net/dnet-hadoop#324 2023-07-24 15:51:11 +02:00
Refactor Dedup using Spark Dataframe API, initial support for scala 2.12 and Spark 3.4
giambattista.bloisi pushed to dedup-with-dataframe-2 at D-Net/dnet-hadoop 2023-07-24 15:37:01 +02:00
e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
giambattista.bloisi pushed to dedup-with-dataframe-2 at D-Net/dnet-hadoop 2023-07-24 11:14:19 +02:00
45ed6e6229 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
bb5b845e3c Use scala.binary.version property to resolve scala maven dependencies
002b24e06f Merge pull request '[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests' (#315) from pid_cleaning into beta
c754397a19 Merge branch 'beta' into pid_cleaning
f0678cda09 Merge pull request 'fix_beta_tests' (#323) from fix_beta_tests into beta
Compare 13 commits »
giambattista.bloisi pushed to dedup-with-dataframe-2 at D-Net/dnet-hadoop 2023-07-21 13:46:33 +02:00
b21a1107ae Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
giambattista.bloisi pushed to dedup-with-dataframe-2 at D-Net/dnet-hadoop 2023-07-21 12:23:32 +02:00
587ca0e44d Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
giambattista.bloisi pushed to fix_beta_tests at D-Net/dnet-hadoop 2023-07-21 10:48:58 +02:00
f03153823a Update testCitationRelations number of expected citations according to changes made in 0559d8b4 (monodirectional citations)
54c1eacef1 SparkJobTest was failing because testing workingdir was not cleaned up after eact test
5e15f20e6e Fix entityMerger that was excluding the authors of the first entity in the list to merge
0210a14e43 Ignore timestamp differences in PromoteActionPayloadForGraphTableJobTest
Compare 4 commits »