Commit Graph

19 Commits

Author SHA1 Message Date
Giambattista Bloisi e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Sandro La Bruzzo 2b9a20a4a3 Changed the way Scholexplorer filter the relationships, I found that filter all relation coming from openCitation is wrong, because we loose a lot of relation than intersect OpenCitation, but they don't come only from there 2022-10-24 12:53:47 +02:00
Claudio Atzori 2aa16d0432 [scholix] fixed OpenCitation dump procedure 2022-08-10 17:39:29 +02:00
Claudio Atzori 51ad93e545 [scholix] fixed OpenCitation dump procedure 2022-08-10 11:57:56 +02:00
Sandro La Bruzzo 5f651f2316 changed filter relation on SubRelType 2022-07-21 10:11:48 +02:00
Sandro La Bruzzo 5b76321d9c implemented oozie workflow to generate scholix dump filtering relclass semantic 2022-07-20 16:34:32 +02:00
Miriam Baglioni e4eac1d20b [EOSC TAG] added code to remove EOSC Jupyter Notebook from subjects and put EOSC as classid in the qualifier 2022-05-13 11:01:33 +02:00
Sandro La Bruzzo ca8d26bcb4 added better filter for openCitations 2022-05-11 15:29:57 +02:00
Miriam Baglioni 2c5087d55a [HostedByMap] download of doaj from json, modification of test resources, deletion of class no more needed for the CSV download 2022-03-04 15:18:21 +01:00
Miriam Baglioni 5d608d6291 [HostedByMap] changed the model to include also oaStart date and review process that could be possibly used in the future 2022-03-04 11:06:09 +01:00
Sandro La Bruzzo 57e2c4b749 formatted code 2022-01-12 09:40:28 +01:00
Claudio Atzori 4f212652ca scalafmt: code formatting 2022-01-11 16:57:48 +01:00
Claudio Atzori 908294d86e OAF-store-graph mdstores: firther fix for PR#180 2022-01-05 15:49:05 +01:00
Claudio Atzori a6977197b3 serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema 2022-01-03 17:25:26 +01:00
Sandro La Bruzzo 3920d68992 Fixed workflow generation of delta in datacite 2021-12-21 11:41:49 +01:00
Sandro La Bruzzo b881ee5ef8 [scholexplorer]
- implemented generation of scholix of delta update of datacite
2021-12-15 11:25:32 +01:00
Sandro La Bruzzo 63952018c0 [scholexplorer]
-moved SparkRetrieveDataciteDelta in scala folder
2021-12-15 11:25:32 +01:00
Sandro La Bruzzo e5bff64f2e [scholexplorer]
- Minor fix on SparkConvertRDDtoDataset
-first implementation of retrieve datacite dump
2021-12-15 11:25:32 +01:00
Sandro La Bruzzo bf880e2508 [scala-refactor] Module dhp-graph-mapper:
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 13:57:41 +01:00