1
0
Fork 0
Commit Graph

252 Commits

Author SHA1 Message Date
Claudio Atzori 1763d377ad code formatting 2023-11-23 16:33:24 +01:00
Miriam Baglioni b177cd5a0a Project propagation via communityAPI instead of using IS via IIS 2023-11-14 16:25:09 +01:00
Claudio Atzori 5f1ed61c1f merging from bulkTag branch 2023-11-03 12:51:37 +01:00
Claudio Atzori 8c03c41d5d applying changes from beta 2023-11-03 12:08:39 +01:00
Miriam Baglioni 0097f4e64b Removed Query community testing. Removed package from common related to the interaction with Zenodo since it was moved to the dump-project 2023-10-26 09:38:09 +02:00
Miriam Baglioni 5c5a195e97 refactoring and fixing issue on property name 2023-10-23 11:26:17 +02:00
Miriam Baglioni 70b78a40c7 removed file from different propagation 2023-10-20 15:50:49 +02:00
Miriam Baglioni f206ff42d6 modified code to use the the API. Removing not needed parameters. Rewritten the code to exploit the parallel stream on the entity types 2023-10-20 15:49:41 +02:00
Miriam Baglioni 34358afe75 modified resource file, workflow anf default-config. Add 3g of memory Overhead and specified the shuffle partition in the wf confiduration. Removed the multiple instantiation in the wf because of different implementation of the spark job 2023-10-20 15:48:27 +02:00
Miriam Baglioni 18bfff8af3 adding test classes and modifying test for bulktag 2023-10-20 15:47:03 +02:00
Miriam Baglioni 69dac91659 adding the new code to use the API instead of the Information Service 2023-10-20 15:45:52 +02:00
Claudio Atzori b0fed1725e avoid NPEs 2023-10-19 12:13:45 +02:00
Claudio Atzori da0e9828f7 resolved conflicts for PR#337 2023-09-06 11:28:46 +02:00
Giambattista Bloisi e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Giambattista Bloisi bb5b845e3c Use scala.binary.version property to resolve scala maven dependencies
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
Giambattista Bloisi 54c1eacef1 SparkJobTest was failing because testing workingdir was not cleaned up after eact test 2023-07-21 10:42:24 +02:00
Claudio Atzori f3a85e224b merged from branch beta the bulk tagging (single step, negative constraints), the cleanig worflow (single step, pid type based cleaning), instance level fulltext 2023-06-28 13:33:57 +02:00
Miriam Baglioni 2717edafb7 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-06-28 11:25:14 +02:00
Miriam Baglioni 2f04c9d149 [BulkTagging] fixing left over for test 2023-06-28 11:24:42 +02:00
Claudio Atzori 50d7dc0078 [graph enrichment] fixed projectOrganizationPath not being passed to the apply_resulttoorganization_propagation node 2023-06-19 15:42:44 +02:00
Claudio Atzori fbd9bf704e indent 2023-06-19 15:41:22 +02:00
Claudio Atzori 55f002f1e9 Merge branch 'beta' into propagationProjectThroughParentChils 2023-06-12 09:56:53 +02:00
Miriam Baglioni daf4d7971b refactoring 2023-05-31 18:56:58 +02:00
Miriam Baglioni 97d72d41c3 finalization of implementation and testing 2023-05-31 18:53:22 +02:00
Miriam Baglioni 0389b57ca7 added propagation for project to organization 2023-05-31 11:06:58 +02:00
Miriam Baglioni 9097e71853 Added assertion in test 2023-05-24 16:30:53 +02:00
Miriam Baglioni 9567c13bc3 refactoring 2023-05-24 16:20:05 +02:00
Miriam Baglioni 34172455d1 [BulkTag] Adding remove constraints to specify when a community must not appear in the context of a result. 2023-05-24 09:56:23 +02:00
Miriam Baglioni 8c05f49665 moved the version as it was before the change 2023-05-09 10:48:34 +02:00
Claudio Atzori abd7ca0c18 Merge branch 'beta' into bulkTagRefactor 2023-05-02 10:50:01 +02:00
Claudio Atzori de11edca98 Merge branch 'beta' into organizationToRepresentative 2023-05-02 09:59:41 +02:00
Miriam Baglioni efc4f6a658 [bulkTag] refactor to enrich each result single step 2023-04-18 17:39:31 +02:00
Miriam Baglioni 697a134504 - 2023-04-18 10:21:12 +02:00
Miriam Baglioni 6cc95c96a2 - 2023-04-18 09:53:11 +02:00
Miriam Baglioni 932d07d2dd [bulkTag] added filtering for datasources in eosctag 2023-04-06 15:08:27 +02:00
Miriam Baglioni c6a7602b3e refactoring after compilation 2023-04-06 14:45:01 +02:00
Miriam Baglioni 831055a1fc change of the property for test purposes, addition of two new verbs, and fix of issue for advanced constraints 2023-04-06 14:41:32 +02:00
Miriam Baglioni 287753417d better implementation for the fix 2023-04-06 12:22:38 +02:00
Miriam Baglioni cf3d0f4f83 fixed issue on bulktagging for the advanced constraints 2023-04-06 12:17:35 +02:00
Miriam Baglioni b42abc9904 fixed issue on bulktagging for the advanced constraints 2023-04-06 12:15:00 +02:00
Miriam Baglioni ecc05fe0f3 Added the code for the advancedConstraint implementation during the bulkTagging 2023-04-05 16:40:29 +02:00
Miriam Baglioni b25b401065 added test to verify the advconstraints to dth community. inserted some additional logs. 2023-04-05 12:18:39 +02:00
Claudio Atzori d05ca53a14 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-01-31 14:39:53 +01:00
Miriam Baglioni e82e009b46 added missing close tag for XML produced by the xquery to get information for the community from the IS 2023-01-31 10:19:34 +01:00
Miriam Baglioni b254a0375f [Affiliation from institutionalrepo] changed the field to check to verify the datasource type. Now it is in the field jurisdiction 2023-01-26 16:51:20 +01:00
Claudio Atzori 505867bce9 [bulk tagging] better node naming 2023-01-20 16:13:16 +01:00
Claudio Atzori 1b37516578 [bulk tagging] better node naming 2023-01-20 16:11:26 +01:00
Miriam Baglioni ecd398fe51 refactoring 2023-01-20 14:23:45 +01:00
Claudio Atzori 3800361033 [country propagation] fixes error 'cannot resolve countrySet given input columns: []' when there is no prepared information driving the propagation process for a given result type 2023-01-19 15:57:43 +01:00
Miriam Baglioni 8893389895 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-12-21 12:42:27 +01:00