Commit Graph

323 Commits

Author SHA1 Message Date
Miriam Baglioni 336fb31d87 [community_result_propagation] adjusting starting poit of workflow 2023-12-07 10:27:25 +01:00
Miriam Baglioni c0cde53bf6 [bulktagging] setting first step of bulktaggin as the copy of the entities and relations not involved in the tagging' 2023-12-07 10:08:35 +01:00
Miriam Baglioni 616622d2bb first version of the workflow single step 2023-12-07 09:59:52 +01:00
Claudio Atzori c5b7253130 [community_organization propagation] fixed workflow parameters 2023-12-05 09:13:33 +01:00
Claudio Atzori 3c3bdb8318 [bulktagging] fixed workflow parameters 2023-12-05 09:08:48 +01:00
Miriam Baglioni 8eb70e6657 refactoring 2023-11-27 15:13:15 +01:00
Miriam Baglioni 48e0427a23 changed the parameter from production to baseURL. Fixed issue in tagging configuration 2023-11-27 15:10:27 +01:00
Claudio Atzori 1763d377ad code formatting 2023-11-23 16:33:24 +01:00
Miriam Baglioni b177cd5a0a Project propagation via communityAPI instead of using IS via IIS 2023-11-14 16:25:09 +01:00
Miriam Baglioni eaf0a702de - 2023-11-14 14:53:34 +01:00
Miriam Baglioni 7b1e34f159 refactoring 2023-11-03 15:30:01 +01:00
Miriam Baglioni 638ad9e74f changing test for new implementation 2023-11-03 15:06:50 +01:00
Miriam Baglioni edcb17ca98 refactoring and test 2023-11-03 13:01:14 +01:00
Claudio Atzori 5f1ed61c1f merging from bulkTag branch 2023-11-03 12:51:37 +01:00
Claudio Atzori 8c03c41d5d applying changes from beta 2023-11-03 12:08:39 +01:00
Miriam Baglioni 937ff6a7c7 - 2023-10-31 15:56:08 +01:00
Miriam Baglioni a737dd47b6 removed not needed test class 2023-10-31 15:54:49 +01:00
Miriam Baglioni c80b768af0 test for project propagation 2023-10-31 15:49:42 +01:00
Miriam Baglioni e9a20fc8f6 mergin with branch beta 2023-10-31 14:36:03 +01:00
Miriam Baglioni 0097f4e64b Removed Query community testing. Removed package from common related to the interaction with Zenodo since it was moved to the dump-project 2023-10-26 09:38:09 +02:00
Miriam Baglioni 5c5a195e97 refactoring and fixing issue on property name 2023-10-23 11:26:17 +02:00
Miriam Baglioni 70b78a40c7 removed file from different propagation 2023-10-20 15:50:49 +02:00
Miriam Baglioni f206ff42d6 modified code to use the the API. Removing not needed parameters. Rewritten the code to exploit the parallel stream on the entity types 2023-10-20 15:49:41 +02:00
Miriam Baglioni 34358afe75 modified resource file, workflow anf default-config. Add 3g of memory Overhead and specified the shuffle partition in the wf confiduration. Removed the multiple instantiation in the wf because of different implementation of the spark job 2023-10-20 15:48:27 +02:00
Miriam Baglioni 18bfff8af3 adding test classes and modifying test for bulktag 2023-10-20 15:47:03 +02:00
Miriam Baglioni 69dac91659 adding the new code to use the API instead of the Information Service 2023-10-20 15:45:52 +02:00
Miriam Baglioni a4214ced1e fixing issue on propagation organization. added --config to workflow definition. added oozie_app to communtiy project 2023-10-20 10:14:20 +02:00
Claudio Atzori b0fed1725e avoid NPEs 2023-10-19 12:13:45 +02:00
Sandro La Bruzzo a5a89a702f new spark parrameter updated 2023-10-16 11:46:12 +02:00
Miriam Baglioni 159388f9c2 testing and fix some issues 2023-10-16 11:26:07 +02:00
Miriam Baglioni 89184d5b4f used the API instead of the IS for bulktagging and propagation for community through organization. Added a new propagation step for communities through projects. Still using the API and not the IS 2023-10-11 18:17:35 +02:00
Miriam Baglioni a3d01ccb24 refactoring 2023-10-09 14:52:17 +02:00
Miriam Baglioni 3d6be20989 changes to use the API instead of the IS the get the information for the communities to be used during bulktagging and context propagation 2023-10-09 14:26:33 +02:00
Claudio Atzori da0e9828f7 resolved conflicts for PR#337 2023-09-06 11:28:46 +02:00
Giambattista Bloisi e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Giambattista Bloisi bb5b845e3c Use scala.binary.version property to resolve scala maven dependencies
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
Giambattista Bloisi 54c1eacef1 SparkJobTest was failing because testing workingdir was not cleaned up after eact test 2023-07-21 10:42:24 +02:00
Claudio Atzori f3a85e224b merged from branch beta the bulk tagging (single step, negative constraints), the cleanig worflow (single step, pid type based cleaning), instance level fulltext 2023-06-28 13:33:57 +02:00
Miriam Baglioni 2717edafb7 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-06-28 11:25:14 +02:00
Miriam Baglioni 2f04c9d149 [BulkTagging] fixing left over for test 2023-06-28 11:24:42 +02:00
Claudio Atzori 50d7dc0078 [graph enrichment] fixed projectOrganizationPath not being passed to the apply_resulttoorganization_propagation node 2023-06-19 15:42:44 +02:00
Claudio Atzori fbd9bf704e indent 2023-06-19 15:41:22 +02:00
Claudio Atzori 55f002f1e9 Merge branch 'beta' into propagationProjectThroughParentChils 2023-06-12 09:56:53 +02:00
Miriam Baglioni daf4d7971b refactoring 2023-05-31 18:56:58 +02:00
Miriam Baglioni 97d72d41c3 finalization of implementation and testing 2023-05-31 18:53:22 +02:00
Miriam Baglioni 0389b57ca7 added propagation for project to organization 2023-05-31 11:06:58 +02:00
Miriam Baglioni 9097e71853 Added assertion in test 2023-05-24 16:30:53 +02:00
Miriam Baglioni 9567c13bc3 refactoring 2023-05-24 16:20:05 +02:00
Miriam Baglioni 34172455d1 [BulkTag] Adding remove constraints to specify when a community must not appear in the context of a result. 2023-05-24 09:56:23 +02:00
Miriam Baglioni 8c05f49665 moved the version as it was before the change 2023-05-09 10:48:34 +02:00
Claudio Atzori abd7ca0c18 Merge branch 'beta' into bulkTagRefactor 2023-05-02 10:50:01 +02:00
Claudio Atzori de11edca98 Merge branch 'beta' into organizationToRepresentative 2023-05-02 09:59:41 +02:00
Miriam Baglioni efc4f6a658 [bulkTag] refactor to enrich each result single step 2023-04-18 17:39:31 +02:00
Miriam Baglioni 697a134504 - 2023-04-18 10:21:12 +02:00
Miriam Baglioni 6cc95c96a2 - 2023-04-18 09:53:11 +02:00
Miriam Baglioni 932d07d2dd [bulkTag] added filtering for datasources in eosctag 2023-04-06 15:08:27 +02:00
Miriam Baglioni c6a7602b3e refactoring after compilation 2023-04-06 14:45:01 +02:00
Miriam Baglioni 831055a1fc change of the property for test purposes, addition of two new verbs, and fix of issue for advanced constraints 2023-04-06 14:41:32 +02:00
Miriam Baglioni 287753417d better implementation for the fix 2023-04-06 12:22:38 +02:00
Miriam Baglioni cf3d0f4f83 fixed issue on bulktagging for the advanced constraints 2023-04-06 12:17:35 +02:00
Miriam Baglioni b42abc9904 fixed issue on bulktagging for the advanced constraints 2023-04-06 12:15:00 +02:00
Miriam Baglioni ecc05fe0f3 Added the code for the advancedConstraint implementation during the bulkTagging 2023-04-05 16:40:29 +02:00
Miriam Baglioni b25b401065 added test to verify the advconstraints to dth community. inserted some additional logs. 2023-04-05 12:18:39 +02:00
Claudio Atzori d05ca53a14 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-01-31 14:39:53 +01:00
Miriam Baglioni e82e009b46 added missing close tag for XML produced by the xquery to get information for the community from the IS 2023-01-31 10:19:34 +01:00
Miriam Baglioni b254a0375f [Affiliation from institutionalrepo] changed the field to check to verify the datasource type. Now it is in the field jurisdiction 2023-01-26 16:51:20 +01:00
Claudio Atzori 505867bce9 [bulk tagging] better node naming 2023-01-20 16:13:16 +01:00
Claudio Atzori 1b37516578 [bulk tagging] better node naming 2023-01-20 16:11:26 +01:00
Miriam Baglioni ecd398fe51 refactoring 2023-01-20 14:23:45 +01:00
Claudio Atzori 3800361033 [country propagation] fixes error 'cannot resolve countrySet given input columns: []' when there is no prepared information driving the propagation process for a given result type 2023-01-19 15:57:43 +01:00
Miriam Baglioni 8893389895 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-12-21 12:42:27 +01:00
Claudio Atzori 6aa91204a5 [orcid propagation] skip empty directories 2022-12-20 14:15:46 +01:00
Claudio Atzori 5816ded93f code formatting 2022-12-20 10:41:40 +01:00
Claudio Atzori 46972f8393 [orcid propagation] skip empty directory 2022-12-20 10:28:22 +01:00
Miriam Baglioni 6674cccb94 [BulkTag] description of parameters more comprehensive for those who do not implement it 2022-12-16 15:33:20 +01:00
Miriam Baglioni f37113a941 [BulkTag] moving xquery to get community configuration in dedicated file 2022-12-16 15:32:26 +01:00
Miriam Baglioni 3d99b78d94 [Cleaning] fixed error in parameter (workingPath to workingDir) 2022-12-08 10:25:02 +01:00
Claudio Atzori 1b8488976b code formatting 2022-12-07 10:45:38 +01:00
Claudio Atzori cd1b58483e [bulk tag] fixed Community configuration parsing to void NPE 2022-12-07 10:39:00 +01:00
Miriam Baglioni bb0ddc1c44 [BulkTag] adding verb starts_with 2022-11-30 09:56:24 +01:00
Miriam Baglioni 9c70c5dbd6 [Bulk Tag horizontal] added new path in definition of constraint (to recognize fos subjects) - changed test and resource class to test this new aspect 2022-11-28 14:51:20 +01:00
Miriam Baglioni 0628df7a3a resolving conflicts 2022-11-28 10:44:56 +01:00
Miriam Baglioni 33a2b1b5dc [Bulk Tag] fixed typo in test configuration 2022-11-23 11:31:17 +01:00
Miriam Baglioni c6df8327b3 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2022-11-23 11:26:57 +01:00
Miriam Baglioni 0e3edc5018 [Bulk Tag] fixed issue in verb name 2022-11-23 11:26:36 +01:00
Miriam Baglioni 935aa367d8 [BulkTag] removed commented code 2022-11-23 11:16:39 +01:00
Miriam Baglioni 43aedbdfe5 [BulkTag] changed verb name in configuration 2022-11-23 11:14:23 +01:00
Miriam Baglioni b6da9b67ff [BulkTag] fixed typo in annotation for verb name 2022-11-23 11:13:58 +01:00
Claudio Atzori a34c8b6f81 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2022-11-22 10:22:31 +01:00
Miriam Baglioni 122e75aa17 fixed conflicts 2022-11-21 18:13:12 +01:00
Miriam Baglioni cee7a45b1d [Bulk Tag Datasource] fixed issue with verb name and add new test for neanias selection for orcid 2022-11-21 18:10:20 +01:00
Claudio Atzori ed64618235 increased spark.sql.shuffle.partitions in the last join phase of the result (publication) to community through semantic relation propagation 2022-11-18 16:06:51 +01:00
Claudio Atzori 8742934843 added spark.sql.shuffle.partitions in the last join phase of the result to community through semantic relation propagation 2022-11-18 11:32:22 +01:00
Claudio Atzori 13cc592f39 code formatting 2022-11-15 09:37:57 +01:00
Claudio Atzori af15b1e48d [eosc tag] extending criteria for Jupyter Notebook (adding to ORP the same constraint) 2022-11-14 18:30:43 +01:00
Miriam Baglioni 5f9383b2d9 [EOSC TAG] remove reduntant check for jupyter notebook 2022-11-11 14:06:19 +01:00
Miriam Baglioni b18bbca8af [EOSC TAG] adding search in orp for jupyter notebook criteria 2022-11-11 12:42:58 +01:00
Claudio Atzori bca4a61710 suppressing hyper verbose spark logs during unit test execution 2022-10-19 15:20:58 +02:00
Miriam Baglioni a653e1b3ea [Enrichment - result to community through organization] reimplementation of the data preparation step using spark 2022-10-04 15:01:28 +02:00
Miriam Baglioni f1d7d45cf7 [BulkTag] fixed issue 2022-09-28 12:01:43 +02:00