Miriam Baglioni
336fb31d87
[community_result_propagation] adjusting starting poit of workflow
2023-12-07 10:27:25 +01:00
Miriam Baglioni
c0cde53bf6
[bulktagging] setting first step of bulktaggin as the copy of the entities and relations not involved in the tagging'
2023-12-07 10:08:35 +01:00
Miriam Baglioni
616622d2bb
first version of the workflow single step
2023-12-07 09:59:52 +01:00
Claudio Atzori
c5b7253130
[community_organization propagation] fixed workflow parameters
2023-12-05 09:13:33 +01:00
Claudio Atzori
3c3bdb8318
[bulktagging] fixed workflow parameters
2023-12-05 09:08:48 +01:00
Miriam Baglioni
8eb70e6657
refactoring
2023-11-27 15:13:15 +01:00
Miriam Baglioni
48e0427a23
changed the parameter from production to baseURL. Fixed issue in tagging configuration
2023-11-27 15:10:27 +01:00
Claudio Atzori
1763d377ad
code formatting
2023-11-23 16:33:24 +01:00
Miriam Baglioni
b177cd5a0a
Project propagation via communityAPI instead of using IS via IIS
2023-11-14 16:25:09 +01:00
Miriam Baglioni
eaf0a702de
-
2023-11-14 14:53:34 +01:00
Miriam Baglioni
7b1e34f159
refactoring
2023-11-03 15:30:01 +01:00
Miriam Baglioni
638ad9e74f
changing test for new implementation
2023-11-03 15:06:50 +01:00
Miriam Baglioni
edcb17ca98
refactoring and test
2023-11-03 13:01:14 +01:00
Claudio Atzori
5f1ed61c1f
merging from bulkTag branch
2023-11-03 12:51:37 +01:00
Claudio Atzori
8c03c41d5d
applying changes from beta
2023-11-03 12:08:39 +01:00
Miriam Baglioni
937ff6a7c7
-
2023-10-31 15:56:08 +01:00
Miriam Baglioni
a737dd47b6
removed not needed test class
2023-10-31 15:54:49 +01:00
Miriam Baglioni
c80b768af0
test for project propagation
2023-10-31 15:49:42 +01:00
Miriam Baglioni
e9a20fc8f6
mergin with branch beta
2023-10-31 14:36:03 +01:00
Miriam Baglioni
0097f4e64b
Removed Query community testing. Removed package from common related to the interaction with Zenodo since it was moved to the dump-project
2023-10-26 09:38:09 +02:00
Miriam Baglioni
5c5a195e97
refactoring and fixing issue on property name
2023-10-23 11:26:17 +02:00
Miriam Baglioni
70b78a40c7
removed file from different propagation
2023-10-20 15:50:49 +02:00
Miriam Baglioni
f206ff42d6
modified code to use the the API. Removing not needed parameters. Rewritten the code to exploit the parallel stream on the entity types
2023-10-20 15:49:41 +02:00
Miriam Baglioni
34358afe75
modified resource file, workflow anf default-config. Add 3g of memory Overhead and specified the shuffle partition in the wf confiduration. Removed the multiple instantiation in the wf because of different implementation of the spark job
2023-10-20 15:48:27 +02:00
Miriam Baglioni
18bfff8af3
adding test classes and modifying test for bulktag
2023-10-20 15:47:03 +02:00
Miriam Baglioni
69dac91659
adding the new code to use the API instead of the Information Service
2023-10-20 15:45:52 +02:00
Miriam Baglioni
a4214ced1e
fixing issue on propagation organization. added --config to workflow definition. added oozie_app to communtiy project
2023-10-20 10:14:20 +02:00
Claudio Atzori
b0fed1725e
avoid NPEs
2023-10-19 12:13:45 +02:00
Sandro La Bruzzo
a5a89a702f
new spark parrameter updated
2023-10-16 11:46:12 +02:00
Miriam Baglioni
159388f9c2
testing and fix some issues
2023-10-16 11:26:07 +02:00
Miriam Baglioni
89184d5b4f
used the API instead of the IS for bulktagging and propagation for community through organization. Added a new propagation step for communities through projects. Still using the API and not the IS
2023-10-11 18:17:35 +02:00
Miriam Baglioni
a3d01ccb24
refactoring
2023-10-09 14:52:17 +02:00
Miriam Baglioni
3d6be20989
changes to use the API instead of the IS the get the information for the communities to be used during bulktagging and context propagation
2023-10-09 14:26:33 +02:00
Claudio Atzori
da0e9828f7
resolved conflicts for PR#337
2023-09-06 11:28:46 +02:00
Giambattista Bloisi
e64c2854a3
Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
...
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Giambattista Bloisi
bb5b845e3c
Use scala.binary.version property to resolve scala maven dependencies
...
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
Giambattista Bloisi
54c1eacef1
SparkJobTest was failing because testing workingdir was not cleaned up after eact test
2023-07-21 10:42:24 +02:00
Claudio Atzori
f3a85e224b
merged from branch beta the bulk tagging (single step, negative constraints), the cleanig worflow (single step, pid type based cleaning), instance level fulltext
2023-06-28 13:33:57 +02:00
Miriam Baglioni
2717edafb7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-06-28 11:25:14 +02:00
Miriam Baglioni
2f04c9d149
[BulkTagging] fixing left over for test
2023-06-28 11:24:42 +02:00
Claudio Atzori
50d7dc0078
[graph enrichment] fixed projectOrganizationPath not being passed to the apply_resulttoorganization_propagation node
2023-06-19 15:42:44 +02:00
Claudio Atzori
fbd9bf704e
indent
2023-06-19 15:41:22 +02:00
Claudio Atzori
55f002f1e9
Merge branch 'beta' into propagationProjectThroughParentChils
2023-06-12 09:56:53 +02:00
Miriam Baglioni
daf4d7971b
refactoring
2023-05-31 18:56:58 +02:00
Miriam Baglioni
97d72d41c3
finalization of implementation and testing
2023-05-31 18:53:22 +02:00
Miriam Baglioni
0389b57ca7
added propagation for project to organization
2023-05-31 11:06:58 +02:00
Miriam Baglioni
9097e71853
Added assertion in test
2023-05-24 16:30:53 +02:00
Miriam Baglioni
9567c13bc3
refactoring
2023-05-24 16:20:05 +02:00
Miriam Baglioni
34172455d1
[BulkTag] Adding remove constraints to specify when a community must not appear in the context of a result.
2023-05-24 09:56:23 +02:00
Miriam Baglioni
8c05f49665
moved the version as it was before the change
2023-05-09 10:48:34 +02:00
Claudio Atzori
abd7ca0c18
Merge branch 'beta' into bulkTagRefactor
2023-05-02 10:50:01 +02:00
Claudio Atzori
de11edca98
Merge branch 'beta' into organizationToRepresentative
2023-05-02 09:59:41 +02:00
Miriam Baglioni
efc4f6a658
[bulkTag] refactor to enrich each result single step
2023-04-18 17:39:31 +02:00
Miriam Baglioni
697a134504
-
2023-04-18 10:21:12 +02:00
Miriam Baglioni
6cc95c96a2
-
2023-04-18 09:53:11 +02:00
Miriam Baglioni
932d07d2dd
[bulkTag] added filtering for datasources in eosctag
2023-04-06 15:08:27 +02:00
Miriam Baglioni
c6a7602b3e
refactoring after compilation
2023-04-06 14:45:01 +02:00
Miriam Baglioni
831055a1fc
change of the property for test purposes, addition of two new verbs, and fix of issue for advanced constraints
2023-04-06 14:41:32 +02:00
Miriam Baglioni
287753417d
better implementation for the fix
2023-04-06 12:22:38 +02:00
Miriam Baglioni
cf3d0f4f83
fixed issue on bulktagging for the advanced constraints
2023-04-06 12:17:35 +02:00
Miriam Baglioni
b42abc9904
fixed issue on bulktagging for the advanced constraints
2023-04-06 12:15:00 +02:00
Miriam Baglioni
ecc05fe0f3
Added the code for the advancedConstraint implementation during the bulkTagging
2023-04-05 16:40:29 +02:00
Miriam Baglioni
b25b401065
added test to verify the advconstraints to dth community. inserted some additional logs.
2023-04-05 12:18:39 +02:00
Claudio Atzori
d05ca53a14
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-31 14:39:53 +01:00
Miriam Baglioni
e82e009b46
added missing close tag for XML produced by the xquery to get information for the community from the IS
2023-01-31 10:19:34 +01:00
Miriam Baglioni
b254a0375f
[Affiliation from institutionalrepo] changed the field to check to verify the datasource type. Now it is in the field jurisdiction
2023-01-26 16:51:20 +01:00
Claudio Atzori
505867bce9
[bulk tagging] better node naming
2023-01-20 16:13:16 +01:00
Claudio Atzori
1b37516578
[bulk tagging] better node naming
2023-01-20 16:11:26 +01:00
Miriam Baglioni
ecd398fe51
refactoring
2023-01-20 14:23:45 +01:00
Claudio Atzori
3800361033
[country propagation] fixes error 'cannot resolve countrySet given input columns: []' when there is no prepared information driving the propagation process for a given result type
2023-01-19 15:57:43 +01:00
Miriam Baglioni
8893389895
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-12-21 12:42:27 +01:00
Claudio Atzori
6aa91204a5
[orcid propagation] skip empty directories
2022-12-20 14:15:46 +01:00
Claudio Atzori
5816ded93f
code formatting
2022-12-20 10:41:40 +01:00
Claudio Atzori
46972f8393
[orcid propagation] skip empty directory
2022-12-20 10:28:22 +01:00
Miriam Baglioni
6674cccb94
[BulkTag] description of parameters more comprehensive for those who do not implement it
2022-12-16 15:33:20 +01:00
Miriam Baglioni
f37113a941
[BulkTag] moving xquery to get community configuration in dedicated file
2022-12-16 15:32:26 +01:00
Miriam Baglioni
3d99b78d94
[Cleaning] fixed error in parameter (workingPath to workingDir)
2022-12-08 10:25:02 +01:00
Claudio Atzori
1b8488976b
code formatting
2022-12-07 10:45:38 +01:00
Claudio Atzori
cd1b58483e
[bulk tag] fixed Community configuration parsing to void NPE
2022-12-07 10:39:00 +01:00
Miriam Baglioni
bb0ddc1c44
[BulkTag] adding verb starts_with
2022-11-30 09:56:24 +01:00
Miriam Baglioni
9c70c5dbd6
[Bulk Tag horizontal] added new path in definition of constraint (to recognize fos subjects) - changed test and resource class to test this new aspect
2022-11-28 14:51:20 +01:00
Miriam Baglioni
0628df7a3a
resolving conflicts
2022-11-28 10:44:56 +01:00
Miriam Baglioni
33a2b1b5dc
[Bulk Tag] fixed typo in test configuration
2022-11-23 11:31:17 +01:00
Miriam Baglioni
c6df8327b3
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2022-11-23 11:26:57 +01:00
Miriam Baglioni
0e3edc5018
[Bulk Tag] fixed issue in verb name
2022-11-23 11:26:36 +01:00
Miriam Baglioni
935aa367d8
[BulkTag] removed commented code
2022-11-23 11:16:39 +01:00
Miriam Baglioni
43aedbdfe5
[BulkTag] changed verb name in configuration
2022-11-23 11:14:23 +01:00
Miriam Baglioni
b6da9b67ff
[BulkTag] fixed typo in annotation for verb name
2022-11-23 11:13:58 +01:00
Claudio Atzori
a34c8b6f81
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2022-11-22 10:22:31 +01:00
Miriam Baglioni
122e75aa17
fixed conflicts
2022-11-21 18:13:12 +01:00
Miriam Baglioni
cee7a45b1d
[Bulk Tag Datasource] fixed issue with verb name and add new test for neanias selection for orcid
2022-11-21 18:10:20 +01:00
Claudio Atzori
ed64618235
increased spark.sql.shuffle.partitions in the last join phase of the result (publication) to community through semantic relation propagation
2022-11-18 16:06:51 +01:00
Claudio Atzori
8742934843
added spark.sql.shuffle.partitions in the last join phase of the result to community through semantic relation propagation
2022-11-18 11:32:22 +01:00
Claudio Atzori
13cc592f39
code formatting
2022-11-15 09:37:57 +01:00
Claudio Atzori
af15b1e48d
[eosc tag] extending criteria for Jupyter Notebook (adding to ORP the same constraint)
2022-11-14 18:30:43 +01:00
Miriam Baglioni
5f9383b2d9
[EOSC TAG] remove reduntant check for jupyter notebook
2022-11-11 14:06:19 +01:00
Miriam Baglioni
b18bbca8af
[EOSC TAG] adding search in orp for jupyter notebook criteria
2022-11-11 12:42:58 +01:00
Claudio Atzori
bca4a61710
suppressing hyper verbose spark logs during unit test execution
2022-10-19 15:20:58 +02:00
Miriam Baglioni
a653e1b3ea
[Enrichment - result to community through organization] reimplementation of the data preparation step using spark
2022-10-04 15:01:28 +02:00
Miriam Baglioni
f1d7d45cf7
[BulkTag] fixed issue
2022-09-28 12:01:43 +02:00