master #11

Manually merged
claudio.atzori merged 275 commits from :master into enrichment_wfs 2020-05-11 15:14:56 +02:00

275 Commits

Author SHA1 Message Date
Miriam Baglioni bbc9b4f329 removed unused imports 2020-05-11 14:28:55 +02:00
Miriam Baglioni 757bae53ea removed unusefule serialization points 2020-05-11 14:28:37 +02:00
Miriam Baglioni b35d57a1ac added resources for test 2020-05-11 14:15:30 +02:00
Miriam Baglioni e563e65335 moved check from join to method 2020-05-11 14:11:44 +02:00
Miriam Baglioni f5d785e096 used the DbClient moved in dhp-common 2020-05-11 13:59:42 +02:00
Miriam Baglioni 112b2cb3c3 added the test class 2020-05-11 13:58:58 +02:00
Miriam Baglioni 9a7ae523c9 update to version 1.2.1-SNAPSHOT 2020-05-11 13:57:47 +02:00
Miriam Baglioni 2abb84877d Merge branch 'master' into blacklist 2020-05-11 10:37:49 +02:00
Miriam Baglioni b0f0b24263 update to version 1.2.1-SNAPSHOT 2020-05-11 10:37:31 +02:00
Miriam Baglioni a7e91e23ba update to versione 1.2.1-SNAPSHOT 2020-05-11 10:34:30 +02:00
Miriam Baglioni bb59bdd60f merge upstream 2020-05-11 10:33:17 +02:00
Miriam Baglioni 5e3548add6 - 2020-05-11 10:33:08 +02:00
Miriam Baglioni dc8c8fa480 changed the version 2020-05-11 10:20:48 +02:00
Miriam Baglioni 871e079b45 merged with master 2020-05-11 10:20:00 +02:00
Miriam Baglioni 622ba87ec2 changed the version 2020-05-11 10:10:36 +02:00
Miriam Baglioni 391b2399cc merge upstream 2020-05-11 10:08:51 +02:00
Miriam Baglioni 32301451ec merge upstream 2020-05-11 09:42:23 +02:00
Miriam Baglioni 7e66bc2527 fix a typo in the compression keyword and added some logging info in the spark job 2020-05-11 09:40:58 +02:00
Miriam Baglioni 9a29ab7508 got back to the readPath we have before 2020-05-08 13:08:56 +02:00
Miriam Baglioni 28556507e7 - 2020-05-08 12:54:52 +02:00
Claudio Atzori b2192fdcdc simplified reset_outputpath nodes across the workflows, applied common xml formatting 2020-05-08 12:33:31 +02:00
Miriam Baglioni 4c94231cad merge with master fork 2020-05-08 12:25:57 +02:00
Miriam Baglioni 9b4c0d4b3a - 2020-05-08 11:51:45 +02:00
Miriam Baglioni 53952707b6 modified test because of new step of data preparation. It now expects to find ResultCountrySet serialization nstead of DatasourceCountry 2020-05-08 11:49:19 +02:00
Miriam Baglioni d6b9de9f46 Merge branch 'master' of https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop 2020-05-07 18:22:59 +02:00
Miriam Baglioni f95d288681 fixed swithch of parameters 2020-05-07 18:22:32 +02:00
Claudio Atzori 166aafd936 heavy cleanup 2020-05-07 18:22:26 +02:00
Miriam Baglioni fb405275f7 merged with master 2020-05-07 11:48:21 +02:00
Miriam Baglioni e124278934 - 2020-05-07 11:47:11 +02:00
Claudio Atzori 5111671e62 celanup 2020-05-07 11:47:00 +02:00
Miriam Baglioni 9f8855991c changed Encorders.bean to Encoders.kryo 2020-05-07 11:44:35 +02:00
Miriam Baglioni 207b899d6d merged with upstream 2020-05-07 11:43:53 +02:00
Claudio Atzori e07feb4c5f removed spurious file 2020-05-07 11:42:46 +02:00
Claudio Atzori 5b3f8a0e90 using Encoders.bean instead of kryo 2020-05-07 11:41:41 +02:00
Miriam Baglioni 182225becb Merge branch 'master' of https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop 2020-05-07 11:38:17 +02:00
Miriam Baglioni 5efae3acb9 new workflow for job3 2020-05-07 11:38:10 +02:00
Claudio Atzori 73243793b2 Dataset based implementation for SparkCountryPropagationJob3 2020-05-07 11:15:24 +02:00
Claudio Atzori 128c3bf1c8 restored Author bean with simple getter/setter, author pid addition moved into dedicated implementation SparkOrcidToResultFromSemRelJob3 2020-05-07 11:14:56 +02:00
Miriam Baglioni b2fec32c87 new workflow for job3 2020-05-07 10:01:57 +02:00
Miriam Baglioni 29bc8c44b1 changes in the construction of new country set 2020-05-07 10:01:34 +02:00
Miriam Baglioni 55e825acd4 chenged the test according to changes in SparkCOuntryPropagationJob2 2020-05-07 10:01:00 +02:00
Miriam Baglioni 16193cf0ba new workflow and parameter for country propagation 2020-05-07 09:59:58 +02:00
Miriam Baglioni 5a476c7a13 chenged the xquery for the cfhb table 2020-05-07 09:58:17 +02:00
Miriam Baglioni 42ad51577a new implementation with one more serialization step 2020-05-07 09:57:49 +02:00
Miriam Baglioni dd2e698a72 added a sequentialization step on the spark job. Addedd new parameter 2020-05-05 17:03:43 +02:00
Miriam Baglioni 252b219dd5 chanced the name of some properties 2020-05-05 10:03:32 +02:00
Miriam Baglioni 78578c3ccf fixed wrong trnasition name in workflow 2020-05-04 15:46:24 +02:00
Miriam Baglioni cc7d9b6b19 merge upstream 2020-05-04 13:59:09 +02:00
Miriam Baglioni 3957c815b9 changed the name of some parameters 2020-05-04 13:58:52 +02:00
Miriam Baglioni e218360f8a changed code for the mode of DbClient and also removed the dependency to graph-mapper 2020-05-04 12:26:17 +02:00
Miriam Baglioni 31ea05297d moved the DbClient to common and added needed dependency to pom 2020-05-04 12:22:28 +02:00
Miriam Baglioni b7dd400e51 added check if author.pid exists or is null 2020-05-01 15:09:02 +02:00
Miriam Baglioni dbf3ba051a minor 2020-04-30 20:22:07 +02:00
Miriam Baglioni 43053a286d workflow pom with added blacklist module 2020-04-30 18:30:21 +02:00
Miriam Baglioni 0631fe548a pom.xml 2020-04-30 18:29:46 +02:00
Miriam Baglioni 38ecfd5785 the wf with all the three steps for blacklisting relations 2020-04-30 18:28:46 +02:00
Miriam Baglioni 95433e1087 parameters for the preparation phase and blacklist phase 2020-04-30 18:28:13 +02:00
Miriam Baglioni 1070790c19 minor 2020-04-30 18:26:58 +02:00
Miriam Baglioni b9d56b3ced applies the actual removal of the relations 2020-04-30 18:26:25 +02:00
Miriam Baglioni d6d6ebeae5 preparation step: creates the subset of the merges relations 2020-04-30 18:25:33 +02:00
Miriam Baglioni 13f30664ea minor 2020-04-30 15:23:49 +02:00
Miriam Baglioni 276b95b7b3 add create file instruction 2020-04-30 15:05:17 +02:00
Miriam Baglioni 65a5d67b8b minor modifications 2020-04-30 14:45:27 +02:00
Miriam Baglioni 418595fec2 removed the saveGraph parameter 2020-04-30 14:45:00 +02:00
Miriam Baglioni ce8b1d0bc3 new workflow definition to be inserted in the provision pipeline 2020-04-30 14:38:54 +02:00
Miriam Baglioni 4b0bd91012 - 2020-04-30 12:45:28 +02:00
Miriam Baglioni 2349bfd8b8 changed the job test to remove the writeUpdate option 2020-04-30 11:43:33 +02:00
Miriam Baglioni 951517f9ec new input parameters and workflow definition to be used in the provision pipeline 2020-04-30 11:32:50 +02:00
Miriam Baglioni 026f297e49 removed the writeUpdate oprion 2020-04-30 11:31:59 +02:00
Miriam Baglioni c89fe762b1 modified relation datasource organization 2020-04-30 11:17:03 +02:00
Miriam Baglioni 3abb76ff7a merge with upstream 2020-04-30 11:15:54 +02:00
Miriam Baglioni 638a3c465b - 2020-04-30 11:05:17 +02:00
Miriam Baglioni 354f0162be changes in the blacklist and workflow definition 2020-04-30 10:26:50 +02:00
Miriam Baglioni 564e5d6279 added new information in support of blacklist reader 2020-04-30 10:22:58 +02:00
Miriam Baglioni 3cffee74b9 merge with upstream 2020-04-29 18:25:29 +02:00
Miriam Baglioni 9ab46535e7 pom with the new blacklist module added 2020-04-29 18:17:15 +02:00
Miriam Baglioni 6a47e6191d read from blacklist and write the result as relations on hdfs 2020-04-29 18:16:01 +02:00
Miriam Baglioni 869f576273 added hash map for relationship entityType id prefix, and relation inverse 2020-04-29 18:14:52 +02:00
Miriam Baglioni b85ad7012a reads the blacklist from the blacklist db and writes it as a set of relations on hdfs 2020-04-29 17:29:49 +02:00
Miriam Baglioni f7695e833c resolved conflicts 2020-04-29 11:41:31 +02:00
Miriam Baglioni 2980e50edf merge upstream 2020-04-27 15:06:48 +02:00
Miriam Baglioni df34a4ebcc changed the configuration to add ignorecase option to each verb related to covid-19 community 2020-04-27 12:32:56 +02:00
Miriam Baglioni 7a59324ccf changed the test to check for the new ignorecase option 2020-04-27 12:31:46 +02:00
Miriam Baglioni 986c97348d added the ignorecase option to each selection verb 2020-04-27 12:31:05 +02:00
Miriam Baglioni a303fc9f73 resources for testing propagation of result to comminuty from organization and from semrel 2020-04-27 11:14:16 +02:00
Miriam Baglioni c093d764a3 - 2020-04-27 11:12:38 +02:00
Miriam Baglioni c925e2be16 test for propagation of result to community from organization and result to community from semrel 2020-04-27 10:59:53 +02:00
Miriam Baglioni ec7f166690 changed the bl because of changed of the examples for the re implementation of the propagation step 2020-04-27 10:58:41 +02:00
Miriam Baglioni 6135096ef1 refactoring 2020-04-27 10:57:50 +02:00
Miriam Baglioni d30e710165 fixed duplicates action name in the workflow 2020-04-27 10:52:30 +02:00
Miriam Baglioni f9ee343fc0 new parametrized workflow with preparation steps and new parameter input files 2020-04-27 10:48:31 +02:00
Miriam Baglioni e2093644dc changed in the workflow the directory where to store the preparedInfo and the graph genearated at this step 2020-04-27 10:46:44 +02:00
Miriam Baglioni 8a58bf2744 removed the writeUpdate option 2020-04-27 10:45:06 +02:00
Miriam Baglioni 5dccbe13db merge with upstream 2020-04-27 10:43:59 +02:00
Miriam Baglioni 7b6505ec69 new resuorces for testing propagation of project to result after the re-implementation 2020-04-27 10:42:16 +02:00
Miriam Baglioni 1b0e0bd1b5 refactoring 2020-04-27 10:40:26 +02:00
Miriam Baglioni e5a177f0a7 refactoring 2020-04-27 10:36:21 +02:00
Miriam Baglioni e000754c92 refactoring 2020-04-27 10:34:03 +02:00
Miriam Baglioni 95a54d5460 removed the writeUpdate option. The update is available in the preparedInfo path 2020-04-27 10:30:32 +02:00
Miriam Baglioni 8802e4126b re-implemented inverting the couple: from (projectId, relatedResultList) to (resultId, relatedProjectList) 2020-04-27 10:26:55 +02:00
Miriam Baglioni adcbf0e29a refactoring 2020-04-24 10:47:43 +02:00
Miriam Baglioni 0e447add66 removed unuseful classes 2020-04-23 12:59:43 +02:00
Miriam Baglioni edb00db86a refactoring 2020-04-23 12:57:35 +02:00
Miriam Baglioni 44fab140de - 2020-04-23 12:42:07 +02:00
Miriam Baglioni 769aa8178a refactoring 2020-04-23 12:40:44 +02:00
Miriam Baglioni d8dc31d4af refactoring 2020-04-23 12:35:49 +02:00
Miriam Baglioni 8c5dac5cc3 removed unuseful classes 2020-04-23 12:30:58 +02:00
Miriam Baglioni 15656684b9 added proeprties for the preparation step and actual propagation. Added the new parametrized workflow 2020-04-23 12:13:34 +02:00
Miriam Baglioni 6f35f5ca42 added the steps of reset output dir and copy information not changed by the propagation step 2020-04-23 12:12:07 +02:00
Miriam Baglioni 19cd5b85c0 changed the classname to execute 2020-04-23 12:07:41 +02:00
Miriam Baglioni fa2ff5c6f5 refactoring 2020-04-23 11:58:26 +02:00
Miriam Baglioni 540f70298b added missing property 2020-04-23 11:51:48 +02:00
Miriam Baglioni e431fe4f5b added the implements Serializable to each class 2020-04-23 11:48:47 +02:00
Miriam Baglioni 24fa81d7e8 implementation parametrized for result type 2020-04-23 11:44:19 +02:00
Miriam Baglioni ac7ec349cf removed the shaded lib module 2020-04-23 11:43:03 +02:00
Miriam Baglioni ab2a24cc2b changed the dependency to use reflections to find annotated classes 2020-04-23 11:08:47 +02:00
Miriam Baglioni 04fc223346 add method addPid 2020-04-23 11:07:44 +02:00
Miriam Baglioni 5153d88bd3 defiition of workflow and properties for bulktagging 2020-04-23 11:04:53 +02:00
Miriam Baglioni 3b2e4ab670 test for bulktag 2020-04-23 10:00:10 +02:00
Miriam Baglioni 259525cb93 Merge remote-tracking branch 'upstream/master' 2020-04-21 18:33:46 +02:00
Miriam Baglioni 30e53261d0 minor 2020-04-21 18:00:53 +02:00
Miriam Baglioni 676ba49324 changed pom of dhp-build 2020-04-21 16:09:23 +02:00
Miriam Baglioni 90c768dde6 added shaded libs module 2020-04-21 16:03:51 +02:00
Miriam Baglioni e1848b7603 minor 2020-04-18 14:16:42 +02:00
Miriam Baglioni 0ff9b1ef05 added needed parameter 2020-04-18 14:16:29 +02:00
Miriam Baglioni e2dfe8b656 removed not used action 2020-04-18 14:16:07 +02:00
Miriam Baglioni 437ebbad76 refactorign 2020-04-18 14:15:09 +02:00
Miriam Baglioni 9a8876ac86 added needed parameter 2020-04-18 14:14:08 +02:00
Miriam Baglioni 9854852878 refactoring 2020-04-18 14:13:16 +02:00
Miriam Baglioni 454b8a6a29 Merge remote-tracking branch 'upstream/master' 2020-04-18 14:09:44 +02:00
Miriam Baglioni 890ec28f0f input parameters for preparation step1 2020-04-18 14:09:37 +02:00
Miriam Baglioni fbf5c27c27 Added preparation classes before actual propagation 2020-04-18 14:09:03 +02:00
Miriam Baglioni 72c63a326e removed unuseful class 2020-04-17 17:14:51 +02:00
Miriam Baglioni 00c2ca3ee5 - 2020-04-17 17:14:25 +02:00
Miriam Baglioni 7d9fd75020 add method addPid 2020-04-17 17:13:48 +02:00
Miriam Baglioni 5cd092114f use mergeFrom method to add the new community contexts 2020-04-17 17:13:18 +02:00
Miriam Baglioni 264c82f21e minor 2020-04-17 16:54:46 +02:00
Miriam Baglioni 8c079c7a49 unit test for orcid to result propagation from semrel 2020-04-17 16:53:03 +02:00
Miriam Baglioni eacd140a98 added missing parameter(s) 2020-04-17 16:52:30 +02:00
Miriam Baglioni 390e250faf use the addPid method of the Author class to add a new pid 2020-04-17 16:52:02 +02:00
Miriam Baglioni b46b080ddc use mergeFrom method call to add the country(ies) instead of modify the result directly. 2020-04-17 16:50:54 +02:00
Miriam Baglioni c4987dd12a minor 2020-04-17 16:49:08 +02:00
Miriam Baglioni adc11c97a7 Merge remote-tracking branch 'upstream/master' 2020-04-17 12:34:31 +02:00
Miriam Baglioni 5d772e5263 new implementation of propagation of community to result from organization that exploits the prepared info 2020-04-16 18:45:22 +02:00
Miriam Baglioni fff1e5ec39 classes to (de)serialize the data provided in the preparation step 2020-04-16 18:44:43 +02:00
Miriam Baglioni 3fd9d6b02f preparation phase for the propagation of community to result from organization 2020-04-16 18:43:55 +02:00
Miriam Baglioni a9120164aa added hive parameter and a step of reset of the working dir in the workflow 2020-04-16 18:42:04 +02:00
Miriam Baglioni 6afbd542ca changed the save mode to avoid NegativeArraySize... error. Needed to modify also the preparationstep2 2020-04-16 18:40:14 +02:00
Miriam Baglioni d60fd36046 changed the save method 2020-04-16 16:14:15 +02:00
Miriam Baglioni 951b13ac46 input parameters and workflow for new implementation of propagation of orcid to result from semrel and preparation phases 2020-04-16 16:13:10 +02:00
Miriam Baglioni 4d89f3dfed removed unuseful classes 2020-04-16 16:11:44 +02:00
Miriam Baglioni 5e72a51f11 - 2020-04-16 16:11:20 +02:00
Miriam Baglioni c33a593381 renamed 2020-04-16 16:09:47 +02:00
Miriam Baglioni 0e5399bf74 seconf phase of data preparation. Groups all the possible updates by id 2020-04-16 16:08:51 +02:00
Miriam Baglioni 548ba915ac first phase of data preparation. For each result type (parallel) it produces the possible updates 2020-04-16 15:58:42 +02:00
Miriam Baglioni 243013cea3 to (de)serialize the association from the resultId and the list of autoritative authors with orcid to possibly propagate 2020-04-16 15:57:29 +02:00
Miriam Baglioni ac3ad25b36 to (de)serialize needed information of the author to determine if the orcid can be passed (name, surname, fullname (?), orcid) 2020-04-16 15:56:33 +02:00
Miriam Baglioni d6cd700a32 new implementation that exploits prepared information (the list of possible updates: resultId - possible list of orcid to be added 2020-04-16 15:55:25 +02:00
Miriam Baglioni f077f22f73 minor 2020-04-16 15:54:16 +02:00
Miriam Baglioni fd5d792e35 refactoring 2020-04-16 15:53:34 +02:00
Miriam Baglioni 08227cfcbd resources needed for running the test on propagation of result to organization from institutional repositories 2020-04-16 11:06:10 +02:00
Miriam Baglioni a97e915c24 test unit for propagation of result to organization from institutional repository 2020-04-16 11:05:21 +02:00
Miriam Baglioni b078710924 modification to the test due to the removal of unused parameters 2020-04-16 11:04:39 +02:00
Miriam Baglioni a5e5c81a2c input parameters and workflow definition for propagation of result to organization from institutional repositories 2020-04-16 11:03:41 +02:00
Miriam Baglioni 5e1bd67680 removed unuseful parameter 2020-04-16 11:02:01 +02:00
Miriam Baglioni eaf19ce01b removed unuseful class 2020-04-16 10:59:33 +02:00
Miriam Baglioni 7bd49abbef commit to delete 2020-04-16 10:59:09 +02:00
Miriam Baglioni 53f418098b added the isTest checkpoint 2020-04-16 10:53:48 +02:00
Miriam Baglioni c28333d43f minor 2020-04-16 10:52:50 +02:00
Miriam Baglioni a8100baed6 changed the way to save the results to aviod NegativeArray... error 2020-04-16 10:50:09 +02:00
Miriam Baglioni 79b978ec57 refactoring 2020-04-16 10:48:41 +02:00
Miriam Baglioni 3577219127 removed unuseful classes 2020-04-15 12:45:49 +02:00
Miriam Baglioni 964b22d418 modified the writing of the new relations. before: read old rels, add the new ones to them, write all the relations in new location. Now: first step of the wf copies the old relation i new location. If new relations are found, they are saved in the new location in append mode. 2020-04-15 12:32:01 +02:00
Miriam Baglioni 43f0590d4b change in the testing because the business logic is changed. 2020-04-15 12:29:50 +02:00
Miriam Baglioni 473d17767c new business logic for the actual propagation. It exploits previously computed information 2020-04-15 12:25:44 +02:00
Miriam Baglioni 6a377a7582 class to compute some information needed for the actual propagation 2020-04-15 12:25:11 +02:00
Miriam Baglioni 5a3487280d classes to serialize/deserialize the prepared data 2020-04-15 12:24:36 +02:00
Miriam Baglioni 62b09be43c added correct descritption for parameter isSparkSessionManaged 2020-04-15 12:23:06 +02:00
Miriam Baglioni 1859ce8902 minor refactoring 2020-04-15 12:21:31 +02:00
Miriam Baglioni 27f1d3ee8f minor refactoring 2020-04-15 12:21:05 +02:00
Miriam Baglioni 3f4b579e7f new workflow. It is composed of four steps. The first removes the directory where to store the results. The second copies the relation to the new locatio, the third id the preparation phase and then the actual propagation 2020-04-14 16:49:24 +02:00
Miriam Baglioni ca2b40952e minor changes 2020-04-14 16:48:02 +02:00
Miriam Baglioni 61d39e659e parameters for the project2result propagation phase 2020-04-14 16:47:39 +02:00
Miriam Baglioni 92f19fa0a0 parameters for the project2result preparation phase 2020-04-14 16:46:57 +02:00
Miriam Baglioni cadab9b81d new implementation for result to project propagation. Use the prepared info in propagation 2020-04-14 16:46:07 +02:00
Miriam Baglioni ceb1f299bf minor changes 2020-04-14 16:45:12 +02:00
Miriam Baglioni e0038bde5b Support class to serialize/deserialize the association project, set of linked results 2020-04-14 15:32:12 +02:00
Miriam Baglioni c0bebb7c35 code to compute the prepared information used in the actual propagation step. This step will produce who files: one with potential updates (association between projects and a list of results), the other already linked entities (association between projects and the list of results already linked to them) 2020-04-14 15:31:26 +02:00
Miriam Baglioni f47ee5b78e directory where to store the prepared infor before actual propagation will take place 2020-04-14 15:29:21 +02:00
Miriam Baglioni 36cc9516d8 the starting relation set for testing 2020-04-14 15:28:34 +02:00
Miriam Baglioni 4b01dc60e6 test unit for result to project propagation 2020-04-14 15:28:00 +02:00
Miriam Baglioni 8f12292daa changed the way to save the results on filesystem 2020-04-11 16:47:34 +02:00
Miriam Baglioni 87f802821e new workflow for country propagation: it is composed of the preparation step and in the propagation. The propagation part runs in parallel on the result types 2020-04-11 16:40:22 +02:00
Miriam Baglioni a562080b0b parameters to be used in the prepared Job and in the actual country propagation job 2020-04-11 16:39:17 +02:00
Miriam Baglioni 1251ad4455 removed unuseful class 2020-04-11 16:38:13 +02:00
Miriam Baglioni aef9b3aa90 new parametric implementation of country propagation. Exploits information compute before and broadcasts it to each executor 2020-04-11 16:36:59 +02:00
Miriam Baglioni a2d833d5dd step of data preparation before actual country propagation will take palce 2020-04-11 16:36:03 +02:00
Miriam Baglioni 6897c920a2 classes in support of new implementation of country propagation 2020-04-11 16:35:26 +02:00
Miriam Baglioni 85766a02d8 added dependency to use hive on local machine 2020-04-11 16:34:22 +02:00
Miriam Baglioni 79b8ea4fed prepared information to be used in actual country propagation. Subset of info 2020-04-11 16:29:41 +02:00
Miriam Baglioni 1822476613 Test for country propagation 2020-04-11 16:28:09 +02:00
Miriam Baglioni 7783b09c5b new implementation for result to project propagation. Prepare some info to be used in propagation 2020-04-11 16:26:23 +02:00
Miriam Baglioni 90469789b9 two new classes fro new implementation of project to result propagation 2020-04-09 13:29:01 +02:00
Miriam Baglioni 627ad58a8b new wf definition 2020-04-09 11:33:19 +02:00
Miriam Baglioni 9c63c4840d new workflow and parameters for country propagation 2020-04-08 19:13:42 +02:00
Miriam Baglioni a2d309545b new parametrized implementation for country propagation 2020-04-08 19:12:59 +02:00
Miriam Baglioni 6dfdba9ef7 new parametrized implementation for country propagation 2020-04-08 18:14:37 +02:00
Miriam Baglioni 03f7cb6402 new parametrized implementation for country propagation 2020-04-08 18:08:41 +02:00
Miriam Baglioni df2fc4a6d7 Merge remote-tracking branch 'upstream/master' 2020-04-08 18:07:26 +02:00
Miriam Baglioni fcfef4632f input parameters for country propagation preparation job 2020-04-08 18:07:18 +02:00
Miriam Baglioni 61045e84d9 merged conflict in pom 2020-04-08 14:23:30 +02:00
Miriam Baglioni 540da4ab61 new busuness logic with prepared info before actual job run 2020-04-08 13:04:04 +02:00
Miriam Baglioni 8438702b3d addition in propagation constants 2020-04-08 10:54:01 +02:00
Miriam Baglioni 2afe971816 new implementation for country propagatio 2020-04-08 10:49:09 +02:00
Miriam Baglioni beebbcf66b new config for countrypropagation 2020-04-08 10:31:29 +02:00
Miriam Baglioni dd011f4a95 to make them visible to Claudio 2020-03-30 10:55:47 +02:00
Miriam Baglioni b1af90a45f to make it visible to Claudio 2020-03-30 10:50:03 +02:00
Miriam Baglioni 19d7f8b51d decommented execution for some of the result type for testing purposes 2020-03-24 16:49:46 +01:00
Miriam Baglioni ad24c8478f added missing parameter 2020-03-24 16:19:59 +01:00
Miriam Baglioni 46094a3eec bug fixing for implementation with dataset 2020-03-24 16:19:36 +01:00
Miriam Baglioni ad712f2d79 added the needed variables in the config and read the variables in the workflow 2020-03-23 17:11:36 +01:00
Miriam Baglioni f1e9fe9752 changed implementation using dataset and query on hive 2020-03-23 17:11:00 +01:00
Miriam Baglioni f09cd1e911 removed unuseful variable in the configuration 2020-03-23 17:10:14 +01:00
Miriam Baglioni 9418e3d4fa read dataset from files instead of using hive tables 2020-03-23 17:09:27 +01:00
Miriam Baglioni a7bf037306 remove unused class 2020-03-23 14:36:43 +01:00
Miriam Baglioni 8ab8b6b0bf minor 2020-03-23 14:35:23 +01:00
Miriam Baglioni 30d58fd98c change the configuration of the workflow 2020-03-23 14:32:49 +01:00
Miriam Baglioni a440152b46 refactoring 2020-03-23 14:30:56 +01:00
Miriam Baglioni 47561f3597 changed the implementation from rdd to dataset got from sql queries (on hive) 2020-03-23 11:58:32 +01:00
Miriam Baglioni 67ea3cf3ed changed the way to read the file with info on resource or relation. From sequenceFile to textFile 2020-03-17 16:32:05 +01:00
Miriam Baglioni b4652d018c moved the creation of new dir to common class. 2020-03-17 16:31:24 +01:00
Miriam Baglioni 92f4e0001d Merge branch 'bulktag' 2020-03-16 13:33:27 +01:00
Miriam Baglioni ab08a37024 Merge remote-tracking branch 'upstream/master' 2020-03-16 12:45:23 +01:00
Miriam Baglioni c37f2bd1b5 moved some classes to package to make code clearer 2020-03-03 16:42:23 +01:00
Miriam Baglioni d9d2060561 implementation for bulk tagging 2020-03-03 16:38:50 +01:00
Miriam Baglioni e80f80ca93 properties and workflow for new propagation 2020-03-02 17:03:31 +01:00
Miriam Baglioni 50080c1b3c changed the implementation of addAll method. Before adding all the items in a collection, we check if the accumulator set is not empty 2020-03-02 16:41:37 +01:00
Miriam Baglioni 02815dd2cf update result for community moved in propagationconstants 2020-03-02 16:40:56 +01:00
Miriam Baglioni 95f8c3092f update for new propagation implementation and moving of updateResult for community business logic since the same can be used for result to community from organization and result to community from semrel 2020-03-02 16:40:17 +01:00
Miriam Baglioni 3d63f35dcb implementation of new propagation. Result to community for results linked to given organization. We exploit the hasAuthorInstitution semantic link to discover which results are related to institutions 2020-03-02 16:39:03 +01:00
Miriam Baglioni 3a4ccb26c0 New properties for the orcid to result propagation through semantic relation 2020-02-28 18:26:04 +01:00
Miriam Baglioni b50166b9ad None 2020-02-28 18:25:28 +01:00
Miriam Baglioni 550cb21c23 None 2020-02-28 18:24:39 +01:00
Miriam Baglioni b098ee0bae Changed the structure of typed row to conatain also list of authors with orcid 2020-02-28 18:23:51 +01:00
Miriam Baglioni 841f5523fe Added information and methods for the new propagation of orcid to result through semrel 2020-02-28 18:23:16 +01:00
Miriam Baglioni 2b7b05fb29 New propagation of ORCID to result exploiting the semantic relation connecting them. R has author with orcid o, R is bounf by strong semantic relationship with R1 that has the same author withouth orcid, then o is also associated to the author in R1 2020-02-28 18:22:41 +01:00
Miriam Baglioni 833c83c694 Wrong file name 2020-02-28 18:21:01 +01:00
Miriam Baglioni a86426776a Changed from Oaf to Result the type of the updateResult method parameter, not to be forced to cast each time 2020-02-28 18:20:19 +01:00
Miriam Baglioni 3f941a2af4 Merge branch 'master' into propagationCommunityToResult 2020-02-19 18:05:22 +01:00
Miriam Baglioni b2bdc9b99b merging project to result propagation logic to master 2020-02-19 18:04:59 +01:00
Miriam Baglioni a153a07997 none 2020-02-19 18:03:13 +01:00
Miriam Baglioni d0279af630 start to implement the business logic 2020-02-19 17:59:24 +01:00
Miriam Baglioni 5f63ab1416 to query the information system to get the list of comunities up to now. It will have a more general usage when introducing bulk tagging 2020-02-19 17:59:02 +01:00
Miriam Baglioni 5ceb174d24 Merge branch 'master' into propagationCommunityToResult 2020-02-19 17:13:38 +01:00
Miriam Baglioni e8af7a6b64 Merge remote-tracking branch 'upstream/master' 2020-02-19 17:03:10 +01:00
Miriam Baglioni 79ff79b0cd propagation of result to community through semantic relation: C -> R and R -> isSupplementedBy R1 => C -> R1 2020-02-19 17:02:39 +01:00
Miriam Baglioni ab84163bb3 added set accumulator in TypedRow and used it to acucmulate country information in Country Propagation 2020-02-19 15:02:50 +01:00
Miriam Baglioni bb0fdf5e0a fix wrong source target in new relation 2020-02-19 15:00:46 +01:00
Miriam Baglioni 9e1678ccf8 fix error in workflow name 2020-02-19 14:59:24 +01:00
Miriam Baglioni 8aa3b4d7c0 adding to propagation constants the ones needed for propagation of project to result and addition of new accumulator Set in typed row to collect values of a type 2020-02-19 14:55:54 +01:00
Miriam Baglioni 7167673a58 implementation and configuration for propagation of project to result through semantic relation: P -> R1 and R1 -> supplemented by -> R2 => P -> R2 2020-02-19 14:54:18 +01:00
Miriam Baglioni b81e6af429 added config for new propagation 2020-02-18 17:30:44 +01:00
Miriam Baglioni b736a9581c changed relclass and reltype in reelation specification for country propagation and implementation of propagation of result affiliation through institutional repositories 2020-02-18 17:27:28 +01:00
Miriam Baglioni ed262293a6 aligned to new snapshot version 1.1.6 2020-02-18 17:25:32 +01:00
Miriam Baglioni 2688a89c21 changed relclass and reltype in relation specification 2020-02-18 17:24:40 +01:00
Miriam Baglioni c0022fec9f moved on upper package to serve other propagations 2020-02-18 17:24:11 +01:00
Miriam Baglioni e0a777028a fix problem in parameters 2020-02-18 17:23:34 +01:00
Miriam Baglioni 5868ff8a86 synch fork with master 2020-02-17 18:22:27 +01:00
Miriam Baglioni 18e4092d5c change name of properties dir 2020-02-17 18:07:06 +01:00
Miriam Baglioni bd0e504b42 changes to the wf configuration 2020-02-17 18:04:15 +01:00
Miriam Baglioni 3a9d723655 adding default parameters in code 2020-02-17 16:30:52 +01:00
Miriam Baglioni a5517eee35 adding the mkdirs for creation of propagation folder under provision on tmp 2020-02-17 14:20:42 +01:00
Miriam Baglioni 9abde5cfac removed outputPath from job parameters 2020-02-17 14:19:53 +01:00
Miriam Baglioni be2421d5d8 removed wrongly pushed file 2020-02-17 12:07:26 +01:00
Miriam Baglioni c7bc73aedf country propagation for results collected from institutional repositories 2020-02-17 11:44:48 +01:00