Commit Graph

257 Commits

Author SHA1 Message Date
Miriam Baglioni 3329b6ce6b [EOSC TAG] added fix for NPE on subjects 2022-07-29 10:54:20 +02:00
Miriam Baglioni 35bcd9422d [EOSC Context Tagging] removed not needed specification in path 2022-07-25 15:45:22 +02:00
Miriam Baglioni 1c82acb168 [EOSC Context Tagging] refactoring: moved EOSC IF tagging in package eosc under bulkTag 2022-07-25 14:26:39 +02:00
Miriam Baglioni 68cb637832 merge with branch beta 2022-07-25 14:24:25 +02:00
Miriam Baglioni 0172bab251 [EOSC Context Tagging] refactoring 2022-07-25 14:16:45 +02:00
Miriam Baglioni 144c103b67 [EOSC Context Tagging] add check to avoid the insertion of the context if already present 2022-07-25 13:52:45 +02:00
Miriam Baglioni d091866e48 [EOSC Context Tagging] refactoring 2022-07-25 11:12:22 +02:00
Miriam Baglioni 06a95daf60 [EOSC context TAG] refactoring after compilation 2022-07-22 14:57:06 +02:00
Miriam Baglioni 627332526b [EOSC context TAG] workflow start from reset_outputpath action 2022-07-22 14:55:11 +02:00
Miriam Baglioni 7a1c1b6f53 [EOSC context TAG] Add test class and resourcesK 2022-07-22 14:36:02 +02:00
Miriam Baglioni 317a4a56ef [EOSC context TAG] first implementation of the logic to tag results imported from datasources registered in the EOSC 2022-07-21 17:37:48 +02:00
Miriam Baglioni 3be036f290 [EOSC TAG] refactoring after compilation 2022-07-21 14:45:43 +02:00
Miriam Baglioni 56d09e6348 [EOSC TAG] before adding the tag added a step to verify the same tag is not already present 2022-07-21 14:36:48 +02:00
Miriam Baglioni 5143a80232 [EOSC TAG] modification of test class to align with new element 2022-07-21 11:56:51 +02:00
Miriam Baglioni 438abdf96f [EOSC TAG] adding eosc interoperability guidelines in the specific element in the result. Removed from subjects. Removed also the deletion of EOSC Jupyter Notebook from subject since now the criteria are searchd for in a different place 2022-07-20 18:07:54 +02:00
Claudio Atzori 1138b2ac8e code formatting 2022-07-19 14:15:49 +02:00
Miriam Baglioni fae681fea1 [Country Propagation] add check to avoid NPE on datasource.getDatasourceType().getClassis() 2022-07-03 17:39:58 +02:00
Claudio Atzori 0cb1c70788 code formatting 2022-07-01 10:44:08 +02:00
Miriam Baglioni 5e0b8f9b5f [CountryPropagation] refactoring 2022-05-20 09:15:53 +02:00
Miriam Baglioni c298c148cb [CountryPropagation] fix NPE issue 2022-05-20 09:11:46 +02:00
Miriam Baglioni f5207885e3 [EOSCTag] changed code to remove EOSC Jupyter Notebook and modified test to exclude galaxy + software from the tagging for Galaxy 2022-05-17 15:09:22 +02:00
Miriam Baglioni e4eac1d20b [EOSC TAG] added code to remove EOSC Jupyter Notebook from subjects and put EOSC as classid in the qualifier 2022-05-13 11:01:33 +02:00
Miriam Baglioni 8a72de4011 [EOSCTag] modified workflow to execute all the steps and not only the last one 2022-05-04 10:10:56 +02:00
Miriam Baglioni 3aeedd931a [EOSCTag] fixed issue in case description is null. Modified test resources and classes 2022-05-04 10:06:38 +02:00
Miriam Baglioni a21fe310e5 [EOSCTag] last test and change in the implementation to search in title and descriptio 2022-05-02 17:43:20 +02:00
Miriam Baglioni e342ec93f0 [EOSCTag] prepared resources for test 2022-04-22 18:35:37 +02:00
Miriam Baglioni 88562c0930 [EOSC TAG] added test for galaxy for title and description criterias 2022-04-22 18:35:03 +02:00
Miriam Baglioni dfbd2bcbea [EOSC TAG] added logic in case subject is null 2022-04-22 18:34:03 +02:00
Miriam Baglioni 27c85e901a [EOSCTag] added resources and finalized test for Jupyter Notebook tagging 2022-04-22 17:38:10 +02:00
Miriam Baglioni bbb77052d3 [EOSCTag] first test 2022-04-22 11:32:57 +02:00
Miriam Baglioni 7cb7066472 [EoscTag] first "rough" implementation 2022-04-22 10:44:17 +02:00
Miriam Baglioni 6dc68c48e0 [EOSCTag] - 2022-04-21 16:19:04 +02:00
Miriam Baglioni d012d125d7 [EOSCTag] - 2022-04-21 12:02:09 +02:00
Miriam Baglioni c5a863132c [BulkTagging] revert it 2022-04-14 14:14:13 +02:00
Miriam Baglioni 8e8933d41a [BulkTagging] added fix if result.dataInfo is null 2022-04-14 09:04:24 +02:00
Claudio Atzori 48b580b45c [graph enrichment] fixed country_propagation oozie workflow definition, parameter saveGraph is not needed anymore by the SparkCountryPropagationJob 2022-04-11 08:52:36 +02:00
Claudio Atzori 21f32b83c6 [graph enrichment] fixed country_propagation oozie workflow definition, parameter saveGraph is not needed anymore by the SparkCountryPropagationJob 2022-04-11 08:52:12 +02:00
Claudio Atzori c26222623f [maven-release-plugin] prepare for next development iteration 2022-04-07 13:32:22 +02:00
Claudio Atzori 86585a6b27 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 13:32:19 +02:00
Claudio Atzori ad85d88eaf [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 13:28:35 +02:00
Claudio Atzori 598e11dfd7 [maven-release-plugin] prepare for next development iteration 2022-04-07 13:27:02 +02:00
Claudio Atzori db3d9877a5 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 13:26:58 +02:00
Claudio Atzori 3bba6d6e38 [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 12:23:17 +02:00
Claudio Atzori 2ac2d928bd [maven-release-plugin] prepare for next development iteration 2022-04-07 12:18:47 +02:00
Claudio Atzori 85bc722ff4 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 12:18:43 +02:00
Claudio Atzori bc05b6168a [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 11:49:06 +02:00
Claudio Atzori 505420fd61 [maven-release-plugin] prepare for next development iteration 2022-04-07 11:34:06 +02:00
Claudio Atzori 66e718981e [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 11:34:02 +02:00
Miriam Baglioni 7b8f85692e [Enrichment country] fixed issues with parameters and workflow args 2022-03-23 17:20:23 +01:00
Claudio Atzori f10066547b increased spark.sql.shuffle.partitions in affiliation_from_semrel_propagation 2022-03-23 12:22:26 +01:00
Claudio Atzori f430029596 cleanup 2022-03-11 14:28:28 +01:00
Miriam Baglioni 12de9acb0d [Country Propagation] left out from previous commit 2022-03-11 14:17:02 +01:00
Miriam Baglioni 4437f9345d [Country Propagation] left out from previous commit 2022-03-11 13:57:47 +01:00
Miriam Baglioni 2b643059fa [Country Propagation] changed the logic to get the collectedfrom at the result level. To fix issue when no instance is created for a result that should have the country associated. Change the code to use spark instead of hive to prepare the data needed for the propagation step. Added new tests for the intermediate steps and new verification for the propagation itself 2022-03-11 13:56:48 +01:00
Miriam Baglioni f5b0a6f89c [master to beta] fixed issues in test files 2022-02-25 10:21:57 +01:00
Miriam Baglioni 37784209c9 [dhp-schemas-] updated the version of dhp-schema to 2.10.27 for APC name and id modification 2022-02-02 12:46:31 +01:00
Miriam Baglioni dce7f5fea8 [BULK TAGGING] changed to fix issue that should have been fixed already 2022-01-31 08:20:28 +01:00
Miriam Baglioni 064f9bbd87 [AFFPropSR] added new paprameter for the number of iterations and new code for just one iteration 2022-01-07 18:58:51 +01:00
Sandro La Bruzzo 3920d68992 Fixed workflow generation of delta in datacite 2021-12-21 11:41:49 +01:00
Claudio Atzori 1790fa2d44 Merge branch 'beta' into affiliationPropagation 2021-12-14 15:26:56 +01:00
Miriam Baglioni 2bbece2ca5 mergin with branch beta 2021-11-16 16:35:40 +01:00
Sandro La Bruzzo 2d67020c59 added dhp-enrichment maven site template 2021-11-16 16:01:08 +01:00
Miriam Baglioni 28ea532ece [Affilaition Propagation] moved the selection of graph relation as a preparation step 2021-11-16 15:24:19 +01:00
Miriam Baglioni 7c96e3fd46 removed not useful dir 2021-11-16 13:57:26 +01:00
Miriam Baglioni c7c0c3187b [AFFILIATION PROPAGATION] Applied some SonarLint suggestions 2021-11-16 13:56:32 +01:00
Miriam Baglioni 935062edec [Bypass Action Set] creation of unresolved entities 2021-11-11 16:11:25 +01:00
Miriam Baglioni c371b23077 - 2021-11-10 17:00:37 +01:00
Miriam Baglioni 9e214ce0eb [BypassAS] addition of OC relations 2021-11-09 12:07:19 +01:00
Miriam Baglioni 6f7ca539c6 [BypassAS] update of results for bipFinder and FOS 2021-11-09 11:25:41 +01:00
Miriam Baglioni a7d50c499b [BypassAS] prepare FOS subject, test and model for FOS and BipFinder scores 2021-11-08 16:44:19 +01:00
Miriam Baglioni b9d124bb7c [Enrichment: Propagation through parent-child relationships] Added counters, and changed constraint to verify if filtering out the relation (from classname = harvested to classid != propagation) 2021-11-03 13:55:37 +01:00
Miriam Baglioni 09f36cffb8 [Enrichment: Propagation through parent-child relationships] First implementation, testing, and wf for propagation of result to organization through semantic relation 2021-10-29 11:20:03 +02:00
Miriam Baglioni d0ef7d91c5 adding test resource 2021-10-26 17:34:11 +02:00
Miriam Baglioni 652114c641 [affiliationPropagation] first try. preparetion 2021-10-20 11:44:23 +02:00
Sandro La Bruzzo 5606014b17 code refactor see ticket #7065 2021-10-12 08:11:53 +02:00
Miriam Baglioni e9ccdf853f related to #132 2021-09-15 18:44:54 +02:00
Miriam Baglioni 5f674efb0c moved dependency version in external pom 2021-08-13 10:07:53 +02:00
Claudio Atzori 2ee21da43b suggestions from SonarLint 2021-08-11 12:13:22 +02:00
Claudio Atzori 741077dbca Merge pull request 'Fix in Affiliation Propagation' (#113) from miriam.baglioni/dnet-hadoop:master into stable_ids
Reviewed-on: #113
2021-06-09 18:42:42 +02:00
Miriam Baglioni 32b0c27217 Aggiornare 'dhp-workflows/dhp-enrichment/src/main/java/eu/dnetlib/dhp/resulttoorganizationfrominstrepo/PrepareResultInstRepoAssociation.java'
fix in SQL query: while writing the blacklist constraint it used d.id to indicate the datasource id, but no alias for the datasource was defined. So I removed the alias
2021-06-09 18:36:11 +02:00
Miriam Baglioni dc07f1079b added check in case the author set to be enriched is null 2021-06-08 12:06:10 +02:00
Claudio Atzori b695932ae4 integrated pull#108 2021-05-20 15:34:04 +02:00
Miriam Baglioni 02b80cf24f resolved conflicts 2021-05-20 10:59:39 +02:00
Claudio Atzori 23b8883ab1 applied intellij code cleanup 2021-05-14 10:58:12 +02:00
Claudio Atzori 27ab8a704d adjusted poms to align with the external dhp-schema module 2021-04-27 10:12:27 +02:00
Claudio Atzori c2bb03c8b5 depending on external dhp-schemas module 2021-04-23 17:57:35 +02:00
Claudio Atzori 7ed107be53 depending on external dhp-schemas module 2021-04-23 17:52:36 +02:00
Miriam Baglioni 72e5aa3b42 refactoring 2021-04-23 12:10:30 +02:00
Miriam Baglioni fe36895c53 added datasource blacklist for the organization to result propagation through institutional repositories 2021-01-22 11:55:10 +01:00
Claudio Atzori 4766495f5b [orcid_to_result_from_semrel_propagation] fixed typo in SQL 2020-12-17 09:15:50 +01:00
Claudio Atzori 7d325e2c57 using actual result subclasses instead of their parent class 2020-12-14 14:40:54 +01:00
Claudio Atzori 152916890f renamed test name 2020-12-14 14:40:05 +01:00
Miriam Baglioni 4c58bd1c93 merge with upstream 2020-12-03 11:24:00 +01:00
Miriam Baglioni 05c452f58d merge with upstream 2020-12-03 10:26:45 +01:00
Claudio Atzori cfb55effd9 code formatting 2020-12-02 11:23:49 +01:00
Claudio Atzori 74242e450e using constants from ModelConstants 2020-12-02 11:23:35 +01:00
Miriam Baglioni d5efa6963a using constants in ModelCOnstants 2020-12-02 11:20:26 +01:00
Miriam Baglioni cd285e98bc usoing the constants defined in the ModelConstants class 2020-12-02 11:13:23 +01:00
Miriam Baglioni f8468c9c22 added extention for new author pid (orcid_pending) 2020-12-01 20:09:35 +01:00
Miriam Baglioni 55e24c2547 relclass for relation and corresponding values have been put to lower case (isSupplementedBy wrote as IsSupplementedBy - orcid propagation) 2020-08-18 16:42:08 +02:00
Miriam Baglioni bc6b5d5b34 removed leftover parameter 2020-08-15 11:22:35 +02:00
Miriam Baglioni 200cd5c730 removed leftover parameter 2020-08-15 11:22:19 +02:00
Miriam Baglioni de995970ea try again to solve clash with master 2020-08-14 15:24:36 +02:00
Miriam Baglioni 5040d72d5e changed to make it equal to master branch 2020-08-14 15:20:17 +02:00
Miriam Baglioni be8106c339 added space toavoid conflicts with master branch 2020-08-14 15:16:27 +02:00
Miriam Baglioni b7e49aee8d removed commented code 2020-08-13 18:44:07 +02:00
Miriam Baglioni 270c89489c fixed issue created while renaming subject to subjects in community configuration xml 2020-08-13 15:16:04 +02:00
Miriam Baglioni c3672b162b merge branch with master 2020-08-11 17:53:04 +02:00
Miriam Baglioni a16bbf3202 changed test resource to mirror change in the Xquery that produced data to be parsed. The main Zenodo community it is no more provided in a different element, but it is part of the <zenodocommunities> 2020-08-11 17:48:44 +02:00
Miriam Baglioni 5b651abf82 merge branch with master 2020-08-04 10:14:07 +02:00
Miriam Baglioni 88e4c3b751 added default trust to context bulktagged 2020-08-04 10:13:25 +02:00
Miriam Baglioni f9342cb484 added constant 2020-08-03 18:32:35 +02:00
Miriam Baglioni 96c3c891f4 added trust 2020-08-03 18:32:17 +02:00
Miriam Baglioni 53656600ad changed XQuery to select only community and ri with status not hidden 2020-08-03 18:29:30 +02:00
Miriam Baglioni 40bbe94f7c merge with master fork 2020-07-20 18:10:03 +02:00
Miriam Baglioni b904e0699a - 2020-07-20 18:02:53 +02:00
Miriam Baglioni d7d84c8217 - 2020-07-17 14:03:23 +02:00
Miriam Baglioni faea30cda0 - 2020-07-09 14:05:21 +02:00
Miriam Baglioni 4a7de07ea2 refactoring 2020-06-25 16:32:40 +02:00
Miriam Baglioni 54a12978d3 fixed issue in xquery 2020-06-25 16:30:20 +02:00
Miriam Baglioni 507f7a94a8 added one of the main zenodo communities to the tagging conf for testing purposes 2020-06-23 08:45:27 +02:00
Miriam Baglioni af1d40351b changed XQuery to add also the main Zenodo community among the communities associated to the openaire community 2020-06-22 19:20:54 +02:00
Claudio Atzori 9cd27183b6 [maven-release-plugin] prepare for next development iteration 2020-06-22 11:27:44 +02:00
Claudio Atzori 1e3dab0631 [maven-release-plugin] prepare release dhp-1.2.3 2020-06-22 11:27:39 +02:00
Claudio Atzori c4d9f1837f [maven-release-plugin] prepare for next development iteration 2020-06-12 12:21:08 +02:00
Claudio Atzori f0746a7605 [maven-release-plugin] prepare release dhp-1.2.2 2020-06-12 12:21:03 +02:00
Claudio Atzori 55595d7235 HACK: patch NULL values with defaults found in result.datainfo.deletedbyinference and result.context 2020-05-26 10:28:35 +02:00
Miriam Baglioni 54d869e618 merge upstream 2020-05-26 09:22:04 +02:00
Miriam Baglioni eea07f4c42 refactoring 2020-05-26 09:21:49 +02:00
Claudio Atzori 7582532e73 [maven-release-plugin] prepare for next development iteration 2020-05-25 19:48:18 +02:00
Claudio Atzori 01c2e93395 [maven-release-plugin] prepare release dhp-1.2.1 2020-05-25 19:48:14 +02:00
Miriam Baglioni 74215f6d9f refactoring 2020-05-25 10:38:16 +02:00
Miriam Baglioni f754c424bd changed logic to compute only onece PacePerson for each Author to be enriched 2020-05-25 10:35:02 +02:00
Miriam Baglioni 8f51af4e9b added PacePerson to get name surname for authors having only fullname set 2020-05-25 10:34:30 +02:00
Miriam Baglioni b258f99ece fix for issue that duplicated result 2020-05-25 10:26:48 +02:00
Miriam Baglioni 0d1ec1913f added fix to avoid duplication of results 2020-05-22 18:42:25 +02:00
Miriam Baglioni 29066a6b46 applied code cleanup 2020-05-22 15:38:50 +02:00
Miriam Baglioni 8610ad5142 added groupby id to fix multiple result with same id at join step 2020-05-22 15:32:55 +02:00
Miriam Baglioni 4308f31165 added fix to make test run 2020-05-22 13:13:01 +02:00
Miriam Baglioni b71fbb68b1 removed the removeOutputDir command from code. Reltions are written in Append. The erase of the output dir ment to remove all the relations computed in the prevoius steps 2020-05-18 13:57:20 +02:00
Claudio Atzori ef9a9a9f1a remove the outout path when starting 2020-05-15 22:34:19 +02:00
Claudio Atzori a832658296 code formatting 2020-05-15 10:21:09 +02:00
Miriam Baglioni f25db01664 changed in the constant from propagationconstants to modelconstants 2020-05-14 18:29:24 +02:00
Miriam Baglioni d05630d979 removed the constants added in ModelConstants 2020-05-14 18:22:50 +02:00
Miriam Baglioni e7eb4f377e Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-14 10:34:17 +02:00
Miriam Baglioni 8828458acf minor changes 2020-05-14 10:34:12 +02:00
Claudio Atzori ab37953332 added global properties in wf definitions to avoid repeating name-node and job-tracker in the (many) distcp actions; reintroduced output directory removal at the beginning of each spark action 2020-05-14 10:25:41 +02:00
Miriam Baglioni 43f127448d changed the package name from dhp-propagation to dhp-enrichment for the preparation phase of funding propagation 2020-05-12 18:24:26 +02:00
Claudio Atzori ec0782e582 renamed jar containing the bulktagging and propagation workflows from dhp-[bulktagging|propagation] to dhp-enrichment; adjusted xml formatting 2020-05-12 15:49:28 +02:00
Miriam Baglioni 14979f299e changed the configuration factory 2020-05-12 11:28:38 +02:00