Commit Graph

975 Commits

Author SHA1 Message Date
Miriam Baglioni dfa4997a4f removed commented code 2020-05-29 10:45:18 +02:00
Miriam Baglioni 6f1eea28b6 changed message in log 2020-05-29 10:41:39 +02:00
Miriam Baglioni 8b6e886fb6 added new resource for testing 2020-05-28 23:54:31 +02:00
Miriam Baglioni 6989fb9c8a changed the project test according to the newly introduced join with the db project codes 2020-05-28 23:53:24 +02:00
Miriam Baglioni 782984d8e5 added needed parameter 2020-05-28 23:52:41 +02:00
Miriam Baglioni 01f7876595 fix issue with flatMap - the return type must not be null 2020-05-28 23:50:32 +02:00
Miriam Baglioni 773735f870 added the path to the file containing the projects code from the db 2020-05-28 17:30:45 +02:00
Miriam Baglioni 6a15067a64 added one step in the workflow 2020-05-28 17:30:09 +02:00
Miriam Baglioni 5309a99a70 modified the PrepareProjects to consider those in the db 2020-05-28 17:29:53 +02:00
Miriam Baglioni b737ed8236 added part to read projects from the openaire db to filter out those in the csv file that are not in the db 2020-05-28 17:29:21 +02:00
Miriam Baglioni 35b7279147 changed test because data are saved as SequenceFile now, and because of the group by the umber of produced update decrease 2020-05-28 10:26:12 +02:00
Miriam Baglioni 37c155b86a merge branch with fork master 2020-05-28 10:09:51 +02:00
Miriam Baglioni df44db686a refactoring 2020-05-28 10:07:00 +02:00
Miriam Baglioni 87b07f4af8 removed unused variables 2020-05-28 10:05:43 +02:00
Miriam Baglioni 1060977272 added fs actions to remove and the create the workingDir 2020-05-28 10:04:36 +02:00
Miriam Baglioni 96d1a3c431 deleted the file were to store the csv files 2020-05-28 10:04:10 +02:00
Miriam Baglioni 669c05c771 added groupBy before creating Actions 2020-05-28 10:00:45 +02:00
Miriam Baglioni 1855453434 changed the outputdir of the last step 2020-05-27 17:59:36 +02:00
Miriam Baglioni dd1e0b93b8 added merge for Programme 2020-05-27 17:40:32 +02:00
Miriam Baglioni f3dcca0dd0 added equals for programme 2020-05-27 17:23:34 +02:00
Claudio Atzori aac1515b58 Merge pull request 'result_pids without conflicts ???' (#16) from result_pids into master
Looks good, thanks Michele
2020-05-27 12:54:52 +02:00
Michele Artini f5ce7d76e1 resolve conflicts 2020-05-27 12:49:17 +02:00
Michele Artini b81f2741d2 xquery 2020-05-27 12:10:20 +02:00
Michele Artini a25598140a result pids (new xpaths + IS vocabularies) 2020-05-27 12:10:20 +02:00
Michele Artini 7a7272d9ec result pids (new xpaths + IS vocabularies) 2020-05-27 12:10:20 +02:00
Michele Artini 3ceb2d2853 match terms with vocabularies 2020-05-27 11:34:13 +02:00
Claudio Atzori 4e36d689dd fixed XML serialization for children sub-elements (duplicates & externalreferences) 2020-05-26 18:30:40 +02:00
Miriam Baglioni 92e3a52e91 merge branch with fork master 2020-05-26 15:57:51 +02:00
Michele Artini c15d997925 xquery 2020-05-26 13:13:17 +02:00
Michele Artini c6af36496a result pids (new xpaths + IS vocabularies) 2020-05-26 13:11:09 +02:00
Michele Artini 093f1aff03 result pids (new xpaths + IS vocabularies) 2020-05-26 13:06:55 +02:00
Claudio Atzori b8e541a454 fixing repeated organization.websiteurl in organization entities (#5645) as well as project.ecinternationalorganizationeurinterests 2020-05-26 10:30:09 +02:00
Claudio Atzori 55595d7235 HACK: patch NULL values with defaults found in result.datainfo.deletedbyinference and result.context 2020-05-26 10:28:35 +02:00
Claudio Atzori 7b288a94cb code formatting 2020-05-26 09:54:13 +02:00
Claudio Atzori e87eca9300 Merge pull request 'master' (#13) from miriam.baglioni/dnet-hadoop:master into enrichment_wfs 2020-05-26 09:34:23 +02:00
Miriam Baglioni 54d869e618 merge upstream 2020-05-26 09:22:04 +02:00
Miriam Baglioni eea07f4c42 refactoring 2020-05-26 09:21:49 +02:00
Michele Artini d6aada4957 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-26 08:44:31 +02:00
Michele Artini b1546605e3 updated version of a dependency 2020-05-26 08:44:15 +02:00
Claudio Atzori 7582532e73 [maven-release-plugin] prepare for next development iteration 2020-05-25 19:48:18 +02:00
Claudio Atzori 01c2e93395 [maven-release-plugin] prepare release dhp-1.2.1 2020-05-25 19:48:14 +02:00
Claudio Atzori ae04234472 DataInfo.deletedbyinference is false by default 2020-05-25 19:32:48 +02:00
miconis da1e5cf557 implementation of the result title merge. main title with higher trust, distinct between the others 2020-05-25 18:02:57 +02:00
Miriam Baglioni d3d36647d2 merge upstream 2020-05-25 10:38:22 +02:00
Miriam Baglioni 74215f6d9f refactoring 2020-05-25 10:38:16 +02:00
Miriam Baglioni dbde2d243a changed due to move of PacePerson from dhp-graph-mapper to dhp-common 2020-05-25 10:35:39 +02:00
Miriam Baglioni f754c424bd changed logic to compute only onece PacePerson for each Author to be enriched 2020-05-25 10:35:02 +02:00
Miriam Baglioni 8f51af4e9b added PacePerson to get name surname for authors having only fullname set 2020-05-25 10:34:30 +02:00
Miriam Baglioni b258f99ece fix for issue that duplicated result 2020-05-25 10:26:48 +02:00
Miriam Baglioni 8f6ce970f9 moved PacePerson to dhp-common to avoid conflict in dependency with graph-mapper 2020-05-25 10:25:55 +02:00