Commit Graph

1746 Commits

Author SHA1 Message Date
Claudio Atzori d9e07a242b extended XmlIndexingJob to accept an optional parameter: outputPath. When present, forces the job to write its output on the specified HDFS location 2020-11-18 14:34:55 +01:00
Claudio Atzori 29dcff0f34 spark complains about missing classes, so here they are again 2020-11-18 14:32:32 +01:00
Miriam Baglioni 57cac36898 changed the workflow name 2020-11-18 13:38:03 +01:00
Claudio Atzori 12acf25519 Merge pull request 'starting from first step...' (#57) from antonis.lempesis/dnet-hadoop:master into master
No judging. Just re-deploying...
2020-11-18 11:01:49 +01:00
Claudio Atzori 8177ce7939 test for XmlIndexingJob based on a local miniSolrCluster 2020-11-18 10:58:05 +01:00
Alessia Bardi 10e673660f Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-11-18 10:01:23 +01:00
Alessia Bardi be7b310cef rel semantcis ignore case 2020-11-18 10:01:20 +01:00
Michele Artini 33da2e3d6c xpaths for dateOfCollection and dateOfTransformation 2020-11-18 09:26:20 +01:00
Antonis Lempesis 01a6e03989 starting from first step... 2020-11-17 23:26:47 +02:00
Alessia Bardi 8f87020a50 #56: map relevantDates from aggregated ODF records 2020-11-17 18:42:09 +01:00
Alessia Bardi 7e0a76a8ac test fr TextGrid 2020-11-17 18:39:25 +01:00
Claudio Atzori cfc01f136e PID filtering based on a blacklist 2020-11-17 12:27:06 +01:00
Claudio Atzori 6ab1ce53c9 fixed condition in result pid cleaning; cleanup 2020-11-16 10:09:17 +01:00
Claudio Atzori 4de8c8b237 fixed workflow variable name 2020-11-16 10:03:11 +01:00
Claudio Atzori 331d621800 added test resource 2020-11-14 12:16:15 +01:00
Claudio Atzori 5d4e34e26a fixed typo in variable name 2020-11-14 10:32:26 +01:00
Claudio Atzori 768bc5304c Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-11-13 15:40:34 +01:00
Claudio Atzori 93f7b7974f Merge pull request 'trust truncated to 3 decimals' (#24) from trunc_trust into master
LGTM
2020-11-13 15:40:02 +01:00
Claudio Atzori 528231a287 grouping graph entities by id turned out to be an easy extension for the already existing cleaning workflow 2020-11-13 15:37:48 +01:00
Claudio Atzori 2bed29eb09 WIP: added oozie workflow for grouping graph entities by id 2020-11-13 10:05:12 +01:00
Claudio Atzori 13e36a4da0 WIP: added oozie workflow for grouping graph entities by id 2020-11-13 10:05:02 +01:00
Claudio Atzori 75324ae58a Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-11-12 09:23:37 +01:00
Claudio Atzori 822971f54f no need to filter relations in CreateRelatedEntitiesJob_phase1; replaced 'left outer' join with 'left' join in CreateRelatedEntitiesJob_phase2; cleanup; 2020-11-12 09:22:59 +01:00
Claudio Atzori 9841488482 Merge pull request 'latest changes in stats wf' (#54) from antonis.lempesis/dnet-hadoop:master into master
LGTM, thanks!
2020-11-11 16:01:51 +01:00
Antonis Lempesis 99ebaee347 fixed #5913 2020-11-11 16:56:46 +02:00
Claudio Atzori e3d3481fb9 Merge pull request 'organizations pids' (#53) from organization_pids into master
LGTM
2020-11-11 14:08:25 +01:00
Antonis Lempesis f14e65f6a3 reverted wrong change 2020-11-10 17:23:04 +02:00
Antonis Lempesis c02c7741c9 fixes in db creation 2020-11-10 17:11:30 +02:00
Antonis Lempesis e603fa5847 fixes in db creation 2020-11-10 17:11:12 +02:00
Claudio Atzori 18d9aad70c improved documentation in dhp-graph-provision 2020-11-10 11:48:55 +01:00
Michele Artini 40160d171f organizations pids 2020-11-09 12:58:36 +01:00
Sandro La Bruzzo 027ef2326c Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-11-06 17:12:42 +01:00
Sandro La Bruzzo cd27df91a1 fixed bug on missing relation in ANDS 2020-11-06 17:12:31 +01:00
Claudio Atzori d10447e747 re-packaged graph dump workflow sources 2020-11-05 17:38:18 +01:00
Miriam Baglioni f8e9bda24c merge branch with master 2020-11-05 16:31:18 +01:00
Miriam Baglioni be5ed8f554 added check to avoid sending empty metadata. 2020-11-05 16:10:17 +01:00
Claudio Atzori 2148a51fae minor changes 2020-11-05 11:24:12 +01:00
Claudio Atzori 4625b7486e code formatting 2020-11-04 18:12:43 +01:00
Claudio Atzori f5f346dd2b Merge pull request 'dump' (#50) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
2020-11-04 18:07:01 +01:00
Miriam Baglioni e9ac471ae9 removed dependency from classes for the pid graph dump 2020-11-04 18:04:42 +01:00
Miriam Baglioni b90a945c49 removed property files for pid graph dump 2020-11-04 17:28:33 +01:00
Miriam Baglioni bac307155a removed properties specific for pid graph dump 2020-11-04 17:28:04 +01:00
Miriam Baglioni 9c9d50f486 removed code specific for pid graph dump 2020-11-04 17:26:22 +01:00
Miriam Baglioni 5669890934 removed commented lines 2020-11-04 17:15:21 +01:00
Miriam Baglioni 6a89f59be9 removed commented lines 2020-11-04 17:13:59 +01:00
Miriam Baglioni 56150d7e5e removed all code related to the dump of pids graph 2020-11-04 17:13:12 +01:00
Miriam Baglioni 16c54a96f8 removed pid dump 2020-11-04 17:11:32 +01:00
Miriam Baglioni 0cac5436ff Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump 2020-11-04 13:21:11 +01:00
Alessia Bardi 51808b5afd Updated descriptions 2020-11-04 12:29:48 +01:00
Alessia Bardi e6becf8659 Updated descriptions 2020-11-04 12:17:57 +01:00
Alessia Bardi 0abe0eee33 Updated descriptions 2020-11-04 12:15:30 +01:00
Alessia Bardi f6ab238f5d Updated descriptions 2020-11-04 11:50:47 +01:00
Sandro La Bruzzo 3581244daf Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-11-04 09:04:22 +01:00
Sandro La Bruzzo 66efb39634 implemented merge scholix 2020-11-04 09:04:01 +01:00
Miriam Baglioni c010a8442f fixed issue on test code 2020-11-03 17:26:51 +01:00
Miriam Baglioni 8ec7a61188 merge branch with master 2020-11-03 16:59:08 +01:00
Miriam Baglioni c209284ca7 new schemas for the entities in the dump with added descriptions 2020-11-03 16:58:08 +01:00
Miriam Baglioni 08806deddf added the splitSize non mandatory parameter. Default size 10G 2020-11-03 16:57:34 +01:00
Miriam Baglioni 7d2eda43ca added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase 2020-11-03 16:57:01 +01:00
Miriam Baglioni cbbb1bdc54 moved business logic to new class in common for handling the zip of hte archives 2020-11-03 16:55:50 +01:00
Miriam Baglioni d4382b54df moved the tar archive with maz size on common module 2020-11-03 16:54:50 +01:00
Claudio Atzori 5310e56dba remove empy PIDs 2020-11-03 11:52:10 +01:00
Sandro La Bruzzo 754c86f33e fixed test to work on jenkins 2020-11-02 09:35:01 +01:00
Sandro La Bruzzo 39337d8a8a fixed test 2020-11-02 09:26:25 +01:00
Miriam Baglioni dabb33e018 changed the discriminant for which split the file 2020-10-30 17:52:22 +01:00
Claudio Atzori c5dda3a00c Merge pull request 'h2020classification' (#49) from miriam.baglioni/dnet-hadoop:h2020classification into master
LGTM
2020-10-30 17:10:05 +01:00
Miriam Baglioni 4905739be6 changed resource file to mirror change in business logic 2020-10-30 17:02:57 +01:00
Miriam Baglioni b40360ebfb changed the code to mirror the changed decision in the classification level and prodramme description labels 2020-10-30 17:02:30 +01:00
Miriam Baglioni 696409fb9f disabled tests because needing remote resource 2020-10-30 17:01:48 +01:00
Miriam Baglioni 0fba08eae4 max allowed size per file 10 Gb 2020-10-30 16:05:55 +01:00
Miriam Baglioni b828587252 prevent the code to cicle indefinetly 2020-10-30 15:01:25 +01:00
Miriam Baglioni f747e303ac classes for dumping of the graph as ttl file 2020-10-30 14:13:45 +01:00
Miriam Baglioni 16baf5b69e formatting 2020-10-30 14:13:14 +01:00
Miriam Baglioni a9eef9c852 added check for possible Optional value in relation dataInfo 2020-10-30 14:12:28 +01:00
Miriam Baglioni 5f4de9a962 formatting 2020-10-30 14:11:40 +01:00
Miriam Baglioni 14bf2e7238 added option to split dumps bigger that 40Gb on different files 2020-10-30 14:09:04 +01:00
Miriam Baglioni 78fdb11c3f merge branch with master 2020-10-29 12:55:22 +01:00
Sandro La Bruzzo 1d9fdb7367 fixed spark memory issue in SparkSplitOafTODLIEntities 2020-10-28 12:30:32 +01:00
Miriam Baglioni d2374e3b9e added code to handle cases where the funding tree is not existing 2020-10-27 16:15:21 +01:00
Miriam Baglioni 5d3012eeb4 changed code to dump only the programme list and not the classification list 2020-10-27 16:14:18 +01:00
Miriam Baglioni 3241ec1777 added connection timeout and socket timeout 600 sec 2020-10-27 16:12:11 +01:00
sandro 3a81a940b7 solved bug on merge publication 2020-10-21 22:41:55 +02:00
Miriam Baglioni a2ce527fae changed to match the requirements for short titles in level and long titles in classification 2020-10-20 17:03:25 +02:00
Sandro La Bruzzo 346ed65e2c added upload to zenodo node 2020-10-20 16:59:55 +02:00
sandro 271b4db450 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-10-20 16:09:49 +02:00
sandro d58d02d448 added workflow upload on zenodo 2020-10-20 16:09:07 +02:00
Alessia Bardi 1425d810a8 testing mapping 2020-10-19 17:46:14 +02:00
Sandro La Bruzzo fed711da80 Merge remote-tracking branch 'origin/master' into merge_record_to_common 2020-10-13 15:32:45 +02:00
Sandro La Bruzzo 34bf64c94f fixed export Scholexplorer to OpenAire 2020-10-13 08:47:58 +02:00
Alessia Bardi 8775a64bc1 Merge pull request 'Merging different compatibility levels (pinocchio operator)' (#47) from merge_graph into master 2020-10-09 14:44:52 +02:00
Claudio Atzori e751c1402f Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-10-09 13:53:21 +02:00
Claudio Atzori b961dc7d1e added originalid to the fields in the result graph view 2020-10-09 13:53:15 +02:00
Sandro La Bruzzo 734934e2eb fixed error on empty intersection with publication and relation on export to OAF 2020-10-08 17:29:29 +02:00
Sandro La Bruzzo eec418cd26 moved AuthoreMerger into dhp-common 2020-10-08 10:33:55 +02:00
Sandro La Bruzzo fe0a7870e6 Added test to check if merge authors works 2020-10-08 10:33:12 +02:00
Sandro La Bruzzo cd9c377d18 adpted scholexplorer Dump generation to the new Dataset definition 2020-10-08 10:10:13 +02:00
Claudio Atzori a3f37a9414 javadoc 2020-10-07 16:44:22 +02:00
Claudio Atzori 8d85a2fced [BETA wf only] datasources involved in the merge operation doesn't obey to the infra precedence policy, but relies on a custom behaviour that, given two datasources from beta and prod returns the one from prod with the highest compatibility among the two 2020-10-07 16:28:52 +02:00
Claudio Atzori 5f7b75f5c5 code formatting 2020-10-07 13:22:54 +02:00
miconis 5a8bc329c5 bug fix in the result merge: it takes the correct bestaccessright basing on the license instead of the trust 2020-10-06 15:26:44 +02:00