Commit Graph

1748 Commits

Author SHA1 Message Date
Miriam Baglioni a9cc70d3b0 fixed issues on wf definition and proeprty name 2020-11-30 12:08:33 +01:00
Miriam Baglioni 02589717b0 change in wf definition to have the cleaning part follow the report part directly. Modification of the name of the properties. Adjustments in test classes 2020-11-30 12:00:26 +01:00
Miriam Baglioni 44a66ad8a6 first try in adding orcid emend after cleaning step 2020-11-27 17:28:29 +01:00
Miriam Baglioni 3d66a7c0d6 - 2020-11-25 17:41:29 +01:00
Miriam Baglioni 2f9ac993e0 merge branch with master 2020-11-25 14:42:06 +01:00
Claudio Atzori eeebd5a920 Cleanig workflow: remove newlines from titles, descriptions, subjects 2020-11-24 18:40:25 +01:00
Miriam Baglioni 33c27b6f75 - 2020-11-20 10:07:50 +01:00
Miriam Baglioni d08dca0745 merge branch with master 2020-11-19 19:17:06 +01:00
Claudio Atzori d48f388fb2 Merge branch 'provision_indexing' 2020-11-19 15:59:55 +01:00
Claudio Atzori 46bde9c13f Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-11-19 15:26:27 +01:00
Claudio Atzori 7c9feaf9e7 project attributes removed from the XML record serialization: contactfullname, contactfax, contactphone, contactemail 2020-11-19 15:26:20 +01:00
Michele Artini 293da47ad9 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-11-19 10:42:31 +01:00
Michele Artini ab08d12c46 considering abstract > MIN_LENGTH in ENRICH_MISSING_ABSTRACT 2020-11-19 10:42:10 +01:00
Claudio Atzori e503271abe fixed notification workflow name 2020-11-19 10:41:38 +01:00
Claudio Atzori 0374d34c3e introduced configuration param outputFormat: HDFS | SOLR 2020-11-19 10:34:28 +01:00
Miriam Baglioni cb3cb8df04 - 2020-11-18 18:11:27 +01:00
Claudio Atzori ede7fae6c8 Merge pull request 'XML record indexing test' (#58) from provision_indexing into master 2020-11-18 17:04:34 +01:00
Claudio Atzori 5218718e8b updated set of fields from the MDFormatDSResourceType on PROD 2020-11-18 15:00:41 +01:00
Claudio Atzori d9e07a242b extended XmlIndexingJob to accept an optional parameter: outputPath. When present, forces the job to write its output on the specified HDFS location 2020-11-18 14:34:55 +01:00
Claudio Atzori 29dcff0f34 spark complains about missing classes, so here they are again 2020-11-18 14:32:32 +01:00
Miriam Baglioni c702f8e6a3 added dependency for consrtaint computing, and added test 2020-11-18 14:16:48 +01:00
Miriam Baglioni f4fee8f43c changed upper bound for whitelist 2020-11-18 14:04:17 +01:00
Miriam Baglioni 96d50080b4 merge branch with master 2020-11-18 12:26:14 +01:00
Miriam Baglioni 0e407b5f23 new tests 2020-11-18 12:18:11 +01:00
Miriam Baglioni 07837e51a9 - 2020-11-18 12:16:00 +01:00
Miriam Baglioni 1f52649d13 changed workflow and added parameters (whitelist) 2020-11-18 12:14:58 +01:00
Miriam Baglioni 683275a5db added logic to filter out other possible right matches 2020-11-18 12:14:13 +01:00
Miriam Baglioni 2aa48a132a added alternative names on the report 2020-11-18 12:13:43 +01:00
Miriam Baglioni 8555082bb1 changed to allow and logic in constraints verification 2020-11-18 12:13:13 +01:00
Miriam Baglioni a4781ddf65 changed to consider and logic within constraints 2020-11-18 12:12:48 +01:00
Miriam Baglioni a708652093 added check for empty(null) values 2020-11-18 11:58:00 +01:00
Miriam Baglioni 457c07a522 added the ignore case for both the constraints 2020-11-18 11:56:40 +01:00
Claudio Atzori 12acf25519 Merge pull request 'starting from first step...' (#57) from antonis.lempesis/dnet-hadoop:master into master
No judging. Just re-deploying...
2020-11-18 11:01:49 +01:00
Claudio Atzori 8177ce7939 test for XmlIndexingJob based on a local miniSolrCluster 2020-11-18 10:58:05 +01:00
Alessia Bardi 10e673660f Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-11-18 10:01:23 +01:00
Alessia Bardi be7b310cef rel semantcis ignore case 2020-11-18 10:01:20 +01:00
Michele Artini 33da2e3d6c xpaths for dateOfCollection and dateOfTransformation 2020-11-18 09:26:20 +01:00
Antonis Lempesis 01a6e03989 starting from first step... 2020-11-17 23:26:47 +02:00
Alessia Bardi 8f87020a50 #56: map relevantDates from aggregated ODF records 2020-11-17 18:42:09 +01:00
Alessia Bardi 7e0a76a8ac test fr TextGrid 2020-11-17 18:39:25 +01:00
Claudio Atzori cfc01f136e PID filtering based on a blacklist 2020-11-17 12:27:06 +01:00
Miriam Baglioni ec5b5c3a23 added set of classes for the verification of constraints on the result. 2020-11-16 18:52:53 +01:00
Miriam Baglioni 19167c9b9d merge branch with master 2020-11-16 14:15:39 +01:00
Miriam Baglioni 0ad1e237e6 added check to avoid break when name/surname is only composed of the word dr 2020-11-16 14:15:03 +01:00
Miriam Baglioni c29d142087 - 2020-11-16 10:53:12 +01:00
Claudio Atzori 6ab1ce53c9 fixed condition in result pid cleaning; cleanup 2020-11-16 10:09:17 +01:00
Claudio Atzori 4de8c8b237 fixed workflow variable name 2020-11-16 10:03:11 +01:00
Claudio Atzori 331d621800 added test resource 2020-11-14 12:16:15 +01:00
Claudio Atzori 5d4e34e26a fixed typo in variable name 2020-11-14 10:32:26 +01:00
Claudio Atzori 768bc5304c Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-11-13 15:40:34 +01:00