Commit Graph

5459 Commits

Author SHA1 Message Date
Claudio Atzori 786c217085 Using the updated Solr JSON payload model classes 2024-06-26 11:11:33 +02:00
Lampros Smyrnaios c858c02111 - Fix not using the "export HADOOP_USER_NAME" statement in "createPDFsAggregated.sh", which caused permission-issues when creating tables with Impala.
- Remove unused "--user" parameter in "impala-shell" calls.
- Code polishing.
2024-06-26 10:11:21 +02:00
Claudio Atzori 8220e27110 Merge pull request 'Align Solr JSON records to the explore portal requirements' (#448) from json_payload into beta_to_master_may2024
Reviewed-on: #448
2024-06-25 09:57:40 +02:00
Claudio Atzori bc993d49c1 Update pom.xml
depend on released schema version
2024-06-25 09:57:06 +02:00
Claudio Atzori 1dc7458de2 added JSON payload to the SolrInputDocument, updated unit tests 2024-06-24 14:48:09 +02:00
Claudio Atzori a7a54aab47 WIP: align Solr JSON records to the explore portal requirements 2024-06-20 15:48:45 +02:00
Miriam Baglioni eaa00a4199 [IrishFunderList]make changed according to 9635 comment 20, 21, 22 and 23 2024-06-20 12:32:57 +02:00
Claudio Atzori fb731b6d46 WIP: align Solr JSON records to the explore portal requirements 2024-06-19 15:38:43 +02:00
Miriam Baglioni b6da35e736 [IrishFunderList]make changed according to 9635 comment 14, 15 and 16 2024-06-19 11:06:58 +02:00
Lampros Smyrnaios 3c9b8de892 Miscellaneous updates to the copying operation to Impala Cluster:
- Fix not breaking out of the VIEWS-infinite-loop when the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" is set to "false".
- Exit the script when no HDFS-active-node was found, independently of the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR".
- Fix view_name-recognition in a log-message, by using the more advanced "Perl-Compatible Regular Expressions" in "grep".
- Add error-handling for "compute stats" errors.
2024-06-18 15:59:34 +02:00
Antonis Lempesis c67ef157d3 filtering out deletedbyinference and invinsible results from accessroute 2024-06-18 15:59:00 +02:00
Lampros Smyrnaios c23f3031ed Miscellaneous updates to the copying operation to Impala Cluster:
- Show some counts and the elapsed time for various sub-tasks.
- Code polishing.
2024-06-18 15:58:46 +02:00
Claudio Atzori 8ec151aa3d [graph indexing] comment out setting the JSON payload from the SolrInputDocuments 2024-06-18 15:53:24 +02:00
Claudio Atzori 2636936162 [IE OAI-PMH] fixed oozie wf definition 2024-06-14 11:47:37 +02:00
Miriam Baglioni ef437a8cdf [Provision]temporarily removed Json paylod from indexed records (Shadow cannot support it) 2024-06-13 16:48:03 +02:00
Miriam Baglioni 86088ef26e Merge remote-tracking branch 'origin/beta_to_master_may2024' into beta_to_master_may2024 2024-06-11 17:04:07 +02:00
Miriam Baglioni 143c525343 [WebCrawl]remove relations for pid not doi 2024-06-11 17:03:59 +02:00
Claudio Atzori c371513d43 [graph resolution] use sparkExecutorMemory to define also the memoryOverhead 2024-06-11 14:21:01 +02:00
Claudio Atzori 71927ca818 avoid NPEs 2024-06-11 12:40:50 +02:00
Giambattista Bloisi 46018dc804 Fix OperationUnsupportedException while merging two Result's contexts due to modification of an immutable collection 2024-06-11 10:39:48 +02:00
Miriam Baglioni 3efd5b1308 [SDGActionSet]remove datainfo for the result. It is not needed (qualifier.classid = UPDATE) useless since subject do not go at the level of the instance 2024-06-11 10:35:57 +02:00
Miriam Baglioni 196fa55774 Merge remote-tracking branch 'origin/beta_to_master_may2024' into beta_to_master_may2024 2024-06-11 10:26:24 +02:00
Miriam Baglioni 50805e3fc1 [FoSActionSet]remove datainfo for the result. It is not needed (qualifier.classid = UPDATE) useless since subject do not go at the level of the instance 2024-06-11 10:25:46 +02:00
Claudio Atzori d39a1054b8 [actionset promotion] use sparkExecutorMemory to define also the memoryOverhead 2024-06-10 16:15:07 +02:00
Claudio Atzori 576efc1857 hostedby patching to work with the updated Crossref contents 2024-06-10 15:22:33 +02:00
Claudio Atzori efc1632e16 code formatting 2024-06-06 09:25:26 +02:00
Claudio Atzori 91b49366c6 [graph provision] align serialisation of the usage count measures to the agrred specifications 2024-06-05 16:34:40 +02:00
Claudio Atzori 5e05385d35 minor 2024-06-05 16:31:58 +02:00
Miriam Baglioni c4d9b5b9d2 [downloadsAndViews]update the test file to consider the new serialization for downloads and views 2024-06-05 16:30:15 +02:00
Miriam Baglioni bf9a5e6314 [downloadsAndViews]changed the test file to check the indicators are not there if their value is 0 2024-06-05 16:29:40 +02:00
Miriam Baglioni 9d79ddb3dd [bulkTag] fixed issue that made project disappear in graph_10_enriched 2024-06-05 16:20:40 +02:00
Miriam Baglioni 907aa28c6c [downloadsAndViews] fixed issue 2024-06-05 16:19:29 +02:00
Miriam Baglioni 3955ceaa76 [downloadsAndViews] changed the serialization for downloads and views 2024-06-05 16:18:46 +02:00
Miriam Baglioni 128c143394 {downloadsAndViews] extended test file with measures for downloads and views 2024-06-05 16:17:59 +02:00
Claudio Atzori 5133993ee5 Merge branch 'beta_to_master_may2024' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta_to_master_may2024 2024-06-05 12:17:48 +02:00
Claudio Atzori 5cf259a851 [graph2hive] use sparkExecutorMemory to define also the memoryOverhead 2024-06-05 12:17:16 +02:00
Claudio Atzori e1828fc60e Merge pull request '[PROD] Irish oaipmh exporter' (#444) from irish-oaipmh-exporter into beta_to_master_may2024
Reviewed-on: #444
2024-06-05 10:56:20 +02:00
Claudio Atzori 81090ad593 [IE OAIPHM] added oozie workflow, minor changes, code formatting 2024-06-05 10:03:33 +02:00
Claudio Atzori 56920b447d Merge pull request 'Fix for missing collectedfrom after dedup' (#442) from fix_mergedcliquesort into beta_to_master_may2024
Reviewed-on: #442
2024-06-03 15:34:01 +02:00
Giambattista Bloisi 3feab5d92d Fix MergeUtils.mergeGroup: it could get rid of some records and did not consider all PID authorities whilke sorting records.
ResultTypeComparator is now renamed in MergeEntitiesComparator and can be used as a general comparator for merging groups of records
2024-06-03 15:13:40 +02:00
Claudio Atzori 6be783caec [graph cleaning] use sparkExecutorMemory to define also the memoryOverhead 2024-05-29 14:36:49 +02:00
Claudio Atzori b703f94f09 Merge pull request 'changes in copy script - beta2master' (#439) from antonis.lempesis/dnet-hadoop:beta into beta_to_master_may2024
Reviewed-on: #439
2024-05-29 14:29:26 +02:00
Miriam Baglioni 14f275ffaf [NOAMI] removed Ireland funder id 501100011103. ticket 9635 2024-05-29 11:54:17 +02:00
Claudio Atzori a428e7be7e graph cleaning to implement ugly hardcoded rules, avoid NPEs 2024-05-29 09:26:12 +02:00
Lampros Smyrnaios e3f28338c1 Miscellaneous updates to the copying operation to Impala Cluster:
- Assign the WRITE and EXECUTE permissions to the DBs' HDFS-directories, in order to be able to create tables on top of them, in the Impala Cluster.
- Make sure the "copydb" function returns early, when it encounters a fatal error, while respecting the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" config.
2024-05-28 17:51:45 +03:00
Claudio Atzori 8e45c5baa8 graph cleaning to implement ugly hardcoded rules 2024-05-28 15:28:42 +02:00
Claudio Atzori db5e18c784 hostedby patching to work with the updated Crossref contents 2024-05-28 15:28:13 +02:00
Claudio Atzori fb266efbcb [org dedup] avoid NPEs in SparkPrepareNewOrgs 2024-05-26 21:23:30 +02:00
Claudio Atzori d7daf54333 [org dedup] avoid NPEs in SparkPrepareOrgRels 2024-05-26 16:48:11 +02:00
Claudio Atzori f99eaa0376 Merge branch 'beta_to_master_may2024' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta_to_master_may2024 2024-05-26 15:45:41 +02:00