Claudio Atzori
c06dfdfd86
ignore dates containing 'null's
2024-07-02 15:43:11 +02:00
Claudio Atzori
b822b34abe
code formatting
2024-07-01 09:22:35 +02:00
Michele De Bonis
ea1841fbd2
implementation of countryMatch and addition of workflow parameters
2024-07-01 09:14:32 +02:00
Miriam Baglioni
4dbce39237
[AffiliationInference]Extended the affiliation ingestion from OpenAIRE to include also the links derived from web crawl. Changed the provenance from BIP! to OpenAIRE
2024-06-29 18:51:06 +02:00
Miriam Baglioni
3ee8a7d18a
[WebCrawl]moved to Constants web crawl name and id
2024-06-29 18:47:23 +02:00
Claudio Atzori
ee7deb3f60
[graph provision] publicFormat worfklow parameter defined as optional
2024-06-28 14:52:43 +02:00
Claudio Atzori
157cc8be87
[graph provision] fixed serialization of the instancetypes
2024-06-28 14:21:12 +02:00
Claudio Atzori
023099a921
imported from beta
2024-06-26 11:40:16 +02:00
Claudio Atzori
786c217085
Using the updated Solr JSON payload model classes
2024-06-26 11:11:33 +02:00
Lampros Smyrnaios
c858c02111
- Fix not using the "export HADOOP_USER_NAME" statement in "createPDFsAggregated.sh", which caused permission-issues when creating tables with Impala.
...
- Remove unused "--user" parameter in "impala-shell" calls.
- Code polishing.
2024-06-26 10:11:21 +02:00
Claudio Atzori
8220e27110
Merge pull request 'Align Solr JSON records to the explore portal requirements' ( #448 ) from json_payload into beta_to_master_may2024
...
Reviewed-on: #448
2024-06-25 09:57:40 +02:00
Claudio Atzori
bc993d49c1
Update pom.xml
...
depend on released schema version
2024-06-25 09:57:06 +02:00
Claudio Atzori
1dc7458de2
added JSON payload to the SolrInputDocument, updated unit tests
2024-06-24 14:48:09 +02:00
Claudio Atzori
a7a54aab47
WIP: align Solr JSON records to the explore portal requirements
2024-06-20 15:48:45 +02:00
Miriam Baglioni
eaa00a4199
[IrishFunderList]make changed according to 9635 comment 20, 21, 22 and 23
2024-06-20 12:32:57 +02:00
Claudio Atzori
fb731b6d46
WIP: align Solr JSON records to the explore portal requirements
2024-06-19 15:38:43 +02:00
Miriam Baglioni
b6da35e736
[IrishFunderList]make changed according to 9635 comment 14, 15 and 16
2024-06-19 11:06:58 +02:00
Lampros Smyrnaios
3c9b8de892
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Fix not breaking out of the VIEWS-infinite-loop when the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" is set to "false".
- Exit the script when no HDFS-active-node was found, independently of the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR".
- Fix view_name-recognition in a log-message, by using the more advanced "Perl-Compatible Regular Expressions" in "grep".
- Add error-handling for "compute stats" errors.
2024-06-18 15:59:34 +02:00
Antonis Lempesis
c67ef157d3
filtering out deletedbyinference and invinsible results from accessroute
2024-06-18 15:59:00 +02:00
Lampros Smyrnaios
c23f3031ed
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Show some counts and the elapsed time for various sub-tasks.
- Code polishing.
2024-06-18 15:58:46 +02:00
Claudio Atzori
8ec151aa3d
[graph indexing] comment out setting the JSON payload from the SolrInputDocuments
2024-06-18 15:53:24 +02:00
Claudio Atzori
2636936162
[IE OAI-PMH] fixed oozie wf definition
2024-06-14 11:47:37 +02:00
Miriam Baglioni
ef437a8cdf
[Provision]temporarily removed Json paylod from indexed records (Shadow cannot support it)
2024-06-13 16:48:03 +02:00
Miriam Baglioni
86088ef26e
Merge remote-tracking branch 'origin/beta_to_master_may2024' into beta_to_master_may2024
2024-06-11 17:04:07 +02:00
Miriam Baglioni
143c525343
[WebCrawl]remove relations for pid not doi
2024-06-11 17:03:59 +02:00
Claudio Atzori
c371513d43
[graph resolution] use sparkExecutorMemory to define also the memoryOverhead
2024-06-11 14:21:01 +02:00
Claudio Atzori
71927ca818
avoid NPEs
2024-06-11 12:40:50 +02:00
Giambattista Bloisi
46018dc804
Fix OperationUnsupportedException while merging two Result's contexts due to modification of an immutable collection
2024-06-11 10:39:48 +02:00
Miriam Baglioni
3efd5b1308
[SDGActionSet]remove datainfo for the result. It is not needed (qualifier.classid = UPDATE) useless since subject do not go at the level of the instance
2024-06-11 10:35:57 +02:00
Miriam Baglioni
196fa55774
Merge remote-tracking branch 'origin/beta_to_master_may2024' into beta_to_master_may2024
2024-06-11 10:26:24 +02:00
Miriam Baglioni
50805e3fc1
[FoSActionSet]remove datainfo for the result. It is not needed (qualifier.classid = UPDATE) useless since subject do not go at the level of the instance
2024-06-11 10:25:46 +02:00
Claudio Atzori
d39a1054b8
[actionset promotion] use sparkExecutorMemory to define also the memoryOverhead
2024-06-10 16:15:07 +02:00
Claudio Atzori
576efc1857
hostedby patching to work with the updated Crossref contents
2024-06-10 15:22:33 +02:00
Claudio Atzori
efc1632e16
code formatting
2024-06-06 09:25:26 +02:00
Claudio Atzori
91b49366c6
[graph provision] align serialisation of the usage count measures to the agrred specifications
2024-06-05 16:34:40 +02:00
Claudio Atzori
5e05385d35
minor
2024-06-05 16:31:58 +02:00
Miriam Baglioni
c4d9b5b9d2
[downloadsAndViews]update the test file to consider the new serialization for downloads and views
2024-06-05 16:30:15 +02:00
Miriam Baglioni
bf9a5e6314
[downloadsAndViews]changed the test file to check the indicators are not there if their value is 0
2024-06-05 16:29:40 +02:00
Miriam Baglioni
9d79ddb3dd
[bulkTag] fixed issue that made project disappear in graph_10_enriched
2024-06-05 16:20:40 +02:00
Miriam Baglioni
907aa28c6c
[downloadsAndViews] fixed issue
2024-06-05 16:19:29 +02:00
Miriam Baglioni
3955ceaa76
[downloadsAndViews] changed the serialization for downloads and views
2024-06-05 16:18:46 +02:00
Miriam Baglioni
128c143394
{downloadsAndViews] extended test file with measures for downloads and views
2024-06-05 16:17:59 +02:00
Claudio Atzori
5133993ee5
Merge branch 'beta_to_master_may2024' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta_to_master_may2024
2024-06-05 12:17:48 +02:00
Claudio Atzori
5cf259a851
[graph2hive] use sparkExecutorMemory to define also the memoryOverhead
2024-06-05 12:17:16 +02:00
Claudio Atzori
e1828fc60e
Merge pull request '[PROD] Irish oaipmh exporter' ( #444 ) from irish-oaipmh-exporter into beta_to_master_may2024
...
Reviewed-on: #444
2024-06-05 10:56:20 +02:00
Claudio Atzori
81090ad593
[IE OAIPHM] added oozie workflow, minor changes, code formatting
2024-06-05 10:03:33 +02:00
Claudio Atzori
56920b447d
Merge pull request 'Fix for missing collectedfrom after dedup' ( #442 ) from fix_mergedcliquesort into beta_to_master_may2024
...
Reviewed-on: #442
2024-06-03 15:34:01 +02:00
Giambattista Bloisi
3feab5d92d
Fix MergeUtils.mergeGroup: it could get rid of some records and did not consider all PID authorities whilke sorting records.
...
ResultTypeComparator is now renamed in MergeEntitiesComparator and can be used as a general comparator for merging groups of records
2024-06-03 15:13:40 +02:00
Claudio Atzori
6be783caec
[graph cleaning] use sparkExecutorMemory to define also the memoryOverhead
2024-05-29 14:36:49 +02:00
Claudio Atzori
b703f94f09
Merge pull request 'changes in copy script - beta2master' ( #439 ) from antonis.lempesis/dnet-hadoop:beta into beta_to_master_may2024
...
Reviewed-on: #439
2024-05-29 14:29:26 +02:00