Claudio Atzori
|
cadb5a42c2
|
removed spark.shuffle.sort.bypassMergeThreshold and increased spark.yarn.executor.memoryOverhead to 6G
|
2024-02-15 14:11:25 +01:00 |
Claudio Atzori
|
ef7335c867
|
setting spark.shuffle.sort.bypassMergeThreshold equals to spark.sql.shuffle.partitions, see https://spark.apache.org/docs/2.4.0/configuration.html\#shuffle-behavior
|
2024-02-13 14:18:25 +01:00 |
Claudio Atzori
|
1416f16b35
|
[graph raw] fixed mapping of the original resource type from the Datacite format
|
2024-02-09 10:19:53 +01:00 |
Giambattista Bloisi
|
ba1a0e7b4f
|
Merge pull request 'Set deletedbyinference =true to dedup aliases, created when a dedup in a previous build has been merged in a new dedup' (#392) from fix_dedupaliases_deletedbyinference into master
Reviewed-on: D-Net/dnet-hadoop#392
|
2024-02-08 15:29:29 +01:00 |
Giambattista Bloisi
|
079085286c
|
Merge branch 'master' into fix_dedupaliases_deletedbyinference
|
2024-02-08 15:29:13 +01:00 |
Giambattista Bloisi
|
8dd666aedd
|
Dedup aliases, created when a dedup in a previous build has been merged in a new dedup, need to be marked as "deletedbyinference", since they are "merged" in the new dedup
|
2024-02-08 15:27:57 +01:00 |
Claudio Atzori
|
f21133229a
|
Merge pull request 'Support for the PromoteAction strategy [master]' (#391) from promote_actions_join_type_master into master
Reviewed-on: D-Net/dnet-hadoop#391
|
2024-02-08 15:12:16 +01:00 |
Claudio Atzori
|
d86b909db2
|
[actiosets] fixed join type
|
2024-02-08 15:10:55 +01:00 |
Claudio Atzori
|
08162902ab
|
[actiosets] introduced support for the PromoteAction strategy
|
2024-02-08 15:10:40 +01:00 |
Claudio Atzori
|
e8630a6d03
|
[graph cleaning] rule out datasources without an officialname
|
2024-02-05 14:59:06 +02:00 |
Claudio Atzori
|
f28c63d5ef
|
[orcid enrichment] fixed directory cleanup before distcp
|
2024-02-05 09:44:56 +02:00 |
Claudio Atzori
|
1a8b609ed2
|
code formatting
|
2024-01-30 11:34:16 +01:00 |
Miriam Baglioni
|
4c8706efee
|
[orcid-enrichment] change the value of parameters.
|
2024-01-29 18:21:36 +01:00 |
Claudio Atzori
|
4d0c59669b
|
merged changes from beta
|
2024-01-26 16:08:54 +01:00 |
Claudio Atzori
|
bf99c424fa
|
Merge pull request 'Fixed problem on missing author in crossref Mapping' (#383) from crossref_missing_author_fix into beta
Reviewed-on: D-Net/dnet-hadoop#383
|
2024-01-26 15:57:23 +01:00 |
Claudio Atzori
|
ce3200263e
|
Merge branch 'beta' into crossref_missing_author_fix
|
2024-01-26 15:57:04 +01:00 |
Sandro La Bruzzo
|
e889808daa
|
Fixed problem on missing author in crossref Mapping
|
2024-01-26 12:19:04 +01:00 |
Claudio Atzori
|
9e8fc6aa88
|
[collection] increased logging from the oai-pmh metadata collection process
|
2024-01-26 09:17:20 +01:00 |
Antonis Lempesis
|
a7115cfa9e
|
max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%.
|
2024-01-25 15:13:16 +01:00 |
Claudio Atzori
|
2838a9b630
|
Update 'CONTRIBUTING.md'
|
2024-01-24 16:07:05 +01:00 |
Claudio Atzori
|
da944a5c55
|
Merge pull request 'code of conduct and contributing' (#382) from contributing into beta
Reviewed-on: D-Net/dnet-hadoop#382
|
2024-01-24 15:40:26 +01:00 |
Claudio Atzori
|
0c97a3a81a
|
minor
|
2024-01-24 10:56:33 +01:00 |
Claudio Atzori
|
2c1e6849f0
|
added code of conduct and contributing files
|
2024-01-24 10:36:41 +01:00 |
Claudio Atzori
|
9b13c22e5d
|
[graph provision] retrieve all the context information by adding all=true to the requests issued to thr API
|
2024-01-23 15:36:08 +01:00 |
Claudio Atzori
|
3e96777cc4
|
[collection] increased logging from the oai-pmh metadata collection process
|
2024-01-23 15:21:03 +01:00 |
Claudio Atzori
|
9812406589
|
Merge pull request '[graph provision] updated param specification for the XML converter job' (#380) from provision_community_api into beta
Reviewed-on: D-Net/dnet-hadoop#380
|
2024-01-23 08:55:59 +01:00 |
Claudio Atzori
|
f87f3a6483
|
[graph provision] updated param specification for the XML converter job
|
2024-01-23 08:54:37 +01:00 |
Claudio Atzori
|
6fd25cf549
|
code formatting
|
2024-01-23 08:47:12 +01:00 |
Claudio Atzori
|
bd187ec6e7
|
Merge pull request 'Implements pivots table update oozie workflow' (#376) from update_pivots_table into beta
Reviewed-on: D-Net/dnet-hadoop#376
|
2024-01-22 16:37:30 +01:00 |
Claudio Atzori
|
f76852f385
|
Merge branch 'beta' into update_pivots_table
|
2024-01-22 16:37:22 +01:00 |
Claudio Atzori
|
b9fcc5ad5e
|
Merge pull request 'Context API update' (#379) from provision_community_api into beta
Reviewed-on: D-Net/dnet-hadoop#379
|
2024-01-22 15:55:33 +01:00 |
Claudio Atzori
|
1c6db320f4
|
[graph provision] obtain context info from the context API instead from the ISLookUp service
|
2024-01-22 15:53:17 +01:00 |
Claudio Atzori
|
2655eea5bc
|
[orcid enrichment] drop paths before copying the non-modifyed contents
|
2024-01-19 16:28:05 +01:00 |
Claudio Atzori
|
c6b3401596
|
increased shuffle partitions for publications in the country propagation workflow
|
2024-01-19 10:15:39 +01:00 |
Miriam Baglioni
|
bcc0a13981
|
[enrichment single step] adding <end> element in wf definition
|
2024-01-18 17:39:14 +01:00 |
Miriam Baglioni
|
6af536541d
|
[enrichment single step] moving parameter file in correct location
|
2024-01-18 15:35:40 +01:00 |
Miriam Baglioni
|
a12a3eb143
|
-
|
2024-01-18 15:18:10 +01:00 |
Claudio Atzori
|
628fdfb5eb
|
Merge pull request '[enrichment single step]' (#378) from enrichmentSingleStepFixed into beta
Reviewed-on: D-Net/dnet-hadoop#378
|
2024-01-18 09:41:09 +01:00 |
Miriam Baglioni
|
82e9e262ee
|
[enrichment single step] remove parameter from execution
|
2024-01-17 17:38:03 +01:00 |
Miriam Baglioni
|
67ce2d54be
|
[enrichment single step] refactoring to fix issues in disappeared result type
|
2024-01-17 16:50:00 +01:00 |
Miriam Baglioni
|
59eaccbd87
|
[enrichment single step] refactoring to fix issue in disappeared result type
|
2024-01-15 17:49:54 +01:00 |
Giambattista Bloisi
|
21a14fcd80
|
Reusable RunSQLSparkJob for executing SQL in Spark through Oozie Spark Actions
Implements pivots table update oozie workflow
|
2024-01-15 10:18:14 +01:00 |
Claudio Atzori
|
2d302e6827
|
Merge pull request '[FoS integration]fix issue on FoS integration. Removing the null values from FoS' (#375) from fosPreparationBeta into beta
Reviewed-on: D-Net/dnet-hadoop#375
|
2024-01-12 10:27:28 +01:00 |
Miriam Baglioni
|
f612125939
|
fix issue on FoS integration. Removing the null values from FoS
|
2024-01-12 10:20:28 +01:00 |
Claudio Atzori
|
c67467723b
|
Merge pull request 'refined mapping for the extraction of the original resource type' (#374) from resource_types into beta
Reviewed-on: D-Net/dnet-hadoop#374
|
2024-01-11 16:29:47 +01:00 |
Claudio Atzori
|
cb9e739484
|
Merge branch 'beta' into resource_types
|
2024-01-11 16:29:41 +01:00 |
Claudio Atzori
|
2753044d13
|
refined mapping for the extraction of the original resource type
|
2024-01-11 16:28:26 +01:00 |
Giambattista Bloisi
|
a88dce5bf3
|
Merge pull request 'Improvements and refactoring in Dedup' (#367) from dedup_increasenumofblocks into beta
Reviewed-on: D-Net/dnet-hadoop#367
|
2024-01-11 11:24:06 +01:00 |
Giambattista Bloisi
|
3c66e3bd7b
|
Create dedup record for "merged" pivots
Do not create dedup records for group that have more than 20 different acceptance date
|
2024-01-10 22:59:52 +01:00 |
Giambattista Bloisi
|
10e135db1e
|
Use dedup_wf_002 in place of dedup_wf_001 to make explicit a different algorithm has been used to generate those kind of ids
|
2024-01-10 22:59:52 +01:00 |