Sandro La Bruzzo
cbd4e5e4bb
update mag mapping
2024-03-08 16:31:40 +01:00
Sandro La Bruzzo
d34cef3f8d
Merge remote-tracking branch 'origin/beta' into doidoost_dismiss
2024-03-05 11:45:31 +01:00
Sandro La Bruzzo
3b837d38ce
added oozie workflow
2024-03-05 11:44:59 +01:00
Sandro La Bruzzo
f417515e43
Implemented class that generates a normalized table of MAG, which is the starting point for the creation of the mag source
2024-03-04 17:15:13 +01:00
Sandro La Bruzzo
ad0e9aa80c
added first part of refactoring of the code generating MAG,
...
make it more readable using spark sql queries
2024-02-29 18:16:15 +01:00
Sandro La Bruzzo
9d94648f3b
code formatted
2024-02-29 18:15:20 +01:00
Giambattista Bloisi
3cd5590f3b
When converting json to XML, remove characters that are not allowed in the XML 1.0 specs, as they will cause xpath failures even if escaped
2024-02-28 15:14:18 +01:00
Giambattista Bloisi
56dd05f85c
Merge pull request 'Revised procedure when converting json data into xml' ( #395 ) from restiterator_xmlcleanup into beta
...
Reviewed-on: #395
2024-02-28 10:38:54 +01:00
Sandro La Bruzzo
7d806a434c
formatted code
2024-02-28 09:31:58 +01:00
Sandro La Bruzzo
e468e99100
Merge pull request 'Orcid Update Procedure' ( #394 ) from orcid_update into beta
...
Reviewed-on: #394
2024-02-28 09:17:30 +01:00
Sandro La Bruzzo
b63994dcc4
Merge remote-tracking branch 'origin/beta' into orcid_update
2024-02-28 09:11:18 +01:00
Sandro La Bruzzo
915a76a796
following the comment on the pull requests:
...
- Added #NUM_OF_THREADS complete job in the queue at the end of the main loop to avoid deadlock
2024-02-28 09:10:55 +01:00
Giambattista Bloisi
773e856550
Revised procedure when converting json data into xml:
...
- json object keys are renamed to be conformant to xml tag elements, special characters are substituted or removed
- json string values are no longer post-processed as they are already escaped by the org.json.XML.toString method
2024-02-24 16:54:30 +01:00
Sandro La Bruzzo
a712df1e1d
Merge remote-tracking branch 'origin/beta' into orcid_update
2024-02-23 10:12:25 +01:00
Sandro La Bruzzo
b32a9d1994
Implemented workflow for updating table , added step to check if the new generated table is valid
2024-02-23 10:04:28 +01:00
Michele Artini
3268570b2c
mapping of project PIDs
2024-02-22 14:47:21 +01:00
Claudio Atzori
753c2a72bd
Merge pull request 'fix import of ORPs' ( #390 ) from import_orps_fix into beta
...
Reviewed-on: #390
2024-02-15 15:02:08 +01:00
Claudio Atzori
a63b091bae
Merge branch 'beta' into import_orps_fix
2024-02-15 15:01:56 +01:00
Giambattista Bloisi
85aeff72f1
Merge pull request 'Revised instance type comparisons in dedup phase' ( #393 ) from revisedInstanceType into beta
...
Reviewed-on: #393
2024-02-15 12:15:37 +01:00
Giambattista Bloisi
d65285da7f
Promote "Research" to a jolly instanceType in dedup comparisons
...
Compare "Journal" and "Part of book or chapter of book" with "Article"
2024-02-15 12:11:04 +01:00
Giambattista Bloisi
29194472a7
Promote "Research" to a jolly instanceType in dedup comparisons
...
Compare Part of book or chapter of book with Article
2024-02-15 11:53:46 +01:00
Claudio Atzori
d85d2df6ad
[graph raw] fixed mapping of the original resource type from the Datacite format
2024-02-09 10:20:20 +01:00
Giambattista Bloisi
b19643f6eb
Dedup aliases, created when a dedup in a previous build has been merged in a new dedup, need to be marked as "deletedbyinference", since they are "merged" in the new dedup
2024-02-08 15:34:59 +01:00
Claudio Atzori
e6bdee86d1
Merge pull request 'Support for the PromoteAction strategy' ( #389 ) from promote_actions_join_type into beta
...
Reviewed-on: #389
2024-02-08 15:08:05 +01:00
Claudio Atzori
38c9001147
fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite)
2024-02-07 17:02:05 +01:00
Claudio Atzori
fd17c1f17c
[actiosets] fixed join type
2024-02-05 16:55:36 +02:00
Claudio Atzori
009dcf6aea
[actiosets] introduced support for the PromoteAction strategy
2024-02-05 16:43:40 +02:00
Claudio Atzori
bb82052c40
[graph cleaning] rule out datasources without an officialname
2024-02-05 14:59:27 +02:00
Claudio Atzori
42f5506306
[orcid enrichment] fixed directory cleanup before distcp
2024-02-05 09:45:36 +02:00
Alessia Bardi
f2a08d8cc2
test for Italian records from IRS repositories
2024-01-30 19:20:14 +01:00
Miriam Baglioni
a5995ab557
[orcid-enrichment] change the value of parameters.
2024-01-29 18:19:48 +01:00
Claudio Atzori
f804c58bc7
Merge pull request 'Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf' ( #386 ) from stats_with_spark_sql into beta
...
Reviewed-on: #386
2024-01-29 09:11:59 +01:00
Claudio Atzori
926903b06b
Merge branch 'beta' into stats_with_spark_sql
2024-01-29 09:11:45 +01:00
Giambattista Bloisi
078df0b4d1
Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf
2024-01-26 21:56:55 +01:00
Claudio Atzori
bf99c424fa
Merge pull request 'Fixed problem on missing author in crossref Mapping' ( #383 ) from crossref_missing_author_fix into beta
...
Reviewed-on: #383
2024-01-26 15:57:23 +01:00
Claudio Atzori
ce3200263e
Merge branch 'beta' into crossref_missing_author_fix
2024-01-26 15:57:04 +01:00
Sandro La Bruzzo
e889808daa
Fixed problem on missing author in crossref Mapping
2024-01-26 12:19:04 +01:00
Claudio Atzori
9e8fc6aa88
[collection] increased logging from the oai-pmh metadata collection process
2024-01-26 09:17:20 +01:00
Sandro La Bruzzo
0386f36385
Added workflow to update ORCID and replaced some parsing, because the update works and employments xml differs from the dump one.
2024-01-25 19:40:59 +01:00
Antonis Lempesis
a7115cfa9e
max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%.
2024-01-25 15:13:16 +01:00
Claudio Atzori
2838a9b630
Update 'CONTRIBUTING.md'
2024-01-24 16:07:05 +01:00
Claudio Atzori
da944a5c55
Merge pull request 'code of conduct and contributing' ( #382 ) from contributing into beta
...
Reviewed-on: #382
2024-01-24 15:40:26 +01:00
Claudio Atzori
0c97a3a81a
minor
2024-01-24 10:56:33 +01:00
Claudio Atzori
2c1e6849f0
added code of conduct and contributing files
2024-01-24 10:36:41 +01:00
Claudio Atzori
9b13c22e5d
[graph provision] retrieve all the context information by adding all=true to the requests issued to thr API
2024-01-23 15:36:08 +01:00
Claudio Atzori
3e96777cc4
[collection] increased logging from the oai-pmh metadata collection process
2024-01-23 15:21:03 +01:00
Sandro La Bruzzo
43e0bba7ed
logg added during download
2024-01-23 15:04:49 +01:00
Claudio Atzori
9812406589
Merge pull request '[graph provision] updated param specification for the XML converter job' ( #380 ) from provision_community_api into beta
...
Reviewed-on: #380
2024-01-23 08:55:59 +01:00
Claudio Atzori
f87f3a6483
[graph provision] updated param specification for the XML converter job
2024-01-23 08:54:37 +01:00
Claudio Atzori
6fd25cf549
code formatting
2024-01-23 08:47:12 +01:00