UnknowHostException handling for orcid collector api
#141
Merged
claudio.atzori
merged 4 commits from enrico.ottonello/dnet-hadoop:beta
into beta
3 years ago
Loading…
Reference in New Issue
There is no content yet.
Delete Branch 'enrico.ottonello/dnet-hadoop:beta'
Deleting a branch is permanent. It CANNOT be undone. Continue?
This PR is for orcid collector api. The wf fails when a UnknowHostException occurs for the api.orcid.org hostname on a single download task execution.
This problem not always occurs.
With this update the wf will go ahead.
betato UnknowHostException handling for orcid collector api 3 years agoLooking at the files changed, it seems that in case of
UnknowHostException
the procedure will return whatever is held in thedownloaded
variable, as a Tuple2. Considering this kind of exception could be caused by a trasient network issue (e.g. with the DNS resolution) wouldn't this method cause loss of data, skipping a portion of the records? If so, it would be better to implement a backoff/retry mechanism.I implemented a retry mechanism that takes action when a connection error occurs. The core mechanism is taken from https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/beta/dhp-common/src/main/java/eu/dnetlib/dhp/common/collection/HttpConnector2.java and readjusted with a different authentication system and report handling.
I already run the wf with the modified download tasks on a small subset of entries and it worked.
df15a4dc9f
into beta 3 years agodf15a4dc9f
.Step 1:
From your project repository, check out a new branch and test the changes.Step 2:
Merge the changes and update on Gitea.