UnknowHostException handling for orcid collector api #141
No reviewers
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#141
Loading…
Reference in New Issue
No description provided.
Delete Branch "enrico.ottonello/dnet-hadoop:beta"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR is for orcid collector api. The wf fails when a UnknowHostException occurs for the api.orcid.org hostname on a single download task execution.
This problem not always occurs.
With this update the wf will go ahead.
betato UnknowHostException handling for orcid collector apiLooking at the files changed, it seems that in case of
UnknowHostException
the procedure will return whatever is held in thedownloaded
variable, as a Tuple2. Considering this kind of exception could be caused by a trasient network issue (e.g. with the DNS resolution) wouldn't this method cause loss of data, skipping a portion of the records? If so, it would be better to implement a backoff/retry mechanism.I implemented a retry mechanism that takes action when a connection error occurs. The core mechanism is taken from https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/beta/dhp-common/src/main/java/eu/dnetlib/dhp/common/collection/HttpConnector2.java and readjusted with a different authentication system and report handling.
I already run the wf with the modified download tasks on a small subset of entries and it worked.