UnknowHostException handling for orcid collector api #141

Merged
claudio.atzori merged 4 commits from enrico.ottonello/dnet-hadoop:beta into beta 2021-09-22 11:51:14 +02:00
Contributor

This PR is for orcid collector api. The wf fails when a UnknowHostException occurs for the api.orcid.org hostname on a single download task execution.
This problem not always occurs.
With this update the wf will go ahead.

This PR is for orcid collector api. The wf fails when a UnknowHostException occurs for the api.orcid.org hostname on a single download task execution. This problem not always occurs. With this update the wf will go ahead.
enrico.ottonello added 2 commits 2021-09-14 18:03:38 +02:00
enrico.ottonello changed title from beta to UnknowHostException handling for orcid collector api 2021-09-15 10:32:14 +02:00

Looking at the files changed, it seems that in case of UnknowHostException the procedure will return whatever is held in the downloaded variable, as a Tuple2. Considering this kind of exception could be caused by a trasient network issue (e.g. with the DNS resolution) wouldn't this method cause loss of data, skipping a portion of the records? If so, it would be better to implement a backoff/retry mechanism.

Looking at the files changed, it seems that in case of `UnknowHostException` the procedure will return whatever is held in the `downloaded` variable, as a Tuple2. Considering this kind of exception could be caused by a trasient network issue (e.g. with the DNS resolution) wouldn't this method cause loss of data, skipping a portion of the records? If so, it would be better to implement a backoff/retry mechanism.
enrico.ottonello added 2 commits 2021-09-20 18:25:33 +02:00
Author
Contributor

I implemented a retry mechanism that takes action when a connection error occurs. The core mechanism is taken from https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/beta/dhp-common/src/main/java/eu/dnetlib/dhp/common/collection/HttpConnector2.java and readjusted with a different authentication system and report handling.
I already run the wf with the modified download tasks on a small subset of entries and it worked.

I implemented a retry mechanism that takes action when a connection error occurs. The core mechanism is taken from https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/beta/dhp-common/src/main/java/eu/dnetlib/dhp/common/collection/HttpConnector2.java and readjusted with a different authentication system and report handling. I already run the wf with the modified download tasks on a small subset of entries and it worked.
claudio.atzori merged commit df15a4dc9f into beta 2021-09-22 11:51:14 +02:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#141
No description provided.