UnknowHostException handling for orcid collector api #141

Merged
claudio.atzori merged 4 commits from enrico.ottonello/dnet-hadoop:beta into beta 3 years ago
Collaborator

This PR is for orcid collector api. The wf fails when a UnknowHostException occurs for the api.orcid.org hostname on a single download task execution.
This problem not always occurs.
With this update the wf will go ahead.

This PR is for orcid collector api. The wf fails when a UnknowHostException occurs for the api.orcid.org hostname on a single download task execution. This problem not always occurs. With this update the wf will go ahead.
enrico.ottonello added 2 commits 3 years ago
enrico.ottonello changed title from beta to UnknowHostException handling for orcid collector api 3 years ago
Owner

Looking at the files changed, it seems that in case of UnknowHostException the procedure will return whatever is held in the downloaded variable, as a Tuple2. Considering this kind of exception could be caused by a trasient network issue (e.g. with the DNS resolution) wouldn't this method cause loss of data, skipping a portion of the records? If so, it would be better to implement a backoff/retry mechanism.

Looking at the files changed, it seems that in case of `UnknowHostException` the procedure will return whatever is held in the `downloaded` variable, as a Tuple2. Considering this kind of exception could be caused by a trasient network issue (e.g. with the DNS resolution) wouldn't this method cause loss of data, skipping a portion of the records? If so, it would be better to implement a backoff/retry mechanism.
enrico.ottonello added 2 commits 3 years ago
Poster
Collaborator

I implemented a retry mechanism that takes action when a connection error occurs. The core mechanism is taken from https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/beta/dhp-common/src/main/java/eu/dnetlib/dhp/common/collection/HttpConnector2.java and readjusted with a different authentication system and report handling.
I already run the wf with the modified download tasks on a small subset of entries and it worked.

I implemented a retry mechanism that takes action when a connection error occurs. The core mechanism is taken from https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/beta/dhp-common/src/main/java/eu/dnetlib/dhp/common/collection/HttpConnector2.java and readjusted with a different authentication system and report handling. I already run the wf with the modified download tasks on a small subset of entries and it worked.
claudio.atzori merged commit df15a4dc9f into beta 3 years ago
The pull request has been merged as df15a4dc9f.
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b enrico.ottonello-beta beta
git pull beta

Step 2:

Merge the changes and update on Gitea.
git checkout beta
git merge --no-ff enrico.ottonello-beta
git push origin beta
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#141
Loading…
There is no content yet.