Orcid Update Procedure
#394
Merged
sandro.labruzzo
merged 13 commits from orcid_update
into beta
2 months ago
Loading…
Reference in New Issue
There is no content yet.
Delete Branch 'orcid_update'
Deleting a branch is permanent. It CANNOT be undone. Continue?
This pull request implements a new procedure to update the ORCID table from the APIs. The previous one was too complicated and used spark executor to download from the APIs, and it did not give the possibility to control the request limit.
This function creates a number of threads less than the total number of the ORCID API's request rate limit per second. It introduces a possible sleep for each thread to prevent it from making more than one request per second.
The new procedure is more efficient and easier to use than the previous version.
Furthermore before apply updates check if there are no decrease respect the original table in case raises an exception.
Can you sunmmarize how the caller of the workflow can control the request limit?
@ -0,0 +155,4 @@
throw new RuntimeException(e);
}
});
queue.put(ORCIDWorker.JOB_COMPLETE);
I read this line as a request to shut down one worker.
I would expect instead to have 22 of those put to shut down all workers. Also I would expect this activity to be done at the end of the outermost while loop, that is before joining the workers.
When a ORCIDworker encounters a JOB_COMPLETE message before it closes itself, it re-inserts the same object back into the queue. Do you think this behavior is incorrect, or could it lead to deadlocks or other types of bugs?
A deadlock is possible if this thread (or in general producer of messages) - after putting the JOB_COMPLETE message - continues to put other messages. What could happen is that the last worker gets the last re-routed JOB_COMPLETE message and, meanwhile before reinserting the message, it gets pre-empted and the queue gets completely filled by the producers: at this point the last reinsert tentative will last forever because the queue is full and not consumed by anyone.
If I'm not reading the code wrongly, in this case the JOB_COMPLETE should be put outside the while loop also to cover cases where the tar is empty or the last entry of the tar is not a file.
Thanks for your revision Jedi Master, I've updated the code.
I Think I can merge the request
@claudio.atzori I've rewritten the PR description, I hope it's clearer now.
e468e99100
into beta 2 months agoe468e99100
.Step 1:
From your project repository, check out a new branch and test the changes.Step 2:
Merge the changes and update on Gitea.