Update README.md

This commit is contained in:
Lampros Smyrnaios 2024-10-25 21:44:27 +02:00
parent e6d6382bd0
commit 776243e6d5
1 changed files with 2 additions and 2 deletions

View File

@ -6,8 +6,8 @@ in order to find whether that PID exists in the aggregator's DB and has full-tex
In detail, it does the following:
- extracts the pids from a json-file (DOIs and PMIDs)
- if a "previous-results" file is provided, extracts the pid from there as well and reduced the original input to the pids which have not been processed before.
- splits them in batches and for each batch it submits each pid-evaluation-job to a "ThreadPoolExecutor", which uses 12 threads.
- if a "previous-results" file is provided, extracts the pid from there as well and reduces the original input to the pids which have not been processed before.
- splits them in batches, and for each batch it submits each pid-evaluation-job to a "ThreadPoolExecutor", which uses 12 threads.
- for each one of the PID-pairs, makes a query with Impala, to quickly acquire the following: "dedupid", "id", "pid", "pid_type", "fulltext_url", "location"
- saves the results in a json-file, including the pid for which it made the check (for example in case a record has both "doi" and "pmid" and a fulltext was detected for the "doi" (at least), then the output-record has the "doi" as its PID)