Update README.md
This commit is contained in:
parent
e6d6382bd0
commit
776243e6d5
|
@ -6,8 +6,8 @@ in order to find whether that PID exists in the aggregator's DB and has full-tex
|
||||||
|
|
||||||
In detail, it does the following:
|
In detail, it does the following:
|
||||||
- extracts the pids from a json-file (DOIs and PMIDs)
|
- extracts the pids from a json-file (DOIs and PMIDs)
|
||||||
- if a "previous-results" file is provided, extracts the pid from there as well and reduced the original input to the pids which have not been processed before.
|
- if a "previous-results" file is provided, extracts the pid from there as well and reduces the original input to the pids which have not been processed before.
|
||||||
- splits them in batches and for each batch it submits each pid-evaluation-job to a "ThreadPoolExecutor", which uses 12 threads.
|
- splits them in batches, and for each batch it submits each pid-evaluation-job to a "ThreadPoolExecutor", which uses 12 threads.
|
||||||
- for each one of the PID-pairs, makes a query with Impala, to quickly acquire the following: "dedupid", "id", "pid", "pid_type", "fulltext_url", "location"
|
- for each one of the PID-pairs, makes a query with Impala, to quickly acquire the following: "dedupid", "id", "pid", "pid_type", "fulltext_url", "location"
|
||||||
- saves the results in a json-file, including the pid for which it made the check (for example in case a record has both "doi" and "pmid" and a fulltext was detected for the "doi" (at least), then the output-record has the "doi" as its PID)
|
- saves the results in a json-file, including the pid for which it made the check (for example in case a record has both "doi" and "pmid" and a fulltext was detected for the "doi" (at least), then the output-record has the "doi" as its PID)
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue