From 776243e6d513cfa0f58158345525c0b316209a00 Mon Sep 17 00:00:00 2001 From: Lampros Smyrnaios Date: Fri, 25 Oct 2024 21:44:27 +0200 Subject: [PATCH] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ba7fc52..9652dca 100644 --- a/README.md +++ b/README.md @@ -6,8 +6,8 @@ in order to find whether that PID exists in the aggregator's DB and has full-tex In detail, it does the following: - extracts the pids from a json-file (DOIs and PMIDs) -- if a "previous-results" file is provided, extracts the pid from there as well and reduced the original input to the pids which have not been processed before. -- splits them in batches and for each batch it submits each pid-evaluation-job to a "ThreadPoolExecutor", which uses 12 threads. +- if a "previous-results" file is provided, extracts the pid from there as well and reduces the original input to the pids which have not been processed before. +- splits them in batches, and for each batch it submits each pid-evaluation-job to a "ThreadPoolExecutor", which uses 12 threads. - for each one of the PID-pairs, makes a query with Impala, to quickly acquire the following: "dedupid", "id", "pid", "pid_type", "fulltext_url", "location" - saves the results in a json-file, including the pid for which it made the check (for example in case a record has both "doi" and "pmid" and a fulltext was detected for the "doi" (at least), then the output-record has the "doi" as its PID)