UrlsController/src/main/java/eu/openaire/urls_controller
Lampros Smyrnaios 8f9786de09 Upgrade the algorithm for finding the previously-found fulltexts, based on their md5hash:
- Use a single query with a list of the fileHashes, instead of thousands of singe-md5hash-check queries (run at most 6 in parallel) which require a lot of I/O.
- Avoid checking multiple times the same fileHash, in case it is related with multiple payloads.
- In case of a database-error, avoid completely losing the full-texts of that worker, instead, continue processing the full-texts.
2024-03-13 11:28:37 +02:00
..
components - Improve error-handling in "S3ObjectStore.emptyBucket()". 2024-03-11 16:17:32 +02:00
configuration - Rename "ImpalaConnector.java" to "DatabaseConnector.java". 2023-08-23 16:55:23 +03:00
controllers - Improve error-handling in "S3ObjectStore.emptyBucket()". 2024-03-11 16:17:32 +02:00
models - Add error-handling for the case when no payloads could be associated with a specific url which should have been in the hashMultiMap in "addUrlReportsByMatchingRecordsFromBacklog". 2024-03-11 19:48:04 +02:00
payloads - Process the WorkerReports in background Jobs and post the reportResults to the Workers. 2023-05-24 13:52:28 +03:00
security - Update the Spring-Security-code to use the "SecurityFilterChain", as the previous code was deprecated. 2022-06-27 21:41:32 +03:00
services Upgrade the algorithm for finding the previously-found fulltexts, based on their md5hash: 2024-03-13 11:28:37 +02:00
util Upgrade the algorithm for finding the previously-found fulltexts, based on their md5hash: 2024-03-13 11:28:37 +02:00
UrlsControllerApplication.java - Allow to easily change the por used by workers. 2023-12-19 23:31:42 +02:00