Commit Graph

25 Commits

Author SHA1 Message Date
Lampros Smyrnaios bd0d9eb36f - Delete the transferred full-texts as soon as possible, in order to mitigate the "No space left on device"-error, which may appear, in case we have some very large files.
- Use the new "GenericUtils.clearBlockingData()" method from the "PublicationsRetriever" library.
- Remove the deprecated "getMultipleFullTexts"-endpoint, along with the Zip-related code.
2023-01-18 16:55:59 +02:00
Lampros Smyrnaios 7dd5719bff - Update a method-call, to reflect the latest changes in the "PublicationsRetriever"-software.
- Add TODOs.
2023-01-17 18:25:49 +02:00
Lampros Smyrnaios c283cb4365 Improve exception-handling in "AssignmentsHandler.postWorkerReport()". 2023-01-16 15:22:32 +02:00
Lampros Smyrnaios 778dc6e25c - Improve the stability of "UriBuilder.getPublicIP()", by using a "HttpURLConnection" to increase the connection and read timeouts and avoid timeout-exceptions.
- Show the number of assignments which are requested from the Controller, in the log-message.
- Update Spring.
2023-01-03 18:43:26 +02:00
Lampros Smyrnaios 8c1daadad0 - Increase the "requestReadTimeoutDuration" to 5 hours.
- Improve gradle's performance.
2022-12-12 17:49:14 +02:00
Lampros Smyrnaios d37cd738a0 Refactor the full-texts deletion process to reduce storage space and complexity:
- Delete the assignments-batch full-texts after the whole procedure (for each assignments-batch) is finished, either successfully or not.
- Do not check for remaining files, when the Worker shuts down, since, in case of problematic handling the files are deleted anyway.

The full-texts are not needed to be kept, in case of an error, since the Controller will reassign the non-downloaded id-url records to some worker (maybe different) and these files will be downloaded again and handled there.

Also, change the "assignmentsNumsHandled" to hold data only for assignments which are handled all the way, including the upload of the full-texts from the Controller and also the insertion of the WorkerReport to the database.
2022-12-07 12:29:05 +02:00
Lampros Smyrnaios 5f48f72f06 - Add handling for the case, when the Controller could not retrieve any assignments from the database (without an error).
- Improve exception handling.
- Remove obsolete code.
2022-12-05 16:47:15 +02:00
Lampros Smyrnaios 4b85b092fe Handle the new "HttpStatus.MULTI_STATUS"-response from the Controller, inside "AssignmentsHandler.postWorkerReport()". 2022-09-28 22:41:43 +03:00
Lampros Smyrnaios b051e10fd3 - Fix a bug, causing the domainAndPath-tracking data to be deleted after every batch, after the initial threshold was reached. Now the thresholds increase, along the processed id-urls, in order to clear data, e.g. every 300_000 processed id-urls, as intended.
- Use different thresholds for clearing just the "domainAndPath"-blocking-data and all-tracking-data.
2022-09-28 19:10:01 +03:00
Lampros Smyrnaios 25070d7aba - Lower the thresholds for how often to clear the data-structures.
- Clear the "ConnSupportUtils.domainsWithConnectionData" data-structure, after each batch.
- Move the code for handling the "CookieStore" inside the "PublicationsRetrieverPlugin", as it is more related to that.
2022-07-04 18:42:05 +03:00
Lampros Smyrnaios d91732bc16 - Add deletion, of the cookies in the newly-supported CookieManager, after each batch.
- Update the Spring-Security-code to use the "SecurityFilterChain", as the previous code was deprecated.
- Update dependencies.
- Code cleanup.
2022-06-27 17:58:02 +03:00
Lampros Smyrnaios 26cbb83b51 - Add the "shutdownWorker"-endpoint to accept requests for shutting-down the Worker, gracefully, after it completes its current work (including sending the publications-files to the Controller). A user-defined "auth-code" is required.
- Add the "cancelShutdownWorker"-endpoint to cancel a previous "shutdownWorker"-request. A user-defined "auth-code" is required.
2022-06-22 18:53:27 +03:00
Lampros Smyrnaios 377b98d677 Increase the "requestReadTimeoutDuration" from 1 hour to 3. This way, each worker will handle saturation without aborting the connection, when multiple workers are waiting for the "databaseLock" in the Controller. 2022-02-22 13:29:02 +02:00
Lampros Smyrnaios edbf6461d5 - Refactor the scheduling of the "handleNewAssignments()" task. Spring already waits for the last task to get finished, before running the new one (unless Async is specifically enabled), so the "isAvailableForWork" didn't do anything (thus the bug described in a previous commit was never going to appear). Also, now we set to request the new assignments-batch immediately after the last one is finished (not after 15 mins), while dealing with potential continuous connection-errors.
- Avoid running the "deleteHandledAssignmentsFullTexts()" scheduled task on application's start.
- Optimize assignment of "requestUrl".
- Add clarity in the scheduled tasks, by using "fixedDelay" instead of "fixedRate", to signify that the time specified is counted right from the time the last task is finished (even though without enabling the "Async" there is no "danger" of running them in parallel).
- Code cleanup.
2022-02-21 12:48:21 +02:00
Lampros Smyrnaios 0d2f0b8b01 Code cleanup. 2022-02-19 17:21:51 +02:00
Lampros Smyrnaios b63ad87d00 Bug fixes:
- Fix a bug, where, in case it took too long to get the assignments from the Controller (possible when there are too many workers requesting at the same time or if the database is responding slowly), the Worker's scheduler would request for new assignments, in the meantime.
- Fix a bug, where, if the "maxAssignmentsBatchesToHandleBeforeRestart" was set, the Worker's scheduler could request another batch, right before the Worker was about to shut down.
- Fix a bug, where the condition of when to clear the over-sized data-structures was based on the "assignmentRequestCounter" send by the Controller (which is increased on each request by any worker and not for each individual one), and not on the "numHandledAssignmentsBatches" kept by each individual worker. This would result in much earlier cleanup, relative to the number of the Workers.
2022-02-19 17:09:02 +02:00
Lampros Smyrnaios 73552ce079 - Handle the latest download-errors provided by the "PublicationsRetriever" program.
- Update the "test" requestUrl.
2022-02-07 14:40:33 +02:00
Lampros Smyrnaios 8912bb1cf9 Fix adding an invalid error-message in case of an "alreadyDownloaded" full-text being discovered inside the "FileUtils.dataToBeLoggedList". 2022-01-17 23:46:15 +02:00
Lampros Smyrnaios 8abb260d60 - In case of an unknown (non-documented) exception inside "LoaderAndChecker.invokeAllTasksAndWait", now it will be logged and the app will gently shut down with an error-message in the Error-stream.
- Avoid double-checking for handledAssignments -in order to delete their full-texts- when the app is about to shut down, in case the "maxAssignmentsBatchesToHandleBeforeRestart" is set above Zero.
2022-01-04 00:23:45 +02:00
Lampros Smyrnaios 92d011e8a0 - Make sure the handled assignments - full-texts are deleted before the application exits.
- When the user sets the "maxAssignmentsBatchesToHandleBeforeRestart" above zero, shutdown immediately after the last assignments-batch. Do not wait for the next scheduled check.
- Allow the user to set the "maxAssignmentsBatchesToHandleBeforeRestart" in the "installAndRun.sh" script.
- Increase the "fixedRate" for the "ScheduledTasks.deleteHandledAssignmentsFullTexts()" method to 12 hours.
- Update README.md
2021-12-31 04:09:05 +02:00
Lampros Smyrnaios 1ddfd34236 - Allow the user to set a maximum number of assignments-batches for the Worker to handle. After handling those batches, the Worker will shut down. A number of < 0 > indicates an infinite number of batches.
- Avoid converting the zero fileSize to < null >. Now, the default value is < null >, so the zero-value will indicate a zero-byte file.
- Update dependencies.
- Code cleanup.
2021-12-24 00:12:34 +02:00
Lampros Smyrnaios c46c8c448a - Upgrade the zip-file delivery by using the "InputStreamResource". This way is more reliable, have better performance and uses less memory.
- Use the "InputStreamResource" also in "get(single)FullText"-endpoint, in order to avoid loading a big full-text file in memory.
- Decrease the system-reserved memory by 128 MB.
- Fix path-variable regexes for "getFullText"-endpoint.
- Optimize imports.
- Code cleanup.
2021-12-17 08:25:54 +02:00
Lampros Smyrnaios fd5b56e3c6 - Allow the user to set the "maxAssignmentsLimitPerBatch" value.
- Set increased lower and upper limits for the Java Heap Size.
- Update the "ServerBaseURL" to the Public IP Address of the machine which is running the app.
- Improve two log-messages.
2021-12-07 00:52:40 +02:00
Lampros Smyrnaios 018326eedd - Optimize the "FileZipper.zipMultipleFilesAndGetZip()" and "FileZipper.zipAFile()" methods.
- Improve the "getMultipleFullTexts"-endpoint. Check if the "fileNamesWithExtensions"-list is empty. Check if the baseDir for the fullTexts of a given assignments-counter is missing.
- Optimize the "PublicationsRetrieverPlugin.processAssignments()" method.
- Set a max-size limit to the amount of space the logs can use. Over that size, the older logs will be deleted.
- Show the heap size, in the beginning.
- Update Gradle.
- Code cleanup.
2021-12-03 04:09:40 +02:00
Lampros Smyrnaios 045788c728 - Use the "Timestamp" data-type instead of the "Date", in order to include more information.
- Code cleanup.
2021-11-27 02:37:33 +02:00