Lampros Smyrnaios
0ca02f3587
Change the delay values of scheduledTasks to production ones.
2023-05-24 13:56:20 +03:00
Lampros Smyrnaios
bfa569685a
- Use the "POST" method for shutdown and cancelShutdown requests.
...
- Polish some messages.
2023-05-23 22:24:49 +03:00
Lampros Smyrnaios
9fdaa9503b
- Delete any left-over full-texts after 36 hours.
...
- Upon shutting down, post a "shutdownReport" to the Controller.
2023-05-23 22:22:57 +03:00
Lampros Smyrnaios
903032f454
- After a WorkerReport has been sent, ask for new assignments immediately. So, the Worker does not have to wait for hours for the Controller to check for duplicate files in the DB, retrieve and upload the full-texts and insert the records to the DB.
...
- Special care is taken to delete the delivered full-texts as soon as possible.
- Write the workerReport to a json-file, in case something goes wrong, and keep it until the Controller notifies the Worker that the processing was successful.
2023-05-23 22:19:41 +03:00
Lampros Smyrnaios
9cb43b3d94
- Improve startup speed, by using a faster remote server to get the host's machine public IP. This also reduces the risk of not being able to get the public IP at all.
...
- Set the App to gracefully shut down the WebServer and wait up to 2 minutes.
- Increase the waiting time for the "PublicationsRetriever.executor" to shut down, to 2 minutes.
2023-05-23 20:17:58 +03:00
Lampros Smyrnaios
4d90846261
- In case the specified "controllerIP" is actually a domain-name, find its IP-address, so that a proper IP-to-IP comparison can be performed and the "securityChecks" can pass.
...
- Increase the "read-timeout" when searching for the host's machine public-IP.
- Update dependencies.
- Code polishing.
2023-05-22 21:25:22 +03:00
Lampros Smyrnaios
bd0ead816d
Make the value of time-out for "restTemplateForReport", to scale along the "maxAssignmentsLimitPerBatch".
2023-05-16 19:08:59 +03:00
Lampros Smyrnaios
93d1aa9588
- Fix a missing change.
...
- Add todo.
2023-05-15 13:41:53 +03:00
Lampros Smyrnaios
cc55354e73
Show the worker-id when the worker starts.
2023-05-15 13:22:55 +03:00
Lampros Smyrnaios
714938531b
- Add the time-zone in the logs.
...
- Code polishing.
2023-05-11 03:14:56 +03:00
Lampros Smyrnaios
29a54f0b30
Remove the "shutDownOrCancelCode" from security checks, since we have an IP whitelisting mechanism in place.
2023-05-03 15:15:46 +03:00
Lampros Smyrnaios
4eac7c5c66
Fix typo in property's name.
2023-04-29 18:10:35 +03:00
Lampros Smyrnaios
0ea7bccadb
Leave the Max-Heap-Size to 8Gb, we assume that enough swap space will be available on the host.
...
We can still override the max-heap-size if desired.
2023-04-29 17:55:03 +03:00
Lampros Smyrnaios
d5a997ad3d
Use restTemplates with different read timeouts depending on the operation. For the assignments-request we need a shorter read timeout, than the one we need for the worker-report. This guarantees that the connection does not hungs for so long, when the Controller crashes before sending the assignments.
2023-04-29 17:24:16 +03:00
Lampros Smyrnaios
53ab51922a
Allow shutdown requests from the Controller.
2023-04-28 23:46:39 +03:00
Lampros Smyrnaios
fcd80a8f3f
Add the "./createSwapStorage.sh" script.
2023-04-28 20:55:58 +03:00
Lampros Smyrnaios
7b7dd59b57
- Increase the "max_heap_size".
...
- Update a dependency.
- Update README.md
2023-04-28 19:37:12 +03:00
Lampros Smyrnaios
ec4d084972
Reduce memory usage.
2023-04-28 17:59:36 +03:00
Lampros Smyrnaios
0ba15dd31a
Increase the "requestReadTimeoutDuration" to 10 hours, as the number of full-texts to be transferred to the Controller keeps getting larger.
2023-04-26 15:08:46 +03:00
Lampros Smyrnaios
344bc46e08
Update Gradle.
2023-04-22 06:44:04 +03:00
Lampros Smyrnaios
0997558347
Update dependencies.
2023-04-20 15:39:15 +03:00
Lampros Smyrnaios
796e46bc99
Update dependencies.
2023-03-27 19:44:49 +03:00
Lampros Smyrnaios
839a797124
- Improve performance of full-texts transferring to the Controller, by preloading some bytes for faster response to the Controller's read requests.
...
- Optimize directories-creation process by eliminating the additive check for existence, as that check already takes place inside the "mkdirs()" method.
- Remove the obsolete code which in case the specific assignments' subdirectory failed to be created, then a different base-dir was used instead. Since the user-defined baseDir is already been successfully created upon initialization, any problem on creating subdirectories inside that base-directory will most likely persist even when changing the base directory. Additionally, even if the subdirectory with the changed base-directory succeeded, the "FullTextsController.getFullTexts()" method would not use it, resulting in errors.
- Code polishing.
2023-03-08 13:12:17 +02:00
Lampros Smyrnaios
4da54e7a7d
- Show a warning, in case the number of archived files is different from the number of requested files.
...
- Code polishing.
- Update Gradle.
2023-03-07 16:25:10 +02:00
Lampros Smyrnaios
ec09ecc7ff
- Refactor and Spring-ify the File-storage initialization process.
...
- Fix the problematic file-storage-path (it could not be used when the Controller was requesting the full-texts), which was produced when the user-defined path could not be created.
2023-03-07 16:21:32 +02:00
Lampros Smyrnaios
ba989484e4
Improve performance when archiving and compressing the full-texts.
2023-03-02 17:47:58 +02:00
Lampros Smyrnaios
ff4fd3d289
- Show the elapsed time for each assignments-request to be processed by the Worker.
...
- Update dependencies.
2023-03-02 17:34:44 +02:00
Lampros Smyrnaios
66d3f7bcb2
- Show a warning, in case the number of results is different from the number of the assignments (due to missing / double logging).
...
- Update Spring.
2023-02-24 23:27:02 +02:00
Lampros Smyrnaios
81b61b530f
Drastically improve performance by applying a pre-processing algorithm for the assignments-list to open some "space" between assignments which have the same domain, which in return, causes the threads to block less during execution.
...
(The threads block, due to the mandatory "politeness-delay" before reconnecting with the same domain, in order to avoid overloading the remote servers.)
2023-02-24 23:23:37 +02:00
Lampros Smyrnaios
84a37bd4b7
- Handle the case, where an instance of a urlReport record (having the same id and sourceUrl), may have failed to give a docUrl, due to en error, even if another instance gives the docUrl and the docFile. The absence of that handling could lead to a record-instance, being assigned a "fileLocation" which was actually an error-message (comment), and as a result the real "fileLocation" would have never been reached to be assigned, so the payload would be lost.
...
- Improve exceptions-handling.
2023-02-21 15:22:49 +02:00
Lampros Smyrnaios
9888349bef
Update Gradle.
2023-02-20 19:14:16 +02:00
Lampros Smyrnaios
0dd2b6c46f
Rename "getFullTextsImproved"-endpoint to simply "getFullTexts", now that this is stable.
2023-02-16 14:23:47 +02:00
Lampros Smyrnaios
0626e85894
Update dependencies.
2023-02-15 16:18:33 +02:00
Lampros Smyrnaios
13f56d16c0
Inform the user, if a previous "shutdownWorker"-request has been given, in "GeneralController.shutdownWorkerGracefully()"-endpoint.
2023-02-01 16:41:40 +02:00
Lampros Smyrnaios
b98ea92dec
Update/improve documentation.
2023-01-27 14:27:57 +02:00
Lampros Smyrnaios
24b52fba63
- Refactor the initialization and configuration process and Spring-ify the project.
...
- Update Spring dependency.
2023-01-25 18:33:49 +02:00
Lampros Smyrnaios
d6ff62d2ef
Update the "installAndRun.sh" script:
...
- Add the ability to build and run the app without re-installing the PublicationsRetriever library. This is useful when trying a non-published version of that library.
- Fix a wrong variable-name.
2023-01-20 01:59:26 +02:00
Lampros Smyrnaios
bd0d9eb36f
- Delete the transferred full-texts as soon as possible, in order to mitigate the "No space left on device"-error, which may appear, in case we have some very large files.
...
- Use the new "GenericUtils.clearBlockingData()" method from the "PublicationsRetriever" library.
- Remove the deprecated "getMultipleFullTexts"-endpoint, along with the Zip-related code.
2023-01-18 16:55:59 +02:00
Lampros Smyrnaios
7dd5719bff
- Update a method-call, to reflect the latest changes in the "PublicationsRetriever"-software.
...
- Add TODOs.
2023-01-17 18:25:49 +02:00
Lampros Smyrnaios
c283cb4365
Improve exception-handling in "AssignmentsHandler.postWorkerReport()".
2023-01-16 15:22:32 +02:00
Lampros Smyrnaios
d96d0c68cd
Make sure the "responseCode" is "200-OK", before trying to get the InputStream in "UriBuilder.getPublicIP()".
2023-01-11 16:02:31 +02:00
Lampros Smyrnaios
fd62ac567e
- Add a new endpoint "getFullTextsImproved" which uses Facebook's [**Zstandard**]( https://facebook.github.io/zstd/ ) compression algorithm, which brings very big benefits on compression rate and speed.
...
- Remove some dependencies.
2023-01-09 15:48:30 +02:00
Lampros Smyrnaios
778dc6e25c
- Improve the stability of "UriBuilder.getPublicIP()", by using a "HttpURLConnection" to increase the connection and read timeouts and avoid timeout-exceptions.
...
- Show the number of assignments which are requested from the Controller, in the log-message.
- Update Spring.
2023-01-03 18:43:26 +02:00
Lampros Smyrnaios
378db2ff2f
- Add an existence-check for the "publications_retriever"-JAR, before trying to make a backup, inside "installAndRun.sh".
...
- Add a final logging message, right before the app shuts down.
2022-12-15 14:15:24 +02:00
Lampros Smyrnaios
8c1daadad0
- Increase the "requestReadTimeoutDuration" to 5 hours.
...
- Improve gradle's performance.
2022-12-12 17:49:14 +02:00
Lampros Smyrnaios
6c17e86c70
Code polishing.
2022-12-09 12:53:08 +02:00
Lampros Smyrnaios
d37cd738a0
Refactor the full-texts deletion process to reduce storage space and complexity:
...
- Delete the assignments-batch full-texts after the whole procedure (for each assignments-batch) is finished, either successfully or not.
- Do not check for remaining files, when the Worker shuts down, since, in case of problematic handling the files are deleted anyway.
The full-texts are not needed to be kept, in case of an error, since the Controller will reassign the non-downloaded id-url records to some worker (maybe different) and these files will be downloaded again and handled there.
Also, change the "assignmentsNumsHandled" to hold data only for assignments which are handled all the way, including the upload of the full-texts from the Controller and also the insertion of the WorkerReport to the database.
2022-12-07 12:29:05 +02:00
Lampros Smyrnaios
326af0f12d
- Return a success-message in the response-body, of the "shutdownWorkerGracefully" and "cancelShutdownWorkerGracefully" endpoints.
...
- Apply the checks for the "totalZipBatches" param, before the Worker-related checks, in "FullTextsController.getMultipleFullTexts()"
- Show the Heap-sizes in megabytes.
2022-12-05 21:58:16 +02:00
Lampros Smyrnaios
5f48f72f06
- Add handling for the case, when the Controller could not retrieve any assignments from the database (without an error).
...
- Improve exception handling.
- Remove obsolete code.
2022-12-05 16:47:15 +02:00
Lampros Smyrnaios
182d6153d4
- Set some optimization settings for gradle.
...
- Fix error-handling in "installAndRun.sh".
- Update dependencies.
2022-11-30 16:25:57 +02:00